Towards an Integrated Chemistry Information Environment: Introduction
Although experiments in on-line provision of chemical information and
journals can be traced back to the early 1970s, such services were
introduced into most chemistry departments only in the mid 1980s,
often via only a single low bandwidth "point of presence". The use of
such services in taught courses has been much more variable, largely
because significant, and in a teaching environment unmanageable, costs
were associated with these services. During this period, the standard
user interface was mostly restricted to either the 24 line by 80
character telnet "VT100" terminal mode, or Tektronix 4014 vector
graphics mode. There was little integration between various
information services from the point of view of query formulation or
chemical structure definition. Equally variable was the quality of
documentation and on-line help, too often depending purely on
the user having access to printed material.
Other significant limitations were the lack of integration of
information sources into other laboratory based excercises or
molecular modelling themes, and on a wider scale of the incorporation
of related projects being conducted in other teaching faculties around
the world.
Recently, solutions to such problems have evolved
in both a commercial and an educational context.
The commercial model is an interesting one, in
that it must of necessity evolve around robust and
affordable charging models. For example, Current
Science has recently launched the "BioMedNet" club,
which offers an environment in which subscribers
can browse electronic journals, perform keyword
searches, and have access to other network
resources in a self-consistent and "user-friendly"
manner. This club makes use of standard software
such as a World-Wide web client and an assumed
Internet connectivity to provide access. A rather
different, and very much more proprietary model is
the "SciFinder"
interface to the Chemical Abstracts database,
representing the latest stage in the 15 year
evolution of on-line services provided by this
organisation. SciFinder in its current state of
development is very much a closed turn-key
client-single server system which does not appear
to offer a viable model for the implementation of
any local teaching resources. Moreover, the
current cost of subscribing to such a service
would represent a very significant increase in
most library or teaching budgets, at a time when
these budgets are under severe pressure to
contract. The focus of SciFinder on the commercial
sector means that this charging model fits with
difficulty into any teaching environment.
A quite different open approach is rapidly evolving in many
teaching institutions and is based on a client-multiple
server model known as the World-Wide Web system which you
are using to view this document. The Web originated in 1989
at the European Laboratory for Particle Physics (CERN) with
the first definition of HTML or Hypertext-markup-language
and a transport protocol called HTTP
(Hypertext-Trasport-Protocol). The participation of the
National Center for Supercomputing Applications (NCSA) in
1993 introduced a Web client called Mosaic, which allowed a
combination of text and two dimensional images to be used to
create a cohesive environment for describing various
information services. The real technical innovation of the
Web over earlier hypertext systems was the introduction of a
global resource locator known as a URL (Uniform Resource
Locator), which allows a section of text or a graphic to be
seamlessly linked to other relevant documents or resources
anywhere on the Internet.
Starting in September 1993, a significant chemical
presence began to build up using these
technologies [1].
The "critical mass" was probably achieved in 1995,
when for the first time it became
possible to devise an experiment in molecular
information retrieval which could be completely
integrated, not merely on a local but on a global
scale, with other chemical resources [2]. In this
article, we describe our own implementation of an
experiment in molecular information retrieval in a
teaching environment which takes into account the
increasing molecular richness and diversity of the
Internet.
[ Abstract |1: Introduction | 2: The Basic Chemical Web
Technologies|MIME |CSML |VRML | Java | CML | The Virtual Chemistry Library | The Future | Acknowledgements and References | What's New ]
The last two years has seen the introduction of a number of
World Wide Web clients designed to display the contents of a
HTML document written in hypertext markup language or HTML.
Programs such as NCSA Mosaic, Microsoft Internet Explorer,
Netscape Navigator or Apple Cyberdog allow the two
dimensional page metaphor used in traditional chemistry
texts to be translated to on-line form. However, this is no
reason that we should continue to be bounded by two
dimensions. From the outset, we considered it necessary to
develop an infra-structure for linking documents written in
HTML with chemically specific datafiles, which could be
processed in an explicitly molecular manner by the user. In
this section, we discuss various methods that we have
evolved over the last two years for more closely integrating
chemistry into the traditional "document".
Our first solution to
this problem was to adopt a mechanism derived from mail
handling programs called MIME or Multipurpose Internet Mail
Extensions [3].
This mechanism is integrated in a generic
manner into most HTML browsers. Our particular
implementation of this was termed chemical MIME [4]. It
enables a browser to pass on any documents of an explictly
chemical nature to a program of the user's choice present on
their computer. This means that HTML documents can contain
hyperlinks to chemical data, which can then be displayed in
a visual manner which a generic HTML browser is incapable
of. A typical example is a hyperlink to a "pdb" file
containing 3D molecular coordinates, which can be displayed
using an external program such as RasMol or as an "in-lined"
molecule using Chemscape Chime (Figure 1).
Figure 1. Adenosine Triphosphate displayed using Chemical MIME.
|
---|
|
|
If you have installed Chemscape
Chime as a "plug-in" to Netscape 2.0, this molecule
should appear as a rotating image. | If you are
using other WWW clients, and have configured the chemical
MIME type as chemical/x-pdb, clicking on the
thumbnail image will activate the molecule.
|
[ Abstract |1: Introduction | 2: The Basic Chemical Web
Technologies|MIME |CSML |VRML | Java | CML | The Virtual Chemistry Library | The Future | Acknowledgements and References | What's New ]
Whilst such an approach is capable of adding a rich seam of
chemical content to a document, there are specific
limitations which soon become apparent. Because programs
such as RasMol or derived plug-ins such as Chime cannot
themselves resolve hyperlinks, the chemically specific
document becomes something of a cul-de-sac into which
further hyperlinks cannot easily be inserted. For example,
one might wish to have a hyperlink in the master HTML
document which when invoked might highlight one specific
atom or functional group in the molecule display. To accomplish this, it is necessary to
establish further
subsequent communication from the original HTML document to
the chemical display window. Our original solution to this
specific problem was to develop what we termed "CSML" or
chemical-structure-markup-language [3b], achieving communication
between the HTML browser and RasMol using a feature built
into the Unix version of Rasmol, by which a script can
communicate with a running RasMol process. By this means, we
were able to associate peaks in a 2D NMR spectrum displayed
in an HTML document with the individual protons responsible
highlighted in a molecule display window.(Figure 2). The
user could navigate around the spectrum using a device known
as an "image-map", identifying individual proton pairs as
they went. Subequently, the CSML mechanism has also been
integrated into the Chemscape Chime plug-in, and applied to
a "molecule-of-the-month"
current awareness collection at Imperial College. Such annotation provides
a powerful new teaching tool for use on the Web.
[ Abstract |1: Introduction | 2: The Basic Chemical Web
Technologies|MIME |CSML |VRML | Java | CML | The Virtual Chemistry Library | The Future | Acknowledgements and References | What's New ]
The information display mechanisms described thus far
represent essentially a one-directional communication
between a hyperlinked document and a molecular visualiser.
There is no capability for reversing the direction, from a
"marked-up" molecule to HTML or other documents. Two recent
developments offer solutions to this problem. During 1995, a
three dimensional object description language called VRML or
Virtual Reality Modelling Language was introduced. If HTML
is thought of as a language used to choreograph the two
dimensional ASCII character set, then VRML would correspond
to a similar description of a set of three dimensional
objects such as spheres, cylinders and other primitive
graphical objects. A VRML browser can display these objects
in 3D space, and the user can navigate around in this space.
Unlike a custom display program such as RasMol, VRML
browsers also fully support the hyperlink concept via URLs.
Thus a molecule described using VRML can have hyperlinks
associated with various atoms, or larger groups, and thus a
bidirectional information flow between say an HTML and a
VRML document can now be achieved, with each invoking the
other. As with Rasmol, the VRML scene can be rendered in
either a separate window, or as an in-line image using a
"plug-in".
Figure 3. Dimethyl sulfate encoded in VRML,
containing embedded hyperlinks associated with
individual atoms and bonds illustrating the hydrolysis of this species.
|
---|
|
[ Abstract |1: Introduction | 2: The Basic Chemical Web
Technologies|MIME |CSML |VRML | Java | CML | The Virtual Chemistry Library | The Future | Acknowledgements and References | What's New ]
Currently, VRML in version 1.0 supports no chemical
semantics, i.e. bonds and atoms are not explicitly
identified as such, and hence the way they are displayed
cannot be changed. Java is a programming language which
allows the molecular display code (e.g. RasMol), the display
data (e.g. a pdb file) and the hyperlink communication to be
built into a single file, or "applet". The applet window is
in-lined into the main body of the HTML document.
Furthermore, two or more Java applets can establish mutual
communication, such that a 2D NMR spectrum can be associated
with a 3D rotatable model of the corresponding molecule,
with appropriate atoms again highlighted. Thus Java allows
small compact applets to be written by users for a specific
task. In this, it does not necessarily supercede a
specialised display program such as Rasmol, and all three
mechanisms outlined above have their particular roles to
play in the creating of a rich chemical environment for the
user.
Because Java is highly customisable, and also secure,
several other issues come to the fore which the community
will need to solve. Firstly, is the recognition that two or
more Java Applets may need to intercommunicate. To achieve
this, chemical standards will have to be created to allow
this to happen easily and seamlessly. Secondly, some
mechanism for indexing the action and content of a Java
applet will need to be created. Such issues also apply to
the VRML concepts outlined above. We envisage the major
thrust of such work coming from the commercial software
developers, but perhaps with an impartial standards body set
up to attempt to control the evolution.
Figure 4. A Molecule Rendered using a Java Applet
|
---|
|
If your WWW client is "Java-aware", the image you see should be
rotatable. If your client does not support Java, a simple static
image will be present.
|
[ Abstract |1: Introduction | 2: The Basic Chemical Web
Technologies|MIME |CSML |VRML | Java | CML | The Virtual Chemistry Library | The Future | Acknowledgements and References | What's New ]
The technologies covered thus far relate to the visualisation and
interpretation of molecular coordinate data, with spectral data represented
as simple bit-mapped images. However,
the variety of disciplines and techniques that chemistry covers is
enormous, so it's not surprising that information exchange between
different types of molecular datafile is difficult.
It is generally accepted that the best way to tackle these problems is
through the use of markup languages. You
are reading a markup language (HTML) at the moment! Markup languages
add meta-information to a document to tell the recipient more
about it. In this spirit, we have started to develop what is termed
Chemical Markup language [5].
CML consists of three parts (in ascending hierarchy):
These are quite general, so that markup might appear as
<X.VAR TITLE="Heat of Evaporation" REL="glossary"
HREF="/chem/theor?deltahevap"
UNITS="kilocalorie/mole">34.12</X.VAR>
The most important result of this is that a very large body of current
chemical information can be encoded with CML. CML documents can have
a very flexible structure and have already been used to describe
precisely:
- Instrument output (e.g spectra and crystallography).
- Program output (e.g. molecular orbital calculations).
- Database entries.
- Publications (management of whole papers is already tractable).
In the future, we expect mechanisms
such as this to achive a closer intergation of virtual
chemistry libraries
[ Abstract |1: Introduction | 2: The Basic Chemical Web
Technologies|MIME |CSML |VRML | Java | CML | The Virtual Chemistry Library | The Future | Acknowledgements and References | What's New ]