The Web and
its Chemistry.
Henry S Rzepa
Department of
Chemistry, Imperial College, London, SW7 2AY.
Introduction.
Chemical knowledge has been collected and indexed on a global scale for more than ninety years, and this body of information now
includes descriptions of more than ten million discrete chemicals and their
properties. Chemicals in turn can be considered as a collection of
objects (atoms) connected by bonds, with a higher order structure
defined by three dimensional coordinates or other spatial descriptors.
Some very sophisticated chemical display, searching and indexing methods have been
developed over the last twenty years by a number of commercial and academic
information providers, but these tools have rarely been suitable
for teachers or researchers in presenting and communicating their
own work on a global scale.
A Chemical MIME Type.
The World-Wide-Web system offers an excellent mechanism with which
to unify and further develop
chemical information technology. It became apparent to
us in our own implementation of initially gophers, and subsequently
www servers as mechanisms for disseminating our research results,
that none of the existing MIME types were ideally suited for
classifying chemical information. Whilst many of the chemical data
formats developed during the computer age could be classified as text
files, the content is capable of a much richer interpretation. We
are
proposing that a new primary MIME type to be known as
chemical be defined, with initially at least three
secondary group types.
- chemical constitution and connection types
- three dimensional and chemical property types
- chemical spectroscopic and analytical formats
These formats cover a broad spectrum of chemical and information types, including
small molecules, macromolecules and biomolecules such as
proteins and enzymes. Approximately twenty specific and standard
formats already exist and are in common use for defining chemical
information. Typical examples might be;
We expect that the adoption and use of such descriptors will
encourage the development of new chemically orientated browsing
tools for use both as stand-alone tools and within the context of the web.
Three-dimensional Properties and Dynamics.
There is no doubt however that even within the existing MIME types,
much can be achieved in a chemical sense. One particularly high
priority is to escape the confines of the two-dimensional journal page
by enabling some sense of a three-dimensional
perspective to be added to a chemical structure. In many of the
cutting-edge themes in chemistry such as chirality, molecular recognition or materials science,
three dimensional aspects are paramount. Yet the best that most
printed journals can offer is a "stereo pair" that often fails to
achieve the desired effect. Some journals have taken to distributing
a floppy
disk containing e.g. molecular coordinates for use with a local program, but this is
an expensive and very much a non-standard solution.
Our solution to
this problem has been to prepare video/mpg or video/mov (Quicktime) animations of complex
chemical structures, and to include
these in many of our
documents mounted on the web.
This allows the viewer to perceive the structure as a three-dimensional
object, and also to some extent allows the viewing angle to be selected.
Animation also allows time dependent properties to be
effectively displayed, such as in the area of chemical simulation
known as molecular dynamics, where the
time-evolution of structure and energy is studied.
Hitherto, researchers wishing to present such results have had to
resort to film or tape based animations, and routine dissemination via journals
has not been possible. We believe that the web forms
an ideal delivery mechanism for such information.
Where particular aspects of a chemical structure might need elaboration,
we have introduced IS mapping into a 2D chemical bitmap in image/gif format,
with for example an audio/au annotation added to explain the
particular features. This is clearly an area where the development of
suitable software tools is a high priority.
The Need for Chemical Mapping.
Although such technology offers a significant improvement on
the more conventional methods of scientific publishing, it is
apparent that far more could be accomplished. For example, a
chemical/pdb file can contain three dimensional information
about a chemical system such as an enzyme, which could readily be used to prepare a fully
navigable 3D rendered object for viewing. This object could be
mapped in a 3D rather than a 2D sense, using perhaps atoms, bonds, amino acids or nucleotide residues as
locators rather than rectangles or circles. Since the chemical is exactly defined,
specified helper programs can be used to add value to the image, as
for example in calculating molecular energies or wavefunctions for the system.
One might also imagine the copy-paste metaphor for text and images
being applied via the mechanism of chemical MIME types. Full or partial molecular
structures could be selected and pasted into other chemically
cognisant programs in the way that text is currently handled. Further
extensions of this concept could involve chemical FORMS, in which
chemical structures rather than text are defined, and indexing is based
on chemical sub-structures and functional groups. Much of the methodology for
implementing sub-structure searches has been developed, and the integration
of such techniques into web indexing seems highly desirable.
The Role of High Bandwidth Networks.
The rapid development of high capacity national network backbones, and
the availability of individual workstations with network connections running at
up to 100 Mbps, has enormous significance for promoting both
national and an international collaborations, and for enhancing
productive liaisons within universities for both research and
teaching. The use of chemical MIME types and the web
as a management tool is a feature of our own pilot ATM co-project
with the University of
Leeds
using the UK Superjanet network
and involving videoconferencing,
the real-time display of chemical images and whiteboarding techniques.
We each have installed our own web home page for the project,
with a link to the
other site, and with experimental chemical MIME types defined to
commonly used helper programs. Of equal importance is the prospect
of collaborating with the chemical and pharmaceutical industries,
provided important aspects of security and the "firewall" are resolved.
The Issues Raised.
In common with other areas of science, such enabling technology
brings with many fascinating new issues which
need to be discussed and resolved;
- Establishing mechanisms for insuring quality control via suitable peer review.
- Authenticating data with a date stamp to establish priority.
- Developing permanent archives for web based chemical information.
- Establishing how scholarly electronic publication may be recognised for career development and funding applications.
These four themes derive from the current acceptance that
printed journals have constituted an infra-structure developed over
several hundred years for permanently archiving material which has been
subject to some form of quality control. We
are now asking for entirely new mechanisms to be implemented,
recognised and routinely used, on a timescale of just a year or
two! Perhaps encryption schemes
are now inevitable, but in their implementation, the scientific
community must be careful to protect the mechanisms that have allowed
serendipity to flourish and teaching to develop, and where perhaps
some element of cost recovery from authors rather than readers
might be considered.
At the beginning of 1994, only four predominantly chemically orientated web
servers existed. The list has grown substantially since then.
Given the wide ranging possibilities for developing chemically aware
browsing mechanisms, some of which have been outlined above, there
is every prospect for the exponential growth in this area to be
maintained.
For further information, contact the author via either E-mail:
rzepa@ic.ic.uk or the URL:http://www.ch.ic.ac.uk/rzepa.html