The Web and its Chemistry.

Henry S Rzepa

Department of Chemistry, Imperial College, London, SW7 2AY.


Introduction.

Chemical knowledge has been collected and indexed on a global scale for more than ninety years, and this body of information now includes descriptions of more than ten million discrete chemicals and their properties. Chemicals in turn can be considered as a collection of objects (atoms) connected by bonds, with a higher order structure defined by three dimensional coordinates or other spatial descriptors. Some very sophisticated chemical display, searching and indexing methods have been developed over the last twenty years by a number of commercial and academic information providers, but these tools have rarely been suitable for teachers or researchers in presenting and communicating their own work on a global scale.

A Chemical MIME Type.

The World-Wide-Web system offers an excellent mechanism with which to unify and further develop chemical information technology. It became apparent to us in our own implementation of initially gophers, and subsequently www servers as mechanisms for disseminating our research results, that none of the existing MIME types were ideally suited for classifying chemical information. Whilst many of the chemical data formats developed during the computer age could be classified as text files, the content is capable of a much richer interpretation. We are proposing that a new primary MIME type to be known as chemical be defined, with initially at least three secondary group types.

These formats cover a broad spectrum of chemical and information types, including small molecules, macromolecules and biomolecules such as proteins and enzymes. Approximately twenty specific and standard formats already exist and are in common use for defining chemical information. Typical examples might be;

We expect that the adoption and use of such descriptors will encourage the development of new chemically orientated browsing tools for use both as stand-alone tools and within the context of the web.

Three-dimensional Properties and Dynamics.

There is no doubt however that even within the existing MIME types, much can be achieved in a chemical sense. One particularly high priority is to escape the confines of the two-dimensional journal page by enabling some sense of a three-dimensional perspective to be added to a chemical structure. In many of the cutting-edge themes in chemistry such as chirality, molecular recognition or materials science, three dimensional aspects are paramount. Yet the best that most printed journals can offer is a "stereo pair" that often fails to achieve the desired effect. Some journals have taken to distributing a floppy disk containing e.g. molecular coordinates for use with a local program, but this is an expensive and very much a non-standard solution.

Our solution to this problem has been to prepare video/mpg or video/mov (Quicktime) animations of complex chemical structures, and to include these in many of our documents mounted on the web. This allows the viewer to perceive the structure as a three-dimensional object, and also to some extent allows the viewing angle to be selected. Animation also allows time dependent properties to be effectively displayed, such as in the area of chemical simulation known as molecular dynamics, where the time-evolution of structure and energy is studied. Hitherto, researchers wishing to present such results have had to resort to film or tape based animations, and routine dissemination via journals has not been possible. We believe that the web forms an ideal delivery mechanism for such information.

Where particular aspects of a chemical structure might need elaboration, we have introduced IS mapping into a 2D chemical bitmap in image/gif format, with for example an audio/au annotation added to explain the particular features. This is clearly an area where the development of suitable software tools is a high priority.

The Need for Chemical Mapping.

Although such technology offers a significant improvement on the more conventional methods of scientific publishing, it is apparent that far more could be accomplished. For example, a chemical/pdb file can contain three dimensional information about a chemical system such as an enzyme, which could readily be used to prepare a fully navigable 3D rendered object for viewing. This object could be mapped in a 3D rather than a 2D sense, using perhaps atoms, bonds, amino acids or nucleotide residues as locators rather than rectangles or circles. Since the chemical is exactly defined, specified helper programs can be used to add value to the image, as for example in calculating molecular energies or wavefunctions for the system.

One might also imagine the copy-paste metaphor for text and images being applied via the mechanism of chemical MIME types. Full or partial molecular structures could be selected and pasted into other chemically cognisant programs in the way that text is currently handled. Further extensions of this concept could involve chemical FORMS, in which chemical structures rather than text are defined, and indexing is based on chemical sub-structures and functional groups. Much of the methodology for implementing sub-structure searches has been developed, and the integration of such techniques into web indexing seems highly desirable.

The Role of High Bandwidth Networks.

The rapid development of high capacity national network backbones, and the availability of individual workstations with network connections running at up to 100 Mbps, has enormous significance for promoting both national and an international collaborations, and for enhancing productive liaisons within universities for both research and teaching. The use of chemical MIME types and the web as a management tool is a feature of our own pilot ATM co-project with the University of Leeds using the UK Superjanet network and involving videoconferencing, the real-time display of chemical images and whiteboarding techniques. We each have installed our own web home page for the project, with a link to the other site, and with experimental chemical MIME types defined to commonly used helper programs. Of equal importance is the prospect of collaborating with the chemical and pharmaceutical industries, provided important aspects of security and the "firewall" are resolved.

The Issues Raised.

In common with other areas of science, such enabling technology brings with many fascinating new issues which need to be discussed and resolved;

These four themes derive from the current acceptance that printed journals have constituted an infra-structure developed over several hundred years for permanently archiving material which has been subject to some form of quality control. We are now asking for entirely new mechanisms to be implemented, recognised and routinely used, on a timescale of just a year or two! Perhaps encryption schemes are now inevitable, but in their implementation, the scientific community must be careful to protect the mechanisms that have allowed serendipity to flourish and teaching to develop, and where perhaps some element of cost recovery from authors rather than readers might be considered.

At the beginning of 1994, only four predominantly chemically orientated web servers existed. The list has grown substantially since then. Given the wide ranging possibilities for developing chemically aware browsing mechanisms, some of which have been outlined above, there is every prospect for the exponential growth in this area to be maintained.


For further information, contact the author via either E-mail: rzepa@ic.ic.uk or the URL:http://www.ch.ic.ac.uk/rzepa.html