During this 150-200 year evolution of the learned printed journal or book in molecular science, a rich and often very subtle and sophisticated notation has evolved to describe molecules and their properties on the printed page. For some time now, an international committee structure (IUPAC, the International Union for Pure and Applied Chemistry) has overseen the systematisation and publication of this nomenclature. When it comes to representing this on the printed page, it has been estimated that some 3000 special typesetting characters are required in molecular science. Amongst the younger generations of scientists, the skills necessary to translate complex molecular concepts into printed notation are increasingly absent. In part at least, this has been brought about in the last ten years by the introduction of a range of "chemical structure" drawing computer programs, which allow a highly visual representation of molecular structures to be constructed, thus by-passing the need to use the more traditional printers symbols and nomenclature. This trend has been re-inforced by the increasing trend toward manipulating graphic object resulting from the use of computer based database searches, molecular modelling and other techniques by scientists. This in turn has the effect that fewer chemists rely on purely text based nomenclature to formulate and communicate their ideas.
In the last ten years then, most scientists have acquired access to software tools that have enabled them to generate accurate descriptions of their subject in a variety of digital formats. With few exceptions, the ultimate destination of all these digital formats is ironically to the printed page of the conventional journal. But consider the perspective from the reader's point of view. They are faced with an essentially "analogue" medium and if they require quantitative information, they will have to re-key it into a computer, with all the risk of transcription error. OCR (optical character recognition) is another option, but its relatively poor accuracy means it is still little used in this context, especially since chemistry has its own unique character set! Molecular structures present even more of a challenge, since 100% accuracy must really be achived (there is often little or no internal redundancy in such structures). Similar operations would be needed on symbolic or mathematical equations, and the difficulties become extreme when it comes to recognising Instrumental data, which is often presented in highly consolidated or perhaps symbolic form in the journal.
It is as well to remind oneself of the philosophy of the scientific paper, which is not only to introduce new concepts and theories to the reader, but to provide the reader with sufficient information to allow them to reproduce the original research and experiments as appropriate. In part this arose as an "error-correction" mechanism, so that erroneous or faulty concepts and theories could be edited out of the "body of knowledge". In practice, with the widespread adoption of printed journals, most readers were also faced with identifying the inevitable transcription and typesetting errors contained on the printed page. Whilst a refereeing system does exist to catch both the errors of science and the transcription errors, neither referee nor reader has any "tools" to assist them in this process other than then own eyes and minds! The reader has in effect to use their own knowledge of the subject to identify whether the scientific "checksums" are correct. Because of the modern career pressures to publish, one suspects that few papers are ever fully subjected to such a rigorous analysis of either their factual content or the scientific concepts presented. To put it simply, whereas the computer era has enabled most of us to simplify the process of producing a learned article, it has done almost nothing to help us read, understand and apply published materials.
The advent of electronic journals, where in principal at least the content can be made fully digital, allows us finally to explore how the science of reading articles as opposed to producing them can be improved. In this respect, it is unfortunate that most of the discussion of electronic journals has not centered around this requirement, but on rather different, and it has to be said, much more commercial points of view. Thus librarians see the medium as an opportunity to re-vitalise a market where the spiraling costs of many journals appear to be casting doubts on the viability of small and departmental libraries. Publishers are largely focusing on what are termed "parallel" track printed and electronic publications, since this allows them to capitalise on any achieved reputation and quality of an existing journal, and to sell into an established market. Put crudely, the "look and feel" of and the copyright ownership of the journal become important, dare we say pre-eminent, considerations. This in turn translates into an emphasis on the physical appearance of the electronic representation of the journal, and the ability of the reader to print pages that closely resemble the printed version, as well as the controlled way in which the journal may be disseminated.
The central point of the present article is to present the point of view from the reader's and the scientist's perspective, and one which we feel has not hitherto been so actively promoted by publishers or librarians. The scientist would very much like to see the journal move towards being regarded as much more of a scientific instrument, to be used and more importantly integrated into the laboratory as part of the everyday research and teaching activity. Thus information and data should be capable of flowing transparently and of course with complete digital accuracy from say a learned article in a journal into the software, tools and instruments used for basic research. In effect, our aim is to reconcile the reader with at least a proportion of the quantitative information the original authors had at their disposal when they formulated their original theories. In the last several years, the various technologies that have become assciated with the World-Wide Web have finally allowed this dream to become a reality.
Traditional journals provide all this information in printed
and often higly symbolic form. Cost decrees that only a
minimal amount of information actually appears on the
printed pages. The rest is often discarded, or at best
"deposited" in the form of printed supplementary
information, and it can require real determination to
re-conciliate all this information into a form that can
then productively put to use. Thus in the electronic
version of this article, the "thumbnail image" (Figure 1)
should not simply be a static two dimensional diagram which
still requires substantial (and possibly error prone)
interpretation by the reader of the paper, but in fact a
starting point for the reader's own exploration of the
content. By hyperlinking this image to a set of three
dimensional information about the molecule, the reader can
easily acquire a full set of molecular data in digital form
which is theirs to process as they wish. We have coined the
expression "hyperactive molecules" [1] to express this
concept. In this case, it is achieved by using the
following HTML markup commands;
<A
HREF="http://www.ch.ic.ac.uk/clic/halo.pdb"><IMG
SRC="http://www.ch.ic.ac.uk/clic/halo.gif"></A>
Implicit in this syntax is another concept we have introduced, that of chemical MIME. [2] This is a chemical implementation of a standard mechanism for allowing the reader to associate files with assumed chemical content (in this case halo.pdb) with a "viewer" of their choice that is capable of processing this content, in this case to produce a rotatable image of the molecule on the screen. In the 18 months since we wrote our first "Internet draft" proposing this standard, the concept has come to be widely accepted throughout the molecular community, and we anticipate will be extensively used in chemistry electronic journals.
Another of our research projects currently involves exploring metaphors for creating 3D objects (scenes as they are referred to), using authoring environments such as virtual reality modelling language (VRML) [3]. Here again, one has the ability to directly associate visual images with the quantitative data and definitions behind them. In the case of halofantrin for example, one could hyperlink the carbon atom involved in the so called "chiral centre" which gives rise to the R/S symbolism, with say a remotely held glossary of information describing the rules governing the Cahn-Ingold-Prelog nomenclature. The addition of the powerful and secure Java scripting language [4] to the VRML object descriptor provides a powerful and flexible authoring environment which we anticipate will allow the creation of many new and innovative applications of electronic journals that far surpass what is currently possible with printed journals. We feel that is only by the introduction of such tools that the user community will come to fully accept electronic journals as a valuable new resource.
"Chemistry Markup Language" or CML [5] is another project which explores the ethos of providing full chemistry content to the reader of an electronic journal. This is a semantically rich SGML based language applicable to all areas of chemistry and molecular biology. It is designed for accurate and facile interchange and deposition of information, with the following characteristic features;
The concepts outlined above lead to an interesting potential conflict with one of the basic principles of conventional publishing, namely clear identification of copyright ownership. What we are advocating is that the the reader of any learned article is actively encouraged to acquire digital and hence exact copies of information and data associated with any particular article. Potentially, any one article might be associated with many different sources of information, some of which may reside with the publishers of the article, others of which may refer back to the original authors of the article, whilst others may point to third parties or commercial sources of information. The entire ethos of our argument is that the reader of the article becomes enabled to acquire this information easily and quickly. However, the nightmare scenario is that they are faced with negotiating a "transaction" with many different owners of this information. If for example, the reader of an article were to be faced with a demand for payment in exchange for each set of molecular coordinates or other information they wished to download, then clearly the entire system would not function in the manner intended. Clearly, if the scientific community sets out on such a road, then clear and workable guidelines will need to be formulated. Whether existing copyright law is up to this task is debatable!