NMR Analysis using Hyperactive Molecules and the World-Wide Web

Henry S. Rzepa* and Christopher Leach

Department of Chemistry, Imperial college, London, SW7 2AY.

Introduction.

The use of two-dimensional NMR and other spectroscopic data to interpret the three dimensional structures of molecules in solution underpins much of modern chemistry research, and yet this area can be a particularly difficult one to teach using the "conventional methods" of books and printed diagrams. Conceptually, the two dimensional NOESY technique in particular is particularly simple, a cross-peak off the diagonal indicating merely that the distance between two nuclei (normally both protons), the chemical shifts of which are associated with each of the two axes of the plot, must be within the approximate limits of 2.0 - 4.5A. The challenge lies in assigning each observed peak in the 2D NMR spectrum to a particular pair of nuclei and then deducing structural or conformation properties of the molecule. Building a complete three dimensional structure from such information is normally then a case of gradually accumulating approximate information of this type and refining a geometrical model of the molecule it relates to such that eventually a probable three dimensional structure emerges. Assuming the molecular constitution is known, and further that the bond connectivity is established and assignments deduced, only the molecular conformation remains to be determined.

To illustrate this technique in operation to our students, one of the examples we chose was of the the DNA oligomer, abreviated to CGCGTTTTCGCG, where the letters C, G and T refer to the nucleotides shown below and HR denotes the anomeric proton (Figure 1)



Figure 1. The basic structural components of the CGCGTTTTCGCG Oligonucleotide

As it happens, the two ends of this system are self-complementary, and fold around to form what is known as a hairpin loop. The resultant structure exhibits the well known helical structure in the two CGCG regions, whereas the TTTT region involving the bending of the chain is clearly going to show anomalies. An expansion of the 2D proton NMR spectrum of this system was taken from the work of Hare and colleages [1] (Figure 2). The cross-region which shows only close (3-4.5A) contacts between one aromatic proton and another so-called anomeric proton is relatively free of confusing artifacts and other irrelevant peaks and is particularly useful in illustrating how such spectral data can be directly related to 3D molecular structure. This is largely because the anomeric protons (as shown in the diagrams above) have fairly distinct and isolated chemical shifts, as indeed do the aromatic protons of the three bases. Thus whenever one of each type of proton approach each other closely, the resulting 2D NMR cross-peak can be clearly identified.

Figure 2. The Partial 2D NOESY Proton NMR Spectrum of CGCGTTTTCGCG. This represents a "clickable map". Firstly, activate the molecule by clicking on the image on the top left. First clicks on cross peaks will result in annotation of the relevant protons.
The first and perhaps the most difficult point to get across to students is that because of the helical structure of DNA, each anomeric proton is likely to be close to at least two aromatic protons, one from each adjacent base. Similarly, the aromatic protons can approach to within 4.5A of the anomeric proton from two different ribose rings. Hence the 2D NOESY cross-peaks will form a pattern of pairs, sharing either a common aromatic or a common anomeric proton. The cross-peaks are therefore capable of connection by either horizontal or vertical traces in the 2D spectrum. This connectivity can be used to either sequence the DNA oligomer, or to build a quite accurate 3D structure model of the entire system.

The explanation as given above in words does little justice to the actually simplicity of this analysis. We felt that a much more visual presentation was necessary for student acceptance, and to do this, we made use of several technologies which are now routinely available in most teaching laboratories.

Implementation

The essential tools needed for implementation are as follows.
  1. A computer system with a World-Wide Web browser such as NCSA Mosaic, MacWeb or Netscape and of course Internet connectivity.
  2. A program to visualise the 3D structure of the molecule. We chose the RasMol program written by Roger Sayle because it is particularly simple to operate, requires little memory, is fast in operation, supports a rich set of visualisation features, is freely distributable, and comes in Unix, Macintosh and Windows versions. The program can be obtained from the following anonymous ftp site; ftp.dcs.ed.ac.uk in the directory pub/Rasmol. In principle however, any 3D visualisation program available locally could be used, although the CSML component described below would probably not operate.
  3. The NMR spectrum is presented as a so called mapped GIF image within the WWW browser [2], prefaced by a textual description of the experiment written in a "style description language" called HTML, much as illustrated by the present document. In essence, all the relevant NMR cross peaks are defined as "hot-spots" within the GIF image, such that if the user clicks with the cursor on the peaks, a new hyperlink is invoked. The identity of these hyperlinks will be elaborated later. One corner of the NMR spectrum is reserved for a small "thumbnail" image of the DNA molecule itself. This image represents a "hot-spot" corresponding to a hyperlink to a set of 3D molecular coordinates stored in so-called "pdb" format (although this could be any well known 3D format for storing molecular information) and corresponding to the CGCGTTTTCGCG system.
  4. The WWW browser has to be configured initially for the various hyperlinks in the NMR spectrum to work correctly. The first task is to enable the "pdb" hot-link to associate correctly with the RasMol 3D viewer. This is accomplished using a mechanism known as chemical MIME, the details of which do not need to be known by the students. In essence, the WWW server where the pdb coordinate file is stored will be configured to associate the filename extension of the coordinates (ie .pdb in this case) with a so-called MIME header corressponding to
    chemical/x-pdb
    The x is to indicate that formal ratification of this MIME header by the international Internet standards bodies is not yet complete.[3] When a WWW browser receives this header, it looks to see if any associated "helper" program has been configured to process the file. Thus the local browser will need adding to its configuration file the chemical/x-pdb MIME type, the associated file suffix of .pdb, and information relating this to the RasMol program. A typical such entry for the Netscape browser is shown in Figure 3.


    Figure 3. The Chemical MIME Configuration of Netscape.

    This has the effect that when the small DNA thumbnail is clicked, 3D coordinates are transferred to the user and RasMol is automatically activated to display them. This concept of "hyperactive molecules" [4] also has enormously important implications outside the specific example of NMR analysis being discussed here, since it allows visually rich 3D information regarding molecules to be embedded into any textual or graphical description of chemistry. The user now has complete freedom to rotate the molecule as they wish, to apply various "styles" to the molecule such as wireframe or spacefilling renderings, even to add ribbon representations of larger proteins and enzymes.
  5. The final component of this experiment is a little more complex to set up, and indeed we have currently only implemented it for the SGI Indigo systems which form the mainstay of our molecular modelling laboratory. Implementations for Windows and Macintosh systems are planned. To introduce this component, we need to expand on the concept of a molecular "style" as noted above. Just as text can be annotated with italics, bold emphasis, headings and so on, so can a chemical structure be "marked-up" with various styles. In the context of analysing an NMR spectrum, we might wish to apply a particular style to say the two protons involved in a particular spectral cross-peak by uniquely assigning them a space-filling rendering whilst leaving the rest of the molecule as say a wire-frame model. To achieve this, we created what we have called a CSML, or chemical-structure-markup-language. This is simply a sequence of commands which can be sent to the 3D visualiser to achive the effect. This works particular well with the RasMol program, as illustrated with the sequence;
    all grey wireframe on
    atomno=106 yellow spacefill on 106
    atomno=137 yellow spacefill on 137
    

    which indicates that atoms 106 and 137 (in the particular sequence used for the coordinate file) will be rendered as spacefilled spheres in yellow, and with an accompanying label appearing on the screen, whilst the remainder of the molecule remains as a wireframe representation.

    In practice, this is implemented as follows. Each 2D NMR cross-peak in the mapped spectral image is associated with a separate file given the suffix .csml and containing appropriate CSML instructions such as shown above to highlite the two protons involved. The user's WWW browser must have a second MIME type defined of the type;
    chemical/x-csml

    This is associated not so much with a discrete application program as with a Tk/Tcl script resident on the users computer, and which itself is capable of passing the appropriate instructions on to a running RasMol process. This of course is a Unix-specific implementation. As we note above, Windows and Macintosh systems would require somewhat different implementations, such as for example a set of Applescript commands which would communicate with the Macintosh version of RasMol. This rather convoluted process is largely required in order to maintain the security of the local computer system. By passing pure data to a "trusted" local program (in this case a combination of the CSML script and the RasMol program) one avoids the problem of having a virus contamination. The CSML script required and the Tc/Tcl environment are both available from the anonymous ftp server ftp.ch.ic.ac.uk in the directory /pub/csml, along with installation instructions.

The Experiment in Action

The first action performed by the student is to acquire the 3D coordinates, activate RasMol and spend some time inspecting the structure. An attempt to locate the aromatic and the anomeric protons responsible for the region of the 2D NMR spectrum under scrutiny should be made. In the current implementation of RasMol (2.5), it is possible to directly display the distance between two selected protons, and hence the students can go in search of suitable contacts of between 3-4.2A.

To verify any putative assignments, the student can then click on relevant cross peaks in the NMR spectrum. The two protons will then be highlighted as spacefilled spheres in the RasMol window. Further selection of cross peaks should result in these spheres being replaced by new ones. At each point, the student can rotate the structure to be able to better see the interaction, and can measure the distances for themselves. One can even attempt to approximately correlate the magnitude of the NOESY cross-peak with the distance. A full exploration of the CGCGTTTTCGCG spectrum, along with an investigation of what happens in the TTTT region, can represent several hours work. Various other elaborations, such as invoking a so called user "form" via the hot-spot hyperlink instead of a CSML script, would allow the user to enter a structural assignment themselves, rather than having the answer given to them. Depending on the correctness of the answer, a CSML script could then be issued to provide the user with further help.

The Future.

In principle, any form of spectroscopic or analytical visual information can be associated with 3D structural features. Thus peaks in a HPLC or GC apparatus could be associated with molecular coordinates or indeed mas spectral fragmentation patterns[5]. We believe that such intimate connection between experimental data and structural features represents a new teaching aid, the potential of which we have barely begun to explore.[6]

Looking further ahead, one can envisage hyperlinks being present in 3D coordinate files themselves. This would enable annotation of a 3D structure, thus allowing the user to "navigate" through a complex 3D molecular world. One way this might be accomplished is to make the 3D molecular visualisation program itself capable of identifying hyperlinks and acting upon them. One such implementation, called EyeChem for SGI systems, has been described [7]. Also just around the corner is a variation on the HTML authoring language used to produce this document, called VRML, or virtual reality modelling language. Whereas HTML operates entirely in a two dimensional sense, VRML describes inherently three dimensional objects, such as for example molecules [8]. Just as text or 2D images can have embedded hyperlinks, so too can 3D objects in VRML. The opportunity is then offered to reverse the sense of the experiment described above, whereby clicking on an individual atom in a rotatable 3D molecular object would result in the display of the appropriate region of a 2D NMR spectrum. The opportunities for integrating World-Wide Web information systems, HTML documents and VRML molecular descriptions are exciting indeed.

Acknowledgements.

We thank Roger Sayle and Peter Murray-Rust for much help and encouragement during the development of these techniques and Omer Casher for work on VRML and EyeChem.

References.

[1] K. M. Banks, D. R. Hare and B. R. Reid, Biochemistry, 1989, 28, 6996.

[2] Such mapping is slightly more complex than preparing a conventional WWW document, and requires the use of separate programs to identify the "hot-spots"and to write out a map configuration file. The "ISMAP" file is held on the WWW server and resolved using a program on the server known as "imagemap". For further details, the following pages should be consulted; http://www.ch.ic.ac.uk/ectoc/ectoc_instructions.html In future, it is expected that the entire operation will actually be performed locally by the WWW browser.

[3] For further details of the chemical MIME project, the current proposed standard and discussions, see http://www.ch.ic.ac.uk/chemime2.html and http://www.ch.ic.ac.uk/hypermail/chemime/

[4] O. Casher, G. Chandramohan, M. Hargreaves, P. Murray-Rust, R. Sayle, H. S. Rzepa and B. J. Whitaker, "Hyperactive Molecules and the World-Wide-Web Information System", J. Chem. Soc., Perkin Trans 2, 1995, 7.

[5] A collection of related materials can be found in the "Global Instructional Chemistry" pages; http://www.ch.ic.ac.uk/GIC/

[6] For some further applications of this concept, including an exploration of the photosystem reaction center, see http://www.ch.ic.ac.uk/chemical_mime.html

[7] O. Casher and H. S. Rzepa, IEEE Computer Graphics, 1995, May issue.

[8] For examples of molecules described in VRML, see O. Casher and H. S. Rzepa, http://www.ch.ic.ac.uk/VRML/