NMR Analysis using Hyperactive Molecules and the World-Wide Web
Department of Chemistry, Imperial college, London, SW7
2AY.
Introduction.
The use of two-dimensional NMR and other spectroscopic data to
interpret the three dimensional structures of molecules in solution
underpins much of modern chemistry research, and yet this area can be
a particularly difficult one to teach using the "conventional
methods" of books and printed diagrams. Conceptually, the two
dimensional NOESY technique in particular is particularly simple, a
cross-peak off the diagonal indicating merely that the distance
between two nuclei (normally both protons), the chemical shifts of
which are associated with each of the two axes of the plot, must be
within the approximate limits of 2.0 - 4.5A. The challenge lies in
assigning each observed peak in the 2D NMR spectrum to a particular
pair of nuclei and then deducing structural or conformation
properties of the molecule. Building a complete three dimensional
structure from such information is normally then a case of gradually
accumulating approximate information of this type and refining a
geometrical model of the molecule it relates to such that eventually
a probable three dimensional structure emerges. Assuming the
molecular constitution is known, and further that the bond
connectivity is established and assignments deduced, only the
molecular conformation remains to be determined.
To illustrate this technique in operation to our students, one of the
examples we chose was of the the DNA oligomer, abreviated to
CGCGTTTTCGCG, where the letters C, G and T refer to the nucleotides
shown below and HR denotes the anomeric proton (Figure 1)
Figure 1. The basic structural components of the CGCGTTTTCGCG Oligonucleotide
As it happens, the two ends of this system are self-complementary, and
fold around to form what is known as a hairpin loop. The resultant
structure exhibits the well known helical structure in the two CGCG
regions, whereas the TTTT region involving the bending of the chain
is clearly going to show anomalies. An expansion of the 2D proton NMR
spectrum of this system was taken from the work of Hare and colleages
[1] (Figure 2). The cross-region which shows only close (3-4.5A) contacts
between one aromatic proton and another so-called anomeric proton is
relatively free of confusing artifacts and other irrelevant peaks and
is particularly useful in illustrating how such spectral data can be
directly related to 3D molecular structure. This is largely because
the anomeric protons (as shown in the diagrams above) have fairly
distinct and isolated chemical shifts, as indeed do the aromatic
protons of the three bases. Thus whenever one of each type of proton
approach each other closely, the resulting 2D NMR cross-peak can be
clearly identified.
Figure 2. The Partial 2D NOESY Proton NMR Spectrum of CGCGTTTTCGCG. This represents
a "clickable map". Firstly, activate the molecule by clicking on the image on the
top left. First clicks on cross peaks will result in annotation of the relevant
protons.
The first and perhaps the most difficult point to get across to
students is that because of the helical structure of DNA, each
anomeric proton is likely to be close to at least two aromatic
protons, one from each adjacent base. Similarly, the aromatic protons
can approach to within 4.5A of the anomeric proton from two different
ribose rings. Hence the 2D NOESY cross-peaks will form a pattern of
pairs, sharing either a common aromatic or a common anomeric proton.
The cross-peaks are therefore capable of connection by either
horizontal or vertical traces in the 2D spectrum. This connectivity
can be used to either sequence the DNA oligomer, or to build a quite
accurate 3D structure model of the entire system.
The explanation as given above in words does little justice to the
actually simplicity of this analysis. We felt that a much more visual
presentation was necessary for student acceptance, and to do this, we
made use of several technologies which are now routinely available in
most teaching laboratories.
Implementation
The essential tools needed for implementation are as follows.
- A computer system with a World-Wide Web
browser such as NCSA Mosaic, MacWeb or Netscape and of course Internet
connectivity.
- A program to visualise the 3D structure of the molecule. We chose the
RasMol program written by Roger Sayle because it is particularly simple to
operate, requires little memory, is fast in operation, supports a rich set of
visualisation features, is freely distributable, and comes in Unix, Macintosh and Windows versions. The
program can be obtained from the following anonymous ftp site;
ftp.dcs.ed.ac.uk in the directory pub/Rasmol. In
principle however, any 3D visualisation program available locally could be
used, although the CSML component described below would probably not operate.
- The NMR spectrum is presented as a so called mapped GIF image
within the WWW browser [2], prefaced by a textual description of the
experiment written in a "style description language" called HTML,
much as illustrated by the present document. In essence, all the
relevant NMR cross peaks are defined as "hot-spots" within the GIF
image, such that if the user clicks with the cursor on the peaks, a
new hyperlink is invoked. The identity of these hyperlinks will be
elaborated later. One corner of the NMR spectrum is reserved for a
small "thumbnail" image of the DNA molecule itself. This image
represents a "hot-spot" corresponding to a hyperlink to a set of 3D
molecular coordinates stored in so-called "pdb" format (although this
could be any well known 3D format for storing molecular information)
and corresponding to the CGCGTTTTCGCG system.
- The WWW browser has to be configured initially for the various
hyperlinks in the NMR spectrum to work correctly. The first task is
to enable the "pdb" hot-link to associate correctly with the RasMol
3D viewer. This is accomplished using a mechanism known as chemical
MIME, the details of which do not need to be known by the students.
In essence, the WWW server where the pdb coordinate file is stored
will be configured to associate the filename extension of the
coordinates (ie .pdb in this case) with a so-called MIME
header corressponding to
chemical/x-pdb
The x is to indicate that formal ratification of this MIME header
by the international Internet standards bodies is not yet complete.[3]
When a WWW browser receives this header, it looks to see if any
associated "helper" program has been configured to process the file.
Thus the local browser will need adding to its configuration file the
chemical/x-pdb MIME type, the associated file suffix of
.pdb, and information relating this to the RasMol program. A
typical such entry for the Netscape browser is shown in Figure 3.
Figure 3. The Chemical MIME Configuration of Netscape.
This has the effect that when the small DNA thumbnail is clicked, 3D
coordinates are transferred to the user and RasMol is automatically
activated to display them. This concept of "hyperactive molecules"
[4] also has enormously important implications outside the specific
example of NMR analysis being discussed here, since it allows
visually rich 3D information regarding molecules to be embedded into
any textual or graphical description of chemistry. The user now has
complete freedom to rotate the molecule as they wish, to apply
various "styles" to the molecule such as wireframe or spacefilling
renderings, even to add ribbon representations of larger proteins and
enzymes.
- The final component of this experiment is a little more
complex to set up, and indeed we have currently only implemented it
for the SGI Indigo systems which form the mainstay of our molecular
modelling laboratory. Implementations for Windows and Macintosh
systems are planned. To introduce this component, we need to
expand on the concept of a molecular "style" as noted above. Just as
text can be annotated with italics, bold emphasis, headings and so
on, so can a chemical structure be "marked-up" with various styles.
In the context of analysing an NMR spectrum, we might wish to apply a
particular style to say the two protons involved in a particular
spectral cross-peak by uniquely assigning them a space-filling
rendering whilst leaving the rest of the molecule as say a wire-frame
model. To achieve this, we created what we have called a CSML, or
chemical-structure-markup-language. This is simply a sequence of
commands which can be sent to the 3D visualiser to achive the effect.
This works particular well with the RasMol program, as illustrated
with the sequence;
all grey wireframe on
atomno=106 yellow spacefill on 106
atomno=137 yellow spacefill on 137
which indicates that atoms 106 and 137 (in the particular sequence
used for the coordinate file) will be rendered as spacefilled spheres
in yellow, and with an accompanying label appearing on the screen,
whilst the remainder of the molecule remains as a wireframe representation.
In practice, this is implemented as follows. Each 2D NMR cross-peak
in the mapped spectral image is associated with a separate file given
the suffix .csml and containing appropriate CSML
instructions such as shown above
to highlite the two protons involved. The user's WWW
browser must have a second MIME type defined of the type;
chemical/x-csml
This is associated not so much with a
discrete application program as with a Tk/Tcl script resident on the
users computer, and which itself is capable of passing the
appropriate instructions on to a running RasMol process. This of
course is a Unix-specific implementation. As we note above, Windows
and Macintosh systems would require somewhat different
implementations, such as for example a set of
Applescript commands which would communicate with the Macintosh
version of RasMol. This rather
convoluted process is largely required in order to maintain the
security of the local computer system. By passing pure data to a
"trusted" local program (in this case a combination of the CSML
script and the RasMol program) one avoids the problem of having a
virus contamination. The CSML script required and the Tc/Tcl
environment are both available from the anonymous ftp server
ftp.ch.ic.ac.uk in the directory /pub/csml, along
with installation instructions.
The Experiment in Action
The first action performed by the student is to acquire
the 3D coordinates, activate RasMol and spend some time
inspecting the structure. An attempt to locate the aromatic
and the anomeric protons responsible for the region of the 2D NMR
spectrum under scrutiny should be made.
In the current implementation of RasMol
(2.5), it is possible to directly display the distance between two
selected protons, and hence the students can go in search of suitable
contacts of between 3-4.2A.
To verify any putative assignments,
the student can then click on relevant cross peaks in the NMR
spectrum. The two protons will then be highlighted as spacefilled
spheres in the RasMol window. Further selection of cross peaks should
result in these spheres being replaced by new ones. At each point,
the student can rotate the structure to be able to better see the
interaction, and can measure the distances for themselves. One can
even attempt to approximately correlate the magnitude of the NOESY
cross-peak with the distance. A full exploration of the CGCGTTTTCGCG
spectrum, along with an investigation of what happens in the TTTT
region, can represent several hours work. Various other elaborations,
such as invoking a so called user "form" via the hot-spot hyperlink
instead of a CSML script, would allow the user to enter a structural
assignment themselves, rather than having the answer given to them.
Depending on the correctness of the answer, a CSML script could then
be issued to provide the user with further help.
The Future.
In principle, any form of spectroscopic or
analytical visual information can be associated with 3D structural
features. Thus peaks in a HPLC or GC apparatus could be associated
with molecular coordinates or indeed mas spectral fragmentation
patterns[5]. We believe that such intimate connection between
experimental data and structural features represents a new teaching
aid, the potential of which we have barely begun to explore.[6]
Looking further ahead, one can envisage hyperlinks being present
in 3D coordinate files themselves. This would enable annotation of a 3D
structure, thus allowing the user to "navigate" through a complex
3D molecular world. One way this might be accomplished is to make
the 3D molecular visualisation program itself capable of identifying
hyperlinks and acting upon them. One such implementation, called
EyeChem for SGI systems, has been described [7]. Also just around the corner
is a variation on the HTML authoring language used to produce this
document, called VRML, or virtual reality modelling language. Whereas
HTML operates entirely in a two dimensional sense, VRML describes
inherently three dimensional objects, such as for example molecules [8].
Just as text or 2D images can have embedded hyperlinks, so too can
3D objects in VRML. The opportunity is then offered to reverse the sense
of the experiment described above, whereby clicking on an individual
atom in a rotatable 3D molecular object would result in the display of
the appropriate region of a 2D NMR spectrum. The opportunities for
integrating World-Wide Web information systems, HTML documents and VRML molecular
descriptions are exciting indeed.
Acknowledgements.
We thank Roger Sayle and Peter
Murray-Rust for much help and encouragement during the development of
these techniques and Omer Casher for work on VRML and EyeChem.
References.
[1] K. M. Banks, D. R. Hare and B. R. Reid,
Biochemistry, 1989, 28, 6996.
[2] Such mapping is slightly more complex than preparing a conventional
WWW document, and requires the use of separate programs to identify the
"hot-spots"and to write out a map configuration file. The "ISMAP" file is
held on the WWW server and resolved using a program on the server known
as "imagemap". For further details, the following pages should be
consulted; http://www.ch.ic.ac.uk/ectoc/ectoc_instructions.html In
future, it is expected that the entire operation will actually be
performed locally by the WWW browser.
[3] For further details of the chemical MIME project, the current
proposed standard and discussions, see http://www.ch.ic.ac.uk/chemime2.html
and http://www.ch.ic.ac.uk/hypermail/chemime/
[4] O. Casher, G. Chandramohan, M. Hargreaves, P. Murray-Rust, R.
Sayle, H. S. Rzepa and B. J. Whitaker, "Hyperactive Molecules and the
World-Wide-Web Information System", J. Chem.
Soc., Perkin Trans 2, 1995, 7.
[5] A collection of related materials can be found in the
"Global Instructional Chemistry" pages; http://www.ch.ic.ac.uk/GIC/
[6] For some further applications of this concept, including an
exploration of the photosystem reaction center, see http://www.ch.ic.ac.uk/chemical_mime.html
[7] O. Casher and H. S. Rzepa, IEEE Computer Graphics, 1995, May issue.
[8] For examples of molecules described in VRML, see O. Casher and H. S. Rzepa,
http://www.ch.ic.ac.uk/VRML/