Overview of Molecular Modelling
Molecular modelling is a very diverse subject, ranging from the acquisition and subsequent display of molecular coordinates through to highly accurate (i.e. better than experiment) numerical simulation using theoretically derived functions. Depending on the context and the rigour, the subject itself is also often referred to as "molecular graphics", "molecular visualisation", "computational chemistry", "computational quantum chemistry" or "theoretical chemistry". A related area known as "molecular simulation" relates the use of molecular modelling techniques to describing and understanding the statistical behaviour and properties of collections of molecules on a "macroscopic" scale. "Molecular dynamics" deals with those time-dependent properties of collections of molecules, and uses many of the techniques of molecule modelling and statistical mechanics. Both these last two methods are beyond the scope of this lecture course.
Six Characteristic Features and Classifications
-
Five Molecular Scales. Molecular modelling spans an enormous range of molecule size.
- Solutions of the Quantum Mechanical (QM) Schroedinger equation which predict properties at least as accurate as can be measured by experiment are only available for molecules with up to ~8 atoms (rather more with symmetry)
- Acceptably accurate "QM" solutions are nowadays available for molecules with perhaps 250 atoms if large-scale computational resources are available (see DNA).
- The "macromolecular" region of up to 10,000 atoms can be treated using semi-empirical (SE) QM methods, giving approximate chemical accuracy.
- Beyond this, in the region up to about 50,000 atoms, non-quantum mechanical solutions based on "Molecular Mechanics" (MM) methods can be used.
- For larger systems (the "Mesoscopic" region), the "atom" is replaced by "unified groups" of several atoms, or by simple descriptions (spheres, ellipsoids, etc) of entire molecules and one can treat collections of perhaps up to a million molecules.
-
Molecular Coordinates. Molecular modellers were amongst the first to take full advantage of on-line databases and information sources, and were the first chemists to adapt to the modern Internet. 3D atom coordinates are an essential feature of many modelling methods.
-
Sources of Coordinates: Accurate or approximate 3D coordinates can be obtained from several experimental sources, in order of preference:
- Expt: The Cambridge Crystal database centre
- Expt: Microwave, electron diffraction and other specialised spectroscopies (small molecules only)
- Expt: NMR data (NOE experiments which determine how close eg to H atoms might be).
- Calc: Specialised databases deriving from archived calculations or algorithms (Corina)
- Calc: Sources such as electronic journals e.g. DOI: 10.1021/ja061400a or 10.1021/ic0519988
- Calc: Digital Repositories. These are only just now starting to be used.
-
Coordinate Systems: For N atoms in a system to be modelled, at least 3N-6 coordinates are needed to specify the system geometrically. These coordinates can be:
- Some very simple modelling methods (Hückel) need only the atom connectivity and not atom coordinates.
- Cartesian (XYZ) of which there are 3N coordinates, of which 6 are normally redundant, corresponding to translations and rotations of the molecule. They are the coordinates of choice for un-connected collections of molecules (i.e. host/guest assemblies etc)
-
Internal or "Z" matrix coordinates;
H
O 0.96 1
O 1.4 2 111 1
H 0.96 3 111 2 90 1
- Redundant internal coordinates (DOI: 10.1063/1.462844) which are particularly useful for bridged polycyclic compounds and cage structures, and tends to be the method of choice for most modern programs.
- Symmetry adapted coordinates can be specified using exact symmetry restrictions (Gaussview is a program that can symmetrize a coordinate set). A Web site for handling coordinate symmetry even allows you to determine the symmetry group by providing XYZ coordinates. These coordinates can speed up calculations by several factors.
- Crystallographers have the own system of coordinates in a unit cell and associated symmetry operators (for periodic systems). Sometimes you see Polar coordinates in this context.
- Some modelling methods abandon the atom as the smallest unit whose coordinate needs to be known, and use larger scale approximations such as protein backbone positions, or even spherical or ellipsoidal approximations to whole molecules. Quaternion coordinates can be used in such cases.
-
Coordinate File Types: Historically, various computer file formats were developed to described these coordinates, of which the best known are the "Molfile", the "PDB" and "CML" formats.
- The MDL Molfile is really a database, not modelling format, and can lead to difficulties for small molecule modellers:
h2o2.mol
4 3 0 0 0 1 V2000
0.1332 0.6883 2.1950 O 0 0 0 0 0 \n
0.2562 0.6410 0.9013 O 0 0 0 0 0
0.8290 1.3074 2.5089 H 0 0 0 0 0
0.2935 -0.3133 0.6690 H 0 0 0 0 0
1 2 1 6 0 0
1 3 1 0 0 0
2 4 1 0 0 0
M END
- Issues with the above format include the fixed precision (4 decimal places) of the coordinates
- Each atom is defined only by a line break (which can be lost on passage between different computers) and which causes difficulties on e.g. iPads.
- The atom connectivity descriptors (yellow on black above) are fixed width, and break for more than 100 atoms!
- The PDB format contains much more information about bio-molecules, but is poorly suited for small molecules (note only 3 decimal places for coordinates). The CIF file is a small molecule variation used for crystal structure coordinates and mmCIF for large systems.
SEQRES 1 A 467 GLY ALA MET ALA SER SER VAL LEU VAL THR GLN GLU PRO
SEQRES 2 A 467 GLU ILE GLU LEU PRO ARG GLU PRO ARG PRO ASN GLU GLU
HET COA 101 48
HETNAM COA COENZYME A
HETNAM MAH 3-HYDROXY-3-METHYL-GLUTARIC ACID
FORMUL 5 COA 4(C21 H36 N7 O16 P3 S1)
HELIX 1 1 PRO A 444 LEU A 449 1 6
HELIX 2 2 SER A 463 LYS A 474 1 12
SHEET 1 A 4 LYS A 549 ALA A 556 0
SHEET 2 A 4 VAL A 530 LEU A 546 -1 N GLY A 539 O MET A 555
CISPEP 1 GLY A 542 PRO A 543 0 0.61
CRYST1 75.297 130.182 92.547 90.00 106.48 90.00 P 1 21 1 8
ATOM 1 N PRO A 439 -7.194 -13.702 30.538 1.00 76.06 N
ATOM 8 N ARG A 440 -7.440 -15.246 28.234 1.00 76.37 N
- A more modern example is CML (Chemical Markup Language), which is an extensible format which can carry as much (molecular modelling) information as is needed:
<cml:molecule xmlns:cml="http://www.xml-cml.org/schema/cml2/core">
<cml:metadataList title="generated automatically from Openbabel">
<cml:metadata name="dc:creator" content="OpenBabel version 1-100.1"/>
<cml:metadata name="dc:description" content="CCSD(T)//CCSD/6-31G(d) Gaussian 09 optimised geometries"/>
</cml:metadataList>
<cml:atomArray atomID="a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14" elementType="C C O O C C C C H H H H H H" formalCharge="0 0 0 0 0 0 0 0 0 0 0 0 0 0" x3="0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000100" y3="0.675900 -0.675900 -1.705300 1.705300 -1.701000 1.701000 -0.722800 0.722800 1.111600 -1.111600 -1.147000 1.147000 2.739700 -2.739700" z3="-1.572000 -1.572000 -0.678200 -0.678200 0.682200 0.682200 1.617300 1.617300 -2.568500 -2.568500 2.622800 2.622800 1.006400 1.006400"/>
<cml:bondArray atomRef1="a1 a1 a1 a2 a2 a3 a4 a5 a5 a6 a6 a7 a7 a8" atomRef2="a2 a4 a9 a3 a10 a5 a6 a7 a14 a8 a13 a8 a11 a12" order="2 1 1 1 1 1 1 2 1 2 1 1 1 1"/>
</cml:molecule>
The advantage of such modern (XML-based) formats is that e.g. molecular coordinates and properties can be embedded in a variety of delivery systems, including podcasts, Microsoft Office (CML4Word) and Journals.
Molecular Visualisation. Once 3D coordinates are available, they can be visualised, an important aid to interpretation of molecular modelling:
- Wireframe, Ball and Stick and Spacefill for small and medium sized molecules
- Ribbon for protein, nucleotide and carbohydrate structures to render the tertiary molecular structures, Polyhedral modes for eg ionic lattices.
- Isosurfaces, which are generated from the sizes of atoms, and onto which can be colour coded further properties such as MOs, charges etc.
- Animation to view molecular vibrations and the time dependent properties of molecules such as (intrinsic) reaction coordinates, protein folding dynamics, etc.
- Integration and Scripting. Programs such as Jmol or ChemDoodle allow seamless integration of models as part of lecture courses, electronic journals, podcasts, iPads, etc and increasingly elaborate scripting of the models to illustrate scientific points.
Molecular Structure Analysis. Once a visual model is available, simple "heuristics" (rules) can be applied. These include:
- Detecting close atom contacts (defined as two atoms approaching significantly closer than the sum of their van der Waals distances (10.1021/jp8111556 and 10.1524/zkri.2009.1158), due to e.g. hydrogen bonding,
- Detecting bond lengths and the pattern of bond length alternation (e.g. aromaticity or anti-aromaticity, see 10.1021/ol703129z or anomeric effects)
- Detecting e.g. stereoelectronic effects such as atom antiperiplanarity or ring planarity
- Mapping to the simple ideas of e.g.arrow pushing which organic chemists tend to develop (and carry around in their head).
- IsoSTAR is a "data mining" method which can detect structural patterns in a large number of related structures.
- The Importance of being bonded (DOI: 10.1038/nchem.373): A sub-area of molecule modelling devoted to discussing what a chemical bond is, and how to characterise it,
Molecular Structure and Property Prediction
- Simple rule-based (heuristic) methods (e.g. CORINA) for approximations to molecular structures. Can be applied to most of the known 51,000,000 molecules, but is unlikely to lead to "new effects".
- Molecular Mechanics methods for larger molecules such as organics, carbohydrates, peptides, DNA oligomers, metal ion binding and some organometallics, zeolites, and complex ionic lattices.
- Semi-empirical Quantum mechanics methods for obtaining wavefunctions of molecules up to the size of medium sized enzymes.
- Ab initio and Density Functional Quantum Mechanics for accurate reproduction of geometries, and for e.g. weak molecular interactions such as hydrogen bonds, non-classical bonds (e.g. agostic interactions in organometallic species).
- Ab initio Coupled-cluster and double-hybrid correlated theories, particularly for "difficult" molecules such as e.g ozone, FOOF, aromatics, transition states, etc.
- Topological analysis of the wavefunction-derived electron density (AIM, ELF, etc) for defining bonding, and spectroscopic predictions.
Molecular Reactivity and Potential energy surfaces
- Hückel theories (heuristics) for aromaticity, frontier orbital analysis and simple relative energetics
- Extended Hückel theories for orbital correlation diagrams and metal systems.
- Semi-empirical, ab initio and density functional theories (DF) for
- reactions and bond formation/cleavage,
- potential energy surfaces and transition state modelling,
- electron density distributions and stereoelectronic properties,
- excited state properties and reactions,
- intermolecular properties,
- molecular surfaces and charge distributions (electrostatic potentials, etc).
- Multi-reference methods for exploring excited states, conical intersections etc.
The next two aspects of modelling are largely beyond the scope of the current lecture course:
- Molecular Solvation and Condensed phase properties Supermolecule and condensed phase models of specific and bulk solvation effects.
- Molecular Dynamics and Simulations. Theories of free energies and reaction kinetics.
Typical Molecular Modelling Software Tools
The "tools of the trade" have gradually evolved from physical models (Dreiding, CPK, etc) and calculators, including the use of programmable computers (starting around 1956 with the introduction of the first scientific programming language called Fortran), computers as visualisation aids (around 1970-), computers running commercially written analysis "packages" such as e.g. Sybyl (around 1984-) and most recently integration using Internet based tools and Workbenches (1994-) based on languages such as HTML, JavaScript, Java and C++. A Forum for discussing such tools, and other general queries is the Computational Chemistry List (CCL).
A typical selection of molecular modelling teaching tools available within the department is listed below.
- Mercury: A (free) Crystallographic unit cell viewer and editor.
- Jmol: A (free) Web-browser applet that can display molecules, and some of their properties such as surfaces, spectra, vibrations, etc.
- Ghemical: an OpenSource molecular editing and molecular mechanics program. Superceded by Avogadro.
- ChemDraw/ChemBio3D: Molecule editor and 3D geometry molecular mechanics/quantum mechanics optimisation and display tool (STEREO ENABLED)
- Gaussview+Gaussian 09: Ab initio quantum mechanics editors and programs.
- DS Viewer Pro (STEREO ENABLED)
- VMD visualisation of molecular dynamics (STEREO ENABLED)
Return to overview| Forward to visualization|Forward to Mechanics| Forward to MO Reactant|Forward to MO TS| Forward to MO Advanced|
(c) H. S. Rzepa 1998-2012. No reproduction rights granted to this material without permission.