We are approaching 1 million recorded crystal structures (actually, around 716,000 in the CCDC and just over 300,00 in COD). One delight with having this wealth of information is the simple little explorations that can take just a minute or so to do. This one was sparked by my helping a colleague update a set of interactive lecture demos dealing with stereochemistry. Three of the examples included molecules where chirality originates in stereogenic centres with just three attached groups. An example might be a sulfoxide, for which the priority rule is to assign the lone pair present with atomic number zero. The issue then arises as to whether this centre is configurationally stable, i.e. does it invert in an umbrella motion slowly or quickly. My initial intention was to see if crystal structures could cast any light at all on this aspect.
Archive for the ‘Chemical IT’ Category
More simple experiments with crystal data. The pyramidalisation of nitrogen.
Saturday, November 1st, 2014Electronic notebooks: a peek into the future?
Tuesday, September 16th, 2014ELNs (electronic laboratory notebooks) have been around for a long time in chemistry, largely of course due to the needs of the pharmaceutical industries. We did our first extensive evaluation probably at least 15 years ago, and nowadays there are many on the commercial market, with a few more coming from opensource communities. Here I thought I would bring to your attention the potential of an interesting new entrant from the open community.
One molecule, one identifier: Viewing molecular files from a digital repository using metadata standards.
Monday, September 8th, 2014In the beginning (taken here as prior to ~1980) libraries held five-year printed consolidated indices of molecules, organised by formula or name (Chemical abstracts). This could occupy about 2m of shelf space for each five years. And an equivalent set of printed volumes from the Beilstein collection. Those of us who needed to track down information about molecules prior to ~1980 spent many an afternoon (or indeed a whole day) in the libraries thumbing through these weighty volumes. Fast forward to the present, when (closed) commercial databases such as SciFinder, Reaxys and CCDC offer information online for around 100 million molecules (CAS indicates it has 89,506,154 today for example). These have been joined by many open databases (e.g. PubChem). All these sources of molecular information have their own way of accessing individual entries, and the wonderful program Jmol (nowadays JSmol) has several of these custom interfaces programmed in. Here I describe some work we have recently done[cite]10.1021/ci500302p[/cite] on how one might generalise access to an individual molecule held in what is now called a digital data repository.
Data galore! 134 kilomolecules.
Wednesday, August 6th, 2014I do go on a lot about the importance of having modern access to data. And so the appearance of this article[cite]10.1038/sdata.2014.22[/cite] immediately struck me as important. It is appropriately enough in the new journal Scientific Data. The data contain computed properties at the B3LYP/6-31G(2df,p) level for 133,885 species with up to nine heavy atoms, and the entire data set has its own DOI[cite]10.6084/m9.figshare.978904[/cite]. The data is generated by subjecting a molecule set to a number of validation protocols, including obtaining relaxed (optimised) geometries at the B3LYP/6-31G(2df,p) level. It would be good to replicate this set with inclusion of a functional that also includes dispersion, and of course making the coordinates all available in this manner greatly facilitates this. The collection also includes data for e.g. 6095 constitutional isomers of C7H10O2, which reminds me of an early, delightfully entitled, article adopting such an approach in quantum chemistry[cite]10.1021/jp057107z[/cite]. Such collections are an important part of the process of validating computational methods[cite]10.1007/s00894-005-0278-1[/cite] This way of publishing data does raise some interesting discussion points.
The price of information: Evaluating big deal journal bundles
Thursday, July 3rd, 2014Increasingly, our access to scientific information is becoming a research topic in itself. Thus an analysis of big deal journal bundles[cite]10.1073/pnas.1403006111[/cite] has attracted much interesting commentary (including one from a large scientific publisher[cite]10.1038/510447f[/cite]). In the UK, our funding councils have been pro-active in promoting the so-called GOLD publishing model, where the authors (aided by grants from their own institution or others) pay the perpetual up-front publication costs (more precisely the costs demanded by the publishers, which is not necessarily the same thing) so that their article is removed from the normal subscription pay wall erected by the publisher and becomes accessible to anyone. As the proportion of GOLD content increases, it was anticipated (hoped?) that the costs of accessing the remaining non-GOLD articles via a pay-walled subscription would decrease.
Test of JSmol in WordPress: the background story.
Sunday, June 8th, 2014A word of explanation about this test page for experimenting with JSmol. Many moons ago I posted about how to include a generated 3D molecular model in a blog post, and have used that method on many posts here ever since. It relied on Java as the underlying software (first introduced in 1996), or almost 20 years ago. Like most software technologies, much has changed, and Java itself (as a compiled language) has had to move to improve its underlying security. In the last year, the Java code itself (in this case Jmol) has needed to be digitally signed in a standard manner, and this meant that many an old site that used unsigned older versions has started to throw up increasingly alarming messages.
A newcomer in the game of how we find and use data.
Saturday, May 17th, 2014I remember a time when tracking down a particular property of a specified molecule was an all day effort, spent in the central library (or further afield). Then came the likes of STN Online (~1980) and later Beilstein. But only if your institution had a subscription. Let me then cut to the chase: consider this URL: http://search.datacite.org/ui?q=InChIKey%3DLQPOSWKBQVCBKS-PGMHMLKASA-N The site is datacite, which collects metadata about cited data! Most of that data is open in the sense that it can be retrieved without a subscription (but see here that it is not always made easy to do so). So, the above is a search for cited data which contains the InChIkey LQPOSWKBQVCBKS-PGMHMLKASA-N. This produces the result:
This tells you who published the data (but oddly, its date is merely to the nearest year? It is beta software after all). The advanced equivalent of this search looks like this:
Disambiguation/provenance of claimed scientific opinion and research.
Monday, May 5th, 2014My name is displayed pretty prominently on this blog, but it is not always easy to find out who the real person is behind many a blog. In science, I am troubled by such anonymity. Well, a new era is about to hit us. When you come across an Internet resource, or an opinion/review of some scientific topic, I argue here that you should immediately ask: “what is its provenance?”
Trigonal bipyramidal or square pyramidal: Another ten minute exploration.
Friday, May 2nd, 2014This is rather cranking the handle, but taking my previous post and altering the search definition of the crystal structure database from 4- to 5-coordinate metals, one gets the following.