How FAIR are the data associated with the 2017 Molecules-of-the-Year?

C&EN has again run a vote for the 2017 Molecules of the year. Here I take a look not just at these molecules, but at how FAIR (Findable, Accessible, Interoperable and Reusable) the data associated with these molecules actually is.

I went about finding out as follows:

  1. The article DOI for all seven candidates was linked to the C&EN site.
  2. From there I manually tracked down the Supporting information
  3. Some of this SI gave a CCDC deposition number for crystal structure data for the molecule in question. The easiest way of going directly to the data was to use the search.datacite.org search engine and to enter the keywords CCDC + deposition number. This gives a DOI for the data, examples of which are included in the table below.
  4. In other examples, I used the CSD Conquest search program and entered the names of 2-3 of the authors of the articles. This also worked well.
  5. Most of the SI files, downloaded as PDF files also had static images of NMR spectra included. This is not active data, and hence does not fulfil the F and I of FAIR, and probably the A as well. None of it is FAIR as defined by my post here although it is actually really easy to make it so. One of the examples had ~116 spectra so unFAIRed.
  6. In another example there was also computational data, included simply as a set of XYZ coordinates and again contained in the PDF file. This too is not really FAIR, since one has to know how to extract it from this container and repurpose it. It also represents a tiny subset of the data potentially available.
How FAIR are the data associated with the 2017 Molecules-of-the-Year?
# Title Article DOI Data DOI
1 Persulfurated Coronene: A New Generation of “Sunflower” 10.1021/jacs.6b12630 Data available only as PDF
Hosted by Figshare
The SI also has its own DOI:
10.1021/jacs.6b12630.s001
2 A Truncated Molecular Star 10.1021/jacs.6b12630 Crystal structure data:
10.5517/ccdc.csd.cc1nb303
3 Synthesis of trinorbornane 10.1039/c7cc06273g Crystal structure data:
10.5517/ccdc.csd.cc1p7806
4 Braiding a molecular knot with eight crossings 10.1126/science.aal1619 Crystal structure data:
10.5517/ccdc.csd.cc1m85y0
5 Unique physicochemical and catalytic properties dictated by the B3NO2 ring system 10.1038/nchem.2708 Crystal structure data:
10.5517/ccdc.csd.cc1lkff0
6 Total synthesis of mycobacterial arabinogalactan containing 92 monosaccharide units 10.1038/ncomms148510 116 NMR spectra available only as PDF. No crystal structure
7 Nitrogen Lewis Acids 10.1021/jacs.6b12360 NMR spectra available only as PDF.
Computed coordinates available only as PDF
Crystal structures data:
CCDC 1457983-1457987,1458000-1458001
e.g. 10.5517/ccdc.csd.cc1ky4qc
10.5517/ccdc.csd.cc1ky4rd

The FAIRness of the data for these molecules of the year is largely rescued by the crystal structure data deposited with the CCDC in their CSD database and rendered F of FAIR by the persistent identifiers such as the (parochial) deposition numbers or the more general DOI. Now if the NMR and computational data were also covered in this way, we would be making great progress. There are of course many other types of data included with these examples, and procedures for making such data also FAIR have to be worked out by the community.

In order to construct the table above, I had to put about two hours of effort into tracking down the items (and this only because I have done this sort of search before). Perhaps next year I might persuade C&EN to include such a table in their own article!

Tags: , , , , , , , , , , ,

Leave a Reply