The university sector in the UK has quality inspections of its research outputs conducted every seven years, going by the name of REF or Research Excellence Framework. The next one is due around 2020, and already preparations are under way! Here I describe how I have interpreted one of its strictures; that all UK funded research outputs (i.e. research publications in international journals) must be made available in open unrestricted form within three months of the article being accepted for publication, or they will not be eligible for consideration in 2020.
At the outset, I should say that one infrastructure to help researchers adhere to the guidelines is being implemented in the form of the Symplectic system. This allows a researcher to upload the final accepted version of a manuscript. At Imperial College, a digital repository called Spiral serves this purpose and also acts as the front end for collecting informative metadata to enhance discoverability. The final accepted version is then converted by the publisher into a version-of-record. This contains styling unique to the publisher and the content is subjected to further scrutiny by the authors as proof corrections. In an ideal world, these latter changes should also be faithfully propagated back to the final accepted version, as would all the supporting information associated with the article. Since most authors do not exactly enjoy the delights of proof corrections, this final reconciliation of the two versions may not always be assiduously undertaken.
I became concerned about the existence of two versions of any given scientific report and that the task of ensuring total fidelity in the content of both versions may negatively impact on the author’s time. Much better if the publisher could grant permission for the author to archive the version-of-record into a digital repository.
Some experiments were needed, and I decided to start them in reverse, by archiving my oldest publications. Since Symplectic now provides a system to do this, I began by using it. Symplectic identifies each publisher’s policies for archival, of which the most liberal are known as ROMEO GREEN. To quote from the definition, this colour allows the author to “archive pre-print and post-print or publisher’s version/PDF“. In an afternoon I had processed most of my ROMEO green articles. You know how it is sometimes, you do not read the fine print! And so the library soon informed me that archival of ROMEO GREEN was in fact only permitted on the author’s “personal web page”. Spiral, as an institutional repository, does not apparently constitute a personal web page for me and so none of my Symplectic submissions could be accepted for archival there.
Time to rethink the experiment. Firstly, I very much wanted the reprints to be held by a proper digital repository rather than a conventional web page. Why? I wanted my reprints to adhere as much as possible to FAIR: findable, accessible, interoperable and re-usable. Well, at least the first two of those (the last two relate more to data). A repository is designed to hold metadata in a formal and standards-based manner and metadata helps achieve FAIR. So I asked the Royal Society of Chemistry (as a ROMEO GREEN publisher) whether a personal web page hosted on a digital repository would qualify. I was soon informed that I had proposed a neat solution here, and they couldn’t see an issue.
Now, all I had to do is find a repository where I could create such a personal web page. The chemistry department at Imperial College has for ten years hosted a DSpace repository called SPECTRa[cite]10.1021/ci7004737[/cite] which already has the functionality for individuals to create personal collections. I had also picked up on the increasing attention being given to Zenodo, like the World-Wide Web itself an offshoot of CERN (of large Hadron Collider fame) and born from the need for researchers to more permanently archive the outputs of their researches. These outputs include software, videos, images, presentations, posters, publications and (most obviously for CERN) datasets. I thought I would include them in my experiment as well. There results are summarised below.
DSpace-SPECTRa | Zenodo | |
---|---|---|
Community | Henry Rzepa personal web page reprint collection |
Rzepa personal computational chemistry data and reprint page |
Collection | Royal Society of Chemistry reprints | |
Publication | 10042/195577 | 10.5281/zenodo.18758[cite]10.5281/zenodo.18758[/cite] |
Thesis | 10044/1/20860[cite]http://doi.org/10044/1/20860[/cite] | 10.5281/zenodo.18777[cite]10.5281/zenodo.18777[/cite] |
Dataset | 10.14469/ch/191342[cite]10.14469/ch/191342[/cite] | 10.5281/zenodo.18632[cite]10.5281/zenodo.18632[/cite] |
Harvesting | OAI-ORE | OAI-PMH |
The last line of this table includes a link to another design feature of a repository, facilitating the ability to harvest the content. The ContentMine project (“The right to read is the right to mine!“) has shown how such harvesting of facts from the literature can be automated on a vast scale, and (IMHO) represents an example of those disruptive innovations that have the power to change the world forever. It also enshrines the idea that scientific facts funded by the public purse should be capable of being openly liberated from their containers. A harvestable repository seems an ideal container for achieving this.
My experiment is part of what might be seen as the increasingly subtle interplay between:
- scientific authors, whose creative endeavour research is and without whom scientific publishers would not exist
- publishers who create a business model from the content freely given them by authors but also (especially if a commercial publisher) need to be accountable to their shareholders.
- the funding councils, many of whom now wish the outcomes of the research they fund to be openly available to all
- the local libraries/administrators who have to adhere to/enforce all the rules contractually handed down to them by publishers whose direct customers they are, but who also need to serve their community of readers and authors.
- researchers who would rather do research than fret about the above, and who would rather spend limited resources doing that research rather than diverting an increasing amount of their attention into the above system.
- readers, who need unimpeded access to the research endeavours of others, but often have little influence on the policies and actions of all the other stakeholders, since they are NOT considered customers (of the publishers).
- etc. etc.
My experiment was in part designed to explore these rules, their interpretations and their boundaries. For the time being at least I seem to have found an arrangement that allows me to distribute versions-of-record of my own work, thanks to a generous and far-sighted learned society publisher. Watch this space!
Acknowledgments
This post has been cross-posted in PDF format at Authorea.
Tags: Academia, Academic publishing, Archival science, author, Data management, Digital library, EPrints, Institutional repository, Knowledge, Knowledge representation, Library science, metadata, Open access, PDF, personal web page, Preprint, Publishing, Repository, researcher, ROMEO GREEN, Science, Technology/Internet, United Kingdom, web server