16 April 2018 – The European Commission has taken the bold step to proclaim a Europe that is by 2020 to be ‘Open Innovation’, ‘Open Science’ and ‘Open to the World’. Given the speed of these developments, it is crucial to face up to a key issue: the question of trust in and quality of data shared through Open Science mechanisms.
Clearly, a 24/7 release of raw data by all labs will rapidly clog up even the most well-resourced repositories, while – in that form – providing only limited benefit to the community. Without the provision of stable, structured databases and repositories that are curated and quality-controlled, we risk sinking in a swamp of data that will be largely ignored.
Preprints ought to play a key part in any Open Science agenda as a bridge between data and research papers, and they are poised to be accepted as a community standard in the biosciences. Now is the time to establish mechanisms that ensure we select for rigorous results that are associated with sufficient metadata to render them reproducible and discoverable.
Curation is key to discoverability
Key to the success of preprints is ease and speed of submission. As their volume increases, we need to find scalable ways to integrate quality control steps that reinforce the rigor and utility of the science shared, while posing minimal friction for researchers.
This is an opportunity to go beyond what the average scientific journal can muster by fortifying new authoring tools with templated methods sections that allow the reporting of materials and methods in a structured, machine-readable way.
The data presented in figures and tables represent the core of the scientific evidence in a manuscript. The level of confidence a reader has in non-peer reviewed scientific work will be reinforced if preprints can be linked to each other and to published research papers via the experimental results shown in figures. For example, the EMBO SourceData platform turns figures and their legends into searchable machine-readable metadata that describe the design of the reported experiments. Interlinking the data presented in figures helps users to place a given preprint in the context of related data.
At the same time, such carefully curated figures are directly discoverable by data-directed search technology. It will be crucial to enable researchers to interrogate the published literature for specific experiments including the associated data, materials and methods.
Quality control: how and by whom?
The assessment of the quality of preprints can happen at three levels:
Are the experiments reported in a way that allows their interpretation and replication by others? Is the preprint marred by problems such as image manipulation or sub-par statistics?
Are the experiments carried out in a robust manner that allows meaningful interpretation?
Are the experiments reported valuable to the research community? Do they warrant preservation, curation and dissemination?
The current assessment of preprints does not cover any of these, at least formally. We need to decide which of these assessment levels should be implemented, and which would be desirable. Ideally, we would find a scalable way to systematically apply all three levels, but who would take on these responsibilities?
Curation and quality control of the data can be executed optimally as an integrated workflow by quality checkers, be they editors, curators or researchers, who are assisted by state-of-the art automation.
Screening for work that is meaningful and valuable to the community (points 2 and 3, respectively) will have to be done by knowledgeable experts. This might in principle include the established pool of senior academics who review papers. However, we ought to tap into the pool of more junior researchers to avoid exacerbating the peer review bottleneck. There is a vast and highly capable community of experienced postdocs who would be well placed to carry out these tasks. A second group might be retired academics interested in staying engaged with the community.
Linking preprints and journals
Journals apply peer review and editorial assessment to submitted manuscripts. More progressive journals have started to apply more or less complex additional screening processes to complement peer-review. It is imperative to avoid redundancy in such screening processes. We therefore hope that the quality control and curation exercise can be applied to preprints at the point of their intersection with journals, such as the simultaneous submission for publication when posting a preprint.
But of course not all preprints need be destined for journal publication, so ultimately the screens outlined here should be applied to all preprints systematically. Once the issues of quality control, reproducibility and discoverability are addressed in preprints, they will also be applied to the published journal paper. In other words, a central quality-controlled preprint server can play an important role in improving the journal literature.
Regardless of the mechanisms through which curation and quality control will be applied to preprints, there will be a need for appropriate resources to run such undertakings in a sustainable manner.
The full version of this Commentary is available at asapbio.org/pulverer-qc