Proposal for a ‘Virtuous Cycle’ of Data Citations

A group of colleagues from the Association of American Medical Colleges, the Multi-Regional Clinical Trials Center at Brigham and Women’s Hospital, and Harvard Medical School recently published a commentary piece in Nature, Credit data generators for data reuse,” that encourages researchers and others working with data to place increased emphasis on the origins of data as a way to track its origins as well as its reuse (who created and who reused it).

Their commentary reacts to the increase in mandates from funders and publishers for researchers to publish and share their data at the end of a research project; however, there is not yet a robust system in place that streamlines the process of tracking and crediting data reuse to represent how it has been used since its original creation or publication. This aim to cite a data set when it provides the basis for new scientific discovery and conclusion, as well as when it is reused for further discovery, either by the same group of researchers or other researchers, reinforces the importance of ensuring that data adhere to FAIR principles (findable, accessible, interoperable, reusable).

The solution that the authors of this commentary present is to ensure that both the data set and the individual researchers (those who created the data set and those who are potentially reusing the data set) have persistent identifiers (PIDs). They call this a “Virtuous Cycle.” Individual researchers should ideally use ORCiD as the source of their persistent identifier, and data sets are encouraged to be identified using a digital object identifier, or DOI. Data repositories commonly provide persistent identifiers in the form of DOIs when data sets are deposited. A diagram showing a data set’s journey and connections to researchers and research outputs is below.

When the creation (data set), their creators, analysts, and those reusing the data (researchers), and resulting research output (journal article) apply or include persistent identifiers to both the data they’re working with and themselves, they create lasting linkages between themselves and their work, allowing their work and the data they’re working with to be tracked within searchable databases. Another aspect of linking research data and outputs to researchers is to link grant funding to researchers and to research projects and outputs; this is an area that is not as widely discussed as the data from research, but can be another way of assessing the impact of research projects.

Using such identifier systems to link researchers with data and publications provides a more established way for researchers to be recognized and credited for their work and contributes to the ability for their citation impact to be assessed.