Good data management practices don’t only involve how you treat your data when you’re actively working with it, but also involve what you do when your research project has terminated and you are ready to archive and share your data. When identifying a repository for the final version of your data, you should consider the repository’s stability, any discipline-specific expectations, and any associated costs.
What is a repository?
A data repository is a centralized place to store digital data, usually supported and maintained by an organization or institution, that will preserve your data while also making it openly accessible to the public or a subset of users, such as other researchers.
Repositories are a great solution for those who are interested in both the long-term preservation or archiving of their data as well as sharing their data.
Sharing data, often through a trusted repository, is becoming increasingly common among many disciplines and is even required for some researchers to do. Many funding agencies and journals require researchers to make their data publicly accessible, when appropriate (e.g. sometimes data that is sensitive or restricted should not be made fully publicly accessible). Some funding agencies and journals have identified specific repositories that they require grant recipients and authors to deposit their data in. Other agencies and journals will have the expectation that researchers deposit their data into an ‘appropriate repository’–though they may or may not have clear guidelines for what makes an ‘appropriate’ repository.
In addition to providing a place for data to be archived long-term, repositories provide other benefits for you data. Typically, repositories provide a persistent and unique identifier for your deposit, which makes your data citable and more easily discoverable by others. Along the same lines as providing a unique identifier, repositories also provide a landing page for each dataset. Landing pages contain metadata that helps others find the data and provides context for the dataset, as well as relates or links it to associated publications, and cites it.
Choosing a Repository
The repository that you choose for your data should match your particular data needs. This means that you should choose a repository that is commonly used and relevant within your research domain. The repository should also support the data format that you work with and will deposit.
Budgeting for a Repository
Try to identify a repository at the outset of your project, as some repositories have a deposit fee or curation fees. Knowing in advance what costs data archiving might incur will help you budget for your data management needs. Some funding agencies will also allow you to include data management costs into the budget in your proposal.
Sharing data often has the beneficial after-effects of raising interest in publications (and thus improving citation metrics for researchers) and speeding the advancement of research. Speeding up research is especially impactful in complex fields, such as those related to health sciences.
The Registry of Research Data Repositories provides a directory of repositories that users can browse by discipline. Common generalist repositories are Zenodo and OSF (Open Science Framework), as they accept submissions from any discipline. It was also recently announced that Dryad, a repository for natural and medical sciences, will partner with Zenodo to streamline and simplify deposit options for researchers.
If you have questions about selecting a repository, feel free to contact us!