Preserving digital data is more complex than just saving it on a hard drive or a server, and much more complex than storing paper copies. Digital data can often degrade faster, as computer and software systems update. It is smart to submit your final data to a trusted repository that is equipped to effectively preserve your data, and funders will likely ask you for details about your plan for preserving or archiving your data.
What data should be preserved?
It’s rarely likely that you’ll need to preserve all of the data created over the course of a research project. Rather, you’ll want to prioritize preserving data that cannot be re-created or produced, data that is costly to reproduce, data of one-time events, experimental data, etc. You’ll need to preserve data that is needed to validate your research findings, in accordance with funding agency requirements.
How long is “long-term” preservation?
Many funders want a DMP to include a plan for “long-term preservation” without specifying how far into the future data needs to be preserved. It can be easier to provide a minimum amount of time that you plan on preserving your data than a maximum: a good rule of thumb to follow when determining a minimum amount of time is to consider the amount of time it takes for a paper to be cited and then to add 5 years. Always make sure that the time period that you commit to is acceptable to your peers and follows the funding agency’s requirements. Institutions, publishers, and repositories may have a minimum retention period to follow, as well.
- Data will be kept for at least 10 years. After this time, the data may be subject to deletion if it has not been reused, accessed, or cited.
- Data will be available for the longevity of the repository, as it will be deposited to Dryad, which is part of the DataOne network, a group committed to long-term preservation and access.
Who Is Responsible for Preserving Data?
The most responsible and reliable way to preserve your data is to seek out a data custodian, like a data repository. When possible, you should deposit your data in a repository that provides curation services, rather than just preservation services. Curated data is more valuable, easier to locate and reuse, and more highly cited. Curation activities include verifying the integrity and quality of data, migrating data formats, and creating descriptive records for data. If depositing in a repository that does not offer curation services, though, be sure to thoroughly document your data, or reach out to a RDS consultant or data librarian.
- An example of a data repository that offers data curation services is the ICPSR (Inter-university Consortium for Political and Social Science Research). As UW-Madison is a member institution with ICPSR, UW-Madison researchers can deposit data at no cost, but they must pay a fee to make the data open to the public. ICPSR provides a DOI and a data citation for all deposited datasets.
- Other options for data repositories (that may or may not offer curation services) include: MINDS@UW, Dryad, Figshare, and OSF (Open Science Framework).
- Try to use a disciplinary repository commonly used in your research field whenever possible. If your domain does not have a common repository, or if the common repository in your area isn’t a good fit for your data, then you should seek alternatives.
- Another option for UW-Madison researchers is to use the institutional repository MINDS@UW.
Many data repositories have requirements for depositing regarding data types, file formats, file size limits, or Creative Commons licenses. Make sure that you are familiar with the requirements of a data repository that you intend to deposit with. For more information on sharing data and selecting a repository, you can refer to our Data Sharing Essentials page.
- Which data should be preserved?
- How long will the data be preserved?
- Who is responsible for preserving the data?
This content was adapted from Iowa State University Library’s Data Management Plan Guide.