Reasons to share your data
Data sharing encourages more collaboration between researchers and disciplines, cuts down on duplication of already existing research, enables reproducibility, and benefits the public. The following are some other reasons to share your data:
- To raise interest in publications. One recent study found a 25% increase in citations for articles whose associated data were available online.
- To speed research. Particularly in complex fields, data sharing can accelerate discovery rates, as researchers into Alzheimer’s disease and researchers at UW-Madison studying the Zika virus discovered.
- To establish priority. Data shared in a repository online can be time stamped to establish the date they were produced, blocking “scooping” tactics.
- To fulfill funder and journal requirements. Grant funders and (in some disciplines) journals may require data sharing. If you have questions about data sharing or about writing data management plans for grants, contact us.
Considerations before sharing
Before sharing your data, it is important to know the expectations, standards, policies, and laws that affect your data. The following are a few key questions to consider before you share. For more information please see our Responsible Data Planning, Use, and Sharing micro-course.
- Restrictions. Does your data contain confidential, sensitive, or private personal information? If you anonymize, can individuals in the dataset be reidentified? Are there any legal or intellectual property-related restrictions?
- Documentation. Are your datasets understandable to those who wish to use them? Have you included all the metadata, methodology descriptions, codebooks, data dictionaries, and other descriptive material that someone looking at the dataset for the first time would need?
- Standards. Do your datasets comply with description, format, metadata, and sharing standards in your field?
- Licensing. What reuse policies do you wish for your data? Consider the Open Knowledge Foundation’s definition of open data carefully before you attach reuse restrictions. For more information on data licensing, see Introduction to intellectual property rights in data management from Cornell University.
The FAIR Principles provide guidelines to preparing and sharing your data to improve the reusability and the machine-actionability of your data. These 4 guiding principles in FAIR focus on steps for making your data Findable, Accessible, Interoperable, and Reusable. The guidelines include further actions that detail some specific best practices that can help you better describe and license your data as well as help guide your choice of an appropriate repository. There are also guidelines for applying the FAIR principles to your research software, known as the FAIR4S Principles.
Places to share data
- Disciplinary repository. If there are well-known disciplinary data repositories in your field, we recommend prioritizing them as options for data sharing as this will increase its discoverability. Examples include ICPSR and the National Snow and Ice Data Center.
- Institutional repository. A campus repository for scholarly outputs from the university with a commitment to long-term preservation. At UW-Madison, we have MINDS@UW for data from any UW-Madison affiliated researcher and the Data and Information Services Center’s Online Data Archive for Social Sciences datasets.
- Generalist repositories. Generalist repositories accept data regardless of the disciplinary focus or data type. Dryad, Zenodo and OSF (Open Science Framework) are good options for generalist repositories, as they accept submissions from any discipline and have focused on infrastructure and preservation plans. Dryad, which allows UW-Madison to deposit for free through our institutional membership, allows you to upload scripts and software source code and publish them on Zenodo.
- For more information on MINDS@UW and our Dryad membership, visit our Data Repositories page.
To browse data repository options for various disciplines, check out DataCite’s Re3Data.org, the Registry of Research Data Repositories. Please check repository pages for up-to-date features and information or contact us with any questions.
If your research project makes use of code for the cleaning, analysis, or transformation of data, consider including that code in any data sharing plans. Because code operates in specific computing environments and using specific software packages or is dependent on custom installs, it can be difficult to share, but there are steps you can take to make sure the code you use translates into other contexts.
- Documentation: As with data, it’s important to attach metadata to your code. A common practice is to generate a README file that explains variable names, operators, and other features of the code that help others reuse and cite it.
- Commenting: Well-written code features commentary that explains the code to future users. It’s usually not necessary to explain what the code does (that’s often legible in the code itself), but good comments illuminate why your team made certain decisions in the drafting of your code base.
- File Formats: Also like sharing data, it’s a best practice to use open source formats for storing your code. Plain text files are the simplest and easiest way to be sure anyone who wants to use the code can.
Repositories for code
There are short term and long term repositories for sharing code, some that are better for sharing code alongside your other research materials, and specific ones that may be required by your field, funders, or journals.
- Short term repositories: There are specialized code repositories that prioritize short term sharing, collaboration, and version control that are less concerned with long term storage and preservation of the code. Examples of these include GitHub, GitLab, Bitbucket, and Code Ocean. These shorter term code repositories can often be combined with the options shared below for long term preservation purposes or to share code alongside other research outputs.
- Long term repositories: Long term repositories provide long term storage of any file format and make your code and other research outputs shareable into the future. Examples include Open Science Framework (OSF) , Figshare, and Zenodo. Zenodo offers custom digital object identifiers (DOIs), including for GitHub codebases through a partnership designed to make code stored in GitHub citable.
What help is available?
These people and services can help you make data-sharing decisions:
- An RDS team member.
- The subject librarian for your department.
- An informatics specialist or IT consultant in your department.
- Data and Digital Scholarship Manager Cameron Cook
Tenopir C, Allard S, Douglass K, Aydinoglu AU, Wu L, et al. (2011) Data Sharing by Scientists: Practices and Perceptions. PLoS ONE 6(6).