Barriers to data reuse
- Discovery: Many datasets are not findable through Google. Search journals for references to datasets, or ask a librarian to search for relevant datasets.
- Acquisition: Even when authors say they offer dataset access, they may not respond to requests. Be persistent.
- Cost: If you need a for-pay dataset, contact your liaison librarian to see whether the library can purchase it.
- Licensing: Some dataset owners impose stringent reuse requirements. Read any data license carefully to be sure you will be allowed the use you want.
- Security: Datasets with licensing or other confidentiality restrictions on access need strong digital security. If you need to safeguard a dataset, talk to the Office of Cybersecurity.
Finding data for reuse
Valuable digital data live many places:
- On the open web. Government websites often include useful datasets, and some university researchers are also choosing to open their data for reuse.
- In general repositories. Repositories like Open Science Framework, Zenodo, and figshare are examples of general repositories where researchers share data.
- In discipline-based data repositories. If you’re not sure whether a repository exists in your research area, you can search re3data, a global registry of research data repositories, or ask a librarian as they will have suggested resources for you.
- In campus-based data repositories. Many academic institutions now have repositories for data and research outputs such as papers, presentations, or theses that are typically findable on the open web.
- With researchers or research groups. These are hardest to locate and gain access to. The best route may be through contacting authors of relevant publications.
Be prepared for:
- Data cleanup. Many datasets are poorly organized, or only available in difficult-to-reuse forms.
- Data interpretation difficulties. Many datasets lack data dictionaries and other necessary documentation. You may have to work with the data creator to understand the dataset.
- Data disappearance. Like any other website, online datasets can and do disappear without warning. Data formally deposited in data repositories tend to live longest. Always try to cite data by persistent identifiers like DOIs or handles.
Cite datasets for the same reasons you cite books and journal articles: for dataset creators to receive appropriate credit for their work, and to make clear the antecedents to your research.
Data citation standards are still emerging in many disciplines, but, you can cite data much like you would cite any other work. Some disciplinary style manuals, professional organizations, or journals and repositories will have guidance on preferred data citation formats, while others may not.
ICPSR suggests the minimum elements of data citation as:
- Persistent identifier (such as the Digital Object Identifier, Uniform Resource Name URN, or Handle System)
- Persistent identifiers are preferred, but if not available a URL will suffice.
Other information on data citation: