Finding, reusing, and citing data

 

Barriers to data reuse

  • Discovery: Many datasets are not findable through Google. Search journals for references to datasets, or ask a librarian to search for relevant datasets.
  • Acquisition: Even when authors say they offer dataset access, they may not respond to requests. Be persistent.
  • Cost: If you need a for-pay dataset, contact your liaison librarian to see whether the library can purchase it.
  • Licensing: Some dataset owners impose stringent reuse requirements. Read any data license carefully to be sure you will be allowed the use you want.
  • Security: Datasets with licensing or other confidentiality restrictions on access need strong digital security. If you need to safeguard a dataset, talk to the Office of Campus Information Security.

Finding data for reuse

Valuable digital data live many places:

  • On the open web. Government websites often include useful datasets, and some university researchers are also choosing to open their data for reuse.
  • In discipline-based data repositories. If you’re not sure whether a repository exists in your research area, ask a librarian.
  • In campus-based data repositories. Typically these are findable on the open web.
  • With researchers or research groups. These are hardest to locate and gain access to. The best route may be through contacting authors of relevant publications.

Reusing data

Be prepared for:

  • Data cleanup. Many datasets are poorly organized, or only available in difficult-to-reuse forms.
  • Data interpretation difficulties. Many datasets lack data dictionaries and other necessary documentation. You may have to work with the data creator to understand the dataset at all.
  • Data disappearance. Like any other website, online datasets can and do disappear without warning. Data formally deposited in data repositories tend to live longest.

Citing data

Data citation standards do not exist in many disciplines, though the DataCite initiative is working on them. Current workarounds include:

  • Citing a “data paper,” where available.
  • Citing a journal article describing the dataset.
  • Citing the dataset as a website, where possible.

Cite datasets for the same reasons you cite books and journal articles: for dataset creators to receive appropriate credit for their work, and to make clear the antecedents to your research.