Definition
Metadata is information about the context, content, quality, provenance, and/or accessibility of a set of data.
Relevance
Metadata may be . . .
- required for depositing a data set in disciplinary repositories or for publishing it in research journals
- critical documentation for the longevity and reproducibility of research data
- useful for visualizing or analyzing the data in data files
What are some examples of metadata?
Metadata can exist in a variety of different formats. Some of the most common ones are summarized in the table below.
Type of metadata | Example of this type |
---|---|
A text or html document. | Metadata includes authors, dates, location, etc. This metadata accompanies the dataset on Seasonal Frost Depths, Midwestern USA (1971-1981) that is archived in the National Snow and Ice Data Center. |
An XML document linked to data files. | Metadata includes authors, locations, dates, etc. This metadata is linked to a dataset of locations in Northeastern Illinois, Northwestern Indiana, and Southeastern Wisconsin where alternative vehicle fuels are available. This data was collected by the City of Chicago and is provided on Data.gov. (Note: you may need to select “View page source” in your browser to see the XML format.) |
Information embedded in an XML data file. | Metadata includes authors, dates, organism, publication, instrument, etc. It is kept within the X-ray diffraction data file for UDP-galactopyranose mutase in the Protein Data Bank repository. (Note: you may need to select “View page source” in your browser to see the XML format.) Follows the PDBML (Protein Data Bank Markup Language) specification. |
What metadata help is available?
A data specialist from one of the following groups may be able to help you find, adapt, and use an appropriate metadata standard.
- An informatics specialist or IT consultant in your department.
- A digital curation consultant.
- The subject librarian for your department.
- A disciplinary society in your research area.

What metadata should I use?
Metadata standards specify what pieces of information are included and how they are expressed in digital files. Some are generic enough to be useful across a wide array of disciplines, while others are highly specific to disciplinary areas. You may select a metadata standard based on the discipline that you’re working in, or the type of data that you’re working with.
We cannot provide a comprehensive list here. Instead, we include examples in broad disciplinary areas, plus a “general” category. Where possible, we selected examples that appear to have broad adoption within or across disciplinary areas.
Disciplinary area | Metadata standard | Description |
---|---|---|
General | Dublin Core | Widely used in disciplinary and institutional repositories. |
Disciplinary Metadata from the DCC | Searchable list of disciplinary metadata standards and related information. Includes biology, Earth science, physical science, social science & humanities and general research data. | |
Altova Schema library | A reference library to common (and uncommon) industry and cross-industry schemas. | |
Life Sciences | Darwin Core | Designed to facilitate the sharing of information about biological diversity. It is primarily based on taxa, their occurrence in nature as documented by observations, specimens, and samples and related information. |
EML (Ecology Metadata Language) | Maintained by the Ecological Society of America. Consists of XML modules that can be used to document ecological datasets. | |
Humanities | Seeing Standards: A Visualization of the Metadata Universe | Information on 105 cultural heritage metadata standards. |
TEI (Text Encoding Initiative) | A widely-used standard for representing textual materials in XML. | |
VRA (Visual Resources Association) Core | A metadata standard for works of visual culture and the images that document them. | |
Social Sciences | DDI (Data Documentation Initiative) | A metadata specification for the social and behavioral sciences was created by the Data Documentation Initiative, and is used to document data through its lifecycle and to enhance dataset interoperability. |