What is “digital curation?” How is it relevant to research data?
Digital curation covers cradle-to-grave data management, including storage, preservation, selection, transfer, description, sharing, access, reuse, and transformations. With the current focus on data sharing and preservation on the part of funding agencies, publishers, and research disciplines, having data management practices in place is more relevant than ever.
Why should I care about managing my research data?
Digital curation and data-management plans may already be a requirement of your grant-funded research. The NIH Data Sharing Policy has been in effect since October 1st, 2003. As of January 2011, data management plans are also a requirement of all NSF-funded grant applications. Digital curation enables faster knowledge discovery, and increases visibility and impact of published work.
What are the top recommendations for managing data?
There are six key recommendations for managing your data/digital materials to ensure their longevity and usefulness:
- keep data/digital materials in sustainable formats
- store and back up your data
- include metadata to preserve contextual information about who collected/created it, the date, instrument settings, etc.
- organize and structure data using file naming/versioning conventions, ontologies/vocabularies, and/or databases
- keep data secure and implement procedures for keeping sensitive data private
- include explanations about how data may be re-used and how the source of the data should be acknowledged
Where can I store my data long term?
Ask the journals in which you publish whether they will curate data, perhaps as “supplementary materials.” If they do, ask for some sample articles with underlying data to make sure they do a good job; some don’t! Alternately, find out whether a data repository exists for your discipline or research specialty; the subject librarian for your discipline may be able to help.
How can I make sure that data collected by several lab members can be found and used in the future, since lab members come and go?
This is entirely dependent on the processes and procedures that your lab designs and agrees upon. You may wish to look into electronic lab notebook solutions; if they turn out to be overkill, a simple wiki may be perfectly adequate. Examples of “open notebook science” may give you ideas for useful tools, even if you are unprepared to make your process public.
Is storing my data in Excel spreadsheets adequate?
It may be, particularly for short-term needs. Keep in mind, however, that the Excel file format changes over time; your spreadsheet may lose functionality or become unreadable by current software. When you finish your analysis, replace functions with their values, and keep at least one copy of your work in comma-separated value (CSV) or Open Document format.
What are ontologies and metadata? Do I need to use them?
An ontology is a shared vocabulary to describe objects, their properties and their interrelationships in a domain or field. More simply put, ontologies represent knowledge in a structured and consistent fashion to facilitate the organization and automated processing of information.
Metadata (“data about data”) includes but is not limited to information on the authorship, subject, origin, organization, and intellectual-property rights that pertain to a set of data or a publication. Without metadata, datasets are easy to lose and difficult or impossible to interpret.
Ontologies and metadata become increasingly important as datasets become larger and more complex, particularly when data doesn’t lend itself to search-engine retrieval, as with image data. Incorporating appropriate ontologies and/or metadata into your research processes can help you organize data, meet funding requirements, and help other researchers collaborate or build on your results more easily.