by Cid Freitag, Instructional Technology Program Manager at DoIT Academic Technology
If the data you need still exists;
If you found the data you need;
If you understand the data you found;
If you trust the data you understand;
If you can use the data you trust;
Someone did a good job of data management.
Rex Sanders ‐ USGS‐Santa Cruz*
Data management practices have been described in detail in a variety of documentation and tutorials, which may focus on specific needs and resources applicable to the organization that produced them. The following is a selected list of resources that are general enough to apply to different disciplines, and more broadly than the university or agency that developed them.
Guides and Tutorials
- The University of Washington offers a well organized, comprehensive data management guide. Most of the resources listed are publicly available.
- Georgia Tech’s guide includes a webpage that aggregates the data management requirements of several federal funding agencies. Learn about data management requirements.
- Multiple authors contributed to the short guide“10 Simple Rules for the Care and Feeding of Scientific Data” which offers practical advice for researchers on practices they can follow to manage their data for sharing and reuse.
- The USGS Data Management Training Modules are tailored to the needs of the USGS, but many of the practices are applicable to any discipline.
- In particular, three short narrated tutorials give overviews of the value of data management, planning, and best practices for preparing data to share.
Data Science MOOCs
Several Massively Open Online Courses cover topics related to data analysis and research methods. Even if you choose not to do the coursework and earn a statement of completion, it’s easy to sign up for the courses, which gives you access to lectures and examples.
The Class Central website has curated a list of several data science and analysis methods MOOCs, developed by reputable sources.
The MOOCs listed here have been developed through Johns Hopkins University, and offered through the Coursera platform. They are part of a Data Science Specialization series of of courses, and have applicability to data management practices outside of specific analytical techniques. Each of these courses lasts 4 weeks, and are frequently offered. Currently, there is a new offering of each course starting each month from March through June, 2015.
The Data Scientist’s Toolbox, Jeff Leek, Roger Peng, Brian Caffo
“The course gives an overview of the data, questions, and tools that data analysts and data scientists work with.” It focuses on a practical introduction to tools, using version control, markdown, git, GitHub, R, and RStudio.
Getting and Cleaning Data, Jeff Leek, Roger Peng, Brian Caffo
“This course will cover the basic ways that data can be obtained…..It will also cover the basics of data cleaning and how to make data “tidy”… The course will also cover the components of a complete data set including raw data, processing instructions, codebooks, and processed data. The course will cover the basics needed for collecting, cleaning, and sharing data.” Tools used in this course: Github, R, RStudio
Reproducible Research, Jeff Leek, Roger Peng, Brian Caffo
“Reproducible research is the idea that data analyses, and more generally, scientific claims, are published with their data and software code so that others may verify the findings and build upon them…This course will focus on literate statistical analysis tools which allow one to publish data analyses in a single document that allows others to easily execute the same analysis to obtain the same results.” Tools: R markdown, knitr
*Rex Sanders quote from: Environmental Data Management: CHALLENGES AND OPPORTUNITIES, Jamie Gerrard | March 2014
Looking for additional information about research data management? Contact us.