Information from Love Your Data Week.
Message of the day
Good documentation tells people they can trust your data by enabling validation, replication, and reuse.
Things to consider
Why does having good documentation matter?
- It contributes to the quality and usefulness of your research and the data itself – for yourself, colleagues, students, and others.
- It makes the analysis and write-up stages of your project easier and less stressful.
- It helps your teammates, colleagues, and students understand and build on your work.
- It helps to build trust in your research by allowing others to validate your data or methods.
- It can help you answer questions about your work during pre-publication peer review and after publication.
- It can make it easier for others to replicate or reuse your data. When they cite the data, you get credit! Include these citations in your CV, funding proposal, or promotion and tenure package.
- It improves the integrity of the scholarly record by providing a more complete picture of how your research was conducted. This promotes public trust and support of research!
- Some communities and fields have been talking about documentation for decades and have well-developed standards for documentation (e.g., geospatial data, clinical data, etc.), while others do not (e.g., psychology, education, engineering, etc.). No matter where your research community or field falls in this spectrum, you can start improving your documentation today!
Stories (learn from others’ mistakes and successes)
- Error-laden database kills paper (Retraction Watch): http://retractionwatch.com/2016/12/27/error-laden-database-kills-paper-extinction-patterns/#more-46585
- The value of a good inventory system: https://www.dataone.org/data-stories/inventory-overload
- Metadata? I thought you were in charge of that: https://www.dataone.org/data-stories/metadata-i-thought-you-were-charge
- The case of the missing research protocol: https://www.dataone.org/data-stories/case-missing-research-protocol
- The importance of documenting how your images and visualizations are created: http://retractionwatch.com/2016/07/04/diabetes-researcher-logged-1-retraction-3-correx-after-pubpeer-comments/
Resources
Practical Tips by data type & format
- Numeric/Spreadsheets
- Check out Christine Bahlai’s guide for using spreadsheets for scientific data
- Check out Kristin Briney’s video and blog post on data dictionaries
- Check out Colectica for Excel to document your spreadsheet
- Lab notebooks
- Observation
- Define all your codes clearly and operationally
- Document introductory & debriefing comments
- Make sure you’ve defined codes for non-verbal behavior
- Identify annotations separately from quotes or notes
- Interview
- Documentation should include
- your assumptions
- rationale for choices in designing the interview
- the interview questions or script (if applicable)
- Relationship or map between the research questions and the interview questions
- Codes or notations for non-verbal behavior
- Syntax or codes to indicate annotations versus interview responses
- Documentation should include
General Resources
- RDS’s documentation tips: http://wwwtest.researchdata.wisc.edu/dmp-3-data-documentation/
- Best Practices for Project Metadata: http://ropensci.github.io/reproducibility-guide/sections/metaData/
- Readme files are a simple and low-tech way to start documenting your data better. Check out the sample readme.txt (filename = readme_template.txt) from IU or Cornell University’s data working group guide with tips for using readme files
- Check out Kristin Briney’s post on taking better notes
- Reining in your metadata – advice from an archivist
- Cornell University data working group also has some tips for writing metadata
Activities
Option 1: Check out some of the documentation guidelines and standards out there. What can you borrow or learn from them to improve your own documentation?
- USGS Data Management guidelines:https://www2.usgs.gov/datamanagement/describe/metadata.php
- CDISC has three foundational standards for clinical research data, including CDASH (Clinical Data Acquisitions Standards Harmonisation) & SDTM (Study Data Tabulation Model) & ADaM (Analysis Data Model)
- Marine Metadata Interoperability: https://marinemetadata.org/
- 10 Simple Rules for a Computational Biologist’s Laboratory Notebook http://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1004385
Option 2: Have a colleague, labmate or teammate review your documentation for a specific project. Ask them to tell you what is missing or unclear. If it’s a long list, choose 2-3 things to focus on improving throughout the semester.
Bonus points: Set up a regular schedule for your lab or team to review and sign off on lab notebook pages, protocols, procedures manuals, data dictionaries, or whatever forms of documentation you use. Afterwards, reward yourselves!