Rescuing Unloved Data – Love Your Data Week 2017

Information from Love Your Data Week.

Message of the day

“Data that is mobile, visible and well-loved stands a better chance of surviving” ~ Kurt Bollacker

Things to consider

Legacy, heritage and at-risk data share one common theme: barrier to access. Data that has been recorded by hand (field notes, lab notebooks, handwritten transcripts, measurements or ledgers) or on outdated technology or using proprietary formats are at risk.

Securing legacy data takes time, resources and expertise but is well worth the effort as old data can enable new research and the loss of data could impede future research. So how to approach reviving legacy or at-risk data?

How do you eat an elephant? One bite at a time.

  1. Recover and inventory the data
    • Format, type
    • Accompanying material–codebooks, notes, marginalia
  2. Organize the data
    • Depending on discipline/subject: date, variable, content/subject
  3. Assess the data
    • Are there any gaps or missing information
    • Triage–consider nature of data along with ease of recovery
  4. Describe the data
    • Assign metadata at the collection/file level
  5. Digitize/normalize the data:
    • Digitization is not preservation. Choose a file format that will retain its functionality (and accessibility!) over time: “Which file formats should I use?”
  6. Review
    • Confirm there are no gaps or indicate where gaps exist
  7. Deposit and disseminate
    • Make the data open and available for re-use

(more…)

Finding the Right Data – Love Your Data Week 2017

Information from Love Your Data Week.

Message of the day

Need to find the right data? Have a clear question and locate quality data sources.

Things to consider

romanticlocationicon_nounprojectIn a 2004 Science Daily News article, the National Science Foundation used the phrase “here there be data”to highlight the exploratory nature of traversing the “untamed” scientific data landscape. The use of that phrase harkens to older maps of the world where unexplored territories or areas on maps bore the warning ‘here, there be [insert mythical/fantastical creatures]’ to alert explorers to the dangers of the unknown. While the research data landscape is (slightly) less foreboding, there’s still an adventurous quality to looking for research data.

(more…)

Good Data Examples – Love Your Data Week 2017

Information from Love Your Data Week.

Message of the day

Good data are FAIR – Findable, Accessible, Interoperable, Re-usable

Things to consider

What makes data good?

  1. It has to be readable and well enough documented for others (and a future you) to understand.
  2. Data has to be findable to keep it from being lost. Information scientists have started to call such data FAIR — Findable, Accessible, Interoperable, Re-usable. One of the most important things you can do to keep your data FAIR is to deposit it in a trusted digital repository. Do not use your personal website as your data archive.
  3. Tidy data are good data. Messy data are hard to work with.
  4. Data quality is a process, starting with planning through to curation of the data for deposit.

phd072613s
Source: http://www.phdcomics.com/comics/archive.php?comicid=1612
Remember! “Documentation is a love letter to your data”

(more…)

Documenting, Describing, Defining – Love Your Data Week 2017

Information from Love Your Data Week.

Message of the day

Good documentation tells people they can trust your data by enabling validation, replication, and reuse.

Things to consider

Why does having good documentation matter?

  • It contributes to the quality and usefulness of your research and the data itself – for yourself, colleagues, students, and others.
  • It makes the analysis and write-up stages of your project easier and less stressful.
  • It helps your teammates, colleagues, and students understand and build on your work.
  • It helps to build trust in your research by allowing others to validate your data or methods.
  • It can help you answer questions about your work during pre-publication peer review and after publication.
  • It can make it easier for others to replicate or reuse your data. When they cite the data, you get credit! Include these citations in your CV, funding proposal, or promotion and tenure package.
  • It improves the integrity of the scholarly record by providing a more complete picture of how your research was conducted. This promotes public trust and support of research!
  • Some communities and fields have been talking about documentation for decades and have well-developed standards for documentation (e.g., geospatial data, clinical data, etc.), while others do not (e.g., psychology, education, engineering, etc.). No matter where your research community or field falls in this spectrum, you can start improving your documentation today!

(more…)

Defining Data Quality – Love Your Data Week 2017

Information from Love Your Data Week.

Message of the Day 

Data quality is the degree to which data meets the purposes and requirements of its use. Depending on the uses, good quality data may refer to complete, accurate, credible, consistent or “good enough” data.

Things to consider

What is data quality and how can we distinguish between good and bad data? How are the issues of data quality being addressed in various disciplines?

  • Most straightforward definition of data quality is that data quality is the quality of content (values) in one’s dataset. For example, if a dataset contains names and addresses of customers, all names and addresses have to be recorded (data is complete), they have to correspond to the actual names and addresses (data is accurate), and all records are up-to-date (data is current).
  • Most common characteristics of data quality include completeness, validity, consistency, timeliness and accuracy. Additionally, data has to be useful (fit for purpose) and documented and reproducible / verifiable.
  • At least four activities impact the quality of data: modeling the world (deciding what to collect and how), collecting or generating data, storage/access, and formating / transformation
  • Assessing data quality requires disciplinary knowledge and is time-consuming
  • Data quality issues: how to measure, how to track lineage of data (provenance), when data is “good enough”, what happens when data is mixed and triangulated (esp. high quality and low quality data), crowdsourcing for quality
  • Data quality is responsibility of both data providers and data curators: data providers ensure the quality of their individual datasets, while curators help the community with consistency, coverage and metadata.

“Care and Quality are internal and external aspects of the same thing. A person who sees Quality and feels it as he works is a person who cares. A person who cares about what he sees and does is a person who’s bound to have some characteristic of quality.” 
― Robert M. Pirsig, Zen and the Art of Motorcycle Maintenance: An Inquiry Into Values

(more…)