Rescuing Unloved Data – Love Your Data Week 2017

Information from Love Your Data Week.

Message of the day

“Data that is mobile, visible and well-loved stands a better chance of surviving” ~ Kurt Bollacker

Things to consider

Legacy, heritage and at-risk data share one common theme: barrier to access. Data that has been recorded by hand (field notes, lab notebooks, handwritten transcripts, measurements or ledgers) or on outdated technology or using proprietary formats are at risk.

Securing legacy data takes time, resources and expertise but is well worth the effort as old data can enable new research and the loss of data could impede future research. So how to approach reviving legacy or at-risk data?

How do you eat an elephant? One bite at a time.

  1. Recover and inventory the data
    • Format, type
    • Accompanying material–codebooks, notes, marginalia
  2. Organize the data
    • Depending on discipline/subject: date, variable, content/subject
  3. Assess the data
    • Are there any gaps or missing information
    • Triage–consider nature of data along with ease of recovery
  4. Describe the data
    • Assign metadata at the collection/file level
  5. Digitize/normalize the data:
    • Digitization is not preservation. Choose a file format that will retain its functionality (and accessibility!) over time: “Which file formats should I use?”
  6. Review
    • Confirm there are no gaps or indicate where gaps exist
  7. Deposit and disseminate
    • Make the data open and available for re-use

(more…)

Finding the Right Data – Love Your Data Week 2017

Information from Love Your Data Week.

Message of the day

Need to find the right data? Have a clear question and locate quality data sources.

Things to consider

romanticlocationicon_nounprojectIn a 2004 Science Daily News article, the National Science Foundation used the phrase “here there be data”to highlight the exploratory nature of traversing the “untamed” scientific data landscape. The use of that phrase harkens to older maps of the world where unexplored territories or areas on maps bore the warning ‘here, there be [insert mythical/fantastical creatures]’ to alert explorers to the dangers of the unknown. While the research data landscape is (slightly) less foreboding, there’s still an adventurous quality to looking for research data.

(more…)

Good Data Examples – Love Your Data Week 2017

Information from Love Your Data Week.

Message of the day

Good data are FAIR – Findable, Accessible, Interoperable, Re-usable

Things to consider

What makes data good?

  1. It has to be readable and well enough documented for others (and a future you) to understand.
  2. Data has to be findable to keep it from being lost. Information scientists have started to call such data FAIR — Findable, Accessible, Interoperable, Re-usable. One of the most important things you can do to keep your data FAIR is to deposit it in a trusted digital repository. Do not use your personal website as your data archive.
  3. Tidy data are good data. Messy data are hard to work with.
  4. Data quality is a process, starting with planning through to curation of the data for deposit.

phd072613s
Source: http://www.phdcomics.com/comics/archive.php?comicid=1612
Remember! “Documentation is a love letter to your data”

(more…)

Documenting, Describing, Defining – Love Your Data Week 2017

Information from Love Your Data Week.

Message of the day

Good documentation tells people they can trust your data by enabling validation, replication, and reuse.

Things to consider

Why does having good documentation matter?

  • It contributes to the quality and usefulness of your research and the data itself – for yourself, colleagues, students, and others.
  • It makes the analysis and write-up stages of your project easier and less stressful.
  • It helps your teammates, colleagues, and students understand and build on your work.
  • It helps to build trust in your research by allowing others to validate your data or methods.
  • It can help you answer questions about your work during pre-publication peer review and after publication.
  • It can make it easier for others to replicate or reuse your data. When they cite the data, you get credit! Include these citations in your CV, funding proposal, or promotion and tenure package.
  • It improves the integrity of the scholarly record by providing a more complete picture of how your research was conducted. This promotes public trust and support of research!
  • Some communities and fields have been talking about documentation for decades and have well-developed standards for documentation (e.g., geospatial data, clinical data, etc.), while others do not (e.g., psychology, education, engineering, etc.). No matter where your research community or field falls in this spectrum, you can start improving your documentation today!

(more…)

Defining Data Quality – Love Your Data Week 2017

Information from Love Your Data Week.

Message of the Day 

Data quality is the degree to which data meets the purposes and requirements of its use. Depending on the uses, good quality data may refer to complete, accurate, credible, consistent or “good enough” data.

Things to consider

What is data quality and how can we distinguish between good and bad data? How are the issues of data quality being addressed in various disciplines?

  • Most straightforward definition of data quality is that data quality is the quality of content (values) in one’s dataset. For example, if a dataset contains names and addresses of customers, all names and addresses have to be recorded (data is complete), they have to correspond to the actual names and addresses (data is accurate), and all records are up-to-date (data is current).
  • Most common characteristics of data quality include completeness, validity, consistency, timeliness and accuracy. Additionally, data has to be useful (fit for purpose) and documented and reproducible / verifiable.
  • At least four activities impact the quality of data: modeling the world (deciding what to collect and how), collecting or generating data, storage/access, and formating / transformation
  • Assessing data quality requires disciplinary knowledge and is time-consuming
  • Data quality issues: how to measure, how to track lineage of data (provenance), when data is “good enough”, what happens when data is mixed and triangulated (esp. high quality and low quality data), crowdsourcing for quality
  • Data quality is responsibility of both data providers and data curators: data providers ensure the quality of their individual datasets, while curators help the community with consistency, coverage and metadata.

“Care and Quality are internal and external aspects of the same thing. A person who sees Quality and feels it as he works is a person who cares. A person who cares about what he sees and does is a person who’s bound to have some characteristic of quality.” 
― Robert M. Pirsig, Zen and the Art of Motorcycle Maintenance: An Inquiry Into Values

(more…)

October RDS Brown Bag Talk: Jason Fishbain

by Cameron Cook

The Rebecca J. Holz series in Research Data Management is a monthly lecture series hosted during the spring and fall academic semesters. Research Data Services invites speakers from a variety of disciplines to talk about their research or involvement with data.

This October, Jason Fishbain, UW-Madison’s Chief Data Officer, gave a talk entitled “The Role the Chief Data Officer Can Play in Helping the Research Community”. You can find the archived presentation slides on MINDS@UW.

Jason’s talk began with an overview of the data governance program he is working to establish at UW. He provided an introduction into the different types of data that the system produces – administrative, local, and research – as well as his ideas for a data governance framework that has an overarching goal of information literacy for the campus. But information literacy means means educating users and instituting change in four specific areas:

1) Policies and Standards – i.e. making decision on who is accountable for the data and data policies, crafting a data stewardship policy.

2) Information Quality – i.e.  control workflows that ensures data quality and process quality

3) Privacy, Compliance, and Security – i.e.  making decisions on what ‘restricted’ and ‘classified’ means, how to comply with privacy and security laws (like FERPA) within our data management plan.

4) Architecture and Integration – i.e. means having consistent data definitions, data dictionaries

Creating a data governance framework is a large culture overhaul for an institution and it needs technology and people to make the changes work. However, it provides great potential for the Chief Data Officer to assist the research community. It sets up potential for greater institutional support of the research enterprise by providing more resources such as assistance for preparing grant proposals, complying with funding requirements, storage of data, etc.

September RDS Brown Bag Talk: Mattie Burkert

by Cameron Cook

The Rebecca J. Holz series in Research Data Management is a monthly lecture series hosted during the spring and fall academic semesters. Research Data Services invites speakers from a variety of disciplines to talk about their research or involvement with data.

This September, Mattie Burkert, a PhD student from the Department of English, gave a talk entitled “Recovering the London Stage Information Bank (1970-1978): Data Preservation Lessons from an Early Humanities Computing Project”.

You can find the archived presentation slides on MINDS@UW.

Mattie’s talk focused on her work of piecing together what remains from the London Stage Information Bank, an early digital humanities computing initiative from the 1970s that sought to transform the printed text The London Stage into a data bank queryable by researchers. Her work touches on the rapid media obsolescence and the misconceptions of data preservation of the time, both of which can seen as lessons for today’s digital humanities and digital scholarship worlds. Her talk also gave a brief view into the project as it stands, the tools and techniques she has used for reconstructing lost data, and the difficulties faced as she continues her project.