Information from Love Your Data Week.
Message of the day
Good documentation tells people they can trust your data by enabling validation, replication, and reuse.
Things to consider
Why does having good documentation matter?
- It contributes to the quality and usefulness of your research and the data itself – for yourself, colleagues, students, and others.
- It makes the analysis and write-up stages of your project easier and less stressful.
- It helps your teammates, colleagues, and students understand and build on your work.
- It helps to build trust in your research by allowing others to validate your data or methods.
- It can help you answer questions about your work during pre-publication peer review and after publication.
- It can make it easier for others to replicate or reuse your data. When they cite the data, you get credit! Include these citations in your CV, funding proposal, or promotion and tenure package.
- It improves the integrity of the scholarly record by providing a more complete picture of how your research was conducted. This promotes public trust and support of research!
- Some communities and fields have been talking about documentation for decades and have well-developed standards for documentation (e.g., geospatial data, clinical data, etc.), while others do not (e.g., psychology, education, engineering, etc.). No matter where your research community or field falls in this spectrum, you can start improving your documentation today!
Information from Love Your Data Week.
Message of the Day
Data quality is the degree to which data meets the purposes and requirements of its use. Depending on the uses, good quality data may refer to complete, accurate, credible, consistent or “good enough” data.
Things to consider
What is data quality and how can we distinguish between good and bad data? How are the issues of data quality being addressed in various disciplines?
- Most straightforward definition of data quality is that data quality is the quality of content (values) in one’s dataset. For example, if a dataset contains names and addresses of customers, all names and addresses have to be recorded (data is complete), they have to correspond to the actual names and addresses (data is accurate), and all records are up-to-date (data is current).
- Most common characteristics of data quality include completeness, validity, consistency, timeliness and accuracy. Additionally, data has to be useful (fit for purpose) and documented and reproducible / verifiable.
- At least four activities impact the quality of data: modeling the world (deciding what to collect and how), collecting or generating data, storage/access, and formating / transformation
- Assessing data quality requires disciplinary knowledge and is time-consuming
- Data quality issues: how to measure, how to track lineage of data (provenance), when data is “good enough”, what happens when data is mixed and triangulated (esp. high quality and low quality data), crowdsourcing for quality
- Data quality is responsibility of both data providers and data curators: data providers ensure the quality of their individual datasets, while curators help the community with consistency, coverage and metadata.
“Care and Quality are internal and external aspects of the same thing. A person who sees Quality and feels it as he works is a person who cares. A person who cares about what he sees and does is a person who’s bound to have some characteristic of quality.”
― Robert M. Pirsig, Zen and the Art of Motorcycle Maintenance: An Inquiry Into Values
We love your data too. New flyers coming soon!
By Brianna Marshall, RDS Chair
We have been working on revamping our marketing materials. Though we are very fond of “data man,” as we’ve come to know him, we couldn’t help but feel that we could do a much better job of representing the diversity of our research community at UW-Madison.
RDS is committed to supporting all researchers, period. Our differences strengthen our abilities as collaborators and innovators. To further this commitment, we’ve created a new community of “data people” that moves us closer to our aim of being as inclusive as possible in our outreach to campus. We won’t be sharing everyone until later in the month but stay tuned. We hope you’ll like the results!
Last year RDS participated in Love Your Data Week, a great campaign to raise awareness around research data management.
This year, we’ll be participating again! The event starts next Monday, 2/13 and runs until Friday, 2/17. We’ll be sharing the great content they’ve created on our blog and pointing to resources that the Love Your Data Week crew is sharing. You can also follow along in the larger conversation happening with the other participating institutions by following the hashtags #LYD17 and #loveyourdata on Twitter.
This year the focus for each day is as follows:
- Monday – Defining Data Quality
- Tuesday – Documenting, Describing, Defining
- Wednesday – Good Data Examples
- Thursday – Finding the Right Data
- Friday – Rescuing Unloved Data
If your institution would like to participate or you’d like to learn more, you can visit the Love Your Data Week website.
In this series, members of the RDS team share links to research data related stories, resources, and news that caught their eye each month. Feel free to share your favorite stories with us on Twitter @UWMadRschSvcs!
I’ve had Data Carpentry on the brain lately, as I work to become more involved in the community so I thought I’d share a post I really liked by Christine Bahlai called Soft(ware) Skills.
I’ve also run into the importance of knowing good spreadsheet practices again this semester, so I’ll also link my favorite little lesson from Data Carpentry, common formatting problems in spreadsheets.
This post is a cool look into applying the Open Science Framework to the Eleanor Roosevelt Papers Project.
Information adapted from ReproZip.org.
What is ReproZip?
ReproZip is a software packaging tool developed by Fernando Chirigati, Juliana Freire, Rémi Rampin, Dennis Shasha, and Vicky Steeves at NYU. ReproZip is designed to make the computational components of research easier to reproduce across different machines.
RDS is excited to announce that our project on the Open Science Framework is now public!
What does this mean? Well, while this blog and website are our pipeline for important research data tools, news, or resources for our UW-Madison researchers, students, and staff; RDS also creates resources and outputs related to our services that may be valuable for other information professionals and research communities beyond UW-Madison.
So, we have created a project on the Open Science Framework where we can share those outputs openly with those who may be interested in building their own data or digital scholarship services.
You can find our project at the following address – https://osf.io/m7q9p/. Within the project, you’ll find separate components for events, marketing and outreach, presentations, and flyers. Within each component, you will find all the files we’ve uploaded as well as documentation on the component wiki.
The space may seem a little bare right now, but it is a living project. So, as RDS continues to develop our programming, marketing, and documentation, we’ll continue to update our project space as well. Feel free to explore and please send us feedback on how to make it more beneficial to you and others!