Teaching Data Management to Undergraduates: A Biocore + RDS Partnership


RDS and Dr. Michelle Harris of the Biocore Honors Program partnered to bring research data management into the classroom and introduce it as a lifelong research skill to her undergraduates. With the advent of federal funding requirements and a general shift toward sharing and reproducibility, good data management is a critical skill for students. Introducing these concepts early on gives students a chance to adapt them as habits and incorporate them into their workflows. Below you’ll find a discussion of our approach to teaching data management this semester and how we can adapt these exercises to your classroom or lab.


Changing Scholarly Communication Models: Reflections from the 2015 Teaching and Learning Retreat

By Cameron Cook, Digital Curation Assistant and SLIS Graduate Student

Amy Buckland presenting at the Teaching and Learning Retreat. August 6th, 2015.

Amy Buckland presenting at the Teaching and Learning Retreat. August 6th, 2015. Image from Brianna Marshall.

On August 6th, 2015 I attended the afternoon workshop of the UW-Madison Teaching and Learning Retreat, which was led by University of Chicago’s Institutional Repository Manager, Amy Buckland. The focus of the daylong retreat was the intersections of scholarly communication and information literacy. Amy’s talk narrowed in on issues of public access and libraries’ role in scholarly communication – both as content consumers and content creators.

What then, you might ask, does Amy’s talk have to do with researchers and the purpose of Research Data Services? The answer is something very simple but a key concept for all of us involved in research and research data to move forward with in mind. It is that, as Amy said, “the new normal will be public access.”


Let’s Talk About Storage

By Luke Bluma, IT Engagement Manager for the Campus Computing Infrastructure (CCI)

Data is a critical part of our lives here at UW-Madison. We collect, analyze, and share data every day to get our jobs done. Data comes in all shapes and sizes and it needs the right place to live. That’s where storage comes in.

However, storage can be a loaded term. It can mean a thumb drive, or your computer’s hard drive, or storage that is accessed via a server or cloud storage or a large campus-wide storage service. It is all of these things, but not all of these will fit your needs. Your needs are what matters and they will drive what solution(s) will work for you.

I am the Engagement Manager for the Campus Computing Infrastructure (CCI) initiative. I work with campus partners on their data center, server, storage and/or backup needs. Storage is currently a big focus for me, so I wanted to share some thoughts about evaluating potential storage solutions.

Storage Array in Data Center

Storage for CCI

The main areas to think about are:

  • What kinds of data are you working with?
  • What are your “must have’s”?
  • What storage options are available at UW-Madison?

What kinds of data are you working with?

This is the first big question you want to focus on because it drastically impacts what options are available to you. Are you working with FERPA data, sensitive data, restricted data, PCI data, etc.? Each of these will impact what service(s) you can or can’t utilize. For more information on Restricted Data see: https://www.cio.wisc.edu/security/about/campus-initiatives/restricted-data-security-standards/

What are your “must have’s”?

Once you have identified the types of data you are working with, then it is crucial to determine what are your must have requirements for a storage solution. Does it need to be secure? If so, how secure? Does it need to be accessed by people outside of UW-Madison? Does it need to be high performance storage? Does it need to scale to 20+ TB? Does it need to be accessible via the web? These are just example questions, and the key here is that there is no perfect storage solution. Some services do X, Y, Z and others do X, Y, A but not Z. So determining your “must have’s” will help you figure out which services you can work with, and which you can’t.

What storage options are available at UW-Madison?

Now that you have identified the kinds of data, and the “must have’s” for your solution the final step is to evaluate what storage options are available to you at UW-Madison. Storage is an evolving technology so specific services will change over time, but here are good places to start to learn more about what services are available to you:

  • Local IT – if you have a local IT group, then talk to them first about what local options may be available to you
  • Campus Computing Infrastructure (CCI) – if you need network storage or server storage that isn’t focused on high performance computing then CCI has several options that could work depending on your needs
  • Advanced Computing Initiative (ACI) – if you need to do high performance or high throughput computing then ACI has several options that could work depending on your needs
  • Division of Information Technology (DoIT) – if you need cloud storage, like Box.com, or local storage, like an external hard drive, then DoIT has solutions that could work for you as well

This can seem like a lot to think about, and to be honest it can be quite confusing at times. The good news is that you have help! Research Data Services (RDS) can be a great starting point for your storage needs. We can focus on the key question: what are you looking to do? Then we can help you evaluate some potential options for moving forward based on your needs.

To get started contact RDS at http://researchdata.wisc.edu/help/contact-us/ or contact me at cci@cio.wisc.edu

Guides, Tutorials, and Courses for Learning About Data Management

by Cid Freitag, ‎Instructional Technology Program Manager at DoIT Academic Technology


If the data you need still exists;
If you found the data you need;
If you understand the data you found;
If you trust the data you understand;
If you can use the data you trust;
Someone did a good job of data management.

Rex Sanders ‐ USGS‐Santa Cruz*

Data management practices have been described in detail in a variety of documentation and tutorials, which may focus on specific needs and resources applicable to the organization that produced them. The following is a selected list of resources that are general enough to apply to different disciplines, and more broadly than the university or agency that developed them.

Guides and Tutorials

Data Science MOOCs

Several Massively Open Online Courses cover topics related to data analysis and research methods. Even if you choose not to do the coursework and earn a statement of completion, it’s easy to sign up for the courses, which gives you access to lectures and examples.

The Class Central website has curated a list of several data science and analysis methods MOOCs, developed by reputable sources.

The MOOCs listed here have been developed through Johns Hopkins University, and offered through the Coursera platform. They are part of a Data Science Specialization series of of courses, and have applicability to data management practices outside of specific analytical techniques. Each of these courses lasts 4 weeks, and are frequently offered. Currently, there is a new offering of each course starting each month from March through June, 2015.

The Data Scientist’s Toolbox, Jeff Leek, Roger Peng, Brian Caffo

“The course gives an overview of the data, questions, and tools that data analysts and data scientists work with.” It focuses on a practical introduction to tools, using version control, markdown, git, GitHub, R, and RStudio.

Getting and Cleaning DataJeff Leek, Roger Peng, Brian Caffo

“This course will cover the basic ways that data can be obtained…..It will also cover the basics of data cleaning and how to make data “tidy”… The course will also cover the components of a complete data set including raw data, processing instructions, codebooks, and processed data. The course will cover the basics needed for collecting, cleaning, and sharing data.” Tools used in this course:  Github, R, RStudio

Reproducible Research, Jeff Leek, Roger Peng, Brian Caffo

“Reproducible research is the idea that data analyses, and more generally, scientific claims, are published with their data and software code so that others may verify the findings and build upon them…This course will focus on literate statistical analysis tools which allow one to publish data analyses in a single document that allows others to easily execute the same analysis to obtain the same results.” Tools: R markdown, knitr

*Rex Sanders quote from: Environmental Data Management: CHALLENGES AND OPPORTUNITIES, Jamie Gerrard | March 2014


Looking for additional information about research data management? Contact us.