Love Your Data Week – Day 5

Content is adapted from the Love Your Data website.

1

It’s the final day of Love Your Data Week! Today’s topic is the big picture – transforming, extending, and reusing. Today is about putting the entire week together! Think about all the topics we’ve talked about this week and focus on how to make your data reusable for your future self and others.  Share your plan with us or ask us about ways you can share and reuse on Twitter!

“Wear your open on your sleeve”  (Mike Eisen, OpenCon 2015 keynote)

Good Practice

Still emerging, depends on your field and research community!

Things to Avoid

Locking down your data if it can be reused by others without legal or ethical restrictions

Today’s Activity

Think about how your data might be used by your “future self.” How can you plan, document, and share your data to make it more reusable in the future?

Check out these stories about how data are shared and reused by others.

Want to learn more about open science? Check out the talks and projects from OpenCon2015.

Love Your Data Week – Day 4

Content is adapted from the Love Your Data website.

1

As we reach the last few days of Love Your Data Week, let’s talk about a harder topic – data sharing. Sharing is a great way to give and get credit – it’s also required by some federal funding agencies. Today’s post will introduce you to key components of sharing and provide an activity to help you become comfortable with it. If you have any questions or want to let us know how you shared your data, reach out to us on Twitter!

Respect Your Data – Give & Get Credit

Data are becoming valued scholarly products instead of a byproduct of the research process. Federal funding agencies and publishers are encouraging, and sometimes requiring, researchers to share data that have been created with public funds. The benefit to researchers is that sharing your data can increase the impact of your work, lead to new collaborations or projects, enables verification of your published results, provides credit to you as the creator, and provides great resources for education and training. Data sharing also benefits the greater scientific community, funders, the public by encouraging scientific inquiry and debate, increases transparency, reduces the cost of duplicating data, and enables informed public policy.

There are many ways to comply with these requirements – talk to your local librarian to figure out how, where, and when to share your data.

Good Practice

  • Share your data upon publication.
  • Share your data in an open, accessible, and machine readable format (e.g., csv vs. xlsx, odf vs. docx, etc.)
  • Deposit your data in a subject or institutional repository so your colleagues can find and use it.
  • Deposit your data in your institution’s repository to enable long term preservation.
  • License your data so people know what they can do with it.
  • Tell people how to cite your data.
  • When choosing a repository, ask about the support for tracking its use. Do they provide a handle or DOI? Can you see how many views and downloads? Is it indexed by Google, Google Scholar, the Data Citation Index?

Things to Avoid

  • “Data available upon request” is NOT sharing the data.
  • Sharing data in PDF files.
  • Sharing raw data if the publication doesn’t provide sufficient detail to replicate your results.

Today’s Activity

Take the plunge and share some of your data today! Check out the list of resources below, or contact your local librarians to get started.

If your data are not quite ready to go public, go check out 1-2 of the repositories below and see what kinds of data are already being shared.

If you have used someone else’s data, make sure you are giving them credit. Take a minute to learn how to cite data:

Love Your Data Week – Day 3

Content is adapted from the Love Your Data website.

1

It’s day 3 of Love Your Data Week! Today is all about documentation – one of the easiest and hardest things about data management! Documenting your data is all about giving it context and ensuring that it continues to be usable to you and others. Today’s activity will help you think through using readme files, writing metadata, and taking better notes. If you’ve made improvement to your data documentation, share it with us on Twitter!

Good Practice

Document, document, document! You probably won’t remember that weird thing that happened yesterday unless you write it down. Your documentation provides crucial context for your data. So whatever your preferred method of record keeping is, today is the day to make it a little bit better! Some general strategies that work for any format:

  • Be clear, concise, and consistent.
  • Write legibly.
  • Number pages.
  • Date everything, use a standard format (ex: YYYYMMDD).
  • Try to organize information in a logical and consistent way.
  • Define your assumptions, parameters, codes, abbreviations, etc.
  • If documentation is scattered across more than one place or file (e.g., protocols & lab notebook), remind yourself of the file names and where those files are located.
  • Review your notes regularly and keep them current.
  • Keep all of your notes for at least 7 years after the project is completed.

Things to Avoid

  • Writing illegibly.
  • Using abbreviations or codes that aren’t defined.
  • Using abbreviations or codes inconsistently.
  • Forgetting to jot down what was unusual or what went wrong. This is usually the most important type of information when it comes to analysis and write up!

Today’s Activity

Take a few minutes to think about how you document your data. What’s missing? Where are the gaps?
If your documentation could be better, try out some of these strategies and tools.

Love Your Data Week – Day 2

Content adapted from the Love Your Data website.

1

It’s Day 2 of Love Your Data Week! Today we’re talking about organizing your data. Having a set file naming and organization system ensures that you’ll always know how and where to find your data. It can be hard to break old habits, but not having to dig through multiple ‘final’ versions of a document will be worth it! Share your organization plan with us on our Twitter or tweet us if you have questions!

 Good Practice

Have a plan for organizing your data. This usually includes a folder structure and file naming scheme (plan). Easier said than done, but check out the tips below!

phd052810s_storyinfilenames

Things to avoid

Source: http://www.phdcomics.com/comics/archive.php?comicid=1323

phd101212s

Source: http://www.phdcomics.com/comics/archive.php?comicid=1531

Google “bad file names” and browse through the images for laughs.

Today’s Activity

If you don’t already have a folder structure and/or file naming plan, come up with one and share it. Some good practices for naming files are described below.

  • Be Clear, Concise, Consistent, and Correct
  • Make it meaningful (to you and anyone else who is working on the project)
  • Provide context so it will still be a unique file and people will be able to recognize what it is if moved to another location.
  • For sequential numbering, use leading zeros.
    • For example, a sequence of 1-10 should be numbered 01-10; a sequence of 1-100 should be numbered 001-010-100.
  • Do not use special characters: & , * % # ; * ( ) ! @$ ^ ~ ‘ { } [ ] ? < >
    • Some people like to use a dash ( – ) to separate words
    • Others like to separate words by capitalizing the first letter of each (e.g., DST_FileNamingScheme_20151216)
  • Dates should be formatted like this: YYYYMMDD (e.g., 20150209)
    • Put dates at the beginning or the end of your files, not in the middle, to make it easy to sort files by name
      • OK: DST_FileNamingScheme_20151216
      • OK: 20151216_DST_FileNamingScheme
      • AVOID: DST_20151216_FileNamingScheme
  • Use only one period and before the file extension (e.g., name_paper.doc NOT name.paper.doc OR name_paper..doc)

There are generally two approaches to folder structures. Filing, or using a hierarchical folder structure. The other approach is piling, which relies on fewer folders and uses the search, sort, and tagging functions of your operating system or cloud storage tools like Box.

dsp_folderstructure-ex21

dsp_folderstructure-ex1

Love Your Data Week – Day 1

Content is adapted from the Love Your Data website.


1

Happy Love Your Data Week!

We’re excited to start the weeklong celebration of data! The first day of Love Your Data Week focuses on keeping your data safe. Below you’ll find some tips and an activity to get you thinking about when and how you can protect your data. When you’re done, share it with us on Twitter! We’d love to see how you mapped your project and it’ll help spread the word about keeping your data safe!

Good Practice

Follow the 3-2-1 Rule

  • Keep 3 copies of any important file (1 primary, 2 backup copies)
  • Store files on at least 2 different media types (e.g., 1 copy on an internal hard drive and a second in a secure cloud account or an external hard drive)
  • Keep at least 1 copy offsite (i.e., not at your home or in the campus lab)

Things to Avoid

  • Storing the only copy of your data on your laptop or flash drive
  • Storing critical data on an unencrypted laptop or flash drive
  • Saving copies of your files haphazardly across 3 or 4 places
  • Sharing the password to your laptop or cloud storage account

Today’s Activity

Data snapshots or data locks are great for tracking your data from collection through analysis and write up. Librarians call this provenance, and it can be really important. Errors are inevitable. Data snapshots can save you lots of time when you make a mistake in cleaning or coding your data. Taking periodic snapshots of your data, especially before the next phase begins (collection or processing or analysis) can keep you from losing crucial data and time if you need to make corrections. These snapshots then get archived somewhere safe (not where you store active files) just in case you need them. If something should go wrong, copy the files you need back to your active storage location, keeping the original snapshot in your archival location. For a 5-year longitudinal study, you might take snapshots every quarter. If you will be collecting all the data for your study in a 2-week period, you will want to take snapshots more often, probably every day. How much data can you afford to lose? Oh, and (almost) always keep the raw data! The only time when you might not is it’s easier and less expensive to recreate the data than keep it around.

Instructions: Draw a quick workflow diagram of the data lifecycle for your project (check out our examples on Instagram and Pinterest). Think about when major data transformations happen in your workflow. Taking a snapshot of your data just before and after the transformation can save you from heartache and confusion if something goes wrong.