Love Your Data Week – Day 4

Content is adapted from the Love Your Data website.

1

As we reach the last few days of Love Your Data Week, let’s talk about a harder topic – data sharing. Sharing is a great way to give and get credit – it’s also required by some federal funding agencies. Today’s post will introduce you to key components of sharing and provide an activity to help you become comfortable with it. If you have any questions or want to let us know how you shared your data, reach out to us on Twitter!

Respect Your Data – Give & Get Credit

Data are becoming valued scholarly products instead of a byproduct of the research process. Federal funding agencies and publishers are encouraging, and sometimes requiring, researchers to share data that have been created with public funds. The benefit to researchers is that sharing your data can increase the impact of your work, lead to new collaborations or projects, enables verification of your published results, provides credit to you as the creator, and provides great resources for education and training. Data sharing also benefits the greater scientific community, funders, the public by encouraging scientific inquiry and debate, increases transparency, reduces the cost of duplicating data, and enables informed public policy.

There are many ways to comply with these requirements – talk to your local librarian to figure out how, where, and when to share your data.

Good Practice

  • Share your data upon publication.
  • Share your data in an open, accessible, and machine readable format (e.g., csv vs. xlsx, odf vs. docx, etc.)
  • Deposit your data in a subject or institutional repository so your colleagues can find and use it.
  • Deposit your data in your institution’s repository to enable long term preservation.
  • License your data so people know what they can do with it.
  • Tell people how to cite your data.
  • When choosing a repository, ask about the support for tracking its use. Do they provide a handle or DOI? Can you see how many views and downloads? Is it indexed by Google, Google Scholar, the Data Citation Index?

Things to Avoid

  • “Data available upon request” is NOT sharing the data.
  • Sharing data in PDF files.
  • Sharing raw data if the publication doesn’t provide sufficient detail to replicate your results.

Today’s Activity

Take the plunge and share some of your data today! Check out the list of resources below, or contact your local librarians to get started.

If your data are not quite ready to go public, go check out 1-2 of the repositories below and see what kinds of data are already being shared.

If you have used someone else’s data, make sure you are giving them credit. Take a minute to learn how to cite data:

Data Archiving Platforms: Figshare

by Brianna Marshall, Digital Curation Coordinator

This is part three of a three-part series where I explore platforms for archiving and sharing your data. Read the first post in the series, focused on UW’s institutional repository, MINDS@UW or read the second post, focused on data repository Dryad.

To help you better understand your options, here are the areas I will address for each platform:

  • Background information on who can use it and what type of content is appropriate.
  • Options for sharing and access
  • Archiving and preservation benefits the platform offers
  • Whether the platform complies with the forthcoming OSTP mandate

figshare

About

figshare is a discipline-neutral platform for sharing research in many formats, including figures, datasets, media, papers, posters, presentations and filesets. All items uploaded to figshare are citable, shareable and discoverable.

Sharing and access

All publicly available research outputs are stored under Creative Commons Licenses. By default, figures, media, posters, papers, and filesets are available under a CC-BY license, datasets are available under CC0, and software/code is available under the MIT license. Learn more about sharing your research on figshare.

Archiving and preservation

figshare notes that items will be retained for the lifetime of the repository and that its sustainability model “includes the continued hosting and persistence of all public research outputs.” Research outputs are stored directly in Amazon Web Service’s S3 buckets. Data files and metadata are backed up nightly and replicated into multiple copies in the online system. Learn more about figshare’s preservation policies.

OSTP mandate

The OSTP mandate requires all federal funding agencies with over $100 million in R&D funds to make greater efforts to make grant-funded research outputs more accessible. This will likely mean that data must be publicly accessible and have an assigned DOI (though you’ll need to check with your funding agency for the exact requirements). All items uploaded to figshare are minted a DataCite DOI, so as long as your data is set to public it is a good candidate for complying with the mandate.

Visit figshare.

Have additional questions or concerns about where you should archive your data? Contact us.

Data Archiving Platforms: Dryad

by Brianna Marshall, Digital Curation Coordinator

This is part two of a three-part series where I explore platforms for archiving and sharing your data. Read the first post in the series, focused on UW’s institutional repository, MINDS@UW.

To help you better understand your options, here are the areas I address for each platform:

  • Background information on who can use it and what type of content is appropriate.
  • Options for sharing and access
  • Archiving and preservation benefits the platform offers
  • Whether the platform complies with the forthcoming OSTP mandate

Dryad

About

Dryad is a repository appropriate for data that accompanies published articles in the sciences or medicine. Many journals partner with Dryad to provide submission integration, which makes linking the data between Dryad and the journal easy for you. Pricing varies depending on the journal you are publishing in; some journals cover the data publishing charge (DPC) while others do not. Read more about Dryad’s pricing model or browse the journals with sponsored DPCs.

Sharing and access

Data uploaded to Dryad are made available for reuse under the Creative Commons Zero (CC0) license. There are no format restrictions to what you upload, though you are encouraged to use community standards if possible. Your data will be given a DOI, enabling you to get credit for sharing.

Archiving and preservation

According to the Dryad website, “Data packages in Dryad are replicated across multiple systems to support failover, improve access times, allow recovery from disk failures, and preserve bit integrity. The data packages are discoverable and backed up for long-term preservation within the DataONE network.”

OSTP mandate

The OSTP mandate requires all federal funding agencies with over $100 million in R&D funds to make greater efforts to make grant-funded research outputs more accessible. This will likely mean that data must be publicly accessible and have an assigned DOI (though you’ll need to check with your funding agency for the exact requirements). As long as the data you need to share is associated with a published article, Dryad is a good candidate for OSTP-compliant data: it mints DOIs and makes data openly available under a CC0 license.

Visit Dryad.

Have additional questions or concerns about where you should archive your data? Contact us.

NSF Releases New Public Access Plan

By Allan Barclay, Ebling Library

New Requirements to Make Work and Data More Transparent and Reusable

April 2015 – The National Science Foundation (NSF) recently released a set of public access requirements for researchers applying for grants with an effective date on or after January 2016. According to the plan, entitled Today’s Data, Tomorrow’s Discoveries, the objectives of increasing public-accessibility are to make research and data easier for other investigators and educational institutions to use, and spur innovation from these same communities.

The NSF sees these requirements as the “initial implementation” of a framework that will change and grow over time to include additional research products and degree of accessibility.

The scope of the plan is initially focused on four types of outcome products:

  • Articles in peer-reviewed journals
  • Papers accepted as part of juried conference proceedings
  • Articles/juried papers in conference proceedings authored entirely or in part by NSF employees
  • Data generated and curated as part of an NSF-required Data Management Plan (DMP).

Researchers who receive all or partial NSF funding will be required to

  • Deposit either the version of record or final accepted peer-reviewed manuscript of these products in a public access compliant repository as designated by the NSF. At this time, the NSF has designated the Department of Energy’s PAGES (Public Access Gateway for Energy and Science) system as their designated repository.
  • Make these outcome products freely available for download, reading and analysis no later than 12 months after initial publication.
  • Provide a minimum level of machine-readable metadata with each product at the time of initial publication.
  • Ensure the long-term preservation of products.
  • Provide a unique persistent identifier to all products in the award annual and final reports.

The NSF expects that investigators will be able to deposit research products into the PAGES system by the end of the 2015 calendar year. Data underling journal article or conference paper findings should be deposited in a repository as specified by the publication or as described in the research proposal’s DMP.

Public access requirement specifics will be provided in future NSF documents and grant solicitations.

For more information on how these new requirements could affect your grant proposal, contact the solicitation’s Cognizant Program Officer or the UW-Madison’s Research Data Services.

An Introduction to Research Data Sharing and Open Data

by Lisa Abler, Assistant Scientist, Dept. of Comparative Biosciences, School of Veterinary Medicine

Image courtesy of Colleen Simon for opensource.com

Image courtesy of Colleen Simon for opensource.com

Researchers are increasingly exposed to the concepts of research data sharing and open data. Funders, publishers, research institutions and possibly even colleagues are introducing these phrases. Often, the differences between these concepts can be confusing, and understanding how one or both could affect your research may be a mystery. The following provides definitions, some benefits to researchers and a list of resources for learning more about these topics, as well as why and how to implement them for your lab.

What is Research Data Sharing?
Research data sharing is the act of making your research data available to others for reuse. There are a number of aspects that factor into data sharing:

  • Which data to share: raw data, processed data, both?
  • How to share the data: lab meetings, scientific meetings, journal publication, online databases?
  • With whom to share: coworkers, collaborators, peers, funders, the public?
  • How soon to share: immediately, after ensuring your own lab’s publishing needs are met, never?

There are also restrictions that may apply to data sharing at many levels, from institutions to publishers to the federal government (e.g., privacy). Fortunately, there are resources to help navigate these restrictions. See what UW-Madison, Johns Hopkins University and the University of Oregon have to offer on this topic:

What is Open Data?
As defined by The Open Definition, “Open data and content can be freely used, modified, and shared by anyone for any purpose.” In essence, open data is an unrestricted mode of research data sharing. The most important points to consider when making data open include:

  • Availability: Data must be available to anyone to use, with no restrictions based on person, group or undertaking
  • Access and usability: Data must be accessible, preferably downloadable over the Internet, and must be available as a whole, in a reusable format, for a reasonable reproduction cost
  • Licensing: Data must be made available as a whole and licensing must allow utilization without restrictions on use (i.e., in whole or in part), redistribution or modification

What are the benefits of sharing my data?
There are many potential benefits to sharing your research data:

  • Increases recognition through citation of datasets
  • Facilitates the exchange of ideas and sharing of expertise among peers
  • May be required by research funders or publishers
  • Increases visibility of and interest in your research, especially in a global research environment
  • Provides evidence of research findings, as well as opportunities for verification or validation
  • May allow opportunities for reciprocity; by sharing your data to further research, others may be willing to share with you
  • Can encourage collaborations or co-authorships
  • Can accelerate discovery if many qualified scientists are working on a common problem, particularly data analysis in complex fields
  • Avoids experimental duplications, especially in the case of negative findings or failed experiments
  • Sharing data is central to scientific progress, benefiting both research endeavors and the public

Where can I find more information?
Research Data Sharing

Open Data

Further Reading

DoE Proposes First Plan for Expanded Public Access of Research

DOE Seal graphic
For University of Wisconsin researchers who rely on Department of Energy federal grants, the other shoe has dropped. To be precise, the DoE’s “shoe” or plan to increase access to the works and data of its federally-funded investigators is one of approximately thirteen plans many federal agencies will likely be announcing in the next several weeks. In February 2013, the White House Office of Science and Technology Policy issued a memo (known as the OSTP memo) that required all agencies that fund over $100 million in research annually to create a plan to allow greater public access to its’ researchers’ work and data after a 12 month embargo period. The Washington Post’s recent article on the announcement indicates that this particular plan is not without its detractors.

RDS will be covering the release of all OSTP Memo plans as they are announced.