OpenCitations Enhances Citation Data with COCI

OpenCitations has been working toward enhancing citations to make citation data more easily discoverable and retrievable. In July, OpenCitations released COCI, the OpenCitations Index of Crossref open DOI-to-DOI references. The initial release of COCI created first-class data entities out of citation information in order to index Crossref and to make this information machine-readable. The July release also included the OpenCitations Corpus (OCC), a repository of downloadable bibliographic and citation data. OpenCitations has been building upon the data model that they created, and released the newest version of COCI this week: they have extended the data model, and the index now contains almost 450 million citation links between DOIs from Crossref reference data.

(more…)

Tool: OpenICPSR

Written by Chiu-chuang Lu Chou; Information adapted from OpenICPSR

 

OpenICPSR is a self-serving data repository for researchers who need to deposit their social and behavioral science research data for public access compliance.  Researchers can share up to 2 GB data in OpenICPSR for free. Researchers prepare all data and documentation files necessary to allow their data collection be read and interpreted independently.  They also prepare metadata to allow their data be searched and discovered in ICPSR catalog and major search engines. A DOI and a data citation will be provided to the depositor after data are published.

 

Depositors will receive data download reports from OpenICPSR. All OpenICPSR data is governed by the Attribution 4.0 Creative Commons License. Server-side encryption is used to encrypt all files uploaded to OpenICPSR. Data deposited in self-deposit package are distributed and preserved as-is, exactly as they arrive without the standard curation and preservation features available to professional curation package.

 

OpenICPSR offers Professional Curation Package to researchers, who like to utilize ICPSR’s curation services including full metadata generation and a bibliography search, statistical package conversion, and user support. The cost of professional curation is based on the number of variables and complexity of the data. To learn more about OpenICPSR, please visit their website.

Top 5 Data Management Tips for Undergraduates

7NWB2A8I0R

by Cameron Cook

With fall well under way on campus and final projects just around the corner,  it’s the perfect time to review our top five data management tips for undergrads! As an undergraduate, data management may not seem important, but giving it a few moments of your day will ensure your assignments are safe – even in the face of a hard drive meltdown the night before a due date.

If keeping your final projects safe isn’t enough of an incentive, there is one more. You have undergraduate publishing opportunities. As you learn and grow as a researcher, you can publish your work in a number of undergraduate research journals. Practicing good data management will help keep your research reproducible, understandable, findable, and organized for when you submit your work to a journal.

 

1 ) Clear, consistent file naming and structure

Or know where your data lives. Keep file names simple, short, but descriptive. Include dates (in a standardized format) to version your files so that you can always go back to a previous copy in case of mistakes. Keep files in a consistent, clear structure with easy to follow labels (this may be date, file type, instrument or analysis type) so that you will never misplace an important file.

(more…)

Data Archiving Platforms: Figshare

by Brianna Marshall, Digital Curation Coordinator

This is part three of a three-part series where I explore platforms for archiving and sharing your data. Read the first post in the series, focused on UW’s institutional repository, MINDS@UW or read the second post, focused on data repository Dryad.

To help you better understand your options, here are the areas I will address for each platform:

  • Background information on who can use it and what type of content is appropriate.
  • Options for sharing and access
  • Archiving and preservation benefits the platform offers
  • Whether the platform complies with the forthcoming OSTP mandate

figshare

About

figshare is a discipline-neutral platform for sharing research in many formats, including figures, datasets, media, papers, posters, presentations and filesets. All items uploaded to figshare are citable, shareable and discoverable.

Sharing and access

All publicly available research outputs are stored under Creative Commons Licenses. By default, figures, media, posters, papers, and filesets are available under a CC-BY license, datasets are available under CC0, and software/code is available under the MIT license. Learn more about sharing your research on figshare.

Archiving and preservation

figshare notes that items will be retained for the lifetime of the repository and that its sustainability model “includes the continued hosting and persistence of all public research outputs.” Research outputs are stored directly in Amazon Web Service’s S3 buckets. Data files and metadata are backed up nightly and replicated into multiple copies in the online system. Learn more about figshare’s preservation policies.

OSTP mandate

The OSTP mandate requires all federal funding agencies with over $100 million in R&D funds to make greater efforts to make grant-funded research outputs more accessible. This will likely mean that data must be publicly accessible and have an assigned DOI (though you’ll need to check with your funding agency for the exact requirements). All items uploaded to figshare are minted a DataCite DOI, so as long as your data is set to public it is a good candidate for complying with the mandate.

Visit figshare.

Have additional questions or concerns about where you should archive your data? Contact us.

Data Archiving Platforms: Dryad

by Brianna Marshall, Digital Curation Coordinator

This is part two of a three-part series where I explore platforms for archiving and sharing your data. Read the first post in the series, focused on UW’s institutional repository, MINDS@UW.

To help you better understand your options, here are the areas I address for each platform:

  • Background information on who can use it and what type of content is appropriate.
  • Options for sharing and access
  • Archiving and preservation benefits the platform offers
  • Whether the platform complies with the forthcoming OSTP mandate

Dryad

About

Dryad is a repository appropriate for data that accompanies published articles in the sciences or medicine. Many journals partner with Dryad to provide submission integration, which makes linking the data between Dryad and the journal easy for you. Pricing varies depending on the journal you are publishing in; some journals cover the data publishing charge (DPC) while others do not. Read more about Dryad’s pricing model or browse the journals with sponsored DPCs.

Sharing and access

Data uploaded to Dryad are made available for reuse under the Creative Commons Zero (CC0) license. There are no format restrictions to what you upload, though you are encouraged to use community standards if possible. Your data will be given a DOI, enabling you to get credit for sharing.

Archiving and preservation

According to the Dryad website, “Data packages in Dryad are replicated across multiple systems to support failover, improve access times, allow recovery from disk failures, and preserve bit integrity. The data packages are discoverable and backed up for long-term preservation within the DataONE network.”

OSTP mandate

The OSTP mandate requires all federal funding agencies with over $100 million in R&D funds to make greater efforts to make grant-funded research outputs more accessible. This will likely mean that data must be publicly accessible and have an assigned DOI (though you’ll need to check with your funding agency for the exact requirements). As long as the data you need to share is associated with a published article, Dryad is a good candidate for OSTP-compliant data: it mints DOIs and makes data openly available under a CC0 license.

Visit Dryad.

Have additional questions or concerns about where you should archive your data? Contact us.

Building a Practical DM Foundation

5070_Lab_microscope_originalBy Elliott Shuppy, Masters Candidate, School of Library and Information Studies

In addition to being an active research lab on the UW-Madison campus, the Laboratory for Optical and Computational Imaging (LOCI) initiates quite a lot of experimental instrumentation techniques and develops software to support those techniques. One major database platform development is OMERO, which stands for Open Microscopy Environment Remote Object. OMERO is an open, consortium-driven software package that is set up with the capabilities to view, organize, share, and analyze image data. One hiccough is that it’s not widely used at LOCI.

Having identified this problem, my mentor Kevin Elicieri, LOCI director, and I thought it would be a good idea for me to develop expertise in this software as a project for ZOO 699 and figure out how to incorporate it into a researcher workflow at LOCI. On-site researcher Jayne Squirrel was the ideal candidate as she is a highly organized researcher working in the lab, providing us an excellent use case. Before we could insert OMERO into her workflow, we had to lay some formal foundational management practices, which will be transferable in her use of OMERO.

We identified four immediate needs:

  • Simple and consistent folder structure
  • Identify all associated files
  • ID system that can be used in OMERO database
  • Documentation

We then developed solutions to meet each need. The first solution was a formalized folder structure, which we chose to organize by Jayne’s workload:

Lab\Year (YYYY)\Project\Sub-project\Experiment\Replicates\Files

This folder structure will help organize and regularize naming of files and data sets not only locally and on the backup server, but also within the OMERO platform.

In order to identify all files associated with a particular experiment we developed a unique identifier that we termed the Experiment ID.  This identifier will lead file names and consists of the following values: initial of collaborating lab (O or H) and a numerical sequence based on current year, month, series number of experiments, and replicate.

Example: O_1411_02_R1

The example reads Ogle lab, 2014, November, second experiment (within the month of November), replicate one. Incorporating this ID into file names will help to identify and recall data sets of a particular experiment and any related files such as processed images and analyses.

Further, both the file organization and experiment ID can aid organization and identification within OMERO.  The database platform has two levels of nesting resolution.  The folder is the top tier; within each folder a dataset can be nested; each dataset contains a number of image data. So, we can adapt folder structure naming to organize files and datasets and apply the unique identifier to name uploaded image objects.  These upgrades make searching more robust and similar in process to local drive searches.

Lastly, we developed documentation for reference. We realized that Experiment ID’s need to be accessible at the prep bench and microscope.  We subsequently created a mobile accessible spreadsheet containing information on each experiment. We termed this document the Experimental Worksheet and it contains the following information:

  • Experiment ID
  • Experiment Description
  • Experiment Start Date
  • Project Name
  • Sub-project Name
  • Notes

This document will act as a quick reference of bare bones experiment information for Jayne and student workers. Too, we realized that Jayne’s student workers need to know what the processes are in each step of her workflow. So, we developed step-by-step procedures and policy for each phase of the workflow. These procedural and policy documents set management expectations and conduct for Jayne’s data. Now, with such a data management foundation laid, the next step is to get to our root problem, discern how Jayne can best benefit from using OMERO and where it makes sense in her workflow.

Let’s Talk About Storage

By Luke Bluma, IT Engagement Manager for the Campus Computing Infrastructure (CCI)

Data is a critical part of our lives here at UW-Madison. We collect, analyze, and share data every day to get our jobs done. Data comes in all shapes and sizes and it needs the right place to live. That’s where storage comes in.

However, storage can be a loaded term. It can mean a thumb drive, or your computer’s hard drive, or storage that is accessed via a server or cloud storage or a large campus-wide storage service. It is all of these things, but not all of these will fit your needs. Your needs are what matters and they will drive what solution(s) will work for you.

I am the Engagement Manager for the Campus Computing Infrastructure (CCI) initiative. I work with campus partners on their data center, server, storage and/or backup needs. Storage is currently a big focus for me, so I wanted to share some thoughts about evaluating potential storage solutions.

Storage Array in Data Center

Storage for CCI

The main areas to think about are:

  • What kinds of data are you working with?
  • What are your “must have’s”?
  • What storage options are available at UW-Madison?

What kinds of data are you working with?

This is the first big question you want to focus on because it drastically impacts what options are available to you. Are you working with FERPA data, sensitive data, restricted data, PCI data, etc.? Each of these will impact what service(s) you can or can’t utilize. For more information on Restricted Data see: https://www.cio.wisc.edu/security/about/campus-initiatives/restricted-data-security-standards/

What are your “must have’s”?

Once you have identified the types of data you are working with, then it is crucial to determine what are your must have requirements for a storage solution. Does it need to be secure? If so, how secure? Does it need to be accessed by people outside of UW-Madison? Does it need to be high performance storage? Does it need to scale to 20+ TB? Does it need to be accessible via the web? These are just example questions, and the key here is that there is no perfect storage solution. Some services do X, Y, Z and others do X, Y, A but not Z. So determining your “must have’s” will help you figure out which services you can work with, and which you can’t.

What storage options are available at UW-Madison?

Now that you have identified the kinds of data, and the “must have’s” for your solution the final step is to evaluate what storage options are available to you at UW-Madison. Storage is an evolving technology so specific services will change over time, but here are good places to start to learn more about what services are available to you:

  • Local IT – if you have a local IT group, then talk to them first about what local options may be available to you
  • Campus Computing Infrastructure (CCI) – if you need network storage or server storage that isn’t focused on high performance computing then CCI has several options that could work depending on your needs
  • Advanced Computing Initiative (ACI) – if you need to do high performance or high throughput computing then ACI has several options that could work depending on your needs
  • Division of Information Technology (DoIT) – if you need cloud storage, like Box.com, or local storage, like an external hard drive, then DoIT has solutions that could work for you as well

This can seem like a lot to think about, and to be honest it can be quite confusing at times. The good news is that you have help! Research Data Services (RDS) can be a great starting point for your storage needs. We can focus on the key question: what are you looking to do? Then we can help you evaluate some potential options for moving forward based on your needs.

To get started contact RDS at http://researchdata.wisc.edu/help/contact-us/ or contact me at cci@cio.wisc.edu