A newly designed banner with a graphic of mascot Bucky Badger’s face hangs between the columns of Bascom Hall at the University of Wisconsin-Madison during autumn on Oct. 27, 2014. In the foreground is the Abraham Lincoln statue and pedestrians walking across Bascom Hill. (Photo by Jeff Miller/UW-Madison)
If your time as a researcher or student at UW-Madison is coming to an end, good luck with your new opportunities! As you make the shift, it’s important to begin the process of off-boarding – taking all the necessary steps to ensure a seamless transition when formally separating from the university.
This is especially important when it comes to your research data. Off-boarding requires a careful assessment of all the data, accounts, and tools you have used while at UW-Madison and an understanding of policies on transitioning your research data to your collaborators, departments, or new institutions.
To help, we have put together this brief guide. But remember, many labs, departments, and colleges have their own off-boarding procedures, so it’s best to inquire there for more specific guidance. UW-Madison has also gathered some role-specific resources to get started.
Is the spreadsheet you’re using for research data starting to get unwieldy?
For research projects of any size it’s important to think about the tools that will best suit your data collection, recording, and analysis needs now and into the future. Spreadsheets and databases are two of the most versatile and widely used tools for managing research data. However, to ensure the long term integrity of your data, there are many scenarios in which the limitations of spreadsheets mean it may be time to transition your research data to a database.
Spreadsheets, whether in a proprietary software like Microsoft Excel or open source applications like OpenOffice, are some of the most widely used data management tools. This is not by accident: spreadsheets are an easy way to get started storing both qualitative and quantitative data. For larger research projects or those with highly specialized data types and vocabularies, however, spreadsheets have limitations that can jeopardize data integrity and introduce errors into your research process.
Benefits of Spreadsheets:
- They are easy to learn and many people already know many of their functions
- They provide flexible and customizable features for organizing and analyzing data
- Many other applications and coding languages work seamlessly with data stored in spreadsheets
- Data in spreadsheets can be easily converted into csv or tabular formats that are perfectly suited for long term storage and sharing
Drawbacks of spreadsheets:
- Limited control over data integrity and format
- Some proprietary spreadsheet software have limits on the amount of data they can house
- Most importantly, it’s easy for humans to introduce errors into data through mistakes like accidentally overwriting cells or by recording data in non-normalized structures (e.g. Excel famously mangles dates as cited in this blog post by a data librarian.)
For more information, consider taking a look at this Data Carpentries course on how to use spreadsheets for data management.
Databases are among the most powerful and ubiquitous tools for managing data of all kinds because of their dynamic ability to organize, query, and share the data they contain. There are both proprietary and open source database software that can serve as effective solutions for managing your research data. In addition, data that is already stored in a spreadsheet can, in many cases, be integrated into a database system.
Benefits of databases:
- Databases provide highly customizable controls over formats, vocabularies, and data types according to which data is entered and stored
- Easy to enforce quality control by requiring specific formats or values for fields and limiting access to tables
- There are many Database Management Systems (DBMS) like MySQL Workbench or Microsoft Access that support the design and implementation of databases without extensive coding know how
- Databases facilitate complex relationships between data elements and support sophisticated queries
- Command line controls make it possible to implement detailed customization of database structure
- Databases often include customizable interfaces that provide access to data as well as forms for entering new data
Drawbacks of databases:
- Learning database languages such as SQL can be labor and time intensive
- Planning databases often requires extensive planning and may slow the research process down
- Data stored in proprietary database software can be difficult to convert to other formats
Where to get help: If you are interested in further exploring database solutions you can access database training courses on LinkedIn Learning (formerly Lynda.com) or you can consult with Teach and Research Application Development (TRAD) on your options.
Happy New Year’s! The start of a new year and a new semester are as good a time as ever to evaluate your data management practices. Here are some reminders about data management best practices, groups on campus who can help you with managing your data, and some upcoming opportunities for you to sharpen your skills.
OpenCitations has been working toward enhancing citations to make citation data more easily discoverable and retrievable. In July, OpenCitations released COCI, the OpenCitations Index of Crossref open DOI-to-DOI references. The initial release of COCI created first-class data entities out of citation information in order to index Crossref and to make this information machine-readable. The July release also included the OpenCitations Corpus (OCC), a repository of downloadable bibliographic and citation data. OpenCitations has been building upon the data model that they created, and released the newest version of COCI this week: they have extended the data model, and the index now contains almost 450 million citation links between DOIs from Crossref reference data.
Written by Chiu-chuang Lu Chou; Information adapted from OpenICPSR
OpenICPSR is a self-serving data repository for researchers who need to deposit their social and behavioral science research data for public access compliance. Researchers can share up to 2 GB data in OpenICPSR for free. Researchers prepare all data and documentation files necessary to allow their data collection be read and interpreted independently. They also prepare metadata to allow their data be searched and discovered in ICPSR catalog and major search engines. A DOI and a data citation will be provided to the depositor after data are published.
Depositors will receive data download reports from OpenICPSR. All OpenICPSR data is governed by the Attribution 4.0 Creative Commons License. Server-side encryption is used to encrypt all files uploaded to OpenICPSR. Data deposited in self-deposit package are distributed and preserved as-is, exactly as they arrive without the standard curation and preservation features available to professional curation package.
OpenICPSR offers Professional Curation Package to researchers, who like to utilize ICPSR’s curation services including full metadata generation and a bibliography search, statistical package conversion, and user support. The cost of professional curation is based on the number of variables and complexity of the data. To learn more about OpenICPSR, please visit their website
by Cameron Cook
With fall well under way on campus and final projects just around the corner, it’s the perfect time to review our top five data management tips for undergrads! As an undergraduate, data management may not seem important, but giving it a few moments of your day will ensure your assignments are safe – even in the face of a hard drive meltdown the night before a due date.
If keeping your final projects safe isn’t enough of an incentive, there is one more. You have undergraduate publishing opportunities. As you learn and grow as a researcher, you can publish your work in a number of undergraduate research journals. Practicing good data management will help keep your research reproducible, understandable, findable, and organized for when you submit your work to a journal.
1 ) Clear, consistent file naming and structure
Or know where your data lives. Keep file names simple, short, but descriptive. Include dates (in a standardized format) to version your files so that you can always go back to a previous copy in case of mistakes. Keep files in a consistent, clear structure with easy to follow labels (this may be date, file type, instrument or analysis type) so that you will never misplace an important file.
by Brianna Marshall, Digital Curation Coordinator
This is part three of a three-part series where I explore platforms for archiving and sharing your data. Read the first post in the series, focused on UW’s institutional repository, MINDS@UW or read the second post, focused on data repository Dryad.
To help you better understand your options, here are the areas I will address for each platform:
- Background information on who can use it and what type of content is appropriate.
- Options for sharing and access
- Archiving and preservation benefits the platform offers
- Whether the platform complies with the forthcoming OSTP mandate
figshare is a discipline-neutral platform for sharing research in many formats, including figures, datasets, media, papers, posters, presentations and filesets. All items uploaded to figshare are citable, shareable and discoverable.
Sharing and access
All publicly available research outputs are stored under Creative Commons Licenses. By default, figures, media, posters, papers, and filesets are available under a CC-BY license, datasets are available under CC0, and software/code is available under the MIT license. Learn more about sharing your research on figshare.
Archiving and preservation
figshare notes that items will be retained for the lifetime of the repository and that its sustainability model “includes the continued hosting and persistence of all public research outputs.” Research outputs are stored directly in Amazon Web Service’s S3 buckets. Data files and metadata are backed up nightly and replicated into multiple copies in the online system. Learn more about figshare’s preservation policies.
The OSTP mandate requires all federal funding agencies with over $100 million in R&D funds to make greater efforts to make grant-funded research outputs more accessible. This will likely mean that data must be publicly accessible and have an assigned DOI (though you’ll need to check with your funding agency for the exact requirements). All items uploaded to figshare are minted a DataCite DOI, so as long as your data is set to public it is a good candidate for complying with the mandate.
Have additional questions or concerns about where you should archive your data? Contact us.