From September 3-5, the Workshop on Open Citations was held in Bologna: researchers, scholarly publishers, funders, policy makers, and advocates for open citations gathered to present new tools and practices for the creation, management, and reuse of citation data, and to participate in a hackathon.
Written by Jarrod Irwin
Researchers across disciplines care about the security and privacy of their data–especially those with data containing personally identifiable information, as even limited combinations of certain data points can lead to the identification of a subject. Researchers in the health sciences have an especially strong need to protect it. Patient health data has special legal restrictions on its use and handling.
Here’s a quick look at the most important U.S. law affecting patient health data, as well as some best practices for preserving data privacy when working with laboratory tests and other types of medical research data.
Q: What types of research is LabKey Server suited for?
Mark: LabKey Server helps teams of scientists bring together many different kinds of information from different sources for integrated analysis, secure sharing and collaboration.
Analysis across large datasets is a common need in fields that generate large volumes of data from high throughput techniques, such as proteomics and genomics. But geneticists also use LabKey Server to store phenotypes. Microscopists use it to document and point to high-resolution images. And clinical researchers use it to track diagnostic and other clinical data. Scientists across many fields of biomedical research face common challenges in managing, integrating and securely sharing their data.
Q: What types of data files can be used for analysis in LabKey Server?
Mark: A wide range of data file formats, particularly tabular formats such as MS Excel spreadsheets.
Q: What’s involved in running an analysis on data files in LabKey Server?
Mark: LabKey Server provides web-based tools for analyzing and visualizing data. For example, you can use interactive data grids to filter, sort, and join tabular data from multiple experiments. You can also write R scripts to run analyses within LabKey Server and conduct SAS or SQL queries on data you are authorized to view.
Q: How does analysis across spreadsheet data from multiple experiments or multiple labs work? What if each lab or experiment had a different way of naming columns or coding data values in spreadsheets?
Mark: It’s relatively easy to remap column names or set up aliases when you import each spreadsheet into LabKey Server. Inconsistent data types are a bigger problem. If the data values come from lookups, there are a few ways you can fix that by writing a script in LabKey Server. However, analysis across spreadsheet data always works best when spreadsheet data are simple and coded in consistent ways. When we meet help research groups set up their LabKey Servers, we usually help them define consistent ways of coding data and variable names.
Q: How does LabKey Server differ from Electronic Lab Notebook software?
Mark: ELNs are based on the traditional lab notebook paradigm— a place to describe and store information about each experiment. LabKey Server is a tool for loading data and descriptive metadata in a structured way so you can compare and analyze across large volumes of data using the power of a database.
LabKey Server can be used like an ELN. For example, you can create data structures for specific experiment types, like a chemistry assay, then load data files from individual experiments, adding annotations about specific parameters for each experiment. LabKey Server then can read contents of the data files, perform transformations and visualizations across experiments, and populate the underlying database with the transformed data. It can also compare the quality of results from different experiments and show you any trends in quality due to differences in reagents or other conditions, as in the example below.
Q: So, researchers can write custom scripts for data analysis and other steps in LabKey Server. Does LabKey Server work like a code repository?
Mark: LabKey Server isn’t a code repository per se. For example, it doesn’t have built in versioning system for code. It does audit changes in configuration and security. And it can show you what code was used to run an analysis. Anyone who is writing code to implement on a LabKey Server should follow best practices for code versioning and use a code versioning application that is external to the LabKey Server.
The UW-Madison has implemented a utility for exporting your files out of your UW Google Drive account (as well as YouTube and Google Contacts) in one step. This is useful for archiving files in your account if you are leaving the University or if you want a copy of the files to place in another location. Takeout doesn’t delete the files; it creates a copy, so if you need to delete them, you will need to do that directly in Google Drive.
See Exporting Data Using Google Takeout for instructions on how to do this.
Google Takeout creates a zipped folder, named <yourNetID>@wisc.edu-takeout.zip which you can download from the browser.
DMPTool Webinar Series Brown Bag
Join us for a ~15 part webinar series on the Data Management Planning Tool, DMPTool, from the California Digital Library. This series will introduce the tool, discuss how to use it effectively, and describe how it can be customized for institutional needs. Librarians, staff, and information professionals interested in promoting the use of the DMPTool by researchers are encouraged to attend.
More information on the DMPTool webinar series.
Webinar 8, Tuesday, August 13, 12-1pm, 126 Memorial Library – Data curation profiles and the DMPTool – Jake Carlson
LabKey Server is an open source data management platform designed for organizing and managing data from large-scale research; for example, data from thousands of samples and/or subjects. It provides a secure environment for collaborators at different locations to share, combine, and query data. It is an extensible platform, allowing developers to create custom applications for data analysis and visualization through its API (application programming interface).
LabKey has been used in several biomedical research communities to integrate and analyze data from high throughput assays conducted in distributed labs, including the Immune Tolerance Network, the Atlas data portal for HIV Vaccine studies, and others.
LabKey is currently in use at the UW-Madison Primate Center.
What It Is: Cloud-based file storage, synchronization, and back-ups. SpiderOak is available on Windows, Linux, OS X, iOS, Android, and N900 Maemo.
Cost: Free, premium, and enterprise accounts available. The pricing for storage is better compared to Dropbox; $10/month gets you 100GB at SpiderOak vs. 50GB from Dropbox. SpiderOak also has no maximum storage limit. Additionally, it offers a 50% educational discount to anyone with a valid .edu email address.
Ease of Use: SpiderOak’s forte is security, not interface design. The web and mobile interfaces are fairly plain and not nearly as user-friendly as Dropbox’s interfaces. Additionally, while Dropbox has a very simple set-up–everything goes in the Dropbox folder and syncs to all your devices unless you tell it not to–SpiderOak’s set up is a bit more involved. First, you need to set up a back-up. You can choose multiple folders and even specific types of files. After you’ve done this, you can sync the folders across your devices. Finally, access from the web and mobile interfaces is read-only. You can only upload files from the desktop client.
Sharing and Collaboration: SpiderOak provides ShareRooms which allow you to selectively share folders (with anyone; not limited to other SpiderOak users), but the files are read-only. It also allows sharing of a single file, but this is read-only as well. The sharing is more secure: the ShareRoom is access through a unique URL and a RoomKey (password) must be entered, but there is no mechanism for collaborative editing.
Organizing: Other than the traditional hierarchical file system structure, SpiderOak does not have any built-in organizational features.
Exporting: Files can easily be exported. Simply de-select the folders or files in question from the syncing and back-up.
Backups and Versioning: This is one area where SpiderOak does well. It says all historical versions of a file, and does extensive de-duplication, so only the parts that are different are saved, not the entire file.
Security: SpiderOak is, as Ars Technica puts it, “Dropbox for the security obsessive.” Its main selling point is not that’s cloud storage, but that it is secure cloud storage. Unlike the other major cloud storage services, SpiderOak employees cannot access your files. Both Dropbox and SpiderOak encrypt their data, but SO also encrypts the decryption key. The downside to SpiderOak’s superior security is that if you forget your password, your files are gone.