Tools: Interview with Mark Igra, LabKey Server

Earlier this year, I spoke with Mark Igra, a partner at LabKey Software, and learned more about how LabKey Server works and how it’s used by researchers.

Q: What types of research is LabKey Server suited for?

Mark: LabKey Server helps teams of scientists bring together many different kinds of information from different sources for integrated analysis, secure sharing and collaboration.

Analysis across large datasets is a common need in fields that generate large volumes of data from high throughput techniques, such as proteomics and genomics. But geneticists also use LabKey Server to store phenotypes. Microscopists use it to document and point to high-resolution images. And clinical researchers use it to track diagnostic and other clinical data. Scientists across many fields of biomedical research face common challenges in managing, integrating and securely sharing their data.

An overview of the kinds of data, analyses, and collaborations Labkey server supports.
LabKey Server helps scientists integrate, analyze, and share many different kinds of research information through a secure web portal. Collaborators can only view the data they have permissions to see.

Q: What types of data files can be used for analysis in LabKey Server?

Mark: A wide range of data file formats, particularly tabular formats such as MS Excel spreadsheets.

Q: What’s involved in running an analysis on data files in LabKey Server?

Mark: LabKey Server provides web-based tools for analyzing and visualizing data. For example, you can use interactive data grids to filter, sort, and join tabular data from multiple experiments. You can also write R scripts to run analyses within LabKey Server and conduct SAS or SQL queries on data you are authorized to view.

This image shows a data grid in LabKey being filtered by treatment group.
Interactive data grids support sorting, filtering, adding/removing columns, data export, and a variety of visualization and analysis options, such as R scripting. This image shows a data grid being filtered by treatment group. A live view of this grid: http://goo.gl/oTQXbP
Screenshots showing the R script, the grid containing data used in the R analysis, and the plotted results of the analysis.
LabKey Server’s built-in interface for R scripting helps users create and share R-based analyses and visualizations through the web-based portal. Users with sufficient security credentials can explore alternative analyses by editing existing scripts and saving private copies. As shown here, source data, scripts and script results (“views”) are displayed on separate tabs. A live view: http://goo.gl/2MnzIM
A screenshot of a Chart Wizard in LabKey server showing how data types are filtered with checkboxes.
Chart wizards make it easy to produce interactive plots of results. This time-based chart shows progression relative to baseline for several cohorts. The checkboxes on the right allow users to filter the data displayed. A live version of this chart: http://goo.gl/23aukH

Q: How does analysis across spreadsheet data from multiple experiments or multiple labs work? What if each lab or experiment had a different way of naming columns or coding data values in spreadsheets?

Mark: It’s relatively easy to remap column names or set up aliases when you import each spreadsheet into LabKey Server. Inconsistent data types are a bigger problem. If the data values come from lookups, there are a few ways you can fix that by writing a script in LabKey Server. However, analysis across spreadsheet data always works best when spreadsheet data are simple and coded in consistent ways. When we meet help research groups set up their LabKey Servers, we usually help them define consistent ways of coding data and variable names.

Screenshot showing how values for data from pre-defined vocabularies are selected during data input.
To ensure standardized data entry, administrators can configure table fields as lookups to pre-defined lists of vocabulary. Users must then pick from a predefined list of terms when entering data in this field. The screenshot shows an example of how a field is configured as a lookup with a default value.

Q: How does LabKey Server differ from Electronic Lab Notebook software?

Mark: ELNs are based on the traditional lab notebook paradigm— a place to describe and store information about each experiment. LabKey Server is a tool for loading data and descriptive metadata in a structured way so you can compare and analyze across large volumes of data using the power of a database.

LabKey Server can be used like an ELN. For example, you can create data structures for specific experiment types, like a chemistry assay, then load data files from individual experiments, adding annotations about specific parameters for each experiment. LabKey Server then can read contents of the data files, perform transformations and visualizations across experiments, and populate the underlying database with the transformed data. It can also compare the quality of results from different experiments and show you any trends in quality due to differences in reagents or other conditions, as in the example below.

A screenshot showing how data quality from 10 experimental runs of the same assay is visualized in LabKey Server.
LabKey Server can help with experimental quality control by visualizing the progression of quality metrics over time. This figure shows a Levy-Jennings plot for a quality metric for a Luminex assay across 10 experimental runs, enabling early detection of problematic trends and outliers. A live version of this chart: http://goo.gl/f6I6nV

Q: So, researchers can write custom scripts for data analysis and other steps in LabKey Server. Does LabKey Server work like a code repository?

Mark: LabKey Server isn’t a code repository per se. For example, it doesn’t have built in versioning system for code. It does audit changes in configuration and security. And it can show you what code was used to run an analysis. Anyone who is writing code to implement on a LabKey Server should follow best practices for code versioning and use a code versioning application that is external to the LabKey Server.