Earlier this year, I spoke with Mark Igra, a partner at LabKey Software, and learned more about how LabKey Server works and how it’s used by researchers.
Q: What types of research is LabKey Server suited for?
Mark: LabKey Server helps teams of scientists bring together many different kinds of information from different sources for integrated analysis, secure sharing and collaboration.
Analysis across large datasets is a common need in fields that generate large volumes of data from high throughput techniques, such as proteomics and genomics. But geneticists also use LabKey Server to store phenotypes. Microscopists use it to document and point to high-resolution images. And clinical researchers use it to track diagnostic and other clinical data. Scientists across many fields of biomedical research face common challenges in managing, integrating and securely sharing their data.
Q: What types of data files can be used for analysis in LabKey Server?
Mark: A wide range of data file formats, particularly tabular formats such as MS Excel spreadsheets.
Q: What’s involved in running an analysis on data files in LabKey Server?
Mark: LabKey Server provides web-based tools for analyzing and visualizing data. For example, you can use interactive data grids to filter, sort, and join tabular data from multiple experiments. You can also write R scripts to run analyses within LabKey Server and conduct SAS or SQL queries on data you are authorized to view.
Q: How does analysis across spreadsheet data from multiple experiments or multiple labs work? What if each lab or experiment had a different way of naming columns or coding data values in spreadsheets?
Mark: It’s relatively easy to remap column names or set up aliases when you import each spreadsheet into LabKey Server. Inconsistent data types are a bigger problem. If the data values come from lookups, there are a few ways you can fix that by writing a script in LabKey Server. However, analysis across spreadsheet data always works best when spreadsheet data are simple and coded in consistent ways. When we meet help research groups set up their LabKey Servers, we usually help them define consistent ways of coding data and variable names.
Q: How does LabKey Server differ from Electronic Lab Notebook software?
Mark: ELNs are based on the traditional lab notebook paradigm— a place to describe and store information about each experiment. LabKey Server is a tool for loading data and descriptive metadata in a structured way so you can compare and analyze across large volumes of data using the power of a database.
LabKey Server can be used like an ELN. For example, you can create data structures for specific experiment types, like a chemistry assay, then load data files from individual experiments, adding annotations about specific parameters for each experiment. LabKey Server then can read contents of the data files, perform transformations and visualizations across experiments, and populate the underlying database with the transformed data. It can also compare the quality of results from different experiments and show you any trends in quality due to differences in reagents or other conditions, as in the example below.
Q: So, researchers can write custom scripts for data analysis and other steps in LabKey Server. Does LabKey Server work like a code repository?
Mark: LabKey Server isn’t a code repository per se. For example, it doesn’t have built in versioning system for code. It does audit changes in configuration and security. And it can show you what code was used to run an analysis. Anyone who is writing code to implement on a LabKey Server should follow best practices for code versioning and use a code versioning application that is external to the LabKey Server.