The Big Picture of Data Management in an Electronic Lab Notebook (ELN)

by Jan Cheetham

The ELN is designed to be a single place to hold the different types of digital data you produce in your research. Now that UW-Madison researchers have unlimited storage on the LabArchives platform, there are no physical limits on the amount of data you include in your ELN notebook. It may be time to take a holistic view of how to manage all those data files in the ELN.

To get a big picture view, it helps to consider the different ways to add data and information to a LabArchives notebook and the options for getting it out again, as a backup copy or to archive it for the long haul. In an ELN system, a notebook consists of essentially three different types of data: notebook pages, attached files, and linked files.

 

 

Getting data in

This schematic illustrates the different ways to add data to a notebook: manually uploading it to an attachment entry; typing, pasting, or dragging it into a rich text entry, or using an automated workflow to “push” it from instruments or software into an attachment entry (blue arrows).  Once a data file is attached to an entry, LabArchives keeps track of the file name and versions, including times and dates of changes.

Linking to data

If your data files need to live on an external server outside of the ELN, you can create a link to them from inside a notebook entry. However, if the files move, this link will break.

Getting data out

The green arrows in the diagram represent the export process, which is manual. LabArchives has two types of export packages: Offline Notebook, a zipped folder that includes notebook pages in HTML format plus attached data files (the most recent version only) and Print to PDF, a single PDF file of the pages that includes names and icons of attached files only. Of course, neither option contains a copy of linked files that are stored outside the ELN– only the path to the external files gets recorded in the ELN.

Backing up and archiving notebooks require some planning. There are several factors to consider:

  1. How often will you need to back up your notebook? Which type(s) of export package(s) will be most useful for this?
  2. Where will you store these export package files? Will you also print out paper copies and retain them? Is there a digital repository or data archiving platform you can keep them in for the long term?
  3. Will you keep a backup copy of attached data files outside of the ELN? If yes, you may find that the PDF format is sufficient for periodic backups of your notebook. If no, you may want to have a regular schedule for exporting the Offline Notebook as a backup of both notebook pages and attached data files.
  4. Are the external data files you link to from the ELN in a permanent location? If not, consider moving them into the ELN or storing them in a more permanent location.

For more information

ELN Archiving

Scenarios to help you create an archival and backup plan.

Archiving Electronic Lab Notebooks

Components of a ELN notebook and what information in each needs to be archived.

Manage Your Data with LabArchives

More details about export packages as well as data management tips for sharing, organizing, and documenting data inside the ELN.

 Data Storage and Backup

Storage and backup resources at UW-Madison for exported ELN files.

 Digital Archiving Platforms

A sampling of digital data repositories that may be useful for archiving ELNs

Tools: OnCore and REDCap

by Jan Cheetham

OnCore copy

RedCap

 

REDCap (Research Electronic Data Capture) and OnCore (Online Collaborative Environment) are clinical data management tools supported by the UW Institute for Clinical and Translational Research. See table for a comparison of the features of the two tools.

These systems are used by researchers who conduct clinical trials in the School of Medicine and Public Health and in other units. OnCore is required for some types of clinical protocols. The two systems are designed for use with clinical research data, including identifiable information about subjects. In both systems, data is entered in forms. OnCore provides standard forms for managing clinical trials that can be customized by ICTR staff. REDCap users create their own forms and these can be used to collect survey data. Supporting files and documents in various formats can be also be uploaded to both systems.

Security

Both the OnCore and REDCap systems are HIPAA compliant, employing secure networks, architectures, and appliances such as firewalls, routers, and gateways for routing data. Data in the systems are encrypted and all actions are tracked and audited. In addition, access to data centers is restricted to authorized personnel only.

Sharing

Both systems allow access by researchers at multiple study sites. Access rights can be specified for each user, to limit access to personal health information fields to specific individuals, or to allow only some users to enter data and others to electronically sign/verify and lock records.

Tracking Changes/Versions

The audit trail for OnCore is not complete for all consoles within the system. The EDC system does have a decent audit trail for data entered into forms and each individual subject can be printed. But there is not a summary export of the full audit trail for all subjects.

Data Documentation

Both systems allow upload of supporting documents describing the data and collection methods, such as data dictionaries, code books, protocols, etc. OnCore has a general location to upload Protocol documents, but data cannot be imported into OnCore like REDCap.

Data Quality Controls

In both systems, forms can include several measures that enhance the accuracy/validity of data entered in forms. These include field notes describing allowed data values and field validation settings that limit data entry to specified ranges of values. Data quality rules can also be applied to search for missing values and empty fields in forms. In addition, data records can be verified and locked.

Exporting

Both systems allow export of data in a variety of formats for use in statistical software, such as Excel, SAS, SPSS, and others.

Information updated 8/2/17 by Allan Barclay with assistance from ICTR.

Manage Your Data with LabArchives

line beaker

By Jan Cheetham, Research and Instructional Technologies Consultant, DoIT

LabArchives is an ELN (Electronic Lab Notebook) that provides data storage, data documentation, collaboration, and export features. Like traditional paper lab notebooks, an ELN can serve as a continuous and complete record of the research process.

Basics

Collaboration and Sharing

LabArchives provides flexible permissions and roles for lab members and their collaborators. It is recommended that PI’s assume the Owner role in all their lab’s notebooks, in alignment with UW-Madison’s Policy on Data Stewardship, Access, and Retention and to ensure that no data is lost when lab members graduate or leave the university.

There are several approaches for organizing notebooks and managing edit/read rights of individuals. Permissions can be set at the level of the notebook, page, or entry. It also possible for individuals in the Owner or Admin role to share notebooks, pages, and entries with collaborators outside the university. Although LabArchives has a method for creating Digital Object Identifiers (DOIs) for notebooks, this requires making the notebook publicly available. The UW-Madison LabArchives site currently has the public sharing feature turned off as a security measure to prevent inadvertent sharing of notebooks.

The ELN provides a timestamp and record of every user action, creating an electronic record of who added or edited an entry and when. In addition, nothing can be permanently deleted from the ELN. ( LabArchives allows you to move a notebook, page, or entry to a Delete Bin; however, these items are not actually deleted and can be recovered at any time.)

Organizing and Documenting

The ability to blend digital data with the human readable narrative of the research process is one of the main advantages of an ELN over other file sharing/storage services or hybrid paper/electronic systems. LabArchives has a number of different entry types for entering data and recording the narratives. Below are a few suggestions that will help ensure that the information you enter in LabArchives can be readily retrieved.

Naming conventions
LabArchives currently does not offer a way to browse through folders or pages chronologically. Therefore, you may want to use file-naming conventions for pages (and possibly, folders). Names should contain a project name, date, experiment identifier, etc. For more specific suggestions, see naming conventions in an ELN.  It is also a good idea to use similar naming conventions for files you attach or link to in the ELN to make it easier to trace through versions and locate those with transformations.

Documenting attached files
In LabArchives, you upload and attach a single data file to an attachment entry on a page. The file can be of any type and up to 250 MB in size. The entry will display the name of the attached file and you can also enter a description with detailed information (metadata) about the file. When you upload a new version of the file to the same entry, LabArchives retains all prior versions and lets you revert back to older versions through the entry’s revision history. However, as noted below, only the most recent version is included in HTML export. Therefore, to ensure that all data files that you or someone else would need to reproduce your findings are archived both inside the ELN and in HTML exports, be sure to create a separate attachment entry for each essential file that needs to be retained in its original, unaltered form. Then, new versions of the data file (in which the original data are cleaned, transformed, analyzed, visualized, etc.) should be added to the ELN as one or more new entries.

Documenting linked files
When data files are too big (>250 MB) or too numerous to attach to the ELN, you can create links to them from within a rich text entry. However, LabArchives does not check links or verify locations, so you will need to ensure the files are in a secure and permanent location. It is also a good practice to record the name of the file and its location directly in the rich text entry since the URL you add when you create a link is not directly visible in the entry.

Exporting and Archiving

LabArchives has two export formats, PDF and HTML. The PDF version is similar to a scanned paper notebook page. The HTML version lacks some of the appearance of the notebook but contains more complete information, including attached files. As with any digital platform you use for your research data, you will want to have a backup and archival plan. This should take into account how often you make changes to the notebook and include methods for retaining duplicate copies of important data files in alternate locations.

PDF
PDFs can be created for a single entry or page or entire notebook. PDFs include: text entries, thumbnails of images and widgets, annotations and descriptions of attachments, user name and time/date stamps. They do not include: attached files, version history of attachments, or comments. URLs of links in rich text entries may be retrievable, depending on the application you use to read the PDF.

HTML
The HTML option exports an entire notebook. Each page in the notebook is a separate HTML file and the most recent version of each attached file is also included. This export option also does not include version history of attachments or comments. Again, URLs that you add to create links in rich text entries may be retrievable, depending on the browser you use to read the HTML pages.

Do you have additional questions or concerns about electronic lab notebooks? Contact us.

Tools: Archiving Electronic Lab Notebooks

Electronic Lab Notebooks are becoming important data management tools for researchers in a number of fields. Since ELNs are replacing paper lab notebooks in many labs, can we anticipate a future in which boxes and shelves of decades-old notebooks are replaced with a digital archive of ELN entries? Since ELNs are relative newcomers to the data management ecosystem, some basic discussion about what an ELN archive should contain seems relevant.

There are four general types of data “assets” that can be recorded in a ELN and each has a separate set of considerations for archiving.

DoIT AT LTDE - Blog 100x100 Icons-131. Notebook pages/entries and folders

In ELNs, pages and entries are containers in which text, symbols, equations, and other entities are entered using tools in the ELN interface. ELN pages/entries may be further organized within folders in the notebook.

What needs to be preserved?

All the information entered in ELN notebook fields, including tags and comments. In addition, the organizational structure of the page and hierarchical structure of folders and subfolders needs to be preserved. Therefore, an export package should include notebook page files in formats such as xml, html, or PDF that preserve the content, appearance, and layout of notebook pages and folders. It should also retain the naming schemas and folder hierarchies with the notebook.

DoIT AT LTDE - Blog 100x100 Icons-112. Attached data files

These are data files and documents that were not created in the ELN interface but uploaded to the ELN platform and attached to an ELN entry. These can include things like images, spreadsheets, and data files from lab instruments. ELN platforms generally allow the user to add annotations and comments and associate them with these data files.

What needs to be preserved?

All the  data files in their original, native formats plus any annotations added in the ELN interface. Annotations and comments should be preserved as either  separate files linked to the data files or as components of page/entry files in the export package, rather than altering the data files themselves. If multiple versions of individual files were attached to an ELN entry/page, metadata about the versions, including dates, should be also be preserved.

DoIT AT LTDE - Blog 100x100 Icons-073. Linked data files

These are files and documents that are linked to an ELN entry but reside on other systems such as lab or department servers.

What needs to be preserved?

Although linked files are located external to the ELN platform, an archive of all the data associated with a notebook should include a record of the server address of the linked file plus evidence of whether the server location is still accurate for the file at the time of archiving. One mechanism to assure that the file associated with an ELN entry is valid is to generate a checksum using common algorithms like MD5 or SHA-1 that would be stored with the file location. Ideally, the ELN platform would manage this checksum generation. In addition, it would be beneficial for the ELN platform to perform periodic link checking even before archiving is done to assure the continued presence of the remote file.

DoIT AT LTDE - Blog 100x100 Icons-094. Metadata

This is information about the provenance of an ELN page/entry and includes things such as date and time, name of the individual creating/editing, the version history of attached data files, etc.

What needs to be preserved?

Provenance information that is viewable in the ELN interface should be included in archives of the ELN pages. More detailed metadata is contained in log files collected on the database and application servers of the ELN platform and some components of this information that provide evidence of user access and actions may also need to be preserved in an ELN archive.

Tools: Interview with Mark Igra, LabKey Server

Earlier this year, I spoke with Mark Igra, a partner at LabKey Software, and learned more about how LabKey Server works and how it’s used by researchers.

Q: What types of research is LabKey Server suited for?

Mark: LabKey Server helps teams of scientists bring together many different kinds of information from different sources for integrated analysis, secure sharing and collaboration.

Analysis across large datasets is a common need in fields that generate large volumes of data from high throughput techniques, such as proteomics and genomics. But geneticists also use LabKey Server to store phenotypes. Microscopists use it to document and point to high-resolution images. And clinical researchers use it to track diagnostic and other clinical data. Scientists across many fields of biomedical research face common challenges in managing, integrating and securely sharing their data.

An overview of the kinds of data, analyses, and collaborations Labkey server supports.

LabKey Server helps scientists integrate, analyze, and share many different kinds of research information through a secure web portal. Collaborators can only view the data they have permissions to see.

Q: What types of data files can be used for analysis in LabKey Server?

Mark: A wide range of data file formats, particularly tabular formats such as MS Excel spreadsheets.

Q: What’s involved in running an analysis on data files in LabKey Server?

Mark: LabKey Server provides web-based tools for analyzing and visualizing data. For example, you can use interactive data grids to filter, sort, and join tabular data from multiple experiments. You can also write R scripts to run analyses within LabKey Server and conduct SAS or SQL queries on data you are authorized to view.

This image shows a data grid in LabKey being filtered by treatment group.

Interactive data grids support sorting, filtering, adding/removing columns, data export, and a variety of visualization and analysis options, such as R scripting. This image shows a data grid being filtered by treatment group. A live view of this grid: http://goo.gl/oTQXbP

Screenshots showing the R script, the grid containing data used in the R analysis, and the plotted results of the analysis.

LabKey Server’s built-in interface for R scripting helps users create and share R-based analyses and visualizations through the web-based portal. Users with sufficient security credentials can explore alternative analyses by editing existing scripts and saving private copies. As shown here, source data, scripts and script results (“views”) are displayed on separate tabs. A live view: http://goo.gl/2MnzIM

A screenshot of a Chart Wizard in LabKey server showing how data types are filtered with checkboxes.

Chart wizards make it easy to produce interactive plots of results. This time-based chart shows progression relative to baseline for several cohorts. The checkboxes on the right allow users to filter the data displayed. A live version of this chart: http://goo.gl/23aukH

Q: How does analysis across spreadsheet data from multiple experiments or multiple labs work? What if each lab or experiment had a different way of naming columns or coding data values in spreadsheets?

Mark: It’s relatively easy to remap column names or set up aliases when you import each spreadsheet into LabKey Server. Inconsistent data types are a bigger problem. If the data values come from lookups, there are a few ways you can fix that by writing a script in LabKey Server. However, analysis across spreadsheet data always works best when spreadsheet data are simple and coded in consistent ways. When we meet help research groups set up their LabKey Servers, we usually help them define consistent ways of coding data and variable names.

Screenshot showing how values for data from pre-defined vocabularies are selected during data input.

To ensure standardized data entry, administrators can configure table fields as lookups to pre-defined lists of vocabulary. Users must then pick from a predefined list of terms when entering data in this field. The screenshot shows an example of how a field is configured as a lookup with a default value.

Q: How does LabKey Server differ from Electronic Lab Notebook software?

Mark: ELNs are based on the traditional lab notebook paradigm— a place to describe and store information about each experiment. LabKey Server is a tool for loading data and descriptive metadata in a structured way so you can compare and analyze across large volumes of data using the power of a database.

LabKey Server can be used like an ELN. For example, you can create data structures for specific experiment types, like a chemistry assay, then load data files from individual experiments, adding annotations about specific parameters for each experiment. LabKey Server then can read contents of the data files, perform transformations and visualizations across experiments, and populate the underlying database with the transformed data. It can also compare the quality of results from different experiments and show you any trends in quality due to differences in reagents or other conditions, as in the example below.

A screenshot showing how data quality from 10 experimental runs of the same assay is visualized in LabKey Server.

LabKey Server can help with experimental quality control by visualizing the progression of quality metrics over time. This figure shows a Levy-Jennings plot for a quality metric for a Luminex assay across 10 experimental runs, enabling early detection of problematic trends and outliers. A live version of this chart: http://goo.gl/f6I6nV

Q: So, researchers can write custom scripts for data analysis and other steps in LabKey Server. Does LabKey Server work like a code repository?

Mark: LabKey Server isn’t a code repository per se. For example, it doesn’t have built in versioning system for code. It does audit changes in configuration and security. And it can show you what code was used to run an analysis. Anyone who is writing code to implement on a LabKey Server should follow best practices for code versioning and use a code versioning application that is external to the LabKey Server.

Tools: Google Takeout

The UW-Madison has implemented a utility for exporting your files out of your UW Google Drive account (as well as YouTube and Google Contacts) in one step. This is useful for archiving files in your account if you are leaving the University or if you want a copy of the files to place in another location. Takeout doesn’t delete the files; it creates a copy, so if you need to delete them, you will need to do that directly in Google Drive.

See  Exporting Data Using Google Takeout for instructions on how to do this.

Google Takeout creates a zipped folder, named <yourNetID>@wisc.edu-takeout.zip which you can download from the browser.