Tools: Archiving Electronic Lab Notebooks

Electronic Lab Notebooks are becoming important data management tools for researchers in a number of fields. Since ELNs are replacing paper lab notebooks in many labs, can we anticipate a future in which boxes and shelves of decades-old notebooks are replaced with a digital archive of ELN entries? Since ELNs are relative newcomers to the data management ecosystem, some basic discussion about what an ELN archive should contain seems relevant.

There are four general types of data “assets” that can be recorded in a ELN and each has a separate set of considerations for archiving.

DoIT AT LTDE - Blog 100x100 Icons-131. Notebook pages/entries and folders

In ELNs, pages and entries are containers in which text, symbols, equations, and other entities are entered using tools in the ELN interface. ELN pages/entries may be further organized within folders in the notebook.

What needs to be preserved?

All the information entered in ELN notebook fields, including tags and comments. In addition, the organizational structure of the page and hierarchical structure of folders and subfolders needs to be preserved. Therefore, an export package should include notebook page files in formats such as xml, html, or PDF that preserve the content, appearance, and layout of notebook pages and folders. It should also retain the naming schemas and folder hierarchies with the notebook.

DoIT AT LTDE - Blog 100x100 Icons-112. Attached data files

These are data files and documents that were not created in the ELN interface but uploaded to the ELN platform and attached to an ELN entry. These can include things like images, spreadsheets, and data files from lab instruments. ELN platforms generally allow the user to add annotations and comments and associate them with these data files.

What needs to be preserved?

All the  data files in their original, native formats plus any annotations added in the ELN interface. Annotations and comments should be preserved as either  separate files linked to the data files or as components of page/entry files in the export package, rather than altering the data files themselves. If multiple versions of individual files were attached to an ELN entry/page, metadata about the versions, including dates, should be also be preserved.

DoIT AT LTDE - Blog 100x100 Icons-073. Linked data files

These are files and documents that are linked to an ELN entry but reside on other systems such as lab or department servers.

What needs to be preserved?

Although linked files are located external to the ELN platform, an archive of all the data associated with a notebook should include a record of the server address of the linked file plus evidence of whether the server location is still accurate for the file at the time of archiving. One mechanism to assure that the file associated with an ELN entry is valid is to generate a checksum using common algorithms like MD5 or SHA-1 that would be stored with the file location. Ideally, the ELN platform would manage this checksum generation. In addition, it would be beneficial for the ELN platform to perform periodic link checking even before archiving is done to assure the continued presence of the remote file.

DoIT AT LTDE - Blog 100x100 Icons-094. Metadata

This is information about the provenance of an ELN page/entry and includes things such as date and time, name of the individual creating/editing, the version history of attached data files, etc.

What needs to be preserved?

Provenance information that is viewable in the ELN interface should be included in archives of the ELN pages. More detailed metadata is contained in log files collected on the database and application servers of the ELN platform and some components of this information that provide evidence of user access and actions may also need to be preserved in an ELN archive.