Version Control for Research Projects

Working on individual or collaborative projects of any size requires keeping your files organized. Inconsistent file management can result in lost work, redundancies, errors in the final products, or difficulty for others building on your work later on. One of the foundational practices for ensuring you keep your files organized is to use version control conventions and tools. Choosing the appropriate service depends on many factors including the types of files and data you are using and producing, the size of your team, and the frequency with which changes are made to your files. 

When considering the proper platform for maintaining your research files it is important to understand your responsibilities for identifying, transmitting, redistributing, storing or disposing of sensitive information. For more information, refer to UW-Madison’s guide to handling sensitive data, and the UW System policy.

Consistent File Naming Conventions

For projects of any size, forming some simple habits can keep your files and edits in order. Most basically, establishing a clear and consistent file naming system can ensure you can identify files and understand the relationships between them. Naming files “final_draft”, and then “final_final_draft” can get confusing as you make additional edits, and can be mystifying for those collaborating with you or for anyone who may want to access the files at a later date.

Instead, as Stanford Libraries suggests, using consistent versioning conventions like “v1, v2, v3” can serve as an extensible and transparent way to keep your versions straight. In addition, if a given file includes major changes, indicating that in the title with a brief descriptive word or phrase can help locate significant versions later on in the research process. 

Version Control and Collaboration

One of the primary benefits of version control is that it helps you track changes to  files as multiple people edit them. Having different researchers contribute to a shared collection of files is a common work model, and, in practice, everyone introduces changes to the files on overlapping schedules and with different ideas of how to document them. In these scenarios, establishing well-defined version control guidelines or relying on tools to maintain version control is a huge benefit. 

Tools for Version Control

There are many tools for aiding researchers with version control. These tools vary from simpler automated backup systems to platforms with customizable backup and versioning capabilities. 

Cloud Services:

UW Madison provides easy to use services such as Google Drive, Box, and Microsoft Office 365. These services have many benefits such as:

  • easy to use
  • automatic backup and versioning
  • flexible sharing options
  • seamless remote access 
  • folded in features like word processing and spreadsheet applications

Drawbacks to these services include: 

  • dependency on their built-in applications
  • limited software capabilities

They also may not be the best option in cases where it’s necessary to have a highly formalized version control system in place. If your project requires a customized backup schedule and detailed documentation of changes made to your files, Git may be your best option.

Git and Github:

Git is what is referred to as a Distributed Version Control System (DVCS) that allows users to seamlessly introduce new edits and versions to files in collaboration with others. It also allows you to document different versions and edits as they are created by collaborators simultaneously and through time. 

Github is the standard repository for maintaining strict version control and ensuring shareability and documentation of your work for the future. It also provides a stable and shareable platform for your project, its documentation, and its files. Git and Github are used by enterprise development teams and researchers alike, and though the details of how it works are complex, learning the basics and implementing it into your workflow is relatively simple. 

Other tools:

There are many other tools for research sharing and collaboration. UW-Madison supports LabArchives, an Electronic Lab Notebook application available to PIs which includes built in version control functions. The Open Science Framework is another tool for research collaboration, but is not supported by UW-Madison, so be sure to read the terms and conditions. 

This free online course from Data Carpentries can get you started with Git and Github and the RDS team is on standby ready to help you implement consistent version control systems on any platform into your research projects.