Spreadsheets and Databases as Tools for Data Management

Is the spreadsheet you’re using for research data starting to get unwieldy? 

For research projects of any size it’s important to think about the tools that will best suit your data collection, recording, and analysis needs now and into the future. Spreadsheets and databases are two of the most versatile and widely used tools for managing research data. However, to ensure the long term integrity of your data, there are many scenarios in which the limitations of spreadsheets mean it may be time to transition your research data to a database. 

Spreadsheets:

Spreadsheets, whether in a proprietary software like Microsoft Excel or open source applications like OpenOffice, are some of the most widely used data management tools. This is not by accident: spreadsheets are an easy way to get started storing both qualitative and quantitative data. For larger research projects or those with highly specialized data types and vocabularies, however, spreadsheets have limitations that can jeopardize data integrity and introduce errors into your research process. 

Benefits of Spreadsheets: 

  • They are easy to learn and many people already know many of their functions 
  • They provide flexible and customizable features for organizing and analyzing data
  • Many other applications and coding languages work seamlessly with data stored in spreadsheets
  • Data in spreadsheets can be easily converted into csv or tabular formats that are perfectly suited for long term storage and sharing

Drawbacks of spreadsheets:

  • Limited control over data integrity and format
  • Some proprietary spreadsheet software have limits on the amount of data they can house
  • Most importantly, it’s easy for humans to introduce errors into data through mistakes like accidentally overwriting cells or by recording data in non-normalized structures (e.g. Excel famously mangles dates as cited in this blog post by a data librarian.)

For more information, consider taking a look at this Data Carpentries course on how to use spreadsheets for data management.

Databases: 

Databases are among the most powerful and ubiquitous tools for managing data of all kinds because of their dynamic ability to organize, query, and share the data they contain. There are both proprietary and open source database software that can serve as effective solutions for managing your research data. In addition, data that is already stored in a spreadsheet can, in many cases, be integrated into a database system

Benefits of databases: 

  • Databases provide highly customizable controls over formats, vocabularies, and data types according to which data is entered and stored
  • Easy to enforce quality control by requiring specific formats or values for fields and limiting access to tables
  • There are many Database Management Systems (DBMS) like MySQL Workbench or Microsoft Access that support the design and implementation of databases without extensive coding know how
  • Databases facilitate complex relationships between data elements and support sophisticated queries 
  • Command line controls make it possible to implement detailed customization of database structure 
  • Databases often include customizable interfaces that provide access to data as well as forms for entering new data

Drawbacks of databases: 

  • Learning database languages such as SQL can be labor and time intensive
  • Planning databases often requires extensive planning and may slow the research process down
  • Data stored in proprietary database software can be difficult to convert to other formats

Where to get help: If you are interested in further exploring database solutions you can access database training courses on LinkedIn Learning (formerly Lynda.com) or you can consult with Teach and Research Application Development (TRAD) on your options.