Tools: Colectica for Excel

Many researchers use spreadsheets to collect, analyze and archive their research data, but spreadsheets are notoriously poor data management tools that are subject to common and costly errors. As part of an ongoing series by RDS that examines the use of spreadsheets in research, the following slide deck (RDS_brownbag_20140313) is from a presentation by RDS personnel advocating for best practices for use of spreadsheets in quantitative research. The end of the presentation contains a demonstration of a recently developed tool that documents Excel spreadsheets according to a metadata standard called the Data Documentation Initiative.

Using Spreadsheets for Research Data Management

The RDS group advocates for good data management practices at UW-Madison. Data management is a topic that is attracting more and more attention in the era of Big Data. One ubiquitous practice among researchers and analysts in academia and business is the use of spreadsheets for data entry, storage, and analysis. While the use of spreadsheets in research is widespread, there are few guidelines for such use. Indeed, spreadsheets pose some troublesome issues from the perspective of documenting and managing research data. The topic of spreadsheet use in research has gained quite a bit of traction since Spring, 2013, when a controversial and widely-cited academic paper on government debt and growth was shown to be based on a faulty Excel dataset.

Prompted in part of this and other related events, RDS has recently updated its recommendations on using spreadsheets in research data management.  Another great resource to consult before deciding to use spreadsheets for your research is a primer on using Excel for data entry assembled by the UW Social Science Computing Cooperative. Some tools that can potentially improve the documentation of spreadsheet data and analysis are Colectica for Excel and Data Up.

Popular Economic Paper Criticized for Undocumented Errors

A new review of an influential research article on fiscal austerity and GDP finds that the results were tainted in part by an undocumented error in the authors’ Excel dataset. The original research by Carmen Reinhart and Ken Rogoff was titled “Growth in a Time of Debt” claimed that economic growth slowed quite dramatically for countries whose public debt crossed a threshold of 90% of Gross Domestic Product. Since its publication, this finding has often been cited in stimulus/austerity debates, but many economists were unable to replicate it, in part because of the authors’ reticence to share their original data.

The authors of the new review were able to obtain the original data and found a number of problems in the analysis, which are well summarized in this blog post. This episode stands as a cautionary tale about proper data management and open access; these issues are finally being recognized as critical to the integrity of science.