Documenting DH: Deidre Stuffer and the Visualizing English Print Project

Written by Heather Wacha

In last month’s RDS newsletter, the Digital Humanities Research Network (DHRN) announced its new project “Documenting DH”, which consists of a series of audio interviews with various humanities scholars and students around the University of Wisconsin, Madison campus. Each interviewee is given a chance to talk about what they see as the project’s data and how that data is managed. In November we interviewed our first digital humanist, Deidre Stuffer from the English Department here at the University of Wisconsin, Madison and her interview will be accessible to the public on the DHRN website starting December 12th, 2016. She has been working on the Visualizing English Print project (VEP), which involves encoding 61,000 early printed literary texts and finding interesting and significant ways to visualize and analyze all or part of that corpora.

What data is involved in this project?

As a literary scholar and computer scientist, Deidre sees her data as messy. After describing its several iterations – xml files, plain text files, metadata for each, as well as a myriad of visualizations for said data – I, as a historian with no computer science background still had no idea what her raw “data” was. Deidre saw the somewhat puzzled look on my face and immediately clarified herself: “61,000 early printed texts 1470-1800.” Since she bridges the gap between humanist and scientist, Deidre easily recognizes when someone is not quite comprehending what she is saying, and can rephrase her explanations depending on whether she is talking from a humanist perspective or a computer science perspective; a gentle reminder to all of us that one of the greatest benefits from cross-pollination between disciplines is to learn to speak in a manner that facilitates the translation of ideas.

How is the data managed?

As our conversation turned towards the management of all the data, Deidre admitted that she did a lot of babysitting of xml and plain text files in filesystems and spreadsheets held on GitHub. GitHub offers the VEP project coordinators the possibility to access previous versions of datasets so that Deidre and others can make corrections or edits when needed.

What are some of the challenges of working with this data?

One of the challenges of such a project is simply its scope, especially when it comes to figuring out how to survey the data. For example, Deidre talked about when she was set the task of streamlining spellings and had to find the best spelling for the word “tire.” She surveyed all the texts and found that the most frequent spelling of the word was indeed “tire,” as opposed to other variants such as “tyre”. While this may seem logical and obvious at first glance, the comprehensive change meant that the Prince of Tyre in Shakespeare’s Pericles, became the Prince of Tire. In the end the Prince’s title remains “Tire,” but the text is able to show that the spelling has been altered to fit standardization decisions.

What are some of the pleasures of working with this data?

Deidre loves to research the metadata associated with her corpora of texts. Just last week she was researching the names of all the authors that figure in these 61,000 texts and found that one of the author’s descendents is Taylor Swift, the muscian. And no, this particular ancestor of Taylor was not Jonathan.