The Holz Featured Researcher series invites UW-Madison researchers to discuss their work and illuminate their workflows, insights, and challenges when it comes to working with their research data.
RDS: What are some of the challenges you’ve encountered while working with these types of data?
EH: Yeah, I mean, I’m trying to think, there’s so many. I mean, I’ve already said, I guess, a fair amount about the data model and how it’s been kind of messy and decentralized. So that’s been a challenge. You know, now that we’re trying to do this important and overdue work of diversifying and globalizing the collection, it feels good to do now. But that’s also created challenges too, where the American copyright laws were always very favorable to our work since in most cases, the publishers of these magazines never renewed the copyright. So for the most part, movie magazines are public domain until 1964. And you can check that. Check that just by going through the Library of Congress in this Copyright Office record. But when you start trying to scan things from Mexico and Brazil, Japan, and Portugal, and France, Germany, there’s just a whole new web of intellectual property laws to try to learn about.
And beside the legal side, there’s also what I think of as being more of the ethical component of this work. As an American-led project, we don’t want to just sort of go out in some kind of imperial-type fashion trying to conquer everything. We really want to be respectful of the work that archives are already doing. Some archives are already scanning magazines from their collections and posting them freely to the web. In some of those cases, the magazines are very old and they are no longer protected by copyright. In those cases, it’s important to me to get the consent of the institution that was doing the work. If they don’t want us to use the files within Media History Digital Library, my attitude is, ‘Let’s respect those wishes.’ There’s enough other things we can spend our time on without using their work in a way that they would object to, even if it’s legally permissible.
RDS: What tools or methods have you found helpful? Any that you wish you’d found sooner?
EH: Hmm. Let’s see. I mean, I guess I tend to think about it from the researcher’s standpoint. But I have also thought about it from the standpoint of ‘’Where can you get funding to develop tools?’ And I think when I came into this work, it was like, 2011 to 2012. It was a kind of a hot moment for the digital humanities, but also where part of that heat and luster was sort of these calls to say ‘We need to go beyond search, search is too simple, we need topic modelling and we need to have the deeper analytics.’ So, it was easier to go after grants to do that kind of work than it was to just say, ‘Hey, I’d like to scan more things. I’d like to build a search engine.’ So, we did some of that. I’m proud of it. We have this data mining app called Arclight that we developed because we were able to get a Digging into Data grant back in 2014-2016. I spent a lot of time working on that and I’m proud of it. My collaborators and I published a book that shared research from the process and also shared reflections from scholars in the field. I’m proud of all that. But when I look at the user statistics, Lantern is used so much more than Arclight. So, in other words, there is far more demand for core functions of search and keyword searching. That’s what people keep coming back to and seem to find it most helpful for their research.
One thing that’s been really gratifying about being able to spend time on this ACLS-funded project, is that it really is like kind of coming back to the basics, like ‘Let’s make the database better or make the collection better.’ So I’m grateful to the ACLS for creating that competition where instead of asking for projects to submit proposals to do groovy data visualizations, it’s like ‘Let’s take projects that are already there but could be a lot better and try to, you know, find ways to support them to make that leap from good to great.’ So, yeah, that feels good. I hope more granting organizations-I think they are-follow that lead. The NEH is recognizing that it makes a lot of sense to give out sustainability grants and grants to help, again, people improve those core things they’re doing rather than putting all the focus on the new cutting edge methods. Of course, you need both. I’m not saying ‘Don’t fund data analytics’ or new versions of machine learning that could be applied here. I think there’s a lot of potential and promise. But for a while, it felt like you could get the money to pay for the frosting, but not the cake. And I’m glad we passed that.
RDS: Do you have any big picture insights or lessons learned about preparing data, developing project ideas from plans to reality, or collaborating on data intensive research?
EH: It takes a lot of work. It takes time. And, you know, I started out with, like, the metaphor of the Venn diagram. But the big difference is that in one of these circles, when I’m writing articles and books, I can just like publish them and walk away. It’s done. And when you’re doing anything related to database and software development and web platform work, you’re never finished. You know, you walk away for too long and the whole thing will just fall apart or you become compromised by malware. So, it takes ongoing work to sustain it. And that work is substantial, and it should be recognized. But I also think that the work is worth it. You know, there’s over 10,000 people, unique users from around the world who use Lantern every month. Weekly analytics suggests that they’re generally spending quite a bit of time with at least 12 or 13 minutes is the average per user session. So, they’re using it and they’re getting things done with it. And when I go to conferences now, like I see in people’s PowerPoint, Moving Picture World, from 1921. I had that magazine in my trunk. Sometimes they’ll acknowledge it and then they’ll say, ‘Here’s Moving Picture World. I got this from Media History Digital Library’ and sometimes they won’t, but I still know.
It’s gratifying to see the way it’s become really woven into the research fabric of the field. And people continue to use it. And right now, we’re at a moment in time where you can’t even get to the old microfilm readers. People really are dependent in a big way on online access. I look forward to the archives and libraries reopening, but I’m also glad that both now and in the future, as they’re doing research, they can just go to lantern.mediahist.org and run a query and start researching that way from their living room, sitting in sweatpants. So, my final reflections are: that it’s hard work to keep up. But I think it’s work that’s worth doing.
The interview has been lightly edited for content and clarity.