An Interview with the Center for High Throughput Computing (CHTC)

We spoke with Lauren Michael, a Research Computing Facilitator with CHTC, to learn more about the center and how their role supports researchers at UW-Madison.

Can you tell us a little bit about the CHTC? Its history and purposes?

The Center for High Throughput Computing (CHTC) serves as UW-Madison’s core research computing center, leveraging a long history of international contributions to the field of parallel computing as the pioneer of high-throughput computing (HTC) principles. This work has included decades of ongoing development of HTC technologies like the HTCondor software suite and many others that are used by companies, research institutions, and major research collaborations. Within the department of Computer Sciences, the CHTC was established in 2006 around this work, led by Director Miron Livny, professor of Computer Sciences and Chief Technology Officer for both the Wisconsin Institute for Discovery and Morgridge Institute for Research (in the Discovery building). 


DH Tools Part 1: Off-the-Shelf

You don’t have to learn an entirely new programming language to do cutting edge digital humanities work. There are many sophisticated, useful off the shelf tools that you can use for your research. Many of them are as simple as using a web browser and can produce thoughtful, well-designed, and interactive research outputs. If your research requires some coding know-how, read our post on tutorials and resources for acquiring programming skills. 

As always, be sure to read the Terms of Service and Privacy Policy for any tool you use. Seek to understand how your data will be stored, shared, or if your final output is made public. Part of good data management is understanding how your tools handle your data and making responsible choices about what tools you select.


Link Roundup – May 2020

light bulb

Cameron Cook

John Yin, Professor of Chemical and Biological Engineering at UW-Madison, is using computational methods to understand the material conditions viruses use to reproduce themselves. He is hopeful that such an approach will allow us interfere with their reproduction and prevent the spread of viruses like COVID-19.

Brian Foo, one of the Library of Congress Innovators in Residence has released a beta version of “Citizen DJ”. You can create remix, create music, or download free-to-use audio clips from their collections to use as part of your projects or as a dataset.

Jennifer Patiño

Led by their “Innovator in Residence,” Ben Lee, the Library of Congress is using sophisticated machine learning tools to digitize and organize images from several centuries of American newspapers. The result is a tool for searching a truly massive collection of historical newspaper images called the “Newspaper Navigator“.

Stat News provides tips for researchers for staying connected, moving to virtual research, and reusing datasets to ask new questions during COVID-19.

Kent Emerson

Researchers at UW-Madison have been involved in the development of a desktop and mobile app designed to help Wisconsinites navigate the COVID-19 pandemic. The app, called Wisconsin Connect, features discussion rooms, fact checkers, prevention techniques, symptom trackers and much more. The app should be available via the Apple Store and Google Play in May.

The Media History Digital Library project’s search interface, called Lantern, has received some exciting updates to facilitate more precise searching and resource location. Take minute to browse this huge media history resource.  

Link Roundup April 2020

light bulb

Cameron Cook

UW-Madison’s American Family Insurance Data Science Institute has posted a collection of COVID-19 resources that demonstrate the contributions data science can make to better understanding the virus. These resources include projections for the virus’ spread and treatment, visualizations, research datasets and code bases, as well as stories about how data scientists are helping the efforts to combat the virus.

Semantic Scholar has made the COVID-19 Open Research Dataset (CORD-19) accessible online for download and analysis. Access to the data provides researchers with an opportunity to apply the newest methods of analysis to the data to aid in understanding the virus. The dataset was prepared through a partnership between leading researchers and the Allen Institute for AI.

Jennifer Patiño

In their series called Chart Chat, Tableau has shared a discussion of COVID-19 data visualizations. It covers the history of pandemic visualizations, different iterations of what flattening the curve might look like, and how to use data responsibly in visualizations.

Open access publisher, Frontiers, has developed a portal that connects researchers studying the COVID-19 virus to sources of funding. In addition to listing open funding calls, the portal features a dashboard that presents essential information about funding requirements, deadlines, and organizations, all of which streamlines the search for funding for researchers. You can also find resources for COVID-19 research funding, general funding, and tools to help throughout the award lifecycle from UW Madison’s Research and Sponsored Programs.

Kent Emerson

To help during the COVID-19 related campus closure, DoIT shares technology for working remotely and technology for learning remotely.

Tool: Digital Mappa

Written by Martin Foys and Maxwell Gray

What is Digital Mappa?

Digital Mappa 2.0 (DM) is an open-source, collaborative digital humanities platform for public or private workspaces, projects and scholarly publications. The platform software can be installed on a local or cloud server, and collaborators can highlight, annotate and link collections of digital texts and local, online and/or IIIF images through an array of easy to use tools. The platform development is directed by UW-Madison Professor of English Martin Foys, and there is a UW instance of it run under the UW Center for the History of Print and Digital Culture.

What Can DM Do For You?

The premise of DM is simple: if you have a collection of digital images and texts, then you should be able to develop a project where you can identify specific moments on these images and texts, annotate them as much as you want, link them together, generate searchable text content, collaborate with your colleagues, and publish your work online at a durable URL for others to see and share. DM 2.0 gives you the environment to make this happen, and users can create basic or sophisticated individual or collaborative projects with no coding ability whatsoever. Users can add images to DM projects locally or remotely through remote urls or IIIF image protocols. A search feature compiles all annotation text in a project into a searchable resource, and projects can be set to private, group or public access.

DM 2.0 was initially developed for deployment on the Heroku cloud-server platform, where installation and administration is straightforward for server administrators, after which developing individual projects requires no specific IT expertise. DM is now also available for local server installation and administration.


An Introduction to Web Scraping for Research

Like web archiving, web scraping is a process by which you can collect data from websites and save it for further research or preserve it over time. Also like web archiving, web scraping can be done through manual selection or it can involve the automated crawling of web pages using pre-programmed scraping applications.

Unlike web archiving, which is designed to preserve the look and feel of websites, web scraping is mostly used for gathering textual data. Most web scraping tools also allow you to structure the data as you collect it. So, instead of massive unstructured text files, you can transform your scraped data into spreadsheet, csv, or database formats that allow you to analyze and use it in your research. 

There are many applications for web scraping. Companies use it for market and pricing research, weather services use it to track weather information, and real estate companies harvest data on properties. But researchers also use web scraping to perform research on web forums or social media such as Twitter and Facebook, large collections of data or documents published on the web, and for monitoring changes to web pages over time. If you are interested in identifying, collecting, and preserving textual data that exists online, there is almost certainly a scraping tool that can fit your research needs. 

Please be advised that if you are collecting data from web pages, forums, social media, or other web materials for research purposes and it may constitute human subjects research, you must consult with and follow the appropriate UW-Madison Institutional Review Board process as well as follow their guidelines on “Technology & New Media Research”.  (more…)