April 2016 RDS Brown Bag: Robert A. Haworth

The Rebecca J. Holz series in Research Data Management is a monthly lecture series hosted during the spring and fall academic semesters. Research Data Services invites speakers from a variety of disciplines to talk about their research or involvement with data. 

On April 13, 2016, Robert A. Haworth, Distinguished Scientist Emeritus at the University of Wisconsin – Madison, gave a talk entitled “labElephant: A Metadatabase Application for Managing the Research Endeavor”. You can find the slides on the Research Data Services Speakerdeck page.

Robert Haworth’s talk focused on a software he has developed, called labElephant, to help labs manage the research and discovery process. Many researchers may be familiar with electronic lab notebooks which help manage the research data lifecycle. However, Haworth has designed labElephant to merge the data management element into a larger cycle that includes the knowledge production process and the housekeeping aspects of the lab environment. labElephant provides an interface through which the user can import content from their citation manager, track important information gleaned from papers, conferences, books, etc., and then connect them to synthesize into bigger ideas or hypotheses. The user can then also link the ideas and hypotheses to the related experiments. labElephant is described as a metadatabase because the software itself does not contain any of the experimental data but instead leverages the systems in place by linking to where the data already lives on the lab’s software structures. Through the labElephant system, researchers are able to then track experiments, outcomes, materials used, methods used, as well as link that information back to the initial recorded idea or hypothesis behind the experiment. The experiment information and hypotheses can then also be produced as a report for the user to use as a skeleton for a paper or article.

 

March 2016 RDS Brown Bag: Alex Hanna

The Rebecca J. Holz series in Research Data Management is a monthly lecture series hosted during the spring and fall academic semesters. Research Data Services invites speakers from a variety of disciplines to talk about their research or involvement with data.

On March 9th 2016, Alex Hanna, a PhD candidate in the Department of Sociology at the University of Wisconsin – Madison, gave a talk entitled “Data Pipelines and Computational Methods for the Social Sciences”. You can find her slides on the Research Data Services Speakerdeck page.

Alex Hanna’s talk covered three key areas of Hanna’s work. The first was ‘Twitter and Politics’, which was about the work Hanna does with the Social Media and Democracy research group at UW-Madison. The research group has a Twitter archive that currently contains over 50 billion tweets and continues to download around one percent of all tweets produced each day. With the collected data, they are able to study the tweets in relation to political events, for example, they can study how users respond to the sound and physical appearances of the candidates and then map the mentions of candidates onto key speaking points of the debates. Hanna then discussed the hardware and software changes that have been made to the archive as it grows and how the changes have enabled them to process the data more quickly.

The second piece of the talk covered protest event data, a subject that forms the crux of Hanna’s dissertation. The protest event data is extracted from information reported in news articles, focusing on articles available in newspapers for their publishing consistency and role in the historical record. The data collection process for the event data used to be labor intensive and costly as articles had to be collected, filtered, coded by hand to a codebook, and then coded into usable data. Hanna’s dissertation has focused on creating Machine-learning Protest Event Data System (MPEDS). The system improves the process through automation with limited human intervention. This portion of the talk focused on the changes to the process, which have allowed for work on a larger scale while also providing better searching and indexing upon insertion.

In the final portion of the talk, Hanna discussed “Computational Social Science Education”. Hanna seeks to help new and veteran scientists expand their computational language literacies beyond the traditional social science tools of SPSS and STATA by teaching languages such as R, Hadoop, command line and Python. These languages give social scientists new and more flexible functions with which to complete their research, such as web scraping, large scale networks, and automated text analysis. Hanna covered the pedagogical approaches, workshop examples, and lessons learned from teaching these tools.

 

February RDS Brown Bag Talk: Jack Williams & Simon Goring

by Cameron Cook

The Rebecca J. Holz series in Research Data Management is a monthly lecture series hosted during the spring and fall academic semesters. Research Data Services invites speakers from a variety of disciplines to talk about their research or involvement with data.

On February 17th, 2016 Jack Williams and Simon Goring, researchers from the Department of Geography at the University of Wisconsin-Madison, gave a talk for the Holz series entitled “Community-Supported Data Repositories in Paleoecoinformatics: Building the Middle Tail”. You can find their slides on the Research Data Services Speakerdeck page.

Jack Williams covered the first part of the talk, “Understanding the Data, Framing the Challenge”, which provided an introduction to the field of Paleoecology, to the characteristics of paleoecology data and data cycle, and to the Neotoma Paleoecology Database.  Neotoma is one of the community repositories that have developed as a response to the need for researchers to access many datasets to create larger networks in order to answer questions that span large chunks of time and space. These community repositories have similar features: they have open data, are community curated, include standardized taxonomy, and have age controls and age models. Jack ended with an important question – “Neotoma is one informatic initiative among many, how best to cross-link and cross-leverage?”

The second part of the talk, “Connecting Users, Data, and Repositories”, was covered by Simon Goring, who gave an overview of the opportunities for community repositories to act as a space to bring together the disparate parts of the research process – the journal and the database. Jack and Simon suggested that repositories can help mediate these pieces by building workflow tools that help make data submission an integrated piece of the research process as well as by acting as a broker between researchers and the myriad platforms and formats available. Simon also covered examples of successful linking and building-upon of initiatives from Neotoma and EarthCube, as well as other cyberinfrastructure tools created by EarthCube.

December RDS Brown Bag Talk: Jaime Martindale & AJ Wortley

by Cameron Cook

The Rebecca J. Holz series in Research Data Management is a monthly lecture series hosted during the spring and fall academic semesters. Research Data Services invites speakers from a variety of disciplines to talk about their research or involvement with data.

This December, Jaime Martindale, Map & GIS Data Librarian at the Arthur Robinson Map Library, and AJ Wortley of the State Cartographer’s Office, shared their talk “Geospatial Data Preservation & Management”. You can find their archived slides here.

Jaime and AJ’s talk focused on the work they have done over the past year archiving and releasing data through a  UW-Madison campus geoportal, as well as the work they will be doing over the next year to help a CIC wide geoportal get off the ground.  During their talk they introduced what geospatial data is in terms of raster data and vector data, how geospatial data is used in educational contexts through the geoportal they have built, as well as the process of collecting and archiving the data for use in the geoportal databases. They also noted, as they gave a brief view into the process of starting the new CIC wide geoportal, that they are weighing choices similar to ones faced through the research data lifecycle – i.e. decisions on file formats, software, standards, etc.

 

November RDS Brown Bag Talk: Karl Broman

By Cameron Cook

The Rebecca J. Holz series in Research Data Management is a monthly lecture series hosted during the spring and fall academic semesters. Research Data Services invites speakers from a variety of disciplines to talk about their research or involvement with data.

This November, Karl Broman, a professor in the Department of Biostatistics and Medical Informatics at the University of Wisconsin – Madison, gave a talk entitled “Reproducible Research”. You can find the slides and notes for his talk on his website.

Karl’s talk centered on steps that can be implemented over time to make one’s work fully reproducible. He began with an overview of the issues he faces in his work, highlighting common problems and misconceptions that many researchers have experienced. He also introduced the idea of what reproducibility means and how that idea differs from replicable research. The steps to reproducibility that are outlined in Karl’s talk are listed below, further explanation can be found in the notes attached to his archived slides.

 1) Everything with a script – Everything you do, do in code.

2) Organize your data and code – Make your code or data meaningful to someone else.

3) Automate the process – Karl uses GNU Make for the process.

4) Turn your scripts into reproducible reports – Give a better picture of the data.

5) Turn repeated code into functions – Write better code!

6) Create a package/module – Don’t repeat yourself. Reuse and improve upon the code you have.

7) Use version control – Through tools like git/GitHub.

8) License your software – License your code so you can share it with others.

 

October RDS Brown Bag Talk: Jason Fishbain

by Cameron Cook

The Rebecca J. Holz series in Research Data Management is a monthly lecture series hosted during the spring and fall academic semesters. Research Data Services invites speakers from a variety of disciplines to talk about their research or involvement with data.

This October, Jason Fishbain, UW-Madison’s Chief Data Officer, gave a talk entitled “The Role the Chief Data Officer Can Play in Helping the Research Community”. You can find the archived presentation slides on MINDS@UW.

Jason’s talk began with an overview of the data governance program he is working to establish at UW. He provided an introduction into the different types of data that the system produces – administrative, local, and research – as well as his ideas for a data governance framework that has an overarching goal of information literacy for the campus. But information literacy means means educating users and instituting change in four specific areas:

1) Policies and Standards – i.e. making decision on who is accountable for the data and data policies, crafting a data stewardship policy.

2) Information Quality – i.e.  control workflows that ensures data quality and process quality

3) Privacy, Compliance, and Security – i.e.  making decisions on what ‘restricted’ and ‘classified’ means, how to comply with privacy and security laws (like FERPA) within our data management plan.

4) Architecture and Integration – i.e. means having consistent data definitions, data dictionaries

Creating a data governance framework is a large culture overhaul for an institution and it needs technology and people to make the changes work. However, it provides great potential for the Chief Data Officer to assist the research community. It sets up potential for greater institutional support of the research enterprise by providing more resources such as assistance for preparing grant proposals, complying with funding requirements, storage of data, etc.

September RDS Brown Bag Talk: Mattie Burkert

by Cameron Cook

The Rebecca J. Holz series in Research Data Management is a monthly lecture series hosted during the spring and fall academic semesters. Research Data Services invites speakers from a variety of disciplines to talk about their research or involvement with data.

This September, Mattie Burkert, a PhD student from the Department of English, gave a talk entitled “Recovering the London Stage Information Bank (1970-1978): Data Preservation Lessons from an Early Humanities Computing Project”.

You can find the archived presentation slides on MINDS@UW.

Mattie’s talk focused on her work of piecing together what remains from the London Stage Information Bank, an early digital humanities computing initiative from the 1970s that sought to transform the printed text The London Stage into a data bank queryable by researchers. Her work touches on the rapid media obsolescence and the misconceptions of data preservation of the time, both of which can seen as lessons for today’s digital humanities and digital scholarship worlds. Her talk also gave a brief view into the project as it stands, the tools and techniques she has used for reconstructing lost data, and the difficulties faced as she continues her project.