October 2017 Brown Bag: Matthew Garcia

The Rebecca J. Holz Series in Research Data Management is a monthly lecture series hosted during the spring and fall academic semesters. Research Data Services invites speakers from a variety of disciplines to talk about their research or involvement with data.

On October 12th, Matthew Garcia, a PhD candidate in Forest Science with the Dept of Forest & Wildlife Ecology at UW-Madison, gave a talk entitled “The accurate data management plan (if such exists) in the presence of ‘Big Data'”. His slides are embedded below and are also available on the Research Data Services Speakerdeck page.

(more…)

Fall 2017 Holz Brown Bag Lineup

The Rebecca J. Holz Series in Research Data Management presents talks on various data-related topics. Each presentation is held in room 126 of Memorial Library from noon to 1:00 PM (bring your lunch!).

The next presentation is October 12, 2017. Matthew Garcia, Ph.D. Candidate, Forest Science, Dept. of Forest & Wildlife Ecology will present The accurate Data Management Plan (if such exists) in the presence of “Big Data”.

On November 15, Morton Ann Gernsbacher, Vilas Research Professor and Sir Frederic Bartlett Professor – UW-Madison will present Benefits of Open Data and Open Stimuli.

(more…)

April 2016 RDS Brown Bag: Robert A. Haworth

The Rebecca J. Holz series in Research Data Management is a monthly lecture series hosted during the spring and fall academic semesters. Research Data Services invites speakers from a variety of disciplines to talk about their research or involvement with data. 

On April 13, 2016, Robert A. Haworth, Distinguished Scientist Emeritus at the University of Wisconsin – Madison, gave a talk entitled “labElephant: A Metadatabase Application for Managing the Research Endeavor”. You can find the slides on the Research Data Services Speakerdeck page.

Robert Haworth’s talk focused on a software he has developed, called labElephant, to help labs manage the research and discovery process. Many researchers may be familiar with electronic lab notebooks which help manage the research data lifecycle. However, Haworth has designed labElephant to merge the data management element into a larger cycle that includes the knowledge production process and the housekeeping aspects of the lab environment. labElephant provides an interface through which the user can import content from their citation manager, track important information gleaned from papers, conferences, books, etc., and then connect them to synthesize into bigger ideas or hypotheses. The user can then also link the ideas and hypotheses to the related experiments. labElephant is described as a metadatabase because the software itself does not contain any of the experimental data but instead leverages the systems in place by linking to where the data already lives on the lab’s software structures. Through the labElephant system, researchers are able to then track experiments, outcomes, materials used, methods used, as well as link that information back to the initial recorded idea or hypothesis behind the experiment. The experiment information and hypotheses can then also be produced as a report for the user to use as a skeleton for a paper or article.

 

March 2016 RDS Brown Bag: Alex Hanna

The Rebecca J. Holz series in Research Data Management is a monthly lecture series hosted during the spring and fall academic semesters. Research Data Services invites speakers from a variety of disciplines to talk about their research or involvement with data.

On March 9th 2016, Alex Hanna, a PhD candidate in the Department of Sociology at the University of Wisconsin – Madison, gave a talk entitled “Data Pipelines and Computational Methods for the Social Sciences”. You can find her slides on the Research Data Services Speakerdeck page.

Alex Hanna’s talk covered three key areas of Hanna’s work. The first was ‘Twitter and Politics’, which was about the work Hanna does with the Social Media and Democracy research group at UW-Madison. The research group has a Twitter archive that currently contains over 50 billion tweets and continues to download around one percent of all tweets produced each day. With the collected data, they are able to study the tweets in relation to political events, for example, they can study how users respond to the sound and physical appearances of the candidates and then map the mentions of candidates onto key speaking points of the debates. Hanna then discussed the hardware and software changes that have been made to the archive as it grows and how the changes have enabled them to process the data more quickly.

The second piece of the talk covered protest event data, a subject that forms the crux of Hanna’s dissertation. The protest event data is extracted from information reported in news articles, focusing on articles available in newspapers for their publishing consistency and role in the historical record. The data collection process for the event data used to be labor intensive and costly as articles had to be collected, filtered, coded by hand to a codebook, and then coded into usable data. Hanna’s dissertation has focused on creating Machine-learning Protest Event Data System (MPEDS). The system improves the process through automation with limited human intervention. This portion of the talk focused on the changes to the process, which have allowed for work on a larger scale while also providing better searching and indexing upon insertion.

In the final portion of the talk, Hanna discussed “Computational Social Science Education”. Hanna seeks to help new and veteran scientists expand their computational language literacies beyond the traditional social science tools of SPSS and STATA by teaching languages such as R, Hadoop, command line and Python. These languages give social scientists new and more flexible functions with which to complete their research, such as web scraping, large scale networks, and automated text analysis. Hanna covered the pedagogical approaches, workshop examples, and lessons learned from teaching these tools.

 

February RDS Brown Bag Talk: Jack Williams & Simon Goring

by Cameron Cook

The Rebecca J. Holz series in Research Data Management is a monthly lecture series hosted during the spring and fall academic semesters. Research Data Services invites speakers from a variety of disciplines to talk about their research or involvement with data.

On February 17th, 2016 Jack Williams and Simon Goring, researchers from the Department of Geography at the University of Wisconsin-Madison, gave a talk for the Holz series entitled “Community-Supported Data Repositories in Paleoecoinformatics: Building the Middle Tail”. You can find their slides on the Research Data Services Speakerdeck page.

Jack Williams covered the first part of the talk, “Understanding the Data, Framing the Challenge”, which provided an introduction to the field of Paleoecology, to the characteristics of paleoecology data and data cycle, and to the Neotoma Paleoecology Database.  Neotoma is one of the community repositories that have developed as a response to the need for researchers to access many datasets to create larger networks in order to answer questions that span large chunks of time and space. These community repositories have similar features: they have open data, are community curated, include standardized taxonomy, and have age controls and age models. Jack ended with an important question – “Neotoma is one informatic initiative among many, how best to cross-link and cross-leverage?”

The second part of the talk, “Connecting Users, Data, and Repositories”, was covered by Simon Goring, who gave an overview of the opportunities for community repositories to act as a space to bring together the disparate parts of the research process – the journal and the database. Jack and Simon suggested that repositories can help mediate these pieces by building workflow tools that help make data submission an integrated piece of the research process as well as by acting as a broker between researchers and the myriad platforms and formats available. Simon also covered examples of successful linking and building-upon of initiatives from Neotoma and EarthCube, as well as other cyberinfrastructure tools created by EarthCube.

December RDS Brown Bag Talk: Jaime Martindale & AJ Wortley

by Cameron Cook

The Rebecca J. Holz series in Research Data Management is a monthly lecture series hosted during the spring and fall academic semesters. Research Data Services invites speakers from a variety of disciplines to talk about their research or involvement with data.

This December, Jaime Martindale, Map & GIS Data Librarian at the Arthur Robinson Map Library, and AJ Wortley of the State Cartographer’s Office, shared their talk “Geospatial Data Preservation & Management”. You can find their archived slides here.

Jaime and AJ’s talk focused on the work they have done over the past year archiving and releasing data through a  UW-Madison campus geoportal, as well as the work they will be doing over the next year to help a CIC wide geoportal get off the ground.  During their talk they introduced what geospatial data is in terms of raster data and vector data, how geospatial data is used in educational contexts through the geoportal they have built, as well as the process of collecting and archiving the data for use in the geoportal databases. They also noted, as they gave a brief view into the process of starting the new CIC wide geoportal, that they are weighing choices similar to ones faced through the research data lifecycle – i.e. decisions on file formats, software, standards, etc.

 

November RDS Brown Bag Talk: Karl Broman

By Cameron Cook

The Rebecca J. Holz series in Research Data Management is a monthly lecture series hosted during the spring and fall academic semesters. Research Data Services invites speakers from a variety of disciplines to talk about their research or involvement with data.

This November, Karl Broman, a professor in the Department of Biostatistics and Medical Informatics at the University of Wisconsin – Madison, gave a talk entitled “Reproducible Research”. You can find the slides and notes for his talk on his website.

Karl’s talk centered on steps that can be implemented over time to make one’s work fully reproducible. He began with an overview of the issues he faces in his work, highlighting common problems and misconceptions that many researchers have experienced. He also introduced the idea of what reproducibility means and how that idea differs from replicable research. The steps to reproducibility that are outlined in Karl’s talk are listed below, further explanation can be found in the notes attached to his archived slides.

 1) Everything with a script – Everything you do, do in code.

2) Organize your data and code – Make your code or data meaningful to someone else.

3) Automate the process – Karl uses GNU Make for the process.

4) Turn your scripts into reproducible reports – Give a better picture of the data.

5) Turn repeated code into functions – Write better code!

6) Create a package/module – Don’t repeat yourself. Reuse and improve upon the code you have.

7) Use version control – Through tools like git/GitHub.

8) License your software – License your code so you can share it with others.