Pierce Edmiston, PhD candidate, Department of Psychology | University of Wisconsin-Madison
What can scientists learn from the open source community when it comes to improving the reproducibility of research methods and results? I’ll introduce three problems in reproducibility, explain how they’ve been solved by the open source community, and demonstrate how these solutions can be utilized by scientists to make for better and more reproducible research. I’ll cover version control systems, dynamic documents, and “hollow research projects”–what I consider to be the epitome of reproducible research. I’ll review the empirical evidence that open source practices make for better science, and offer some speculation from the perspective of cultural evolution as to why open source has been so successful.
Social coding: It’s not a communicable disease
Ankur R Desai, Professor, Atmospheric and Oceanic Sciences, Ned P Smith Professorship of Climatology, University of Wisconsin-Madison
We all work broadly among networks of scholars diffused among multiple institutions. Over time this has led to practices for sharing of ideas, data, and tools for group organization, meeting, and writing. At the same time, virtually all data analyses in all fields have become more computational and programming intensive. However, broadly accessible tools for joint coding and reproducing computational results of others have not been widely adopted by researchers. Social coding and development tools, such as GitHub, Docker, and Slack, have had widespread adoption in commercial and non-profit software engineering. Here, I will discuss how these tools also enhance research by improving interaction on large programming-based data analyses, ensuring code reliability and reproducibility, and providing a means to widely document and share code and results. I will draw on examples from my own lab’s research on development of community code for processing of environmental and atmospheric field observations and on the development of an informatics tool for ecosystem modeling. I will also highlight a colleague’s novel experiment to run an entire lab-based experiment from idea to instrument output to code to results to manuscript in GitHub. Finally, I will wrap up with discussion on the pros and cons of social coding in a research environment and what the future may look like for best practices in code development and sharing for research
Lauren Michael & Christina Koch, Center for High Throughput Computing | UW-Madison
For research aiming to extract novel understanding from large and/or complex sets of data, the process of selecting from an ever-growing list of computational tools and analytical approaches is often a significant bottleneck. Beyond the difficulty in gaining awareness of various methods is the daunting task of identifying which of these will apply to and scale for the evolving complexity of future research problems. In fact, the ability of research projects to expand to greater dimensionality, validity, and impact is often unknowingly limited by researcher perceptions of the scalability of previously-selected computational tools. Ultimately, the scalability of any computational research project or tool relies on an ability to effectively break up (or “parallelize”) large and time-consuming tasks for better ease and throughput. As a general approach, high-throughput computing not only applies to a multitude of data-intensive research problems but also provides for built-in scalability into the future.
To provide a discussion of high-throughput computing approaches, Lauren Michael from UW-Madison’s Center for High Throughput Computing will discuss the applicability of high-throughput computing for executing data-intensive work and will describe the free support and computational capacity of the CHTC, which serves as UW-Madison’s research computing center. The seminar will include examples of projects from a diverse set of research disciplines that have been transformed through the use of high-throughput computing.
Sarah Stevens, Department of Bacteriology | UW-Madison
Want a community to help build up your own computational skills? Be it R, python, matlab, unix, or something subject specific? Start your own study group! Sarah will share about her own experiences starting the Computational Biology, Ecology, and Evolution (ComBEE) group and R and python study groups. She will also share information about resources available to help you start your own group.
Matt Moehr, Survey of the Health of Wisconsin | UW-Madison
Researchers have long known that geographic location is an important part of public health. However, precise geographic locations of people are usually restricted due to concerns of confidentiality. In this talk I will review the IRB and HIPAA regulations around geographical identifiers, and ask questions about balancing risks with the potential benefits from research. I will share examples from public health research and my experience working at the Survey of the Health of Wisconsin that demonstrate the challenges in finding this balance.
Jack Williams, Professor of Geography, Director of Nelson Center for Climatic Research | UW-Madison
Simon Goring, Assistant Scientist – Department of Geography | UW-Madison
Paleoecologists use geological data to study ecological dynamics during past environmental change. Our data is hard-won and expensive, usually requiring weeks to months of fieldwork and years of laboratory analyses. Our scientific expertise is dispersed and distributed across taxonomic groups, regions, geological time periods, and research questions. There is an inherent disconnect between the global-scale questions that motivate much of our research (e.g. the responses of global biodiversity to climate change) and the scale of data collection (site-level spatial data, long time scales).
Because of the above, paleoecologists have a long and proud tradition of sharing their data into community-supported data repositories (CSDRs), to enable them to tackle big questions and work at regional, continental, and global spatial scales. Now, the on-going revolution in information sciences is creating both challenges and opportunities for CSDRs, but mostly opportunities. In this talk we will present to you our perspective as paleoecologists who collect primary data, engage in large-scale synthetic research, and are increasingly taking on leadership roles in the building and development of the Neotoma Paleoecology Database (www.neotomadb.org), a CSDR dedicated to supporting community research into ecological dynamics over the large climate changes of the Quaternary Period. We’ll present recent developments and then discuss some of the current challenges that we are either solving or seeking solutions.
Alex Hanna, PhD Candidate, Department of Sociology | UW-Madison
With the mass proliferation of digital traces of social life, computational and data scientists have developed many different methods for manipulating and analyzing these data at scale. These data have the potential to bear on enduring questions in the social sciences. In this talk, I discuss some basic skills needed for literacy in computational methods and programming for the social sciences.
Robert A. Haworth, Distinguished Scientist Emeritus | UW-Madison
labElephant is a Microsoft Access database application that I developed as a biological research scientist at UW Madison for managing my own research laboratory. Most e-notebooks focus on laboratory housekeeping and data management. labElephant covers this, but more importantly facilitates the scientific process itself: tracking of knowledge learned from the literature, hypothesis development, and integration of knowledge with experiments. The application is designed for use in a multi-user lab environment where there is a secure shared lab folder on the department server where each lab person stores their own data in their own sub-folder. labElephant serves not only as a way for the Principal Investigator to organize and track this lab data, but also as a stimulus to thought, and as a training tool with graduate students.
Mattie Burkert, PhD student – Department of English | UW-Madison
This paper traces the little-known history of the London Stage Information Bank, a digital initiative that ran from 1970 to 1978 under the direction of Professor Ben R. Schneider, Jr. at Lawrence University. With support from the NEH, the ACLS, and the Mellon Foundation, Schneider’s team produced a database from the multi-volume reference work The London Stage, 1660-1800 (Carbondale: SIU Press, 1960). Today, however, many of the project’s outputs have been lost or corrupted — and despite an ongoing need for this resource, few theater researchers know that it ever existed. In detailing my archival and forensic efforts to recover the material artifacts of this nearly decade-long project, I present the London Stage Information Bank as an object lesson in crucial issues of access, preservation, and institutional memory facing digital humanities work today.
Jason Fishbain, Chief Data Officer | UW-Madison
As the champion for establishing strategies to effectively manage data in order to support the mission of UW-Madison, the Chief Data Officer is working on establishing our institution’s first Data Governance program. While the genesis of the program was the need to effectively manage UW-Madison’s administrative data and the systems that contain them, the scope of the program also includes establishing institutional support for managing data within the research enterprise. One of the goals of the CDO and the Data Governance program is to make the case for and provide the necessary resources to assist researchers in managing the data-lifecycle of their research endeavors. Jason will speak to the work that has been done to date and what planned efforts are upcoming in this realm.
Karl W Broman, Biostatistics & Medical Informatics | UW-Madison
A minimal standard for data analysis and other scientific computations is that they be reproducible: that the code and data are assembled in a way so that another group can re-create all of the results (e.g., the figures in a paper). I will discuss my personal struggles to make my work reproducible and will present a series of suggested steps on the path towards reproducibility (see http://kbroman.org/steps2rr).
Jaime Martindale, Map & Geospatial Data Librarian Arthur H. Robinson Map Library – Department of Geography | UW-Madison
AJ Wortley, State Cartographer’s Office – Department of Geography | UW-Madison
The Robinson Map Library has been archiving local geospatial data collected from Wisconsin counties and municipalities since 2006. Much has changed since then with regard to best practices related to management of archived geospatial data. In partnership with the WI State Cartographer’s Office, the library recently launched an online geoportal to provide access to archived geospatial data collections for educational use. Planning for and implementing the geoportal project meant creating a management plan for metadata, file organization, and preservation. In this presentation, we will share details on how we acquire, store, manage, and provide access to geospatial data for educational use at UW-Madison, and how we are planning to expand this access to all UW System institutions in the fall of 2015. In addition, we’ll share preliminary details on how this work might extend to geospatial research data produced at UW-Madison through an inter-campus effort still in its early stages with support from UW Libraries.
Elliott Shuppy, SLIS Graduate Student | UW-Madison
A growing demand for convenient sharing of research image data between the Laboratory for Optical and Computational Imaging (LOCI) and partner laboratories stimulated the need for enhanced data management processes and accompanying documentation. SLIS graduate student Elliott Shuppy began working with scientists as a researcher at LOCI in Fall 2014 to meet these current needs. During his talk he will discuss his involvement augmenting one scientist’s data management workflow, including reflections on current practices, positioned and proposed tactics, and next steps in the process.
Kristin Briney, Data Services Librarian | UW-Milwaukee Libraries
What does it take to create research data services where none existed before? Kristin Briney will discuss establishing Data Services at the University of Wisconsin-Milwaukee. Her talk will include strategy and lessons learned 18 months into the process.
Jim Jonas, Information and Instructional Services Librarian | MERIT Library
Brianna Marshall, Digital Curation Coordinator | UW-Madison Libraries
Carrie Nelson, Public Services/Instructional Content Librarian | UW-Madison Libraries
Doug Way, AUL for Collections & Research Services | UW-Madison Libraries
In this talk, the presenters will introduce the concepts of open access, data, and educational resources. They will share recent updates in each domain and highlight existing resources for learning more. The second half of the presentation will be reserved for questions and unstructured conversation about these issues.
Barry Radler, Researcher | UW-Madison Institute on Aging
The increasing availability of research and other data via the internet has spurred interest in and the need for better documentation of such data. The Open Data movement gaining momentum among federal funding agencies, academic libraries, and professional journals is also contributing to a recognition that good documentation and metadata are essential to distinguishing the quality of research datasets and facilitating their discovery and use in an online environment of ever-expanding information. This presentation will provide a primer in metadata use and metadata standards like the Data Documentation Initiative (DDI). It will also include reflections by the presenter on his particular DDI use cases, as well as his experience hosting the 3rd annual North American DDI Conference. There will be an opportunity for questions and discussion.
You’re Doing It Wrong!: Data Retention and the Cloud
Dan Uhlrich, Associate Vice Chancellor of Research Policy and Judy Caruso, CIO Office Director of Policy and Planning
You have data, but do you know how long you should keep it? And where? You might be surprised by the answers. Dan Uhlrich, Associate Vice Chancellor of Research Policy, will discuss the Graduate School’s Policy on Data Stewardship, Access, and Retention and how it applies to you, and Judy Caruso, CIO Office Director of Policy and Planning, will discuss the legal aspects of using cloud storage services.
Jan Cheetham, DoIT Academic Technology; Barry Radler, UW-Madison Institute on Aging; Emily Utzerath, Madison Teaching and Learning Excellence; Cid Frietag, DoIT Academic Technology
Do you want to get the most out of spreadsheets? Do you need to securely store and share data and files with collaborators across campus and beyond? Are you interested in learning more about data visualization? Learn how to be a power user of Excel, Box, and data visualization tools.
Introduction to Research Data Management Services at UW-Madison
Ryan Schryver, Wendt Commons; Alan Wolf, Office of the Vice Provost for Information Technology; Trisha Adamus, Ebling Library; Tom Mish School of Public Health
Wondering what a data management plan is? Why funders are asking for something called metadata? Where you will store your data and make it accessible to colleagues? If so, learn about many of the research data resources available to UW-Madison researchers. Speakers involved with assisting researchers with data management issues will speak on a variety of issues and services, including MINDS@UW, Advanced Computing Infrastructure, ORCID, VIVO, and data security.
Dr. Krishanu Saha, Wisconsin Institutes for Discovery
Dr. Saha will discuss his laboratory’s contributions to online stem-cell databanks, including data preparation and the advantages of participation.
Ross Tredinnick, Wisconsin Institutes for Discovery
Ross Tredinnick will demonstrate and explain the technology and data-management requirements associated with WID’s Living Environments Laboratory, also known as “the CAVE.”
Dr. Karen Strier, Department of Anthropology
Dr. Karen Strier will discuss her data-design process, aimed at collecting behavioral data on wild monkeys in Brazil. After consulting with DoIT, she engaged a student to design and build a custom relational database. She has been testing the design with uploads of existing data, and will soon have enough entered to start query-based analysis.
Data and Tangible Research Products: Stewardship, Access, and Retention
Assistant Dean Stephen J. Harsy, School of Medicine and Public Health
Dr. Harsy will discuss UW-Madison’s Policy on Data Stewardship, Access and Retention, which clarifies ownership of data produced by university researchers, as well as researcher responsibility for data before, during, and after specific research projects.
This lecture will take place in Room 1220-1222, Health Sciences Learning Center. Please post this advertising flyer widely!
De-Mystifying the Data Management Requirements of Research Funders
Trisha Adamus, Data, Network, and Translational Research Librarian, Ebling Library
Since the announcement of the NSF Data Management Plan requirement, researchers have had many questions about it: Will I be required to make all of my data available on the web in perpetuity? Where will I put all of this data? Who will pay for storage after the grant is finished? The answers to these questions aren’t always straightforward, especially since the NSF requirements are relatively general.
The NSF requirement isn’t the only one researchers face, however: many federal funding agencies have data management requirements for PIs. What does this landscape look like? What can we learn from examining a range of data policies? This talk will review over two dozen funder policies and provide strategies for writing a data management plan.
This lecture will take place in the third-floor Teaching Lab at WID/MIR.
Managing Sensitive Data Across Research Sites: the Wisconsin Alzheimer’s Institute
Dr. Erin Jonaitis, Wisconsin Alzheimer’s Institute
The Wisconsin Alzheimer’s Institute conducts longitudinal medical research at several sites across Wisconsin. Learn how WAI collects, transports, and stores data safely and securely despite the logistics difficulties inherent in multiple-site research.
This lecture will take place in the third-floor Teaching Lab at WID/MIR.
Semantics and Geospatial Data
Dr. Nancy Wiegand, Space Science and Engineering Center
Data are becoming more voluminous and more available, but finding and re-using data across agencies or jurisdictions remains difficult because different terms are used. This lack of semantic interoperability has been recognized as a stumbling block to collaboration. This talk introduces semantics, including ontologies, the vision of enabling semantic interoperability, and the linked data initiative.
This lecture will take place in the third-floor Teaching Lab at WID/MIR.
What should grad students know about data management? (Event recording)
IMPORTANT: This brownbag will take place in the SLIS LIBRARY CLASSROOM, 4191F Helen C. White. Go into the SLIS Laboratory Library and turn left.
This summer Dorothea Salo will inaugurate a one-credit bootcamp-style course on research-data management (LIS 341) for graduate students across the disciplines. What context, best practices, and technologies should be covered? PIs, PAs, RAs, postdocs, dissertators: come prepared with horror stories, experiences good and bad, and your own “if only I’d known that early on” wishlists!
GIS Data Preservation, with Jaime Stoltenberg and AJ Wortley (Adobe Connect archive)
Learn how the Robinson Map Library and the State Cartographer’s Office are archiving geospatial data for use in research and teaching. Preservation of at risk and temporally significant digital geospatial content poses challenges:
- Geospatial data layers containing information about land parcels, roads, and administrative boundaries change often.
- Existing copies of these data are often at risk of being overwritten when updates or changes are made, and these superseded snapshots of data are then lost for future use and analysis.
Teacher Incentive Fund, Sara Kraemer and Lexy Spry (Adobe Connect link)
The Teacher Incentive Fund is a data-driven project that takes advantage of knowledge-management and online-collaboration tools. Also discussed will be how school districts involved in the project use data to inform incentive and compensation decisions.
Long-Term Ecological Research Network, with Corinna Gries (Adobe Connect link)
LTER is an NSF-funded network of 26 sites throughout the US, Antarctica and French Polynesia. LTER has had the mandate to archive data and employ a dedicated Information Manager since its inception 30 years ago. The group of LTER Information Managers was instrumental in developing procedures and approaches for long-term ecological data storage, including the Ecological Metadata Language, employed for data discovery and data access. New challenges include:
- streaming sensor data management
- a centralized network information system, and
- implementation of workflow systems for quality control measures.
May 7 bonus: posters from LIS 855 “Digital Curation” service-learning projects centered on local datasets.
NSF Data Management Plans
Research Data Services will facilitate a discussion of the NSF’s new guidelines for grant applicants. Written an NSF grant and gotten DMP feedback? About to write one, and have questions or concerns? Have a wishlist for campus support of data management and DMPs? We want to hear from you!
Kevin Eliceiri, Image Informatics for Multidimensional Biological Microscopy
Dr. Eliceiri and the Laboratory for Optical and Computational Instrumentation (http://loci.wisc.edu/) are developing a complete, open source system for handling biomedical images, including image acquisition, data storage, metadata (experimental data associated with an image), visualization, analysis, annotation, and database interconnectivity.
Puneet Kishor, Building a data-sharing database: PaleoDB
Noted open-data practitioner Puneet Kishor of the Department of Geology will demo the Paleobiology Database (http://paleodb.org/), an open, crowdsourced online database of fossils. Questions about technical infrastructure, sustainability, and data sharing welcome!
Brian Yandell, Statistics support for UW-Madison research
Statistics department chair Brian Yandell will speak about how his department’s strategic plan emphasizes improving statistical literacy and access to statistics support and assistance for UW-Madison researchers.