Open Science Grid (OSG) User School 2020–Application Deadline Extended

OSG User School 2020

Announcement from Christina Koch + OSG

UPDATED: Due to the COVID-19 pandemic, we are extending the application
deadline to Friday, April 10. If you are unsure about applying, given all
the uncertainties lately, go ahead and submit an application! Even if this
year’s School is postponed or canceled, the first choice of seats at the
next School that is offered will go to applicants selected this year.

Could you transform your research with vast amounts of computing?

Come spend a week at the beautiful University of Wisconsin–Madison and learn how.

The Open Science Grid User School is seeking applicants for July 2020

During the school, July 6–10, you will learn to use high-throughput computing (HTC) systems — at your own campus or using the national Open Science Grid (OSG) — to run large-scale computing applications that are at the heart of today’s cutting-edge science. Through lectures, discussions, and lots of hands-on activities with experienced OSG staff, you will learn how HTC systems work, how to run and manage lots of jobs and huge datasets to implement a scientific computing workflow, and where to turn for more information and help. Take a look at the high-level curriculum and syllabus for more details.

The school is ideal for graduate students in any science or research domain where large-scale computing is a vital part of the research process, plus we will consider applications from advanced undergraduates, post-doctoral students, faculty, and staff. Students accepted to this program will receive financial support for basic travel and local costs associated with the School.

(more…)

Spreadsheets and Databases as Tools for Data Management

Is the spreadsheet you’re using for research data starting to get unwieldy? 

For research projects of any size it’s important to think about the tools that will best suit your data collection, recording, and analysis needs now and into the future. Spreadsheets and databases are two of the most versatile and widely used tools for managing research data. However, to ensure the long term integrity of your data, there are many scenarios in which the limitations of spreadsheets mean it may be time to transition your research data to a database. 

Spreadsheets:

Spreadsheets, whether in a proprietary software like Microsoft Excel or open source applications like OpenOffice, are some of the most widely used data management tools. This is not by accident: spreadsheets are an easy way to get started storing both qualitative and quantitative data. For larger research projects or those with highly specialized data types and vocabularies, however, spreadsheets have limitations that can jeopardize data integrity and introduce errors into your research process. 

Benefits of Spreadsheets: 

  • They are easy to learn and many people already know many of their functions 
  • They provide flexible and customizable features for organizing and analyzing data
  • Many other applications and coding languages work seamlessly with data stored in spreadsheets
  • Data in spreadsheets can be easily converted into csv or tabular formats that are perfectly suited for long term storage and sharing

Drawbacks of spreadsheets:

  • Limited control over data integrity and format
  • Some proprietary spreadsheet software have limits on the amount of data they can house
  • Most importantly, it’s easy for humans to introduce errors into data through mistakes like accidentally overwriting cells or by recording data in non-normalized structures (e.g. Excel famously mangles dates as cited in this blog post by a data librarian.)

For more information, consider taking a look at this Data Carpentries course on how to use spreadsheets for data management.

Databases: 

Databases are among the most powerful and ubiquitous tools for managing data of all kinds because of their dynamic ability to organize, query, and share the data they contain. There are both proprietary and open source database software that can serve as effective solutions for managing your research data. In addition, data that is already stored in a spreadsheet can, in many cases, be integrated into a database system

Benefits of databases: 

  • Databases provide highly customizable controls over formats, vocabularies, and data types according to which data is entered and stored
  • Easy to enforce quality control by requiring specific formats or values for fields and limiting access to tables
  • There are many Database Management Systems (DBMS) like MySQL Workbench or Microsoft Access that support the design and implementation of databases without extensive coding know how
  • Databases facilitate complex relationships between data elements and support sophisticated queries 
  • Command line controls make it possible to implement detailed customization of database structure 
  • Databases often include customizable interfaces that provide access to data as well as forms for entering new data

Drawbacks of databases: 

  • Learning database languages such as SQL can be labor and time intensive
  • Planning databases often requires extensive planning and may slow the research process down
  • Data stored in proprietary database software can be difficult to convert to other formats

Where to get help: If you are interested in further exploring database solutions you can access database training courses on LinkedIn Learning (formerly Lynda.com) or you can consult with Teach and Research Application Development (TRAD) on your options.

 

Link Roundup March 2020

light bulb

Cameron Cook

In the wake of changes in the pricing structure and licensing policies of the Box cloud storage system, UW Madison has created a task force to evaluate Box usage on campus. They will host listening sessions and asks for your input by filling out their Box Evaluation Survey

DoIT provides some tips for keeping your data safe before, during, and after your return from traveling.

Jennifer Patiño

Saturday March 7 is Open Data Day, an annual celebration of open data all over the world!

Kent Emerson

Several UW-Madison researchers are at the forefront of efforts to understand and treat the coronavirus, as well as how to prepare for and prevent similar outbreaks in the future. The researchers stress the importance of public data sharing to help find effective treatments more quickly and efficiently.

Link Roundup February 2020

light bulb

Jennifer Patiño

It’s Love Data Week this week! In the spirit of Valentine’s Day, we’re blogging about some of our most loved resources in and around UW -Madison. ICPSR is celebrating Love Data Week by encouraging everyone to adopt and share a dataset. You can follow the conversation and share your own love for data on Twitter using the hashtag #lovedata20.

Researcher, poet of code, and activist Joy Buolamwini was recently interviewed by NPR’s Code Switch about her findings at MIT Media Lab on bias in facial recognition algorithms that incorrectly classified Black women as male and her spoken word piece “AI, Ain’t I a Woman?” that drew attention to the issue.

Kent Emerson

DoIT has some recommendations for maintaining security while working remotely.

SciScore is a software developed at the University of California-San Diego that uses natural language processing tools to measure the reproducibility of research projects.

ResearchDrive Now Available

The UW–Madison Office of the Vice Chancellor for Research and Graduate Education (VCRGE) and the Division of Information Technology (DoIT) are excited to announce ResearchDrive, a secure, shareable data storage solution for faculty principal investigators (PIs), permanent PIs, and their research group members.  The new service is the first phase of a Research Cyberinfrastructure strategic initiative that is a collaborative effort with the VCRGE, DoIT, the Research Technology Advisory Group (RTAG), the Libraries, and campus research computing centers to support the growing data and computing needs of researchers.

The university provides each PI with 5 terabytes (TB) available at no cost and additional storage at $200/TB/year including support, training, and onboarding for researchers.  The quota per PI ensures that ResearchDrive is a predictable resource that can be leveraged for faculty recruitments and included in data management plans and grant proposals. ResearchDrive is suited for a variety of research purposes, including storing research data and files, storage for data inputs/outputs of research computing, archiving data, and others.  It is a secure and permanent place to store data and includes security and data protection features based on the NIST Cybersecurity framework such as encryption, snapshots, off-site replication, ransomware protection, and monitoring by the Cybersecurity Operations Center (CSOC).

(more…)