An Introduction to R

R logo

Logo from R-project under CC BY-SA 4.0 , no changes made.

In this series, we’ll cover different aspects of the R programming language. This first post will cover a brief introduction to the language.

R is an open-source programming language and software environment for statistical computing and visualization, created by Ross Ihaka and Robert Gentleman and released in 1993. It was developed from the S statistical programming language which was created by John Chambers and colleagues at Bell Technologies in 1976. The basic R interface is a command line interface, but there are other GUI options available for download such as RStudio.

Per the R-project,

R is an integrated suite of software facilities for data manipulation, calculation and graphical display. It includes:

  • an effective data handling and storage facility,
  • a suite of operators for calculations on arrays, in particular matrices,
  • a large, coherent, integrated collection of intermediate tools for data analysis,
  • graphical facilities for data analysis and display either on-screen or on hardcopy, and
  • a well-developed, simple and effective programming language which includes conditionals, loops, user-defined recursive functions and input and output facilities.

R’s abilities can also be extended through packages, which are user created scripts, functions, documentation, etc. that are available for download and use in your local R environment. There are a seemingly overwhelming number of packages available, but there are resources to help you sort them out. RStudio has an article with some of the most useful R packages available and R-bloggers has an introduction to packages as well as a number of tutorials. There are also multiple packages available for text analysis, natural language processing, topic modeling, word clouds, etc. for those who made be interested in R’s use for computing outside of the sciences.

R and its packages are available for download through the Comprehensive R Archive Network (CRAN). To download, select the geographically closest mirror and then download the appropriate version for your device.

In the next post, we’ll cover important features of the language and I will discuss some of the challenges I have experienced while learning R during the course of the semester.

Sources:

https://www.r-project.org/about.html

https://en.wikipedia.org/wiki/R_(programming_language)

https://en.wikipedia.org/wiki/S_(programming_language)

The Big Picture of Data Management in an Electronic Lab Notebook (ELN)

by Jan Cheetham

The ELN is designed to be a single place to hold the different types of digital data you produce in your research. Now that UW-Madison researchers have unlimited storage on the LabArchives platform, there are no physical limits on the amount of data you include in your ELN notebook. It may be time to take a holistic view of how to manage all those data files in the ELN.

To get a big picture view, it helps to consider the different ways to add data and information to a LabArchives notebook and the options for getting it out again, as a backup copy or to archive it for the long haul. In an ELN system, a notebook consists of essentially three different types of data: notebook pages, attached files, and linked files.

 

 

Getting data in

This schematic illustrates the different ways to add data to a notebook: manually uploading it to an attachment entry; typing, pasting, or dragging it into a rich text entry, or using an automated workflow to “push” it from instruments or software into an attachment entry (blue arrows).  Once a data file is attached to an entry, LabArchives keeps track of the file name and versions, including times and dates of changes.

Linking to data

If your data files need to live on an external server outside of the ELN, you can create a link to them from inside a notebook entry. However, if the files move, this link will break.

Getting data out

The green arrows in the diagram represent the export process, which is manual. LabArchives has two types of export packages: Offline Notebook, a zipped folder that includes notebook pages in HTML format plus attached data files (the most recent version only) and Print to PDF, a single PDF file of the pages that includes names and icons of attached files only. Of course, neither option contains a copy of linked files that are stored outside the ELN– only the path to the external files gets recorded in the ELN.

Backing up and archiving notebooks require some planning. There are several factors to consider:

  1. How often will you need to back up your notebook? Which type(s) of export package(s) will be most useful for this?
  2. Where will you store these export package files? Will you also print out paper copies and retain them? Is there a digital repository or data archiving platform you can keep them in for the long term?
  3. Will you keep a backup copy of attached data files outside of the ELN? If yes, you may find that the PDF format is sufficient for periodic backups of your notebook. If no, you may want to have a regular schedule for exporting the Offline Notebook as a backup of both notebook pages and attached data files.
  4. Are the external data files you link to from the ELN in a permanent location? If not, consider moving them into the ELN or storing them in a more permanent location.

For more information

ELN Archiving

Scenarios to help you create an archival and backup plan.

Archiving Electronic Lab Notebooks

Components of a ELN notebook and what information in each needs to be archived.

Manage Your Data with LabArchives

More details about export packages as well as data management tips for sharing, organizing, and documenting data inside the ELN.

 Data Storage and Backup

Storage and backup resources at UW-Madison for exported ELN files.

 Digital Archiving Platforms

A sampling of digital data repositories that may be useful for archiving ELNs

Tool: Electronic Lab Notebook

by Cameron Cook

UW-Madison nows offers an electronic lab notebook (ELN) service called LabArchives.

What is an electronic lab notebook?

An electronic lab notebook is software designed to replace traditional paper lab notebooks and allows you to contain all the different types of digital data your research produces in one place. It also makes your research easier by helping you organize, document, and share your work.

What can LabArchives help you do?

Per the university ELN website and the DoIT ELN Information, some of the available features are:

  • Organize research by structuring files and folders, as well as allow versioning and tracking.
  • Designate multiple user roles and customize permission settings.
  • Share and collaborate with other researchers whether internally or externally.
  • Integrate it with “Microsoft Office (Windows only), Google docs, ChemDoodle, PubMed, and GraphPad Prism”.
  • Attach files of many different formats.
  • Link to files living on an external server.
  • Unlimited storage on the LabArchives platform.
  • Securely access research  anywhere in the world via an internet connection.
  • Access to basic drawing and chemistry tools.
  • Export your data with ‘Offline Notebook’ or ‘Print to PDF’ options.
  • Using LabArchives through the university provides legal protections for your research like the one’s provided through UW-Google Apps and UW-Madison Box. It also helps with compliance to UW’s Policy on Data Stewardship, Access, and Retention.
How do you get it?

It is available at no cost to students, staff, faculty, and researchers upon request or approval by a Principal Investigator or other appropriate designee.

What else should you know?

If you are interested in using the LabArchives service with data that is subject to campus IRB or federal oversight, it is recommended that you consult with your IRB or information security officer. At this time, the campus LabArchives service is not considered a suitable place for sensitive data elements, such as personal health information.

For more information LabArchives and ELN best practices suggested by UW visit eln.wisc.edu. For more information about ELN implementation at UW-Madison, visit UW-Madison News’ recent article.

 

Top 5 Data Management Tips for Undergraduates

7NWB2A8I0R

by Cameron Cook

With fall well under way on campus and final projects just around the corner,  it’s the perfect time to review our top five data management tips for undergrads! As an undergraduate, data management may not seem important, but giving it a few moments of your day will ensure your assignments are safe – even in the face of a hard drive meltdown the night before a due date.

If keeping your final projects safe isn’t enough of an incentive, there is one more. You have undergraduate publishing opportunities. As you learn and grow as a researcher, you can publish your work in a number of undergraduate research journals. Practicing good data management will help keep your research reproducible, understandable, findable, and organized for when you submit your work to a journal.

 

1 ) Clear, consistent file naming and structure

Or know where your data lives. Keep file names simple, short, but descriptive. Include dates (in a standardized format) to version your files so that you can always go back to a previous copy in case of mistakes. Keep files in a consistent, clear structure with easy to follow labels (this may be date, file type, instrument or analysis type) so that you will never misplace an important file.

(more…)

Tools: OnCore and REDCap

by Jan Cheetham

OnCore copy

RedCap

 

REDCap (Research Electronic Data Capture) and OnCore (Online Collaborative Environment) are clinical data management tools supported by the UW Institute for Clinical and Translational Research. See table for a comparison of the features of the two tools.

These systems are used by researchers who conduct clinical trials in the School of Medicine and Public Health and in other units. OnCore is required for some types of clinical protocols. The two systems are designed for use with clinical research data, including identifiable information about subjects. In both systems, data is entered in forms. OnCore provides standard forms for managing clinical trials that can be customized by ICTR staff. REDCap users create their own forms and these can be used to collect survey data. Supporting files and documents in various formats can be also be uploaded to both systems.

Security

Both the OnCore and REDCap systems are HIPAA compliant, employing secure networks, architectures, and appliances such as firewalls, routers, and gateways for routing data. Data in the systems are encrypted and all actions are tracked and audited. In addition, access to data centers is restricted to authorized personnel only.

Sharing

Both systems allow access by researchers at multiple study sites. Access rights can be specified for each user, to limit access to personal health information fields to specific individuals, or to allow only some users to enter data and others to electronically sign/verify and lock records.

Tracking Changes/Versions

The audit trail for OnCore is not complete for all consoles within the system. The EDC system does have a decent audit trail for data entered into forms and each individual subject can be printed. But there is not a summary export of the full audit trail for all subjects.

Data Documentation

Both systems allow upload of supporting documents describing the data and collection methods, such as data dictionaries, code books, protocols, etc. OnCore has a general location to upload Protocol documents, but data cannot be imported into OnCore like REDCap.

Data Quality Controls

In both systems, forms can include several measures that enhance the accuracy/validity of data entered in forms. These include field notes describing allowed data values and field validation settings that limit data entry to specified ranges of values. Data quality rules can also be applied to search for missing values and empty fields in forms. In addition, data records can be verified and locked.

Exporting

Both systems allow export of data in a variety of formats for use in statistical software, such as Excel, SAS, SPSS, and others.

Information updated 8/2/17 by Allan Barclay with assistance from ICTR.

Let’s Talk About Storage

By Luke Bluma, IT Engagement Manager for the Campus Computing Infrastructure (CCI)

Data is a critical part of our lives here at UW-Madison. We collect, analyze, and share data every day to get our jobs done. Data comes in all shapes and sizes and it needs the right place to live. That’s where storage comes in.

However, storage can be a loaded term. It can mean a thumb drive, or your computer’s hard drive, or storage that is accessed via a server or cloud storage or a large campus-wide storage service. It is all of these things, but not all of these will fit your needs. Your needs are what matters and they will drive what solution(s) will work for you.

I am the Engagement Manager for the Campus Computing Infrastructure (CCI) initiative. I work with campus partners on their data center, server, storage and/or backup needs. Storage is currently a big focus for me, so I wanted to share some thoughts about evaluating potential storage solutions.

Storage Array in Data Center

Storage for CCI

The main areas to think about are:

  • What kinds of data are you working with?
  • What are your “must have’s”?
  • What storage options are available at UW-Madison?

What kinds of data are you working with?

This is the first big question you want to focus on because it drastically impacts what options are available to you. Are you working with FERPA data, sensitive data, restricted data, PCI data, etc.? Each of these will impact what service(s) you can or can’t utilize. For more information on Restricted Data see: https://www.cio.wisc.edu/security/about/campus-initiatives/restricted-data-security-standards/

What are your “must have’s”?

Once you have identified the types of data you are working with, then it is crucial to determine what are your must have requirements for a storage solution. Does it need to be secure? If so, how secure? Does it need to be accessed by people outside of UW-Madison? Does it need to be high performance storage? Does it need to scale to 20+ TB? Does it need to be accessible via the web? These are just example questions, and the key here is that there is no perfect storage solution. Some services do X, Y, Z and others do X, Y, A but not Z. So determining your “must have’s” will help you figure out which services you can work with, and which you can’t.

What storage options are available at UW-Madison?

Now that you have identified the kinds of data, and the “must have’s” for your solution the final step is to evaluate what storage options are available to you at UW-Madison. Storage is an evolving technology so specific services will change over time, but here are good places to start to learn more about what services are available to you:

  • Local IT – if you have a local IT group, then talk to them first about what local options may be available to you
  • Campus Computing Infrastructure (CCI) – if you need network storage or server storage that isn’t focused on high performance computing then CCI has several options that could work depending on your needs
  • Advanced Computing Initiative (ACI) – if you need to do high performance or high throughput computing then ACI has several options that could work depending on your needs
  • Division of Information Technology (DoIT) – if you need cloud storage, like Box.com, or local storage, like an external hard drive, then DoIT has solutions that could work for you as well

This can seem like a lot to think about, and to be honest it can be quite confusing at times. The good news is that you have help! Research Data Services (RDS) can be a great starting point for your storage needs. We can focus on the key question: what are you looking to do? Then we can help you evaluate some potential options for moving forward based on your needs.

To get started contact RDS at http://researchdata.wisc.edu/help/contact-us/ or contact me at cci@cio.wisc.edu