Link Roundup March 2020

light bulb

Cameron Cook

In the wake of changes in the pricing structure and licensing policies of the Box cloud storage system, UW Madison has created a task force to evaluate Box usage on campus. They will host listening sessions and asks for your input by filling out their Box Evaluation Survey

DoIT provides some tips for keeping your data safe before, during, and after your return from traveling.

Jennifer Patiño

Saturday March 7 is Open Data Day, an annual celebration of open data all over the world!

Kent Emerson

Several UW-Madison researchers are at the forefront of efforts to understand and treat the coronavirus, as well as how to prepare for and prevent similar outbreaks in the future. The researchers stress the importance of public data sharing to help find effective treatments more quickly and efficiently.

Version Control for Research Projects

Working on individual or collaborative projects of any size requires keeping your files organized. Inconsistent file management can result in lost work, redundancies, errors in the final products, or difficulty for others building on your work later on. One of the foundational practices for ensuring you keep your files organized is to use version control conventions and tools. Choosing the appropriate service depends on many factors including the types of files and data you are using and producing, the size of your team, and the frequency with which changes are made to your files. 

When considering the proper platform for maintaining your research files it is important to understand your responsibilities for identifying, transmitting, redistributing, storing or disposing of sensitive information. For more information, refer to UW-Madison’s guide to handling sensitive data, and the UW System policy.


An Introduction to Web Archiving for Research

Web archiving is the practice of collecting and preserving resources from the web. The most well known and widely used web archive is the Internet Archive’s Wayback Machine. The Internet Archive was launched in 1996 by Brewster Kahle with the mission of providing “Universal Access to All Knowledge” The Wayback Machine uses an automated process called crawling to collect pages from all over the web and stores them on servers at the Internet Archive headquarters in San Francisco. 

Institutions such as government agencies, universities, and libraries also actively archive the web, but often with narrower collection scope. There are also many web archiving projects run by smaller teams and individual researchers, and these too usually have specific areas of focus. If there are web resources you are interested in collecting and preserving, with a little research and learning of the tools, you can absolutely create your own web archive. 

Please be advised that if you are archiving web pages, forums, social media, or other web materials for research purposes and it may constitute human subjects research, you must consult with and follow the appropriate UW-Madison Institutional Review Board process as well as follow their guidelines on “Technology & New Media Research”.  (more…)

Tools: Google Takeout

The UW-Madison has implemented a utility for exporting your files out of your UW Google Drive account (as well as YouTube and Google Contacts) in one step. This is useful for archiving files in your account if you are leaving the University or if you want a copy of the files to place in another location. Takeout doesn’t delete the files; it creates a copy, so if you need to delete them, you will need to do that directly in Google Drive.

See  Exporting Data Using Google Takeout for instructions on how to do this.

Google Takeout creates a zipped folder, named <yourNetID> which you can download from the browser.


Tools: Electronic Lab Notebooks

What are they?

Electronic Lab Notebooks (ELNs) are software counterparts to paper lab notebooks. Although the name suggests a physical notebook device, ELNs are actually just software that runs on a computer, although some ELNs have apps for tablets and phones. The ELN interface resembles a notebook page, with fields for creating text entries and for attaching and annotating data files. Most allow you to create and modify templates for frequently used protocols. Other functionalities may include inventory tools that allow you to track samples and reagents.  Some have chemistry tools for drawing and searching on chemical structures.

UW-Madison has piloted a couple of ELNs ( and we are currently evaluating a few others that have recently emerged. Interest in ELNs at the UW has grown over the past 2 years from a few interested researchers at the start of the pilot to hundreds of interested researchers. In response to this interest, a campus wide effort spearheaded by the Office of the CIO, WARF, CALS, and DoIT is seeking software, funding, and infrastructure to establish an enterprise ELN service.

Are ELNs useful for data management?

In general, ELNs let you keep data and descriptive information (e.g. materials, methods, analysis, and interpretations of the data) in digital form and all in one place. (An exception may be digital data files that are too large to upload/attach to the ELN. In that case, links can be added in the ELN to the server location of these files.) For simplicity in the following discussion, we’ll refer to both data files and the descriptive information as “data,” although the descriptive information might more accurately be called metadata.

Storing data in ELNs

From a data storage perspective, ELNs come in two flavors: those that can be locally hosted and those are hosted and store data in the cloud.

Locally hosted ELNs have the advantage of keeping data on campus servers. However, they usually consist of application, file system, and database layers and can be fairly complex to install, administer, and maintain for individual labs and departments. An enterprise level ELN service could provide some economies of scale by providing a common infrastructure to provide the hardware and services needed for a locally hosted ELN.

Other ELNs are cloud services that store data in the cloud, external to UW servers.  However, there are a lot of questions about security, protection of intellectual property and other issues when research data is moved to the cloud.  An advantage of an exploring an enterprise cloud ELN would be that the university could negotiate for favorable terms with vendors to secure agreements about the geographic location of cloud servers, segmentation of the our data from that of other institutions, encryption and advanced security provisions, etc., through a purchasing exercise.

Organizing and Tracking Versions of Files

When it comes to keeping an audit trail and preserving versioning information for entries and attached data, ELNs have a lot going for them. ELNs log time, date, user names and actions, and track version information. In many ELNs, these audit trails are designed to meet Federal government requirements for electronic records: the Code of Federal Regulations Title 21, Part II Electronic Records; Electronic Signatures.

Sharing and Collaborating
Similar to collaborative tools such as Google Drive, ELNs let you tag, comment, define workgroups, and share entries, files, folders, and templates with specified groups and individuals. In addition, a few labs on campus have been showing data recorded in their ELN at lab meetings and find that generally works out well.

Most ELNs also have features that allow electronic signing and witnessing of entries, enabling a level of legal documentation needed for research leading to patents.

Exporting and Archiving
Most ELNs let you export (and print) entries as PDF files and data files that are attached in their native formats. Some have xml and/or html export formats. However, there are no standards for exports which would make it easier to move records from one vendor’s ELN to another. We’d like to see that happen.

One of the strongest arguments for using an ELN system is its promise as a solution for data stewardship. Since the campus data stewardship policy specifies data be retained for at least 7 years (longer for some types of research), an enterprise ELN service would need to have a backend archival system. This would allow older data in the ELN to be moved to cheaper storage but would still allow search and retrieval from the archive with the appropriate access permissions in place. Researchers can also save ELN entries in PDF format and retain both print outs and electronic versions of these files.

ELNs also offer the potential for publishing data. For example, one ELN provides a permanent digital object identifier (DOI) for each entry, which have been used in a publication.

Tools: SpiderOak

What It Is: Cloud-based file storage, synchronization, and back-ups. SpiderOak is available on Windows, Linux, OS X, iOS, Android, and N900 Maemo.

Cost: Free, premium, and enterprise accounts available. The pricing for storage is better compared to Dropbox; $10/month gets you 100GB at SpiderOak vs. 50GB from Dropbox. SpiderOak also has no maximum storage limit. Additionally, it offers a 50% educational discount to anyone with a valid .edu email address.

Ease of Use: SpiderOak’s forte is security, not interface design. The web and mobile interfaces are fairly plain and not nearly as user-friendly as Dropbox’s interfaces. Additionally, while Dropbox has a very simple set-up–everything goes in the Dropbox folder and syncs to all your devices unless you tell it not to–SpiderOak’s set up is a bit more involved. First, you need to set up a back-up. You can choose multiple folders and even specific types of files. After you’ve done this, you can sync the folders across your devices. Finally, access from the web and mobile interfaces is read-only. You can only upload files from the desktop client.

Sharing and Collaboration: SpiderOak provides ShareRooms which allow you to selectively share folders (with anyone; not limited to other SpiderOak users), but the files are read-only. It also allows sharing of a single file, but this is read-only as well. The sharing is more secure: the ShareRoom is access through a unique URL and a RoomKey (password) must be entered, but there is no mechanism for collaborative editing.

Organizing: Other than the traditional hierarchical file system structure, SpiderOak does not have any built-in organizational features.

Exporting: Files can easily be exported. Simply de-select the folders or files in question from the syncing and back-up.

Backups and Versioning: This is one area where SpiderOak does well. It says all historical versions of a file, and does extensive de-duplication, so only the parts that are different are saved, not the entire file.

Security: SpiderOak is, as Ars Technica puts it, “Dropbox for the security obsessive.” Its main selling point is not that’s cloud storage, but that it is secure cloud storage. Unlike the other major cloud storage services, SpiderOak employees cannot access your files. Both Dropbox and SpiderOak encrypt their data, but SO also encrypts the decryption key. The downside to SpiderOak’s superior security is that if you forget your password, your files are gone.


Case Study: Box

I recently sat down with Breanne Litts, a doctoral candidate in Digital Media, Curriculum & Instruction, who has been using Box for file storage and collaboration for her research on learning in makerspaces.

Project needs:
The research project, Learning in the Making: Studying and Designing Makerspaces, is funded by the National Science Foundation.  Breanne and her advisor are collaborating with co-investigators from George Mason University and the Children’s Museum Pittsburgh.  Box appealed to them as a tool for file storage, sharing, and collaboration because it was free and supported cross-institutional collaboration.
The group is conducting ethnographic research at makerspaces in Madison, Detroit, and along the east coast, with the goal of designing activities for the Makeshop in Pittsburgh.  They are conducting interviews and generating video and large audio files, as well as meeting notes, and other documentation related to the research.  They also do brainstorming and initial analysis in Box.  There are eight individuals working on this project, including undergraduate students, so another requirement for their data management tool was the ability to grant differential access privileges.  They organize files using Box’s folder system and have a main folder, a public folder, a private folder in which their sensitive data is stored, and each research site has its own folder.

Favorite features:
Storage and sharing – The group creates Word documents and Google Docs right in Box and appreciates the ability to lock open files to prevent conflicting copies.  This feature is also available on the mobile app.  The previews for documents, audio, and photos are “fantastic”, and the folder system for organization, tagging capability, and search feature are helpful.  Breanne expressed the opinion that the 50 GB of free storage that UW affiliates have access to will be a huge draw for graduate students.
Security – Box makes it easy to comply with IRB requirements regarding access to sensitive information.  In fact, the biggest attraction of Box was that it meets NSF and IRB standards for secure data management.  The ability to create, open, edit, and save directly to Box and not on your machine adds to this security.
Permissions – It’s simple to manage permissions of each individual file, unlike other project management tools the group looked into, which required users to go through an administrator.
Collaboration – Comments, tasks, and discussion features facilitate cross-institution, cross-country collaboration, making it easy to communicate while minimizing the need to email.  The group also found it easy to control email notifications to avoid being overwhelmed, compared to other project management tools.  The ability to link directly to files and folders is very convenient, as is the ability to track changes and revert to previous versions.
Overall, Breanne felt that it was easy to get started with Box.  There’s a low barrier to entry: one can use it without exploiting its total functionality and start getting things done without being overwhelmed.  In contrast, other tools the group considered require too many decisions to set up, as well as requiring meetings with an administrator.  Box offers collaborative teams autonomy, flexibility, and adaptability.
She’s found it to be a great tool for project and data management and collaboration and described it as “Facebook, Dropbox, and a project management tool in one!”  She feels that it does data management, as well as day-to-day project management, better than other tools.