Tools: Google Takeout

The UW-Madison has implemented a utility for exporting your files out of your UW Google Drive account (as well as YouTube and Google Contacts) in one step. This is useful for archiving files in your account if you are leaving the University or if you want a copy of the files to place in another location. Takeout doesn’t delete the files; it creates a copy, so if you need to delete them, you will need to do that directly in Google Drive.

See  Exporting Data Using Google Takeout for instructions on how to do this.

Google Takeout creates a zipped folder, named <yourNetID>@wisc.edu-takeout.zip which you can download from the browser.

 

Tools: Electronic Lab Notebooks

What are they?

Electronic Lab Notebooks (ELNs) are software counterparts to paper lab notebooks. Although the name suggests a physical notebook device, ELNs are actually just software that runs on a computer, although some ELNs have apps for tablets and phones. The ELN interface resembles a notebook page, with fields for creating text entries and for attaching and annotating data files. Most allow you to create and modify templates for frequently used protocols. Other functionalities may include inventory tools that allow you to track samples and reagents.  Some have chemistry tools for drawing and searching on chemical structures.

UW-Madison has piloted a couple of ELNs (http://academictech.doit.wisc.edu/ideas/electronic-lab-notebooks) and we are currently evaluating a few others that have recently emerged. Interest in ELNs at the UW has grown over the past 2 years from a few interested researchers at the start of the pilot to hundreds of interested researchers. In response to this interest, a campus wide effort spearheaded by the Office of the CIO, WARF, CALS, and DoIT is seeking software, funding, and infrastructure to establish an enterprise ELN service.

Are ELNs useful for data management?

In general, ELNs let you keep data and descriptive information (e.g. materials, methods, analysis, and interpretations of the data) in digital form and all in one place. (An exception may be digital data files that are too large to upload/attach to the ELN. In that case, links can be added in the ELN to the server location of these files.) For simplicity in the following discussion, we’ll refer to both data files and the descriptive information as “data,” although the descriptive information might more accurately be called metadata.

Storing data in ELNs

From a data storage perspective, ELNs come in two flavors: those that can be locally hosted and those are hosted and store data in the cloud.

Locally hosted ELNs have the advantage of keeping data on campus servers. However, they usually consist of application, file system, and database layers and can be fairly complex to install, administer, and maintain for individual labs and departments. An enterprise level ELN service could provide some economies of scale by providing a common infrastructure to provide the hardware and services needed for a locally hosted ELN.

Other ELNs are cloud services that store data in the cloud, external to UW servers.  However, there are a lot of questions about security, protection of intellectual property and other issues when research data is moved to the cloud.  An advantage of an exploring an enterprise cloud ELN would be that the university could negotiate for favorable terms with vendors to secure agreements about the geographic location of cloud servers, segmentation of the our data from that of other institutions, encryption and advanced security provisions, etc., through a purchasing exercise.

Organizing and Tracking Versions of Files

When it comes to keeping an audit trail and preserving versioning information for entries and attached data, ELNs have a lot going for them. ELNs log time, date, user names and actions, and track version information. In many ELNs, these audit trails are designed to meet Federal government requirements for electronic records: the Code of Federal Regulations Title 21, Part II Electronic Records; Electronic Signatures.

Sharing and Collaborating
Similar to collaborative tools such as Google Drive, ELNs let you tag, comment, define workgroups, and share entries, files, folders, and templates with specified groups and individuals. In addition, a few labs on campus have been showing data recorded in their ELN at lab meetings and find that generally works out well.

Most ELNs also have features that allow electronic signing and witnessing of entries, enabling a level of legal documentation needed for research leading to patents.

Exporting and Archiving
Most ELNs let you export (and print) entries as PDF files and data files that are attached in their native formats. Some have xml and/or html export formats. However, there are no standards for exports which would make it easier to move records from one vendor’s ELN to another. We’d like to see that happen.

One of the strongest arguments for using an ELN system is its promise as a solution for data stewardship. Since the campus data stewardship policy specifies data be retained for at least 7 years (longer for some types of research), an enterprise ELN service would need to have a backend archival system. This would allow older data in the ELN to be moved to cheaper storage but would still allow search and retrieval from the archive with the appropriate access permissions in place. Researchers can also save ELN entries in PDF format and retain both print outs and electronic versions of these files.

Publishing
ELNs also offer the potential for publishing data. For example, one ELN provides a permanent digital object identifier (DOI) for each entry, which have been used in a publication.

Tools: SpiderOak

What It Is: Cloud-based file storage, synchronization, and back-ups. SpiderOak is available on Windows, Linux, OS X, iOS, Android, and N900 Maemo.

Cost: Free, premium, and enterprise accounts available. The pricing for storage is better compared to Dropbox; $10/month gets you 100GB at SpiderOak vs. 50GB from Dropbox. SpiderOak also has no maximum storage limit. Additionally, it offers a 50% educational discount to anyone with a valid .edu email address.

Ease of Use: SpiderOak’s forte is security, not interface design. The web and mobile interfaces are fairly plain and not nearly as user-friendly as Dropbox’s interfaces. Additionally, while Dropbox has a very simple set-up–everything goes in the Dropbox folder and syncs to all your devices unless you tell it not to–SpiderOak’s set up is a bit more involved. First, you need to set up a back-up. You can choose multiple folders and even specific types of files. After you’ve done this, you can sync the folders across your devices. Finally, access from the web and mobile interfaces is read-only. You can only upload files from the desktop client.

Sharing and Collaboration: SpiderOak provides ShareRooms which allow you to selectively share folders (with anyone; not limited to other SpiderOak users), but the files are read-only. It also allows sharing of a single file, but this is read-only as well. The sharing is more secure: the ShareRoom is access through a unique URL and a RoomKey (password) must be entered, but there is no mechanism for collaborative editing.

Organizing: Other than the traditional hierarchical file system structure, SpiderOak does not have any built-in organizational features.

Exporting: Files can easily be exported. Simply de-select the folders or files in question from the syncing and back-up.

Backups and Versioning: This is one area where SpiderOak does well. It says all historical versions of a file, and does extensive de-duplication, so only the parts that are different are saved, not the entire file.

Security: SpiderOak is, as Ars Technica puts it, “Dropbox for the security obsessive.” Its main selling point is not that’s cloud storage, but that it is secure cloud storage. Unlike the other major cloud storage services, SpiderOak employees cannot access your files. Both Dropbox and SpiderOak encrypt their data, but SO also encrypts the decryption key. The downside to SpiderOak’s superior security is that if you forget your password, your files are gone.

 

Case Study: Box

I recently sat down with Breanne Litts, a doctoral candidate in Digital Media, Curriculum & Instruction, who has been using Box for file storage and collaboration for her research on learning in makerspaces.

Project needs:
The research project, Learning in the Making: Studying and Designing Makerspaces, is funded by the National Science Foundation.  Breanne and her advisor are collaborating with co-investigators from George Mason University and the Children’s Museum Pittsburgh.  Box appealed to them as a tool for file storage, sharing, and collaboration because it was free and supported cross-institutional collaboration.
The group is conducting ethnographic research at makerspaces in Madison, Detroit, and along the east coast, with the goal of designing activities for the Makeshop in Pittsburgh.  They are conducting interviews and generating video and large audio files, as well as meeting notes, and other documentation related to the research.  They also do brainstorming and initial analysis in Box.  There are eight individuals working on this project, including undergraduate students, so another requirement for their data management tool was the ability to grant differential access privileges.  They organize files using Box’s folder system and have a main folder, a public folder, a private folder in which their sensitive data is stored, and each research site has its own folder.

Favorite features:
Storage and sharing – The group creates Word documents and Google Docs right in Box and appreciates the ability to lock open files to prevent conflicting copies.  This feature is also available on the mobile app.  The previews for documents, audio, and photos are “fantastic”, and the folder system for organization, tagging capability, and search feature are helpful.  Breanne expressed the opinion that the 50 GB of free storage that UW affiliates have access to will be a huge draw for graduate students.
Security – Box makes it easy to comply with IRB requirements regarding access to sensitive information.  In fact, the biggest attraction of Box was that it meets NSF and IRB standards for secure data management.  The ability to create, open, edit, and save directly to Box and not on your machine adds to this security.
Permissions – It’s simple to manage permissions of each individual file, unlike other project management tools the group looked into, which required users to go through an administrator.
Collaboration – Comments, tasks, and discussion features facilitate cross-institution, cross-country collaboration, making it easy to communicate while minimizing the need to email.  The group also found it easy to control email notifications to avoid being overwhelmed, compared to other project management tools.  The ability to link directly to files and folders is very convenient, as is the ability to track changes and revert to previous versions.
Overall, Breanne felt that it was easy to get started with Box.  There’s a low barrier to entry: one can use it without exploiting its total functionality and start getting things done without being overwhelmed.  In contrast, other tools the group considered require too many decisions to set up, as well as requiring meetings with an administrator.  Box offers collaborative teams autonomy, flexibility, and adaptability.
She’s found it to be a great tool for project and data management and collaboration and described it as “Facebook, Dropbox, and a project management tool in one!”  She feels that it does data management, as well as day-to-day project management, better than other tools.

Tools: Box

www.box.com

What it is

Box is a cloud-based file storage, synchronization, and collaboration service.  It can be used by groups for storing, creating, editing, and sharing data files, documents, and other digital objects.

Cost

A personal account offers 5 GB of storage and a 100 MB file size limit for free; you can pay for more storage and larger file limits.  Business and enterprise accounts are also available for a fee.  UW-Madison affiliates can participate in the Box pilot, which includes 50 GB of storage.

Sharing and collaboration

Users can share folders with other users, share files with individuals without Box accounts, and embed files in websites.  In addition to sharing and editing files, users can post comments and discussions and assign tasks.  You can also lock files while you are editing them.  Email notifications keep you updated about edits, comments, tasks, and uploads.  In addition to the web interface, there are desktop clients for Windows and Mac as well as Windows, Android, and Apple phones and tablets.

If you are the owner of a Box account used for collaboration, keep any private files you also store there separate from the files the group sees. One way to do this is to set up separate folders for shared and private at the top level of your folder structure and ensure that files always go in the appropriate area.

Organizing

You can’t readily make links between documents in Box like you can on a wiki or website; putting related documents together in folders is the most straightforward way to associate them in Box. Before your group starts adding documents to Box, collaborators should agree on a folder hierarchy structure so that all parties know where everything goes and related files/docs can be tied together by their folder location.

Exporting

To export files in bulk from Box, download the folders that contain them. It will be more convenient to do this if you have a well-organized hierarchy of nested folders rather than multiple folders at the top of your hierarchy. If you have installed Box Synch on your computer, you can also duplicate the files in your Box Documents folder and save them in another location on your computer.

Box does not give you options for exporting your documents in new formats. If you need to convert a file to a different format, you will need to do this in an application that can open and read the file.

If you have important information about what was done to create different versions of files that you need to preserve, there are currently only a couple of options for getting this information out of Box:

  • Take screenshots of screens in Box that display the comments, version information, discussions, etc.
  • Copy and paste the text from these items into a “read me” file.

Be sure to give the files you create using either of these methods meaningful names so you know which files they are describing. Save them alongside those files in the same folders.

Archiving

It is always a good idea to keep multiple copies of your data in several secure places and in formats that will be usable over a long period. You can download folders in Box and save them in other locations. But, will this preserve everything you need?

Backups and Versioning

Box keeps a record of file versions, and users can restore a previous version.

Security

Box is a member of the Cloud Security Alliance, and has provided safeguards to help ensure  HIPAA compliance.

 

 

Tools: Dropbox

https://www.dropbox.com/

What It Is: Cloud-based file storage and synchronization

Cost: Free, premium, and enterprise accounts available.

Ease of Use: With its focus on appealing to a broad audience of users, the interface is designed to be simple and the software is engineered to be easy for the average person to install and use.

Sharing and Collaboration: Users can share folders or individual files; items can also be shared with people who do not have Dropbox accounts. The software is also available on many platforms; in addition to its web interface, there are desktop clients for Windows, Mac, Linux, and the major mobile devices as well. Dropbox’s focus on mass appeal and cross-platform availability makes it a good fit for both groups with users who have varying levels of technical expertise and groups whose members use different operating systems.

Organizing: Other than the traditional hierarchical filesystem structure, Dropbox does not have any built-in organizational features. There are add-ons for tagging and attaching notes, but that isn’t a particularly active area among the third-party development community and the few available add-ons are still in alpha and beta-testing phases. For group collaboration with Dropbox, maintaining findability of files is dependent upon all group members following the same folder-organization and file-naming conventions.

Exporting: Dropbox also shines in its ability to easily export the files put into it. Items can simply be dragged and dropped to another folder on the computer.

Backups and Versioning: Two areas in which Dropbox does quite well are versioning and backups. It automatically creates conflicted copies of files that have been edited by multiple people at the same time, and all free accounts come with a 30-day file history, meaning that you can recover deleted files and revert to old versions of files from within the previous 30 days. Indefinite versioning and file history are available as a paid add-on, but that service is not retroactive, meaning that if you purchase it, the indefinite history will only apply to changes made after you’ve purchased the add-on.

Security: Lack of adequate security has been brought up as a criticism by Dropbox’s detractors. While there are both third-party apps and a number of DIY methods to make individual Dropbox accounts more secure, if you are working with sensitive data and/or things which are covered by legislation such as HIPAA or FERPA, you’ll want to look for a syncing solution that places a higher priority on security.

Tools: Google Drive

Using Google Drive to manage research data

Sharing

Google Drive can be used by groups for creating , editing, and sharing data files, documents, and other digital objects. To be an effective steward of your data in this collaborative environment:

Use your UW Google Apps Drive area to protect your intellectual property

UW-Madison and Google have an agreement that protects your intellectual property in the UW Google Apps area of Google Drive, i.e. the area you access using your NetID/password. This agreement does NOT extend to your private Google Drive, which you access using your Gmail account. It also does not extend to collaborators from outside the UW-Madison.

Have a schema for organizing folders

Links between Google docs aren’t preserved as relative links when you export the files like they are when you export Wiki pages. Therefore, collaborators should agree on a folder hierarchy structure so that all parties know where everything goes and related files/docs can be tied together by their folder location.

Maintain separate private and sharing areas

If you are the owner of a Google Drive used for collaboration, keep any private files you also store there separate from the files the group sees. One way to do this is to set up separate folders for shared and private at the top level of your folder structure and ensure that files always go in the appropriate area.

Maintain your own copies of files shared with you in Google Drive

Each UW Google Drive space is connected to an individual. If that individual leaves the UW, their account will be de-activated and all collaborators will lose access to the Google Drive space. For that reason, all collaborators should keep their own copies of shared files.

File versioning

You can track version information, including person, date, time, etc., of both Google docs (.gdoc, .gsheet, etc.) and other file types in Google Drive. For a Google doc, you can step through revision changes by using “See revision history.” For a non-Google doc, use “Manage revisions” to see and download current and past versions. There are a few things to keep in mind to get the most out of version tracking:

Revision history is lost when you export

Time-stamp and author information are not retained outside of Google Drive. If you will be exporting your files to archive them elsewhere and version information needs to be preserved, use another versioning method, such as file-naming conventions for files while you are working with them in Google Drive.

Watch out for auto-delete

By default, Google Drive deletes older versions of non-Google docs files after 30 days or 100 revisions, unless you specify that a revision should not be auto-deleted. If you need to keep older versions but don’t want them using your space on Google Drive, you’ll need to download the files and store them elsewhere.

Beware of the potential for version divergence with “Export to Google”

When you use “Export to Google Docs,” you convert a file to a Google docs version that you can edit directly in Google Drive. Google Drive saves that new file under the same name as the original. However, any edits to this new file, won’t be made to the original file or tracked in its revision history. This has the potential to create a new lineage of the file, which could lead to confusion if version information is important.

Archiving

It is always a good idea to keep multiple copies of your data in several secure places and in formats that will be usable over a long period. Google Drive has many options for exporting files and folders and for converting Google docs to more sustainable formats (such as PDF, OpenOffice, etc.) for long-term archiving elsewhere. But, will exporting preserve everything you need?

Comments are not included all types of exports

If the comments your group has created in Google docs are important documentation of your research workflow that you will want to preserve, you can preserve them if you export as a Word document. Other types of export formats don’t preserve the comments, however.

Version and revision information are not exported

See the discussion about versioning above.

Formulas in spreadsheets are retained in some export formats

Exporting Google spreadsheets in Excel or OpenOffice formats will preserve formulas; exporting as .csv will not.