Exploring the Power of Text Analysis with Gale Digital Scholar Lab

by Rahil Virani

In a recent event by Gale for UW-Madison, participants explored investigative journalism and digital humanities using the Gale Digital Scholar Lab. The event titled “Going Hands-on with the Gale Digital Scholar Lab: A Closer Look at the ‘Female Stunt Reporter’ Dataset,” offered a chance to learn new skills and knowledge and to get ready to apply text analysis techniques in their projects/research.

Overview of the Dataset
The Female Stunt Reporters Dataset comprises 91 documents sourced from various archives including the Women’s Studies Archive, Nineteenth Century U.S. Newspapers, American Historical Periodicals, and others, covering the period from 1885 to 1925. These documents, compiled using Gale Primary Sources and Gale Digital Scholar Lab, offer a rich resource for exploring the roles and contributions of female journalists during the late 19th and early 20th centuries. The dataset represents an array of publications and perspectives, providing insight into the challenges, achievements, and evolving narratives surrounding women in journalism during this period. Researchers utilizing this dataset can delve into the experiences and impacts of female reporters, shedding light on their historical significance within the field.

Getting Started
Using  Google or Microsoft Office 365 accounts via NetID, users could easily access the lab and save their research securely on the Gale Digital Scholar Lab. Each project or workspace could be created using different datasets, allowing for organized management. Gale Digital Scholar Lab features 16 preinstalled datasets covering historical events and social movements, providing valuable resources for exploration and research. However, users should be cautious of potential data incompleteness, biases, and the necessity for cleaning before analysis. To learn how to create and set up an account, please click on this link.

Building Your Dataset
Building a Content Set in Gale Digital Scholar Lab involves two main steps: searching for documents and curating the results.

  1. Search: Search Gale Primary Sources, accessible through your institution’s library. Use keywords or terms relevant to your research interests to find documents. You can search based on words within documents or by utilizing metadata fields.
  2. Review and Curate Results: After searching, review the information of each document to assess its suitability for your Content Set. Consider factors such as relevance to your research questions, quality of content, and metadata associated with the document.

Once you’ve identified and curated the documents you want to analyze, add them to your Content Set for further exploration and analysis in Gale Digital Scholar Lab.

Preparing Your Dataset
Preparing your dataset for analysis is crucial in text analysis, and Gale Digital Scholar Lab offers a comprehensive cleaning feature to ensure your documents are formatted appropriately. By creating multiple cleaning configurations, users can tailor the cleaning process to suit specific analysis needs. This involves removing unwanted words or characters that could impact analysis results. The process includes creating a cleaning configuration, testing it on a subset of documents to ensure effectiveness, and then applying it during analysis. This ensures consistency and accuracy in data preparation, allowing researchers to focus on extracting insights rather than dealing with data inconsistencies.

Analyzing Your Dataset
This phase empowers you to interrogate hundreds or thousands of documents using digital tools, which would have been too time-consuming without computational algorithms. It guides you through selecting the right tool by asking pertinent questions about your analysis goals. It also covers setting up and running tools effectively to refine results. The available tools include Document Clustering for grouping documents based on similarity, Named Entity Recognition for extracting proper and common nouns, Ngrams for analyzing term frequencies, Parts of Speech for identifying grammatical components, Sentiment Analysis for gauging overall sentiment, and Topic Modeling for grouping frequently co-occurring terms into topics. Each tool offers unique insights, facilitating a comprehensive analysis of your dataset.

Overview – Gale Digital Scholar lab
Overall, the event helped in understand how Gale’s platform facilitates research and academic endeavors by integrating its extensive Primary Sources collections with open-source text mining and natural language processing tools. This Single Platform Text and Data Mining Environment streamlines content set creation, cleaning, parsing, and analysis, while optimized cloud-hosted data ensures accessibility and ease of use. To find out about future text analysis related events, please visit UW-Madison Libraries’ Data Services page