Data Visualization: Choosing Tools and Workflows Across the Research Process

Introduction:

Data Visualization can serve as a complement to statistics and as a part of your research process from analysis through publication. Visualization works with the human eye-brain system and can help a viewer see relationships, patterns, and outliers in his or her data.

Data visualization as a broad term can refer to anything from a small bar graph with a few values to an elaborate poster-like display that integrates multiple graphs, maps, photographs, short annotations, and longer text.

The variety of tools and types of visualizations can have varying degrees of alignment with data analysis tools. When choosing a tool and a workflow, a model developed in cartography can be helpful to connect the purpose of visualization with your design and communication needs. Although the model was developed for mapping, it can apply to other disciplines also as a way to help consider the audience and purpose of visualization, and help inform tool and workflow choices.

Model:

This model, proposed by DiBiase presents a research process with four stages:

  1. Exploration of data to reveal pertinent questions
  2. Confirmation of apparent relationships in the data in light of a formal hypothesis
  3. Synthesis or generalization of findings
  4. Presentation of the research at professional conferences and in scholarly publications

dibiaseDiagram

DiBiase Model: Visual Thinking/Private Realm

The visual thinking, tools, and methods can change as your research stages change. During early stages of your data analysis, visualizations may complement statistical methods and help you explore the data, to look for patterns or outliers. You might not show initial visualizations to anyone else, nor will they all result in meaningful insights. You may not show initial visualizations to anyone else, nor will they all result in meaningful insights.

The early stages are typically done privately, as an individual or small team of experts deeply involved with the research subject. At this stage of your research, the characteristics of visualization tools should support you in working efficiently with your time to generate multiple visualizations with repeatable, documentable methods. Visual design elements, such as colors and graphic symbols, the types of visualizations, and levels of detail, can be chosen to help you identify patterns, similarities and outliers. At this stage, the audience is an individual researcher or small team who is familiar with the data; the visualizations aren’t intended for a broader audience.

DiBiase Model: Visual Communication/Public Realm

As the research progresses, the work shifts as you begin to communicate ideas and results to colleagues and peers, and eventually to a broader public.

As your audience widens, the visualizations change to serve as a tool for communicating beyond the research team and possibly to an audience with less expertise in the field. Visualizations that were clear to experts might not be understood by a broader audience without the depth of knowledge or interest in the subject.

Graphic design elements become more important to help you use your visualizations to communicate your research results to an external audience. Choices of chart type, level of detail, color, symbols, typography, labels and annotation can make a difference in the clarity of communications.

A Simple Example

This simple example can help illustrate a distinction between an exploratory graph and a communication graph, potential tools and one example workflow.

Exploratory Graph:

CO_In_obs

This graph was produced in R, a language and environment that offers several statistical and graphical techniques. R is available as free software and it offers strengths in its ability to handle data, provide a programming language, and allowing users to define new functions. R is extensible through packages. Many packages are available that can provide specialized functions applicable to a variety of domains. One strength of R to the roles of visualization in a research process is in its ability to generate individual or multiple graphs through scripts that then serve as a documentation of the data handling and visualization process.

When a research project has progressed to a point of showing graphs beyond the researcher(s) closely familiar with the data, the communication value of the graphs could be enhanced by progressing beyond R’s default graph functions and using packages that offer additional graph functions.

Communication Graph:

CO-ObsInact-Commun

Another option for generating public-audience graphs involves exporting a graph produced directly from the data into software that offers flexibility in design and pre-publication details. The example shown here was created by importing the scatterplot created in R into Adobe Illustrator, a vector-graphics software, and editing it for design elements. A strength of illustration software is that it affords flexibility to fine tune graphic design through wide choices in type, colors, shapes, annotations, and the ability to alter design element placement. A disadvantage is that a hand-editing process is prone to human errors. Because the graphic is separated from the data management environment, the editing tasks cannot be automatically replicated or easily traced.

The tools and methods that are effective for data exploration and analysis might not be the same as those for fine-tuning the visualizations for a public audience. As you work through a research process, considering the purpose and audience for your visualizations may help inform your choices in tools, methods, and efforts spent in polishing the graphic presentation.

DiBiase, David. 1990. Visualization in the Earth Sciences. Department of Geography, The Pennsylvania State University. https://scholar.google.com/citations?user=K9CPOLwAAAAJ&hl=en

The R Project for Statistical Computing. http://www.r-project.org

Adobe Illustrator. http://www.adobe.com/products/illustrator.html