Data Visualization can serve as a complement to statistics and as a part of your research process from analysis through publication. Visualization works with the human eye-brain system and can help a viewer see relationships, patterns, and outliers that would not otherwise be readily apparent in the data.
Data visualization as a broad term can refer to anything from a small bar graph with a few values to an elaborate poster-like display or interactive dashboard that integrates multiple graphs, maps, photographs, short annotations, and longer text.
The variety of tools and types of visualizations have varying degrees of interoperability with other data analysis tools. When choosing a tool and a workflow, a model developed by David Dibiase for application in geography can be helpful to connect the purpose of visualization with your design and communication needs. Although the model was developed for mapping, it can apply to other disciplines also as a way to help consider the audience and purpose of visualization, and help inform tool and workflow choices.
This model presents a research process with four stages:
- Exploration of data to reveal pertinent questions
- Confirmation of apparent relationships in the data in light of a formal hypothesis
- Synthesis or generalization of findings
- Presentation of the research at professional conferences and in scholarly publications
DiBiase Model: Visual Thinking/Private Realm
The visual thinking, tools, and methods can change as your research stages change. During early stages of your data analysis, visualizations may complement statistical methods and help you explore the data to look for patterns or outliers. You might not show initial visualizations to anyone else, nor will they all result in meaningful insights.
The early stages are typically done privately, as an individual or small team of experts deeply involved with the research subject. At this stage of your research, the characteristics of visualization tools should support you in working efficiently with your time to generate multiple visualizations with repeatable, documentable methods. Different visual design elements, such as colors and graphic symbols, the types of visualizations, and levels of detail, are best suited to reveal different features of the data. In the early stages, the audience is an individual researcher or small team who is familiar with the data; the visualizations aren’t intended for a broader audience.
DiBiase Model: Visual Communication/Public Realm
As the research progresses, you begin to communicate ideas and results to colleagues and peers, and eventually to a broader public.
As your audience widens, the visualizations change to serve as a tool for communicating beyond the research team and possibly to an audience with less expertise in the field. Visualizations that are clear to experts might not be understood by a broader audience without the depth of knowledge or interest in the subject.
Graphic design elements become more important to help you use your visualizations to communicate your research results to an external audience. Choices of chart type, level of detail, color, symbols, typography, labels and annotation can make a difference in the clarity of communications.
A Simple Example
This simple example can help illustrate a distinction between an exploratory graph and a communication graph, potential tools, and one example workflow.
This graph was produced in R, a language and environment that offers several statistical and graphical capabilities. R is a freely available software that featuring a programming language for handling and analyzing data, and allowing users to define and implement new functions. R is extensible through packages that can provide specialized functions applicable to a variety of domains. One strength of R for visualization in a research process is its ability to generate individual or multiple graphs through scripts that then serve as a documentation of the data handling and visualization process.
When a research project has progressed beyond the researcher(s) closely familiar with the data, the communication value of the graphs could be enhanced by making use of R’s more advanced packages that offer additional graphing functions.
It’s also possible to export a graph produced directly from the data into software that offers flexibility in design and pre-publication details. The example shown in Figure 3 was created by importing the scatterplot created in R into Adobe Illustrator, a vector-graphics software, and editing it for design elements. A strength of illustration software is that it affords flexibility to fine tune graphic design through wide choices in type, colors, shapes, annotations, and the ability to alter design element placement. A disadvantage is that a hand-editing process is prone to human errors. Because the graphic is separated from the data management environment, the editing tasks cannot be automatically replicated or easily traced.
The tools and methods that are effective for data exploration and analysis might not be the same as those for fine-tuning the visualizations for a public audience. As you work through a research process, considering the purpose and audience for your visualizations may help inform your choices in tools, methods, and efforts spent in polishing the graphic presentation.
DiBiase, David. 1990. Visualization in the Earth Sciences. Department of Geography, The Pennsylvania State University. https://scholar.google.com/citations?user=K9CPOLwAAAAJ&hl=en
The R Project for Statistical Computing. http://www.r-project.org
Adobe Illustrator. http://www.adobe.com/products/illustrator.html