How I approach a data visualization story: A California wildfire story
By Peter Aldhous, Data, Science and Investigative Reporter, and Instructor
Find the story by exploring the data
As a journalist who makes charts and maps to find and tell stories, I typically start with questions to ask of the data, or a clear idea of what I’m trying to show based on prior interviews and background reading.
For this 2018 BuzzFeed News story on California wildfires, experts in fire ecology had told me how an interplay between climate change and population growth, which has increased ignitions caused by people and their infrastructure, had combined to create the firestorm gripping the state. So my challenge was to find a chart form to tell this story.
Understanding the data
I use the R programming language to analyze and visualize data. Here I worked with historical data on fire perimeters maintained by the California Department of Forestry and Fire Protection, or Cal Fire, which includes information on the source of the ignition and the area burned for each wildfire over the decades. Cal Fire scientists told me they were confident that the data was complete and accurate from about 1950, so I used R to remove data prior to 1950.
Communicating data with charts and maps
That left more than 13,000 fires for which I had all the required data, including the area burned. The human brain can’t extract much meaning from a sprawling table of data, but it’s very good at extracting meaning from visual patterns. My plan was to put all the fires on the same chart to reveal at a glance how their seasonal timing and size had changed over the years, and how this varied for fires started naturally and those started by people and their infrastructure.
I decided to approach this story in a couple of steps using the same chart form. Once your audience is familiar with a graphic, using variants of it to explore different facets of the story makes the key messages easier to process.
Rather than thinking in terms of finished charts, my starting point is often to think about the variables that will capture the story in the data and how they can be encoded on a chart. First, I wanted to show how large fires have become more common in recent years, and how the fire season, which once was mostly constrained to the summer and fall, has expanded. To show the changing seasonality in the data, I plotted years on the vertical or Y axis, and a timeline within each year on the horizontal or X axis. Along the timeline for each year, I then plotted a partially transparent circle for each fire, scaled by the total area burned, and positioned on the horizontal axis according to its “alarm date” – when Cal Fire first responded to the incident.
This step-by-step approach to building charts, starting with the variables to be plotted on each of the axes, and then adding geometric marks like circles, lines, or rectangles to encode other variables, is called the “grammar of graphics.” It lies at the heart of R’s ggplot2 charting package, which is what I used to make the chart.
I chose orange on a dark background to evoke fire. The preponderance of large fires near the bottom of the chart jumps out immediately. Look at the total amount of orange on the chart, and you can also see that the fire season seems to have crept earlier in the year over time.
Next, I brought in the sources of ignition, creating three views, one for natural fires started by lightning, one for fires started by people and infrastructure including power lines, and one where the initial cause was unknown. I used lavender and yellow, two strongly contrasting colors, for the natural and human-caused fires, and a neutral white for the unknowns. For each view I adjusted the opacity of the other fires to zero in my ggplot2 code, so they were fully transparent and could no longer be seen. Then I made a GIF to animate between the different views.
In this case, the chart form emerged very quickly after my initial decision to plot years on the Y axis and a timeline within each year on the X axis. I honed its appearance by experimenting with colors, the maximum size for the circles, their transparency, and the weight of the lines used for each year. Sometimes, however, you may need to try different visual encodings of data and chart forms before you find the one that works best. Sketch and experiment until the story is clear!
Having established the preponderance of human-caused fires in California, especially in recent years, I turned to a wildfire dataset maintained by the U.S. Department of Agriculture’s Forest Service to put this in a national context. I made a map, dividing the continental United States into a grid with a resolution of half a degree latitude and longitude. In each grid cell I plotted a circle scaled by the total area burned from 1992 to 2015 and colored according to the percent of that area burned by fires ignited by people or their infrastructure.
This map showed that California is unlike most of the rest of the Western U.S., where fires are mostly ignited by lightning. It was inspired by a similar map in this scientific paper:
Choosing color palettes
I made some subtle but important changes to fit with my visual story. First, I used a simple yellow-white-lavender diverging color palette to fit with the colors used in the earlier GIF. And rather than scaling the circles in my grid by the number of fires, I scaled them by the total area burned. This gave continuity with the visual encodings used on the earlier chart, where the colors and scaling of circles similarly represented the source of ignition and the area burned.
Continuity in visual encodings of data to represent the same or similar variables helps guide your audience through a visual story. But avoid using the same visual encodings to represent different variables at different steps in the narrative. It may be tempting from a design perspective to use the same colors through a piece, but your audience will likely remember what the colors represented when they were first introduced, and misread a chart or map as a result. So when introducing new variables, use a different color palette.
This was fairly a complex story, showing how different factors had combined to escalate the wildfire crisis in California. But by thinking carefully about visual encodings of the data, I was able to find charts that could be quickly scanned and understood. My goal was to design for the human brain, working with the way it processes visual information.
Learn how to create data visualizations using R in Berkeley AMI’s Advanced Data Analysis & Visualization online course.
For beginners interested in learning the basics of data visualization check out Berkeley AMI’s Data Visualization for Storytellers.
January 9 - February 3
January 9 - February 10
January 9 - February 3
January 9 - February 3
January 25 & January 26
February 6 - March 3
February 13 - March 24