Tutorial: Use Google Refine to Export JSON
Google Refine is great at cleaning large sets of data. But one amazing under-documented feature is the ability to design and output JSON files. With Google Refine, you can turn a simple spreadsheet into a straight forward JSON dataset or multidimensional array quickly and easily.
Basic JSON structure
Let’s start with a simple spreadsheet in Refine. Download the project files and import CaStimContracts.csv into Google Refine. You should see a straight forward spreads sheet that looks like this:
When converted to JSON, each row of data is converted into an element in an array. The data will look like this:
Converting your spreadsheet to JSON is easy. In the top-right corner, click the export button to get a drop-down menu and select Templating.
Clicking Export saves a text file.
Export JSON to match templates
The real power of Refine is that it gives you control over how the JSON will be formatted. You can edit once and make sure the changes are applied to huge data sets. This makes it very easy to make sure your data matches your custom templates. Lets make this data match the data used in the External Data Protovis tutorial. Here’s what the top of the file looks like:
To get the same thing in our file, we edit the prefix. Notice that the preview on the right will update to show you how your data will appear.
We also need to make sure the suffix matches. The template file ends like this:
Edit the suffix to match.
We can now export and use this is the external data tutorial example. The dollar amount can display in the chart and the related information as text.
Understanding multi-dimensional arrays
“That’s great,” you say. “But how can I edit the data to use the really cool Protovis examples?” The answer lies in multi-dimensional arrays. While this may sound intimidating, the concept is straight forward. All it really means is that a data element (the parent) may have subsets (children).
In this example,
race is the parent and the children are
American Indian, etc. This adds a dimension to the array.
Let’s take it a step further and look at race in three different counties in Florida. Now
Palm_Beach are all parents. And each has children (individual race data).
Now here’s where it gets a bit tricky. Just as in life, children can become parents and have their own offspring. In this example,
American Indian is subdivided by the
Pahokee tribes. This creates another dimension in the array.
This example is easy to understand but it’s only the beginning. Any child can become a parent with it’s own children and add a dimension to the array.
Creating multi-dimensional arrays
So why do we care about multi-dimensional arrays? Well, there are a lot of free data visualization templates that use them. We’re going to stick with Protovis and create data that can be used can in the Treemap, Bubble, Sunburst, and Dendogram examples.
Start by uploading “race.csv” from the example files into Refine and click Export/Template. Edit the Prefix and Suffix to match the examples we just looked at. Your window should look like this:
We create multi-dimensional arrays by editing the Row Template. Select and cut the first row.
Paste the line in front of the first curly bracket so that it looks like this:
"County" : Replace the comma at the end of the line with a colon. If you have an extra line after the curly bracket, delete that too. Your window should look like this:
Delete the code after
American Indian and replace it with an open curly bracket. Add a couple of spaces before the
Pahokee lines to keep things easy to read. Finally, add an extra closing curly bracket just before the comma at the end of the
Click export and you’re done. Congrats! You just built a multi-dimensional array. Try plugging it into the Protovis examples linked to earlier.
About this Tutorial
This tutorial was made to support the Knight Digital Media Center’s Interactive Census Workshop in Dec. 2010.
This content may not be republished in print or digital form without express written permission from KDMC. Please see our Content Redistribution Policy at kdmc.berkeley.edu/license.