Tutorial: Use Google Refine to Export JSON
Google Refine is great at cleaning large sets of data. But one amazing under-documented feature is the ability to design and output JSON files. With Google Refine, you can turn a simple spreadsheet into a straight forward JSON dataset or multidimensional array quickly and easily.
If you haven’t used Refine before, here’s some videos
to get you started
Basic JSON structure
Let’s start with a simple spreadsheet in Refine. Download the project files
and import CaStimContracts.csv into Google Refine. You should see a straight forward spreads sheet that looks like this:
When converted to JSON, each row of data is converted into an element in an array. The data will look like this:
Converting your spreadsheet to JSON is easy. In the top-right corner, click the export button to get a drop-down menu and select Templating.
Clicking Export saves a text file.
Export JSON to match templates
The real power of Refine is that it gives you control over how the JSON will be formatted. You can edit once and make sure the changes are applied to huge data sets. This makes it very easy to make sure your data matches your custom templates. Lets make this data match the data used in the External Data Protovis tutorial
. Here’s what the top of the file looks like:
To get the same thing in our file, we edit the prefix. Notice that the preview on the right will update to show you how your data will appear.
We also need to make sure the suffix matches. The template file ends like this:
Edit the suffix to match.
We can now export and use this is the external data tutorial example. The dollar amount can display in the chart and the related information as text.
Understanding multi-dimensional arrays
“That’s great,” you say. “But how can I edit the data to use the really cool Protovis examples?” The answer lies in multi-dimensional arrays. While this may sound intimidating, the concept is straight forward. All it really means is that a data element (the parent) may have subsets (children).
In this example,
is the parent and the children are
, etc. This adds a dimension to the array.
Let’s take it a step further and look at race in three different counties in Florida. Now
are all parents. And each has children (individual race data).
Now here’s where it gets a bit tricky. Just as in life, children can become parents and have their own offspring. In this example,
is subdivided by the
tribes. This creates another dimension in the array.
This example is easy to understand but it’s only the beginning. Any child can become a parent with it’s own children and add a dimension to the array.
Creating multi-dimensional arrays
So why do we care about multi-dimensional arrays? Well, there are a lot of free data visualization templates that use them. We’re going to stick with Protovis and create data that can be used can in the Treemap
, and Dendogram
Start by uploading “race.csv” from the example files into Refine and click Export/Template. Edit the Prefix and Suffix to match the examples we just looked at. Your window should look like this:
We create multi-dimensional arrays by editing the Row Template. Select and cut the first row.
Paste the line in front of the first curly bracket so that it looks like this:
Replace the comma at the end of the line with a colon. If you have an extra line after the curly bracket, delete that too. Your window should look like this:
Delete the code after
and replace it with an open curly bracket. Add a couple of spaces before the
lines to keep things easy to read. Finally, add an extra closing curly bracket just before the comma at the end of the
Click export and you’re done. Congrats! You just built a multi-dimensional array. Try plugging it into the Protovis examples linked to earlier.
This content may not be republished
in print or digital form without express written permission from Berkeley Advanced Media Institute. Please see our Content Redistribution Policy at multimedia.journalism.berkeley.edu/content_redistribution/.