Use Google Refine to Export JSON

Tutorial: Use Google Refine to Export JSON

Introduction

Google Refine is great at cleaning large sets of data. But one amazing under-documented feature is the ability to design and output JSON files.  With Google Refine, you can turn a simple spreadsheet into a straight forward JSON dataset or multidimensional array quickly and easily.

If you haven’t used Refine before, here’s some videos to get you started.

Basic JSON structure

JSON stands for JavaScript Object Notation and is an increasingly common way of incorporating data in JavaScript. Think of this text file as a cleaner, lighter version of XML.

Let’s start with a simple spreadsheet in Refine. Download the project files  and import CaStimContracts.csv into Google Refine. You should see a straight forward spreads sheet that looks like this:

google refine

When converted to JSON, each row of data is converted into an element in an array. The data will look like this:

json format

Converting your spreadsheet to JSON is easy. In the top-right corner, click the export button to get a drop-down menu and select Templating.

refine export

A new window will open showing you the default export template. The default export will work with a lot of JavaScript templates.

json refine export

Clicking Export saves a text file.

Export JSON to match templates

The real power of Refine is that it gives you control over how the JSON will be formatted. You can edit once and make sure the changes are applied to huge data sets. This makes it very easy to make sure your data matches your custom templates. Lets make this data match the data used in the External Data Protovis tutorial.  Here’s what the top of the file looks like:

population data top

To get the same thing in our file, we edit the prefix. Notice that the preview on the right will update to show you how your data will appear.

edit json template

We also need to make sure the suffix matches. The template file ends like this:

pop data end

Edit the suffix to match.

edit suffix refine

We can now export and use this is the external data tutorial example. The dollar amount can display in the chart and the related information as text.

Understanding multi-dimensional arrays

“That’s great,” you say. “But how can I edit the data to use the really cool Protovis examples?” The answer lies in multi-dimensional arrays. While this may sound intimidating, the concept is straight forward. All it really means is that a data element (the parent) may have subsets (children).

In this example, race is the parent and the children are Asian, American Indian, etc. This adds a dimension to the array.

data simple

Let’s take it a step further and look at race in three different counties in Florida. Now Broward, MiamiDade and Palm_Beach are all parents. And each has children (individual race data).

data sample medium

Now here’s where it gets a bit tricky. Just as in life, children can become parents and have their own offspring. In this example, American Indian is subdivided by the Seminole and Pahokee tribes. This creates another dimension in the array.

data advanced

This example is easy to understand but it’s only the beginning. Any child can become a parent with it’s own children and add a dimension to the array.

Creating multi-dimensional arrays

So why do we care about multi-dimensional arrays? Well, there are a lot of free data visualization templates that use them. We’re going to stick with Protovis and create data that can be used can in the Treemap, Bubble, Sunburst, and Dendogram examples.

Start by uploading “race.csv” from the example files into Refine and click Export/Template. Edit the Prefix and Suffix to match the examples we just looked at. Your window should look like this:

multi-start

We create multi-dimensional arrays by editing the Row Template. Select and cut the first row.

multi 2

Paste the line in front of the first curly bracket so that it looks like this:

multi 3

Delete "County" :   Replace the comma at the end of the line with a colon. If you have an extra line after the curly bracket, delete that too. Your window should look like this:

multi 4

Delete the code after American Indian and replace it with an open curly bracket. Add a couple of spaces before the Seminole and Pahokee lines to keep things easy to read. Finally, add an extra closing curly bracket just before the comma at the end of the Pahokee line.

multi 5

Click export and you’re done. Congrats! You just built a multi-dimensional array. Try plugging it into the Protovis examples linked to earlier.

About this Tutorial

This tutorial was made to support the Knight Digital Media Center’s Interactive Census Workshop in Dec. 2010.

Republishing Policy

This content may not be republished in print or digital form without express written permission from KDMC. Please see our Content Redistribution Policy at kdmc.berkeley.edu/license.