the kdmcinfo weblog

Behind the Scenes at NPR's Argo Network

Is it possible to share stories and data seamlessly between a dozen separate sites without expensive commercial content management systems? Here's how NPR's Argo network uses Django to unify a collection of independent affiliate sites running on WordPress.

NPR's Argo Network is a collection of websites by NPR member stations committed to strengthening local journalism. The sites cover a range of topics: global health, climate change, public safety, local music and more. Each Argo site is run by a different member station, but all of them cover news that resonates nationally.

I recently had an excellent conversation with Argo developer Marc Lavallee, who explained how the Argo system works, and provided some screenshots of the editorial back-end.

According to Lavallee, Argo is a grant-funded project that's trying come up with the "just right" model for blogging, interaction with the public, and integration with other member sites. Two developers work with 12 NPR member stations, each focused on a coverage area. For example, KQED covers technology, while KALW does cops and courts (see the bottom of this page for a complete list of Argo sites).

Most stories produced by member stations are local, but many also have national interest. While each site gets its own look and feel, each also needs the ability to share stories and links with other stations on the back-end. Since each site runs on WordPress, these requirements meant finding a way to modify the WordPress Dashboard to integrate content from the rest of the network.

Choosing Systems

The Argo developers were already committed to WordPress as an ideal platform for building small, publication-oriented sites, but they also felt that the Python-based Django framework couldn't be beat for working with custom data models and sophisticated aggregation rules.

It was important that authors and editors not have to log into multiple systems - they wanted their writers to be able do it all from within the WordPress Dashboard. What they ended up with was a centralized Django installation at the middle of a hub and spoke system, ingesting content from all of the member sites, then making that content available to other member sites as cleanly tagged JSON data (Argo has written their own JSON emitters/consumers, rather than using the popular Piston app). The WordPress sites then consume that JSON data and present it to authors for re-publication via custom WordPress plugins.

The result is that Django aggregates and redistributes content from around the network, while WordPress provides tools to pick and choose content from that aggregated data, generate new content, and to serve the actual pages. Lavalee:

Our bloggers don't ever log into the Django admin. We embed sections of the Django admin as iframes inside the WordPress Dashboard. We re-use CSS classes (via JQuery/Ajax request) from the Django admin inside WordPress, so the look and feel comes into the Dashboard as well.

But original stories aren't the only content aggregated by Argo. The network also grabs feeds from Twitter (each blogger follows 200 people related to their beat), Delicious (each blogger has a Delicious account, with special tags being used for link roundups), and the publishing workflow aggregator DayLife.

The Argo team has built custom WordPress plugins to gather content from various external networks in addition to content coming from member sites.

Instead of taking those 3 different inputs and putting 3 boxes on each page, all links, regardless how found, exist in the database exactly once, and are scored by various criteria (e.g. how often the link has been used). The interface makes clear to authors which content is externally source and what's internally generated.

For example, consider the right sidebar at KALW's The Informant. The Latest Links section is comprised of data coming from three separate sources aggregated by Django, then pushed into the WordPress rich text editor and made into a WordPress post which can be modified by the author if needed.

combined admin
To create a new Latest Links post, editors use the Recent Roundup Links section below the WordPress post editor. The links are aggregated from around the network by a remote Django server and pulled into WordPress in real time. An editor selects the links they want to use, clicks a "Send to Editor" button at the bottom, and the selected collection is pushed into rich text editor. While the process could have been fully automated, this gives humans the ability to pick and choose, or to edit if necessary before publication -- a "just right" mix of automation and human editing.

To trigger the process of pushing new content to the Django system, Argo uses WordPress hooks to fire off events (http requests). For example, when an author updates a tag in WordPress, an event is fired off to Django to update its own tables.

The whole system was built by two developers in three months (for initial launch), with refinements being added regularly since. Though the two-system approach was initially considered an experiment, Lavalee says it's worked out great, and that they're very pleased with the results.

Coming Soon

The next big challenge for Argo will be to integrate with the NPR API so that content can move in both directions to/from the mothership. The team is also building tools to support key metrics. For example, if someone says "Posting five times per day is good for traffic," they want to be able to prove or disprove it, and to be able to visualize the data.

The platform will also be expanding to accommodate a new grant project to improve coverage of state houses and legislatures. Under that model, multiple stations in each state will be sharing data.

Interested in setting up something similar for your organization? The team plans to open source the Argo system in January 2011!

Other Approaches

Lavalee says the team also consider building things the other way around, with Django serving pages and WordPress as the content repository. Argo didn't go that route because, "At the end of the day, most of the public facing site is handled nicely by WordPress logic. And the integration would have been harder."

And why not Drupal?

"WordPress 3 had the multi-site plumbing functionality we needed in the core - the 12 ARGO sites are a single WordPressMU instance. We spent weeks before committing to a platform, and thought a lot about Drupal, but Drupal has an identity crisis. It wants to make things easy, but all that easy stuff ultimately gets in the way. Either you need to be a Drupal hot shot to work around Drupal's assumptions, or wait for Drupal 8 for some of the stuff we needed."

(Interestingly, that's the same conclusion I came to back in 2009 in Drupal or Django: A Guide for Decision Makers).

"Just Right"

Of course, every publication wants to streamline and consolidate toolchains to reduce duplication of effort and to increase news gathering efficiency. And NPR isn't alone in being a parent organization with lots of satellite sites to gather under one roof.

Some parent organizations require all of their brands to use the same CMS, while others use disparate systems that barely talk to each other, if at all. Many of these systems are expensive, heinously complicated, and saddled with antiquated technology. Upgrading to more modern systems isn't a luxury that most publications can afford. But for those who have an opportunity to start fresh, Argo is a great example of how a hybrid mix of open source tools can be fused to create custom solutions - in this case providing a "just right" balance between independence and cooperation between sibling publications.