Blog

JSON shell scripting with Node.js

08 Jun, 2012
Xebia Background Header Wave

I’m currently in a project team working on an application that stores much of its data in CouchDB. One of the lovely things about Couch is its RESTful API. It’s all simple HTTP and JSON, easy to understand and easy to program to.
One aspect where this interface isn’t so readily accessible is in shell scripting. There’s curl to handle all the HTTP stuff we could ever need, but to transform a JSON structure or extract information from it proved less straightforward. We can cover simple cases with grep and awk, but JSON is complex enough that we (or, well, I) wouldn’t want to. If the documents were XML, we could have used xpath and xslt to do our heavy lifting. There is to my knowledge no equivalent to xmlstarlet for JSON to reliably handle these chores.
We solved our shell scripting problem and the solution is dead obvious, because there’s an excellent DSL for manipulating JSON structures. It’s called JavaScript :-).
Our admin scripts use Node.js as a pipeline stage. There are so many blogs and articles talking about Node as a server platform that one easily forgets that it’s a generic JavaScript interpreter and REPL.

A simple example

This script reads a CouchDB view that lists documents we want to get rid of. In this example, I’ve used the standard _all_docs view that’s available in every Couch database. It then processes this list in JavaScript to create a delete command for CouchDB’s bulk API. The command is then executed and the response entity is pretty-printed with jsonpp.

[sourcecode language="bash"]
#!/bin/sh
COUCH="https://xebia.com/blog:5984/dbname"
curl "${COUCH}/_all_docs&include_docs=true" \
| ./format_delete_command.js \
| curl --header 'Content-Type: application/json' \
--header 'Accept: application/json' \
-X POST --data-binary @- "${COUCH}/_bulk_docs" \
| tr -d '\n' | jsonpp
[/sourcecode]

For those unfamiliar with curl: the -data-binary parameter indicates the request body. The @ -sign indicates that what follows is a file name. To post from stdin, use a - sign as the file name. Without an at-sign, the parameter value is used as the request body directly.
When we read from that CouchDB view, the response looks something like this:

[sourcecode language="javascript"]
{"total_rows":22,"offset":0,"rows":[
{"id":"123456","key":"123456","value":{"rev":"1-acf7f3..."},"doc":{...}},
{"id":"123457","key":"123457","value":{"rev":"1-b67df1..."},"doc":{...}}
...20 more...
]}
[/sourcecode]

Note that the doc field is left out unless you pass include_docs=true as a query parameter.
To post a bulk delete command, we send this:

[sourcecode language="javascript"]
{ "docs":[
{"_id":"123456","_rev":"1-acf6f39495a2cd4465be504cd435629e","_deleted":true}
{"_id":"123457","_rev":"1-b67df18954264dbb65be341294e572a5","_deleted":true}
...20 more...
]}
[/sourcecode]

awk could have handled that. Here’s the JavaScript that we use instead:

[sourcecode language="javascript"]
#! /usr/bin/env node
var inputStream = process.stdin
, data = '';
process.stdin.resume();
// Read the entire input stream into the data variable.
inputStream.on('data', function(chunk) {
data += chunk;
});
// At end of stream, load the JSON object and process it.
inputStream.on('end', function() {
var json = JSON.parse(data)
, rows = json['rows']
, postdata = { docs: [] };
rows.forEach(function(row) {
var doc = row['doc'];
postdata.docs.push( {_id: doc._id, _rev: doc._rev, _deleted: true} );
});
console.log(JSON.stringify(postdata));
});
[/sourcecode]

A few things worth noting:

  • Node.js is clever enough to ignore the hashbang line.
  • We simply buffer the entire input in a temporary variable. Our documents are small, we don’t need a clever streaming parser.
  • Use console.log to write to standard out, console.warn to write to standard err. You can use process.exit(int) to exit with an error code.
  • For this particular view, we had id’s and rev’s in the view output and therefore we could have formatted the delete command without passing the include_docs parameter. This is because I used the _all_docs view for the example. Our actual views have varying output, and relying on _all_docs makes the script work with all of them. (Please don’t feed your _all_docs view through this script, just delete the database!)

Our other scripts use the same approach.

Summary

Node.js is so often referred to as a server that it’s easy to forget you can use it in a shell pipeline. This post demonstrates how you might do that.
Updated 2012-06-27:
You may want to check out my follow-up post, JSON shell scripting with jsawk.

Questions?

Get in touch with us to learn more about the subject and related solutions

Explore related posts