JSON-like config, a spectrum of under/overengineering

by havoc

Cynics might say that overengineered means I didn’t write it and don’t understand it yet.

I found a nice real-world example, JSON-like configuration file formats, where reasonable developers have implemented many points on a complexity spectrum. (Full disclosure: I implemented one of these.)

STOP. Don’t take this post as an excuse to defend the answer you already like!

We’ll learn more if we spend some time wrapping our heads around other developers’ thinking. Someone saw a good rationale for each point on this spectrum. The point of the exercise is to look at why each of these could make sense, to see what we can learn.

Most of these file formats are principled. They have a rationale.

Here’s an overview of the spectrum I’ve chosen (not an exhaustive list of JSON-like formats, but an illustrative range):

(sorry for the unclickable links, click on the section headers below)

Most of these are JSON supersets or near-supersets, and they end up producing a JSON-style data structure in memory.

The points on the spectrum

I’d encourage you to click on these. Go look at the details of each one.

1. JSON

You all know this one already. JSON‘s principle might be ease of implementation, which means not much code to write, and less room for interoperability problems. Every software stack you’re likely to use comes with a JSON parser.

(It’s not uncommon to see JSON-with-comments, as a simple single-feature extension to JSON.)

2. HJSON

One step beyond JSON-with-comments, HJSON adds more syntactic sugar to JSON, including comments, multiline strings, and the ability to omit quotes and commas. But HJSON avoids features that introduce abstraction (HJSON does not give a config file maintainer any way to clean up duplicate or repetitive configuration). Everything in the file is a literal value.

3. Ad Hoc Play 1.x and Ad Hoc Akka 1.x

Unlike the other examples on my spectrum, these aren’t specifications or libraries. They are obsolete chunks of code intended to illustrate “I’ll just do something simple and custom,” a common developer decision. Neither one has a specification, and both have implementations involving regular expressions. HOCON (discussed next) replaced both of these in Play 2.x and Akka 2.x.

Play 1.x’s ad hoc format is a riff on Java properties, adding include statements and a ${foo} syntax to define one property in terms of another.

Akka 1.x’s ad hoc format is sort of like HOCON or HJSON in syntax, and also adds include statements to allow assembling a config from multiple files.

These ad hoc formats evolved organically and may be interesting data points showing what people want from a config format.

4. HOCON

HOCON includes similar syntactic niceties to HJSON, but introduces abstractions. That is, it tries to help the config file maintainer avoid duplication. It does this by adding two features: “merging” (two objects or two files can be combined in a defined way), and “substitution” (a reference syntax ${foo} used to point to other parts of the config or to environment variables). Include statements are also supported (defined in terms of merging, that is, an include inserts another file inline and merges its fields).

HOCON avoids anything that feels like “programming”; it lacks loops, conditionals, or arithmetic. It remains purely a data file.

5. YAML

YAML doesn’t quite belong here, because it wasn’t designed for configuration specifically. It’s a more readable way to write JSON. In that sense, it’s closer to HJSON than it is to HOCON or Jsonnet, but I’ve put it on the “more engineering” end of the spectrum because YAML has a large specification with quite a few features. Because YAML has an extension mechanism, it could in principle be extended (using tags) to support abstraction features such as includes.

6. Jsonnet

With Jsonnet we jump into the world of configuration-as-code. Jsonnet is a domain-specific programming language designed to generate JSON, with conditionals, expressions, and functions.

7. Writing code in a general-purpose programming language

Many developers are passionate advocates of avoiding config-specific languages entirely; they prefer to load and evaluate a chunk of regular code, instead. This code could be anything from JavaScript to Scala (often, it’s the same language used to implement the application).

Principled Design

Most of these formats have a thoughtful philosophy — an overall approach that guides them as they include or exclude features. This is a Good Thing, and it’s often overlooked by less-experienced developers.

Tradeoffs

What are some of the tradeoffs, when choosing a point on this spectrum? Here are some that I came up with.

  • Dependencies. Do you need a custom library?
  • Library size. How large is the code to read/write config files?
  • Leakiness of abstraction. How much are you going to have to care about the file format, when you’re using it to get some settings for your app?
  • Config file readability. Can people tell what your config file means?
  • DRY-ness of config files. Are there any means of abstraction?
  • Composing external sources. Can config files reference environment variables, remote resources, and the like?
  • Machine-editability. Can a program reliably load/edit/save a config file without sci-fi AI?
  • Cross-language interoperability. Are multiple implementations of the config file format likely to be compatible?
  • Learnability. Can the people editing your file format guess or easily learn how the format works?

The right answer hinges on people, not tech

Often, tradeoffs like these push a problem around between people.

An application developer who chooses to use JSON config to keep things simple, may be pushing complexity onto someone else — perhaps a customer, or someone in ops who will be deploying the app.

An application developer who uses anything more complex than JSON for their config may be asking customers, ops, or support to learn a new syntax, or even to learn how to program.

When we think about engineering tradeoffs, sometimes we feel we’re advocating the Right Thing, but in fact we’re advocating the Easiest Thing For Us Personally.

There won’t be a single right way to balance different interests. Who will configure your app? What background do they have? The people matter.

All of the choices work

None of these choices for config are categorically broken. When we choose one, we’re making a judgment that matters about tradeoffs, and we’re applying some measure of personal taste, but we aren’t choosing between broken and not-broken. (That’s what makes this an example worth discussing, I think.)

My Twitter account is @havocp.
Interested in becoming a better software developer? Sign up for my email list and I'll let you know when I write something new.