JSON-like config, a spectrum of under/overengineering
by havoc
Cynics might say that overengineered means I didn’t write it and don’t understand it yet.
I found a nice real-world example, JSON-like configuration file formats, where reasonable developers have implemented many points on a complexity spectrum. (Full disclosure: I implemented one of these.)
STOP. Don’t take this post as an excuse to defend the answer you already like!
We’ll learn more if we spend some time wrapping our heads around other developers’ thinking. Someone saw a good rationale for each point on this spectrum. The point of the exercise is to look at why each of these could make sense, to see what we can learn.
Most of these file formats are principled. They have a rationale.
Here’s an overview of the spectrum I’ve chosen (not an exhaustive list of JSON-like formats, but an illustrative range):
(sorry for the unclickable links, click on the section headers below)
Most of these are JSON supersets or near-supersets, and they end up producing a JSON-style data structure in memory.
The points on the spectrum
I’d encourage you to click on these. Go look at the details of each one.
1. JSON
You all know this one already. JSON‘s principle might be ease of implementation, which means not much code to write, and less room for interoperability problems. Every software stack you’re likely to use comes with a JSON parser.
(It’s not uncommon to see JSON-with-comments, as a simple single-feature extension to JSON.)
2. HJSON
One step beyond JSON-with-comments, HJSON adds more syntactic sugar to JSON, including comments, multiline strings, and the ability to omit quotes and commas. But HJSON avoids features that introduce abstraction (HJSON does not give a config file maintainer any way to clean up duplicate or repetitive configuration). Everything in the file is a literal value.
3. Ad Hoc Play 1.x and Ad Hoc Akka 1.x
Unlike the other examples on my spectrum, these aren’t specifications or libraries. They are obsolete chunks of code intended to illustrate “I’ll just do something simple and custom,” a common developer decision. Neither one has a specification, and both have implementations involving regular expressions. HOCON (discussed next) replaced both of these in Play 2.x and Akka 2.x.
Play 1.x’s ad hoc format is a riff on Java properties, adding include statements and a ${foo}
syntax to define one property in terms of another.
Akka 1.x’s ad hoc format is sort of like HOCON or HJSON in syntax, and also adds include statements to allow assembling a config from multiple files.
These ad hoc formats evolved organically and may be interesting data points showing what people want from a config format.
4. HOCON
HOCON includes similar syntactic niceties to HJSON, but introduces abstractions. That is, it tries to help the config file maintainer avoid duplication. It does this by adding two features: “merging” (two objects or two files can be combined in a defined way), and “substitution” (a reference syntax ${foo}
used to point to other parts of the config or to environment variables). Include statements are also supported (defined in terms of merging, that is, an include inserts another file inline and merges its fields).
HOCON avoids anything that feels like “programming”; it lacks loops, conditionals, or arithmetic. It remains purely a data file.
5. YAML
YAML doesn’t quite belong here, because it wasn’t designed for configuration specifically. It’s a more readable way to write JSON. In that sense, it’s closer to HJSON than it is to HOCON or Jsonnet, but I’ve put it on the “more engineering” end of the spectrum because YAML has a large specification with quite a few features. Because YAML has an extension mechanism, it could in principle be extended (using tags) to support abstraction features such as includes.
6. Jsonnet
With Jsonnet we jump into the world of configuration-as-code. Jsonnet is a domain-specific programming language designed to generate JSON, with conditionals, expressions, and functions.
7. Writing code in a general-purpose programming language
Many developers are passionate advocates of avoiding config-specific languages entirely; they prefer to load and evaluate a chunk of regular code, instead. This code could be anything from JavaScript to Scala (often, it’s the same language used to implement the application).
Principled Design
Most of these formats have a thoughtful philosophy — an overall approach that guides them as they include or exclude features. This is a Good Thing, and it’s often overlooked by less-experienced developers.
Tradeoffs
What are some of the tradeoffs, when choosing a point on this spectrum? Here are some that I came up with.
- Dependencies. Do you need a custom library?
- Library size. How large is the code to read/write config files?
- Leakiness of abstraction. How much are you going to have to care about the file format, when you’re using it to get some settings for your app?
- Config file readability. Can people tell what your config file means?
- DRY-ness of config files. Are there any means of abstraction?
- Composing external sources. Can config files reference environment variables, remote resources, and the like?
- Machine-editability. Can a program reliably load/edit/save a config file without sci-fi AI?
- Cross-language interoperability. Are multiple implementations of the config file format likely to be compatible?
- Learnability. Can the people editing your file format guess or easily learn how the format works?
The right answer hinges on people, not tech
Often, tradeoffs like these push a problem around between people.
An application developer who chooses to use JSON config to keep things simple, may be pushing complexity onto someone else — perhaps a customer, or someone in ops who will be deploying the app.
An application developer who uses anything more complex than JSON for their config may be asking customers, ops, or support to learn a new syntax, or even to learn how to program.
When we think about engineering tradeoffs, sometimes we feel we’re advocating the Right Thing, but in fact we’re advocating the Easiest Thing For Us Personally.
There won’t be a single right way to balance different interests. Who will configure your app? What background do they have? The people matter.
All of the choices work
None of these choices for config are categorically broken. When we choose one, we’re making a judgment that matters about tradeoffs, and we’re applying some measure of personal taste, but we aren’t choosing between broken and not-broken. (That’s what makes this an example worth discussing, I think.)
See also: This planet.mozilla.org post from today, “Standardizing Things My Way“>
Though, for what it’s worth, I don’t see the differences in these particular choices as irrelevant cosmetics – I do think there are some meaningful differences, and some real context-specific reasons to prefer one or another way to do application config.
One problem I find with most of these formats is that they *look* like JSON, but because they don’t conform to spec, they’re not readable by standard JSON tools, and they’re too obscure for people to know which tools to use.
Most people know JSON, but I’d never heard of HOCON or HJSON before reading this post. As such, if I saw it, I’d assume it was JSON, then find myself cursing the developers responsible for producing “crappy non-conformant config files”.
To me the question is whether, in a particular context, your desire to parse the file with a JSON parser outweighs for example an ops team desire to reference env vars in a config. It’s easy to find pros and cons; can be hard to decide which ones matter most in a given situation.
Many widely adopted apps don’t even seem to have something like HJSON or HOCON that’s factored out and documented … popular apps often seem to have stuff more like the play/Akka 1.x ad hoc hacks … I think it’s because JSON and XML alone really leave some needs unmet, so people start to elaborate but usually in a half ass way.
I kinda missed your point that a non-JSON-like or more clearly unique format could be good. Pros and cons to that 🙂 I think JSON superset is nice for learnability for example.
Yeah, I’m not saying that those formats don’t add useful elements that aren’t in JSON – comments is a big one, no question. It’s just that to anyone not familiar with those variants, JSON+extensions is basically just malformed JSON…
That’s one of the useful properties of YAML – as well as being easier to read/write without specialised tools, it has the advantage of not looking anything like JSON… it doesn’t create unwanted expectations.
Speaking from an ops point of view, for me, HOCON represents almost ideal blend of simplicity (json-like, human editable), and features (value tree merging, config key references).
In fact, we built an entire configuration database around the merge feature. It allows us to store server meta- and config data in a hierarchical fashion – we can define common properties for a group of servers in one config file and only override / amend specific properties for individual (special case) servers. The resulting DRY-ness of such structure was a key benefit in maintaining config for thousands of servers.
[…] see also this post on JSON-like config […]