The Java ecosystem and Scala ABI versioning : Havoc's Blog

On the sbt mailing list there’s a discussion of where to go with “cross versioning.” Here’s how I’ve been thinking about it.

Disclaimer

I’m a relative newcomer to the Scala community. If I push anyone’s buttons it’s not intentional. This is a personal opinion.

Summary

Two theories:

The largest problem created by changing ABI contracts is an explosion of combinations rather than the ABI change per se.
The ABI of the Scala standard library is only one of the many ABIs that can cause problems by changing. A general solution to ABI issues would help cope with ABI changes to any jar file, even those unrelated to Scala.

Proposal: rather than attacking the problem piecemeal by cross-versioning with respect to a single jar (such as the Scala library), cross-version with respect to a global universe of ABI-consistent jars.

This idea copies from the Linux world, where wide enterprise adoption has been achieved despite active hostility to a fixed ABI from the open source Linux kernel project, and relatively frequent ABI changes in userspace (for example from GTK+ 1.2, to 2.0, to 3.0). I believe there’s a sensible balance between allowing innovation and providing a stable platform for application developers.

Problem definition: finding an ABI-consistent universe

If you’re writing an application or library in Scala, you have to select a Scala ABI version; then also select an ABI version for any dependencies you use, whether they are implemented in Scala or not. For example, Play, Akka, Netty, slf4j, whatever.

Not all combinations of dependencies exist and work. For example, Play 1.2 cannot be used with Akka 1.2 because Play depends on an SBT version which depends on a different Scala version from Akka.

Due to a lack of coordination, identifying an ABI-consistent universe involves trial-and-error, and the desired set of dependencies may not exist.

Projects don’t reliably use something like semantic versioning so it can be hard to even determine which versions of a given jar have the same ABI. Worse, if you get this wrong, the JVM will complain very late in the game (often at runtime — unfortunately, there are no mechanisms on the JVM platform to encode an ABI version in a jar).

Whenever one jar in your stack changes its ABI, you have a problem. To upgrade that jar, anything which depends on it (directly or transitively) also has to be upgraded. This is a coordination problem for the community.

To see the issue on a small scale, look at what happens when a new SBT version comes out. Initially, no plugins are using the new version so you cannot upgrade to it if you’re using plugins. Later, half your plugins might be using it and half not using it: you still can’t upgrade. Eventually all the plugins move, but it takes a while. You must upgrade all your plugins at once.

Whenever a dependency, such as sbt, changes its ABI, then the universe becomes a multiverse: the ecosystem of dependencies splits. Changing the ABI of the Scala library, or any widely-used dependency such as Akka, has the same effect. The real pain arrives when many modules change their ABI, slicing and dicing the ecosystem into numerous incompatible, undocumented, and ever-changing universes.

Developers must choose among these universes, finding a working one through trial and error.

For another description of the problem, see this post from David Pollak.

Often, projects are reluctant to have dependencies on other projects, because the more dependencies you have the worse this problem becomes.

One solution: coordinate an explicit universe

This idea shamelessly takes a page from Linux distributions.

We could declare that there is a Universe 1.0. This universe contains a fixed ABI version of the Scala standard library, of SBT, of Akka, of Play — in principle, though initially not in practice, of everything.

To build your application, rather than being forced to specify the version of each individual dependency, you could specify that you would like Universe 1.0. Then you get the latest release for each dependency as long as its ABI remains Universe-1.0-compatible.

There’s also a Universe 2.0. In Universe 2.0, the ABI can be changed with respect to Universe 1.0, but again Universe 2.0 is internally consistent; everything in Universe 2.0 works with everything else in Universe 2.0, and the ABI of Universe 2.0 does not ever change.

The idea is simple: convert an undocumented, ever-changing set of implicit dependency sets into a single series of documented, explicit, testable dependency sets. Rather than an ad hoc M versions of Scala times N versions of SBT times O versions of Akka times P versions of whatever else, there’s Universe 1.0, Universe 2.0, Universe 3.0, etc.

This could be straightforwardly mapped to repositories; a repository per universe. Everything in the Universe 1.0 repository has guaranteed ABI consistency. Stick to that repository and you won’t have ABI problems.

One of the wins could be community around these universes. With everyone sharing the same small number of dependency sets, everyone can contribute to solving problems with those sets. Today, every application developer has to figure out and maintain their own dependency set.

How to do it

Linux distributions and large multi-module open source projects such as GNOME provide a blueprint. Here are the current Fedora and GNOME descriptions of their process for example.

For these projects, there’s a schedule with a development phase (not yet ABI frozen), freeze periods, and release dates. During the development phase incompatibilities are worked out and the final ABI version of everything is selected.

At some point in time it’s all working, and there’s a release. Post-release, the ABI of the released universe isn’t allowed to change anymore. ABI changes can only happen in the next version of the universe.

Creating the universe is simply another open source project, one which develops release engineering infrastructure. “Meta-projects” such as Fedora and GNOME involve a fair amount of code to automate and verify their releases as a whole. The code in a Universe project would convert some kind of configuration describing the Universe into a published repository of artifacts.

There are important differences between the way the Linux ecosystem works today and the way the Java ecosystem works. Linux packages are normally released as source code by upstream open source developers, leaving Linux distributions to compile against particular system ABIs and to sign the resulting binaries. Java packages are released as binaries by upstream, and while they could be signed, often they are not. As far as I know, however, there is nothing stopping a “universe repository” project from picking and choosing which jar versions to include, or even signing everything in the universe repository with a common key.

I believe that in practice, there must be a central release engineering effort of some kind (with automated checks to ensure that ABIs don’t change, for example). Another approach would be completely by convention, similar to the current cross-build infrastructure, where individual package maintainers could put a universe version in their builds when they publish. I don’t believe a by-convention-only approach can work.

To make this idea practical, there would have to be a “release artifact” (which would be the entire universe repository) and it would have to be tested as a whole and stamped “released” on a certain flag day. There would have to be provisions for “foreign” jars, where a version of an arbitrary already-published Java jar could be included in the universe.

It would not work to rely on getting everyone on earth to buy into the plan and follow it closely. A small release engineering team would have to create the universe repository independently, without blocking on others. Close coordination with the important packages in the universe would still be very helpful, of course, but a workable plan can’t rely on getting hundreds of individuals to pay attention and take action.

Scala vs. Java

I don’t believe this is a “Scala” problem. It’s really a Java ecosystem problem. The Scala standard library is a jar which changes ABI when the major version is bumped. A lot of other jars depend on the standard library jar. Any widely-used plain-Java jar that changes ABI creates the same issues.

(Technicality: the Scala compiler also changes its code generation which changes ABIs, but since that breaks ABIs at the same time that the standard library does, I don’t think it creates unique issues.)

Thinking of this as a “Scala problem” frames it poorly and leads to incomplete solutions like cross-versioning based only on the Scala version. A good solution would also support ABI changes in something like slf4j or commons-codec or whatever example you’d like to use.

btw, it would certainly be productive to look at what .NET and Ruby and Python and everyone else have done in this area. I won’t try to research and catalog all those in this post (but feel free to discuss in comments).

Rehash

The goal is that rather than specifying the version for every dependency in your build, you would specify “Universe 1.0”; which would mean “the latest version of everything in the ABI-frozen and internally consistent 1.0 universe of dependencies.” When you get ready to update to a newer stack, you’d change that to “Universe 2.0” and you’d get another ABI-frozen, internally-consistent universe of dependencies (but everything would be shinier and newer).

This solution scales to any number of ABI changes in any number of dependencies; no matter how many dependencies or how many ABI changes in those dependencies, application developers only have to specify one version number (the universe version). Given the universe, an application will always get a coherent set of dependencies, and the ABI will never change for that universe version.

This solution is tried and true. It works well for the universe of open source C/C++ programs. Enterprise adoption has been just fine.

After all, the problem here is not new and unique to Java. It wasn’t new in Linux either; when we were trying to work out what to do in the GNOME Project in 1999-2001 or so, in part we looked at Sun’s longstanding internal policies for Solaris. Other platforms such as .NET and Ruby have wrestled with it. There’s a whole lot of prior art. If there’s an issue unique to Java and Scala, it seems to be that we find the problem too big and intimidating to solve, given the weight of Java tradition.

I’m just writing down half-baked ideas in a blog post; making anything like this a reality hinges on people doing a whole lot of work.

Comments

You are welcome to comment on this post, but it may make more sense to add to the sbt list thread (use your judgment).

9 Responses to “The Java ecosystem and Scala ABI versioning”

Andrew Overholt says:

January 24, 2012 at 2:31 pm

This is at least somewhat related to Java module systems. OSGi is pretty well-deployed (Eclipse is built on it, for example) but OpenJDK is attempting something for Java 8 AFAIK.

Gus says:

January 24, 2012 at 3:44 pm

So, although I think the idea is generally interesting, you sometimes just want to upgrade one library. Say you need to support new hardware, so you need a new libX.

If that requires you to upgrade all libraries, and all libraries are potentially under a different ABI, that means you have to re-code significant parts of your software, and re-test the whole thing.

In the end, there’s no real solution for this unless we expect the ABI/APIs never to change, which would not be very productive either.

I do agree that an automated tool to validate that the libraries are internally consistent would be nice. Except it would not work for reflection…

- havoc says:
  
  January 24, 2012 at 3:53 pm
  
  You can always fall back to the current situation and specify the repository/version for some libraries by hand, or even drop them in a lib/ directory… it’s just that you’d be back to the pain of the current situation. Nothing would stop you from making that tradeoff though. Having some defined common universes doesn’t keep you from rolling your own, if you really want to, just as you do now.
  
  Both now and with common universes, of course, it’s possible that your new libX flat-out conflicts with other stuff you’re using, and it’s therefore impossible to upgrade (at least without rebuilding/patching one or more of your other dependencies to remove the conflicts).
  
  If the new libX doesn’t conflict with your other stuff, then you could upgrade it either way, by just specifying it by hand instead of asking for the common universe version.
  
Julian Andres Klode says:

January 24, 2012 at 6:34 pm

For Python and Ruby, the solution to ABI issues is fairly simple: There is no ABI. OK, there is stuff for c extensions, but the interfaces between modules do not have any binary compatibility stuff.

- havoc says:
  
  January 24, 2012 at 6:42 pm
  
  Maybe another way to look at it is that the API and the ABI are the same… no compile-time vs. runtime distinction, but there’s still an issue of the interface remaining compatible or not.
  
Thomas Koch says:

January 25, 2012 at 4:43 am

I think in some ways it boils down to collaboration, a concept hard to grasp for the enterprise world.
It’s not that thinks don’t break in Debian and that there wouldn’t be problems with incompatibilities. But all Debian developers collaborate constantly to keep the universe running. Debian even collaborates with Fedora (and Ubuntu of course).
In the enterprise world every team or shop struggles on its own to get a working universe. For a small team this is of course a heroic challenge.

And we work with source code. So if a new version of a compiler or dependency comes in, we just automatically rebuild everything that depends on it. – That’s one reason why Distribution devs are so religious about having the source code and easy builds.

Simon says:

January 26, 2012 at 9:21 am

Great thoughts!

Some of it is already starting to happen, e. g. the idea to have a Universe of libraries which are compiled together and verified to work.

The Typesafe Stack is another example where specific versions are bundled and shipped together.

As far as I have understood Jigsaw will enable us to describe the dependencies of our libraries in a much better and machine-verified way.

In the end, compiling from source and the growing maturity of key parts of the ecosystem both reduce the problem.

A lot of work remains, but I’m optimistic!

This week in #Scala (27/01/2012) « Cake Solutions Team Blog says:

January 27, 2012 at 4:54 am

[…] The Java ecosystem and Scala ABI versioning by Havoc Pennington (@havocp) […]

Max says:

January 27, 2012 at 7:41 pm

Coming from the Linux world background you’re picturing the ‘Universe’ according to the release life cycles of distributions/desktop environments, me coming from a mainly C#/.NET Framework background im just reminded of .NET Framework versions!

I seconds it’s not a Scala specific problem – the JDK modularisations JSRs have been spark expectations for quite some time with OSGi being already here.

As .NET versioned assemblies (http://msdn.microsoft.com/en-us/library/51ket42z%28v=vs.100%29.aspx) led out of the DDL hell (by allowing the use of multiple versions of the same library by different dependencies of a project) things like OSGi are said to lead out of the JAR hell.

Best regards, Max