Dear package managers: dependency resolution results should be in version control
by havoc
If your build depends on a non-exact dependency version (like “somelibrary >= 3.1”), and the exact version gets recomputed every time you run the build, your project is broken.
- You can no longer build old versions and get the same results.
- Want to cut a bugfixes-only release from an old branch? Sorry.
- Want to use git bisect? Nope.
- You can’t rely on your code working because it will change by itself. Maybe it worked today, but that doesn’t mean it will work tomorrow. Maybe it worked in continuous integration, but that doesn’t mean it will work when deployed.
- Wondering whether any dependency versions changed and when? No way to figure it out.
Package management and build tools should get this right by default. It is a real problem; I’ve seen it bite projects I’m working on countless times.
(I know that some package managers get it right, and good for them! But many don’t. Not naming names here because it’s beside the point.)
What’s the solution? I’d argue that it’s been well-known for a while. Persist the output of the dependency resolution process and keep it in version control.
- Start with the “logical” description of the dependencies as hand-specified by the developers (leaf nodes only, with version ranges or minimum versions).
- Have a manual update command to run the dependency resolution algorithm, arriving at an exhaustive list of all packages (ideally: identified by content hash and including results for all possible platforms). Write this to a file with deterministic sort order, and encourage keeping this file in git. This is sometimes called a “lock file.”
- Both CI and production deployment should use the lock file to download and install an exact set of packages, ideally bit-for-bit content-hash-verified.
- When you want to update dependencies, run the update command manually and submit a pull request with the new lock file, so CI can check that the update is safe. There will be a commit in git history showing exactly what was upgraded and when.
Bonus: downloading a bunch of fixed package versions can be extremely efficient; there’s no need to download package A in order to find its transitive dependencies and decide package B is needed, instead you can have a list of exact URLs and download them all in parallel.
You may say this is obvious, but several major ecosystems do not do this by default, so I’m not convinced it’s obvious.
Reproducible builds are (very) useful, and when package managers can’t snapshot the output of dependency resolution, they break reproducible builds in a way that matters quite a bit in practice.
(Note: of course this post is about the kind of package manager or build tool that manages packages for a single build, not the kind that installs packages globally for an OS.)