I recently built cargo-semver-checks
, a linter that ensures crates adhere to semantic versioning. This is why and how I built it.
Fearless development is a key theme throughout Rust. "If it compiles, it works", fearless concurrency, etc.
But there's one aspect of Rust (and nearly all other languages) that isn't entirely fearless yet: cargo update
, upgrading the versions of the project's dependencies.
Most of the time, running cargo update
is uneventful, or even joyous due to performance optimizations in the dependencies.
But on rare occasions, cargo update
can also cause a project to no longer compile.
cargo
assumes that crates follow semantic versioning (semver), but despite maintainers' best efforts, a variety of accidental semver violations have happened:
pyo3 v0.5.1
accidentally changed a function signature;clap v3.2.x
accidentally had a type stop implementing an auto-trait;- multiple
block-buffer
versions accidentally broke their MSRV contract, - and many more: this paper claims 43% of yanked (un-published) releases mentioned semver breaks as a reason for yanking.
In this situation, maintainers face a difficult choice: yank the problematic versions, or ignore the semver break and embrace the accidental change?
Yanking can break the builds of everyone using that version.
For example, the clap
case above would have required all 3.2 versions to be yanked, breaking many projects.
In other cases, and especially if the semver break is detected early, yanking is feasible and preferable: pyo3 v0.5.1
was yanked and replaced with v0.5.2
which was semver-compatible with v0.5.0
.
But what if we could detect semver breaks even earlier ... perhaps even before the problematic cargo publish
?
This is what cargo-semver-checks
helps with.
And why do these semver accidents happen even in well-run projects like pyo3
and clap
?
See if you spot the semver break in the example code in the next section.
Breaking semver with auto-traits
Semver in Rust is easy to break by accident.
I am not even talking about ambiguous situations like #[repr(transparent)]
. Per the Rustonomicon: This repr is only considered part of the public ABI of a type if either the single field is pub, or if its layout is documented in prose. Otherwise, the layout should not be relied upon by other crates.
Consider the following example:
struct Foo {
x: String
}
pub struct Bar {
y: Foo
}
Then change Foo.x
from String
to Rc<str>
💥
This feels completely benign, doesn't it?
We changed the non-public field x
of the non-public struct Foo
.
Surely changes in non-public code can't affect the crate's public API, right?
Right??
Nope! This is a breaking change, and semver says our crate needs a new MAJOR version.
To understand the breaking change, we need to briefly talk about the Send
and Sync
traits, which are automatically implemented by the compiler whenever possible:
Send and Sync are also automatically derived traits. This means that, unlike every other trait, if a type is composed entirely of Send or Sync types, then it is Send or Sync. Almost all primitives are Send and Sync, and as a consequence pretty much all types you'll ever interact with are Send and Sync. Major exceptions include: [...]
Rc
 isn't Send or Sync (because the refcount is shared and unsynchronized).
String
is both Send
and Sync
, but Rc<str>
is neither.
x: Rc<str>
made struct Foo
become not Send
or Sync
, which in turn made struct Bar
(via its y: Foo
field) not Send
or Sync
either.
The original implementation was safe to use in concurrent contexts but the new one is not, potentially breaking downstream code.
Oops!
Here, both structs were in the same file. But what if one of them was in a different file elsewhere in the crate? We might edit a file containing only crate-internal code and thus break the public API of the opposite corner of the crate.
No wonder accidental semver breaks happen. "Just be careful" is not going to work here.
Finding semver violations with cargo-semver-checks
Checking whether changing a field type affects the public API of the crate is hard.
We're essentially asking: "Are there any public types in the crate API that were Send
or Sync
, but would stop being Send
or Sync
as a result of changing this field?"
Answering that question is way too much work for a human. It's unreasonable to do that much work just to change one private field. With that much overhead, we'd never get anything done.
The compiler, on the other hand, does this sort of work all the time. Let's ask it to help!
The rustdoc
tool allows us to get our crate's type information as a (large) JSON file.
We can make one JSON file showing the crate's information before our changes — the "baseline" — and another file showing the crate information after our changes: the "current" JSON.
Our Send
and Sync
problem from earlier would be reflected in these files: the baseline JSON would show Bar
is both Send
and Sync
, but the current JSON would not include Bar
implementations for either Send
or Sync
.
With the right flags, rustdoc
can also output information about non-public types like Foo
, whose Send
and Sync
implementations will similarly have disappeared.
This is how we'd answer our earlier "types that previously were Send
or Sync
but now are not" question.
Zooming out, we can also detect other semver issues in the same manner.
Public function got removed?
"Find public functions that previously existed but don't exist anymore."
An existing enum became #[non_exhaustive]
for the first time?
"Find public enums that were previously exhaustive but are now #[non_exhaustive]
."
And so on for each of the dozens, if not hundreds, of semver rules.
No wonder they are hard to follow when I can't even say for sure how many of them there are...
I'll borrow Immutably, of course. a line from @oli_obk:
If you're creating rules that need more than a page of prosa or more than a month to come up with, consider writing the rules in an executable way instead. https://t.co/XvMFJO3Yjv
— unsafe async const fn oli (@oli_obk) July 23, 2022
cargo-semver-checks
is just:
- a collection of lint checks over the "before-and-after" JSON files, where each lint looks for a particular kind of semver violation;
Alternative approaches exist: using unstable internal compiler APIs, diffing source code ASTs, diffing the rustdoc JSON files. The
cargo-semver-checks
docs summarize the tradeoffs, and the details have been discussed at length in GitHub PRs and issues. - a way to execute those checks, and
- some code that runs
rustdoc
with the right flags, and generally ties everything together.
And when I say lint checks, I really just mean queries: "Are there any types in the API that stopped being Send
or Sync
?" in cargo-semver-checks
is a query written in a strongly-typed query language executed by the Trustfall "query everything" engine I've also been working on.
To learn more about Trustfall, check out my 10min conference talk "How to Query (Almost) Everything" from the HYTRADBOI 2022 conference.
We'll save the details on how exactly queries are executed for another time.
Onward to fearless cargo update
cargo update
isn't fearless yet.
cargo-semver-checks
as of v0.9.1
implements only 18Â semver checks, with many more to go.
(Happy to mentor folks to help add new checks!)
rustdoc
's JSON output format is still in the process of being stabilized, and is currently only available on nightly Rust.
And there are always more ergonomics improvements and more exciting new features ahead of us.
But there's also a plan for making cargo-semver-checks
a part of cargo
itself, and members of the cargo
team are already helping improve cargo-semver-checks
.
There's a GitHub Action for running cargo-semver-checks
right before your cargo publish
step:
- name: Check semver
uses: obi1kenobi/cargo-semver-checks-action@v1
- name: Publish to crates.io
run: # your `cargo publish` code here
There's even interest in hosting each crate version's generated JSON files on docs.rs
once the format is stabilized, which would further speed up and simplify semver-checking.
There's still work to do, but the future is bright!
Thanks to Bojan Serafimov, Luca Palmieri, and Doc Jones for their feedback on drafts of this post. Any mistakes are mine alone.