cargo-semver-checks today and in 2023

December 23, 2022 semver rust retrospective year in review

cargo-semver-checks ends 2022 with 40,000 downloads from crates.io, able to prevent 30 different kinds of semver issues, and having done so in real-world use cases. Inspired by Yoshua Wuyts' "Rust in 2023 (by Yosh)" post, here are my thoughts on cargo-semver-checks in 2022, and what I look forward to in 2023 and beyond.

Following semver in Rust is a perfect example of a workflow worth automating:

Important to get right, painful if done wrong: cargo requires all crates to follow semver, so breaking semver in one crate can have a ripple effect across the ecosystem. [Sidenote: I recently re-learned this lesson myself, for the umpteenth time.] But if done right, semver is completely invisible.
Countless complex rules: There are hundreds of ways to cause a breaking change, many of them non-obvious. [Sidenote: The tracking issue for not-yet-implemented lints in cargo-semver-checks lists 60+ ways, and is far from an exhaustive list. I'm currently reading Rust for Rustaceans and discovering new ways to break semver, each more surprising than the last. For a quick taste, check out my previous blog post.]
Code that violates semver doesn't look wrong: No code reviewer can be expected to reliably flag most of the semver issues, even assuming they are well-versed in all the semver rules. The evidence on this point is particularly overwhelming.

Some might say the solution is to "git gud". I deeply respect operational excellence, but this is not the way.

Civilization advances at the rate at which we develop robust abstractions. I am writing this on a computer I cannot build, under a blanket I cannot weave, having enjoyed a meal with ingredients I cannot grow. I dedicated ten years to math competitions, [Sidenote: And developing test-taking strategies aimed at getting a perfect score given limited time!] and I can't even calculate a logarithm by hand! Can you? [Sidenote: If my life depended on it, I'd use the Newton-Raphson method to approximate my way to it, but there's zero chance that's actually the best way. My friends with aero-astro engineering degrees still find it hilarious that I once used binary search to calculate orbital maneuvers for Kerbal Space Program, instead of the closed-form formula that apparently existed 😅]

Gatekeeping to only include people with a PhD in "Semver in Rust" won't cut it.

Yosh Wuyts quotes another Rust contributor as saying: "The job of an expert is to learn everything about a field there is to learn, and then distill it so that others don't have to." [Sidenote: I'll gladly put their name here if the quote is confirmed as coming from them. I wasn't present when this was said, and didn't want to risk misattributing.] I couldn't agree more!

2022: Rust + semver - tedium = 💖

cargo-semver-checks was born in mid-July 2022, when I realized that building a semver linter boils down to only two things:

a list of machine-checkable rules, and
a system to check them.

At a high level, that's all cargo-semver-checks is: a checklist, and a for-loop over it.

As is usually the case:

I wasn't the first person to realize this. cargo-semver-checks isn't the first attempt at a semver linter for Rust.
cargo-semver-checks stands on the shoulders of giants: without rustdoc JSON and serde, the same work would have taken ten times as long.

The novel trick in cargo-semver-checks is that lint rules are written declaratively.

Given the need to have hundreds of different lints defined over an ever-changing data format, [Sidenote: The rustdoc JSON format is unstable and frequently has breaking changes — sometimes even multiple times per week in nightly Rust.] this is a huge win.

But creating a good declarative query language is a much harder problem than semver! Generally one shouldn't replace an easier problem with a harder one. This is why linters rarely build their own query language.

Fortunately, I spent the last 7+ years of my career working on high-performance query languages for heterogeneous data, so I didn't need to start from scratch. Instead, I just plugged in my existing Trustfall query engine which is able to query any data source(s) no matter whether they are local files, remote APIs, or a terabyte-scale SQL cluster. [Sidenote: Ever wonder which lints do popular crates like itertools allow in their code? Or maybe you're curious which GitHub or Twitter users comment on HackerNews stories about OpenAI? The answers are one browser-executed query away!]

Thanks to Trustfall, each cargo-semver-checks lint is a type-checked structured query in Trustfall's GraphQL-like syntax. (More on this in future blog posts!) In practice, this means:

New lints are super easy to add: writing a new lint takes only 1-2 minutes. The vast majority of effort can then be spent on great test cases that reflect the diversity of use cases for each Rust language construct.
Lints are not tied to a specific rustdoc JSON format version. Even though the rustdoc JSON format changes frequently, the changes are absorbed by the Trustfall adapter for rustdoc and are completely invisible to the lints — an airtight abstraction layer.
cargo-semver-checks benefits from the performance and correctness guarantees of Trustfall, whose optimizations and test suite are far more intricate than would be feasible to write for a semver-checker alone. (If you'd like to hear more, tell me and I'll write more blog posts!)

All this allowed us to go from zero to 30 different semver lints in just five months.

We are ending 2022 on a particularly high note: four students have begun contributing to cargo-semver-checks as part of their Bachelors' theses! The pace of development has sped up dramatically thanks to their hard work, and the codebase is healthier than ever.

Looking ahead to 2023

At RustConf 2022 I had the pleasure of meeting several cargo team members, and we decided that the end goal for cargo-semver-checks is merging into cargo itself.

Another goal for cargo-semver-checks is adding even more lints to prevent more kinds of semver violations.

These goals are self-explanatory, and I won't dig into them further. Instead, I'll mention three of my personal favorite things I'd like to see in cargo-semver-checks in 2023.

Proactively discover and prevent false-positives

A false-positive error in cargo-semver-checks is when the tool incorrectly claims it found a semver violation. I consider false-positives extremely serious bugs [Sidenote: Much more serious than false-negatives! A false-negative means there was a semver violation but the tool didn't find it. There are dozens of ways to break semver that cargo-semver-checks can't yet detect, each of which is a false-negative.] because they give the user incorrect advice, confusing them and slowing them down while also hurting the credibility of cargo-semver-checks itself.

Unfortunately, in 2022 our users reported multiple false-positive errors. I am grateful to everyone that spent their precious time helping debug problems that shouldn't have happened in the first place.

We have already begun strengthening the cargo-semver-checks test systems to discover and prevent future false-positives, so our users won't have to. In the process, we already discovered and fixed three previously-unknown false-positives.

In 2023, we plan to take a page from Rust's book: testing cargo-semver-checks on the most popular crates on crates.io as part of our release process. This would have a dual benefit: in addition to proactively discovering false-positives, it would also ensure cargo-semver-checks is ready to be adopted by those crates at their maintainers' convenience. And if we happen to discover more semver issues in the wild, that'll be a nice bonus!

Faster semver-checking via rustdoc caching

A cargo-semver-checks run consists of two steps: generating rustdoc JSON, and running lints over the generated JSON files.

The "run the lints" step is much faster [Sidenote: Even though we've put in negligible effort at optimizing them beyond what Trustfall provides out of the box.] than the process of generating the rustdoc, which can take a few minutes in CI environments with low core counts like GitHub Actions.

In 2023, we'll implement rustdoc caching to limit how often the rustdoc has to be rebuilt.

We expect to cut rustdoc generation time in half: we'll still have to generate the current version's rustdoc, but we can avoid repeatedly rebuilding rustdoc for crate versions that are already published on crates.io.

Semver-check PRs, not just `cargo publish`

Currently, cargo-semver-checks is most ergonomic when used right before cargo publish: it checks whether the publish step with the specified version [Sidenote: If the version in Cargo.toml is already on crates.io, it assumes a patch version bump.] would result in a semver-compliant release.

But wouldn't it be nice to know about breaking changes in a pull request before merging it and committing to a major version bump? Multiple projects have already begun running cargo-semver-checks like this, generally via custom scripts they've adapted specifically for that purpose.

In 2023, I hope we're able to make this an officially-supported mode of operation, complete with a GitHub Action. Bonus points if the Action reports semver issues as inline PR comments using the lints' span information!

Onwards!

I'm thrilled and humbled by the response that cargo-semver-checks has received in the Rust community. I've never been more excited about building the future with Rust, and I'm excited to see what 2023 has in store for cargo-semver-checks and the Rust ecosystem as a whole.