# cargo-semver-checks today and in 2023

_Published: 2022-12-23_

*`cargo-semver-checks` ends 2022 with [40,000 downloads from crates.io](https://crates.io/crates/cargo-semver-checks), able to prevent 30 different kinds of semver issues, and having done so [in real-world use cases](https://twitter.com/PredragGruevski/status/1587877518018756609).
Inspired by Yoshua Wuyts' ["Rust in 2023 (by Yosh)"](https://blog.yoshuawuyts.com/rust-2023/) post, here are my thoughts on `cargo-semver-checks` in 2022, and what I look forward to in 2023 and beyond.*

Following semver in Rust is a perfect example of a workflow worth automating:
- **Important to get right, painful if done wrong:** `cargo` requires all crates to follow semver, so breaking semver in one crate can have a ripple effect across the ecosystem.[^sn-1]
  But if done right, semver is completely invisible.
- **Countless complex rules:** There are *hundreds* of ways to cause a breaking change, many of them non-obvious.[^sn-2]
- **Code that violates semver doesn't look wrong**: No code reviewer can be expected to reliably flag most of the semver issues, *even assuming* they are well-versed in all the semver rules.
  [The evidence](https://github.com/PyO3/pyo3/issues/285) [on this](https://github.com/clap-rs/clap/issues/3876) [point is](https://github.com/RustCrypto/utils/issues/22) [particularly](https://twitter.com/PredragGruevski/status/1587877518018756609) [overwhelming](https://arxiv.org/pdf/2201.11821.pdf).

Some might say the solution is to ["git gud"](https://en.wiktionary.org/wiki/git_gud).
I deeply [respect operational excellence](https://twitter.com/PredragGruevski/status/1289949333626986496), but this is not the way.

Civilization advances at the rate at which we develop robust abstractions.
I am writing this on a computer I cannot build, under a blanket I cannot weave, having enjoyed a meal with ingredients I cannot grow.
I dedicated *ten years* to math competitions,[^sn-3] and I can't even calculate a logarithm by hand! Can you?[^sn-4]

Gatekeeping to only include people with a PhD in "Semver in Rust" won't cut it.

Yosh Wuyts [quotes another Rust contributor](https://blog.yoshuawuyts.com/rust-2023/) as saying: "The job of an expert is to learn everything about a field there is to learn, and then distill it so that others don't have to."[^sn-5] I&nbsp;couldn't agree more!

## 2022: Rust + semver - tedium = 💖

`cargo-semver-checks` was born in mid-July 2022, when I realized that building a semver linter boils down to only two things:
- a list of machine-checkable rules, and
- a system to check them.

At a high level, that's all `cargo-semver-checks` is: [a checklist](https://github.com/obi1kenobi/cargo-semver-checks/tree/main/src/lints), and [a for-loop over it](https://github.com/obi1kenobi/cargo-semver-checks/blob/4567eca9e1b9e957b2282140ca63e4a8c51349b3/src/check_release.rs#L142).

As is usually the case:
- I wasn't the first person to realize this.
  `cargo-semver-checks` isn't the first attempt at a semver linter for Rust.
- `cargo-semver-checks` stands on the shoulders of giants: without rustdoc JSON and serde, the same work would have taken ten times as long.

The novel trick in `cargo-semver-checks` is that lint rules are written *declaratively*.

Given the need to have hundreds of different lints defined over an ever-changing data format,[^sn-6] this is a huge win.

But creating a good declarative query language is a much harder problem than semver!
Generally one shouldn't replace an easier problem with a harder one.
This is why linters rarely build their own query language.

Fortunately, I spent the last 7+ years of my career working on [high-performance query languages for heterogeneous data](https://blog.kensho.com/database-agnostic-querying-is-unavoidable-at-scale-18895f6df2f0), so I didn't need to start from scratch.
Instead, I just plugged in my existing [Trustfall query engine](https://github.com/obi1kenobi/trustfall) which is [able to query any data source(s)](https://www.hytradboi.com/2022/how-to-query-almost-everything) no matter whether they are local files, remote APIs, or a terabyte-scale SQL cluster.[^sn-7]

Thanks to Trustfall, each cargo-semver-checks lint is a type-checked structured query in Trustfall's GraphQL-like syntax.
(More on this in future blog posts!)
In practice, this means:
- New lints are super easy to add: writing a new lint takes only 1-2 minutes.
  The vast majority of effort can then be spent on great test cases that reflect the diversity of use cases for each Rust language construct.
- Lints are not tied to a specific rustdoc JSON format version.
  Even though the rustdoc JSON format changes frequently, the changes are absorbed by the Trustfall adapter for rustdoc and are completely invisible to the lints — an airtight abstraction layer.
- `cargo-semver-checks` benefits from the performance and correctness guarantees of Trustfall, whose optimizations and test suite are far more intricate than would be feasible to write for a semver-checker alone.
  (If you'd like to hear more, [tell me](https://hachyderm.io/@predrag) and I'll write more blog posts!)

All this allowed us to go from zero to 30 different semver lints in just five months.

We are ending 2022 on a particularly high note: [four students](https://twitter.com/PredragGruevski/status/1584563200011382784) have begun contributing to `cargo-semver-checks` as part of their Bachelors' theses!
The pace of development has sped up dramatically thanks to their hard work, and the codebase is healthier than ever.

## Looking ahead to 2023

At RustConf 2022 I had the pleasure of meeting several `cargo` team members, and we decided that the end goal for `cargo-semver-checks` is [merging into `cargo` itself](https://github.com/obi1kenobi/cargo-semver-checks/issues/61).

Another goal for `cargo-semver-checks` is adding [even more lints](https://github.com/obi1kenobi/cargo-semver-checks/issues/5) to prevent more kinds of semver violations.

These goals are self-explanatory, and I won't dig into them further. Instead, I'll mention three of my personal favorite things I'd like to see in `cargo-semver-checks` in&nbsp;2023.

### Proactively discover and prevent false-positives

A false-positive error in `cargo-semver-checks` is when the tool *incorrectly* claims it found a semver violation.
I consider false-positives extremely serious bugs[^sn-8] because they give the user incorrect advice, confusing them and slowing them down while also hurting the credibility of `cargo-semver-checks` itself.

Unfortunately, in 2022 our users [reported](https://github.com/obi1kenobi/cargo-semver-checks/issues/147) [multiple](https://github.com/obi1kenobi/cargo-semver-checks/issues/193) [false-positive](https://github.com/obi1kenobi/cargo-semver-checks/issues/167) [errors](https://github.com/obi1kenobi/cargo-semver-checks/issues/202).
I am grateful to everyone that spent their precious time helping debug problems that shouldn't have happened in the first place.

We have [already begun strengthening](https://github.com/obi1kenobi/cargo-semver-checks/issues/225) the `cargo-semver-checks` test systems to discover and prevent future false-positives, so our users won't have to. In the process, we already discovered and [fixed three](https://github.com/obi1kenobi/cargo-semver-checks/pull/222) [previously-unknown](https://github.com/obi1kenobi/cargo-semver-checks/pull/220) [false-positives](https://github.com/obi1kenobi/cargo-semver-checks/pull/218).

In 2023, we plan to take a page from Rust's book: testing `cargo-semver-checks` on the most popular crates on [crates.io](https://crates.io/) as part of our release process.
This would have a dual benefit: in addition to proactively discovering false-positives, it would also ensure `cargo-semver-checks` is ready to be adopted by those crates at their maintainers' convenience.
And if we happen to discover more semver issues in the wild, that'll be a&nbsp;nice&nbsp;bonus!

### Faster semver-checking via rustdoc caching

A `cargo-semver-checks` run consists of two steps: generating rustdoc JSON, and running lints over the generated JSON files.

The "run the lints" step is *much faster*[^sn-9] than the process of generating the rustdoc, which can take a few minutes in CI environments with low core counts like GitHub Actions.

In 2023, we'll implement rustdoc caching to limit how often the rustdoc has to be&nbsp;rebuilt.

We expect to cut rustdoc generation time in half: we'll still have to generate the current version's rustdoc, but we can avoid repeatedly rebuilding rustdoc for crate versions that are already published on [crates.io](https://crates.io/).

### Semver-check PRs, not just `cargo publish`

Currently, `cargo-semver-checks` is most ergonomic when used right before `cargo publish`: it checks whether the publish step with the specified version[^sn-10] would result in a semver-compliant release.

But wouldn't it be nice to know about breaking changes in a pull request *before* merging it and committing to a major version bump?
[Multiple](https://github.com/libp2p/rust-libp2p) [projects](https://github.com/pest-parser/pest) have already begun running `cargo-semver-checks` like this, generally via custom scripts they've adapted specifically for that purpose.

In 2023, I hope we're able to make this an officially-supported mode of operation, complete with a GitHub Action.
Bonus points if the Action reports semver issues as inline PR comments using the lints' span information!

## Onwards!

I'm thrilled and humbled by the response that `cargo-semver-checks` has received in the Rust community.
I've never been more excited about building the future with Rust, and I'm excited to see what 2023 has in store for `cargo-semver-checks` and the Rust ecosystem as a whole.

[^sn-1]: I recently [re-learned this lesson](https://github.com/obi1kenobi/cargo-semver-checks/issues/210) myself, for the umpteenth time.

[^sn-2]: [The tracking issue](https://github.com/obi1kenobi/cargo-semver-checks/issues/5) for not-yet-implemented lints in `cargo-semver-checks` lists 60+ ways, and is far from an exhaustive list. I'm currently reading [Rust for Rustaceans](https://rust-for-rustaceans.com/) and discovering new ways to break semver, each more surprising than the last. For a quick taste, check out my [previous blog post](https://predr.ag/blog/toward-fearless-cargo-update/#breaking-semver-with-auto-traits).

[^sn-3]: And developing test-taking strategies aimed at [getting a perfect score given limited time!](https://predr.ag/blog/to-ace-exams-get-better-at-the-easy-questions/)

[^sn-4]: If my life depended on it, I'd use the [Newton-Raphson method](https://en.wikipedia.org/wiki/Newton%27s_method) to approximate my way to it, but there's *zero chance* that's actually the best way. My friends with aero-astro engineering degrees still find it hilarious that I once used [binary search to calculate orbital maneuvers](https://github.com/obi1kenobi/kosmos/blob/master/maneuver_planning.ks#L56-L70) for Kerbal Space Program, instead of the closed-form formula that apparently existed 😅

[^sn-5]: I'll gladly put their name here if the quote is confirmed as coming from them. I wasn't present when this was said, and didn't want to risk misattributing.

[^sn-6]: The rustdoc JSON format is unstable and frequently has breaking changes — sometimes even multiple times per week in nightly Rust.

[^sn-7]: Ever wonder [which lints do popular crates like `itertools` allow in their code](https://play.predr.ag/rustdoc#?f=1&q=IyBJdGVtcyB3aGVyZSBsaW50cyB3ZXJlIGFsbG93ZWQuIE5vdCBhbGwgY3JhdGVzIGhhdmUgdGhlc2UsCiMgdHJ5IG9uZSBvZjogYW55aG93LCBjbGFwLCBodHRwLCBodHRwYXJzZSwgaHlwZXIsIGl0ZXJ0b29scy4KcXVlcnkgewogIENyYXRlIHsKICAgIGl0ZW0gewogICAgICBuYW1lIEBvdXRwdXQKCiAgICAgIGF0dHJpYnV0ZSB7CiAgICAgICAgYXR0cjogdmFsdWUgQG91dHB1dAogICAgICAgICAgICAgICAgICAgIEBmaWx0ZXIob3A6ICJyZWdleCIsIHZhbHVlOiBbIiRwYXR0ZXJuIl0pCiAgICAgIH0KCiAgICAgIHNwYW4gewogICAgICAgIGZpbGVuYW1lIEBvdXRwdXQKICAgICAgICBiZWdpbl9saW5lIEBvdXRwdXQKICAgICAgfQogICAgfQogIH0KfQ%3D%3D&v=ewogICJwYXR0ZXJuIjogIiNcXFthbGxvd1xcKC4rXFwpXFxdIgp9)? Or maybe you're curious [which GitHub or Twitter users comment on HackerNews stories about OpenAI](https://play.predr.ag/hackernews#?f=1&q=IyBDcm9zcyBBUEkgcXVlcnkgKEFsZ29saWEgKyBGaXJlYmFzZSk6CiMgRmluZCBjb21tZW50cyBvbiBzdG9yaWVzIGFib3V0ICJvcGVuYWkuY29tIiB3aGVyZQojIHRoZSBjb21tZW50ZXIncyBiaW8gaGFzIGF0IGxlYXN0IG9uZSBHaXRIdWIgb3IgVHdpdHRlciBsaW5rCnF1ZXJ5IHsKICAjIFRoaXMgaGl0cyB0aGUgQWxnb2xpYSBzZWFyY2ggQVBJIGZvciBIYWNrZXJOZXdzLgogICMgVGhlIHN0b3JpZXMvY29tbWVudHMvdXNlcnMgZGF0YSBpcyBmcm9tIHRoZSBGaXJlYmFzZSBITiBBUEkuCiAgIyBUaGUgdHJhbnNpdGlvbiBpcyBzZWFtbGVzcyAtLSBpdCBpc24ndCB2aXNpYmxlIGZyb20gdGhlIHF1ZXJ5LgogIFNlYXJjaEJ5RGF0ZShxdWVyeTogIm9wZW5haS5jb20iKSB7CiAgICAuLi4gb24gU3RvcnkgewogICAgICAjIEFsbCBkYXRhIGZyb20gaGVyZSBvbndhcmQgaXMgZnJvbSB0aGUgRmlyZWJhc2UgQVBJLgogICAgICBzdG9yeVRpdGxlOiB0aXRsZSBAb3V0cHV0CiAgICAgIHN0b3J5TGluazogdXJsIEBvdXRwdXQKICAgICAgc3Rvcnk6IHN1Ym1pdHRlZFVybCBAb3V0cHV0CiAgICAgICAgICAgICAgICAgICAgICAgICAgQGZpbHRlcihvcDogInJlZ2V4IiwgdmFsdWU6IFsiJHNpdGVQYXR0ZXJuIl0pCgogICAgICBjb21tZW50IHsKICAgICAgICByZXBseSBAcmVjdXJzZShkZXB0aDogNSkgewogICAgICAgICAgY29tbWVudDogdGV4dFBsYWluIEBvdXRwdXQKCiAgICAgICAgICBieVVzZXIgewogICAgICAgICAgICBjb21tZW50ZXI6IGlkIEBvdXRwdXQKICAgICAgICAgICAgY29tbWVudGVyQmlvOiBhYm91dFBsYWluIEBvdXRwdXQKCiAgICAgICAgICAgICMgVGhlIHByb2ZpbGUgbXVzdCBoYXZlIGF0IGxlYXN0IG9uZQogICAgICAgICAgICAjIGxpbmsgdGhhdCBwb2ludHMgdG8gZWl0aGVyIEdpdEh1YiBvciBUd2l0dGVyLgogICAgICAgICAgICBsaW5rCiAgICAgICAgICAgICAgQGZvbGQKICAgICAgICAgICAgICBAdHJhbnNmb3JtKG9wOiAiY291bnQiKQogICAgICAgICAgICAgIEBmaWx0ZXIob3A6ICI%2BPSIsIHZhbHVlOiBbIiRtaW5Qcm9maWxlcyJdKQogICAgICAgICAgICB7CiAgICAgICAgICAgICAgY29tbWVudGVySURzOiB1cmwgQGZpbHRlcihvcDogInJlZ2V4IiwgdmFsdWU6IFsiJHNvY2lhbFBhdHRlcm4iXSkKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBAb3V0cHV0CiAgICAgICAgICAgIH0KICAgICAgICAgIH0KICAgICAgICB9CiAgICAgIH0KICAgIH0KICB9Cn0%3D&v=ewogICJzaXRlUGF0dGVybiI6ICJodHRwW3NdOi8vKFteLl0qXFwuKSpvcGVuYWkuY29tLy4qIiwKICAibWluUHJvZmlsZXMiOiAxLAogICJzb2NpYWxQYXR0ZXJuIjogIihnaXRodWJ8dHdpdHRlcilcXC5jb20vIgp9)? The answers are one browser-executed query away!

[^sn-8]: Much more serious than false-negatives! A false-*negative* means there *was* a semver violation but the tool *didn't* find it. There are dozens of ways to break semver that `cargo-semver-checks` can't yet detect, each of which is a false-negative.

[^sn-9]: Even though we've put in negligible effort at optimizing them beyond what Trustfall provides out of the box.

[^sn-10]: If the version in `Cargo.toml` is already on [crates.io](https://crates.io), it assumes a patch version bump.

Copyright (C) Predrag Gruevski 2022. [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en)
