# Toward fearless cargo update

_Published: 2022-08-25_

*I recently built `cargo-semver-checks`, a linter that ensures crates adhere to semantic versioning. This is why and how I built it.*

Fearless development is a key theme throughout Rust.
["If it compiles, it works"](https://rustacean-principles.netlify.app/how_rust_empowers/reliable.html),
[fearless concurrency](https://doc.rust-lang.org/book/ch16-00-concurrency.html), etc.

But there's one aspect of Rust (and nearly all other languages) that isn't entirely fearless yet: `cargo update`, upgrading the versions of the project's dependencies.



Most of the time, running `cargo update` is uneventful, or even joyous due to performance optimizations in the dependencies.

But on rare occasions, `cargo update` can also cause a project to no longer compile.
`cargo` assumes that crates follow [semantic versioning (semver)](https://doc.rust-lang.org/cargo/reference/resolver.html#semver-compatibility), but despite maintainers' best efforts, a variety of accidental semver violations have happened:
- `pyo3 v0.5.1` accidentally [changed a function signature](https://github.com/PyO3/pyo3/issues/285);
- `clap v3.2.x` accidentally had [a type stop implementing an auto-trait](https://github.com/clap-rs/clap/issues/3876);
- multiple `block-buffer` versions accidentally [broke their MSRV contract](https://github.com/RustCrypto/utils/issues/22),
- and many more: [this paper](https://arxiv.org/pdf/2201.11821.pdf) claims 43% of yanked (un-published) releases mentioned semver breaks as a reason for yanking.

In this situation, maintainers face a difficult choice: [yank the problematic versions](https://doc.rust-lang.org/cargo/commands/cargo-yank.html), or ignore the semver break and embrace the accidental change?
Yanking can break the builds of everyone using that version.
For example, the `clap` case above would have [required all 3.2 versions to be yanked, breaking many projects](https://github.com/clap-rs/clap/issues/3876#issuecomment-1168133551).
In other cases, and especially if the semver break is detected early, yanking is feasible and preferable: `pyo3 v0.5.1` was yanked and replaced with `v0.5.2` which was semver-compatible with `v0.5.0`.

But what if we could detect semver breaks even earlier ... perhaps even before the problematic `cargo publish`?
This is what `cargo-semver-checks` helps with.

And why do these semver accidents happen even in well-run projects like `pyo3` and `clap`?
See if you spot the semver break in the example code in the next section.

## Breaking semver with auto-traits

Semver in Rust is easy to break by accident.[^sn-1]
Consider the following example:

```rust
struct Foo {
    x: String
}

pub struct Bar {
    y: Foo
}
```

Then change `Foo.x` from `String` to `Rc<str>` 💥

This feels completely benign, doesn't it?
We changed the non-public field `x` of the non-public struct `Foo`.
Surely changes in non-public code can't affect the crate's public API, right?
Right??

Nope!
This is a breaking change, and semver says our crate needs a new **MAJOR** version.

To understand the breaking change, we need to briefly talk about the `Send` and `Sync` traits, which are automatically implemented by the compiler whenever possible:

> Send and Sync are also automatically derived traits. This means that, unlike every other trait, if a type is composed entirely of Send or Sync types, then it is Send or Sync. Almost all primitives are Send and Sync, and as a consequence pretty much all types you'll ever interact with are Send and Sync. Major exceptions include: \[...\] `Rc` isn't Send or Sync (because the refcount is shared and unsynchronized).
>
> Source: [Send and Sync, Rustonomicon](https://doc.rust-lang.org/nomicon/send-and-sync.html)

`String` is both `Send` and `Sync`, but `Rc<str>` is neither.
`x: Rc<str>` made `struct Foo` become not `Send` or `Sync`, which in turn made `struct Bar` (via its `y: Foo` field) not `Send` or `Sync` either.
The original implementation was safe to use in concurrent contexts but the new one is not, potentially breaking downstream code.
Oops!

Here, both structs were in the same file.
But what if one of them was in a different file elsewhere in the crate?
We might edit a file containing only crate-internal code and thus break the public API of the opposite corner of the crate.

No wonder accidental semver breaks happen.
"Just be careful" is not going to work here.

## Finding semver violations with `cargo-semver-checks`

Checking whether changing a field type affects the public API of the crate is *hard*.

We're essentially asking: "Are there any public types in the crate API that were `Send` or `Sync`, but would stop being `Send` or `Sync` as a result of changing this field?"

Answering that question is *way too much work* for a human.
It's unreasonable to do that much work just to change one private field.
With that much overhead, we'd never get anything done.

The compiler, on the other hand, does this sort of work *all the time*.
Let's ask it to help!

The `rustdoc` tool allows us to get our crate's type information as a (large) JSON file.
We can make one JSON file showing the crate's information *before* our changes — the "baseline" — and another file showing the crate information *after* our changes: the "current" JSON.
Our `Send` and `Sync` problem from earlier would be reflected in these files: the *baseline* JSON would show `Bar` is both `Send` and `Sync`, but the *current* JSON would not include `Bar` implementations for either `Send` or `Sync`.[^sn-2]

This is how we'd answer our earlier "types that previously were `Send` or `Sync` but now are not" question.

Zooming out, we can also detect other semver issues in the same manner.
Public function got removed?
"Find public functions that previously existed but don't exist anymore."
An existing enum became `#[non_exhaustive]` for the first time?
"Find public enums that were previously exhaustive but are now `#[non_exhaustive]`."
And so on for each of the dozens, if not hundreds, of semver rules.[^sn-3]

I'll borrow[^sn-4] a line from [@oli_obk](https://twitter.com/oli_obk):

<blockquote class="twitter-tweet" data-dnt="true"><p lang="en" dir="ltr">If you&#39;re creating rules that need more than a page of prosa or more than a month to come up with, consider writing the rules in an executable way instead. <a href="https://t.co/XvMFJO3Yjv">https://t.co/XvMFJO3Yjv</a></p>&mdash; unsafe async const fn oli (@oli_obk) <a href="https://twitter.com/oli_obk/status/1550790900703526912">July 23, 2022</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>



`cargo-semver-checks` is just:
- a [collection of lint checks](https://github.com/obi1kenobi/cargo-semver-check/tree/main/src/queries) over the "before-and-after" JSON files, where each lint looks for a particular kind of semver violation;[^sn-5]
- a way to [execute those checks](https://github.com/obi1kenobi/cargo-semver-check/blob/main/src/adapter.rs), and
- some code that runs `rustdoc` with the right flags, and generally ties everything together.

![cargo-semver-checks run reporting auto-trait semver issues: five types are found to no longer be Send, Sync, Unpin, UnwindSafe, and RefUnwindSafe. The run completes in 0.177 seconds.](/blog/2022-08-25-toward-fearless-cargo-update/auto_traits.png)

*Example semver violation due to auto-traits, as caught by `cargo-semver-checks`.*

And when I say *lint checks*, I really just mean *queries*: "Are there any types in the API that stopped being `Send` or `Sync`?" in `cargo-semver-checks` is [a query written in a strongly-typed query language](https://github.com/obi1kenobi/cargo-semver-check/blob/main/src/queries/auto_trait_impl_removed.ron#L8-L70) executed by [the Trustfall "query everything" engine](https://github.com/obi1kenobi/trustfall) I've also been working on.[^sn-6]

We'll save the details on how exactly queries are executed for another time.

## Onward to fearless `cargo update`

`cargo update` isn't fearless yet.
`cargo-semver-checks` as of `v0.9.1` implements only 18&nbsp;semver checks, with [many more to go](https://github.com/obi1kenobi/cargo-semver-check/issues/5).
([Happy to mentor folks to help add new checks!](https://github.com/obi1kenobi/cargo-semver-check/blob/main/CONTRIBUTING.md#contributing))
`rustdoc`'s JSON output format is still in the process of being stabilized, and is currently only available on nightly Rust.
And there are [always more ergonomics improvements](https://github.com/obi1kenobi/cargo-semver-check/issues/86) and [more](https://github.com/obi1kenobi/cargo-semver-checks-action/issues/1) [exciting](https://github.com/obi1kenobi/cargo-semver-check/issues/38) [new](https://github.com/obi1kenobi/cargo-semver-check/issues/56) [features](https://github.com/obi1kenobi/cargo-semver-check/issues/6) [ahead](https://github.com/obi1kenobi/cargo-semver-check/issues/60) [of us](https://github.com/obi1kenobi/cargo-semver-check/issues/67).

But there's also a plan for making `cargo-semver-checks` a [part of `cargo` itself](https://github.com/obi1kenobi/cargo-semver-check/issues/61), and [members of the `cargo` team are already helping](https://github.com/obi1kenobi/cargo-semver-check/graphs/contributors) improve `cargo-semver-checks`.
There's [a GitHub Action](https://github.com/obi1kenobi/cargo-semver-checks-action#cargo-semver-checks-action) for running `cargo-semver-checks` right before your `cargo publish` step:
```yaml
- name: Check semver
  uses: obi1kenobi/cargo-semver-checks-action@v1
- name: Publish to crates.io
  run: # your `cargo publish` code here
```

There's even [interest in hosting each crate version's generated JSON files](https://github.com/rust-lang/docs.rs/issues/1285) on `docs.rs` once the format is stabilized, which would further speed up and simplify semver-checking.

There's still work to do, but the future is bright!

*Thanks to [Bojan Serafimov](https://twitter.com/Bojan93112526), [Luca Palmieri](https://twitter.com/algo_luca), and [Doc Jones](https://twitter.com/mojosd) for their feedback on drafts of this post.*
*Any mistakes are mine alone.*

[^sn-1]: I am not even talking about ambiguous situations like <span class="nobr">`#[repr(transparent)]`</span>. [Per the Rustonomicon](https://doc.rust-lang.org/nomicon/other-reprs.html#reprtransparent): This repr is only considered part of the public ABI of a type if either the single field is pub, or if its layout is documented in prose. Otherwise, the layout should not be relied upon by other crates.

[^sn-2]: With the right flags, `rustdoc` can also output information about non-public types like `Foo`, whose `Send` and `Sync` implementations will similarly have disappeared.

[^sn-3]: No wonder they are hard to follow when I can't even say for sure how many of them there are...

[^sn-4]: Immutably, of course.

[^sn-5]: Alternative approaches exist: [using unstable internal compiler APIs](https://github.com/rust-lang/rust-semverver), [diffing source code ASTs](https://github.com/iomentum/cargo-breaking), [diffing the rustdoc JSON files](https://github.com/Enselic/cargo-public-api). The `cargo-semver-checks` docs [summarize the tradeoffs](https://github.com/obi1kenobi/cargo-semver-check#why-cargo-semver-checks-instead-of-), and the details [have been discussed](https://github.com/rust-lang/cargo/issues/374) [at length in](https://github.com/obi1kenobi/cargo-semver-check/pull/39/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R64) [GitHub PRs and issues](https://github.com/obi1kenobi/cargo-semver-check/issues/97).

[^sn-6]: To learn more about Trustfall, check out my 10min conference talk ["How to Query (Almost) Everything" from the HYTRADBOI 2022 conference](https://predr.ag/talks/#how-to-query-almost-everything).

Copyright (C) Predrag Gruevski 2022. [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en)
