Four challenges cargo-semver-checks has yet to tackle

January 23, 2024 semver rust

My last post covered the key cargo-semver-checks achievements from 2023. Here are the biggest challenges that lie ahead!

Many of the remaining challenges in cargo-semver-checks are obvious: we all want more lints, fewer false-positives, etc. etc. Let's set those aside.

Instead, let's talk about four non-obvious challenges we have yet to tackle:

Discuss this post on r/rust or lobste.rs. Subscribe to future posts.

Obvious in retrospect — Project edge cases

In theory, cargo-semver-checks is simple: for each crate being scanned, grab an appropriate prior version, diff it against the current source code to find breaking changes.

In practice, that sentence glosses over an unbelievable number of edge cases. It would only become accurate if we put^* footnotes^* next^* to^* each^* word^*. Not real footnotes, just a narrative aid. "Diff the source code to find breaking changes" sounds like it would be the hardest part — but in practice, that's the only straightforward bit!

Here are some of the edge cases we may run into before starting any semver checking:

GitHub screenshot from the opening comment at this link: https://github.com/obi1kenobi/cargo-semver-checks/pull/600 It reads: As cargo-semver-checks is getting adopted by more and more projects, we're seeing more and more cases where our current APIs and config flags give users insufficient control over various edge cases that happen in the real world. To address such edge cases, we'll need to expose a richer API in our lib target, which can then be used both as a dependency of other tools (e.g. release-plz) and directly via the CLI. I'm adding a draft for such a new API, and I'm looking for feedback! Here are the issues we've seen so far, lightly grouped: "We failed to generate or read the rustdoc JSON." (Hitting a recursion limit while running cargo semver-check on diesel; Crate fails to compile; Failure to generate rustdoc when RUSTFLAGS env var sets -Dwarnings); "There's no baseline version to check against." (Should skip if no baseline versions available; Make running on bin-only targets an error, and skip bin-only targets in --workspace; Improve error message when testing on not-yet-published crate; Option to skip crates whose Cargo.toml version is already published on the registry; Error message on previously nonexistent crate is a bit unhelpful); "The baseline version exists but cannot be used." (The tool fails when generating rustdoc of a yanked release from registry); "Unusual edge cases, where the library API can force handling them to happen at the layer above, since it is better suited to the task." (cargo semver-checks might exit 0 if it finds only publish = false crates in a workspace; can fail when projects of same name are in a subdirectory)

When I started cargo-semver-checks, the priority was getting the tool working at all. I didn't have time to consider, nor could I have imagined, the vast space of possible edge cases.

In retrospect, many of them seem obvious. In 2024, we should tackle them.

Blocked upstream — Cross-crate analysis

cargo-semver-checks currently analyzes crates independently, without looking at their dependencies. It never sees any data for items defined in dependencies, even if those items are part of the checked crate's public API. It's as if the foreign items simply don't exist.

Most of the time, this is fine! Many people still use cargo-semver-checks despite this limitation.

But cross-crate analysis is the top cause of false-positives in cargo-semver-checks today. In our ecosystem-wide semver study, it came in second only to the false-positives caused by #[doc(hidden)] handling — which are no longer an issue as of our v0.25 release.

In fact, it causes both false positives Say you move a type to another crate, then re-export it in its original location. This falsely looks like a deletion: the re-export references some unresolved (missing) item and we cannot determine if it's the same type or not. and false negatives. One example is if your crate has a re-export of another crate's item, and the re-export gets deleted or stops being pub. Since the re-export seemed to point to an unresolved (missing) item, cargo-semver-checks would have discarded it as non-analyzable before realizing that it was deleted. So why haven't we fixed it yet?

While cargo-semver-checks could be better at handling the downstream symptoms, any solution that tackles all edge cases will require new functionality in both rustc and rustdoc:

GitHub screenshot from this comment: https://github.com/obi1kenobi/cargo-semver-checks/issues/609 It reads: Cargo-semver-checks can't currently do cross-crate analysis, so all items from other crates look like they are missing. This is because: (a) Rustdoc doesn't inline re-exported items across crates. This used to be the case, but was removed because it was causing lots of bugs and other kinds of unpleasantness. (b) There's currently no reliable way to determine which dependency (including crate and exact version, to distinguish between multiple major versions) a cross-crate item came from. There has been interest in adding such a mechanism, though progress seems to have slowed recently: https://github.com/rust-lang/compiler-team/issues/635 (c) There will still be probably 3 person-months of work needed on the cargo-semver-checks side after such a mechanism is added. Such a large investment of time would require finding a more permanent source of funding for the project — sponsoring me on GitHub is the way to do that. When I can cover rent with cargo-semver-checks, I'll be much more able to work on time-consuming improvements like this.

Surprising limitation — No checking of types

Changing the type of a public function's parameters is an obvious breaking change. Surely?

A precise and correct answer needs another^* thousand^* footnotes^*. Still not real footnotes. It hinges on understanding dark corners of the Rust language (e.g. type coercions, or lifetime subtyping and variance) as well as community-designed gadgets like sealed traits that aren't official language featues (yet?) but are nevertheless commonly used in real-world Rust.

Currently, no lints check type information since they can't see it (#149) so all the type-related lints are on the not-yet-implemented list: #5. While simpler cases like the one in your repro exist, this is a very hard problem in general since it has to take into account generic types, lifetimes, lifetime variance, whether traits are sealed, etc. In a sense, it's the "final boss" of semver checking. For example: pub fn example(value: i64) { ... } changing to pub fn example(value: impl Into<i64>) { ... } is not a breaking change. But if inside a trait, it is a breaking change. Except if the trait is sealed, in which case it isn't a breaking change. Etc. etc. We'll need quite a bit more time and more funding before we can tackle this.

It's not all doom and gloom, though! I'm positive this is doable, the only question is what's the best way to do it.

The process here is going to be analogous to how I've been working on extending our trait-checking capabilities:

I found that correct trait semver-checks depend on accurately detecting whether a trait is sealed — meaning that it may not be implemented outside of the trait's own crate.
I did extensive research on the various ways to seal traits, which turned out to be a broad topic! I couldn't find any singular resource that covered all the details, so I wrote it myself. Since then, I've learned a few even more advanced trait-sealing techniques. This GitHub issue has more details — it might be time for a follow-up post though!
Then, I started working with the rustdoc-json maintainers toward removing roadblocks and making this feature easier to implement and maintain.
In the meantime, I've come up with a sketch for representing trait-sealing info in the schema that powers the cargo-semver-checks lint queries.

Which brings us to the next challenge...

Existential threat — Sustainable project funding

I'm not employed by any company to work on Rust or cargo-semver-checks. Instead, I'm trying to become a professional open-source maintainer.

I'm deeply grateful to every single one of my GitHub sponsors. Many of you are donating out of your own pocket 🙏 And behind each company that sponsors me are individuals who advocated for me to their company 🙏 I appreciate it more than I can describe!

Unfortunately, the current funding situation is far from sustainable.

The bottom line is this: all the sponsorships I've received for cargo-semver-checks in total would cover one week of my usual consulting fee.

Either I manage to figure out how to get more recurring funding to support my family, or I eventually join the list of burned out maintainers who had to make the difficult choice to abandon the projects they care so deeply about. The recent post on burnout in the Rust project hit quite close to home. We all know this problem goes well beyond just Rust itself. I love cargo-semver-checks, and I'm blessed to have an extremely supportive family, but on our current trajectory either our savings or their near-infinite patience will run out.

The best person to help is — you, dear reader.

If you work at a company that uses cargo-semver-checks, please ask them to support its development.
If you aren't sure how to approach that conversation, reach out and I'll help you through it!
If you're an experienced professional open-source maintainer who's made the system work — I'd love to learn from you!

Discuss this post on r/rust or lobste.rs. Subscribe to future posts.