cargo-semver-checks
v0.37 can now scan Cargo.toml
files for breakage! In this post: a primer on Rust package features, and how innocuous-looking Cargo.toml
changes can break your users.
This Cargo.toml
diff contains a major breaking change — can you spot it?
[package]
name = "my_crate"
version = "0.1.0"
edition = "2021"
[dependencies]
# Optionally, process data in parallel.
rayon = { version = "1.10.0", optional = true }
# Optionally, use a faster hash table.
rustc-hash = { version = "2.1.0", optional = true }
[features]
+ # Allow selecting "max perf" mode with just one feature,
+ # instead of managing multiple optional dependencies.
+ max_performance = ["dep:rayon", "dep:rustc-hash"]
The change looks extremely innocuous!
my_crate
is gaining a simple convenience feature allowing users to easily opt into maximizing performance.
But shipping this change would cause breakage for precisely the performance-maximizing users we were aiming to help! Fortunately, the latest release of cargo-semver-checks
can save the day 😇
We'll explain the problem in a few steps.
Feel free to skip ahead if you're already familiar with Cargo.toml
package features, or if you just want the answer:
As most code examples in blog posts, this example is intended to be pedagogical: simple and accurate, at small expense to realism. Rest assured that this flavor of breakage does happen in the real world, albeit possibly with extra steps — for example, together with Rust code changes instead of just a Cargo.toml
change.
- Basics of Rust package features
- Deleting a package feature is a major breaking change
- Optional dependencies (sometimes) create implicit features
- Breakage! in the
Cargo.toml
- How this works under the hood in
cargo-semver-checks
- Wrapping up
Basics of Rust package features
Cargo allows packages to ship with functionality that can be conditionally enabled and compiled as part of the package. Users may opt in to get that conditional functionality, or opt out and make it as if that functionality never existed. I'm glossing over some details for brevity. For example, features can be enabled or disabled by default i.e. when the user hasn't actively opted in or out. The cargo reference has an excellent chapter on the topic if you'd like all the details.
The following Cargo.toml
defines an opt-in feature called randomize
:
[package]
name = "example"
version = "0.1.0"
[features]
randomize = []
The crate can define functionality that is only present with the randomize
feature enabled. For example, the crate can say that its random
module exists only when the randomize
feature is enabled:
#[cfg(feature = "randomize")]
pub mod random;
Downstream users who depend on this crate may opt into the randomize
feature like so:
[package]
name = "downstream_crate"
version = "0.1.0"
[dependencies]
+ example = { version = "0.1.0", features = ["randomize"]}
If the features = ["randomize"]
portion wasn't present, the feature would remain disabled and its random
module would not be present.
The code inside that module wouldn't become private — it would quite literally not exist from Rust's point of view.
Hence, conditional compilation.
Deleting a package feature is a major breaking change
Referring to a package feature that doesn't exist triggers a build error — cargo
and rustc
have no idea what code to compile:
error: failed to select a version for `example`.
... required by package `downstream_crate v0.1.0`
versions that meet the requirements `^0.1.0` (locked to 0.1.0) are: 0.1.0
the package `downstream_crate` depends on `example`,
with features: `nonexistent` but `example` does not have these features.
failed to select a version for `example` which could resolve this conflict
Thus, deleting a package feature that used to exist is a major breaking change. Users would run into build errors if a crate feature they use is no longer present.
(Don't worry, the new cargo-semver-checks
version catches those cases too!)
Optional dependencies (sometimes) create implicit features
As part of the features system, cargo
also allows defining optional dependencies.
That's useful when a dependency is only useful in some of the package's cargo features, and is otherwise not worth including.
Our post's motivating example Cargo.toml
file had two optional dependencies:
[package]
name = "my_crate"
version = "0.1.0"
edition = "2021"
[dependencies]
# Optionally, process data in parallel.
rayon = { version = "1.10.0", optional = true }
# Optionally, use a faster hash table.
rustc-hash = { version = "2.1.0", optional = true }
Each of these optional dependencies defines an implicit feature by the same name.
This way, only users who use my_crate
with the rayon
feature will get the rayon
dependency — otherwise it won't be included.
While implicit features are meant to be a convenience, they can often be a footgun too. At issue: the fact that the implicit feature isn't always created.
Breakage! in the Cargo.toml
Package features can reference both other features as well as any optional dependencies.
To distinguish optional dependencies from feature names, naming an optional dependency uses the dep:
prefix.
Our example's feature syntax says: "define a max_performance
feature that requires the rayon
and rustc-hash
optional dependencies."
[features]
# Allow selecting "max perf" mode with just one feature,
# instead of managing multiple optional dependencies.
max_performance = ["dep:rayon", "dep:rustc-hash"]
Here lies the problem:
In some cases, you may not want to expose a feature that has the same name as the optional dependency. For example, perhaps the optional dependency is an internal detail, or you want to group multiple optional dependencies together, or you just want to use a better name. If you specify the optional dependency with the
dep:
prefix anywhere in the[features]
table, that disables the implicit feature.
Listing dep:rayon
in the max_performance
feature is quietly removing the rayon
feature, and same with dep:rustc-hash
and the rustc-hash
feature!
Human eyes may miss this, but cargo-semver-checks v0.37 does not:
The current state of the lint is "don't let perfect be the enemy of good." The diagnostic text in particular can be way more helpful. I plan to iterate on it in the immediate future! In its current state, it is primarily a demonstration that we can lint Cargo.toml
files.
$ cargo semver-checks
# ... snip ...
Checking my_crate v0.1.0 -> v0.1.0 (no change)
Checked [ 0.006s] 95 checks: 94 pass, 1 fail, 0 warn, 0 skip
--- failure feature_missing: package feature removed or renamed ---
Description:
A feature has been removed from this package's Cargo.toml. This will
break downstream crates which enable that feature.
ref: https://doc.rust-lang.org/cargo/reference/semver.html
#cargo-feature-remove
impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/
v0.37.0/src/lints/feature_missing.ron
Failed in:
feature rayon in the package's Cargo.toml
feature rustc-hash in the package's Cargo.toml
Summary semver requires new major version: 1 major and 0 minor
checks failed
If you aren't excited to hear about how this works under the hood, that's fine — you can skip the next section.
How this works under the hood in cargo-semver-checks
This was a challenging feature to build!
Up to this point, cargo-semver-checks
had relied solely on rustdoc JSON — the unstable, machine-readable description of the API and ABI of a given package, produced by Rust's built-in rustdoc
tool. But rustdoc JSON is emitted for a specific combination of enabled features, so it does not by itself contain any feature information. We had to look elsewhere.
The data source we chose to use is the JSON output of cargo metadata
. It proved a superior choice compared to alternatives like "parse the Cargo.toml
file" or "query crates.io".
cargo metadata
lets us ask the authoritative system on what features exist. It means we can't accidentally diverge in our interpretation of the Cargo.toml
file. We also remain robust to future improvements in the feature system: for example, RFCs are in progress for private/hidden features, nightly/unstable features, etc.
But that was just the start of the journey.
The most intricate part of cargo-semver-checks
isn't linting — it's gathering the data needed to lint.
cargo-semver-checks
today spends 5-10x longer gathering data (e.g. running rustdoc
) than it does actually running lints!
As a result, we rely on tricks like caching reused rustdoc data so we don't have to re-generate it repeatedly — and caching means "fun" problems like cache invalidation! For example: "the cache is no longer valid because the user has since upgraded to a newer Rust, which uses a different and incompatible rustdoc format."
To make the problem tractable, our system used to make a fundamental assumption: "there is only one data source, and it's rustdoc JSON."
Introducing cargo metadata
as a second data source required first unwinding that assumption with tedious refactoring, and then weaving cargo metadata
through the entire system.
Of course, we also had to ensure we didn't break any existing functionality — work that was made much easier by the improved test harnesses built by Max Carr as part of this year's Google Summer of Code!
Having acquired the metadata, the rest was analogous to how we ship any other lint:
- First, express the metadata's data model as a Trustfall query engine schema, and implement its bindings in the adapter component. This is a one-time operation, only necessary the first time a new piece of data is brought in.
- Then, write a new lint file. It includes the query to run ("find features that used to exist in the previous release, but don't exist currently", written as a Trustfall query
It also includes a community-sourced list of common private or unstable feature names to ignore. Until
cargo
has a better mechanism for this, we don't want to flag features with names likeunstable-do-not-use
! The full list of name patterns we ignore is here. ), as well as the diagnostic text to show. - Finally, add test crates with various edge cases, to ensure the lint fires exactly when it should — no more and no less.
Wrapping up
I'd love to know: did you spot the breakage the Cargo.toml
example at the top of the post?
If so, do you feel like you would have caught it in a more realistic scenario too? Say, if the Cargo.toml
change were part of a bigger pull request, and you didn't already know to look for breakage in it?
Personally, I would fail to catch this most of the time. I'd much rather have cargo-semver-checks
deal with it!
If you liked this essay, consider subscribing to my blog or following me on Mastodon, Bluesky, or Twitter/X. You can also fund my writing and work on cargo-semver-checks
via GitHub Sponsors, for which I'd be most grateful ❤