# Checking semver in the presence of doc(hidden) items

_Published: 2023-11-18_

*`cargo-semver-checks` v0.25 squashes nearly all bugs related to `doc(hidden)` items — its most common source of false-positives. What does `doc(hidden)` mean in Rust, and why was handling it correctly so hard?*



`cargo-semver-checks` is a linter that helps prevent accidental breaking changes in the public APIs of Rust libraries.

Seems simple enough, no? And yet, almost every word in that sentence is unexpectedly load-bearing!

In today's post, we dig into how "public API" isn't the same thing as "all `pub` items," and why that caused headaches for `cargo-semver-checks` users prior to v0.25. Resolving those headaches is now as simple as upgrading to the latest `cargo-semver-checks`!

*Discuss on [r/rust](https://www.reddit.com/r/rust/comments/17y7vbi/checking_semver_in_the_presence_of_dochidden_items/) or [lobste.rs](https://lobste.rs/s/zh49df/checking_semver_presence_doc_hidden).*

## TL;DR + table of contents

The `#[doc(hidden)]` attribute is a way to mark `pub` items exported by a crate as "not public API" and therefore exempt from semver obligations. It's [commonly used](https://github.com/search?q=doc%28hidden%29+language%3ARust&type=code&l=Rust) — any Rust project likely has at least one dependency containing public `doc(hidden)` items.

Prior to v0.25, `doc(hidden)` could be a huge problem for `cargo-semver-checks` users. [Our study of semver compliance in the top 1000 Rust crates](/blog/2023-09-07-semver-violations-are-common-better-tooling-is-the-answer/) showed it was the source of **over 60% of our false-positives**.[^sn-1] For crates that heavily rely on `doc(hidden)`, using `cargo-semver-checks` was a frustrating experience whenever the hidden items changed.

**`cargo-semver-checks` v0.25 addresses this problem.**[^sn-2] It's not every day that one eliminates over 60% of outstanding bugs! 🎉

This was a [long-term](https://github.com/obi1kenobi/cargo-semver-checks/issues/120) team effort that stands on the shoulders of giants. [David Hewitt](https://twitter.com/__davidhewitt__), a maintainer of [PyO3](https://crates.io/crates/pyo3), helped flesh out the [initial idea and prototypes](https://github.com/obi1kenobi/trustfall-rustdoc-adapter/pull/260). I [consulted the `rustdoc` maintainers](https://rust-lang.zulipchat.com/#narrow/stream/266220-t-rustdoc/topic/pub.20re-export.20of.20item.20from.20.60doc.28hidden.29.60.20module) about the subtleties of `doc(hidden)`. All of us benefit from [shockingly good tooling](https://twitter.com/PredragGruevski/status/1719743377602687115) in the Rust ecosystem, from `rustc`'s diagnostics onward. And there's no way this would have shipped today without [the sponsors that fund my work](https://github.com/sponsors/obi1kenobi).[^sn-3]

### Contents
* [The `doc(hidden)` attribute](#the-doc-hidden-attribute)
* [Why the obvious solutions don't work](#why-the-obvious-solutions-don-t-work)
* ["Lints are queries" to the rescue!](#lints-are-queries-to-the-rescue)
* [What about deprecated items?](#what-about-deprecated-items)
* [I'd like to hear from you!](#i-d-like-to-hear-from-you)

## The `doc(hidden)` attribute

_If you are already familiar with `doc(hidden)`, feel free to [skip ahead](#why-the-obvious-solutions-don-t-work)._

Here are a few examples of using `doc(hidden)`:

```rust
#[doc(hidden)]
pub struct Example;

#[doc(hidden)]
pub fn frobnicate() {
    // implementation here
}

#[doc(hidden)]
pub mod hidden {
    pub struct AnotherExample;

    pub fn frobnicate_more() {
        // more implementation
    }
}
```

All items in this snippet are public in the sense of "accessible outside their crate."
None of them are part of the crate's public API.
If this crate had a [docs.rs](https://docs.rs/) page, none of these items would show up on it.
We can verify this in the real world: the `impl_` module in PyO3 is [public but hidden](https://docs.rs/pyo3/0.20.0/src/pyo3/lib.rs.html#410-411) — can you find it in [the module listing on its docs.rs page](https://docs.rs/pyo3/latest/pyo3/#modules)?

The most common use case for `doc(hidden)` is in crates that export macros. Code generated by macros is part of the _consumer's crate_, so it can only call `pub` code from the macro definition's crate. In practice, that requires large portions of macros' implementation details to be `pub` even though they are never intended to be used _directly_ by end users. Changes to internal implementation details shouldn't require a major version bump, whether in macros or not![^sn-4]

Prior to v0.25, `cargo-semver-checks` was oblivious to `doc(hidden)` and treated all public items as public API. This worked just fine for many crates! However, crates like PyO3 that define many complex macros often found that `cargo-semver-checks` reported too many false-positives to be practical. As of v0.25 — not anymore!

Why did handling one Rust attribute take [well over a year](https://github.com/obi1kenobi/cargo-semver-checks/issues/120)?

Because none of the obvious approaches would have worked.

## Why the obvious solutions don't work

*"Easy," one might say, "just pretend `#[doc(hidden)]` items don't exist at all."*

And here I thought we were trying to _fix_ false-positives, not _cause them_ 😂 Watch this!

```rust
pub enum Example {
    Regular,

    #[doc(hidden)]
    Sneaky,
}
```

Say we make `Example::Sneaky` no longer `doc(hidden)`. Is this a breaking change? No — we've only expanded the public API surface area. Any code that worked before will continue to work.[^sn-5]

[Recall how `cargo-semver-checks` lints work](/blog/2023-02-07-speeding-up-rust-semver-checking-by-over-2000x/): they look for differences between a baseline ("already published") version and a "current" version that is being checked. One example of such a difference is when an exhaustive enum gains a new variant:[^sn-6]

```rust
// Exhaustive enum -- no `#[non_exhaustive]` attribute.
pub enum SomeEnum {
    First,

    // Added in the new version.
    // Breaking change!
    Second,
}
```

Adding `SomeEnum::Second` is a major breaking change! `SomeEnum`'s public API supports exhaustive pattern-matching, which will now require a case for `SomeEnum::Second` as well.

Now let's go back to the previous snippet:
```rust
pub enum Example {
    Regular,

    #[doc(hidden)]  // removed in new version
    Sneaky,
}
```

If `cargo-semver-checks` pretended that `doc(hidden)` items don't exist, then `Example::Sneaky` looks just like `SomeEnum::Second` — a brand-new variant in an exhaustive enum, an apparent breaking change!

Behold, a false-positive!

And far from the only one! Pretending `doc(hidden)` items don't exist would also have failed us in a dozen other ways.[^sn-7]

I spent _months_ coming up with ideas, then finding counterexamples. This is how a few hundred lines of code can take over a year to write.

## "Lints are queries" to the rescue!

In the end, the query-based design of `cargo-semver-checks` once again played a key role in the solution.

Each lint in `cargo-semver-checks` is implemented as a query over an abstract data model implemented as [an adapter](https://github.com/obi1kenobi/trustfall-rustdoc-adapter) for the [Trustfall query engine](https://github.com/obi1kenobi/trustfall). I've written up how those queries work [in a prior post](/blog/2023-02-07-speeding-up-rust-semver-checking-by-over-2000x/), and you can try them out by querying Rust crates' data in our query playground. For example: "[Find the enums and their variants defined in the `itertools` crate.](https://play.predr.ag/rustdoc#?f=2&q=*3-Enums*L-variants*L-and-their-associated-information.*lquery---0Crate---2item---4*E-Enum---6enum_*n-name-*o*l--_6*l--_6importable_path---8enum_path*B-path-*o*l--_6*J*l*l--_6variant_*B-variant---8name-*o*l--_8docs-*o*l--_8attrs-*o*l--_8*l--_8declared_in_*B-span---afilename-*o*l--_aline*B-begin_line-*o*l--_8--*6--*4--*2--*0*J*l*J&v=*C*l*l*J&crate=itertools-0.10.4)"

The data backing that abstract model is derived from rustdoc JSON, but the additional layer of abstraction buys us a lot of flexibility. We already rely on that flexibility in two ways I've touched on in previous posts:
- [supporting multiple rustdoc JSON formats concurrently](https://predr.ag/blog/speeding-up-rust-semver-checking-by-over-2000x/#with-the-right-tools-anything-can-be-a-database), without rewriting lint queries for each format change, and
- [resolving names and re-exports](https://predr.ag/blog/breaking-semver-in-rust-by-adding-private-type-or-import/) in order to catch edge cases that can [sneakily break crates' public APIs in entirely non-obvious ways](https://github.com/kenba/opencl3/pull/54).

We handle `doc(hidden)` by leaning on that flexibility just a bit more. We add two new "synthetic" fields to our data model:[^sn-8]
- In our `Item` vertex type and its subtypes, we [add a new property](https://github.com/obi1kenobi/trustfall-rustdoc-adapter/pull/260/files#diff-7a7f49515432b8b2e2fb74a47dc49b20ea0478726e890841ebb77082a06a4b46R66-R85) — `public_api_eligible`.  This property is `true` if the item, viewed in isolation, is able to participate in the crate's public API: neither its own visibility (like `pub(crate)`, for example) nor `doc(hidden)` prevents it from being public API.
- In our `ImportablePath` vertex type, we [add another new property](https://github.com/obi1kenobi/trustfall-rustdoc-adapter/pull/260/files#diff-7a7f49515432b8b2e2fb74a47dc49b20ea0478726e890841ebb77082a06a4b46R877-R883) — `public_api`. This property is `true` if that item's path is accessible without passing through any items that are not themselves public API.

```graphql
interface Item {
  # <... existing properties and edges ...>

  """
  Whether this item is eligible to be in the public API.

  # <... more docs ...>
  """
  public_api_eligible: Boolean!
}

type ImportablePath {
  # <... existing properties and edges ...>

  """
  This path should be treated as public API.

  # <... more docs ...>
  """
  public_api: Boolean!
}
```

These two properties capture the two ways that public items can be suppressed from the public API: by marking the item itself `doc(hidden)`, or by making sure all paths under which that item is visible are themselves `doc(hidden)`.

In other words, these two properties are necessary to handle the following edge case:

```rust
#[doc(hidden)]
pub mod foo {
    pub struct Bar;
}

// `this_crate::Bar` is public API here!
//
// `this_crate::foo::Bar` is pub-and-hidden,
// but this re-export is not hidden
// and neither is `Bar` itself.
pub use foo::Bar;
```

In this case, `cargo-semver-checks` detects that `Bar` is importable from this crate in two ways: `this_crate::Bar` and `this_crate::foo::Bar`. The former is public API, and the latter is not since it passes through the hidden `this_crate::foo` module.

In our data model:
- the `Bar` item would have `public_api_eligible = true`;
- the `this_crate::foo::Bar` path would have `public_api = false`, and
- the `this_crate::Bar` path would have `public_api = true`.[^sn-9]

Equipped with these new additions, making `cargo-semver-checks` lints aware of `doc(hidden)` was just a matter of updating their queries to use these new properties appropriately.

Updating 50+ queries sounds like a lot of work, but in reality it wasn't that bad: it was just a matter of [adding an extra clause or two](https://github.com/obi1kenobi/cargo-semver-checks/pull/576/files#diff-211f3404f6c9c152f6923afdcb9f7813d8f8a5450cf5bdccddb48899bf534ea5) to queries that are type-checked by the Trustfall engine, then checked for correctness against a substantial test suite. If our lints were specified imperatively, the change could have been a hundred times harder to get right.

Declarative lints for the win! 🚀

## What about deprecated items?

Our [semver study of the top Rust crates](https://predr.ag/blog/semver-violations-are-common-better-tooling-is-the-answer/) helped contribute many excellent edge cases that guided the handling of `doc(hidden)` items in `cargo-semver-checks`. It was great to find these edge cases proactively — before launching this new functionality — rather than retroactively by having annoyed users open bug reports post-launch 😅[^sn-10]

Via the results of that study, I learned that maintainers sometimes [use `doc(hidden)` when deprecating public API items](https://github.com/search?q=language%3Arust+doc%28hidden%29+deprecated&type=code). This allows them to suppress the deprecated items from documentation without breaking code that still depends on them.

```rust
#[deprecated = "Use crate::Other instead."]
#[doc(hidden)]
pub struct LegacyCode;
```

It turns out this is reasonably common! As such, `cargo-semver-checks` must accommodate this use case as well.

Our answer: while normally `doc(hidden)` items are not `public_api_eligible`, items that are both deprecated and hidden _are_ considered `public_api_eligible` and remain in the public API. This allows us to ensure that public APIs remain unchanged both _during_ and _after_ their deprecation.

This approach also introduces a small quirk: if an item is both hidden and deprecated, removing the deprecation is considered a breaking change — it makes the item no longer public API. Whether this is correct or not is debatable, and likely best decided on a case-by-case basis. Multiple reasons make me feel this is unlikely to be a problem in practice:
- One wouldn't usually mark non-public items as deprecated in the first place. If an item is private to a crate and outdated, maintainers would most often replace it immediately rather than deprecating first.
- Deprecating and hiding an item, then un-deprecating _without un-hiding_ seems like a very strange sequence of changes. I can't think of many (any?) cases where it could realistically happen.

## I'd like to hear from you!

That's a wrap for the biggest feature of `cargo-semver-checks` v0.25!

What do you think? Are you using `cargo-semver-checks` in your project? Please let me know!

Whether your opinion is positive, negative, or just plain "meh," I'd like to hear about it. Feedback helps drive the project forward: positive feedback is motivation, and constructive negative feedback helps improve the tool in places where it needs improvement the most.

I'd love to make `cargo-semver-checks` my full-time job that pays my rent, and I'd love your help with that as well.

Is there a feature or improvement that your team or company really wants? Do you want to lint your project's codebase for other things beyond just semver? Is your organization interested in a private talk about the tech behind `cargo-semver-checks` and how it can best be used and scaled up? I can do all this and more — [give me a ping](https://github.com/obi1kenobi)!

*Discuss on [r/rust](https://www.reddit.com/r/rust/comments/17y7vbi/checking_semver_in_the_presence_of_dochidden_items/) or [lobste.rs](https://lobste.rs/s/zh49df/checking_semver_presence_doc_hidden).*

[^sn-1]: The study ended up discarding nearly half (46.6%) of all issues reported by `cargo-semver-checks` as `doc(hidden)` false-positives _alone_. To put it mildly: not great. Thankfully, these false-positives were not evenly distributed: most crates had zero or very few and were able to use `cargo-semver-checks` without issue, while a minority of crates were `doc(hidden)` [Georg](https://en.wikipedia.org/wiki/Spiders_Georg) and `cargo-semver-checks` was practically unusable for them. The next section offers intuition for why this is the case.

[^sn-2]: A tiny handful of rare edge cases remains unsolved since it first requires another feature: the ability to detect whether, and to what degree, a trait is sealed. As readers of this blog know, that's _also_ [more complex than it may seem](/blog/2023-04-05-definitive-guide-to-sealed-traits-in-rust/) at a glance. Fortunately, real-world data shows those cases are extremely rare in practice — and we have a path toward handling them correctly in the future. More on this in future posts!

[^sn-3]: If you or your employer use `cargo-semver-checks`, please consider supporting my work! If you aren't sure how to approach your employer about this — [let's](https://twitter.com/PredragGruevski) [chat](https://hachyderm.io/@predrag)!

[^sn-4]: Aside from "feeling right," there's also a pragmatic argument here: cargo will not automatically update libraries to new major versions. We don't want projects to end up with multiple major versions of macro crates for no reason — that just adds bloat, wastes compile time, and worsens maintainability. At scale, even small amounts of friction can drastically change outcomes.

[^sn-5]: One can argue that clippy should raise a warning lint here, since it's probably a good idea to make the enum non-exhaustive if we're hiding its variants. I'm sympathetic to that argument! But as of Rust 1.74, this is legal, lint-free Rust code.

[^sn-6]: Here's [the implementation](https://github.com/obi1kenobi/cargo-semver-checks/blob/main/src/lints/enum_variant_added.ron) for that lint.

[^sn-7]: Another failure mode is a variant of the [re-exports issue](https://predr.ag/blog/breaking-semver-in-rust-by-adding-private-type-or-import/) that I discussed in a previous post, where knowledge of private items is required to determine which names are publicly visible in a module. There are more — let me know if you'd like me to write about them!

[^sn-8]: "Synthetic" because they don't per se exist in rustdoc JSON, and are instead [computed on-demand](https://github.com/obi1kenobi/trustfall-rustdoc-adapter/pull/260/files#diff-bab205220e099630e9ba8d3ac0832a819e400befa0ed9bcb480d08e85d421582R62-R77) by the Trustfall adapter for rustdoc.

[^sn-9]: Modules are also a type of `Item` in our data model. The `foo` module's own item would have `public_api_eligible = false` since it's marked `doc(hidden)`, and its path `this_crate::foo` would have `public_api = false` due to the same reason.

[^sn-10]: In [my recent talk at P99 CONF](https://www.youtube.com/watch?v=Fqo8r4bInsk), I argued that having high performance is what made it _viable_ to find such issues proactively. If `cargo-semver-checks` were slower, scanning 14000 Rust crate releases would have been infeasibly expensive. In my (admittedly biased) opinion, I think building on top of Trustfall is a key reason why `cargo-semver-checks` was able to achieve high performance despite all the domain's challenges and without expending an overwhelming amount of effort.

Copyright (C) Predrag Gruevski 2023. [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en)
