# My HYTRADBOI'22 Jam

_Published: 2022-10-03_

I had a lot of fun spending nights-and-weekends time participating in the [HYTRADBOI Jam](https://www.hytradboi.com/jam), a global hack week aimed at building "exciting and weird" data-centric solutions to familiar problems.
The name [HYTRADBOI](https://www.hytradboi.com/) might sound familiar: the jam is associated with the same conference where I gave my ["How to Query (Almost) Everything" talk](/talks/#how-to-query-almost-everything) talk in April this year.

I jammed on two projects: one solo and one with [a friend](https://twitter.com/Bojan93112526). The projects ultimately were very successful and mostly-successful, respectively.

I'll summarize the goals and progress on each project here.
I'll save the deep technical details for a future post.

## Semver queries over multiple rustdoc versions

My solo project was to [add support for multiple rustdoc versions](https://github.com/obi1kenobi/cargo-semver-check/pull/133) to `cargo-semver-checks`.[^sn-1]

Before the Jam, each `cargo-semver-checks` version could only support a single version of rustdoc.
As an unstable Rust feature under active development, the rustdoc JSON format changes frequently: I counted 7 breaking changes in a ~two-month period.

Thus, users had to match their version of `cargo-semver-checks` to the nightly Rust compiler version they were using.
Also, every time a new nightly included an updated rustdoc JSON format, I had to scramble to publish a corresponding update for `cargo-semver-checks`.
All around, this was [less than ideal](https://github.com/obi1kenobi/cargo-semver-check/issues/122).

My Jam project was to use [Trustfall](https://github.com/obi1kenobi/trustfall) to decouple the semver lint queries from the changes in the rustdoc JSON format version.
In broad strokes, JSON format changes don't affect the data model semver queries are interested in — Rust functions, structs, fields etc. — so the Trustfall adapter can absorb format changes and prevent them from leaking through to the queries.
More details in a future post!

This project was very successful: `cargo-semver-checks` version 0.12.0 includes this functionality, and *none of the semver queries needed any changes* whatsoever.
From this version onward, each new `cargo-semver-checks` version will support at least the most-recent stable and beta Rust versions.
Version 0.12.0 simultaneously supports [three rustdoc format versions](https://github.com/obi1kenobi/cargo-semver-check/pull/133):
- rustdoc v16, shipped in Rust stable 1.64
- rustdoc v21, shipped in Rust beta 1.65
- rustdoc v22, in the current nightly Rust.
The [GitHub Action](https://github.com/obi1kenobi/cargo-semver-checks-action) is also updated to match, and now uses the latest stable Rust by default.

## Unlimited-depth recursion in Trustfall queries

The `@recurse` directive is one of [Trustfall's](https://github.com/obi1kenobi/trustfall) key improvements over GraphQL.
For example, it allows queries like ["What comments and replies up to 3 levels deep have been posted on HackerNews' top stories?"](https://play.predr.ag/hackernews#?f=1&q=IyBHZXQgY29tbWVudHMgYW5kIHJlcGxpZXMgdXAgdG8gMyBsZXZlbHMgZGVlcAojIG9uIHRvcCBIYWNrZXJOZXdzIHN0b3JpZXMuCnF1ZXJ5IHsKICBUb3AgewogICAgLi4uIG9uIFN0b3J5IHsKICAgICAgdGl0bGUgQG91dHB1dAogICAgICBzdWJtaXR0ZWRVcmwgQG91dHB1dAogICAgICBzdG9yeVVybDogdXJsIEBvdXRwdXQKICAgICAgCiAgICAgIGNvbW1lbnQgewogICAgICAgIHJlcGx5IEByZWN1cnNlKGRlcHRoOiAzKSB7CiAgICAgICAgICBjb21tZW50OiB0ZXh0UGxhaW4gQG91dHB1dAogICAgICAgICAgY29tbWVudGVyOiBieVVzZXJuYW1lIEBvdXRwdXQKICAgICAgICB9CiAgICAgIH0KICAgIH0KICB9Cn0%3D&v=ewoKfQ%3D%3D)[^sn-2]

When deploying [Trustfall](https://github.com/obi1kenobi/trustfall) over a dataset, engineers don't need to worry about the complexity of implementing recursion over the edges (relations) in their data model.
The [Trustfall](https://github.com/obi1kenobi/trustfall) interpreter handles the execution of `@recurse` directives transparently, and the underlying dataset adapter sees only a series of non-recursive edge-traversal requests.

This outward-facing elegance comes at the cost of significant internal complexity within [Trustfall](https://github.com/obi1kenobi/trustfall).
In fact, the internals that make `@recurse` work are *the most complex code in the entire project*, and are already the product of many months of painstaking work.

But [my friend Bojan](https://twitter.com/Bojan93112526) and I love to push the limits of what can be achieved.
We wanted to further improve `@recurse` in two ways:
- Today, `@recurse` requires a fixed maximum recursion depth, and can't express "recurse until you run out of data" queries.
  This is in part because the current implementation eagerly allocates `O(d)` memory for `d`-depth recursion.
  Thus, it would attempt to allocate an infinite amount of memory if attempting to perform unlimited recursion, which wouldn't work.
- Today's `@recurse` implementation is asymptotically optimal,[^sn-3] and we wanted to make it *directly optimal* instead.
  This would improve `@recurse` query performance by returning all results earlier than before: the first result would become available after `O(1)` recursion-related work instead of `O(d)` for `d`-depth recursion, etc.

[Several prototypes](https://github.com/bojanserafimov/recurse) and lots of collaboration later, we have *mostly* achieved our objective: we have [a draft PR](https://github.com/obi1kenobi/trustfall/pull/83) containing a directly-optimal `@recurse` implementation that can support unlimited-depth recursion.

Our traces show the new implementation is *more than twice as fast* at delivering a query's first result compared to the existing `@recurse` execution approach.
As the recursion depth increases, the new implementation's performance advantage grows proportionally — the more-than-2x improvement is a *lower bound*.

The best part: this is all without making *any* changes to the adapter interface used to plug in data sources into [Trustfall](https://github.com/obi1kenobi/trustfall). All Trustfall users would benefit from this improvement without needing to change even one line of query or data adapter code.

This project was *mostly* successful.
We had (rather ambitiously!) hoped to have the new implementation merged by the end of the Jam.
However, [Trustfall's](https://github.com/obi1kenobi/trustfall) thorough test suite revealed that our new code currently mishandles [a particularly subtle edge case](https://github.com/obi1kenobi/trustfall/blob/main/trustfall_core/src/resources/test_data/valid_queries/implicit_coercion_in_recurse_to_supertype.graphql.ron#L4-L23) that can arise when recursing in particularly diabolical schemas.

We won't let one edge case deter us from our pursuit of optimal performance!
But first, we need some rest after a week of thrilling jamming 😴

Thanks to [Jamie Brandon](https://twitter.com/sc13ts) for organizing the [HYTRADBOI Jam](https://www.hytradboi.com/jam)! Participating was a lot of fun, and I am hoping there will be many more Jams to come.

*This blog post includes joint work with [Bojan Serafimov](https://twitter.com/Bojan93112526). However, any mistakes in this blog post are mine alone.*

[^sn-1]: That's the [semver linter for Rust](https://crates.io/crates/cargo-semver-checks) I built using [Trustfall](https://github.com/obi1kenobi/trustfall) a few months ago.

[^sn-2]: This isn't the same as nesting the `reply` edge in that query three times: that would give us all replies that are *exactly* 3 levels deep in the comment tree, instead of all comments at levels 0-3 inclusive.

[^sn-3]: Essentially, computing the first result involves a bit more work (`O(d)` for `d`-depth recursion) than strictly necessary. Subsequent results progressively recoup this upfront cost, so most of the time this isn't a problem. But it can become noticeable when using extremely deep recursion, say 10,000+ levels deep.

Copyright (C) Predrag Gruevski 2022. [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en)
