Debugging Safari: If at first you succeed, don't try again

September 19, 2022 debugging browser caching

The saying usually goes: "If at first you don't succeed, try, try again." But in the Safari web browser under the right conditions, trying again after succeeding once can get you in trouble. This is my recent debugging adventure. As requested on Twitter.

While building Trustfall Playground, I noticed that the playground editor in Safari was sporadically broken. You guessed it, it was a caching problem — it's always caching, unless it's DNS.

When the page requested the problematic script, cache misses worked fine. But on cache hits, Safari was deciding not to load the resource claiming it violated security rules — even if it had already accepted and loaded it earlier on the same page. As long as the resource remained in cache, any attempt to use it would fail.

To explain what was happening, we'll need a bit of background. The bug is caused by a complex interaction of browser caching logic, new APIs for high-performance computing, and new browser security features. We'll get charged with breaking security rules, successfully proclaim our innocence, and ride off into the sunset while filing a bug on the Safari/WebKit browser engine.

If mysteries aren't your thing, click here to skip to the answer.

Playground needs special security headers

Trustfall Playground executes queries using one or more web workers. Kind of like threads in a native app: a way to move computation off the UI thread to keep it responsive. One of the workers compiles and executes queries, another handles network requests, a third provides query editor functionality like syntax highlighting, and so on.

The Trustfall Playground website's page on querying HackerNews APIs, showing an example query titled "Comments With Links to HackerNews Stories." The schema pane is open and shows documentation about fields available to query, like "FrontPage," "Top," and "Latest."

The workers coordinate through an object called a SharedArrayBuffer. It represents a region of memory accessible to multiple workers, so that data written by one worker can be read by another. Techniques that use shared memory buffers are a cornerstone of modern multicore computing: they are extensively used by both the browser where you are seeing these words and the operating system on which it runs, for example.

However, their pervasiveness also made them targets for exploitation via attacks like Spectre. To keep users safe, web worker code that uses such high-performance functionality needs to meet a whole list of requirements which make sure the sensitive code is delivered safely over HTTPS and remains isolated from other components, just in case they are up to no good.

Any page needing this functionality must be served with the following headers: This is just a high-level overview. For more depth, see https://web.dev/coop-coep/

Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp

The first header demands that the browser not run our page's code together with code from other origins, to prevent those other websites from running Spectre-like attacks on our code. The second header restricts the page to only using resources that have headers CORP or CORS headers specifically. which opt them into being used on our page.

Scripts loaded into the Playground web workers signal they opt in with:

Cross-Origin-Resource-Policy: same-origin

If a worker's script is delivered without this header, the browser will refuse to start the worker:

Refused to load 'https://play.predr.ag/broken_script.js)' worker
because of Cross-Origin-Embedder-Policy.

Worker load was blocked by Cross-Origin-Embedder-Policy

Violating policy, but only sometimes

Back to the story: the Trustfall Playground was broken in Safari. The page was visibly missing functionality, and those "worker load was blocked" messages were being printed in the console.

Over the next couple of hours, the adventure unfolded something like this:

Wipe out all caches, so we start from a known configuration.
Open the browser's dev tools network pane, then reload the page.
Find Safari's request for the script that gets blocked. Is the Cross-Origin-Resource-Policy header set correctly in the response? Yes. Unlikely to have been the problem (the page would have also been broken in other browsers too) but it's important to check our assumptions.
Double-check the top-level page's headers too. They look good.
Nothing obvious is broken. Look for clues: things that look out of place.

Checking lots of things that ended up not being the problem...

Hmm, the script is loaded three times but Safari blocks it (and complains via the console) only twice. Curious!

Let's try to figure out which requests generate which complaints.

Clear caches, and pay close attention to order of requests and console log lines.
First request — miss, works fine. Two more requests, two cache hits — each with a complaint.
Repeat a few times. It's reproducible!

Time to put the caching logic under a microscope.

Caching on the web

By default, Netlify sends caching headers that ask the browser to ensure any cached copies of resource are up-to-date before being used. If the browser has the resource in cache, it still makes a request to the server but includes an Etag value which the server can use to determine if the cached copy is outdated or still good.

On cache miss (outdated or missing Etag), the server responds with 200 OK and sends the resource together with any headers — including that Cross-Origin-Resource-Policy header we require.

On cache hit (matching Etag), the server instead responds with 304 Not Modified.

Inspecting the requests, we find a clue: the 200 OK responses include our security header, but the 304 Not Modified responses do not!

Let's run an experiment: disable caching for web worker scripts by setting the Cache-Control: no-store header. Yes, no-store — confusingly, no-cache means something else:

Note that no-cache does not mean "don't cache". no-cache allows caches to store a response but requires them to revalidate it before reuse. If the sense of "don't cache" that you want is actually "don't store", then no-store is the directive to use.

Cache-Control Response Directives, MDN Web Docs

With caching disabled, Safari loads all web worker scripts successfully, no matter how many times we reload the page. The problem is definitely related to caching. But which side is causing it: the browser or the site?

The Playground uses Netlify's default caching configuration. Are Netlify's defaults broken?

And why does all this work fine in other browsers?

Whose responsibility are these headers, anyway?

Opening Chrome, we see that Playground works fine there even though the 304 Not Modified responses still don't include the Cross-Origin-Resource-Policy security header. Obviously, 200 OK responses do include that header — the site was broken when they did not.

So we're either slipping a script past Chrome's security protocols ... or Safari is failing to properly apply the security header on cached resources.

When in doubt, consult the official specification:

The 304 (Not Modified) status code indicates that a conditional GET or HEAD request has been received and would have resulted in a 200 (OK) response [...] the server is therefore redirecting the client to make use of that stored representation as if it were the content of a 200 (OK) response.
[...]
Since the goal of a 304 response is to minimize information transfer when the recipient already has one or more cached representations, a sender SHOULD NOT generate representation metadata other than the above listed fields unless said metadata exists for the purpose of guiding cache updates [...]

Section 15.4.5.: 304 Not Modified, RFC 9110, Internet Engineering Task Force (IETF)

As the Cross-Origin-Resource-Policy header has nothing to do with cache updates, the spec unambiguously says it should not be included in 304 Not Modified responses. Instead, the client is expected to use the stored resource as if from a 200 OK response. This surely includes any relevant headers that the 200 OK would (and previously did!) contain, like our Cross-Origin-Resource-Policy header.

A bug in Safari / WebKit

Knowing what we know now, here's a review of what was breaking in Playground and why:

On cache miss, Safari gets the web worker script with proper security headers, and everything is fine.
On cache hit, the server doesn't include security headers, expecting Safari to act as if the response were the prior 200 OK which included those headers.
Safari does not do so, and the missing headers cause the web worker script to be rejected.
If the page loads the script multiple times, we'll see one rejection message in the console for each cache hit for that script.

The specification points to the Safari caching code as the cause of the bug.

But it's also perfectly understandable how a bug like this might happen. Browsers are among the most complex pieces of software in existence. This bug is a complex interaction of different systems evolved under competing pressures: maximum performance versus uncompromising security.

Rather than being frustrated with Safari / WebKit maintainers, we should appreciate their hard work, and make their job just a bit easier by providing the best possible description and reproduction of the bug. I filed a bug report for this bug with the best information I could gather with my limited web development experience.

In the meantime, we can work around the bug by disabling caching for web worker scripts using our Cache-Control: no-store header.

Playground is sadly not yet functional on Safari or iOS due to another more severe Safari/WebKit bug that the other Playground authors and I haven't managed to work around yet. We have some ideas, though, so all is not lost!

Thanks to Bojan Serafimov and Amos Wenger for their feedback on drafts of this post. Any mistakes are mine alone.