# Falsehoods programmers believe about undefined behavior

_Published: 2022-11-27_

[Undefined behavior (UB)](https://en.wikipedia.org/wiki/Undefined_behavior) is a tricky concept in programming languages and compilers.
Over the many years I've been an industry mentor for [MIT's 6.172 Performance Engineering course](https://www.youtube.com/playlist?list=PLUl4u3cNGP63VIBQVWguXxZZi0566y7Wf),[^sn-1] I've heard many misconceptions about what the compiler guarantees in the presence of UB.
This is unfortunate but not surprising!

For a primer on undefined behavior and why we can't just "define all the behaviors," I highly recommend [Chandler Carruth's talk](https://www.youtube.com/watch?v=yG1OZ69H_-o) "Garbage In, Garbage Out: Arguing about Undefined Behavior with Nasal Demons."

You might also be familiar with my [Compiler Adventures blog series](https://predr.ag/tags/compiler-adventures/) on how compiler optimizations work.
An upcoming episode is about implementing optimizations that take advantage of undefined behavior like dividing by zero, where we'll see UB "from the other side."

## Undefined behavior != implementation-defined behavior

Undefined behavior is not the same as implementation-defined behavior.[^sn-2]
Program behaviors fall into _three_ buckets, not two:
- **Specification-defined:** The programming language itself defines what happens. This is the vast majority of every program.
- **Implementation-defined:** The exact behavior is defined by your compiler, operating system, or hardware. For example: [how many bits exactly](https://en.wikipedia.org/wiki/C_data_types#Basic_types) are in a `char` or `int` in C++.[^sn-3]
- **Undefined behavior**: *Anything* is allowed to happen, and you might no longer have a computer left after it all happens. No outcome is a bug if caused by UB. For example: signed integer overflow in C, or using `unsafe` to create two `&mut` references to the same data in Rust.[^sn-4]

Here's the list of guarantees compilers make about the outcomes of undefined behavior:

That's the whole list. No, I didn't forget any items. Yes, seriously.

It is possible to analyze how UB affects *a specific program* when compiled by a *specific compiler* or executed on *a specific target platform*.
For example, there exist exotic compilers, operating systems, and hardware that offer additional guarantees[^sn-5] relative to most common platforms, which only guarantee OS-level [process isolation](https://en.wikipedia.org/wiki/Process_isolation).
We aren't talking about those in this post.

The mindset for this post is this: "If my program contains UB, and the compiler produced a binary that does X, is that a compiler bug?"

It's not a compiler bug.

## All of the following assumptions are wrong

### Falsehoods about when UB "happens"

1. Undefined behavior only "happens" at high optimization levels like <span class="nobr">`-O2`</span> or <span class="nobr">`-O3`</span>.
1. If I turn off optimizations with a flag like <span class="nobr">`-O0`</span>, then there's no UB.
1. If I include debug symbols in the build, there's no UB.
1. If I run the program under a debugger, there's no UB.
1. Okay there's still UB with all of these, but my code will "do the right thing" regardless.
1. It will either "do the right thing" or crash with a `Segmentation Fault` (`SIGSEGV` signal).
1. It will either "do the right thing" or crash _somehow_.
1. It will either "do the right thing" or crash or infinite-loop or deadlock.
1. At least it won't run some unrelated code from elsewhere in the program.
1. At least it won't [run any unreachable code](https://kristerw.blogspot.com/2017/09/why-undefined-behavior-may-call-never.html) the program might contain.

### Falsehoods around the behavior of executing UB

<ol start="11">
<li>If a line with UB previously "did the right thing," then it will continue to "do the right thing" the next time we run the program.</li>
<li>The UB line will at least continue to "do the right thing" while the program is still running.</li>
<li>It's possible to determine if a previous line was UB and prevent it from causing problems.</li>
<li>At least the impact of the UB is limited to code which uses values produced from the UB.</li>
<li>At least the impact of the UB is limited to code which is in the same compilation unit as the line with UB.</li>
<li id="falsehood-16">Okay, but at least the impact of the UB is limited to code which runs after the line with UB.[^sn-6]</li>
</ol>

### Falsehoods about the possible outcomes of UB

<ol start="17">
<li> At least it won't corrupt the memory of the program.</li>
<li> At least it won't corrupt the memory of the program other than where the UB-affected data was located.</li>
<li> At least it won't corrupt the heap.</li>
<li> At least it won't corrupt the stack.</li>
<li> At least it won't corrupt the current stack frame. (My name for this is the "local variables are safely in registers" fallacy.)</li>
<li> At least it won't corrupt the stack pointer.</li>
<li> At least it won't corrupt the CPU flags register / any other CPU state.</li>
<li> At least it won't corrupt the <em>executable</em> memory of the program.[^sn-7]</li>
<li> At least it won't corrupt streams like stdout or stderr.</li>
<li> At least it won't overwrite any files the program already had open.</li>
<li> At least it won't open new files and overwrite them.</li>
<li> At least <a href="https://kristerw.blogspot.com/2017/09/why-undefined-behavior-may-call-never.html">it won't completely wipe the drive.</a></li>
<li> At least it won't damage or destroy any hardware components.[^sn-8]</li>
<li> At least it won't start playing Doom if the program didn't already have the Doom source code in it.[^sn-9]</li>
</ol>

### Falsehoods like "but it worked fine before"

<ol start="31">
<li>If a UB-containing program "worked fine" previously, recompiling the program without any code changes will still produce a binary that "works fine."</li>
<li>Recompiling without code changes and with the same compiler and flags will produce a binary that still "works fine."</li>
<li>Recompiling as above + on the same machine will produce a binary that still "works fine."</li>
<li>Recompiling as above + if you haven't rebooted the machine since the last compilation will produce a binary that still "works fine."</li>
<li>Recompiling as above + with the same environment variables will produce a binary that still "works fine."</li>
<li>Recompiling as above + at the same time of day and day of week as before, during a Lunar eclipse, having first sacrificed a fresh stick of RAM to the binary gods, will produce a binary that still "works fine."</li>
</ol>

### Falsehoods about self-consistent behavior of UB

<ol start="37">
<li>Multiple runs of the program compiled as above and with the same inputs will produce the same behavior in each run.</li>
<li>Those multiple runs will produce the same behavior if the program, ignoring the UB, is deterministic.</li>
<li>But they will if the program is also single-threaded.</li>
<li>But they will if the program also doesn't read any external data (files, network, environment variables, etc.).</li>
</ol>

### Community-contributed falsehoods around UB
<ol start="41">
<li>Using a debugger on a UB-containing program will show program state that corresponds to the source code.[^sn-10]</li>
<li>Undefined behavior is purely a runtime phenomenon.[^sn-11]</li>
</ol>

### False expectations around UB, in general

<ol start="43">
<li>Any kind of reasonable or unreasonable behavior happening with any consistency or any guarantee of any sort.</li>
</ol>

The moment your program contains UB, **all bets are off**.
Even if it's just one little UB.
Even if it's never executed.
Even if you don't know it's there at all.
Probably even if you wrote the language spec and compiler yourself.[^sn-12]

This is not to say that all outcomes in the list above are equally likely, or even plausible.[^sn-13]
But they are all allowed, valid, spec-compliant behavior.

It's perfectly possible that your program has UB, and it's been running fine for years without issues.
That's great!
I'm happy to hear it!
I'm not even saying you need to go back and rewrite it to remove the UB.
But as you make your decisions, it's good to know the full picture of what the compiler will or won't guarantee for your program.

## Honorable mention for one special assumption

"If the program compiles without errors then it doesn't have UB."

This is 100% false in C and C++.

It's also false as stated in Rust, but with one tweak it's _almost_ true.
If your Rust program never uses `unsafe`, then it _should_ be free of UB.
In other words: causing UB without `unsafe` is considered
[a bug in the Rust compiler](https://github.com/rust-lang/rust/issues?q=is%3Aopen+is%3Aissue+label%3AI-unsound).
These are rare and you are quite unlikely to run into them.

When Rust `unsafe` is used, then all bets are off just as in C or C++.
But the assumption that "Safe Rust programs that compile are free of UB" is _mostly true_.

This is not an easy feat.
We owe a debt of gratitude to the folks who cumulatively put engineer-centuries into making it so.
It's Thanksgiving, and I thank you!

## Errata and edit history

### 2022-11-29: Items 13-16 corrected and updated

The original version of this post contained the following items at positions 13-16 in the list:
<ol start="13">
<li>But if the line with UB isn't executed, then the program will work normally as if the UB wasn't there.</li>
<li>Okay, but if the line with UB is <a href="https://en.wikipedia.org/wiki/Dead_code">unreachable (dead) code</a>, then it's as if the UB wasn't there.[^sn-14]</li>
<li>If the line with UB is unreachable code, then the program won't crash because of the UB.</li>
<li>If the line with UB is unreachable code, then the program will at least stop running <em>somehow</em> and <em>at some point</em>.</li>
</ol>

This wording was not precise enough, and as a result the claims were arguably incorrect as stated.
I have updated the post near those claims to better capture the subtleties involved.

### 2022-11-29: Added community-contributed items

The "False expectations around UB, in general" section now contains a selection of community-suggested items.
Previously it only contained a single item (the last one in the current list) at position number 41.

*Thanks to [arriven](https://github.com/arriven), [Conrad Ludgate](https://github.com/conradludgate), [sharnoff](https://github.com/sharnoff), Brian Graham, and a few folks who preferred to remain unnamed, for feedback on drafts of this post.*
*Any mistakes are mine alone.*

[^sn-1]: An *excellent* class that I *highly* recommend. It's very thorough and hands-on, at the expense of also requiring a lot of work at a very fast pace. When I took it as an undergrad, that was a great tradeoff, but YMMV.

[^sn-2]: Undefined behavior is also not the same as [_unspecified behavior_](https://en.wikipedia.org/wiki/Unspecified_behavior), which is similar to implementation-defined behavior minus the requirement that the implementation document its choices and stick to them. Here we're focusing on _undefined_ behavior, not _unspecified_ behavior, so we'll lump _unspecified_ behavior and implementation-defined behavior together.

[^sn-3]: The specification guarantees _at least_ 8 bits for `char` and at least 16 bits for `int`. The rest is implementation-defined.

[^sn-4]: Wikipedia has [an excellent list of examples](https://en.wikipedia.org/wiki/Undefined_behavior#Examples_in_C_and_C++) if you'd like to see more.

[^sn-5]: Like [CHERI](https://www.cl.cam.ac.uk/research/security/ctsrd/cheri/cheri-faq.html), with awesome powers around pointer safety.

[^sn-6]: UB is explicitly allowed to alter the behavior of other code, even including operations preceding it! "Alter" here encompasses corrupting, undoing, or altogether preventing (as if it never happened) the outcomes of that other code. To learn more and see examples of UB causing "time travel," check out [this blog post](https://devblogs.microsoft.com/oldnewthing/20140627-00/?p=633).<br><br>Thanks to [these two](https://www.reddit.com/r/rust/comments/z7115a/comment/iy4w557/) [Reddit posts](https://www.reddit.com/r/rust/comments/z7115a/comment/iy51rtl/) for suggesting better wording for these items. For the original text, see the Errata section at the end of this post.

[^sn-7]: OS and hardware security features like [W^X](https://en.wikipedia.org/wiki/W%5EX) can make this unlikely, but self-modifying programs can be built so it's in principle possible through UB as well. Certainly there's no guarantee that UB _won't_ do this!

[^sn-8]: Not all devices have the same level of self-protection against bad inputs written to their control registers. This is the kind of lesson one tends to learn the hard way.

[^sn-9]: I'd be quite impressed if you made a compiler that makes programs run Doom when they encounter UB. Consider it a challenge!

[^sn-10]: This is a corrolary of [falsehood #16](#falsehood-16), further explained in [this post](https://devblogs.microsoft.com/oldnewthing/20140627-00/?p=633). UB can corrupt the behavior of the program both before and after the UB, so the source code you see in your editor no longer matches the actual executing program. Of course, you can still use the debugger to step through assembly instructions and view register state. But highly optimized assembly isn't easy to understand to begin with, and UB-induced weirdness will only make it harder. Overall, a situation that is best avoided. Contributed [here](https://www.reddit.com/r/rust/comments/z7115a/comment/iy51rtl/).

[^sn-11]: In Rust, a counter-example is misusing <span class="nobr"><code>#[no_mangle]</code></span> [to overwrite a symbol with an incorrect type](https://old.reddit.com/r/rust/comments/z7115a/falsehoods_programmers_believe_about_undefined/iy4ztkm/). [A C++ counter-example](https://news.ycombinator.com/item?id=33776047) is violations of the [One Definition Rule (ODR)](https://en.wikipedia.org/wiki/One_Definition_Rule), some of which the compiler is not required to report before causing havoc.

[^sn-12]: Speaking from experience. Hopefully not one you have to relive to believe.

[^sn-13]: Especially the one about running Doom.

[^sn-14]: Surprising, right? It isn't obvious why code that should be perfectly safe to delete would have any effect on the behavior of the program. But it turns out that sometimes [optimizations can make some dead code live again](https://www.ralfj.de/blog/2020/07/15/unused-data.html). EDIT: This was originally footnote #6 before being moved down here.

Copyright (C) Predrag Gruevski 2022. [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en)
