Learn to model and handle any error using idiomatic Rust.
Contents
Part III: Structured error handling in Rust
🦀🦀🦀
Introduction
Are you overwhelmed by the amount of choice Rust gives us for handling errors? Confused about when to return a structured error type or a Box<dyn Error>
? Intimidated by Box<dyn Error + Send + Sync + 'static>
's beefy type signature?
Whether you're building an application or library, this guide will help you make the right decision.
I love error handling. I'm obsessed. I work in the finance and space industries, and things go wrong a lot.
Failure cases vastly outnumber success cases. Knowing how to communicate what went wrong, to the right audience, in an appropriate amount of detail is a skill that sets you apart from other developers.
Think about how great the Rust compiler's error messages are compared to other programming languages. We want users of our code to have that same reaction, whether they're on our team or using our library. We want them to be impressed when things go wrong!
Before we dazzle anyone with our error handling skills, though, let's nail the fundamentals.
🦀🦀🦀
Part I
Rust error handling basics
What is an error in Rust?
In Rust, an error is any type that implements the std::error::Error
trait. Here's the definition:
rustsrc/core/error.rs
This is a moderately threatening trait definition, but all four of these methods have default implementations provided for us.
Any type that implements both Debug
and Display
can implement Error
. There's very little manual work required.
In fact, Error::cause
and description
are deprecated in favor of Error::source
and the Display
implementation, respectively. You should never have to worry about them, except when working with older code.
Error::provide
is part of an experimental nightly build, so I won't discuss it here. You won't have to worry about it unless you're working with cutting-edge, unstable code.
The return type of Error::source
warrants closer examination 1
, because we'll see similar types throughout this guide.
You know what Option
is already. &(dyn Error + 'static)
simply means "a reference to some error that may live for the whole duration of the program".
The 'static
lifetime is important for error handling, because errors are often handled long after the code that causes them returns, sometimes on a different thread.
Good luck handling an error that's been dropped unexpectedly! Rust protects us from this scenario.
You'll often see 'static
alongside Send
and Sync
bounds. dyn Error + Send + Sync + 'static
describes "some error that can live as long as the program, be sent between threads by value or shared across threads by immutable reference".
Error::source's return type, &(dyn Error + 'static)
, doesn't make any promises about thread safety.
In general, standard library code places more relaxed bounds on dynamic errors than you'll see in the broader ecosystem and use in your own projects.
This allows the widest variety of things to behave as errors, with stricter requirements left to the user's discretion.
How do we make an Error
type static? Simple – use only owned fields, or fields which specify the 'static
lifetime for references and trait objects.
The following type is only 'static
if the reference assigned to field
happens to be 'static
itself:
rust
These are always 'static
:
rust
Errors in the context of Result
Surprisingly, the type wrapped by std::result::Result::Err
doesn't need an Error
bound:
rust
You can use whatever type you want to represent an error inside Result
.
The same is true for associated types in many trait definitions, such as std::str::FromStr
:
rust
Err
isn't bounded by Error
2
!
Although you can use any types in these contexts, I strongly encourage you to only use Error
implementations.
Other Rust developers will expect these things to behave like Error
s, and we should strive to be as unsurprising as possible. That doesn't stop you from implementing additional functionality on your Error
s, though.
There are exceptions to this rule, often within the standard library itself. Look out for the discussion of Error::downcast
and Box<dyn Error>
in the next section.
Okay, we've nailed the essentials. Let's get into the choice that confuses most new Rust developers: should we use dynamic or statically typed errors?
🦀🦀🦀
Part II
Dynamic error handling in Rust
When to use Box<dyn Error>
and friends
Box<dyn Error>
is Rust's vaguest error type. It's just some object that implements Error
🤷.
Box<dyn Error + Send + Sync + 'static>
is its thread-safe counterpart.
The Error
is boxed because, as a dynamic trait object, we don't know its size at compile time. We have to allocate it on the heap.
Box<dyn Error>
simply says "something went wrong, check my message or my optional cause to know more".
This has two key properties:
- It's excellent for quickly communicating that something went wrong.
- It's god-awful at providing structured data for an error handler to act on.
If you would like consumers of your error – whether they're error handlers in your own application or users of your library – to be able to dynamically change their program's behavior based on the internal details of an error, don't use Box<dyn Error>
.
Parsing error details from messages is fragile and hard to maintain. If you expect people to rely on your error messages to drive program behavior, you've also inadvertently made those error messages part of your public API. If that error message changes, code that parses it may break.
If you know that there's nothing useful a receiving program can do with the error, but that the message is helpful for a human debugger, then Box<dyn Error>
and related trait objects are very convenient.
I work on an astrodynamics library for a space mission simulator funded by the European Space Agency. If someone inputs garbage data, like the time 23:59:60
on a year without leap seconds, there's really no way to recover. In this scenario, it would be perfectly reasonable to return Box<dyn Error>
with a message that explains how silly they are.
Now, we don't actually do this – that's a story for Part III on structured errors – but it is a valid Rust error handling strategy.
Handling dynamic errors from other people's code
What if library code you call returns a dynamic error?
Hopefully, you just want to log it for a future debugging session. Surely the thoughtfully crafted error message will give you everything you need to solve the problem 🤡.
But say it doesn't, and you need to find out what's inside the dyn Error
?
I don't envy you this situation. It's often an indicator of bad library design.
Moaning about it won't help you in the moment, though. You need to downcast.
Downcasting errors in Rust
Did you know that you can get a concrete error type back out of a boxed dyn Error
?
I'm not going to get into how the std::error
crate does this, because it involves some scary unsafe
code that has nothing to do with handling errors. That won't stop us from using it.
dyn Error
trait objects have three methods for attempting a transformation into some concrete type T
:
rust
If the dyn Error
is of type T
, you'll get a T
for closer inspection. Whether that T
is owned or borrowed depends on which method you call.
All of this is useless if the underlying type is private to the crate the dyn Error
came from. In this scenario, politely explain your predicament to the maintainers, then scream into a pillow.
Avoid forcing callers to downcast
I don't encourage designing your errors to require downcasting to figure out what's gone wrong.
If you choose to return a dynamic error, you are communicating that the internal structure of the error shouldn't matter to callers.
Forcing them to dig into your crate's error types, identify the possible culprits, downcast, and react dynamically screams "leaky implementation details".
This is Rust, not Go.
So what's the point of downcasting?
If downcasting isn't an ideal way to handle errors, what is it good for? Let's use Actix Web 4.7.0 as an example.
The primary Actix error struct, Error
, has a single field, cause
, that holds a Box<dyn ResponseError>
.
rustactix-web src/error/error.rs
ResponseError
is a trait with identical bounds to std::error::Error
, but specifies methods to return a status code and an HTTP response body:
rustactix-web src/error/response_error.rs
It has default implementations for both of these methods, but they're not important here.
What is important is the large number of concrete error types that Actix provides ResponseError
implementations for: Box<dyn std::
, Infallible
, serde_json::Error
, std::io::Error
, and many more.
Naturally, Actix users can implement ResponseError
for their own types too, so actix_web::error::Error
chooses a dynamic error type to wrap a theoretically infinite variety of ResponseError
s.
Actix itself doesn't care about the internal structure of any particular ResponseError
. It just needs a way to get a status code and response body when something goes wrong. This is a scenario where dynamic errors shine.
But you know who might care? The team whose code produced the error.
If an Actix user converts an error into Actix's opaque error format, they should reasonably expect to be able to get it out again. That's why actix_web::error::Error
provides the as_error
method, which downcasts to the user's original error type.
rustactix-web src/error/error.rs
There are no leaky abstractions here, because the caller of as_error
also owns the code that created the error in the first place.
Actix never calls downcast_ref
itself. It doesn't use downcast_ref
to handle errors. Rather, it provides as_error
as a means for external parties using Actix's wrapper type to inspect their own implementation details.
Handling Rust errors with anyhow
What discussion of dynamic error handling in Rust would be complete without talking about anyhow?
anyhow is Rust's most-loved crate for handling errors in the laziest way possible.
anyhow::Error
is effectively a Box<dyn Error + Send + Sync + 'static>
with bells on. It always gives you a backtrace, and, unlike Box
, it takes up only one machine word, not two (a "narrow pointer").
anyhow comes with a selection of macros, methods and blanket implementations to make wrapping and adding context to any Display + Send + Sync + 'static
type a breeze.
Just like actix_web::error::Error
, anyhow::Error
is a wrapper for user-provided types. Seeing as those users might want their types back, it provides downcast
methods in your three favorite flavors: owned, &
and &mut
.
I use anyhow often, and I find it's a better fit for applications than libraries.
If you return a concrete anyhow::Error
across a crate boundary, you force the caller to depend directly on anyhow, and not everyone will want to.
As a general rule, return only your own or standard library error types across crate boundaries to minimize leakage of your implementation details into other people's code.
Who is your audience and what will they do with your error?
I hope it's becoming clear that how you choose to handle your errors depends on two key things:
- Who the audience for the error is.
- What they should be able to do with an error you give them.
Dynamic errors are great for consolidating a wide range of error types and returning them in a format where the only reasonable thing to do is write to output, whether that's a logger or an HTTP connection.
In Part III, we'll look at structured, statically typed errors as carriers of data that we can handle programmatically. More than that though, we'll see how they serve as invaluable, innate documentation for other developers.
When we understand both of these error handling styles, we'll bring them together, equipping ourselves with the knowledge to handle any kind of error that might arise, and avoid some nasty footguns.
🦀🦀🦀
Part III
Structured error handling in Rust
Sane APIs support programmatic error handling
Knock knock. It's Hyrum's Law.
In other words, someone, somewhere will end up depending on your error messages. You might not say these messages are part of your public API, but the public has access to them, and if they've got no better way to handle your errors, they're going to if
-else
your strings.
Changing an error message in the popular library you maintain is going to fuck someone up – and they will end up at your door. Knock knock.
If you're thinking that this is a low-impact edge-case, consider that error strings from deep within the Go standard library are depended on by programs of real consequence.
Here's a sample from Go's http
package:
gogo src/net/http/request.go
I didn't write the comment at 3
. One of the Go team did. Good thing, too, because here's Grafana depending on it.
I'm calling out Go because it was famously unergonomic to discern whether a specific type of error was present in a long chain of errors. Things improved in Go 1.13, but Hyrum's Law had already had its way with the Go codebase.
In fact, MaxBytesError
was only added to Go's public API in 2022, replacing the anonymous error that forced Grafana and others to depend on the error string. The message it outputs can't change without breaking their code.
Shouldn't they have known better than to depend on an undocumented implementation detail? Are they software engineers or kindergarteners?
Kids need structure, and Go didn't give them any. There was no stable way to identify this error.
This is precisely why you should avoid forcing callers to downcast your Rust errors. Whenever there's the slightest possibility that someone might want to react to your error programmatically, a dynamic error type won't do.
Luckily, Rust makes it simple to build strong, beautiful errors into our API contracts.
Build expressive Rust errors with enums
Consider a simple, Gregorian Date
type:
rust
When deciding what type of error to return 4
, start by listing all the ways someone might lose their mind when calling your function. In our case:
- The month may be outside the range
1..=12
. - The day may be zero, or greater than the number of days in the given month.
- The caller requests February 29th on a non-leap year.
Expressing the constructor return type as Result<Self, Box<dyn Error>>
is convenient – just box a string explaining the problem. Convenient, that is, until Hyrum wants his pound of flesh. We can't change these strings because we've forced people to depend on them.
In Rust, our weapon of choice is the enum
:
rust
DateError
gives us two massive benefits:
- The entire universe of errors that the caller needs to handle is obvious from the function signature. There's no need to dig through the
Date
constructor call chain to figure out what errors it might return. This is a key shortcoming with dynamic errors or, God forbid, exceptions in other languages. - It encodes our list of problem states in a way that callers can respond to programmatically. They match each variant of interest to act on the cause, supported by structured data describing the invalid fields.
DateError
's variants are a documented part of our public API. Adding or removing variants or their fields are still breaking changes, but, unlike string error messages, they're governed by an explicit contract between us and our users.
If your users still choose to depend on your messages rather than your enum variants, that's very much a them problem, not a you problem, which is the best kind of problem.
Composing structured error types
So far, so simple. But in real-life code, fallible functions call other fallible functions, and each failure may be represented by a different error type. We need to compose these errors into a single return type.
Let's extend our budding time library with a new struct and a corresponding error:
rust
The UtcTimestampError
variants for hour-, minute- and second-related errors are obvious. However, the International Earth Rotation and Reference Systems Service (IERS – they hold the best parties) occasionally adds leap seconds to keep UTC in sync with the rotation of the Earth.
This is why – and I say this as an author of an astronomical time library – UTC is the Devil's Timescale.
Leap seconds always occur at 23:59:60
. If we have a second
field of 60
, and hour
and minute
fields that aren't 23
and 59
, respectively, someone's messed up. We capture this with UtcTimestampError::
.
Now, leap seconds don't happen every year, praise be to IERS. And they only occur in June or December. So when we define a UtcDateTime
time, we need to account for three things:
DateError
s.UtcTimestampError
s.- Leap seconds with valid timestamps, but which fall on a year or month in which there was no leap second.
How do we compose three errors that occur in the course of a single function call? That's right – with another enum.
rust
DateError
and UtcTimestampError
are thinly wrapped in UtcDateTime
-specific equivalents. Their messages carry a little more context for human readers.
Having access to both a date and a time, the UtcDateTime
constructor can also validate whether a leap second timestamp falls on a leap second date. UtcDateTimeError::
is a new variant specific to the compound struct.
Ok, next question: what error type should this alternative UtcDateTime
constructor return?
rust
With the onus on the caller to construct valid Date
s and UtcTimestamp
s and handle their errors, the constructor's error space shrinks to just InvalidLeapSecond
, which could plausibly become its own struct
error type.
What's the proper way to support both constructors? This?
rust
Maybe.
This approach succeeds in giving the caller only the most relevant information about the issue, at a cost to you, the developer. All this nesting creates a lot of code. We want to avoid module-scale umbrella errors, but while a bespoke error per domain type is one thing, you may think that a bespoke error per function is excessive.
Ultimately, you decide whether it's reasonable for your users to handle unrelated error variants. Trust me, they'll let you know if not. Stick to our rule of thumb and you'll be fine:
How to improve the ergonomics of your Rust errors
Manually implementing errors is boilerplatey. In this section, we'll remove that barrier to implementing robust error types for every occasion.
thiserror
A titan among error handling crates, thiserror dramatically simplifies the process of defining and constructing situational error types.
It's the order to anyhow's dynamic chaos. Perfectly balanced, as all things should be.
Let's reimplement UtcDateTimeError
with thiserror:
rust
First off, the manual Display
implementation is gone, replaced by annotations. thiserror::Error
is a derive macro that handles the legwork for us 5
.
At 6
, we take advantage of the transparent
annotation to make thiserror forward the error message from the wrapped DateError
. This is useful when the wrapping enum doesn't have any additional context that could clarify the problem for users.
Next, we generate an implementation of From<DateError>
for UtcDateTimeError::Date
, and From<UtcTimestampError>
for UtcDateTimeError::Time
7
. This makes constructing the UtcDateTimeError
wrapper from its causes trivial.
Best of all, Result
s containing either DateError
or UtcTimestampError
will be transparently morphed into Result<T, UtcDateTimeError>
when returned with the try operator, ?
:
rust
Unlike DateError
and UtcTimestampError
, UtcDateTimeError::
has no Display
implementation of its own, so the final step is to generate one at 8
, interpolating the wrapped Date
.
Structured error handling examples from the Rust ecosystem
Don't take my word for it. Here are prime examples of structured errors from two popular Rust crates.
tracing
tracing is the number-one framework for instrumenting your Rust applications. Collecting the events you emit requires a collector – some implementation of tracing_
. As the name suggests, there can be only one global default collector. What happens if you try to set it twice?
rusttracing tracing-core/src/dispatch.rs
Since there's only one way setting the global default can fail – when it's already been set – this is neatly represented by an empty struct: SetGlobalDefaultError
9
.
wgpu
Here's an all-singing, all-dancing example from wgpu, a cross-platform graphics API based on the WebGPU standard. Creating compute shader pipelines is fraught with danger:
rustwgpu wgpu-core/src/pipeline.rs
CreateComputePipelineError
showcases a thiserror-derived enum error. It includes variants composed from granular, low-level errors 11
, and new errors exclusive to the creation of the pipeline 12
.
If you'd like to see more examples from wgpu
, which adopts the maximalist approach of having distinct error types for each operation, wgpu_
contains several error definitions, including one 27-variant monster!
std::io::Error
, Rust's most challenging error type
std::io::Error
isn't the prettiest part of the Rust standard library. It's trying to solve a very hard problem – to represent any possible IO error, on all supported operating systems, with the smallest possible overhead. In doing so, it ends up being too low-level for some use cases, and too high-level for others.
We'll scavenge what looks tasty, and leave the bits that look off. Like vultures.
Here's the implementation (for clarity, I've left out the bit-packing optimization used on 64-bit systems):
rustrust library/std/src/io/error.rs|repr_unpacked.rs
Aha! Four error representations wearing a trench coat! And they would have gotten away with it if it wasn't for us meddling crabs.
ErrorData
specifies four, broad forms of error 13
:
Os
wraps error codes returned by the operating system.RawOsError
is ausize
alias.SimpleMessage
is, simply, an error message.Simple
wraps anErrorKind
– another enum, which we'll discuss imminently.Custom
is a catch-all variant for anything that isn't covered by the other three. Specifically,std::io::Error
uses anErrorData<Box<Custom>>
, meaningErrorData::Custom
holds aBox<Custom>
.Custom
itself combines anErrorKind
and a boxed, dynamic error. Capeesh?
I won't reproduce ErrorKind
in full – it has more variants than Covid. Here's a sample of the many, many ways IO goes wrong:
rustrust library/std/src/io/error.rs
ErrorKind
is a smash-up of network failures 14
, filesystem errors 15
and OS process complaints 16
. There are write-only error cases, like ReadOnlyFilesystem
, in an enum that's shared by read operations. This is not the tight error definition we're used to.
Down in the basement of your program, std::io
doesn't know what sort of operation you're attempting. It shovels bytes into the OS via the Write
trait, and gets bytes out via the Read
trait. std::io::Error
is baked into their definitions.
What are the consequences? Since Read
and Write
depend on std::io::Error
, these traits must live in std
, not core
. std::io::Error
presumes the presence of an operating system. But if you're running no_std
, there's a chance you are the operating system! no_std
programs have to reinvent these traits without this dependency.
There's strangeness for std
programs too. Read
and Write
are the basis for higher-level readers and writers. If you design an HTTP connection, a database connection, a packet library, a logger, or anything else with sophisticated IO, odds are that you'll define specialized readers and writers based on lower-level implementations of Read
and Write
.
Since specialist implementations must return std::io::Error
to satisfy the IO trait signatures, the Rust devs had to give std::io::Error
a way to represent errors that std::io
doesn't know about.
That's what Custom
is for. It's built from any ErrorKind
variant – probably Other
– and a Box<dyn Error + Send + Sync>
. In other words, custom readers and writers are forced to represent their custom errors dynamically. In this mirror world, the more specialized the use case, the more vague std::io::Error
becomes.
What's up with Other
and Uncategorized
?
Ever get that creeping feeling – late at night, long after the world has gone to sleep – of something lurking just beyond the corner of your eye? That's Hyrum.
He comes for all of us, just like he came for std::io
.
That link directs to a Rust language tracking issue, in which a number of Rust Nightly users complain of failing tests following the addition of several new ErrorKind
variants. But ErrorKind
is non-exhaustive, so how did this happen?
Hyrum's Law.
Other
17
was formerly a catch-all variant not just for Rust users, but for the Rust standard library itself. For example, there is no ErrorKind
representing a failure to write to stdout
. Instead, a message describing the problem was bundled into Other
.
Did the ErrorKind
documentation explicitly warn users that this was not a stable contract, and that these "other" errors may be replaced as time went on? Yes, it did.
Did Rust users depend on this anyway? Naturally.
When these vague errors became bespoke ErrorKind
variants, code that expected to find them in Other
stopped working.
Enter Uncategorized
. Reason can't stop developers from depending on implicit behavior, but rustc can.
Uncategorized
is the new home for errors the Rust team hasn't figured out what to do with. The standard library no longer assigns errors to Other
. Since Uncategorized
is marked as unstable
18
, you can't match it without enabling an unstable feature yourself – you know what you're getting yourself in for.
That's std::io::Error
. Pros: enum-based variants for every error kind Rust knows about. A valiant, workable solution to an unforgiving problem. Cons: everything else.
When designing your own error types, consider these pitfalls carefully, and plan your escape route.
Now that you're equipped with the strengths and weaknesses of both dynamic and structured errors in Rust, it should be clear that you're not faced with a binary choice to adopt one or the other.
This isn't Highlander. anyhow
and thiserror
serve different purposes and may happily coexist within the same codebase.
Choose how to represent each error on a case-by-case basis, guided by what you expect users to do with your error.
And keep an eye out for Hyrum.
He hunts at night.
🦀🦀🦀
Part IV
Error handling in exceptional circumstances: panic!
, no_std
and FFIs
Let me cook.