Improve type safety, clarify business logic and improve test coverage with Rust's newtype wrappers.
Contents
- What are newtype wrappers in Rust?
- Why newtype-driven design will change your life
- Newtype essentials
- The most important newtype trait implementations
- Write ergonomic newtype constructors with
From
andTryFrom
- From newtypes to primitives:
AsRef
,Deref
,Borrow
and more - Bypassing newtype validations
- How to eliminate newtype boilerplate
Guide
🦀🦀🦀
What are newtype wrappers in Rust?
You've read The Book, so I'm sure you've heard of newtypes – thin wrappers around other Rust types that allow you to extend their functionality.
But if reading The Book is your only exposure to newtypes, you might think that they're only useful for getting around the Orphan Rule. Think again.
By wrapping Vec<String>
in a tuple struct, The Book shows us how to implement a trait defined in a crate we don't control on a struct which is also outside our control.
rust
This is an essential Rust skill, to be sure. But it pales in comparison to the true power of newtypes as the building blocks of type-safe, maintainable, well tested applications.
Would you like to prevent your codebase from becoming a hot mess? Is your codebase already a hot mess? Well then, let's get into it.
Why newtype-driven design will change your life
Newtypes are the raw ingredients of type-driven design in Rust, a practice which makes it almost impossible for invalid data to enter your system.
Right now, somewhere in your organization's codebase, is a function that looks like this – I guarantee it.
rust
Does this make you uneasy? It should.
At 1
we accept two &str
s: email
and password
. The probability that someone, at some point, will screw up the order of these arguments and pass a password as email
and an email address as password
is 1.
I've been that person. You've been that person. Everyone on your team is that person. It's a time bomb.
This is problematic when you consider that an email address is generally safe to log (depending on your data protection regime 🙃), whereas logging a plaintext password will bring great shame upon your family – and great fines upon your company when the predictable breach occurs.
Because of this liability, your business-critical function has to concern itself with checking that the &str
s it's been given are, in fact, an email address and a password 2
.
It would much rather be doing important business-logic things 3
, but it's cluttered with code making sure its arguments are what they claim to be. What is this, JavaScript?
This uncomfortable cohabitation results in complex error types:
rust
And complex error types means you need a lot of test cases to cover all practical outcomes:
rust
If this looks reasonable and easy to follow, keep in mind that I'm a freak for naming and documentation. You should be too. But we both have to come to terms with the fact that your standard teammate won't name these functions consistently.
When asked what their test function tests, this teammate might tell you, "just read the code". This individual is dangerous, and should be treated with the same fear and suspicion you reserve for C++.
Functions with this many branching return values are not reasonable.
Imagine if the validations inside create_user
occurred in parallel, or that the success of the function was dependent on a subset of validations succeeding, but not all of them. Suddenly you'd find yourself testing permutations of failure cases – a scenario that should induce tremors and a cold sweat.
This is how many real production functions behave and, let me tell you, I don't want to be the one testing that code.
Newtyping is the practice of investing extra time upfront to design datatypes that are always valid. In the long run, you prevent human error, maintain fabulously readable code and make unit testing trivial.
I'll show you how.
Newtype essentials
We agree on why this matters. Excellent. Now, let's take our first step in the right direction.
rust
We can define tuple structs 5
as wrappers around owned String
s that represent an email address and a password. Just like The Book showed us!
Now, our function requires distinctly typed arguments 6
. It's impossible to pass a Password
to as an argument of type EmailAddress
, and vice versa.
We've eliminated one source of human error, but believe me, plenty remain. Never forget, if a software engineer can fuck it up, they will fuck it up.
In the depths of a hangover, you might be tempted to unleash this particular evil into your repo:
rust
Don't.
If you make the wrapped type pub
, there's absolutely nothing to stop you from doing this after a little hair of the dog:
rust
Good work. 👏
A strong test suite will catch this mistake. Hell, a crappy test suite should catch this mistake. But you're on How To Code It – you've sworn off the crappy code. So how can you guarantee that any EmailAddress
or Password
that create_user
gets passed is valid?
I'm glad you asked.
Constructors as the source of truth
Instead of running validations on data that may or may not be valid when it's already inside the core of your application, require your business logic to accept only data that has been parsed into an acceptable representation.
First, let's pull our email address validation code out of the business function it was cluttering. From this point onwards, I'll be giving code only for EmailAddress
– I've left the implementation of Password
as an exercise.
rust
This is very exciting. In order to get hold of an EmailAddress
, a raw string must pass the validation performed in the EmailAddress::new
constructor. That means that any email address passed to create_user
must be valid, so create_user
no longer needs to check – it's all business logic, baby! 10
Voilá. We have drastically simplified the error handling. Both EmailAddress::new
and create_user
now return only one type of error each 7
9
. And notice how, at 9
, even our error types contain guaranteed-valid, type-safe fields!
Now, we can write sane unit tests instead of badly disguised integration tests.
rust
Do you see how we're getting extraordinary value from a small shift in mindset? We're using Rust's remarkable type system to do a lot of heavy lifting for us. If an instance of a newtype exists, we know that it's valid.
Newtype mutability
It makes sense for some newtypes to be mutable. Just take care that every mutating method preserves the newtype's invariants:
rust
NonEmptyVec
is a wrapper around a Vec<T>
that must always have at least one element. I've omitted the constructor for brevity.
NonEmptyVec::pop
takes &mut self
, which means we need to check that we make only valid mutations. We can't pop the final element from a NonEmptyVec
11
!
The flip side of taking these precautions is that other operations become simpler. Unlike Vec<T>::last
, NonEmptyVec<T>::last
is infallible, so we don't need to return an Option<&T>
12
.
The most important newtype trait implementations
So we're agreed that newtypes are fire. 🔥 Let's turn our attention to making them as easy as possible to work with. I'll start simple, and work up to more adventurous code.
derive
standard traits
You likely want your newtype to behave in a similar way to the underlying type. An EmailAddress
is identical to a String
for the purposes of equality, ordering, hashing, cloning and debugging. We can handle this simple case with derive
:
rust
String
also implements Default
, but a "default email address" doesn't make much sense, so we don't derive it. And, since String
isn't Copy
, neither is EmailAddress
.
But what about Display
? There's no derive
macro for Display
, so let's do it manually for now.
rust
Manually implement special cases
For more complex newtypes, manual implementations of common traits may be required. Here's Subsecond
, an f64
wrapper that represents a fraction of a single second in the range 0.0..1.0
.
rust
f64
can implement neither Eq
nor Ord
, since f64::NAN
isn't equal to anything – even itself! f64
has no "total" equality and ordering that encompasses every possible value an f64
may be. How sad. 😔
Happily, that's not true of Subsecond
. There is a total equality and ordering of all f64
s between 0.0
and 1.0
. This calls for manual trait implementations.
rust
Notice that we're now deriving PartialEq
but not PartialOrd
13
. How come?
If an implementation of Ord
exists, a correct implementation of PartialOrd::partial_cmp
simply wraps the return value of Ord::cmp
in an Option
14
15
. This prevents us from accidentally implementing Ord
and PartialOrd
in ways that disagree with each other.
A derived implementation of PartialOrd
wouldn't call our manual Ord
implementation – it would call PartialOrd
on the underlying f64
. This isn't what we want, so we need to define both PartialOrd
and Ord
ourselves, or Clippy will yell at us.
The logic is reversed for Eq
and PartialEq
. If an implementation of PartialEq
exists, Eq
is simply a marker trait that indicates that the type has a total equality. By deriving PartialEq
and manually adding an Eq
implementation, we're telling the compiler, "chill out, I know what's good".
Write ergonomic newtype constructors with From
and TryFrom
At some point your newtypes will – sadly – have to interact with Other People's Code. These Other People didn't have your domain-specific type in mind when they wrote their "code". They either have their own set of newtypes, or they pass around &str
and f64
like lunatics.
We need to make it easy to convert from their types to our types, using classic Rust patterns that won't surprise other devs. That means From
and TryFrom
.
Infallible conversions
Choose From
when your conversion is infallible. The standard library gives us a blanket implementation of Into
for every type that implements From
. For example:
rust
We only implemented From
, but we got Into
for free 16
.
Fallible conversions
More often than not, though, newtypes don't have infallible conversions from other types. We can't turn just any f64
into a Subsecond
!
TryFrom
is the trait of choice in this scenario.
rust
Note how TryFrom
is implemented as a simple call to the Subsecond
constructor 17
. A newtype's constructor serves as its source of truth – never have multiple constructors for the same use case.
For instance, it's valid to have two constructors Subsecond::default()
and Subsecond::new(raw: f64)
, since these serve two distinct purposes. Avoid having competing implementations for Subsecond::new(raw: f64)
and Subsecond::
, however. This doubles the code you need to maintain and test for no benefit. Define conversion traits in terms of a canonical constructor.
From newtypes to primitives: AsRef
, Deref
, Borrow
and more
How should you get an underlying primitive back out of a newtype? I don't know about your database client, but mine accepts &str
, not EmailAddress
.
This requires a little more care than you might think.
Writing user-friendly getters
As a starting point, we should implement getters with common-sense names to return the inner value.
rust
Recall that implementing Display
gives us a free implementation of ToString
. Shadowing this implementation is such bad news that Clippy considers this an error. That's why I haven't defined EmailAddress::to_string
at 18
.
Consider carefully how other developers will use your code. If you're writing an application maintained by one team, which won't be a dependency of some higher-level code, you can stop implementing getters here.
If you're a library author with a specific target audience in mind, think about whether there are conversion traits in third-party crates that your users will almost certainly need.
I'm a contributor to an astrodynamics library for a space mission simulator. We perform many conversions between numeric types, and it's safe to assume that our users will too. This makes implementing num::ToPrimitive
from the popular num
crate on our newtypes a reasonable thing to do.
AsRef
Library code you interface with will expect &str
, not &EmailAddress
.
AsRef
provides a convenient way to get a reference to a newtype's wrapped type:
rust
Deref
Sometimes your newtype will behave so much like its underlying type that you'd like it to dereference directly to the wrapped type.
rust
Stop. Pause what you're listening to. Take a toilet break. Move your cat off your keyboard. I need your full attention.
Deref
is a powerful trait that you should approach like you would disarming a very small bomb. It might not kill you, but it could leave you with nasty burns.
I'll show you the right wires to cut after demonstrating why it's attractive for use with newtypes:
rust
Pretty impressive, no? &email
isn't a &str
19
, but Deref
tells the compiler to treat it like one. This is an example of "deref coercion". We could also have written takes_
.
Deref coercion also gives us all the &self
methods of str
on EmailAddress
20
. This feels similar to inheritance in object-oriented languages, and it's what makes a Box<T>
or Rc<T>
almost as convenient to use as an instance of T
itself. The Rust Reference has the full details on how method lookup works in these cases.
Finally, it causes *email
to desugar to *Deref::deref(&email)
21
.
So why do we have to be cautious with Deref
?
By adding every &self
method of a wrapped type to a newtype, we vastly expand the newtype's public interface. This is the opposite of how we typically choose to publish methods, exposing the minimum viable set of operations and gradually extending the type as new use cases arise.
Does it make sense for your newtype to have all these methods? That's a judgement call. There's no good reason for an EmailAddress
to have an is_empty
method, since an EmailAddress
can never be empty, but implementing Deref
means that str::is_empty
comes along for the ride.
This decision is critical if your newtype wraps a user-controlled type generically. In this situation, you don't know what methods will be available on the user's underlying type. What happens if your newtype defines a method that happens to have the same signature as a method on the user-provided type? The newtype's method takes precedence, so if the user is relying on your newtype's Deref
implementation to call through to their underlying type, they're out of luck.
The best advice I've seen on this issue comes from Rust for Rustaceans (an essential read): prefer associated functions to inherent methods on generic wrapper types.
If a newtype has only associated functions, it has no methods that could inadvertently intercept method calls intended for the wrapped type. Behold:
rust
Calling demonstrate
outputs
text
At 22
we implement Deref
so that SmartBox<T>::deref
returns &T
. Our SmartBox
, being clever, has an associated function informing us that howtocodeit.com is the best Rust resource 23
.
But what's this? A bewildered developer wants to wrap their ConfusedUnitStruct
in SmartBox
! ConfusedUnitStruct
has a method with some very concerning views about the path to Rust mastery 24
.
Luckily, SmartBox
believes that all views should be heard – even the ones that are wrong. Because SmartBox
implements print_best_rust_resource
as an associated function, it can't clash with any method implemented by the types it derefs to. Both functions can be called unambiguously `^25.
Borrow
Borrow
is deceptively simple. A type which is Borrow<T>
can give you a &T
.
rust
When used properly, a newtype that implements Borrow
says, "I am identical to my underlying type for all practical purposes". It is expected that the outputs of Eq
, Ord
and Hash
are the same for both the owned newtype and the borrowed, wrapped type – but this isn't statically enforced. 🫠
For example, if we manually implement PartialEq
for EmailAddress
to be case insensitive (as indeed email addresses are), EmailAddress
cannot implement Borrow<str>
without unleashing this unspeakable darkness:
rust
text
At 26
we make EmailAddress
equality case insensitive, remembering that two equal objects must also have equal hashes 27
. The Borrow
implementation at 28
heralds the end of days.
We instantiate a HashMap
with EmailAddress
keys 29
. HashMap
owns its keys, but allows you to look up entries using any hashable type for which the owned key type implements Borrow
. Since EmailAddress
implements Borrow<str>
, we should be able to query login_attempts
using &str
.
We create an EmailAddress
from the raw, uppercase &str
30
, and insert it into login_attempts
with a value of 2
.
When we attempt to get the value back out using raw_lowercase
as the key 32
... armageddon. There is no corresponding entry in the HashMap
. 😱
This happens because we've violated HashMap
's assumption about how Borrow
should be implemented. A type which is Borrow<T>
must hash and compare identically to T
. Since EmailAddress
hashes and defines equality differently to &str
, we cannot use &str
to look up EmailAddress
es in a HashMap
.
Since these invariants are assumed but not enforced, I consider Borrow
implementations "unofficially unsafe". Scrutinize any Borrow
implementation you see in code review.
In its current form, EmailAddress
should not implement Borrow
. We can fix it though. If we perform the lowercasing in the EmailAddress
constructor, there's no need to implement PartialEq
and Hash
manually.
rust
In this arrangement, EmailAddress
is equivalent to &str
for the sake of hashing, equality and ordering, so it's safe to implement Borrow
. Yet another reason to make your constructors the source of truth for what it means to be a valid instance of a type.
You have been warned.
Bypassing newtype validations
We've come so far. We've learned how to build a type-driven utopia, populated by happy, valid instances of newtypes. Testing is easy now, and human error is a passing memory.
But wait – what's that on the horizon? Hark, tis the dark, acrid smoke of a developer who thinks they know better, seeking to bypass our constructors!
You might be expecting me to cast this evil into the abyss, but bypassing strict constructors can be a reasonable thing to do. We often know that a raw value is valid and don't want to incur the cost of revalidating it just to wrap it in a newtype. Every email address that went into the database was valid, so why would we check them again when pulling them out?
Building backdoors in your newtypes requires care, but there are two ways to limit the fallout: marker types and unsafe
.
Marker types are a powerful feature of Rust's type system which we can use to constrain what can be done with a value depending on its source. Consider this approach a role-based permissions system within the type system itself!
As it sounds, using marker types to represent provenance is more advanced than anything we've discussed so far. In fact, it can introduce more complexity than it solves. For this reason, I'll be saving marker types for a separate article. I know, I'm a tease, but they need at least 3,000 words of their own.
There's plenty to keep us occupied with unsafe
, however.
unsafe
is a signal to other developers that some invariant exists that is not enforced by the compiler, and extra care must be taken to make sure that it holds. It's also a signal to yourself, when a week has passed and you have no memory of what you were doing. A better keyword would perhaps be check_
.
The Rust standard library provides several examples of unsafe
constructors that bypass costly validations. Convention is to suffix such functions with _unchecked
. For example, on std::string::String
we have
rust
which doesn't check that the content of bytes
is valid UTF-8.
Following this convention for EmailAddress
gives us this:
rust
For raw email addresses entering your system from web requests, EmailAddress:new
should be used to ensure validity before they're saved to the database. For email addresses coming out of the database, EmailAddress::
is acceptable, because the validation and case normalization has already been done.
If you ever see unsafe
used in code you're reviewing, make yourself a fresh coffee. You need to pay attention.
How to eliminate newtype boilerplate
We're almost at the end of our journey. We're walking taller, writing better code and better tests than before. We stride like giants over Rust developers still writing validations into their business logic. You know which traits to implement, and when to implement them. You understand the power and responsibility of dipping into unsafe
code. But...
...doesn't this all seem like a lot of effort?
Indeed, writing robust newtypes comes with a lot of boilerplate. I have two libraries to share with you that can slash that boilerplate down to size, but let's make an agreement first: before you start using them, practice writing your own newtypes by hand.
Before introducing magic from external crates, you should understand what you're automating and why. Check out the exercises at the end of this article for a head start!
Using derive_more
to... derive more
derive_more
is a crate designed to ease the burden of implementing traits on newtypes. It lets you derive
implementations for From
, IntoIterator
, AsRef
, Deref
, arithmetic operators, and more.
rust
If you want to simplify the process of writing newtypes while minimizing the magic in your codebase, this is where I'd start.
Going all-in with nutype
nutype
is a formidable procedural macro that generates sanitization and validation code for your newtypes, including dedicated error types.
rust
As you can see, practically everything we've been doing manually up to this point can be generated by nutype
. It's a huge time saver.
Beware the corners that you choose to cut, though. nutype
's generated error messages are quite vague, and there's no way to override them or include additional detail:
text
This hampers debugging and puts the onus on the caller to wrap the newtype's associated error with additional context. It's good practice to do so regardless, but this omission feels out of place in Rust – just think about how good Rust's compiler error messages are.
Go now. I have nothing more to teach you.