Sentinel values
Sentinel values are in-band value that indicates a special situation, such as having reached the end of valid data in an iterator.
nullptr
Many designs in C++ borrow the convention from C of using a null pointer as a
sentinel value for a method that returns owned pointers. For example, a method
that parses a large structure may produce std::nullptr
in the case of failure.
A similar situation in Rust would make use of the type
Option<Box<LargeStructure>>
.
#include <memory>
class LargeStructure {
int field;
// many fields ...
};
std::unique_ptr<LargeStructure>
parse(char *data, size_t len) {
// ...
// on failure
return nullptr;
}
#![allow(unused)] fn main() { struct LargeStructure { field: i32, // many fields ... } fn parse( data: &[u8], ) -> Option<Box<LargeStructure>> { // ... // on failure None } }
The Box<T>
type has the same meaning as std::unique_ptr<T>
in terms of being
an uniquely owned pointer to some T
on the heap, but unlike std::unique_ptr
,
it cannot be null. Rust's Option<T>
is like std::optional<T>
in C++, except
that it can be used with pointers and references. In those cases (and in some
other
cases) the
compiler optimizes the representation to be the same size as Box<T>
by
leveraging the fact that Box
cannot be null.
In Rust it is also common to pay the cost for the extra byte to use a return
type of Result<T, E>
(which is akin to std::expected
in C++23) in order to
make the reason for the failure available at runtime.
Integer sentinels
When a possibly-failing function produces an integer, it is also common to use
an otherwise unused or unlikely integer value as a sentinel value, such as 0
or INT_MAX
.
In Rust, the Option
type is used for this purpose. In cases where the zero
value really is not possible to produce, as with the gcd algorithm above, the
type NonZero<T>
can be used to indicate that fact. As with Option<Box<T>>
,
the compiler optimizes the representation to make use of the unused value (in
this case 0
) to represent the None
case to ensure that the representation of
Option<NonZero<T>>
is the same as the representation of Option<T>
.
#include <algorithm>
int gcd(int a, int b) {
if (b == 0 || a == 0) {
// returns 0 to indicate invalid input
return 0;
}
while (b != 0) {
int temp = b;
b = a % b;
a = temp;
}
return std::abs(a);
}
use std::num::NonZero; fn gcd( mut a: i32, mut b: i32, ) -> Option<NonZero<i32>> { if a == 0 || b == 0 { return None; } while b != 0 { let temp = b; b = a % b; a = temp; } // At this point, a is guaranteed to not be // zero. The `Some` case from `NonZero::new` // has a different meaning than the `Some` // returned from this function, but here it // happens to coincide. NonZero::new(a.abs()) } fn main() { assert!(gcd(5, 0) == None); assert!(gcd(0, 5) == None); assert!(gcd(5, 1) == NonZero::new(1)); assert!(gcd(1, 5) == NonZero::new(1)); assert!(gcd(2 * 2 * 3 * 5 * 7, 2 * 2 * 7 * 11) == NonZero::new(2 * 2 * 7)); assert!(gcd(2 * 2 * 7 * 11, 2 * 2 * 3 * 5 * 7) == NonZero::new(2 * 2 * 7)); }
As an aside, it is also possible to avoid the redundant check for zero at the end, and without using unsafe Rust, by preserving the non-zeroness property throughout the algorithm.
use std::num::NonZero; fn gcd(x: i32, mut b: i32) -> Option<NonZero<i32>> { if b == 0 { return None; } // a is guaranteed to be non-zero, so we record the fact in the type of a. let mut a = NonZero::new(x)?; while let Some(temp) = NonZero::new(b) { b = a.get() % b; a = temp; } Some(a.abs()) } fn main() { assert!(gcd(5, 0) == None); assert!(gcd(0, 5) == None); assert!(gcd(5, 1) == NonZero::new(1)); assert!(gcd(1, 5) == NonZero::new(1)); assert!(gcd(2 * 2 * 3 * 5 * 7, 2 * 2 * 7 * 11) == NonZero::new(2 * 2 * 7)); assert!(gcd(2 * 2 * 7 * 11, 2 * 2 * 3 * 5 * 7) == NonZero::new(2 * 2 * 7)); }
std::optional
In situations where std::optional
would be used as a sentinel value in C++,
Option
can be used for the same purpose in Rust. The main difference between
the two is that safe Rust requires either explicitly checking whether the value is
None
, while in C++ one can attempt to access the value without checking (at
the risk of undefined behavior).