C++ to Rust Phrasebook
This book is designed to help C++ programmers learn Rust. It provides translations of common C++ patterns into idiomatic Rust. Each pattern is described through concrete code examples along with high-level discussion of engineering trade-offs.
The book can be read front-to-back, but it is designed to be used random-access. When you are writing Rust code and think, "I know how to do this in C++ but not Rust," then look for the corresponding chapter in this book.
This book was hand-written by expert C++ and Rust programmers at Brown University's Cognitive Engineering Lab. Our goal is to provide accurate information with a tasteful degree of detail. No text in this book was written by AI.
If you would like updates on when we add new chapters to this book, you can drop your email here.
Other resources
If you have zero Rust experience, you might consider first reading The Rust Programming Language or getting a quick overview at Learn X in Y Minutes.
If you are primarily an embedded systems programmer using C or C++, this book is a complement to The Embedded Rust Book.
Compared to resources like the Rustonomicon and Learn Rust With Entirely Too Many Linked Lists, this book is less about "Rust behind the scenes" and more about explicitly describing how Rust works in terms of C++.
Feedback on this book
At the bottom of every page there is a link to a form where you can submit feedback: typos, factual errors, or any other issues you spot.
If you answer the quizzes at the end of each chapter, we will save your responses anonymously for research purposes.
Constructors
In C++, constructors initialize objects. At the point when a constructor is executed, storage for the object has been allocated and the constructor is only performing initialization.
Rust does not have constructors in the same way as C++. In Rust, there is a
single fundamental way to create an object, which is to initialize all of its
members at once. The term "constructor" or "constructor method" in Rust refers
to something more like a factory: a static method associated with a type (i.e.,
a method that does not have a self
parameter), which returns a value of the
type.
#include <thread>
unsigned int cpu_count() {
return std::thread::hardware_concurrency();
}
class ThreadPool {
unsigned int num_threads;
public:
ThreadPool() : num_threads(cpu_count()) {}
ThreadPool(unsigned int nt) : num_threads(nt) {}
};
int main() {
ThreadPool p1;
ThreadPool p2(4);
}
fn cpu_count() -> usize { std::thread::available_parallelism().unwrap().get() } struct ThreadPool { num_threads: usize } impl ThreadPool { fn new() -> Self { Self { num_threads: cpu_count() } } fn with_threads(nt: usize) -> Self { Self { num_threads: nt } } } fn main() { let p1 = ThreadPool::new(); let p2 = ThreadPool::with_threads(4); }
In Rust, typically the primary constructor for a type is named new
, especially if it
takes no arguments. (See the chapter on default
constructors.) Constructors based on
some specific property of the value are usually named with_<something>
, e.g.,
ThreadPool::with_threads
. See the naming
guidelines for the
conventions on how to name constructor methods in Rust.
If the fields to be initialized are visible, there is a reasonable default value, and the value does not manage a resource, then it is also common to use record update syntax to initialize a value based on some default value.
struct Point { x: i32, y: i32, z: i32, } impl Point { const fn zero() -> Self { Self { x: 0, y: 0, z: 0 } } } fn main() { let x_unit = Point { x: 1, ..Point::zero() }; }
Despite the name, "record update syntax" does not modify a record but instead creates a new value based on another one, taking ownership of it in order to do so.
Storage allocation vs initialization
In Rust, the actual construction of a structure or enum value occurs where the
structure construction syntax ThreadPool { ... }
is, after the evaluation of the
expressions for the fields.
A significant implication of this difference is that storage is not allocated
for a struct in Rust at the point where the constructor method (such as
ThreadPool::with_threads
) is called, and in fact is not allocated until after the
values of the fields of a struct have been computed (in terms of the semantics
of the language — the optimizer may still avoid the copy). Therefore there is no
straightforward way in Rust to translate patterns such as a class which stores a pointer to
itself upon construction (in Rust, this requires tools like Pin
and MaybeUninit
).
Fallible constructors
In C++, the primary way constructors can indicate failure is by throwing exceptions. In Rust, because constructors are normal static methods, fallible constructors
can instead return Result
(akin to std::expected
) or Option
(akin to
std::optional
).
#include <iostream>
#include <stdexcept>
class ThreadPool {
unsigned int num_threads;
public:
ThreadPool(unsigned int nt) : num_threads(nt) {
if (num_threads == 0) {
throw std::domain_error("Cannot have zero threads");
}
}
};
int main() {
try {
ThreadPool p(0);
} catch (const std::domain_error &e) {
std::cout << e.what() << std::endl;
}
}
struct ThreadPool { num_threads: usize, } impl ThreadPool { fn with_threads(nt: usize) -> Result<Self, String> { if nt == 0 { Err("Cannot have zero threads".to_string()) } else { Ok(Self { num_threads: nt }) } } } fn main() { match ThreadPool::with_threads(0) { Err(err) => println!("{err}"), Ok(p) => { /* ... */ } } }
See the chapter on exceptions for more information on how C++ exceptions and exception handling translate to Rust.
Default constructors
C++ has a special concept of default constructors to support several scenarios
in which they are implicitly called.
Rust does not have the same notion of a default constructor. The most similar mechanism is the Default
trait.
class Person {
int age;
public:
// Default constructor
Person() : age(0) {}
}
#![allow(unused)] fn main() { struct Person { age: i32, } impl Person { pub const fn new() -> Self { Self { age: 0 } } } impl Default for Person { fn default() -> Self { Self::new() } } }
If a structure has a useful default value (such as would be constructed by a
default constructor in C++), then the type should provide
both
a new
method that takes no arguments and an implementation of Default
.
Implicit initialization of class members
In C++, if a member is not explicitly initialized by a constructor, then it is default-initialized. When the type of the member is a class, the default-initialization invokes the default constructor.
In Rust, if all of the fields of a struct implement the Default
trait, then an
implementation for the structure can be provided by the compiler.
class Person {
int age;
public:
Person() : age(0) {}
}
class Student {
Person person;
}
#![allow(unused)] fn main() { #[derive(Default)] struct Person { age: i32, } #[derive(Default)] struct Student { person: Person, } }
The #[derive(Default)]
macros in Rust are equivalent to writing the following.
#![allow(unused)] fn main() { struct Person { age: i32, } impl Default for Person { fn default() -> Self { Self { age: Default::default() } } } struct Student { person: Person, } impl Default for Student { fn default() -> Self { Self { person: Default::default() } } } }
Unlike C++ where the default initialization value for integers is indeterminate, in Rust the default value for the primitive integer and floating point types is zero.
Deriving the Default
trait has a similar effect
on code concision as eliding initialization in C++. In situations where all of
the types implement the Default
trait, but only some of the fields should have
their default values, one can use struct update
syntax
to define a constructor method without enumerating the values for all of the
fields.
#![allow(unused)] fn main() { #[derive(Default)] struct Person { age: i32, } #[derive(Default)] struct Student { person: Person, favorite_color: Option<String>, } impl Student { pub fn with_favorite_color(color: String) -> Self { Student { favorite_color: Some(color), ..Default::default() } } } }
Implicit initialization of array values
In C++, arrays without explicit initialization are default-initialized using the default constructors.
In Rust, the value with which to initialize the array must be provided.
class Person {
int age;
public:
Person() : age(0) {}
};
int main() {
Person people[3];
// ...
}
#[derive(Default)] struct Person { age: i32, } fn main() { // std::array::from_fn provides the index to the callback let people: [Person; 3] = std::array::from_fn(|_| Default::default()); // ... }
If the type happens to be trivially copyable, then a shorthand can be used.
#[derive(Clone, Copy, Default)] struct Person { age: i32, } fn main() { let people: [Person; 3] = [Default::default(); 3]; // ... }
Container element initialization
In C++, the default constructor could be used to implicitly define collection
types, such as std::vector
. Before C++11, one value would be default
constructed, and the elements would be copy constructed from that initial
element. Since C++11, all elements are default constructed.
As with array initialization, the values must be explicitly specified in Rust. The vector can be constructed from an array, enabling the same syntax as with arrays.
#include <vector>
class Person {
int age;
public:
Person() : age(0) {}
}
int main() {
std::vector<Person> people(3);
// ...
}
#[derive(Default)] struct Person { age: i32, } fn main() { let people_arr: [Person; 3] = std::array::from_fn(|_| Default::default()); let people: Vec<Person> = Vec::from(people_arr); // ... }
In Rust, the vector can also be constructed from an iterator.
#[derive(Default)] struct Person { age: i32, } fn main() { let people: Vec<Person> = (0..3).map(|_| Default::default()).collect(); // ... }
If the type implements the Clone
trait, then the array can be constructed
using the vec!
macro. See the chapter on copy
constructors for more
details on Clone
.
#[derive(Clone, Default)] struct Person { age: i32, } fn main() { let people: Vec<Person> = vec![Default::default(); 3]; // ... }
Implicit initialization of local variables
In C++, the default constructor is used to perform default-initialization of local variables that are not explicitly initialized.
In Rust, initialization of local variables is always explicit.
class Person {
int age;
public:
Person() : age(0) {}
};
int main() {
Person person;
// ...
}
#[derive(Clone, Default)] struct Person { age: i32, } fn main() { let person = Person::default(); // ... }
Implicit initialization of the base class object
In C++, the default constructor is used to initialize the base class object if no other constructor is specified.
class Base {
int x;
public:
Base() : x(0) {}
};
class Derived : Base {
public:
// Calls the default constructor for Base
Derived() {}
};
Since Rust does not have inheritance, there is no equivalent to this case. See the chapter on implementation reuse or the section on traits in the Rust book for alternatives.
std::unique_ptr
There are some additional cases where the Default
trait is used in Rust, but
default constructors are not used for initialization in C++.
Rust's equivalent of smart pointers implement Default
by delegating to the
Default
implementation of the contained type.
#[derive(Default)] struct Person { age: i32, } fn main() { let b: Box<Person> = Default::default(); // ... }
This differs from the treatment of std::unique_ptr
in C++ because unlike Box
,
std::unique_ptr
is nullable, and so the default constructor for
std:unique_ptr
produces a pointer that owns nothing. The equivalent type in
Rust is Option<Box<Person>>
, for which the Default
implementation produces
None
.
Other uses of Default
Option::unwrap_or_default
makes use of Default
, which makes getting a default value when the Option
does not contain a value more convenient.
#![allow(unused)] fn main() { fn go(x: Option<i32>) { let a: i32 = x.unwrap_or_default(); // if x was None, then a is 0 // ... } }
In C++, std::optional
does not have an equivalent method.
Copy and move constructors
In both C++ and Rust, one rarely has to write copy or move constructors (or
their Rust equivalents) by hand. In C++ this is because the implicit definitions
are good enough for most purposes, especially when using smart pointers (i.e.,
following the rule of
zero). In Rust this
is because move semantics are the default, and the automatically derived
implementations of the Clone
and Copy
traits are good enough for most
purposes.
For the following C++ classes, the implicitly defined copy and move constructors are sufficient. The equivalent in Rust uses a derive macro provided by the standard library to implement the corresponding traits.
#include <memory>
#include <string>
struct Age {
unsigned int years;
Age(unsigned int years) : years(years) {}
// copy and move constructors and destructor
// implicitly declared and defined
};
struct Person {
Age age;
std::string name;
std::shared_ptr<Person> best_friend;
Person(Age age,
std::string name,
std::shared_ptr<Person> best_friend)
: age(age), name(name),
best_friend(best_friend) {}
// copy and move constructors and destructor
// implicitly declared and defined
};
#![allow(unused)] fn main() { use std::rc::Rc; #[derive(Clone, Copy)] struct Age { years: u32, } #[derive(Clone)] struct Person { age: Age, name: String, best_friend: Rc<Person>, } }
User-defined constructors
On the other hand, the following example requires a user-defined copy and move
constructor because it manages a resource (a pointer acquired from a C library).
The equivalent in Rust requires a custom implementation of the Clone
trait.
#include <cstdlib>
#include <cstring>
// widget.h
struct widget_t;
widget_t *alloc_widget();
void free_widget(widget_t *);
void copy_widget(widget_t *dst, widget_t *src);
// widget.cc
class Widget {
widget_t *widget;
public:
Widget() : widget(alloc_widget()) {}
Widget(const Widget &other) : widget(alloc_widget()) {
copy_widget(widget, other.widget);
}
Widget(Widget &&other) : widget(other.widget) {
other.widget = nullptr;
}
~Widget() {
free_widget(widget);
}
};
#![allow(unused)] fn main() { mod example { mod widget_ffi { // Models an opaque type. // See https://doc.rust-lang.org/nomicon/ffi.html#representing-opaque-structs #[repr(C)] pub struct CWidget { _data: [u8; 0], _marker: core::marker::PhantomData<( *mut u8, core::marker::PhantomPinned, )>, } extern "C" { pub fn make_widget() -> *mut CWidget; pub fn copy_widget( dst: *mut CWidget, src: *mut CWidget, ); pub fn free_widget(ptr: *mut CWidget); } } use self::widget_ffi::*; struct Widget { widget: *mut CWidget, } impl Widget { fn new() -> Self { Widget { widget: unsafe { make_widget() }, } } } impl Clone for Widget { fn clone(&self) -> Self { let widget = unsafe { make_widget() }; unsafe { copy_widget(widget, self.widget); } Widget { widget } } } impl Drop for Widget { fn drop(&mut self) { unsafe { free_widget(self.widget) }; } } } }
Just as with how in C++ it is uncommon to need user-defined implementations for
copy and move constructors or user-defined implementations for destructors, in
Rust it is rare to need to implement the Clone
and Drop
traits by hand for
types that do not represent resources.
There is one exception to this. If the type has type parameters, it might be
desirable to implement Clone
(and Copy
) manually even if the clone should be
done field-by-field. See the standard library documentation of
Clone
and of
Copy
for details.
Trivially copyable types
In C++, a class type is trivially copyable when it has no non-trivial copy constructors, move constructors, copy assignment operators, move assignment operators and it has a trivial destructor. Values of a trivially copyable type are able to be copied by copying their bytes.
In the first C++ example above, Age
is trivially copyable, but Person
is
not. This is because despite using a default copy constructor, the constructor
is not trivial because std::string
and std::shared_ptr
are not trivially
copyable.
Rust indicates whether types are trivially copyable with the Copy
trait. Just
as with trivially copyable types in C++, values of types that implement Copy
in Rust can be copied by copying their bytes. Rust requires explicit calls to
the clone
method to make copies of values of types that do not implement
Copy
.
In the first Rust example above, Age
implements the Copy
trait but Person
does not. This is because neither std::String
nor Rc<Person>
implement
Copy
. They do not implement Copy
because they own data that lives on the
heap, and so are not trivially copyable.
Rust prevents implementing Copy
for a type if any of its fields are not
Copy
, but does not prevent implementing Copy
for types that should not be
copied bit-for-bit due to their intended meaning, which is usually indicated by
a user-defined Clone
implementation.
Rust does not permit the implementation of both Copy
and Drop
for the same
type. This aligns with the C++ standard's requirement that trivially copyable
types not implement a user-defined destructor.
Move constructors
In Rust, all types support move semantics by default, and custom move semantics cannot be (and do not need to be) defined. This is because what "move" means in Rust is not the same as it is in C++. In Rust, moving a value means changing what owns the value. In particular, there is no "old" object to be destructed after a move, because the compiler will prevent the use of a variable whose value has been moved.
Assignment operators
Rust does not have a copy or move assignment operator. Instead, assignment either moves (by transferring ownership), explicitly clones and then moves, or implicitly copies and then moves.
fn main() { let x = Box::<u32>::new(5); let y = x; // moves let z = y.clone(); // explicitly clones and then moves the clone let w = *y; // implicitly copies the content of the Box and then moves the copy }
For situations where something like a user-defined copy assignment could avoid
allocations, the Clone
trait has an additional method called clone_from
. The
method is automatically defined, but can be overridden when implementing the
Clone
trait to provide an efficient implementation.
The method is not used for normal assignments, but can be explicitly used in
situations where the performance of the assignment is significant and would be
improved by using the more efficient implementation, if one is defined. The
implementation can be made more efficient because clone_from
takes ownership
of the object to which the values are being assigned, and so can do things like
reuse memory to avoid allocations.
#![allow(unused)] fn main() { fn go(x: &Vec<u32>) { let mut y = vec![0; x.len()]; // ... y.clone_from(&x); // ... } }
Performance concerns and Copy
The decision to implement Copy
should be based on the semantics of the type,
not on performance. If the size of objects being copied is a concern, then one
should instead use a reference (&T
or &mut T
) or put the value on the heap
(Box<T>
or
Rc<T>
). These approaches
correspond to passing by reference, or using a std::unique_ptr
or
std::shared_ptr
in C++.
Rule of three/five/zero
Rule of three
In C++ the rule of three is a rule of thumb that if a class has a user-defined destructor, copy constructor or copy assignment operator, it probably should have all three.
The corresponding rule for Rust is that if a type has a user-defined Clone
or
Drop
implementation, it probably needs both. This is for the same reason as
the rule of three in C++: if a type has a user-defined implementation for
Clone
or Drop
, it is probably because the type manages a resource, and both
Clone
and Drop
will need to take special actions for the resource.
Rule of five
The rule of five in C++ states that if move semantics are needed for a type with a user-defined copy constructor or copy assignment operator, then a user-defined move constructor and move assignment should also be provided, because no implicit move constructor or move assignment operator will be generated.
In Rust, this rule is not relevant because of the difference in move semantics between C++ and Rust.
Rule of zero
The rule of zero states that classes with user-defined copy/move constructors,
assignment operators, and destructors should deal only with ownership, and other
classes should not have those constructors or destructors. In practice, most
classes should make use of types from the STL (shared_ptr
, vector
, etc.) for
dealing with ownership concerns so that the implicitly defined copy and move
constructors are sufficient.
In Rust, the same is true. See the list of Rust type equivalents for equivalents of C++ smart pointer types and equivalents of C++ container types.
One difference between C++ and Rust in applying the rule of zero is that in C++
std::unique_ptr
can take a custom deleter, making it possible to use
std::unique_ptr
for wrapping raw pointers that require custom destruction
logic. In Rust, the Box
type is not parameterized in the same way. To
accomplish the same goal, one instead must define a new type with a user-defined
Drop
implementation, as is done in the example in the chapter on copy and
move
constructors.
Destructors and resource cleanup
In C++, a destructor for a class T
is defined by providing a special member
function ~T()
. To achieve the equivalent in Rust, the Drop
trait is implemented for a
type.
For an example, see the chapter on copy and move constructors.
Drop
implementations play the same role as destructors in C++ for types that
manage resources. That is, they enable cleanup of resources owned by the value
at the end of the value's lifetime.
In Rust the Drop::drop
method of a value is called automatically by a
destructor when the variable that owns the value goes out of scope. Unlike in
C++, the drop method cannot be called manually. Instead the automatic "drop
glue" implicitly calls the destructors of fields.
Lifetimes and destructors
C++ destructors are called in reverse order of construction when variables go out of scope, or for dynamically allocated objects, when they are deleted. This includes destructors of moved-from objects.
In Rust, the drop order is similar to that of C++ (reverse order of declaration). If additional specific details about the drop order are needed (e.g., for writing unsafe code), the full rules for the drop order are described in the language reference. However, moving an object in Rust does not leave a moved-from object on which a destructor will be called.
#include <iostream>
#include <utility>
struct A {
int id;
A(int id) : id(id) {}
// copy constructor
A(A &other) : id(other.id) {}
// move constructor
A(A &&other) : id(other.id) {
other.id = 0;
}
// destructor
~A() {
std::cout << id << std::endl;
}
};
int accept(A x) {
return x.id;
} // the destructor of x is called after the
// return expression is evaluated
// Prints:
// 2
// 3
// 0
// 1
int main() {
A x(1);
A y(2);
accept(std::move(y));
A z(3);
return 0;
}
struct A { id: i32, } impl Drop for A { fn drop(&mut self) { println!("{}", self.id) } } fn accept(x: A) -> i32 { return x.id; } // Prints: // 2 // 3 // 1 fn main() { let x = A { id: 1 }; let y = A { id: 2 }; accept(y); let z = A { id: 3 }; }
In Rust, after ownership of y
is moved into the function accept
, there is
no additional object remaining, and so there is no additional Drop::drop
call
(which in the C++ example prints 0
).
Rust's drop methods do run when leaving scope due to a panic, though not if the panic occurs in a destructor that was called in response to an initial panic.
Early cleanup and explicitly destroying values
In C++ you can explicitly destroy an object. This is mainly useful for situations where placement new has been used to allocate the object at a specific memory location, and so the destructor will not be implicitly called.
However, once the destructor has been explicitly called, it may not be called again, even implicitly. Thus the destructor can't be used for early cleanup. Instead, either the class must be designed with a separate cleanup method that releases the resources but leaves the object in a state where the destructor can be called or the function using the object must be structured so that the variable goes out of scope at the desired time.
In Rust, values can be dropped early for early cleanup by using
std::mem::drop
. This works
because (for non-Copy
types)
ownership of the object is actually transferred to std::mem::drop
function,
and so Drop::drop
is called at the end of std::mem::drop
when the lifetime
of the parameter ends.
Thus, std::mem::drop
can be used for early cleanup of resources without having
to restructure a function to force variables out of scope early.
For example, the following allocates a large vector on the heap, but explicitly drops it before allocating a second large vector on the heap, reducing the overall memory usage.
fn main() { let v = vec![0u32; 100000]; // ... use v std::mem::drop(v); // can no longer use v here let v2 = vec![0u32; 100000]; // ... use v2 }
Data modeling
In C++ the mechanisms available for data modeling are classes, enums, and unions.
Rust, on the other hand, uses records (structs) and algebraic data types (enums).
Although Rust supports one major piece of object oriented design, polymorphism
using interfaces, Rust also has language features for modeling things using
algebraic data types (which in simple cases are like a much more ergonomic
std::variant
).
This section gives examples of common constructions used when programming in C++ and how to achieve the same effects using Rust's features.
Fixed operations, varying data
In situations where one needs to model a fixed set of operations that clients will use, but the data that implements those operations are not fixed ahead of time, the approach in C++ and the approach in Rust are the same. In both cases interfaces that defines the required operations are defined. Concrete types, possibly defined by the client, implement those interfaces.
This way of modeling data can make use of either dynamic or static dispatch, each of which is covered in its own section.
Fixed data, varying operations
In situations where there is a fixed set of data but the operations that the data must support vary, there are a few approaches in C++. Which approaches are available to use depend on the version of the standard in use.
In older versions of the standard, one might use manually defined tagged unions.
In newer versions, std::variant
is available to improve the safety and
ergonomics of tagged unions. Both of these approaches map to the same approach
in Rust.
Additionally, despite it not being strictly necessary to model a fixed set of
variants, the visitor pattern is sometimes used for this situation, especially
when using versions of the C++ standard before the introduction of
std::variant
. In most of these cases the idiomatic Rust solution is the same
as what one would do when converting a C++ solution that uses tagged unions. The
chapter on the visitor pattern describes when to use a
Rust version of the visitor pattern or when to use Rust's enums (which are
closer to std::variant
than to C++ enums) to model the data.
Varying data and operations
When both data and operations may be extended by a client, the visitor pattern is used in both C++ and in Rust.
Abstract classes, interfaces, and dynamic dispatch
In C++ when an interface will be used with dynamic dispatch to resolve invoked methods, the interface is defined using an abstract class. Types that implement the interface inherit from the abstract class. In Rust the interface is given by a trait, which is then implemented for the types that support that trait. Programs can then be written over trait objects that use that trait as their base type.
The following example defines an interface, two implementations of that
interface, and a function that takes an argument that satisfies the interface.
In C++ the interface is defined with an abstract class with pure virtual
methods, and in Rust the interface is defined with a trait. In both languages,
the function (printArea
in C++ and print_area
in Rust) invokes a method
using dynamic dispatch.
#include <iostream>
#include <memory>
// Define an abstract class for an interface
struct Shape {
Shape() = default;
virtual ~Shape() = default;
virtual double area() = 0;
};
// Implement the interface for a concrete class
struct Triangle : public Shape {
double base;
double height;
Triangle(double base, double height)
: base(base), height(height) {}
double area() override {
return 0.5 * base * height;
}
};
// Implement the interface for a concrete class
struct Rectangle : public Shape {
double width;
double height;
Rectangle(double width, double height)
: width(width), height(height) {}
double area() override {
return width * height;
}
};
// Use an object via a reference to the interface
void printArea(Shape &shape) {
std::cout << shape.area() << std::endl;
}
int main() {
Triangle triangle = Triangle{1.0, 1.0};
printArea(triangle);
// Use an object via an owned pointer to the
// interface
std::unique_ptr<Shape> shape;
if (true) {
shape = std::make_unique<Rectangle>(1.0, 1.0);
} else {
shape = std::make_unique<Triangle>(
std::move(triangle));
}
// Convert to a reference to the interface
printArea(*shape);
}
// Define an interface trait Shape { fn area(&self) -> f64; } struct Triangle { base: f64, height: f64, } // Implement the interface for a concrete type impl Shape for Triangle { fn area(&self) -> f64 { 0.5 * self.base * self.height } } struct Rectangle { width: f64, height: f64, } // Implement the interface for a concrete type impl Shape for Rectangle { fn area(&self) -> f64 { self.width * self.height } } // Use a value via a reference to the interface fn print_area(shape: &dyn Shape) { println!("{}", shape.area()); } fn main() { let triangle = Triangle { base: 1.0, height: 1.0, }; print_area(&triangle); // Use a value via an owned pointer to the // interface let shape: Box<dyn Shape> = if true { Box::new(Rectangle { width: 1.0, height: 1.0, }) } else { Box::new(triangle) }; // Convert to a reference to the interface print_area(shape.as_ref()); }
There are several places where the Rust implementation differs slightly from the C++ implementation.
In Rust, a trait's methods are always visible whenever the trait itself is visible. Additionally, the fact that a type implements a trait is always visible whenever both the trait and the type are visible. These properties of Rust explain the lack of visibility declarations in places where one might find them in C++.
In C++, to associate methods with a type rather than value of that type, you use
the static
keyword. In Rust, non-static methods take an explicit self
parameter.
This syntactic choice makes it possible to indicate (in way similar to other parameters) whether the
method mutates the object (by taking &mut self
instead of &self
) and whether
it takes ownership of the object (by taking self
instead of &self
).
Rust methods do not need to be declared as virtual. Because of differences in
vtable representation, all methods for a type are available for dynamic
dispatch. Types of values that use vtables are indicated with the dyn
keyword.
This is further described below.
Additionally, Rust does not have an equivalent for the virtual destructor
declaration because in Rust every vtable includes the drop behavior (whether
given by a user defined Drop
implementation or not) required for the value.
Vtables and Rust trait object types
C++ and Rust both requires some kind of indirection to perform dynamic dispatch against an interface. In C++ this indirection takes the form of a pointer to the abstract class (instead of the derived concrete class), making use of a vtable to resolve the virtual method.
In the above Rust example, the type dyn Shape
is the type of a trait object
for the Shape
trait. A trait object includes a vtable along with the
underlying value.
In C++ all objects whose class inherits from a class with a virtual method have a vtable in their representation, whether dynamic dispatch is used or not. Pointers or references to objects are the same size as pointers to objects without virtual methods, but every object includes its vtable.
In Rust, vtables are present only when values are represented as trait objects.
The reference to the trait object is twice the size of a normal reference since
it includes both the pointer to the value and the pointer to the vtable. In the
Rust example above, the local variable triangle
in main
does not have a
vtable in its representation, but when the reference to it is converted to a
reference to a trait object (so that it can be passed to print_area
), that
does include a pointer to the vtable.
Additionally, just as abstract classes in C++ cannot be used as the type of a
local variable, the type of a parameter of a function, or the type of a return
value of a function, trait object types in Rust cannot be used in corresponding
contexts. In Rust, this is enforced by the type dyn Shape
not implementing the
Sized
marker trait, preventing it from being used in contexts that require
knowing the size of a type statically.
The following example shows some places where a trait object type can and cannot
be used due to not implementing Sized
. The uses forbidden in Rust would also
be forbidden in C++ because Shape
is an abstract class.
trait Shape { fn area(&self) -> f64; } struct Triangle { base: f64, height: f64, } impl Shape for Triangle { fn area(&self) -> f64 { 0.5 * self.base * self.height } } fn main() { // Local variables must have a known size. // let v: dyn Shape = Triangle { base: 1.0, height: 1.0 }; // References always have a known size. let shape: &dyn Shape = &Triangle { base: 1.0, height: 1.0, }; // Boxes also always have a known size. let boxed_shape: Box<dyn Shape> = Box::new(Triangle { base: 1.0, height: 1.0, }); // Types like Option<T> the value of type T directly, and so also need to // know the size of T. // let v: Option<dyn Shape> = Some(Triangle { base: 1.0, height: 1.0 }); } // Parameter types must have a known size. // fn print_area(shape: dyn Shape) { } fn print_area(shape: &dyn Shape) {}
The decision to include the vtable in the reference instead of in the value is one part of what makes it reasonable to use traits both for polymorphism via dynamic dispatch and for polymorphism via static dispatch, where one would use concepts in C++.
Limitations of trait objects in Rust
In Rust, not all traits can be used as the base trait for trait objects. The
most commonly encountered restriction is that traits that require knowledge of
the object's size via a Sized
supertrait are not dyn
-compatible. There are
additional
restrictions.
Trait objects and lifetimes
Objects which are used with dynamic dispatch may contain pointers or references to other objects. In C++ the lifetimes of those references must be tracked manually by the programmer.
Rust checks the bounds on the lifetimes of references that the trait objects may contain. If the bounds are not given explicitly, they are determined according to the lifetime elision rules. The bound is part of the type of the trait object.
Usually the elision rules pick the correct lifetime bound. Sometimes, the rules
result in surprising error messages from the compiler. In those situations or
when the compiler cannot determine which lifetime bound to assign, the bound may
be given manually. The following example shows explicitly what the inferred
lifetimes are for a structure storing a trait object and for the print_area
function.
trait Shape { fn area(&self) -> f64; } struct Triangle { base: f64, height: f64, } impl Shape for Triangle { fn area(&self) -> f64 { 0.5 * self.base * self.height } } struct Scaled { scale: f64, // 'static is the lifetime that would be inferred by the lifetime elision // rule [lifetime-elision.trait-object.default]. shape: Box<dyn Shape + 'static>, } impl Shape for Scaled { fn area(&self) -> f64 { self.scale * self.shape.area() } } // These are the lifetimes that would be inferred by the lifetime elision rule // [lifetime-elision.function.implicit-lifetime-parameters] for the reference // and [lifetime-elision.trait-object.containing-type-unique] for the trait // bound. fn print_area<'a>(shape: &'a (dyn Shape + 'a)) { println!("{}", shape.area()); } fn main() { let triangle = Triangle { base: 1.0, height: 1.0, }; print_area(&triangle); let scaled_triangle = Scaled { scale: 2.0, shape: Box::new(triangle), }; print_area(&scaled_triangle); }
Concepts, interfaces, and static dispatch
In C++, static dispatch over an interface is achieved by implementing a template function or template method that interacts with the type using some expected interface.
The template function twiceArea
in the example below makes use of an area()
method on the template type parameter.
To achieve the same goal in Rust involves defining a trait (Shape
) with the
desired method (twice_area
) and using the trait as a bound on the type
parameter for the generic function.
#include <iostream>
struct Triangle {
double base;
double height;
Triangle(double base, double height)
: base(base), height(height) {}
// NOT virtual: it will be used with static dispatch
double area() {
return 0.5 * base * height;
}
};
// Generic function using interface
template <class T>
double twiceArea(T &shape) {
return shape.area() * 2;
}
int main() {
Triangle triangle{1.0, 1.0};
std::cout << twiceArea(triangle) << std::endl;
return 0;
}
// Interface that generic function will use trait Shape { fn area(&self) -> f64; } struct Triangle { base: f64, height: f64, } // Implementation of interface for type impl Shape for Triangle { fn area(&self) -> f64 { 0.5 * self.base * self.height } } // Generic function using interface fn twice_area<T: Shape>(shape: &T) -> f64 { 2.0 * shape.area() } fn main() { let triangle = Triangle { base: 1.0, height: 1.0, }; println!("{}", twice_area(&triangle)); }
Note that in the Rust example, the definition of the trait and the struct have not changed from the example in the chapter on virtual methods and dynamic dispatch. Even so, this example does use static dispatch. This is the result of a design trade-off in Rust around the representation of vtables and vptrs which is described later in that chapter.
The difference between Rust and C++ in the above examples arises from Rust being nominally typed (types must opt in to supporting a specific interface, merely having the right methods isn't enough) and C++'s template meta-programming enabling a kind of structural or duck typing (types only need to have the methods actually used, and there is no need to explicitly opt in to supporting an interface).
Templates vs generic functions
The reason why Rust is nominally typed instead of structurally typed has to do with the difference between C++ templates and Rust generic functions. In particular, C++ templates are only type checked after all of the template arguments are provided and they are fully expanded, while Rust generic functions are type checked independently of the type arguments.
Since the functions are checked before the type arguments are known, the methods and functions that can be applied to values of those types also need to be known before the type arguments are known.
This point in the programming language design space favors simplicity of reasoning about these functions over the flexibility that comes from the template programming approach. This becomes especially valuable when writing libraries that both provide generic functions defined in terms of other generic functions, for which a C++ compiler can give many fewer static guarantees, since it would not be possible to test all possible instantiations.
In both C++ and Rust, however, multiple implementations are generated by the compiler in order to achieve static dispatch.
C++ constraints and concepts
Rust's approach to static dispatch over an interface can be partially (but only partially) modeled with a strict application of C++ concepts.
The usual way to apply concepts is still structural and does not model Rust's approach: it only requires that a method with specific properties be present on the type.
#include <concepts>
template <typename T>
concept shape = requires(T t) {
{ t.area() } -> std::same_as<double>;
};
template <shape T>
double twiceArea(T shape) {
return shape.area() * 2;
}
A closer equivalent to the above Rust program in C++ is to use a combination of abstract classes and concepts.
#include <concepts>
struct Shape {
Shape() {}
virtual ~Shape() {}
virtual double area() = 0;
};
template <typename T>
concept shape = std::derived_from<T, Shape>;
struct Triangle : Shape {
double base;
double height;
Triangle(double base, double height) : base(base), height(height) {}
// still NOT virtual: will be used static dispatch
double area() override {
return 0.5 * base * height;
}
};
template <shape T>
double twiceArea(T shape) {
return shape.area() * 2;
}
int main() {
Triangle triangle{1.0, 1.0};
std::cout << twiceArea(triangle) << std::endl;
return 0;
}
This is still not the same, however, because the concept only creates a
requirement on the use of the template, not on the use of values of type T
within the template. In Rust, the trait bound constrains both. So the following
still compiles in C++.
#include <concepts>
struct Shape {
Shape() {}
virtual ~Shape() {}
virtual double area() = 0;
};
template <typename T>
concept shape = std::derived_from<T, Shape>;
template <shape T>
double twiceArea(T shape) {
// note the call to a method not defined in Shape
return shape.volume() * 2;
}
However, the equivalent does not compile in Rust and instead produces an error.
trait Shape {
fn area(&self) -> f64;
}
fn twice_area<T: Shape>(shape: &T) -> f64 {
// note the call to a method not defined in Shape
2.0 * shape.volume()
}
error[E0599]: no method named `volume` found for reference `&T` in the current scope
--> example.rs:7:17
|
7 | 2.0 * shape.volume()
| ^^^^^^ method not found in `&T`
These additional static checks mean that in many situations where C++ templates would be useful but hard to implement correctly, Rust generics are freely used.
Required traits and ergonomics
In the above examples, the function requiring a trait was defined like the following.
fn twice_area<T: Shape>(shape: &T) -> f64 {
2.0 * shape.area()
}
This is a commonly used shorthand for the following:
fn twice_area<T>(shape: &T) -> f64
where
T: Shape,
{
2.0 * shape.area()
}
The more verbose form is preferred when there are many type parameters or those
type parameters must implement many traits. An even shorter-hand available in some
cases is the impl
keyword:
fn twice_area(shape: &impl Shape) -> f64 {
2.0 * shape.area()
}
Generics and lifetimes
When defining a template in C++ that makes use of a type template parameter, the lifetimes of references stored within objects of that type must be tracked manually by the programmer.
The following (contrived) C++ example compiles without error, but could be used in a way that results in undefined behavior.
#include <memory>
struct Shape {
Shape() {}
virtual ~Shape() {}
virtual double area() = 0;
};
template<typename S>
void store(S s, std::unique_ptr<Shape> data) {
// Will pointers or references in `s` become dangling while `data`
// is still in use?
*data = s;
}
Rust checks the bounds on lifetimes of references contained within type parameters. Just as with trait object types, these bounds are usually inferred according to the lifetime elision rules. When they cannot be inferred, or they are inferred incorrectly, the bounds can be declared manually.
In the Rust transliteration of the above example, the lifetime bounds have to be given manually because the inferred bounds are incorrect. Without explicit bounds, the compiler produces an error.
trait Shape {}
fn store<S: Shape>(x: S, data: &mut Box<dyn Shape>) {
*data = Box::new(x);
}
error[E0310]: the parameter type `S` may not live long enough
--> example.rs:7:5
|
7 | *data = Box::new(x);
| ^^^^^
| |
| the parameter type `S` must be valid for the static lifetime...
| ...so that the type `S` will meet its required lifetime bounds
|
The error message becomes clearer when the inferred lifetime bounds are made
explicit. With the given type for store
, the argument for x
could be
something that has a lifetime that does not last as long as the lifetimes in the
contents in the box.
trait Shape {}
struct Triangle {
base: f64,
height: f64,
}
impl Shape for Triangle {}
// The type parameter S is assigned no lifetime bound.
fn store<'a, S: Shape>(
x: S,
// The reference is assigned a fresh lifetime by rule
// [lifetime-elision.function.implicit-lifetime-parameters].
//
// The trait object is assigned 'static by rule
// [lifetime-elision.trait-object.default] and
// [lifetime-elision.trait-object.innermost-type].
data: &'a mut Box<dyn Shape + 'static>,
) {
*data = Box::new(x);
}
// An example of how the implementation of store could be misused with
// the given type.
fn main() {
let triangle = Triangle {
base: 1.0,
height: 2.0,
};
let mut b: Box<dyn Shape> = Box::new(triangle);
{
let short_lived_triangle = Triangle {
base: 5.0,
height: 10.0,
};
store(short_lived_triangle, &mut b);
}
// Here b contains a dangling reference.
}
For this specific case, the most general solution is to define a new lifetime
parameter to bound both S
and dyn Shape
. The type parameter for the
reference can be elided, because it will be assigned a fresh lifetime parameter.
#![allow(unused)] fn main() { trait Shape {} // Note the common bound // -----------------here-\ // ----------------------|---------------------------and here-\ // v v fn store<'s, S: Shape + 's>(x: S, data: &mut Box<dyn Shape + 's>) { *data = Box::new(x); } }
Enums
In C++, enums are often used to model a fixed set of alternatives, especially when each of those enumerators corresponds to a specific integer value, such as is needed when working with hardware, system calls, or protocol implementations.
For example, the various modes for a GPIO pin could be modeled as an enum, which would restrict methods using the mode to valid values.
While Rust enums are more general, they can still be used for this sort of modeling.
#include <cstdint>
enum Pin : uint8_t {
Pin1 = 0x01,
Pin2 = 0x02,
Pin3 = 0x04
};
enum Mode : uint8_t {
Output = 0x03,
Pullup = 0x04,
Analog = 0x27
// ...
};
void low_level_set_pin(uint8_t pin, uint8_t mode);
void set_pin_mode(Pin pin, Mode mode) {
low_level_set_pin(pin, mode);
}
#![allow(unused)] fn main() { #[repr(u8)] #[derive(Clone, Copy)] enum Pin { Pin1 = 0x01, Pin2 = 0x02, Pin3 = 0x04, } #[repr(u8)] #[derive(Clone, Copy)] enum Mode { Output = 0x03, Pullup = 0x04, Analog = 0x27, // ... } extern "C" { fn low_level_set_pin(pin: u8, mode: u8); } fn set_pin_mode(pin: Pin, mode: Mode) { unsafe { low_level_set_pin(pin as u8, mode as u8) }; } }
The #[repr(u8)]
attribute ensures that the representation of the enum is the
same as a byte (like declaring the underlying type of an enum in C++). The enum
values can then be freely converted to the underlying type with the as
.
In C++ the standard way to convert from an integer to an enum is a static cast. However, this requires that the user check the validity of the cast themselves. Often the conversion is done by a function that checks that the value to convert is a valid enum value.
In Rust the standard way to perform the conversion is to implement the TryFrom
trait for the type and then use the try_from
method or try_into
method.
#include <cstdint>
enum Pin : uint8_t {
Pin1 = 0x01,
Pin2 = 0x02,
Pin3 = 0x04
};
struct InvalidPin {
uint8_t pin;
};
Pin to_pin(uint8_t pin) {
// The values are not contiguous, so we can't
// just check the bounds and then cast.
switch (pin) {
case 0x1: { return Pin1; }
case 0x2: { return Pin2; }
case 0x4: { return Pin3; }
}
throw InvalidPin{pin};
}
int main() {
try {
Pin p(to_pin(2));
} catch (InvalidPin &e) {
return 0;
}
// use pin p
}
#[repr(u8)] #[derive(Clone, Copy)] enum Pin { Pin1 = 0x01, Pin2 = 0x02, Pin3 = 0x04, } use std::convert::TryFrom; struct InvalidPin(u8); impl TryFrom<u8> for Pin { type Error = InvalidPin; fn try_from( value: u8, ) -> Result<Self, Self::Error> { match value { 0x01 => Ok(Pin::Pin1), 0x02 => Ok(Pin::Pin2), 0x04 => Ok(Pin::Pin3), pin => Err(InvalidPin(pin)), } } } fn main() { let Ok(p) = Pin::try_from(2) else { return; }; // use pin p }
See Exceptions and error handling for examples of how
to ergonomically handle the result of try_from
.
If low-level performance is more of a concern than memory safety,
std::mem::transmute
is analogous to a C++ reinterpret cast, but requires
unsafe Rust because its use can result in undefined behavior. Uses of
std::mem::transmute
for this purpose should not be hidden behind an interface
that can be called from safe Rust unless the interface can actually guarantee
that the call will never happen with an invalid value.
Enums and methods
In C++ enums cannot have methods. Instead, to model an enum with methods one
must define a wrapper class for the enum and define the methods on that wrapper
class. In Rust, methods can be defined on an enum with an impl
block, just
like any other type.
#include <cstdint>
// Actual enum
enum PinImpl : uint8_t {
Pin1 = 0x01,
Pin2 = 0x02,
Pin3 = 0x04
};
class LastPin{};
// Wrapper type
struct Pin {
PinImpl pin;
// Conversion constructor so that PinImpl can be
// used as a Pin.
Pin(PinImpl p) : pin(p) {}
// Conversion method so wrapper type can be
// used with switch statement.
operator PinImpl() {
return this->pin;
}
Pin next() const {
switch (pin) {
case Pin1:
return Pin(Pin2);
case Pin2:
return Pin(Pin3);
default:
throw LastPin{};
}
}
};
#![allow(unused)] fn main() { #[repr(u8)] #[derive(Clone, Copy)] enum Pin { Pin1 = 0x01, Pin2 = 0x02, Pin3 = 0x04, } struct LastPin; impl Pin { fn next(&self) -> Result<Self, LastPin> { match self { Pin::Pin1 => Ok(Pin::Pin2), Pin::Pin2 => Ok(Pin::Pin3), Pin::Pin3 => Err(LastPin), } } } }
Tagged unions and std::variant
C-style tagged unions
Because unions cannot be used for type punning in C++, they are usually used with a tag to discriminate between which variant of the union is active.
Rust's equivalent to union types are always tagged. They are a generalization of Rust enums, where additional data may be associated with the enum variants.
enum Tag { Rectangle, Triangle };
struct Shape {
Tag tag;
union {
struct {
double width;
double height;
} rectangle;
struct {
double base;
double height;
} triangle;
};
double area() {
switch (this->tag) {
case Rectangle: {
return this->rectangle.width *
this->rectangle.height;
}
case Triangle: {
return 0.5 * this->triangle.base *
this->triangle.height;
}
}
}
};
#![allow(unused)] fn main() { enum Shape { Rectangle { width: f64, height: f64 }, Triangle { base: f64, height: f64 }, } impl Shape { fn area(&self) -> f64 { match self { Shape::Rectangle { width, height, } => width * height, Shape::Triangle { base, height } => { 0.5 * base * height } } } } }
When matching on an enum, Rust requires that all variants of the enum be
handled. In situations where default
would be used with a C++ switch
on the
tag, a wildcard can be used in the Rust match
.
#include <iostream>
enum Tag { Rectangle, Triangle, Circle };
struct Shape {
Tag tag;
union {
struct {
double width;
double height;
} rectangle;
struct {
double base;
double height;
} triangle;
struct {
double radius;
} circle;
};
void print_shape() {
switch (this->tag) {
case Rectangle: {
std::cout << "Rectangle" << std::endl;
break;
}
default: {
std::cout << "Some other shape"
<< std::endl;
break;
}
}
}
};
#![allow(unused)] fn main() { enum Shape { Rectangle { width: f64, height: f64 }, Triangle { base: f64, height: f64 }, } impl Shape { fn print_shape(&self) { match self { Shape::Rectangle { .. } => { println!("Rectangle"); } _ => { println!("Some other shape"); } } } } }
Rust does not support C++-style fallthrough where some behavior can be done before falling through to the next case. However, in Rust one can match on multiple enum variants simultaneously, so long as the simultaneous match patterns bind the same names with the same types.
#![allow(unused)] fn main() { enum Shape { Rectangle { width: f64, height: f64 }, Triangle { base: f64, height: f64 }, } impl Shape { fn bounding_area(&self) -> f64 { match self { Shape::Rectangle { height, width } | Shape::Triangle { height, base: width, } => width * height, } } } }
Accessing the value without checking the discriminant
Unlike with C-style unions, Rust always requires matching on the discriminant before accessing the values. If the variant is already known, e.g., due to an earlier check, then the code can usually be refactored to encode the knowledge in the type so that the second check (and corresponding error handling) can be omitted.
A C++ program like the following requires more restructuring of the types to achieve the same goal in Rust.
The corresponding Rust program requires defining separate types for each variant
of the Shape
enum so that the fact that all of the value are of a given type
can be expressed in the type system by having an array of Triangle
instead of
an array of Shape
.
#include <ranges>
#include <vector>
// Uses the same Shape definition.
enum Tag { Rectangle, Triangle };
struct Shape {
Tag tag;
union {
struct {
double width;
double height;
} rectangle;
struct {
double base;
double height;
} triangle;
};
};
std::vector<Shape> get_shapes() {
return std::vector<Shape>{
Shape{Triangle, {.triangle = {1.0, 1.0}}},
Shape{Triangle, {.triangle = {1.0, 1.0}}},
Shape{Rectangle, {.rectangle = {1.0, 1.0}}},
};
}
std::vector<Shape> get_shapes();
int main() {
std::vector<Shape> shapes = get_shapes();
auto is_triangle = [](Shape shape) {
return shape.tag == Triangle;
};
// Create an iterator that only sees the
// triangles. (std::views::filter is from C++20,
// but the same effect can be acheived with a
// custom iterator.)
auto triangles =
shapes | std::views::filter(is_triangle);
double total_base = 0.0;
for (auto &triangle : triangles) {
// Skip checking the tag because we know we
// have only triangles.
total_base += triangle.triangle.base;
}
return 0;
}
// Define a separate struct for each variant. struct Rectangle { width: f64, height: f64 } struct Triangle { base: f64, height: f64 } enum Shape { Rectangle(Rectangle), Triangle(Triangle), } fn get_shapes() -> Vec<Shape> { vec![ Shape::Triangle(Triangle { base: 1.0, height: 1.0, }), Shape::Triangle(Triangle { base: 1.0, height: 1.0, }), Shape::Rectangle(Rectangle { width: 1.0, height: 1.0, }), ] } fn main() { let shapes = get_shapes(); // This iterator only iterates over triangles // and demonstrates that by iterating over // the Triangle type instead of the Shape type. let triangles = shapes .iter() // Keep only the triangles .filter_map(|shape| match shape { Shape::Triangle(t) => Some(t), _ => None, }); let mut total_base = 0.0; for triangle in triangles { // Because the iterator produces Triangles // instead of Shapes, base can be accessed // directly. total_base += triangle.base; } }
This kind of use is common enough in Rust that the variants are often designed to have their own types from the start.
This approach is also possible in C++. It is more commonly used along with
std::variant
in C++17 or later.
std::variant
(since C++17)
When programming in C++ standards since C++17, std::variant
can be used to
represent a tagged union in a way that has more in common with Rust enums.
#include <variant>
struct Rectangle {
double width;
double height;
};
struct Triangle {
double base;
double height;
};
using Shape = std::variant<Rectangle, Triangle>;
double area(const Shape &shape) {
return std::visit(
[](auto &&arg) -> double {
using T = std::decay_t<decltype(arg)>;
if constexpr (std::is_same_v<T, Rectangle>) {
return arg.width * arg.height;
} else if constexpr (std::is_same_v<T, Triangle>) {
return 0.5 * arg.base * arg.height;
}
},
shape);
}
Because Rust doesn't depend on templates for this language feature, error
messages when a variant is missed or when a new variant is added are easier to
read, which removes one of the barriers to using tagged unions more frequently.
Compare the errors in C++ (using gcc) and Rust when the Triangle
case is
omitted.
The following two programs have the same error: each fails to handle a case of
Shape
.
#include <variant>
struct Rectangle {
double width;
double height;
};
struct Triangle {
double base;
double height;
};
using Shape = std::variant<Rectangle, Triangle>;
double area(const Shape &shape) {
return std::visit(
[](auto &&arg) -> double {
using T = std::decay_t<decltype(arg)>;
if constexpr (std::is_same_v<T, Rectangle>) {
return arg.width * arg.height;
}
},
shape);
}
enum Shape {
Rectangle { width: f64, height: f64 },
Triangle { base: f64, height: f64 },
}
impl Shape {
fn area(&self) -> f64 {
match self {
Shape::Rectangle {
width,
height,
} => width * height,
}
}
}
However, the error messages differ significantly.
example.cc: In instantiation of ‘area(const Shape&)::<lambda(auto:27&&)> [with auto:27 = const Triangle&]’:
/usr/include/c++/14.2.1/bits/invoke.h:61:36: required from ‘constexpr _Res std::__invoke_impl(__invoke_other, _Fn&&, _Args&& ...) [with _Res = double; _Fn = area(const Shape&)::<lambda(auto:27&&)>; _Args = {const Triangle&}]’
61 | { return std::forward<_Fn>(__f)(std::forward<_Args>(__args)...); }
| ~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/c++/14.2.1/bits/invoke.h:96:40: required from ‘constexpr typename std::__invoke_result<_Functor, _ArgTypes>::type std::__invoke(_Callable&&, _Args&& ...) [with _Callable = area(const Shape&)::<lambda(auto:27&&)>; _Args = {const Triangle&}; typename __invoke_result<_Functor, _ArgTypes>::type = double]’
96 | return std::__invoke_impl<__type>(__tag{}, std::forward<_Callable>(__fn),
| ~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
97 | std::forward<_Args>(__args)...);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/c++/14.2.1/variant:1060:24: required from ‘static constexpr decltype(auto) std::__detail::__variant::__gen_vtable_impl<std::__detail::__variant::_Multi_array<_Result_type (*)(_Visitor, _Variants ...)>, std::integer_sequence<long unsigned int, __indices ...> >::__visit_invoke(_Visitor&&, _Variants ...) [with _Result_type = std::__detail::__variant::__deduce_visit_result<double>; _Visitor = area(const Shape&)::<lambda(auto:27&&)>&&; _Variants = {const std::variant<Rectangle, Triangle>&}; long unsigned int ...__indices = {1}]’
1060 | return std::__invoke(std::forward<_Visitor>(__visitor),
| ~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1061 | __element_by_index_or_cookie<__indices>(
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1062 | std::forward<_Variants>(__vars))...);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/usr/include/c++/14.2.1/variant:1820:5: required from ‘constexpr decltype(auto) std::__do_visit(_Visitor&&, _Variants&& ...) [with _Result_type = __detail::__variant::__deduce_visit_result<double>; _Visitor = area(const Shape&)::<lambda(auto:27&&)>; _Variants = {const variant<Rectangle, Triangle>&}]’
1820 | _GLIBCXX_VISIT_CASE(1)
| ^~~~~~~~~~~~~~~~~~~
/usr/include/c++/14.2.1/variant:1882:34: required from ‘constexpr std::__detail::__variant::__visit_result_t<_Visitor, _Variants ...> std::visit(_Visitor&&, _Variants&& ...) [with _Visitor = area(const Shape&)::<lambda(auto:27&&)>; _Variants = {const variant<Rectangle, Triangle>&}; __detail::__variant::__visit_result_t<_Visitor, _Variants ...> = double]’
1882 | return std::__do_visit<_Tag>(
| ~~~~~~~~~~~~~~~~~~~~~^
1883 | std::forward<_Visitor>(__visitor),
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1884 | static_cast<_Vp>(__variants)...);
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
example.cc:17:20: required from here
17 | return std::visit(
| ~~~~~~~~~~^
18 | [](auto &&arg) -> double {
| ~~~~~~~~~~~~~~~~~~~~~~~~~~
19 | using T = std::decay_t<decltype(arg)>;
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
20 | if constexpr (std::is_same_v<T, Rectangle>) {
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
21 | return arg.width * arg.height;
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
22 | }
| ~
23 | },
| ~~
24 | shape);
| ~~~~~~
example.cc:23:7: error: no return statement in ‘constexpr’ function returning non-void
23 | },
| ^
example.cc: In lambda function:
example.cc:23:7: warning: control reaches end of non-void function [-Wreturn-type]
error[E0004]: non-exhaustive patterns: `&Shape::Triangle { .. }` not covered
--> example.rs:8:15
|
8 | match self {
| ^^^^ pattern `&Shape::Triangle { .. }` not covered
|
note: `Shape` defined here
--> example.rs:1:6
|
1 | enum Shape {
| ^^^^^
2 | Rectangle { width: f64, height: f64 },
3 | Triangle { base: f64, height: f64 },
| -------- not covered
= note: the matched value is of type `&Shape`
help: ensure that all possible cases are being handled by adding a match arm with a wildcard pattern or an explicit pattern as shown
|
12~ } => width * height,
13~ &Shape::Triangle { .. } => todo!(),
|
Using unsafe Rust to avoid checking the discriminant
In situations where rewriting code to use the above
approach is not
possible, one can check the discriminant anyway and then use the unreachable!
macro to avoid handling
the impossible case. However, that still involves actually checking the
discriminant. If the cost of checking the discriminant must be avoided, then the
unsafe function
unreachable_unchecked
can be used to both avoid handling the case and to indicate to the compiler that
the optimizer should assume that the case cannot be reached, so the discriminant
check can be optimized away.
Much like how in the C++ example accessing an inactive variant is undefined
behavior, reaching unreachable_unchecked
is also undefined behavior.
enum Shape { Rectangle { width: f64, height: f64 }, Triangle { base: f64, height: f64 }, } impl Shape { fn area(&self) -> f64 { match self { Shape::Rectangle { width, height, } => width * height, Shape::Triangle { base, height } => { 0.5 * base * height } } } } fn get_triangles() -> Vec<Shape> { vec![ Shape::Triangle { base: 1.0, height: 1.0, }, Shape::Triangle { base: 1.0, height: 1.0, }, ] } use std::hint::unreachable_unchecked; fn main() { let mut total_base = 0.0; for triangle in get_triangles() { match triangle { Shape::Triangle { base, .. } => { total_base += base; } _ => unsafe { unreachable_unchecked(); }, } } }
Inheritance and implementation reuse
Rust does not have inheritance and so the primary means of reuse of implementations in Rust are composition, aggregation, and generics.
However, Rust traits do have support for default methods which resemble one simple case of using inheritance for reuse of implementations. For example, in the following example two virtual methods are used to support a method whose implementation is provided by the abstract class.
#include <iostream>
#include <string>
class Device {
public:
virtual void powerOn() = 0;
virtual void powerOff() = 0;
virtual void resetDevice() {
std::cout << "Resetting device..." << std::endl;
powerOff();
powerOn();
}
virtual ~Device() {}
};
class Printer : public Device {
bool powered = false;
public:
void powerOn() override {
this.powered = true;
std::cout << "Printer is powered on." << std::endl;
}
void powerOff() override {
this.powered = false;
std::cout << "Printer is powered off." << std::endl;
}
};
int main() {
Printer myPrinter;
myPrinter.resetDevice();
}
trait Device { fn power_on(&mut self); fn power_off(&mut self); fn reset_device(&mut self) { println!("Resetting device..."); self.power_on(); self.power_off(); } } struct Printer { powered: bool, } impl Printer { fn new() -> Printer { Printer { powered: false } } } impl Device for Printer { fn power_on(&mut self) { self.powered = true; println!("Printer is powered on"); } fn power_off(&mut self) { self.powered = false; println!("Printer is powered off"); } } fn main() { let mut p = Printer::new(); p.reset_device(); }
In practice, the resetDevice()
method in the Device
class might be made
non-virtual in C++ if it is not expected that it will be overridden. In order to
make it align with the Rust example, we have made it virtual here, since Rust
traits can be used either for dynamic
dispatch or static
dispatch (with no vtable overhead in the
static dispatch
case).
Rust traits differ from abstract classes in few more ways. For example, Rust traits cannot define data members and cannot define private or protected methods. This limits the effectiveness of using traits to implement the template method pattern.
Rust traits also cannot be privately implemented. Anywhere that both a trait and a type that implements that trait are visible, the methods of the trait are visible as methods on the type.
Traits can, however, inherit from each other, including multiple inheritance. As in modern C++, inheritance hierarchies in Rust tend to be shallow. In situations with complex multiple inheritance, however, the diamond problem cannot arise in Rust because traits cannot override other traits implementations. Therefore, all paths to a common parent trait resolve to the same implementation.
Template classes, functions, and methods
The most common uses of templates in C++ are to define classes, methods, traits,
or functions that work for any type (or at least for any type that provides
certain methods). This use case is common in the STL for container classes (such
as <vector>
) and for the algorithms library (<algorithm>
).
The following example defines a template for a directed graph represented as an adjacency list, where the graph is generic in the type of the labels on the nodes. Though the example shows a template class, the same comparisons with Rust apply to template methods and template functions.
The same kind of reusable code can be created in Rust using generic types.
#include <stdexcept>
#include <vector>
template <typename Label>
class DirectedGraph {
std::vector<std::vector<size_t>> adjacencies;
std::vector<Label> nodeLabels;
public:
size_t addNode(Label label) {
adjacencies.push_back(std::vector<size_t>());
nodeLabels.push_back(label);
return numNodes() - 1;
}
void addEdge(size_t from, size_t to) {
size_t numNodes = this->numNodes();
if (from >= numNodes || to >= numNodes) {
throw std::invalid_argument(
"Node index out of range");
}
adjacencies[from].push_back(to);
}
size_t numNodes() const {
return adjacencies.size();
}
};
#![allow(unused)] fn main() { pub struct DirectedGraph<Label> { adjacencies: Vec<Vec<usize>>, node_labels: Vec<Label>, } impl<Label> DirectedGraph<Label> { pub fn new() -> Self { DirectedGraph { adjacencies: Vec::new(), node_labels: Vec::new(), } } pub fn add_node( &mut self, label: Label, ) -> usize { self.adjacencies.push(Vec::new()); self.node_labels.push(label); self.num_nodes() - 1 } pub fn add_edge( &mut self, from: usize, to: usize, ) -> Result<(), &str> { let num_nodes = self.num_nodes(); if from >= num_nodes || to >= num_nodes { Err("Node index out of range.") } else { self.adjacencies[from].push(to); Ok(()) } } pub fn num_nodes(&self) -> usize { self.node_labels.len() } } }
In the use case demonstrated in the above example, there are few practical
differences between using C++ template to define a class and using and Rust's
generics to define a struct. Whenever one would use a template that takes a
typename
or class
parameter in C++, one can instead take a type parameter in
Rust.
Operations on the parameterized type
The differences become more apparent when one attempts to perform operations on the values. The following code listing adds a method to get the smallest node in the graph to both the Rust and the C++ examples.
#include <optional>
#include <stdexcept>
#include <vector>
template <typename Label>
class DirectedGraph {
std::vector<std::vector<size_t>> adjacencies;
std::vector<Label> nodeLabels;
public:
size_t addNode(Label label) {
adjacencies.push_back(std::vector<size_t>());
nodeLabels.push_back(label);
return numNodes() - 1;
}
void addEdge(size_t from, size_t to) {
size_t numNodes = this->numNodes();
if (from >= numNodes || to >= numNodes) {
throw std::invalid_argument(
"Node index out of range");
}
adjacencies[from].push_back(to);
}
size_t numNodes() const {
return adjacencies.size();
}
std::optional<size_t> smallestNode() {
if (nodeLabels.empty()) {
return std::nullopt;
}
Label &least = nodeLabels[0];
size_t index = 0;
for (int i = 1; i < nodeLabels.size(); i++) {
if (least > nodeLabels[i]) {
least = nodeLabels[i];
index = i;
}
}
return std::optional(index);
}
};
#![allow(unused)] fn main() { pub struct DirectedGraph<Label> { adjacencies: Vec<Vec<usize>>, node_labels: Vec<Label>, } impl<Label> DirectedGraph<Label> { pub fn new() -> Self { DirectedGraph { adjacencies: Vec::new(), node_labels: Vec::new(), } } pub fn add_node( &mut self, label: Label, ) -> usize { self.adjacencies.push(Vec::new()); self.node_labels.push(label); self.num_nodes() - 1 } pub fn num_nodes(&self) -> usize { self.node_labels.len() } pub fn add_edge( &mut self, from: usize, to: usize, ) -> Result<(), &str> { if from > self.num_nodes() || to > self.num_nodes() { Err("Node not in graph.") } else { self.adjacencies[from].push(to); Ok(()) } } pub fn smallest_node(&self) -> Option<usize> where Label: Ord, { // Matches the C++, but is not the idomatic // implementation! if self.node_labels.is_empty() { None } else { let mut least = &self.node_labels[0]; let mut index = 0; for i in 1..self.node_labels.len() { if *least > self.node_labels[i] { least = &self.node_labels[i]; index = i; } } Some(index) } } } }
The major difference between these implementations is that in the C++ version
operator>
is used on the values without knowing whether the operator is
defined for the type. In the Rust version, there is a constraint requiring that
the Label
type implement the Ord
trait. (See the chapter on concepts,
interfaces, and static dispatch for more
details on Rust traits and how they relate to C++ concepts.)
Unlike C++ templates, generic definitions in Rust are type checked at the point
of definition rather than at the point of use. This means that for operations to
be used on values with the type of a type parameter, the parameter has to be
constrained to types that implement some trait. As can be seen in the above
example, much like with C++ concepts and requires
, the constraint can be
required for individual methods rather than for the whole generic class.
It is best practice in Rust to put the trait bounds on the specific things that require the bounds, in order to make the overall use of the types more flexible.
As an aside, a more idiomatic implementation of smallest_node
makes use of
Rust's iterators. This style of implementation may take some getting used to for
programmers more accustomed to implementations in the style used in the earlier
example.
#![allow(unused)] fn main() { pub struct DirectedGraph<Label> { adjacencies: Vec<Vec<usize>>, node_labels: Vec<Label>, } impl<Label> DirectedGraph<Label> { pub fn new() -> Self { DirectedGraph { adjacencies: Vec::new(), node_labels: Vec::new(), } } pub fn add_node( &mut self, label: Label, ) -> usize { self.adjacencies.push(Vec::new()); self.node_labels.push(label); self.num_nodes() - 1 } pub fn num_nodes(&self) -> usize { self.node_labels.len() } pub fn add_edge( &mut self, from: usize, to: usize, ) -> Result<(), &str> { if from > self.num_nodes() || to > self.num_nodes() { Err("Node not in graph.") } else { self.adjacencies[from].push(to); Ok(()) } } pub fn smallest_node(&self) -> Option<usize> where Label: Ord, { self.node_labels .iter() .enumerate() .map(|(i, l)| (l, i)) .min() .map(|(_, i)| i) } } }
An even more idiomatic implementation would make use of the itertools crate.
use itertools::*;
pub struct DirectedGraph<Label> {
adjacencies: Vec<Vec<usize>>,
node_labels: Vec<Label>,
}
impl<Label> DirectedGraph<Label> {
pub fn new() -> Self {
DirectedGraph {
adjacencies: Vec::new(),
node_labels: Vec::new(),
}
}
pub fn add_node(
&mut self,
label: Label,
) -> usize {
self.adjacencies.push(Vec::new());
self.node_labels.push(label);
self.num_nodes() - 1
}
pub fn num_nodes(&self) -> usize {
self.node_labels.len()
}
pub fn add_edge(
&mut self,
from: usize,
to: usize,
) -> Result<(), &str> {
if from > self.num_nodes()
|| to > self.num_nodes()
{
Err("Node not in graph.")
} else {
self.adjacencies[from].push(to);
Ok(())
}
}
pub fn smallest_node(&self) -> Option<usize>
where
Label: Ord,
{
self.node_labels.iter().position_min()
}
}
constexpr
template parameters
Rust also supports the equivalent of constexpr template parameters. For example, one can define a generic function that returns an array consecutive integers starting from a specific value and whose size is determined at compile time.
#include <array>
#include <cstddef>
template <size_t N>
std::array<int, N>
makeSequentialArray(int start) {
std::array<int, N> arr;
for (size_t i = 0; i < N; i++) {
arr[i] = start + i;
}
}
#![allow(unused)] fn main() { fn make_sequential_array<const N: usize>( start: i32, ) -> [i32; N] { std::array::from_fn(|i| start + i as i32) } }
The corresponding idiomatic Rust function uses the helper std::array::from_fn
to construct the array. from_fn
itself takes as type parameters the element
type and the constant. Those arguments are elided because Rust can infer them,
because both are part of the type of the produced array.
Rust's Self
type
Within a Rust struct defintion, impl
block, or impl
trait block, there is a
Self
type that is in scope. The Self
type is the type of the class being
defined with all of the generic type parameters filled in. It can be useful to
refer to this type especially in cases where there are many parameters that
would otherwise have to be listed out.
The Self
type is necessary when defining generic traits to refer to the
concrete implementing type. Because Rust does not have inheritance between
concrete types and does not have method overriding, this is sufficient to avoid
the need to pass the implementing type as a type parameter.
For examples of this, see the chapter on the curiously reoccurring template pattern.
A note on type checking and type errors
The checking of generic types at the point of definition rather than at the point of template expansion impacts when errors are detected and how they are reported. Some of this difference cannot be achieved by consistently using C++ concepts to declare the operations required.
For example, one might accidentally make the nodeLabels
member a vector of
size_t
instead of a vector of the label parameter. If all of the test cases
for the graph used label types that were convertible to integers, the error
would not be detected.
A similar Rust program fails to compile, even without a function that instantiates the generic structure with a concrete type.
#include <stdexcept>
#include <vector>
template <typename Label>
class DirectedGraph {
// The mistake is here: size_t should be Label
std::vector<std::vector<size_t>> adjacencies;
std::vector<size_t> nodeLabels;
public:
Label getNode(size_t nodeId) {
return nodeLabels[nodeId];
}
size_t addNode(Label label) {
adjacencies.push_back(std::vector<size_t>());
nodeLabels.push_back(label);
return numNodes() - 1;
}
size_t numNodes() const {
return adjacencies.size();
}
};
#define BOOST_TEST_MODULE DirectedGraphTests
#include <boost/test/included/unit_test.hpp>
BOOST_AUTO_TEST_CASE(test_add_node_int) {
DirectedGraph<int> g;
auto n1 = g.addNode(1);
BOOST_CHECK_EQUAL(1, g.getNode(n1));
}
BOOST_AUTO_TEST_CASE(test_add_node_float) {
DirectedGraph<float> g;
float label = 1.0f;
auto n1 = g.addNode(label);
BOOST_CHECK_CLOSE(label, g.getNode(n1), 0.0001);
}
pub struct DirectedGraph<Label> {
// The mistake is here: size_t should be Label
adjacencies: Vec<Vec<usize>>,
node_labels: Vec<usize>,
}
impl<Label> DirectedGraph<Label> {
pub fn new() -> Self {
DirectedGraph {
adjacencies: Vec::new(),
node_labels: Vec::new(),
}
}
pub fn get_node(
&self,
node_id: usize,
) -> Option<&Label> {
self.node_labels.get(node_id)
}
pub fn add_node(
&mut self,
label: Label,
) -> usize {
self.adjacencies.push(Vec::new());
self.node_labels.push(label);
self.num_nodes() - 1
}
pub fn num_nodes(&self) -> usize {
self.node_labels.len()
}
}
Despite the error, the C++ example compiles and passes the tests.
Running 2 test cases...
*** No errors detected
Even without test cases, the Rust example fails to compile and produces a message useful for identifying the error.
error[E0308]: mismatched types
--> example.rs:26:31
|
6 | impl<Label> DirectedGraph<Label> {
| ----- found this type parameter
...
26 | self.node_labels.push(label);
| ---- ^^^^^ expected `usize`, found type parameter `Label`
| |
| arguments to this method are incorrect
|
= note: expected type `usize`
found type parameter `Label`
Lifetimes parameters
Rust's generics are also used for classes, methods, traits, and functions that are generic in the lifetimes of the references they manipulate. Unlike other type parameters, the using a function with different lifetimes does not cause additional copies of the function to be generated in the compiled code, because lifetimes do not impact the runtime representation.
The chapter on concepts includes examples of how lifetimes interact with Rust's generics.
Conditional compilation
One significant difference between C++ templates and Rust generics is that C++
templates are actually a more general purpose macro language, supporting things
like conditional compilation (e.g., when used in conjunction with if constexpr
, requires
, or std::enable_if
). Rust supports these use cases with
its macro system, which differs significantly from C++. The most common use of
the macro system, conditional compilation, is provided by the cfg
attribute
and cfg!
macro.
The separation of conditional compilation from generics in Rust involves similar design considerations as the omission of template specialization from Rust.
Template specialization
Template specialization in C++ makes it possible for a template entity to have
different implementations for different parameters. Most STL implementations
make use of this to, for example, provide a space-efficient representation of
std::vector<bool>
.
Because of the possibility of template specialization, when a C++ function
operates on values of a template class like std::vector
, the function is
essentially defined in terms of the interface provided by the template class,
rather than for a specific implementation.
To accomplish the same thing in Rust requires defining the function in terms of a trait for the interface against which it operates. This enables clients to select their choice of representation for data by using any concrete type that implements the interface.
This is more practical to do in Rust than in C++, because generics not being a general metaprogramming facility means that generic entities can be type checked locally, making them easier to define. It is more common to do in Rust than in C++ because Rust does not have implementation inheritance, so there is a sharper line between interface and implementation than there is in C++.
The following example shows how a Rust function can be implemented so that
different concrete representations can be selected by a client. For a compact
bit vector representation, the example uses the
BitVec
type
from the bitvec crate. BitVec
is
intended intended to provide an API similar to Vec<bool>
or
std::vector<bool>
.
#include <string>
#include <vector>
template <typename T>
void push_if_even(int n,
std::vector<T> &collection,
T item) {
if (n % 2 == 0) {
collection.push_back(item);
}
}
int main() {
// Operate on the default std::vector
// implementation
std::vector<std::string> v{"a", "b"};
push_if_even(2, v, std::string("c"));
// Operate on the (likely space-optimized)
// std::vector implementation
std::vector<bool> bv{false, true};
push_if_even(2, bv, false);
}
// The Extend trait is for types that support
// appending values to the collection.
fn push_if_even<T, I: Extend<T>>(
n: u32,
collection: &mut I,
item: T,
) {
if n % 2 == 0 {
collection.extend([item]);
}
}
use bitvec::prelude::*;
fn main() {
// Operate on Vec
let mut v =
vec!["a".to_string(), "b".to_string()];
push_if_even(2, &mut v, "c".to_string());
// Operate on BitVec
let mut bv = bitvec![0, 1];
push_if_even(2, &mut bv, 0);
}
Trade-offs between generics and templates
Because generic functions can only interact with generic values in ways defined by the trait bounds, it is easier to test generic implementations. In particular, code testing a generic implementation only has to consider the possible behaviors of the given trait.
For a comparison, consider the following programs.
template <totally_ordered T>
T max(const T &x, const T &y) {
return (x > y) ? x : y;
}
template <>
int max(const int &x, const int &y) {
return (x > y) ? x + 1 : y + 1;
}
#![allow(unused)] fn main() { fn max<'a, T: Ord>(x: &'a T, y: &'a T) -> &'a T { if x > y { x } else { y } } }
In the Rust program, parametricity means that (assuming safe Rust) from the
type alone one can tell that if the function returns, it must return exactly one
of x
or y
. This is because the trait bound Ord
doesn't give any way to
construct new values of type T
, and the use of references doesn't give any way
for the function to store one of x
or y
from an earlier call to return in a
later call.
In the C++ program, a call to max
with int
as the template parameter will
give a distinctly different result than with any other parameter because of the
template specialization enabling the behavior of the function to vary based on
the type.
The trade-off is that in Rust specialized implementations are harder to use because they must have different names, but that they are easier to write because it is easier to write generic code while being confident about its correctness.
Niche optimization
There are several cases where the Rust compiler will perform optimizations to achieve more efficient representations. Those situations are all ones where the efficiency gains do not otherwise change the observable behavior of the code.
The most common case is with the Option
type. When
Option
is used with a type where the compiler can tell that there are unused
values, one f those unused values will be used to represent the None
case, so
that Option<T>
will not require an extra word of memory to indicate the
discriminant of the enum.
This optimization is applied to reference types (&
and &mut
), since
references cannot be null. It is also applied to NonNull<T>
, which represents
a non-null pointer to a value of type T
, and to NonZeroU8
and other non-zero
integral types. The optimization for the reference case is what makes
Option<&T>
and Option<&mut T>
safer equivalents to using non-owning
observation pointers in C++.
Null (nullptr)
This section covers idiomatic uses of nullptr
in C++ and how to achieve the
same results in Rust.
Some uses of nullptr
in C++ don't arise in the first place in Rust because of
other language differences. For example, moved objects don't leave anything
behind that needs to be destroyed. Therefore
there is no need to use nullptr
as a placeholder for a moved pointer that can
have delete
or free
called on it.
Other uses are replaced by Option
, which in safe Rust requires checking for
the empty case before accessing the contained value. This use is common enough
that Rust has an
optimization
for when Option
is used with a reference (&
or &mut ref
), Box
(equivalent of unique_ptr
), and NonNull
(a non-null raw pointer).
Sentinel values
Sentinel values are in-band value that indicates a special situation, such as having reached the end of valid data in an iterator.
nullptr
Many designs in C++ borrow the convention from C of using a null pointer as a
sentinel value for a method that returns owned pointers. For example, a method
that parses a large structure may produce std::nullptr
in the case of failure.
A similar situation in Rust would make use of the type
Option<Box<LargeStructure>>
.
#include <memory>
class LargeStructure {
int field;
// many fields ...
};
std::unique_ptr<LargeStructure>
parse(char *data, size_t len) {
// ...
// on failure
return nullptr;
}
#![allow(unused)] fn main() { struct LargeStructure { field: i32, // many fields ... } fn parse( data: &[u8], ) -> Option<Box<LargeStructure>> { // ... // on failure None } }
The Box<T>
type has the same meaning as std::unique_ptr<T>
in terms of being
an uniquely owned pointer to some T
on the heap, but unlike std::unique_ptr
,
it cannot be null. Rust's Option<T>
is like std::optional<T>
in C++, except
that it can be used with pointers and references. In those cases (and in some
other
cases) the
compiler optimizes the representation to be the same size as Box<T>
by
leveraging the fact that Box
cannot be null.
In Rust it is also common to pay the cost for the extra byte to use a return
type of Result<T, E>
(which is akin to std::expected
in C++23) in order to
make the reason for the failure available at runtime.
Integer sentinels
When a possibly-failing function produces an integer, it is also common to use
an otherwise unused or unlikely integer value as a sentinel value, such as 0
or INT_MAX
.
In Rust, the Option
type is used for this purpose. In cases where the zero
value really is not possible to produce, as with the gcd algorithm above, the
type NonZero<T>
can be used to indicate that fact. As with Option<Box<T>>
,
the compiler optimizes the representation to make use of the unused value (in
this case 0
) to represent the None
case to ensure that the representation of
Option<NonZero<T>>
is the same as the representation of Option<T>
.
#include <algorithm>
int gcd(int a, int b) {
if (b == 0 || a == 0) {
// returns 0 to indicate invalid input
return 0;
}
while (b != 0) {
int temp = b;
b = a % b;
a = temp;
}
return std::abs(a);
}
use std::num::NonZero; fn gcd( mut a: i32, mut b: i32, ) -> Option<NonZero<i32>> { if a == 0 || b == 0 { return None; } while b != 0 { let temp = b; b = a % b; a = temp; } // At this point, a is guaranteed to not be // zero. The `Some` case from `NonZero::new` // has a different meaning than the `Some` // returned from this function, but here it // happens to coincide. NonZero::new(a.abs()) } fn main() { assert!(gcd(5, 0) == None); assert!(gcd(0, 5) == None); assert!(gcd(5, 1) == NonZero::new(1)); assert!(gcd(1, 5) == NonZero::new(1)); assert!(gcd(2 * 2 * 3 * 5 * 7, 2 * 2 * 7 * 11) == NonZero::new(2 * 2 * 7)); assert!(gcd(2 * 2 * 7 * 11, 2 * 2 * 3 * 5 * 7) == NonZero::new(2 * 2 * 7)); }
As an aside, it is also possible to avoid the redundant check for zero at the end, and without using unsafe Rust, by preserving the non-zeroness property throughout the algorithm.
use std::num::NonZero; fn gcd(x: i32, mut b: i32) -> Option<NonZero<i32>> { if b == 0 { return None; } // a is guaranteed to be non-zero, so we record the fact in the type of a. let mut a = NonZero::new(x)?; while let Some(temp) = NonZero::new(b) { b = a.get() % b; a = temp; } Some(a.abs()) } fn main() { assert!(gcd(5, 0) == None); assert!(gcd(0, 5) == None); assert!(gcd(5, 1) == NonZero::new(1)); assert!(gcd(1, 5) == NonZero::new(1)); assert!(gcd(2 * 2 * 3 * 5 * 7, 2 * 2 * 7 * 11) == NonZero::new(2 * 2 * 7)); assert!(gcd(2 * 2 * 7 * 11, 2 * 2 * 3 * 5 * 7) == NonZero::new(2 * 2 * 7)); }
std::optional
In situations where std::optional
would be used as a sentinel value in C++,
Option
can be used for the same purpose in Rust. The main difference between
the two is that safe Rust requires either explicitly checking whether the value is
None
, while in C++ one can attempt to access the value without checking (at
the risk of undefined behavior).
Moved members
Moving values out of variables or fields in Rust is more explicit than it is in
C++. A value that might be moved with nothing left behind needs to be
represented using an Option<Box<T>>
type in Rust, while in C++ it would just
be a std::unique_ptr<T>
.
#include <memory>
void readMailbox(std::unique_ptr<int> &mailbox,
std::mutex mailboxMutex) {
std::lock_guard<std::mutex> guard(mailboxMutex);
if (!mailbox) {
return;
}
int x = *mailbox;
mailbox = nullptr;
// use x
}
#![allow(unused)] fn main() { use std::sync::Arc; use std::sync::Mutex; fn read(mailbox: Arc<Mutex<Option<i32>>>) { let Ok(mut x) = mailbox.lock() else { return; }; let x = x.take(); // use x } }
Additionally, when taking ownership of a value from within a mutable reference,
something has to be left in its place. This can be done using
std::mem::swap
, and many
container-like types have methods for making common ownership-swapping more
ergonomic, like
Option::take
as seen in the earlier example,
Option::replace
or
Vec::swap
.
Deleting moved objects
Another common use of null pointers in modern C++ is as values for the members of moved objects so that the destructor can still safely be called. E.g.,
#include <cstdlib>
#include <cstring>
// widget.h
struct widget_t;
widget_t *alloc_widget();
void free_widget(widget_t*);
void copy_widget(widget_t* dst, widget_t* src);
// widget.cc
class Widget {
widget_t* widget;
public:
Widget() : widget(alloc_widget()) {}
Widget(const Widget &other) : widget(alloc_widget()) {
copy_widget(widget, other.widget);
}
Widget(Widget &&other) : widget(other.widget) {
other.widget = nullptr;
}
~Widget() {
free_widget(widget);
}
};
Rust's notion of moving objects does not involve leaving behind an object on which a destructor will be called, and so this use of null does not have a corresponding idiom. See the chapter on copy and move constructors for more details.
Zero-length arrays
In C++ codebases that are written in a C style or that make use of C libraries, null pointers may be used to represent empty arrays.
In Rust, arrays of arbitrary size are represented as slices. These slices can have zero length. Since Rust vectors are convertible to slices, defining functions that work with slices enables them to be used with vectors as well.
#include <cstddef>
#include <cassert>
int c_style_sum(std::size_t len, int arr[]) {
int sum = 0;
for (size_t i = 0; i < len; i++) {
sum += arr[i];
}
return sum;
}
int main() {
int sum = c_style_sum(0, nullptr);
assert(sum == 0);
}
fn sum_slice(arr: &[i32]) -> i32 { let mut sum = 0; for x in arr { sum += x; } sum } fn main() { let sum = sum_slice(&[]); assert!(sum == 0); let sum2 = sum_slice(&vec![]); assert!(sum2 == 0); }
Encapsulation
In C++ the encapsulation boundary is the class. In Rust the encapsulation boundary is the module, which may contain several types along with standalone functions. In larger projects, the crate may also act as an encapsulation boundary.
This difference means that in Rust one is more likely to have multiple, tightly coupled types that work together which are defined in one module and encapsulated as a whole.
This section provides ways to translate between C++ and Rust's notions of encapsulation both mechanically and conceptually.
Header files
One use of header files in C++ is to expose declarations that are defined in one translation units to other translation units without requiring the duplication of the declarations in multiple files. By convention, declarations that are not included in the header are considered to be private to the defining translation unit (though, to enforce this convention other mechanisms, such as anonymous namespaces, are required).
In contrast, Rust uses neither textually-included header files nor forward declarations. Instead, Rust modules control visibility and linkage simultaneously and expose public definitions for use by other modules.
// person.h
class Person {
std::string name;
public:
Person(std::string name) : name(name) {}
const std::string &getName();
};
// person.cc
#include <string>
#include "person.h"
const std::string &Person::getName() {
return this->name;
}
// client.cc
#include <string>
#include "person.h"
int main() {
Person p("Alice");
const std::string &name = p.getName();
// ...
}
// person.rs
pub struct Person {
name: String,
}
impl Person {
pub fn new(name: String) -> Person {
Person { name }
}
pub fn name(&self) -> &String {
&self.name
}
}
// client.rs
mod person;
use person::*;
fn main() {
let p = Person::new("Alice".to_string());
// doesn't compile, private field
// let name = p.name;
let name = p.name();
//...
}
In person.rs
, the Person
type is public but the name
field is not. This
prevents both direct construction of values of the type (similar to private
members preventing aggregate initialization in C++) and prevents field access.
The static method Person::new(String)
and method Person::name()
are exposed
to clients of the module by the pub
visibility declarations.
In the client
module, the mod
declaration defines the content of person.rs
as a submodule named person
. The use
declaration brings the contents of the
person
module into scope.
The essence of the difference
A C++ program is a collection of translation units. Header files are required to make providing forward declarations of definitions from other translation units manageable.
A Rust program is a tree of modules. Definitions in one module may access items from other modules based on visibility declarations given in the definitions of the module themselves.
Submodules and additional visibility features
Modules and visibility declarations are more powerful than shown in the above
example. More details on how to use modules, pub
, and use
to achieve
encapsulation goals are described in the chapter on private members and
friends.
Anonymous namespaces and static
Anonymous namespaces in C++ are used to avoid symbol collisions between different translation units. Such collisions violate the one definition rule and result in undefined behavior (which at best manifests as linking errors).
For example, without the use of anonymous namespaces, the following would result
in undefined behavior (and no linking error, due to the use of inline
producing
weak symbols in the object files).
/// a.cc
namespace {
inline void common_function_name() {
// ...
}
}
/// b.cc
namespace {
inline void common_function_name() {
// ...
}
}
C++ static declarations are also used to achieve the same goal by making it so that a declaration has internal linkage (and so is not visible outside of the translation unit).
Rust avoids the linkage problem by controlling linkage and visibility simultaneously, with declarations always also being definitions. Instead of translation units, programs are structured in terms of modules, which provide both namespaces and visibility controls over definitions, enabling the Rust compiler to guarantee that symbol collision issues cannot happen.
The following Rust program achieves the same goal as the C++ program above, in terms of avoiding the collision of the two functions while making them available for use within the defining files.
#![allow(unused)] fn main() { // a.rs mod a { fn common_function_name() { // ... } } // b.rs mod b { fn common_function_name() { // ... } } }
Additionally,
- Unlike C++ namespaces, Rust modules (which provide namespacing as well as visibility controls) can only be defined once, and this is checked by the compiler.
- Each file defines a module which has to be explicitly included in the module hierarchy.
- Modules from Rust crates (libraries) are always qualified with some root module name, so they cannot conflict. If they would conflict, the root module name must be replaced with some user-chosen name.
Caveats about C interoperability
When using libraries not managed by Rust, the usual problems can occur if there are symbol collisions in the object files. This can arise when using C or C++ static or dynamic libraries. It can also arise when using Rust static or dynamic libraries built for use in C or C++ programs.
Rust provides #[unsafe(no_mangle)]
to bypass name mangling
in order to produce functions that can be easily
referred to from C or C++. This can also cause undefined behavior due to name collision.
Private members and friends
Private members
In C++ the unit of encapsulation is the class. Access specifiers (private
,
protected
, and public
) that control access to members are enforced at the
class boundary.
In Rust the module is the unit of encapsulation. Item visibility (Rust's analog to access specifiers) controls access to items at the module boundary.
#include <iostream>
#include <string>
class Person {
int age;
public:
std::string name;
// Because age is private, a public constructor
// method is needed to create instances.
Person(std::string name, int age)
: name(name), age(age) {}
// Free functions cannot access private members,
// so this has to be a member function.
static void example() {
Person alice{"Alice", 42};
std::ctout << alice.name << cout::endl;
// The private field is visible here, within
// the class.
std::ctout << alice.age << cout::endl;
}
};
int main() {
Person alice("Alice", 42);
std::cout << alice.name << std::endl;
// compilation error
// std::cout << alice.age << std::endl;
}
mod person { pub struct Person { pub name: String, // this field is private age: i32, } impl Person { // Because age is private, a public // constructor method is needed to create // values outside of the person module. pub fn new( name: String, age: i32, ) -> Person { Person { name, age } } } // Free functions in the same module can // access private fields because the unit of // encapsulation is the module, not the // struct. fn example() { let alice = Person::new("Alice".to_string(), 42); println!("{}", alice.name); // The private field is visible here, // within the module. println!("{}", alice.age); } } use person::Person; fn main() { let alice = Person::new("Alice".to_string(), 42); println!("{}", alice.name); // compilation error // println!("{}", alice.age); }
In the Rust example, the constructor for Person
is
private because one of the
fields is private.
Friends
Because encapsulation is at the module level in Rust, associated methods for
types can access internals of other types defined in the same module. This
subsumes most uses of the C++ friend
declaration.
For example, defining a binary tree in C++ requires that the class representing
the nodes of the tree declare the main binary tree class as a friend in order
for it to access internal methods while keeping them private from other uses.
This would be required even if the TreeNode
class were defined as an inner
class of BinaryTree
.
In Rust, however, both types can be defined in the same module, and so have access to each other's private fields and methods. The module as a whole provides a collection of types, methods, and functions that together define a encapsulated concept.
#include <memory>
class BinaryTree {
// This needs to be an inner class in order for
// it to be private.
class TreeNode {
friend class BinaryTree;
int value;
std::unique_ptr<TreeNode> left;
std::unique_ptr<TreeNode> right;
public:
TreeNode(int value)
: value(value), left(nullptr),
right(nullptr) {}
private:
static void
insert(std::unique_ptr<TreeNode> &node,
int value) {
if (node) {
node->insert(value);
} else {
node = std::make_unique<TreeNode>(value);
}
}
void insert(int value) {
if (value < this->value) {
insert(this->left, value);
} else {
insert(this->right, value);
}
}
};
std::unique_ptr<TreeNode> root;
public:
BinaryTree() : root(nullptr) {}
void insert(int value) {
TreeNode::insert(root, value);
}
};
int main() {
BinaryTree b;
b.insert(42);
return 0;
}
mod binary_tree { pub struct BinaryTree { // This field is not visible outside of // the module. root: Option<Box<TreeNode>>, } impl BinaryTree { pub fn new() -> BinaryTree { BinaryTree { root: None } } pub fn insert(&mut self, value: i32) { insert(&mut self.root, value); } } // This struct and all its fields are not // visible outside of the module. struct TreeNode { value: i32, left: Option<Box<TreeNode>>, right: Option<Box<TreeNode>>, } impl TreeNode { fn new(value: i32) -> TreeNode { TreeNode { value, left: None, right: None, } } fn insert(&mut self, value: i32) { if value < self.value { insert(&mut self.left, value); } else { insert(&mut self.right, value); } } } // This free function is not visible outside // of the module. fn insert( node: &mut Option<Box<TreeNode>>, value: i32, ) { match node { None => { *node = Some(Box::new( TreeNode::new(value), )); } Some(ref mut left) => { left.insert(value); } } } } // This brings the (public) type into scope. use binary_tree::BinaryTree; fn main() { let mut b = BinaryTree::new(); b.insert(42); }
Passkey idiom
In the previous C++ example, the TreeNode
constructor has to be public in
order to be used with make_unique
. Fortunately, the constructor is still
inaccessible outside of the containing class, but it is not always the case that
such helper classes can be inner classes.
To make the constructor effectively private when it is not possible, one might need to use a programming pattern like the passkey idiom.
The passkey idiom is also sometimes used to provide finer-grained control over access to members than is possible with friend declarations. In either case, the effect is achieved by modeling a capability-like system.
In Rust, it is possible to express the same idiom in order to achieve the same effect.
#include <iostream>
#include <memory>
#include <string>
class Person {
int age;
class Passkey {};
public:
std::string name;
Person(Passkey, std::string name, int age)
: name(name), age(age) {}
static std::unique_ptr<Person>
createPerson(std::string name, int age) {
// Other uses of make_unique are not possible
// because the Passkey type cannot be
// constructed.
return std::make_unique<Person>(Passkey(),
name, age);
}
};
pub trait Maker<K, B> { fn make(passkey: K, args: B) -> Self; } // Generic helper that we want to be able to call // an otherwise private function or method. fn alloc_thing<K, B, T: Maker<K, B>>( passkey: K, args: B, ) -> Box<T> { Box::new(Maker::<K, B>::make(passkey, args)) } mod person { use super::*; use std::marker::PhantomData; pub struct Person { pub name: String, age: u32, } // A zero-sized type to act as the passkey. pub struct Passkey { // This field is zero-sized. It is also // private, which prevents construction // of Passkey outside of the person // module. _phantom: PhantomData<()>, } impl Person { // Private method that will be exposed // with a passkey wrapper. fn new(name: String, age: u32) -> Person { Person { name, age } } // Method that uses external helper that // requires access to another // otherwise-private method. fn alloc( name: String, age: u32, ) -> Box<Person> { alloc_thing( Passkey { _phantom: PhantomData {}, }, MakePersonArgs { name, age }, ) } } // Helper structure needed to make the trait // providing the interface generic. pub struct MakePersonArgs { pub name: String, pub age: u32, } // Implementation of the trait that exposes // the method requiring a passkey. impl Maker<Passkey, MakePersonArgs> for Person { fn make( _passkey: Passkey, args: MakePersonArgs, ) -> Person { Person::new(args.name, args.age) } } } fn main() {}
However the Passkey idiom is unlikely to be used in Rust because
- coupled types are usually defined in the same module (or a
pub (in path)
declaration can be used), making it unnecessary, and - it requires cooperation from the interface by which the calling function will use a type.
The second point contrasts with the use above involving std::make_unique
which
is able to forward to the underlying constructor without knowing about it at the
point of the definition of std::make_unique
. While the example below is not
useful (because alloc_thing
is not a useful helper), it does demonstrate what
would types have to be defined in order to achieve the same effect as when using
the idiom in C++.
Friends and testing
Another common use of friend declarations is to make the internals of a class available for unit testing. Though this practice is often discouraged in C++, it is sometimes necessary in order to test other-wise private helper inner classes or helper methods.
In Rust, tests are usually defined in the same module as the code being tested. Because the content of modules is visible to submodules, this makes it so that all of the content of the module is available for testing.
// Using Boost.Test
// https://www.boost.org/doc/libs/1_84_0/libs/test/doc/html/index.html
#include <string>
class Person {
public:
std::string name;
private:
int age;
friend class PersonTest;
public:
Person(std::string name, int age)
: name(name), age(age) {}
void have_birthday() {
this->age = this->age + 1;
}
};
#define BOOST_TEST_MODULE PersonTestModule
#include <boost/test/included/unit_test.hpp>
class PersonTest {
public:
static void test_have_birthday() {
Person alice("Alice", 42);
BOOST_CHECK_EQUAL(alice.age, 42);
alice.have_birthday();
BOOST_CHECK_EQUAL(alice.age, 43);
}
};
BOOST_AUTO_TEST_CASE(have_birthday_test) {
PersonTest::test_have_birthday();
}
#![allow(unused)] fn main() { pub struct Person { pub name: String, age: u32, } impl Person { pub fn new(name: String, age: u32) -> Person { Person { name, age } } pub fn have_birthday(&mut self) { self.age = self.age + 1; } } #[cfg(test)] mod test { use super::Person; #[test] fn test_have_birthday() { let mut alice = Person::new("alice".to_string(), 42); assert_eq!(alice.age, 42); alice.have_birthday(); assert_eq!(alice.age, 43); } } }
Visibility of methods on Rust traits
Because traits in Rust are intended for the definition of interfaces, the methods for some type that are declared by a trait are visible whenever both the trait and the type are visible. In other words, it is not possible to have private trait methods.
The default visibility for trait methods differs from Rust structs where the default visibility is private to the defining module.
Private constructors and friends
In C++ one can control which classes can derive from a specific class by making all of the constructors private and then declaring classes which may derive from it as friends.
In Rust, one can achieve the similar goal of controlling which types can implement a trait by using the sealed trait pattern.
Private constructors
In C++ constructors for classes can be made private by declaring them private,
or by defining a class using class
and using the default private visibility.
In Rust, constructors (the actual constructors, not "constructor methods") for structs are visible from wherever the type and all fields are visible. To achieve similar visibility restrictions as in the C++ example, an additional private field needs to be added to the struct in Rust. Because Rust supports zero-sized types, the additional field can have no performance cost. The unit type has zero size and can be used for this purpose.
#include <string>
struct Person {
std::string name;
int age;
private:
Person() = default;
};
int main() {
// fails to compile, Person::Person() private
// Person nobody;
// fails to compile since C++20
// Person alice{"Alice", 42};
return 0;
}
mod person { pub struct Person { pub name: String, pub age: i32, _private: (), } impl Person { pub fn new( name: String, age: i32, ) -> Person { Person { name, age, _private: (), } } } } use person::*; fn main() { // field `_private` of struct `person::Person` // is private // let alice = Person { // name: "Alice".to_string(), // age: 42, // _private: (), // }; // cannot construct `person::Person` with // struct literal syntax due to private fields // let bob = Person { // name: "Bob".to_string(), // age: 55, // }; let carol = Person::new("Carol".to_string(), 20); // Can match on the public fields, and then // use .. to ignore the remaning ones. let Person { name, age, .. } = carol; }
Enums
Unlike C++ unions, but like std::variant
, Rust enums do not have direct
control over the visibility of their variants or the fields of their variants.
In the following example, the circle
variant of the Shape
union is not
public, so it can only be accessed from within the definition of Shape
, as it
is by the make_circle
static method.
#include <iostream>
struct Triangle {
double base;
double height;
};
struct Circle {
double radius;
};
union Shape {
Triangle triangle;
private:
Circle circle;
public:
static Shape make_circle(double radius) {
Shape s;
s.circle = Circle(radius);
return s;
};
};
int main() {
Shape triangle;
triangle.triangle = Triangle{1.0, 2.0};
Shape circle = Shape::make_circle(1.0);
// fails to compile
// circle.circle = Circle{1.0};
// fails to compile
// std::cout << shape.circle.radius;
}
In Rust visibility modifiers cannot be applied to individual enum variants or their fields.
mod shape { pub enum Shape { Triangle { base: f64, height: f64 }, Circle { radius: f64 }, } } use shape::*; fn main() { // Variant constructor is accesssible despite not being marked pub. let triangle = Shape::Triangle { base: 1.0, height: 2.0, }; let circle = Shape::Circle { radius: 1.0 }; // Fields accessbile despite not being marked pub. match circle { Shape::Triangle { base, height } => { println!("Triangle: {}, {}", base, height); } Shape::Circle { radius } => { println!("Circle {}", radius); } } }
Instead, to control construction of and pattern matching on the enum implementation, one of two approaches can be taken. The first controls construction of and access to the fields, but not inspection of which variant is active.
mod shape { pub struct Triangle { pub base: f64, pub height: f64, _private: (), } pub struct Circle { pub radius: f64, _private: (), } pub enum Shape { Triangle(Triangle), Circle(Circle), } impl Shape { pub fn new_triangle(base: f64, height: f64) -> Shape { Shape::Triangle(Triangle { base, height, _private: (), }) } pub fn new_circle(radius: f64) -> Shape { Shape::Circle(Circle { radius, _private: (), }) } } } use shape::*; fn main() { let triangle = Shape::new_triangle(1.0, 2.0); let circle = Shape::new_circle(1.0); match circle { Shape::Triangle(Triangle { base, height, .. }) => { println!("Triangle: {}, {}", base, height); } Shape::Circle(Circle { radius, .. }) => { println!("Circle: {}", radius); } } }
The second places the enum in a struct with a private field, preventing both construction and inspection from outside of the module.
mod shape { enum ShapeKind { Triangle { base: f64, height: f64 }, Circle { radius: f64 }, } pub struct Shape(ShapeKind); impl Shape { pub fn new_circle(radius: f64) -> Shape { Shape(ShapeKind::Circle { radius }) } pub fn new_triangle(base: f64, height: f64) -> Shape { Shape(ShapeKind::Triangle { base, height }) } pub fn print(&self) { match self.0 { ShapeKind::Triangle { base, height } => { println!("Triangle: {}, {}", base, height); } ShapeKind::Circle { radius } => { println!("Circle: {}", radius); } } } } } use shape::*; fn main() { let triangle = Shape::new_triangle(1.0, 2.0); let circle = Shape::new_circle(1.0); // Does not compile because Shape has private fields. // match circle { // Shape(_) -> {} // } circle.print(); }
If the purpose of making the variants private is to ensure that invariants are
met, then it can be useful to expose the implementing enum (ShapeKind
) but not
the field of the wrapping struct (Shape
), with the invariants only being
guaranteed when the wrapping struct is used. In this case, it is necessary to
make the field private and define a getter function, since otherwise the field
would be modifiable, possibly violating the invariant that the wrapping struct
represents.
mod shape { pub enum ShapeKind { Triangle { base: f64, height: f64 }, Circle { radius: f64 }, } // The field of Shape is private. pub struct Shape(ShapeKind); impl Shape { pub fn new(kind: ShapeKind) -> Option<Shape> { // ... check invariants ... Some(Shape(kind)) } pub fn get_kind(&self) -> &ShapeKind { &self.0 } } } use shape::*; fn main() { let triangle = Shape::new(ShapeKind::Triangle { base: 1.0, height: 2.0, }); let Some(circle) = Shape::new(ShapeKind::Circle { radius: 1.0 }) else { return; }; // Does not compile because Shape has private fields. // match circle { // Shape(c) => {} // }; match circle.get_kind() { ShapeKind::Triangle { base, height } => { println!("Triangle: {}, {}", base, height); } ShapeKind::Circle { radius } => { println!("Circle: {}", radius); } } }
The situation in Rust resembles the situation in C++ when using std::variant
,
for which it is not possible to make the variants themselves private. Instead
either the constructors for the types that form the variants can be made private
or the variant can be wrapped in a class with appropriate visibility controls.
Rust's #[non_exhaustive]
annotation
If a struct or enum is intended to be public within a
crate, but
should not be constructed outside of the crate, then the #[non_exhaustive]
attribute can be used to constrain construction. The attribute can be applied to
both structs and to individual enum variants with the same effect as adding a
private field.
However, the attribute applies the constraint at the level of the crate, not at the level of a module.
#![allow(unused)] fn main() { #[non_exhaustive] pub struct Person { pub name: String, pub age: i32, } pub enum Shape { #[non_exhaustive] Triangle { base: f64, height: f64 }, #[non_exhaustive] Circle { radius: f64 }, } }
The attribute is more typically used to force clients of a library to include the wildcard when matching on the struct fields, making it so that adding additional fields to a struct is not breaking change (i.e., that it does not require the increase of the major version component when using semantic versioning).
Applying the #[non_exhasutive]
attribute to the enum itself makes it as if one
of the variants were private, requiring a wildcard when matching on the variant
itself. This has the same effect in terms of versioning as when used on a struct
but is less advantageous. In most cases, code failing to compile when a new enum
variant is added is desirable, since that indicates a new case that requires
handling logic.
Setter and getter methods
Setters and getters work similarly in C++ and Rust, but are used less frequently in Rust.
It would not be unusual to see the following representation of a two-dimensional vector in C++, which hides its implementation and provides setters and getters to access the fields. This choice would typically be made in case a representation change (such as using polar instead of rectangular coordinates) needed to be made later without breaking clients.
On the other hand, in Rust such a type would almost always be defined with public fields.
class Vec2 {
double x;
double y;
public:
Vec2(double x, double y) : x(x), y(y) {}
double getX() { return x; }
double getY() { return y; }
// ... vector operations ...
};
#![allow(unused)] fn main() { pub struct Vec2 { // public fields instead of getters pub x: f64, pub y: f64, } impl Vec2 { // ... vector operations ... } }
One major reason for the difference is a limitation of the borrow checker. With a getter function the entire structure is borrowed, preventing mutable use of other fields of the structure.
The following program will not compile because get_name()
borrows all of
alice
.
struct Person {
name: String,
age: u32,
}
impl Person {
fn get_name(&self) -> &String {
&self.name
}
}
fn main() {
let mut alice = Person { name: "Alice".to_string(), age: 42 };
let name = alice.get_name();
alice.age = 43;
println!("{}", name);
}
error[E0506]: cannot assign to `alice.age` because it is borrowed
--> example.rs:16:5
|
14 | let name = alice.get_name();
| ----- `alice.age` is borrowed here
15 |
16 | alice.age = 43;
| ^^^^^^^^^^^^^^ `alice.age` is assigned to here but it was already borrowed
17 |
18 | println!("{}", name);
| ---- borrow later used here
error: aborting due to 1 previous error
Some additional reasons for the difference in approach are:
- Ergonomics: Public members make it possible to use pattern matching.
- Transparency of performance: A change in representation would dramatically change the costs involved with the getters. Exposing the representation makes the cost change visible.
- Control over mutability: Static lifetime checking of mutable references removes concerns of unintended mutation of the value through Rust's equivalent of observation pointers.
Types with invariants and newtypes
When types need to preserve invariants but the benefits of exposing fields are
desired, a newtype pattern can be used. A wrapping "newtype" struct that
represents the data with an invariant is defined and access to the fields of the
underlying struct is provided by via a non-mut
reference.
#![allow(unused)] fn main() { pub struct Vec2 { pub x: f64, pub y: f64, } /// Represents a 2-vector that has magnitude 1. pub struct Normalized(Vec2); // note the private field fn sqrt_approx_zero(x: f64) -> bool { x < 0.001 } impl Normalized { pub fn from_vec2(v: Vec2) -> Option<Self> { if sqrt_approx_zero(v.x * v.x + v.y * v.x - 1.0) { Some(Self(v)) } else { None } } // The getter provides a reference to the underlying Vec2 value // without permitting mutation. pub fn get(&self) -> &Vec2 { &self.0 } } }
Borrowing from indexed structures
A significant limitation that arises from the way that getter methods interact
with the borrow checker is that it isn't possible to mutably borrow multiple
elements from an indexed structure like a vector using a methods like
Vec::get_mut
.
The built-in indexed types have several methods for creating split views onto a structure. These can be used to create helper functions that match the requirements of a specific application.
The Rustonomicon has examples of implementing this pattern, using both safe and unsafe Rust.
Setter methods
Setter methods also borrow the entire value, which causes the same problems as getters that return mutable references. As with getter methods, setter methods are mainly used when needed to preserve invariants.
Exceptions and error handling
In C++ errors that are to be handled by the caller are sometimes indicated by
sentinel values (e.g., std::map::find
producing an empty iterator), sometimes
indicated by exceptions (e.g., std::vector::at
throwing std::out_of_range
),
and sometimes indicated by setting an error bit (e.g., std::fstream::fail
).
Errors that are not intended to be handled by the caller are usually indicated
by exceptions (e.g., std::bad_cast
). Errors that are due to programming bugs
often just result in undefined behavior (e.g., std::vector::operator[]
when
the index is out-of-bounds).
In contrast, safe Rust has two mechanisms for indicating errors. When the error
is expected to be handled by the caller (because it is due to, e.g., user
input), the function returns a
Result
or
Option
. When the error is
due to a programming bug, the function panics. Undefined behavior can only occur
if unchecked variants of functions are used with unsafe Rust.
Many libraries in Rust will offer two versions of an API, one which returns a
Result
or Option
type and one of which panics, so that the interpretation of
the error (expected exceptional case or programmer bug) can be chosen by the
caller.
The major differences between using Result
or Option
and using exceptions
are that
Result
andOption
force explicit handling of the error case in order to access the contained value. This also differs fromstd::expected
in C++23.- When propagating errors with
Result
, the types of the errors much match. There are libraries for making this easier to handle.
Result
vs Option
The approaches demonstrated in the Rust examples in this chapter apply to both
Result
and Option
. When the type is Option
it indicates that there is no
additional information to provide in the error case: Option::None
does not
contain a value, but Result::Err
does. When there is no additional
information, is usually because there is exactly one circumstance which can
cause the error case.
It is possible to convert between the two types.
fn main() { let r: Result<i32, &'static str> = None.ok_or("my errror message"); let r2: Result<i32, &'static str> = None.ok_or_else(|| "expensive error message"); let o: Option<i32> = r.ok(); }
Expected errors
In C++, throw
both produces an error (the thrown exception) and initiates
non-local control flow (unwinding to the nearest catch
block). In Rust, error
values (Option::None
or Result::Err
) are returned as normal values from a
function. Rust's return
statement can be used to return early from a function.
#include <stdexcept>
double divide(double dividend, double divisor) {
if (divisor == 0.0) {
throw std::domain_error("zero divisor");
}
return dividend / divisor;
}
#![allow(unused)] fn main() { fn divide( dividend: f64, divisor: f64, ) -> Option<f64> { if divisor == 0.0 { return None; } Some(dividend / divisor) } }
The requirement to have the return type indicate that an error is possible means
that callbacks that are permitted to have errors need to be given an Option
or
Result
return type. Omitting that is like requiring callbacks to be noexcept
in C++. Functions that do not need to indicate errors but that will be used as
callbacks where errors are permitted will need to wrap their results in
Option::Some
or Result::Ok
.
#include <stdexcept>
int produce_42() {
return 42;
}
int fail() {
throw std::runtime_error("oops");
}
int useCallback(int (*func)(void)) {
return func();
}
int main() {
try {
int x = useCallback(produce_42);
int y = useCallback(fail);
// use x and y
} catch (std::runtime_error &e) {
// handle error
}
}
fn produce_42() -> i32 { 42 } fn fail() -> Option<i32> { None } fn use_callback( f: impl Fn() -> Option<i32>, ) -> Option<i32> { f() } fn main() { // need to wrap produce_42 to match the // expected type let Some(x) = use_callback(|| Some(produce_42())) else { // handle error return; }; let Some(y) = use_callback(fail) else { // handle error return; }; // use x and y }
Handling errors
In C++, the only way to handle exceptions is catch
. In Rust, all of the
features for dealing with tagged
unions can be used with Result
and
Option
. The most approach depends on the intention of the program.
The basic way of handling an error indicated by a Result
in Rust is by using
match
.
Using match
is the most general approach, because it enables handling
additional cases explicitly and can be used as an expression. match
connotes
equal importance of all branches.
#include <vector>
#include <stdexcept>
int main() {
std::vector<int> v;
// ... populate v ...
try {
auto x = v.at(0);
// use x
} catch (std::out_of_range &e) {
// handle error
}
}
fn main() { let mut v = Vec::<i32>::new(); // ... populate v ... match v.get(0) { Some(x) => { // use x } None => { // handle error } } }
Because handling only a single variant of a Rust enum is so common, the if let
syntax support that use case. The syntax both makes it clear that only the one
case is important and reduces the levels of indentation.
if let
is less general than match
. It can also be used as an expression, but
can only distinguish one case from the rest. if let
connotes that the else
case is not the normal case, but that some default handling will occur or some
default value will be produced.
Note that with Result
, if let
does not enable accessing the error value.
fn main() { let mut v = Vec::<i32>::new(); // ... populate v ... if let Some(x) = v.get(0) { // use x } else { // handle error } }
When the error handling involves some kind of control flow operation, like
break
or return
, the let else
syntax is even more concise.
Much like normal let
statements, let else
statements can only be used where
statements are expected. let else
statements also connote that the else case
is not the normal case, and that no further (normal) processing will occur.
fn main() { let mut v = Vec::<i32>::new(); // ... populate v ... let Some(x) = v.get(0) else { // handle error return; }; // use x }
Result
and Option
also have some helper methods for handling errors.
These methods resemble the methods on std::expected
in C++.
#include <expected>
#include <string>
int main() {
std::expected<int, std::string> res(42);
auto x(res.transform([](int n) { return n * 2; }));
}
fn main() { let res: Result<i32, String> = Ok(42); let x = res.map(|n| n * 2); }
These helper methods and others are described in detail in the documentation for
Option
and
Result
.
Borrowed results
In the above examples, the successful results are borrowed from the vector. It
common to need to clone or copy the result into an owned copy, and to want to do
so without having to match on and reconstruct the value. Result
and Option
have helper methods for these purposes.
fn main() { let mut v = Vec::<i32>::new(); v.push(42); let x: Option<&i32> = v.get(0); let y: Option<i32> = v.get(0).copied(); let mut w = Vec::<String>::new(); w.push("hello".to_string()); let s: Option<&String> = w.get(0); let r: Option<String> = w.get(0).cloned(); }
Propagating errors
In C++, exceptions propagate automatically. In Rust, errors indicated by
Result
or Option
must be explicitly propagated. The ?
operator is a
convenience for this. There are also several methods for manipulating Result
and Option
that have a similar effect to propagating the error.
#include <cstddef>
#include <vector>
int accessValue(std::vector<std::size_t> indices,
std::vector<int> values,
std::size_t i) {
// vector::at throws
size_t idx(indices.at(i));
// vector::at throws
return values.at(idx);
}
#![allow(unused)] fn main() { fn access_value( indices: Vec<usize>, values: Vec<i32>, i: usize, ) -> Option<i32> { // * dereferences the &i32 to copy it // ? propagates the None let idx = *indices.get(i)?; // returns the Option directly values.get(idx).copied() } }
The above Rust example is equivalent to the following, which does not use the
?
operator. The version using ?
is more idiomatic.
#![allow(unused)] fn main() { fn access_value( indices: Vec<usize>, values: Vec<i32>, i: usize, ) -> Option<i32> { // matching through the & makes a copy of the i32 let Some(&idx) = indices.get(i) else { return None; }; // still returns the Option directly values.get(idx).copied() } }
The following example is also equivalent. It is not idiomatic (using ?
here is
more readable), but does demonstrate one of the helper methods.
Option::and_then
is similar to std::optional::and_then
in
C++23.
#![allow(unused)] fn main() { fn access_value( indices: Vec<usize>, values: Vec<i32>, i: usize, ) -> Option<i32> { // matching through the & makes a copy of the i32 indices .get(i) .and_then(|idx| values.get(*idx)) .copied() } }
These helper methods and others are described in detail in the documentation for
Option
and
Result
.
Uncaught exceptions in main
In C++ when an exception is uncaught, it terminates the program with a non-zero
exit code and an error message. To achieve a similar result using Result
in
Rust, main
can be given a return type of Result
.
#include <stdexcept>
int main() {
throw std::runtime_error("oops");
}
fn main() -> Result<(), &'static str> {
Err("oops")
}
The result type must be unit ()
and the error type can be any type that
implements the Debug
trait.
#[derive(Debug)] struct InterestingError { message: &'static str, other_interesting_value: i32, } fn main() -> Result<(), InterestingError> { Err(InterestingError { message: "oops", other_interesting_value: 9001, }) }
Running this program produces the output Error: InterestingError { message: "oops", other_interesting_value: 9001 }
with an exit code of 1
.
Limitations to forcing error handling with Result
Returning Result
or Option
does not give the usual benefits when used with
APIs that pass pre-allocated buffers by mutable reference. This is because the
buffer is accessible outside of the Result
or Option
, and so the compiler
cannot force handling of the error case.
For example, in the following example the result of read_line
can be ignored,
resulting in logic errors in the program. However, since the buffer is required
to be initialized, it will not result in memory safety violations or undefined
behavior.
fn main() { let mut buffer = String::with_capacity(1024); std::io::stdin().read_line(&mut buffer); // use buffer }
Rust will produce a warning in this case, because of the #[must_use]
attribute
on Result
.
warning: unused `Result` that must be used
--> example.rs:3:5
|
3 | std::io::stdin().read_line(&mut buffer);
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
= note: this `Result` may be an `Err` variant, which should be handled
= note: `#[warn(unused_must_use)]` on by default
help: use `let _ = ...` to ignore the resulting value
|
3 | let _ = std::io::stdin().read_line(&mut buffer);
| +++++++
Option
does not have a #[must_use]
attribute, so functions that return an
Option
that must be handled (due to the None
case indicating an error)
should be annotated with the #[must_use]
attribute. For example, the get
method on slices returns Option
and is annotated as
#[must_use]
.
Type equivalents
The type equivalents listed in this document are equivalent for the purposes of
programming in Rust as one would program in C++. They are not necessarily
equivalent in terms of being useful for interacting with C or C++ programs via
an FFI. For types that are useful for interoperability with C or C++, see the
Rust std::ffi
module
documentation and the FFI
documentation in the Rustonomicon.
Primitive types
Integer types
In C++, many of the integer types (like int
and long
) have implementation
defined widths. In Rust, integer types are always specified with their widths,
much like the types in <cstdint>
in C++. When it isn't clear what integer type
to use, it is common to default to i32
, which is the type that Rust defaults
to for integer
literals.
C++ type | Rust type |
---|---|
uint8_t | u8 |
uint16_t | u16 |
uint32_t | u32 |
uint64_t | u64 |
int8_t | i8 |
int16_t | i16 |
int32_t | i32 |
int64_t | i64 |
size_t | usize |
isize |
In C++ size_t
is conventionally used only for sizes and offsets. The same is
true in Rust for usize
, which is the pointer-sized integer type. The isize
type is the signed equivalent of usize
and has no direct equivalent in C++.
The isize
type is typically only used to represent pointer offsets.
Floating point types
As with integer types in C++, the floating point types float
, double
, and
long double
have implementation defined widths. C++23 introduced types
guaranteed to be IEEE 754 floats of specific widths. Of those, float32_t
and
float64_t
correspond to what is usually expected from float
and double
.
Rust's floating point types are analogous to these.
C++ type | Rust type |
---|---|
float16_t | |
float32_t | f32 |
float64_t | f64 |
float128_t |
The Rust types analogous to float16_t
and float128_t
(f16
and f128
) are
not yet available in stable
Rust.
Raw memory types
In C++ pointers to or arrays of char
, unsigned char
, or byte
are used to
represent raw memory. In Rust, arrays ([u8; N]
), vectors (Vec<u8>
), or
slices (&[u8]
) of u8
are used to accomplish the same goal. However,
accessing the underlying memory of another Rust value in that way requires
unsafe Rust. There are libraries for creating safe wrappers
around that kind of access for purposes such as serialization or interacting
with hardware.
Character and string types
The C++ char
or wchar_t
types have implementation defined widths. Rust does
not have an equivalent to these types. When working with string encodings in
Rust one would use unsigned integer types where one would use the fixed width
character types in C++.
C++ type | Rust type |
---|---|
char8_t | u8 |
char16_t | u16 |
The Rust char
type represents a Unicode scalar value. Thus, a Rust char
is
the same size as a u32
. For working with characters in Rust strings (which are
guaranteed to be valid UTF-8), the char
type is appropriate. For representing
a byte, one should instead use u8
.
The Rust standard library includes a type for UTF-8 strings and string slices:
String
and &str
, respectively. Both types guarantee that represented strings
are valid UTF-8. The Rust char
type is appropriate for representing elements
of a String
.
Because str
(without the reference) is a slice, it is unsized and therefore
must be used behind a pointer-like construct, such as a reference or box. For
this reason, string slices are often described as &str
instead of str
in
documentation, even though they can also be used as Box<str>
, Rc<str>
, etc.
Rust also includes types for platform-specific string representations and slices
of those strings:
std::ffi::OsString
and &std::ffi::OsStr
. While these strings use the OS-specific representation,
to use one with the Rust FFI, it must still be converted to a
CString
.
Unlike C++ which has std::u16string
, Rust has no specific representation for
UTF-16 strings. Something like Vec<u16>
can be used, but the type will not
guarantee that its contents are a valid UTF-16 string. Rust does provide a
mechanisms for converting String
to and from a UTF-16 encoding
(String::encode_utf16
and
String::from_utf16
,
among others) as well as similar mechanisms for accessing the underlying UTF-8
encoding
(https://doc.rust-lang.org/std/string/struct.String.html#method.from_utf8).
Purpose | Rust type |
---|---|
representing text | String and &str |
representing bytes | vectors, arrays, or slices of u8 |
interacting with OS | OsString and &OsStr |
representing UTF-8 | String |
representing UTF-16 | use a library |
Boolean types
The bool
type in Rust is analogous to the bool
type in C++. Unlike C++, Rust
makes guarantees about the size, alignment, and bit pattern used to represent
values of the bool
type.
void
In C++ void
indicates that a function does not return a value. Because Rust is
expression-oriented, all functions return values. In the place of void
, Rust
uses the unit type ()
. When a function does not have a return type declared,
()
is the return type.
#include <iostream>
void process() {
std::cout
<< "Does something, but returns nothing."
<< std::endl;
}
#![allow(unused)] fn main() { fn process() { println!("Does something but returns nothing."); } }
Since the unit type has only one value (also written ()
), values of the type
provide no information. This also means that the return value can be left
implicit, as in the above example. The following example makes the unit type
usage explicit.
#![allow(unused)] fn main() { fn process() -> () { let () = println!("Does something but returns nothing."); () } }
The syntax of the unit type and syntax of the unit value resemble that of an empty tuple. Essentially, that is what the type is. The following example shows some equivalent types, though without the special syntax or language integration.
struct Pair<T1, T2>(T1, T2); // the same as (T1, T2) struct Single<T>(T); // a tuple with just one value (T1) struct Unit; // the same as () // can also be written as // struct Unit(); fn main() { let pair = Pair(1,2.0); let single = Single(1); let unit = Unit; // can also be written as // let unit = Unit(); }
Using a unit type instead of void
enables expressions with unit type (such as
function calls that would return void
in C++) to be used in contexts that
expect a value. This is especially helpful with defining and using generic
functions, instead of needing something like std::is_void
to special-case the
handling when a type is void
.
Pointers
The following table maps the ownership-managing classes from C++ to equivalents types in Rust.
Use | C++ type | Rust type |
---|---|---|
Owned | T | T |
Single owner, dynamic storage | std::unique_ptr<T> | Box<T> |
Shared owner, dynamic storage, immutable, not thread-safe | std::shared_ptr<T> | std::rc::Rc<T> |
Shared owner, dynamic storage, immutable, thread-safe | std::shared_ptr<T> | std::sync::Arc<T> |
Shared owner, dynamic storage, mutable, not thread-safe | std::shared_ptr<T> | std::rc::Rc<std::cell::RefCell<T>> |
Shared owner, dynamic storage, mutable, thread-safe | std::shared_ptr<std::mutex<T>> | std::sync::Arc<std::mutex::Mutex<T>> |
Const reference | const &T | &T |
Mutable reference | &T | &mut T |
Const observer pointer | const *T | &T |
Mutable observer pointer | *T | &mut T |
In C++, the thread safety of std::shared_ptr
is more nuanced than it appears
in this table (e.g., some uses may require std::atomic
). However, in safe Rust
the compiler will prevent the incorrect use of the shared owner types.
Unlike with C++ references, Rust can have references-to-references. Rust references are more like observer pointers than they are like C++ references.
void*
Rust does not have anything directly analogous to void*
in C++. The upcoming chapter
on RTTI
will cover some use cases where the goal is dynamic
typing. The FFI chapter of the
Rustonomicon
covers some use cases where the goal is interoperability with C programs that
use void*
.
Containers
Both C++ and Rust containers own their elements. However, in both the element type may be a non-owning type, such as a pointer in C++ or a reference in Rust.
C++ type | Rust type |
---|---|
std::vector<T> | Vec<T> |
std::array<T, N> | [T; N] |
std::list<T> | std::collections::LinkedList<T> |
std::queue<T> | std::collections::VecDeque<T> |
std::deque<T> | std::collections::VecDeque<T> |
std::stack<T> | Vec<T> |
std::map<K,V> | std::collections::BTreeMap<K,V> |
std::unordered_map<K,V> | std::collections::HashMap<K,V> |
std::set<K> | std::collections::BTreeSet<K> |
std::unordered_set<K> | std::collections::HashSet<K> |
std::priority_queue<T> | std::collections::BinaryHeap<T> |
std::span<T> | &[T] |
For maps and sets instead of the container being parameterized over the hash or
comparison function used, the types require that the key types implement the
std::hash::Hash
(unordered) or std::cmp::Ord
(ordered) traits. To use the containers
with different hash or comparison functions, one must use a wrapper type with a
different implementation of the required trait.
Some C++ container types provided by the STL have no equivalent in Rust. Many of those have equivalents available in third-party libraries.
One significant different in the use of these types between C++ in Rust is with
the Vec<T>
and array [T; N]
types, from which slice references &[T]
or
&mut [T]
to part or all of the data can be cheaply created. For this reason,
when defining a function that does not modify the length of a vector and does
not need to statically know the number of elements in an array, it is more
idiomatic to take a parameter as &[T]
or &mut [T]
than as a reference to the
owned type.
In C++ it is better to take begin and end iterators than a span
when possible,
since iterators are more general. The same is true with Rust and taking a
generic type that implements IntoIter<&T>
or IntoIter<&mut T>
instead of
&[T]
.
#include <iterator>
#include <vector>
template <typename InputIter>
void go(InputIter first, InputIter last) {
for (auto it = first; it != last; ++it) {
// ...
}
}
int main() {
std::vector<int> v = {1, 2, 3};
go(v.begin(), v.end());
}
use std::iter::IntoIterator; fn go<'a>(iter: impl IntoIterator<Item = &'a mut i32>) { for x in iter { // ... } } fn main() { let mut v = vec![1, 2, 3]; go(&mut v); }
Type promotions and conversions
lvalue to rvalue
In C++ lvalues are automatically converted to rvalues when needed.
In Rust the equivalent of lvalues are "place expressions" (expressions that represent memory locations) and the equivalent of rvalues are "value expressions". Place expressions are automatically converted to value expressions when needed.
int main() {
// Local variables are lvalues,
int x(0);
// and therefore may be assigned to.
x = 42;
// x is converted to an lvalue when needed.
int y = x + 1;
}
fn main() { // Local variables are place expressions, let mut x = 0; // and therefore may be assigned to. x = 42; // x is converted to a value expression when // needed. let y = x + 1; }
Array to pointer
In C++, arrays are automatically converted to pointers as required.
The equivalent to this in Rust is the automatic conversion of vector and array references to slice references.
#include <cstring>
int main() {
char example[6] = "hello";
char other[6];
// strncpy takes arguments of type char*
strncpy(other, example, 6);
}
fn third(ts: &[char]) -> Option<&char> { ts.get(2) } fn main() { let vec: Vec<char> = vec!['a', 'b', 'c']; let arr: [char; 3] = ['a', 'b', 'c']; third(&vec); third(&arr); }
Because slice references can be easily used in a memory-safe way, it is generally recommended in Rust to define functions in terms of slice references instead of in terms of references to vectors or arrays, unless vector-specific or array-specific functionality is needed.
Unlike in C++ where the conversion from arrays to pointers is built into the
language, this is actually a general mechanism provided by the Deref
trait, which provides one
kind of user-defined conversion.
Function to pointer
In C++ functions and static member functions are automatically converted to function pointers.
Rust performs the same conversion. In addition to functions and members that do
not take self
as an argument, constructors (proper constructors) also have
function type and can be converted to function pointers. Non-capturing closures
do not have function type, but can also be converted to function pointers.
int twice(int n) {
return n * n;
}
struct MyPair {
int x;
int y;
MyPair(int x, int y) : x(x), y(y) {}
static MyPair make() {
return MyPair{0, 0};
}
};
int main() {
// convert a function to a function pointer
int (*twicePtr)(int) = twice;
int result = twicePtr(5);
// Per C++23 11.4.5.1.6, can't take the address
// of a constructor.
// MyPair (*ctor)(int, int) = MyPair::MyPair;
// MyPair pair = ctor(10, 20);
// convert a static method to a function
// pointer
MyPair (*methodPtr)() = MyPair::make;
MyPair pair2 = methodPtr();
// convert a non-capturing closure to a
// function pointer
int (*closure)(int) = [](int x) -> int {
return x * 5;
};
int closureRes = closure(2);
}
fn twice(x: i32) -> i32 { x * x } struct MyPair(i32, i32); impl MyPair { fn new() -> MyPair { MyPair(0, 0) } } fn main() { // convert a function to a function pointer let twicePtr: fn(i32) -> i32 = twice; let res = twicePtr(5); // convert a constructor to a function pointer let ctorPtr: fn(i32, i32) -> MyPair = MyPair; let pair = ctorPtr(10, 20); // convert a static method to a function // pointer let methodPtr: fn() -> MyPair = MyPair::new; let pair2 = methodPtr(); // convert a non-capturing closure to a // function pointer let closure: fn(i32) -> i32 = |x: i32| x * 5; let closureRes = closure(2); }
Numeric promotion and numeric conversion
In C++ there are several kinds of implicit conversions that occur between numeric types. The most commonly encountered are numeric promotions, which convert numeric types to larger types.
These lossless conversions are not implicit in Rust. Instead, they must be
performed explicitly using the Into::into()
method. These conversions are
provided by implementations of the
From
and
Into
traits. The list
of conversions provided by the Rust standard library is listed on the
documentation
page for
the trait.
int main() {
int x(42);
long y = x;
float a(1.0);
double b = a;
}
fn main() { let x: i32 = 42; let y: i64 = x.into(); let a: f32 = 1.0; let b: f64 = a.into(); }
There are several implicit conversions that occur in C++ that are not lossless. For example, integers can be implicitly converted to unsigned integers in C++.
In Rust, these conversions are also required to be explicit and are provided by
the TryFrom
and
TryInto
traits
which require handling the cases where the value does not map to the other type.
int main() {
int x(42);
unsigned int y(x);
float a(1.0);
double b(a);
}
use std::convert::TryInto; fn main() { let x: i32 = 42; let y: u32 = match x.try_into() { Ok(x) => x, Err(err) => { panic!("Can't convert! {:?}", err); } }; }
Some conversions that occur in C++ are supported by neither From
nor TryFrom
because there is not a clear choice of conversion or because they are not
value-preserving. For example, in C++ int32_t
can implicitly be converted to
float
despite float
not being able to represent all 32 bit integers
precisely, but in Rust there is no TryFrom<i32>
implementation for f32
.
In Rust the only way to convert from an i32
to an f32
is with the as
operator.
The operator can actually be used to convert between other primitive types as
well and does not panic or produce undefined behavior, but may not convert in
the desired way (e.g., it may use a different rounding mode than desired or it
may truncate rather than saturate as desired).
#include <cstdint>
int main() {
int32_t x(42);
float a = x;
}
fn main() { let x: i32 = 42; let a: f32 = x as f32; }
isize
and usize
In the Rust standard library the isize
and usize
types are used for values
intended to used be indices (much like size_t
in C++). However, their use for
other purposes is usually discouraged in favor of using explicitly sized types
such as u32
. This results a situation where values of type u32
have to be
converted to usize
for use in indexing, but Into<usize>
is not implemented
for u32
.
In these cases, best practice is to use TryInto
, and if further error handling
of the failure cause is not desired, to call unwrap
, creating a panic at the
point of conversion.
This is preferred because it prevents the possibility of moving forward with an
incorrect value. E.g., consider converting a u64
to a usize
that has a
32-bit representation with as
, which truncates the result. A value that is one
greater than the u32::MAX
will truncate to 0
, which would probably result in
successfully retrieving the wrong value from a data structure, thus masking a
bug and producing unexpected behavior.
Enums
In C++ enums can be implicitly converted to integer types.
In Rust the conversion requires the use of the as
operator, and providing
From
and TryFrom
implementations to move back and forth between the enum and
its representation type is recommended. Examples and additional details are
given in the chapter on enums.
Qualification conversion
In C++ qualification conversions enable the use of const (or volatile) values where the const (or volatile) qualifier is not expected.
In Rust the equivalent enables the use of mut
variables and mut
references
to be used where non-mut
variables or references are expected.
#include <iostream>
#include <string>
void display(const std::string &msg) {
std::cout << "Displaying: " << msg << std::endl;
}
int main() {
// no const qualifier
std::string message("hello world");
// used where const expected
display(message);
}
fn display(msg: &str) { println!("{}", msg); } fn main() { let mut s: String = "hello world".to_string(); let message: &mut str = s.as_mut(); display(message); }
Integer literals
In C++ integer literals with no suffix indicating type have the smallest type in
which they can fit from int
, long int
, or long long int
. When the literal
is then assigned to a variable of a different type, an implicit conversion is
performed.
In Rust, integer literals have their type inferred depending on context. When
there is insufficient information to infer a type either i32
is assumed or may
require some type annotation to be given.
#include <cstdint>
#include <iostream>
int main() {
// Compiles without error (but with a warning).
uint32_t x = 4294967296;
// assumes int
auto y = 1;
// literal is given a larger type, so it prints
// correctly
std::cout << 4294967296 << std::endl;
// these work as expected
std::cout << INT64_C(4294967296) << std::endl;
uint64_t z = INT64_C(4294967296);
std::cout << z << std::endl;
}
fn main() { // error: literal out of range for `u32` // let x: u32 = 4294967296; // assumes i32 let y = 1; // fails to compile because it is inferred as i32 // print!("{}", 4294967296); // These work, though. println!("{}", 4294967296u64); let z: u64 = 4294967296; println!("{}", z); }
Safe bools
The safe bool idiom exists to make it possible to use types as conditions. Since C++11 this idiom is straightforward to implement.
In Rust instead of converting the value to a boolean, the normal idiom matches
on the value instead. Depending on the situation, the mechanism used for
matching might be match
, if let
, or let else
.
struct Wire {
bool ready;
unsigned int value;
explicit operator bool() const { return ready; }
};
int main() {
Wire w{false, 0};
// ...
if (w) {
// use w.value
} else {
// do something else
}
}
enum Wire { Ready(u32), NotReady, } fn main() { let wire = Wire::NotReady; // ... // match match wire { Wire::Ready(v) => { // use value v } Wire::NotReady => { // do something else } } // if let if let Wire::Ready(v) = wire { // use value v } // let else let Wire::Ready(v) = wire else { // do something that doesn't continue, // like early return return; }; }
User-defined conversions
User-defined conversions are covered in a separate chapter.
User-defined conversions
In C++ user-defined conversions are created using converting
constructors
or conversion
functions. Because
converting constructors are opt-out (via the explicit
specifier), implicit
conversions occur with regularity in C++ code. In the following example both the
assignments and the function calls make use of implicit conversions as provided
by a converting constructor.
Rust makes significantly less use of implicit conversions. Instead most
conversions are explicit. The
std::convert
module
provides several traits for working with user-defined conversions. In Rust, the
below example makes use of explicit conversions by implementing the From
trait.
struct Widget {
Widget(int) {}
Widget(int, int) {}
};
void process(Widget w) {}
int main() {
Widget w1 = 1;
Widget w2 = {4, 5};
process(1);
process({4, 5});
return 0;
}
struct Widget; impl From<i32> for Widget { fn from(_x: i32) -> Widget { Widget } } impl From<(i32, i32)> for Widget { fn from(_x: (i32, i32)) -> Widget { Widget } } fn process(w: Widget) {} fn main() { let w1: Widget = 1.into(); // For construction this is more idiomatic: let w1b = Widget::from(1); let w2: Widget = (4, 5).into(); // For construction this is more idiomatic: let w2b = Widget::from((4, 5)); process(1.into()); process((4, 5).into()); }
The into
method used above is provided via a blanket
implementations
for the Into trait
for types that implement the From
trait. Because of the existence of the
blanket
implementation,
it is generally preferred to implement the From
trait instead of the Into
trait, and let the Into
trait be provided by that blanket implementation.
Conversion functions
C++ conversion functions enable conversions in the other direction, from the defined class to another type.
To achieve the same in Rust, the From
trait can be implemented in the other
direction. At least one of the source type or the target type must be defined in
the same crate as the trait implementation.
#include <utility>
struct Point {
int x;
int y;
operator std::pair<int, int>() const {
return std::pair(x, y);
}
};
void process(std::pair<int, int>) {}
int main() {
Point p1{1, 2};
Point p2{3, 4};
std::pair<int, int> xy = p1;
process(p2);
return 0;
}
struct Point { x: i32, y: i32, } impl From<Point> for (i32, i32) { fn from(p: Point) -> (i32, i32) { (p.x, p.y) } } fn process(x: (i32, i32)) {} fn main() { let p1 = Point { x: 1, y: 2 }; let p2 = Point { x: 3, y: 4 }; let xy: (i32, i32) = p1.into(); process(p2.into()); }
Conversion functions are is often used to implement the safe bool pattern in C++, which is addressed in a different way in Rust.
Borrowing conversions
The methods in the From
and Into
traits take ownership of the values to be
converted. When this is not desired in C++, the conversion function can just
take and return references.
To achieve the same in Rust the AsRef
trait or AsMut
trait are used.
#include <iostream>
#include <string>
struct Person {
std::string name;
operator std::string &() {
return this->name;
}
};
void process(const std::string &name) {
std::cout << name << std::endl;
}
int main() {
Person alice{"Alice"};
process(alice);
return 0;
}
struct Person { name: String, } impl AsRef<str> for Person { fn as_ref(&self) -> &str { &self.name } } fn process(name: &str) { println!("{}", name); } fn main() { let alice = Person { name: "Alice".to_string(), }; process(alice.as_ref()); }
It is common to use AsRef
or AsMut
as a trait bound in function definitions.
Using generics with an AsRef
or AsMut
bound allows clients to call the
functions with anything that can be cheaply viewed as the type that the function
wants to work with. Using this technique, the above definition of process
would be defined as in the following example.
struct Person { name: String, } impl AsRef<str> for Person { fn as_ref(&self) -> &str { &self.name } } fn process<T: AsRef<str>>(name: T) { println!("{}", name.as_ref()); } fn main() { let alice = Person { name: "Alice".to_string(), }; process(alice); }
This technique is often used with functions that take file system paths, so that literal strings can more easily be used as paths.
Fallible conversions
In C++ when conversions might fail it is possible (though usually discouraged) to throw an exception from the converting constructor or converting function.
Error handling in Rust does not use exceptions. Instead
the TryFrom
trait
and TryInto
trait
are used for fallible conversions. These traits differ from From
and Into
in
that they return a Result
, which may indicate a failing case. When a
conversion may fail one should implement TryFrom
and rely on the client to
call unwrap
on the result, rather than panic in a From
implementation.
#include <stdexcept>
#include <string>
class NonEmpty {
std::string s;
public:
NonEmpty(std::string s) : s(s) {
if (this->s.empty()) {
throw std::domain_error("empty string");
}
}
};
int main() {
std::string s("");
NonEmpty x = s; // throws
return 0;
}
use std::convert::TryFrom; use std::convert::TryInto; struct NonEmpty { s: String, } #[derive(Clone, Copy, Debug)] struct NonEmptyStringError; impl TryFrom<String> for NonEmpty { type Error = NonEmptyStringError; fn try_from( s: String, ) -> Result<NonEmpty, NonEmptyStringError> { if s.is_empty() { Err(NonEmptyStringError) } else { Ok(NonEmpty { s }) } } } fn main() { let res: Result< NonEmpty, NonEmptyStringError, > = "".to_string().try_into(); match res { Ok(ne) => { println!("Converted!"); } Err(err) => { println!("Couldn't convert"); } } }
Just like with From
and Into
, there is a blanket
implementation
for TryInto
for everything that implements TryFrom
.
Implicit conversions
Rust does have one kind of user-defined implicit conversion, called deref
coercions,
provided by the Deref
trait and
DerefMut
trait. These
coercions exist for making pointer-like types more ergonomic to use.
An example of implementing the traits for a custom pointer-like type is given in the Rust book.
Summary
A summary of when to use which kind of conversion interface is given in the
documentation for the std::convert
module.
Overloading
C++ supports overloading of functions, so long as the invocations of the functions can be distinguished by the number or types of their arguments.
Rust does not support this kind of function overloading. Instead, Rust has a few different mechanisms (some of which C++ also has) for achieving the effects of overloading in a way that interacts better with type inference. The mechanisms usually involve making the commonalities between the overloaded functions apparent in the code.
#include <string>
double twice(double x) {
return x + x;
}
int twice(int x) {
return x + x;
}
#![allow(unused)] fn main() { fn twice(x: f64) -> f64 { x + x } // error[E0428]: the name `twice` is defined multiple times // fn twice(x: i32) -> i32 { // x + x // } }
In practice, an example like the above would also likely be implemented in a more structured way even in C++, using templates.
When phrased this way, the example can be translated to Rust, with the notable addition of requiring a trait bound on the type.
template <typename T>
T twice(T x) {
return x + x;
}
#![allow(unused)] fn main() { fn twice<T>(x: T) -> T::Output where T: std::ops::Add<T>, T: Copy, { x + x } }
Overloaded methods
In C++ it is possible to have methods with the same name but different signatures on the same type. In Rust there can be at most one method with the same name for each trait implementation and at most one inherent method with the same name for a type.
In cases where there are multiple methods with the same names because the method is defined for multiple traits, the desired method must be distinguished at the call site by specifying the trait.
trait TraitA { fn go(&self) -> String; } trait TraitB { fn go(&self) -> String; } struct MyStruct; impl MyStruct { fn go(&self) -> String { "Called inherent method".to_string() } } impl TraitA for MyStruct { fn go(&self) -> String { "Called Trait A method".to_string() } } impl TraitB for MyStruct { fn go(&self) -> String { "Called Trait B method".to_string() } } fn main() { let my_struct = MyStruct; // Calling the inherent method println!("{}", my_struct.go()); // Calling the method from TraitA println!("{}", TraitA::go(&my_struct)); // Calling the method from TraitB println!("{}", TraitB::go(&my_struct)); }
One exception to this is when the methods are all from the same generic trait
with with different type parameters for the implementations. In that case, if
the signature is sufficient to determine which implementation to use, the trait
does not need to be specified to resolve the method. This is common when using
the From
trait.
struct Widget; impl From<i32> for Widget { fn from(x: i32) -> Widget { Widget } } impl From<f32> for Widget { fn from(x: f32) -> Widget { Widget } } fn main() { // Calls <Widget as From<i32>>::from let w1 = Widget::from(5); // Calls <Widget as From<f32>>::from let w2 = Widget::from(1.0); }
Overloaded operators
In C++ most operators can either be overloaded either with a free-standing function or by providing a method defining the operator on a class.
Rust provides operator via implementation of specific traits. Implementing a method of the same name as required by the trait will not make a type usable with the operator if the trait is not implemented.
struct Vec2 {
double x;
double y;
Vec2 operator+(const Vec2 &other) const {
return Vec2{x + other.x, y + other.y};
}
};
int main() {
Vec2 a{1.0, 2.0};
Vec2 b{3.0, 4.0};
Vec2 c = a + b;
}
#[derive(Clone, Copy)] struct Vec2 { x: f64, y: f64, } impl std::ops::Add for &Vec2 { type Output = Vec2; // Note that the type of self here is &Vec2. fn add(self, other: Self) -> Vec2 { Vec2 { x: self.x + other.x, y: self.y + other.y, } } } fn main() { let a = Vec2 { x: 1.0, y: 2.0 }; let b = Vec2 { x: 3.0, y: 4.0 }; let c = &a + &b; }
Additionally, sometimes it is best to provide trait implementations for various
combinations of reference types, especially for types that implement the Copy trait
, since they are
likely to want to be used either with or without taking a reference. For the
example above, that involve defining four implementations.
#[derive(Clone, Copy)] struct Vec2 { x: f64, y: f64, } impl std::ops::Add<&Vec2> for &Vec2 { type Output = Vec2; fn add(self, other: &Vec2) -> Vec2 { Vec2 { x: self.x + other.x, y: self.y + other.y, } } } // If Vec2 weren't so small, it might be desireable to re-use space in the below // implementations, since they take ownership. impl std::ops::Add<Vec2> for &Vec2 { type Output = Vec2; fn add(self, other: Vec2) -> Vec2 { Vec2 { x: self.x + other.x, y: self.y + other.y, } } } impl std::ops::Add<&Vec2> for Vec2 { type Output = Vec2; fn add(self, other: &Vec2) -> Vec2 { Vec2 { x: self.x + other.x, y: self.y + other.y, } } } impl std::ops::Add<Vec2> for Vec2 { type Output = Vec2; fn add(self, other: Vec2) -> Vec2 { Vec2 { x: self.x + other.x, y: self.y + other.y, } } } fn main() { let a = Vec2 { x: 1.0, y: 2.0 }; let b = Vec2 { x: 3.0, y: 4.0 }; let c = a + b; }
The repetition can be addressed by defining a macro.
#[derive(Clone, Copy)] struct Vec2 { x: f64, y: f64, } macro_rules! impl_add_vec2 { ($lhs:ty, $rhs:ty) => { impl std::ops::Add<$rhs> for $lhs { type Output = Vec2; fn add(self, other: $rhs) -> Vec2 { Vec2 { x: self.x + other.x, y: self.y + other.y, } } } }; } impl_add_vec2!(&Vec2, &Vec2); impl_add_vec2!(&Vec2, Vec2); impl_add_vec2!(Vec2, &Vec2); impl_add_vec2!(Vec2, Vec2); fn main() { let a = Vec2 { x: 1.0, y: 2.0 }; let b = Vec2 { x: 3.0, y: 4.0 }; let c = a + b; }
Default arguments
Default arguments in C++ are sometimes implemented in terms of function overloading.
Rust does not have default arguments. Instead, arguments with Option
type can
be used to provide a similar effect.
unsigned int shift(unsigned int x,
unsigned int shiftAmount) {
return x << shiftAmount;
}
unsigned int shift(unsigned int x) {
return shift(x, 2);
}
int main() {
unsigned int a = shift(7); // shifts by 2
}
use std::ops::Shl; fn shift( x: u32, shift_amount: Option<u32>, ) -> u32 { let a = shift_amount.unwrap_or(2); x.shl(a) } fn main() { let res = shift(7, None); // shifts by 2 }
Unrelated overloads
The lack of completely ad hoc overloading in Rust encourages the definition of traits that capture essential commonalities between types, so that functions can be implemented in terms of those interfaces and used generally. However, it also sometime encourages the anti-pattern of defining of traits that only capture incidental commonalities (such as having methods of the same name).
It is better programming practice in those cases to simply define separate functions, rather than to shoehorn in a trait where no real commonality exists.
This is commonly seen in Rust in the naming conventions for constructor static
methods. Instead of them all being named new
with different arguments, they
are usually given names of the form
from_something
, where
the something
varies based on from what the value is being constructed, or a
more specific name if appropriate.
#![allow(unused)] fn main() { struct Vec3 { x: f64, y: f64, z: f64, } impl Vec3 { fn from_x(x: f64) -> Vec3 { Vec3 { x, y: 0.0, z: 0.0 } } fn from_y(y: f64) -> Vec3 { Vec3 { x: 0.0, y, z: 0.0 } } fn diagonal(d: f64) -> Vec3 { Vec3 { x: d, y: d, z: d } } } }
This differs from the conversion methods supported by the From
and Into
traits, which have the additional purpose of supporting trait bounds on generic
functions which should take any type convertible to a specific type.
Object identity
In C++ the pointer to an object is sometimes used to represent its identity in terms of the logic of a program.
In some cases, this is a standard optimization, such as when implementing the copy assignment operator.
In other cases the pointer value is used as a logical identity to distinguish between specific instances of an object that otherwise have the same properties. For example, representing a labeled graph where there may be distinct nodes that have the same label.
In Rust, some of these cases are not applicable, and others cases are typically handled by instead by implementing a synthetic notion of identity for the values.
Overloading copy assignment and equality comparison operators
For example, when implementing the copy-assignment operator, one might short-circuit when the copied object and the assignee are the same. Note that in this use the pointer values are not stored.
This kind of optimization is unnecessary when implementing Rust's equivalent to
the copy assignment
operator
Clone::clone_from
. The type of Clone::clone_from
prevents the same object
from being passed as both arguments, because one of the arguments is a mutable
reference, which is exclusive, and so prevents the other reference argument from
referring to the same object.
struct Person
{
std::string name;
// many other expensive-to-copy fields
Person& operator=(const Person& other) {
// compare object identity first
if (this != &other) {
this.name = other.name;
// copy the other expensive-to-copy fields
}
return *this;
}
};
#![allow(unused)] fn main() { struct Person { name: String, } impl Clone for Person { fn clone(&self) -> Self { Self { name: self.name.clone() } } fn clone_from(&mut self, source: &Self) { // self and source cannot be the same here, // because that would mean there are a // mutable and an immutable reference to // the same memory location. Therefore, a // check for assignment to self is not // needed, even for the purpose of // optimization. self.name.clone_from(&source.name); } } }
In cases in C++ where most comparisons are between an object and itself (e.g., the object's primary use is to be stored in a hash set), and comparison of unequal objects is expensive, comparing object identity might be used as optimization for the equality comparison operator overload.
For supporting similar operations in Rust,
std::ptr::eq
can be used.
struct Person
{
std::string name;
// many other expensive-to-compare fields
};
bool operator==(const Person& lhs, const Person& rhs) {
// compare object identity first
if (&lhs == &rhs) {
return true;
}
// compare the other expensive-to-compare fields
return true;
}
#![allow(unused)] fn main() { struct Person { name: String, // many other expensive-to-compare fields } impl PartialEq for Person { fn eq(&self, other: &Self) -> bool { if std::ptr::eq(self, other) { return true; } // compare other expensive-to-compare fields true } } impl Eq for Person {} }
Distinguishing between values in a relational structure
The other use is when relationships between values are represented using a data structure external to the values, such as when representing a labeled graph in which multiple nodes might share the same label, but have edges between different sets of other nodes. This differs from the earlier case because the pointer value is preserved.
One real-world example of this is in the LLVM codebase, where occurrences of
declarations, statements, and expressions in the AST are distinguished by object
identity. For example, variable expressions (class DeclRefExpr
) contain the
pointer to the occurrence of the declaration to which the variable
refers.
Similarly, when comparing whether two variable declarations represent
declarations of the same variable, a pointer to some canonical VarDecl
is
used:
VarDecl *VarDecl::getCanonicalDecl();
bool CapturedStmt::capturesVariable(const VarDecl *Var) const {
for (const auto &I : captures()) {
if (!I.capturesVariable() && !I.capturesVariableByCopy())
continue;
if (I.getCapturedVar()->getCanonicalDecl() == Var->getCanonicalDecl())
return true;
}
return false;
}
This kind of use is often discouraged in C++ because of the risk of use-after-free bugs, but might be used in performance sensitive applications where either storing the memory to represent the mapping or the additional indirection to resolve an entity's value from its identity is cost prohibitive.
In Rust it is generally preferred to represent the identity of the objects with synthetic identifiers. This is in part as a technique for modeling self-referential data structures.
As an example, one popular Rust graph library
petgraph uses u32
as its default
node identity type. This incurs the cost of an extra call to dereference the
synthetic identifier to the label of the represented node as well as the extra
memory required to store the mapping from nodes to labels.
A simplified graph representation using the same synthetic identifier technique would look like the following, which represents the node identities by their index in the vectors that represent the labels and the edges.
#![allow(unused)] fn main() { enum Color { Red, Blue } struct Graph { /// Maps from node id to node labels, which here are colors. nodes_labels: Vec<Color>, /// Maps from node id to adjacent nodes ids. edges: Vec<Vec<usize>>, } }
If performance requirements make the use of synthetic identifiers unacceptable,
then it may be necessary to use prevent the value from being moved. The Pin
and PhantomPinned
structs can
be used to achieve an effect similar to deleting the move constructor in C++.
Out parameters
There are several idioms in C++ that involve the use of out parameters: passing pointers or references to functions for the function to mutate to provide its results.
The chapters in this section address idiomatic ways to achieve the same goals that out parameters are used for in C++. Many of the Rust idioms resemble the recommended alternatives to out parameters when programming against newer C++ standards.
Multiple return values
One idiom for returning multiple values from a function or method in C++ is to pass in references to which the values can be assigned.
There are several reasons why this idiom might be used:
- compatibility with versions of C++ earlier than C++11,
- working in a codebase that uses C-style of C++, or
- performance concerns.
The idiomatic translation of this program into Rust makes use of either tuples or a named structure for the return type.
void get_point(int &x, int &y) {
x = 5;
y = 6;
}
int main() {
int x, y;
get_point(x, y);
// ...
}
fn get_point() -> (i32, i32) { (5, 6) } fn main() { let (x, y) = get_point(); // ... }
Rust has a dedicated tuple syntax and supports pattern matching with let
bindings in part to support use cases like this one.
Problems with the direct transliteration
It is possible to transliterate the original example that uses out parameters to Rust, but Rust requires the initialization of the variables before they can be passed to a function. The resulting program is not idiomatic Rust.
// NOT IDIOMATIC RUST fn get_point(x: &mut i32, y: &mut i32) { *x = 5; *y = 6; } fn main() { let mut x = 0; // initialized to arbitrary values let mut y = 0; get_point(&mut x, &mut y); // ... }
This approach requires assigning arbitrary initial values to the variables and making the variables mutable, both of which make it harder for the compiler to help with avoiding programming errors.
Additionally, the Rust compiler is tuned for optimizing the idiomatic version of the program, and produces a significantly faster binary for that version.
In situations where the performance of memory allocation is a concern (such as when it is necessary to reuse entire buffers in memory), the trade-offs may be different. That situation is discussed in the chapter on pre-allocated buffers.
Similarities with idiomatic C++ since C++11
In C++11 and later, std::pair
and std::tuple
are available for returning
multiple values instead of assigning to reference parameters.
#include <tuple>
#include <utility>
std::pair<int, int> get_point() {
return std::make_pair(5, 6);
}
int main() {
int x, y;
std::tie(x, y) = get_point();
// ...
}
This more closely aligns with the normal Rust idiom for returning multiple values.
Optional return values
One idiom in C++ for optionally producing a result from a method or function is to use a reference parameter along with a boolean or integer return value to indicate whether the result was produced. This might be done for the same reasons as for using out parameters for multiple return values:
- compatibility with versions of C++ earlier than C++11,
- working in a codebase that uses C-style of C++, and
- performance concerns.
The idiomatic Rust approach for optionally returning a value is to return a
value of type Option
.
#include <iostream>
bool safe_divide(unsigned int dividend,
unsigned int divisor,
unsigned int "ient) {
if (divisor != 0) {
quotient = dividend / divisor;
return true;
} else {
return false;
}
}
void go(unsigned int dividend,
unsigned int divisor) {
unsigned int quotient;
if (safe_divide(dividend, divisor, quotient)) {
std::cout << quotient << std::endl;
} else {
std::cout << "Division failed!" << std::endl;
}
}
int main() {
go(10, 2);
go(10, 0);
}
fn safe_divide( dividend: u32, divisor: u32, ) -> Option<u32> { if divisor != 0 { Some(dividend / divisor) } else { None } } fn go(dividend: u32, divisor: u32) { match safe_divide(dividend, divisor) { Some(quotient) => { println!("{}", quotient); } None => { println!("Division failed!"); } } } fn main() { go(10, 2); go(10, 0); }
When there is useful information to provide in the failing case, the Result
type can be used instead. The chapter
on error handling describes the use of Result
.
Returning a pointer
When the value being returned is a pointer, another common idiom in C++ is to
use nullptr
to represent the optional case. In the Rust translation of that
idiom, Option
is also used, along with a reference type, such as &
or Box
.
See the chapter on using nullptr
as a sentinel
value for more details.
Problems with the direct transliteration
It is possible to transliterate the original example that uses out parameters to Rust, but the resulting code is not idiomatic.
// NOT IDIOIMATIC RUST fn safe_divide(dividend: u32, divisor: u32, quotient: &mut u32) -> bool { if divisor != 0 { *quotient = dividend / divisor; true } else { false } } fn go(dividend: u32, divisor: u32) { let mut quotient: u32 = 0; // initliazed to arbitrary value if safe_divide(dividend, divisor, &mut quotient) { println!("{}", quotient); } else { println!("Division failed!"); } } fn main() { go(10, 2); go(10, 0); }
This shares the same problems as with using out-parameters for multiple return values.
Similarities with C++ since C++17
C++17 and later offer std::optional
, which can be used to express optional
return values in a way similar to the idiomatic Rust example.
#include <iostream>
#include <optional>
std::optional<unsigned int> safe_divide(unsigned int dividend,
unsigned int divisor) {
if (divisor != 0) {
return std::optional<unsigned int>(dividend / divisor);
} else {
return std::nullopt;
}
}
void go(unsigned int dividend, unsigned int divisor) {
if (auto quotient = safe_divide(dividend, divisor)) {
std::cout << *quotient << std::endl;
} else {
std::cout << "Division failed!" << std::endl;
}
}
int main() {
go(10, 2);
go(10, 0);
}
Helpful Option
utilities
Rust provides several syntactic sugars for simplifying use of functions that return Option
. If a failure should be propagated to the caller, then use the ?
operator:
#![allow(unused)] fn main() { fn safe_divide(dividend: u32, divisor: u32) -> Option<u32> { if divisor != 0 { Some(dividend / divisor) } else { None } } fn go(dividend: u32, divisor: u32) -> Option<()> { let quotient = safe_divide(dividend, divisor)?; println!("{}", quotient); Some(()) } }
If None
should not be propagated, it is sometimes clearer to use let-else
syntax:
fn safe_divide(dividend: u32, divisor: u32) -> Option<u32> { if divisor != 0 { Some(dividend / divisor) } else { None } } fn go(dividend: u32, divisor: u32) { let Some(quotient) = safe_divide(dividend, divisor) else { println!("Division failed!"); return; }; println!("{}", quotient); } fn main() { go(10, 2); go(10, 0); }
If there is a default value that should be used in the None
case, the
Option::unwrap_or
,
Option::unwrap_or_else
,
Option::unwrap_or_default
,
or
Option::unwrap
methods can be used:
fn safe_divide(dividend: u32, divisor: u32) -> Option<u32> { if divisor != 0 { Some(dividend / divisor) } else { None } } fn expensive_computation() -> u32 { // ... 0 } fn go(dividend: u32, divisor: u32) { // If None, returns the given value. let result = safe_divide(dividend, divisor).unwrap_or(0); // If None, returns the result of calling the given function. let result2 = safe_divide(dividend, divisor).unwrap_or_else(expensive_computation); // If None, returns Default::default(), which is 0 for u32. let result3 = safe_divide(dividend, divisor).unwrap_or_default(); // If None, panics. Prefer the other methods! // let result3 = safe_divide(dividend, divisor).unwrap(); } fn main() { go(10, 2); go(10, 0); }
In performance-sensitive code where you have manually checked that the result is
guaranteed to be Some
,
Option::unwrap_unchecked
can be used, but is an unsafe method.
There are additional utility
methods that enable
concise handling of Option
values, which this book covers in the chapter on
exceptions and error handling.
An alternative approach
An alternative approach in Rust to returning optional values is to require that the caller of a function prove that the value with which they call a function will not result in the failing case.
For the above safe division example, this involves the caller guaranteeing that the provided divisor is non-zero. In the following example this is done with a dynamic check. In other contexts the evidence needed may be available statically, provided from callers further upstream, or used more than once. In those cases, this approach reduces both runtime cost and code complexity.
use std::convert::TryFrom; use std::num::NonZero; fn safe_divide(dividend: u32, divisor: NonZero<u32>) -> u32 { // This is more efficient because the overflow check is skipped. dividend / divisor } fn go(dividend: u32, divisor: u32) { let Ok(safe_divisor) = NonZero::try_from(divisor) else { println!("Can't divide!"); return; }; let quotient = safe_divide(dividend, safe_divisor); println!("{}", quotient); } fn main() { go(10, 2); go(10, 0); }
Pre-allocated buffers
There are situations where large quantities of data need to be returned from a function that will be called repeatedly, so that incurring the copies involved in returning by value or repeated heap allocations would be cost prohibitive. Some of these situations include:
- performing file or network IO,
- communicating with graphics hardware,
- communicating with hardware on embedded systems, or
- implementing cryptography algorithms.
In these situations, C++ programs tend to pre-allocate buffers that are reused for all calls. This also usually enables allocating the buffer on the stack, rather than having to use dynamic storage.
The following example pre-allocates a buffer and reads a large file into it within a loop.
#include <fstream>
int main() {
std::ifstream file("/path/to/file");
if (!file.is_open()) {
return -1;
}
byte buf[1024];
while (file.good()) {
file.read(buf, sizeof buf);
std::streamsize count = file.gcount();
// use data in buf
}
return 0;
}
use std::fs::File; use std::io::{BufReader, Read}; fn main() -> Result<(), std::io::Error> { let mut f = BufReader::new(File::open( "/path/to/file", )?); let mut buf = [0u8; 1024]; loop { let count = f.read(&mut buf)?; if count == 0 { break; } // use data in buf } Ok(()) }
The major difference between the C++ program and the Rust program is that in the Rust program the buffer must be initialized before it can be used. In most cases, this one-time initialization cost is not significant. When it is, unsafe Rust is required to avoid the initialization.
The technique for avoiding initialization makes use of
std::mem::MaybeUninit
.
Examples of safe usage of
MaybeUninit
are given in the API documentation for the type.
The IO API in stable Rust does not include support for MaybeUninit
. Instead,
there is a new safe API being developed
that will enable avoiding initialization without requiring unsafe Rust in code
that uses the API.
If the callee might need to grow the provided buffer and dynamic allocation is
allowed, then a &mut Vec<T>
can be used instead of &mut [T]
. This is similar
to providing a std::vector<T>&
in C++. To avoid unnecessary reallocation, the
vector can be created using Vec::<T>::with_capacity(n)
.
A note on reading files
While the examples here use IO to demonstrate re-using pre-allocated buffers,
there are higher-level interfaces available for reading from File
s, both from
the Read
and
BufRead
traits, and
from convenience functions in
std::io
and in
std::fs
.
The techniques described here are useful, however, in other situations where a reusable buffer is required, such as when interacting with hardware APIs, when using existing C or C++ libraries, or when implementing algorithms that produce larges amount of data in chunks, such as cryptography algorithms.
Upcoming changes and BorrowedBuf
The Rust community is refining approaches to working with uninitialized buffers.
On the nightly branch of Rust, one can use
BorrowedBuf
to
achieve the same results as when using slices of MaybeUninit
, but without
having to write any unsafe code. The IO APIs for avoiding unnecessary
initialization use BorrowedBuf
instead of slices of MaybeUninit
.
Curiously recurring template pattern (CRTP)
The C++ curiously recurring template pattern is used to make the concrete type of the derived class available in the definition of methods defined in the base class.
Sharing implementations with static polymorphism
The basic use of the CRTP is for reducing redundancy in implementations that
make use of static polymorphism. In this use case, the this
pointer is cast to
the type provided by the template parameter so that methods from the derived
class can be called. This enables methods implemented in the base class to call
methods in the derived class without having to declare them virtual, avoiding
the cost of dynamic dispatch.
In the following example, Triangle
and Square
have a common implementation
of twiceArea
without the need for dynamic dispatch. This use case is addressed
in Rust using default trait methods.
#include <iostream>
template <typename T>
struct Shape {
// This implementation is shared and can call
// the area method from derived classes without
// declaring it virtual.
double twiceArea() {
return 2.0 * static_cast<T *>(this)->area();
}
};
struct Triangle : public Shape<Triangle> {
double base;
double height;
Triangle(double base, double height)
: base(base), height(height) {}
double area() {
return 0.5 * base * height;
}
};
struct Square : public Shape<Square> {
double side;
Square(double side) : side(side) {}
double area() {
return side * side;
}
};
int main() {
Triangle triangle{2.0, 1.0};
Square square{2.0};
std::cout << triangle.twiceArea() << std::endl;
std::cout << square.twiceArea() << std::endl;
}
trait Shape { fn area(&self) -> f64; fn twice_area(&self) -> f64 { 2.0 * self.area() } } struct Triangle { base: f64, height: f64, } impl Shape for Triangle { fn area(&self) -> f64 { 0.5 * self.base * self.height } } struct Square { side: f64, } impl Shape for Square { fn area(&self) -> f64 { self.side * self.side } } fn main() { let triangle = Triangle { base: 2.0, height: 1.0, }; let square = Square { side: 2.0 }; println!("{}", triangle.twice_area()); println!("{}", square.twice_area()); }
The reason why nothing additional needs to be done for the default method to
invoke area statically in Rust is that calls to methods on self
are always
resolved statically in Rust. This is possible because Rust does not have
inheritance between concrete
types. Despite being defined in
the trait, the default method is actually implemented as part of the
implementing struct.
Method chaining
Another common use for the CRTP is for implementing method chaining when an implementation of a method to be chained is provided by a base class.
In C++ the template parameter is used to ensure that the type returned from the shared function is that of the derived class, so that further methods defined in the derived class can be called on it. The template parameter is also used to call a method on the derived type without declaring the method as virtual.
In Rust the template parameter is not required because the Self
type is
available in traits to refer to the type of the implementing struct.
#include <iostream>
#include <span>
#include <string>
#include <vector>
// D is the type of the derived class
template <typename D>
struct Combinable {
D combineWith(D &d);
// concat is implemented in the base class, but
// operates on values of the derived class.
D concat(std::span<D> vec) {
D acc(*static_cast<D *>(this));
for (D &v : vec) {
acc = acc.combineWith(v);
}
return acc;
}
};
struct Sum : Combinable<Sum> {
int sum;
Sum(int sum) : sum(sum) {}
Sum combineWith(Sum s) {
return Sum(sum + s.sum);
}
// Sum includes an additional method that can be
// chained.
Sum mult(int n) {
return Sum(sum * n);
}
};
int main() {
Sum s(0);
std::vector<Sum> v{1, 2, 3, 4};
Sum x = s.concat(v)
// Even though concat is part of the
// base class, it returns a value of
// the implementing class, making it
// possible to chain methods
// specific to that class.
.mult(2)
.combineWith(5);
std::cout << x.sum << std::endl;
}
// No generic type is required: Self already // refers to implementing type. trait Combinable { fn combine_with(&self, other: &Self) -> Self; // concat has a default implementation in // terms of Self. fn concat(&self, others: &[Self]) -> Self where Self: Clone, { let mut acc = self.clone(); for v in others { acc = acc.combine_with(v); } acc } } #[derive(Clone)] struct Sum(i32); impl Sum { // Sum includes an additional method that can be // chained. fn mult(&self, n: i32) -> Self { Self(self.0 * n) } } impl Combinable for Sum { fn combine_with(&self, other: &Self) -> Self { Self(self.0 + other.0) } } fn main() { let s = Sum(0); let v = vec![Sum(1), Sum(2), Sum(3), Sum(4)]; let x = s .concat(&v) // Even though concat is part of the // trait, it returns a value of the // implementing type, making it possible // to chain methods specific to that type. .mult(2) .combine_with(&Sum(5)); println!("{}", x.0) }
Again, the reason why Self
can refer to the implementing type is that Rust
does not have inheritance between concrete
types. This contrasts with C++
where a value may be used at any number of types which are concrete, and so it
would not be clear which type something like Self
should refer to.
Libraries
C++ programs tend to either use libraries that come with operating system distributions or that are vendored.
Rust programs tend to rely on a central registry of Rust libraries ("crates") called crates.io (along with a central documentation repository created from the in-code documentation of those crates called docs.rs). Dependencies on crates are managed using the Cargo package manager.
Lib.rs is a good resource for finding popular crates organized by category.
Some specific alternatives
C++ library | Rust alternative |
---|---|
STL UTF-16 and UTF-32 strings | widestring |
STL random | rand |
STL regex | regex |
Boost.Test | cargo test |
pybind11 | PyO3 |
OpenSSL | rustls |
If there is a C++ library that you use where you cannot find a Rust alternative, please leave feedback using the link below, letting us know the name and purpose of the library.
Supply chain management
In situations where managing the library supply chain is important, Cargo can be used either with custom self-managed or organization-managed registries or with vendored versions of dependencies fetched from crates.io.
Both approaches provide mechanisms for reviewing dependencies as part supply chain security.
Solutions for supply chain security that do not involve vendoring or custom registries are in progress.
Attribution notices
This book makes use of the Standard C++ Foundation logo under their posted terms of use.
This book makes use of the Rust logo, including a modified version of the logo, under the Creative Commons CC-BY license, as posted in the rust-artwork repository and under the posted terms of use for the trademark.
Click here to leave us feedback about this page.