Object identity
In C++ the pointer to an object is sometimes used to represent its identity in terms of the logic of a program.
In some cases, this is a standard optimization, such as when implementing the copy assignment operator.
In other cases the pointer value is used as a logical identity to distinguish between specific instances of an object that otherwise have the same properties. For example, representing a labeled graph where there may be distinct nodes that have the same label.
In Rust, some of these cases are not applicable, and others cases are typically handled by instead by implementing a synthetic notion of identity for the values.
Overloading copy assignment and equality comparison operators
For example, when implementing the copy-assignment operator, one might short-circuit when the copied object and the assignee are the same. Note that in this use the pointer values are not stored.
This kind of optimization is unnecessary when implementing Rust's equivalent to
the copy assignment
operator
Clone::clone_from
. The type of Clone::clone_from
prevents the same object
from being passed as both arguments, because one of the arguments is a mutable
reference, which is exclusive, and so prevents the other reference argument from
referring to the same object.
struct Person
{
std::string name;
// many other expensive-to-copy fields
Person& operator=(const Person& other) {
// compare object identity first
if (this != &other) {
this.name = other.name;
// copy the other expensive-to-copy fields
}
return *this;
}
};
#![allow(unused)] fn main() { struct Person { name: String, } impl Clone for Person { fn clone(&self) -> Self { Self { name: self.name.clone() } } fn clone_from(&mut self, source: &Self) { // self and source cannot be the same here, // because that would mean there are a // mutable and an immutable reference to // the same memory location. Therefore, a // check for assignment to self is not // needed, even for the purpose of // optimization. self.name.clone_from(&source.name); } } }
In cases in C++ where most comparisons are between an object and itself (e.g., the object's primary use is to be stored in a hash set), and comparison of unequal objects is expensive, comparing object identity might be used as optimization for the equality comparison operator overload.
For supporting similar operations in Rust,
std::ptr::eq
can be used.
struct Person
{
std::string name;
// many other expensive-to-compare fields
};
bool operator==(const Person& lhs, const Person& rhs) {
// compare object identity first
if (&lhs == &rhs) {
return true;
}
// compare the other expensive-to-compare fields
return true;
}
#![allow(unused)] fn main() { struct Person { name: String, // many other expensive-to-compare fields } impl PartialEq for Person { fn eq(&self, other: &Self) -> bool { if std::ptr::eq(self, other) { return true; } // compare other expensive-to-compare fields true } } impl Eq for Person {} }
Distinguishing between values in a relational structure
The other use is when relationships between values are represented using a data structure external to the values, such as when representing a labeled graph in which multiple nodes might share the same label, but have edges between different sets of other nodes. This differs from the earlier case because the pointer value is preserved.
One real-world example of this is in the LLVM codebase, where occurrences of
declarations, statements, and expressions in the AST are distinguished by object
identity. For example, variable expressions (class DeclRefExpr
) contain the
pointer to the occurrence of the declaration to which the variable
refers.
Similarly, when comparing whether two variable declarations represent
declarations of the same variable, a pointer to some canonical VarDecl
is
used:
VarDecl *VarDecl::getCanonicalDecl();
bool CapturedStmt::capturesVariable(const VarDecl *Var) const {
for (const auto &I : captures()) {
if (!I.capturesVariable() && !I.capturesVariableByCopy())
continue;
if (I.getCapturedVar()->getCanonicalDecl() == Var->getCanonicalDecl())
return true;
}
return false;
}
This kind of use is often discouraged in C++ because of the risk of use-after-free bugs, but might be used in performance sensitive applications where either storing the memory to represent the mapping or the additional indirection to resolve an entity's value from its identity is cost prohibitive.
In Rust it is generally preferred to represent the identity of the objects with synthetic identifiers. This is in part as a technique for modeling self-referential data structures.
As an example, one popular Rust graph library
petgraph uses u32
as its default
node identity type. This incurs the cost of an extra call to dereference the
synthetic identifier to the label of the represented node as well as the extra
memory required to store the mapping from nodes to labels.
A simplified graph representation using the same synthetic identifier technique would look like the following, which represents the node identities by their index in the vectors that represent the labels and the edges.
#![allow(unused)] fn main() { enum Color { Red, Blue } struct Graph { /// Maps from node id to node labels, which here are colors. nodes_labels: Vec<Color>, /// Maps from node id to adjacent nodes ids. edges: Vec<Vec<usize>>, } }
If performance requirements make the use of synthetic identifiers unacceptable,
then it may be necessary to use prevent the value from being moved. The Pin
and PhantomPinned
structs can
be used to achieve an effect similar to deleting the move constructor in C++.