In the previous blog post in this series, we reviewed some functional aspects of domain modeling in Rust. In this post, as promised, we will evaluate some more advanced features we need to consider in our domain modeling due to the Rust memory management system.
The Ownership and Borrowing model
Rust has a strong ownership and borrowing model. Although it’s out of the scope of this blog post to delve into it, in a nutshell, this model ensures that there is always precisely one owner of a given piece of memory and that the owner is responsible for controlling the memory’s lifetime. This contrasts with languages like Scala, Java, Go, or Python, where garbage collection automatically manages memory and avoids issues such as memory leaks.
The ownership and borrowing rules are enforced at compile time, which means that:
If the program compiles, the program is guaranteed to be memory-safe*.
(* Note: This is true except for Unsafe Rust)
Thus, these rules highlight the importance of memory management in Rust. We need to know that we have two memory regions: the stack and heap. Without going very deep, for further explanations, the stack is the region of memory that stores data with a known and fixed size, such as local variables, function arguments, and return values. In contrast, the heap is a region of memory that stores data with an unknown or dynamic size, such as vectors or strings.
Ownership Model
Every value in Rust has an owner responsible for managing the value’s memory. When the owner goes out of scope, the value is dropped, and its memory is deallocated.
fn main() {
let str1 = String::from("I'll be placed in the heap");
let str2 = str1;
println!("{}", str1); // Compile error
}
In this example, we have set the String str1
to str2
. However, because String
is a heap-allocated type, when we assign str1
to str2
, the data on the heap is not copied – instead, str2
takes ownership of the memory that str1
was pointing to. As a result, str1
is no longer valid after the assignment, and if we try to use it in the following line, we’ll get a compile error.
It illustrates the importance of understanding ownership and that Rust won’t allow us to use a value that has been moved or dropped (If the program compiles, the program is guaranteed to be memory-safe).
Borrowing Model
In addition to ownership, the borrowing model allows us to temporarily loan out references to owned values. A reference is a pointer to a value, allowing you to access the value without owning it. Borrowing rules ensure that references are used safely and no data races exist.
fn print_message(msg: &str) {
println!("{}", msg);
}
fn main() {
let message = String::from("Hello, I belong to the main function!");
print_message(&message);
}
In the example above, message
is a String
; when we call the print_message
function with &message
as an argument, we pass a reference to the message.
The print_message
function takes a reference to the String
as an argument, which means that it borrows the String
and has access to its contents but does not own it. The function cannot modify the borrowed String
.
When we call println!("{}", msg)
, we are printing the borrowed String
contents without modifying it.
Mutability
The combination of ownership and borrowing allows Rust to control mutability in a fine-grained manner. Regarding mutability in Rust, we could list the following properties:
- Mutability is tied to ownership.
- By default, values are immutable.
- If we want to mutate a value, the context must own or borrow it mutably (adding the
mut
keyword). - Borrowing mutably is exclusive, meaning only one mutable reference can exist to a value at any given time. This prevents data races and makes it easy to reason about your code.
Smart Pointers
The previous sections show us where smart pointers come into play. They can be a powerful tool for functional domain modeling, allowing for management ownership and sharing of complex data structures. Indeed, Smart pointers like Box
, Rc
, and Arc
can help enforce immutability principles in functional domain modeling, so we ensure that data is not mutated unintentionally, which is a fundamental principle here.
Furthermore, smart pointers help to manage ownership and borrowing by providing additional functionality on top of regular pointers. For example, Box
provides heap allocation and automatic deallocation of memory when the Box
goes out of scope, making it useful for storing data that needs to outlive the current stack frame. Rc
and Arc
provide shared ownership, allowing multiple owners of a given piece of memory, which can be helpful in data structures that need to be shared across various program parts. Let’s see some concrete examples of these three types of pointers.
Box Smart Pointer
To start explaining the Box
smart pointer, we can refer to the following example:
let candidate_id: CandidateId = CandidateId(2);
Here, the candidate_id
value will be allocated on the stack. The stack is where values are assigned by default. So, if we want to store this value in the heap, we could:
let candidate_id: Box<CandidateId> = Box::new(CandidateId(2));
In this case, the Box
will allocate the CandidateId
instance on the heap and return a smart pointer to it. The pointer will be stored on the stack, along with the candidate_id
variable, a pointer to the Box
. So, the CandidateId
instance will be indirectly allocated on the heap, and its ownership will be managed by the Box
.
Now, let’s introduce a new requirement for managing candidates in the hiring pipeline. The HiringPipeline
model will need to contain a list of candidates, so the first implementation we can think of would be:
struct HiringPipeline {
candidates: Vec<Candidate>,
}
The candidates
vector will be stored as a field inside the HiringPipeline
struct. When an instance of HiringPipeline
is created, memory is allocated to hold the struct, including the candidates
vector, on the stack. In addition, the vector itself will allocate memory on the heap to store its elements.
Regarding ownership, the HiringPipeline
struct owns the Candidate
objects directly:
- When we add a candidate to the vector, its data is stored directly in the vector’s memory allocation.
- When we remove a candidate from the vector, its data is removed from memory.
It is the right choice if we need to store a fixed number of Candidate
objects that are always accessed together and don’t need to be individually modified or transferred.
However, this implementation would not be suitable if we need to move the ownership of Candidate
objects to another struct
or function. In this case, our model should look like this:
struct HiringPipeline {
candidates: Vec<Box<Candidate>>,
}
In this second implementation, the candidates
field is a vector of Box
objects, where the HiringPipeline
struct owns a vector of pointers to Candidate
objects stored on the heap.
- When we add a candidate to the vector, a new
Box
is created to store a pointer to theCandidate
object on the heap, and thisBox
is added to the vector. - When we remove a candidate from the vector, the
Box
is removed, but theCandidate
object itself is not automatically deallocated. However, if no more Boxes point to a givenCandidate
object, Rust’s memory management system will automatically deallocate the object.
Let’s see a concrete use case for this:
let alice = Candidate {
id: CandidateId(1),
name: CandidateName(String::from("Alice")),
email: CandidateEmail::new(String::from("alice@example.com")).unwrap(),
experience_level: ExperienceLevel::Senior,
interview_status: Some(InterviewStatus::Scheduled),
application_status: ApplicationStatus::Submitted,
};
let bob = Candidate {
id: CandidateId(2),
name: CandidateName(String::from("Bob")),
email: CandidateEmail::new(String::from("bob@example.com")).unwrap(),
experience_level: ExperienceLevel::MidLevel,
interview_status: None,
application_status: ApplicationStatus::Rejected,
};
let pipeline = HiringPipeline {
candidates: vec![
Box::new(alice),
Box::new(bob),
],
};
let senior_candidates: Vec<&Box<Candidate>> = pipeline
.candidates
.iter()
.filter(|c| c.experience_level == ExperienceLevel::Senior)
.collect();
println!("Senior candidates: {:?}", senior_candidates);
// Senior candidates: [Candidate { id: CandidateId(1), name: CandidateName("Alice"), email: CandidateEmail("alice@example.com"), experience_level: Senior, interview_status: Some(Scheduled), application_status: Submitted }]
The resulting senior_candidates
vector contains references to the original Candidate
objects, not boxed copies of them. Therefore, using Vec<Box>
allows us to avoid unnecessary memory allocation and copying when working with collections of objects, which are expensive to copy or move. If we had used Vec
instead, the filter
method would have returned shared references to the Candidate
objects in the vector. Yet, the problem is that these shared references have a lifetime tied to the vector. Passing them to other functions or storing them in other data structures would require ensuring that the vector and its contents remain valid for the entire lifetime of those references.
In summary, using Box
, we can keep the Candidate
objects on the heap and only pass around references to them. The lifetimes would be tied to the heap allocation rather than the vector’s lifetime. This allows us to pass ownership of the references without passing ownership of the objects themselves, which is what we want in this case.
Rc Smart Pointer
It’s mandatory to remember that, in functional programming, immutable data is a must, and shared state is always avoided. With this in mind, by using Rc
, we can share immutable data, so programs can maintain a functional style while still benefiting from the ability to share data efficiently without needing to clone or copy the data, which can be expensive for performance and memory usage.
Rc
stands for "reference counted," and it allows having multiple "owners" of a value without having to transfer ownership. For comparison purposes, instead, the Box
smart pointer we have seen is used when we need to transfer ownership of a value from one part of your program to another.
Therefore, each part of the program that needs to use the data can hold a reference to the same instance by using Rc
, so the data can be safely accessed as long as at least one reference is still active. When the number of references drops to zero, the value is automatically dropped, and its memory is deallocated.
Here is an example where we could use the Rc
smart pointer instead of the Box
that we have used before:
use std::rc::Rc;
struct HiringPipeline {
candidates: Vec<Rc<Candidate>>,
}
impl HiringPipeline {
fn new() -> HiringPipeline {
HiringPipeline { candidates: vec![] }
}
fn add_candidate(&mut self, candidate: Candidate) {
self.candidates.push(Rc::new(candidate));
}
}
In this implementation, the HiringPipeline
would own the Rc
objects, not the original Candidate
objects. When a new candidate is added to the HiringPipeline
, it is wrapped in an Rc
using the Rc::new
method and then pushed into the vector. The Rc
keeps track of the number of references to the underlying Candidate
object, and when the last reference is dropped, the Candidate
object is deallocated.
Since Rc
does not provide exclusive ownership of the underlying object, it cannot directly modify the Candidate
object using an Rc
reference.
Arc Smart Pointer
In concurrent programs, sharing data between multiple threads is critical. How could we model our domain considering this in Rust?
Well, Arc
is an abbreviation for "atomically reference-counted smart pointer," and it provides thread-safe reference counting. As a result, multiple threads can share data ownership without causing data races or memory issues. In addition, the reference count is updated atomically, ensuring it is always accurate, even when accessed by multiple threads simultaneously.
Therefore, Arc
is a valuable tool for functional domain modeling when working with concurrent and parallel computations that require shared access to values across multiple threads. By using Arc
, the ownership of values can be safely shared between threads, allowing for more efficient use of resources and better performance.
For example, in a concurrent hiring pipeline, multiple threads could process candidates simultaneously. By using Arc
, each thread can safely access and even modify the shared candidate data without needing to clone the data for each thread, which can be inefficient and lead to memory issues.
use rayon::prelude::*;
use std::sync::{Arc, Mutex};
#[derive(Debug)]
struct HiringPipeline {
candidates: Vec<Arc<Candidate>>,
}
impl HiringPipeline {
fn new() -> HiringPipeline {
HiringPipeline { candidates: vec![] }
}
fn add_candidate(&mut self, candidate: Candidate) {
self.candidates.push(Arc::new(candidate));
}
fn filter_candidates<F>(&self, predicate: F) -> Vec<Arc<Candidate>>
where
F: Fn(&Candidate) -> bool + Send + Sync,
{
let filtered: Vec<Arc<Candidate>> = self
.candidates
.par_iter()
.filter(|candidate| predicate(candidate.as_ref()))
.cloned()
.collect();
filtered
}
}
In this illustration, you might have noticed that we are using the rayon crate for filtering candidates. Indeed, the rayon crate can easily convert a sequential computation into a parallel one. Additionally, the filter_candidates
function accepts a predicate function that implements the Send
and Sync
traits so that it can be safely used in multiple threads.
Let’s look at one example using this model from a multi-threaded program:
use std::sync::Mutex;
use std::thread;
fn main() {
let alice = Candidate {
id: CandidateId(1),
name: CandidateName(String::from("Alice")),
email: CandidateEmail::new(String::from("alice@example.com")).unwrap(),
experience_level: ExperienceLevel::Senior,
interview_status: Some(InterviewStatus::Scheduled),
application_status: ApplicationStatus::Submitted,
};
let bob = Candidate {
id: CandidateId(2),
name: CandidateName(String::from("Bob")),
email: CandidateEmail::new(String::from("bob@example.com")).unwrap(),
experience_level: ExperienceLevel::MidLevel,
interview_status: None,
application_status: ApplicationStatus::Rejected,
};
let mut pipeline = HiringPipeline::new();
pipeline.add_candidate(alice);
pipeline.add_candidate(bob);
let pipeline_arc = Arc::new(Mutex::new(pipeline));
let pipeline_seniors = pipeline_arc.clone();
let handle1 = thread::spawn(move || {
let pipeline = pipeline_seniors.lock().unwrap();
let filtered = pipeline
.filter_candidates(|candidate| candidate.experience_level == ExperienceLevel::Senior);
println!("Filtered candidates in thread 1: {:?}", filtered);
});
let pipeline_mids = pipeline_arc.clone();
let handle2 = thread::spawn(move || {
let pipeline = pipeline_mids.lock().unwrap();
let filtered = pipeline
.filter_candidates(|candidate| candidate.experience_level == ExperienceLevel::MidLevel);
println!("Filtered candidates in thread 2: {:?}", filtered);
});
handle1.join().unwrap();
handle2.join().unwrap();
}
This program creates two threads using thread::spawn()
. Each thread locks the mutex to acquire ownership of the HiringPipeline
struct and filters the candidates
based on their experience level using the filter_candidates()
method. The filtered results are then printed to the console (the order may vary if you try it on your local machine):
Filtered candidates in thread 1: [Candidate { id: CandidateId(1), name: CandidateName("Alice"), email: CandidateEmail("alice@example.com"), experience_level: Senior, interview_status: Some(Scheduled), application_status: Submitted }]
Filtered candidates in thread 2: [Candidate { id: CandidateId(2), name: CandidateName("Bob"), email: CandidateEmail("bob@example.com"), experience_level: MidLevel, interview_status: None, application_status: Rejected }]
Conclusion
In conclusion, Box
, Rc
, and Arc
are all valuable tools for functional domain modeling in Rust. Indeed, several other smart pointers can be helpful, but we have not covered them in this article.
We have seen that Box
is ideal for handling heap-allocated data and providing single ownership. Rc
is helpful when multiple references to the same data are needed, and sharing ownership is necessary but not concurrent. Arc
is beneficial in concurrent programs where shared data ownership is crucial. By using these smart pointers, Rust allows us to manage memory more safely and efficiently, enabling us to build complex domain models that can be easily shared and utilized in concurrent environments.
Understanding the differences between these smart pointers and choosing the right one for the job, in combination with ADTs, the Result
and Option
types, and Traits, is vital to building elegant and efficient domain models that are easy to reason about and maintain but also are safe and efficient.