In the previous blog post in this series, we reviewed some functional aspects of domain modeling in Rust. In this post, as promised, we will evaluate some more advanced features we need to consider in our domain modeling due to the Rust memory management system.

The Ownership and Borrowing model

Rust has a strong ownership and borrowing model. Although it’s out of the scope of this blog post to delve into it, in a nutshell, this model ensures that there is always precisely one owner of a given piece of memory and that the owner is responsible for controlling the memory’s lifetime. This contrasts with languages like Scala, Java, Go, or Python, where garbage collection automatically manages memory and avoids issues such as memory leaks.

The ownership and borrowing rules are enforced at compile time, which means that:

If the program compiles, the program is guaranteed to be memory-safe*.

(* Note: This is true except for Unsafe Rust)

Thus, these rules highlight the importance of memory management in Rust. We need to know that we have two memory regions: the stack and heap. Without going very deep, for further explanations, the stack is the region of memory that stores data with a known and fixed size, such as local variables, function arguments, and return values. In contrast, the heap is a region of memory that stores data with an unknown or dynamic size, such as vectors or strings.

Ownership Model

Every value in Rust has an owner responsible for managing the value’s memory. When the owner goes out of scope, the value is dropped, and its memory is deallocated.

fn main() {
    let str1 = String::from("I'll be placed in the heap");
    let str2 = str1;
    println!("{}", str1); // Compile error
}

In this example, we have set the String str1 to str2. However, because String is a heap-allocated type, when we assign str1 to str2, the data on the heap is not copied – instead, str2 takes ownership of the memory that str1 was pointing to. As a result, str1 is no longer valid after the assignment, and if we try to use it in the following line, we’ll get a compile error.

It illustrates the importance of understanding ownership and that Rust won’t allow us to use a value that has been moved or dropped (If the program compiles, the program is guaranteed to be memory-safe).

Borrowing Model

In addition to ownership, the borrowing model allows us to temporarily loan out references to owned values. A reference is a pointer to a value, allowing you to access the value without owning it. Borrowing rules ensure that references are used safely and no data races exist.

fn print_message(msg: &str) {
    println!("{}", msg);
}

fn main() {
    let message = String::from("Hello, I belong to the main function!");
    print_message(&message);
}

In the example above, message is a String; when we call the print_message function with &message as an argument, we pass a reference to the message.

The print_message function takes a reference to the String as an argument, which means that it borrows the String and has access to its contents but does not own it. The function cannot modify the borrowed String.

When we call println!("{}", msg), we are printing the borrowed String contents without modifying it.

Mutability

The combination of ownership and borrowing allows Rust to control mutability in a fine-grained manner. Regarding mutability in Rust, we could list the following properties:

Mutability is tied to ownership.
By default, values are immutable.
If we want to mutate a value, the context must own or borrow it mutably (adding the mut keyword).
Borrowing mutably is exclusive, meaning only one mutable reference can exist to a value at any given time. This prevents data races and makes it easy to reason about your code.

Smart Pointers

The previous sections show us where smart pointers come into play. They can be a powerful tool for functional domain modeling, allowing for management ownership and sharing of complex data structures. Indeed, Smart pointers like Box, Rc, and Arc can help enforce immutability principles in functional domain modeling, so we ensure that data is not mutated unintentionally, which is a fundamental principle here.

Furthermore, smart pointers help to manage ownership and borrowing by providing additional functionality on top of regular pointers. For example, Box provides heap allocation and automatic deallocation of memory when the Box goes out of scope, making it useful for storing data that needs to outlive the current stack frame. Rc and Arc provide shared ownership, allowing multiple owners of a given piece of memory, which can be helpful in data structures that need to be shared across various program parts. Let’s see some concrete examples of these three types of pointers.

Box Smart Pointer

To start explaining the Box smart pointer, we can refer to the following example:

let candidate_id: CandidateId = CandidateId(2);

Here, the candidate_id value will be allocated on the stack. The stack is where values are assigned by default. So, if we want to store this value in the heap, we could:

let candidate_id: Box<CandidateId> = Box::new(CandidateId(2));

In this case, the Box will allocate the CandidateId instance on the heap and return a smart pointer to it. The pointer will be stored on the stack, along with the candidate_id variable, a pointer to the Box. So, the CandidateId instance will be indirectly allocated on the heap, and its ownership will be managed by the Box.

Now, let’s introduce a new requirement for managing candidates in the hiring pipeline. The HiringPipeline model will need to contain a list of candidates, so the first implementation we can think of would be:

struct HiringPipeline {
    candidates: Vec<Candidate>,
}

The candidates vector will be stored as a field inside the HiringPipeline struct. When an instance of HiringPipeline is created, memory is allocated to hold the struct, including the candidates vector, on the stack. In addition, the vector itself will allocate memory on the heap to store its elements.

Regarding ownership, the HiringPipeline struct owns the Candidate objects directly:

When we add a candidate to the vector, its data is stored directly in the vector’s memory allocation.
When we remove a candidate from the vector, its data is removed from memory.

It is the right choice if we need to store a fixed number of Candidate objects that are always accessed together and don’t need to be individually modified or transferred.

However, this implementation would not be suitable if we need to move the ownership of Candidate objects to another struct or function. In this case, our model should look like this:

struct HiringPipeline {
    candidates: Vec<Box<Candidate>>,
}

In this second implementation, the candidates field is a vector of Box objects, where the HiringPipeline struct owns a vector of pointers to Candidate objects stored on the heap.

When we add a candidate to the vector, a new Box is created to store a pointer to the Candidate object on the heap, and this Box is added to the vector.
When we remove a candidate from the vector, the Box is removed, but the Candidate object itself is not automatically deallocated. However, if no more Boxes point to a given Candidate object, Rust’s memory management system will automatically deallocate the object.

Let’s see a concrete use case for this:

let alice = Candidate {
    id: CandidateId(1),
    name: CandidateName(String::from("Alice")),
    email: CandidateEmail::new(String::from("alice@example.com")).unwrap(),
    experience_level: ExperienceLevel::Senior,
    interview_status: Some(InterviewStatus::Scheduled),
    application_status: ApplicationStatus::Submitted,
};
let bob = Candidate {
    id: CandidateId(2),
    name: CandidateName(String::from("Bob")),
    email: CandidateEmail::new(String::from("bob@example.com")).unwrap(),
    experience_level: ExperienceLevel::MidLevel,
    interview_status: None,
    application_status: ApplicationStatus::Rejected,
};

let pipeline = HiringPipeline {
    candidates: vec![
        Box::new(alice),
        Box::new(bob),
    ],
};

let senior_candidates: Vec<&Box<Candidate>> = pipeline
    .candidates
    .iter()
    .filter(|c| c.experience_level == ExperienceLevel::Senior)
    .collect();

println!("Senior candidates: {:?}", senior_candidates);
// Senior candidates: [Candidate { id: CandidateId(1), name: CandidateName("Alice"), email: CandidateEmail("alice@example.com"), experience_level: Senior, interview_status: Some(Scheduled), application_status: Submitted }]

The resulting senior_candidates vector contains references to the original Candidate objects, not boxed copies of them. Therefore, using Vec<Box> allows us to avoid unnecessary memory allocation and copying when working with collections of objects, which are expensive to copy or move. If we had used Vec instead, the filter method would have returned shared references to the Candidate objects in the vector. Yet, the problem is that these shared references have a lifetime tied to the vector. Passing them to other functions or storing them in other data structures would require ensuring that the vector and its contents remain valid for the entire lifetime of those references.

In summary, using Box, we can keep the Candidate objects on the heap and only pass around references to them. The lifetimes would be tied to the heap allocation rather than the vector’s lifetime. This allows us to pass ownership of the references without passing ownership of the objects themselves, which is what we want in this case.

Rc Smart Pointer

It’s mandatory to remember that, in functional programming, immutable data is a must, and shared state is always avoided. With this in mind, by using Rc, we can share immutable data, so programs can maintain a functional style while still benefiting from the ability to share data efficiently without needing to clone or copy the data, which can be expensive for performance and memory usage.

Rc stands for "reference counted," and it allows having multiple "owners" of a value without having to transfer ownership. For comparison purposes, instead, the Box smart pointer we have seen is used when we need to transfer ownership of a value from one part of your program to another.

Therefore, each part of the program that needs to use the data can hold a reference to the same instance by using Rc, so the data can be safely accessed as long as at least one reference is still active. When the number of references drops to zero, the value is automatically dropped, and its memory is deallocated.

Here is an example where we could use the Rc smart pointer instead of the Box that we have used before:

use std::rc::Rc;
struct HiringPipeline {
    candidates: Vec<Rc<Candidate>>,
}

impl HiringPipeline {
    fn new() -> HiringPipeline {
        HiringPipeline { candidates: vec![] }
    }

    fn add_candidate(&mut self, candidate: Candidate) {
        self.candidates.push(Rc::new(candidate));
    }
}

In this implementation, the HiringPipeline would own the Rc objects, not the original Candidate objects. When a new candidate is added to the HiringPipeline, it is wrapped in an Rc using the Rc::new method and then pushed into the vector. The Rc keeps track of the number of references to the underlying Candidate object, and when the last reference is dropped, the Candidate object is deallocated.

Since Rc does not provide exclusive ownership of the underlying object, it cannot directly modify the Candidate object using an Rc reference.

Arc Smart Pointer

In concurrent programs, sharing data between multiple threads is critical. How could we model our domain considering this in Rust?

Well, Arc is an abbreviation for "atomically reference-counted smart pointer," and it provides thread-safe reference counting. As a result, multiple threads can share data ownership without causing data races or memory issues. In addition, the reference count is updated atomically, ensuring it is always accurate, even when accessed by multiple threads simultaneously.

Therefore, Arc is a valuable tool for functional domain modeling when working with concurrent and parallel computations that require shared access to values across multiple threads. By using Arc, the ownership of values can be safely shared between threads, allowing for more efficient use of resources and better performance.

For example, in a concurrent hiring pipeline, multiple threads could process candidates simultaneously. By using Arc, each thread can safely access and even modify the shared candidate data without needing to clone the data for each thread, which can be inefficient and lead to memory issues.

use rayon::prelude::*;
use std::sync::{Arc, Mutex};

#[derive(Debug)]
struct HiringPipeline {
    candidates: Vec<Arc<Candidate>>,
}

impl HiringPipeline {
    fn new() -> HiringPipeline {
        HiringPipeline { candidates: vec![] }
    }

    fn add_candidate(&mut self, candidate: Candidate) {
        self.candidates.push(Arc::new(candidate));
    }

    fn filter_candidates<F>(&self, predicate: F) -> Vec<Arc<Candidate>>
    where
        F: Fn(&Candidate) -> bool + Send + Sync,
    {
        let filtered: Vec<Arc<Candidate>> = self
            .candidates
            .par_iter()
            .filter(|candidate| predicate(candidate.as_ref()))
            .cloned()
            .collect();

        filtered
    }
}

In this illustration, you might have noticed that we are using the rayon crate for filtering candidates. Indeed, the rayon crate can easily convert a sequential computation into a parallel one. Additionally, the filter_candidates function accepts a predicate function that implements the Send and Sync traits so that it can be safely used in multiple threads.

Let’s look at one example using this model from a multi-threaded program:

use std::sync::Mutex;
use std::thread;

fn main() {
    let alice = Candidate {
        id: CandidateId(1),
        name: CandidateName(String::from("Alice")),
        email: CandidateEmail::new(String::from("alice@example.com")).unwrap(),
        experience_level: ExperienceLevel::Senior,
        interview_status: Some(InterviewStatus::Scheduled),
        application_status: ApplicationStatus::Submitted,
    };
    let bob = Candidate {
        id: CandidateId(2),
        name: CandidateName(String::from("Bob")),
        email: CandidateEmail::new(String::from("bob@example.com")).unwrap(),
        experience_level: ExperienceLevel::MidLevel,
        interview_status: None,
        application_status: ApplicationStatus::Rejected,
    };

    let mut pipeline = HiringPipeline::new();

    pipeline.add_candidate(alice);
    pipeline.add_candidate(bob);

    let pipeline_arc = Arc::new(Mutex::new(pipeline));

    let pipeline_seniors = pipeline_arc.clone();
    let handle1 = thread::spawn(move || {
        let pipeline = pipeline_seniors.lock().unwrap();
        let filtered = pipeline
            .filter_candidates(|candidate| candidate.experience_level == ExperienceLevel::Senior);
        println!("Filtered candidates in thread 1: {:?}", filtered);
    });

    let pipeline_mids = pipeline_arc.clone();
    let handle2 = thread::spawn(move || {
        let pipeline = pipeline_mids.lock().unwrap();
        let filtered = pipeline
            .filter_candidates(|candidate| candidate.experience_level == ExperienceLevel::MidLevel);
        println!("Filtered candidates in thread 2: {:?}", filtered);
    });

    handle1.join().unwrap();
    handle2.join().unwrap();
}

This program creates two threads using thread::spawn(). Each thread locks the mutex to acquire ownership of the HiringPipeline struct and filters the candidates based on their experience level using the filter_candidates() method. The filtered results are then printed to the console (the order may vary if you try it on your local machine):

Filtered candidates in thread 1: [Candidate { id: CandidateId(1), name: CandidateName("Alice"), email: CandidateEmail("alice@example.com"), experience_level: Senior, interview_status: Some(Scheduled), application_status: Submitted }]
Filtered candidates in thread 2: [Candidate { id: CandidateId(2), name: CandidateName("Bob"), email: CandidateEmail("bob@example.com"), experience_level: MidLevel, interview_status: None, application_status: Rejected }]

Conclusion

In conclusion, Box, Rc, and Arc are all valuable tools for functional domain modeling in Rust. Indeed, several other smart pointers can be helpful, but we have not covered them in this article.

We have seen that Box is ideal for handling heap-allocated data and providing single ownership. Rc is helpful when multiple references to the same data are needed, and sharing ownership is necessary but not concurrent. Arc is beneficial in concurrent programs where shared data ownership is crucial. By using these smart pointers, Rust allows us to manage memory more safely and efficiently, enabling us to build complex domain models that can be easily shared and utilized in concurrent environments.

Understanding the differences between these smart pointers and choosing the right one for the job, in combination with ADTs, the Result and Option types, and Traits, is vital to building elegant and efficient domain models that are easy to reason about and maintain but also are safe and efficient.

Functional Domain Modeling in Rust – Part 2