Rust - Unsafe Rust

Overview

Estimated time: 45–55 minutes

Understand when and how to use unsafe Rust for low-level programming, interfacing with C libraries, and performance-critical code. Learn about raw pointers, unsafe blocks, and the responsibilities that come with bypassing Rust's safety guarantees.

Learning Objectives

Prerequisites

What is Unsafe Rust?

Safe vs Unsafe Rust

Understanding the boundary between safe and unsafe code:

fn main() {
    // This is safe Rust - compiler guarantees memory safety
    let mut numbers = vec![1, 2, 3, 4, 5];
    let first = &numbers[0];
    // numbers.push(6); // This would cause a compile error
    println!("First: {}", first);
    
    // This requires unsafe - we're telling the compiler we know what we're doing
    unsafe {
        let raw_ptr = numbers.as_mut_ptr();
        *raw_ptr = 42; // Direct memory access
    }
    
    println!("Modified vector: {:?}", numbers);
    
    // The unsafe operations
    println!("Unsafe operations available:");
    println!("1. Dereference raw pointers");
    println!("2. Call unsafe functions");
    println!("3. Access/modify static mutable variables");
    println!("4. Implement unsafe traits");
    println!("5. Access fields of unions");
}

When to Use Unsafe

Common scenarios where unsafe code is necessary:

// 1. Implementing fundamental data structures
struct MyVec {
    ptr: *mut T,
    len: usize,
    capacity: usize,
}

impl MyVec {
    fn new() -> Self {
        MyVec {
            ptr: std::ptr::null_mut(),
            len: 0,
            capacity: 0,
        }
    }
    
    fn push(&mut self, value: T) {
        if self.len == self.capacity {
            self.resize();
        }
        
        unsafe {
            // Write to uninitialized memory
            std::ptr::write(self.ptr.add(self.len), value);
        }
        self.len += 1;
    }
    
    fn resize(&mut self) {
        let new_capacity = if self.capacity == 0 { 1 } else { self.capacity * 2 };
        
        unsafe {
            let layout = std::alloc::Layout::array::(new_capacity).unwrap();
            let new_ptr = if self.capacity == 0 {
                std::alloc::alloc(layout) as *mut T
            } else {
                let old_layout = std::alloc::Layout::array::(self.capacity).unwrap();
                std::alloc::realloc(self.ptr as *mut u8, old_layout, layout.size()) as *mut T
            };
            
            if new_ptr.is_null() {
                panic!("Allocation failed");
            }
            
            self.ptr = new_ptr;
            self.capacity = new_capacity;
        }
    }
}

// 2. FFI with C libraries
extern "C" {
    fn strlen(s: *const libc::c_char) -> libc::size_t;
    fn malloc(size: libc::size_t) -> *mut libc::c_void;
    fn free(ptr: *mut libc::c_void);
}

// 3. Performance-critical code that needs to bypass bounds checks
fn fast_sum_slice(slice: &[i32]) -> i32 {
    let mut sum = 0;
    let ptr = slice.as_ptr();
    let len = slice.len();
    
    unsafe {
        for i in 0..len {
            sum += *ptr.add(i); // No bounds checking
        }
    }
    sum
}

fn main() {
    // Example usage
    let numbers = [1, 2, 3, 4, 5];
    let sum = fast_sum_slice(&numbers);
    println!("Fast sum: {}", sum);
}

Raw Pointers

Creating and Using Raw Pointers

Raw pointers provide direct memory access without safety guarantees:

fn main() {
    let mut num = 42;
    
    // Create raw pointers from references
    let raw_const: *const i32 = #
    let raw_mut: *mut i32 = &mut num;
    
    println!("Raw const pointer: {:p}", raw_const);
    println!("Raw mut pointer: {:p}", raw_mut);
    
    // Safe operations on raw pointers (no dereferencing)
    println!("Pointer is null: {}", raw_const.is_null());
    println!("Pointer address: {:p}", raw_const);
    
    // Unsafe operations
    unsafe {
        // Dereference raw pointers
        println!("Value through const pointer: {}", *raw_const);
        
        // Modify through mutable pointer
        *raw_mut = 100;
        println!("Modified value: {}", *raw_mut);
        
        // Pointer arithmetic
        let arr = [1, 2, 3, 4, 5];
        let ptr = arr.as_ptr();
        
        for i in 0..arr.len() {
            println!("Element {}: {}", i, *ptr.add(i));
        }
    }
    
    // Creating pointers from arbitrary addresses (very dangerous!)
    let address = 0x12345678usize;
    let ptr = address as *const i32;
    
    // Don't dereference arbitrary pointers!
    // unsafe { println!("{}", *ptr); } // This would likely crash
    
    println!("Final num value: {}", num);
}

Pointer Arithmetic and Memory Layout

Working with memory layout and pointer calculations:

use std::mem;

#[repr(C)]
struct Point {
    x: f64,
    y: f64,
}

fn analyze_memory_layout() {
    let points = vec![
        Point { x: 1.0, y: 2.0 },
        Point { x: 3.0, y: 4.0 },
        Point { x: 5.0, y: 6.0 },
    ];
    
    println!("Point size: {} bytes", mem::size_of::());
    println!("Point alignment: {} bytes", mem::align_of::());
    
    unsafe {
        let ptr = points.as_ptr();
        
        // Access as raw bytes
        let byte_ptr = ptr as *const u8;
        
        println!("First point as bytes:");
        for i in 0..mem::size_of::() {
            print!("{:02x} ", *byte_ptr.add(i));
        }
        println!();
        
        // Pointer arithmetic
        for i in 0..points.len() {
            let point_ptr = ptr.add(i);
            println!("Point {}: ({}, {})", i, (*point_ptr).x, (*point_ptr).y);
            
            // Access individual fields
            let x_ptr = &(*point_ptr).x as *const f64;
            let y_ptr = &(*point_ptr).y as *const f64;
            
            println!("  X at {:p}: {}", x_ptr, *x_ptr);
            println!("  Y at {:p}: {}", y_ptr, *y_ptr);
        }
    }
}

fn main() {
    analyze_memory_layout();
    
    // Manual memory management example
    unsafe {
        // Allocate memory for 5 integers
        let layout = std::alloc::Layout::array::(5).unwrap();
        let ptr = std::alloc::alloc(layout) as *mut i32;
        
        if ptr.is_null() {
            panic!("Allocation failed");
        }
        
        // Initialize the memory
        for i in 0..5 {
            std::ptr::write(ptr.add(i), i as i32 * 10);
        }
        
        // Read back the values
        println!("Allocated array:");
        for i in 0..5 {
            println!("  [{}] = {}", i, *ptr.add(i));
        }
        
        // Clean up - must deallocate what we allocated
        std::alloc::dealloc(ptr as *mut u8, layout);
    }
}

Unsafe Functions and Traits

Defining Unsafe Functions

Functions that perform unsafe operations must be marked as unsafe:

// Unsafe function - caller must ensure safety invariants
unsafe fn dangerous_function(ptr: *mut i32, len: usize) -> i32 {
    let mut sum = 0;
    for i in 0..len {
        sum += *ptr.add(i); // Potential out-of-bounds access
    }
    sum
}

// Safe wrapper that ensures safety
fn safe_sum_array(slice: &[i32]) -> i32 {
    unsafe {
        // We know this is safe because slice guarantees valid memory
        dangerous_function(slice.as_ptr() as *mut i32, slice.len())
    }
}

// Unsafe trait - implementor must uphold safety invariants
unsafe trait UnsafeTrait {
    fn dangerous_method(&self);
}

// Implementing unsafe trait requires unsafe impl
struct MyStruct;

unsafe impl UnsafeTrait for MyStruct {
    fn dangerous_method(&self) {
        println!("Implementing dangerous method safely");
    }
}

// Example: Custom allocator interface
struct SimpleAllocator;

unsafe impl std::alloc::GlobalAlloc for SimpleAllocator {
    unsafe fn alloc(&self, layout: std::alloc::Layout) -> *mut u8 {
        // This is a simplified example - real allocators are complex
        std::alloc::System.alloc(layout)
    }
    
    unsafe fn dealloc(&self, ptr: *mut u8, layout: std::alloc::Layout) {
        std::alloc::System.dealloc(ptr, layout)
    }
}

fn main() {
    let numbers = [1, 2, 3, 4, 5];
    
    // Using safe wrapper
    let sum = safe_sum_array(&numbers);
    println!("Sum: {}", sum);
    
    // Using unsafe function directly
    unsafe {
        let sum = dangerous_function(numbers.as_ptr() as *mut i32, numbers.len());
        println!("Direct unsafe sum: {}", sum);
    }
    
    // Using unsafe trait
    let my_struct = MyStruct;
    my_struct.dangerous_method();
}

FFI (Foreign Function Interface)

Calling C Functions

Interface with C libraries using extern blocks:

use std::ffi::{CStr, CString};
use std::os::raw::{c_char, c_int, c_void};

// Declare external C functions
extern "C" {
    fn strlen(s: *const c_char) -> usize;
    fn strcpy(dest: *mut c_char, src: *const c_char) -> *mut c_char;
    fn malloc(size: usize) -> *mut c_void;
    fn free(ptr: *mut c_void);
    fn printf(format: *const c_char, ...) -> c_int;
}

fn main() {
    // Working with C strings
    let rust_string = "Hello from Rust!";
    let c_string = CString::new(rust_string).expect("CString::new failed");
    
    unsafe {
        // Call C strlen function
        let len = strlen(c_string.as_ptr());
        println!("C strlen result: {}", len);
        
        // Call C printf function
        let format = CString::new("C printf: %s\n").unwrap();
        printf(format.as_ptr(), c_string.as_ptr());
        
        // Manual memory management with C malloc/free
        let size = 100;
        let ptr = malloc(size) as *mut c_char;
        
        if !ptr.is_null() {
            // Copy string to allocated memory
            strcpy(ptr, c_string.as_ptr());
            
            // Convert back to Rust string
            let copied_cstr = CStr::from_ptr(ptr);
            let copied_string = copied_cstr.to_string_lossy();
            println!("Copied string: {}", copied_string);
            
            // Free the allocated memory
            free(ptr as *mut c_void);
        }
    }
    
    // Safe wrapper for C string operations
    fn safe_c_strlen(s: &str) -> usize {
        let c_string = CString::new(s).expect("CString::new failed");
        unsafe { strlen(c_string.as_ptr()) }
    }
    
    println!("Safe wrapper result: {}", safe_c_strlen("Test string"));
}

// Define C-compatible functions that can be called from C
#[no_mangle]
pub extern "C" fn rust_add(a: c_int, b: c_int) -> c_int {
    a + b
}

#[no_mangle]
pub extern "C" fn rust_process_array(ptr: *mut c_int, len: usize) {
    if ptr.is_null() {
        return;
    }
    
    unsafe {
        for i in 0..len {
            *ptr.add(i) *= 2;
        }
    }
}

Union Types

Working with C-style Unions

Unions allow multiple interpretations of the same memory:

use std::mem;

#[repr(C)]
union MyUnion {
    integer: u32,
    float: f32,
    bytes: [u8; 4],
}

fn main() {
    let mut my_union = MyUnion { integer: 0x41424344 };
    
    unsafe {
        println!("As integer: 0x{:08x}", my_union.integer);
        println!("As float: {}", my_union.float);
        println!("As bytes: {:?}", my_union.bytes);
        
        // Modify through different fields
        my_union.float = 3.14159;
        println!("After setting float:");
        println!("As integer: 0x{:08x}", my_union.integer);
        println!("As float: {}", my_union.float);
        println!("As bytes: {:?}", my_union.bytes);
        
        // Byte manipulation
        my_union.bytes[0] = 0xFF;
        println!("After modifying first byte:");
        println!("As integer: 0x{:08x}", my_union.integer);
        println!("As float: {}", my_union.float);
    }
    
    // Tagged union for type safety
    #[repr(C)]
    union Value {
        int_val: i32,
        float_val: f32,
        bool_val: bool,
    }
    
    #[repr(C)]
    struct TaggedValue {
        tag: u8, // 0 = int, 1 = float, 2 = bool
        value: Value,
    }
    
    impl TaggedValue {
        fn new_int(val: i32) -> Self {
            TaggedValue {
                tag: 0,
                value: Value { int_val: val },
            }
        }
        
        fn new_float(val: f32) -> Self {
            TaggedValue {
                tag: 1,
                value: Value { float_val: val },
            }
        }
        
        fn as_int(&self) -> Option {
            if self.tag == 0 {
                unsafe { Some(self.value.int_val) }
            } else {
                None
            }
        }
        
        fn as_float(&self) -> Option {
            if self.tag == 1 {
                unsafe { Some(self.value.float_val) }
            } else {
                None
            }
        }
    }
    
    let tagged_int = TaggedValue::new_int(42);
    let tagged_float = TaggedValue::new_float(3.14);
    
    println!("Tagged int: {:?}", tagged_int.as_int());
    println!("Tagged float: {:?}", tagged_float.as_float());
}

Safe Abstractions

Building Safe APIs on Unsafe Foundations

Create safe interfaces that encapsulate unsafe code:

use std::ptr;
use std::alloc::{alloc, dealloc, Layout};

// Safe vector implementation using unsafe code internally
pub struct SafeVec {
    ptr: *mut T,
    len: usize,
    capacity: usize,
}

impl SafeVec {
    pub fn new() -> Self {
        SafeVec {
            ptr: ptr::null_mut(),
            len: 0,
            capacity: 0,
        }
    }
    
    pub fn with_capacity(capacity: usize) -> Self {
        if capacity == 0 {
            return Self::new();
        }
        
        let layout = Layout::array::(capacity).unwrap();
        let ptr = unsafe { alloc(layout) as *mut T };
        
        if ptr.is_null() {
            panic!("Allocation failed");
        }
        
        SafeVec {
            ptr,
            len: 0,
            capacity,
        }
    }
    
    pub fn push(&mut self, value: T) {
        if self.len == self.capacity {
            self.resize();
        }
        
        unsafe {
            ptr::write(self.ptr.add(self.len), value);
        }
        self.len += 1;
    }
    
    pub fn pop(&mut self) -> Option {
        if self.len == 0 {
            None
        } else {
            self.len -= 1;
            unsafe {
                Some(ptr::read(self.ptr.add(self.len)))
            }
        }
    }
    
    pub fn get(&self, index: usize) -> Option<&T> {
        if index < self.len {
            unsafe {
                Some(&*self.ptr.add(index))
            }
        } else {
            None
        }
    }
    
    pub fn len(&self) -> usize {
        self.len
    }
    
    pub fn capacity(&self) -> usize {
        self.capacity
    }
    
    fn resize(&mut self) {
        let new_capacity = if self.capacity == 0 { 1 } else { self.capacity * 2 };
        
        let new_layout = Layout::array::(new_capacity).unwrap();
        let new_ptr = unsafe { alloc(new_layout) as *mut T };
        
        if new_ptr.is_null() {
            panic!("Allocation failed");
        }
        
        unsafe {
            if !self.ptr.is_null() {
                // Copy existing elements
                ptr::copy_nonoverlapping(self.ptr, new_ptr, self.len);
                
                // Deallocate old memory
                let old_layout = Layout::array::(self.capacity).unwrap();
                dealloc(self.ptr as *mut u8, old_layout);
            }
        }
        
        self.ptr = new_ptr;
        self.capacity = new_capacity;
    }
}

impl Drop for SafeVec {
    fn drop(&mut self) {
        // Drop all elements
        while let Some(_) = self.pop() {}
        
        // Deallocate memory
        if !self.ptr.is_null() {
            unsafe {
                let layout = Layout::array::(self.capacity).unwrap();
                dealloc(self.ptr as *mut u8, layout);
            }
        }
    }
}

// Safe usage of unsafe code
fn main() {
    let mut vec = SafeVec::new();
    
    // All operations are safe from the user's perspective
    vec.push(1);
    vec.push(2);
    vec.push(3);
    
    println!("Length: {}", vec.len());
    println!("Capacity: {}", vec.capacity());
    
    while let Some(value) = vec.pop() {
        println!("Popped: {}", value);
    }
    
    // Demonstrate safety - this returns None instead of crashing
    println!("Get index 0 from empty vec: {:?}", vec.get(0));
}

Best Practices

Guidelines for Safe Unsafe Code

// Good: Clear documentation and minimal unsafe scope
/// Sums elements in a slice using unsafe pointer arithmetic.
/// 
/// # Safety
/// 
/// This function is safe because it only accesses elements within
/// the slice bounds, which are guaranteed by the slice type.
fn unsafe_sum_slice(slice: &[i32]) -> i32 {
    let mut sum = 0;
    let len = slice.len();
    
    if len == 0 {
        return 0;
    }
    
    let ptr = slice.as_ptr();
    
    // SAFETY: We know ptr is valid for `len` elements because it comes from a slice
    unsafe {
        for i in 0..len {
            sum += *ptr.add(i);
        }
    }
    
    sum
}

// Bad: Large unsafe block without clear justification
fn bad_example(data: &mut [i32]) {
    unsafe {
        // Too much code in unsafe block
        let ptr = data.as_mut_ptr();
        let len = data.len();
        
        for i in 0..len {
            *ptr.add(i) *= 2;
        }
        
        // More operations...
        if len > 0 {
            *ptr = 100;
        }
        
        // Even more code...
    }
}

Common Pitfalls

Mistakes to Avoid

Checks for Understanding

  1. What are the five things you can do in unsafe Rust that you can't do in safe Rust?
  2. Why must functions that contain unsafe operations be marked as unsafe?
  3. What's the difference between *const T and *mut T?
  4. How do you create a safe abstraction over unsafe code?
Answers
  1. Dereference raw pointers, call unsafe functions, access/modify static mutable variables, implement unsafe traits, access union fields
  2. To warn callers that they must ensure safety invariants are met, and to require explicit acknowledgment with unsafe blocks
  3. *const T is a raw pointer to immutable data; *mut T is a raw pointer to mutable data
  4. Encapsulate unsafe operations in functions that maintain safety invariants and expose only safe APIs to users