Blog Post - reverse engineering arm64 hypervisor internals, exploring the el2 abstraction

reverse engineering arm64 hypervisor internals, exploring the el2 abstraction

Posted: Dec 15 2024

i’ve spent a bit of time analyzing apple’s hypervisor framework on m-series silicon. what started as just curiosity kinda turned into an obsession with understanding how these translation layers between high-level virtualization apis and the bare-metal hardware primitives actually function.

the el2 privilege boundary

arm64’s exception level hierarchy creates isolation boundaries with el2 carved out specifically for hypervisor operations. what’s particularly cool is seeing how hypervisor.framework abstracts away the complex VMSA constructs like stage-2 page tables while still delivering good performance for guest vms.

i spent a few weeks reverse engineering the framework, the most intriguing part was looking through the memory-mapping mechanism that creates the mappings between guest physical addresses and host virtual addresses:

// this is a simplified interpretation, not actual apple code
static int map_guest_memory(hv_vcpuid_t vcpu, hv_gpaddr_t gpa, void *hva, size_t len, hv_memory_flags_t flags) {
    // stage-2 translation table entry components based on ARM architecture
    uint64_t s2tte = 0;
    uint64_t phys_addr = 0;
    
    // in a real implementation the hypervisor would convert HVA to PA here
    phys_addr = get_physical_address(hva);
    
    // these are based on the ARM architecture reference manual
    const uint64_t S2TTE_VALID = 0x1;
    const uint64_t S2TTE_TABLE = 0x3;  // For table entries
    const uint64_t S2TTE_PAGE = 0x3;   // For page entries
    const uint64_t S2TTE_AF = (1ULL << 10); // Access flag
    const uint64_t S2TTE_AP_RW = (0ULL << 6); // Read-write
    const uint64_t S2TTE_AP_RO = (2ULL << 6); // Read-only
    const uint64_t S2TTE_XN = (1ULL << 54); // Execute-never
    
    s2tte = phys_addr & ~(0xFFFULL); // 4KB aligned physical address
    s2tte |= S2TTE_VALID | S2TTE_PAGE | S2TTE_AF;
    
    if (flags & HV_MEMORY_READ) {
        if (flags & HV_MEMORY_WRITE) {
            s2tte |= S2TTE_AP_RW;
        } else {
            s2tte |= S2TTE_AP_RO;
        }
    }
    
    if (!(flags & HV_MEMORY_EXEC)) {
        s2tte |= S2TTE_XN;
    }
    
    return update_stage2_tables(vcpu, gpa, s2tte, 0); // 0 = 4KB page level
}

i find it frankly elegant how the hypervisor framework bridges the gap between the high-level MMA and the arm64 translation table format

memory management complexities

i noticed some edge cases in the memory management subsystem. when you start playing with overlapping memory regions the hypervisor has to perform some serious gymnastics to track references and permissions correctly:

void explore_memory_mapping() {
    hv_vm_t vm;
    hv_vm_create(&vm);
    
    size_t region_size = 16 * 1024 * 1024; // 16MB
    void *host_region = mmap(NULL, region_size, PROT_READ | PROT_WRITE, 
                            MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
    
    hv_gpaddr_t guest_base = 0x10000000;
    hv_vm_map(vm, guest_base, host_region, region_size, 
             HV_MEMORY_READ | HV_MEMORY_WRITE);
    
    // here i create overlapping sub-regions with different permissions
    for (int i = 0; i < 16; i++) {
        size_t offset = i * 1024 * 1024;
        size_t subregion_size = 1024 * 1024 + 4096; // Slightly larger than 1MB
        
        hv_vm_map(vm, guest_base + offset, host_region + offset, subregion_size, 
                 HV_MEMORY_READ | HV_MEMORY_WRITE | HV_MEMORY_EXEC);
    }
    
    // unmap alternating regions
    for (int i = 0; i < 16; i += 2) {
        size_t offset = i * 1024 * 1024;
        hv_vm_unmap(vm, guest_base + offset, 1024 * 1024);
    }
    
    // at this point, you get partially overlapping regions and different permissions
}

this kind of pattern actually creates some absolutely fascinating challenges for the hypervisor’s internal state tracking. particularly, the way it has to maintain reference counts when dealing with these overlapping memory regions that have different permissions or get unmapped in weird, partial chunks.

theoretical security considerations

the complexity of memory management in hypervisors raises some incredibly interesting security questions. in any sophisticated hypervisor implementation, you have to consider:

the precision of reference counting algorithms when handling overlapping memory regions
coherent permission tracking across multiple mappings of the same physical memory pages
subtle race conditions during concurrent mapping/unmapping operations
the correctness of partially unmapped page handling

these aren’t unique to apple’s implementation tbh but they’re fundamental challenges in hypervisor design. the arm64 architecture provides robust hardware isolation through el2, but the software layers orchestrating these hardware features require meticulous attention to these subtle edge cases.

the most fascinating aspect to me is the gap between architectural specifications and actual implementations

future research directions

this exploration has opened up several tantalizing research paths i’m itching to pursue:

automated fuzzing of hypervisor memory management edge cases
formal verification approaches for reference counting algorithms
comparative performance analysis of different memory mapping strategies
architectural differences in hypervisor implementations across platforms

i’m particularly interested in developing new methodologies for testing hypervisor memory management subsystems.

while this was a fun side project it has been a fascinating journey. the memory management subsystem in particular represents a masterclass in careful design, requiring sophisticated handling of permissions, references, and address translations.

while my exploration is fundamentally based on reverse engineering and experimentation, i believe this highlights the incredible importance of understanding the implementation details beneath our convenient abstraction layers. virtualization keeps getting more and more mainstream and increasingly central to modern computing infrastructure, these details carry significant implications for both performance engineering and security architecture.

i’ll be publishing more detailed findings if this work progresses. if you’re working on something similar or are interested in collaborating on any of these research directions, feel free to reach out