Stefan's CV

When we write in high-level languages, we’re abstracted away from what the machin is actually seeing. But underneath every Python script, every Java application, and every C program lies assembly language - the last human-readable layer before raw machine code.

I’m not here to discuss the process of compilation and such in great detail, there are plenty of great courses on that, but rather discuss something else. Confusingly when everyone discusses Assembly in passing they typically talk about it it the same way we talk about C, though as I learnt through doing “Assembly” (ARM) at University, “Assembly” (x86/x64) for a personal project, and “Assembly” (MIPS) in industry, Assembly is more of a class of “languages” than a single language in itself

I’d like to explore the three dominant assembly language architectures: MIPS, ARM, and x86. Each represents a different philosophy in processor design, shaped by decades of evolution and vastly different design goals. By looking how each architecture approaches the same fundamental operations, we can better understand not just how processors work, but why they work the way they do.

The Three Architectures

Let’s begin by getting a quick intro to the three contenders.

MIPS: The Academic Standard

MIPS represents RISC design in its purest form. Developed in the early 1980s at Stanford University, MIPS (Microprocessor without Interlocked Pipelined Stages) prioritizes simplicity and orthogonality. Every instruction is exactly 32 bits long, and the instruction set is remarkably regular - there are very few special cases to remember.

This consistency makes MIPS particularly valuable for education. When you’re learning how pipelining works or understanding hazards, MIPS’s transparency helps you see exactly what’s happening at each stage.

ARM: Efficiency Through Evolution

ARM began at Acorn Computers in the 1980s, but its modern incarnation - particularly the 64-bit AArch64 we’ll focus on - represents decades of refinement. ARM processors dominate mobile devices and are making significant progress into the server market.

What makes ARM interesting is how it balances RISC principles with practical optimizations. While maintaining fixed-length instructions and a load-store architecture, ARM includes features like sophisticated addressing modes and a barrel shifter that can transform operands on the fly. The architecture manages to be both simple and efficient - a difficult balance to strike.

x86: The Survivor

The x86 architecture tells a different story - one of evolution through accretion. Starting with Intel’s 8086 in 1978, x86 has grown through successive extensions: 16-bit to 32-bit (386) to 64-bit (x86-64/AMD64). It’s a CISC architecture, meaning single instructions can perform complex operations.

Where MIPS and ARM were designed, x86 evolved. This shows in its variable-length instructions (1 to 15 bytes), complex addressing modes, and occasionally bizarre conventions. Yet x86’s complexity enables remarkable code density and performance optimizations that keep it competitive despite its baggage.

The Evolution of Register Names: A Historical Perspective

The naming conventions for registers reveal much about each architecture’s history and design philosophy.

x86: An Archaeological Record

The x86 register names are essentially an archaeological record of computing history. In 1978, Intel’s 8086 introduced 16-bit registers with purposeful names:

AX - The Accumulator, inherited from earlier processors where most arithmetic operations involved a special accumulator register
CX - The Count register, automatically used by loop and string instructions
DX - The Data register, used for I/O port addressing and extended precision arithmetic
BX - The Base register, designed for base addressing in segmented memory models

Intererestingly though numerous sources incorrectly list them in Alphabetical order, if you look into it, the A,D,C,B order is actually correct, due to some fun history: Link Here

Each could be split into high and low bytes (AH/AL, BH/BL, etc.), allowing byte-level operations when memory was precious.

When Intel introduced the 80386 in 1985, they needed 32-bit registers but wanted to maintain backward compatibility. Their solution? Prefix the names with ‘E’ for Extended: EAX, EBX, ECX, EDX. The lower 16 bits remained accessible as AX, maintaining perfect compatibility with older code.

AMD faced the same challenge when designing x86-64 in the early 2000s. They chose ‘R’ as the prefix (RAX, RBX, etc.), with ‘R’ simply meaning Register. But when adding new registers beyond the original eight, they abandoned the letter scheme entirely, opting for R8 through R15. This pragmatic shift from mnemonic to numeric naming reflects the exhaustion of meaningful letter combinations.

ARM’s Systematic Approach

ARM’s evolution shows more deliberate planning. The 32-bit architecture used a straightforward R0-R15 naming scheme. When ARM designed AArch64, they made a clean break: X0-X30 for 64-bit registers, with W0-W30 providing 32-bit views of the same registers. This systematic approach avoids the historical baggage of x86 while maintaining clarity.

MIPS: Consistency from the Start

MIPS, being designed rather than evolved, maintains the most consistent naming: $0-$31, with meaningful aliases ($zero, $sp, $ra) that indicate conventional usage without enforcing it at the hardware level.

Register Lineup

Let’s see what we’re working with in each architecture:

MIPS Registers

$0 / $zero    - Hardwired to zero (always returns 0, writes are ignored)
$1 / $at      - Reserved for the assembler
$2-$3 / $v0-$v1 - Return values
$4-$7 / $a0-$a3 - Function arguments
$8-$15 / $t0-$t7 - Temporary (caller-saved)
$16-$23 / $s0-$s7 - Saved (callee-saved)
$24-$25 / $t8-$t9 - More temporaries
$26-$27 / $k0-$k1 - Reserved for kernel
$28 / $gp     - Global pointer
$29 / $sp     - Stack pointer
$30 / $fp     - Frame pointer
$31 / $ra     - Return address

ARM64/AArch64 Registers

X0-X7    - Arguments and results (X0 is also the return value)
X8       - Indirect result location
X9-X15   - Temporary registers
X16-X17  - Intra-procedure scratch (IP0, IP1)
X18      - Platform register (OS reserved)
X19-X28  - Callee-saved registers
X29/FP   - Frame pointer
X30/LR   - Link register (return address)
SP       - Stack pointer (not part of the general register file!)
PC       - Program counter (can't directly access)
XZR      - Zero register (always reads 0)

W0-W30   - 32-bit views of X0-X30 (lower 32 bits)

x86-64 Registers

RAX - Accumulator (return value, multiply/divide target)
RBX - Base (callee-saved, general use)
RCX - Counter (loop counts, 4th argument in Linux)
RDX - Data (3rd argument in Linux, high bits of multiply)
RSI - Source Index (2nd argument in Linux)
RDI - Destination Index (1st argument in Linux)
RBP - Base Pointer (frame pointer)
RSP - Stack Pointer
R8-R15 - "New" registers added with x86-64

And you can access smaller portions:
RAX → EAX (32-bit) → AX (16-bit) → AH/AL (8-bit each)
R8 → R8D (32-bit) → R8W (16-bit) → R8B (8-bit)

Let’s Write Some Code!

Enough theory. Let’s see how these architectures handle common operations. (Apologies in advance for any mistakes in code)

Moving Data Around

The most basic operation - loading a value:

# MIPS - straightforward
li $t0, 100        # load immediate 100 into $t0

# ARM64 - equally simple
mov x0, #100       # 64-bit move
mov w0, #100       # 32-bit move (zeros upper 32 bits)

# x86-64 - same idea, different syntax
mov rax, 100       # 64-bit move
mov eax, 100       # 32-bit move (also zeros upper 32 bits!)

Loading from memory shows more differences:

# MIPS - base + offset addressing
lw $t0, 0($s0)     # load word from address in $s0

# ARM64 - square brackets for memory access
ldr x0, [x1]       # load 64-bit value
ldr w0, [x1]       # load 32-bit value

# x86-64 - also uses square brackets
mov rax, [rbx]     # load 64-bit value
mov eax, [rbx]     # load 32-bit value

Arithmetic

Addition is where you start seeing personality differences:

# MIPS - three-operand format (dst, src1, src2)
add $t0, $t1, $t2  # $t0 = $t1 + $t2

# ARM64 - same three-operand style
add x0, x1, x2     # x0 = x1 + x2

# x86-64 - two-operand (dst is also first source)
add rax, rbx       # rax = rax + rbx

x86’s two-operand format means it destroys one of its inputs. This seems limiting, but it makes the instruction encoding more compact. Classic CISC trade-off.

Division shows how different these architectures really are:

# MIPS - puts quotient and remainder in special registers
div $t1, $t2       # Lo = quotient, Hi = remainder
mflo $t0           # move from Lo to $t0

# ARM64 - clean and simple
sdiv x0, x1, x2    # x0 = x1 / x2 (signed)
udiv x0, x1, x2    # unsigned version

# x86-64 - the dividend is in rdx:rax (128-bit value!)
xor rdx, rdx       # clear upper 64 bits
mov rax, 100       # dividend in rax
mov rbx, 7         # divisor
idiv rbx           # rax = quotient, rdx = remainder

x86’s division is weird because it’s designed for extended precision - you can divide a 128-bit number by a 64-bit number. This made sense in the 1970s when registers were precious, but today it’s just a quirk we live with.

Stack Operations Visualized

The stack is fundamental to function calls and local storage. Let’s examine how each architecture manipulates the stack, with visual representations of what happens in memory.

Push Operations

# MIPS - Manual stack manipulation
addi $sp, $sp, -4  # Decrement stack pointer
sw $t0, 0($sp)     # Store value

# Stack visualization:
# Before:          After:
# ┌─────────┐      ┌─────────┐
# │ (empty) │ ←sp  │  $t0    │ ←sp
# ├─────────┤      ├─────────┤
# │  data   │      │  data   │
# └─────────┘      └─────────┘

# ARM (AArch64) - Pre-decrement addressing
str x0, [sp, #-16]! # Push x0 onto stack (pre-decrement)

# Stack visualization:
# Before:          After:
# ┌─────────┐      ┌─────────┐
# │ (empty) │      │   x0    │ ←sp
# ├─────────┤      ├─────────┤
# │ (empty) │ ←sp  │ (empty) │
# ├─────────┤      ├─────────┤
# │  data   │      │  data   │
# └─────────┘      └─────────┘
# Stack grows ↑ (16-byte alignment required)

# x86-64 - Dedicated push instruction
push rax           # Push rax onto stack

# Stack visualization:
# Before:          After:
# ┌─────────┐      ┌─────────┐
# │ (empty) │ ←rsp │   rax   │ ←rsp
# ├─────────┤      ├─────────┤
# │  data   │      │  data   │
# ├─────────┤      ├─────────┤
# │  data   │      │  data   │
# └─────────┘      └─────────┘
# Stack grows ↑ (automatic 8-byte operation)

Pop Operations

# MIPS - Manual restoration
lw $t0, 0($sp)     # Load value
addi $sp, $sp, 4   # Increment stack pointer

# Stack visualization:
# Before:          After:
# ┌─────────┐      ┌─────────┐
# │  value  │ ←sp  │ (value) │
# ├─────────┤      ├─────────┤
# │  data   │      │  data   │ ←sp
# └─────────┘      └─────────┘
# $t0 = value

# ARM (AArch64) - Post-increment addressing
ldr x0, [sp], #16  # Pop from stack into x0

# Stack visualization:
# Before:          After:
# ┌─────────┐      ┌─────────┐
# │  value  │ ←sp  │ (value) │
# ├─────────┤      ├─────────┤
# │ (empty) │      │ (empty) │ ←sp
# └─────────┘      └─────────┘
# x0 = value

# x86-64 - Dedicated pop instruction
pop rax            # Pop from stack into rax

# Stack visualization:
# Before:          After:
# ┌─────────┐      ┌─────────┐
# │  value  │ ←rsp │ (value) │
# ├─────────┤      ├─────────┤
# │  data   │      │  data   │ ←rsp
# └─────────┘      └─────────┘
# rax = value

The key observation here is that MIPS requires explicit pointer arithmetic, ARM cleverly combines memory access with pointer updates, and x86 abstracts the entire operation into atomic instructions. Each approach reflects the architecture’s design philosophy: MIPS values transparency, ARM values efficiency, and x86 values code density.

Branching and Conditions

This is where things get interesting:

# MIPS - compare and branch are often separate
slt $t0, $t1, $t2  # set $t0 to 1 if $t1 < $t2
beq $t0, $zero, skip  # branch if $t0 == 0

# ARM64 - sets condition flags
cmp x0, x1         # compare and set flags
b.eq label         # branch if equal
b.lt label         # branch if less than

# x86-64 - also uses flags
cmp rax, rbx       # compare and set flags
je label           # jump if equal
jl label           # jump if less

ARM64’s branch syntax (b.eq, b.lt) is pretty readable. x86 uses two-letter mnemonics (je = jump if equal, jl = jump if less). MIPS doesn’t have condition codes at all - it uses explicit comparison instructions that put results in registers.

Complete Examples: Theory in Practice

Let’s examine how these architectural differences manifest in real code.

Array Summation

This fundamental operation reveals how each architecture approaches memory access and loop construction.

# MIPS Version
# a0 = array pointer, a1 = length, returns sum in v0
sum_array:
    li $v0, 0              # sum = 0
    li $t0, 0              # i = 0
loop:
    beq $t0, $a1, done     # if i == length, exit
    sll $t1, $t0, 2        # t1 = i * 4 (sizeof(int))
    add $t1, $a0, $t1      # t1 = &array[i]
    lw $t2, 0($t1)         # t2 = array[i]
    add $v0, $v0, $t2      # sum += array[i]
    addi $t0, $t0, 1       # i++
    j loop
done:
    jr $ra                 # return

# MIPS requires explicit calculation of array offsets.
# The shift-left-logical (sll) computes the byte offset,
# which must then be added to the base pointer.

# ARM64 Version
# x0 = array pointer, x1 = length, returns sum in x0
sum_array:
    mov x2, #0             # sum = 0
    mov x3, #0             # i = 0
loop:
    cmp x3, x1
    b.eq done
    ldr w4, [x0, x3, lsl #2]  # load array[i] with scaled index
    add w2, w2, w4         # sum += array[i]
    add x3, x3, #1         # i++
    b loop
done:
    mov x0, x2             # return sum
    ret

# ARM's scaled register addressing [x0, x3, lsl #2]
# performs array indexing in a single instruction.
# The processor automatically shifts the index left by 2.

# x86-64 Version (System V ABI)
# rdi = array pointer, rsi = length, returns sum in rax
sum_array:
    xor eax, eax           # sum = 0
    xor rcx, rcx           # i = 0
loop:
    cmp rcx, rsi
    je done
    mov edx, [rdi + rcx*4] # array[i] using SIB addressing
    add eax, edx           # sum += array[i]
    inc rcx                # i++
    jmp loop
done:
    ret

# x86-64's Scale-Index-Base (SIB) addressing mode
# [rdi + rcx*4] is even more flexible than ARM's,
# allowing base + (index × scale) + displacement
# in a single instruction.

Recursive Factorial: Stack Frame Management

Recursion provides an excellent lens through which to examine each architecture’s approach to function calls and stack management.

# MIPS - Explicit frame management
factorial:
    addi $sp, $sp, -8
    sw $ra, 4($sp)         # save return address
    sw $a0, 0($sp)         # save n
    
    slti $t0, $a0, 2       # t0 = (n < 2)
    beq $t0, $zero, recurse
    li $v0, 1              # base case: return 1
    j done
    
recurse:
    addi $a0, $a0, -1      # n - 1
    jal factorial          # recursive call
    lw $a0, 0($sp)         # restore n
    mul $v0, $v0, $a0      # result * n
    
done:
    lw $ra, 4($sp)
    addi $sp, $sp, 8
    jr $ra

# ARM64 - Paired operations
factorial:
    stp x29, x30, [sp, #-16]!  # save frame pointer and link register
    
    cmp x0, #1
    b.gt recurse
    mov x0, #1
    b done
    
recurse:
    str x0, [sp, #-16]!    # save n
    sub x0, x0, #1
    bl factorial           # branch and link
    ldr x1, [sp], #16      # restore n
    mul x0, x0, x1
    
done:
    ldp x29, x30, [sp], #16
    ret

# x86-64 - Hardware-assisted calls
factorial:
    push rbp
    mov rbp, rsp
    
    cmp rdi, 1
    jg recurse
    mov rax, 1
    jmp done
    
recurse:
    push rdi               # save n
    dec rdi
    call factorial
    pop rdi                # restore n
    imul rax, rdi          # result * n
    
done:
    pop rbp
    ret

MIPS requires manual management of every aspect of the stack frame. The return address must be explicitly saved before making a recursive call, as jal (jump and link) overwrites the $ra register.

ARM64’s stp and ldp instructions can save/restore register pairs, reducing instruction count. The link register (x30) serves the same purpose as MIPS’s $ra, but ARM’s addressing modes make stack operations more concise.

x86-64’s call instruction automatically pushes the return address onto the stack, and ret pops it, eliminating the need for explicit return address management. This hardware support for the call stack is a CISC characteristic that simplifies function call sequences.

System Calls: Talking to the OS

System calls are where you really see the platform differences:

# MIPS (SPIM simulator) - syscall numbers in v0
li $v0, 1          # print integer syscall
move $a0, $t0      # integer to print
syscall

# ARM64 Linux - syscall numbers in x8
mov x8, #64        # write syscall
mov x0, #1         # stdout
svc #0             # supervisor call

# x86-64 Linux - syscall numbers in rax
mov rax, 1         # write syscall
mov rdi, 1         # stdout
syscall            # note: not "int 0x80" anymore!

That old int 0x80 instruction you might have seen in 32-bit x86 code? It’s gone in 64-bit. We use syscall now, which is faster.

Memory Addressing Modes: Philosophy in Silicon

The addressing modes available in each architecture reveal fundamental design philosophies.

MIPS: Simplicity Above All

Register: $t0
Immediate: 100
Base + Offset: 8($sp)

MIPS offers only the essential addressing mode. This limitation is intentional - it simplifies hardware implementation and makes pipeline behavior predictable.

ARM64: Pragmatic Flexibility

Register: x0
Immediate: #100
Register Indirect: [x0]
Pre-indexed: [x0, #16]! (updates x0 before access)
Post-indexed: [x0], #16 (updates x0 after access)
Register offset: [x0, x1]
Scaled Register: [x0, x1, lsl #3]

ARM strikes a balance, offering enough addressing modes to handle common patterns efficiently without the complexity explosion of x86.

x86-64: Maximum Flexibility

Register: rax
Immediate: 100
Direct: [0x400000]
Register Indirect: [rax]
Base + Index: [rax + rbx]
Scale-Index-Base: [rax + rbx*8 + 16]
RIP-relative: [rip + label]

The x86-64 SIB byte enables encoding of base + (index × scale) + displacement in a single instruction. This complexity requires sophisticated decode hardware but enables extremely compact code for array and structure access.

Calling Conventions: How Functions Talk

This is where you really need to pay attention:

MIPS: Arguments in $a0-$a3, return in $v0-$v1. Simple!

ARM64: Arguments in x0-x7, return in x0. Still pretty simple.

x86-64: Oh boy. On Linux, the first six integer arguments go in rdi, rsi, rdx, rcx, r8, r9. On Windows, it’s rcx, rdx, r8, r9. Why? Because history is messy.

Architectural Quirks and Design Decisions

Every architecture has peculiarities that reflect its history and design trade-offs.

MIPS: The Delay Slot

MIPS has a famous quirk: the instruction following a branch executes regardless of whether the branch is taken. This “delay slot” made sense in the original MIPS pipeline design, allowing branches to be resolved without stalling. Modern implementations often nullify this behavior, but the architectural requirement remains.

beq $t0, $t1, target
addi $t2, $t2, 1    # This executes before the branch!

ARM64: Mandatory Stack Alignment

AArch64 requires the stack pointer to maintain 16-byte alignment at all times. Violating this constraint causes immediate program termination. This requirement enables SIMD operations on stack data and improves cache performance, but it can surprise developers coming from more forgiving architectures.

x86-64: Partial Register Updates

Perhaps the most subtle x86-64 quirk involves partial register updates:

Writing to a 32-bit register (e.g., mov eax, 1) zeros the upper 32 bits of the 64-bit register
Writing to 16-bit or 8-bit portions (e.g., mov ax, 1) leaves the upper bits unchanged

This inconsistency arose from AMD’s desire to eliminate false dependencies in 64-bit mode while maintaining backward compatibility. It’s a perfect example of how historical decisions create modern complexity.

Choose your Fighter

The choice of which assembly language to learn depends on your goals and context.

For Computer Architecture Students / Teachers

MIPS remains the gold standard for education. Its transparency and regularity make it ideal for understanding fundamental concepts like pipelining, hazards, and cache behavior. You’ll find MIPS in most computer architecture textbooks for good reason - it doesn’t obscure the concepts with complexity.

For Embedded Systems Development

ARM, particularly ARM Cortex-M series for microcontrollers, dominates the embedded space. If you’re working with IoT devices, robotics, or any battery-powered computing, ARM assembly knowledge is invaluable. The newer AArch64 is increasingly important as ARM moves into the server and desktop markets.

For Systems Programming

x86-64 remains essential for PC and server development. Despite its complexity, understanding x86-64 assembly is crucial for:

Operating system development
Compiler optimization
Security research and exploit development
High-performance computing

Philosophical Differences

These architectures embody different design philosophies that have shaped modern computing:

MIPS represents the RISC ideal: make the common case fast, keep the hardware simple, and let the compiler handle complexity. Its influence extends far beyond its market share - many modern architectures borrow MIPS concepts.

ARM demonstrates that commercial success requires pragmatism. By carefully adding complexity where it provides clear benefits (like conditional execution and sophisticated addressing modes), ARM achieved an optimal balance for mobile computing.

x86 proves that backward compatibility and raw performance can overcome architectural inelegance. Its CISC nature, once considered a liability, enables sophisticated hardware optimizations like micro-op fusion and complex branch prediction.

Looking Forward

Understanding these three architectures provides insight into the tensions that shape processor design:

Simplicity vs. Performance: MIPS shows the benefits of simplicity, but ARM and x86 demonstrate that strategic complexity can yield performance gains
Hardware vs. Software Complexity: RISC pushes complexity to compilers; CISC handles it in hardware
Evolution vs. Revolution: x86 evolved through extensions; ARM64 made a clean break from ARM32
General Purpose vs. Specialization: All three have spawned specialized variants for specific domains

Final Thoughts

Assembly language programming offers unique insights into how computers actually work. While high-level languages abstract away the machine, assembly reveals it in full detail. The differences between MIPS, ARM, and x86 aren’t just syntax - they represent different solutions to the fundamental challenges of processor design.

Whether you’re optimizing critical code paths, debugging system-level issues, or simply satisfying curiosity about what happens beneath the abstractions, understanding these assembly languages provides invaluable perspective. Each architecture tells a story about the evolution of computing, the trade-offs in processor design, and the endless balance between simplicity and capability.

The machine code that ultimately executes on our processors may seem far removed from the Python or JavaScript we write daily. But understanding that gap - bridged by these assembly languages - makes us better programmers, better debuggers, and better engineers.

Keyboard Shortcuts