What is assembly? Link to heading

Assembly is a low-level programming language, it is written to communicate directly with the processor hardware. Unlike high-level languages, such as the famous Python, assembly operates close to machine code, the famous binary! Each assembly instruction generally represents a single CPU operation, such as moving data between registers, performing arithmetic operations or controlling the flow of execution.

Assembly is not universal!! It depends directly on the processor architecture. For example, assembly code written for x86 processors will not work on an ARM architecture, which means that learning assembly also means learning about the internal structure and operation of the CPU in question. So, if you want to learn assembly, papers like the one by our member voiiid can help you a lot on this journey! https://pwnbuffer.org/en/posts/void/x86/

Here is a table of the x86 and x86_64 registers, showing EAX, EBX… and their equivalents, RAX, RBX…

2

Assembly is a very old language, emerging in the mid-50s, but even so, it is very relevant, but why is it still relevant?

Simply because it is an essential tool in several areas of computing, such as reverse engineering, to analyze binaries and understand software without access to the source code.

Assembly is NOT a generic language! It reflects the architecture of the processor being used. This means that registers, instructions and addressing modes vary between architectures such as x86, x86_64, ARM & MIPS. Learning assembly is learning the internal language of the processor. This direct connection makes assembly indispensable for understanding how C code is transformed into binary instructions and how the system actually executes each line of code. It reveals what is “under the carpet”.

2


x86/x86_64 architecture Link to heading

The main difference between 32-bit and 64-bit architectures is the width of the registers, that is, the amount of data that the processor can handle at once:

  • 32 bits –> registers and addresses are limited to 4GB
  • 64 bits –> registers and addresses can access up to 16 exabytes, theoretically (2⁶⁴) although the limit in practice is much lower

In addition to the increased capacity, the x86_64 architecture brings more registers, which improves performance and reduces the need to constantly access memory.


Main registers Link to heading

x86 Link to heading

  • EAX, EBX, ECX, EDX –> general registers
  • ESP –> stack pointer
  • EBP –> base pointer
  • ESI/EDI –> memory copying & manipulation

Each register is 32 bits (i.e. 4 bytes) they can also be partially accessed like AX, AH, AL (16 and 8 bits)


x86_64 Link to heading

  • all 32-bit registers have been expanded: EAX –> RAX, EBX –> RBX, etc.
  • addition of new registers R8 through R15
  • stack pointer and base pointer have also changed: ESP –> RSP & EBP –> RBP

Each register now stores 64 bits (8 bytes) and can also be accessed in portions, for example: RAX (64 bits), EAX (32 bits), AX (16 bits), AL (8 bits)


Structure of an Assembly Program Link to heading

An assembly language program is usually divided into sections that organize the data and code, for example:

  • .data –> where initialized data is stored (such as strings and variables with defined values)
  • .bss –> for uninitialized data (reserved variables, but without initial values)
  • .text –> contains the executable code of the program
section .data
    msg db "Olá, Mundo!", 0Ah

section .text
    global _start
_start:
    ;sys write
    mov eax, 4
    mov ebx, 1
    mov ecx, msg
    mov edx, 13
    int 0x80

    ;sys exit
    mov eax, 1
    xor ebx, ebx
    int 0x80

This example uses linux system calls with int 0x80, which are valid on 32-bit systems. It simply prints "Hello, World!" to the terminal and then terminates the program.


Key Concepts Link to heading

  • Common instructions:

  • MOV –> moves data between registers and memory

  • ADD, SUB –> addition and subtraction

  • CMP –> compares values ​​(used before JMP)

  • JMP, JE, JNE, etc. –> jumps (conditional or not)

  • Flags:

    • ZF (Zero Flag) –> set if the result is zero
    • CF (Carry Flag) –> indicates overflow in unsigned operations
    • SF (Sign Flag) –> indicates if the result is negative
    • OF (Overflow Flag) –> indicates overflow in signed operations
  • Stack:

    • PUSH –> pushes a value onto the stack
    • POP –> removes the top of the stack
    • CALL –> pushes the return address and jumps to the function
    • RET –> returns from the function to the saved address
section .data
    msg db "exec func!", 0Ah
    len equ $ - msg

section .text
    global _start

_start:
    call message

    mov eax, 1
    xor ebx, ebx
    int 0x80

message:
    push eax
    push ebx

    mov eax, 4
    mov ebx, 1
    mov ecx, msg
    mov edx, len
    int 0x80

    pop ebx
    pop eax
    ret

In this example, the program calls the message function using CALL, which automatically PUSH the return address.

Inside the function:

  • The eax & ebx registers are saved with PUSH
  • The write syscall prints "exec func!"
  • Then, POP restores the original registers
  • And the RET jumps back to after the CALL, where the program continues

Calling Conventions Link to heading

Calling conventions define how functions receive arguments, return values ​​and manipulate the stack. These are the rules that ensure compatibility between assembly code and other languages ​​such as C!

Main Conventions Link to heading

  • cdecl –> C declaration, common in 32-bit linux

    • arguments: passed on the stack, from right to left
    • responsible for cleaning the stack: whoever calls the caller function
    • return: via EAX
    • widely used with: GCC, C/C++ languages
  • stdcall –> Common on 32-bit Windows

    • arguments: passed on the stack, from right to left
    • responsible for cleaning the stack: the function called callee
    • return: via EAX
    • widely used with: Windows APIs
  • sycv –> System V AMD64 ABI, default on 64-bit Linux

    • arguments: via registers, in this order:
      • RDI, RSI, RDX, RCX, R8, R9
      • extras go to the stack
    • return: via RAX
    • responsible for saving temporary registers: caller

Visual summary: Link to heading

Convention Platform Arguments via Return Who cleans the stack
cdecl Linux 32bit Stack (rig → left) EAX Who calls
stdcall Windows 32b Stack (rig → left) EAX Call function
sysv Linux 64bit Registers + Stack RAX Who calls

Well, we’ve reached the end of another paper, thank you very much for reading this far! I hope I’ve helped you in some way, and here are the sources I used to create this article! ;)


Sources used to construct this paper: