Welcome to the second part of our series of articles on Basic Reverse Engineering! Link to heading

β–ˆβ–ˆ   β–ˆβ–ˆ  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  
 β–ˆβ–ˆ β–ˆβ–ˆ  β–ˆβ–ˆ   β–ˆβ–ˆ β–ˆβ–ˆ       
  β–ˆβ–ˆβ–ˆ    β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  
 β–ˆβ–ˆ β–ˆβ–ˆ  β–ˆβ–ˆ   β–ˆβ–ˆ β–ˆβ–ˆ    β–ˆβ–ˆ 
β–ˆβ–ˆ   β–ˆβ–ˆ  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ   β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  
                         

Now that we’ve explored the fundamental concepts of CPU in the first article, it’s time to understand how data is represented and manipulated internally by computers. Let’s talk about numbering systems, with a focus on hexadecimal, which is crucial for those working with low-level reverse engineering.

In our first article, I said - Don’t get bogged down in an endless string of β€œwhys!?”. And I’ll say it again: Don’t worry about understanding all the details at once. The aim here is to understand the essence of the concepts and how each one contributes to the functioning of the computer system. The idea is not to get lost in technical explanations, but to understand how everything fits together, and this will become very clear in practice in the future.

Binary System Link to heading

The binary system is the language of computers. It is a numbering system that uses only two symbols as switch states: 0 -> off and 1 -> on. It is the basis of computing, and the reason for this is that: computers, at their deepest level, work with electrical circuits that can be in two distinct states: on or off. The 0 and 1 are the way to represent these binary states. Under the hood, that’s what the system is:

0001010101011010101011100101010010101010...

But obviously, we don’t have direct contact with it and we don’t even see this binary. This is processed by the layers of abstraction, such as: circuits, CPU, operating system and various other parts that make up the computer, which handle the binary for us.

To form characters and numbers, the bits are repeated, just like in any other number system.

  • 1 bit can represent 2 values: 0 or 1.
  • 2 bits can represent 4 values: 00, 01, 10 e 11.
  • 3 bits can represent 8 values: 000, 001, 010, 011, 100, 101, 110 e 111. And so on…

In the operating system, everything is represented by binaries. Each binary sequence can represent something specific: it can be an escape key, a space key, an input, an output, a number, a letter, and so on. Everything has a meaning encoded in binary.

You can see in the ASCII table how text outputs are represented in binary.

  • Here is an example of such a table.

Characteristics of the Binary System Link to heading

  • Base 2**: The binary system is made up of two symbols: 0 e 1.
    • Just like any other number system, such as decimal (0, 1, 2, 3, 4, 5, 6, 7, 8, 9), the binary system uses two digits. With just these two symbols, we can represent infinite numbers, just by rearranging them in different ways.
  • Positional: Each digit has a value that depends on its position in the sequence. This is because the system is positional and base 2, i.e. the position of each bit defines its value.
  • Data storage: Hard disks, SSDs, RAM and other storage devices use the binary system to store information. Each bit is a string, which represents data, and this data is interpreted by the operating system.
  • Data representation: All data and instructions on a computer are represented in binary. This includes text, images, videos, programs and any other type of information. Even the instructions that the CPU executes are represented in binary, and are interpreted according to context.
  • Representation: In the operating system, it is represented by 0b, i.e.: 0b1100110.

Conclusion Link to heading

Although the binary system is fundamental to computing, you probably won’t need to come into contact with it directly in your day-to-day life. However, understanding how it works is important, because in the future you may need some basic knowledge of binaries to understand other concepts related to reverse engineering.

Auxiliary resources Link to heading

Hexadecimal Link to heading

Well, now we’re going to talk about hexadecimal. I believe it’s the most widely used number system for reverse engineering. When dealing with memory addresses and low-level data manipulation, the hexadecimal system is much more compact and readable than the binary system.

The hexadecimal system is base 16, which means that it uses 16 different symbols to represent values. These symbols are:

0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F

Numbers from 10 to 15 are represented by letters:

  • A = 10
  • B = 11
  • C = 12
  • D = 13
  • E = 14
  • F = 15
... 8, 9, A, B, C, D, E, F, 10, 11, 12...

10 equals 16 and so on…

How Hexadecimal Numbering Works Link to heading

Just as in the decimal system, where each position represents a power of 10, in hexadecimal the power corresponds to the position of each number in the hexadecimal system.

Powers of 16 from 0 to 5: Link to heading
(16⁰)
(16¹) (16²) (16³) (16⁴) (16⁡)
1 16 256 4.096 65.536 1.048.576

**Let’s take an example:
In the number 2F (hexadecimal):

  • 2 occupies the second position from the left:
    • 2 x 16ΒΉ = 2 x 16 = 32
  • F occupies the first position on the left:
    • F represents 15 in decimal, so:
    • 15 x 16⁰ = 15 x 1 = 15

Therefore, 2F (hexadecimal) is equal to 32 + 15 = 47 in decimal.

Let’s take another example, the number: 0x543210.

  • 5(16⁡) x 1.048.576 = 5.242.880
  • 4(16⁴) x 65.536 = 262.144
  • 3(16Β³) x 4.096 = 12.288
  • 2(16Β²) x 256 = 512
  • 1(16ΒΉ) x 16 = 16
  • 0(16⁰) x 1 = 0

The power of each number indicates how many times it must be multiplied by the corresponding value of the position.

Result Link to heading

Now let’s add these values together to get the decimal number 0x543210:

5.242.880+262.144+12.288+512+16+0 = 5.517.840

So 0x543210 is equal to 5,517,840 in decimal.

Inserting letters Link to heading

we can also insert letters into hexadecimal numbers. we learned above that the letters A to F represent the values from 10 to 15, so let’s use the number 0xAB2345 as an example, and convert it from hexadecimal to decimal:

Don’t forget: A =10 B = 11

  • A(16⁡) x 1.048.576 = 10.485.760
  • B(16⁴) x 65.536 = 720.896
  • 2(16Β³) x 4.096 = 8.192
  • 3(16Β²) x 256 = 768
  • 4(16ΒΉ) x 16 = 64
  • 5(16⁰)x 1 = 5
(AB2345)₁₆ = (10 Γ— 16⁡) + (11 Γ— 16⁴) + (2 Γ— 16Β³) + (3 Γ— 16Β²) + (4 Γ— 16ΒΉ) + (5 Γ— 16⁰) = (11215685)₁₀

add up the result, and we get 11,215,685. So 0xAB2345 is 11,215,685 in decimal.
Cool, right?

Additional Features: Link to heading

Hexadecimal in Modern Operating Systems Link to heading

Well, all that math behind hexadecimals won’t be necessary, since it’s the operating system that handles it. But there are a few things you should know/remember when programming, reverse engineering or the like. Here they are:

  • In a modern operating system, 1 number in hexadecimal is equivalent to a 4-digit binary number (4 bits). For example:
1101 = D
0001 = 1
0110 = 6
0011 = 3
  • In modern systems such as windows and linux, we use the prefix 0x, for example 0xD163.

  • Dynamic memory addresses are represented in hexadecimal.

intel-gen
  • Static addresses/memory offsets are represented in hexadecimal
intel-gen
  • Data is represented in hexadecimal.
intel-gen
  • When we reverse engineer, we’ll be dealing with hexadecimal numbers.
intel-gen

In short, hexadecimal is very important because it is present in everything.

x86 Byte, Word, Double Word Link to heading

Bom, essas sΓ£o as nomenclaturas para a quantidade de dados que um determinado segmento armazena.

Measure Size (Intel) Intel Nomenclature
Byte 8 bits BYTE
Word 16 bits WORD
Double Word 32 bits DWORD
Quad Word 64 bits QWORD

When we program or analyze a malware source code, we will deal with these nomenclatures, especially when dealing with WINAPI. It’s important because, for example, if you declare a variable of size DWORD, you’ll be storing a 32-bit value:

DWORD procid = NULL;

The registers are equivalent to these sizes too. See:

  • AX = 16 bits (WORD)
  • EAX = 32 bits (DWORD)
  • RAX = 64 bits (QWORD)

Well, that’s it from here. The rest we learn with practice. So practice as much as you can, and these concepts will sink into your mind.
Thank you for reading this far!

 β–Œ ▐·      β–ͺ  β–ͺ  Β·β–„β–„β–„β–„  
β–ͺβ–ˆΒ·β–ˆβ–Œβ–ͺ     β–ˆβ–ˆ β–ˆβ–ˆ β–ˆβ–ˆβ–ͺ β–ˆβ–ˆ 
β–β–ˆβ–β–ˆβ€’ β–„β–ˆβ–€β–„ β–β–ˆΒ·β–β–ˆΒ·β–β–ˆΒ· β–β–ˆβ–Œ
 β–ˆβ–ˆβ–ˆ β–β–ˆβ–Œ.β–β–Œβ–β–ˆβ–Œβ–β–ˆβ–Œβ–ˆβ–ˆ. β–ˆβ–ˆ 
. β–€   β–€β–ˆβ–„β–€β–ͺβ–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β–€β€’