AbdooOS - Episode 2: The Boot Sector

AbdooOS - Episode 2: The Boot Sector

Let's understand just a tinsy bit what is a boot sector

What is a "Boot Sector"?

Remember what we said last time? That we should start from a boot sector to get into our beautiful OS' "real" part? I think I didn't say that...

So, our OS is composed of two main things:

  • Boot: A part of the OS which puts the system/computer into a state expected by the Kernel. And this happens by setting up some registers, setting up the stack, etc...

  • Kernel: The "real" part of our OS. This guy just starts doing really cool stuff. And the good thing in that, is the fact that this one part lets us use higher-level languages like C! (You can still use Rust but few people actually do that + the documentation about it is poor, but you can still take a look atthis blog).

What's real mode?

Real mode is the Processor Mode that most of the OS's out there boot with. And this for backward compatibility. So this mode is 16 bit, and is very limited, only 1Mb. I'm too dumb to explain real mode more than that so here is an explanation generated by ChatGPT: "Real mode in x86 processors is a 16-bit operating mode known for its direct hardware access, flat memory model, and backward compatibility with early CPUs and software. It allows direct interaction with hardware devices but has limitations such as a 1 MB memory limit and no memory protection. Despite its constraints, real mode remains relevant for legacy support and educational purposes, offering insights into low-level system programming and computer architecture".

Now, last time we only wrote a boot signature, which is 0xaa55, and then filled the sector with zeroes to occupy the 512 bytes of space. So, our boot sector doesn't do anything interesting yet—it doesn't do anything at all! But this makes us wonder: what is a sector?

What is a sector?

Imagine your disk is like a stack of vinyl records. Each record in the stack is a "platter." The arm of the record player that reads the information is the "head." Each platter is divided into many concentric circles called "tracks." Each track is then divided into small, fixed-size segments called "sectors."

Think of a sector as a slice of a pie, but instead of spanning from the center to the edge, it's a small segment of a track. The combination of a specific track and sector number uniquely identifies a location on the disk where data is stored.

Each sector has 512 bytes. Considering that 512 bytes is not nearly enough to store substantial data, our disk contains many sectors. And because there are a lot of sectors, our BIOS will probably get lost in all of those sectors. This is why we have that boot signature: so the BIOS recognizes that this is the boot sector. The boot sector has to be the first sector of the disk, the boot signature tells the BIOS that this is the boot sector and that it should execute it.

Let's print text

Alright, having a cute little boot signature is not enough to make a whole OS. So our next step is printing text. This is actually optional, because we won't really need to print text because we'll later use a higher level language that is C. But first, let's explain how the screen works.

How the screen works

The screen, or the screen buffer, is made of multiple "squares". Each of those squares are composed of 2 cells:

  • ASCII character cell: Where the ASCII character to print is stored.

  • Color cell: The color of the "square" (foreground + background).

Each of these cells are one byte large.

This means that if I want to print a new character at the cursor's current position, I'll have to manipulate the registers so that the BIOS reads my character and the color of that square to get the desired result. The ASCII character is stored in al, the lower bytes of the ax register, and the color cell is stored in bh, the higher bytes of the bx register. ah, the higher bytes of ax stores the code that tells the BIOS what we'll do with what's in al. In our case, we'll set it to 0x0e which means we'll print one character. By default, bh is set to 0x0f which means a white font.

Printing a character

We'll first see how to print a character to the screen. We can do this by storing our character in al( the BIOS will read the character from this register), and setting ah to 0x0e(meaning we'll print one character). Then, to print, we'll order the BIOS to do so by calling an "interrupt". Interrupts have multiple codes/types. In our case, we'll call the 10h interrupt which will print the character stored in al with the "parameter" stored in ah. To do that, we do this in Assembly (boot.asm) :

WARNING: The syntax used when you code in Assembly may vary between compilers. I don't like GNU's assembler's syntax. I'll instead use NASM.

; boot.asm

; This is a comment by the way

; We set the al register to our character
mov al, 'A'

; We can also write the character's Hex code instead of 
; the character directly, for example: 0x41 instead of 'A'.
; You can also use its decimal code: 65


; We don't need to set bh to anything for our color.
; It is by default white on black
mov ah, 0x0e    ; yo BIOS! Print one character!

; Call the interrupt
int 0x10

After compiling the boot sector with:

nasm boot.asm -f bin -o boot.bin

And running it with:

qemu-system-i386 -fda boot.bin

You'll see a beautiful 'A' printed where the cursor was. But doing all of this to print just one character is kinda annoying, so let's code a function to do it in 2 lines.

Print a character with a function

Functions in Assembly are quite easy, actually! Imagine that functions are islands we jump to using the jmp instruction, then we return to our lands using the ret instruction. To do all of that, we call that function using the instruction call. But to actually call it, we need to set up the stack (just imagine pancakes on top of each other, you can only add to the top, or remove from the top).

I'll give you a quick example (read the comments for comprehension):

; The binary generated by our compiler will start directly
; (starts with anything it sees, kinda like a Python script).

; set up the stack
mov ax, 0        ; can't set ss directly
mov ss, ax
mov sp, 0x7C00   ; the stack grows downward to not overwrite our OS

; # call our function
; calling our function just means "jump to that piece of code".
; This means that doing this will jump to the section "my_function"
; of the code.
call my_function

my_function:
    ; THIS IS OUR FUNCTION!!1!1!1!!!!
    ret    ; go back to where we called the function

This function doesn't do anything, but it explains (poorly) how a function works, in the most basic way. But we need a function that prints a character, so we'll start doing that. A function (like every programmer knows) can take arguments, but in Assembly, how are we gonna take arguments? The thing is that, we don't. We take local variables, set them to the desired value for the argument, then the variable is used in the function. In pseudo-code it is like this:

declare variable "char"

set "char" to 'A'
call function "print_char"

function "print_char"
    take the character in "char" and put it in al
    set ah to 0x0e
    call interrupt 0x10
    return from function

In Assembly, we don't use local variables; we use General Purpose Registers. You can think of them as local variables, but remember that they are limited. There are four of them: ax, bx, cx, and dx. We can set them to whatever we want, but we need to be wise in how we use them because of their limited number. Knowing that, we can use them as arguments we can pass to functions.

; Arguments:
;    - bx: the character
print_char:
    mov al, bx      ; we take bx's content
    mov ah, 0x0e    ; character printing function
    int 0x10        ; Yo BIOS! PRINT
    ret             ; return from function

Now if you do:

; We set up some registers
mov ax, 0
mov es, ax
mov ds, ax
mov fs, ax
mov gs, ax

; We set up the stack
mov ss, ax
mov sp, 0x7C00       ; the stack grows downwards

mov bx, 'B'
call print_char

You should see "B" printed. If it does get printed, then congrats! You just coded a function to print a character.

Printing a String

So now you know how to print a character, but now we gotta learn how to print full strings! Printing each character of a string that we stored in our mind using the previous technique is quite frustrating.

String printing theory

Let's first suggest a solution to print a string using only one function. We could write a function that takes a string which would be a collection of characters in memory, one after another. Then, we could have some kind of variable that tells us at what position of the string we are. The string would be null-terminated*(meaning the string ends with a decimal 0, not the character '0')*.

To sum up, what we'll do is:

  • Store our string in memory and null-terminate it.

  • Code a loop to print each character.

  • Stop printing when we read the null-character, zero.

String printing applied

To do what we suggested, we put our variable's name, then what this variable has. Like this:

msg_hello: db "Hello, World!", 0

msg_hello is the variable's name (WARNING: it is not really a variable, more of a pointer to where that string is stored. What we're doing is storing at that location in memory). The db instruction stands for "Define Byte". We write our string and then put a 0. This 0 means that the string is null-terminated. If we don't put that 0, our OS will keep reading memory without stopping. In other words, it will start reading our string from memory but will continue reading what is NOT our string. This could lead to reading other data and code that it shouldn't read, like some code to delete a disk's content (We don't know what it could execute, that's it).

Now, using one of the General Purpose Registers (ax,bx,cx,dx), we can keep track of what we're printing. I said using one. Using only one register, I can know where we are at printing while keeping in that same register the string we're printing. Let me explain: so basically, we stored our message at msg_hello's memory location. We can then put that address in bx. After that I can do what we did earlier to print a character, and increment bx by one to get the next character.

So let's actually do all of this:

; prints what's in bx
print_string:
    mov ah, 0x0e        ; printing character BIOS function
    mov al, bx          ; we set al to what's IN bx's address.
                        ; we're not setting al to bx's address.
    test al, al         ; check if al is 0 (if it is null-terminated)
    je print_done       ; if zero, we're done printing

    int 0x10            ; print!
    inc bx              ; next character

    jmp print_string    ; go again (loop)

print_done:
    ret                ; return from function

Let's explain quickly:

  • We set ah to 0x0e to tell the BIOS that we're printing chars.

  • We set al to bx's content (the first character of the characters' chain).

  • We compare between al and 0. If al is zero, then we jump to the print_done label, then we return from where we got called.

  • If we're not null-terminated yet, we print al using the 10h interrupt.

  • We increment bx so next time in the loop, we'll print the next char.

That was simple, wasn't it? But this way is kind of heavy, and what if we're using bx ? We're limited to 4 registers (there will be more registers when we quit real mode). So there is a better way of doing it: instead of using bx as our argument for the print_string function, we'll use the si register (source index). This register is useful because instead of setting al to bx's content, comparing it and going to next character, we use a special instruction which is lodsb, which means "Load String Byte". This instruction takes our string stored in si and puts it in al, and handles that "go to next character" thing.

Now we can change our function to do it in this new way:

print_string:
    lodsb            ; load next byte (character) from si 
    test al, al      ; is al = 0?
    je print_done    ; if null-terminated, then we're done
    mov ah, 0x0e     ; character printing function
    int 0x10         ; print!
    jmp print_string ; start again to do same thing but 
                     ; with next character, thanks to lodsb

print_done:
    ret              ; return to where we got called

Isn't that shorter and simpler? Now, let's try printing something, here is the whole code:

[org 0x7c00]    ; BIOS puts boot sector there
[bits 16]       ; This is 16 bit code (real mode moment)

; set up registers
mov ax, 0    ; can't set registers directly
mov es, ax
mov ds, ax
mov fs, ax
mov gs, ax

; set up the stack
mov ss, ax        ; ax is already 0
mov sp, 0x7C00    ; Stack grows downward


mov si, msg_hello    ; we say where our message is stored
call print_string

print_string:
    lodsb            ; load next byte
    test al, al      ; is al = 0?
    je print_done
    mov ah, 0x0e
    int 0x10
    jmp print_string

print_done:
    ret    ; return

msg_hello: db "Hello, World!", 0

times 510-($-$$) db 0    ; fill sector with 0s
dw 0xaa55

When compiling the boot code and running it, you should see a cool "Hello, World!" printed. You can also use macros in NASM to not always rewrite something. Macros are like a developer's variables. Let me explain through an example: %define ENDL 0x0D, 0x0A. We just defined a macro. To put a new line when printing, we need those two characters (Carriage return + New line feed). We stored these two in one macro so it is easier for us, the developer. When compiling, they will get replaced by their values. One useful way of using macros is to add a newline using only one word:

%define ENDL 0x0D, 0x0A    ; carriage return + new line feed

msg_hello: db "Hello, World!", ENDL, 0    ; we put our new line 
                                          ; before null-termination

Conclusion

Alright! I think we're done with printing stuff. Now, you should technically know what a boot sector is. I didn't explain it correctly, but I want this blog to stay simple. The basic idea of a boot sector is a program that runs when the OS starts, that's it. You, as the developer, can do anything with that boot sector, but it would be cool to do something cooler, no? So next time, we'll know how to set up our development environment (tools, environment, scripts...), and we'll also look at how to add a file system to our OS so we can do dope stuff.

You can always check some code on my OS' GitHub Repo. And if you wanna ask something, try DM'ing me, abdooowd, on Discord. And thanks for reading!

References