As I am writing my tiny toy operating system, I’ll document some of the things I discover as I am doing it. I’ll gloss over things that have answers better documented in places like the OSDev Wiki, but will go into little details that was non-obvious (to me) that I had to find out through trial and error, poring over documents and asking on irc.
So, where does one start to write an OS? We can choose to start at the very beginning: booting. Example code here is from the OS I am working on called Treehouse.
What happens when an x86 system boots? It powers on, loads up the bootup firmware (UEFI or BIOS), and from there it will load up a bootloader that will load your kernel. A bootloader can be a simple thing that just loads up your kernel which it finds on a specific place on disk, or it can be pretty sophisticated, performing hardware initialization, reading different filesystems, or presenting a user menu that lets you select a kernel to boot.
Intel machines boot in real mode, the legacy mode with no memory protection and a 20-bit address space that gets you a whopping 1MB of RAM. Why? “Historical reasons”, as real mode is rarely ever has any modern uses. We’ll need protected mode, which all current operating systems need for things like being able to use your entire address space, virtual memory, paging, etc. Now, we can switch to protected mode ourselves in our kernel, but it’s a way easier to just let a bootloader like Grub do it for you, and we can assume protected mode when the system gets handed over to our kernel. It’s a cheat but it makes life simpler, and dammit Jim, we’re writing a kernel not a bootloader.
The boot.S file is the first Treehouse code that the bootloader will call. Most of it is from the OSDev Wiki Bare Bones Tutorial, and it’s well documented over there but here I’ll attempt to explain the bits that it doesn’t explain which puzzled me. It’s written in AT&T syntax assembly and it’s pretty short so I’ll just plonk it down here:
# Boot.S taken mostly from the OSDEV wiki.
# Constants for the multiboot header
.set MULTIBOOT_ALIGN, 1<<0
.set MULTIBOOT_MEMINFO, 1<<1
.set MULTIBOOT_MAGIC, 0x1BADB002
.set MULTIBOOT_FLAGS, MULTIBOOT_ALIGN | MULTIBOOT_MEMINFO
.set MULTIBOOT_CHECKSUM, -(MULTIBOOT_MAGIC + MULTIBOOT_FLAGS)
# The multiboot header
.section .bootstrap_stack, "aw", @nobits
.skip 16384 # 16 KiB
.type _start, @function
movl $stack_top, %esp
.size _start, . - _start
Lines 4-8 are setting of symbols to expressions for the multiboot header, and the .set assembler directive works a little like #define in C, except instead of a search-and-replace it is actually assigning a value to the symbol, so it expects an expression there. Next, we plonk down the multiboot header in its own section, with the intent of putting that section in the very beginning of the binary in our link file. Sections are just named parts of the source which can be called anything you like, but depending on what you’re building you’re going to need a few “canonical” sections like .text (if you’re designing an executable), .data and .bss.
The next section is the bootstrap stack, which isn’t a “canonical” section in binaries, but the assembler lets you create any kind of section that you like for your own nefarious ends, and you can describe the properties of that section with attributes. A bootstrap stack is the stack you’re going to use while the kernel is bootstrapping itself. Code usually needs a stack for us to be able to do things like use stack variables and make function calls, so we’re going to have to reserve some space for it. For the attributes we use “aw” and @nobits. The “aw” means “allocatable and writable”. Writable (implies readable too) because you’d want to write to it (obviously) and allocatable means its loaded into memory at runtime. The @nobits is an attribute to indicate it’s not a section to be stored on disk but exists only at runtime and to be initialised to zeroes when starting up. The syntax is funky if you’ve never seen it before, but you won’t need to dabble with assembler syntax very much if you’re writing most of your code in C, like I’m doing here.
Next up is the .text section, which starts with the _start function, the literal starting point of execution for our kernel. As you can see, all it does is set up the address of stack_top and put it into the stack pointer register, push the EBX register onto the stack, and call the main function (kernel_main) which we’re going implement in C.
Now we don’t ever expect to return from kernel_main, but in the case it does, we have several failsafes; first is the cli and hlt instructions, which disables interrupts and halts the CPU, but in case that fails too I have an infinite loop there in the form of .failsafeloop.
Finally the last line sets the size of the _start symbol to current location minus the start, which is something from the OSDEV tutorial which will be useful further down when I do call tracing and debugging, so we’ll talk about it later when it comes up.
Next up, we’ll talk about the linker script!