Posts with category - Operating Systems Development

Booting An Operating System On x86

As I am writing my tiny toy operating system, I’ll document some of the things I discover as I am doing it. I’ll gloss over things that have answers better documented in places like the OSDev Wiki, but will go into little details that was non-obvious (to me) that I had to find out through trial and error, poring over documents and asking on irc.

So, where does one start to write an OS? We can choose to start at the very beginning: booting. Example code here is from the OS I am working on called Treehouse.

What happens when an x86 system boots? It powers on, loads up the bootup firmware (UEFI or BIOS), and from there it will load up a bootloader that will load your kernel. A bootloader can be a simple thing that just loads up your kernel which it finds on a specific place on disk, or it can be pretty sophisticated, performing hardware initialization, reading different filesystems, or presenting a user menu that lets you select a kernel to boot.

Intel machines boot in real mode, the legacy mode with no memory protection and a 20-bit address space that gets you a whopping 1MB of RAM. Why? “Historical reasons”, as real mode is rarely ever has any modern uses. We’ll need protected mode, which all current operating systems need for things like being able to use your entire address space, virtual memory, paging, etc. Now, we can switch to protected mode ourselves in our kernel, but it’s a way easier to just let a bootloader like Grub do it for you, and we can assume protected mode when the system gets handed over to our kernel. It’s a cheat but it makes life simpler, and dammit Jim, we’re writing a kernel not a bootloader.

The boot.S file is the first Treehouse code that the bootloader will call. Most of it is from the OSDev Wiki Bare Bones Tutorial, and it’s well documented over there but here I’ll attempt to explain the bits that it doesn’t explain which puzzled me. It’s written in AT&T syntax assembly and it’s pretty short so I’ll just plonk it down here:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# Boot.S taken mostly from the OSDEV wiki.

# Constants for the multiboot header
.set MULTIBOOT_ALIGN, 1<<0
.set MULTIBOOT_MEMINFO, 1<<1
.set MULTIBOOT_MAGIC, 0x1BADB002
.set MULTIBOOT_FLAGS, MULTIBOOT_ALIGN | MULTIBOOT_MEMINFO
.set MULTIBOOT_CHECKSUM, -(MULTIBOOT_MAGIC + MULTIBOOT_FLAGS)

# The multiboot header
.section .multiboot
.global _multiboot

.align 4
.long MULTIBOOT_MAGIC
.long MULTIBOOT_FLAGS
.long MULTIBOOT_CHECKSUM

.section .bootstrap_stack, "aw", @nobits
stack_bottom:
.skip 16384 # 16 KiB
stack_top:

.section .text
.global _start
.type _start, @function
_start:
movl $stack_top, %esp
pushl %ebx

call kernel_main

cli
hlt

.failsafeloop:
jmp .failsafeloop

.size _start, . - _start

Lines 4-8 are setting of symbols to expressions for the multiboot header, and the .set assembler directive works a little like #define in C, except instead of a search-and-replace it is actually assigning a value to the symbol, so it expects an expression there. Next, we plonk down the multiboot header in its own section, with the intent of putting that section in the very beginning of the binary in our link file. Sections are just named parts of the source which can be called anything you like, but depending on what you’re building you’re going to need a few “canonical” sections like .text (if you’re designing an executable), .data and .bss.

The next section is the bootstrap stack, which isn’t a “canonical” section in binaries, but the assembler lets you create any kind of section that you like for your own nefarious ends, and you can describe the properties of that section with attributes. A bootstrap stack is the stack you’re going to use while the kernel is bootstrapping itself. Code usually needs a stack for us to be able to do things like use stack variables and make function calls, so we’re going to have to reserve some space for it. For the attributes we use “aw” and @nobits. The “aw” means “allocatable and writable”. Writable (implies readable too) because you’d want to write to it (obviously) and allocatable means its loaded into memory at runtime. The @nobits is an attribute to indicate it’s not a section to be stored on disk but exists only at runtime and to be initialised to zeroes when starting up. The syntax is funky if you’ve never seen it before, but you won’t need to dabble with assembler syntax very much if you’re writing most of your code in C, like I’m doing here.

Next up is the .text section, which starts with the _start function, the literal starting point of execution for our kernel. As you can see, all it does is set up the address of stack_top and put it into the stack pointer register, push the EBX register onto the stack, and call the main function (kernel_main) which we’re going implement in C.

Now we don’t ever expect to return from kernel_main, but in the case it does, we have several failsafes; first is the cli and hlt instructions, which disables interrupts and halts the CPU, but in case that fails too I have an infinite loop there in the form of .failsafeloop.

Finally the last line sets the size of the _start symbol to current location minus the start, which is something from the OSDEV tutorial which will be useful further down when I do call tracing and debugging, so we’ll talk about it later when it comes up.

Next up, we’ll talk about the linker script!

No Comments

Treehouse: A Beginner’s OS on x86

I am currently trying to learn how to write an operating system, from scratch, for the fun of it. I have a few ARM boards, but I also want to get my hands on the Intel Edison, which is a 32-bit 500Mhz dual core Atom SoC designed for the internet of thingamajigs. It’ll be a while before I even get something usable running, but I’ve started to write a little kernel called Treehouse which you can follow the progress of on my Github page.

I don’t have a lot of time to work on it, so progress will be slow, but I would like to document what I do on this blog, mainly for my own records but if you’re interested you can try it out too. Currently you can run it within Qemu or Bochs, and it boots up and prints a little banner. You’ll need a baremetal 32-bit x86 compiler or cross-compiler (if you’re on something else, like amd64) to compile it.

 

No Comments

Toolchains For Building A Baremetal Kernel

If you’re writing your own kernel, you’ll need a compiler for it. The one that comes packaged with your Linux distribution may not cut it, because it’s not a baremetal or freestanding compiler; it expects a C library to be present. When you’re starting off your OS, there won’t be any C library to link to. You’ll need a baremetal compiler at this stage.

Obtaining The Compiler

There are a number of ways you can get yourself a compiler for building a kernel project, the simplest way is to find one already built for your system. However, it also isn’t too hard to build one from scratch and a number of guides exist on how to do this. The best of which is probably the OSDev Wiki guide, which worked flawlessly when I tried it today. If you’re building from scratch and assuming you want to use GCC as your compiler, you basically need the source code to the following:

  • GCC
  • Binutils
  • G++ (for building GCC)
  • GNU Make
  • Flex
  • GNU GMP
  • GNU MPFR
  • GNU MPC

There are optional things called Cloog and ISL which I left out, because they often aren’t synced up with the latest GCC. Also if you’re building on Ubuntu you’ll need to apt-get install the following packages: build-essential, bison, gawk, and texinfo.

Target Triplets

You need one compiler for each instruction set architecture (ISA) you’re going to target for your operating system. I intend to target ARM but I am building on an x86 workstation, so I want an ARM cross compiler. The procedure is the same for any architecture you target, you simply need to be aware of the target triplet of the compiler your chosen architecture.

There is something that I learned about ARM target triplets which isn’t entirely clear from most guides; some have claimed there is no difference between “EABI” and “GNUEABI”. This isn’t true. The GNUEABI option expects a GNU C library to be present, whereas EABI is the one you want for baremetal kernel building. This post by Mark Mitchell about compiling U-Boot enlightened me on the differences between the two:

U-Boot is not a GNU/Linux application. However, you’re using the
GNU/Linux toolchain to compile it — so the libraries assume the
presence of a GNU/Linux C library. In this case, the division routine
wants to call “raise” to signal a division-by-zero exception.

People often try to abuse the GNU/Linux toolchain to build U-Boot
because they want to use the same toolchain that they use to build the
Linux kernel and GNU/Linux applications. But, U-Boot is really a
bare-metal application, and, as such, should be built with a bare-metal
toolchain, like our “ARM EABI” toolchains. There are often these kinds
of problems with U-Boot when moving between different architectures or
toolchain versions because the U-Boot source code has tricks to try to
make the GNU/Linux toolchain work, and those tricks only work with
particular toolchains.

Also, when you have your cross-compiler, you probably also want a cross GDB too for debugging. When compiling GDB you need to set the target triplet to match the one for your cross-compiler. That took me a while to figure out.

 

No Comments