What is the role of a bootloader, and why do you need one? Couldn't you just load the operating system kernel directly instead?
Overview
The bootloader is the first piece of code you can control running on your system, and the second piece of code that runs after reset. It is responsible for overseeing the tasks needed for the rest of the system to become functional. Due to its position in the boot process, it may have the only level of privilege appropriate for handling firmware updates. In some cases, it may also be combined with the operating system kernel.
Initialize DRAM
One of the most important roles for a bootloader on modern platforms is DRAM initialization. In my previous post, I mentioned that on new systems, the bootloader is often executed from SRAM, and therefore responsible for enabling DRAM. This approach is significantly more flexible for the developer, but at the cost of it now being required to implement.
Because the BootROM is no longer responsible for enabling DRAM, it is possible to make design choices such as encoding a number of possible timings into the bootloader's software, and then reading a small number of hardware configuration GPIOs or fuses to determine which timings to use.
However, this comes with the drawback that the bootloader must fit into SRAM, and 128K may not always be enough to fit all of the boot logic. In that case, a second stage can be loaded into DRAM once it is initialized, with access to all of system memory instead of just the narrow SRAM window.
Initialize secure monitor
The initialization of the secure monitor should be done early in the boot process. This is because the operating system will depend on the secure monitor firmware for power management, and the secure monitor binary needs to occupy a specific region in physical memory.
The BootROM starts at the secure monitor privilege level, and installs its own call vectors to allow loading a new image. Enabling the secure monitor requires the bootloader to load code into a specific region in memory, and then execute a call back into the BootROM to replace the secure monitor. The new code will then run and return back into the bootloader, where the boot process can proceed as normal.
Firmware update and anti-rollback
The bootloader is likely to be involved in device firmware updates. Though userspace updates are preferred for file contents, it is common to add a mode to the bootloader to update a system, often over a USB device port. Such code must be handled with extreme care, as USB is a common source of vulnerabilities, on top of the requirement of verifying the update binary itself.
The bootloader itself may blow an anti-rollback fuse when it is updated. When loading itself, it may check the anti-rollback fuses to ensure only the correct number have been blown, and check the operating system during verification to ensure only the correct version of the operating system is being loaded.
For systems which use a redundant partitioning scheme, the bootloader is also responsible for handling switching between the redundant partitions, as well as automatic rollback on repeated boot failures.
Decompress the kernel
Even if you kernel image is small enough that you can read it from storage reasonably quickly, you should consider compressing the kernel with an algorithm like LZ4. This will give a speed boost on all but the highest-performance storage.
The way this is typically done is:
- Finalized kernel image is generated by linker script
- Loader stub written in C/assembly is the first part of the kernel that runs, filled at link time with the relative offset and size of the kernel attached to it
- Loader stub decompresses the kernel into memory. For LZ4, this is an approximately 50-line algorithm.
Load the device tree and initial processes
The device tree will most likely be passed to the kernel by the bootloader and will describe specific functions of the system. The kernel should keep this mapped in memory and accessible to driver processes.
It is possible to have the kernel initialize storage and filesystem access by itself to load the initial processes. However, it is more robust if all filesystem management occurs in userspace driver processes. The best way to do this is to have the bootloader pass a bundle of initial process binaries as well - Linux calls this strategy initramfs
.
Chain of trust
Assuming the BootROM has already cryptographically validated the bootloader, and the bootloader has validated the bundle of (kernel, device tree, initial processes), then the chain of trust is still intact at this point. To maintain it, the integrity of each loaded process must be verified. One possible design that respects the chain of trust is to delegate a specific process to be the "process loader" for the system - it calls into the filesystem to read the process binary, verifies the header signature, and only then can it create the process.
Cryptography-free kernel
For security purposes, it is recommended to keep the kernel as small as possible. This means that you should generally avoid extraneous functions like drivers and cryptography. Tasks like non-cryptographic random number generation can be accomplished with strategies like a Mersenne Twister updated with timing information from started processes. Applications which require cryptographically random numbers should have their own system to generate them, which might rely on other processes like drivers.
Other security considerations
Data verification can often be just as, or more important, than process verification. It is important to design your filesystem driver to validate the integrity of files as well as their authenticity.
Avoid system call designs that imitate functionality like fork
. This functionality may seem attractive, but in practice it results in complicated copy-on-write code, and causes pathological system behavior that affect system availability and make swap mandatory for sane reclamation (suppose a process which is currently consuming all memory forks).
When your bootloader is the operating system
In this case, the BootROM will an uncompressed version of the bootloader directly into DRAM or SRAM. This provides the fastest possible startup time, but can be very limiting due to the small loadable area. Use this strategy only when boot performance is the most important aspect of the system.