The Hello World Kernel Module

Cave programmers carving Hello World on cave computers

published: 

As a brief introduction to this (very long) essay. What lies below are my notes while completing the 1st in a set of 20 tasks from The Eudyptula Challenge. Each task, emailed one at a time, starting with building a "hello world" kernel module (this essay) and progressed in difficulty until we ultimately submit patches into the main tree of the Linux kernel. The ultimate goal of The Eudyptula Challenge is to get new developers comfortable with the somewhat unique world of kernel development by separating the "on-boarding" process into focused manageable tasks.

Sadly, The Eudyptula Challenge is no longer accepting new applicants. However, if you wish to work on the tasks yourself, I've published the 20 tasks I've managed to find along with the code I used to "complete" them in a git repo here.

Task No.1

Write a Linux kernel module, and stand-alone Makefile, that when loaded prints to the kernel debug log level, "Hello World!" Be sure to make the module be able to be unloaded as well.

The Makefile should build the kernel module against the source for the currently running kernel, or, use an environment variable to specify what kernel tree to build it against.

Please show proof of this module being built, and running, in your kernel. What this proof is is up to you, I'm sure you can come up with something. Also be sure to send the kernel module you wrote, along with the Makefile you created to build the module.

—Little Penguin

What Is A Module

A kernel module is piece of code designed to be loaded and unloaded on demand by our kernels. For example, the device drivers for your keyboard or a network card are a type of module. By separating the kernel into individual software components, we can keep the overall size of the kernel small, letting Linux fit into the smallest of embedded systems. Some kernel modules, like the one we'll be building, can even be installed without the need to recompile and reboot our kernel, making upgrades easy, and saving us a lot of time.

If you have access to a Linux machine, you can find the modules that are currently loaded into the kernel by using the lsmod command, which gets its information from /proc/modules.

Chiseling A Cave Module

Every kernel module must have at least two functions, one that will be called when we install the module and another function to remove it from the kernel. Back in the pre v2.3 era (early 2000s) this could only be done with a "start" function, called init_module() and an "end" function, called cleanup_module(). There are more modern (and preferred) methods available to us today, however some developers still use these, so it's a great starting point.

#include <linux/kernel.h>  /* for KERN_DEBUG */
#include <linux/module.h>  /* for all kernel modules */

int init_module(void)
{
        printk(KERN_DEBUG "Hello World.\n");
        return 0; /* init_module loaded successfully */
}

void cleanup_module(void)
{
        printk(KERN_DEBUG "oh, the rest is silence.\n");
}

Typically init_module() is used to register handlers or alter some other part of the kernel for a device or something. The cleanup_module() will then undo those changes, allowing the module to be removed safely from the kernel. Both of these functions (as of version 5.7) can be found on line 75, as well as everything else we need, in linux/module.h of the source code.

printk() != printf()

To print Hello World on "the kernel debug log level", we'll need to use another, very old, function called printk(). Unlike the printf() commonly used in userspace applications, printk() is not designed to communicate to the user (or say hello to worlds). It's a logging mechanism used to give warnings and to log messages. This is why each printk() statement also comes with a priority. There are currently 8 defined priorities we can use ranging from KERN_DEBUG to KERN_EMERG. You can see them all, and their definitions, currently (version 5.7) in linux/kern_levels.h in the source code.

Pay attention to the single argument passed to printk(). Looking into the source code shows that printk(const char *ftm, ...) accepts only one string, with space to pass extra arguments to format the string if needed, for example, our "Hello World" statement from above, which doesn't need formatting and therefore passes no extra arguments:

printk(KERN_DEBUG "Hello World.\n");

The KERN_DEBUG macro will expand to "\001" "7", turning our statement into:

printk("\001" "7" "Hello World.\n");

Our C lexer will then combine the adjacent string literals to produce our formatted string for the kernel to log:

printk("\0017Hello World.\n");

Even though printk() is falling out of style with modern Linux maintainers, as we will see in later sections, there is a lot more to read about how to work with printk() and format specifiers in the kernel in the documentation here if you're into that kind of stuff.

Making A Kernel Module

Much like how kernel modules are a little different than userspace application modules, the Makefiles that compile the kernel are also a bit different than Makefiles in userspace.

Originally, as the Linux code-base grew, so did its Makefiles. As they continued to grow in complexity, they eventually became a burden to maintain. Fortunately a solution, called the "kbuild system", was created and accepted into the kernel to help organize and simplify the kernel's building process. If you are interested, there is an entire section about the kbuild system in the documentation.

Kbuild Makefile

Just like Makefiles in userspace, we can start a Kbuild Makefile by creating a new file called …wait for it…Makefile in the same folder as our hello-world.c module we made in the sections above.

$ ls -l
total 8
-rw-rw-r-- 1 me us 903 Jul  5 00:00 hello-world.c
-rw-rw-r-- 1 me us 167 Jul  5 00:00 Makefile

We can alternatively use the name Kbuild (not preferred) to indicate to other developers that the Makefile is intended to run using the kbuild system. However, while the Kbuild name is not preferred, interestingly, if both Makefile and Kbuild files exist in the same directory the Kbuild file will be used. (source)

Goal Definitions

The "heart" of the kbuild system uses lines called "goal definitions" to define all the various target files, special compilation options, and any sub-directories to enter. When we compile the kernel (with its thousands of Makefiles) the goal definitions are collected and used to build all the various, documentation files, modules, and other files we need for our particular kernel.

The simplest Kbuild Makefile we can write for our module contains a single line:

obj-m += hello-world.o

obj-m tells kbuild that our hello-world.o object file is a loadable kernel module (LKM) that can be loaded and unloaded at any time without needing to reboot the kernel. This line will also tell the kbuild system to look for files in our directory named hello-world.c or hello-world.S to compile into the hello-world.o object file, before building the kernel object file hello-world.ko we'll use to load into our kernel.

Convenience Targets

For the pure convenience of it, we can add extra phony targets to our Kbuild Makefile to easily compile our module for the kernel currently running on our computer, simplifying the task of compiling our module down to just typing make into our terminals:

all:
    ${MAKE} -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules

And make clean to clean up everything afterwards:

clean:
    ${MAKE} -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean

Both of these phony targets use the -C option to move out of our current directory and into our kernel's source directory. There make can find and use the top most kbuild Makefile, which takes the M option to locate the folder we are current working in, and build the files defined using the obj-m goal definition we setup above.

Installing A Kernel Module

Just as with our userspace applications, kernel modules need to be compiled. Using the kbuild system, along with our convenience targets above, we can compile our kernel module by issuing the make command, and if all goes well, you should see an output similar to this:

$ make
make -C /lib/modules/4.15.0-108-generic/build M=/home/me/src/eudyptula ...
/tasks/01 modules
make[1]: Entering directory '/usr/src/linux-headers-4.15.0-108-generic'
  CC [M]  /home/me/src/eudyptula/tasks/01/hello-world.o
  Building modules, stage 2.
  MODPOST 1 modules
WARNING: modpost: missing MODULE_LICENSE() in /home/me/src/eudyptula ...
see include/linux/module.h for more information
  CC      /home/me/src/eudyptula/tasks/01/hello-world.mod.o
  LD [M]  /home/me/src/eudyptula/tasks/01/hello-world.ko
make[1]: Leaving directory '/usr/src/linux-headers-4.15.0-108-generic'

The .ko extension was introduced around kernel version 2.6 to help differentiate between userspace object files and kernel object files, which contain a .modinfo section to hold extra metadata information about the module. We can use the modinfo command to see and interpret the contents of the section:

$ modinfo hello-world.ko
filename:       /home/me/src/eudyptula/tasks/01/hello-world.ko
srcversion:     18005133D4ECFCDD12928D8
depends:
retpoline:      Y
name:           hello_world
vermagic:       4.15.0-108-generic SMP mod_unload

Installing the Module

With our hello-world.c module freshly compiled, we can insert it into our kernel using the insmod command as root or another user with sudo privileges:

$ sudo insmod hello-world.ko

Congratulations!, you have created your first kernel module! A quick inspection of the kernel's diagnostic messages, using dmesg, should show our Hello World. message:

$ dmesg | tail -1
[241745.247591] Hello World.

Removing the Module

After the well deserved pat-on-the-back and when you are ready to continue, we can uninstall our module with the rmmod command as root or someone with sudo privileges:

$ sudo rmmod hello_world

The only indication we've uninstalled our module will be in dmesg from our printk() statement in the cleanup_module() function.

$ dmesg | tail -1
[241751.401232] oh, the rest is silence.

Kernel Taint

There are plenty of ways we can taint our kernel. Don't worry too much about this though, most of the time it is completely fine to run a tainted kernel. When something happens that could be important to an investigation later on, a kernel will mark itself as "tainted". Usually the event that caused the kernel to become tainted is the problem being investigated.

We can find our kernel's tainted state by reading our /proc/sys/kernel/tainted file. Every way we can taint our kernels is assigned one bit in a bit-field, meaning any value other than 0 indicates our kernel is tainted. To decode the bit-field values, we can use the tools/debugging/kernel-chktaint script found in the source code, to decode its meaning.

$ tools/debugging/kernel-chktaint
Kernel is "tainted" for the following reasons:
 * proprietary module was loaded (#0)
 * kernel issued warning (#9)
 * externally-built ('out-of-tree') module was loaded  (#12)
 * unsigned module was loaded (#13)
For a more detailed explanation of the various taint flags see
 Documentation/admin-guide/tainted-kernels.rst in the the Linux kernel sources
 or https://kernel.org/doc/html/latest/admin-guide/tainted-kernels.html
Raw taint value as int/string: 12801/'P        W  OE    '

Licensing & Documentation

One of the ways we can taint our kernels is by loading proprietary modules or modules that use licenses not compatible with the General Public License (GPL) (bit 0 in the tainting list). Modules that don't use the MODULE_LICENSE() macro will also be considered proprietary and taint our kernel, if loaded (this is why we saw the warning above).

There are many documentation macros, defined in linux/module.h, some of the basics I added are:

MODULE_LICENSE("MIT");
MODULE_AUTHOR("Bryan Brattlof <email@example.com>");
MODULE_DESCRIPTION("A Hello World Driver");
MODULE_SUPPORTED_DEVICE("testdevice");

Once we add our module's license, author and other information to the end of our hello-world.c module, when we compile our module again using make, the WARNING should be gone:

$ make
make -C /lib/modules/4.15.0-108-generic/build M=/home/me/src/eudyptula ...
make[1]: Entering directory '/usr/src/linux-headers-4.15.0-108-generic'
  CC [M]  /home/me/src/eudyptula/tasks/01/hello-world.o
  Building modules, stage 2.
  MODPOST 1 modules
  CC      /home/me/src/eudyptula/tasks/01/hello-world.mod.o
  LD [M]  /home/me/src/eudyptula/tasks/01/hello-world.ko
make[1]: Leaving directory '/usr/src/linux-headers-4.15.0-108-generic'

There where many reasons to add this system to the kernel. For example, it gives developers a way to easily find who maintains a module, describe what the module does, and what license the code is protected with. It also provides an easy method to inform users when they are using non open source software.

Updating the Module

Everything in my notes, to this point, was needed to complete the 1st task assigned to us by the Little Penguin. However, just like with every software project, the Linux kernel is constantly adding new features and adopting new coding styles, ensuring that my notes will become obsolete as soon as I've writing them.

With that said, the sections below, while not technically needed to complete the task, are my notes on the macros and functions I saw in the drivers directory of the Linux source code that I found particularly interesting. These functions are mostly stylistic changes or they introduce functionality that improves efficiency and modularity of the Linux kernel in some way.

module_init() & module_exit()

Introduced in version 2.4 of the kernel, and defined in linux/init.h of the source code, we can now rename our "start" and "end" functions to whatever we wish. In this example, I've chosen to rename the "start" function to hello_world_init()

-int init_module(void)
+static int __init hello_world_init(void)
 {
         printk(KERN_DEBUG "Hello World.\n");
         return 0; /* init_module loaded successfully */
 }

And renamed the "exit" function hello_world_exit()

-void cleanup_module(void)
+static void __exit hello_world_exit(void)
 {
         printk(KERN_DEBUG "oh, the rest is silence.\n");
 }

The kernel will then use the module_init() macro to find the function to execute when the module is installed and module_exit() to find the function to cleanup before being removed.

module_init(hello_world_init);
module_exit(hello_world_exit);

To avoid compiling issues, both the module_init() and module_exit() macros must be defined below our newly named "start" and "end" functions.

__init & __exit

I also introduced two macros to our "start" and "end" functions above called __init and __exit. These macros, defined in linux/init.h of the source code, help reduce memory used by the kernel depending on how the module is installed.

For built-in modules, where our module cannot be removed from the kernel without recompiling and restarting, the __init keyword will tell our C lexer to place our module's "start" function into a special section inside the compiled kernel. After the module is loaded and our "start" function has finished, the kernel will never have to run the code again until reboot. So this special section can be freed, saving memory.

The same is true for the __exit macro. For built-in modules, the module cannot be removed from the kernel without recompiling and restarting. So the kernel will never need to run our module's "exit" function to safely remove it from the kernel. This means our C lexer can safely omit our "exit" function from the compiled kernel.

pr_debug()

In the beginning there was printk(), and the kernel's diagnostic messages structure was formless. The lack of any format for printk() messages is one of a number of reasons why developers are replacing printk() statements with their newer equivalents. Depending on what section of the kernel we are in, there are newer functions that have some benefits for us.

For example, the pr_debug() function, which has the benefit of being less syntactically verbose than printk(KERN_DEBUG ...) also allows us to take advantage of the dynamic debugging interface, which gives developers a uniform control interface for debugging kernel messages while avoiding cluttering the kernel.

 static int __init hello_world_init(void)
 {
-    printk(KERN_DEBUG "Hello World.\n");
+    pr_debug("Hello World.\n");
     return 0; /* means init_module loaded successfully */
 }
 static void __exit hello_world_exit(void)
 {
-    printk(KERN_DEBUG "oh, the rest is silence.\n");
+    pr_debug("oh, the rest is silence.\n");
 }

Wrapping Up

If you made it here, all I can say is you are a very brave person, and I'm glad my notes were able to help you in some way. If you see any issues or have a question, please feel free to contact me, or better yet subscribe to the kernel newbies mailing list.

For the next challenge, we will be building the Linux Kernel from scratch, as well as installing and booting from it. If you want to work on this challenge before you read my notes (recommended), I've published a copy of the challenges in a git repo here.

Next: My notes on How to build the Linux Kernel from scratch.