Linux Loadable Kernel Module HOWTO

Bryan Henderson

21 May 2002

Revision History
Revision v1.022002-05-21Revised by: bjh
Correct explanation of symbol versioning. Correct author of Linux Device Drivers. Add info about memory allocation penalty of LKM vs bound-in. Add LKM-to-LKM symbol matching requirement. Add open source licensing issue in LKM symbol resolution. Add SMP symbol versioning info.
Revision v1.012001-08-18Revised by: bjh
Add material on various features created in the last few years: kernel module loader, ksymoops symbols, kernel-version-dependent LKM file location.
Revision v1.002001-06-14Revised by: bjh
Initial release.

Table of Contents
1. Preface
2. Introduction to Linux Loadable Kernel Modules
2.1. Terminology
2.2. History of Loadable Kernel Modules
2.3. The Case For Loadable Kernel Modules
2.4. What LKMs Can't Do
2.5. What LKMs Are Used For
3. Making Loadable Kernel Modules
4. LKM Utilities
5. How To Insert And Remove LKMs
5.1. Could Not Find Kernel Version...
5.2. Intelligent Loading Of LKMs - Modprobe
5.3. Automatic LKM Loading and Unloading
5.4. /proc/modules
5.5. Where Are My LKM Files On My System?
6. Unresolved Symbols
6.1. Some LKMs Prerequire Other LKMs
6.2. An LKM Must Match The Base Kernel
6.3. If You Run Multiple Kernels
6.4. SMP symbols
6.5. You Are Not Licensed To Access The Symbol
6.6. An LKM Must Match Prerequisite LKMs
7. How To Boot Without A Disk Device Driver
8. About Module Parameters
9. Persistent Data
10. Technical Details
10.1. How They Work
10.2. The .modinfo Section
10.3. The __ksymtab And .kstrtab Sections
10.4. Ksymoops Symbols
10.5. Other Symbols
10.6. Memory Allocation For Loading
10.7. Linux internals
11. Writing Your Own Loadable Kernel Module
11.1. bug in hello.c
11.2. Rubini   Corbet: Linux Device Drivers
11.3. Improving On Use Counts
12. Related Documentation
13. Individual Modules
13.1. Executable Interpreters
13.2. Block Device Drivers
13.3. SCSI Drivers
13.4. Network Device Drivers
13.5. CDROM Device Drivers
13.6. Filesystem Drivers
13.7. Miscellaneous Device Driver
13.8. Serial Device Drivers
13.9. Parallel Device Drivers
13.10. Bus Mouse Device Drivers
13.11. Tape Device Drivers
13.12. Watchdog Timers
13.13. Sound Device Drivers
14. Maintenance Of This Document
15. History
16. Copyright

1. Preface

Copyright and license information, as well as credits, are at the end of this document.

This HOWTO is maintained by Bryan Henderson, bryanh@giraffe-data.com. It was released May 31, 2001. You can get the current version of this HOWTO from the Linux Documentation Project.


2. Introduction to Linux Loadable Kernel Modules

If you want to add code to a Linux kernel, the most basic way to do that is to add some source files to the kernel source tree and recompile the kernel. In fact, the kernel configuration process consists mainly of choosing which files to include in the kernel to be compiled.

But you can also add code to the Linux kernel while it is running. A chunk of code that you add in this way is called a loadable kernel module. These modules can do lots of things, but they typically are one of three things: 1) device drivers; 2) filesystem drivers; 3) system calls. The kernel isolates certain functions, including these, especially well so they don't have to be intricately wired into the rest of the kernel.


2.1. Terminology

Loadable kernel modules are often called just kernel modules or just modules, but those are rather misleading terms because there are lots of kinds of modules in the world and various pieces built into the base kernel can easily be called modules. We use the term loadable kernel module or LKM for the particular kinds of modules this HOWTO is about.

Some people think of LKMs as outside of the kernel. They speak of LKMs communicating with the kernel. This is a mistake; LKMs (when loaded) are very much part of the kernel. The correct term for the part of the kernel that is bound into the image that you boot, i.e. all of the kernel except the LKMs, is "base kernel." LKMs communicate with the base kernel.

In some other operating systems, the equivalent of a Linux LKM is called a "kernel extension."

Now what is "Linux"? Well, first of all, the name is used for two entirely different things, and only one of them is really relevant here:

  1. The kernel and related items distributed as a package by Linus Torvalds.

  2. A class of operating systems that traditionally are based on the Linux kernel.

Only the first of these is really useful in discussing LKMs. But even choosing this definition, people are often confused when it comes to LKMs. Is an LKM part of Linux or not? Though an LKM is always part of the kernel, it is part of Linux if it is distributed in the Linux kernel package, and not otherwise. Thus, if you have loaded into your kenel a device driver LKM that came with your device, you can't, strictly speaking, say that your kernel is Linux. Rather, it's a slight extension of Linux. As you might expect, it is commonplace to use the name "Linux" approximately -- Lots of variations on Linux are in use and are widely distributed, and referred to as "Linux." In this document, though, we will stick to the strictest definition in the interest of clarity.


2.3. The Case For Loadable Kernel Modules

You often have a choice between putting a module into the kernel by loading it as an LKM or binding it into the base kernel. LKMs have a lot of advantages over binding into the base kernel and I recommend them wherever possible.

One advantage is that you don't have to rebuild your kernel as often. This saves you time and spares you the possibility of introducing an error in rebuilding and reinstalling the base kernel. Once you have a working base kernel, it is good to leave it untouched as long as possible.

Another advantage is that LKMs help you diagnose system problems. A bug in a device driver which is bound into the kernel can stop your system from booting at all. And it can be really hard to tell which part of the base kernel caused the trouble. If the same device driver is an LKM, though, the base kernel is up and running before the device driver even gets loaded. If your system dies after the base kernel is up and running, it's an easy matter to track the problem down to the trouble-making device driver and just not load that device driver until you fix the problem.

LKMs can save you memory, because you have to have them loaded only when you're actually using them. All parts of the base kernel stay loaded all the time. And in real storage, not just virtual storage.

LKMs are much faster to maintain and debug. What would require a full reboot to do with a filesystem driver built into the kernel, you can do with a few quick commands with LKMs. You can try out different parameters or even change the code repeatedly in rapid succession, without waiting for a boot.

LKMs are not slower, by the way, than base kernel modules. Calling either one is simply a branch to the memory location where it resides. [1]

Sometimes you have to build something into the base kernel instead of making it an LKM. Anything that is necessary to get the system up far enough to load LKMs must obviously be built into the base kernel. For example, the driver for the disk drive that contains the root filesystem must be built into the base kernel.


2.5. What LKMs Are Used For

There are six main things LKMs are used for:


3. Making Loadable Kernel Modules

An LKM lives in a single ELF object file (normally named like "serial.o"). You typically keep all your LKM object files in a particular directory (near your base kernel image makes sense). When you use the insmod program to insert an LKM into the kernel, you give the name of that object file.

For the LKMs that are part of Linux, you build them as part of the same kernel build process that generates the base kernel image. See the README file in the Linux source tree. In short, after you make the base kernel image with a command such as make zImage, you will make all the LKMs with the command
make modules     

This results in a bunch of LKM object files (*.o) throughout the Linux source tree. (In older versions of Linux, there would be symbolic links in the modules directory of the Linux source tree pointing to all those LKM object files). These LKMs are ready to load, but you probably want to install them in some appropriate directory. The conventional place is described in Section 5.5. The command make modules_install will copy them all over to the conventional locations.

Part of configuring the Linux kernel (at build time) is choosing which parts of the kernel to bind into the base kernel and which parts to generate as separate LKMs. In the basic question-and-answer configuration (make config), you are asked, for each optional part of the kernel, whether you want it bound into the kernel (a "Y" response), created as an LKM (an "M" response), or just skipped completely (an "N" response). Other configuration methods are similar.

As explained in Section 2.3, you should have only the bare minimum bound into the base kernel. And only skip completely the parts that you're sure you'll never want. There is very little to lose by building an LKM that you won't use. Some compile time, some disk space, some chance of a problem in the code killing the kernel build. That's it.

As part of the configuration dialog you also must choose whether to use symbol versioning or not. This choice affects building both the base kernel and the LKMs and it is crucial you get it right. See Section 6.

LKMs that are not part of Linux (i.e. not distributed with the Linux kernel) have their own build procedures which I will not cover. The goal of any such procedure, though, is always to end up with an ELF object file.

You don't necessarily have to rebuild all your LKMs and your base kernel image at the same time (e.g. you could build just the base kernel and use LKMs you built earlier with it) but it is always a good idea. See Section 6.


4. LKM Utilities

The programs you need to load and unload and otherwise work with LKMs are in the package modutils. You can find this package in this directory.

This package contains the following programs to help you use LKMs:

insmod

Insert an LKM into the kernel.

rmmod

Remove an LKM from the kernel.

depmod

Determine interdependencies between LKMs.

kerneld

Kerneld daemon program

ksyms

Display symbols that are exported by the kernel for use by new LKMs.

lsmod

List currently loaded LKMs.

modinfo

Display contents of .modinfo section in an LKM object file.

modprobe

Insert or remove an LKM or set of LKMs intelligently. For example, if you must load A before loading B, Modprobe will automatically load A when you tell it to load B.

Changes to the kernel often require changes to modutils, so be sure you're using a current version of modutils whenever you upgrade your kernel. modutils is always backward compatible (it works with older kernels), so there's no such thing as having too new a modutils.

Warning: modprobe invokes insmod and has its location hardcoded as /sbin/insmod. There may be other instances in modutils of the PATH not being used to find programs. So either modify the source code of modutils before you build it, or make sure you install the programs in their conventional directories.


5. How To Insert And Remove LKMs

The basic programs for inserting and removing LKMs are insmod and rmmod. See their man pages for details.

Inserting an LKM is conceptually easy: Just type, as superuser, a command like
insmod serial.o
(serial.o contains the device driver for serial ports (UARTs)).

However, I would be misleading you if I said the command just works. It is very common, and rather maddening, for the command to fail either with a message about a module/kernel version mismatch or a pile of unresolved symbols.

If it does work, though, the way to prove to yourself that you know what you're doing is to look at /proc/modules as described in Section 5.4.

Now lets look at a more difficult insertion. If you try
insmod msdos.o
you will probably get a raft of error messages like:
  msdos.o: unresolved symbol fat_date_unix2dos
  msdos.o: unresolved symbol fat_add_cluster1
  msdos.o: unresolved symbol fat_put_super
  ...

This is because msdos.o contains external symbol references to the symbols mentioned and there are no such symbols exported by the kernel. To prove this, do a
cat /proc/ksyms
to list every symbol that is exported by the kernel (i.e. available for binding to LKMs). You will see that 'fat_date_unix2dos' is nowhere in the list.

How do you get it into the list? By loading another LKM, one which defines those symbols and exports them. In this case, it is the LKM in the file fat.o. So do
  insmod fat.o
and then see that "fat_date_unix2dos" is in /proc/ksyms. Now redo the
insmod msdos.o
and it works. Look at /proc/modules and see that both LKMs are loaded and one depends on the other:
msdos                   5632   0 (unused)
fat                    30400   0 [msdos]

How did I know fat.o was the module I was missing? Just a little ingenuity. A more robust way to address this problem is to use depmod and modprobe instead of insmod, as discussed below.

When your symbols look like "fat_date_unix2dos_R83fb36a1", the problem may be more complex than just getting prerequisite LKMs loaded. See Section 6.

When the error message is "kernel/module version mismatch," see Section 6.

Often, you need to pass parameters to the LKM when you insert it. For example, a device driver wants to know the address and IRQ of the device it is supposed to drive. Or the network driver wants to know how much diagnostic tracing you want it to do. Here is an example of that:

insmod ne.o io=0x300 irq=11

Here, I am loading the device driver for my NE2000-like Ethernet adapter and telling it to drive the Ethernet adapter at IO address 0x300, which generates interrupts on IRQ 11.

There are no standard parameters for LKMs and very few conventions. Each LKM author decides what parameters insmod will take for his LKM. Hence, you will find them documented in the documentation of the LKM. This HOWTO also compiles a lot of LKM parameter information in Section 13. For general information about LKM parameters, see Section 8.

To remove an LKM from the kernel, the command is like
rmmod ne

There is a command lsmod to list the currently loaded LKMs, but all it does is dump the contents of /proc/modules, with column headings, so you may just want to go to the horse's mouth and forget about lsmod.


5.2. Intelligent Loading Of LKMs - Modprobe

Once you have module loading and unloading figured out using insmod and rmmod, you can let the system do more of the work for you by using the higher level program modprobe. See the modprobe man page for details.

The main thing that modprobe does is automatically load the prerequisites of an LKM you request. It does this with the help of a file that you create with depmod and keep on your system.

Example:
modprobe msdos

This performs an insmod of msdos.o, but before that does an insmod of fat.o, since you have to have fat.o loaded before you can load msdos.o.

The other major thing modprobe does for you is to find the object module containing the LKM given just the name of the LKM. For example, modprobe msdos might load /lib/2.4.2-2/fs/msdos.o. Check out the man pages for modprobe and the configuration file modules.conf (usually /etc/modules.conf) for details on the search rules modprobe uses.

depmod scans your LKM object files (typically all the .o files in the appropriate /lib/modules subdirectory) and figures out which LKMs prerequire (refer to symbols in) other LKMs. It generates a dependency file (typically named modules.dep), which you normally keep in /lib/modules for use by modprobe.

You can use modprobe to remove stacks of LKMs as well.

Via the LKM configuration file (typically /etc/modules.conf), you can fine tune the dependencies and do other fancy things to control LKM selections. And you can specify programs to run when you insert and remove LKMs, for example to initialize a device driver.

If you are maintaining one system and memory is not in short supply, it is probably easier to avoid modprobe and the various files and directories it needs, and just do raw insmods in a startup script.


5.3. Automatic LKM Loading and Unloading

5.3.1. Automatic Loading

You can cause an LKM to be loaded automatically when the kernel first needs it. You do this with either a kerneld daemon or with the more recent invention, the kernel module loader, which is part of Linux.

As an example, let's say you run a program that executes an open system call for a file in an MS-DOS filesystem. But you don't have a filesystem driver for the MS-DOS filesystem either bound into your base kernel or loaded as an LKM. So the kernel does not know how to access the file you're opening on the disk.

The kernel recognizes that it has no filesystem driver for MS-DOS, but that one of the two automatic module loading facilities are available and uses it to cause the LKM to be loaded. The kernel then proceeds with the open.

Both kerneld and the kernel module loader use modprobe, ergo insmod, to insert LKMs.


5.3.1.1. Kerneld

kerneld is explained at length in the Kerneld mini-HOWTO, available from the Linux Documentation Project.

kerneld is a user process, which runs the kerneld program from the modutils package. kerneld sets up an IPC message channel with the kernel. When the kernel needs an LKM, it sends a message on that channel to kerneld and kerneld runs modprobe to load the LKM, then sends a message back to the kernel to say that it is done.


5.3.1.2. Kernel Module Loader

There is some documentation of the kernel module loader in the file Documentation/kmod.txt in the Linux source tree. As of this writing, this section is more complete and accurate than that file. You can also look at its source code in kernel/kmod.c.

The kernel module loader is an optional part of the Linux kernel. You get it if you select the CONFIG_KMOD feature when you configure the kernel at build time.

When a kernel that has the kernel module loader needs an LKM, it creates a user process (owned by the superuser, though) that executes modprobe to load the LKM, then exits. By default, it finds modprobe as /sbin/modprobe, but you can set up any program you like as modprobe by writing its file name to /proc/sys/kernel/modprobe. For example:
$ echo "sbin/mymodprobe" >/proc/sys/kernel/modprobe

The kernel module loader passes the following arguments to modprobe: Argument Zero is the full file name of modprobe. The regular arguments are -s, -k, and the name of the LKM that the kernel wants. -s is the user-hostile form of --syslog; -k is the cryptic way to say --autoclean. I.e. messages from modprobe will go to syslog and the loaded LKM will have the "autoclean" flag set.

The kernel module loader runs modprobe with the following environment variables (only): HOME=/; TERM=linux; PATH=/sbin:/usr/sbin:/bin:/usr/bin.

The kernel module loader was new in Linux 2.2 and was designed to take the place of kerneld. It does not, however, have all the features of kerneld.

In Linux 2.2, the kernel module loader creates the above mentioned process directly. In Linux 2.4, the kernel module loader submits the module loading work to Keventd and it runs as a child process of Keventd.

The kernel module loader is a pretty strange beast. It violates layering as Unix programmers generally understand it and consequently is inflexible, hard to understand, and not robust. Many system designers would bristle just at the fact that it has the PATH hardcoded. You may prefer to use kerneld instead, or not bother with automatic loading of LKMs at all.


5.5. Where Are My LKM Files On My System?

The LKM world is flexible enough that the files you need to load could live just about anywhere on your system, but there is a convention that most systems follow: The LKM .o files are in the directory /lib/modules, divided into subdirectories. There is one subdirectory for each version of the kernel, since LKMs are specific to a kernel (see Section 6). Each subdirectory contains a complete set of LKMs.

The subdirectory name is the value you get from the uname --release command, for example 2.2.19. Section 6.3 tells how you control that value.

When you build Linux, a standard make modules and make modules_install should install all the LKMs that are part of Linux in the proper release subdirectory.

If you build a lot of kernels, another organization may be more helpful: keep the LKMs together with the base kernel and other kernel-related files in a subdirectory of /boot. The only drawback of this is that you cannot have /boot reside on a tiny disk partition. In some systems, /boot is on a special tiny "boot partition" and contains only enough files to get the system up to the point that it can mount other filesystems.


6. Unresolved Symbols

The most common and most frustrating failure in loading an LKM is a bunch of error messages about unresolved symbols, like this:
  msdos.o: unresolved symbol fat_date_unix2dos
  msdos.o: unresolved symbol fat_add_cluster1
  msdos.o: unresolved symbol fat_put_super
  ...
There are actually a bunch of different problems that result in this symptom. In any case, you can get closer to the problem by looking at /proc/ksymsand confirming that the symbols in the message are indeed not in the list.

6.2. An LKM Must Match The Base Kernel

The designers of loadable kernel modules realized there would be a problem with having the kernel in multiple files, possibly distributed independently of one another. What if the LKM mydriver.o was written and compiled to work with the Linux 1.2.1 base kernel, and then someone tried to load it into a Linux 1.2.2 kernel? What if there was a change between 1.2.1 and 1.2.2 in the way a kernel subroutine that mydriver.o calls works? These are internal kernel subroutines, so what's to stop them from changing from one release to the next? You could end up with a broken kernel.

To address this problem, the creators of LKMs endowed them with a kernel version number. The special .modinfo section of the mydriver.o object file in this example has "1.2.1" in it because it was compiled using header files from Linux 1.2.1. Try to load it into a 1.2.2 kernel and insmod notices the mismatch and fails, telling you you have a kernel version mismatch.

But wait. What's the chance that there really is an incompatibility between Linux 1.2.1 and 1.2.2 that will affect mydriver.o? mydriver.o only calls a few subroutines and accesses a few data structures. Surely they don't change with every minor release. Must we recompile every LKM against the header files for the particular kernel into which we want to insert it?

To ease this burden, insmod has a -f option that "forces" insmod to ignore the kernel version mismatch and insert the module anyway. Because it is so unusual for there to be a significant difference between any two kernel versions, I recommend you always use -f. You will, however, still get a warning message about the mismatch. There's no way to shut that off.

But LKM designers still wanted to address the problem of incompatible changes that do occasionally happen. So they invented a very clever way to allow the LKM insertion process to be sensitive to the actual content of each kernel subroutine the LKM uses. It's called symbol versioning (or sometimes less clearly, "module versioning."). It's optional, and you select it when you configure the kernel via the "CONFIG_MODVERSIONS" kernel configuration option.

When you build a base kernel or LKM with symbol versioning, the various symbols exported for use by LKMs get defined as macros. The definition of the macro is the same symbol name plus a hexadecimal hash value of the parameter and return value types for the subroutine named by the symbol (based on an analysis by the program genksyms of the source code for the subroutine). So let's look at the register_chrdev subroutine. register_chrdev is a subroutine in the base kernel that device driver LKMs often call. With symbol versioning, there is a C macro definition like

  #define register_chrdev register_chrdev_Rc8dc8350

This macro definition is in effect both in the C source file that defines register_chrdev and in any C source file that refers to register_chrdev, so while your eyes see register_chrdev as you read the code, the C preprocessor knows that the function is really called register_chrdev_Rc8dc8350.

What is the meaning of that garbage suffix? It is a hash of the data types of the parameters and return value of register_chrdev. No two combinations of parameter and return value types have the same hash value.

So let's say someone adds a paramater to register_chrdev between Linux 1.2.1 and Linux 1.2.2. In 1.2.1, register_chrdev is a macro for register_chrdev_Rc8dc8350, but in 1.2.2, it is a macro for register_chrdev_R12f8dc01. In mydriver.o, compiled with Linux 1.2.1 header files, there is an external reference to register_chrdev_Rc8dc8350, but there is no such symbol exported by the 1.2.2 base kernel. Instead, the 1.2.2 base kernel exports a symbol register_chrdev_R12f8dc01.

So if you try to insmod this 1.2.1 mydriver.o into this 1.2.2 base kernel, you will fail. And the error message isn't one about mismatched kernel versions, but simply "unresolved symbol reference."

As clever as this is, it actually works against you sometimes. The way genksyms works, it often generates different hash values for parameter lists that are essentially the same.

And symbol versioning doesn't even guarantee compatibility. It catches only a small subset of the kinds of changes in the definition of a function that can make it not backward compatible. If the way register_chrdev interprets one of its parameters changes in a non-backward-compatible way, its version suffix won't change -- the parameter still has the same C type.

And there's no way an option like -f on insmod can get around this.

So it is generally not wise to use symbol versioning.

Of course, if you have a base kernel that was compiled with symbol versioning, then you must have all your LKMs compiled likewise, and vice versa. Otherwise, you're guaranteed to get those "unresolved symbol reference" errors.


7. How To Boot Without A Disk Device Driver

For most systems, the ATA disk device driver must be bound into the base kernel because the root filesystem is on an ATA disk [2] and the kernel cannot mount the root filesystem, much less read any LKMs from it, without the ATA disk driver. But if you really want the device driver for your root filesystem to be an LKM, here's how to do it with Initrd:

"Initrd" is the name of the "initial ramdisk" feature of Linux. With this, you have your loader (probably LILO) load a filesystem into memory (as a ramdisk) before starting the kernel. When it starts the kernel, it tells it to mount the ramdisk as the root filesystem. You put the disk device driver for your real root filesystem and all the software you need to load it in that ramdisk filesystem. Your startup programs (which live in the ramdisk) eventually mount the real (disk) filesystem as the root filesystem. Note that a ramdisk doesn't require any device driver.

This does not free you, however, from having to bind into the base kernel 1) the filesystem driver for the filesystem in your ramdisk, and 2) the executable interpreter for the programs in the ramdisk.


8. About Module Parameters

It is useful to compare parameters that get passed to LKMs and parameters that get passed to modules that are bound into the base kernel, especially since modules often can be run either way.

We've seen above that you pass parameters to an LKM by specifying something like io=0x300 on the insmod command. For a module that is bound into the base kernel, you pass parameters to it via the kernel boot parameters. One common way to specify kernel boot parameters is at a lilo boot prompt. Another is with an append statement in the lilo configuration file.

The kernel initializes an LKM at the time you load it. It initializes a bound-in module at boot time.

Since there is only one string of kernel boot parameters, you need some way within that string to identify which parameters go to which modules. The rule for this is that if there is a module named xyz, then a kernel boot parameter named xyz is for that module. The value of a kernel boot parameter is an arbitrary string that makes sense only to the module.

This is why you sometimes see an LKM whose only parameter is its own name. E.g. you load the Mitsumi CDROM driver with a command like
  insmod mcd mcd=0x340
It seems ridiculous to have the parameter named mcd instead of, say, io, but this is done for consistency with the case where you bind mcd into the base kernel, in which case you would select the I/O port address with the characters mcd=0x340 in the kernel boot parameters.


9. Persistent Data

Some LKMs are set up to retain information from one load to the next. This is called persistent data. When you remove one of these LKMs with rmmod, rmmod extracts certain values from the LKM's working storage and stores them in a file. When you next insert the LKM with insmod, insmod reads the persistent data from the file and inserts it into the LKM.

See the --persist option on insmod and rmmod.

Persistent data was introduced in November 2000.


10. Technical Details

10.1. How They Work

insmod makes an init_module system call to load the LKM into kernel memory. Loading it is the easy part, though. How does the kernel know to use it? The answer is that the init_module system call invokes the LKM's initialization routine right after it loads the LKM. insmod passes to init_module the address of the subroutine in the LKM named init_module as its initialization routine.

(This is confusing -- every LKM has a subroutine named init_module, and the base kernel has a system call by that same name, which is accessible via a subroutine in the standard C library also named init_module).

The LKM author set up init_module to call a kernel function that registers the subroutines that the LKM contains. For example, a character device driver's init_module subroutine might call the kernel's register_chrdev subroutine, passing the major and minor number of the device it intends to drive and the address of its own "open" routine among the arguments. register_chrdev records in base kernel tables that when the kernel wants to open that particular device, it should call the open routine in our LKM.

But the astute reader will now ask how the LKM's init_module subroutine knew the address of the base kernel's register_chrdev subroutine. This is not a system call, but an ordinary subroutine bound into the base kernel. Calling it means branching to its address. So how does our LKM, which was not compiled anywhere near the base kernel, know that address? The answer to this is insmod relocation. insmod functions as a relocating linker/loader. The LKM object file contains an external reference to the symbol register_chrdev. insmod does a query_module system call to find out the addresses of various symbols that the existing kernel exports. register_chrdev is among these. query_module returns the address for which register_chrdev stands and insmod patches that into the LKM where the LKM refers to register_chrdev.

If you want to see the kind of information insmod can get from a query_module system call, look at the contents of /proc/ksyms.

Note that some LKMs call subroutines in other LKMs. They can do this because of the __ksymtab and .kstrtab sections in the LKM object files. These sections together list the external symbols within the LKM object file that are supposed to be accessible by other LKMs inserted in the future. insmod looks at __ksymtab and .kstrtab and tells the kernel to add those symbols to its exported kernel symbols table.

To see this for yourself, insert the LKM msdos.o and then notice in /proc/ksyms the symbol fat_add_cluster (which is the name of a subroutine in the fat.o LKM). Any subsequently inserted LKM can branch to fat_add_cluster, and in fact msdos.o does just that.


10.4. Ksymoops Symbols

insmod adds a bunch of exported symbols to the LKM as it loads it. These symbols are all intended to help ksymoops do its job. ksymoops is a program that interprets and "oops" display. And "oops" display is stuff that the Linux kernel displays when it detects an internal kernel error (and consequently terminates a process). This information contains a bunch of addresses in the kernel, in hexadecimal.

ksymoops looks at the hexadecimal addresses, looks them up in the kernel symbol table (which you see in /proc/ksyms, and translates the addresses in the oops message to symbolic addresses, which you might be able to look up in an assembler listing.

So lets say you have an LKM crash on you. The oops message contains the address of the instruction that choked, and what you want ksymoops to tell you is 1) in what LKM is that instruction, and 2) where is the instruction relative to an assembler listing of that LKM? Similar questions arise for the data addresses in the oops message.

To answer those questions, ksymoops must be able to get the loadpoints and lengths of the various sections of the LKM from the kernel symbol table.

Well, insmod knows those addresses, so it just creates symbols for them and includes them in the symbols it loads with the LKM.

In particular, those symbols are named (and you can see this for yourself by looking at /proc/ksyms):

__insmod_name_Ssectionname_Llength

name is the LKM name (as you would see in /proc/modules.

sectionname is the section name, e.g. .text (don't forget the leading period).

length is the length of the section, in decimal.

The value of the symbol is, of course, the address of the section.

Insmod also adds a pretty useful symbol that tells from what file the LKM was loaded. That symbol's name is

__insmod_name_Ofilespec_Mmtime_Vversion

name is the LKM name, as above.

filespec is the file specification that was used to identify the file containing the LKM when it was loaded. Note that it isn't necessarily still under that name, and there are multiple file specifications that might have been used to refer to the same file. For example, ../dir1/mylkm.o and /lib/dir1/mylkm.o.

mtime is the modification time of that file, in the standard Unix representation (seconds since 1969), in hexadecimal.

version tells the kernel version level for which the LKM was built (same as in the .modinfo section). It is the value of the macro LINUX_VERSION_CODE in Linux's linux/version.h file. For example, 132101.

The value of this symbol is meaningless.


10.6. Memory Allocation For Loading

This section is about how Linux allocates memory in which to load an LKM. It is not about how an LKM dynamically allocates memory, which is the same as for any other part of the kernel.

The memory where an LKM resides is a little different from that where the base kernel resides. The base kernel is always loaded into one big contiguous area of real memory, whose real addresses are equal to is virtual addresses. That's possible because the base kernel is the first thing ever to get loaded (besides the loader) -- it has a wide open empty space in which to load. And since the Linux kernel is not pageable, it stays in it's homestead forever.

By the time you load an LKM, real memory is all fragmented -- you can't simply add the LKM to the end of the base kernel. But the LKM needs to be in contiguous virtual memory, so Linux uses vmalloc to allocate a contiguous area of virtual memory (in the kernel address space), which is probably not contiguous in real memory. But the memory is still not pageable. The LKM gets loaded into real page frames from the start, and stays in those real page frames until it gets unloaded.

Some CPUs can take advantage of the properties of the base kernel to effect faster access to base kernel memory. For example, on one machine, the entire base kernel is covered by one page table entry and consequently one entry in the translation lookaside buffer (TLB). Naturally, that TLB entry is virtually always present. For LKMs on this machine, there is a page table entry for each memory page into which the LKM is loaded. Much more often, the entry for a page is not in the TLB when the CPU goes to access it, which means a slower access.

This effect is probably trivial.

It is also said that PowerPC Linux does something with its address translation so that transferring between accessing base kernel memory to accessing LKM memory is costly. I don't know anything solid about that.

The base kernel contains within its prized contiguous domain a large expanse of reusable memory -- the kmalloc pool. In Linux 2.4, at the end of 2001, there was a proposal to make the module loader try first to get contiguous memory from that pool into which to load an LKM and only if a large enough space was not available, go to the vmalloc space. That function may be in some versions of Linux.


11. Writing Your Own Loadable Kernel Module

The Linux Kernel Module Programming Guide by Ori Pomerantz is a complete explanation of writing your own LKM. This book is also available in print.

It is, however, a little out of date and contains an error or two. Here are a few things about writing an LKM that aren't in there.


11.1. bug in hello.c

The simple hello.c program has a small bug that causes it to generate a warning about an implicit declaration of printk(). The warning is innocuous.

The program is also more complicated than it needs to be with current Linux and depends on your having kernel messaging set up a certain way on your system to see it work. Finally, the program requires you to include -D options on your compile command to work, because it does not define some macros in the source code, where the definitions belong.

Here is an improved version of hello.c. Compile this with the simple command
$ gcc -c -Wall hello.c
/* hello.c 
 * 
 * "Hello, world" - the loadable kernel module version. 
 *
 * Compile this with 
 *
 *          gcc -c hello.c -Wall
 */

/* Declare what kind of code we want from the header files */
#define __KERNEL__         /* We're part of the kernel */
#define MODULE             /* Not a permanent part, though. */

/* Standard headers for LKMs */
#include <linux/modversions.h> 
#include <linux/module.h>  

#define _LOOSE_KERNEL_NAMES
    /* With some combinations of Linux and gcc, tty.h will not compile if
       you don't define _LOOSE_KERNEL_NAMES.  It's a bug somewhere.
    */
#include <linux/tty.h>      /* console_print() interface */

/* Initialize the LKM */
int init_module()
{
  console_print("Hello, world - this is the kernel speaking\n");
  /* More normal is printk(), but there's less that can go wrong with 
     console_print(), so let's start simple.
  */

  /* If we return a non zero value, it means that 
   * init_module failed and the LKM can't be loaded 
   */
  return 0;
}


/* Cleanup - undo whatever init_module did */
void cleanup_module()
{
  console_print("Short is the life of an LKM\n");
}


11.3. Improving On Use Counts

In the original design, the LKM increments and decrements its use count to tell the module manager whether it is OK to unload it. For example, if it's a filesystem driver, it would increment the use count when someone mounts a filesystem of the type it drives, and decrement it at unmount time.

Now, there is a more flexible alternative. Your LKM can register a function that the module manager will call whenever it wants to know if it is OK to unload the module. If the function returns a true value, that means the LKM is busy and cannot be unloaded. If it returns a false value, the LKM is idle and can be unloaded. The module manager holds the big kernel lock from before calling the module-busy function until after its cleanup subroutine returns or sleeps, and unless you've done something odd, that should mean that your LKM cannot become busy between the time that you report "not busy" and the time you clean up.

So how do you register the module-busy function? By putting its address in the unfortunately named can_unload field in the module descriptor ("struct module"). The name is truly unfortunate because the boolean value it returns is the exact opposite of what "can unload" means: true if the module manager cannot unload the LKM.

The module manager ensures that it does not attempt to unload the module before its initialization subroutine has returned or sleeps, so you are safe in setting the can_unload field anywhere in the initialization subroutine except after a sleep.


12. Related Documentation

For modules that are part of Linux (i.e. distributed with the base kernel), you can sometimes find documentation in the Documentation subdirectory of the Linux source code.

Many LKMs can be alternatively bound into the base kernel. If you do that, you will pass parameters to them via the kernel "command line," which in its most basic form means via a prompt at boot time. The BootPrompt HOWTO by Paul Gortmaker will help you with that. It is available from the Linux Documentation Project.

Don't forget that the source code of Linux and any LKM is always the documentation of last resort, and the most trustworthy.


13. Individual Modules

In this chapter, I document individual LKMs. Where possible, I do this by reference to more authoritative documentation for the particular LKM (probably maintained by the same person who maintains the LKM code).


13.1. Executable Interpreters

You must have at least one executable interpreter bound into the base kernel, because in order to load an executable interpreter LKM, you have to run an executable and something has to interpret that executable.

That one bound-in executable interpreter is almost certainly the ELF interpreter, since virtually all executables in a Linux system are ELF.

Historical note: Before ELF existed on Linux (c. 1995), the normal executable format was a.out. For a while, part ELF/part a.out systems were common. Some still exist.


13.2. Block Device Drivers

13.2.1. floppy: floppy disk driver

This is the device driver for floppy disks. You need this in order to access a floppy disk in any way.

This LKM is documented in the file README.fd in the linux/drivers/block directory of the Linux source tree. For detailed up to date information refer directly to this file.

Note that if you boot (or might boot) from a floppy disk or with a root filesystem on a floppy disk, you must have this driver bound into the base kernel, because your system will need it before it has a chance to insert the LKM.

Example:
  
  modprobe floppy 'floppy="daring two_fdc 0,thinkpad 0x8,fifo_depth"'

There is only one LKM parameter: floppy. But it contains many subparameters. The reason for this unusual parameter format is to be consistent with the way you would specify the same things in the kernel boot parameters if the driver were bound into the base kernel.

The value of floppy is a sequence of blank-delimited words. Each of those words is one of the following sequences of comma-delimited words:

asus_pci

Sets the bit mask of allowed drives to allow only units 0 and 1. Obsolete, as this is the default setting anyways

daring

Tells the floppy driver that you have a well behaved floppy controller. This allows more efficient and smoother operation, but may fail on certain controllers. This may speed up certain operations.

0,daring

Tells the floppy driver that your floppy controller should be used with caution.

one_fdc

Tells the floppy driver that you have only floppy controller (default).

address,two_fdc

Tells the floppy driver that you have two floppy controllers. The second floppy controller is assumed to be at address. This option is not needed if the second controller is at address 0x370, and if you use the 'cmos' option

two_fdc

Like above, but with default address

thinkpad

Tells the floppy driver that you have an IBM Thinkpad model notebook computer. Thinkpads use an inverted convention for the disk change line.

0,thinkpad

Tells the floppy driver that you don't have a Thinkpad.

nodma

Tells the floppy driver not to use DMA for data transfers. This is needed on HP Omnibooks, which don't have a workable DMA channel for the floppy driver. This option is also useful if you frequently get "Unable to allocate DMA memory" messages. Indeed, DMA memory needs to be continuous in physical memory, and is thus harder to find, whereas non-DMA buffers may be allocated in virtual memory. However, I advise against this if you have an FDC without a FIFO (8272A or 82072). 82072A and later are OK). You also need at least a 486 to use nodma. If you use nodma mode, I suggest you also set the FIFO threshold to 10 or lower, in order to limit the number of data transfer interrupts.

If you have a FIFO-able FDC, the floppy driver automatically falls back on non DMA mode if it can't find any DMA-able memory. If you want to avoid this, explicitly specify "yesdma".

omnibook

Same as nodma.

yesdma

Tells the floppy driver that a workable DMA channel is available (the default).

nofifo

Disables the FIFO entirely. This is needed if you get "Bus master arbitration error" messages from your Ethernet card (or from other devices) while accessing the floppy.

fifo

Enables the FIFO (default)

threshold,fifo_depth

Sets the FIFO threshold. This is mostly relevant in DMA mode. If this is higher, the floppy driver tolerates more interrupt latency, but it triggers more interrupts (i.e. it imposes more load on the rest of the system). If this is lower, the interrupt latency should be lower too (faster processor). The benefit of a lower threshold is fewer interrupts.

To tune the fifo threshold, switch on over/underrun messages using 'floppycontrol --messages'. Then access a floppy disk. If you get a huge amount of "Over/Underrun - retrying" messages, then the fifo threshold is too low. Try with a higher value, until you only get an occasional Over/Underrun.

The value must be between 0 and 0xf, inclusive.

As you insert and remove the LKM to try different values, remember to redo the 'floppycontrol --messages' every time you insert the LKM. You shouldn't normally have to tune the fifo, because the default (0xa) is reasonable.

drive,type,cmos

Sets the CMOS type of drive to type. This is mandatory if you have more than two floppy drives (only two can be described in the physical CMOS), or if your BIOS uses non-standard CMOS types. The CMOS types are:

(Note: there are two valid types for ED drives. This is because 5 was initially chosen to represent floppy tapes, and 6 for ED drives. AMI ignored this, and used 5 for ED drives. That's why the floppy driver handles both)

unexpected_interrupts

Print a warning message when an unexpected interrupt is received. (default behavior)

no_unexpected_interrupts

Don't print a message when an unexpected interrupt is received. This is needed on IBM L40SX laptops in certain video modes. (There seems to be an interaction between video and floppy. The unexpected interrupts only affect performance, and can safely be ignored.)

L40SX

Same as no_unexpected_interrupts.

broken_dcl

Don't use the disk change line, but assume that the disk was changed whenever the device node is reopened. Needed on some boxes where the disk change line is broken or unsupported. This should be regarded as a stopgap measure, indeed it makes floppy operation less efficient due to unneeded cache flushings, and slightly more unreliable. Please verify your cable, connection and jumper settings if you have any DCL problems. However, some older drives, and also some laptops are known not to have a DCL.

debug

Print debugging messages

messages

Print informational messages for some operations (disk change notifications, warnings about over and underruns, and about autodetection)

silent_dcl_clear

Uses a less noisy way to clear the disk change line (which doesn't involve seeks). Implied by daring.

nr,irq

Tells the driver to expect interrupts on IRQ nr instead of the conventional IRQ 6.

nr,dma

Tells the driver to use DMA channel nr instead of the conventional DMA channel 2.

slow

Use PS/2 stepping rate: PS/2 floppies have much slower step rates than regular floppies. It's been recommended that take about 1/4 of the default speed in some more extreme cases.

mask,allowed_drive_mask

Sets the bitmask of allowed drives to mask. By default, only units 0 and 1 of each floppy controller are allowed. This is done because certain non-standard hardware (ASUS PCI motherboards) mess up the keyboard when accessing units 2 or 3. This option is somewhat obsoleted by the 'cmos' option.

all_drives

Sets the bitmask of allowed drives to all drives. Use this if you have more than two drives connected to a floppy controller.


13.2.5. rd: ramdisk device driver

A ramdisk is a block device whose storage is composed of system memory (real memory; not virtual). You can use it like a very fast disk device and also in circumstances where you need a device, but don't have traditional hardware devices to play with.

A common example of the latter is for a rescue system -- a system you use to diagnose and repair your real system. Since you don't want to mess with your real disks, you run off ramdisks. You might load data into these ramdisks from external media such as floppy disks.

Sometimes, you have your boot loader (e.g. lilo) create a ramdisk and load it with data (perhaps from a floppy disk). Of course, if you do this, you cannot use the LKM version of the ramdisk driver because the driver will have to be in the kernel at boot time.

A ramdisk is actually conceptually simple in Linux. Disk devices operate through memory because of the buffer cache. The only difference with a ramdisk is that you never actually get past the buffer cache to a real device. This is because with a ramdisk, 1) when you first access a particular block, Linux just assumes it is all zeroes; and 2) the device's buffer cache blocks are never written to the device, ergo never stolen for use with other devices. This means reads and writes are always to the buffer cache and never reach the device.

There is additional information about ramdisks in the file Documentation/ramdisk.txt in the Linux source tree.

Example:
  modprobe rd

There are no module parameters that you can supply to the LKM, but if you bind the module into the base kernel, there are kernel parameters you can pass to it. See BootPrompt-HOWTO.


13.3. SCSI Drivers

Detailed information about SCSI drivers is in SCSI-2.4-HOWTO.

Linux's SCSI function is implemented in three layers, and there are LKMs for all of them.

In the middle is the mid-level driver or SCSI core. This consists of the scsi_mod LKM. It does all those things that are common among SCSI devices regardless of what SCSI adapter you use and what class of device (disk, scanner, CD-ROM drive, etc.) it is.

There is a low-level driver for each kind of SCSI adapter -- typically, a different driver for each brand. For example, the low-level driver for Advansys adapters (made by the company which is now Connect.com) is named advansys. (If you are comparing ATA (aka IDE) and SCSI disk devices, this is a major difference -- ATA is simple and standard enough that one driver works with all adapters from all companies. SCSI is less standard and as a result you should have less confidence in any particular adapter being perfectly compatible with your system).

High-level drivers present to the rest of the kernel an interface appropriate to a certain class of devices. The SCSI high-level driver for tape devices, st, for example, has ioctls to rewind. The high-level SCSI driver for CD-ROM drives, sr, does not.

Note that you rarely need a high-level driver specific to a certain brand of device. At this level, there is little room for one brand to be distinguishable from another.

One SCSI high-level driver that deserves special mention is sg. This driver, called the "SCSI generic" driver, is a fairly thin layer that presents a rather raw representation of the SCSI mid-level driver to the rest of the kernel. User space programs that operate through the SCSI generic driver (because they access device special files whose major number is the one registered by sg (to wit, 21)) have a detailed understanding of SCSI protocols, whereas user space programs that operate through other SCSI high-level drivers typically don't even know what SCSI is. SCSI-Programming-HOWTO has complete documentation of the SCSI generic driver.

The layering order of the SCSI modules belies the way the LKMs depend upon each other and the order in which they must be loaded. You always load the mid-level driver first and unload it last. The low-level and high-level drivers can be loaded and unloaded in any order after that, and they hook themselves into and establish dependency on the mid-level driver at both ends. If you don't have a complete set, you will get a "device not found" error when you try to access a device.

Most SCSI low-level (adapter) drivers don't have LKM parameters; they do generally autoprobe for card settings. If your card responds to some unconventional port address you must bind the driver into the base kernel and use kernel "command line" options. See BootPrompt-HOWTO. Or you can twiddle The Source and recompile.

Many SCSI low-level drivers have documentation in the drivers/scsi directory in the Linux source tree, in files called README.*.


13.4. Network Device Drivers


13.4.10. baycom: BAYCOM AX.25 amateur radio driver

This is a driver for Baycom style simple amateur radio modems that connect to either a serial interface or a parallel interface. The driver works with the ser12 and par96 designs.

For more information, see http://www.baycom.org/~tom.

Example:
modprobe baycom modem=1 iobase=0x3f8 irq=4 options=1

Parameters:

major

major number the driver should use; default 60

modem

modem type of the first channel (minor 0):

1

ser12

2

par96/par97

iobase

base address of the port the driver is to drive. Common values are for ser12 0x3f8, 0x2f8, 0x3e8, 0x2e8 and for par96/par97 0x378, 0x278, 0x3bc.

irq

IRQ the driver is to service. Common values are 3 and 4 for ser12 and 7 for for par96/par97.

options

0

use hardware DCD

1

use software DCD


13.4.11. strip: STRIP (Metricom starmode radio IP) driver

STRIP is a radio protocol developed for the MosquitoNet project to send Internet traffic using Metricom radios. Metricom radios are small, battery powered, 100kbit/sec packet radio transceivers, about the size and weight of a wireless telephone. (You may also have heard them called "Metricom modems" but we avoid the term "modem" because it misleads many people into thinking that you can plug a Metricom modem into a phone line and use it as a modem.) You can use STRIP on any Linux machine with a serial port, although it is obviously most useful for people with laptop computers.

Example:
modprobe strip

There are no module parameters.


13.5. CDROM Device Drivers


13.6. Filesystem Drivers


13.6.10. smbfs: SMB filesystem driver

SMBFS is a filesystem type which has an SMB protocol interface. This is the protocol Windows for Workgroups, Windows NT or Lan Manager use to talk to each other. SMBFS was inspired by Samba, the program written by Andrew Tridgell that turns any unix host into a file server for DOS or Windows clients. See ftp://nimbus.anu.edu.au/pub/tridge/samba/ for this interesting program suite and lots of more information on SMB and NetBIOS over TCP/IP. There you also find explanation for concepts like netbios name or share.

To use SMBFS, you need a special mount program, which can be found in the ksmbfs package, found on ftp://ibiblio.org/pub/Linux/system/Filesystems/smbfs.

Example:
modprobe smbfs

There are no module parameters


13.8. Serial Device Drivers


14. Maintenance Of This Document

This HOWTO is enthusiastically maintained by Bryan Henderson . If you find something incorrect or incomplete or can't understand something, Bryan wants to know so maybe the next reader can be saved the trouble you had.

The source for this document is DocBook SGML, and is available from the Linux Documentation Project.


15. History

I have derived this (in 2001) from the HOWTO of the same name by Laurie Tischler, dated 1997. While I have kept all of the information from that original document (where it is still useful), I have rewritten the presentation entirely and have added a lot of other information. The original HOWTO's primary purpose was to document LKM parameters.

The original HOWTO was first released (Release 1.0) June 20, 1996, with a second release (1.1) October 20, 1996.

The first release of Bryan's rewrite was in June 2001.


16. Copyright

Here is Lauri Tischler's copyright notice from the original document from which this is derived:

This document is Copyright 1996© by Lauri Tischler. Permisson is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies.

Permission is granted to copy and distribute modified versions of this document under the conditions for verbatim copying, provided that this copyright notice is included exactly as in the original, and that the entire resulting derived work is distributed under the terms of a permission notice identical to this one.

Permission is granted to copy and distribute translations of this document into another language, under the above conditions for modified versions.

Bryan Henderson, the current maintainer and contributing author of this document, licenses it under the same terms as above. His work is Copyright 2001©.

Notes

[1]

For the pedantic, see Section 10.6.

[2]

You probably know this type of disk as "IDE". Strictly speaking, IDE is an incorrect appelation. IDE refers to the "Integrated Drive Electronics" which all modern disk drives, notably all SCSI disk drives, use. The first IDE drives in common usage were ATA, and the names kind of got confused. ATA, like SCSI, is a precise specification of electrical signals, commands, etc.