Anyone can build Linux kernel. Linux Kernel is provided freely on http://www.kernel.org/. From the earlier version until the latest version are available. Kernel is release regularly and use a versioning system to distinguish earlier and later kernel. To know Linux Kernel version, a simple command uname can be used. For example, I invoke this and got message
# uname -r
At that command output, you can see dotted decimal string 3.7.8. This is the linux kernel version. In this dotted decimal string, the first value 3 denotes major relase number. Second number 7 denotes minor release and the third value 8 is called the revision number. The major release combined with minor release is called the kernel series. Thus, I use kernel 3.7
Another string after 3.7.8 is gnx-z30a. I’m using a self-compiled kernel and add -gnx-z30a as a signature of my kernel version. Some distribution also gives their signature after the kernel, such as Ubuntu, Fedore, Red Hat, etc.
An example of building kernel can be read at this article.
Kernel Source Exploration
For building the linux kernel , you will need latest or any other stable kernel sources . For example we have taken the sources of stable kernel release version 3.8.2 . Different versions of Linux Kernel sources can be found at http://www.kernel.org . Get latest or any stable release of kernel sources from there.
Assuming you have download the stable kernel release source on your machine, extract the source and put it to /usr/src directory.
Most of the kernel source is written in C .It is organized in various directories and subdirectories . Each directory is named after what it contains . Directory structure of kernel may look like the below diagram.
Know let’s dive more into each directories.
Linux kernel can be installed on a handheld device to huge servers. It supports intel, alpha, mips, arm, sparc processor architectures . This ‘arch’ directory further contains subdirectories for a specific processor architecture. Each subdirectory contains the architecture dependent code. For example , for a PC , code will be under arch/i386 directory , for arm processor , code will be under arch/arm/arm64 directory etc.
LILO or linux loader loads the kernel into memory and then control is passed to an assembler routine, arch/x86/kernel/head_x.S. This routine is responsible for hardware initialization , and hence it is architecture specific. Once hardware initialization is done , control is passed to start_kernel() routine that is defined in init/main.c . This routine is analogous to main() function in any ‘C’ program , it’s the starting point of kernel code . After the architecture specific setup is done , the kernel initialization starts and this kernel initialization code is kept under init directory. The code under this directory is responsible for proper kernel initialization that includes initialization of page addresses, scheduler, trap, irq, signals, timer, console etc.. The code under this directory is also responsible for processing the boot time command line arguments.
This directory contains source code of different encryption algorithms , e.g. md5,sha1,blowfish,serpent and many more . All these algorithms are implemented as kernel modules . They can be loaded and unloaded at run time . We will talk about kernel modules in subsequent chapters.
This directory contains documentation of kernel sources.
If we understand the device driver code , it is splitted into two parts. One part communicates with user, takes commands from user , displays output to user etc. The other part communicates with the device, for example controlling the device , sending or receiving commands to and from the device etc. The part of the device driver that communicates with user is hardware independent and resides under this ‘drivers’ directory. This directory contains source code of various device drivers. Device drivers are implemented as kernel modules. As a matter of fact, majority of the linux kernel code is composed of the device drivers code , so majority of our discussion too will roam around device drivers.
This directory is further divided into subdirectories depending on the device’s driver code it contains.
- contains drivers for block devices,e.g. hard disks.
- contains drivers for proprietary cd-rom drives.
- contains drivers for character devices , e.g. – terminals, serial port, mouse etc.
- contains isdn drivers.
- contains drivers for network cards.
- contains drivers for pci bus access and control.
- contains drivers for scsi devices.
- contains drivers for ide devices
- contains drivers for various soundcards.
Another part of a device driver, that communicates with the device is hardware dependent, more specifically bus dependent. It is dependent on the type of bus which device uses for the communication. This bus specific code resides under the arch/ directory
Linux has got support for lot of file systems, e.g. ext2,ext3, fat, vfat,ntfs, nfs,jffs and more. All the source code for these different file systems supported is given in this directory under file system specific sudirectory,e.g. fs/ext2, fs/ext3 etc.
Also, linux provides a virtual file system(VFS) that acts like a wrapper to these different file systems . Linux virtual file system interface enables the user to use different file systems under one single root ( ‘/’) . Code for vfs also resides here. Data structures related to vfs are defined in include/linux/fs.h. Please take a note , it is very important header file for kernel development.
This is one of the most important directories in kernel. This directory contains the generic code for kernel subsystem i.e. code for system calls , timers, schedulers, DMA , interrupt handling and signal handling. The architecture specific kernel code is kept under arch/*/kernel.
Along with the kernel/ directory this include/ directory also is very important for kernel development .It includes generic kernel headers . This directory too contains many subdirectories . Each subdirectory contains the architecture specific header files .
Code for all three System V IPCs(semaphores, shared memory, message queues) resides here.
Kernel’s library code is kept under this directory. The architecture specific library’s code resides under arch/*/lib.
This too is very important directory for kernel development perspective. It contains generic code for memory management and virtual memory subsystem. Again, the architecture specific code is in arch/*/mm/ directory. This part of kernel code is responsible for requesting/releasing memory, paging, page fault handling, memory mapping, different caches etc.
The code for kernel’s networking subsystem resides here. It includes code for various protocols like ,TCP/IP, ARP, Ethernet, ATM, Bluetooth etc. . It includes socket implementation too , quite interesting directory to look into for networking geeks.
This directory includes kernel build and configuration subsystem. This directory has scripts and code that is used to configure and build kernel.
This directory includes security functions and SELinux code, implemented as kernel modules.
This directory includes code for sound subsystem.
When the kernel is compiled , lot of code is compiled as modules which will be added later to kernel image at runtime. This directory holds all those modules. It will be empty until the kernel is built at least once.
Apart from these important directories , also there are few files under the root of kernel sources.
- COPYING – Copyright and licensing (GNU GPL v2).
- CREDITS – partial credits-file of people that have contributed to the Linux project.
- MAINTAINERS – List of maintainers who maintain kernel subsystems and drivers. It also describes how to submit kernel changes.
- Makefile – Kernel’s main or root makefile.
- README – This is the release notes for linux kernel. it explains how to install and patch the kernel , and what to do if something goes wrong .
We can use make documentation targets to generate linux kernel documentation. By running these targets, we can construct the documents in any of the formats like pdf, html,man page, psdocs etc.
For generating kernel documentation, give any of the commands from the root of your kernel sources.
Browsing source code of a large project like linux kernel can be very tedious and time consuming . Unix systems have provided two tools, ctags and cscope for browsing the codebase of large projects. Source code browsing becomes very convenient using those tools. Linux kernel has built-in support for cscope.
Using cscope, we can:
- Find all references of a symbol
- Find function’s definition
- Find the caller graph of a function
- Find a particular text string
- Change the particular text string
- Find a particular file
- Find all the files that includes a particular file