Linux Kernel boot time Optimization

Present state of Linux kernel booting process itself is highly optimized. I mean that there is no major deficiency in kernel loading architecture (as it is mostly serialized/linear) and little/no can be changed.

 startup assembly code (setting up clk,cpu and soc registers) ->
 detect memory layout, detect display, setup mmu, setup interrupt  table, enable paging and start -> 
 mode changes to protected -> 
 kernel decompressing -> 
 transfer to main kernel full start where it again setup interrupt table, stack init, 
 user space -> schedule

Also due to versatility of kernel, to adapt to any board, give us so much control that it depends on US (how we ported and how devices are initialized for our platform) to tweak/optimize the boot time. 
So basically I will try to provide pointers which should be crosschecked when someone need to reduce the kernel boot time in any platform. These pointers can be important for some platform and may be not so relevant for others. 
If someone wants to optimize booting:
  • He must completely understand detailed steps of how kernel boots.
  • He should also know ways/how to measure timing of execution of code/function.
  • Depending on platform he can removed/defer less required steps and prioritize must required steps.
  • This optimization work must be done with boot loader to initial prompt using initramfs.
Main approach is as follows:
  1. Size
    • Reduce the size of binaries for each successive component loaded. 
    • Remove features that are not required

     2. Speed

    • Optimize for target processor. 
    • Use faster medium for loading primary, secondary boot loaders and kernel. 
    • Reduce number of tasks leading to the boot. 
    • Remove features that are not required . 
I will try to mention as it comes to my mind. 
Many things / configurations are stated for kernel , applies equally for XLOADER, UBOOT too. 
    • removing non used features
    • compiling for time
    • debugging features
    • debug messages etc.
Compile kernel with optimization set for time and not space.
  • Start with minimal kernel options and keep on adding as required will give optimized working .config file.(this will remove many not required features - raid/lmv/pcmcia/pci/ide/ipv6/ticp/sctp/not requrired filesystems/ext. partitions/cpu hotplug/memory hotplug).
  • Remove lots of debugging features (mostly from menuconfig->kernel hacking) and lock checks and debug kernel.
  • Remove config info support
Thing is that you have to search how this has to be done for xloader/uboot and kernel.

Kernel Boot time -
  • Generic kernel boot architecture 
This include how kernel image is 
    1. Read (from network/nonvolatile memory/uart/usb),
      **From where kernel is loaded , flash and network are fastest. (sete verify off)
    2. Decompressed (what format kernel image is compressed or uncompressed etc)\
      ** Disable comparing of copied image and crc/md5 check calculation if present
    3. Loaded in to RAM or loading is not required but execute in place (XIP) is used
      ** Disable comparing of copied image and crc/md5 check calculation if present

      ** Disable boot logo loading and display on LCD.
      ** Disable beep or led sinalling/display or animation if used

Linux has options to configure these things to match our needs.
So changing these is highly project specific. Each topic i am going to discuss below will have this condition so I would give an example in this case what i mean when I say that this depends highly on project and there is no particular way which can be said as right and other wrong.
Example - There is sensor network, datalogging and and transmission hardware. this may require a linux kernel based software to handle multicore cpu and periodic data from number of sensors, processing of data, storage of data, hoasting of data, transmission of data.
As this device may be installed in remote location and need to wake up at regular interval and collect data and sleep/shutdown (another microcontroller will boot it again  next time); this is required to save power consumption. Now what we see here is that these devices are installed in remote location.
Logging software is needs little or no updating. Software running is very predictive (in the sense that less flush/invalidate) and non realtime embedded system. So this situation we can reduce RAM and save the considerably.  So using NOR and XIP firmware and XIP Kernel we can implement same which saves boot time as there will be no reading from non-volatile memory, no decompression and no copying kernel to RAM. As code is executed from non-volatile memory hence less RAM is required. 
But XIP is not suited for mobile devices or consumer electronics where execution of kernel code is very random (due to higly interactive device) so fast access to complete code is must. So this optimization is redundant in this case. Now we need to get kernel code to RAM, so we have to consider possibilities of which compression algorithms are less complex and faster and generating relatively more compressed image. So there will be storage vs time choice. 
  • Detection of machine, setting cache policy, setting up paging and interrupt.
  • Setting up of memories required for e.g. for crash kernel, percpu variables, video buffers etc., 
*** Please check/analyze if these can be setup once and stored so that it can be reused next time - 
building zone lists/pages, hash table entries for dentry cache, hash table cache etc. 
**** timer calibration can be optimized, initially use internal register timer to for timing 
*** instead of calibration, set loops per jiffy (lpj) 
  • Setting l2 cache and calibrating timers
  • From here onwards peripherals of soc are "probed" and initialized. **defer setting up of time from network
    **disable/defer peripherals which are not used
    **disable creation of all tty devs and virtual consoles but enable required consoles and uart ports dev entries
    **try to minimize probe function code of components of video, imaging, graphics.
  • Here onwards external peripherals are probed and initialized. ***** try isolating modules which are not required just after boot -> make them loadable module (*.ko) and load them after kernel boot from init.rc
    ** try to defer or disable probing of all the i2c devices because probing of each i2c/pci/ide devices takes a lot of time.
    ** instead of bus probing use fixed probing of available devices present in board.
    ** Noprobe for IDE. and disable pci scan show or disable pci if no peripheral is on pci.
    ** defer loading/mounting partitions of emmc and flash and sdcard (if present)
    ** Try disabling above things in uboot too.
    ** analyse which type of file system is most suited. intiramfs.cramfs and jffs2 later/deffered after boot
    ** try to compile all /bin in to a busybox
    ** try to use fast boot which almost removes boot loader
    ** disable prints, enable silent boot i.e. disable display of console information of boot. This saves quite some time. disable printk. or keep printk but lowest debugging level 

Defering of modules can be done in board specific file which lies in /arch/arm/mach-xxxx/board-xxxx.c 
in menuconfig devices can be chosen which can be made loadable and which one compiled with kernel. 
modules which are not required can be completely disabled. 
modules which can be deferred can be loaded from init.rc script usnig "insmod xxxx.ko" 

  • Profiling of functions - 
Apart from this person must use ktime_t to find the time used by group of functions or a specific function and try to optimize it. 
use kernel timing to see initcall time usage 
 dmesg | grep initcall | sort -k8 -n
 dmesg | grep initcall | sort -k6 -n

** manually reduce long probe functions 
** manually reduce/optimize long init functions 
Use kernel function timing to measure fucnton timing and call tracing. 
 linux trace toolkit
 ftace can be used once linux boots up

Disable tracing mechanism from kernel when tracing output is collected.

Related Articles