Wednesday, April 29, 2009

network boot

Some of my fellows here are working on the Web Of Things. Currently they are planing to setup a playground in the office of our group. For the gateways Dominique decided to use MicroClient Sr. machines, which are tiny VIA based ultra low power PCs.

I suggested to boot and run them completely from the network. This saves money as no local storage devices have to be bought. And it eases administration as there is only one centralized image one has to maintain. I promised to explain him how it works, so here it goes.

First thing to mention is that the firmware of the Realtek network card supports PXE. If you enable "LAN BIOS Execute" in the boot settings configuration of the AMI BIOS you are prompted to press Shift-F10 during boot in order to configure the Realtek Boot Agent. There you can setup the network boot protocol to be PXE and configure the boot order to always boot from network first. That's all that must be done with the MicroClient.


When you reboot the client you see that it is searching for a DHCP server in order to get its network configuration and further boot strapping information. I decided to use the version 3 of the Internet Software Consortium's implementation of DHCP which is in the dhcp3-server package of my Debian system. The configuration in /etc/dhcp3/dhcpd.conf looks like this:

allow booting;
allow bootp;

# Standard configuration directives...
option domain-name "playground.example.com";
option subnet-mask 255.255.255.0;
option broadcast-address 192.168.1.255;
option domain-name-servers 192.168.1.1;
option routers 192.168.1.1;

subnet 192.168.1.0 netmask 255.255.255.0 {
}

# Group the PXE bootable hosts together
group {
       # PXE-specific configuration directives...
       next-server 192.168.1.1;
       filename "pxelinux.0";

       host hostname {
        hardware ethernet 44:4d:50:02:f7:3d;
        fixed-address 192.168.1.2;
       }
}

The important things are the allow statements and the next-server and filename statements which tell the PXE client that at 192.168.1.1 there is a TFTP server from which it can get a boot loader called pxelinux.0.

The next step is to setup the TFTP server. In my case I use a TFTP server derived from OpenBSD TFTP with some extra options added, which is in the tftpd-hpa package. The configuration is easy. You simply say that the server should run in stand-alone rather than in inetd mode and you specify a root directory for the daemon's chroot environment. So the contents of /etc/default/tftpd-hpa is just:

RUN_DAEMON="yes"
OPTIONS="-l -s /var/lib/tftpboot"

The DHCP server promised to the PXE client that there would be a boot loader for him available. So you need to provide one. The syslinux package comes along with a ready-to-go image. So you simply have to copy /usr/lib/syslinux/pxelinux.0 to the TFTP root directory.

The next step is to provide a configuration for the boot loader. The PXELINUX wiki explains the rules after which PXELINUX will search for its configuration file on the TFTP server. In my case this is the file /var/lib/tftpboot/pxelinux.cfg/01-44-4d-50-02-f7-3d (compare with the client MAC address from the DHCP server configuration above). This file is a normal SYSLINUX boot configuration, which looks like this:

default playground

label playground
kernel vmlinuz
initrd initrd.img
append ip=dhcp root=/dev/nfs ro nfsroot=192.168.1.1:/var/lib/nfs/debian-lenny-32/

This provides a single boot target using the kernel vmlinuz and the initial ramdisk initrd.img. The root file system is a NFS file system located at the server 192.168.1.1 under the directory /var/lib/nfs/debian-lenny-32/.

Note, that according to the Syslinux documentation you require version 3.71 and higher to use the initrd statement. If you have an older version - for example because you run the current Ubuntu release *grml* - you need to add the kernel boot option initrd=initrd.img to the append statement.
So, the next thing you need is a system installation. You can use debootstrap to install a 32-bit lenny installation under /var/lib/nfs/debian-lenny-32/ and enter the new system like this:

for i in dev proc sys; do mount --bind /$i /var/lib/nfs/debian-lenny-32/$i; done
chroot /var/lib/nfs/debian-lenny-32 /bin/bash

Then install the linux-image-2.6.26-2-486 kernel image package and other packages you want to have on the MicroClients later. As the Debian kernel needs an initrd image for booting you have to create a fitting one. Therefore set the root file system in /etc/fstab to
192.168.1.1:/var/lib/nfs/debian-lenny-32 /  nfs  ro,auto  0 0
and enable network boot support for the initrd by setting BOOT=nfs in /etc/initramfs-tools/initramfs.conf. Last create the initrd image with update-initramfs -k 2.6.26-2-486 -u.


As stated above the PXE client is looking for the kernel and the initrd at the TFTP server. So you have to copy both files from /var/lib/nfs/debian-lenny-32/boot to /var/lib/tftpboot. Symbolic links unfortunately do not work because the TFTP server can not access files outside of its chroot directory.

Next you have to setup an NFS server. I recommend the nfs-kernel-server package because I had problems with the user land daemon. But I did not hunt them down so I might work as well. In any case you have to export the lenny installation directory and set proper access rights. This is done in /etc/exports with
/var/lib/nfs/debian-lenny-32/ 192.168.1.0/24 (ro,no_root_squash,no_subtree_check)
Start all daemons and let the client boot. If all went well the client will find the DHCP server, the TFTP server, the PXELINIUX image, the PXE configuration, the kernel, the initrd and the NFS export. When the kernel runs /sbin/init you will encounter some error about read-only file system. After quite some time all timeouts are over and you will be prompted to log in. Congratulations! Your network boot system is running.


As we want to boot multiple clients with the same installation we can not grant them read write access. Instead with the above configuration the root file systems are mounted read-only. But on a standard Debian system there are daemons which want to write log files and acquire lock files. So we need writable file systems for /var and /tmp at least.

As we want to keep the MicroClients diskless I have chosen to use tmpfs, which is a file system which stores its contents into RAM. This has the valuable benefit that a system reboot resets all system states as well. For /tmp tmpfs clearly is the file system of choice. For /var things are a bit different, because the system expects a certain directory structure on /var, which a virgin tmpfs does not have.
As a solution I decided to use aufs, which is a so called union file system that is able to stack multiple file systems and do copy-on-write operations. So on top of the read-only /var directory from the system installation we stack a read-write file systems to carry the write accesses of the system. As aufs is not integrated into the linux kernel tree you have to install the aufs-modules-2.6-486 package. Additionally you need the aufs-tools package.

In order to create the above sketched setup you need to alter the system installation.
mkdir /aufs mkdir -p /aufs/var/{mount,rw} mv /var /aufs/var/ro ln -s /aufs/var/mount /var

/aufs/var/mount is going to be the mount point of the aufs file system. /aufs/var/ro is the original contents of /var and /aufs/var/rw is going to be the mount point for the tmpfs file system. To get this done you need to change /etc/fstab:

192.168.1.1:/var/lib/nfs/debian-lenny-32 / nfs ro,auto 0 0 /dev/shm /tmp tmpfs size=128M,auto 0 0 /dev/shm /aufs/var/rw tmpfs size=128M,auto 0 0 none /aufs/var/mount aufs dirs=/aufs/var/rw=rw:/aufs/var/ro=ro,auto 0 0

The downside of this solution is that the symbolic link must be set to /aufs/var/mount for the boot environment and to /aufs/var/ro for local maintenance. This is a bit cumbersome but I don't know any better solution currently.

When you reboot the client now things are much better. But there is still an error about read-only file systems. This is because unfortunatelly /var is not the only place where the system wants to write to. In this case udev creates dynamic rule files and wants to write them to /etc/udev/rules.d. To turn turn this directory writable you can do the aufs trick again.

mkdir -p /aufs/etc/udev/rules.d/{rw,mount} mv /etc/udev/rules.d /aufs/etc/udev/rules.d/ro ln -s /var/etc/udev/rules.d/ro /etc/udev/rules.d echo "/dev/shm /aufs/etc/udev/rules.d/rw tmpfs size=1M,auto" >> /etc/fstab echo "none /aufs/etc/udev/rules.d/mount aufs dirs=/aufs/etc/udev/rules.d/rw=rw:/aufs/etc/udev/rules.d/ro=ro,auto 0 0" >> /etc/fstab

That's it. I hope I did not forget anything. If this does not work for you feel free to leave a comment or send me an email. I want to thank Ulrich Dangel who helped me out when I got stucked. And I want to mention this article on setting up diskless booting of a Via EDEN computer from an Ubuntu server, which was a very good starting point for me.

In order to easily maintain the Debian installation I wrote the following chroot script:

#!/bin/bash

usage() {
        echo "$0 <chroot name>" 1>&2
        exit 1
}

if [ -z $1 ]; then
        usage
fi

chroot_path="/var/lib/nfs/root/$1"
if [ ! -d $chroot_path ]; then
        echo "chroot directory not found: $chroot_path" 1>&2
        exit 1
fi

# shut down nfs server
/etc/init.d/nfs-kernel-server stop > /dev/null

# mount extra file systems
mounts="dev proc sys"
for i in $mounts; do
        mount --bind /$i $chroot_path/$i
done

# prepare links for local maintenance
links="var etc/udev/rules.d"
for link in $links; do
        if [ ! -L $chroot_path/$link ]; then
                echo "$chroot_path is not a valid wot installation. Symbolic link /$link is missing" 1>&2
                exit 1
        fi
        rm $chroot_path/$link
        ln -s /aufs/$link/ro $chroot_path/$link
done


# enter the chroot
/usr/sbin/chroot $chroot_path /bin/bash

# reset links for remote mount
for link in $links; do
        rm $chroot_path/$link
        ln -s /aufs/$link/mount $chroot_path/$link
done

# umount extra file systems
for i in $mounts; do
        umount $chroot_path/$i
done

# restart nfs server
/etc/init.d/nfs-kernel-server start > /dev/null

No comments:

Post a Comment