Colin.Guthr.ie Illegitimi non carborundum

17Dec/077

Network booting + NFS root with Mandriva

I've been running a version of Mandriva on my VIA EPIA machine for a number of years but recently the question came up on the Cooker mailing list so I thought I'd pull my finger out and actually document this a little!

The concept is fairly simple for me. I have an EPIA machine that lives on my network and plays/display my media at me: music, movies, photos etc. As this lives in my sitting room, the principle is the quieter the better and a hard disk is just one device that produces unnecessary heat/noise and sucks power when not strictly speaking needed.

So my server also runs Mandriva and acts as a DHCP, DNS and NFS server. Now an NFS root filesystem works something like this:

  1. The client machine bootstraps itself and asks the network for an IP address.
  2. The server responds and also tells it which server to ask for boot information (the same server in my case) and which file to execute on that server (PXE - Preboot eXecution Environment).
  3. The client uses TFTP and grabs this file and "runs" it. I'm using pxelinux so it also loads an additional config file which tells the client which kernel and ramdisk to use.
  4. The client then grabs the kernel and ramdisk from the TFTP server and kickstarts itself.
  5. As part of the bootup process rather than mount a local hard drive for the / filesystem, it instead mounts an NFS share instead.

So I'll break this write up into a few sections: installing the root filesystem on the server, configuring DHCP on the server, configuring TFTP on the server, getting a suitable kernel and ramdisk, configuring PXE on the server, configuring NFS sharing on the server and finally patching the root filesystem for a read only operation (as we can in theory use the same filesystem for multiple machines and we want it to remain stateless).

Installing the root filesystem

This is relatively simple an assumes that you have the relevant URPMI sources configured on your server. It wouldn't take a lot to use different sources should you desire and it's just a matter of fiddling with the command line options a little. Other systems than URPMI (e.g. YUM) can also be used with suitable tweakage. For this process I presume the path on your server's disk for the root filesystem is /path/to/nfs/root.

mkdir -p /path/to/nfs/root{,/dev,/etc,/var/lib/rpm}
mknod /path/to/nfs/root/dev/null c 1 3
chmod 666 /path/to/nfs/root/dev/null
touch /path/to/nfs/root/etc/fstab
rpm --root /path/to/nfs/root --initdb
urpmi --root /path/to/nfs/root /path/to/basesystem-chroot.noarch.rpm \
  basesystem vim-minimal sysklogd mandriva-release-Free \
  kernel-desktop-latest nfs-utils-clients

The only RPM from above there is basesystem-chroot package which I'll add to Mandriva's official contrib media before 2008.1 but which you can grab for 2008.0 here.. To be honest it's not really needed, but it just allows you to skip packages that are not needed for network booting (e.g. a bootloader!)

The above command should install a very minimal system. Other packages can be installed to your liking (e.g. task-gnome or task-kde etc.). This should be enough to get things going tho'.

Configuring DHCP

Firstly you need the package dhcp-server. There are plenty of guides on setting up dhcpd so I wont go over old ground, but suffice to say there were a few tweaks that are important for NFS root.

Something to note is that I opted for fixed IP addresses for my clients. This is because each client may be subtly different in terms of their setup/hardware and thus I want a way to differentiate them. So here is an extract of my config:

        group {
              filename "/X86PC/linux/linux.0";
              next-server 192.168.0.1;

              host epia { 
                   hardware ethernet aa:bb:cc:dd:ee:ff; 
                   fixed-address 192.168.0.2; 

                   ## The next couple of lines tell the remote machine
                   ## where to boot from.
                   option root-path = "-o ro,rsize=32768,wsize=32768,exec,nfsvers=3,hard,proto=tcp 192.168.0.1:/path/to/nfs/root";
                   if substring ( option vendor-class-identifier, 0, 5 ) = "udhcp" {
                       # Forcibly add root-path to list
                       option dhcp-parameter-request-list = concat ( option dhcp-parameter-request-list, 11 );
                   }
              }
        }

Fill in the IP and MAC addresses as appropriate and slap the above into your subnet definition in /etc/dhcpd.conf. Remember to update the root filesystem path to match your setup too!

Configuring TFTP

The client needs to use Trivial FTP to grab some of it's file to bootstrap itself, so we have to make sure the server is configured for this. Firstly install tftp-server. It's disabled by default so run chkconfig --add tfpt. It's technically started by xinetd and so it's configuration file is /etc/xinetd.d/tftp. As you'll see from looking at this file the default path to serve it's files from is /var/lib/tftpboot. It's easiest to just accept this path as we'll see later on.

Kernel and ramdisk

OK, so now we need a kernel and ramdisk. When creating the ramdisk, the easiest situation is when the server is running the same kernel that you want for your client. It is possible to do clever things with chroot in order to use the the NFS root itself but this is slightly more complex. Either way it is essential that you use a kernel that is installed in your NFS root such that the module match.

So installl mkinitrd-net and then run it as follows:

mkinitrd-net via-rhine pcnet32

Here I've specified two network drivers to include in the ramdisk. My EPIA wants via-rhine and I've used pcnet32 for VMWare which I use for testing this crazy setup! Depending on your hardware you will have to select different modules. Assuming you are running this on the server, you will find it has produced some files in /var/lib/tftpboot, one of them called: initrd-via-rhine.pcnet32.img (technically this is just a symlink). This is your ramdisk! Remember it's name.

Now you need the kernel. That's easy. Just copy it from /boot or /path/to/nfs/root/boot. Remember to copy the real file and not it's symlink. The file you want starts with vmlinuz. Place it next to the ramdisk from above. Let's assume you call it vmlinuz-2.6.22.12-desktop-1mdv.

Configuring PXE

Simply install pxelinux. This will create a few files including /var/lib/tftpboot/X86PC/linux/linux.0 which if you've been paying attention, we already referenced in our dhcpd.conf above (see, things are starting to fall into place!!)

In order to customise your server you need to edit the config file. For simplicity you can just edit /var/lib/tftpboot/X86PC/linux/pxelinux.cfg/default but you can have a finer grain (e.g. per client IP) config if you like - just read up on PXElinux for that.

Here is my config:

DEFAULT default
LABEL default
    KERNEL vmlinuz-2.6.22.12-desktop-1mdv
    APPEND ro initrd=initrd-via-rhine.pcnet32.img splash=silent vga=788 root=/dev/nfs

Again, feel free to fiddle if this isn't suitable for your needs. There are lots of options available so you should find a way to do what you want.

NFS exports

You need to configure your server to share the NFS mount. You can do this via various drak tools or by editing /etc/exports. Here is my entry:

/path/to/nfs/root	*(ro,no_root_squash,no_subtree_check,sync)

Note the read only option. It's important that this file system cannot be altered by a client just in case it somehow manages to mount it read write.

Tweaking the NFS root

Most things are now in place and we are almost ready for the off!!! There are just a couple of small tweaks to make things work smoothly.

Firstly edit /path/to/nfs/root/etc/sysconfig/init and set PROMPT=no. This should not be needed but I had some weird issues with this. Next edit /path/to/nfs/root/etc/sysconfig/installkernel and set NOENTRY="yes" (probably not needed as we do not even have a bootloader!). Finally edit /path/to/nfs/root/etc/sysconfig/readonly-root and set READONLY=yes and TEMPORARY_STATE=no. Despite the optimistically named file, it does not work all that well, but we'll sort that out!

Now you need to make a small alteration to /path/to/nfs/root/etc/rc.sysinit. For convenience here is a patch:

--- /path/to/nfs/root/etc/rc.sysinit.orig	2007-10-03 22:49:52.000000000 +0100
+++ /path/to/nfs/root/etc/rc.sysinit	2007-12-16 19:06:45.000000000 +0000
@@ -33,6 +33,10 @@
 	mount -n -t sysfs /sys /sys >/dev/null 2>&1
 fi
 
+if [ -f /etc/rc.earlylocal ]; then
+    . /etc/rc.earlylocal
+fi
+
 . /etc/init.d/functions
 
 # This must be done before anything else because now most messages
@@ -579,7 +583,8 @@
 	READONLY=no
 fi
 
-if [ "$READONLY" = "yes" -o "$TEMPORARY_STATE" = "yes" ]; then
+if [ "$TEMPORARY_STATE" = "yes" ]; then
+#if [ "$READONLY" = "yes" -o "$TEMPORARY_STATE" = "yes" ]; then
 
 	mount_empty() {
 		if [ -e "$1" ]; then

As you can see, it fixes an if statement in the Fedora initscripts that assumes that if you have a READONLY filesystem that you also want a temporary state. This is a fair assumption to make but the default TEMPORARY_STATE system is not all that great and wei'll implement our own version via the new file, rc.earlylocal. (this keeps the changes to rc.sysinit to a minimum.

So you probably want to know what's in rc.earlylocal right? Well here it is:

#!/bin/bash

# This file is sourced early on in rc.sysinit and this means
# we can do some very early setup for the read only / filesystem.

# Firstly, we need some writable directories.
# We do this using unionfs to combine the readonly versions with
# a tmpfs location to provide writable space.

# Mount somewhere we can use in ram
TMPFS=/mnt/tmpfs
mount -t tmpfs none $TMPFS -o size=32m

for fs in etc var tmp root; do
  echo "Mounting unionfs on /$fs to make it writable (any changes will not be saved)"
  mkdir -p $TMPFS/unionfs/$fs
  mount -t unionfs -o dirs=$TMPFS/unionfs/$fs=rw:/$fs=ro none /$fs
done

# Mount the correct config directory for our hostname
HN=$(hostname -s)
if [ -d /var/configs/$HN ]; then
    mount /var/configs/$HN /var/config -o bind
else
    echo "Warning: Config directory /var/configs/$HN does not exist"
fi

The first part of this file uses tmpfs and unionfs to make /var, /etc, /root and /tmp "writable". All changes will be lost at powerdown, but this is acceptable to us.

The second part of this script simply mounts a host specific config directory at /var/config. I use this to store each host's e.g. xorg.conf and just symlink the real file to this space. This allows me to vary key configuration files for different hosts. If you only have one host this is simply not needed. An arguably more elegant solution to this is to use something like cfengine

Suck it and see

OK, that's all folks. We've installed and configured pretty much everything we need. It's entirely possible that even if you follow the above to the letter that it wont work for you! This is because I've lied consistently throughout with regards to file paths and such. My setup is actually a little different and I've perhaps made a mistake in genericificating the setup a little. If so, and you are able to correct me, please drop me a mail and let me know.

I hope this helps someone out there. If you have any questions, please feel free to drop me a line.

Share and Enjoy:
  • Digg
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Yahoo! Buzz
  • Twitter
  • Google Bookmarks
  • Identi.ca
  • Slashdot
  • AL13N

    it seems there’s no mkinitrd-net anymore in 2010.1

    i’ve tried lotsa thing to make a good initrd (even dracut), all through a chrooted install, but dracut seems to not include any modules at all. and mkinitrd gives errors with missing symbols??? when insmodding the modules?

    • Colin

      If you are getting undefined symbols, then it smells like you are creating an initrd for a different kernel to the one being run. e.g. your actual boot kernel is 2.6.30 and your modules are from 2.6.31 or similar. Even just a variation in the kernel flavour may be enough to get these symbol errors. e.g. in Mandriva there are server kernels and desktop kernels and a few other variations too. Make sure you use the right kernel when running mkinitrd (or mkinitrd-net) to generate the initrd.

      • AL13N

        yes, that’s what i thought at first, but it’s odd, since it’s a chrooted install, with only one kernel… and i’m building it inside the chroot… so that would either mean the modules installed are built wrong for that kernel…

        I also doublechecked if i copied the kernel and initrd over to the PXE config file but all looked ok…

        the thing is, there is no mkinitrd-net anymore and mkinitrd isn’t even maintained… people are telling me i need to use dracut, but it’s still in development… 🙁

        tbh, i prolly will just make my own initrd manually… it’ll be easier and at least i’ll know if it doesn’t work it’s my own damn fault.

        • Colin

          Did you really copy the right initrd from /var/lib/tftpboot/? (this is where mkinitrd-net puts it by default).

          Not sure what you mean by “there is no mkinitrd-net anymore” as the package still exists here AFAICT. I used it not long ago after updating my network boot systems to 2010.1.

          You do have to make sure that the /dev and /proc on the NFS mount is suitably clean tho’, otherwise udev wont start and other issues like sysfs not being mounted also get in the way.

          Also mkinitrd is still the recommended and maintained initrd creator in Mandriva. Dracut may be what we use next time around, but for now mkinitrd is still preferred.

  • AL13N

    i’ve had several issues getting it to work for myself:

    * mkinitrd-net doesn’t exist for x86_64
    * dracut seems to fail on several points to make a working initrd in relation to NFS, but after several modifications it sort of works
    * i added a tmpfs mountpoint in fstab for /tmp
    * even though i wanted to have it read-write, (so i left readonly alone), the NFS mont point didn’t want to remount rw … i’ve forced it to boot by adding rw to kernel command line.
    * there are still some kind of i18n warnings when booting, saying File Not Found, even thought the settings appear normal-ish…

  • oboingo

    I’ve almost successfully done this with mandriva 2010.1. “Almost” being the operative word there.

    The client systems boot up completely to runlevel 3 (login:), but it refuses to let me login in! Frustrating to be so close and yet so far!

    Clearly I’m missing something rather obvious. You’ve probably moved on to Mageia but I can really use some pointers.

    Thanks!

    • oboingo

      Eh! Never mind. It didn’t work because I was trying to be too clever. Bought a Seagate Dockstar (armel arch) on sale. I thought in theory that could work well as a nfsroot server (hacked with openwrt) to x86 clients, and it does, except for the login issue. (PAM inheritance??)

      Long story short a Mandriva box works just fine as nfsroot server to other x86 clients.

  • Pingback: Network Boot Mageia: PXE + NFS Root goodness « Colin.Guthr.ie()