Epic kvmtool adventures

This is a quick protocol of the kvmtool hacking session with @richard @dxld @goliath @littlelion and @Lambda

Preliminary discussion

Kvmtool currently has the problem that off-the-shelf OS images will not boot because it doesn’t really have BIOS/firmware support (at least on x86) yet.

We want to investigate what needs to be done to make that happen or make kvmtool usable with common linux distros some other way.

Get it booting with firmware

Richard found the Gerd’s patch series1 for seabios which seems to enable using seabios as a firmware image with kvmtool.

We found that we need to apply patches [1/3] and [2/3] to seabios master to get things working. To apply them we used patch -p1 < mbox...[2]. We didn’t look at [3/3] for now.

[2]: git-am is too picky and won’t allow the necessary fuzz

Konfig need some tweaking though to get things working. We need to first do make defconfig and then set CONFIG_KVMTOOL=y and CONFIG_ROM_SIZE=128 in make menuconfig. The defconfig seems to default to ROM_SIZE=0 even though patch [1/3] adds a default 128 if KVMTOOL. We’re unsure if/how that’s supposed to work.

With this we can get seabios and grub/syslinux booting. However the kernel/initrd will be unable to find any virtio block devices.

This is because kvmtool uses nasty kernel cmdline hacks to elide having to write various tables, see kvm__arch_set_cmdline() in kvmtool/x86/kvm.c. To get PCI working in the guest we found that enabling --sdl on the kvmtool commandline works.

Note that to enable --sdl support in kvmtool, libsdl1.2-dev (Debian) needs to be installed before running make.

While this might seem bizzare we can see how the kernel will try to find either a HOST_BRIDGE or DISPLAY_VGA device in pci_sanity_check() and while kvmtool does not seem to provide a HOST_BRIDGE, with --sdl we get a DISPLAY_VGA device at least. This will subsequently allow pci_check_type1() to succeed.

Note: Kvmtool would usually pass pci=type1 on the kernel cmdline as discussed before to skip this check entirely.

So together with --sdl we can successfully boot both Debian and OpenSUSE images.

Full working command line:

$ ./lkvm --firmware ../seabios/out/bios.bin --disk my-distro-disk.img --sdl

Get SMP working

When we set --cpus to something greater than one we don’t get see any extra cpus in the guest. After much investigating we found that when a kernel is booted by plain kvmtool --kernel an MP table is passed in which contains ncpus info.

Seabios however overwrites the memory where kvmtool put the mptable in malloc_init() specifically the f-segment in the BIOS memory.

1 Like

Seabios: ef88eeaf052c8a7d28c5f85e790c5e45bcffa45e

Config+diff: https://infraroot.at/seabios.tar.xz

Kvmtool: 90b2d3adadf218dfc6bdfdfcefe269843360223c

Edit: The diff in the tar ball makes the image boot without --sdl

Hacking session on [2021-01-26 Tue], with @richard, @dxld and @goliath. Attending: @Lambda, @littlelion and @konfusius

Using DMI/SMBIOS table to get past pci sanity check

Last time we saw that linux will not be able to detect kvmtool’s PCI configuration interface unless we add the --sdl option. One of the cases that allows pci_sanity_check() to pass is to have dmi_get_bios_year() return something after the year 2000.

While this wasn’t in the protocol, last time goliath found out how to get seabios to emit a DMI/SMBIOS table to get past this check.

First we need to enable "BIOS Tables" and USE_SMM for KVMTOOL in Kconfig then add a call to smbios_setup() in the kvmtool_platform_setup() function. See commit “kvmtool: Enable generating BIOS Tables”.

As I’m writing this I do wonder if SMM is truly necessary just to get the DMI table. Looking at the seabios code USE_SMM seems to only control whether SMM interrupt handling is enabled or not. As far as I can see that shouldn’t be neccesarry for the kernel to get the DMI table.

On the kernel side dmi_scan_machine() just looks at some BIOS memory to find the DMI table, no SMM involved.

Committing the rebased patches into git

After spending too much time figuring out what state each of our git checkouts was in we started committing the rebased patches into git proper and pushing them to https://git.sr.ht/~dxld/seabios-kvmtool/ for now.

Since too much has changed on seabios master after the patches were sent they don’t apply cleanly, as we saw last time. To commit them anyway we first download the patches (and review comments) as an mbox file off the ML archive:

$ wget https://lore.kernel.org/kvm/20171102155031.17454-1-kraxel@redhat.com/t.mbox.gz

then we pipe that into git-am(1). It will try to apply each patch in turn and allows us to do stuff when that fails:

$ gunzip t.mbox.gz | git am -3
Applying: kvmtool: initial support
[...]
Patch failed at 0001 kvmtool: initial support
hint: Use 'git am --show-current-patch' to see the failed patch
[...]

Next we (ab)use GNU patch to apply this patch while allowing for fuzz:

$ git am --show-current-patch | patch -p1 --merge
patching file Makefile                                                    
patching file src/fw/paravirt.h
patching file src/fw/paravirt.c
Hunk #1 merged at 722-769.
patching file src/post.c
patching file src/sercon.c
patching file src/Kconfig

The -p1 is pretty standard but --merge makes patch behave more like git-apply in that it puts conflict markers into the conflicting file instead of writing *.rej files though in this case there aren’t any conflicts anyway.

$ git status -s
 M Makefile
 M src/Kconfig
 M src/fw/paravirt.c
 M src/fw/paravirt.h
 M src/post.c
 M src/sercon.c

This leaves the changes as unstaged in our git checkout which we have to git-add after which git-am can take over again to fill in the commit message etc. from the patch.

$ git add -u
$ git am --continue
Applying: kvmtool: initial support
Patch is empty.
[...]

Now we’ve comitted the first patch but git-am complains about the next one being empty, since we just imported the entire ML thread’s mbox this is likely just a review comment so we can git am --skip it:

$ git am --skip
Applying: kvmtool: allow mmio for legacy bar 0
Applying: kvmtool: support larger virtio queues
[...]
Falling back to patching base and 3-way merge...
Auto-merging src/hw/virtio-ring.h
CONFLICT (content): Merge conflict in src/hw/virtio-ring.h
error: Failed to merge in the changes.
Patch failed at 0004 kvmtool: support larger virtio queues
[...]

Alright so git-am managed to apply the second patch but the third one triggers a conflict around MAX_QUEUE_NUM:

<<<<<<< HEAD
#define MAX_QUEUE_NUM      (256)
||||||| merged common ancestors
#define MAX_QUEUE_NUM      (128)
=======
#define MAX_QUEUE_NUM      (260)
>>>>>>> kvmtool: support larger virtio queues

Simple enough, master changed it to 256 while the patch wants 260. After resolving and git-add(ing) that we continue and skip the rest of the comment emails:

$ git add -u
$ git am --continue
Applying: kvmtool: support larger virtio queues
[...]
$ git am --skip
$ git am --skip
$ git am --skip
$ git am --skip

NB: Yes, one could argue downloading the patch emails individually would have been more straightforward than using the full mbox but I didn’t want to have to paste three long urls in here :stuck_out_tongue:

Quick notes

We compared kernel dmesg output between a boot with and without seabios firmware (i.e. directly booting the kernel with --kernel). We can see APIC is still missing, we only got the legacy PIC. We need APIC for SMP but a quick look in the linux sources shows the APIC detection will likely come along with the MPTABLE too.

We decided to refactor seabios’s malloc stuff to not overwrite kvmtool’s MPTABLE. This currently happens in malloc_init in src/malloc.c:

memset(VSYMBOL(zonefseg_start), 0
          , SYMBOL(zonefseg_end) - SYMBOL(zonefseg_start)); 

Next session: <2021-01-31 Sun>

1 Like