Epic kvmtool adventures

This is a quick protocol of the kvmtool hacking session with @richard @dxld
@goliath @littlelion and @Lambda

Preliminary discussion

Kvmtool currently has the problem that off-the-shelf OS images will not
boot because it doesn’t really have BIOS/firmware support (at least on x86)
yet.

We want to investigate what needs to be done to make that happen or make
kvmtool usable with common linux distros some other way.

Get it booting with firmware

Richard found the Gerd’s patch series1 for seabios which seems to
enable using seabios as a firmware image with kvmtool.

We found that we need to apply patches [1/3] and [2/3] to seabios master to
get things working. To apply them we used patch -p1 < mbox...[2]. We
didn’t look at [3/3] for now.

[2]: git-am is too picky and won’t allow the necessary fuzz

Konfig need some tweaking though to get things working. We need to first do
make defconfig and then set CONFIG_KVMTOOL=y and CONFIG_ROM_SIZE=128
in make menuconfig. The defconfig seems to default to ROM_SIZE=0 even
though patch [1/3] adds a default 128 if KVMTOOL. We’re unsure if/how
that’s supposed to work.

With this we can get seabios and grub/syslinux booting. However the
kernel/initrd will be unable to find any virtio block devices.

This is because kvmtool uses nasty kernel cmdline hacks to elide having to
write various tables, see kvm__arch_set_cmdline() in
kvmtool/x86/kvm.c. To get PCI working in the guest we found that enabling
--sdl on the kvmtool commandline works.

Note that to enable --sdl support in kvmtool, libsdl1.2-dev (Debian)
needs to be installed before running make.

While this might seem bizzare we can see how the kernel will try to find
either a HOST_BRIDGE or DISPLAY_VGA device in pci_sanity_check() and
while kvmtool does not seem to provide a HOST_BRIDGE, with --sdl we get
a DISPLAY_VGA device at least. This will subsequently allow
pci_check_type1() to succeed.

Note: Kvmtool would usually pass pci=type1 on the kernel cmdline as
discussed before to skip this check entirely.

So together with --sdl we can successfully boot both Debian and OpenSUSE
images.

Full working command line:

$ ./lkvm --firmware ../seabios/out/bios.bin --disk my-distro-disk.img --sdl

Get SMP working

When we set --cpus to something greater than one we don’t get see any
extra cpus in the guest. After much investigating we found that when a
kernel is booted by plain kvmtool --kernel an MP table is passed in which
contains ncpus info.

Seabios however overwrites the memory where kvmtool put the mptable in
malloc_init() specifically the f-segment in the BIOS memory.

1 Like

Seabios: ef88eeaf052c8a7d28c5f85e790c5e45bcffa45e

Config+diff: https://infraroot.at/seabios.tar.xz

Kvmtool: 90b2d3adadf218dfc6bdfdfcefe269843360223c

Edit: The diff in the tar ball makes the image boot without --sdl

Hacking session on [2021-01-26 Tue], with @richard, @dxld and
@goliath. Attending: @Lambda, @littlelion and @konfusius

Using DMI/SMBIOS table to get past pci sanity check

Last time we saw that linux will not be able to detect kvmtool’s PCI
configuration interface unless we add the --sdl option. One of the cases
that allows pci_sanity_check() to pass is to have dmi_get_bios_year()
return something after the year 2000.

While this wasn’t in the protocol, last time goliath found out how to get
seabios to emit a DMI/SMBIOS table to get past this check.

First we need to enable "BIOS Tables" and USE_SMM for KVMTOOL in
Kconfig then add a call to smbios_setup() in the
kvmtool_platform_setup() function. See commit “kvmtool: Enable generating
BIOS Tables”
.

As I’m writing this I do wonder if SMM is truly necessary just to get the
DMI table. Looking at the seabios code USE_SMM seems to only control
whether SMM interrupt handling is enabled or not. As far as I can see that
shouldn’t be neccesarry for the kernel to get the DMI table.

On the kernel side dmi_scan_machine() just looks at some BIOS memory to
find the DMI table, no SMM involved.

Committing the rebased patches into git

After spending too much time figuring out what state each of our git
checkouts was in we started committing the rebased patches into git proper
and pushing them to https://git.sr.ht/~dxld/seabios-kvmtool/ for now.

Since too much has changed on seabios master after the patches were sent
they don’t apply cleanly, as we saw last time. To commit them anyway we
first download the patches (and review comments) as an mbox file off the ML
archive:

$ wget https://lore.kernel.org/kvm/20171102155031.17454-1-kraxel@redhat.com/t.mbox.gz

then we pipe that into git-am(1). It will try to apply each patch in turn
and allows us to do stuff when that fails:

$ gunzip t.mbox.gz | git am -3
Applying: kvmtool: initial support
[...]
Patch failed at 0001 kvmtool: initial support
hint: Use 'git am --show-current-patch' to see the failed patch
[...]

Next we (ab)use GNU patch to apply this patch while allowing for fuzz:

$ git am --show-current-patch | patch -p1 --merge
patching file Makefile                                                    
patching file src/fw/paravirt.h
patching file src/fw/paravirt.c
Hunk #1 merged at 722-769.
patching file src/post.c
patching file src/sercon.c
patching file src/Kconfig

The -p1 is pretty standard but --merge makes patch behave more like
git-apply in that it puts conflict markers into the conflicting file
instead of writing *.rej files though in this case there aren’t any
conflicts anyway.

$ git status -s
 M Makefile
 M src/Kconfig
 M src/fw/paravirt.c
 M src/fw/paravirt.h
 M src/post.c
 M src/sercon.c

This leaves the changes as unstaged in our git checkout which we have to
git-add after which git-am can take over again to fill in the commit
message etc. from the patch.

$ git add -u
$ git am --continue
Applying: kvmtool: initial support
Patch is empty.
[...]

Now we’ve comitted the first patch but git-am complains about the next one
being empty, since we just imported the entire ML thread’s mbox this is
likely just a review comment so we can git am --skip it:

$ git am --skip
Applying: kvmtool: allow mmio for legacy bar 0
Applying: kvmtool: support larger virtio queues
[...]
Falling back to patching base and 3-way merge...
Auto-merging src/hw/virtio-ring.h
CONFLICT (content): Merge conflict in src/hw/virtio-ring.h
error: Failed to merge in the changes.
Patch failed at 0004 kvmtool: support larger virtio queues
[...]

Alright so git-am managed to apply the second patch but the third one
triggers a conflict around MAX_QUEUE_NUM:

<<<<<<< HEAD
#define MAX_QUEUE_NUM      (256)
||||||| merged common ancestors
#define MAX_QUEUE_NUM      (128)
=======
#define MAX_QUEUE_NUM      (260)
>>>>>>> kvmtool: support larger virtio queues

Simple enough, master changed it to 256 while the patch wants 260. After
resolving and git-add(ing) that we continue and skip the rest of the
comment emails:

$ git add -u
$ git am --continue
Applying: kvmtool: support larger virtio queues
[...]
$ git am --skip
$ git am --skip
$ git am --skip
$ git am --skip

NB: Yes, one could argue downloading the patch emails individually would
have been more straightforward than using the full mbox but I didn’t want
to have to paste three long urls in here :P

Quick notes

We compared kernel dmesg output between a boot with and without seabios
firmware (i.e. directly booting the kernel with --kernel). We can see
APIC is still missing, we only got the legacy PIC. We need APIC for SMP but
a quick look in the linux sources shows the APIC detection will likely come
along with the MPTABLE too.

We decided to refactor seabios’s malloc stuff to not overwrite kvmtool’s
MPTABLE. This currently happens in malloc_init in src/malloc.c:

memset(VSYMBOL(zonefseg_start), 0
          , SYMBOL(zonefseg_end) - SYMBOL(zonefseg_start)); 

Next session: <2021-01-31 Sun>

1 Like