Google Coral TPU on a KVM Guest
I recently purchased a Coral.ai co-processor for use with Frigate, an open-source security camera NVR that utilizes AI models to very accurately detect motion events. Google’s USB based Coral device promises to take the processing burden away from my forty Intel CPU cores that could barely handle it.
The snag?
I run everything in libvirt/KVM including the VM that runs the Frigate docker container. This means we have to figure out how to pass my Coral.ai TPU to my KVM Guest. I plugged in my Coral.ai device into my server’s USB 3.0 port and it came up right away as “Global Unichip Corp”.
root@compute1:~# lsusb -tv
/: Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/6p, 5000M
ID 1d6b:0003 Linux Foundation 3.0 root hub
|__ Port 6: Dev 2, If 0, Class=Application Specific Interface, Driver=, 5000M
ID 1a6e:089a Global Unichip Corp.
...
root@compute1:~#
PCI Pass Through
First I considered passing through the USB device itself to my KVM guest. Unfortunately the simplest ways are clunky and have issues with reboots and device id/vendor name changes. Then I realized my server only has one USB 3.0 port so I may as well just pass through the server’s entire PCI USB 3.0 controller to the KVM guest.
root@compute1:~# lspci -vv | grep -i xhci
00:14.0 USB controller: Intel Corporation C610/X99 series chipset USB xHCI Host Controller (rev 05) (prog-if 30 [XHCI])
Subsystem: Lenovo C610/X99 series chipset USB xHCI Host Controller
root@compute1:~#
As you can see my USB 3.0 controller is this Intel Corporation C610/X99 PCI device at 00:14.0 and we can use the virsh tool to get the address of the device that we can copy and paste into our KVM guest configuration. In this case 00:14.0 translates to 0000_00_14_0:
root@compute1:~# virsh nodedev-dumpxml pci_0000_00_14_0 | grep address
<address domain='0x0000' bus='0x00' slot='0x14' function='0x0'/>
root@compute1:~#
With that address string in hand let’s edit our KVM guest:
root@compute1:~# virsh list
Id Name State
--------------------------------------------
2 dockstarter.kvm running
3 librenms.kvm running
4 ubuntu.22.lts.template.kvm running
5 docker.securefiber.net.kvm running
root@compute1:~# virsh edit docker.securefiber.net.kvm
The virsh edit feature should bring you into nano, vi or a similar text editor depending on your operating system. I’m using Ubuntu 20 LTS in this case. In your editor scroll down to the <devices> section and add a new entry:
<hostdev mode='subsystem' type='pci' managed='yes'>
<source>
<address domain='0x0000' bus='0x00' slot='0x14' function='0x0'/>
</source>
</hostdev>
At this point you can SSH or console to your guest and shut it down. When you start it again the PCI device should show up within your guest:
root@docker:~# lsusb -tv
/: Bus 06.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/6p, 5000M
ID 1d6b:0003 Linux Foundation 3.0 root hub
|__ Port 6: Dev 2, If 0, Class=Application Specific Interface, Driver=, 5000M
ID 1a6e:089a Global Unichip Corp.
...
root@docker:~#
At this point you should be able to use your Coral.ai device with docker like you would on bare metal. If you’re doing this for Frigate you can pick things up with their manual here.
Known Issues
Some of you may get an error when starting the guest saying something like:
root@compute1:~# virsh start docker.securefiber.net.kvm
error: Failed to start domain docker.securefiber.net.kvm
error: unsupported configuration: host doesn't support passthrough of host PCI devices
root@compute1:~#
This tool will tell you what is wrong, in my case IOMMU was not working at the kernel level.
root@compute1:~# virt-host-validate
QEMU: Checking for hardware virtualization : PASS
QEMU: Checking if device /dev/kvm exists : PASS
QEMU: Checking if device /dev/kvm is accessible : PASS
QEMU: Checking if device /dev/vhost-net exists : PASS
QEMU: Checking if device /dev/net/tun exists : PASS
QEMU: Checking for cgroup 'cpu' controller support : PASS
QEMU: Checking for cgroup 'cpuacct' controller support : PASS
QEMU: Checking for cgroup 'cpuset' controller support : PASS
QEMU: Checking for cgroup 'memory' controller support : PASS
QEMU: Checking for cgroup 'devices' controller support : PASS
QEMU: Checking for cgroup 'blkio' controller support : PASS
QEMU: Checking for device assignment IOMMU support : PASS
QEMU: Checking if IOMMU is enabled by kernel : WARN (IOMMU appears to be disabled in kernel. Add intel_iommu=on to kernel cmdline arguments)
QEMU: Checking for secure guest support : WARN (Unknown if this platform has Secure Guest support)
LXC: Checking for Linux >= 2.6.26 : PASS
LXC: Checking for namespace ipc : PASS
LXC: Checking for namespace mnt : PASS
LXC: Checking for namespace pid : PASS
LXC: Checking for namespace uts : PASS
LXC: Checking for namespace net : PASS
LXC: Checking for namespace user : PASS
LXC: Checking for cgroup 'cpu' controller support : PASS
LXC: Checking for cgroup 'cpuacct' controller support : PASS
LXC: Checking for cgroup 'cpuset' controller support : PASS
LXC: Checking for cgroup 'memory' controller support : PASS
LXC: Checking for cgroup 'devices' controller support : PASS
LXC: Checking for cgroup 'freezer' controller support : PASS
LXC: Checking for cgroup 'blkio' controller support : PASS
LXC: Checking if device /sys/fs/fuse/connections exists : PASS
root@compute1:~#
To fix this I edited my /etc/default/grub file like so:
root@compute1:~# nano -w /etc/default/grub
...
GRUB_CMDLINE_LINUX="intel_iommu=on iommu=pt"
...
root@compute1:~# update-grub
Sourcing file `/etc/default/grub'
Sourcing file `/etc/default/grub.d/init-select.cfg'
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-5.4.0-171-generic
Found initrd image: /boot/initrd.img-5.4.0-171-generic
Found linux image: /boot/vmlinuz-5.4.0-170-generic
Found initrd image: /boot/initrd.img-5.4.0-170-generic
done
root@compute1:~#
Then shut down all your KVM guests safely and reboot the server. The problem should be fixed!