[ title slide ] Hi. I'm here to talk about one of the biggest risk areas for the security of x86 hypervisors: qemu, in its role as PC emulator. I'll be looking at it from the perspective of qemu's role in a Xen system. Other Free Software full-PC hypervisors also use qemu, they are structured differently. But in Xen systems qemu plays a smaller role (and is sometimes even absent), so there are different difficulties and different opportunities. I'm going to talk about some work that Stefano Stabellini and I have been doing to improve the security of one of the most important Xen configurations. [ PV vs HVM slide ] x86 Xen guests can run in two modes: the traditional one is PV (which stands for paravirtualised). This involves modifying the guest to run specifically under Xen. Many Free Software operating systems have been ported to run as a Xen PV guest. This is still, in general, the best mode. But sometimes its necessary to run unmodified guests. Xen HVM provides an environment that looks to the guest just like a whole PC, complete with PCI bus and the traditional PC peripherals. To do this, Xen needs an implementation of those peripherals: and that is where qemu comes in. In many Xen HVM configurations, most if not all of the emulated PC is used only during booting: if the guest has suitable Xen-specific drivers, it can load them at a suitable point during boot, and switch over. This is a good idea because it's faster and more scalable. But from a security point of view, a malicious guest can still attack the whole of the emulated PC (although sometimes it might need to reboot itself to do so). [ XSAs slide ] This is a problem because a PC is a very complicated thing to emulate. Inevitably, almost all code has bugs. The more complicated the code, the more bugs. The PC is also a very broad interface: it provides a lot of different facilities. So the attack surface is very large. That means lots of security problems. Here we see a selection from the past year. Several of these are only applicable in non-default configurations, and they don't all allow immediate compromise of the host by a guest. But it's still a substantial proportion of the security risk posed to a host by a guest running in a virtual x86 PC. In principle there is no architectural reason why the PC emulation needs to run with the full privileges of the host. It's that way in most Xen HVM setups because the PC emulation is combined together in the single program qemu. In most non-Xen scenarios where qemu is used, qemu is responsible for most of the configuration and management of the domain, so it is necessarily trusted. But in Xen most of the other functions, which need privilege, are done by other parts of the Xen system - parts of the Xen system which generally have a much narrower and simpler and therefore safer interface to the guest. (The main exception to this is the Xen code which deals the astonishing complexity and variety in the x86 architecture. That would be a whole talk by itself.) This theoretical flexibility is used by some more advanced Xen-based downstreams which a security focus. But the approaches for an ordinary Xen administrator, who gets a fairly vanilla Xen from their operating system distro, are less sophisticated. [ User options ] As an x86 Xen user - that is, an administrator of a Xen system - you have several options for how to deal with the risk from security bugs in the qemu PC emulator. The best approach to this kind of situation is to simply avoid exposing, to potentially hostile guests, anything which is not strictly necessary. In the case of qemu in Xen, that means running a PV guest. Xen PV guests are still the best choice from a security point of view. But if you need to run HVM for some reason (maybe your guest operating system doesn't run on Xen PV) then your choices right now aren't brilliant. Currently the default is to simply run qemu as a process, as root in dom0. Any security bug which allows the guest to compromise qemu is a bug which compromises the whole system. It is still possible to run the device emulator qemu in a separate domain. It then runs only with the privilege of the guest. We call this `device model stubdomain'. But there are a number of reasons why most ordinary installations are not able to use this. It is currently only supported with an ancient version of qemu (specifically provided for this purpose by the Xen project). This old qemu (we call it `qemu-xen-traditional') is no longer receiving any new features. Most distros find building this too cumbersome and do not want to try to support it, so if you want to do this you will have to build it yourself. If you can do so then the limited feature set is probably still fine if you only want it for booting. And in that case, bugs in that qemu are not security bugs at all: the stub device model domain runs only with the privileges of (and over) the guest VM. But it's not practical to ship this as the default configuration, particularly for distros. [ slide with more user options ] We have two alternatives in the works. The most sophisticated of these is to port a modern qemu to the Rump Kernel project. Rump Kernels are a form of unikernel. Derived from NetBSD, they provide a way to build existing programs which expect a POSIXy kind of environment, to run as a single image directly on Xen (or indeed, on baremetal, or in other hypervisors or environments). We have got some way with this, but it's quite complicated. The build system, in particular, is exciting. It's essentially a little miniature distribution. And we have to provide a lot of Xen-specific control interfaces to be able to make a qemu device model work in a rump kernel. Sadly this probably isn't going to be ready in the next release of Xen, 4.7. In the meantime we want to do something to make security bugs in the device model less of a problem. So we have started a miniature project to deprivilege the device emulator within dom0, by running it as a separate Unix user. This won't do much for resource exhaustion attacks, at least right away, because it's difficult to stop all the ways that a Unix process might starve the rest of the system. But getting rid of the immediate host compromise is definitely worthwhile. We hope to have done this - or most of it - for Xen 4.7 (which is currently targeting June 2016). I'm going to take a look under the covers at how we intend to achieve this: [ slide with technical details ] Actually running a device model in dom0 as a non-root user is not entirely trivial. Firstly, the device model needs to be able to access the underlying resources (such as disks and networks) that it is trying to present to the guest as emulated IDE and ethernet controllers. For example, it will need access to (say) the LVM volume containing the guest disk image. Fortunately, since these emulated PC devices don't support hotplug, we can have qemu open the relevant devices as root, at startup, and then drop privilege later. There is an exception: the emulated cdrom, which needs to support insertion and removal. We will deal with this by having the toolstack library, which in a Xen system invokes and controls qemu, open the device, and pass the relevant fd to qemu. Secondly, the device model needs intimate access to the innards of the guest, the same way that real hardware would have (for example, it needs to be able to do DMA). This involves making hypercalls to access and manipulate guest domain. If qemu runs as root in dom0, it inherits dom0's whole-system privilege. When it runs in its own domain, it is specifically created as a service domain for the guest, so that Xen knows that it can access that guest. If we want to deprivilege it in dom0, it's more complicated. To make qemu's privilege drop effective we need to give it an ability to make hypercalls (and access its guest's memory and so on), but restrict that to hypercalls relating to the management of the specific guest. Only the dom0 kernel can identify and distinguish the qemu process from other parts of dom0, but only the hypervisor understands which hypercalls have which security properties. So, we are developing a small new feature in the hypercall interface that would allow the information about the caller to be presented to Xen along with the hypercall arguments. qemu will open the dom0 Xen hypercall and memory access devices at startup. Before dropping privilege it make a system call to tell the kernel to from now on always attach the appropriate rider to all its hypercalls. Then Xen can make the right access control decision. A similar consideration, and a similar approach, applies to xenstore, the low-level structured interdomain communication system which is used as the control plane for domain management and paravirtualisation. Thirdly, depending on the configuration, in current systems, qemu may be providing paravirtualised devices as well as full system emulation. (The toolstack decides whether PV devices are provided by qemu or by the dom0 or driver domain kernel, according to the requested guest configuration and the configured underlying resources.) The paravirtualised devices must support hotplug; but they provide a narrower, hypervisor-friendly interface: the paravirtualised protocols are generally the primary security boundary. So they can and should be provided by software with greater privilege to access underlying resources. To make this possible, we are reorganising the qemu support processes so that a guest might get two qemus: one deprivileged, which does full system emulation; and one which retains privilege but provides only paravirt interfaces. We have concrete proposals for all of these pieces, but they have not yet been fully agreed, tested, and deployed. Nevertheless, we hope to get this done for this summer's Xen release, 4.7. [ back to user options slide ] So to summarise: It's still best to use a Xen PV guest if you can. For those of you who need a full virtual PC, we hope to reduce the security impact of bugs in the qemu system emulation: by running the emulator as a non-privileged dom0 user, by default. More sophisticated - and more secure - privilege separation will of course continue to be available for Xen-based projects and vendors with a security focus, and in the longer term we hope that full device model stubdomains, with a modern qemu, based on rumpkernels, can become the default. But until then we will help users exploit the existing Unix user security boundary in their dom0 to help contain the qemu device model. [ questions ]