Nested virtualization coming to POWER9


On the KVMPPC mailing list, Paul Mackerras posted for comments a new set of updates to KVM-HV allowing POWER9 systems in radix MMU mode to finally nest virtualization (i.e., run a virtualized POWER9 guest within another virtualized POWER9 guest through KVM-HV). This is not only a big boon to shops that run Power ISA virtual machines in terms of enhanced security and portability, but also offers the potential for improved debugging and development.

As you will no doubt recall from our previous series on turning your Talos into a Power Mac, the Kernel-based Virtual Machine functionality on Power ISA and PowerPC comes in two flavours: KVM-PR, which emulates supervisor instructions in software and thus is slower but more flexible and can be nested, and KVM-HV, which uses hardware hypervisor support in later Power ISA chips and is faster, but cannot emulate most earlier CPUs and previously could not be nested (though a KVM-PR guest can run within a KVM-HV guest, and additional KVM-PR guests within that).

With these patches, nested KVM-HV guests are now possible, and can run at nearly full speed. Let's define the base hypervisor to be at level 0 ("L0"). L0 can use the hardware virtualization support to run a guest at level 1 ("L1"). An L1 guest, however, currently cannot do the same thing, so it can't spawn any additional nested VMs under its own control. The trick with these patches is to add hypercalls to allow an L1 guest to ask the L0 hypervisor to create another guest on its behalf, but set up address translation that the L1 guest can manipulate. The new guest is actually another L1 guest, but it looks like an L2 guest because L0 will in effect translate the fake L2's addressing requests through the L1 guest that requested it using a combination of instruction emulation and paravirtualization. The emulated L2 guest should be able to then turn around and request a new VM itself, and the L0 hypervisor will make another L1 guest that the faux L2 guest can control that acts like an L3 guest, and thus turtles all the way down.

Because it is still inherently KVM-HV, however, it inherits all of its basic limitations such as only supporting the current processor generation and the one immediately preceding it. In addition, the current nested guest implementation relies on radix MMU mode, the default MMU mode of the POWER9 (KVM-PR requires hashed page table MMU mode), meaning it does not support earlier Power ISA generations that only support hashed page tables. The patches are out for comments on the mailing list and hopefully will be incorporated into the Linux kernel tree in the very near future.

Comments

  1. Thanks for the great blog.
    After your 2 parts article and the whole KVM-PR/HV thing, I did some more research because of the N / N-1 generation support.

    As the G5 derives from the Power 4, theory wants that we could be golden using a Power 4 or Power 5 IBM workstation and be able to run OS X with the KVM-HV benefits.

    Unfortunately, the Altivec units are reported as part of the Power 6 and later, making it out of reach of the N / N-1 requirements of KVM-HV :(

    Do you think there can exist a sweet spot to run OS X on a Power workstation running full-ish speed (thanks to KVM-HV) ? Or was the G4/G5 to far from their Power ancestors and we should take comfort in all we can get from the KVM-PR mode?

    Note: I noticed some mentions of in-order / out-of-order exec mode in the Power evolution (and no idea if it can have an impact). And I found nothing about the MMU modes in Power 4 and later.

    ReplyDelete
    Replies
    1. I think what will help most in the immediate timeframe (and is achievable) is getting KVM-PR to work properly in full 64-bit mode. Then it can be made to appear as a G5 to OS X, which would be a better match for the CPU. Right now this just causes the kernel to panic.

      There are PPC 970 workstations, by the way. The Intellistation POWER 185 is a nice fit and would "just work" with this approach.

      Delete

Post a Comment

Comments are subject to moderation. Be nice.