Posts

Latest Posts

More news on the "Tiny Talos"


Raptor is revealing more details on what we'll christen the Tiny Talos. In posts on Phoronix, engineer Timothy Pearson indicated that the unit will take "low-end Sforza" parts (likely capped at four or possibly eight cores, assuming SMT-4), putting it at the low-end under the T2 Lite. Interestingly, integrated sound is available as well as presumably integrated video.

Whatever it is, the full reveal is scheduled for October. We'll be watching.

Alpine Linux updated to 3.8.1


Alpine Linux has been updated to version 3.8.1, including bugfixes and security updates (though note this information on a RCE this release supposedly fixes). The ppc64le versions available for download don't seem to have support for POWER9 or Talos systems yet (only POWER8 systems so far), but hopefully this will change in the near future.

More POWER in Firefox 62


The fixes for compilation and better performance on ppc64le yours truly contributed to Firefox are now in the release channel with Firefox 62. These were bug 1464751, bug 1464754 and bug 1465274, which was a spin-off from bug 1434589.

Unfortunately, the Fedora pre-built Firefox 62 seems to have a crippling crash bug in it when typing addresses into the location bar. Your distro's package may vary. However, building from source doesn't seem to be affected, implying some build configuration issue on their end. Note that there are build system changes in 62 which require some additional workarounds in your .mozconfig and this might have been what bit them. Here's what I use for making a debug build:

export CC=/usr/bin/gcc
export CXX=/usr/bin/g++

mk_add_options MOZ_MAKE_FLAGS="-j24"
ac_add_options --enable-application=browser
ac_add_options --enable-optimize="-Og -mcpu=power9"
ac_add_options --enable-debug
ac_add_options --disable-jemalloc
ac_add_options --disable-release
ac_add_options --enable-linker=bfd

export RUSTC_OPT_LEVEL=0
Adjust the -j24 to the number of threads you want (I like keeping some resources free, so I reserve eight threads from the 32 on this system). The linker defaults to gold, which doesn't work right on ppc64le; this configuration forces it to GNU ld ("bfd"). This config also forces the use of gcc instead of clang; you change to your tastes.

Making a release build seems to have some problems on POWER9 still, so that's disabled, along with jemalloc. I also have a binary of gn (from Chromium) used to regenerate some configurations, which I'm happy to provide upon request. If you have such a binary, then add export GN=/path/to/gn to let the build system use it.

Save this as .mozconfig in the root of the Mercurial tree you cloned or tarball you expanded, and then ./mach build to build.

For an optimized build, such as the one this blog post is being typed in, the config is nearly the same:

export CC=/usr/bin/gcc
export CXX=/usr/bin/g++

mk_add_options MOZ_MAKE_FLAGS="-j24"
ac_add_options --enable-application=browser
ac_add_options --enable-optimize="-O3 -mcpu=power9"
ac_add_options --disable-jemalloc
ac_add_options --disable-release
ac_add_options --enable-linker=bfd

export RUSTC_OPT_LEVEL=2
Unfortunately, setting MOZ_PGO (for profile-guided optimization) and MOZ_LTO (for link-time optimization), although they complete, seem to generate defective executables. That will be a project to work on later. The RUSTC_OPT_LEVEL is probably unnecessary here but doesn't hurt.

My internal builds also use a port of TenFourFox's basic adblock to reduce the amount of JavaScript it needs to run, since Firefox does not yet have a JIT for ppc64le. That's something I'm working on as well, inspired by the big-endian 32-bit PowerPC JIT in TenFourFox, but this JIT will be 64-bit and little-endian so that we can get wasm up and running. I'll be posting progress reports here as the work moves along. This is rather different than the folks working on the ppc64le Chromium port, which uses the existing Power ISA support in V8 and is trying to get the rest of the browser up. For philosophical reasons I won't be working on that project (I think Google's dissemination of Blink is not ultimately benign, a topic for another day), but I support more browser choice on our new platform, and I hope they are successful too.

Talos shines at OSS North America


Raptor was at the Linux Foundation's Open Source Summit North America, and at least one attendee was very impressed with the Talos II. Theirs run Debian (we run Fedora here), but we can still be friends. The unit on display was the dual-socket T2 with two 4-way CPUs for 32 threads, the same as the one your humble author is typing on.

Is there a tiny Talos in the timeline?


Did Raptor just tip their hand on a micro-ATX Talos? No announcements on price or capacity, but being one CPU, it would probably be most comparable to the Talos II Lite (1TB RAM maximum, PCIe x16 and x8). This is doable in an mATX form factor, but given the size of the heatsinks in the EATX T2 and T2 Lite, cooling might be an issue in a case this small. We'd imagine the price would be competitive with the T2 Lite as well.

Some fun case graphics might give this thing a little style, too (see our artist's impression of the image they linked).

Musing over POWER9 roadmap at Hot Chips


(See the presentation from AnandTech's live blog at Hot Chips.)

With the news that GlobalFoundries has stopped all 7nm development, the next step for the Power ISA got more nebulous. IBM really phoned in their presentation at Hot Chips this time around; there wasn't a lot of meat on the bone, and they probably got advance warning of the changes at GF which likely cut what they were willing to say in public. But IBM still has one more stop on the roadmap for the POWER9, so they're not done with 14nm yet.

The 2019 "advanced I/O" POWER9 will increase memory bandwidth from the "scale up" 210GB/s to 350GB/s, over twice as much as the "scale out" cores in the Talos II at 150GB/s. IBM didn't appear to say if this would require buffering or if it was direct attached memory, though our incompletely informed suspicion here is the former. If so, it wouldn't be a direct replacement for the Sforza cores the T2 runs now; the board would probably need a redesign to accommodate whatever Centaur successor they require. That would also have power and thermal impacts in a workstation form factor. I/O on the "AIO" POWER9 jumps to OpenCAPI 4.0 from 3.0, allowing caching on accelerators and additional link widths, and NVLink 3.0 from 2.0, presumably both over Bluelink. IBM didn't announce clock speeds, but given that the core counts are the same, they're most likely identical or comparable.

IBM also said rather little about the POWER10. No core count was reported and the node size was pointedly not shown. However, signaling was announced at 32 and 50GT/s, up to double the POWER9, indicating IBM continues to prioritize bandwidth as their competitive advantage against x86 commodity servers. The timeframe is still 2020, so we can expect at least another 18 months of POWER9 goodness.

Making your Talos II into a Power Mac: dcbz considered harmful (part 2)


In the first part of this article we talked about getting your Talos II prepped to emulate a Power Mac using KVMPPC, the kernel virtualization facility in Linux. Having followed the instructions in that article, you've got your kernel in hash table mode, you've got the KVM-PR kernel module loaded (and patched it if necessary), you installed (or built) QEMU, and you have a blank QEMU disk image ready to go.

For this part, we will assume you have chosen 10.3 Panther, 10.4 Tiger or 10.5 Leopard to install. I will discuss Leopard relatively little other than how to get you started in it; most of the rest applies to Leopard that applies to Tiger. I'll briefly discuss booting OS 9 with TCG at the end.

Before starting, since we will use tun/tap networking, make sure the interface is up before booting. On Fedora, I do something like this:

sudo ip tuntap add dev tap0 mode tap user [your username]
sudo ip link set tap0 up promisc on

and, if you use libvirt,

sudo brctl addif virbr0 tap0

For filesharing you could set up either Samba or Netatalk. I use Netatalk, since I'm more accustomed to AppleTalk and it enables my T2 to serve files over AFP to the other classic Macs here, and it also will work fine with Mac OS 9 if you want to use that at some point.

Let's begin by constructing the command line to boot your emulated Mac from disc and install the OS. Each OS does better currently with certain combinations of emulated CPU and hardware features. In addition, we also need to make sure that the emulator stays within a single core for better performance (you will get random system stalls if it moves over to another core and throughput will be generally impaired), so we need to set affinities appropriately.

We'll go with 10.4 for our example; substitute for your OS of choice where relevant. Start out with

taskset -a -c 0-3 qemu-system-ppc -M

This binds all of QEMU's threads to a single core (recall that the T2 Sforza cores are SMT-4, and each appear as logical CPUs, so everything must run on a single core this way). While QEMU spawns more than four threads, encompassing two cores (i.e., 0-7) has no noticeable performance benefit and can sometimes unsettle Mac OS X by making timing loops unpredictable.

For the -M option, we will specify mac99 and kvm. The OSes differ on what they prefer for the VIA. 10.3 and 10.4 need to run the emulated mac99 with an emulated CUDA chip onboard, or the OS is unable to detect the real-time clock. 10.5, however, requires the later PMU attached to the VIA. So that gets us to

taskset -a -c 0-3 qemu-system-ppc -M mac99,accel=kvm,via=cuda (10.3, 10.4)
taskset -a -c 0-3 qemu-system-ppc -M mac99,accel=kvm,via=pmu (10.5)

All three of these OSes work fine emulating a 7400-series G4. We will use the "Nitro" 7410 (-cpu nitro), which is a bit faster than the G3 (-cpu G3). 10.3 may have some problems with assigning more than 1.5GB of RAM (-m 1536), but 10.4 and 10.5 work fine with 2GB (-m 2048). Don't use more than 2GB of RAM; it will cause various problems. A verbose boot is helpful in case you accidentally did something wrong (-prom-env boot-args=-v). We'll specify our disk image and some tuning parameters (-drive file=[filename].img,format=qcow2,l2-cache-size=4M), and say boot from the CD or DVD (-boot d -cdrom "/dev/cdrom"). Lastly, we'll enable the emulated RTL8139 NIC and USB tablet (-netdev tap,id=mynet0,ifname=tap0,script=no,downscript=no -device rtl8139,netdev=mynet0 -usb -device usb-tablet) and use a sane screen resolution (-g 1024x768x32). For my 10.4 booter, the full command line looks like this (using the filenames I use on this system):

taskset -a -c 0-3 qemu-system-ppc -M mac99,accel=kvm,via=cuda -cpu nitro -m 2048 -prom-env boot-args=-v -boot d -cdrom /dev/cdrom -drive file=tigerhd.img,format=qcow2,l2-cache-size=4M -netdev tap,id=mynet0,ifname=tap0,script=no,downscript=no -device rtl8139,netdev=mynet0 -usb -device usb-tablet -g 1024x768x32

I strongly suggest saving this as a shell script so that you can make any necessary variations. Insert your OS CD or DVD and run the script. It should go into the installer. If it didn't, make sure your filenames are correct, that you have OpenBIOS installed (it comes with QEMU) in a location the emulator can see, and that the KVM kernel modules (both kvm and kvm_pr) are loaded by checking lsmod.

Once the installer has booted you can of course directly proceed to installation in KVM, but I actually recommend shutting down the emulated Mac at this point and bringing everything back up in TCG to get the OS installed. To do that, just use the same command line, but change accel=kvm to accel=tcg. As I mentioned in the first part, heavy I/O loads tend to be less performant on KVMPPC, and installing and upgrading an OS is a pretty heavy I/O load, so running it in TCG will complete the task more quickly and more reliably.

If you want to run Software Update to bring your emulated Mac up to date, it's probably best to also do this in TCG. You could also separately download one of the combo installers (such as the one for 10.4.11) and push that to the emulated Mac on your Samba or Netatalk AFP share.

When the OS is installed, remove the CD-ROM from your command line unless you want to keep it, and change the -boot argument to -boot c to boot from the emulated drive image.

Ta-daa!

For best results with video updates, make sure that the display settings inside System Preferences match your physical display. I'm in 32-bit colour, so I made sure that System Preferences was using Millions instead of Thousands of colours. Because of variabilities in timing, you may notice the OS X clock is close but may seem to run somewhat unsynchronized from your host's clock because of how the delay loop might have been calibrated at bootup. This is mostly just a nuisance.

The next step is optional, but hacks KVMPPC to improve performance of the emulated Mac. Right now we're actually fooling the operating system; we're not really a G4. In fact, the closest Power Mac relative to the T2's POWER9 is the G5, i.e., the PowerPC 970, which is essentially a POWER4 with some modifications for workstation duty and a bolted-on AltiVec unit. Even though we told the OS we're a G4, this doesn't change the attributes of the CPU, in particular for this case the specific instructions it does and does not support and how certain others are handled.

With "big POWER" IBM removed some of the PowerPC instructions that were infrequently used or scaled badly, such as dcba and mcrxr. You don't need to know what these do; just know they were used in some software, but as of the G5 ceased to exist in hardware. Additionally, the G5 and later big POWER designs (including the POWER9) also have a 128-byte cache line instead of the 32-byte cache line of the G3 and G4, which is relevant to the dcbz instruction as it zeroes an entire cache line and potentially spills it to memory. OS X has adaptations for dealing with these cases (an illegal instruction handler in the first case that simulates the instructions in software, and modified system routines in the second), but that only happens if OS X knows the machine is a G5. In this case, it doesn't, so these adaptations are never installed.

KVM-PR gets around the dcbz problem on later POWER designs, including the POWER9, by scanning every new code page in a 32-bit guest for the dcbz instruction and replacing it with an illegal one it can detect. (Remember, it's still a legal instruction; it just behaves differently.) When executed it faults and falls back to KVM-PR, which simulates a 32-byte dcbz instruction in software, and returns control to the guest. It's not a surprise that this process is quite slow, especially if it gets called in a loop. Unfortunately Apple does just exactly that for clearing memory and the instruction is a major portion of the OS' built-in implementation of bzero, which is also called by memset. This is a hot routine and needs to run fast. The G5 version knows about the cache line difference and accounts for it; the G4 and G3 versions don't, and we're using the G4 version.

Apple, however, also helped us out here a little bit by allowing us to guess where the routine is. This and other major components live in a section of memory called the "commpage," which is always located in the top eight pages of the 32-bit addressing space in every process. It is provided by the kernel as an optimization for fast access to important data and common routines. The bzero routine is virtually unchanged from 10.3 to 10.4, and both start with a very unique instruction (cmplwi cr7,r4,32). If we see this instruction in the commpage, we can be confident we have found bzero. And now that we've found it, we can modify it.

Recall I mentioned that KVM-PR must scan each new executable code page for the instruction and change it. We can alter KVM-PR to detect that unique leader instruction if it's mapping in the commpage, and then monkeypatch in a new routine that doesn't use dcbz and thus won't require slow simulation. To make it more reliable, we know where the location should be, so we'll only patch it if it's actually there. As a bonus we'll also map dcba to nop anywhere in an executable section so that it doesn't need a trip to a special handler either. That is what this patch does.

To build KVMPPC with this patch uses the same steps as we discussed for building and installing the kernel modules in part 1. This patch also applies with -p1.

Does it make a difference? You bet it does. On my system with Geekbench 32-bit on Mac OS X 10.4.11, it improved the overall benchmark by nearly 200 points over the unpatched version, almost all of it in (no surprise) the memory score.

This consequence of masquerading as a different CPU also carries over into which software you can run. Even though this is a G4, you actually have to run the G5 version of TenFourFox, which doesn't have any of the other illegal instructions that aren't patched (just be patient -- it will take TenFourFox almost a full minute to come up). If your software offers a G5 version, you should run that if you can. The discontinuity leads to amusing discrepancies like this one.

Interestingly, TCG on POWER9 actually had errors during SunSpider that the JIT in TenFourFox under KVMPPC doesn't, and even with the warmup was up to twice as slow as KVM at SunSpider. Go TenFourFox!

You'll find that performance is still fairly pedestrian even with KVMPPC. While the OS typically benchmarks my T2 as a "2.04GHz G4" (TCG usually gets computed as somewhere between "900 MHz" and "1.0GHz"), the actual throughput you get varies greatly on workload. Raw CPU performance is a bit better than my Quad G5 scores running single core in Reduced mode, though the Quad running full tilt easily surpasses it (the emulation overhead is only reduced, not eliminated). The numbers get a lot different in applications depending on how their workload is structured. For example, TenFourFox's G5 JIT in KVMPPC gets about 6800ms in SunSpider compared to around 3800ms on a "real" 1GHz iMac G4. Improving these numbers to get parity, and especially getting QEMU to support SMP, will need to be an area of active future development.

Lastly, I mentioned about the best way to run OS 9 on a Talos. Although limited to TCG, it's still pretty snappy, a testament to Mac OS 9's comparatively low system requirements. Mac OS 9 works better with the PMU than the CUDA (or you get problems with the mouse not responding to double clicks reliably) and is limited to 1.5GB of RAM. It also doesn't support the QEMU USB tablet, but it does support the RTL8139 with this driver. To get the driver installed, I actually just made an ISO image out of it, dropped it in the Extensions folder and rebooted it. My command line looks like this:

qemu-system-ppc -M mac99,accel=tcg,via=pmu -m 1536 -boot c -drive file=classic.img,format=qcow2,l2-cache-size=4M -usb -netdev tap,id=mynet0,ifname=tap0,script=no,downscript=no -device rtl8139,netdev=mynet0 -rtc base=localtime

Mac OS 9 uses a different real-time clock base, so this has an additional -rtc option. You can use any CPU you want since it's emulated; I just use the default G4 7400 here instead of specifying one.

Post questions or things you've discovered in the comments.

GlobalFoundries stops all 7nm development


As reported in AnandTech, GlobalFoundries, which includes the former chip manufacturing foundries of AMD and, notably for our reporting, IBM, has scuttled their 7nm process roadmap. Instead, the company will be concentrating more on their 14nm and 12nm FinFET technology, including the 14nm FinFET process that GlobalFoundaries uses to manufacture the POWER9.

The POWER10, scheduled for 2020 in the IBM product roadmap, is supposedly being designed on a 10nm process. Assuming IBM doesn't redesign the POWER10 for 12nm, their other options are Intel (unlikely), TSMC or Samsung, all of whom have 10nm processes. POWER11 was planned for 7nm, but has no timeframe. Meanwhile, in a possible sign of what's to come, AMD has moved to TSMC.

Making your Talos II into a Power Mac: KVMPPC for POWER9 (part 1)


Talospace is a spinoff from the TenFourFox Development blog, which for those unfamiliar with it, is a Firefox fork maintained for Power Macs running Mac OS X 10.4 and 10.5. It shouldn't be a surprise that the common architecture was a big plus for me, and it's possible to run OS X with reduced emulation overhead on the processor using the same Kernel-based Virtual Machine (KVM) scheme used for virtualization on other platforms.

Emulation is of course just one of the things us old Mac users would like working properly on the new Power hotness. The other is the damn Command key working like it's supposed to. We'll address that pain point in another "First Person" post coming soooooon.

Anyway, a brief digression before we begin, for those unfamiliar with how KVM works on Power ISA. KVMPPC comes in two flavours, KVM-PR ("PRoblem") and KVM-HV ("HyperVisor"); both work in big and little endian modes. KVM-HV is the more modern of the two and the most technically like hypervisors on other architectures. It uses the hardware support in later Power ISA CPUs, so it's overall faster, particularly when many supervisor-level instructions must be executed. However, it cannot be nested (you can't run a KVM-HV guest inside a KVM-HV guest, though you can run a KVM-PR guest; more on that in a moment), and most importantly, it supports only virtualizing the same processor generation or the one immediately prior. Since no version of OS X ran on a POWER8 (let alone a POWER9), we won't be dealing with it further for the purposes of this article.

That brings us to KVM-PR. Unlike KVM-HV, KVM-PR runs strictly in user mode, or what IBM docs refer to as the "Problem State." It does run as a kernel module, so it's not in userspace, but it does not depend on the hardware which powers KVM-HV and thus only runs user-level instructions. That means it must trap and emulate supervisor-level instructions on behalf of the guest, which is much slower. However, KVM-PR can also emulate other instructions and their desired behaviour, which theoretically allows it to act like any supported Power ISA or PowerPC CPU, including a G3, G4 or G5. Instructions which aren't supported natively are trapped and executed just like supervisor-level instructions, and everything else can still run on the metal. Because it's user mode, it can be nested (a KVM-PR guest can run inside of another KVM-PR guest, as well as inside a KVM-HV guest). KVM-PR was the original method of virtualization on PowerPC Linux, descending from the venerable old Mac-on-Linux project (which had its own peculiar hypercalls), and a specialized form of this method is how OS X runs Classic on 10.4 and earlier. This is the method we will use here.

Let's first talk about whether KVM is the way you want to go. For our Power Mac hardware emulation, we will use QEMU, which can use KVM (and KVMPPC) to accelerate the processor, and QEMU provides the rest of the platform. QEMU provides two platform profiles, g3beige, a Gossamer Beige Power Mac G3, as the name implies, and mac99, essentially a Sawtooth G4. We will only be using the mac99 platform since it provides the best combination of flexibility and compatibility.

QEMU also provides emulated USB devices. The most useful to us is the USB tablet, which allows QEMU to detect when the mouse is within the QEMU window without having to grab it and makes using the emulator a lot more seamless. Unfortunately, the USB tablet is only supported by 10.3 Panther and up. No version of PowerPC Mac OS currently has support for VirtIO devices yet either, so there is no graphics or disk acceleration. On the other hand, QEMU does provide an emulated RTL8139 network card, for which drivers are available for Mac OS 9 through 10.2 Jaguar and are built into at least 10.3 Panther and up, and with tun/tap runs with decent throughput. Sound is best described as a work in progress and graphics work but are basically a dumb framebuffer. Still, this is enough to get the OS off the ground and be useful.

KVMPPC does not work in all situations with QEMU. Most notoriously it does not work for booting Mac OS 9 and Rhapsody, and not 10.0 or 10.1 either, at least from disc. I've done some work on improving this and it gets mostly through the nanokernel startup in OS 9 but doesn't get any further yet. For these operating systems you will currently need to use TCG, QEMU's software CPU emulator, which runs by default if you don't ask for KVM. TCG does have JIT acceleration, and the JIT supports Power ISA, so while it's definitely slower it's at least somewhat better than it sounds. TCG also tends to run a little smoother than KVM since it's all within a user process, but compute-intensive tasks can run up to an order of magnitude slower. TCG is also involved if you run a completely alien inferior architecture like x86.

KVMPPC also tends to be problematic with heavy I/O loads. TCG can be noticeably faster when running installers, for example, or anything that involves substantial emulated disk access. This is probably due to the large amount of supervisor-level code that incurs a speed penalty with KVM-PR. I had better luck and faster install times installing things with QEMU using TCG, then shutting down and rebooting in QEMU with KVM to actually use them.

Finally, KVMPPC only works to mimic certain processors currently. G3 works for every system, and Nitro (G4 7410) works for most of them, but right now that's all. None will boot in KVMPPC with any G4 7450-series processor, and trying to start KVMPPC in 64-bit mode to emulate a G5 currently crashes my Talos. There is also no support for SMP, so our monstrous multi-core beasts will only present one CPU to the emulated OS. The processor you choose doesn't necessarily change the underlying vagaries of the architecture, though, which will be discussed in the next part as well.

Some specific notes on individual versions of Mac OS:

  • Mac OS 9, Rhapsody, 10.0 Cheetah and 10.1 Puma do not currently boot on KVMPPC, at least not from CD. They also don't support the USB tablet, so you must click in the window to grab the mouse and keyboard, and hit Ctrl-Alt-G to release the grab to do something else. Rhapsody can be notoriously hard to install and requires multiple steps which I won't discuss here. For OS 9 I'll talk about a couple of glitches with QEMU in Part 2, since many of you will still want to run it even though there is no CPU acceleration.

  • 10.2 Jaguar has various problems in KVMPPC, though it does work. Finder windows tend to glitch and not fully load when you doubleclick folders and devices on the desktop. Classic does not work in 10.2 with KVMPPC and aborts with a bus error. 10.2 also does not support the USB tablet, so you need to grab the mouse as with OS 9.

  • 10.3 Panther and 10.4 Tiger both run well in KVMPPC. Later on we'll talk about a specific optimization to the operating system "commpage" to make them run even better. 10.3 runs better than 10.4, but 10.4 has better compatibility. Both support the USB tablet and have built-in support for the RTL8139 NIC. Classic will boot and run in both, but is noticeably slower than on a real machine (this is true of both TCG and KVM), though Classic is somewhat faster in Panther.

  • 10.5 Leopard appears to work fine in KVMPPC. It supports everything that 10.3 and 10.4 do, though I haven't done the particular commpage optimization for 10.5 yet because I don't use, nor particularly like, Leopard personally. 10.5 obviously does not support Classic.

You'll need to do some preparation to get your Talos II to be an accelerated Mac with KVMPPC (this isn't needed if you're going to use TCG since it's purely userspace). The first is that you need to make sure your T2's MMU is in hash table mode, used by POWER8 and earlier CPU generations. The POWER9 introduces a new MMU mode called radix mode, but without going into the gory technical details, the particular memory mapping characteristics of radix mode mean certain tracts of memory cannot be properly manipulated by KVM-PR. All OSes that support the POWER9 in radix mode will support it in hash table mode. For Linux, just add disable_radix to your kernel command-line arguments. For my Fedora workstation, I just put it into GRUB, regenerated the configuration, and rebooted. If you did this right, dmesg will show a line like this:

[    0.000000] hash-mmu: Initializing hash mmu with SLB

You shouldn't see any mention of radix mode.

The next step is possibly to download a copy of the kernel source code. If you have kernel 4.17.x or earlier (as my Fedora 28 system does), you will need to apply patches to the KVMPPC kernel modules in that version to even get it to start. If you have kernel 4.18.x and up, the necessary patches should already be present for basic functionality, but you may still want to get the kernel source for some of the hack optimizations in this post that aren't (and probably won't ever be) included by default.

Let's assume for didactic purposes that you do need to patch the KVMPPC kernel module that comes with your distro. We will talk about adding the hacks to it a little later. This will taint your kernel. If your system behaves strangely and you are unable to unload the module, you may need to reboot.

  1. Download and unpack the source archive, and cd into the root of the unpacked source archive.
  2. Download this patch (written by yours truly) into the root of the source archive.
  3. patch -p1 < that_patch.diff
  4. cp /your/kernel/config .config (assuming you're still in the source archive)
  5. make -j24 modules (or if you're one of the lucky scum with more cores, adjust as appropriate; I have two of the 4-core CPUs for 32 threads, but I like to leave a core free)
  6. When the make has run to completion, edit include/generated/utsrelease.h to make sure it matches what appears in uname -r, or your kernel may refuse to load the module.
  7. Regenerate the KVMPPC modules with the matching string: make -j24 SUBDIRS=arch/powerpc/kvm

Now you can load your custom modules:

cd arch/powerpc/kvm && sudo modprobe -r kvm_hv kvm_pr kvm && sudo insmod kvm.ko && sudo insmod kvm-pr.ko

and you should see something like this in dmesg:

[22198.130998] kvm: loading out-of-tree module taints kernel.
[22198.184535] kvm: module verification failed: signature and/or required key missing - tainting kernel

If you actually got an error message, you loaded the wrong thing, or you possibly forgot the patch (earlier KVMPPC versions won't even start KVM-PR on a POWER9).

If you already have the patches, chances are your OS already loaded KVM-PR. You can check this with lsmod. If it didn't, and trying to load it with sudo modprobe kvm_pr doesn't work, you may need to also patch your kernel modules with the steps above. On the other hand, if you see both kvm_pr and kvm_hv listed, do a sudo modprobe -r kvm_hv (unless you really do need it) to limit your system to KVM-PR and help to simplify the remaining steps in this article.

Next, the third step is to install QEMU. QEMU 3.0 is strongly advised; if your package source doesn't have it, then download and compile from source (and you get to do -O3 -mcpu=power9 anyway for great justice). Although QEMU 2.12 will mostly work for these examples, many bugs and edge cases were fixed in the Mac hardware emulation and some bugs can't be worked around easily any other way. You may also have to remove some command line options from the examples below that were not supported in 2.12.

Create your base disk image according to the QEMU instructions and get out your OS X disc. I saw little value in using a raw disk image and it was substantially larger than a qcow2 image, so I'd just use that. We'll assume this is your chosen format for the remainder of this series.

In part 2, we'll talk about how to get all this actually booting.

Intel: Benchmark our product and we'll sue


Making the rounds on the web: discovered in Intel's new firmware EULA, "You will not, and will not allow any third party to ... (v) publish or provide any Software benchmark or comparison test results."

It's probably unenforceable to limit what is essentially a product review of their CPUs. Probably. Or, just don't buy CPUs from a company that's willing to put junk like that in their EULAs in the first place.

Update: Intel has recanted.

CentOS announces support for POWER9


On the CentOS mailing list, CentOS 7 (1804) now supports POWER9 (running little-endian). This is not a surprise, as your humble author is typing this blog post on a Talos II in Fedora 28, but it's good to see the support trickling into downstream distros. The POWER9 version uses kernel 4.14 and is described as otherwise identical to the POWER8 distribution, which also runs little-endian. Download it from the CentOS mirror server.