Showing posts from 2020

Vikings' upcoming OpenPOWER retail channel

Many Talospace readers are familiar with Vikings, who offer libre hosting as well as hardware sales for libre-friendly devices and systems and peripherals certified by the FSF Respects Your Freedom program (for which Raptor systems qualify). However, the Vikings' storefront now shows a new tab for OpenPOWER hardware, hopefully a public demonstration of a new retail channel coming soon for those ready to pull the trigger on an OpenPOWER workstation or server of your own. This is particularly of value to our readers outside North America, since this gets around a lot of the inconveniences of shipping and payment with United States businesses; Vikings is based in Germany, and accepts payments in euros, US dollars, British pounds, Australian dollars and New Zealand dollars. Already we have also heard that Vikings is working on a water-cooler system for POWER systems with an aim to reach the market in two months or less, a great option for people trying to run the 18 and 22-core parts in desktop environments (current BTO cooling options are air-cooled only).

Currently it is not known yet whether Vikings will sell full systems, parts and/or processors, whether the systems include other OpenPOWER systems other than Raptor workstations and servers, or when general availability is expected. Still, the more retail options there are, the greater the volume of sales and the greater the economies of scale that will result. In the end, that can only be a good thing for growing our niche but very important market.

Will it build?

While I will always be big-endian at heart, ppc64le does get around a lot of the unfortunately pervasive endian assumptions in a cold blackboxed x86_64 world, and even things like MMX, SSE and SSSE3 can be automatically translated in many cases. It is therefore a happy result that even many software packages completely unaware of ppc64le will still build and function out of the box, assuming they don't do silly things like emit JITted x86_64 assembly code and try to run it, etc.

I ran across this project the other day which has over 1,000 build scripts for ppc64le (as shell scripts and/or Docker files) that you can either use directly, or as a hint whether your intended build will even work. Cursorily paging through a few I see IBM E-mail addresses, so no surprise much of it is tested on Red Hat (though largely RHEL 7.x), but there are also Ubuntu scripts there as well and I imagine they'd accept other distros. Keep in mind that this is generic ppc64le, so it would work on POWER8 and up but any special optimizations (for example, I always build optimized Firefox at -O3 -mcpu=power9), and the concentration more favours server-side packages than workstation and client software. I also see relatively few platform-specific corrections, which could be both good (they weren't needed) or bad (they weren't tested). Still, it's nice to see more resources to aid porting and platform compatibility and that can only in turn get more packages thinking about making ppc64le (and hopefully ppc64) a first-class citizen too.

Linux 5.8 on POWER

The 5.8 kernel milestone has arrived, with improvements to reduce thrashing (though with the amount of memory even a Blackbird can hold, there's no excuse not to load these suckers up), an API for receiving notifications of kernel events, support for hardware-assisted inline encryption at the block layer for storage devices and a nice convenience feature where you can put sysctl.something.or.other=999 right on the kernel command line.

On the Power ISA side, this kernel adds the first support for POWER10 and ISA 3.1, although our Raptor contacts have indicated some displeasure with IBM's management decisions and we suspect this is a way of saying firmware binary blobs might be required to enable maximal performance (though we don't know, and it's unclear how much is under NDA). Another nice feature is an ioctl to send gzip compression requests directly to POWER9's on-chip compression hardware via /dev/crypto/nx-gzip. This is part of the general family of Nest Accelerators (NXes) accessible through the Virtual Accelerator Switchboard. More about that in a later article, but in the meantime while we wait for compressors to add this support, here's an accelerated power-gzip userspace library that directly replaces zlib.

Finally, in addition to various improvements for the 40x and 8xx series, the most interesting commit was around prefixed instructions. These represent the first 64-bit instructions in the Power ISA (here's a code sample to show you the encoding) and allow much bigger 32-bit displacements for load-store operations than the 16-bit ones in current 32-bit instructions. I'm not too wild about the fact this makes Power ISA technically variable-length, but these D-form instructions are easy to identify and they are always 64 bits in size, and they should make certain types of code generation a lot simpler on chips that support it.

Condor cancelled

Raptor has confirmed that, unfortunately but not unexpectedly, the LaGrange-based Condor that was announced at the OpenPOWER summit last year has been cancelled due to economic concerns. Certainly any new high-end product would be tough to launch in the present COVID-19 economy, and because its size (ATX) and capabilities (single CPU, OpenCAPI, four slots) would have slotted it between the Talos II and the Talos II Lite in our view, there just isn't a lot of slack not served by those two existing products to soak up. It's probably just as well because I think getting ready for POWER10 would mean more to many users (it certainly would to me), but that itself requires a lot of R&D capacity and Raptor's a small company. Rather than a niche POWER9 design, here's hoping the resources that would have gone to Condor will go to a really kick-a$$ new Rainier-based system instead.

Firefox 79 on POWER

Firefox 79 is out. There are many new web and developer-facing features introduced in this version, of which only a couple are of note to us in 64-bit PowerPC land specifically. The first is a migration of WebExtensions storage to a new Rust-based implementation; there was a bit of a pause while extension storage migrated, so don't panic if the browser seems to stall out for a few long seconds on first run. The second is a further rollout of WebRender to more Windows configurations, so this seemed like a good time to me to check again how well it's working on this side of the fence. With the Raptor BTO WX7100 installed in this Talos II, I've forced it on with gfx.webrender.enabled and layers.acceleration.force-enabled both set to true (restart the browser after) and worked with it all afternoon with no issues noted, so this time I'm just going to leave it on and see how it goes. Any GCN-based AMD video card from Northern Islands on up (the WX7100 is Polaris) should work. about:support will show you if WebRender and hardware acceleration are enabled, though currently no Linux configuration has it enabled by default.

Unfortunately, it turns out relatively few of us are like me where we build the browser ourselves from source, and it seems some distros are enabling features — most likely higher-level optimizations — that trigger broken builds on ppc64le (Ubuntu was mentioned by at least one user). It would be nice to whittle down the offending feature(s) they enabled, both to get local fixes to the distro package configurations and then look at why they don't work (or make the default not to enable them on our platform, solving the problem in both places). I suspect LTO and PGO are to blame, which have a long history of being troublesome, as well as various defects in gold (use GNU bfd as the linker instead). Meanwhile, the build I'm typing this blog post into locally is still happily running on the same .mozconfigs from Firefox 67.

The littlest POWER9 booter

In our previous article we talked about emulating an OpenPOWER system from Skiboot up through the Petitboot boot menu using extracts from pre-built PNOR firmware images (and/or QEMU) instead of having to build your own. Well, what if you want to build your own?

You can certainly download and build Skiboot and Skiroot/Petitboot from scratch, or naturally any of the firmware stages in PNOR flash since we're a fully open platform, and there is an entire (huge) build system to automate this process. It's big and intimidating to the uninitiated, and it also works just dandy. But for this simpler example, let's start with something a little smaller which can serve as an educational tool as well.

Recall that Skiboot is the lowest level emulated by QEMU presently, although in reality it is an intermediate phase started by an earlier boot stage, i.e., Hostboot (the pretty graphical boot you see in current versions of the Raptor firmware). Among other tasks Skiboot's most important one is to offer the services provided by OPAL, the OpenPOWER Abstraction Layer, which the operating system will need to talk to the hardware. These services range from shutting down the machine to writing to the console, starting interrupts, handling PCI devices and probably not doing your dishes. After OPAL is initialized Skiboot then starts the bootloader for Petitboot, which unpacks Petitboot's Linux kernel and an initrd (i.e., being a zImage containing Skiroot), and that image is what ultimately brings up Petitboot.

However, when you get right down to it it's still just an ELF binary, so we can replace it as long as we understand how Skiboot calls and starts it.

Up to this point the CPU is in big-endian mode no matter what the terminal operating system is (as an old Power Mac user, this warms my grizzled cybernetic heart) and uses real physical memory addresses. When Skiboot finishes, it loads the single ELF binary stored in the PNOR flash partition BOOTKERNEL and runs it from its given entry point. This binary can be big-endian or little-endian. Skiboot also provides the binary the location of the flattened device tree (the FDT) in register r3, and two special addresses: the base address for OPAL in r8 (in physical memory, mind you), and the actual address to call for OPAL services in r9. This is more or less what kexec() does for a regular kernel, except those registers are guaranteed to be provided by Skiboot no matter the implementation.

OPAL calls assume the machine will stay that way (big-endian, real addresses, and also no external interrupts), so some leg work is required unless you just keep the system that way in the first place. In this simplest case, we'll do exactly that: the Skiboot source code even includes such a minimal boot image which simply says "Hello World!" to the console and shuts down the machine. Here, we see the code save the OPAL registers to non-volatile ones (so that calling OPAL won't clobber them) and use those to make the two OPAL calls themselves, setting the OPAL call number in r0, providing the OPAL base in r2 and any relevant arguments in the standard r3 through r10 registers, and then calling the OPAL entry point.

Let's see it in (brief) action. I will assume you already have QEMU set up to emulate an OpenPOWER machine as in the prior article (in particular, you should have either pnor.PAYLOAD or skiboot.lid available to provide Skiboot). To save you having to do so yourself, I added a little linker-assembler glue, some extra code to support both endian modes (more in a moment) and a trivial build system, and put it up on Github. If you're on an OpenPOWER system, as all right-thinking readers should be, then make should be sufficient to compile both the big and little endian versions, the latter of which I will come back to. If you are not, you will need a cross-building toolchain and should edit the Makefile to point to it.

Using what we learned last time, once you've run make, copy be_payload.elf into the same directory as skiboot.lid (QEMU's emulation doesn't work quite right with Raptor's PNOR Skiboot for this purpose), and let's kick it off:

qemu-system-ppc64 -M powernv8 -cpu power8 \
-nographic \
-bios ./skiboot.lid \
-kernel ./be_payload.elf


Now, what about the little-endian case? This is trickier, because the system starts big-endian and expects big-endian instructions, and simply twiddling the endian bit in the Machine State Register isn't enough (if you do so via typical means like mtmsrd, it is ignored). In fact, only three instructions are allowed to change endianness, namely rfid, its hypervisor analogue hrfid and rfscv, which are all returns from privileged code (interrupt handlers and vectored system calls respectively). Vectored system calls, in fact, weren't even supported in the Linux kernel until 5.9. For our purpose here rfid will suffice.

Let's look at the version of hello_kernel.S I marked up. You will notice that in little endian mode, we are assembling several handwritten opcodes immediately in the macro GO_LITTLE_ENDIAN. These are big-endian instructions (since we're little-endian we can't specify the instructions directly) that set the link register after this little stanza, copy over the MSR and toggle the endian bit, load the link register and the new MSR into the save-restore registers and then act as if we returned from an interrupt handler (rfid). rfid sets the new MSR and jumps to the link register which we have already rigged to be the following instruction. We now continue in little-endian mode.

Now, how do we do OPAL calls? I abstracted the code here a bit for both situations with a OPAL_CALL macro. Big-endian just sets the registers and jumps to the OPAL entry point, since we're in real mode and no external interrupts are presently enabled, exactly the same as the test code in Skiroot. For little-endian, however, I added a little subroutine at the end called le_opal_call which is nearly the same idea as GO_LITTLE_ENDIAN, but in reverse. We save the MSR and the LR in non-volatile registers, turn off the little endian bit in the MSR, compute the new return address for the trampoline after the oncoming rfid and load that into LR, set up srr0 and srr1 — but point to the OPAL entry point instead — and "return from the interrupt."

The OPAL call is thus executed big-endian in real mode. However, when we return following the rfid, we're still big-endian, so we immediately GO_LITTLE_ENDIAN again, restore the old MSR and LR (the LE bit is politely ignored) and return via the link register to the calling routine.

The last trick here is that the length of the string Hello World! will be stored according to the endianness we set for the assembler. If we don't account for this, we'll get a nonsense value in big-endian mode and the OPAL routine that prints a string to the console will spew garbage. When assembling in little-endian mode we thus manually specify the necessary bytes explicitly.

After all that,

qemu-system-ppc64 -M powernv8 -cpu power8 \
-nographic \
-bios ./skiboot.lid \
-kernel ./le_payload.elf

A couple parting comments.

First, while you might think this would be sufficient to make something bootable from both Skiboot and Petitboot, it isn't; if you try to boot this as a kernel from Petitboot it will simply hang. We'll explore this further in a later article. Second, I have intentionally not described how you would actually flash this to PNOR on a real machine lest someone screw something up and blame me for it. In broad strokes, however, you would take either of the ELF binaries and turn it into a PNOR flash partition with fpart (not to be confused with other partition and file management utilities of the same name). Having done so, you would transfer this to the BMC and use pflash to replace the contents of PAYLOAD (after, hopefully, backing up the previous contents with pflash -r). At this point you may now start your machine so it can, um, shut down.

Finally, this entire exercise brings up an interesting question (to me, anyway): is there a performance ramification to running in little-endian vs big-endian, given the additional necessary overhead of flipping endianness every time OPAL is called? The answer is probably, but it's likely negligible in practice unless you're on the bare metal as we are here. Let's compare how little-endian Linux does this in opal-calls.S with big-endian OpenBSD's locore.S; in both listings, scroll down to opal_call and note the differences. Even though we don't have to do quite as much song and dance setting up a trampoline and switching endianness, we still have to twiddle the MSR (in this case to turn off external interrupts and return to real mode), and a similar amount of instruction synchronization must still occur (using isync; rfid and hrfid do this as a natural consequence). From a practical perspective, unless you have some pathological case that makes lots of OPAL calls back to back, the few extra instructions required are probably below the noise threshold when considering everything else that affects performance in modern operating systems.

When will OpenPOWER OpenBSD be now? Now.

We were delighted by the tease that OpenBSD is moving to OpenPOWER (although it is officially big-endian powerpc64, it requires OPAL, so a POWER8 is minimally required). Well, now you can try it out: a powerpc64 snapshot is now available with most of the standard binary distribution sets. The installation documentation is pretty much copy-pasta — I doubt very much that 64-bit PowerPC is supported on AMD Opteron, and I would be impressed to learn that the Pinebook Pro is OpenPOWER — but you should be able to boot from the miniroot (flash it to a USB drive using dd bs=1m) and manually setup and copy the file sets over. Curiously, the X11 distribution sets do not appear to be built yet, so you may be restricted to a text boot and/or the serial console. When I get my spare Talos II back up and running I intend to give this a full shakedown, since this would be a great basis for finally having NetBSD on OpenPOWER (my personal BSD of choice). The fact it doesn't right now is a great shame to an OS that is supposed to run everywhere but doesn't on one of the most open platforms anywhere.

Firefox 78 on POWER

Firefox 78 is released and is running on this Talos II. This version in particular features an updated RegExp engine but is most notable (notorious) for disabling TLS 1.0/1.1 by default (only 1.2/1.3). Unfortunately, because of craziness at $DAYJOB and the lack of a build waterfall or some sort of continuous integration for ppc64le, a build failure slipped through into release but fortunately only in the (optional) tests. The fix is trivial, another compilation bug in the profiler that periodically plagues unsupported platforms, and I have pushed it upstream in bug 1649653. You can either apply that bug to your tree or add ac_add_options --disable-tests to your .mozconfig. Speaking of, as usual, the .mozconfigs we use for debug and optimized builds have been stable since Firefox 67.

UPDATE: The patch has landed on release, beta and ESR 78, so you should be able to build straight from source.

The newest OpenPOWER chip: A2

Besides Microwatt, another open core implementation is now available, the PowerPC A2. The chip name may not be familiar, but its most famous application should be: the Blue Gene/Q supercomputer, based on 45nm 18-core chips (16 active, one unused for yield purposes and one for interrupts, I/O and other on-chip services) at 1.6GHz with a TDP of 55W. In 2012, Blue Gene/Qs took top positions on all three major supercomputer benchmark ratings.

The A2I VHDL on offer does in fact appear to be for the Blue Gene/Q variant. This is important, because A2 doesn't have an FPU or vector unit out of the box; it leaves these to be connected through the auxiliary execution unit (AXU). The A2I BG/Q version, however, does have an IEEE 754-compliant FPU connected to the AXU, and this appears to be provided in the VHDL. There is also apparently an MMU, but while the FPU offers SIMD instructions for up to 4 double-precision floats simultaneously it is not AltiVec, so no VMX/VSX. In addition, despite being SMT-4, it is only dual issue (one instruction to the ALU, one to the AXU), and execution is strictly in-order.

A2I isn't going to replace Microwatt. Microwatt is smaller and simpler, intended for small FPGAs and embedded projects, and is actively evolving by leaps and bounds to the point it can now boot Linux. More to the point, it is intended to be fully OpenPOWER compliant. A2I, however, despite being a fully realized core, is only ISA 2.06 compliant, lacks the radix MMU, lacks AltiVec, and at least right now lacks active developers. But it's small enough that with some work and a process shrink this could be the start of a mobile OpenPOWER system: at 7nm IBM claims it got up to 3.9GHz (their blurb at right claims even higher, to 4.2GHz). And it is indeed under the OpenPOWER license.

The really interesting question is what else might show up under @openpower-cores.

Don't just slack. Power Slack.

A port of Slackware to OpenPOWER ("Riscy Slack") is taking shape, something that delights me personally since Slackware was my first taste of Linux on an old 486 we christened calvin circa 1998. Never a distro for the novice, which is admittedly part of its charm, there's no handholding even on supported platforms and even less so here, so use at your own risk. The current build is based on a snapshot from Slackware64 current, though about a month old as of this writing, and you will need to download and extract the tarballs manually (no slackpkg support yet) with some tweaks (this is the described installation process right now). There is no specific support for POWER8, but X and KDE are apparently working, with some Qt issues still yet to be ironed out. Installation fragments are on a dedicated server and you can watch the progress on the porter's blog.

It's Talos all the way down

Still can't bear the sticker shock of your very own Talos II, or even a itty bitty Blackbird? Why not do what we all do for the machines we can't own and emulate one instead? (And then decide you like it a lot, and save your pennies?)

QEMU 5.0.0 offers a machine model for the bare-metal PowerNV profile, to which the Raptor systems and other OpenPOWER POWER8 and POWER9 designs intended for Linux (i.e., not PowerVM machines) belong. Using the Talos II firmware image (mostly: one snag to be mentioned), you can boot the machine in QEMU and from there bring up an operating system in emulation. In this article we'll prove it works by bringing up Void Linux for Power (hi, Daniel!) in a variety of configurations. A set-up like this might be enough to test that your software or open-source package builds and runs on OpenPOWER, even if you don't own one yet. In a future article we'll talk about how you can boot your own code on the metal so you can port your favourite OS or build a unikernel.

(For the purposes of this article I'll assume an audience that isn't as familiar with OpenPOWER terminology as our usual readership. Kindly humour me.)

The emulation is imperfect, both if you're emulating it on a real Raptor family system or on an icky PC. While QEMU can emulate an AST2500 (i.e., the ARM-based Baseboard Management Controller, which acts as the service processor and provides the video framebuffer), and QEMU can also emulate a PowerNV system, it doesn't do both at the same time. That means the very lowest levels are actually being simulated here -- you can't watch Raptor's pretty Hostboot display, for example, and only the barest functions of the BMC are simulated enough to allow bring-up, not including the framebuffer. In fact, the hardware profiles we will use here do not in general match a real Raptor system either: we're just virtually plugging in PCI devices that give us necessary functionality, though of course none of the peripheral devices in a Raptor system is Raptor-proprietary. Finally, even though I have tagged this entry with KVM, KVM currently doesn't work right with the QEMU PowerNV machine model even though I'm pretty sure it should be technically possible. Sadly, I tried in vain to do so, could never get KVM-HV to be happy, and ended up kernel panicking the machine with KVM-PR. See if you can triumph where I have failed. In the meantime, naturally you can do everything here on a T2 or Blackbird as well because that's how I did it writing this article, but there is no special acceleration for those systems right now.

The first order of business is the first order of business with any emulator: get the ROMs. Fortunately, no one is going to bust you for pirating a set of these because we're an open platform, remember?

The two pieces required are Skiboot and Petitboot, both of which live in the system's PNOR flash. Skiboot contains OPAL, the OpenPOWER Abstraction Layer. It comes in after the BMC has turned on main power and started the Power CPUs' self-boot engines, which then IPL ("initial program load") Hostboot for the second-stage power-on sequence. When Hostboot completes, it chains into Skiboot, which initializes the PCIe host bus controllers (PHBs) and provides all the basic hardware calls needed by a guest kernel to support the platform. You can think of it as something like an overgrown BIOS. This is the lowest firmware level of an OpenPOWER system that QEMU currently supports emulating.

Skiboot lives only to service a kernel, so it immediately starts one. This initial payload is the bootloader for Petitboot, which is also stored in firmware. Petitboot has a small Linux root (Skiroot) and acts as a boot menu, finding bootable volumes on attached devices or over the network. Having found one (or you select one), it chains into it to start the main OS, and from then on Skiboot will provide platform services via OPAL for this final guest until the system is shut down or restarted. Because it's in firmware, Petitboot is always available, which can come in really handy when you're trying to do system recovery.

The first, best and most dedicated way is to build Skiboot and Petitboot yourself. They are open-source and the process is relatively well documented and automated, and you should know how to do this if you own an OpenPOWER machine anyhow. If you aren't doing this on a real OpenPOWER machine you'll need a cross-compiler, but most Linux distros offer such a package nowadays. Do keep in mind that if it looks like you're building a tiny Linux distro, well, that's because that's exactly what you're doing. The advantage here is you can fool around with the firmware at your leisure, but it requires a bit of an investment in disk space and time.

The second way assumes you have a more casual interest and would prefer to go with something prefab. It's possible if you (or, you know, your "friend") has a Raptor-family system to extract the necessary components right from the BMC prompt. Log into the BMC over SSH (or via direct serial connection) and type pflash -i. You'll see a list of all the partitions stored in the PNOR flash. The ones we want are PAYLOAD (which contains Skiboot) and BOOTKERNEL (which contains Skiroot and Petitboot). The exact addresses may vary from system to system and firmware to firmware.

root@bmc:~# pflash -P PAYLOAD -r /tmp/pnor.PAYLOAD --skip=4096
Reading to "/tmp/pnor.PAYLOAD" from 0x021a1000..0x022a1000 !
[==================================================] 100%
root@bmc:~# pflash -P BOOTKERNEL -r /tmp/pnor.BOOTKERNEL --skip=4096
Reading to "/tmp/pnor.BOOTKERNEL" from 0x022a1000..0x03821000 !
[==================================================] 100%

We skip the first 4K page to avoid the wrapping around each partition. pnor.PAYLOAD is actually compressed and needs to be uncompressed prior to use, so:

root@bmc:~# cd /tmp
root@bmc:/tmp# xz -d < pnor.PAYLOAD > skiboot.lid

Finally, scp both skiboot.lid and pnor.BOOTKERNEL to your desired system from the BMC.

Admittedly we just talked at length about the two ways most of you won't get the firmware, so let's talk about the third method and the way most of you will, i.e., you'll just download it. Currently there is an irregularity about Raptor's present Skiboot build for this purpose: it only boots if you are emulating a single POWER8. That's not a typo. If you use it to boot an emulated POWER9, the guest will simply panic, and the guest will go into a bootloop if you are emulating multiple POWER8 CPUs (necessary if you need a larger number of PCIe devices). This is undoubtedly a QEMU deficiency which will be corrected in future releases. In the meantime, if you just care about playing around using a single POWER8 on a terminal, then Raptor's builds (either from BMC flash or downloaded) will suffice. However, if you intend to emulate a POWER9 or SMP POWER8 system, download QEMU's own pre-built skiboot.lid and use that instead.

For Petitboot, we will extract that directly from Raptor's PNOR images. Assuming you didn't get it using the process above, download the current Talos II PNOR image and decompress it. In the shell_upgrade directory you will see the bzip2-compressed PNOR image. Uncompress that, leaving you with a filename like talos-ii-v2.00.pnor. Download my pnorex extractor tool (it's in Perl, because I'm one of those people) and run it on the PNOR image:

% pnorex talos-ii-v2.00.pnor
Version 1 PNOR archive with 33 entries.
Extracting PAYLOAD at offset 8601.
This is a xz format image.
Wrote 1020K successfully.
Extracting BOOTKERNEL at offset 8857.
This is an ELF executable image.
Wrote 22012K successfully.
Extracted 2 partitions successfully.

If you will be using Raptor's Skiroot, then uncompress pnor.PAYLOAD to skiroot.lid as above: xz -d < pnor.PAYLOAD > skiboot.lid

Now, with skiroot.lid (for this first example, either Raptor's or QEMU's) and pnor.BOOTKERNEL in the same folder, grab an ISO you want to boot. I used the prefab one Daniel offers on the Void Linux for Power site since I know it boots fine on OpenPOWER hardware. For our first example let's do a simple example of booting Void from a CD image on a POWER8 using the serial port. Our QEMU command line:

qemu-system-ppc64 -M powernv8 -m 4G -cpu power8 \
-nographic \
-bios ./skiboot.lid \
-kernel ./pnor.BOOTKERNEL \
-device ich9-ahci,id=ahci0 \
-drive id=cd0,media=cdrom,file=void-live-ppc64le-musl-20200411.iso,if=none \
-device ide-cd,bus=ahci0.0,drive=cd0

This configures a single-processor POWER8 system with 4GB of RAM, no graphics, and an Intel AHCI host controller with a single CD-ROM drive attached. The serial output should go to your terminal. It goes a little like this:

Here we are with Skiboot chaining into Petitboot. You can ignore the errors; there will be a lot of them since the platform is still incomplete. It will take a little bit of time to decompress the kernel (much slower than it would be on a regular system). You will notice a single device attached to the three available PCIe host bridges on the single POWER8 CPU, i.e., the host controller itself. Don't you just love that the vendor code for Intel is 8086?

This is Petitboot. When the bootable choices appear, cursor up to the starred option and press E before it autoboots, because we need to tell Void its console is the on-board serial port (otherwise it uses a VGA console: not sure whose bug that is).

Add console=hvc0 at the end, cursor down to OK and hit RETURN/ENTER a couple times to boot.

A successful login on your emulated baby POWER8. Ta-daa! To rudely pull the plug on the QEMU session, press Ctrl-A, and then X (QEMU: Terminated).

Let's now load out the POWER8. We would like to add a video card, an Ethernet card and a USB controller to our existing system, but POWER8 Turismo chips only offer enough PHBs for three PCI endpoints. How do we solve this problem? Easy: we'll add another processor!

At this point you will require the QEMU Skiboot and should use that where skiboot.lid appears in the remainder of this article. I use tun/tap networking in this example, which assumes you already have tap0 configured and up; change the -netdev setting if you want to use a different means of bridging the NIC. This example keeps the AHCI host controller and still displays debug output on the terminal, but uses the QEMU emulated VGA as a console instead and adds a good old Realtek 8139 NIC with a USB mouse and keyboard attached to a QEMU XHCI USB 3.0 controller.

qemu-system-ppc64 -M powernv8 -cpu power8 -m 4G -smp 2 \
-serial mon:stdio \
-device VGA \
-device ich9-ahci,id=ahci0,bus=pcie.0 \
-netdev tap,id=nic0,ifname=tap0,script=no,downscript=no \
-device rtl8139,netdev=nic0,bus=pcie.1 \
-device qemu-xhci,id=usb0,bus=pcie.2 \
-device usb-mouse \
-device usb-kbd \
-bios ./skiboot.lid \
-kernel ./pnor.BOOTKERNEL \
-drive id=cd0,media=cdrom,file=void-live-ppc64le-musl-20200411.iso,if=none \
-device ide-cd,bus=ahci0.0,drive=cd0

Let's spin this sucker like Superman's cape in a dryer:

The reason I keep the serial output is because the extra CPU adds around an extra minute on this T2 to get to Petitboot. Here, you will notice we now have six PHBs available, three per CPU, so now we have enough virtual PCI slots for the peripherals we require.

Petitboot shows up on both the 2D framebuffer and the serial terminal, and both work. You'll also see it probing the bridged Ethernet tap to see if it can boot that way, proving our Ethernet device is up and working. Whichever you use is where boot messages will go, so we'll use the framebuffer as console and start Void by cursoring up and selecting the starred option (thus also proving our USB devices work too).

Having booted Void, we can now demonstrate the PCI cards in the system, the attached peripherals and the number of CPUs. For the record, the DD2.3 POWER9 I'm typing this on shows its Spectre v2 status as "mitigated" with hardware acceleration.

Starting the Installer, which won't install anything because we haven't configured any storage to install to in our QEMU options. I'll leave that as an exercise to the reader.

If we switch to an emulated POWER9 system, Sforza CPUs support six PCI endpoints, so we get six PHBs. This means a single CPU is more than enough for our basic configuration without adding additional startup time. The QEMU command line to do so merely returns to single processor and changes the machine to powernv9 and the CPU to power9, i.e.,

qemu-system-ppc64 -M powernv9 -cpu power9 -m 4G \
-serial mon:stdio \
-device VGA \
-device ich9-ahci,id=ahci0,bus=pcie.0 \
-netdev tap,id=nic0,ifname=tap0,script=no,downscript=no \
-device rtl8139,netdev=nic0,bus=pcie.1 \
-device qemu-xhci,id=usb0,bus=pcie.2 \
-device usb-mouse \
-device usb-kbd \
-bios ./skiboot.lid \
-kernel ./pnor.BOOTKERNEL \
-drive id=cd0,media=cdrom,file=void-live-ppc64le-musl-20200411.iso,if=none \
-device ide-cd,bus=ahci0.0,drive=cd0

and it runs in the same way, but faster, because the emulation overhead is less. So let's totally do something stupid as our last parlour trick and run a POWER9 configuration with as many sockets as QEMU will let us hold (which right now is four). Note that these are all single-threaded cores, so this is still much less powerful than even a 4-core basic Blackbird.

./qemu-system-ppc64 -M powernv9 -cpu power9 -m 4G -smp 4 \
-serial mon:stdio \
-device VGA \
-device ich9-ahci,id=ahci0,bus=pcie.0 \
-netdev tap,id=nic0,ifname=tap0,script=no,downscript=no \
-device rtl8139,netdev=nic0,bus=pcie.1 \
-device qemu-xhci,id=usb0,bus=pcie.2 \
-device usb-mouse \
-device usb-kbd \
-bios ./skiboot.lid \
-kernel ./pnor.BOOTKERNEL \
-drive id=cd0,media=cdrom,file=void-live-ppc64le-musl-20200411.iso,if=none \
-device ide-cd,bus=ahci0.0,drive=cd0

With four emulated CPUs startup took over seven minutes from start to Petitboot on this dual-8 Talos II, so have patience if you're on a lesser workstation, but it does work:

You can see the watchdog complaining about the length of time OPAL calls are taking now (call 128 resets the XIVE VM interrupt controller on POWER9 chips). But we do have our four cores, and it's not impossibly slow on a beefy enough system (like another POWER9).

Incidentally, while the Power ISA emulation in QEMU allows SMT, it's very basic and not enough to get through the boot-up sequence, or at least not before the heat death of the universe. If you like listening to your cooling fans, see what happens when you try to emulate the biggest baddest dual-22 Talos II by adding -accel tcg,thread=multi -smp 176,threads=4,cores=22,sockets=2 to your QEMU command line. It's not pretty. That's why you should buy an OpenPOWER machine of your own instead of emulating one.


Today's featured entry in the increasingly inaccurately named #ShowUsYourTalos series is Karl S.'s Blackbird system, a 4-core unit with 32GB of RAM and a Sapphire RX5500 XT GPU in a rather arresting NZXT H400 case with red accents. A complete bill of materials and prices are proffered for your review. Mind the caution sticker, we wouldn't want to crack the glass.

If you have an OpenPOWER system you'd like to show off, post in the comments. Other than my personal T2, we haven't had any other Talos systems yet, but POWER8s, other POWER9s and of course Blackbirds are always welcome.

Firefox 77 on POWER

Firefox 77 is released. I really couldn't care less about Pocket recommendations, and I don't know who was clamouring for that exactly because everybody be tripping recommendations, but better accessibility options are always welcome and the debugging and developer tools improvements sound really nice. This post is being typed in it.

There are no OpenPOWER-specific changes in Fx77, though a few compilation issues were fixed expeditiously through Dan Horák's testing just in time for the Fx78 beta. Daniel Kolesa reported an issue with system NSS 3.52 and WebRTC, but I have not heard if this is still a problem (at least on the v2 ABI), and I always build using in-tree NSS myself which seems to be fine. This morning Daniel Pocock sent me a basic query of 64-bit Power ISA bugs yet to be fixed in Firefox; I suspect some are dupes (I closed one just this morning which I know I fixed myself already), and many are endian-specific, but we should try whittling down that list (and, as usual, LTO and PGO still need to be investigated further). I'm still using the same .mozconfigs from Firefox 67.

In a minor moment of self-promotion, I'm also shamelessly reminding readers that Fx77 comes out parallel with TenFourFox Feature Parity Release 23, relevant to Talospace readers because I made some fixes to its Content Security Policy support to properly support the web-based OpenBMC with System Package 2.00. Although the serial console-LAN redirector has some stuttery keystrokes, I think this is a timing problem rather than a feature deficiency, and everything else generally works. Connecting over ssh or serial port is naturally always an option, but I have to agree the web OpenBMC is a lot nicer and some tasks are certainly easier that way. If you're a long-term PowerPC dweeb like me and you want to use your beloved Power Mac to manage your brand-spanking-new Talos II or Blackbird, now you can.

Welcome to OpenPOWER, James Kulina!

I'm sad to see Hugh Blemings step down as executive director from the OpenPOWER Foundation (as a dual Aussie-USA citizen it was a distinct pleasure to meet him at SCaLE 17x, back before this whole coronavirus thing), but we're just as happy to welcome his successor, James Kulina. James also hails from the open source industry, and even worked for a little while for Red Hat when his prior company was acquired in 2014.

I particularly like the quote about his goal of making "OpenPOWER one of the easiest platforms to go from an idea to a silicon chip." With toolkits like Microwatt moving swiftly from proof of concept to actually useful, I fully agree I'd like to see OpenPOWER chips in all kinds of settings: definitely more workstation-level options, please (if for no other reason than to keep the Amiga crowd from making more bad choices in CPUs), but also there's no reason that small OpenPOWER designs can't have just as much fun in the embedded space. IBM and the usual corporate suspects can easily take care of the server world themselves; I'm not worried about Power ISA in those environments. Where I want to see it expand and thrive is back into all the market territory Power Architecture lost within the last decade, and a royalty-free ISA with chips, designs and full systems that already exist and are already performance-competitive is a great foundation to build upon. I look forward to hearing what the OpenPOWER Foundation under his leadership has in mind for making that happen.

Godspeed, Hugh, in your next project and do drop by and say hi! And James, if you're reading, drop me a line so we can feature your OpenPOWER setup for #ShowUsYourTalos. I've got one in the pipeline but we'd love to see your particular bona fides too. Welcome to the show!

Updates to Alpine Linux and RHEL, and now Devuan

Some distro updates of note for ppc64le. First, Alpine Linux is now updated to 3.12.0, primarily a maintenance update migrating to Linux 5.4 with refreshes for gcc, LLVM, Node.js and others. ppc64le builds are available in standard, netboot and miniroot flavours, though only x86_64 and x86 are supported for the extended build, and curiously there is still no virtual build either (though I'm sure the standard flavour would serve for that purpose). There is no big-endian ppc64 flavour.

Red Hat also has updated Red Hat Enterprise Linux version 7 to 7.8, approximately based on what was originally Fedora 19. RHEL 7 is notable in that it still supports big-endian ppc64 (on POWER7 and up, sorry G5 folks) as well as ppc64le, including POWER9 support since RHEL 7.4. It's also notable in that it'll cost ya. If you'd prefer not to pay for your OS but you still really want a big-endian Red Hat, then you may be better served by CentOS 7 instead. Maintenance support for RHEL 7 will last until June 30, 2024, with extra extended $$$upport to follow.

Finally, Devuan is updated to 3.0.0. This update officially introduces "ppc64el" support (using kernel 4.19) and given its derivation from Debian Buster 10.4 should boot on any Raptor family machine. I'd consider this distribution not yet fully at parity for OpenPOWER given that the only installation option for ppc64el is a netbooter. Still, I'd be interested to hear package coverage for Devuan and who out there is currently using it.

Linux 5.7

Kernel version 5.7 is released. Besides the new Samsung exFAT support (replacing 5.4's Microsoft driver), a particularly interesting new feature is using thermal pressure as part of scheduling. While POWER9 has a lot of thermal headroom, incorporating this information could nevertheless yield greater efficiencies on high-core-count systems or you cray-cray people trying to cram 18-core CPUs into Blackbirds. Because the On-Chip Controllers have been observable in hwmon since 5.0, OpenPOWER support for this feature should "just work."

On the Power ISA side, other than refactoring the exception code, the most noteworthy changes (to me) are support for fast reboot, which might fix that obnoxious issue I keep seeing with failing over to the serial port on regular reboots; a means for discovery of secure guests under KVM-HV so that those of you using the DD2.3 Ultravisor and the Protected Execution Facility can keep secure guests only on systems that support them; and one of particular historical interest, where compatibility with 32-bit PowerPC binaries is now disabled on ppc64le by default (it can still be enabled with a configuration flag). Big-endian ppc64 does still contain this support by default, presumably because 32-bit PowerPC binaries are more common on those systems, but to be sure, none of my little-endian Power systems have any 32-bit binaries on them. Do yours? It'll be interesting to hear what some of our boutique OpenPOWER distros will do with this going forward.

Fedora 32 mini-review on the Blackbird and Talos II

It took a little time due to hardware hijinx and the present chaos at the day job, but as is traditional, here is the regular mini-review on the latest release of Fedora (now F32) on OpenPOWER. Fedora is what we run here at Chez Floodgap/Talospace, largely because when I got my first Talos II in 2018 Fedora 28 was the only distro that worked right out of the box. However, it's also advantageous because it incorporates updates at a much faster pace than other distributions (which is why you could boot a T2 with it back then), and it's instructive because problems found here can be sorted out hopefully before they get to folks on more conservative distros. That's why I think it's worth all of us caring about it, even if you don't run it. This post is on Firefox 76 on F32.

With this release Fedora 30 progresses to end-of-life, which it hit two days ago. F31 will remain supported until F33 comes out in 6-ish months, when it too will have roughly a month's grace period before being EOLed. As before, I update Fedora manually rather than through GNOME Software. If you haven't seen the process in awhile, it looks like this:

sudo dnf upgrade --refresh # upgrade DNF
sudo dnf install dnf-plugin-system-upgrade # install upgrade plugin if not already done
sudo dnf system-upgrade download --refresh --releasever=32 # download F32 packages
sudo dnf system-upgrade reboot # reboot into upgrader

This process did not go quite as smoothly as before; two packages (that apparently were dependencies of other things) were unupgradable, and the usual --skip-broken couldn't clear the logjam, so I had to dnf remove then by hand. In addition, there is an annoying bug that started sometime in F31 where reboots seem to end up using the serial port as the console no matter what the setting is in Petitboot (you have to do a shutdown and power on instead). Fortunately, I fixed TenFourFox FPR23 to finally work properly with the new web-based BMC firmware, so I could watch the action from the Power Mac G5 using the OpenBMC's serial-to-LAN redirector (see screenshot). If you are on the console, however, for some reason the upgrader on both my Blackbird (with a naked BMC) and my T2 (with a WX7100 workstation card and the firmware in BOOTKERNFW) still reboot up to a black screen. Switch to another VTY, like CTRL-ALT-F2, and log in as root, and you can periodically issue

dnf system-upgrade log --number=-1

to watch the action. The install took a little less than an hour. It automatically restarts afterwards.

Relatively little seems to have changed in terms of system performance (on kernel 5.6.x). VLC seemed a bit quicker on my no-GPU Blackbird at pushing pixels, though mplayer is still the speed champion for media playback. Running Wayland on the Blackbird was yet again an unmitigated disaster and I have not seen any information to suggest this has changed for simple framebuffers in general (so if you're a straight BMC-only system as mine is, fuggedaboutit). On the T2, Wayland is still not as snappy as Xorg, but performance was not massively different and window theming and decorations seem more consistent. However, several games still won't launch in Wayland such as DXX, and because I'm dependent on appmodmap and there is still no equivalent feature in GNOME or Wayland to find out which window is active, Wayland on the whole remains a loss of functionality over Xorg for little gain in my book. startx forever.

If you use a lot of GNOME extensions like I do, then be prepared for some additional fallout as usual. Argos, which I use to power my little DIY IPMI fan monitor, doesn't appear to be working right in GNOME 3.36: scripts update the menu bar but their actual drop-down menus don't work, so I'll have to look into it since the maintainer isn't anymore (it looks like someone else is working on the issue, and this diff seems to fix it). Dash-to-Dock would not update properly with the GNOME tool in Firefox for a couple tries and then suddenly started working for reasons I still don't understand.

However, the worst issue was GNOME crashing every time I opened the Applications drawer. Nothing showed up under All, and if I clicked on Frequent, GNOME would either reset itself or actually force me to log out. After some poring through journalctl -e, I realized it was Appfolders Management that was the problem. Disabling that got me the apps back, and it's not really necessary anymore either since you can drag and drop apps into folders without it now. Other than modest UI changes, though, I don't notice much different about the current GNOME except that my build of Firefox Nightly now has a weirdly elongated icon in the top bar.

Overall, F32 is not a compelling release, but after a couple false starts it's working, and once I got past the usual growing pains there have been no serious problems. I'm hopeful that IBM will take a bit of a firmer hand with Red Hat and especially Fedora in the near future: I'd like OpenPOWER to finally graduate from the "alternative architectures" and be a first-class citizen with x86, which I also think will bring more attention to the port, and hopefully its niggling polish issues can be dealt with. But Wayland is still horrible without a GPU (and isn't feature-comparable with Xorg even with one), and GNOME continues to wreck its own extension ecosystem with wild abandon. Red Hat has substantial influence on these projects as many of these projects' developers work for them. Maybe we should be paying more attention there as well.

Give your OpenPOWER machine a big K1ss

This may be the wrong message in this era of COVID-19, but another operating system option for your POWER9 is to give it a big kiss. That's Kiss Linux, by the way.

Kiss Linux is an independent musl-based non-systemd distribution advertising itself as having "a focus on simplicity and the concept of less is more." Notably it uses busybox as both a source for core utilities and as init, and packages build from source a la Gentoo.

Ostensibly x86_64 only, there is an unofficial ppc64le port available specifically advertised as compatible with the Talos II and Blackbird family. A Debian netboot version is offered which can run directly from Petitboot as a staging area to do the full manual installation with a pre-compiled tarball. Although I don't see any reason why it couldn't be made to work on POWER8, the tarball is compiled for POWER9, so pre-ISA 3.0 systems will need to do more work and there doesn't seem to be current support for big-endian.

On the package side, most things seem to work. Firefox is advertised as functional, which is a good sign because that also means all of its prerequisites (rust, build system, font shaping, etc.) must also be. Netsurf is also advertised, my favourite little browser that can. However, Chromium and LibreOffice are missing, and since Kiss Linux seems to lack dbus this would also exclude GNOME. But, hey, there's no Wayland either, which is good news if you're running a system on just the BMC framebuffer.

If you're running or experimenting with giving your OpenPOWER machine a big sloppy kiss, wipe it off first (ewww), and then post in the comments.

When will OpenPOWER OpenBSD be now? Soon.

The tease of the week is this tweet from the OpenBSD maintainers, which indicates ppc64 (as "powerpc64" and implying big-endian) support is coming to OpenBSD.

It should be noted that this initial patch is very preliminary, largely just the rudiments for getting the kernel to boot, but that's more than it's done previously. It should also be noted that locore.S has dependencies on OPAL (i.e., the Open Power Abstraction Layer), which is provided by Skiboot, so pre-OpenPOWER systems like the Power Mac G5 need not apply (the G5 in particular is better served by OpenBSD/macppc). However, I don't see anything in this first pass that wouldn't work on POWER8.

When I get my "old" dual-4 Talos II up and running again this sounds like something worth experimenting with, and this would probably be the easiest route to getting NetBSD up on OpenPOWER hardware as well (my personal BSD of choice which I run on several systems currently, including a Macintosh IIci). Meanwhile, if you're going to go full Dark Helmet and want a BSD on your OpenPOWER systems that's available Now, look at FreeBSD, which is currently the most mature BSD available for our machines.

The case of the disappearing core

Here's a fun pro-tip: what do you do when one of your system's cores went out to lunch and never came back? On my original dual-4 Talos II my compile times got abnormally long and more sluggish. In dmesg I noted with alarm that it was reporting numa: Node 0 CPUs: 4-15 instead of starting at CPU 0. That means an entire core (because they're SMT-4) somehow went off-line! What gives?

The answer turns out to be related to Hostboot. The GUARD portion of the PNOR controls what hardware components have been disabled (which includes RAM sticks and individual cores), presumably due to defect, but it can also happen spuriously if Hostboot mistakes a driver glitch for actual hardware failure and erroneously turns off that component in the hardware guard entries. With main power off, a simple pflash -P GUARD -c at the BMC root prompt will clear the guard entries and indeed the prodigal core returned forthwith when I powered it back on again. Thanks to Tim Pearson at Raptor for the #protip.


Today's #ShowUsYourTalos is a Blackbird, but hey, we're all family here. Brad describes his unit thusly: "POWER9 DD2.3, 8-core; 64 GB RAM; Fractal Design Meshify C case; be quiet! Shadow Wings 2 140mm front fan; Fractal Design Dynamic X2 120mm rear fan; Silicon Power 512GB PCIe 3 NVMe 1.3 SSD; Samsung Evo 850 1TB SSD (behind motherboard); Radeon Pro WX3200 GPU; FreeBSD 13-CURRENT." Nice! We like seeing some BSD action!

If you have an OpenPOWER system you'd like to show off, post in the comments or E-mail me.

Firefox 76 on POWER

Firefox 76 is released. Besides other CSS, HTML and developer features, it refines that somewhat obnoxious zooming bar a bit, improves Picture-in-Picture further (great for livestreams: using it a lot for church), and most notably adds critical alerts for website breaches and improved password security (both generating good secure passwords and notifying you when a password used on one or other sites may have been stolen). The .mozconfigs are unchanged from Firefox 67, which is good news, because we've been stable without changing build options for quite a while at this point and we might be able to start investigating why some build options fail which should function. In particular, PGO and LTO would be nice to get working.

QEMU adds POWER10 support

"Wait," you say, "POWER10 isn't out yet!" No, it's not; even Axone (variously known as the AIO POWER9 "Advanced I/O" and "POWER9 Prime") is still yet to come out later this year (for that matter, we haven't heard much more about the LaGrange-based Raptor Condor which was announced at the same time). However, there are almost certainly simulations of the forthcoming 7nm POWER10 in IBM's chip labs, and some of that is public in the QEMU sources.

To be sure, what's currently in QEMU is at most rudimentary. The patches that have landed in the new 5.0 release describe the POWER10 as "very similar" to POWER9. The DD1.0 initial stepping (PVR base 0x00800000 as opposed to POWER9 0x004E0000) apparently introduces new ISA 3.10 (POWER9 is 3.0), though the MMU and CPU initialization code for POWER10 right now looks like a copy-pasta from the POWER9 section and doesn't expose any obvious new registers. Similarly, I can't find any IBM documentation on 3.10, so we can presume that's either under wraps or yet to be completed. However, a related commit in Skiboot drops the codename for the POWER10 machine under development: much as the POWER8 prototype was Palmetto and the POWER9 prototype was Witherspoon, POWER10's prototype is named Rainier (presumably for the mountain and not the Simpsons character). Given Axone's release expected later in 2020, the first POWER10 system descended from Rainier (like the AC922 was descended from Witherspoon) will likely not appear until 2021.

Fedora 32 released

Fedora 32 is released (changelog). We care about Fedora a lot at Talospace Low Earth Orbit HQ because it was one of the earliest distributions to run on POWER9 "out of the box" and it's what we run locally. Plus, being more bleeding edge than many distributions, even if you don't run it you should still care about it because OpenPOWER-specific issues on other distros usually get identified on Fedora first.

Given that history and that Red Hat's been an IBM property for a year and a half by now, we'd have hoped to see OpenPOWER move out of the "alternative architectures" (or at least see a little more promotion of the OpenPOWER build). That said, it's got GNOME 3.36, primarily a UX update but with a new suspend option that probably won't work on Raptor workstations anyway, TRIM by default (finally! though I turned this on in 31 anyhow), support for Free Pascal on ppc64le, gcc 10 and glibc 2.31, LLVM 10, Python 3.8 (and retirement of Python 2) and Golang 1.14. Unfortunately, 128-bit floats on ppc64le, necessary to fix some compiler and builder edge cases, have slipped deadline yet again. However, there is a semi-official workstation ISO for ppc64le now, which is welcome for new owners trying to get their machines bootstrapped.

As usual, giving a little time for the mirrors and packages to catch up, I'll try it on my basic Blackbird and Talos II and report back as we have in previous mini-reviews. Meanwhile, download it yourself if you're adventurous.

Eight four two one, twice the cores is (almost) twice as fun

It took a little while but I'm now typing on my second Raptor Talos II workstation, effectively upgrading two years in from a 32GB RAM dual quad-core POWER9 DD2.2 to a 64GB RAM dual octo-core DD2.3. It's rather notable to think about how far we've come with the platform. A number of you have asked about how this changed things in practise, so let's do yet another semi-review.

Again, I say "semi-review" because if I were going to do this right, I'd have set up both the dual-4 and the dual-8 identically, had them do the same tasks and gone back if the results were weird. However, when you're buying a $7000+ workstation you economize where you can, which means I didn't buy any new NVMe cards, bought additional rather than spare RAM, and didn't buy another GPU; the plan was always to consolidate those into the new machine and keep the old chassis, board and CPUs/HSFs as spares. Plus, I moved over the case stickers and those totally change the entire performance characteristics of the system, you dig? We'll let the Phoronix guy(s) do that kind of exacting head-to-head because I pay for this out of pocket and we've all gotta tighten our belts in these days of plague. Here's the new beast sitting beside me under my work area:

(By the way, I'm still taking candidates for #ShowUsYourTalos. If you have pictures uploaded somewhere, I'll rebroadcast them here with your permission and your system's specs. Blackbirds, POWER8s and of course any other OpenPOWER systems welcome. Post in the comments.)

This new "consolidated" system has 64GB of RAM, Raptor's BTO option Radeon WX7100 workstation GPU and two NVMe main drives on the current Talos II 1.01 board, running Fedora 31 as before. In normal usage the dual-8 runs a bit hotter than the dual-4, but this is absolutely par for the course when you've just doubled the number of processors onboard. In my low-Earth-orbit Southern California office the dual-4's fastest fan rarely got above 2100rpm while the dual-8 occasionally spins up to 2300 or 2600rpm. Similarly, give or take system load, the infrared thermometer pegged the dual-4's "exhaust" at around 95 degrees Fahrenheit; the dual-8 puts out about 110 F. However, idle power usage is only about 20W more when sitting around in Firefox and gnome-terminal (130W vs 110W), and the idle fan speeds are about the same such that overall the dual-8 isn't appreciably louder than the very quiet dual-4 was with the most current firmware (with the standard Supermicro fan assemblies, though I replaced the dual-4's PSUs with "super-quiets" a while back and those are in the dual-8 now too).

Naturally the CPUs are the most notable change. Recall that the Sforza "scale out" POWER9 CPUs in Raptor family workstations are SMT-4, i.e., each core offers four hardware threads, which appear as discrete CPUs to the operating system. My dual-4 appeared to be a 32 CPU system to Fedora; this dual-8 appears to have 64. These threads come from "slices," and SMT-4 cores have four which are paired into two "super-slices." They look like this:

Each slice has a vector-scalar unit and an address generator feeding a load-store unit. The VSU has 64-bit integer, floating point and vector ALU components; two slices are needed to get the full 128-bit width of a VMX vector, hence their pairing as super-slices. The super-slices do not have L1 cache of their own, nor do they handle branch instructions or branch prediction; all of that is per-core, which also does instruction fetch and dispatch to the slices. (This has some similar strengths and pitfalls to AMD Ryzen, for example, which also has a single branch unit and caches per core, but the Ryzen execution units are not organized in the same fashion.) The upshot of all this is that certain parallel jobs, especially those that may be competing for scarcer per-core resources like L1 cache or the branch unit, may benefit more from full cores than from threads and this is true of pretty much any SMT implementation. In a like fashion, since each POWER9 slice is not a full vector unit (only the super-slices are), heavy use of VMX would soak up to twice the execution resources though amortized over the greater efficiency vector code would offer over scalar.

The biggest task I routinely do on my T2 are frequent smoke-test builds of Firefox to make sure OpenPOWER-specific bugs are found before they get to release. This was, in fact, where I hoped I would see the most improvement. Indeed, a fair bit of it can be run parallel, so if any of my typical workloads would show benefit, I felt it would likely be this one. Before I tore down the dual-4 I put all 64GB of RAM in it for a final time run to eliminate memory pressure as a variable (and the same sticks are in the dual-8, so it's exactly the same RAM in exactly the same slot layout). These snapshots were done building the Firefox 75 source code from mozilla-release (current as of this writing) with my standard optimized .mozconfig, varying only in the number of jobs specified. I'm only reporting wall time here because frankly that's the only thing I personally cared about. All build runs were done at the text console before booting X and GNOME to further eliminate variability, and I did three runs of each configuration back to back (./mach clobber && ./mach build) to account for any caching that might have occurred. Power was measured at the UPS. Default Spectre and Meltdown mitigations were in effect.

Dual-4 (-j24)

average draw 170W

Dual-8 (-j48)

average draw 230W

Dual-8 (-j24)

average draw 230W

The dual-8 is approximately 40% faster than the dual-4 on this task (or, said another way, the dual-4 was about 1.6x slower), but doubling the number of make processes from my prior configuration didn't seem to yield any improvement despite being well within the 64 threads available. This surprised me, so given that the dual-8 has 16 cores, I tried 16 processes directly:

Dual-8 (-j16)

average draw 215W

This proves, at least for this workload, that SMT does make some difference, just not as much as I would have thought. It also argues that the sweet spot for the dual-4 might have been around -j12, but I'm not willing to tear this box back down to try it. Still, cutting down my build times by over 10 minutes is nothing to sneeze at.

For other kinds of uses, though, I didn't see a lot different in terms of performance between DD2.2 and DD2.3 and to be honest you wouldn't expect to. DD2.3 does have improved Spectre mitigations and this would help the kind of branch-heavy code that would benefit least from additional slices, but the change is relatively minor and the difference in practise indeed seemed to be minimal. On my JIT-accelerated DOSBox build the benchmarks came in nearly exactly the same, as did QEMU running Mac OS 9. Booted into GNOME as I am right now, the extra CPU resources certainly do smooth out doing more things at once, but again, that's of course more a factor of the number of cores and slices than the processor stepping.

Overall I'm pretty pleased with the upgrade, and it's a nice, welcome boost that improves my efficiency further. Here are my present observations if you're thinking about a CPU upgrade too (or are a first time buyer considering how much you should get):

  • Upgrading is pretty easy with these machines: if you bought too little today, you can always drop in a beefier CPU or two tomorrow (assuming you have the dough and the sockets), and the Self-Boot Engine code is generic such that any Sforza POWER9 chip will work on any Raptor system that can accommodate it. I have been repeatedly assured no FPGA update is needed to use DD2.3 chips, even for "O.G." T2 systems. However, if you're running a Blackbird, you should think about case and cooling as well because the 8-core will run noticeably hotter than a 4-core. A lot of the more attractive slimline mATX cases are very thermally constrained, and the 8-core CPU really should be paired with the (3U) high speed fan assembly than the (2U) passive heatsink. This is a big reason why my personal HTPC Blackbird is still a little 4-core system.

  • The 4 and 8-core chips are familiar to most OpenPOWER denizens but the 18 and 22-core monsters have a complicated value proposition. People have certainly run them in Blackbirds; my personal thought is this is courting an early system demise, though I am impressed by how heroic some of those efforts are. I wouldn't myself, however: between the thermal and power demands you're gonna need a bigger boat for these sharks.

    The T2 is designed for them and will work fine, but after my experience here one wonders how loud and hot they would get in warmer environments. Plus, you really need to fill both of those sockets or you'll lose three slots (those serviced by the second CPU's PCIe lanes), which would make them even louder and hotter. The dual-8 setup here gets you 16 cores and all of the slots turned on, so I think it's the better workstation configuration even though it costs a little more than a single-18 and isn't nearly as performant. The dual-18 and dual-22 configurations are really meant for big servers and crazy people.

    With the T2 Lite, though, these CPUs make absolute sense and it would be almost a waste to run one with anything less. The T2 Lite is just a cut-down single-socket T2 board in the same form factor, so it will also easily accommodate any CPU but more cheaply. If you need the massive thread resources of a single-18 (72 thread) or single-22 (88 thread) workstation, and you can make do with an x16 and an x8 slot, it's really the overall best option for those configurations and it's not that much more than a Blackbird board. Plus, being a single CPU configuration it's probably a lot more liveable under one's desk.

  • Simply buying a DD2.3 processor to replace a DD2.2 processor of the same core count probably doesn't pay off for most typical tasks. Unless you need the features (and there are some: besides the Spectre mitigations, it also has Ultravisor support and proper hardware watchpoints), you'll just end up spending additional money for little or no observable benefit. However, if you're going to buy more cores at the same time, then you might as well get a DD2.3 chip and have those extra features just in case you'll need them later. The price difference is almost certainly worth a little futureproofing.

What to do when the BMC won't talk to you

Normally I'm typing TenFourFox posts from the Talos II, but today I'm typing a Talospace post from the old Quad G5. When my dual-8 second T2 arrived, it turned out the onboard NICs were duds (I'm not blaming Raptor for this, mind you; crib death happens with any system). When I started moving components off it to RMA the motherboard, however, one of the RAM sticks got jammed in my old T2 and when I tried to free it, the latch not only snapped off but with such force that it struck the NIC ROM chip and knocked one of the surface mount resistors connected to it off. I will simply say that the cat fled behind the washing machine and didn't come out for hours after I realized what had happened.

That untimely accident thus left me with two systems with dead network ports, and the BMC, the heartbeat of your OpenPOWER machine, the breath of life, the starter, the prime mover, only talks on one of them. Sure, I could plug in an (ugh) USB Ethernet dongle (protip: the old TrendNET USB NIC I dug out of storage doesn't work in Petitboot, so I had to unplug and replug it when Fedora booted), or waste a PCIe slot on another NIC or whathaveyou, but none of those things restores connectivity to the BMC that for security reasons is limited to that top onboard port. The BMC is now an island unto itself, never to be updated or talked to again.

Or ... is it? The BMC fortunately has a serial port. With an easily available 10-pin to DE-9 cable which you can get on eBay for a few bucks, you can talk to your BMC again.

There's no nice web interface (I'll have a later post about how to use TenFourFox and its built-in AppleScript-to-JavaScript bridge to support its later features if you're a Power Mac refugee like me) and everything is strictly by shell, but if you were already used to talking to your BMC over SSH then this won't seem at all unfamiliar. Here it is connected to a USB serial dongle back to the G5.

In ZTerm on this G5 I set the serial connection to 115200bps, 8-N-1. Assuming your system is running the 2.00 system package or later, hit RETURN/ENTER a couple times and you should get a "Phosphor" login prompt; log in as root and your BMC password. You'll get to the root prompt just as you would over SSH.

That suffices for logging in and changing your password if necessary, but what about updating the firmware? Fortunately the clever people who maintain OpenBMC, the standard firmware in Raptor and most other OpenPOWER machines, added rz and sz to the build. That means you're just a ZMODEM upload away from updating your BMC and PNOR! Just go into the correct directory (/tmp for PNOR, /run/initramfs for the BMC firmware), run rz from the shell, and then start a ZMODEM upload from the terminal program on your assisting computer (to update the BMC, you'll upload both image-kernel and image-rofs, and to update the PNOR, whatever file as named). Some terminal programs, or the sz utility if you use it on the client side, will allow you to specify an absolute path. Either way, once it's there, you can then finish the update the usual way.

The Blackbird has such a port as well and uses the same cable as the T2 and T2 Lite. You can even leave the cable connected and this will allow you to access the BMC at any time. Note that rebooting the BMC while the main processors are still running may cause your machine to act a little funny until they're back in sync.

As a postscript, those SMT resistors are damn small, and I'm pretty sure it's not right on the pads anyway assuming I didn't damage something else in the process. But the RMA board did arrive and does work, so I'm rebuilding the new system and we'll talk about whether the jump from a dual-4 to a dual-8 T2 is worth it when I do. The old system still works otherwise and at least with that serial cable and a quad-port PCIe NIC, it may be one slot less but I can still put it to good use.

Firefox 75 on POWER

Firefox 75 seems to build uneventfully on this Raptor Talos II and as always this post is being typed in the new version. I'm not particularly enamoured of the zooming address bar and I'm sure you won't be able to turn it off eventually, but for now you can. A number of the developer-facing features are quite compelling, though. In addition, if you're on Wayland (Xorg forever), Firefox on Wayland now has H.264 VA-API and full WebGL support; I don't know how well these work on Wayland on ppc64le and I'm not going to be the one to tell you, but I'm sure some of you folks will try.

Total build time for opt on this dual-4 DD2.2 system was 36:36.67 (with -j24). Why mention it? Well, my dual-8 DD2.3 is almost here and this sounds like a convenient real-world benchmark to try out on the new box. I'm thinking -j48 sounds nice and still gives me a whole 16 threads for serious business during the compile.

The opt and debug .mozconfigs I'm using are unchanged from Firefox 67.