Showing posts from July, 2020

Condor cancelled

Raptor has confirmed that, unfortunately but not unexpectedly, the LaGrange-based Condor that was announced at the OpenPOWER summit last year has been cancelled due to economic concerns. Certainly any new high-end product would be tough to launch in the present COVID-19 economy, and because its size (ATX) and capabilities (single CPU, OpenCAPI, four slots) would have slotted it between the Talos II and the Talos II Lite in our view, there just isn't a lot of slack not served by those two existing products to soak up. It's probably just as well because I think getting ready for POWER10 would mean more to many users (it certainly would to me), but that itself requires a lot of R&D capacity and Raptor's a small company. Rather than a niche POWER9 design, here's hoping the resources that would have gone to Condor will go to a really kick-a$$ new Rainier-based system instead.

Firefox 79 on POWER

Firefox 79 is out. There are many new web and developer-facing features introduced in this version, of which only a couple are of note to us in 64-bit PowerPC land specifically. The first is a migration of WebExtensions storage to a new Rust-based implementation; there was a bit of a pause while extension storage migrated, so don't panic if the browser seems to stall out for a few long seconds on first run. The second is a further rollout of WebRender to more Windows configurations, so this seemed like a good time to me to check again how well it's working on this side of the fence. With the Raptor BTO WX7100 installed in this Talos II, I've forced it on with gfx.webrender.enabled and layers.acceleration.force-enabled both set to true (restart the browser after) and worked with it all afternoon with no issues noted, so this time I'm just going to leave it on and see how it goes. Any GCN-based AMD video card from Northern Islands on up (the WX7100 is Polaris) should work. about:support will show you if WebRender and hardware acceleration are enabled, though currently no Linux configuration has it enabled by default.

Unfortunately, it turns out relatively few of us are like me where we build the browser ourselves from source, and it seems some distros are enabling features — most likely higher-level optimizations — that trigger broken builds on ppc64le (Ubuntu was mentioned by at least one user). It would be nice to whittle down the offending feature(s) they enabled, both to get local fixes to the distro package configurations and then look at why they don't work (or make the default not to enable them on our platform, solving the problem in both places). I suspect LTO and PGO are to blame, which have a long history of being troublesome, as well as various defects in gold (use GNU bfd as the linker instead). Meanwhile, the build I'm typing this blog post into locally is still happily running on the same .mozconfigs from Firefox 67.

The littlest POWER9 booter

In our previous article we talked about emulating an OpenPOWER system from Skiboot up through the Petitboot boot menu using extracts from pre-built PNOR firmware images (and/or QEMU) instead of having to build your own. Well, what if you want to build your own?

You can certainly download and build Skiboot and Skiroot/Petitboot from scratch, or naturally any of the firmware stages in PNOR flash since we're a fully open platform, and there is an entire (huge) build system to automate this process. It's big and intimidating to the uninitiated, and it also works just dandy. But for this simpler example, let's start with something a little smaller which can serve as an educational tool as well.

Recall that Skiboot is the lowest level emulated by QEMU presently, although in reality it is an intermediate phase started by an earlier boot stage, i.e., Hostboot (the pretty graphical boot you see in current versions of the Raptor firmware). Among other tasks Skiboot's most important one is to offer the services provided by OPAL, the OpenPOWER Abstraction Layer, which the operating system will need to talk to the hardware. These services range from shutting down the machine to writing to the console, starting interrupts, handling PCI devices and probably not doing your dishes. After OPAL is initialized Skiboot then starts the bootloader for Petitboot, which unpacks Petitboot's Linux kernel and an initrd (i.e., being a zImage containing Skiroot), and that image is what ultimately brings up Petitboot.

However, when you get right down to it it's still just an ELF binary, so we can replace it as long as we understand how Skiboot calls and starts it.

Up to this point the CPU is in big-endian mode no matter what the terminal operating system is (as an old Power Mac user, this warms my grizzled cybernetic heart) and uses real physical memory addresses. When Skiboot finishes, it loads the single ELF binary stored in the PNOR flash partition BOOTKERNEL and runs it from its given entry point. This binary can be big-endian or little-endian. Skiboot also provides the binary the location of the flattened device tree (the FDT) in register r3, and two special addresses: the base address for OPAL in r8 (in physical memory, mind you), and the actual address to call for OPAL services in r9. This is more or less what kexec() does for a regular kernel, except those registers are guaranteed to be provided by Skiboot no matter the implementation.

OPAL calls assume the machine will stay that way (big-endian, real addresses, and also no external interrupts), so some leg work is required unless you just keep the system that way in the first place. In this simplest case, we'll do exactly that: the Skiboot source code even includes such a minimal boot image which simply says "Hello World!" to the console and shuts down the machine. Here, we see the code save the OPAL registers to non-volatile ones (so that calling OPAL won't clobber them) and use those to make the two OPAL calls themselves, setting the OPAL call number in r0, providing the OPAL base in r2 and any relevant arguments in the standard r3 through r10 registers, and then calling the OPAL entry point.

Let's see it in (brief) action. I will assume you already have QEMU set up to emulate an OpenPOWER machine as in the prior article (in particular, you should have either pnor.PAYLOAD or skiboot.lid available to provide Skiboot). To save you having to do so yourself, I added a little linker-assembler glue, some extra code to support both endian modes (more in a moment) and a trivial build system, and put it up on Github. If you're on an OpenPOWER system, as all right-thinking readers should be, then make should be sufficient to compile both the big and little endian versions, the latter of which I will come back to. If you are not, you will need a cross-building toolchain and should edit the Makefile to point to it.

Using what we learned last time, once you've run make, copy be_payload.elf into the same directory as skiboot.lid (QEMU's emulation doesn't work quite right with Raptor's PNOR Skiboot for this purpose), and let's kick it off:

qemu-system-ppc64 -M powernv8 -cpu power8 \
-nographic \
-bios ./skiboot.lid \
-kernel ./be_payload.elf


Now, what about the little-endian case? This is trickier, because the system starts big-endian and expects big-endian instructions, and simply twiddling the endian bit in the Machine State Register isn't enough (if you do so via typical means like mtmsrd, it is ignored). In fact, only three instructions are allowed to change endianness, namely rfid, its hypervisor analogue hrfid and rfscv, which are all returns from privileged code (interrupt handlers and vectored system calls respectively). Vectored system calls, in fact, weren't even supported in the Linux kernel until 5.9. For our purpose here rfid will suffice.

Let's look at the version of hello_kernel.S I marked up. You will notice that in little endian mode, we are assembling several handwritten opcodes immediately in the macro GO_LITTLE_ENDIAN. These are big-endian instructions (since we're little-endian we can't specify the instructions directly) that set the link register after this little stanza, copy over the MSR and toggle the endian bit, load the link register and the new MSR into the save-restore registers and then act as if we returned from an interrupt handler (rfid). rfid sets the new MSR and jumps to the link register which we have already rigged to be the following instruction. We now continue in little-endian mode.

Now, how do we do OPAL calls? I abstracted the code here a bit for both situations with a OPAL_CALL macro. Big-endian just sets the registers and jumps to the OPAL entry point, since we're in real mode and no external interrupts are presently enabled, exactly the same as the test code in Skiroot. For little-endian, however, I added a little subroutine at the end called le_opal_call which is nearly the same idea as GO_LITTLE_ENDIAN, but in reverse. We save the MSR and the LR in non-volatile registers, turn off the little endian bit in the MSR, compute the new return address for the trampoline after the oncoming rfid and load that into LR, set up srr0 and srr1 — but point to the OPAL entry point instead — and "return from the interrupt."

The OPAL call is thus executed big-endian in real mode. However, when we return following the rfid, we're still big-endian, so we immediately GO_LITTLE_ENDIAN again, restore the old MSR and LR (the LE bit is politely ignored) and return via the link register to the calling routine.

The last trick here is that the length of the string Hello World! will be stored according to the endianness we set for the assembler. If we don't account for this, we'll get a nonsense value in big-endian mode and the OPAL routine that prints a string to the console will spew garbage. When assembling in little-endian mode we thus manually specify the necessary bytes explicitly.

After all that,

qemu-system-ppc64 -M powernv8 -cpu power8 \
-nographic \
-bios ./skiboot.lid \
-kernel ./le_payload.elf

A couple parting comments.

First, while you might think this would be sufficient to make something bootable from both Skiboot and Petitboot, it isn't; if you try to boot this as a kernel from Petitboot it will simply hang. We'll explore this further in a later article. Second, I have intentionally not described how you would actually flash this to PNOR on a real machine lest someone screw something up and blame me for it. In broad strokes, however, you would take either of the ELF binaries and turn it into a PNOR flash partition with fpart (not to be confused with other partition and file management utilities of the same name). Having done so, you would transfer this to the BMC and use pflash to replace the contents of PAYLOAD (after, hopefully, backing up the previous contents with pflash -r). At this point you may now start your machine so it can, um, shut down.

Finally, this entire exercise brings up an interesting question (to me, anyway): is there a performance ramification to running in little-endian vs big-endian, given the additional necessary overhead of flipping endianness every time OPAL is called? The answer is probably, but it's likely negligible in practice unless you're on the bare metal as we are here. Let's compare how little-endian Linux does this in opal-calls.S with big-endian OpenBSD's locore.S; in both listings, scroll down to opal_call and note the differences. Even though we don't have to do quite as much song and dance setting up a trampoline and switching endianness, we still have to twiddle the MSR (in this case to turn off external interrupts and return to real mode), and a similar amount of instruction synchronization must still occur (using isync; rfid and hrfid do this as a natural consequence). From a practical perspective, unless you have some pathological case that makes lots of OPAL calls back to back, the few extra instructions required are probably below the noise threshold when considering everything else that affects performance in modern operating systems.

When will OpenPOWER OpenBSD be now? Now.

We were delighted by the tease that OpenBSD is moving to OpenPOWER (although it is officially big-endian powerpc64, it requires OPAL, so a POWER8 is minimally required). Well, now you can try it out: a powerpc64 snapshot is now available with most of the standard binary distribution sets. The installation documentation is pretty much copy-pasta — I doubt very much that 64-bit PowerPC is supported on AMD Opteron, and I would be impressed to learn that the Pinebook Pro is OpenPOWER — but you should be able to boot from the miniroot (flash it to a USB drive using dd bs=1m) and manually setup and copy the file sets over. Curiously, the X11 distribution sets do not appear to be built yet, so you may be restricted to a text boot and/or the serial console. When I get my spare Talos II back up and running I intend to give this a full shakedown, since this would be a great basis for finally having NetBSD on OpenPOWER (my personal BSD of choice). The fact it doesn't right now is a great shame to an OS that is supposed to run everywhere but doesn't on one of the most open platforms anywhere.