Posts

Showing posts from 2021

Firefox 89 on POWER


Firefox 89 was released last week with much fanfare over its new interface, though being the curmudgeon I am I'm less enamoured of it. I like the improvements to menus and doorhangers but I'm a big user of compact tabs, which were deprecated, and even with compact mode surreptitously enabled the tab bar is still about a third or so bigger than Firefox 88 (see screenshot). There do seem to be some other performance improvements, though, plus the usual more lower-level changes and WebRender is now on by default for all Linux configurations, including for you fools out there trying to run Nvidia GPUs.

The chief problem is that Fx89 may not compile correctly with certain versions of gcc 11 (see bugs 1710235 and 1713968). For Fedora users if you aren't on 11.1.1-3 (the current version as of this writing) you won't be able to compile the browser at all, and you may not be able to compile it fully even then without putting a # pragma GCC diagnostic ignored "-Wnonnull" at the top of js/src/builtin/streams/PipeToState.cpp (I still can't; see bug 1713968). gcc 10 is unaffected. I used the same .mozconfigs and PGO-LTO optimization patches as we used for Firefox 88. With those changes the browser runs well.

While waiting for the updated gcc I decided to see if clang/clang++ could now build the browser completely on ppc64le (it couldn't before), even though gcc remains my preferred compiler as it generates higher performance objects. The answer is now it can and this time it did, merely by substituting clang for gcc in the .mozconfig, but even using the bfd linker it makes a defective Firefox that freezes or crashes outright on startup; it could not proceed to the second phase of PGO-LTO and the build system aborted with an opaque error -139. So much for that. For the time being I think I'd rather spend my free cycles on the OpenPOWER JavaScript JIT than figuring out why clang still sucks at this.

Some of you will also have noticed the Mac-style pulldown menus in the screenshot, even though this Talos II is running Fedora 34. This comes from firefox-appmenu, which since I build from source is trivial to patch in, and the Fildem global menu GNOME extension (additional tips) paired with my own custom gnome-shell theme. I don't relish adding another GNOME extension that Fedora 35 is certain to break, but it's kind of nice to engage my Mac mouse-le memory and it also gives me a little extra vertical room. You'll notice the window also lacks client-side decorations since I can just close the window with key combinations; this gives me a little extra horizontal tab room too. If you want that, don't apply this particular patch from the firefox-appmenu series and just use the other two .patches.

Progress on the OpenPOWER SpiderMonkey JIT


Progress!

% gdb --args obj/dist/bin/js --no-baseline --no-ion --no-native-regexp --blinterp-eager -e 'print("hello world")'
GNU gdb (GDB) Fedora 10.1-14.fc34
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "ppc64le-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from obj/dist/bin/js...
(gdb) run
Starting program: obj/dist/bin/js --no-baseline --no-ion --no-native-regexp --blinterp-eager -e print\(\"hello\ world\"\)
warning: Expected absolute pathname for libpthread in the inferior, but got .gnu_debugdata for /lib64/libpthread.so.0.
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
[New LWP 2797069]
[LWP 2797069 exited]
[New LWP 2797070]
[New LWP 2797071]
[New LWP 2797072]
[New LWP 2797073]
[New LWP 2797074]
[New LWP 2797075]
[New LWP 2797076]
[New LWP 2797077]
hello world
[LWP 2797072 exited]
[LWP 2797070 exited]
[LWP 2797074 exited]
[LWP 2797077 exited]
[LWP 2797073 exited]
[LWP 2797071 exited]
[LWP 2797076 exited]
[LWP 2797075 exited]
[Inferior 1 (process 2797041) exited normally]

This may not look like much, but it demonstrates that the current version of the OpenPOWER JavaScript JIT for Firefox can emit machine language instructions correctly (mostly — still more codegen bugs to shake out), handles the instruction cache correctly, handles ABI-compliant calls into the SpiderMonkey VM correctly (the IonMonkey JIT is not ABI-compliant except at those edges), and enters and exits routines without making a mess of the stack. Much of the code originates from TenFourFox's "IonPower" 32-bit PowerPC JIT, though obviously greatly expanded, and there is still ongoing work to make sure it is properly 64-bit aware and takes advantage of instructions available in later versions of the Power ISA. (No more spills to the stack to convert floating point, for example. Yay for VSX!)

Although it is only the lowest level of the JIT, what Mozilla calls the Baseline Interpreter, there is substantial code in common between the Baseline Interpreter and the second-stage Baseline Compiler. Because it has much less overhead compared to Baseline Compiler and to the full-fledged Ion JIT, the Baseline Interpreter can significantly improve page loads all by itself. In fact, my next step might be to get regular expressions and the OpenPOWER Baseline Interpreter to pass the test suite and then drag that into a current version of Firefox for continued work so that it can get banged on for reliability and improve performance for those people who want to build it (analogous to how we got PPCBC running first before full-fledged IonPower in TenFourFox). Eventually full Ion JIT and Wasm support should follow, though those both use rather different codepaths apart from the fundamental portions of the backend which still need to be shaped.

A big shout-out goes to Justin Hibbits, who took TenFourFox's code and merged it with the work I had initially done on JitPower way back in the Firefox 62 days but was never able to finish. With him having done most of the grunt work, I was able to get it to compile and then started attacking the various bugs in it.

Want to contribute? It's on Github. Tracing down bugs is labour-intensive, and involves a lot of emitting trap instructions and single-stepping in the debugger, but when you see those small steps add up into meaningful fixes (man, it was great to see those two words appear) it's really rewarding. I'm happy to give tips to anyone who wants to participate. Once it can pass the test suite at some JIT level, it will be time to forward-port it and if we can get our skates on it might even be possible to upstream it into the next Firefox ESR.

For better or worse, the Web is a runtime. Let's get OpenPOWER workstations running it better.

Tonight's game on OpenPOWER: Blake Stone Aliens of Gold


Everything is awful, so now that we've rescued a Blackbird let's go shoot more aliens. One of the more entertaining games based on id's Wolfenstein 3D engine was Apogee's Blake Stone: Aliens of Gold from 1993, developed by yet another set of apparent refugees from Softdisk, but that's another story for another day. It kept the basic formula but added subtle lighting effects and ceiling and floor textures, along with more varied environments, lots of (cartoony but fun to shoot) monsters, and a very helpful automap.

Ordinarily this wouldn't be worth a mention; that's what we have DOSBox for (see our article on adding an OpenPOWER JIT to DOSBox). Despite the fact that DOSBox does support this game, however, I do actually own a retail copy of Blake Stone from back in the day and it runs like dried snot even with a JIT.

Fortunately, the source code to Blake Stone was released back in the day as well after it was long believed to be lost, and an excellent SDL/OpenGL port called BStone is available which adds many graphical improvements, mouse look (well, side to side, anyway), and 16:9 modes as demonstrated in the screenshot. It also supports the IMHO inferior sequel, Planet Strike.

To start saving mankind, you can play the shareware version, but it's more fun to play with a retail copy (mine is the 1994 FormGen release, but the one you can still buy from Apogee will work), or extract the game assets from the DOSBox-based GOG installer. The CD or 3D Realms downloads are easiest to work with, as you can just copy the contents into a folder.

Clone the BStone Github project. You will need CMake and SDL 2.0.4 (or better) development headers and libraries. The CMake build recipe assumes that your SDL has a static libSDL2main.a library, which apparently the ones from Fedora, Slackware and possibly others don't, which may require modifying the SDL CMake component that comes with it (I had to). Then mkdir build, cd build and cmake .. to kick it off.

Once built you can start the game either from the directory where your Blake Stone files are (I have cd ~/stuff/dosbox/BSTONE && ~/src/bstone/build/src/bstone), or pass bstone the --data_dir option with a path (if it fails to detect the correct game, try passing --aog, --aog_sw or --ps). If you don't have OpenAL-capable hardware, disable OpenAL sound from the in-game configuration menu, or you may get random crashes during play. Don't shoot the informants.

Fedora 34 mini-review on the Blackbird and Talos II (it sucks)


Once again it's time to upgrade Floodgap's stable of Raptor systems to the latest release of Fedora, which is up to version 34 (see our prior review of Fedora 33). You may not necessarily run Fedora yourself, but the fact that it does run is important, because it tends to be very ahead of most distros and many problems are identified in it and fixed before moving to other less advanced ones. And boy howdy, are there problems this time. I'm going to get it over with and tl;dr myself right now: if you use GNOME as your desktop environment and you haven't upgraded yet, DON'T. F34 and in particular GNOME 40 are half-baked, and the problems don't seem specific to OpenPOWER and the hard work of folks like Dan Horák; these issues are more generalized. There is always that sense of dread over what's going to break during the update, and while I'm finally typing in Firefox on this updated Talos II, it took me hours to get everything glued back together and the desktop performance problems in particular are cramping my ability to use the system well. Fedora 33 will still be supported until a month after F35 comes out; it may be worth sticking with F33 for a couple months for the GNOME team to work on the remaining performance issues.

The problems started from the very beginning, even before actually updating. I do my updates initially on the Blackbird to shake out any major problems before doing it to my daily driver T2. As I explained previously, neither the Blackbird nor the T2 use gdm; they both boot to a text prompt, and we jump to GNOME with startx (or XDG_SESSION_TYPE=wayland dbus-run-session gnome-session if we want to explore the Wayland Wasteland). I do the upgrade at the text prompt so that there is minimal chance of interference. Our usual MO to update Fedora is, as root,

dnf upgrade --refresh # upgrade prior system and DNF
dnf install dnf-plugin-system-upgrade # install upgrade plugin if not already done
dnf system-upgrade download --refresh --releasever=34 # download F34 packages
dnf system-upgrade reboot # reboot into upgrader

If you do this with F34, however, you get a number of downgrades (unavoidable, apparently), missing groups and an instant conflict with iptables when you try to download the packages:

dnf suggests we add --best --allowerasing to deal with that. It doesn't work:
Neither does adding --skip-broken. The non-obvious solution is dnf system-upgrade download --refresh --releasever=34 --allowerasing, and just ignoring the duff package.
The Blackbird does not have a GPU; all video output is on the ASPEED BMC (using the Blackbird's HDMI port). Ordinarily I would select the new kernel from Petitboot when it restarts after the final command above to see a text log of the installation but this time we get an actual graphical install screen.
After the installation completed, the machine rebooted uneventfully and came up to the text prompt. I entered startx as usual and ...
At this point GNOME just plain hung up. There was no mouse pointer, though pressing ENTER on the keyboard triggered the button and put it back to the text prompt. Nothing unusual was in the Xorg logs, and journalctl -e showed only what seemed like a non-fatal glitch (Window manager warning: Unsupported session type). Well, maybe the time for the Wayland Wasteland was now. I did an exec bash (gnome-session doesn't properly handle using another shell, or you get weird errors like Unknown option: -l because it tries to be cute with the options to exec) and XDG_SESSION_TYPE=wayland dbus-run-session gnome-session, and Wayland does start:
However, it still doesn't support 1920x1080 on the Blackbird on-board HDMI, just 1024x768. It also seemed a little sluggish with the mouse. I exited it and tried to start gnome-session --debug --failsafe but it wouldn't initialize.

It then dawned on me that I was setting XDG_SESSION_TYPE manually for Wayland; I previously left it unset for X11. Setting XDG_SESSION_TYPE to x11 finally brought up GNOME 40 in X with a full 1080p display:

I put that into my .cshrc and that was one problem solved. The Applications drawer seemed a little slower to come up, though I have a very vanilla installation on this Blackbird on purpose and few apps are loaded, so I didn't try scrolling through the list or running lots of applications at once. (More on that in a moment.)

Just to see if anything shook out subsequently, I ran dnf upgrade again. This time the missing iptables compatibility packages came up:

That solves that mystery, so just ignore iptables during the initial download and the next time you run dnf after Fedora has been upgraded, it will clean up and install the right components. This whole sordid affair now shows up in the Release Notes.

Upgrading the Talos II is usually a much more complex undertaking anyway because I have custom GNOME themes and extensions installed on it and I always expect there will be some bustage. I don't like it, mind you, but I expect it. Armed with what I had learned from the Blackbird, I installed the packages on the T2 (some other groups also had "no match," though all of my optionally installed packages could and did upgrade) and rebooted.

Unlike the Blackbird, however, the installer still came up in a text screen as in prior upgrades when I selected that kernel from the Petitboot menu.
This machine has the BTO AMD WX7100 workstation card and does not use the ASPEED BMC framebuffer. If you don't select the kernel from the menu and just let the default go, you will get the usual black screen again, and as in prior versions you'll have to pick another VTY with CTRL-ALT-F2 or something, log in as root and periodically issue dnf system-upgrade log --number=-1 to watch.

I rebooted and started X (with XDG_SESSION_TYPE=x11), and GNOME came up, but it looked a little ... off.

If you noticed the weird pink-purple tint, you win the prize. However, my second monitor seemed to have a normal display (so did the Blackbird), and the difference is that my main display is colour-managed. When I selected the default profile, the tint went away but my colours weren't, you know, just right. I spent a few hours regenerating the profile with my Pantone huey manually with dispcal, but the same thing happened with the new profile.

The problem is the new colour transform matrix (CTM) support; the prior profile obviously worked fine in 3.38 but isn't compatible with 40. The proper way to solve this would be by letting GNOME make a new colour profile for you from the Settings app and it even allegedly supports the Pantone huey and other colourimeters. However, it has never (to my knowledge) worked properly on OpenPOWER (it crashes), so I've never been able to do this myself. Instead, my current solution is to just temporarily disable CTM with

xrandr --output DisplayPort-0 --set CTM 0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1

(that's 0, 1, seven zeroes, 1, seven more zeroes, and 1). Adjust DisplayPort-0 to where your colour-managed display is connected. Note that every time you (re)start GNOME or its shell, it will forget this setting and you'll have to enter it again. It would be nice if the colour manager could work with OpenPOWER, but CTM should have never broken working profiles in the first place.

However, that all got solved later, because an even more pressing concern popped up first: the UI was slow as molasses. GNOME 40 defaults to the Activities overview on startup with nothing running. It takes literally several seconds to move from one page of activities/apps to the next. Several seconds. This problem is not unique to OpenPOWER, and occurs on both Wayland and Xorg, but a general fix is apparently months away.

The performance problems are not X11-specific. In fact, Wayland is even worse, because the mouse stutters even just moving it around. This is the first time Wayland is actually worse on the system with the GPU (the T2) rather than the system without one (the Blackbird), though I hardly consider this regression a positive development.

What am I doing about it? Well, what can I do about it, short of trying to fix it myself? GNOME is the default environment for Fedora Desktop, and while I could switch to KDE or Xfce (and I might!), these are serious regressions that are hitting a decent proportion of users and were even evident during the beta phase. Did QA just fall asleep or something? To top it off, even if it were working well, whose freaking bright idea was it to make you go to the upper left corner to click Activities, then back to the bar to click the Show Applications button, just to pull up what you have installed? I've started using the Applications menu that Fedora includes by default; at least that doesn't take a Presidential administration or two or wild sweeping mouse gestures just to show you a list of apps, even though it's still noticeably slower than 3.38.

The slowdowns are entirely specific to GNOME. Once you actually get an app started, like Firefox or a game, display speed is fine, so the problem clearly isn't pushing pixels; it's something higher level in GNOME. Switching all the core scheduling to performance made at most minimal difference. Similarly, (as root) echo high > /sys/class/drm/card0/device/power_dpm_force_performance_level instead of auto made things a little better, but there is still no excuse for how bad it is generally. About the only thing that made more difference than that was simply turning animations off altogether in GNOME Tweaks. Nothing was smooth anymore, but it was about twice as fast at doing anything, so that's how I'm limping along for the time being.

With those significant problems on deck, the usual turmoil with custom themes and extensions is actually anticlimactic. I had to make some tweaks to my custom Tiger-like GNOME shell extension to fix the panel height and a weird glitch with slightly thicker border lines on the edges of the panel, which you can see in the screenshot below. Quite a few extensions could not automatically update to GNOME 40, either:

I've become irritated enough by this that I actually did set disable-extension-version-validation to true in dconf-editor, which made a couple start working immediately, including my beloved Argos custom script driver. For the others I downloaded the most current version of the shell system monitor and this fork of Dash-to-Dock, and manually installed them in ~/.local/share/gnome-shell/extensions/ (you may need to reset the GNOME shell with Alt+F2 and r to get gnome-extensions enable to actually see their UUIDs). A few I should have dispensed with earlier: No Topleft Hot Corner can now be simply replaced by gsettings set org.gnome.desktop.interface enable-hot-corners false, and AlternateTab's switcher behaviour now can be rigged manually from GNOME Settings.

I'm now more or less back where I started from, but working with apps is much less fluid and the desktop experience is undeniably inferior to prior releases, and I can't believe no one thought to blow the whistle during the test phase.

If you use Fedora purely as a command-line server, other than the initial hiccups with downloading packages, it seems to work. If you use KDE or Xfce or anything other than GNOME as your desktop, you're probably okay with F34 too, though I didn't test those (I may later). But if you use the default GNOME on Fedora, especially if you use Wayland, think twice about this update before installing it while you've still got some time with F33. Part of riding the bleeding edge is drawing blood now and then, but F34's wounds seem much more self-inflicted than usual. This is the worst Fedora update since I started using it in F28 and I'm not exaggerating in the slightest.

LibreBMC and Kestrel: two separate BMC tastes that taste separate separately


After our article earlier this week on LibreBMC and my concern as to what it meant for Raptor's own Kestrel "soft BMC" project, Raptor's Tim Pearson contacted me and advised that Kestrel is a standalone Raptor product on its own timetable that's still very much in development. In fact, the attached screenshot shows evolution from just last month with On-Chip Controller support now in the firmware, where it is able to read the temperature sensors and has scaffolding for monitoring fan RPM. Although Raptor makes it clear that Kestrel is and always will be open, and is developed on OpenPOWER systems using open tooling, it is mostly Raptor in-house code with the only non-Raptor bits being Zephyr (the OS), LiteX (the FPGA designer), Microwatt (the CPU core) and some minor components like I2C.

The other noteworthy thing is that Kestrel will indeed be offered as an aftermarket plug-and-play upgrade for existing Blackbird and Talos II systems, no soldering required. This is excellent news, because while LibreBMC is a very encouraging development and has wide-ranging implications beyond OpenPOWER systems, its basis on OpenBMC means a heavier installation that continues to be less well suited to a workstation environment (in terms of interface and sheer startup time amongst other things). While Antmicro's planned LibreBMC card may also work in Raptor systems, it's really meant for OpenPOWER servers as well as other server-class machines with onboard BMCs that aren't necessarily Power-based. Whether we like it or not, while there's a lot of nerd cred around OpenPOWER workstations, such systems presently represent a minority of the POWER9 installed base (let alone of all BMC-managed hardware). It thus makes sense that Raptor, currently the only manufacturer of OpenPOWER workstation machines, would be in the best position to create an open BMC that also is better tailored to improving the desktop user experience. As I've written here before, the envisioned improvements to fan control and user interface are very welcome, but Kestrel's promised 10-second power-to-IPL is a huge win for the desktop when ASPEED BMCs running LibreBMC right now take over a minute or more. LibreBMC's OpenPOWER core will certainly improve performance but I doubt it will be to that extent.

But I also suspect more good news than just an aftermarket upgrade is afoot. I pointed out in the LibreBMC article how much one of the physical form factors in particular looks an awful lot like an single-board computer, because really when you have a system-on-a-chip and on-board peripherals, you kind of have a "PowerPi" already. Raptor isn't saying, but Kestrel, like LibreBMC, will necessarily have its own SoC at its heart and thus could easily be the basis of another OpenPOWER SBC project, possibly one that might be even better suited for that environment. Plus, being Raptor, you know all the components will be open and blob-free (especially now that they've eliminated the blobs for the onboard Broadcom Ethernet, leaving just the Microsemi PM8068 as the last blob firmware component and only if you buy it as a BTO option). It will be worth watching to see when and how all this comes to market. Either way, getting your choice of BMC alternatives is a really great thing especially when the BMC is so critical to the system as a whole.

GNU Guix 1.3


As previously promised, support for POWER9 is now officially a part of the newly released Guix 1.3. Besides many performance and functionality improvements allowing you to Scheme your way to a full installation (or roll back one), the OpenPOWER port should run on any POWER9 PowerNV system (no word on POWER8 support currently, although the support is for generic powerpc64le). Note that as many binary substitutes aren't yet built for POWER9, support is still considered a "technology preview" and you will need to build many of the packages from source for the time being. It is expected this support will catch up as more build capacity comes online.

The only disappointing thing about this announcement is that there doesn't appear to be a way to install standalone Guix (i.e., Guix System) on OpenPOWER yet, nor are there any ISO images you can just pop in your workstation; you'll need to install some other bootable foreign distro and install Guix-the-package-manager on top of it. Presumably Debian, Fedora, Void and others will be suitable, but a full from-scratch install option will be needed to bring this port fully up to parity.

LibreBMC announced, end of Kestrel?


Oh, how timely. On the heels of our article yesterday on "little Power" comes an official announcement from the OpenPOWER Foundation for LibreBMC, a fully open and Power ISA-based BMC replacement.

All PowerNV systems currently use some form of ASPEED BMC to provide the baseboard management features necessary to run the system (what we old fogies used to call "service processors"), which on these machines includes IPMI, offering the PNOR flash ROM to the processors to start the machine, a 2D frame buffer, environmental monitoring, network, front panel and other on-board interconnects. They run their own operating system, almost always OpenBMC, and function as a computer-within-a-computer. The goal here is to replace the ASPEED devices with a multi-vendor supported option compliant to the Open Compute Project's Data Center Secure Module Specification, but would (from our perspective as users and operators) appear much the same. It would also be compatible with Power, x86 and ARM systems like current BMC offerings. However, unlike ASPEED chips which are ARM-based, LibreBMC would run on a Power ISA core, possibly even a descendant of Microwatt.

Some of the names announced in connection with LibreBMC are familiar: Yadro appeared in yesterday's piece, a Russian developer of high performance systems and storage solutions, and Google was one of the developers of the OCP DC-SCM spec and is undoubtedly advising on the high-level design. Raptor is also involved, which is very good news, because their involvement suggests there are unlikely to be hidden blobs and the LibreBMC will be truly an open device (the announcement says it's all open tooling using LiteX, and synthesizeable on Lattice and Xilinx FPGAs). If you want a "little Power" system-on-a-chip, you could do a whole lot worse than simply ripping this off into a little stand-alone developer board. Maybe this is where the PowerPi will come from.

But, oddly, despite their involvement Raptor isn't making LibreBMC: that's being done by Antmicro, who currently has RISC-V and ARM-based products and will be "developing the LibreBMC card," implying that LibreBMC will also be offered as a physical component instead of just IP and VHDL code. Specifications for the physical form factor are even in the DC-SCM specification: it has a special slot similarly sized to an OCP NIC, but differently keyed, and the vertical form factor option 1 in particular looks very much like Beagle Boards and RPis. The fact that LibreBMC will run OpenBMC also suggests this is not a direct port of Kestrel, Raptor's own prototype BMC replacement; Kestrel currently runs its own OS based on Zephyr.

If this is a quiet way of announcing Kestrel is throwing in with LibreBMC, this would disappoint me somewhat. Kestrel, though granted in a very early form, seemed a better fit for those of us on the small but visible Power workstation side if its promises on faster start times were to be believed. In particular its trimmer OS choice was an important difference: OpenBMC's problem (from my outsider view) is that it wants to be all things to all workloads, which means a bigger loadout with more features but slower start times and a much larger firmware footprint. LibreBMC's move to a higher-performance Power core should help but software always expands to meet the available CPU power. BangBMC is still out there and aimed to address performance by cutting out fat like dbus and systemd, and apparently is now facilitated by Raptor, but it hasn't gotten a great deal of traction and nobody ships it as default, for example. A bespoke fast-start workstation-oriented BMC would be a great fit for our desktop systems but LibreBMC doesn't sound like that. And if this is going to be the simultaneous basis of a little Power board as well, bootup time really has to be better.

There are also the practical aspects for current owners: OpenBMC isn't going to drop ASPEED support anytime soon since there's a large install base running it (on lots of machines, not just OpenPOWER), but future updates would understandably prioritize the new hotness. Kestrel, although you needed a soldering iron, could be retrofitted to existing boards. Especially since open POWER10 systems will be slower to arrive, what about a LibreBMC retrofit option (that doesn't involve board work) for existing machines? Will the Antmicro "card" offer this, or will there be an aftermarket product from Raptor or another vendor? We'd also like whatever hardware and software improvements the LibreBMC initiative will foster especially if we won't be replacing our systems for awhile.

Nevertheless, I wait to see what happens with cautious enthusiasm, because I think that the importance of the BMC to the system (it runs and sees everything, after all) really demands an open solution. Plus, the presence of a BMC means there's (going to be) an OpenPOWER SoC, meaning smaller Power solutions may be just around the corner. Whoever ends up making it, this is good news.

The strange bedfellows who want a little POWER


Thanks to reader Karl S for the initial report that got me down the rabbit hole, who discovered a quiet entry for a PowerPi workgroup at the OpenPOWER Foundation, clearly modeled on the ARM-based Raspberry Pi.

The obvious basis for a "little Power" device would be something like Microwatt, which Raptor is already using for Kestrel, and to which IBM staff regularly adds new features. While IBM also opened up other cores, notably the A2 family, Microwatt is (or at least will be) more like what's under our desk in terms of supported instructions and features, and is probably the easiest of the open synthesizeable cores to work with right now.

Still, Microwatt is "merely" a core, albeit a rapidly developing one, and not actually a product for sale or even a chip (though it may be in the near future). This makes the PowerPi concept particularly interesting in that it's trying to use real hardware to appeal to the same market segment that would rather experiment with something smaller, but end up unrealistically repelled by even Raptor's "low end" (relatively) pricing and reach instead for ARM or RISC-V boards because of their sensitivity to cost.

The PowerPi workgroup has four members. The first three are IBM (naturally); YADRO, a Russian manufacturer of high performance servers and storage solutions; and Van Tosh, a Belgian cloud solutions provider with their own RHEL offshoot. With the exception of IBM, none of them seem to have the expertise to work on a mini-POWER system, nor do their current sites indicate any particular proclivity.

But the fourth is noteworthy, a company called ChipEleven, a new, privately held VC-backed fabless design firm based in New Orleans that started just last year. Its CEO is a Master's student at Georgia Institute of Technology. They advertise a system on a chip with lots of peripheral interconnects, a quadcore ISA 3.0B-compliant CPU, an ARM Mali G77 GPU, onboard VGA up to 1920x1080, Ethernet, SPI flash and DDR4 RAM controller, with an ETA of 2023.

Does this sound a little familiar? It might, because an earlier project exists called Libre-SOC. Libre-SOC started with a RISC-V design to serve as a libre GPU, interestingly based on the CDC 6600, but shifted to OpenPOWER for higher performance and now plans to release an entire SoC instead. Raptor has provided material assistance (though Kestrel is proceeding on its own separate path) and development by volunteers is funded by NLNet donations. Allegedly, ChipEleven started with the Libre-SOC project and cleaved off, though to what extent (if any) ChipEleven's design derives from Libre-SOC is disputed.

It's not generally my habit to report on new speculative projects because a lot of them, even those with well-meaning and motivated people, will eventually flame out or their organizers lose interest. And ChipEleven may do just that: 2023 is a long way off (POWER10 will have certainly arrived by that point) and that doesn't mean there will be an actual physical product you can buy, let alone what it would cost. For its part, Libre-SOC wisely makes no promises. Still, the fact such a workgroup exists and has the backing of the OpenPOWER Foundation is notable, even if the current members are a motley bunch. Whether they can actually ship something is another story.

OpenBSD 6.9


Great past few days for new OpenPOWER operating system support: now the newest release of OpenBSD is available as well to complement Ubuntu and Fedora updates earlier this week. Many improvements have landed to the big-endian powerpc64 port since its début in OpenBSD 6.8, most notably framebuffer support for the ASPEED BMC, workarounds for addressing AMD GPUs over PCIe, power-saving mode for POWER9, and IPMI on PowerNV systems (which is all Raptor-family machines and quite a few others). General to all ports include an additional privacy guard for video devices (similar to the existing one for audio) among other webcam fixes, encrypted RAID-1, performance improvements to SMP, lots of additional hardware drivers, security fixes and updates to networking and crypto.

With this release, your BSD choices on OpenPOWER just got more solid between this and the mature FreeBSD port. Again, the real shame is why there's still no support for OpenPOWER in NetBSD.

Prefixed instructions and more in the OpenPOWER 64-bit ELF V2 ABI Specification


This is pretty nerdy, but you read this blog, so you have no room to talk. The ABI specification for 64-bit ELF v2, which is virtually everything running little-endian and many big-endian 64-bit Power systems, is now in draft for version 1.5. Although library interfaces are not in spec, the document defines a linking interface for executables and shared objects so that register usage and calling convenion is consistent. Rarely if ever would such updates include breaking changes and the small version number increment implies no large shifts, but those of us who write JITs and compilers pay attention to these updates since they may yield new opportunities to generate better code, and they also occasionally shed light on new or upcoming instructions.

Besides incorporating errata from the previous version 1.4, and moving vector programming to a separate document, the most interesting change here is related to POWER10. We've mentioned prefixed instructions briefly here before, a new class of load/store operations available in POWER10 and PowerISA 3.1 that effectively create 64-bit instructions: a first half (prefix) containing up to 18 bits of a displacement, and a second half (suffix) containing the lower 16. This enables to you to express as many as 34 bits of displacement for a memory address with an instruction like pld (as opposed to the 32-bit classical instruction ld with a 16-bit displacement); previously the maximum you could indicate was 26 bits, and only for instructions like unconditional branches that allowed it. However, the R bit in the prefix allows you to use the address of the prefix itself as the register against which the displacement is added, rather than having to have a general purpose register do it or (for non-local data) the dedicated TOC register. (Recall that in Power ISA the program counter is not a general purpose register.) This is actually a big deal for things like constant pools (embedding constants directly into many RISC-style instructions is generally unwieldy) because now you can just squirt local data right into whatever function fragment you're generating instead of having to keep track of it separately. This is mentioned briefly in the ISA 3.1 manual but the ABI spec makes it more prominent as a feature.

Unfortunately, various complexities in instruction dispatch make this less useful than it would appear to be. Prefixed instructions cannot be split over 64-byte (not bit!) instruction address boundaries or else an alignment exception occurs, which even if the OS handles it for you would be expensive. On the other hand, the CPU is clearly treating them as two 32-bit pieces because the prefix is always followed by the suffix regardless of the endianness (i.e., in little endian mode, the suffix is not in front of the prefix), and there are also debugging irregularities in that some suffixes are actually regular instructions that mean something different when used as a prefix. These can be detected by looking at bit 6 of the prefix (not the suffix), which if set indicates the suffix is a valid instruction that the prefix changes the behaviour of, but one wonders if more changes are to come and we don't need any more mscdfr (means something completely different for r0)-type situations in the ISA. A great example is pnop (yes, a prefixed no-operation instruction): you'd think the suffix would be ignored, and it mostly is, except if it's a branch instruction, rfebb, any context synchronizer other than isync or a service processor attention instruction. The ISA 3.1 book benignly says, "This restriction eases hardware implementation complexity." Well, thanks a lot! Does your head hurt too?

Again, I'm not a fan of introducing variable length instructions into what was a fairly regular instruction set and there are many important gotchas which to me seemed avoidable, but the displacement features are welcome and it makes certain on-the-metal programming tasks easier. Always watch these deceptively boring documents closely because there are sometimes valuable signals in their changelogs. Unfortunately, until the situation with POWER10 and OMI gets worked out, this is of largely academic interest.

Fedora 34


And hot on the heels of Ubuntu 21.04 is the latest iteration of Fedora, version 34. Fedora is of particular interest here at Floodgap Orbiting HQ, not only because it serves as an early warning indicator for problems on OpenPOWER as one of the most cutting-edge distros, but it's also the distro I'm typing this blog post into and personally use on a daily basis. Now that F34 has hit release, F32 will become EOL in one month as usual.

Most of us are interested in Workstation-level changes and the most notable is GNOME 40, which introduces new and sure-to-be-controversial changes to Activities (though if this helps multiple display management I'll be a believer), separation in the dash of running apps and favourites (good), and additional shortcuts and gesture support. Other system-wide changes include transparent by-default zstd compression for btrfs, routing all audio through PipeWire (including PulseAudio, JACK and legacy ALSA), enabling systemd-oomd by default, updating to glibc 2.33 as well as gcc 11, llvm 12 and binutils 2.35, upgrading to Ruby 3.0, and (another controversial one) using Wayland by default for KDE Plasma users as well. Another nice minor change is that kernel firmware files will now be compressed by default, saving a bit of space.

On the OpenPOWER side, however, specific platform improvements are rather thin on the ground. 128-bit long double got deferred again, which I've been tracking since Fedora 30 (!!) as certain packages like MAME require it to build out of the box, and there has been little appetite to consider a Workstation-specific 4K page option.

Because I confidently expect GNOME 40 will break all my extensions, and some minor interval is required to ensure all the packages are built for ppc64le, our usual mini-review for F34 will follow in a couple weeks on both Blackbird and T2 systems. Meanwhile, read how it went with F33 in preparation.

Ubuntu 21.04 and the expanding Wayland Wasteland


It's not really Power-specific to be disenchanted with Wayland; there are lots of people who don't like it even on majority platforms like x86_64. I also think that, much like the residual disdain for systemd, a fair amount of the backlash comes from some profoundly unwarranted scope creep in the project. X11 has a lot of historical cruft in its lower reaches which deserved at minimum a solid refactor and I do appreciate Wayland's engineering improvements, but Wayland throws the baby out with the bathwater by putting way too much on the back of the compositor, and as far as claims over security and network transparency are concerned one man's security hole is always another one's convenience. Most of the Wayland developers just throw up their hands when confronted with some functionality that was fine in X11 and say "patches welcome" and "don't expect us to scratch your itch," and then wonder why people get cheesed off when stuff quits working. It would be less aggravating if the process were less headlong but such is the state of Linux desktop development where only established players with their own priorities have traction.

That said, the problem is somewhat more acute on OpenPOWER because of the lack of a libre GPU. Right now, if you don't trust AMD or Nvidia, your solitary choice is the on-board ASPEED BMC and that gives you a 2D framebuffer, period. (Even Kestrel won't fix that.) Performance used to be abysmal under Wayland and now is tolerable, though there are still various problems, and even performance with a GPU seemed to regress a little in Fedora 33. I still don't use it anyway because no current Wayland compositor will tell you what the front window is, nor does it seem any of them care about that, even though X11 facilitated this for literally decades. Again, why not run everything through XWayland by default, let Wayland-friendly apps opt out, and get the best of both worlds for (nearly) free? Why intentionally p*ss everyone off by telling them their working edge cases don't matter?

Nevertheless, the Wayland Wasteland expands with Ubuntu 21.04 "Hirsute Hippo" (release notes), which now also makes Wayland the default as it has been in Fedora for many versions now. Fedora allows you to opt out by either running startx manually (as I do) or for those of you running gdm to set WaylandEnable=false in /etc/gdm/custom.conf, and this functionality will probably remain for as long as X is supported in Red Hat (I'm guessing end of support for RHEL 7, maybe 8). In Ubuntu currently you can do the same thing, but the file is /etc/gdm3/daemon.conf instead (or use the cog on the login screen, though the login screen would still come up in Wayland unless you set that flag). As before little-endian OpenPOWER systems (which Ubuntu calls ppc64el) are officially only offered a Server build for download, but you can then convert it to Desktop.

Should you upgrade? If you're happy with X11 on your OpenPOWER system and the performance is good, maybe you should just stick with 20.04, which is a Long Term Support release (21.04 isn't) and will get updates until 2025. But if the future really is the Wayland Wasteland, at least getting more people stuck in the sand will mean some of these rough spots could get smoothed over, and a better software-only rendering pipeline would at least improve the firmware-free use case. In the meantime, hello, X11: you may be ugly and everyone says you smell bad, but you've never gotten in my way.

Will Kestrel become the better BMC?


Raptor Engineering's Microwatt-powered Kestrel BMC replacement is improving by leaps and bounds. The screenshot was from a Twitter post showing its internal Web server (like OpenBMC) and ability to update its own firmware (also like OpenBMC). And naturally everything is open-source, based on Zephyr.

But that's not the part that attracted me most. What really got me excited was a 10-second start time. Yup, you read that right: Kestrel is ready and able to bring up the system 10 seconds after power is applied, compared to a good couple minutes or more with the current ASPEED BMC running OpenBMC — and the majority of that ten seconds is programming the FPGA. While a couple minutes isn't a big deal on a server system, it's a real problem when it's a desktop, something I complained about way back in my Blackbird semi-review since in its role as a household HTPC it gets powered up and down a fair bit, and shaving off literal minutes of time to login is huge in that setting (as well as anywhere else OpenPOWER is being used as a "small" system).

OpenBMC on ASPEED is by no means perfect in other respects, either. Raptor claims upstream has been slow to incorporate improvements to the user interface and fan control (though the project disputes this). On the user side, this Raptor Talos II is pretty quiet but it's also a big EATX Supermicro chassis with two HSFs and multiple case fans; the Blackbird in its lithe mATX case tends to have an annoying habit of spooling up and down even in a cool room, even with quieted fan assemblies. And it's always been the case with IBM and IBM-derived hardware where one bad fan can sometimes make the difference between booting or not (the ASMI in my personal POWER6 will refuse, and has refused, to power on the main CPUs if all the fans aren't fully operational).

However, some of the slowness may be due to the current requirement on Power ISA designs that the BMC be fully up before offering the virtual PNOR to the main CPUs, which apparently isn't necessary on x86 and allows some parts of bring-up to occur in parallel. A partial hardware solution may be needed to mitigate that deficiency. OpenBMC does have some ideas, including converting OpenBMC's initramfs and UBI to zstd compression which shaves a few more seconds off, and some of the BMC forks out there have done more by jettisoning entire components judged generally unnecessary (but with corresponding impacts to flexibility, and none have gained significant traction).

It may well be that OpenBMC, because it needs to be all things to all deployments, may not be the best place for firmware designed primarily for workstations. If so, then Kestrel (when it's fully "a thing") would be the next best option. However, that doesn't yield an obvious solution for the installed base like the three Raptor systems here, and installing Kestrel on an existing board is still not a trivial process (nor has, at least of this writing, it been advertised to work on the T2 or T2 Lite). Raptor probably doesn't want to be in the board upgrade business either, so any envisioned Kestrel upgrade for older systems would need to be user-installable, and preferably without a soldering iron. I'm all thumbs with one myself and even more so when it's SMT.

Baseband management is only one part of what makes a system liveable, but for desktop machines it's not an insignificant one. No matter what form it ends up taking, any improvements make a difference. And if a Kestrel board is the current way forward, at least we know it will certainly be as trustworthy, if not more so, as the ASPEEDs we already use.

Firefox 88 on POWER


Firefox 88 is out. In addition to a bunch of new CSS properties, JavaScript is now supported in PDF files even within Firefox's own viewer, meaning there is no escape, and FTP is disabled, meaning you will need to use 78ESR (though you get two more weeks of ESR as a reprieve, since Firefox 89 has been delayed to allow UI code to further settle). I've long pondered doing a generic "cURL extension" that would reenable all sorts of protocols through a shim to either curl or libcurl; maybe it's time for it.

Fortunately Fx88 builds uneventually as usual on OpenPOWER, though our PGO-LTO patches (apply to the tree with patch -p1) required a slight tweak to nsTerminator.cpp. Debug and optimized .mozconfigs are unchanged.

Also, an early milestone in the Firefox JavaScript JIT for OpenPOWER: Justin Hibbits merged my earlier jitpower work to a later tree (right now based on Firefox 86) and filled in the gaps with code from TenFourFox, and after some polishing up I did over the weekend, a JIT-enabled JavaScript shell now compiles on Fedora ppc64le. However, it immediately asserts due to probably some missing defintions for register sets, and I'm sure there are many other assertions and lurking bugs to be fixed, but this is much further along than before. The fork is on Github for others who wish to contribute; I will probably decommission the old jitpower project soon since it is now superfluous. More to come.

FreeBSD 13 and Guix for OpenPOWER


After a bit of downtime, we're back. And cool stuff has happened in our absence, the most notable being additional improvements to the increasingly mature OpenPOWER port of FreeBSD. 13-RELEASE, among other changes, officially introduces the 64-bit little-endian port (previously exclusively big-endian, which is still supported), experimental radix MMU support for POWER9 (hashed page tables are of course supported everywhere), XIVE interrupt support on POWER9 (about 10% faster), optimized memcpy(), memmove() and like-minded standard functions, and many stability and performance improvements. The releases notes say that "performance during bulk -a package building is at least 60% higher" which is very impressive. ISOs are available from their download server.

In addition, ppc64le support has been merged to the GNU Guix source tree, meaning with the next expected version 1.2.1 you'll hopefully be able to get a pre-built copy. It's been in development for several months and now it appears to be finally approaching reality. Like Guix the package manager, the GNU Guix System's most notable feature is its declarative service and package configuration, all on top of the GNU Shepherd init system and (right now) Linux 5.9. Currently there is still a reproducibility issue with gcc, rust is still at least somewhat experimental (which is relevant for librsvg) and many packages have not been tested. Still, since the Talos II and T2 Lite are GNU Respects Your Freedom systems, now you can run another GNU-free OS on them too and sooner than you think.

More 64K page problems and some solutions


Meanwhile, as the question of a 4K-page Fedora 34 remains as yet undecided, if you are using a more recent video card with your Linux OpenPOWER system Trung LĂȘ reports that kernel 5.11.x still crashes with 64K pages on his AMD R9 Nano. (Older cards, like the AMD WX7100 workstation GPU in this Talos II that Raptor sells as a BTO option, are unaffected may also be affected — see comments for more.) This is relevant since Fedora 33 is moving to 5.11. If you're a Fedora 33 user and wish to continue with the 5.10.x series until a fix for amdgpu emerges, kernels are available from his Github project.

In the meantime, speaking personally as one of those people who still use FireWire/IEEE-1394, 4K pages are at least part of the problem as to why FireWire cards don't seem to work in my F33 Talos II (I tried a Rosewill one first without success, and more recently an Iocrest card using a more typical Texas Instruments controller). Although a patch for 64K page support was initially submitted, it was rejected, and the followup patch was never tested. I'll be getting around to trying this myself and hopefully getting it into the kernel, but in the meantime report back if using the patch works for you (I still use some FireWire devices, particularly for video capture and legacy interchange with the Power Macs that lurk around here).

Tonight's game on OpenPOWER: The Original Strife Veteran's Edition


I'm a big fan of Strife, famously the last game to use id Software's Doom 3-D engine, and a nice hybrid of light RPG and heavy action. The engine might have been old and the plot was more shooting than Shakespeare, but hey: the voice in your ear is named Blackbird. You can't beat that!

I actually do own a retail copy of Strife from back in the day for MS-DOS; I bought it new and played it on the 486 I still keep around for such things. It plays just fine in Chocolate Doom on this POWER9, but I later heard about Strife: Veteran Edition on GOG.com that fixed minor bugs with better music and improved graphics, and even threw in some extra enhancements and achievements, but still kept the plot, voice acting and character art. It had a Linux version, but clearly one for x86. But that's not a problem when you have the source code.

The source code builds largely uneventfully as long as you have the prerequisites (Fedora 33 and I did not test on big-endian). In particular, it will want cmake, SDL2, libogg, libtheora, libvorbis, zlib, libpng and OpenGL. However, it tries to link against libSD2_main which is no longer necessary; after you've run cmake and make, it will fail with No rule to make target 'SDL2_MAIN_LIBRARY-NOTFOUND'. To get around this, edit (in the build directory where you ran cmake) ./CMakeFiles/strife-ve.dir/link.txt and remove SDL2_MAIN_LIBRARY-NOTFOUND from the single long line link command, then edit ./CMakeFiles/strife-ve.dir/build.make and just delete the line strife-ve: SDL2_MAIN_LIBRARY-NOTFOUND. Run make again and it will link.

Since it built, I decided to spend $10 and try to extract the game assets from the GOG pack. GOG gives this to you as a behemoth 400 megabyte "shell script" which is really a wrapper for a ZIP archive with a MojoSetup installer. Irritatingly the installer is all just binaries, but you can feed it to unzip and it will break it apart. If we list the contents of the file, it will conveniently ignore the header and go right for the ZIP archive, and the money is in data/noarch/game. Thus, do unzip ./the_original_strife_veteran_edition_1_1_1_43197.sh data/noarch/game/'*' and the assets will be extracted to data/noarch/game.

If you want, you can just move the files in game/ (maintain the tree under it, don't flatten it) in with the POWER9 binary of strife-ve, but if you don't want all the x86 binary crap you don't need, a quick find . -name '*.so.*' -print | xargs rm and rm strife-ve before you copy it over should remove the bulk of it. Conveniently, the GOG assets also include the original DOS version (in DOS/) and all the relevant WADs so you can also run it in Chocolate Doom or our OpenPOWER-JIT DOSBox, and another copy of the source code just in case you lose it. Anyway, with everything moved, if you run ./strife-ve it should then just work.

Don't keep the Front waiting.

MIPS goes RISC-V


The RISC-V community is buoyed by Wave Computing, fresh from Chapter 11 bankruptcy, reemerging with the name of its subsidiary MIPS Technologies to develop ... RISC-V chips.

This actually says less about RISC-V than it says about the new MIPS Technologies. You'll recall that MIPS Technologies, formerly Wave Computing, suspended the MIPS Open Initiative, apparently to position it for sale before they went under, and nobody bit. Understandably so: the once great architecture that powered the SGI MIPS workstations in my office (I own an Indy, an Indigo2 and a Fuel) has now been relegated to the "too cheap for ARM" embedded market, which coincidentally is exactly where RISC-V has gotten most of its design wins so far.

And Wave MIPS Technologies isn't giving anything up. There is no mention of the licensing program for MIPS-the-architecture ending, which means it remains in operation, and it remains closed. Indeed, Tallwood Venture Capital probably demanded it, as a hedge in case their efforts with RISC-V aren't sufficiently profitable. MIPS Technologies will be entering a crowded field with other established players, notably SiFive, and not a lot of extant IP to suggest they will substantially leapfrog those existing designs in performance or power usage (if at all). In that sense, this announcement is best seen as a cynical way to capture public interest rather than an important engineering leap, and the RISC-V community should not in any way conclude they have gained a valuable partner. If anything, they've failed to avoid a new, shadier member of the ecosystem who actually took steps to make their previous products less open. That's not a good look for an architecture that has made openness its defining characteristic.

Juicing QEMU for fun, ??? and profit!


The number of packages and applications natively available for OpenPOWER continue to grow in just about every distro's package manager, and even if a prebuilt package doesn't exist even more will build from source. But emulation is still going to be a fact of life for Windows-only/x86/x86_64-only (maybe even aarch64-only) binaries we can't rebuild, and KVM only helps us with other Power ISA systems (in fact, it looks like KVM-PR broke and can't boot Mac OS X again, so I guess I'll be diving back into the source), so we need to wring as much speed out of QEMU's emulation engine as possible.

We are fortunate with QEMU in that there is ppc64le support in TCG, the Tiny Code Generator which implements a basic JIT, and the Power ISA TCG backend even emits those tasty newer POWER9 instructions to take better advantage of the processor. Without TCG, QEMU would be dreadfully slow when emulating a foreign architecture. However, unless IBM or some other OpenPOWER hardware developer implements instructions (a la Apple M1) in a future chip that specifically improve emulation of other CPUs (like, I dunno, x86_64), there's very little that can be done to improve the code the Power TCG backend generates and CPU emulation spends most of its time in TCG-generated code.

However, the software MMU that QEMU's CPU emulation uses has pre-compiled portions, and all the devices and components QEMU emulates (like the system bus, video, mass storage, USB, etc.) are also pre-compiled. This gives us an opportunity: with a little extra elbow grease, you can make a link-time-optimized and profile-guided-optimized (LTO-PGO) build of QEMU specific to the particular workload which can run the CPU anywhere from 3-8% faster and video and other devices up to 15% faster depending on the set of devices. While number crunching isn't substantially faster, and the modest CPU improvements don't improve user-mode emulation a great deal, full system emulation's general responsiveness improves and makes using more applications more feasible.

This process is not automated. For Firefox, we make LTO-PGO builds using the internal machinery and our patches for gcc compatibility, which is currently our preferred compiler on OpenPOWER systems. The Firefox build system generates a profiling build first, then automatically collects profiling data with it off a model workload and builds the optimized browser from that profile. QEMU doesn't have that infrastructure right now, but you can do it manually: you configure and compile a profiling build, run your workload with it to create a profile, and then configure and compile an optimized build with the profile thus generated.

I'll give instructions here for both QEMU 5.0 and 5.2, since 5.0 seems to be a bit more performant than 5.2 and has fewer build prerequisites, but 5.2 is more straightforward and we'll do it first. In these examples, I'm optimizing ppc-softmmu so that I can run Mac OS 9, which has never worked properly with KVM-PR; substitute with your desired target, such as x86_64-softmmu. Only do one target at a time, and you will want to do individual builds for each system image — even if you normally use the same executable binary for multiple OSes — because different code paths may be exercised with different workloads and/or configurations.

Let's start with making a profiling build. To do this, we'll add -fprofile-generate to the compiler flags (as well as -flto for LTO). For consistency we'll pass the same set of options to the C compiler, the C++ compiler and the linker (each will ignore options they don't need). In the QEMU source tree,

  • mkdir build
  • cd build
  • ../configure --extra-cflags="-O3 -mcpu=power9 -flto -fprofile-generate" \
    --extra-cxxflags="-O3 -mcpu=power9 -flto -fprofile-generate" \
    --extra-ldflags="-flto -fprofile-generate" --target-list=ppc-softmmu
  • make -j24 (or as appropriate: this is a dual-8 Talos II)

Wait for QEMU to build. When it finishes, back up your drive image because you may not be able to shut it down normally and it would suck to damage it inadvertently. With a backup copy saved, run the new QEMU as you ordinarily would on your target workload. For example, my classic script is (assuming you're still in the build directory)

./qemu-system-ppc -M mac99,accel=tcg,via=pmu -m 1536 -boot c \
-drive id=root,file=classic.img,format=qcow2,l2-cache-size=4M \
-usb -netdev tap,id=mynet0,ifname=tap0,script=no,downscript=no \
-device rtl8139,netdev=mynet0 -rtc base=localtime

You should use as close to your normal configuration as possible so that the device drivers you run are factored into the profile.

The first thing you'll notice is that QEMU is now really, really, really slow. Crust-of-the-earth-cooling slow. This is because it's storing all that profile data every time any block of compiled code is executed. As a result you will probably not be able to type or interact with the guest in any meaningful fashion, so let the system boot, grab a cup of a fortifying beverage and and wait for it to get as far as it can. For Mac OS 9, it took several minutes to get to the desktop; for OS X 10.4, it took about a quarter of an hour (with a lot of timeouts in a verbose boot) to even start the login window. At some point you will not be able to usefully proceed any further with the guest, but fortunately you backed up your drive image already, so you can simply close the window.

Go back to the build directory. This time we will tell gcc to build with the generated profile (-fprofile-use), though we will allow it to account for certain changes (-fprofile-correction) and allow compilation to occur even if a profile doesn't exist for a particular target (-Wno-missing-profile) so that it can get through configure cleanly:

  • make clean (this doesn't remove the profile .gcda files)
  • ../configure \ --extra-cflags="-O3 -mcpu=power9 -flto -fprofile-correction -fprofile-use -Wno-missing-profile" \
    --extra-cxxflags="-O3 -mcpu=power9 -flto -fprofile-use -fprofile-correction -Wno-missing-profile" \
    --extra-ldflags="-flto -fprofile-use -fprofile-correction -Wno-missing-profile" \
    --target-list=ppc-softmmu
  • make -j24

Enjoy the new hotness. You should be able to see measurable improvements in the CPU emulation, but more importantly, boot times and responsiveness of the full system emulation should also be improved.

For 5.0.0, the process is a bit more complicated, but it's a bit quicker, so I found it worth it (and it's what I currently use for Mac OS 9). In the QEMU source tree, configure the build:

  • ./configure --extra-cflags="-O3 -mcpu=power9 -flto -fprofile-generate" \
    --extra-cxxflags="-O3 -mcpu=power9 -flto -fprofile-generate" \
    --extra-ldflags="-flto -fprofile-generate" --target-list=ppc-softmmu
  • make -j24

Run your profile as before. However, you need to preserve the profile before the rebuild because make clean will clobber it.

  • tar cvf instrumented.tar `find . -name '*.gcda' -print`
  • make clean
  • tar xf instrumented.tar
  • ../configure \ --extra-cflags="-O3 -mcpu=power9 -flto -fprofile-correction -fprofile-use -Wno-missing-profile" \
    --extra-cxxflags="-O3 -mcpu=power9 -flto -fprofile-use -fprofile-correction -Wno-missing-profile" \
    --extra-ldflags="-flto -fprofile-use -fprofile-correction -Wno-missing-profile" \
    --target-list=ppc-softmmu
  • make -j24

Life's golden, and just a little bit zippier. It's not always possible to PGO all the things, but here's one where it makes a noticeable difference.

Firefox 86 on POWER


Firefox 86 is out, not only with multiple picture-in-picture (now have all the Weird Al videos open simultaneously!) and total cookie protection (not to be confused with other things called TCP) but also some noticeable performance improvements and finally gets rid of Backspace backing you up, a key I have never pressed to go back a page. Or, maybe those performance improvements are due to further improvements to our LTO-PGO recipe, which uses Fedora's work to get rid of the sidecar shell script. Now with this single patch, plus their change to nsTerminator.cpp to allow optimization to be unbounded by time, you can build a fully link- and profile-guided optimized version for OpenPOWER and gcc with much less work. Firefox 86 also incorporates our low-level Power-specific fix to xpconnect.

Our .mozconfigs are mostly the same except for purging a couple iffy options. Here's Optimized:

export CC=/usr/bin/gcc
export CXX=/usr/bin/g++

mk_add_options MOZ_MAKE_FLAGS="-j24"
ac_add_options --enable-application=browser
ac_add_options --enable-optimize="-O3 -mcpu=power9"
ac_add_options --enable-release
ac_add_options --enable-linker=bfd
ac_add_options --enable-lto=full
ac_add_options MOZ_PGO=1

# uncomment if you have it
#export GN=/home/censored/bin/gn
And here's Debug:
export CC=/usr/bin/gcc
export CXX=/usr/bin/g++

mk_add_options MOZ_MAKE_FLAGS="-j24"
ac_add_options --enable-application=browser
ac_add_options --enable-optimize="-Og -mcpu=power9"
ac_add_options --enable-debug
ac_add_options --enable-linker=bfd

# uncomment if you have it
#export GN=/home/censored/bin/gn

Blackbird supply chain likely to improve


(Thanks to a reader tip.) Although yours truly is a Talos man (and was ever since it was going to have a POWER8 in it), the Blackbird is certainly far more attractive in terms of price. Backorders due to COVID-19's effect on the global supply chain have plagued it for months, but Raptor management on IRC indicates that logjam may be breaking; the first sign just a few days ago is that the 18-core monster POWER9 v2s (DD2.3) were back in stock. Obviously 18-cores don't (routinely) go in Blackbirds, but their presence suggests the supply chain issues are resolving and that a minimum order from IBM was met.

Raptor is well aware that the Blackbirds, more so than the T2 and T2 Lite, are its leading workstation product, and said there was "lots of demand" too ("about the only positive in the whole pandemic-induced mess"). However, Raptor's Timothy Pearson in the same IRC chat also commented that "we're playing it safe and focusing more on the next generation products than taking risks with POWER9 ... I can categorically state that if COVID19 had never happened, we'd have already offered other chips and we'd have at least one other product on the market designed around P9 by now." The latter sounds like a reference to Condor, Raptor's cancelled LaGrange system, but as long as POWER10 still has openness concerns, what "other chips"?

Gentoo on little-endian


A nice write up by Martin Kukač on getting Gentoo to be happy on little-endian: even though many Linux distributions support LE, and some now only do, if you install Gentoo from the Minimal Installation CD and try to use the ppc64le stage 3 tarball there's an endian mismatch and it doesn't work (dies during the install steps with /bin/bash in incompatible format). The issue appears to be that the Minimal Installation CD itself is big-endian; there is currently no analogous little-endian image. Martin's brainwave was to complete the installation from an already running little-endian system (he used RiscySlack but Void should also work as well). Following his steps, the OS will build in little-endian mode from within the second OS, and then can be booted into it. Good to have the choice and a nice how-to.

A better theory on why there won't be an open POWER10 workstation for awhile


In our previous analysis we suspected that Raptor's indigestion over POWER10 was IBM failing to release some component of the firmware, meaning it wasn't a truly open platform after all. Raptor, under whatever NDA prohibited them, couldn't say, but there was enough to do some educated reading between the lines regarding the problem.

So hats off to Hugo Landau, who did his own research on the subject. As you will recall, for POWER8 IBM introduced the Centaur memory buffers which serve essentially as off-chip memory controllers and a fourth level of cache, and scale-up Cumulus POWER9s (not the Nimbus POWER9s in Raptor workstations) can use them too. This enables a lot of logic to be move off-die and can turn what is a critical high-speed and potentially error-prone parallel interface into a serial one. IBM expanded this into the vendor-neutral Open Memory Interface, or OMI, which halves the latency of Centaur (to 5ns) and runs up to 25Gbps per lane. With OMI RAM technology can advance separately from the CPU, and the processor can be completely agnostic about what it's attached to (as opposed to Cumulus, which only "speaks" Centaur, and our Nimbus systems which use commodity directly-attached DDR4 RAM through an on-chip controller).

We reported previously that at the 2019 OpenPOWER summit Microchip Technology was announced as the first vendor of OMI DDIMMs, and although Micron, Samsung and SMART Modular were listed as planning to release their own, so far the only vendor of OMI controllers appears to be Microchip. We haven't heard anything about a Nimbus-alike POWER10 yet with direct-attached memory, so we have to assume that at least the first wave of POWER10 processors will only use OMI. Hugo's discovery was a obscure Github repo that appears to contain the firmware for the Microchip OMI controller — and no source code. Read Hugo's article for the additional dirty details.

The concept of RAM that requires firmware binary blobs is frankly very disconcerting: I shouldn't have to explain to any regular reader of this blog that if you own the RAM, you own the store, and you could potentially own the RAM this way (even/especially with a vendor lock: see SolarWinds). I won't say how I have knowledge of this, but various other cues indicate to me Hugo has found the exact reason POWER10 can't be considered open under any reasonable definition.

POWER9 systems can't last forever, of course. If there were going to be a truly open POWER10 system, we'd either have to reverse-engineer the Microchip controller firmware or develop a separate open memory controller of "our" own. Likewise, I'm pretty sure Raptor doesn't want to be in the DDIMM business, so if a separate Raptor-specific controller were required it may be simpler to just have RAM on the board as a build-to-spec option. Either way, while I understand IBM's decision with OMI to cater to their bandwidth-hungry institutional customers, the implementation they've chosen may put those very same high-value customers at risk. We should be glad Raptor didn't make the same choice and fortunately POWER9 systems will still be able to hold their own for awhile.

Followup on Firefox 85 for POWER: new low-level fix


Shortly after posting my usual update on Firefox on POWER, I started to notice odd occasional tab crashes in Fx85 that weren't happening in Firefox 84. Dan Horák independently E-mailed me to report the same thing. After some digging, it turned out that our fix way back when for Firefox 70 was incomplete: although it renovated the glue that allows scripts to call native functions and fixed a lot of problems, it had an undiagnosed edge case where if we had a whole lot of float arguments we would spill parameters to the wrong place in the stack frame. Guess what type of function was now getting newly called?

This fix is now in the tree as bug 1690152; read that bug for the dirty details. You will need to apply it to Firefox 85 and rebuild, though I plan to ask to land this on beta 86 once it sticks and it will definitely be in Firefox 87. It should also be applied to ESR 78, though that older version doesn't exhibit the crashes to the frequency Fx85 does. This bug also only trips in optimized builds.

Firefox 85 on POWER


Firefox 85 declares war on supercookies, enables link preloading and adds improved developer tools (just in time, since Google's playing games with Chromium users again). This version builds and runs so far uneventfully on this Talos II. If you want a full PGO-LTO build, which I strongly recommend if you're going to bother building it yourself, grab the shell script from Firefox 82 if you haven't already and use this updated diff to enable PGO for gcc. Either way, the optimized and debug .mozconfigs I use are also unchanged from Fx82. At some point I'll get around to writing a upstreamable patch and then we won't have to keep carrying the diff around.

Introducing Kestrel, part II: it's a soft-BMC


Well, geez, guys, why didn't you just say so in the first place? Kestrel is a "soft" BMC replacement, meaning you can devise your own Baseboard Management Controller/service processor to bring an OpenPOWER system up from absolutely nothing but a Lattice ECP5. Now, that's cool!

The underpinnings are strongly based on the Microwatt soft core and as such makes it a true OpenPOWER processor itself, not ARM like the ASPEED BMC on shipping Raptor systems. Kestrel is not currently far enough along to bring up the On-Chip Controllers on the POWER9 (PowerPC 4xx-like cores), but this appears to merely be a matter of adding more IPMI command support. It is, however, enough to kick off the POWER9's Self-Boot Engines and go into Hostboot, so the basics absolutely work.

Right now I think this system is a little raw for general usage, and the soldering requirement on a several thousand dollar board that's badly backordered is not appealing. The whole dev stack is also intended for Raptor systems, though to be honest if you care about Kestrel you undoubtedly already own one. But Raptor is to be commended for making a shippable product out of Microwatt and making it truly open, as one would expect from them. What I'm interested to see, however, is whether future Raptor systems have Kestrels on board instead of the ASPEED. That would be really impressive in terms of owner control and would make the current valiant but vain efforts to neuter x86 firmware look even more pathetic.