Posts

Showing posts from 2023

Fedora 39 mini-review on the Blackbird and Talos II (and other woes)


Merry Christmas and Happy Holidays: what a long strange trip it's been trying to get to this point, between trying to get a place to live at the new $DAYJOB and fix the Blackbird, which seemed to have gotten its BMC setting scrambled, then get it and the Talos II upgraded to Fedora 39 so that I can get back to work on the Firefox JIT. Here we are finally, just in time to write up our usual mini-review (see what I wrote for Fedora 38). As I always say in these mini-reviews, Fedora was one of the first mainstream distributions to support POWER9 out of the box, it's still one of the top distributions OpenPOWER denizens use and its position closest to the bleeding, ragged edge is where we see problems emerge first and get fixed (hopefully) before they move further downstream. That's why it's worth caring about it even if you yourself don't run it.

Also, as usual, recall both my T2 and Blackbird are configured to come up in a text boot instead of gdm and I start KDE manually from there. I still test GNOME on both systems, but my primary desktop environment has been KDE Plasma for several versions now, and I always recommend a non-graphical boot as a recovery mechanism in case your graphics card gets whacked by something or other. On Fedora this is easily done by ensuring the symlink /etc/systemd/system/default.target points to /lib/systemd/system/multi-user.target.

Unfortunately, dnf kernel updates still! don't seem to properly update grub's config (basically bug 1921479, showing messages like 0ed84c0-p94177c1: integer expression expected during the process), so the process remains largely unchanged from F38:

dnf upgrade --refresh # upgrade prior system and DNF
grub2-mkconfig -o /boot/grub2/grub.cfg # force grub to update
dnf install dnf-plugin-system-upgrade # install upgrade plugin if not already done
dnf system-upgrade download --refresh --releasever=39 # download F39 packages
dnf system-upgrade reboot # reboot into upgrader

I do the Blackbird first as a check, since it can afford to be down waiting for updates, but I can't have that happening with the Talos II since it's my primary workstation. Unfortunately, while the packages downloaded okay, the actual upgrade process didn't do anything. After a couple failed attempts where it rebooted back into 38 seemingly unchanged, I watched it like a hawk and observed the following:

The installer was saying that the Fedora 39 GPG signatures weren't valid "yet" and so nothing that was downloaded could be verified. This turns out to be the same basic issue that bedevils Raspberry Pi 4 and earlier models that lack a realtime clock (bug 2242759), but the Blackbird has an RTC, so what gives?

I checked the CMOS battery coin cell and got a full 3.0 volts, and couldn't find anything otherwise wrong with the installer itself. The answer was labouriously going through the logs and finding that the system time was about two years in the past, confirmed in the Petitboot shell. This gets fixed in a normal boot by chrony synching up with NTP, but that apparently doesn't happen with the installer. Worse, it looked like everything had somehow gotten scrambled in the Blackbird's BMC because I couldn't log on with the admin password and fix the time (either from the SSH server or the web interface). Time to crack the case and connect to the BMC's serial port.

Connect your terminal at 115200bps, 8-N-1 to the inside 9-pin serial port headers. We're basically following these instructions to reset the BMC's internal persistent storage, which will zap everything including the BMC password (if you're an old Mac user like me, think of this as the ASPEED equivalent of "zapping PRAM"). However, a wrinkle here is that the system can reboot multiple times, so it's important to change the bootargs as many times as it resets back to U-Boot until the kernel actually comes up.

Here's what you might see in your terminal program. Make sure the terminal program is running and the serial port is connected before you apply power to the system to start up the BMC, or you won't be early enough to talk to U-Boot.

DRAM Init-V12-DDR4
0abc1-4Gb-Done
Read margin-DL:0.3725/DH:0.3803 CK (min:0.30)


U-Boot 2016.07 (Feb 19 2020 - 11:51:39 +0000)

       Watchdog enabled
DRAM:  496 MiB
Flash: 32 MiB
In:    serial
Out:   serial
Err:   serial
Net:   aspeednic#0
Hit any key to stop autoboot:  0 
ast# setenv bootargs console=ttyS4,115200n8 root=/dev/ram overlay-filesystem-in-ram rw
ast# boot
## Loading kernel from FIT Image at 20080000 ...
   Using 'conf@aspeed-bmc-opp-blackbird.dtb' configuration
   Trying 'kernel@1' kernel subimage
     Description:  Linux kernel
     Type:         Kernel Image
     Compression:  uncompressed
     Data Start:   0x20080128
     Data Size:    2656176 Bytes = 2.5 MiB
     Architecture: ARM
     OS:           Linux
     Load Address: 0x80001000
     Entry Point:  0x80001000
     Hash algo:    sha1
     Hash value:   1815ece74a2e27241c471b9ed87885071dd9e143
   Verifying Hash Integrity ... sha1+ OK
## Loading ramdisk from FIT Image at 20080000 ...
   Using 'conf@aspeed-bmc-opp-blackbird.dtb' configuration
   Trying 'ramdisk@1' ramdisk subimage
     Description:  obmc-phosphor-initramfs
     Type:         RAMDisk Image
     Compression:  lzma compressed
     Data Start:   0x20310f88
     Data Size:    1583941 Bytes = 1.5 MiB
     Architecture: ARM
     OS:           Linux
     Load Address: unavailable
     Entry Point:  unavailable
     Hash algo:    sha1
     Hash value:   55e0853d6ad703d5ea225837d85223a73f7cf3a4
   Verifying Hash Integrity ... sha1
DRAM Init-V12-DDR4
0abc1-4Gb-Done
Read margin-DL:0.3745/DH:0.3784 CK (min:0.30)


U-Boot 2016.07 (Feb 19 2020 - 11:51:39 +0000)

       Watchdog enabled
DRAM:  496 MiB
Flash: 32 MiB
In:    serial
Out:   serial
Err:   serial
Net:   aspeednic#0
Hit any key to stop autoboot:  0 
ast# setenv bootargs console=ttyS4,115200n8 root=/dev/ram overlay-filesystem-in-ram rw
ast# boot
## Loading kernel from FIT Image at 20080000 ...
   Using 'conf@aspeed-bmc-opp-blackbird.dtb' configuration
   Trying 'kernel@1' kernel subimage
     Description:  Linux kernel
     Type:         Kernel Image
     Compression:  uncompressed
     Data Start:   0x20080128
     Data Size:    2656176 Bytes = 2.5 MiB
     Architecture: ARM
     OS:           Linux
     Load Address: 0x80001000
     Entry Point:  0x80001000
     Hash algo:    sha1
     Hash value:   1815ece74a2e27241c471b9ed87885071dd9e143
   Verifying Hash Integrity ... sha1+ OK
## Loading ramdisk from FIT Image at 20080000 ...
   Using 'conf@aspeed-bmc-opp-blackbird.dtb' configuration
   Trying 'ramdisk@1' ramdisk subimage
     Description:  obmc-phosphor-initramfs
     Type:         RAMDisk Image
     Compression:  lzma compressed
     Data Start:   0x20310f88
     Data Size:    1583941 Bytes = 1.5 MiB
     Architecture: ARM
     OS:           Linux
     Load Address: unavailable
     Entry Point:  unavailable
     Hash algo:    sha1
     Hash value:   55e0853d6ad703d5ea225837d85223a73f7cf3a4
   Verifying Hash Integrity ... sha1+ OK
## Loading fdt from FIT Image at 20080000 ...
   Using 'conf@aspeed-bmc-opp-blackbird.dtb' configuration
   Trying 'fdt@aspeed-bmc-opp-blackbird.dtb' fdt subimage
     Description:  Flattened Device Tree blob
     Type:         Flat Device Tree
     Compression:  uncompressed
     Data Start:   0x203089e8
     Data Size:    34013 Bytes = 33.2 KiB
     Architecture: ARM
     Hash algo:    sha1
     Hash value:   f53d8ad3a7c573a4903f910fd124507c73f0bbfb
   Verifying Hash Integrity ... sha1+ OK
   Booting using the fdt blob at 0x203089e8
   Loading Kernel Image ... OK
   Loading Ramdisk to 9ea16000, end 9eb98b45 ... OK
   Loading Device Tree to 9ea0a000, end 9ea154dc ... OK

Starting kernel ...

[...]

Phosphor OpenBMC (Phosphor OpenBMC Project Reference Distro) 0.1.0 blackbird ttyS4

blackbird login: root
Password: 0penBmc
root@blackbird:~# flash_eraseall /dev/mtd/rwfs
Erasing 64 Kibyte @ 400000 - 100% complete.
root@blackbird:~# reboot

Notice that it fell back to U-Boot once before actually going into the OpenBMC kernel. You need to set the bootargs both times (because the watchdog is aggressive and will reboot after even a brief period of inactivity, I recommand cutting and pasting that setenv command instead of typing it). Once that's done, the default 0penBmc password will work, and we can reset the persistent storage. You may need to powercycle the BMC after this too as the manual reboot may not be enough.

With that corrected, I was able to log into the web interface and set the time manually, though I also ensured the time owner was Host (i.e., not the BMC) so that hopefully the system can deal with this itself on startup again. Petitboot confirmed the time was correct and the install succeeded. On my Blackbird I got the usual graphical progress bar on the ASPEED BMC framebuffer; on my T2 with a BTO AMD WX7100, as long as you ensure the kernel is manually selected through Petitboot on startup, you'll still get to see the install log live as text. If you forget to manually select the kernel and the system comes up to an apparently black screen, you can either monitor on the serial port, or from a connected system viewing the serial console over the BMC's web server, or by logging into another VTY with CTRL-ALT-F2 or as appropriate as root and periodically issuing dnf system-upgrade log --number=-1 to watch log updates.

The update did not cause a stuck XFS log entry on the Blackbird, but after the reboot I still had to do one more grub2-mkconfig -o /boot/grub2/grub.cfg and a restart to ensure the right kernel and version were being used. Currently the kernel version as of this writing is 6.6.7.

Fedora 39 is apparently the last hurrah for the X11 session of KDE Plasma, largely because the Fedora KDE SIG doesn't want to deal with it. While Plasma 6 still has a legacy X11 session, KDE as provided in the Fedora spin won't have it — Wayland Wasteland or bust. (In particular, this will obsolete both kwin-x11, presumably including /usr/bin/kwin_x11 and /usr/bin/startplasma-x11, and plasma-workspace-wayland.) Naturally this directly affects me personally, so let's start with the system with the worst Wayland support, our stripped-down Blackbird. All it has is the ASPEED BMC framebuffer connected over HDMI.

(GNOME Wayland started manually with XDG_SESSION_TYPE=wayland /usr/libexec/gnome-session-binary --builtin)
(KDE Plasma 5 Wayland started manually with /usr/bin/startplasma-wayland)

Both GNOME 45 (GNOME's own X11 session support removal is further behind, and doesn't appear imminent) and Plasma 5 in F39 are still limited to 1024x768 with the on-board HDMI. This is a terrible limitation that yet again remains unaddressed, especially for those people trying to run their systems completely blobless, and also affects Arctic Tern which uses the same IT66121FN HDMI transceiver PHY. While performance remains stably improved over the horrid morass it used to be, further fixes appear to be stuck on a plateau.

The good news is, a lot of the graphical glitches that plagued F38 and earlier in GNOME on the AST framebuffer (both Wayland and X11) appear to be corrected in GNOME 45. Window decorations and window movement seem to work properly again and performance is good enough. This places me in the unusual situation of recommending GNOME to blobless or no-GPU OpenPOWER users, even though it's not a particularly lightweight desktop environment, simply because you'll still be able to run it under X11 at least for a little while and you can add an X11 modeline for higher resolution — as I've done. Otherwise, start budgeting for a video card.
Plasma 5, of course, continues to work fine without a GPU in X11. Enjoy that while it lasts.

Next was the Talos II, which has a custom KDE theme and a lot more packages. Its clock is fine, so that wasn't the problem, but the install left a stuck log entry on the XFS root again and Petitboot duly puked. I haven't (quite, anyway) gotten to the point of replacing the XFS root with ext4, but I now have a script to automatically do the volume group gyrations and XFS repair on the Blackbird instead of typing in commands, and for future upgrades I'll be doing ipmitool chassis bootdev safe (at the suggestion of a commenter) before running the installer so that if I can't scan for devices, I can at least restart, immediately jump into the Petitboot shell and do repairs if necessary. Note that turning off the automatic device scan can impair autobooting, such as me remotely bringing up the system from the BMC after a power failure, so I've set it back to normal with ipmitool chassis bootdev none after the upgrade and repair were successful.

So far F39 is running fine on the T2 as well, with only a couple old compat packages that didn't make the jump (meanwhile, for those of you hanging on to F37, that support is already over). It's F40 where the problems are expected to start for me with KDE, which as of this writing is scheduled for mid-April 2024. As usual no one cares, least of all the Wayland community, which has simply fallen back on strong-arm tactics to drive adoption and functionality be damned. I'll have to evaluate my choices then based on my app mix and how well Wayland-only KDE can handle them, and it won't be just OpenPOWER users like me in that boat.

Firefox 121


We're still in the process of finding a place to live at the new job and alternating back and forth to the tune of 400 miles each way. Still, this weekend I updated Firefox on the Talos II to Fx121, which fortunately also builds fine with the WebRTC patch from Fx116 (or --disable-webrtc in your .mozconfig), the PGO-LTO patch from Fx117 and the .mozconfigs from Firefox 105.

Unfortunately I had intended to also sit down with the Blackbird and do a test upgrade to Fedora 39 before doing so on the Talos II, but the Blackbird BMC's persistent storage seems to be hosed, the BMC password is whacked and the clock is permanently stuck in June 2022, causing signature checks on the upgrade to fail (even with --nopgpcheck). This is going to require a little work with a serial console and I just didn't have enough spare cycles over the weekend, so I'll do that over the Christmas holiday when we have a few free days. Hopefully I can also get some more work done on upstreaming the JIT at the same time.

Fedora 39


Fedora 39 is out, the Linux distro I use personally on my Talos II and Blackbird systems. Since I'm doing a lot of remote work with the new $DAYJOB it might be a bit before I can sit down with it but there will be the usual mini-review (here's what I had to say about Fedora 38). This release is based on kernel 6.5 and GNOME 45, with LLVM 17, Perl 5.38, glibc 2.38 and gcc 13.2. As usual, Fedora 37 will EOL one month after this release.

However, F39 isn't really the big news, and probably won't be a very exciting (or bumpy) release. Instead, the bigger and possibly more obnoxious change will occur with Fedora 40 when the Plasma spin (which I use instead of GNOME) is expected to completely drop X11 support, and the same is likely coming for GNOME 46 which will occur in the same version. It will be interesting, to say the least, to see how this affects people running OpenPOWER systems without GPUs to be completely blob-less. I've not been terribly happy with Wayland on the Blackbird's ASpeed BMC framebuffer and I haven't seen anything to indicate that its deficiencies have improved. Wonder how well it would work on the X1 ...

Firefox 119 and the next ppc64le JITeration


Although I've been a bit preoccupied lately with a new $DAYJOB which has required me to be remote, let's not bury the (larger) lede: the first iteration of the Firefox/SpiderMonkey ppc64le JIT is being evaluated by Mozilla to determine if the changes are acceptable. Please don't spam the Bugzilla entry with drive-by comments, but if you'd like to observe its progress, you can follow along in bug 1860412.

That doesn't mean, of course, that you can't try it yourself. The current JIT state for 115ESR now supports baseline Wasm as well as full optimizing Ion compilation for regular JS, and passes the complete test suite on Linux. It does not yet support POWER8, nor the optimizing Wasm compiler, so some applications will not run as well as they should (and obnoxiously asm.js code is not JITted at all in this configuration because it relies on the optimizing Wasm compiler, despite the fact it's regular JavaScript — for TenFourFox, which didn't support Wasm otherwise, I hacked JavaScript to simply compile asm.js with regular Ion). However, I do intend to add support for optimized Wasm and later POWER8, and with that said, the testers I've been seeding this with see good improvements for the vast majority of sites and no additional reproducible crashes so far.

If you'd like to give it a shot as well, then apply the new patches numerically and build as we did for Firefox 115, using the .mozconfigs from Firefox 105. For your convenience the JIT patch set already includes the PGO-LTO and WebRTC fixes for that version. If you don't want to roll your own browser (though I highly recommend it), then Dan Horák has you covered with a copr build for Fedora users. However, I don't intend to backport POWER8 or optimizing Wasm support to 115ESR; future work will be done on trunk, assuming Mozilla is fine with the existing changes. Do not post bugs with the ESR JIT to bug 1860412.

Apart from that, the other Firefox news is anticlimatic: Firefox 119 (I did a test build of Fx118 but hadn't tested enough to post about it) builds fine with the WebRTC patch from Fx116 (or --disable-webrtc in your .mozconfig), the PGO-LTO patch from Fx117 and the .mozconfigs from Firefox 105.

The next Raptor OpenPOWER systems are coming, but they won't be Power10


I'd like to first start out by saying I've been aware of new developments but made certain promises to keep my mouth shut until all the parties were ready to announce. (Phoronix is not so constrained.) Many of you noted an offhand comment in this YouTube video about Raptor announcing a new Power10 system. That got a lot of people excited, because while our POWER9 systems are doing well, in 2023 this dual-8 64-thread POWER9 is no longer cutting edge and we need new silicon in the pipeline to keep the ecosystem viable.

Raptor yesterday officially announced that we're not getting Power10 systems. The idea is we're going to be getting something better: the Solid Silicon S1. It's Power ISA 3.1 and fully compatible, but it's also a fully blob-free OpenPOWER successor to the POWER9, avoiding Power10's notorious binary firmware requirement for OMI RAM and I/O.

I asked Timothy Pearson at Raptor about the S1's specs, and he said it's a PCIe 5.0 DDR5 part running from the high 3GHz to low 4GHz clock range, with the exact frequency range to be determined. (OMI-based RAM not required!) The S1 is bi-endian, SMT-4 and will support at least two sockets with an 18-core option confirmed for certain and others to be evaluated. This compares very well with the Power10, which is also PCIe 5.0, also available as SMT-4 (though it has an SMT-8 option), and also clocks somewhere between 3.5GHz and 4GHz.

S1 embeds its own BMC, the X1 (or variant), which is (like Arctic Tern) a Microwatt-based ISA 3.1 core in Lattice ECP5 and iCE40 FPGAs with 512MB of DDR3 RAM, similar to the existing ASpeed BMC on current systems. X1 will in turn replace the existing Lattice-based FPGA in Arctic Tern as "Antarctic Tern," being a functional descendant of the same hardware, and should fill the same roles as a BMC upgrade for existing Raptor systems as well as the future BMC for the next generation systems and a platform in its own right. The X1 has "integrated 100% open root of trust" as you would expect for such a system-critical part.

Raptor's newest systems are planned for late 2024. There will be tiering, so most likely (though not confirmed) Blackbird, T2 and T2 server classes of systems will be available under new names. Price? Well, you'll just have to wait and see.

Solid Silicon is definitely a new name in the Power ecosystem and we don't know a lot about them. There's a web page, but the TwXitter and LinkedIn links are unpopulated as of this writing, and it's maddeningly minimal on actual content. Tim confirmed they are a new licensee and have been working on the design for at least a couple years. The press release gives a 737 area code, which is Austin, Texas, and the only Solid Silicon business entry I could find for Texas is this one for Solid Silicon Technology LLC in Plano. I'm told this isn't them, so if anyone from Solid Silicon would like to lift the corporate veil a little, drop me a line at ckaiser at floodgap dawt com. [UPDATE: The LinkedIn was updated after this posted, listing Todd Rooke as CEO. Rooke's listing indicates past experience with FPGAs, as well as his time at HPE and Microsoft. His location is given as Colorado Springs but Colorado lists no company by that name. Hopefully more to come.]

But besides new systems in the offing, it's also good news that we're getting — we hope — performant OpenPOWER chips that aren't from IBM. I don't have anything against IBM; I've worked with IBM hardware for literally decades, and my home server is a classic POWER6 that just keeps on truckin'. But IBM designs chips to benefit IBM's world, which is server rooms (ask anyone who's got one what it's like to share an office with a POWER8), and IBM doesn't do end-user sales. If Raptor has a good partner here who can design solid OpenPOWER chips for workstations and small servers, not traditionally IBM's present domain but one important for them to maintain if they want OpenPOWER to stay relevant, then in around a year we should be in for a treat — and a very rosy near future.

OpenBSD 7.4


OpenBSD 7.4 is released, an increasingly mature BSD option for OpenPOWER systems. While there are relatively few improvements this time around specifically for the powerpc64 port, this release does have improvements for SMP (a big deal for our pervasively SMT cores), performance and security upgrades with the virtual memory manager, and updates and bugfixes to userland. You can download it, or read the entire changelog.

Progress on the Firefox ppc64le JIT


A picture is worth a thousand Wasm opcodes. This is further along than we've gotten on earlier drafts. More soon.

Partial ppc64le JIT available again for Firefox 115ESR


I've been rehabilitating the old ppc64le JIT against more current Firefoxes and there is now available a set of patches you can apply to the current 115ESR. This does not yet include support for Ion or Wasm; the first still has some regressions, and the second has multiple outright crashes. Still, even with just the Baseline Interpreter and Baseline Compiler it is several times faster on benchmarks than the interpreter-only 115. I've included also the relevant LTO-PGO and WebRTC patches so you can just apply the changesets numerically and build. The patches and the needed .mozconfigs for either building an optimized browser or a debug JS shell (should you want to poke around) are in this Github issue.

While this passes everything that is expected to pass, you may still experience issues using it, and you should not consider it supported. Always backup your profile first. But it's now an option for those of you who were using the previous set of patches against 91ESR.

ppc64le JIT now officially landing (again) in DOSBox Staging


Waaaay back when, I wrote up a basic dynamic recompilation JIT for vanilla DOSBox (the most well-known of the DOS-specific emulators, if you've been under a rock for awhile), which increases performance in x86 protected mode by as much as several times. This was an unofficial patch and I just kept it out of the tree, since the 32-bit PowerPC JIT it was based on wasn't part of it either.

Well, little did I know, but the patch got picked up as part of the DOSBox Staging spin six months later and apparently ran fine until an upstream commit broke it. I never noticed because I was happily using my old build, but Trung Lê did and reported it. So I fixed it and added proper support for 4K or 64K page sizes, and it was committed to the source tree today as part of 0.81. If you can't wait, build from source today, or wait for your package manager to pick it up whenever 0.81 gets formally released.

Firefox 117 on POWER


Now that the Talos II is upgraded and tuned up, it's back to development work, starting with (after a TenFourFox patch dump) Firefox 117. Maybe it's just me, but it seems subjectively zippier than 116, even accounting for the cruft that builds up during long browser sessions, and there are some notable developer-facing improvements. As usual, for the long-playing bug 1775202, either put --disable-webrtc in your .mozconfig if you don't need WebRTC, or tweak third_party/libwebrtc/moz.build with the patch from Fx116. The browser otherwise builds and works with a tweaked PGO-LTO patch and the .mozconfigs from Firefox 105.

It's a plug and pray night


The Talos II got an upgrade today but not without a lot of messing around. Some of you have noted I've said little about further Firefox JIT updates and that's because of 1) $DAYJOB and 2) running dangerously low on space on the 1TB Samsung 960 EVO NVMe SSD mounted on /home, so having lots more source code sitting around wasn't happening.

Well, as of Monday I'm now officially between jobs (don't worry, I'll be getting a paycheck again in October) and I finally got the T2 to cooperate with a new 2TB Samsung 980 PRO. At the same time I also replaced the Raptor BTO Marvell 88SE9235 SATA card with a JMicron JMB582 card. It's two ports instead of four, but I'm only using it for the two optical drives, and it seems to be much more reliable than the Marvell which would sometimes come up with drives and other times stall out.

But getting them to all work together was unexpectedly tricky. Here's what's in there now.

% lspci
0000:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0000:01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon Pro WX 7100]
0000:01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere HDMI Audio [Radeon RX 470/480 / 570/580/590]
0001:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0001:01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO
0002:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0003:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0003:01:00.0 USB controller: Texas Instruments TUSB73x0 SuperSpeed USB 3.0 xHCI Host Controller (rev 02)
0004:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0004:01:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0004:01:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5719 Gigabit Ethernet PCIe (rev 01)
0005:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0005:01:00.0 PCI bridge: ASPEED Technology, Inc. AST1150 PCI-to-PCI Bridge (rev 04)
0005:02:00.0 Multimedia video controller: ASPEED Technology, Inc. ASPEED Graphics Family (rev 41)
0030:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0030:01:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM961/PM961/SM963
0031:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0032:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0033:00:00.0 PCI bridge: IBM POWER9 Host Bridge (PHB4)
0033:01:00.0 SATA controller: JMicron Technology Corp. JMB58x AHCI SATA controller

Originally, the NVMe drive I bought was a 2TB WD Black SN850X. This worked great in an external USB3 enclosure. I rsynced /home to it and put it into the PCIe carrier and restarted, and it failed to show up in Petitboot, lspci or Fedora. I tried a different passive adaptor on the off-chance that had something to do with it and moved it around the available slots, but nothing made it work. Later I found an old post on the Raptor forums reporting a similar problem, nor was I willing to get one of those pricey switched multi-M-2 cards to try.

Since the boot drive is (still) a Raptor BTO Samsung 960 and the drive I was replacing was also a Samsung, I decided the cheaper option would be to just buy another Samsung, though the TLC flash in the 980 PRO makes it more like a 980 EVO in my book. Anyway, I left everything copying overnight, came back this morning, pulled the PCIe carrier, swapped the SSD sticks and fired it back up, and Petitboot wouldn't see it either. (I had installed the JMicron card at the same time and it did show up, but that wasn't too helpful just then.) At this point two drives acting exactly the same way, including one that would be very likely to work, made me suspicious this was a configuration problem.

This is a fully-populated dual-8 T2, so both CPUs and all five PCIe slots are live. At this point, other than the BMC, Ethernet and USB, the only device on the first CPU's slots was the Raptor BTO AMD Radeon Pro WX7100 workstation card in the 16x (the 8x is open); the original NVMe SSDs and the Marvell SATA card occupied the three lower slots (16x, 16x, 8x) handled by the second CPU.

I started off by pulling the SATA card completely and just leaving the two SSDs and the WX7100. Sforza POWER9s support three PCIe controllers (PECs), 48 PCIe lanes and six PCIe host bridges (PHBs) per processor module. In theory this looks like three x16 slots per CPU, but in practice PEC1 on each CPU is always bifurcated x8x8 and PEC2 is optionally trifurcatable x8x4x4. Plus, there are on-board resources also competing for those lanes, so some of the T2 slots are necessarily bifurcated and others aren't. The x8 slot on CPU0 is actually a bifurcated PEC1, with x4 allocated to the Microsemi PM8068 BTO option (whether present or not), and its phantom PEC2 is split between the BMC, the Broadcom Ethernet controllers and the TI USB 3.0 host controller. On CPU1, slot three is PEC2 x16, slot four is PEC0 x16 (never bifurcated on either CPU), and slot five is PEC1 x8, with x4 allocated to OCuLink.

The old 960 EVO and the unresponsive new 980 were originally in slot 5 (CPU1, PEC1) and the boot 960 in slot 4 (CPU1, PEC0). Moving the new 980 into slot 1 (CPU0, PEC1) finally allowed it to coexist and be mountable with the boot 960, so next I put the JMicron SATA card into the newly available slot 5 and restarted ... and the x16 video card in slot two (CPU0, PEC0) failed to come up. Petitboot was fine on serial.

Figuring I had exceeded some maximum on CPU0 somehow, I moved the WX7100 to the open x16 on slot 3 (CPU1, PEC2). Not only did I still get a black screen, but I also got the dreaded PHB Freeze/Fence detected ! on that slot in the Hostboot output, which usually means the system planar is not happy with you now.

I returned to the immediately preceding configuration, returning the video card to CPU0 PEC0 in slot 2. Putting the 960 into slot 5 and the JMicron card into slot 4 also failed, so within those constraints the only thing I hadn't tried yet was putting the JMicron SATA card in slot 3 instead of slot 5. This seemed technically disgusting since I was wasting an entire x16 slot on a miserable little x1 card, and of course it worked.

The JMicron has a bright blue LED, so with the final configuration and the existing LEDs on the board I suppose I could slap a Honda hood ornament on the front and go street racing. This configuration leaves slot 5 unoccupied, which is an x8, so in the end it's no worse than what I started with (slot 1 was originally free).

I'm not sure what the moral of the story is here. It's possible the Marvell was misbehaving because of conflicts too, but it never failed to show up in lspci even though the drives connected to it sometimes did, whereas the new NVMe drive didn't even show up as a device, let alone a mountable filesystem. It also seems like the Radeon doesn't like being in a bifurcatable slot, though to test this I'd have to try it in slot 4. On the other hand, slot 5 should have been exactly the same as slot 1, yet neither the new 980 nor the SN850X would work in slot 5, nor would the JMicron card. Perhaps the SN850X would work fine in slot 1 as well. If I'm inside the case again to do something else, I should test some or all of these theories.

One thing that is worth remembering is that a PCIe device that initially fails to work or be recognized when installed may simply be picky about where you put it depending on what other devices are present. Unfortunately that means a whole lot more trial and error when you have multiple devices, and tonight I'm not interested in pushing my luck further. Once I've built the next Firefox (article to follow), then it's time to get back to work with a terabyte more space to expand into. And that'll hold a lot of source trees.

Linux 6.5


Linux 6.5 is released, including deprecation of the old SLAB allocator, faster PCIe waits (especially notable for us when things like SATA controllers start timing out), faster parallel direct I/O on ext4, and improvements to the workqueue. There's not a lot notable for Power ISA, though ELFv2 is now the default for 64-bit big-endian kernel builds, and if you're running Power10 this release adds support for the DEXCR SPR (Dynamic Execution Control Register) which helps to reduce speculative execution risk. Expect to see 6.5 in bleeding-edge distros like Fedora soon (and almost certainly in Fedora 39).

Tonight's game on OpenPOWER: Doom64EX and Doom64EX-Plus


We haven't done one of these in awhile because it's been a bad, busy summer. I won't bore you with my personal life; there's obvious catharsis when you can unwind mowing down hordes of hell after a long day at the office. Rather than the same old Doom, though (which was the mistake they made with Nintendo 64 Quake), Midway Games made an actual sequel to Doom on the Nintendo 64 using an enhanced engine supporting more advanced level geometry and lighting effects. Monster sprites were higher resolution, sounds were updated and the music changed from Bobby Prince's synthetic metaloid to deeply unsettling ambient by Aubrey Hodges. Plus, all new levels and a new weapon! And that was Doom 64.

Well, N64 decompilations and re-creations have really come into their own, and you can play Doom 64 on your desktop computer too with Doom64EX (done by the same guy who did Strife) or the updated Doom64EX-Plus which instead supports the Nightdive Studios 2020 remaster (Steam link provided for your convenience; I'm not affiliated and I don't get a cut). [UPDATE: Also on GoG.com.] Both releases have improved mouse and keyboard controls and support oodles of resolutions including widescreen.

However, unlike most of the re-creations we've talked about before, there's no getting around it: if you're not playing the remaster you'll need an N64 ROM. And that's all I'm going to say about that. If you have the N64 cartridge and a dump of it, play Doom64EX (it can't play the remaster); if you bought the remaster and have the data files, play EX-Plus (it can't play the original).

Anyhoo, if you want to build the original Doom64EX, it (at least with gcc on Fedora 38) has a glitch where you can't walk backwards. This took a little while to figure out but fortunately is easily fixed, and is already part of Doom64EX-Plus.

diff --git a/src/engine/doom_main/d_ticcmd.h b/src/engine/doom_main/d_ticcmd.h
index 2352bb2..1eef4bc 100644
--- a/src/engine/doom_main/d_ticcmd.h
+++ b/src/engine/doom_main/d_ticcmd.h
@@ -30,18 +30,18 @@
 #pragma interface
 #endif
 
 // The data sampled per tick (single player)
 // and transmitted to other peers (multiplayer).
 // Mainly movements/button commands per game tick,
 // plus a checksum for internal state consistency.
 typedef struct {
-    char    forwardmove;    // *2048 for move
-    char    sidemove;    // *2048 for move
+    signed char    forwardmove;    // *2048 for move
+    signed char    sidemove;    // *2048 for move
     short    angleturn;    // <<16 for angle delta
     short    pitch;
     byte    consistency;    // checks for net game
     byte    chatchar;
     byte    buttons;
     byte    buttons2;
 } ticcmd_t;
For classic Doom 64 EX, to compile you'll need CMake, SDL2, SDL2_net, zlib and libpng, which odds are you have already. I also recommend building using your system FluidSynth instead of the vendored FluidSynth-lite, so mkdir build; cd build; cmake -DENABLE_SYSTEM_FLUIDSYNTH=ON ..; make -j24 # or whatever to build. Finally, generate the WAD data from the totally legally acquired N64 ROM you have with ./doom64ex -wadgen [path], which will digest the ROM and automatically start the game. (For future starts you can just run ./doom64ex directly and it will use the cached WAD.)

For the updated Doom64EX-Plus, cd src/engine && ./build.sh to build; you don't need CMake. Then put the remaster game data in the same directory or /usr/local/share/doom64ex-plus or /usr/share/games/doom64ex-plus, and start the game with ./DOOM64EX-Plus.

Either way, you have a feeling it wasn't meant to be touched.

Firefox 116 on POWER


Firefox 116 is out with user interface improvements (notably a sidebar switcher), faster HTTP/2 uploads, and some initial UI rework for changes to how recently closed tabs are handled. On the developer side, the Audio Output Devices API lets you redirect browser audio output to a permitted device without having to change it globally, plus directional attributes for certain HTML form elements for those of you using a right-to-left language system.

This release needs new patches. First, for the long-playing bug 1775202, either put --disable-webrtc in your .mozconfig if you don't need WebRTC, or tweak third_party/libwebrtc/moz.build with this updated patch. The browser otherwise builds and works with an updated PGO-LTO patch and the .mozconfigs from Firefox 105.

Firefox 115 on POWER


Firefox 115 is out, which is also the new Extended Support Release base. A nice feature on Linux is a middle click on the new tab button will open a new tab with whatever URL is on the clipboard (on the other hand, middle click on an existing tab closes it, so the interface is a wee bit muddled here); a more questionable feature is a Mozilla-controlled extension blocklist by domain. There are also additional updates to the Web platform.

The good news about building is that the fix for bug 1838584 landed on Fx115, so you won't need that patch to build like we did for Fx114. Apart from that, you'll just need to deal with bug 1775202 as usual, either by putting --disable-webrtc in your .mozconfig if you don't need WebRTC, or patching third_party/libwebrtc/moz.build with this patch. The browser otherwise builds and works with the PGO-LTO patch for Firefox 110 and the .mozconfigs from Firefox 105.

Linux 6.4


While we wait to see what Red Hat's new source code policy does to RHEL rebuilds like RockyLinux and Alma Linux downstream, Linux 6.4 came out this week. Aside from things like new hardware support, filesystem improvements (such as a small performance win for ext4) and more Rust code, and the removal of the old SLOB memory allocator, there's not a lot here on the Power ISA side this time around except for the removal of various older evaluation boards. Expect to see it in Fedora and more leading edge distributions shortly.

Firefox 114 on POWER


Firefox 114 is released. The biggest update in my humble opinion is that (assuming you're running Linux) you can now use FIDO2/WebAuthn authenticators over USB, and virtually all of them should "just work" with OpenPOWER hardware. I'm going to try this out and report back but cursorily looking at the source code I don't see any reason why it would be incompatible. If you bought your POWER9 for security purposes, or even if you just like being secure-adjacent, here's another advance to take advantage of. A more nebulous new feature in 114 is support for WebTransport, which adds low-latency datagram-grade server communication and should facilitate more interactive applications but will probably just be another way for sites to spy on you. Oh well! This initial cut requires HTTP/3 but HTTP/2 support is coming.

Fx114 will not build out of the box. First, as usual you'll need to deal with bug 1775202, either by putting --disable-webrtc in your .mozconfig if you don't need WebRTC, or patching third_party/libwebrtc/moz.build. Unfortunately the patches in that bug no longer serve, so here's the diff on mine, which builds the browser I'm typing in now.

Second, there's a regression in the profiler which breaks linking, filed as bug 1838584 and also present in Firefox 115 currently, apparently a regression from bug 1824465. The provided patch in the bug fixes the issue but unless this lands on beta you'll need this patch again for Fx115.

The browser otherwise builds and works with the PGO-LTO patch for Firefox 110 and the .mozconfigs from Firefox 105.

Also, I might take the opportunity to make a performance reminder: if you have a lot of threads available (on this dual-8 Talos II, I have 64 because SMT-4), you can modestly increase dom.ipc.processCount and get better throughput. I use 12. Increase this pref with care, because you'll need memory to match (I have 64GB), and other kinds of content processes may not count against that cap: with that setting, ps auxww | grep firefox | grep -c contentproc currently gives me a count of 31. Be sure to uncheck "Use recommended performance settings" in Firefox's preferences before changing this count in about:config. about:processes can give you a better idea of what content processes exist and what they're doing.

SDL2 VMX bugfix


Another hat tip to Jeremy Rand, who got a fix landed for issues with VMX endianness with blitting inside the SDL library. We all care about SDL, since many games rely on it (and quite a few other things), and the issue was a real performance problem with ppc64le as at least some distros dealt with the issue by disabling it on little-endian (big-endian Power is unaffected). The patch is also faster because it moves the vector permute out of the loop, which is also the correct fix for the bug. The issue goes back all the way to the original SDL; the patch has landed on the legacy SDL 1.2.x as well as the next release of SDL 2.x, expected to be 2.28.

Jeremy added, "My patch was reviewed promptly, and it's obvious that SDL actually cares about properly supporting diverse arches, even if they don't personally use them." Amen to that.

Maybe XFS truly is cursed


Bugs, bugs everywhere: for those of you — like me! — running XFS as your root in Fedora because you never got around to changing it, if Petitboot doesn't get you then the kernel will. Linux 6.3 apparently has one line of code missing in the XFS component which can corrupt metadata (thank goodness I'm on 6.2). The updated kernel is on its way to the Fedora testing repositories and should turn up as a 6.3 point release. I guess today is a good day to die go ext4 or something.

Rocky Linux 9.2 is rocky


Although Rocky Linux 9.2 emerged on Tuesday, one of the architectures wasn't ppc64le - the release was held back. This seems to be due to a Power-specific bug in the provided build of Python 3.9, and also affects RHEL 9.2 (but not, near as I can tell, Fedora 38, which ships with Python 3.11 and runs fine on my systems). There are also no build artifacts available and there is currently no ETA for repair. Because Rocky Linux's mirrorlist can't hold back just one architecture, you'll need to add --releasever 9.1 (or change /etc/dnf/vars/releasever) to ensure dnf update doesn't get polluted with later metadata until the revised architecture spin is available.

Firefox 113 on POWER


Yes, I skipped a version, sosumi. I'm running a little low on development space on the NVMe drive, but still managed to squeeze in Firefox 113 which introduces enhanced video Picture-in-Picture, more secure private windows and password generation, support for AVIS images, debugger improvements and additional CSS and API features. As usual you'll need to deal with bug 1775202 either with this patch — but without the line containing desktop_capture/desktop_capture_gn, since that's long gone — or put --disable-webrtc in your .mozconfig if you don't need WebRTC. The browser otherwise builds and works with the PGO-LTO patch for Firefox 110 and the .mozconfigs from Firefox 105.

Fedora 38 mini-review on the Blackbird and Talos II


This article would have come out sooner except I also wanted to test building Firefox in Fedora 38, and then when I tried to run libnxz/power-gzip to test out the POWER9 nest accelerator make check made my daily driver Talos II machine check and caused Hostboot to guard out the entire CPU with the NVMe drives attached (and fixing that caused Petitboot to barf on a stuck XFS log entry again, requiring a trip to the Blackbird to mount and repair it). But here we are.

As I always say in these mini-reviews, Fedora was one of the first mainstream distributions to support POWER9 out of the box, it's still one of the top distributions OpenPOWER denizens use and its position closest to the bleeding, ragged edge is where we see problems emerge first and get fixed (hopefully) before they move further downstream. That's why it's worth caring about it even if you yourself don't run it.

Also, as usual, recall both my T2 and Blackbird are configured to come up in a text boot instead of gdm and I start KDE manually from there. I still test GNOME on both systems, but I've pretty much entirely migrated over to KDE Plasma, and you should never have considered my GNOME testing to be exhaustive anyway. I strongly recommend a non-graphical boot as a recovery mechanism in case your graphics card gets whacked by something or other. On Fedora this is easily done by ensuring the symlink /etc/systemd/system/default.target points to /lib/systemd/system/multi-user.target.

Because of issues with dnf kernel updates still sometimes not updating the grub config (basically bug 1921479, showing messages like 0ed84c0-p94177c1: integer expression expected during the process), I've added a little extra paranoia to the usual install dance. To wit:

dnf upgrade --refresh # upgrade prior system and DNF
grub2-mkconfig -o /boot/grub2/grub2.cfg # force grub to update
dnf install dnf-plugin-system-upgrade # install upgrade plugin if not already done
dnf system-upgrade download --refresh --releasever=38 # download F38 packages
dnf system-upgrade reboot # reboot into upgrader

This went fairly smoothly on both systems. Other than a copr package with a stale prerequisite I had to remove, there were no issues or conflicts with the 38 packages. As long as you manually select the new kernel in Petitboot before the system starts, you'll get some sort of installation screen. On the Blackbird's HDMI output from the ASPEED BMC framebuffer, the same friendly GUI installer will appear as in prior releases:

But even without using BMC video, like on the T2 with the Raptor-BTO WX7100 workstation card, as before you'll still get to see the install log live as text (which by now I've found more useful anyway). If you forget to manually select the kernel and the system comes up to an apparently black screen, you can either monitor on the serial port, or from a connected system viewing the serial console over the BMC's web server, or by logging into another VTY with CTRL-ALT-F2 or as appropriate as root and periodically issuing dnf system-upgrade log --number=-1 to watch log updates.

The update did not cause a stuck XFS log entry this time on either the Blackbird or the T2, but after the reboot I did need to do one more grub2-mkconfig -o /boot/grub2/grub2.cfg and a restart to ensure the right kernel and version were being used. Currently the kernel version as of this writing is 6.2.14.

Our first stop on the BMC-only Blackbird is GNOME on Wayland, started (awkwardly) with XDG_SESSION_TYPE=wayland /usr/libexec/gnome-session-binary --builtin. This configuration hasn't visibly improved any from Fedora 37; there are still prominent artifacts moving windows around and display through the HDMI adapter is still limited to 1024x768.

Performance wasn't hideous but the artifacts were distracting. I couldn't get a screenshot of it in Spectacle so I just grabbed a picture on my Pixel 7 Pro. However, the story isn't a whole lot better in GNOME on X11:
While we now have a full 1920x1020, you can see that the title bar still isn't being painted correctly. This occurred with most of the applications I tried. I consider this a critical fault due to the smearing, so I can't really recommend GNOME at all under any window system if you're using baseline BMC graphics. And KDE?
Well, it works fine. I use KDE on the T2, so now I'm using it on the Blackbird as well. If you really prefer a Gtk default, Xfce should also serve you well.

On the T2 with its AMD GPU, however, I dumped GNOME because of libadwaita encroaching on my customizations; even my shell theme has stopped working now. But the basics are fine: there are no more obvious problems with CTM, and performance seems similar to 37 with no obvious issues in Wayland or X11. On KDE, my customizations persisted without having to rework any of them, which is why I've converted fully over to KDE.

Overall, the F38 update was smooth and it runs pretty much like F37. If you had no problems with F37, you'll probably have no problems with this; you just won't see much improvement in some of the longstanding annoyances either.

Fedora 38


Fedora 38 is out — a week early, for a change. Fedora matters to us here at Orbiting Floodgap HQ because it's what we run on our Talos II and Blackbird systems and it should matter to you because, being a bleeding edge distro, changes occur there first that tricke down to other distributions. That's why we make efforts to do mini-reviews of each release. With F38's release F36 will be End of Life in one month.

The changeset for 38 is typically extensive. Possibly the most controversial was the change to globally build with -fno-omit-frame-pointer to facilitate better profiling and debugging, particularly where debugging information is not available, but at a cost as this also takes a register out of circulation to hold the frame pointer. The performance impact seems to be limited on x86_64 but I doubt much testing was done on ppc64le, and it should be noted that PowerPC is one of the gcc targets where leaf functions wouldn't use a frame pointer anyway. Time will tell if this pays off. Builds are also now made with _FORTIFY_SOURCE=3 (up from 2) for better security, and another interesting though probably irrelevant change for most is reducing the shutdown timer in systemd to 45 seconds from 2 minutes.

On the back-end F38 ships with kernel 6.2.x and gcc 13, LLVM 16, gmake 4.4, binutils 2.39, glibc 2.37 and gdb 12.1. F38 also has a major upgrade to microdnf as dnf5, the "future of package management" that may ultimately replace dnf entirely. On the front-end F38 updates GNOME to version 44, finally with grid thumbnail view in the file picker, a big overhaul to the Settings app and many new applications, as well as more apps moving to the unthemable libadwaita (but I run KDE Plasma now, and haven't looked back). Xfce also updates to 4.18, there's a new spin for the Sway window manager, and the SDDM display manager now also defaults to Wayland (we use a text boot to log in and start X11 manually, avoiding any display manager completely).

This is the first release to include the change that blocks clients with different endianness from connecting to the X server, including XWayland, which means that the compositor has to support the configurable option too (GNOME 44 Mutter does, others may not). At least you still have the option!

We'll give the mirrors a week or two to catch up on builds and then start the transition on our own machines, with the usual mini-review to follow. Stay tuned.

FreeBSD 13.2


And hot on the heels of the latest OpenBSD release is the latest FreeBSD iteration, 13.2-RELEASE. FreeBSD has a longer track record on OpenPOWER and in my cursory estimates is the most commonly installed BSD on modern Power ISA. One big jump is that the bhyve hypervisor now supports more than 16 virtual CPUs and by default can create the same number of vCPUs as physical CPUs, which is quite useful to us once you get away from the smallest single-4 machines given all our cores are SMT-4. Additionally, for those of you running FreeBSD on a VM (such as an LPAR or under KVM), nested POWER9 radix MMU mappings are now supported on the pseries flavour, substantially reducing hypercall overhead. The Linux compatibility ABI has also been expanded and on the security side ASLR is now enabled for all 64-bit executables by default, configurable through proccontrol. Downloads are available for big-endian and little-endian. Note that the release notes indicate that all PowerPC and Power ISA releases right now must run kldxref /boot/kernel manually after an upgraded successful kernel and world installation.

OpenBSD 7.3


OpenBSD 7.3 is released. While most of the improvements are not specific to Power ISA, there's a lot we benefit from, including many kernel calls which are now "lock-free" (improving SMP performance) like mmap(2) and select(2), more device support, immutable permissions on address ranges to prevent permissions from being changed in the future — much of a running program's static address space like stack, code and most libraries is now automatically immutable — and support for execute-only memory on both Power ISA and the PowerPC 970 ("G5"). LibreSSL is updated to 3.7.2, OpenSSH is updated to 9.3, and the OS ships with LLVM/clang 13.0.0 and Perl 5.36.0. Download and install when ready, Puffy.

Firefox 111 on POWER


This got a bit delayed due to $DAYJOB interfering with my important hacking and writing time (darn having to make a living), but Firefox 111 is out. As usual you'll need to deal with bug 1775202 either with this patch — but without the line containing desktop_capture/desktop_capture_gn, since that's been gone since the latest WebRTC update — or put --disable-webrtc in your .mozconfig if you don't need WebRTC. The workaround adding #pragma GCC diagnostic ignored "-Wnonnull" to js/src/irregexp/imported/regexp-parser.cc for optimized builds fortunately was addressed by bug 1810584, so you no longer need it, and the browser otherwise builds and works with the PGO-LTO patch for Firefox 110 and the .mozconfigs from Firefox 105.

Now your LLaMa is playing with POWER


Now that the invasion of the large language models has occurred and we will all bow to our GPT overlords, I just generated a pull request to add additional POWER9-specific optimizations to llama.cpp, what all the cool kids are using for LLMs who aren't down with OpenAI. This repo moves quick but it's where the magic is happening if this is what you're into. It will work with both Alpaca and LLaMa models.

In a previous article we talked about autovectorization using conversion of Intel vector intrinsics to POWER9, but this is good old fashioned assembly code and hand-written C. The part that really helped was changing their pure-C "F16" (half-precision) float conversion code to use VSX instead. The rolls-off-your-tongue POWER9-and-up xscvhpdp and xscvdphp instructions convert half-precision floats to and from double-precision respectively (xscvdphp will also work on single-precision, which is handy, because the explicit conversion is from single-precision "F32"), and we also use POWER8 mffprd and mtfprd for toll-free copies between general and float registers without requiring a spill to memory. That change alone is about 12 percent faster than the old pure-C compute and lookup code. Additionally, we also have our own vectorized version of quantize_row_q4_0 like ARM NEON and AVX-256 written with VMX/VSX intrinsics. It's even a little better, because we were able to use our VMX floating-point multiply-add and remove a couple minor inefficiencies in the code. Additionally, people used to G4 and G5-era AltiVec will enjoy the fact that the newer intrinsics substantially map directly to ARM's — I especially liked vec_extract as an all-purpose replacement for all of the NEON vget_lane_* variations, as well as vec_signed for vcvtq_s32_f32 for converting floats in place, and the all-purpose simplified vec_splats for making a splat vector out of anything — making conversion much more straightforward when you need to write your own code.

I did play with alpaca.cpp, the other older white meat, and the changes here should more or less apply to that codebase as well. However, given how quickly llama.cpp evolves and the greater development interest, llama.cpp seems the best way forward for continued evolution.

I will say in the spirit of full disclosure that despite these improvements my 16GB 4P/4E/8G M1 MacBook Air still pops out tokens several times faster than this 64GB dual-8 Talos II, even full-tilt with all 64 threads in use (the cat still looks startled every time the fans rev). On the other hand, we're also comparing a 2017 CPU with one from 2020, and one with specific hardware acceleration for neural networks that llama.cpp takes particular advantage of. Even with Power10's improved bfloat16 support and matrix math operations, specific work would be needed to support those features which won't be coming from me (stay tuned for Power11, I guess). There are other opportunities for vectorization to be done, though at the rate this code base evolves it would be better waiting for one of the mainstream architectures to pick up a SIMD version we can convert first. In the meantime, while you should be advised that going beyond the 7B or 13B models will require patience regardless of how much RAM you have, I think this is definitely better than what we started with.

Firefox 110 on POWER


Firefox 110 is out, with graphics performance improvements like GPU-accelerated 2D canvas and faster WebGL, and the usual under the hood updates. The record's still broken and bug 1775202 still is too, so you'll either need this patch — but this time without the line containing desktop_capture/desktop_capture_gn, since that's gone in the latest WebRTC update — or put --disable-webrtc in your .mozconfig if you don't need WebRTC at all. I also had to put #pragma GCC diagnostic ignored "-Wnonnull" into js/src/irregexp/imported/regexp-parser.cc for optimized builds to complete on this Fedora 37 system and I suspect this is a gcc bug; you may not need it if you're not using gcc 12.2.1 or build with clang. Finally, I trimmed yet another patch from the PGO-LTO diff, so use the new one for Firefox 110 and the .mozconfigs from Firefox 105.

Vikings now has Blackbirds


If you're on the other side of that great pond called the Atlantic, Vikings' OpenPOWER store now lists Blackbirds starting at €3695 + VAT. Not just the board, the package includes a "4-core DD2.3 (v2) CPU, 2U heatsink, 16GB ECC RAM, bequiet! TFX power supply, all packaged nicely in a Antec slim desktop case." That's already a nice quiet basic system and more than enough to get you started with OpenPOWER, but if you want something almost silent, consider pairing it with their so far exclusive water block assembly for POWER9 for €155 + VAT, though you'll need to BYO pump, tubing, reservoir and fluid.

Linux 6.2


Linux 6.2 is out. Among its marquee updates are improved Rust-in-kernel support (strings, formatting and printing, memory allocation, macros, etc.), adding TCP Protective Load Balancing (PLB) for IPv6, reducing the overhead of read-copy update (RCU) operations using lazy callbacks, performance and RAID improvements for Btrfs, and userspace support for runtime verification with safety-critical systems. And, of course, support for Apple silicon and Retbleed sucks less on Skylake, but who cares about that around here anyway?

On the Power ISA side, probably the most noteworthy change is official support for big endian ELFv2 kernels. A nice upgrade for our Sir Mix-A-Lot brigade! Another interesting commit is the one to allow compile time support for the lharx and lbarx instructions (present on ISA v2.06/POWER7 and up). The lwarx (32-bit word) and ldarx (64-bit doubleword) load instructions, along with the corresponding store instructions stwcx. and stdcx. (and a conditional branch), are used to implement atomic load-store-compare/exchange operations by placing and checking reservations on particular memory locations. The newer instructions can do this at halfword (short) and byte level respectively (with sthcx. and stbcx.) instead of reserving at least an entire 32-bit word, reducing contention in tightly packed structs. In the future, it might also benefit the newly introduced Power ISA-specific spinlock implementation as well, which is also new in this release.

Expect 6.2 to make it to bleeding edge users and Fedora in the very near future.

Tonight's game on OpenPOWER: Shadow Warrior


Well, it's been awhile since we expanded our games library, so let's go back to our regular fast food diet of FPSes and select one from the Build side of the house this time: Shadow Warrior. Build games have a reputation starting with Duke Nukem 3D (a game for another day) and that reputation is well-deserved, so let's get this out of the way: if you found these games iffy in the 1990s, rest assured they've aged badly, because you'll find the content level positively radioactive now between the adult humor, graphic violence and (this game in particular) incredibly inappropriate cultural stereotypes. Stop reading this article now and look at some of our other game builds.

On the other hand, Shadow Warrior was probably the most technically superior of the Build games (with the possible exception of Monolith's Blood): more sophisticated sector effects, coloured lighting, true transparency (including water, though used sparingly to avoid spoilers and performance issues), fog and clouds, larger levels, room-over-room effects and the part I liked the most (and was curiously missing from the classic Mac OS port by MacPlay-Westlake Interactive), voxel-based objects that were truly 3D. All of these features plus OpenGL have made it to JonoF's Shadow Warrior Port (JFSW), using Ken Silverman's Build and Polymost engines (more info).

JFSW builds pretty much out of the box with SDL 2; just type make (or make -j24 or such to exercise your other cores), then copy the .GRP group file from either the 3DRealms shareware install or a registered or retail version to ~/.jfsw (I used my MacPlay CD and named it swmac.grp). Shadow Warrior used redbook audio for the retail version, so for music, rip the tracks and save them as track02.ogg (intro) to track14.ogg ("Lo Wang Raps") in the same directory. Then go to where you've built JFSW and start the game with ./sw, and a configuration window will appear to select your resolution. Note that while widescreen resolutions are supported (and look good), the game still uses 4:3 assets, so things like Lo Wang's sword will be cut off.

A note on resolutions and colour depth: 8bpp modes are rendered 100% in software, which is very fast even on Blackbirds with just BMC graphics, and works beautifully on virtually any system. If you select a 24bpp mode, the game will try to use OpenGL. On my system this caused a freeze (actually an infinite loop, once I stepped through it in a debugger) whenever it attempts to render reflections in a mirror. This appears to be related to non-POT texture support which virtually every card anybody would be running probably supports properly. If you get the same freeze, kill the game and edit jfbuild/src/polymost.c. On line 4903 or thereabouts you'll see if ((method & METH_POW2XSPLIT) && (tsizx != xx)) which if you change to if (0) will get around the code that glitches. I can't tell if this is specific to my card, to OpenPOWER or to gcc, and it doesn't happen in software mode, which plays 100% fine all the way to the end including nuking Zilla himself.

Don't mess with Lo Wang.

Firefox 109 on POWER


Firefox 109 is out with new support for Manifest V3 extensions, but without the passive-aggressive deceitful crap Google was pushing (yet another reason not to use Chrome). There are also modest HTML, CSS and JS improvements.

As before linking still requires patching for bug 1775202 using this updated small change or the browser won't link on 64-bit Power ISA (alternatively put --disable-webrtc in your .mozconfig if you don't need WebRTC). Otherwise the browser builds and runs fine with the LTO-PGO patch for Firefox 108 and the .mozconfigs from Firefox 105.

In case you thought AIX had a future


In case you thought IBM AIX had a future, IBM's legacy proprietary Unix, IBM apparently doesn't. The Register reported Friday that IBM has moved the entire AIX development group to IBM India, apparently their Bangalore office, and placing 80 US-based developers into "redeployment." That's a fairly craven way of replacing layoffs with musical chairs, requiring the displaced developers to either find a new position within the company (possibly relocating as well) within some unspecified period, or retire. About a third of IBM's global staff is on the Indian subcontinent. IBM didn't publicly announce this move and while it's undoubtedly good news for IBM India it seems bad news for AIX's prospects: the technologies IBM thinks are up and coming IBM tends to spend money on, and so an obvious cost-cutting move suggests IBM doesn't think AIX is one of those things.

We've got a long history with AIX here at Floodgap Orbiting HQ when I first worked with AIX 3.2.5 and 4.1 in my University employment and consulting days, and I've run personal installations of AIX as my primary personal server since 1998, first on an Apple Network Server 500 and now on a 8203-E4A POWER6 p520. AIX 3 and 4 were surprisingly compelling workstation and server OSes for the time, but AIX 5L was where it started to feel "legacy" and unloved, and IBM has always been tightfisted about APARs and other kinds of updates if you don't buy a support contract. Combine that with nonsense like Capacity on Demand, where my second CPU was locked out after a system planar update until IBM coughed up a new set of keys, and I've already concluded this will be my last AIX server. While the next one will almost certainly be OpenPOWER, I'll probably run FreeBSD instead.

And, well, IBM would rather you ran Linux anyway on Power hardware, and so would their subsidiary Red Hat. If you're still an AIX institutional customer and you're still paying the bills, you'll still get support (just as you would with IBM i, the other white meat), but newly migrating to AIX is increasingly more trouble than it's worth paying for. Apparently IBM thinks so too.

Your X server may no longer swing both ways by default


As a long-time PowerPC and Power ISA bigot, there's a lot of Power-based hardware in this house — primarily Apple, but some IBM, and of course several Raptor systems. While many CPUs are capable of running big-endian or little-endian, Power ISA is probably the last architecture where there is still notable interest in running it in both modes: AIX, IBM i (a/k/a i5, AS/400), AmigaOS and OpenBSD run it big, FreeBSD primarily runs it big (but work exists to run it little), and most Linux distros run it little. Compare with the ostensibly bi-endian ARM and MIPS, which virtually all run little, and SPARC, which virtually all runs big (versus s390x, which only runs big, and of course x86 and x86_64 only runs little). Little-endian is gradually displacing big-endian even in the Power world (sorry), but it's still important.

When it was more commonplace for a discrepancy to exist, such as between mainframes and desktop X terminals or PCs, a feature was added to the X protocol where a connecting X client would advertise its endianness and if this did not match the server's, the server would byteswap for it. (Note that current Xorg may not allow remote connections without passing -listen tcp either from gdm/your display manager of choice or on the command line. On my Fedora 37 system, I do startx -- -listen tcp to enable incoming connections on my secured wired network. Don't forget anything you need to do with xhost or other authorizations. ssh forwarding is of course an alternative means.) This makes running X clients from my AIX POWER6, which is strictly big, possible on my Fedora 37 Talos II, which Fedora runs little. Here's the old beast now from the "WalMart server rack" next door.

And here's proof of connection in my usual KDE Plasma desktop (running aixterm and xlogo), showing that even the most current Xorg still supports it.
A new change to Xorg will now prohibit automatic byteswapping in the X server by default. A client connecting to a server that advertises a different endianness will be kicked off with an error. If you want this support, you'll either need to pass +byteswappedclients on the command line to the X server, or put "AllowByteSwappedClients" "on" in the Options stanza in your xorg.conf. This is also a change request for Fedora 38 which of this writing is still proposed and not accepted.

This means not only will this usage of a big-endian client to a little-endian server, which I use infrequently but not rarely, not work without changes, but will also fail for anyone running a bleeding-edge version of Xorg on a big-endian host (say, Linux on your Power Mac G5) that wants to run clients like a more current web browser from a little-endian server. The latter case is certainly less common than the former (mostly retrocomputing, whereas there are mainframe apps that people will want to have a local interface for), but I think there's more out there of both than folks suspect. Chesterton's fence and all that.

I will say that I appreciate this being turned into an option rather than outright removed, keeping in mind this is usually a prelude for outright removal later. After all, the code seems to have no test coverage in a codebase poorly covered by testing generally, and has caused documented security problems in the past. To the extent this is a better compromise than talking to the hand I support it. However, it also makes Wayland even less attractive than it already is because the ability to pass an option to Xwayland is compositor-specific (see this bug for, among others, GNOME Mutter), meaning you're at the mercy of what you're running and may not be able to change it easily yourself. Well, we're Xorg unto death around here anyway.