Posts

Showing posts from 2023

Rocky Linux 9.2 is rocky


Although Rocky Linux 9.2 emerged on Tuesday, one of the architectures wasn't ppc64le - the release was held back. This seems to be due to a Power-specific bug in the provided build of Python 3.9, and also affects RHEL 9.2 (but not, near as I can tell, Fedora 38, which ships with Python 3.11 and runs fine on my systems). There are also no build artifacts available and there is currently no ETA for repair. Because Rocky Linux's mirrorlist can't hold back just one architecture, you'll need to add --releasever 9.1 (or change /etc/dnf/vars/releasever) to ensure dnf update doesn't get polluted with later metadata until the revised architecture spin is available.

Firefox 113 on POWER


Yes, I skipped a version, sosumi. I'm running a little low on development space on the NVMe drive, but still managed to squeeze in Firefox 113 which introduces enhanced video Picture-in-Picture, more secure private windows and password generation, support for AVIS images, debugger improvements and additional CSS and API features. As usual you'll need to deal with bug 1775202 either with this patch — but without the line containing desktop_capture/desktop_capture_gn, since that's long gone — or put --disable-webrtc in your .mozconfig if you don't need WebRTC. The browser otherwise builds and works with the PGO-LTO patch for Firefox 110 and the .mozconfigs from Firefox 105.

Fedora 38 mini-review on the Blackbird and Talos II


This article would have come out sooner except I also wanted to test building Firefox in Fedora 38, and then when I tried to run libnxz/power-gzip to test out the POWER9 nest accelerator make check made my daily driver Talos II machine check and caused Hostboot to guard out the entire CPU with the NVMe drives attached (and fixing that caused Petitboot to barf on a stuck XFS log entry again, requiring a trip to the Blackbird to mount and repair it). But here we are.

As I always say in these mini-reviews, Fedora was one of the first mainstream distributions to support POWER9 out of the box, it's still one of the top distributions OpenPOWER denizens use and its position closest to the bleeding, ragged edge is where we see problems emerge first and get fixed (hopefully) before they move further downstream. That's why it's worth caring about it even if you yourself don't run it.

Also, as usual, recall both my T2 and Blackbird are configured to come up in a text boot instead of gdm and I start KDE manually from there. I still test GNOME on both systems, but I've pretty much entirely migrated over to KDE Plasma, and you should never have considered my GNOME testing to be exhaustive anyway. I strongly recommend a non-graphical boot as a recovery mechanism in case your graphics card gets whacked by something or other. On Fedora this is easily done by ensuring the symlink /etc/systemd/system/default.target points to /lib/systemd/system/multi-user.target.

Because of issues with dnf kernel updates still sometimes not updating the grub config (basically bug 1921479, showing messages like 0ed84c0-p94177c1: integer expression expected during the process), I've added a little extra paranoia to the usual install dance. To wit:

dnf upgrade --refresh # upgrade prior system and DNF
grub2-mkconfig -o /boot/grub2/grub2.cfg # force grub to update
dnf install dnf-plugin-system-upgrade # install upgrade plugin if not already done
dnf system-upgrade download --refresh --releasever=38 # download F38 packages
dnf system-upgrade reboot # reboot into upgrader

This went fairly smoothly on both systems. Other than a copr package with a stale prerequisite I had to remove, there were no issues or conflicts with the 38 packages. As long as you manually select the new kernel in Petitboot before the system starts, you'll get some sort of installation screen. On the Blackbird's HDMI output from the ASPEED BMC framebuffer, the same friendly GUI installer will appear as in prior releases:

But even without using BMC video, like on the T2 with the Raptor-BTO WX7100 workstation card, as before you'll still get to see the install log live as text (which by now I've found more useful anyway). If you forget to manually select the kernel and the system comes up to an apparently black screen, you can either monitor on the serial port, or from a connected system viewing the serial console over the BMC's web server, or by logging into another VTY with CTRL-ALT-F2 or as appropriate as root and periodically issuing dnf system-upgrade log --number=-1 to watch log updates.

The update did not cause a stuck XFS log entry this time on either the Blackbird or the T2, but after the reboot I did need to do one more grub2-mkconfig -o /boot/grub2/grub2.cfg and a restart to ensure the right kernel and version were being used. Currently the kernel version as of this writing is 6.2.14.

Our first stop on the BMC-only Blackbird is GNOME on Wayland, started (awkwardly) with XDG_SESSION_TYPE=wayland /usr/libexec/gnome-session-binary --builtin. This configuration hasn't visibly improved any from Fedora 37; there are still prominent artifacts moving windows around and display through the HDMI adapter is still limited to 1024x768.

Performance wasn't hideous but the artifacts were distracting. I couldn't get a screenshot of it in Spectacle so I just grabbed a picture on my Pixel 7 Pro. However, the story isn't a whole lot better in GNOME on X11:
While we now have a full 1920x1020, you can see that the title bar still isn't being painted correctly. This occurred with most of the applications I tried. I consider this a critical fault due to the smearing, so I can't really recommend GNOME at all under any window system if you're using baseline BMC graphics. And KDE?
Well, it works fine. I use KDE on the T2, so now I'm using it on the Blackbird as well. If you really prefer a Gtk default, Xfce should also serve you well.

On the T2 with its AMD GPU, however, I dumped GNOME because of libadwaita encroaching on my customizations; even my shell theme has stopped working now. But the basics are fine: there are no more obvious problems with CTM, and performance seems similar to 37 with no obvious issues in Wayland or X11. On KDE, my customizations persisted without having to rework any of them, which is why I've converted fully over to KDE.

Overall, the F38 update was smooth and it runs pretty much like F37. If you had no problems with F37, you'll probably have no problems with this; you just won't see much improvement in some of the longstanding annoyances either.

Fedora 38


Fedora 38 is out — a week early, for a change. Fedora matters to us here at Orbiting Floodgap HQ because it's what we run on our Talos II and Blackbird systems and it should matter to you because, being a bleeding edge distro, changes occur there first that tricke down to other distributions. That's why we make efforts to do mini-reviews of each release. With F38's release F36 will be End of Life in one month.

The changeset for 38 is typically extensive. Possibly the most controversial was the change to globally build with -fno-omit-frame-pointer to facilitate better profiling and debugging, particularly where debugging information is not available, but at a cost as this also takes a register out of circulation to hold the frame pointer. The performance impact seems to be limited on x86_64 but I doubt much testing was done on ppc64le, and it should be noted that PowerPC is one of the gcc targets where leaf functions wouldn't use a frame pointer anyway. Time will tell if this pays off. Builds are also now made with _FORTIFY_SOURCE=3 (up from 2) for better security, and another interesting though probably irrelevant change for most is reducing the shutdown timer in systemd to 45 seconds from 2 minutes.

On the back-end F38 ships with kernel 6.2.x and gcc 13, LLVM 16, gmake 4.4, binutils 2.39, glibc 2.37 and gdb 12.1. F38 also has a major upgrade to microdnf as dnf5, the "future of package management" that may ultimately replace dnf entirely. On the front-end F38 updates GNOME to version 44, finally with grid thumbnail view in the file picker, a big overhaul to the Settings app and many new applications, as well as more apps moving to the unthemable libadwaita (but I run KDE Plasma now, and haven't looked back). Xfce also updates to 4.18, there's a new spin for the Sway window manager, and the SDDM display manager now also defaults to Wayland (we use a text boot to log in and start X11 manually, avoiding any display manager completely).

This is the first release to include the change that blocks clients with different endianness from connecting to the X server, including XWayland, which means that the compositor has to support the configurable option too (GNOME 44 Mutter does, others may not). At least you still have the option!

We'll give the mirrors a week or two to catch up on builds and then start the transition on our own machines, with the usual mini-review to follow. Stay tuned.

FreeBSD 13.2


And hot on the heels of the latest OpenBSD release is the latest FreeBSD iteration, 13.2-RELEASE. FreeBSD has a longer track record on OpenPOWER and in my cursory estimates is the most commonly installed BSD on modern Power ISA. One big jump is that the bhyve hypervisor now supports more than 16 virtual CPUs and by default can create the same number of vCPUs as physical CPUs, which is quite useful to us once you get away from the smallest single-4 machines given all our cores are SMT-4. Additionally, for those of you running FreeBSD on a VM (such as an LPAR or under KVM), nested POWER9 radix MMU mappings are now supported on the pseries flavour, substantially reducing hypercall overhead. The Linux compatibility ABI has also been expanded and on the security side ASLR is now enabled for all 64-bit executables by default, configurable through proccontrol. Downloads are available for big-endian and little-endian. Note that the release notes indicate that all PowerPC and Power ISA releases right now must run kldxref /boot/kernel manually after an upgraded successful kernel and world installation.

OpenBSD 7.3


OpenBSD 7.3 is released. While most of the improvements are not specific to Power ISA, there's a lot we benefit from, including many kernel calls which are now "lock-free" (improving SMP performance) like mmap(2) and select(2), more device support, immutable permissions on address ranges to prevent permissions from being changed in the future — much of a running program's static address space like stack, code and most libraries is now automatically immutable — and support for execute-only memory on both Power ISA and the PowerPC 970 ("G5"). LibreSSL is updated to 3.7.2, OpenSSH is updated to 9.3, and the OS ships with LLVM/clang 13.0.0 and Perl 5.36.0. Download and install when ready, Puffy.

Firefox 111 on POWER


This got a bit delayed due to $DAYJOB interfering with my important hacking and writing time (darn having to make a living), but Firefox 111 is out. As usual you'll need to deal with bug 1775202 either with this patch — but without the line containing desktop_capture/desktop_capture_gn, since that's been gone since the latest WebRTC update — or put --disable-webrtc in your .mozconfig if you don't need WebRTC. The workaround adding #pragma GCC diagnostic ignored "-Wnonnull" to js/src/irregexp/imported/regexp-parser.cc for optimized builds fortunately was addressed by bug 1810584, so you no longer need it, and the browser otherwise builds and works with the PGO-LTO patch for Firefox 110 and the .mozconfigs from Firefox 105.

Now your LLaMa is playing with POWER


Now that the invasion of the large language models has occurred and we will all bow to our GPT overlords, I just generated a pull request to add additional POWER9-specific optimizations to llama.cpp, what all the cool kids are using for LLMs who aren't down with OpenAI. This repo moves quick but it's where the magic is happening if this is what you're into. It will work with both Alpaca and LLaMa models.

In a previous article we talked about autovectorization using conversion of Intel vector intrinsics to POWER9, but this is good old fashioned assembly code and hand-written C. The part that really helped was changing their pure-C "F16" (half-precision) float conversion code to use VSX instead. The rolls-off-your-tongue POWER9-and-up xscvhpdp and xscvdphp instructions convert half-precision floats to and from double-precision respectively (xscvdphp will also work on single-precision, which is handy, because the explicit conversion is from single-precision "F32"), and we also use POWER8 mffprd and mtfprd for toll-free copies between general and float registers without requiring a spill to memory. That change alone is about 12 percent faster than the old pure-C compute and lookup code. Additionally, we also have our own vectorized version of quantize_row_q4_0 like ARM NEON and AVX-256 written with VMX/VSX intrinsics. It's even a little better, because we were able to use our VMX floating-point multiply-add and remove a couple minor inefficiencies in the code. Additionally, people used to G4 and G5-era AltiVec will enjoy the fact that the newer intrinsics substantially map directly to ARM's — I especially liked vec_extract as an all-purpose replacement for all of the NEON vget_lane_* variations, as well as vec_signed for vcvtq_s32_f32 for converting floats in place, and the all-purpose simplified vec_splats for making a splat vector out of anything — making conversion much more straightforward when you need to write your own code.

I did play with alpaca.cpp, the other older white meat, and the changes here should more or less apply to that codebase as well. However, given how quickly llama.cpp evolves and the greater development interest, llama.cpp seems the best way forward for continued evolution.

I will say in the spirit of full disclosure that despite these improvements my 16GB 4P/4E/8G M1 MacBook Air still pops out tokens several times faster than this 64GB dual-8 Talos II, even full-tilt with all 64 threads in use (the cat still looks startled every time the fans rev). On the other hand, we're also comparing a 2017 CPU with one from 2020, and one with specific hardware acceleration for neural networks that llama.cpp takes particular advantage of. Even with Power10's improved bfloat16 support and matrix math operations, specific work would be needed to support those features which won't be coming from me (stay tuned for Power11, I guess). There are other opportunities for vectorization to be done, though at the rate this code base evolves it would be better waiting for one of the mainstream architectures to pick up a SIMD version we can convert first. In the meantime, while you should be advised that going beyond the 7B or 13B models will require patience regardless of how much RAM you have, I think this is definitely better than what we started with.

Firefox 110 on POWER


Firefox 110 is out, with graphics performance improvements like GPU-accelerated 2D canvas and faster WebGL, and the usual under the hood updates. The record's still broken and bug 1775202 still is too, so you'll either need this patch — but this time without the line containing desktop_capture/desktop_capture_gn, since that's gone in the latest WebRTC update — or put --disable-webrtc in your .mozconfig if you don't need WebRTC at all. I also had to put #pragma GCC diagnostic ignored "-Wnonnull" into js/src/irregexp/imported/regexp-parser.cc for optimized builds to complete on this Fedora 37 system and I suspect this is a gcc bug; you may not need it if you're not using gcc 12.2.1 or build with clang. Finally, I trimmed yet another patch from the PGO-LTO diff, so use the new one for Firefox 110 and the .mozconfigs from Firefox 105.

Vikings now has Blackbirds


If you're on the other side of that great pond called the Atlantic, Vikings' OpenPOWER store now lists Blackbirds starting at €3695 + VAT. Not just the board, the package includes a "4-core DD2.3 (v2) CPU, 2U heatsink, 16GB ECC RAM, bequiet! TFX power supply, all packaged nicely in a Antec slim desktop case." That's already a nice quiet basic system and more than enough to get you started with OpenPOWER, but if you want something almost silent, consider pairing it with their so far exclusive water block assembly for POWER9 for €155 + VAT, though you'll need to BYO pump, tubing, reservoir and fluid.

Linux 6.2


Linux 6.2 is out. Among its marquee updates are improved Rust-in-kernel support (strings, formatting and printing, memory allocation, macros, etc.), adding TCP Protective Load Balancing (PLB) for IPv6, reducing the overhead of read-copy update (RCU) operations using lazy callbacks, performance and RAID improvements for Btrfs, and userspace support for runtime verification with safety-critical systems. And, of course, support for Apple silicon and Retbleed sucks less on Skylake, but who cares about that around here anyway?

On the Power ISA side, probably the most noteworthy change is official support for big endian ELFv2 kernels. A nice upgrade for our Sir Mix-A-Lot brigade! Another interesting commit is the one to allow compile time support for the lharx and lbarx instructions (present on ISA v2.06/POWER7 and up). The lwarx (32-bit word) and ldarx (64-bit doubleword) load instructions, along with the corresponding store instructions stwcx. and stdcx. (and a conditional branch), are used to implement atomic load-store-compare/exchange operations by placing and checking reservations on particular memory locations. The newer instructions can do this at halfword (short) and byte level respectively (with sthcx. and stbcx.) instead of reserving at least an entire 32-bit word, reducing contention in tightly packed structs. In the future, it might also benefit the newly introduced Power ISA-specific spinlock implementation as well, which is also new in this release.

Expect 6.2 to make it to bleeding edge users and Fedora in the very near future.

Tonight's game on OpenPOWER: Shadow Warrior


Well, it's been awhile since we expanded our games library, so let's go back to our regular fast food diet of FPSes and select one from the Build side of the house this time: Shadow Warrior. Build games have a reputation starting with Duke Nukem 3D (a game for another day) and that reputation is well-deserved, so let's get this out of the way: if you found these games iffy in the 1990s, rest assured they've aged badly, because you'll find the content level positively radioactive now between the adult humor, graphic violence and (this game in particular) incredibly inappropriate cultural stereotypes. Stop reading this article now and look at some of our other game builds.

On the other hand, Shadow Warrior was probably the most technically superior of the Build games (with the possible exception of Monolith's Blood): more sophisticated sector effects, coloured lighting, true transparency (including water, though used sparingly to avoid spoilers and performance issues), fog and clouds, larger levels, room-over-room effects and the part I liked the most (and was curiously missing from the classic Mac OS port by MacPlay-Westlake Interactive), voxel-based objects that were truly 3D. All of these features plus OpenGL have made it to JonoF's Shadow Warrior Port (JFSW), using Ken Silverman's Build and Polymost engines (more info).

JFSW builds pretty much out of the box with SDL 2; just type make (or make -j24 or such to exercise your other cores), then copy the .GRP group file from either the 3DRealms shareware install or a registered or retail version to ~/.jfsw (I used my MacPlay CD and named it swmac.grp). Shadow Warrior used redbook audio for the retail version, so for music, rip the tracks and save them as track02.ogg (intro) to track14.ogg ("Lo Wang Raps") in the same directory. Then go to where you've built JFSW and start the game with ./sw, and a configuration window will appear to select your resolution. Note that while widescreen resolutions are supported (and look good), the game still uses 4:3 assets, so things like Lo Wang's sword will be cut off.

A note on resolutions and colour depth: 8bpp modes are rendered 100% in software, which is very fast even on Blackbirds with just BMC graphics, and works beautifully on virtually any system. If you select a 24bpp mode, the game will try to use OpenGL. On my system this caused a freeze (actually an infinite loop, once I stepped through it in a debugger) whenever it attempts to render reflections in a mirror. This appears to be related to non-POT texture support which virtually every card anybody would be running probably supports properly. If you get the same freeze, kill the game and edit jfbuild/src/polymost.c. On line 4903 or thereabouts you'll see if ((method & METH_POW2XSPLIT) && (tsizx != xx)) which if you change to if (0) will get around the code that glitches. I can't tell if this is specific to my card, to OpenPOWER or to gcc, and it doesn't happen in software mode, which plays 100% fine all the way to the end including nuking Zilla himself.

Don't mess with Lo Wang.

Firefox 109 on POWER


Firefox 109 is out with new support for Manifest V3 extensions, but without the passive-aggressive deceitful crap Google was pushing (yet another reason not to use Chrome). There are also modest HTML, CSS and JS improvements.

As before linking still requires patching for bug 1775202 using this updated small change or the browser won't link on 64-bit Power ISA (alternatively put --disable-webrtc in your .mozconfig if you don't need WebRTC). Otherwise the browser builds and runs fine with the LTO-PGO patch for Firefox 108 and the .mozconfigs from Firefox 105.

In case you thought AIX had a future


In case you thought IBM AIX had a future, IBM's legacy proprietary Unix, IBM apparently doesn't. The Register reported Friday that IBM has moved the entire AIX development group to IBM India, apparently their Bangalore office, and placing 80 US-based developers into "redeployment." That's a fairly craven way of replacing layoffs with musical chairs, requiring the displaced developers to either find a new position within the company (possibly relocating as well) within some unspecified period, or retire. About a third of IBM's global staff is on the Indian subcontinent. IBM didn't publicly announce this move and while it's undoubtedly good news for IBM India it seems bad news for AIX's prospects: the technologies IBM thinks are up and coming IBM tends to spend money on, and so an obvious cost-cutting move suggests IBM doesn't think AIX is one of those things.

We've got a long history with AIX here at Floodgap Orbiting HQ when I first worked with AIX 3.2.5 and 4.1 in my University employment and consulting days, and I've run personal installations of AIX as my primary personal server since 1998, first on an Apple Network Server 500 and now on a 8203-E4A POWER6 p520. AIX 3 and 4 were surprisingly compelling workstation and server OSes for the time, but AIX 5L was where it started to feel "legacy" and unloved, and IBM has always been tightfisted about APARs and other kinds of updates if you don't buy a support contract. Combine that with nonsense like Capacity on Demand, where my second CPU was locked out after a system planar update until IBM coughed up a new set of keys, and I've already concluded this will be my last AIX server. While the next one will almost certainly be OpenPOWER, I'll probably run FreeBSD instead.

And, well, IBM would rather you ran Linux anyway on Power hardware, and so would their subsidiary Red Hat. If you're still an AIX institutional customer and you're still paying the bills, you'll still get support (just as you would with IBM i, the other white meat), but newly migrating to AIX is increasingly more trouble than it's worth paying for. Apparently IBM thinks so too.

Your X server may no longer swing both ways by default


As a long-time PowerPC and Power ISA bigot, there's a lot of Power-based hardware in this house — primarily Apple, but some IBM, and of course several Raptor systems. While many CPUs are capable of running big-endian or little-endian, Power ISA is probably the last architecture where there is still notable interest in running it in both modes: AIX, IBM i (a/k/a i5, AS/400), AmigaOS and OpenBSD run it big, FreeBSD primarily runs it big (but work exists to run it little), and most Linux distros run it little. Compare with the ostensibly bi-endian ARM and MIPS, which virtually all run little, and SPARC, which virtually all runs big (versus s390x, which only runs big, and of course x86 and x86_64 only runs little). Little-endian is gradually displacing big-endian even in the Power world (sorry), but it's still important.

When it was more commonplace for a discrepancy to exist, such as between mainframes and desktop X terminals or PCs, a feature was added to the X protocol where a connecting X client would advertise its endianness and if this did not match the server's, the server would byteswap for it. (Note that current Xorg may not allow remote connections without passing -listen tcp either from gdm/your display manager of choice or on the command line. On my Fedora 37 system, I do startx -- -listen tcp to enable incoming connections on my secured wired network. Don't forget anything you need to do with xhost or other authorizations. ssh forwarding is of course an alternative means.) This makes running X clients from my AIX POWER6, which is strictly big, possible on my Fedora 37 Talos II, which Fedora runs little. Here's the old beast now from the "WalMart server rack" next door.

And here's proof of connection in my usual KDE Plasma desktop (running aixterm and xlogo), showing that even the most current Xorg still supports it.
A new change to Xorg will now prohibit automatic byteswapping in the X server by default. A client connecting to a server that advertises a different endianness will be kicked off with an error. If you want this support, you'll either need to pass +byteswappedclients on the command line to the X server, or put "AllowByteSwappedClients" "on" in the Options stanza in your xorg.conf. This is also a change request for Fedora 38 which of this writing is still proposed and not accepted.

This means not only will this usage of a big-endian client to a little-endian server, which I use infrequently but not rarely, not work without changes, but will also fail for anyone running a bleeding-edge version of Xorg on a big-endian host (say, Linux on your Power Mac G5) that wants to run clients like a more current web browser from a little-endian server. The latter case is certainly less common than the former (mostly retrocomputing, whereas there are mainframe apps that people will want to have a local interface for), but I think there's more out there of both than folks suspect. Chesterton's fence and all that.

I will say that I appreciate this being turned into an option rather than outright removed, keeping in mind this is usually a prelude for outright removal later. After all, the code seems to have no test coverage in a codebase poorly covered by testing generally, and has caused documented security problems in the past. To the extent this is a better compromise than talking to the hand I support it. However, it also makes Wayland even less attractive than it already is because the ability to pass an option to Xwayland is compositor-specific (see this bug for, among others, GNOME Mutter), meaning you're at the mercy of what you're running and may not be able to change it easily yourself. Well, we're Xorg unto death around here anyway.