Posts
Rocky Linux 9.2 is rocky
- Get link
- Other Apps
Firefox 113 on POWER
- Get link
- Other Apps
Fedora 38 mini-review on the Blackbird and Talos II
As I always say in these mini-reviews, Fedora was one of the first mainstream distributions to support POWER9 out of the box, it's still one of the top distributions OpenPOWER denizens use and its position closest to the bleeding, ragged edge is where we see problems emerge first and get fixed (hopefully) before they move further downstream. That's why it's worth caring about it even if you yourself don't run it.
Also, as usual, recall both my T2 and Blackbird are configured to come up in a text boot instead of gdm and I start KDE manually from there. I still test GNOME on both systems, but I've pretty much entirely migrated over to KDE Plasma, and you should never have considered my GNOME testing to be exhaustive anyway. I strongly recommend a non-graphical boot as a recovery mechanism in case your graphics card gets whacked by something or other. On Fedora this is easily done by ensuring the symlink /etc/systemd/system/default.target points to /lib/systemd/system/multi-user.target.
Because of issues with dnf kernel updates still sometimes not updating the grub config (basically bug 1921479, showing messages like 0ed84c0-p94177c1: integer expression expected during the process), I've added a little extra paranoia to the usual install dance. To wit:
dnf upgrade --refresh # upgrade prior system and DNF
grub2-mkconfig -o /boot/grub2/grub2.cfg # force grub to update
dnf install dnf-plugin-system-upgrade # install upgrade plugin if not already done
dnf system-upgrade download --refresh --releasever=38 # download F38 packages
dnf system-upgrade reboot # reboot into upgrader
This went fairly smoothly on both systems. Other than a copr package with a stale prerequisite I had to remove, there were no issues or conflicts with the 38 packages. As long as you manually select the new kernel in Petitboot before the system starts, you'll get some sort of installation screen. On the Blackbird's HDMI output from the ASPEED BMC framebuffer, the same friendly GUI installer will appear as in prior releases:
But even without using BMC video, like on the T2 with the Raptor-BTO WX7100 workstation card, as before you'll still get to see the install log live as text (which by now I've found more useful anyway). If you forget to manually select the kernel and the system comes up to an apparently black screen, you can either monitor on the serial port, or from a connected system viewing the serial console over the BMC's web server, or by logging into another VTY with CTRL-ALT-F2 or as appropriate as root and periodically issuing dnf system-upgrade log --number=-1 to watch log updates.The update did not cause a stuck XFS log entry this time on either the Blackbird or the T2, but after the reboot I did need to do one more grub2-mkconfig -o /boot/grub2/grub2.cfg and a restart to ensure the right kernel and version were being used. Currently the kernel version as of this writing is 6.2.14.
Our first stop on the BMC-only Blackbird is GNOME on Wayland, started (awkwardly) with XDG_SESSION_TYPE=wayland /usr/libexec/gnome-session-binary --builtin. This configuration hasn't visibly improved any from Fedora 37; there are still prominent artifacts moving windows around and display through the HDMI adapter is still limited to 1024x768.
Performance wasn't hideous but the artifacts were distracting. I couldn't get a screenshot of it in Spectacle so I just grabbed a picture on my Pixel 7 Pro. However, the story isn't a whole lot better in GNOME on X11: While we now have a full 1920x1020, you can see that the title bar still isn't being painted correctly. This occurred with most of the applications I tried. I consider this a critical fault due to the smearing, so I can't really recommend GNOME at all under any window system if you're using baseline BMC graphics. And KDE? Well, it works fine. I use KDE on the T2, so now I'm using it on the Blackbird as well. If you really prefer a Gtk default, Xfce should also serve you well.On the T2 with its AMD GPU, however, I dumped GNOME because of libadwaita encroaching on my customizations; even my shell theme has stopped working now. But the basics are fine: there are no more obvious problems with CTM, and performance seems similar to 37 with no obvious issues in Wayland or X11. On KDE, my customizations persisted without having to rework any of them, which is why I've converted fully over to KDE.
Overall, the F38 update was smooth and it runs pretty much like F37. If you had no problems with F37, you'll probably have no problems with this; you just won't see much improvement in some of the longstanding annoyances either.
- Get link
- Other Apps
Fedora 38
The changeset for 38 is typically extensive. Possibly the most controversial was the change to globally build with -fno-omit-frame-pointer to facilitate better profiling and debugging, particularly where debugging information is not available, but at a cost as this also takes a register out of circulation to hold the frame pointer. The performance impact seems to be limited on x86_64 but I doubt much testing was done on ppc64le, and it should be noted that PowerPC is one of the gcc targets where leaf functions wouldn't use a frame pointer anyway. Time will tell if this pays off. Builds are also now made with _FORTIFY_SOURCE=3 (up from 2) for better security, and another interesting though probably irrelevant change for most is reducing the shutdown timer in systemd to 45 seconds from 2 minutes.
On the back-end F38 ships with kernel 6.2.x and gcc 13, LLVM 16, gmake 4.4, binutils 2.39, glibc 2.37 and gdb 12.1. F38 also has a major upgrade to microdnf as dnf5, the "future of package management" that may ultimately replace dnf entirely. On the front-end F38 updates GNOME to version 44, finally with grid thumbnail view in the file picker, a big overhaul to the Settings app and many new applications, as well as more apps moving to the unthemable libadwaita (but I run KDE Plasma now, and haven't looked back). Xfce also updates to 4.18, there's a new spin for the Sway window manager, and the SDDM display manager now also defaults to Wayland (we use a text boot to log in and start X11 manually, avoiding any display manager completely).
This is the first release to include the change that blocks clients with different endianness from connecting to the X server, including XWayland, which means that the compositor has to support the configurable option too (GNOME 44 Mutter does, others may not). At least you still have the option!
We'll give the mirrors a week or two to catch up on builds and then start the transition on our own machines, with the usual mini-review to follow. Stay tuned.
- Get link
- Other Apps
FreeBSD 13.2
- Get link
- Other Apps
OpenBSD 7.3
- Get link
- Other Apps
Firefox 111 on POWER
- Get link
- Other Apps
Now your LLaMa is playing with POWER
In a previous article we talked about autovectorization using conversion of Intel vector intrinsics to POWER9, but this is good old fashioned assembly code and hand-written C. The part that really helped was changing their pure-C "F16" (half-precision) float conversion code to use VSX instead. The rolls-off-your-tongue POWER9-and-up xscvhpdp and xscvdphp instructions convert half-precision floats to and from double-precision respectively (xscvdphp will also work on single-precision, which is handy, because the explicit conversion is from single-precision "F32"), and we also use POWER8 mffprd and mtfprd for toll-free copies between general and float registers without requiring a spill to memory. That change alone is about 12 percent faster than the old pure-C compute and lookup code. Additionally, we also have our own vectorized version of quantize_row_q4_0 like ARM NEON and AVX-256 written with VMX/VSX intrinsics. It's even a little better, because we were able to use our VMX floating-point multiply-add and remove a couple minor inefficiencies in the code. Additionally, people used to G4 and G5-era AltiVec will enjoy the fact that the newer intrinsics substantially map directly to ARM's — I especially liked vec_extract as an all-purpose replacement for all of the NEON vget_lane_* variations, as well as vec_signed for vcvtq_s32_f32 for converting floats in place, and the all-purpose simplified vec_splats for making a splat vector out of anything — making conversion much more straightforward when you need to write your own code.
I did play with alpaca.cpp, the other older white meat, and the changes here should more or less apply to that codebase as well. However, given how quickly llama.cpp evolves and the greater development interest, llama.cpp seems the best way forward for continued evolution.
I will say in the spirit of full disclosure that despite these improvements my 16GB 4P/4E/8G M1 MacBook Air still pops out tokens several times faster than this 64GB dual-8 Talos II, even full-tilt with all 64 threads in use (the cat still looks startled every time the fans rev). On the other hand, we're also comparing a 2017 CPU with one from 2020, and one with specific hardware acceleration for neural networks that llama.cpp takes particular advantage of. Even with Power10's improved bfloat16 support and matrix math operations, specific work would be needed to support those features which won't be coming from me (stay tuned for Power11, I guess). There are other opportunities for vectorization to be done, though at the rate this code base evolves it would be better waiting for one of the mainstream architectures to pick up a SIMD version we can convert first. In the meantime, while you should be advised that going beyond the 7B or 13B models will require patience regardless of how much RAM you have, I think this is definitely better than what we started with.
- Get link
- Other Apps
Firefox 110 on POWER
- Get link
- Other Apps
Vikings now has Blackbirds
- Get link
- Other Apps
Linux 6.2
On the Power ISA side, probably the most noteworthy change is official support for big endian ELFv2 kernels. A nice upgrade for our Sir Mix-A-Lot brigade! Another interesting commit is the one to allow compile time support for the lharx and lbarx instructions (present on ISA v2.06/POWER7 and up). The lwarx (32-bit word) and ldarx (64-bit doubleword) load instructions, along with the corresponding store instructions stwcx. and stdcx. (and a conditional branch), are used to implement atomic load-store-compare/exchange operations by placing and checking reservations on particular memory locations. The newer instructions can do this at halfword (short) and byte level respectively (with sthcx. and stbcx.) instead of reserving at least an entire 32-bit word, reducing contention in tightly packed structs. In the future, it might also benefit the newly introduced Power ISA-specific spinlock implementation as well, which is also new in this release.
Expect 6.2 to make it to bleeding edge users and Fedora in the very near future.
- Get link
- Other Apps
Tonight's game on OpenPOWER: Shadow Warrior
On the other hand, Shadow Warrior was probably the most technically superior of the Build games (with the possible exception of Monolith's Blood): more sophisticated sector effects, coloured lighting, true transparency (including water, though used sparingly to avoid spoilers and performance issues), fog and clouds, larger levels, room-over-room effects and the part I liked the most (and was curiously missing from the classic Mac OS port by MacPlay-Westlake Interactive), voxel-based objects that were truly 3D. All of these features plus OpenGL have made it to JonoF's Shadow Warrior Port (JFSW), using Ken Silverman's Build and Polymost engines (more info).
JFSW builds pretty much out of the box with SDL 2; just type make (or make -j24 or such to exercise your other cores), then copy the .GRP group file from either the 3DRealms shareware install or a registered or retail version to ~/.jfsw (I used my MacPlay CD and named it swmac.grp). Shadow Warrior used redbook audio for the retail version, so for music, rip the tracks and save them as track02.ogg (intro) to track14.ogg ("Lo Wang Raps") in the same directory. Then go to where you've built JFSW and start the game with ./sw, and a configuration window will appear to select your resolution. Note that while widescreen resolutions are supported (and look good), the game still uses 4:3 assets, so things like Lo Wang's sword will be cut off.
A note on resolutions and colour depth: 8bpp modes are rendered 100% in software, which is very fast even on Blackbirds with just BMC graphics, and works beautifully on virtually any system. If you select a 24bpp mode, the game will try to use OpenGL. On my system this caused a freeze (actually an infinite loop, once I stepped through it in a debugger) whenever it attempts to render reflections in a mirror. This appears to be related to non-POT texture support which virtually every card anybody would be running probably supports properly. If you get the same freeze, kill the game and edit jfbuild/src/polymost.c. On line 4903 or thereabouts you'll see if ((method & METH_POW2XSPLIT) && (tsizx != xx)) which if you change to if (0) will get around the code that glitches. I can't tell if this is specific to my card, to OpenPOWER or to gcc, and it doesn't happen in software mode, which plays 100% fine all the way to the end including nuking Zilla himself.
Don't mess with Lo Wang.- Get link
- Other Apps
Firefox 109 on POWER
As before linking still requires patching for bug 1775202 using this updated small change or the browser won't link on 64-bit Power ISA (alternatively put --disable-webrtc in your .mozconfig if you don't need WebRTC). Otherwise the browser builds and runs fine with the LTO-PGO patch for Firefox 108 and the .mozconfigs from Firefox 105.
- Get link
- Other Apps
In case you thought AIX had a future
We've got a long history with AIX here at Floodgap Orbiting HQ when I first worked with AIX 3.2.5 and 4.1 in my University employment and consulting days, and I've run personal installations of AIX as my primary personal server since 1998, first on an Apple Network Server 500 and now on a 8203-E4A POWER6 p520. AIX 3 and 4 were surprisingly compelling workstation and server OSes for the time, but AIX 5L was where it started to feel "legacy" and unloved, and IBM has always been tightfisted about APARs and other kinds of updates if you don't buy a support contract. Combine that with nonsense like Capacity on Demand, where my second CPU was locked out after a system planar update until IBM coughed up a new set of keys, and I've already concluded this will be my last AIX server. While the next one will almost certainly be OpenPOWER, I'll probably run FreeBSD instead.
And, well, IBM would rather you ran Linux anyway on Power hardware, and so would their subsidiary Red Hat. If you're still an AIX institutional customer and you're still paying the bills, you'll still get support (just as you would with IBM i, the other white meat), but newly migrating to AIX is increasingly more trouble than it's worth paying for. Apparently IBM thinks so too.
- Get link
- Other Apps
Your X server may no longer swing both ways by default
When it was more commonplace for a discrepancy to exist, such as between mainframes and desktop X terminals or PCs, a feature was added to the X protocol where a connecting X client would advertise its endianness and if this did not match the server's, the server would byteswap for it. (Note that current Xorg may not allow remote connections without passing -listen tcp either from gdm/your display manager of choice or on the command line. On my Fedora 37 system, I do startx -- -listen tcp to enable incoming connections on my secured wired network. Don't forget anything you need to do with xhost or other authorizations. ssh forwarding is of course an alternative means.) This makes running X clients from my AIX POWER6, which is strictly big, possible on my Fedora 37 Talos II, which Fedora runs little. Here's the old beast now from the "WalMart server rack" next door.
And here's proof of connection in my usual KDE Plasma desktop (running aixterm and xlogo), showing that even the most current Xorg still supports it. A new change to Xorg will now prohibit automatic byteswapping in the X server by default. A client connecting to a server that advertises a different endianness will be kicked off with an error. If you want this support, you'll either need to pass +byteswappedclients on the command line to the X server, or put "AllowByteSwappedClients" "on" in the Options stanza in your xorg.conf. This is also a change request for Fedora 38 which of this writing is still proposed and not accepted.This means not only will this usage of a big-endian client to a little-endian server, which I use infrequently but not rarely, not work without changes, but will also fail for anyone running a bleeding-edge version of Xorg on a big-endian host (say, Linux on your Power Mac G5) that wants to run clients like a more current web browser from a little-endian server. The latter case is certainly less common than the former (mostly retrocomputing, whereas there are mainframe apps that people will want to have a local interface for), but I think there's more out there of both than folks suspect. Chesterton's fence and all that.
I will say that I appreciate this being turned into an option rather than outright removed, keeping in mind this is usually a prelude for outright removal later. After all, the code seems to have no test coverage in a codebase poorly covered by testing generally, and has caused documented security problems in the past. To the extent this is a better compromise than talking to the hand I support it. However, it also makes Wayland even less attractive than it already is because the ability to pass an option to Xwayland is compositor-specific (see this bug for, among others, GNOME Mutter), meaning you're at the mercy of what you're running and may not be able to change it easily yourself. Well, we're Xorg unto death around here anyway.
- Get link
- Other Apps