Posts

Latest Posts

Fedora 35 mini-review on the Blackbird and Talos II


Happy American Thanksgiving. While America watches football and eats deep-fried gobbler, we went to the Popeye's drivethru for chicken and I finished updating Fedora on my daily driver, now at version 35 (see our prior review of Fedora 34). As I always point out: while Fedora is a very common distro on OpenPOWER systems, even if you don't necessarily run Fedora yourself the fact that it does run is important, because it tends to be very ahead of most distros and many problems are identified and fixed in it before moving to other less advanced ones. I test it on my 4-core BMC graphics Blackbird and my dual-8 AMD WX7100 GPU Talos II.

F34 was a messy, unpleasant upgrade. I did the update first on my 4-core stock Blackbird, which I try to keep to stock Fedora as much as possible, though I note for the record both the Bird and the T2 are configured to come up in a text boot instead of gdm and I start GNOME manually from there. I strongly recommend this to act as a recovery mechanism in case your graphics card gets whacked by something or other. On Fedora this is easily done by ensuring the symlink /etc/systemd/system/default.target points to /lib/systemd/system/multi-user.target. Once you've logged into the console jump to GNOME with startx (set XDG_SESSION_TYPE to x11 if this isn't already done), or XDG_SESSION_TYPE=wayland dbus-run-session gnome-session if we want to explore the Wayland Wasteland. Since this is a minimal boot I can also do the upgrade at the same text prompt for speed and ensure as little interference as possible. As usual, the process is, from a root prompt:

dnf upgrade --refresh # upgrade prior system and DNF
dnf install dnf-plugin-system-upgrade # install upgrade plugin if not already done
dnf system-upgrade download --refresh --releasever=35 # download F35 packages
dnf system-upgrade reboot # reboot into upgrader

This went much more smoothly than F34, which had some weird conflicts; it was able to get the necessary packages right away and booted into the installer with no issue. Back at the text prompt, we started with Wayland, as I always do to see if it's still going to suck, and I'm still not disappointed. Performance was even worse than F34, it got glitchy just trying to take a grab with gnome-screenshot from the command line (see this Reddit thread) and BMC video (through the on-board HDMI connector) is still stuck at 1024x768. I took this on my Pixel 3 after I got tired of mucking around with it.

As before don't even bother with Wayland on a Blackbird if you don't have a GPU. Xorg worked fine but was still slow like F34 was. I'll get to that in a moment.
Otherwise, in Xorg, the system, Firefox and LibreOffice mostly worked as before modulo the performance problems, which was a relief.

The T2 tends to be a different story because I have this system heavily customized. Additionally, kernel 5.14 has a known problem with AMD Vega cards (add amdgpu.aspm=0 to your kernel command line as a workaround), and 5.15 may have an issue with amdgpu in power saving mode, so watch out for both of these problems depending on your GPU. (At least one user reported having to blacklist the AST BMC, though that wasn't necessary for me.)

The first problem was more elemental, however: after I downloaded the packages and ran the installation, it still came up offering an impossibly old kernel - the same thing I had to work around with updating to F34!

When I selected it, it started Fedora 35, but with this old 5.11-series kernel from Fedora 34. I did a manual grub2-mkconfig -o /boot/grub2/grub.cfg and restarted, and the Petitboot menu (built off the grub configuration) looked sane again. The text boot came up without incident.

Next, the desktop environment. Usually GNOME upgrades break a large number of my cherished extensions. Surprisingly, only Dash-to-Dock broke this time, which I rebuilt from a fork using these instructions. Note, however, that I do have disable-extension-version-validation set to true in dconf-editor which helps avoid a lot of churn.

However, the same GNOME regressions turned up in F35 that were in F34: CTM still makes a mess out of my custom colour profiles (again something like xrandr --output DisplayPort-0 --set CTM 0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1 will fix it, but this changes based on how your monitors are connected, and every time you [re]start GNOME you'll have to do it), colour calibration still crashes with my Pantone huey, and graphics were still awfully slow. This performance problem is once again libgraphene not being properly built to enable SIMD; the fix was made by the maintainer but the Fedora-distributed library doesn't seem to incorporate it properly. I rebuilt it on F35 and put a copy on Github. It will replace the file of the same name in /lib64 (remember to make a backup and don't do this while GNOME is running).

I'll not comment much further about Wayland except to say that it continues to meet my low expectations on the T2, but as it still doesn't support what my work habits require, I still don't use it. But you can, at least if you have a working discrete graphics card and you've updated libgraphene. For me, Xorg forever, I guess.

My conclusion is damning with faint praise: at least it wasn't any worse. And with these tweaks it works fine. If you're on F34 you have no reason not to upgrade, and if you're on F33 you won't have much longer until you have to (and you might as well just jump right to F35 at that point). But it's still carrying an odd number of regressions (even though, or perhaps despite the fact, the workarounds for F35 are the same as F34) and the installation on the T2 was bumpier than the Blackbird for reasons that remain unclear to me. If you run KDE or Xfce or anything other than GNOME, you shouldn't have any problems, but if you still use GNOME as your desktop environment you should be prepared to do more preparatory work to get it off the ground. I have higher hopes for F36 because we may finally get that float128 update that still wrecks a small but notable selection of packages like MAME, but I also hope that some of these regressions get dealt with as well because that would make these updates a bit more liveable. Any system upgrade of any OS will make you wonder what's going to break this time, but the most recent Fedora updates have come off as more fraught with peril than they ought to be.

If you like big-endian and Void and cannot lie ...


... then you other brothers can't stand by: Void PPC, probably one of the most finely tuned distributions for Power ISA systems (and one of the few still supporting Power Macs), needs big endian maintainers due to the work needed to maintain those four flavours, i.e., 32-bit PowerPC and 64-bit BE Power multiplied by musl and glibc. I totally get the idea of not maintaining what you don't personally use, which is one of the reasons I cut loose TenFourFox and Classilla earlier. It's a shame but it's awfully hard to justify dedicating resources to a free product that isn't personally beneficial. The new BE Void PPC maintainer would be responsible for doing the builds as well as fixing issues, but it should be possible to coordinate hosting the packages on an official mirror. I imagine it's negotiable to do only glibc or only 64-bit or some such depending on the hardware or interest you have.

If no one steps up, the big-endian musl repos go first by the end of this year, and the glibc repos will be discontinued in January 2023. Little-endian 64-bit is unaffected as is the experimental little-endian 32-bit flavour. Interested community members will want to take a look at the Void PPC Github.

51,552 JavaScript tests can't be wrong


Yeah, so about that OpenPOWER Minimum Viable Product JavaScript JIT for Firefox. This happened (all timings from an unoptimized debug build on my dual-8 Talos II with -j24):

% ./mach jstests --args "--no-ion --no-baseline --blinterp-eager --regexp-warmup-threshold=0" -F -j24

[43359|    0|    0|  614] 100% ======================================>| 529.7s
PASS
% ./mach jstests --args "--no-ion --no-baseline" -F -j24
[43359|    0|    0|  614] 100% ======================================>| 499.0s
PASS
% js/src/jit-test/jit_test.py --args "--no-ion --no-baseline --blinterp-eager --regexp-warmup-threshold=0" -f -j24 obj/dist/bin/js
[8193|   0|   0|   0] 100% ==========================================>| 132.3s
PASSED ALL
% js/src/jit-test/jit_test.py --args "--no-ion --no-baseline" -f -j24 obj/dist/bin/js
[8193|   0|   0|   0] 100% ==========================================>| 133.3s
PASSED ALL

That's a wrap, folks: the MVP, defined as Baseline Interpreter with irregexp and Wasm support for little-endian POWER9, is now officially V. This is the first and lowest of the JIT tiers, but is already a significant improvement; the JavaScript conformance suite executed using the same interpreter with --no-ion --no-baseline --no-blinterp --no-native-regexp took 762.4 seconds (1.53x as long) and one test timed out completely. An optimized build would be even faster.

Currently the code generator makes heavy use of POWER9-specific instructions, as well as VSX to make efficient use of the FPU. There are secondary goals of little-endian POWER8 and big-endian support (including pre-OpenPOWER so your G5 can play too), but these weren't necessary for the MVP, and we'd need someone actually willing to maintain those since I don't run Linux on my G5 or my POWER6 and I don't run any of my OpenPOWER systems big. While we welcome patches for them, they won't hold up primary support for POWER9 little-endian, which is currently the only "tier 1" platform. I note parenthetically this should also work on LE Power10 but as a matter of policy I'm not going to allow any special support for the architecture until IBM gets off their corporate rear end and actually releases the firmware source code. No free work for a chip that isn't!

You should be able to build a JIT-enabled Firefox 86 off of what's in the Github tree now, but my current goal is to pull it up to 91ESR so that it can be issued as patches against a stable branch of Firefox. These patches will be part of my ongoing future status updates for Firefox on OpenPOWER (yes, you'll need to build it yourself, though I'm pondering setting up a Fedora copr at some point). The next phase will be getting Baseline Compiler passing everything, which should be largely done already because of the existing Baseline Interpreter and Wasm support, and then the final Ion JIT stage, which still needs a lot of work. We'll most likely set up a separate tree for it so you can help (ahem). No promises right now but I'd like to see the completed JIT reach the Firefox source tree in time for the next ESR, which is Firefox 102. That's more than you can say for Chrome/Chromium, which so far has refused to accept OpenPOWER-specific work at all.

#ShowUsYourTalos


It's been a while since we did this, and even longer since we showed an actual Talos system, but here's Martin Kukač's Blackbird's new sexy case to contain its 8-core CPU, 32GB RAM and GeForce 210 GPU. The polished metal and open bottom, plus the vertical row of ports and power, make for a nice transitional look from the old Power Mac G5.

If you've got a well-coiffed OpenPOWER workstation to show off, post in the comments. Plus, somebody has to have an actual T2 or T2 Lite they're proud of, or I'm going to have to come up with a new hash tag.

Big and little POWER shouldn't just be endian


While the majority of OpenPOWER installations by this point are probably running little-endian, every single POWER chip runs big — big power usage, that is. While POWER9 is still performance-competitive with x86_64 and this situation continues to improve as more software gets better optimized, and there have been huge gains since POWER4/the PowerPC 970 in particular, POWER chips still run relatively hot and relatively hungry. Anandtech tried to normalize this for POWER8 systems by estimating transactions per watt; power measurements can be very imprecise and depend on more than just the system architecture, but even with that consideration the tested Tyan POWER8 in particular was outclassed by nearly a factor of three by a Xeon E5-2699. Possibly in response POWER9 is more aggressive with power savings than POWER8 and makes a lot of microarchitectural improvements, using 25% less juice for 50% more zip (so roughly a doubling of performance per watt), and Power10 supposedly improves on POWER9's performance per watt even more by at least 2.6 times according to IBM's figures.

But IBM's playbook for improving perf per watt hasn't really changed. Either you're boosting performance by juicing the microarch, jimmying IPC with more instructions and more cores, or both, or you're trying to diminish power usage with heavier clock speed throttling or turning off cores. While shooting the die budget at lower-wattage pack-in accelerators is a clever hybrid approach, their application-specific nature also means they're rather less useful in typical situations than their marketing would allege (look at how little currently uses the gzip accelerator in every POWER9, for example). You can do a lot with strategies like these — AMD certainly does — but sooner or later you'll hit a wall somewhere, either against the particular limitations of the design you're working with or against the intrinsic physical limitations of making a hippo do gymnastics while eating fewer calories.

Apple Silicon has a lot of concerning issues with it from a free computing perspective, but its performance is impressive, and its performance per watt is jaw-dropping. A lot of this is the secret sauce in their microarch which ironically came from P.A. Semi, originally a Power ISA licensee, and some may be due to details of the on-board GPU. But a good portion is also due to the big core-little core approach largely pioneered with the ARM big.LITTLE Cortex A7 and used to great effect in the M1 series. After all, if you want to get the best of both worlds, make some of the cores use less power and give those cores tasks that require less oomph (efficiency or E-cores), reserving the heavy tasks for the big ones (power or P-cores). Intel thinks so too: Lakefield and Alder Lake both attempt the same sort of heterogenous CPU topology for x86_64, and it would be inconceivable to believe AMD isn't looking to make the same jump for their next iteration.

The chief issue with going that route is making sure that the cores are getting work commensurate with their capabilities. This is easy for Apple since they control the whole banana: macOS Quality of Service is all about doing just that (you'd think they would do something based on nice levels as well, but I guess all the sweet talk about being desktop Un*x went out the window somewhere around Mavericks). Linux added initial support for big.LITTLE with kernel 3.10 but it took years for other improvements to the Linux scheduler to make it meaningful. Intel made things worse for themselves in Lakefield and Alder Lake by using lower power Atom-based E-cores that didn't support AVX-512 (and the Tremont E-cores in Lakefield didn't even support AVX2, meaning such tasks couldn't be run by them at all). Rather than hinting Windows 11 or the internal hardware not to send AVX-512 code to the Gracemont E-cores, Alder Lake just doesn't support AVX-512, full stop — on any core. Kernel 5.13 supports Alder Lake, but kernel 5.15 has dawned and there is no specific Intel Thread Manager Support so far, though there is scheduler support for AArch64 E-cores that can't run 32-bit code. And Alder Lake is turning out to be very power-hungry, which calls some of the design into question, in addition to various compatibility issues when software unwittingly puts tasks on the E-cores that don't work as expected.

Still, the time is coming where Power ISA should start thinking about a big-little CPU, maybe even for Power11. We already have big cores (if IBM will ever get their heads out of their rear ends and release the firmware source), but we also have an already extant little OpenPOWER core: Microwatt. While Microwatt doesn't support everything that POWER9 or Power10's large cores do, it's still intended to be a fully compliant OpenPOWER core, and since the Linux kernel is already starting to cater to heterogenous designs a set of POWER8-compliant Microwatt E-cores could still execute on the same die along with a set of Power11 full fat P-cores. Add logic on-chip to move threads to the P-cores if they hit an instruction the E-cores don't support and you're already most of the way there with relatively minor changes to the Linux kernel.

What IBM — or any future OpenPOWER chip builder, though so far no one else is in the performance category — needs to avoid is what seems to be dooming Alder Lake: they've managed to hit the bad luck jackpot with a chip that not only uses more power but has more compatibility problems. Software updates will fix this issue somewhat but a little more forethought might have staved it off, and the apparent greater wattage draw should have been noticed long before it left the lab. But IBM has already shown wattage improvements over the last two generations and if the P- and E-core functionalities are made appropriately comparable, a big-little Power11 — with open firmware please! — could be a very compelling next upgrade for the next generation of Power-based workstations and servers. Apple has clearly demonstrated that highly efficient and powerful computing experiences are possible when hardware and software align. There's no reason OpenPOWER and Linux or *BSD can't do the same on open platforms.

Firefox 94 on POWER


Firefox 94 is released. I have little interest in the colourizer, but I do like about:unloads and EGL support on Linux for great WebGL justice even on X11 (I don't use the Wayland Wasteland), at least if you have an AMD/ATI card like the WX7100 Raptor sells as a BTO option. There are also various performance improvements and a fun feature where you can use a different Mozilla VPN server for each separate multi-account container, the latter probably being Firefox's most useful capability right now. The LTO-PGO patch is unchanged from Firefox 93 and the .mozconfigs are unchanged from Firefox 90.