Showing posts from April, 2021

OpenBSD 6.9

Great past few days for new OpenPOWER operating system support: now the newest release of OpenBSD is available as well to complement Ubuntu and Fedora updates earlier this week. Many improvements have landed to the big-endian powerpc64 port since its début in OpenBSD 6.8, most notably framebuffer support for the ASPEED BMC, workarounds for addressing AMD GPUs over PCIe, power-saving mode for POWER9, and IPMI on PowerNV systems (which is all Raptor-family machines and quite a few others). General to all ports include an additional privacy guard for video devices (similar to the existing one for audio) among other webcam fixes, encrypted RAID-1, performance improvements to SMP, lots of additional hardware drivers, security fixes and updates to networking and crypto.

With this release, your BSD choices on OpenPOWER just got more solid between this and the mature FreeBSD port. Again, the real shame is why there's still no support for OpenPOWER in NetBSD.

Prefixed instructions and more in the OpenPOWER 64-bit ELF V2 ABI Specification

This is pretty nerdy, but you read this blog, so you have no room to talk. The ABI specification for 64-bit ELF v2, which is virtually everything running little-endian and many big-endian 64-bit Power systems, is now in draft for version 1.5. Although library interfaces are not in spec, the document defines a linking interface for executables and shared objects so that register usage and calling convenion is consistent. Rarely if ever would such updates include breaking changes and the small version number increment implies no large shifts, but those of us who write JITs and compilers pay attention to these updates since they may yield new opportunities to generate better code, and they also occasionally shed light on new or upcoming instructions.

Besides incorporating errata from the previous version 1.4, and moving vector programming to a separate document, the most interesting change here is related to POWER10. We've mentioned prefixed instructions briefly here before, a new class of load/store operations available in POWER10 and PowerISA 3.1 that effectively create 64-bit instructions: a first half (prefix) containing up to 18 bits of a displacement, and a second half (suffix) containing the lower 16. This enables to you to express as many as 34 bits of displacement for a memory address with an instruction like pld (as opposed to the 32-bit classical instruction ld with a 16-bit displacement); previously the maximum you could indicate was 26 bits, and only for instructions like unconditional branches that allowed it. However, the R bit in the prefix allows you to use the address of the prefix itself as the register against which the displacement is added, rather than having to have a general purpose register do it or (for non-local data) the dedicated TOC register. (Recall that in Power ISA the program counter is not a general purpose register.) This is actually a big deal for things like constant pools (embedding constants directly into many RISC-style instructions is generally unwieldy) because now you can just squirt local data right into whatever function fragment you're generating instead of having to keep track of it separately. This is mentioned briefly in the ISA 3.1 manual but the ABI spec makes it more prominent as a feature.

Unfortunately, various complexities in instruction dispatch make this less useful than it would appear to be. Prefixed instructions cannot be split over 64-byte (not bit!) instruction address boundaries or else an alignment exception occurs, which even if the OS handles it for you would be expensive. On the other hand, the CPU is clearly treating them as two 32-bit pieces because the prefix is always followed by the suffix regardless of the endianness (i.e., in little endian mode, the suffix is not in front of the prefix), and there are also debugging irregularities in that some suffixes are actually regular instructions that mean something different when used as a prefix. These can be detected by looking at bit 6 of the prefix (not the suffix), which if set indicates the suffix is a valid instruction that the prefix changes the behaviour of, but one wonders if more changes are to come and we don't need any more mscdfr (means something completely different for r0)-type situations in the ISA. A great example is pnop (yes, a prefixed no-operation instruction): you'd think the suffix would be ignored, and it mostly is, except if it's a branch instruction, rfebb, any context synchronizer other than isync or a service processor attention instruction. The ISA 3.1 book benignly says, "This restriction eases hardware implementation complexity." Well, thanks a lot! Does your head hurt too?

Again, I'm not a fan of introducing variable length instructions into what was a fairly regular instruction set and there are many important gotchas which to me seemed avoidable, but the displacement features are welcome and it makes certain on-the-metal programming tasks easier. Always watch these deceptively boring documents closely because there are sometimes valuable signals in their changelogs. Unfortunately, until the situation with POWER10 and OMI gets worked out, this is of largely academic interest.

Fedora 34

And hot on the heels of Ubuntu 21.04 is the latest iteration of Fedora, version 34. Fedora is of particular interest here at Floodgap Orbiting HQ, not only because it serves as an early warning indicator for problems on OpenPOWER as one of the most cutting-edge distros, but it's also the distro I'm typing this blog post into and personally use on a daily basis. Now that F34 has hit release, F32 will become EOL in one month as usual.

Most of us are interested in Workstation-level changes and the most notable is GNOME 40, which introduces new and sure-to-be-controversial changes to Activities (though if this helps multiple display management I'll be a believer), separation in the dash of running apps and favourites (good), and additional shortcuts and gesture support. Other system-wide changes include transparent by-default zstd compression for btrfs, routing all audio through PipeWire (including PulseAudio, JACK and legacy ALSA), enabling systemd-oomd by default, updating to glibc 2.33 as well as gcc 11, llvm 12 and binutils 2.35, upgrading to Ruby 3.0, and (another controversial one) using Wayland by default for KDE Plasma users as well. Another nice minor change is that kernel firmware files will now be compressed by default, saving a bit of space.

On the OpenPOWER side, however, specific platform improvements are rather thin on the ground. 128-bit long double got deferred again, which I've been tracking since Fedora 30 (!!) as certain packages like MAME require it to build out of the box, and there has been little appetite to consider a Workstation-specific 4K page option.

Because I confidently expect GNOME 40 will break all my extensions, and some minor interval is required to ensure all the packages are built for ppc64le, our usual mini-review for F34 will follow in a couple weeks on both Blackbird and T2 systems. Meanwhile, read how it went with F33 in preparation.

Ubuntu 21.04 and the expanding Wayland Wasteland

It's not really Power-specific to be disenchanted with Wayland; there are lots of people who don't like it even on majority platforms like x86_64. I also think that, much like the residual disdain for systemd, a fair amount of the backlash comes from some profoundly unwarranted scope creep in the project. X11 has a lot of historical cruft in its lower reaches which deserved at minimum a solid refactor and I do appreciate Wayland's engineering improvements, but Wayland throws the baby out with the bathwater by putting way too much on the back of the compositor, and as far as claims over security and network transparency are concerned one man's security hole is always another one's convenience. Most of the Wayland developers just throw up their hands when confronted with some functionality that was fine in X11 and say "patches welcome" and "don't expect us to scratch your itch," and then wonder why people get cheesed off when stuff quits working. It would be less aggravating if the process were less headlong but such is the state of Linux desktop development where only established players with their own priorities have traction.

That said, the problem is somewhat more acute on OpenPOWER because of the lack of a libre GPU. Right now, if you don't trust AMD or Nvidia, your solitary choice is the on-board ASPEED BMC and that gives you a 2D framebuffer, period. (Even Kestrel won't fix that.) Performance used to be abysmal under Wayland and now is tolerable, though there are still various problems, and even performance with a GPU seemed to regress a little in Fedora 33. I still don't use it anyway because no current Wayland compositor will tell you what the front window is, nor does it seem any of them care about that, even though X11 facilitated this for literally decades. Again, why not run everything through XWayland by default, let Wayland-friendly apps opt out, and get the best of both worlds for (nearly) free? Why intentionally p*ss everyone off by telling them their working edge cases don't matter?

Nevertheless, the Wayland Wasteland expands with Ubuntu 21.04 "Hirsute Hippo" (release notes), which now also makes Wayland the default as it has been in Fedora for many versions now. Fedora allows you to opt out by either running startx manually (as I do) or for those of you running gdm to set WaylandEnable=false in /etc/gdm/custom.conf, and this functionality will probably remain for as long as X is supported in Red Hat (I'm guessing end of support for RHEL 7, maybe 8). In Ubuntu currently you can do the same thing, but the file is /etc/gdm3/daemon.conf instead (or use the cog on the login screen, though the login screen would still come up in Wayland unless you set that flag). As before little-endian OpenPOWER systems (which Ubuntu calls ppc64el) are officially only offered a Server build for download, but you can then convert it to Desktop.

Should you upgrade? If you're happy with X11 on your OpenPOWER system and the performance is good, maybe you should just stick with 20.04, which is a Long Term Support release (21.04 isn't) and will get updates until 2025. But if the future really is the Wayland Wasteland, at least getting more people stuck in the sand will mean some of these rough spots could get smoothed over, and a better software-only rendering pipeline would at least improve the firmware-free use case. In the meantime, hello, X11: you may be ugly and everyone says you smell bad, but you've never gotten in my way.

Will Kestrel become the better BMC?

Raptor Engineering's Microwatt-powered Kestrel BMC replacement is improving by leaps and bounds. The screenshot was from a Twitter post showing its internal Web server (like OpenBMC) and ability to update its own firmware (also like OpenBMC). And naturally everything is open-source, based on Zephyr.

But that's not the part that attracted me most. What really got me excited was a 10-second start time. Yup, you read that right: Kestrel is ready and able to bring up the system 10 seconds after power is applied, compared to a good couple minutes or more with the current ASPEED BMC running OpenBMC — and the majority of that ten seconds is programming the FPGA. While a couple minutes isn't a big deal on a server system, it's a real problem when it's a desktop, something I complained about way back in my Blackbird semi-review since in its role as a household HTPC it gets powered up and down a fair bit, and shaving off literal minutes of time to login is huge in that setting (as well as anywhere else OpenPOWER is being used as a "small" system).

OpenBMC on ASPEED is by no means perfect in other respects, either. Raptor claims upstream has been slow to incorporate improvements to the user interface and fan control (though the project disputes this). On the user side, this Raptor Talos II is pretty quiet but it's also a big EATX Supermicro chassis with two HSFs and multiple case fans; the Blackbird in its lithe mATX case tends to have an annoying habit of spooling up and down even in a cool room, even with quieted fan assemblies. And it's always been the case with IBM and IBM-derived hardware where one bad fan can sometimes make the difference between booting or not (the ASMI in my personal POWER6 will refuse, and has refused, to power on the main CPUs if all the fans aren't fully operational).

However, some of the slowness may be due to the current requirement on Power ISA designs that the BMC be fully up before offering the virtual PNOR to the main CPUs, which apparently isn't necessary on x86 and allows some parts of bring-up to occur in parallel. A partial hardware solution may be needed to mitigate that deficiency. OpenBMC does have some ideas, including converting OpenBMC's initramfs and UBI to zstd compression which shaves a few more seconds off, and some of the BMC forks out there have done more by jettisoning entire components judged generally unnecessary (but with corresponding impacts to flexibility, and none have gained significant traction).

It may well be that OpenBMC, because it needs to be all things to all deployments, may not be the best place for firmware designed primarily for workstations. If so, then Kestrel (when it's fully "a thing") would be the next best option. However, that doesn't yield an obvious solution for the installed base like the three Raptor systems here, and installing Kestrel on an existing board is still not a trivial process (nor has, at least of this writing, it been advertised to work on the T2 or T2 Lite). Raptor probably doesn't want to be in the board upgrade business either, so any envisioned Kestrel upgrade for older systems would need to be user-installable, and preferably without a soldering iron. I'm all thumbs with one myself and even more so when it's SMT.

Baseband management is only one part of what makes a system liveable, but for desktop machines it's not an insignificant one. No matter what form it ends up taking, any improvements make a difference. And if a Kestrel board is the current way forward, at least we know it will certainly be as trustworthy, if not more so, as the ASPEEDs we already use.

Firefox 88 on POWER

Firefox 88 is out. In addition to a bunch of new CSS properties, JavaScript is now supported in PDF files even within Firefox's own viewer, meaning there is no escape, and FTP is disabled, meaning you will need to use 78ESR (though you get two more weeks of ESR as a reprieve, since Firefox 89 has been delayed to allow UI code to further settle). I've long pondered doing a generic "cURL extension" that would reenable all sorts of protocols through a shim to either curl or libcurl; maybe it's time for it.

Fortunately Fx88 builds uneventually as usual on OpenPOWER, though our PGO-LTO patches (apply to the tree with patch -p1) required a slight tweak to nsTerminator.cpp. Debug and optimized .mozconfigs are unchanged.

Also, an early milestone in the Firefox JavaScript JIT for OpenPOWER: Justin Hibbits merged my earlier jitpower work to a later tree (right now based on Firefox 86) and filled in the gaps with code from TenFourFox, and after some polishing up I did over the weekend, a JIT-enabled JavaScript shell now compiles on Fedora ppc64le. However, it immediately asserts due to probably some missing defintions for register sets, and I'm sure there are many other assertions and lurking bugs to be fixed, but this is much further along than before. The fork is on Github for others who wish to contribute; I will probably decommission the old jitpower project soon since it is now superfluous. More to come.

FreeBSD 13 and Guix for OpenPOWER

After a bit of downtime, we're back. And cool stuff has happened in our absence, the most notable being additional improvements to the increasingly mature OpenPOWER port of FreeBSD. 13-RELEASE, among other changes, officially introduces the 64-bit little-endian port (previously exclusively big-endian, which is still supported), experimental radix MMU support for POWER9 (hashed page tables are of course supported everywhere), XIVE interrupt support on POWER9 (about 10% faster), optimized memcpy(), memmove() and like-minded standard functions, and many stability and performance improvements. The releases notes say that "performance during bulk -a package building is at least 60% higher" which is very impressive. ISOs are available from their download server.

In addition, ppc64le support has been merged to the GNU Guix source tree, meaning with the next expected version 1.2.1 you'll hopefully be able to get a pre-built copy. It's been in development for several months and now it appears to be finally approaching reality. Like Guix the package manager, the GNU Guix System's most notable feature is its declarative service and package configuration, all on top of the GNU Shepherd init system and (right now) Linux 5.9. Currently there is still a reproducibility issue with gcc, rust is still at least somewhat experimental (which is relevant for librsvg) and many packages have not been tested. Still, since the Talos II and T2 Lite are GNU Respects Your Freedom systems, now you can run another GNU-free OS on them too and sooner than you think.