Posts

Showing posts from 2021

Better x86 emulation with Live CDs


Yes, build a better emulator and the world will beat a path to your door to run their old brown x86 binaries. Right now that emulator is QEMU. Even if you run Hangover for Windows binaries, it's still QEMU underneath (and Hangover only works with 4K page kernels currently, leaving us stock Fedora ppc64le users out), and if you want to run Linux x86 or x86_64 binaries on your OpenPOWER box, it's going to be QEMU in user mode for sure.

However, one of the downers of this approach is that you also need system libraries. Hangover embeds Wine to solve this problem (and builds them natively for ppc64le to boot), but QEMU user mode needs the actual shared libraries themselves for the target architecture. This often involves labouriously copying them from foreign architecture packages and can be a slow process of trying and failing to acquire them all, and you get to do it all over again when you upgrade. Instead, just use a live CD/DVD as your library source: you can keep everything in one place (often using less space), and upgrading becomes merely a matter of downloading a new live image.

My real-world use for this is running the old brown Palm OS Emulator, which I've been playing with for retrocomputing purposes. Although the emulator source code is available, it's heavily 32-bit and I've had to make some really scary hacks to the files; I'm not sure I'll ever get it compiling on 64-bit Linux. But there is a pre-built 32-bit i386 binary. I've got a Palm m515 ROM, a death wish and too little to do after work. Let's boot this sucker up. Note that in these examples I'm "still" using QEMU 5.2.0. 6.1.0 had various problems and crashed at one point which I haven't investigated in detail. You might consider building QEMU 5.2.0 in a separate standalone directory (plus-minus juicing it) for this purpose.

We'll use the Debian live CD in this article, though any suitable live distro should do. Since POSE is i386, we'll need that particular architecture image. Download it and mount the ISO (which appears as d-live 11.0.0 gn i386 as of this writing).

The actual filesystem during normal operation is a squashfs image in the live directory. You can mount this with mount, but I use squashfuse for convenience. Similarly, while you could mount the ISO itself every time you need to do this, I just copy the squashfs image out and save a couple hundred megabytes. Then, from where you put it, make sure you have an ~/mnt folder (mkdir ~/mnt), and then: squashfuse debian-11-i386.squashfs ~/mnt

Let's test it on Captain Solo. After all, we've just mounted a squashfs image with a whole mess of alien binaries, so:

% ~/src/qemu-5.2.0/build/qemu-i386 -L ~/mnt ~/mnt/bin/uname -m
i686

And now we can return Luke Skywalker to the Emperor: ~/src/qemu-5.2.0/build/qemu-i386 -L ~/mnt pose

Here it is, running a Palm image using an m515 ROM I copied over from my Mac.

However, uname and pose are both single binaries each in a single place. Let's pick a more complex example with resources, assets and other loadable components like a game. I happen to be a fan of the old Monolith anime-style shooter Shogo: Mobile Armor Division, which originated on Windows (GOG still sells it) but was also ported to the classic Mac OS and Linux by Hyperion. (The soundtrack CD is wonderful.) I own a boxed physical copy not only of the Windows release but also the Mac version, which is quite hard to find, and the retail Linux version is reportedly even rarer. While there have been promising recent developments with open-source versions of the LithTech engine, Shogo was the first LithTech game and apparently used a very old version which doesn't yet function. There is, however, a widely available Linux demo.

The demo which you download from there appears to just be a large i386 binary. But if you run it using the method above, you'll only get a weird error trying to run another binary from a temporary mount point. That's because it's actually an ISO image with an i386 ELF mounter in the header, so rename it to shogo.iso and mount it yourself. On my system GNOME puts it in /run/user/spectre/ISOIMAGE.

To set options before bringing up the main game, Shogo uses a custom launcher (on all platforms), but you can't just run it directly because Debian doesn't have all the libraries the launcher wants:

% ~/src/qemu-5.2.0/build/qemu-i386 -L ~/mnt /run/media/spectre/ISOIMAGE/shogolauncher
/run/media/spectre/ISOIMAGE/shogolauncher: error while loading shared libraries: libgtk-1.2.so.0: cannot open shared object file: No such file or directory

You could try to scare up a copy of that impossibly old version of GTK, but in the Loki_Compat directory of the Shogo ISO is the desired shared object already. (Not Loki Entertainment: this Loki, a former Monolith employee.) You can't give qemu-i386 multiple -L options, but you can give environment variables to its ELF loader, so we'll just specify a custom LD_LIBRARY_PATH. For the next couple steps it will be necessary for us to actually be in the Shogo mounted image so it can find all of its data files, thusly:

% cd /run/media/spectre/ISOIMAGE
% ~/src/qemu-5.2.0/build/qemu-i386 -L ~/mnt -E LD_LIBRARY_PATH="/run/media/spectre/ISOIMAGE/Loki_Compat" ./shogolauncher

We've bypassed the shell script that actually handles the entire startup process, so when you select your options, instead of starting the game it will dump a command line to execute to the screen. This is convenient! To start out with, I picked a windowed 640x480 resolution using the software renderer and disabled sound (it doesn't work anyway, probably due to the age of the libraries it was developed with), got the command line and ran that through QEMU. Boom:
And, as long as you crank the detail level down to low from the main menu, it's playable!
A lot doesn't work: it doesn't save games because you're running it out of an ISO (copy it elsewhere if you want to); there is no sound, probably, as stated, due to the age of the libraries (the game itself dates to 1998 and the Linux port to 2001); and don't even think about trying to launch it using OpenGL (it bombs out with errors). There are also occasional graphics glitches and clipping problems, one of which makes it impossible to complete the level, though I don't know how much of this was their bug versus QEMU's bug.

Performance isn't revolutionary, either for POSE or for Shogo. However, keep in mind that all the system libraries are also running under emulation (only syscalls are native), and with Shogo in particular we've hobbled it even further by making the game render everything entirely in software. With that in mind, the fact the framerate is decent enough to actually play it is really rather remarkable. Moreover, I can certainly test things in POSE without much fuss and it's a lot more convenient than firing up a Mac OS 9 instance to run POSE there.

Best of all, when you're done running alien inferior binaries, just umount ~/mnt and it all goes away. When Debian 12 appears, just replace the squashfs image. Easy as pie! A much more straightforward way to run these sorts of programs when you need to.

A footnote: in an earlier article we discussed HQEMU. This was a heavily modified fork of QEMU that uses LLVM to recompile code on the fly for substantially faster speeds at the occasional cost of stability. Unfortunately it has not received further updates in several years and even after I hacked it to build again on Fedora 34, even with the pre-built LLVM 6 with which it is known to work, it simply hangs. Like I said, for now it's stock QEMU or bust.

Firefox 92 on POWER


Firefox 92 is out. Alongside some solid DOM and CSS improvements, the most interesting bug fix I noticed was a patch for open alerts slowing down other tabs in the same process. In the absence of a JIT we rely heavily on Firefox's multiprocessor capabilities to make the most of our multicore beasts, and this apparently benefits (among others, but in particular) the Google sites we unfortunately have to use in these less-free times. I should note for the record that on this dual-8 Talos II (64 hardware threads) I have dom.ipc.processCount modestly increased to 12 from the default of 8 to take a little more advantage of the system when idle, which also takes down fewer tabs in the rare cases when a content process bombs out. The delay in posting this was waiting for the firefox-appmenu patches, but I decided to just build it now and add those in later. The .mozconfigs and LTO-PGO patches are unchanged from Firefox 90/91.

Meanwhile, in OpenPOWER JIT progress, I'm about halfway through getting the Wasm tests to pass, though I'm currently hung up on a memory corruption bug while testing Wasm garbage collection. It's our bug; it doesn't happen with the C++ interpreter, but unfortunately like most GC bugs it requires hitting it "just right" to find the faulty code. When it all passes, we'll pull everything up to 91ESR for the MVP, and you can try building it. If you want this to happen faster, please pitch in and help.

It's not just OMI that's the trouble with POWER10


Now that POWER10 is out, the gloves (or at least the NDA) are off. Raptor Computing had been careful not to explicitly say what about POWER10 they didn't like and considered non-free, though we note that they pointed to our (and, credit where credit's due, Hugo Landau's) article on OMI's closed firmware multiple times. After all, when even your RAM has firmware, even your RAM can get pwned.

Well, it looks like they're no longer so constrained. In a nerdily juicy Twitter thread, Raptor points out that there's something else iffy with POWER10: unlike the issue with OMI firmware, which is not intrinsically part of the processor (the missing piece is the on-DIMM memory controller), this additional concern is the firmware for the on-chip "PPE I/O processor." It's 16 kilowords of binary blob. The source code isn't available.

It's not clear what this component does exactly, either. The commit messages, such as they are, make reference to a Synopsys part, so my guess is it manages the PCIe bus. Although PPE would imply a Power Processing Element (a la Cell or Xenon), the firmware code does not obviously look like Power ISA instructions at first glance.

In any case, Raptor's concern is justified: on POWER9, you can audit everything, but on POWER10, you have to trust the firmware blobs for RAM and I/O. That's an unacceptable step down in transparency for OpenPOWER, and one we hope IBM rectifies pronto. Please release the source.

First POWER10 machine announced


IBM turns up the volume to 10 (and their server numbers to four digits) with the Power E1080 server, the launch system for POWER10. POWER10 is a 7nm chip fabbed by Samsung with up to 15 SMT-8 cores (a 16th core is disabled for yield) for up to 120 threads per chip. IBM bills POWER10 as having 2.5 times more performance per core than Intel Xeon Platinum (based on an HPE Superdome system running Xeon Platinum 8380H parts), 2.5 times the AES crypto performance per core of POWER9 (no doubt due to quadruple the crypto engines present), five times "AI inferencing per socket" (whatever that means) over Power E980 via the POWER10's matrix math and AI accelerators, and 33% less power usage than the E980 for the same workload. AIX, Linux and IBM i are all supported.

IBM targets its launch hardware at its big institutional customers, and true to form the E1080 can scale up to four nodes, each with four processors, for a capacity of 240 cores (that's 1,920 hardware threads for those of you keeping score at home). The datasheet lists 10, 12 and 15 core parts as available, with asymmetric 48/32K L1 and 2MB of L2 cache per core. Chips are divided into two hemispheres (the 15-core version has 7 and 8 core hemispheres) sharing a pool of 8MB L3 cache per core per side, so the largest 15 core part has 120MB of L3 cache split into shared 64MB and 56MB pools respectively. This is somewhat different from POWER9 which divvys up L3 per two-core slice (but recall that the lowest binned 4- and 8-core parts, like the ones in most Raptor systems, fuse off the other cores in a slice such that each active core gets the L3 all to itself). Compared with Telum's virtual L3 approach, POWER10's cache strategy seems like an interim step to what we suspect POWER11 might have.

I/O doesn't disappoint, as you would expect. Each node has 8 PCIe Gen5 slots on board and can add up to four expansion drawers, each adding an additional twelve slots. You do the math for a full four-node behemoth.

However, memory and especially OMI is what we've been watching most closely with POWER10 because OMI DIMMs have closed-source firmware. Unlike the DDIMMs announced at the 2019 OpenPOWER Summit, the E1080 datasheet specifies buffered DDR4 CDIMMs. This appears to be simply a different form factor; the datasheet intro blurb indicates they are also OMI-based. Each 4-processor node can hold 16TB of RAM for 64TB in the largest 16-socket configuration. IBM lists no directly-attached RAM option currently.

IBM is taking orders now and shipments are expected to begin before the end of September. Now that POWER10 is actually a physical product, let's hope there's news on the horizon about a truly open Open Memory Interface in the meantime. Just keep in mind that if you have to ask how much this machine costs you clearly can't afford it, and IBM doesn't do retail sales anyway.

Cache splash in Telum means seventh heaven for POWER11?


AnandTech has a great analysis of IBM's new z/Architecture mainframe processor Telum, the successor to z15 (so you could consider it the "z16" if you like) scheduled for 2022. The most noteworthy part of that article is Telum's unusual approach to cache.

Most conventional CPUs (keeping in mind mainframes are hardly conventional, at least in terms of system design), including OpenPOWER chips, have multiple levels of cache; so did z15. L1 cache (divided into instruction and data) is private to the core and closest to it, usually measured in double-digit kilobytes on contemporary designs. It then fans out into L2, which is also usually private to an individual core and in triple-digit kilobyte range, and then some level of L3 (plus even L4) cache which is often shared by an entire processor and measured in megabytes. Cache size and how cache entries may be placed (i.e., associativity) is a tradeoff between the latency of searching a larger cache, die space considerations and power usage, versus the performance advantages of fewer cache misses and reduced use of slower peripheral memory.

While every design has some amount of L1, there certainly have been processors that dispensed with other tiers of cache. Most of Hewlett-Packard's late lamented PA-RISC architecture had no L2 cache at all, with the L1 cache being unusually large in some units (the 1997 PA-8200 had 4MB of total L1, 2MB each for data and instructions). Closer to home, the PowerPC 970 "G5" (derived from the POWER4) carried no L3; the 2005 dual-core 970MP, used in the Power Mac G5 Quad, IBM POWER 185 and YDL PowerStation, instead had 1MB of L2 per core which was on the large side for that era. Conversely, the Intel Itanium 2 could have up to 64MB of L4 cache; Haswell CPUs with GT3e Iris Pro Graphics can use the integrated GPU's eDRAM as a L3 victim cache for the same purpose as an L4, though this feature was removed in Skylake. However, the Sforza POWER9 in Raptor workstations is more typical of modern chips with three levels of cache: the dual-8 02CY649 in this machine I'm typing on has 32/32KB L1, 512KB L2 and 10MB L3 for each of the eight CPU cores. In contrast, AMD Zen 3 uses a shared 32MB L3 between up to eight cores, with fewer cores splitting the pot in more upmarket parts.

With money and power consumption being less or little object in mainframes, however, large multi-level caches rule the day directly. The IBM z15 processor "drawer" (there are five drawers in a typical system) divides itself into four Compute Processors, each CP containing 12 cores with 128/128K L1 (compare to Apple M1 with 192/192K) and split 4MB/4MB L2 per core paired with 256MB of shared L3, overseen by a single System Controller which provides a whopping 960MB of shared L4. This gives it the kind of throughput and redundancy expected by IBM's large institutional customers who depend on transaction processing reliability. The SC services the four CPs almost like an old-school northbridge, but to L4 cache instead of main RAM.

Telum could have doubled down on this the way z15 literally doubled down on z14 (twice the L3, nearly half again as much L4), but instead it dispenses with L3 and L4 altogether. L1 jumps to 256/256K, and in shades of PA-RISC L2 balloons to 32MB per core, with eight cores per chip. Let's zoom in on the die.
The 7nm 530mm2 die shows the L2 cache in the centre of the eight cores, which is already a tipoff as to how IBM's arranged it: cores can reach into other cores' cache. If a cache line gets evicted from a core's L2 and the core can find space for it within another core, then the cache line goes to that core's L2, and is marked as L3. This process is not free and does incur more latency than a traditional L3 when an L3 line stored elsewhere must be retrieved, but the ample L2 makes this condition less frequent, and in the less common case where a core requires data and some other core already evicted it to that core as L3, it can just adopt it. Overall, this strategy means better utilization of cache that adapts better to more diverse workloads because the large total L2 space can be flexibly redirected as "virtual L3" to cores with greater bandwidth demands.

It doesn't stop there, though, because Telum has another trick for "virtual L4." Recall that the z15 uses five drawers in a typical system; each drawer has an SC that maintains the L4 cache. Telum is two chips to a package, with four packages to a unit (the equivalent of a z15 "drawer") and four units to a system. If you can reach into other cores' L2 to use them as L3, then it's a simple conceptual leap to reach into other chips (even in different units) and use their L2 as L4. Again, latency jumps over a more traditional L4 approach, but this means theoretically a typical Telum system has a total of 8GB that could be redirected as L4 (7936MB, if you don't count an individual core's L2). With 256 cores in this system, there's bound to be room somewhere faster than main memory.

What makes this interesting for OpenPOWER is that z/Architecture and POWER naturally tend to cross-pollinate. (History favours POWER, too. POWER chips already took over IBM i first with the RS64-based A35 and finally with the eCLipz project; IBM AS/400 a/k/a i5/OS a/k/a i hardware used to be its own bespoke AS/400 architecture.) z/Architecture is decidedly not Power ISA but some microarchitectural features are sometimes shared, such as POWER6 and z10, which emerged from a common development process and as a result had similar fabrication technologies, execution units, floating-point units, busses and pipelines.

POWER10 is almost certainly already taped out if IBM is going to be anywhere close to a Q4 2021 release, so whatever influence Telum had on its creation has already happened. But Telum at the microarchitecture level sure looks more like POWER than z15 did: there is no more CP/SC division but rather general purpose cores in a NUMA topology more like POWER9, more typical PCIe controllers (in this case PCIe 5.0) for I/O and more reliance on specialized pack-in accelerators (Telum's headline feature is an AI accelerator for SIMD, matrix math and fast activation function computation; no doubt some of its design started with POWER10's own accelerator). Frankly, that reads like a recipe for POWER11. While a dual-CPU POWER11 workstation might not have much need for L4, the "virtual L3" strategy could really pay off for the variety of workloads workstations and non-mainframe servers have to do, and on a four or eight-socket server, the availability of virtual L4 starts outweighing any disadvantage in latency.

The commonalities should not be overstated, as Telum is also "only" SMT-2 (versus SMT-4 or SMT-8 for POWER9 and POWER10) and the deep 5GHz-plus pipeline the reduced SMT count facilitates doesn't match up with the shorter pipeline and lower clockspeeds on current POWER generations. But that's just part of the chips being customized for their respective markets, and if IBM can pull this trick off for z/Architecture it's a short jump to making the technology work on POWER. Assuming we don't have OMI to worry about by then, that could really be something to look forward to in future processor generations, and a genuinely unique advance for the architecture.

Kernel 5.14


Version 5.14 of the Linux kernel has landed. Not much in PowerPC land this time around except for a few bug fixes, although one of the fixes repairs an issue that can hit certain hashtable-based CPUs (though I don't believe the POWER9 in HPTE mode is known to be affected), but there are some privacy-related features including memfd_secret() that creates a tract of memory even a compromised kernel can't look into, a new ioctl for ext4 filesystems to prevent information leaks, and of course core-based scheduling allowing restrictions on what processes may share cores as extra insurance against Spectre-type attacks (at the cost of less effective utilization, so this is largely more of interest to hosting providers rather than what you run on your own box). Other new features of note include a burstable "Completely Fair Scheduling" to allow a task group to roll over unused CPU quota under certain conditions, a cgroup "kill button" feature and some initial infrastructure for supporting signed BPF programs. Expect this version to appear in Fedora and other "leading edge" distributions soon.

OpenPOWER Firefox JIT update


As of this afternoon, the Baseline Interpreter-only form of the OpenPOWER JIT (64-bit little-endian) now passes all of the JIT tests except for the Wasm ones, which are being actively worked on. Remember, this is just the first of the three phases and we need all three for the full benefit, but it already yields a noticeable boost in my internal tests over the C++ interpreter. The MVP is Baseline Interpreter and Wasm, so once it passes the Wasm tests as well, it's time to pull it current with 91ESR. You can help.

Debian 11


Debian 11 bullseye is officially released, the latest stable version and the "other white meat" of the two big distros I suspect are commonly used on OpenPOWER workstations (Fedora being the other, and Ubuntu third). Little-endian 64-bit Power ISA (ppc64el) has been a supported architecture for Debian since 8 jessie. The updates are conservative but important, which is what you're looking for if you run Debian stable, such as kernel 5.10, GNOME 3.38, KDE Plasma 5.20, LXDE 11, LXQt 0.16, MATE 1.24, and Xfce 4.16, plus gcc 10.2 and LLVM 9 (with Clang 11). ISOs are already available on the mirrors. If you've updated, post your impressions in the comments.

Firefox 91 on POWER fur the fowk


Firefox 91 is out. Yes, it further improves cookie isolation and cleanup, has faster paint scheduling (noticeably, in some cases), and new JavaScript and DOM support. But for my money, the biggest news is the Scots support: aye, laddie, noo ye kin stravaig the wab lik Robert Burns did. We've waited tae lang fur this.

Anyway, Firefox 91 builds oot o the kist oa, er, Firefox 91 builds out of the box on OpenPOWER using the same .mozconfigs for Firefox 90; I made a wee change to the PGO-LTO patch since I messed up the diff the last time and didn't notice. The crypto issues in Fx90 are fixed in this release.

Meanwhile, the OpenPOWER JIT is now passing all but a handful of the basic tests in Baseline Interpreter mode, and some amount of Wasm, though this isn't nearly as far along. Ye kin hulp.

Tonight's game on OpenPOWER: System Shock Enhanced Edition


Yeah, I know we're doing a lot of FPSes in this series. It's what I tend to play, so deal. Tonight we'll be playing System Shock, the classic hacker-shooter (seems appropriate), courtesy of Shockolate, which adds higher resolutions, better controls, mouselook and OpenGL support. Our drug dealers at GoG, who don't pay us a cent for this kind of shameless plug and really ought to, make the game files easily available as System Shock Enhanced Edition. However, you can also use the DOS or Windows 95 CD-ROM; I tested with both. (I'll talk about the Macintosh release in a moment.)

Shockolate requires CMake and SDL2, and FluidSynth is strongly advised. Don't let Shockolate build with its bundled versions: edit CMakeLists.txt and change all "BUNDLED" libraries to "ON" (don't forget the quote marks). Once set, building should work out of the box (tested on Fedora 34):

mkdir build
cd build
cmake ..
make -j24 # or as you like
cd ..
ln -s build/systemshock systemshock

(The last command is to make running the binary a little more convenient.)

Now we need to provide the resources. For FluidSynth, you'll need a soundfont (I used the default that comes with Fedora's package). If you have the DOS/Windows CD-ROM, insert it now. We will assume it is mounted at /run/media/censored/EA.

mkdir res
cd res
ln -s /usr/share/soundfonts/default.sf2 default.sf2
cp -R /run/media/censored/EA/hd/data .
cp -R /run/media/censored/EA/hd/sound .
chmod -R +w . # if copying from CD makes things read only
cd data
rm -f intro.res
rm -f objprop.dat
cp /run/media/censored/EA/cdrom/data/* .
cd ../..

Then start the game with ./systemshock. The resolutions and choice of renderer (software or OpenGL) are set from the in-gameplay menu (press ESC). Shockolate also implements WASD motion (as well as the classic arrow keys) and F to toggle mouselook. Note that OpenGL is somewhat darker than software mode. It's not clear if this is actually a bug.

Playing System Shock Enhanced Edition in Shockolate is just a more convenient way to get the DOS assets since Shockolate just uses those and not any of the patches (more about this in a second); gameplay and features are the same. Also, GoG only distributes it as a Windows installer and the file structure is a bit different. Use innoextract to break the installer EXE apart into a separate directory and delete everything but sshock.kpf, which is a cloaked ZIP archive containing the game assets. In your Shockolate source directory (note that this also creates res/, so if you did the steps above delete it first),

mkdir ssee
cd ssee
unzip /path/to/sshock.kpf
cd ..
mkdir res
mv ssee/res/pc/hd/data res
cp ssee/res/pc/cdrom/data/* res/data/
mv ssee/res/pc/hd/sound res
rm -rf ssee # if you want
ln -s /usr/share/soundfonts/default.sf2 res/default.sf2

Then start the game with ./systemshock.

Oddly, although Shockolate was based on the (IMHO) superior Power Mac release, it doesn't seem to properly support its higher-resolution assets (SSEE does and includes a converted set, but the source for thatunlike Strife — isn't currently available). I actually own this version also. One rather unique reason to own it is because the cutscenes and audio files are all playable in QuickTime, so if you don't feel like slogging through the entire game you can just listen to the audio logs or go straight to the ending using a Mac emulator. However, you need to do a little song and dance to mount the HFS volume on Linux (as root):

losetup /dev/loop0 /dev/sr0 # or where your drive is
partx -av /dev/loop0

This will respond with something like

partition: none, disk: /dev/loop0, lower: 0, upper: 0
/dev/loop0: partition table type 'mac' detected
range recount: max partno=2, lower=0, upper=0
/dev/loop0: partition #1 added
/dev/loop0: partition #2 added

and you should see it mount in your desktop environment (note that many applications won't understand the resource fork). Do losetup -D before ejecting the physical disc. As a parenthetical note, since SSEE is presumably derived from the GPL-released Mac source code, you would think it, too, would be GPL. But I'm uncertain of the exact history there.

Salt the fries.

Rest in peace, Itanium


Intel has now ceased official support for IA-64 and will make no more Itanium processors (which matters really only to HPE). It's sad to see another processor architecture disappear, even one that seemed to be as universally unloved as this one, so consider this post a brief epitaph for yet another drop in computing diversity. If you want choices other than the cold, sterile x86-ARM duality, they won't happen if you keep buying the same old stuff.

Firefox 90 on POWER (and a JIT progress report)


Firefox 90 is out, offering expanded and improved software WebRender (not really a problem if you've got a supported GPU as most of us in OpenPOWER land do, though), an enhanced SmartBlock which ups the arms race with Facebook, and private fields and methods in JavaScript among other platform updates. FTP is now officially and completely gone (and really should be part of registerProtocolHandler as Gopher is), but at least you can still use compact layout for tabs.

Unfortunately, a promising OpenPOWER-specific update for Fx90 bombed. Ordinarily I would have noticed this with my periodic smoke-test builds but I've been trying to continue work on the JavaScript JIT in my not-so-copious spare time (more on that in a moment), so I didn't notice this until I built Fx90 and no TLS connection would work (they all abort with SSL_ERROR_BAD_SERVER). I discussed this with Dan Horák and the official Fedora build of Firefox seemed to work just fine, including when I did a local fedpkg build. After a few test builds over the last several days I determined the difference was that the Fedora Firefox package is built with --use-system-nss to use the NSS included with Fedora, so it wasn't using whatever was included with Firefox.

Going to the NSS tree I found bug 1566124, an implementation of AES-GCM acceleration for Power ISA. (Interestingly, I tried to write an implementation of it last year for TenFourFox FPR22 but abandoned it since it would be riskier and not much faster with the more limited facilities on 32-bit PowerPC.) This was, to be blunt, poorly tested and Fedora's NSS maintainer indicated he would disable it in the shipping library. Thus, if you use Fedora's included NSS, it works, and if you use the included version in the Firefox tree (based on NSS 3.66), it won't. The fixes are in NSS 3.67, which is part of Firefox 91; they never landed on Fx90.

The two fixes are small (to security/nss/lib/freebl/ppc-gcm-wrap.c and security/nss/lib/freebl/ppc-gcm.s), so if you're building from source anyway the simplest and highest-performance option is just to include them. (And now that it's working, I do have to tip my hat to the author: the implementation is about 20 times faster.) Alternatively, Fedora 34 builders can still just add --with-system-nss to their .mozconfig as long as you have nspr-devel installed, or a third workaround is to set NSS_DISABLE_PPC_GHASH=1 before starting Firefox, which disables the faulty code at runtime. In Firefox 91 this whole issue should be fixed. I'm glad the patch is done and working, but it never should have been committed in its original state without passing the test suite.

Another issue we have a better workaround for is bug 1713968, which causes errors building JavaScript with gcc. The reason that Fedora wasn't having any problem doing so is its rather voluminous generated .mozconfig that, amongst other things, uses -fpermissive. This is a better workaround than minor hacks to the source, so that is now in the .mozconfigs I'm using. I also did a minor tweak to the PGO-LTO patch so that it applies cleanly. With that, here are my current configurations:

Debug

export CC=/usr/bin/gcc
export CXX=/usr/bin/g++

mk_add_options MOZ_MAKE_FLAGS="-j24" # as you like
ac_add_options --enable-application=browser
ac_add_options --enable-optimize="-Og -mcpu=power9 -fpermissive"
ac_add_options --enable-debug
ac_add_options --enable-linker=bfd

export GN=/home/censored/bin/gn # if you have it

PGO-LTO Optimized

export CC=/usr/bin/gcc
export CXX=/usr/bin/g++

mk_add_options MOZ_MAKE_FLAGS="-j24" # as you like
ac_add_options --enable-application=browser
ac_add_options --enable-optimize="-O3 -mcpu=power9 -fpermissive"
ac_add_options --enable-release
ac_add_options --enable-linker=bfd
ac_add_options --enable-lto=full
ac_add_options MOZ_PGO=1

export GN=/home/censored/bin/gn # if you have it
export RUSTC_OPT_LEVEL=2

So, JavaScript. Since our last progress report our current implementation of the Firefox JavaScript JIT (the minimum viable product of which will be Baseline Interpreter + Wasm) is now able to run scripts of significant complexity, but it's still mostly a one-man show and I'm currently struggling with an issue fixing certain optimized calls to self-hosted scripts (notably anything that calls RegExp.prototype.* functions: it goes into an infinite loop and hits the recursion limit). There hasn't been any activity the last week because I've preferred not to commit speculative work yet, plus the time I wasted tracking down the problem above with TLS. The MVP will be considered "V" when it can pass the JavaScript JIT and conformance test suites and it's probably a third of the way there. You can help. Ask in the comments if you're interested in contributing. We'll get this done sooner or later because it's something I'm motivated to finish, but it will go a lot faster if folks pitch in.

Intel might fab your next POWER chip


The Wall Street Journal is reporting that Intel is in talks to buy GlobalFoundries ("GloFo")' manufacturing assets for potentially as much as US$30b (via Reuters). Much of the amusement on the Internet tech sector side is because GloFo started life in 2008 as the divestiture of AMD's manufacturing arm, an admittedly entertaining historical irony, but GloFo is important to us in OpenPOWER-land too because they manufacture the POWER9.

GloFo's fabbing of POWER and OpenPOWER comes from a July 2015 deal with IBM where IBM basically paid GloFo US$1.5b to take and operate their U.S.-based chip manufacturing business unit. In return, GloFo would provide chips to IBM through 2025. The POWER9 in your servers and workstations was initially manufactured at IBM's former East Fishkill, NY plant, which became GlobalFoundries Fab 10 and was sold to ON Semiconductor in 2019, and are now produced at GloFo Fab 9, which was IBM's previous plant in Essex Junction, VT.

This arrangement was odd even at the time, and admittedly wouldn't have been IBM's first choice, but semiconductor production has obvious national security implications and regulators wouldn't let it just sell off or shut down its entire domestic manufacturing arm. Predictably the deal doesn't seem to have gone well. In a lawsuit last month, IBM alleged that by fall 2015 GlobalFoundaries told IBM they wouldn't proceed further on development of a 10nm process and instead would move to 7nm, jeopardizing the POWER9 which was originally supposed to be produced at 10nm. IBM claims that despite what it views as a contract breach, it nevertheless continued the payout to bring up a 7nm process instead by Q3 2018, which didn't happen either. At that point IBM says GloFo asked it for another $1.5b, which IBM refused, and GloFo suspended further development. The POWER9 was eventually manufactured with a 14nm node size using FinFET technology instead.

Intel's interest in GloFo is to secure a manufacturing pipeline for its silicon in the midst of the global chip shortage, and, as voiced earlier this year by CEO Pat Gelsinger, to get a piece of the chip manufacture business from companies other than itself. While GloFo is only about 7% of the market, they have a customer base and support infrastructure Intel completely lacks. While GloFo's current node size is still not highly competitive, there is still a lot of money to be made in larger process sizes for less performance-sensitive applications. Chipzilla certainly has its own fabs but their technology has not been nearly as advanced (certainly not as much as, say, TSMC), and they cater nearly exclusively to Intel in-house designs. However, AMD still relies on its former facilities at GloFo for its own chips, which would certainly attract regulator scrutiny if Intel were to control its supply chain. The deal does not seem to involve GloFo the company, but would have to involve its physical assets and operations to be at all worth the headache.

The POWER9 is still made at Fab 9, but IBM, chastened by its problems with GloFo, has turned to Samsung and its 7nm process for POWER10. However, Intel has a strong interest in improving its node size and although exact numbers have lately gotten more and more meaningless, Cannon Lake, Tigerlake and others are "still" at 10nm, whatever that means. Plus, Intel would presumably have control over IBM's old facilities which they would still know relatively well, and while IBM doesn't have the volume of other chip designers anymore, they're still considered a significant player and their parts are higher-end. AMD may not like their next chips being fabbed by Intel but IBM may not have a problem with it, and if Samsung can't deliver on POWER10 after all, stranger things have happened.

Chimera Linux: if you like BSD and Linux, why not both


A new Linux option for ppc64le popped up on my radar today, and this one's really interesting: if you like the way FreeBSD does business, but you want a Linux kernel, now you don't have to choose. Chimera Linux gives you a FreeBSD userland with no GNU components in the base system except make and ncurses (for some of the readership this will be a feature), plus the ability to bootstrap itself from any musl-based distro like Alpine or Adelie or Void PPC's musl variant. ppc64le, aarch64 and x86_64 are the three launch architectures, making OpenPOWER a first-class citizen from the very beginning, and they promise portability with any architecture that has LLVM/clang available.

There are some important questions yet to answer, however, and the distro is clearly not yet ready for prime time. There's no init system yet (let's hope it's not systemd, because that would really be an unholy union) and there's not even a kernel, so I can't tell you what it runs; presumably it uses the kernel of the distro you bootstrap it on. For that reason, don't even think about asking for a bootable ISO. The source package build system is also custom and it wouldn't be surprising to me if it manifested rough edges for awhile.

The other question mark is LLVM; Chimera relies on clang, not gcc. clang works for a lot of things on ppc64le but at least in my usage doesn't properly work for some large projects like Firefox, and its performance is marginally poorer. This is undoubtedly no problem on the other two architectures, but they are the primary focus of LLVM and clang, and OpenPOWER isn't.

Still, I think there's real potential for this project, quite possibly for people who would ordinarily be attracted to Slackware's philosophy but are more used to the way BSDs do business. (As a product of the University of California, I have great empathy for this viewpoint.) And there's already precedent: Mac OS X macOS is a Mach kernel, but with a BSD userland, and look at how successful that concept was. Sometimes the best choice is the one you don't have to end up making.

Bloody Friday at IBM?


The conspiracy theorist in me wonders about the timing of announcing a large-scale shift in IBM executive staff ... on a Friday going into a United States holiday weekend. That's the kind of press release that tries very hard to say "nothing to see here," and especially one with the anodyne title of "IBM Leadership Changes."

But of course there is something to see here. Cutting through nonsense like "our hybrid cloud and AI strategy is strongly resonating" (remember, things break if they resonate too much), the biggest thing to see is IBM President Jim Whitehurst stepping down. Whitehurst was, of course, the former CEO of Red Hat prior to its merger with Big Blue, and he lasted just over a year as IBM President (but over twelve with Red Hat). He will now be working as a "Senior Advisor" to IBM CEO and chairman Arvind Krishna, which is a nice way of saying a paid sinecure until he finds another gig, and it's unclear if he will get any further payments on his US$6 million retention bonus. This is what it looks like when someone is pushed out, presumably over fax given the current IBM E-mail migration mess.

Whitehurst wasn't the only executive blood spilled: the senior VP of global markets is also out (not to be confused with Global Technology Services), now similarly moved to an executive sinecure for "special projects," and the former senior VP of IBM systems and hardware has either been laterally moved or outright demoted to senior VP of "cloud and cognitive software." Various other new names are coming on board, too. It really reads like Krishna is cleaning house.

Is this reshuffle part of the bootup of the cloud-focused IBM "NewCo", now called the vaguely unpronounceable Kyndryl? Technically NewCo-Kyndryl doesn't exist until the end of this year, and none of the officers Kyndryl announced yesterday include the names in IBM's announcement today. But it would be surprising for any of the cloud operations to remain at IBM "OldCo" which is supposed to have the legacy software and hardware operations ... unless the reasoning for the split was just a load of hooey and it was never the intention for either half of the company to retain exclusive control over its respective market interests. That probably bodes more ill for Kyndling, er, Kyndryl than it does for its parent corporation. Either way, it certainly looks like Red Hat proved the old joke is true: anything plus IBM ends up equalling IBM.

PowerPC and the Western Digital My Book Live debacle


Users relying on the Western Digital My Book Live and My Book Duo NAS systems had an ugly surprise last week when they were abruptly and remotely reset to factory default, erasing all their data. A combination of multiple exploits and Western Digital commenting out a password check appear to be responsible not only for the injury of data loss, but also the added insult of being infected with malware at the same time to join a botnet.

The interest to us here is that the WD My Book Live and Duo family are 32-bit PowerPC devices, more specifically the 800MHz Applied Micro APM82181, which is an enhanced 90nm PowerPC 440 core with additional DSP instructions called the PPC 464. The PowerPC 464FP used here includes a 7-stage pipeline and floating-point unit, and the APM82181 adds a DDR2 controller (256MB onboard) and 256K of RAM configurable as L2 cache. You can boot Gentoo and OpenWRT on it, all of which is unsurprising because the My Book Live basically runs Debian. Western Digital has not issued updates for this device since 2015 and many distros (including Debian, starting with stretch) have dropped 32-bit PowerPC support, though it is still supported in the kernel (except for the PowerPC 601) and these operating systems plus Void PPC and others still support the architecture generally.

The attack abuses a zero-day (CVE-2021-18472) to drop a malware executable named .nttpd,1-ppc-be-t1-z. This is a 32-bit PowerPC-compiled ELF binary and is part of the Linux.Ngioweb family, which in its most recent iteration "supports" 32 and 64-bit x86, MIPS, ARM (32/64) and PowerPC, and there are rumours it's been spotted on s390x (!) and Hitachi SuperH. There is no "preferred device" and the new presence of this malware on PowerPC hosts simply means the authors write good portable code and are expanding to more targets (we'd rather they were porting more generally useful applications, of course).

The upshot of all this is platforms are only as good as their security. There's nothing about the vulnerability which is specific to PowerPC, merely to the spin of Debian they use and what they've layered on top of it. WD recommends disconnecting these NASes from the Internet, and technically as IoTs they probably shouldn't be out naked on a WAN in the first place, but a better idea is to put something on them that's actually supported and maintained. It's fortunate that these devices are "open enough" that you can do it. What about the systems or hardware that aren't?

How to make GNOME not suck on Fedora 34 ppc64le


Hats off to Daniel Kolesa who noticed this initially. Today Floodgap Orbiting HQ was crashed because of power company shenanigans, so since this T2 was down for the count, I decided to do a little GNOME surgery and see if he was right.

For context, read our less than complimentary review of Fedora 34 (nothing personal, Dan!). Besides the usual wholesale breakage of add-ons with each GNOME update, GNOME 40 also broke colour profiles and most importantly had a dramatic reduction in graphics performance. Mucking with GPU performance settings, which seemed to work for other users on x86_64, didn't appear to make much difference in my case. Daniel's discovery, mentioned in the comments, was that one of the major underlying libraries graphene had support for gcc vector intrinsics but it wasn't using them on anything but x86_64.

So, before I brought this Fedora 34 Talos II back to the desktop (remember that I boot to a text login), I downloaded the sources for the version of graphene (1.10) used in Fedora 34 release and applied his patch. You'll need meson (at least 0.50.1), gobject-introspection-devel, gtk-doc, and gobject-2.0 (at least 2.30.0), and of course gcc, so make a quick dnf trip, unpack the source, apply his very trivial patch and mkdir build; cd build; meson ..;ninja to kick it off. (I also threw a common_cflags += ['-O3', '-mcpu=power9'] in meson.build to juice it a little more.) I backed up the old /lib64/libgraphene-1.0.so.0.1000.6 (it says 1.0, but it's 1.10) and replaced it with the one you'll find in build/src, and then did a startx to bring up GNOME.

The result? Not only didn't it crash, but dang is it buttah. I didn't test Wayland, because Wayland, but I would be very surprised if it didn't improve as much as X11 did — desktop performance is now just about back to Fedora 33 level, with only rare stuttering, even when I turned all GNOME animations back on. This also explains why mucking with the GPU performance settings didn't yield much improvement, because the GPU wasn't (at least primarily) the problem. Nice to have all that VSX and VMX doing something useful!

If you are using F34, you should just go ahead and do this; I've recommended to Dan Horák that this should be adopted for the official package, but the performance regression for me was so bad that I'm delighted I don't have to wait. If you don't want to build it yourself I uploaded a gzipped copy of .so I'm using to the Github pull request, but make sure you back up your old file before you replace it, don't replace it while GNOME is running, and you do so at your own risk (but you kept the old file so you can back it out, right?). The version I built is for POWER9, so don't run it on a POWER8. Naturally if you run Void PPC (because Daniel) you already have this change, but if you're on Fedora as I am, using F34 is suddenly a lot more pleasant.

Make water cooling great again


Although I'm sure there are one-offs, the last and most powerful Power architecture workstation to ship liquid-cooled from the factory was the Power Mac Quad G5. It's nice and quiet compared to the air-cooled G5s and allowed Apple to get that extra edge out of the PowerPC 970, though I did have to replace my cooling system a few years ago and you always have to watch for leaks. While I don't find this dual-8 Raptor Talos II to be particularly loud with the stock HSFs, it might be nice to get my 4-core HTPC Blackbird a little quieter when I'm watching a movie and I think that would be worth having to do a little service on it now and then.

While Raptor has no known plans to ship liquid-cooled machines, Vikings has your back (note that their store is down as of this writing for "spring cleaning"). The key with the IBM HSFs is that they get good, high-pressure contact between the heat spreader and the fan heatsink such that you can run 4 and 8-core parts without thermal compound or indium pads. While the custom mount Vikings developed doesn't achieve that level presently, with MX5 thermal compound they demonstrated over a 10 degree C reduction in core temperatures under load compared to the stock HSFs on a single 22-core POWER9. The custom CPU fitting then connects to an off-the-shelf Laing DC pump and a 120mm radiator.

There are of course problems to be solved before this becomes a workable product even though this prototype is very promising. Vikings is still trying to figure out how much pressure should be applied by the CPU clamp; while the use of thermal compound allows a bit of wiggle room here, clamp down too much and you'll crack the chip but too little and it won't cool. (The IBM HSFs are very user-friendly in this regard: turn until it stops.) There are also concerns the compression fittings may be too tight and a manufacturing issue with the mechanism's upper plate. For my money, since I already have one expensive liquid-cooled computer under this desk, I'd also want easy serviceability to drain fluids and replace tubing, and I'd want high quality components and fittings to reduce evaporative loss and the chance of any dreaded leaks. Like anything else, pay now for quality or pay later for damage.

Still, additional cooling options would be great for getting OpenPOWER machines in more places they haven't been before, and while the little 4-cores run very cool with just passive heatsinks, running dual-8s or higher core counts might really benefit from a liquid cooling system (especially you nuts out there trying to cram 18-core parts into Blackbirds). No ETA on a saleable product yet but I'm looking forward to seeing it develop.

Debian 10.10


Debian 10.10 is released, the latest stable version of what I suspect is the other of the two most common distros on OpenPOWER workstations. Like all stable Buster releases, it concentrates almost exclusively on critical fixes and security updates; in particular, the kernel remains at 4.19. Support for Buster is expected through 2022 and updated ISOs are now available.

Firefox 89 on POWER


Firefox 89 was released last week with much fanfare over its new interface, though being the curmudgeon I am I'm less enamoured of it. I like the improvements to menus and doorhangers but I'm a big user of compact tabs, which were deprecated, and even with compact mode surreptitously enabled the tab bar is still about a third or so bigger than Firefox 88 (see screenshot). There do seem to be some other performance improvements, though, plus the usual more lower-level changes and WebRender is now on by default for all Linux configurations, including for you fools out there trying to run Nvidia GPUs.

The chief problem is that Fx89 may not compile correctly with certain versions of gcc 11 (see bugs 1710235 and 1713968). For Fedora users if you aren't on 11.1.1-3 (the current version as of this writing) you won't be able to compile the browser at all, and you may not be able to compile it fully even then without putting a # pragma GCC diagnostic ignored "-Wnonnull" at the top of js/src/builtin/streams/PipeToState.cpp (I still can't; see bug 1713968). gcc 10 is unaffected. I used the same .mozconfigs and PGO-LTO optimization patches as we used for Firefox 88. With those changes the browser runs well.

While waiting for the updated gcc I decided to see if clang/clang++ could now build the browser completely on ppc64le (it couldn't before), even though gcc remains my preferred compiler as it generates higher performance objects. The answer is now it can and this time it did, merely by substituting clang for gcc in the .mozconfig, but even using the bfd linker it makes a defective Firefox that freezes or crashes outright on startup; it could not proceed to the second phase of PGO-LTO and the build system aborted with an opaque error -139. So much for that. For the time being I think I'd rather spend my free cycles on the OpenPOWER JavaScript JIT than figuring out why clang still sucks at this.

Some of you will also have noticed the Mac-style pulldown menus in the screenshot, even though this Talos II is running Fedora 34. This comes from firefox-appmenu, which since I build from source is trivial to patch in, and the Fildem global menu GNOME extension (additional tips) paired with my own custom gnome-shell theme. I don't relish adding another GNOME extension that Fedora 35 is certain to break, but it's kind of nice to engage my Mac mouse-le memory and it also gives me a little extra vertical room. You'll notice the window also lacks client-side decorations since I can just close the window with key combinations; this gives me a little extra horizontal tab room too. If you want that, don't apply this particular patch from the firefox-appmenu series and just use the other two .patches.

Progress on the OpenPOWER SpiderMonkey JIT


Progress!

% gdb --args obj/dist/bin/js --no-baseline --no-ion --no-native-regexp --blinterp-eager -e 'print("hello world")'
GNU gdb (GDB) Fedora 10.1-14.fc34
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "ppc64le-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from obj/dist/bin/js...
(gdb) run
Starting program: obj/dist/bin/js --no-baseline --no-ion --no-native-regexp --blinterp-eager -e print\(\"hello\ world\"\)
warning: Expected absolute pathname for libpthread in the inferior, but got .gnu_debugdata for /lib64/libpthread.so.0.
warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
[New LWP 2797069]
[LWP 2797069 exited]
[New LWP 2797070]
[New LWP 2797071]
[New LWP 2797072]
[New LWP 2797073]
[New LWP 2797074]
[New LWP 2797075]
[New LWP 2797076]
[New LWP 2797077]
hello world
[LWP 2797072 exited]
[LWP 2797070 exited]
[LWP 2797074 exited]
[LWP 2797077 exited]
[LWP 2797073 exited]
[LWP 2797071 exited]
[LWP 2797076 exited]
[LWP 2797075 exited]
[Inferior 1 (process 2797041) exited normally]

This may not look like much, but it demonstrates that the current version of the OpenPOWER JavaScript JIT for Firefox can emit machine language instructions correctly (mostly — still more codegen bugs to shake out), handles the instruction cache correctly, handles ABI-compliant calls into the SpiderMonkey VM correctly (the IonMonkey JIT is not ABI-compliant except at those edges), and enters and exits routines without making a mess of the stack. Much of the code originates from TenFourFox's "IonPower" 32-bit PowerPC JIT, though obviously greatly expanded, and there is still ongoing work to make sure it is properly 64-bit aware and takes advantage of instructions available in later versions of the Power ISA. (No more spills to the stack to convert floating point, for example. Yay for VSX!)

Although it is only the lowest level of the JIT, what Mozilla calls the Baseline Interpreter, there is substantial code in common between the Baseline Interpreter and the second-stage Baseline Compiler. Because it has much less overhead compared to Baseline Compiler and to the full-fledged Ion JIT, the Baseline Interpreter can significantly improve page loads all by itself. In fact, my next step might be to get regular expressions and the OpenPOWER Baseline Interpreter to pass the test suite and then drag that into a current version of Firefox for continued work so that it can get banged on for reliability and improve performance for those people who want to build it (analogous to how we got PPCBC running first before full-fledged IonPower in TenFourFox). Eventually full Ion JIT and Wasm support should follow, though those both use rather different codepaths apart from the fundamental portions of the backend which still need to be shaped.

A big shout-out goes to Justin Hibbits, who took TenFourFox's code and merged it with the work I had initially done on JitPower way back in the Firefox 62 days but was never able to finish. With him having done most of the grunt work, I was able to get it to compile and then started attacking the various bugs in it.

Want to contribute? It's on Github. Tracing down bugs is labour-intensive, and involves a lot of emitting trap instructions and single-stepping in the debugger, but when you see those small steps add up into meaningful fixes (man, it was great to see those two words appear) it's really rewarding. I'm happy to give tips to anyone who wants to participate. Once it can pass the test suite at some JIT level, it will be time to forward-port it and if we can get our skates on it might even be possible to upstream it into the next Firefox ESR.

For better or worse, the Web is a runtime. Let's get OpenPOWER workstations running it better.

Tonight's game on OpenPOWER: Blake Stone Aliens of Gold


Everything is awful, so now that we've rescued a Blackbird let's go shoot more aliens. One of the more entertaining games based on id's Wolfenstein 3D engine was Apogee's Blake Stone: Aliens of Gold from 1993, developed by yet another set of apparent refugees from Softdisk, but that's another story for another day. It kept the basic formula but added subtle lighting effects and ceiling and floor textures, along with more varied environments, lots of (cartoony but fun to shoot) monsters, and a very helpful automap.

Ordinarily this wouldn't be worth a mention; that's what we have DOSBox for (see our article on adding an OpenPOWER JIT to DOSBox). Despite the fact that DOSBox does support this game, however, I do actually own a retail copy of Blake Stone from back in the day and it runs like dried snot even with a JIT.

Fortunately, the source code to Blake Stone was released back in the day as well after it was long believed to be lost, and an excellent SDL/OpenGL port called BStone is available which adds many graphical improvements, mouse look (well, side to side, anyway), and 16:9 modes as demonstrated in the screenshot. It also supports the IMHO inferior sequel, Planet Strike.

To start saving mankind, you can play the shareware version, but it's more fun to play with a retail copy (mine is the 1994 FormGen release, but the one you can still buy from Apogee will work), or extract the game assets from the DOSBox-based GOG installer. The CD or 3D Realms downloads are easiest to work with, as you can just copy the contents into a folder.

Clone the BStone Github project. You will need CMake and SDL 2.0.4 (or better) development headers and libraries. The CMake build recipe assumes that your SDL has a static libSDL2main.a library, which apparently the ones from Fedora, Slackware and possibly others don't, which may require modifying the SDL CMake component that comes with it (I had to). Then mkdir build, cd build and cmake .. to kick it off.

Once built you can start the game either from the directory where your Blake Stone files are (I have cd ~/stuff/dosbox/BSTONE && ~/src/bstone/build/src/bstone), or pass bstone the --data_dir option with a path (if it fails to detect the correct game, try passing --aog, --aog_sw or --ps). If you don't have OpenAL-capable hardware, disable OpenAL sound from the in-game configuration menu, or you may get random crashes during play. Don't shoot the informants.

Fedora 34 mini-review on the Blackbird and Talos II (it sucks)


Once again it's time to upgrade Floodgap's stable of Raptor systems to the latest release of Fedora, which is up to version 34 (see our prior review of Fedora 33). You may not necessarily run Fedora yourself, but the fact that it does run is important, because it tends to be very ahead of most distros and many problems are identified in it and fixed before moving to other less advanced ones. And boy howdy, are there problems this time. I'm going to get it over with and tl;dr myself right now: if you use GNOME as your desktop environment and you haven't upgraded yet, DON'T. F34 and in particular GNOME 40 are half-baked, and the problems don't seem specific to OpenPOWER and the hard work of folks like Dan Horák; these issues are more generalized. There is always that sense of dread over what's going to break during the update, and while I'm finally typing in Firefox on this updated Talos II, it took me hours to get everything glued back together and the desktop performance problems in particular are cramping my ability to use the system well. Fedora 33 will still be supported until a month after F35 comes out; it may be worth sticking with F33 for a couple months for the GNOME team to work on the remaining performance issues.

The problems started from the very beginning, even before actually updating. I do my updates initially on the Blackbird to shake out any major problems before doing it to my daily driver T2. As I explained previously, neither the Blackbird nor the T2 use gdm; they both boot to a text prompt, and we jump to GNOME with startx (or XDG_SESSION_TYPE=wayland dbus-run-session gnome-session if we want to explore the Wayland Wasteland). I do the upgrade at the text prompt so that there is minimal chance of interference. Our usual MO to update Fedora is, as root,

dnf upgrade --refresh # upgrade prior system and DNF
dnf install dnf-plugin-system-upgrade # install upgrade plugin if not already done
dnf system-upgrade download --refresh --releasever=34 # download F34 packages
dnf system-upgrade reboot # reboot into upgrader

If you do this with F34, however, you get a number of downgrades (unavoidable, apparently), missing groups and an instant conflict with iptables when you try to download the packages:

dnf suggests we add --best --allowerasing to deal with that. It doesn't work:
Neither does adding --skip-broken. The non-obvious solution is dnf system-upgrade download --refresh --releasever=34 --allowerasing, and just ignoring the duff package.
The Blackbird does not have a GPU; all video output is on the ASPEED BMC (using the Blackbird's HDMI port). Ordinarily I would select the new kernel from Petitboot when it restarts after the final command above to see a text log of the installation but this time we get an actual graphical install screen.
After the installation completed, the machine rebooted uneventfully and came up to the text prompt. I entered startx as usual and ...
At this point GNOME just plain hung up. There was no mouse pointer, though pressing ENTER on the keyboard triggered the button and put it back to the text prompt. Nothing unusual was in the Xorg logs, and journalctl -e showed only what seemed like a non-fatal glitch (Window manager warning: Unsupported session type). Well, maybe the time for the Wayland Wasteland was now. I did an exec bash (gnome-session doesn't properly handle using another shell, or you get weird errors like Unknown option: -l because it tries to be cute with the options to exec) and XDG_SESSION_TYPE=wayland dbus-run-session gnome-session, and Wayland does start:
However, it still doesn't support 1920x1080 on the Blackbird on-board HDMI, just 1024x768. It also seemed a little sluggish with the mouse. I exited it and tried to start gnome-session --debug --failsafe but it wouldn't initialize.

It then dawned on me that I was setting XDG_SESSION_TYPE manually for Wayland; I previously left it unset for X11. Setting XDG_SESSION_TYPE to x11 finally brought up GNOME 40 in X with a full 1080p display:

I put that into my .cshrc and that was one problem solved. The Applications drawer seemed a little slower to come up, though I have a very vanilla installation on this Blackbird on purpose and few apps are loaded, so I didn't try scrolling through the list or running lots of applications at once. (More on that in a moment.)

Just to see if anything shook out subsequently, I ran dnf upgrade again. This time the missing iptables compatibility packages came up:

That solves that mystery, so just ignore iptables during the initial download and the next time you run dnf after Fedora has been upgraded, it will clean up and install the right components. This whole sordid affair now shows up in the Release Notes.

Upgrading the Talos II is usually a much more complex undertaking anyway because I have custom GNOME themes and extensions installed on it and I always expect there will be some bustage. I don't like it, mind you, but I expect it. Armed with what I had learned from the Blackbird, I installed the packages on the T2 (some other groups also had "no match," though all of my optionally installed packages could and did upgrade) and rebooted.

Unlike the Blackbird, however, the installer still came up in a text screen as in prior upgrades when I selected that kernel from the Petitboot menu.
This machine has the BTO AMD WX7100 workstation card and does not use the ASPEED BMC framebuffer. If you don't select the kernel from the menu and just let the default go, you will get the usual black screen again, and as in prior versions you'll have to pick another VTY with CTRL-ALT-F2 or something, log in as root and periodically issue dnf system-upgrade log --number=-1 to watch.

I rebooted and started X (with XDG_SESSION_TYPE=x11), and GNOME came up, but it looked a little ... off.

If you noticed the weird pink-purple tint, you win the prize. However, my second monitor seemed to have a normal display (so did the Blackbird), and the difference is that my main display is colour-managed. When I selected the default profile, the tint went away but my colours weren't, you know, just right. I spent a few hours regenerating the profile with my Pantone huey manually with dispcal, but the same thing happened with the new profile.

The problem is the new colour transform matrix (CTM) support; the prior profile obviously worked fine in 3.38 but isn't compatible with 40. The proper way to solve this would be by letting GNOME make a new colour profile for you from the Settings app and it even allegedly supports the Pantone huey and other colourimeters. However, it has never (to my knowledge) worked properly on OpenPOWER (it crashes), so I've never been able to do this myself. Instead, my current solution is to just temporarily disable CTM with

xrandr --output DisplayPort-0 --set CTM 0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1

(that's 0, 1, seven zeroes, 1, seven more zeroes, and 1). Adjust DisplayPort-0 to where your colour-managed display is connected. Note that every time you (re)start GNOME or its shell, it will forget this setting and you'll have to enter it again. It would be nice if the colour manager could work with OpenPOWER, but CTM should have never broken working profiles in the first place.

However, that all got solved later, because an even more pressing concern popped up first: the UI was slow as molasses. GNOME 40 defaults to the Activities overview on startup with nothing running. It takes literally several seconds to move from one page of activities/apps to the next. Several seconds. This problem is not unique to OpenPOWER, and occurs on both Wayland and Xorg, but a general fix is apparently months away.

The performance problems are not X11-specific. In fact, Wayland is even worse, because the mouse stutters even just moving it around. This is the first time Wayland is actually worse on the system with the GPU (the T2) rather than the system without one (the Blackbird), though I hardly consider this regression a positive development.

What am I doing about it? Well, what can I do about it, short of trying to fix it myself? GNOME is the default environment for Fedora Desktop, and while I could switch to KDE or Xfce (and I might!), these are serious regressions that are hitting a decent proportion of users and were even evident during the beta phase. Did QA just fall asleep or something? To top it off, even if it were working well, whose freaking bright idea was it to make you go to the upper left corner to click Activities, then back to the bar to click the Show Applications button, just to pull up what you have installed? I've started using the Applications menu that Fedora includes by default; at least that doesn't take a Presidential administration or two or wild sweeping mouse gestures just to show you a list of apps, even though it's still noticeably slower than 3.38.

The slowdowns are entirely specific to GNOME. Once you actually get an app started, like Firefox or a game, display speed is fine, so the problem clearly isn't pushing pixels; it's something higher level in GNOME. Switching all the core scheduling to performance made at most minimal difference. Similarly, (as root) echo high > /sys/class/drm/card0/device/power_dpm_force_performance_level instead of auto made things a little better, but there is still no excuse for how bad it is generally. About the only thing that made more difference than that was simply turning animations off altogether in GNOME Tweaks. Nothing was smooth anymore, but it was about twice as fast at doing anything, so that's how I'm limping along for the time being.

With those significant problems on deck, the usual turmoil with custom themes and extensions is actually anticlimactic. I had to make some tweaks to my custom Tiger-like GNOME shell extension to fix the panel height and a weird glitch with slightly thicker border lines on the edges of the panel, which you can see in the screenshot below. Quite a few extensions could not automatically update to GNOME 40, either:

I've become irritated enough by this that I actually did set disable-extension-version-validation to true in dconf-editor, which made a couple start working immediately, including my beloved Argos custom script driver. For the others I downloaded the most current version of the shell system monitor and this fork of Dash-to-Dock, and manually installed them in ~/.local/share/gnome-shell/extensions/ (you may need to reset the GNOME shell with Alt+F2 and r to get gnome-extensions enable to actually see their UUIDs). A few I should have dispensed with earlier: No Topleft Hot Corner can now be simply replaced by gsettings set org.gnome.desktop.interface enable-hot-corners false, and AlternateTab's switcher behaviour now can be rigged manually from GNOME Settings.

I'm now more or less back where I started from, but working with apps is much less fluid and the desktop experience is undeniably inferior to prior releases, and I can't believe no one thought to blow the whistle during the test phase.

If you use Fedora purely as a command-line server, other than the initial hiccups with downloading packages, it seems to work. If you use KDE or Xfce or anything other than GNOME as your desktop, you're probably okay with F34 too, though I didn't test those (I may later). But if you use the default GNOME on Fedora, especially if you use Wayland, think twice about this update before installing it while you've still got some time with F33. Part of riding the bleeding edge is drawing blood now and then, but F34's wounds seem much more self-inflicted than usual. This is the worst Fedora update since I started using it in F28 and I'm not exaggerating in the slightest.

LibreBMC and Kestrel: two separate BMC tastes that taste separate separately


After our article earlier this week on LibreBMC and my concern as to what it meant for Raptor's own Kestrel "soft BMC" project, Raptor's Tim Pearson contacted me and advised that Kestrel is a standalone Raptor product on its own timetable that's still very much in development. In fact, the attached screenshot shows evolution from just last month with On-Chip Controller support now in the firmware, where it is able to read the temperature sensors and has scaffolding for monitoring fan RPM. Although Raptor makes it clear that Kestrel is and always will be open, and is developed on OpenPOWER systems using open tooling, it is mostly Raptor in-house code with the only non-Raptor bits being Zephyr (the OS), LiteX (the FPGA designer), Microwatt (the CPU core) and some minor components like I2C.

The other noteworthy thing is that Kestrel will indeed be offered as an aftermarket plug-and-play upgrade for existing Blackbird and Talos II systems, no soldering required. This is excellent news, because while LibreBMC is a very encouraging development and has wide-ranging implications beyond OpenPOWER systems, its basis on OpenBMC means a heavier installation that continues to be less well suited to a workstation environment (in terms of interface and sheer startup time amongst other things). While Antmicro's planned LibreBMC card may also work in Raptor systems, it's really meant for OpenPOWER servers as well as other server-class machines with onboard BMCs that aren't necessarily Power-based. Whether we like it or not, while there's a lot of nerd cred around OpenPOWER workstations, such systems presently represent a minority of the POWER9 installed base (let alone of all BMC-managed hardware). It thus makes sense that Raptor, currently the only manufacturer of OpenPOWER workstation machines, would be in the best position to create an open BMC that also is better tailored to improving the desktop user experience. As I've written here before, the envisioned improvements to fan control and user interface are very welcome, but Kestrel's promised 10-second power-to-IPL is a huge win for the desktop when ASPEED BMCs running LibreBMC right now take over a minute or more. LibreBMC's OpenPOWER core will certainly improve performance but I doubt it will be to that extent.

But I also suspect more good news than just an aftermarket upgrade is afoot. I pointed out in the LibreBMC article how much one of the physical form factors in particular looks an awful lot like an single-board computer, because really when you have a system-on-a-chip and on-board peripherals, you kind of have a "PowerPi" already. Raptor isn't saying, but Kestrel, like LibreBMC, will necessarily have its own SoC at its heart and thus could easily be the basis of another OpenPOWER SBC project, possibly one that might be even better suited for that environment. Plus, being Raptor, you know all the components will be open and blob-free (especially now that they've eliminated the blobs for the onboard Broadcom Ethernet, leaving just the Microsemi PM8068 as the last blob firmware component and only if you buy it as a BTO option). It will be worth watching to see when and how all this comes to market. Either way, getting your choice of BMC alternatives is a really great thing especially when the BMC is so critical to the system as a whole.