Posts

Showing posts from 2021

Fedora 35 mini-review on the Blackbird and Talos II


Happy American Thanksgiving. While America watches football and eats deep-fried gobbler, we went to the Popeye's drivethru for chicken and I finished updating Fedora on my daily driver, now at version 35 (see our prior review of Fedora 34). As I always point out: while Fedora is a very common distro on OpenPOWER systems, even if you don't necessarily run Fedora yourself the fact that it does run is important, because it tends to be very ahead of most distros and many problems are identified and fixed in it before moving to other less advanced ones. I test it on my 4-core BMC graphics Blackbird and my dual-8 AMD WX7100 GPU Talos II.

F34 was a messy, unpleasant upgrade. I did the update first on my 4-core stock Blackbird, which I try to keep to stock Fedora as much as possible, though I note for the record both the Bird and the T2 are configured to come up in a text boot instead of gdm and I start GNOME manually from there. I strongly recommend this to act as a recovery mechanism in case your graphics card gets whacked by something or other. On Fedora this is easily done by ensuring the symlink /etc/systemd/system/default.target points to /lib/systemd/system/multi-user.target. Once you've logged into the console jump to GNOME with startx (set XDG_SESSION_TYPE to x11 if this isn't already done), or XDG_SESSION_TYPE=wayland dbus-run-session gnome-session if we want to explore the Wayland Wasteland. Since this is a minimal boot I can also do the upgrade at the same text prompt for speed and ensure as little interference as possible. As usual, the process is, from a root prompt:

dnf upgrade --refresh # upgrade prior system and DNF
dnf install dnf-plugin-system-upgrade # install upgrade plugin if not already done
dnf system-upgrade download --refresh --releasever=35 # download F35 packages
dnf system-upgrade reboot # reboot into upgrader

This went much more smoothly than F34, which had some weird conflicts; it was able to get the necessary packages right away and booted into the installer with no issue. Back at the text prompt, we started with Wayland, as I always do to see if it's still going to suck, and I'm still not disappointed. Performance was even worse than F34, it got glitchy just trying to take a grab with gnome-screenshot from the command line (see this Reddit thread) and BMC video (through the on-board HDMI connector) is still stuck at 1024x768. I took this on my Pixel 3 after I got tired of mucking around with it.

As before don't even bother with Wayland on a Blackbird if you don't have a GPU. Xorg worked fine but was still slow like F34 was. I'll get to that in a moment.
Otherwise, in Xorg, the system, Firefox and LibreOffice mostly worked as before modulo the performance problems, which was a relief.

The T2 tends to be a different story because I have this system heavily customized. Additionally, kernel 5.14 has a known problem with AMD Vega cards (add amdgpu.aspm=0 to your kernel command line as a workaround), and 5.15 may have an issue with amdgpu in power saving mode, so watch out for both of these problems depending on your GPU. (At least one user reported having to blacklist the AST BMC, though that wasn't necessary for me.)

The first problem was more elemental, however: after I downloaded the packages and ran the installation, it still came up offering an impossibly old kernel - the same thing I had to work around with updating to F34!

When I selected it, it started Fedora 35, but with this old 5.11-series kernel from Fedora 34. I did a manual grub2-mkconfig -o /boot/grub2/grub.cfg and restarted, and the Petitboot menu (built off the grub configuration) looked sane again. The text boot came up without incident.

Next, the desktop environment. Usually GNOME upgrades break a large number of my cherished extensions. Surprisingly, only Dash-to-Dock broke this time, which I rebuilt from a fork using these instructions. Note, however, that I do have disable-extension-version-validation set to true in dconf-editor which helps avoid a lot of churn.

However, the same GNOME regressions turned up in F35 that were in F34: CTM still makes a mess out of my custom colour profiles (again something like xrandr --output DisplayPort-0 --set CTM 0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1 will fix it, but this changes based on how your monitors are connected, and every time you [re]start GNOME you'll have to do it), colour calibration still crashes with my Pantone huey, and graphics were still awfully slow. This performance problem is once again libgraphene not being properly built to enable SIMD; the fix was made by the maintainer but the Fedora-distributed library doesn't seem to incorporate it properly. I rebuilt it on F35 and put a copy on Github. It will replace the file of the same name in /lib64 (remember to make a backup and don't do this while GNOME is running).

I'll not comment much further about Wayland except to say that it continues to meet my low expectations on the T2, but as it still doesn't support what my work habits require, I still don't use it. But you can, at least if you have a working discrete graphics card and you've updated libgraphene. For me, Xorg forever, I guess.

My conclusion is damning with faint praise: at least it wasn't any worse. And with these tweaks it works fine. If you're on F34 you have no reason not to upgrade, and if you're on F33 you won't have much longer until you have to (and you might as well just jump right to F35 at that point). But it's still carrying an odd number of regressions (even though, or perhaps despite the fact, the workarounds for F35 are the same as F34) and the installation on the T2 was bumpier than the Blackbird for reasons that remain unclear to me. If you run KDE or Xfce or anything other than GNOME, you shouldn't have any problems, but if you still use GNOME as your desktop environment you should be prepared to do more preparatory work to get it off the ground. I have higher hopes for F36 because we may finally get that float128 update that still wrecks a small but notable selection of packages like MAME, but I also hope that some of these regressions get dealt with as well because that would make these updates a bit more liveable. Any system upgrade of any OS will make you wonder what's going to break this time, but the most recent Fedora updates have come off as more fraught with peril than they ought to be.

If you like big-endian and Void and cannot lie ...


... then you other brothers can't stand by: Void PPC, probably one of the most finely tuned distributions for Power ISA systems (and one of the few still supporting Power Macs), needs big endian maintainers due to the work needed to maintain those four flavours, i.e., 32-bit PowerPC and 64-bit BE Power multiplied by musl and glibc. I totally get the idea of not maintaining what you don't personally use, which is one of the reasons I cut loose TenFourFox and Classilla earlier. It's a shame but it's awfully hard to justify dedicating resources to a free product that isn't personally beneficial. The new BE Void PPC maintainer would be responsible for doing the builds as well as fixing issues, but it should be possible to coordinate hosting the packages on an official mirror. I imagine it's negotiable to do only glibc or only 64-bit or some such depending on the hardware or interest you have.

If no one steps up, the big-endian musl repos go first by the end of this year, and the glibc repos will be discontinued in January 2023. Little-endian 64-bit is unaffected as is the experimental little-endian 32-bit flavour. Interested community members will want to take a look at the Void PPC Github.

51,552 JavaScript tests can't be wrong


Yeah, so about that OpenPOWER Minimum Viable Product JavaScript JIT for Firefox. This happened (all timings from an unoptimized debug build on my dual-8 Talos II with -j24):

% ./mach jstests --args "--no-ion --no-baseline --blinterp-eager --regexp-warmup-threshold=0" -F -j24

[43359|    0|    0|  614] 100% ======================================>| 529.7s
PASS
% ./mach jstests --args "--no-ion --no-baseline" -F -j24
[43359|    0|    0|  614] 100% ======================================>| 499.0s
PASS
% js/src/jit-test/jit_test.py --args "--no-ion --no-baseline --blinterp-eager --regexp-warmup-threshold=0" -f -j24 obj/dist/bin/js
[8193|   0|   0|   0] 100% ==========================================>| 132.3s
PASSED ALL
% js/src/jit-test/jit_test.py --args "--no-ion --no-baseline" -f -j24 obj/dist/bin/js
[8193|   0|   0|   0] 100% ==========================================>| 133.3s
PASSED ALL

That's a wrap, folks: the MVP, defined as Baseline Interpreter with irregexp and Wasm support for little-endian POWER9, is now officially V. This is the first and lowest of the JIT tiers, but is already a significant improvement; the JavaScript conformance suite executed using the same interpreter with --no-ion --no-baseline --no-blinterp --no-native-regexp took 762.4 seconds (1.53x as long) and one test timed out completely. An optimized build would be even faster.

Currently the code generator makes heavy use of POWER9-specific instructions, as well as VSX to make efficient use of the FPU. There are secondary goals of little-endian POWER8 and big-endian support (including pre-OpenPOWER so your G5 can play too), but these weren't necessary for the MVP, and we'd need someone actually willing to maintain those since I don't run Linux on my G5 or my POWER6 and I don't run any of my OpenPOWER systems big. While we welcome patches for them, they won't hold up primary support for POWER9 little-endian, which is currently the only "tier 1" platform. I note parenthetically this should also work on LE Power10 but as a matter of policy I'm not going to allow any special support for the architecture until IBM gets off their corporate rear end and actually releases the firmware source code. No free work for a chip that isn't!

You should be able to build a JIT-enabled Firefox 86 off of what's in the Github tree now, but my current goal is to pull it up to 91ESR so that it can be issued as patches against a stable branch of Firefox. These patches will be part of my ongoing future status updates for Firefox on OpenPOWER (yes, you'll need to build it yourself, though I'm pondering setting up a Fedora copr at some point). The next phase will be getting Baseline Compiler passing everything, which should be largely done already because of the existing Baseline Interpreter and Wasm support, and then the final Ion JIT stage, which still needs a lot of work. We'll most likely set up a separate tree for it so you can help (ahem). No promises right now but I'd like to see the completed JIT reach the Firefox source tree in time for the next ESR, which is Firefox 102. That's more than you can say for Chrome/Chromium, which so far has refused to accept OpenPOWER-specific work at all.

#ShowUsYourTalos


It's been a while since we did this, and even longer since we showed an actual Talos system, but here's Martin Kukač's Blackbird's new sexy case to contain its 8-core CPU, 32GB RAM and GeForce 210 GPU. The polished metal and open bottom, plus the vertical row of ports and power, make for a nice transitional look from the old Power Mac G5.

If you've got a well-coiffed OpenPOWER workstation to show off, post in the comments. Plus, somebody has to have an actual T2 or T2 Lite they're proud of, or I'm going to have to come up with a new hash tag.

Big and little POWER shouldn't just be endian


While the majority of OpenPOWER installations by this point are probably running little-endian, every single POWER chip runs big — big power usage, that is. While POWER9 is still performance-competitive with x86_64 and this situation continues to improve as more software gets better optimized, and there have been huge gains since POWER4/the PowerPC 970 in particular, POWER chips still run relatively hot and relatively hungry. Anandtech tried to normalize this for POWER8 systems by estimating transactions per watt; power measurements can be very imprecise and depend on more than just the system architecture, but even with that consideration the tested Tyan POWER8 in particular was outclassed by nearly a factor of three by a Xeon E5-2699. Possibly in response POWER9 is more aggressive with power savings than POWER8 and makes a lot of microarchitectural improvements, using 25% less juice for 50% more zip (so roughly a doubling of performance per watt), and Power10 supposedly improves on POWER9's performance per watt even more by at least 2.6 times according to IBM's figures.

But IBM's playbook for improving perf per watt hasn't really changed. Either you're boosting performance by juicing the microarch, jimmying IPC with more instructions and more cores, or both, or you're trying to diminish power usage with heavier clock speed throttling or turning off cores. While shooting the die budget at lower-wattage pack-in accelerators is a clever hybrid approach, their application-specific nature also means they're rather less useful in typical situations than their marketing would allege (look at how little currently uses the gzip accelerator in every POWER9, for example). You can do a lot with strategies like these — AMD certainly does — but sooner or later you'll hit a wall somewhere, either against the particular limitations of the design you're working with or against the intrinsic physical limitations of making a hippo do gymnastics while eating fewer calories.

Apple Silicon has a lot of concerning issues with it from a free computing perspective, but its performance is impressive, and its performance per watt is jaw-dropping. A lot of this is the secret sauce in their microarch which ironically came from P.A. Semi, originally a Power ISA licensee, and some may be due to details of the on-board GPU. But a good portion is also due to the big core-little core approach largely pioneered with the ARM big.LITTLE Cortex A7 and used to great effect in the M1 series. After all, if you want to get the best of both worlds, make some of the cores use less power and give those cores tasks that require less oomph (efficiency or E-cores), reserving the heavy tasks for the big ones (power or P-cores). Intel thinks so too: Lakefield and Alder Lake both attempt the same sort of heterogenous CPU topology for x86_64, and it would be inconceivable to believe AMD isn't looking to make the same jump for their next iteration.

The chief issue with going that route is making sure that the cores are getting work commensurate with their capabilities. This is easy for Apple since they control the whole banana: macOS Quality of Service is all about doing just that (you'd think they would do something based on nice levels as well, but I guess all the sweet talk about being desktop Un*x went out the window somewhere around Mavericks). Linux added initial support for big.LITTLE with kernel 3.10 but it took years for other improvements to the Linux scheduler to make it meaningful. Intel made things worse for themselves in Lakefield and Alder Lake by using lower power Atom-based E-cores that didn't support AVX-512 (and the Tremont E-cores in Lakefield didn't even support AVX2, meaning such tasks couldn't be run by them at all). Rather than hinting Windows 11 or the internal hardware not to send AVX-512 code to the Gracemont E-cores, Alder Lake just doesn't support AVX-512, full stop — on any core. Kernel 5.13 supports Alder Lake, but kernel 5.15 has dawned and there is no specific Intel Thread Manager Support so far, though there is scheduler support for AArch64 E-cores that can't run 32-bit code. And Alder Lake is turning out to be very power-hungry, which calls some of the design into question, in addition to various compatibility issues when software unwittingly puts tasks on the E-cores that don't work as expected.

Still, the time is coming where Power ISA should start thinking about a big-little CPU, maybe even for Power11. We already have big cores (if IBM will ever get their heads out of their rear ends and release the firmware source), but we also have an already extant little OpenPOWER core: Microwatt. While Microwatt doesn't support everything that POWER9 or Power10's large cores do, it's still intended to be a fully compliant OpenPOWER core, and since the Linux kernel is already starting to cater to heterogenous designs a set of POWER8-compliant Microwatt E-cores could still execute on the same die along with a set of Power11 full fat P-cores. Add logic on-chip to move threads to the P-cores if they hit an instruction the E-cores don't support and you're already most of the way there with relatively minor changes to the Linux kernel.

What IBM — or any future OpenPOWER chip builder, though so far no one else is in the performance category — needs to avoid is what seems to be dooming Alder Lake: they've managed to hit the bad luck jackpot with a chip that not only uses more power but has more compatibility problems. Software updates will fix this issue somewhat but a little more forethought might have staved it off, and the apparent greater wattage draw should have been noticed long before it left the lab. But IBM has already shown wattage improvements over the last two generations and if the P- and E-core functionalities are made appropriately comparable, a big-little Power11 — with open firmware please! — could be a very compelling next upgrade for the next generation of Power-based workstations and servers. Apple has clearly demonstrated that highly efficient and powerful computing experiences are possible when hardware and software align. There's no reason OpenPOWER and Linux or *BSD can't do the same on open platforms.

Firefox 94 on POWER


Firefox 94 is released. I have little interest in the colourizer, but I do like about:unloads and EGL support on Linux for great WebGL justice even on X11 (I don't use the Wayland Wasteland), at least if you have an AMD/ATI card like the WX7100 Raptor sells as a BTO option. There are also various performance improvements and a fun feature where you can use a different Mozilla VPN server for each separate multi-account container, the latter probably being Firefox's most useful capability right now. The LTO-PGO patch is unchanged from Firefox 93 and the .mozconfigs are unchanged from Firefox 90.

Fedora 35


Fedora 35 is out, which we pay particular attention to at Floodgap Orbiting HQ since both our daily driver Talos II and HTPC Blackbird run Fedora. Even if you don't run it, it's a cutting-edge distro, so OpenPOWER-specific issues show up and (hopefully) get fixed here early, making it a good preview for other distros. I wasn't too happy with F34 so I'm hoping the only direction it can go is up this time around.

Fedora 35 upgrades to GNOME 41 with Wayland-specific performance improvements, a new default GL renderer for GTK4 and new options for power and window management. WirePlumber is also added to complement PipeWire in F34 for additional video and audio session policy control, along with Python 3.10, Perl 5.34, and PHP 8.0. It ships with kernel 5.14, rpm 4.17, glibc 2.34 and gcc 11.

However, there is still no motion on the 128-bit long double transition for OpenPOWER, and the F34 tracking bug has not been reopened. This most notoriously affects MAME but also a small and growing number of other packages, and I have no idea what's holding this up for so long — like, literally, years.

Now that F35 is out, F33 will end-of-life on November 30. We'll do our usual deep dive review in a few days after everything has updated.

New Blackbird firmware


New firmware for the Blackbird is available from the Raptor wiki. This version fixes the Petitboot crashes that plagued users of the LSI SAS module, essentially replacing the "2.01 beta" that Raptor put out to fix the problem. If you were affected, you may wish to update in order to pick up the officially blessed fix.

Also, Raptor is hinting that more updates regarding the Blackbird's availability will come in November. I suspect this may have something to do with the shortage of SATA controllers which is also delaying some Talos II and T2 Lite orders; you can order a T2 without a SATA card, but the Blackbird has SATA on-board. Hopefully the logjam will break up soon.

First flight of Kestrel, the FPGA OpenPOWER-based BMC, and introducing the Arctic Tern dev board


Our alert readers yield the most interesting tips (thanks D!), including a video quietly uploaded to the Raptor wiki currently linked nowhere else showing a running Kestrel system connected to a Talos II. And it looks stupendous.

Kestrel, you will recall from our previous coverage, is a "soft BMC" that replaces the functionality of the onboard ASPEED BMC standard in all current shipping POWER9 hardware. Like the ASPEED BMC, Kestrel provides remote access and management, system IPL (via FSI), firmware (via LPC and SPI), and a 2D framebuffer (though Kestrel is planned to use HDMI, not VGA). However, while the ASPEED BMC runs its own full Linux distribution (OpenBMC), Kestrel runs Zephyr, a small open-source real time operating system, and can be built from the FPGA up with open tooling. Best of all, it's OpenPOWER just like the main system (instead of ARM), using a Microwatt core in little-endian mode as its CPU.

Here's the expanded block diagram of what it includes and provides:

What really impressed me about Kestrel was the potential for much faster BMC boot times — Raptor was promising within seconds (versus the minutes with the stock BMC firmware, though third-party projects like BangBMC aim to improve on it). If even that was all Kestrel could accomplish, it would be worth it.

Well, the video has made me a believer. A few short seconds after power was applied, and in less time than it took the announcer to describe what was happening, the Kestrel-enhanced T2 was ready to boot. I'll take two, Tim.

As before, Kestrel is incarnated on a Lattice ECP5 Versa development board, which the demo unit in the video has mounted to a little tray in the base of the T2's E-ATX case. The ECP5's PCIe edge is not connected. Instead, power is being drawn off an unknown source, and the Flexible Support Interface signals are coming from the on-board debug connector which is not mentioned in the T2 manual. Here's a picture of the one in my running T2 at J3200 (next to the boot and BMC flash):

The "FSI Adaptor v1.0" daughterboard plugged into J3200 is new and doesn't appear on the Raptor Engineering Kestrel page (for that matter, the page still says it's "not yet tested" on the Talos II). The TPM headers at J10105 are connected for LPC, and while it's hard for me to see at the photographed angle, the COM2 port at J7701 also seems connected as well as another set of lines that most likely service I2C. These signals all route to a hat sitting on the ECP5 which is also new, though its label is just out of focus. (The Ableconn card visible in the background looks like NVMe and doesn't appear to be part of Kestrel.)

For the demonstration the ASPEED BMC was completely disabled (but how wasn't said — perhaps the FSI connector is rigged to inhibit it, or maybe this method). The demo showed rapid power on into the Zephyr OS and IPL into Hostboot quickly afterwards. Once the On-Chip Controllers on the POWER9 become active, a separate thread in Zephyr continuously polls the CPU temperature sensors to set appropriate fan speeds, while maintaining the rest of the core functionality. Here's the Kestrel monitoring the system during IPL:

This demo didn't show remote access or management (though we have a screenshot) and it didn't show the framebuffer functionality. But the video does announce a dedicated soft-BMC development board called Arctic Tern which will be "plug and play for all Raptor Computing products" and available in Q1 2022. Likely this will be the hardware Kestrel will be based on, and while it's not clear if it will still be ECP5-based, presumably Arctic Tern will come from the factory preconfigured as Kestrels and you can reprogram them as you please for your own projects.

OpenBMC got us started, but its slow startup and heavier build requirements retarded further functional progress, and it's just not well-suited to workstations. I'm blown away by how far Kestrel has come, I hope to see future Raptor hardware with these as a competitive advantage, and I'll be first in line to get one. Watch for a review here in the near future.

A water-cooled update


Earlier we reported on Vikings' planned watercooling system for OpenPOWER. Vikings is now reporting their second revision, an improved lower-pressure mount, should be available for purchase from their store in two to four weeks. Unlike the IBM HSFs this is a low-pressure mounting mechanism which made it both less expensive and easier to engineer, and also means a custom cooler for the higher pressures won't be necessary. (Vikings notes a short screw is used "so that it shouldn't be possible to tighten it too much.") No MSRP yet and no preorders currently, but it will be sold as a full kit (fluid also available, or use your choice of appropriate fluids depending on the tubing) compatible with all existing Raptor systems or as just the cooler/mount for those with an existing external radiator. For you crazy people trying to cram an 18-core into a Blackbird this might be your ticket, but I'm interested myself to get rid of the fan bank in my POWER9 HTPC spooling up and down — after all, the best advantage of liquid cooling is the peace and quiet. More to come when kits are available.

Ubuntu 21.10 and 20.04.3


Ubuntu 21.10 "Impish Indri" is also out, upgrading to kernel 5.13, GNOME 40 (but presumably past the teething pains in Fedora 34 which required later patching) and gcc 11. This is the last interim release before the next Ubuntu LTS, scheduled for April 2022; the current LTS is updated to 20.04.3. As usual, new installs on OpenPOWER require installing Ubuntu Server first, and then converting to Desktop.

Tonight's game on OpenPOWER: Space Cadet Pinball


I've always loved pinball even though in league play I was always pretty much bang-up average. My first experience was with a Williams Pin-Bot at the local roller rink (I can't rollerskate either) and I was hooked. In Floodgap Orbiting HQ we have a Williams Star Trek: The Next Generation which I'm doing a long-playing LED upgrade on and a Stern Sopranos.

Computer pinball, however, has been a mixed bag, largely because of the simulation fidelity necessary for good play. Nowadays you have Pinball Arcade on mobile devices and Visual Pinball on Windows, but for years the physics never really exceeded what you got in Bill Budge's 1982 Pinball Construction Set and table features were even more limited. The mid 1990s introduced probably the first generation of computer pinball games that actually played vaguely like real pinball and some real pinball tables were even ported (I played a credible if low-res version of Bally's Eight Ball Deluxe on my Mac).

Of these, one of the best known was Maxis' Full Tilt Pinball in one of its tables' incarnation as 3D Pinball for Windows - Space Cadet, included first with Windows Plus! for Windows 95 and then with every version of Windows afterwards (including NT 4 and Windows 2000) through Windows XP inclusive. This version was a port of the original Space Cadet table written in cross-platform C and had a slightly different ruleset. I enjoyed this version on my father's AT&T Pentium 75; later I got Full Tilt Pinball for Mac, which was a dual-version disc with Windows.

Apparently I'm not the only one that liked it because the 3D Pinball version was eventually decompiled and rewritten. This redux not only plays authentically with the assets from the Windows Plus! version, but can use the higher-res versions with Full Tilt, though the ruleset is still from the Plus! game. It uses SDL and can scale to larger screen sizes and faster frame rates.

Compilation on Fedora 34 on this Talos II was straightforward. With development headers installed for SDL2 and SDL_mixer, grab the tree (do this from tip, not version 1.1), mkdir build, cd build, cmake .. and make. Copy the resources from the game — for Full Tilt this is pretty much CADET.DAT and the SOUND folder, but for the Plus! version copy everything in the same folder as PINBALL.EXE — into the build directory (if you're using the Full Tilt version as I did, you may need to loop-mount the disc to get the Windows XA session to show up) and start with ./SpaceCadetPinball.

For best results, under Options make sure Music is checked (you'll need something that plays MIDI files), under Options, Table Resolution make sure Use Maximum Resolution is checked (if you use the Full Tilt assets, you get 1024x768, and you can enlarge the window for sizes even larger), and under Options, Graphics make sure Uncapped UPS is checked so you get all the frames.

Good luck, Cadet.

OpenBSD 7.0


OpenBSD 7.0 is available, compatible with Raptor workstations in big-endian mode as well as "expected to be" with IBM PowerNV hardware generally. New powerpc64-specific improvements include MSI-X support, a fix for page faults under recursive locking, a bump in the maximum data size to 32GB, and support for the dynamic tracer. This is on top of better GPU support, additional driver and device support, updates to OpenSMTPD, LibreSSL and OpenSSH, and lots of new port packages. You can boot OpenBSD directly from Petitboot and install over the network; download mirrors are worldwide.

Firefox 93 on POWER


Firefox 93 is out, though because of inopportune scheduling at my workplace I haven't had much time to do much of anything other than $DAYJOB for the past week or so. (Cue Bill Lumbergh.) Chief amongst its features is AVIF image support (from the AV1 codec), additional PDF forms support, blocking HTTP downloads from HTTPS sites, new DOM/CSS/HTML support (including datetime-local), and most controversially Firefox Suggest, which I personally disabled since it gets in the way. I appreciate Mozilla trying to diversify its income streams, but I'd rather we could just donate directly to the browser's development rather than generally to Mozilla.

At any rate, a slight tweak was required to the LTO-PGO patch but otherwise the browser runs and functions normally using the same .mozconfigs from Firefox 90. Once I get through the next couple weeks hopefully I'll have more free time for JIT work, but you can still help.

DAWR YOLO even with DD2.3


Way back in Linux 5.2 was a "YOLO" mode for the DAWR register required for debugging with hardware watchpoints. This register functions properly on POWER8 but has an erratum on pre-DD2.3 POWER9 steppings (what Raptor sells as "v1") where the CPU will checkstop — invariably bringing the operating system to a screeching halt — if a watchpoint is set on cache-inhibited memory like device I/O. This is rare but catastrophic enough that the option to enable DAWR anyway is hidden behind a debugfs switch.

Now that I'm stressing out gdb a lot more working on the Firefox JIT, it turns out that even if you do upgrade your CPUs to DD2.3 (as I did for my dual-8 Talos II system, or what Raptor sells as "v2"), you don't automatically get access to the DAWR even on a fixed POWER9 (Fedora 34). Although you'll no longer be YOLOing it on such a system, still remember to echo Y > /sys/kernel/debug/powerpc/dawr_enable_dangerous as root and restart your debugger to pick up hardware watchpoint support.

Incidentally, I'm about two-thirds of the way through the wasm test cases. The MVP is little-endian POWER9 Baseline Interpreter and Wasm support, so we're getting closer and closer. You can help.

Whonix on OpenPOWER


Developer Jeremy Rand wrote in to report his functioning port of Whonix 16 to OpenPOWER. (I should point out that all links in this article are "clearnet.") Whonix is a second operating system based on Kicksecure (a Debian derivative formerly known as "Hardened Debian") that runs within VMs on your existing OS (compare with Tails). All connections within it are forced through Tor, using different paths for different applications; additionally, it uses kloak for keystroke anonymization and secure network time synchronization instead of NTP, has higher quality RNGs, and enables AppArmor and hardened kernel profiles to prevent against other types of attacks.

The current release of Whonix is based on Debian bullseye and runs "native" on OpenPOWER KVM-HV using libvirt. Note that ppc64le isn't a top-tier architecture yet, so there are roadbumps: due to a bug in kernel versions prior to 5.14, currently you have to use Debian experimental for the VM, and there may be other glitches temporarily until support is mainstreamed. But if you bought an OpenPOWER workstation for its auditability and transparency, I doubt something like that's going to trip you up much. Detailed installation instructions, including Onion links if you prefer, are on the Raptor wiki.

Better x86 emulation with Live CDs


Yes, build a better emulator and the world will beat a path to your door to run their old brown x86 binaries. Right now that emulator is QEMU. Even if you run Hangover for Windows binaries, it's still QEMU underneath (and Hangover only works with 4K page kernels currently, leaving us stock Fedora ppc64le users out), and if you want to run Linux x86 or x86_64 binaries on your OpenPOWER box, it's going to be QEMU in user mode for sure.

However, one of the downers of this approach is that you also need system libraries. Hangover embeds Wine to solve this problem (and builds them natively for ppc64le to boot), but QEMU user mode needs the actual shared libraries themselves for the target architecture. This often involves labouriously copying them from foreign architecture packages and can be a slow process of trying and failing to acquire them all, and you get to do it all over again when you upgrade. Instead, just use a live CD/DVD as your library source: you can keep everything in one place (often using less space), and upgrading becomes merely a matter of downloading a new live image.

My real-world use for this is running the old brown Palm OS Emulator, which I've been playing with for retrocomputing purposes. Although the emulator source code is available, it's heavily 32-bit and I've had to make some really scary hacks to the files; I'm not sure I'll ever get it compiling on 64-bit Linux. But there is a pre-built 32-bit i386 binary. I've got a Palm m515 ROM, a death wish and too little to do after work. Let's boot this sucker up. Note that in these examples I'm "still" using QEMU 5.2.0. 6.1.0 had various problems and crashed at one point which I haven't investigated in detail. You might consider building QEMU 5.2.0 in a separate standalone directory (plus-minus juicing it) for this purpose.

We'll use the Debian live CD in this article, though any suitable live distro should do. Since POSE is i386, we'll need that particular architecture image. Download it and mount the ISO (which appears as d-live 11.0.0 gn i386 as of this writing).

The actual filesystem during normal operation is a squashfs image in the live directory. You can mount this with mount, but I use squashfuse for convenience. Similarly, while you could mount the ISO itself every time you need to do this, I just copy the squashfs image out and save a couple hundred megabytes. Then, from where you put it, make sure you have an ~/mnt folder (mkdir ~/mnt), and then: squashfuse debian-11-i386.squashfs ~/mnt

Let's test it on Captain Solo. After all, we've just mounted a squashfs image with a whole mess of alien binaries, so:

% ~/src/qemu-5.2.0/build/qemu-i386 -L ~/mnt ~/mnt/bin/uname -m
i686

And now we can return Luke Skywalker to the Emperor: ~/src/qemu-5.2.0/build/qemu-i386 -L ~/mnt pose

Here it is, running a Palm image using an m515 ROM I copied over from my Mac.

However, uname and pose are both single binaries each in a single place. Let's pick a more complex example with resources, assets and other loadable components like a game. I happen to be a fan of the old Monolith anime-style shooter Shogo: Mobile Armor Division, which originated on Windows (GOG still sells it) but was also ported to the classic Mac OS and Linux by Hyperion. (The soundtrack CD is wonderful.) I own a boxed physical copy not only of the Windows release but also the Mac version, which is quite hard to find, and the retail Linux version is reportedly even rarer. While there have been promising recent developments with open-source versions of the LithTech engine, Shogo was the first LithTech game and apparently used a very old version which doesn't yet function. There is, however, a widely available Linux demo.

The demo which you download from there appears to just be a large i386 binary. But if you run it using the method above, you'll only get a weird error trying to run another binary from a temporary mount point. That's because it's actually an ISO image with an i386 ELF mounter in the header, so rename it to shogo.iso and mount it yourself. On my system GNOME puts it in /run/user/spectre/ISOIMAGE.

To set options before bringing up the main game, Shogo uses a custom launcher (on all platforms), but you can't just run it directly because Debian doesn't have all the libraries the launcher wants:

% ~/src/qemu-5.2.0/build/qemu-i386 -L ~/mnt /run/media/spectre/ISOIMAGE/shogolauncher
/run/media/spectre/ISOIMAGE/shogolauncher: error while loading shared libraries: libgtk-1.2.so.0: cannot open shared object file: No such file or directory

You could try to scare up a copy of that impossibly old version of GTK, but in the Loki_Compat directory of the Shogo ISO is the desired shared object already. (Not Loki Entertainment: this Loki, a former Monolith employee.) You can't give qemu-i386 multiple -L options, but you can give environment variables to its ELF loader, so we'll just specify a custom LD_LIBRARY_PATH. For the next couple steps it will be necessary for us to actually be in the Shogo mounted image so it can find all of its data files, thusly:

% cd /run/media/spectre/ISOIMAGE
% ~/src/qemu-5.2.0/build/qemu-i386 -L ~/mnt -E LD_LIBRARY_PATH="/run/media/spectre/ISOIMAGE/Loki_Compat" ./shogolauncher

We've bypassed the shell script that actually handles the entire startup process, so when you select your options, instead of starting the game it will dump a command line to execute to the screen. This is convenient! To start out with, I picked a windowed 640x480 resolution using the software renderer and disabled sound (it doesn't work anyway, probably due to the age of the libraries it was developed with), got the command line and ran that through QEMU. Boom:
And, as long as you crank the detail level down to low from the main menu, it's playable!
A lot doesn't work: it doesn't save games because you're running it out of an ISO (copy it elsewhere if you want to); there is no sound, probably, as stated, due to the age of the libraries (the game itself dates to 1998 and the Linux port to 2001); and don't even think about trying to launch it using OpenGL (it bombs out with errors). There are also occasional graphics glitches and clipping problems, one of which makes it impossible to complete the level, though I don't know how much of this was their bug versus QEMU's bug.

Performance isn't revolutionary, either for POSE or for Shogo. However, keep in mind that all the system libraries are also running under emulation (only syscalls are native), and with Shogo in particular we've hobbled it even further by making the game render everything entirely in software. With that in mind, the fact the framerate is decent enough to actually play it is really rather remarkable. Moreover, I can certainly test things in POSE without much fuss and it's a lot more convenient than firing up a Mac OS 9 instance to run POSE there.

Best of all, when you're done running alien inferior binaries, just umount ~/mnt and it all goes away. When Debian 12 appears, just replace the squashfs image. Easy as pie! A much more straightforward way to run these sorts of programs when you need to.

A footnote: in an earlier article we discussed HQEMU. This was a heavily modified fork of QEMU that uses LLVM to recompile code on the fly for substantially faster speeds at the occasional cost of stability. Unfortunately it has not received further updates in several years and even after I hacked it to build again on Fedora 34, even with the pre-built LLVM 6 with which it is known to work, it simply hangs. Like I said, for now it's stock QEMU or bust.

Firefox 92 on POWER


Firefox 92 is out. Alongside some solid DOM and CSS improvements, the most interesting bug fix I noticed was a patch for open alerts slowing down other tabs in the same process. In the absence of a JIT we rely heavily on Firefox's multiprocessor capabilities to make the most of our multicore beasts, and this apparently benefits (among others, but in particular) the Google sites we unfortunately have to use in these less-free times. I should note for the record that on this dual-8 Talos II (64 hardware threads) I have dom.ipc.processCount modestly increased to 12 from the default of 8 to take a little more advantage of the system when idle, which also takes down fewer tabs in the rare cases when a content process bombs out. The delay in posting this was waiting for the firefox-appmenu patches, but I decided to just build it now and add those in later. The .mozconfigs and LTO-PGO patches are unchanged from Firefox 90/91.

Meanwhile, in OpenPOWER JIT progress, I'm about halfway through getting the Wasm tests to pass, though I'm currently hung up on a memory corruption bug while testing Wasm garbage collection. It's our bug; it doesn't happen with the C++ interpreter, but unfortunately like most GC bugs it requires hitting it "just right" to find the faulty code. When it all passes, we'll pull everything up to 91ESR for the MVP, and you can try building it. If you want this to happen faster, please pitch in and help.

It's not just OMI that's the trouble with POWER10


Now that POWER10 is out, the gloves (or at least the NDA) are off. Raptor Computing had been careful not to explicitly say what about POWER10 they didn't like and considered non-free, though we note that they pointed to our (and, credit where credit's due, Hugo Landau's) article on OMI's closed firmware multiple times. After all, when even your RAM has firmware, even your RAM can get pwned.

Well, it looks like they're no longer so constrained. In a nerdily juicy Twitter thread, Raptor points out that there's something else iffy with POWER10: unlike the issue with OMI firmware, which is not intrinsically part of the processor (the missing piece is the on-DIMM memory controller), this additional concern is the firmware for the on-chip "PPE I/O processor." It's 16 kilowords of binary blob. The source code isn't available.

It's not clear what this component does exactly, either. The commit messages, such as they are, make reference to a Synopsys part, so my guess is it manages the PCIe bus. Although PPE would imply a Power Processing Element (a la Cell or Xenon), the firmware code does not obviously look like Power ISA instructions at first glance.

In any case, Raptor's concern is justified: on POWER9, you can audit everything, but on POWER10, you have to trust the firmware blobs for RAM and I/O. That's an unacceptable step down in transparency for OpenPOWER, and one we hope IBM rectifies pronto. Please release the source.

First POWER10 machine announced


IBM turns up the volume to 10 (and their server numbers to four digits) with the Power E1080 server, the launch system for POWER10. POWER10 is a 7nm chip fabbed by Samsung with up to 15 SMT-8 cores (a 16th core is disabled for yield) for up to 120 threads per chip. IBM bills POWER10 as having 2.5 times more performance per core than Intel Xeon Platinum (based on an HPE Superdome system running Xeon Platinum 8380H parts), 2.5 times the AES crypto performance per core of POWER9 (no doubt due to quadruple the crypto engines present), five times "AI inferencing per socket" (whatever that means) over Power E980 via the POWER10's matrix math and AI accelerators, and 33% less power usage than the E980 for the same workload. AIX, Linux and IBM i are all supported.

IBM targets its launch hardware at its big institutional customers, and true to form the E1080 can scale up to four nodes, each with four processors, for a capacity of 240 cores (that's 1,920 hardware threads for those of you keeping score at home). The datasheet lists 10, 12 and 15 core parts as available, with asymmetric 48/32K L1 and 2MB of L2 cache per core. Chips are divided into two hemispheres (the 15-core version has 7 and 8 core hemispheres) sharing a pool of 8MB L3 cache per core per side, so the largest 15 core part has 120MB of L3 cache split into shared 64MB and 56MB pools respectively. This is somewhat different from POWER9 which divvys up L3 per two-core slice (but recall that the lowest binned 4- and 8-core parts, like the ones in most Raptor systems, fuse off the other cores in a slice such that each active core gets the L3 all to itself). Compared with Telum's virtual L3 approach, POWER10's cache strategy seems like an interim step to what we suspect POWER11 might have.

I/O doesn't disappoint, as you would expect. Each node has 8 PCIe Gen5 slots on board and can add up to four expansion drawers, each adding an additional twelve slots. You do the math for a full four-node behemoth.

However, memory and especially OMI is what we've been watching most closely with POWER10 because OMI DIMMs have closed-source firmware. Unlike the DDIMMs announced at the 2019 OpenPOWER Summit, the E1080 datasheet specifies buffered DDR4 CDIMMs. This appears to be simply a different form factor; the datasheet intro blurb indicates they are also OMI-based. Each 4-processor node can hold 16TB of RAM for 64TB in the largest 16-socket configuration. IBM lists no directly-attached RAM option currently.

IBM is taking orders now and shipments are expected to begin before the end of September. Now that POWER10 is actually a physical product, let's hope there's news on the horizon about a truly open Open Memory Interface in the meantime. Just keep in mind that if you have to ask how much this machine costs you clearly can't afford it, and IBM doesn't do retail sales anyway.

Cache splash in Telum means seventh heaven for POWER11?


AnandTech has a great analysis of IBM's new z/Architecture mainframe processor Telum, the successor to z15 (so you could consider it the "z16" if you like) scheduled for 2022. The most noteworthy part of that article is Telum's unusual approach to cache.

Most conventional CPUs (keeping in mind mainframes are hardly conventional, at least in terms of system design), including OpenPOWER chips, have multiple levels of cache; so did z15. L1 cache (divided into instruction and data) is private to the core and closest to it, usually measured in double-digit kilobytes on contemporary designs. It then fans out into L2, which is also usually private to an individual core and in triple-digit kilobyte range, and then some level of L3 (plus even L4) cache which is often shared by an entire processor and measured in megabytes. Cache size and how cache entries may be placed (i.e., associativity) is a tradeoff between the latency of searching a larger cache, die space considerations and power usage, versus the performance advantages of fewer cache misses and reduced use of slower peripheral memory.

While every design has some amount of L1, there certainly have been processors that dispensed with other tiers of cache. Most of Hewlett-Packard's late lamented PA-RISC architecture had no L2 cache at all, with the L1 cache being unusually large in some units (the 1997 PA-8200 had 4MB of total L1, 2MB each for data and instructions). Closer to home, the PowerPC 970 "G5" (derived from the POWER4) carried no L3; the 2005 dual-core 970MP, used in the Power Mac G5 Quad, IBM POWER 185 and YDL PowerStation, instead had 1MB of L2 per core which was on the large side for that era. Conversely, the Intel Itanium 2 could have up to 64MB of L4 cache; Haswell CPUs with GT3e Iris Pro Graphics can use the integrated GPU's eDRAM as a L3 victim cache for the same purpose as an L4, though this feature was removed in Skylake. However, the Sforza POWER9 in Raptor workstations is more typical of modern chips with three levels of cache: the dual-8 02CY649 in this machine I'm typing on has 32/32KB L1, 512KB L2 and 10MB L3 for each of the eight CPU cores. In contrast, AMD Zen 3 uses a shared 32MB L3 between up to eight cores, with fewer cores splitting the pot in more upmarket parts.

With money and power consumption being less or little object in mainframes, however, large multi-level caches rule the day directly. The IBM z15 processor "drawer" (there are five drawers in a typical system) divides itself into four Compute Processors, each CP containing 12 cores with 128/128K L1 (compare to Apple M1 with 192/192K) and split 4MB/4MB L2 per core paired with 256MB of shared L3, overseen by a single System Controller which provides a whopping 960MB of shared L4. This gives it the kind of throughput and redundancy expected by IBM's large institutional customers who depend on transaction processing reliability. The SC services the four CPs almost like an old-school northbridge, but to L4 cache instead of main RAM.

Telum could have doubled down on this the way z15 literally doubled down on z14 (twice the L3, nearly half again as much L4), but instead it dispenses with L3 and L4 altogether. L1 jumps to 256/256K, and in shades of PA-RISC L2 balloons to 32MB per core, with eight cores per chip. Let's zoom in on the die.
The 7nm 530mm2 die shows the L2 cache in the centre of the eight cores, which is already a tipoff as to how IBM's arranged it: cores can reach into other cores' cache. If a cache line gets evicted from a core's L2 and the core can find space for it within another core, then the cache line goes to that core's L2, and is marked as L3. This process is not free and does incur more latency than a traditional L3 when an L3 line stored elsewhere must be retrieved, but the ample L2 makes this condition less frequent, and in the less common case where a core requires data and some other core already evicted it to that core as L3, it can just adopt it. Overall, this strategy means better utilization of cache that adapts better to more diverse workloads because the large total L2 space can be flexibly redirected as "virtual L3" to cores with greater bandwidth demands.

It doesn't stop there, though, because Telum has another trick for "virtual L4." Recall that the z15 uses five drawers in a typical system; each drawer has an SC that maintains the L4 cache. Telum is two chips to a package, with four packages to a unit (the equivalent of a z15 "drawer") and four units to a system. If you can reach into other cores' L2 to use them as L3, then it's a simple conceptual leap to reach into other chips (even in different units) and use their L2 as L4. Again, latency jumps over a more traditional L4 approach, but this means theoretically a typical Telum system has a total of 8GB that could be redirected as L4 (7936MB, if you don't count an individual core's L2). With 256 cores in this system, there's bound to be room somewhere faster than main memory.

What makes this interesting for OpenPOWER is that z/Architecture and POWER naturally tend to cross-pollinate. (History favours POWER, too. POWER chips already took over IBM i first with the RS64-based A35 and finally with the eCLipz project; IBM AS/400 a/k/a i5/OS a/k/a i hardware used to be its own bespoke AS/400 architecture.) z/Architecture is decidedly not Power ISA but some microarchitectural features are sometimes shared, such as POWER6 and z10, which emerged from a common development process and as a result had similar fabrication technologies, execution units, floating-point units, busses and pipelines.

POWER10 is almost certainly already taped out if IBM is going to be anywhere close to a Q4 2021 release, so whatever influence Telum had on its creation has already happened. But Telum at the microarchitecture level sure looks more like POWER than z15 did: there is no more CP/SC division but rather general purpose cores in a NUMA topology more like POWER9, more typical PCIe controllers (in this case PCIe 5.0) for I/O and more reliance on specialized pack-in accelerators (Telum's headline feature is an AI accelerator for SIMD, matrix math and fast activation function computation; no doubt some of its design started with POWER10's own accelerator). Frankly, that reads like a recipe for POWER11. While a dual-CPU POWER11 workstation might not have much need for L4, the "virtual L3" strategy could really pay off for the variety of workloads workstations and non-mainframe servers have to do, and on a four or eight-socket server, the availability of virtual L4 starts outweighing any disadvantage in latency.

The commonalities should not be overstated, as Telum is also "only" SMT-2 (versus SMT-4 or SMT-8 for POWER9 and POWER10) and the deep 5GHz-plus pipeline the reduced SMT count facilitates doesn't match up with the shorter pipeline and lower clockspeeds on current POWER generations. But that's just part of the chips being customized for their respective markets, and if IBM can pull this trick off for z/Architecture it's a short jump to making the technology work on POWER. Assuming we don't have OMI to worry about by then, that could really be something to look forward to in future processor generations, and a genuinely unique advance for the architecture.

Kernel 5.14


Version 5.14 of the Linux kernel has landed. Not much in PowerPC land this time around except for a few bug fixes, although one of the fixes repairs an issue that can hit certain hashtable-based CPUs (though I don't believe the POWER9 in HPTE mode is known to be affected), but there are some privacy-related features including memfd_secret() that creates a tract of memory even a compromised kernel can't look into, a new ioctl for ext4 filesystems to prevent information leaks, and of course core-based scheduling allowing restrictions on what processes may share cores as extra insurance against Spectre-type attacks (at the cost of less effective utilization, so this is largely more of interest to hosting providers rather than what you run on your own box). Other new features of note include a burstable "Completely Fair Scheduling" to allow a task group to roll over unused CPU quota under certain conditions, a cgroup "kill button" feature and some initial infrastructure for supporting signed BPF programs. Expect this version to appear in Fedora and other "leading edge" distributions soon.

OpenPOWER Firefox JIT update


As of this afternoon, the Baseline Interpreter-only form of the OpenPOWER JIT (64-bit little-endian) now passes all of the JIT tests except for the Wasm ones, which are being actively worked on. Remember, this is just the first of the three phases and we need all three for the full benefit, but it already yields a noticeable boost in my internal tests over the C++ interpreter. The MVP is Baseline Interpreter and Wasm, so once it passes the Wasm tests as well, it's time to pull it current with 91ESR. You can help.

Debian 11


Debian 11 bullseye is officially released, the latest stable version and the "other white meat" of the two big distros I suspect are commonly used on OpenPOWER workstations (Fedora being the other, and Ubuntu third). Little-endian 64-bit Power ISA (ppc64el) has been a supported architecture for Debian since 8 jessie. The updates are conservative but important, which is what you're looking for if you run Debian stable, such as kernel 5.10, GNOME 3.38, KDE Plasma 5.20, LXDE 11, LXQt 0.16, MATE 1.24, and Xfce 4.16, plus gcc 10.2 and LLVM 9 (with Clang 11). ISOs are already available on the mirrors. If you've updated, post your impressions in the comments.

Firefox 91 on POWER fur the fowk


Firefox 91 is out. Yes, it further improves cookie isolation and cleanup, has faster paint scheduling (noticeably, in some cases), and new JavaScript and DOM support. But for my money, the biggest news is the Scots support: aye, laddie, noo ye kin stravaig the wab lik Robert Burns did. We've waited tae lang fur this.

Anyway, Firefox 91 builds oot o the kist oa, er, Firefox 91 builds out of the box on OpenPOWER using the same .mozconfigs for Firefox 90; I made a wee change to the PGO-LTO patch since I messed up the diff the last time and didn't notice. The crypto issues in Fx90 are fixed in this release.

Meanwhile, the OpenPOWER JIT is now passing all but a handful of the basic tests in Baseline Interpreter mode, and some amount of Wasm, though this isn't nearly as far along. Ye kin hulp.

Tonight's game on OpenPOWER: System Shock Enhanced Edition


Yeah, I know we're doing a lot of FPSes in this series. It's what I tend to play, so deal. Tonight we'll be playing System Shock, the classic hacker-shooter (seems appropriate), courtesy of Shockolate, which adds higher resolutions, better controls, mouselook and OpenGL support. Our drug dealers at GoG, who don't pay us a cent for this kind of shameless plug and really ought to, make the game files easily available as System Shock Enhanced Edition. However, you can also use the DOS or Windows 95 CD-ROM; I tested with both. (I'll talk about the Macintosh release in a moment.)

Shockolate requires CMake and SDL2, and FluidSynth is strongly advised. Don't let Shockolate build with its bundled versions: edit CMakeLists.txt and change all "BUNDLED" libraries to "ON" (don't forget the quote marks). Once set, building should work out of the box (tested on Fedora 34):

mkdir build
cd build
cmake ..
make -j24 # or as you like
cd ..
ln -s build/systemshock systemshock

(The last command is to make running the binary a little more convenient.)

Now we need to provide the resources. For FluidSynth, you'll need a soundfont (I used the default that comes with Fedora's package). If you have the DOS/Windows CD-ROM, insert it now. We will assume it is mounted at /run/media/censored/EA.

mkdir res
cd res
ln -s /usr/share/soundfonts/default.sf2 default.sf2
cp -R /run/media/censored/EA/hd/data .
cp -R /run/media/censored/EA/hd/sound .
chmod -R +w . # if copying from CD makes things read only
cd data
rm -f intro.res
rm -f objprop.dat
cp /run/media/censored/EA/cdrom/data/* .
cd ../..

Then start the game with ./systemshock. The resolutions and choice of renderer (software or OpenGL) are set from the in-gameplay menu (press ESC). Shockolate also implements WASD motion (as well as the classic arrow keys) and F to toggle mouselook. Note that OpenGL is somewhat darker than software mode. It's not clear if this is actually a bug.

Playing System Shock Enhanced Edition in Shockolate is just a more convenient way to get the DOS assets since Shockolate just uses those and not any of the patches (more about this in a second); gameplay and features are the same. Also, GoG only distributes it as a Windows installer and the file structure is a bit different. Use innoextract to break the installer EXE apart into a separate directory and delete everything but sshock.kpf, which is a cloaked ZIP archive containing the game assets. In your Shockolate source directory (note that this also creates res/, so if you did the steps above delete it first),

mkdir ssee
cd ssee
unzip /path/to/sshock.kpf
cd ..
mkdir res
mv ssee/res/pc/hd/data res
cp ssee/res/pc/cdrom/data/* res/data/
mv ssee/res/pc/hd/sound res
rm -rf ssee # if you want
ln -s /usr/share/soundfonts/default.sf2 res/default.sf2

Then start the game with ./systemshock.

Oddly, although Shockolate was based on the (IMHO) superior Power Mac release, it doesn't seem to properly support its higher-resolution assets (SSEE does and includes a converted set, but the source for thatunlike Strife — isn't currently available). I actually own this version also. One rather unique reason to own it is because the cutscenes and audio files are all playable in QuickTime, so if you don't feel like slogging through the entire game you can just listen to the audio logs or go straight to the ending using a Mac emulator. However, you need to do a little song and dance to mount the HFS volume on Linux (as root):

losetup /dev/loop0 /dev/sr0 # or where your drive is
partx -av /dev/loop0

This will respond with something like

partition: none, disk: /dev/loop0, lower: 0, upper: 0
/dev/loop0: partition table type 'mac' detected
range recount: max partno=2, lower=0, upper=0
/dev/loop0: partition #1 added
/dev/loop0: partition #2 added

and you should see it mount in your desktop environment (note that many applications won't understand the resource fork). Do losetup -D before ejecting the physical disc. As a parenthetical note, since SSEE is presumably derived from the GPL-released Mac source code, you would think it, too, would be GPL. But I'm uncertain of the exact history there.

Salt the fries.

Rest in peace, Itanium


Intel has now ceased official support for IA-64 and will make no more Itanium processors (which matters really only to HPE). It's sad to see another processor architecture disappear, even one that seemed to be as universally unloved as this one, so consider this post a brief epitaph for yet another drop in computing diversity. If you want choices other than the cold, sterile x86-ARM duality, they won't happen if you keep buying the same old stuff.

Firefox 90 on POWER (and a JIT progress report)


Firefox 90 is out, offering expanded and improved software WebRender (not really a problem if you've got a supported GPU as most of us in OpenPOWER land do, though), an enhanced SmartBlock which ups the arms race with Facebook, and private fields and methods in JavaScript among other platform updates. FTP is now officially and completely gone (and really should be part of registerProtocolHandler as Gopher is), but at least you can still use compact layout for tabs.

Unfortunately, a promising OpenPOWER-specific update for Fx90 bombed. Ordinarily I would have noticed this with my periodic smoke-test builds but I've been trying to continue work on the JavaScript JIT in my not-so-copious spare time (more on that in a moment), so I didn't notice this until I built Fx90 and no TLS connection would work (they all abort with SSL_ERROR_BAD_SERVER). I discussed this with Dan Horák and the official Fedora build of Firefox seemed to work just fine, including when I did a local fedpkg build. After a few test builds over the last several days I determined the difference was that the Fedora Firefox package is built with --use-system-nss to use the NSS included with Fedora, so it wasn't using whatever was included with Firefox.

Going to the NSS tree I found bug 1566124, an implementation of AES-GCM acceleration for Power ISA. (Interestingly, I tried to write an implementation of it last year for TenFourFox FPR22 but abandoned it since it would be riskier and not much faster with the more limited facilities on 32-bit PowerPC.) This was, to be blunt, poorly tested and Fedora's NSS maintainer indicated he would disable it in the shipping library. Thus, if you use Fedora's included NSS, it works, and if you use the included version in the Firefox tree (based on NSS 3.66), it won't. The fixes are in NSS 3.67, which is part of Firefox 91; they never landed on Fx90.

The two fixes are small (to security/nss/lib/freebl/ppc-gcm-wrap.c and security/nss/lib/freebl/ppc-gcm.s), so if you're building from source anyway the simplest and highest-performance option is just to include them. (And now that it's working, I do have to tip my hat to the author: the implementation is about 20 times faster.) Alternatively, Fedora 34 builders can still just add --with-system-nss to their .mozconfig as long as you have nspr-devel installed, or a third workaround is to set NSS_DISABLE_PPC_GHASH=1 before starting Firefox, which disables the faulty code at runtime. In Firefox 91 this whole issue should be fixed. I'm glad the patch is done and working, but it never should have been committed in its original state without passing the test suite.

Another issue we have a better workaround for is bug 1713968, which causes errors building JavaScript with gcc. The reason that Fedora wasn't having any problem doing so is its rather voluminous generated .mozconfig that, amongst other things, uses -fpermissive. This is a better workaround than minor hacks to the source, so that is now in the .mozconfigs I'm using. I also did a minor tweak to the PGO-LTO patch so that it applies cleanly. With that, here are my current configurations:

Debug

export CC=/usr/bin/gcc
export CXX=/usr/bin/g++

mk_add_options MOZ_MAKE_FLAGS="-j24" # as you like
ac_add_options --enable-application=browser
ac_add_options --enable-optimize="-Og -mcpu=power9 -fpermissive"
ac_add_options --enable-debug
ac_add_options --enable-linker=bfd

export GN=/home/censored/bin/gn # if you have it

PGO-LTO Optimized

export CC=/usr/bin/gcc
export CXX=/usr/bin/g++

mk_add_options MOZ_MAKE_FLAGS="-j24" # as you like
ac_add_options --enable-application=browser
ac_add_options --enable-optimize="-O3 -mcpu=power9 -fpermissive"
ac_add_options --enable-release
ac_add_options --enable-linker=bfd
ac_add_options --enable-lto=full
ac_add_options MOZ_PGO=1

export GN=/home/censored/bin/gn # if you have it
export RUSTC_OPT_LEVEL=2

So, JavaScript. Since our last progress report our current implementation of the Firefox JavaScript JIT (the minimum viable product of which will be Baseline Interpreter + Wasm) is now able to run scripts of significant complexity, but it's still mostly a one-man show and I'm currently struggling with an issue fixing certain optimized calls to self-hosted scripts (notably anything that calls RegExp.prototype.* functions: it goes into an infinite loop and hits the recursion limit). There hasn't been any activity the last week because I've preferred not to commit speculative work yet, plus the time I wasted tracking down the problem above with TLS. The MVP will be considered "V" when it can pass the JavaScript JIT and conformance test suites and it's probably a third of the way there. You can help. Ask in the comments if you're interested in contributing. We'll get this done sooner or later because it's something I'm motivated to finish, but it will go a lot faster if folks pitch in.

Intel might fab your next POWER chip


The Wall Street Journal is reporting that Intel is in talks to buy GlobalFoundries ("GloFo")' manufacturing assets for potentially as much as US$30b (via Reuters). Much of the amusement on the Internet tech sector side is because GloFo started life in 2008 as the divestiture of AMD's manufacturing arm, an admittedly entertaining historical irony, but GloFo is important to us in OpenPOWER-land too because they manufacture the POWER9.

GloFo's fabbing of POWER and OpenPOWER comes from a July 2015 deal with IBM where IBM basically paid GloFo US$1.5b to take and operate their U.S.-based chip manufacturing business unit. In return, GloFo would provide chips to IBM through 2025. The POWER9 in your servers and workstations was initially manufactured at IBM's former East Fishkill, NY plant, which became GlobalFoundries Fab 10 and was sold to ON Semiconductor in 2019, and are now produced at GloFo Fab 9, which was IBM's previous plant in Essex Junction, VT.

This arrangement was odd even at the time, and admittedly wouldn't have been IBM's first choice, but semiconductor production has obvious national security implications and regulators wouldn't let it just sell off or shut down its entire domestic manufacturing arm. Predictably the deal doesn't seem to have gone well. In a lawsuit last month, IBM alleged that by fall 2015 GlobalFoundaries told IBM they wouldn't proceed further on development of a 10nm process and instead would move to 7nm, jeopardizing the POWER9 which was originally supposed to be produced at 10nm. IBM claims that despite what it views as a contract breach, it nevertheless continued the payout to bring up a 7nm process instead by Q3 2018, which didn't happen either. At that point IBM says GloFo asked it for another $1.5b, which IBM refused, and GloFo suspended further development. The POWER9 was eventually manufactured with a 14nm node size using FinFET technology instead.

Intel's interest in GloFo is to secure a manufacturing pipeline for its silicon in the midst of the global chip shortage, and, as voiced earlier this year by CEO Pat Gelsinger, to get a piece of the chip manufacture business from companies other than itself. While GloFo is only about 7% of the market, they have a customer base and support infrastructure Intel completely lacks. While GloFo's current node size is still not highly competitive, there is still a lot of money to be made in larger process sizes for less performance-sensitive applications. Chipzilla certainly has its own fabs but their technology has not been nearly as advanced (certainly not as much as, say, TSMC), and they cater nearly exclusively to Intel in-house designs. However, AMD still relies on its former facilities at GloFo for its own chips, which would certainly attract regulator scrutiny if Intel were to control its supply chain. The deal does not seem to involve GloFo the company, but would have to involve its physical assets and operations to be at all worth the headache.

The POWER9 is still made at Fab 9, but IBM, chastened by its problems with GloFo, has turned to Samsung and its 7nm process for POWER10. However, Intel has a strong interest in improving its node size and although exact numbers have lately gotten more and more meaningless, Cannon Lake, Tigerlake and others are "still" at 10nm, whatever that means. Plus, Intel would presumably have control over IBM's old facilities which they would still know relatively well, and while IBM doesn't have the volume of other chip designers anymore, they're still considered a significant player and their parts are higher-end. AMD may not like their next chips being fabbed by Intel but IBM may not have a problem with it, and if Samsung can't deliver on POWER10 after all, stranger things have happened.

Chimera Linux: if you like BSD and Linux, why not both


A new Linux option for ppc64le popped up on my radar today, and this one's really interesting: if you like the way FreeBSD does business, but you want a Linux kernel, now you don't have to choose. Chimera Linux gives you a FreeBSD userland with no GNU components in the base system except make and ncurses (for some of the readership this will be a feature), plus the ability to bootstrap itself from any musl-based distro like Alpine or Adelie or Void PPC's musl variant. ppc64le, aarch64 and x86_64 are the three launch architectures, making OpenPOWER a first-class citizen from the very beginning, and they promise portability with any architecture that has LLVM/clang available.

There are some important questions yet to answer, however, and the distro is clearly not yet ready for prime time. There's no init system yet (let's hope it's not systemd, because that would really be an unholy union) and there's not even a kernel, so I can't tell you what it runs; presumably it uses the kernel of the distro you bootstrap it on. For that reason, don't even think about asking for a bootable ISO. The source package build system is also custom and it wouldn't be surprising to me if it manifested rough edges for awhile.

The other question mark is LLVM; Chimera relies on clang, not gcc. clang works for a lot of things on ppc64le but at least in my usage doesn't properly work for some large projects like Firefox, and its performance is marginally poorer. This is undoubtedly no problem on the other two architectures, but they are the primary focus of LLVM and clang, and OpenPOWER isn't.

Still, I think there's real potential for this project, quite possibly for people who would ordinarily be attracted to Slackware's philosophy but are more used to the way BSDs do business. (As a product of the University of California, I have great empathy for this viewpoint.) And there's already precedent: Mac OS X macOS is a Mach kernel, but with a BSD userland, and look at how successful that concept was. Sometimes the best choice is the one you don't have to end up making.

Bloody Friday at IBM?


The conspiracy theorist in me wonders about the timing of announcing a large-scale shift in IBM executive staff ... on a Friday going into a United States holiday weekend. That's the kind of press release that tries very hard to say "nothing to see here," and especially one with the anodyne title of "IBM Leadership Changes."

But of course there is something to see here. Cutting through nonsense like "our hybrid cloud and AI strategy is strongly resonating" (remember, things break if they resonate too much), the biggest thing to see is IBM President Jim Whitehurst stepping down. Whitehurst was, of course, the former CEO of Red Hat prior to its merger with Big Blue, and he lasted just over a year as IBM President (but over twelve with Red Hat). He will now be working as a "Senior Advisor" to IBM CEO and chairman Arvind Krishna, which is a nice way of saying a paid sinecure until he finds another gig, and it's unclear if he will get any further payments on his US$6 million retention bonus. This is what it looks like when someone is pushed out, presumably over fax given the current IBM E-mail migration mess.

Whitehurst wasn't the only executive blood spilled: the senior VP of global markets is also out (not to be confused with Global Technology Services), now similarly moved to an executive sinecure for "special projects," and the former senior VP of IBM systems and hardware has either been laterally moved or outright demoted to senior VP of "cloud and cognitive software." Various other new names are coming on board, too. It really reads like Krishna is cleaning house.

Is this reshuffle part of the bootup of the cloud-focused IBM "NewCo", now called the vaguely unpronounceable Kyndryl? Technically NewCo-Kyndryl doesn't exist until the end of this year, and none of the officers Kyndryl announced yesterday include the names in IBM's announcement today. But it would be surprising for any of the cloud operations to remain at IBM "OldCo" which is supposed to have the legacy software and hardware operations ... unless the reasoning for the split was just a load of hooey and it was never the intention for either half of the company to retain exclusive control over its respective market interests. That probably bodes more ill for Kyndling, er, Kyndryl than it does for its parent corporation. Either way, it certainly looks like Red Hat proved the old joke is true: anything plus IBM ends up equalling IBM.