Posts

Showing posts from 2020

Raptor suspends shipments due to COVID-19


UPDATE from Tim Pearson: "It looks like with the latest guidance released from the state we can in fact continue to operate mostly normally. Bundles/mainboards/etc. will continue to ship from stock, but any unusual options on built systems or built systems that require parts depleted from stock would have a TBD ship date until the parts can be brought into stock." Good news! Buy that Blackbird bundle after all!

Just in case you were planning on that Blackbird: because of Illinois' "shelter-in-place" order due to the COVID-19 pandemic, Raptor is announcing they are unable to ship. Existing orders (like my order for a second dual-8 Talos II) cannot be refunded or cancelled currently, not that I plan to, but don't make the order unless you are prepared to wait. In addition, tickets and phone support are also suspended, though their IT hosting services and community support are unaffected.

Still, if you have the money to spare, we don't want these folks to go under. Maybe spin up a VM to play with.

Combat coronavirus. Buy a Blackbird.


There hasn't been a lot of content on Talospace lately because of how exponentially busier my particular dayjob has gotten and bluntly I'll just say the Chinese government has a lot to answer for the way they've handled all of this. I'm not going to belabour the obvious impact on COVID-19 on health, but it's also similarly roiling financial markets and a lot of businesses are going to fail the longer this goes on as supply chains get perturbed, especially for imported components.

Any contraction, but in particular one that has a high probability of being prolonged, is bad for niche computing markets like ours. Global contractions like the one that's hitting and the persistent one that's certain to follow disproportionately affect big-ticket items as people hold onto cash. (A finance colleague of mine is convinced there's about to be a liquidity crunch, and he makes a compelling argument I won't repeat here.) Unfortunately, largely because our sorts of systems lack various economies of scale, these are the systems most likely to die off if companies like Raptor go under because they're the ones much fewer people are buying.

This should not be interpreted as me telegraphing some belief that Raptor is financially unsound (they're a private company anyhow; I have no idea what their balance sheet looks like), but I'm rather invested in OpenPOWER and I wouldn't want it to be a casualty of the economic hit we're likely to take until a COVID-19 vaccine or effective treatment exists. And right now Raptor are the ones actually shipping workstations and I want that to continue.

I myself have already pulled the trigger on a second Talos II, upgrading to a dual-8 POWER9 DD2.3 system instead of the dual-4 DD2.2 I have now. I'll combine the RAM and cards, leaving me with a spare board, CPUs and chassis which can be a source for parts or to replace my aging POWER6 frontline server when it reaches my planned end of service period in a few years. I'm looking forward to it arriving, and I'm sure the extra few thousand would be appreciated by the back office to keep folks on the payroll getting systems configured and shipped.

If you're dithering over whether you want to pick up that Blackbird and you've got the cash or plastic on hand, consider just going ahead and doing it. Even just consider picking up a board and CPU if you need to save a little extra. Support the small businesses that support us. If this whole thing goes like we worry it's gonna, we're all going to be spending a lot more time at home anyway, so you might as well do it with a computer you can trust.

Firefox 74 on POWER


So far another uneventful release on ppc64le; I'm typing this blog post in Fx74. Most of what's new in this release is under the hood, and there are no OpenPOWER specific changes (I need to sit down with some of my other VMX/VSX patches and prep them for upstream). The working debug and optimized .mozconfigs are unchanged from Firefox 67.

Jon Masters, transparent ARM shill


I don't hate ARM. But I do hate cynical bloodymindedness.

Jon Masters' pronouncement of OpenPOWER as "dead" has been getting some press, and as far as this particular Power ISA bigot is concerned it's transparent twaddle. He's done a lot for ARM at Red Hat (lest we forget: a current subsidiary of IBM), but he's no longer at Red Hat: he's VP of Software at startup NUVIA, which is building ... a server-grade ARM chip. Knowing what he's planning on selling, that makes his "hot take" on OpenPOWER more than a little bit coloured by his own biases.

Despite being self-serving, though, not everything he points out is wrong. One valid concern is that currently the only manufacturer of high-performance OpenPOWER chips is IBM itself. We are fortunate in that Raptor is an accessible retail channel for these chips (and workstation-class systems), but most of the third-party builders are using Power in embedded applications, not high-performance desktops. Even in the Apple days the chip sources were pretty much just IBM and Motorola/Freescale, and for the G5 exclusively IBM (the brief existence of the PA6T notwithstanding); with the exception of Cell, the Power ISA game console generation was exclusively IBM too (i.e., Xenon, Gekko, Broadway and Espresso), and even Cell was an IBM co-design, so this is not a new issue. This is something that needs to be fixed and thanks to OpenPOWER being a royalty-free ISA there's a market opportunity here you don't have to pay IBM to exploit.

But to essentially argue it's okay to be open, but not that open is painfully self-serving. ARM can certainly compete in the server space; Apple's chips are already in striking distance even with their imposed limits on power consumption, and other companies have gotten into this business before. But none of them will be able to do it without paying ARM royalties, and with that investment in mind none of them want to do it without secret sauce (binary blob drivers) to deter competition. We're in a CPU age where what people think is the CPU is merely the target of a long line of intermediate operating steps and every one of these has firmware. On the Talos II I'm typing on, I can see the source code for every single boot stage. For Masters to argue that none of this matters until you pass into UEFI is like arguing that the Intel Management Engine, bless its little exposed backside, is somehow irrelevant, or that all the boot stages for POWER9 don't matter until you actually get to Petitboot, let alone all the sidecar auxiliary units like the GPEs and OCCs. Do we really need to go over again all the disastrous faults that have emerged in blackbox firmware you can't see or modify?

Masters knows this, too, and that makes his statements not just crap but disingenuous crap as well. (Perhaps he sees OpenPOWER as a threat?) Regardless, that also means you can confidently expect that NUVIA CPUs, if they ever even come out with a product (see also Calxeda), will be just as locked down as any other ARM core. So much for "reimagining silicon design."

Messing with the new 2.0 BMC


Tonight's attempt to upgrade the Blackbird to the new 2.0 BMC firmware did not go smoothly, though some of this was self-inflicted, and I'm also still flattened by whatever hellish virus has gripped me for the last month (it's not coronavirus, or at least not that coronavirus) which causes me little tolerance for glitches. TL;DR: the firmware basically works, but when I used it to reconfigure its IP address the BMC now can't see anything and nothing can see it, which has left me in somewhat of a foul mood [but see postscript]. I'll get it working when I'm feeling better and you should probably still update, but beware of these pitfalls.

Updating to 2.0 from any pre-2.0 version requires a complete flash of the BMC. Raptor warns against this generally because all your U-Boot/firmware settings will be reset, but in this case it's unavoidable. That brought us to the first problem: when sshed into the BMC, at the root prompt fw_printenv is supposed to show you the IPMI MAC address so you can reprogram it. On this Blackbird, however, it showed absolutely bupkis except for the serial port settings. After a brief moment of panic I realized I had a picture of the mainboard from the Blackbird semi-review and could enter it from that. Otherwise you'll have to drag the machine out, open it up and jot down the address printed on the board. Oddly, this does not reset anything else, including the BMC password or actual network settings. More about that in a moment. It did, however, change the ssh key.

Now updated, since my other main systems are Power Macs (and what better computer to be a "service processor" to your Blackbird than another PowerPC?), I decided to do further configuration through the BMC's new web interface in TenFourFox, which is essentially Firefox 45 with a lot of patches. These were done on my iBook G4 service laptop running the latest beta hot off the G5 in the backroom.

The first thing to keep in mind is that the certificate is self-signed. No biggie, just expect it.

The webapp appears to be written in Angular, and it's using JavaScript too recent for TenFourFox (which admittedly doesn't get along well with current React or AngularJS frameworks). Some stuff does work -- the IPMI sensor data loads -- but does not automatically update, and the server status never appeared. It might have been nicer to have a better fallback, especially for the NoScript people, so that data can be displayed even if it won't update until one reloads the page.

It didn't appear here either, even when directly queried.

Fortunately my main use case was to upload firmware through the web interface, so I decided to immediately update the PNOR (both out of necessity and as a useful test), and that worked. Just unpack the archive and upload the subarchive in the web_ipmi folder (the server will automatically unpack the .tar.gz and make the firmware available). TenFourFox threw a weird error at the end but the firmware uploaded, was verified, and could be activated.

IPL, showing the fans coming up. You can boot through the interface, but I just pushed the power button since I was sitting next to it.

The serial port output did not work on TenFourFox either, so I did it from Firefox on the MacBook Air, which I found technically disgusting but worked rather well. Fedora will happily run on the serial port. I was able to log in and look around from the BMC itself. Yes, using TenFourFox was a self-inflicted wound, but I thought it would have worked better than it did.

At this point I decided that I'd had enough mucking around with the Blackbird over WiFi and decided to give it a new static IP through the web interface and run it to the iBook over Ethernet. I did this from the Air, just in case the iBook screwed it up. The machine obligingly accepted the settings and then stopped responding on any address, even after a power cycle. Tomorrow I'll try to find a serial connector to talk to the board directly and try to start over from scratch. I would have your network settings finalized first before this update, as you probably should anyway.

I haven't tried doing this update on the Talos II. I might not anyway since my tax refund should be arriving and I'll be upgrading to a dual-8 system soon. I can't imagine there's much difference in firmware experience between the two systems, though.

The moral of the story is don't update firmware when you're ill.

POSTSCRIPT: ipmitool saves the day! After an obscure mention in an IBM technical manual I was reading for another purpose, it dawned on me that Petitboot (or, for that matter, Fedora) can set the BMC's address.

I started up the Blackbird and went into the Petitboot shell. A quick ipmitool lan print 1 showed what the problem was: the new web BMC interface claimed it had removed the old ZeroConf IP address, but had not, and that became the IP. Since the netmask was now all munged, nothing could see it and it couldn't see anything. So I forced the issue:

ipmitool lan set 1 ipsrc static
ipmitool lan set 1 ipaddr [your IPv4 address]
ipmitool lan set 1 netmask [your IPv4 netmask]
ipmitool lan set 1 defgw ipaddr 0.0.0.0

A quick powercycle confirmed it stuck, and the web BMC answers correctly on the expected address. I still consider this a bug in web BMC but fortunately it's recoverable without digging out the serial cable.

The BMC is getting new tricks


UPDATE: And it's out! Talos II, Blackbird. I'll do the Blackbird update this weekend first. The web view of the serial port (for Petitboot) is particularly nice for those of us without GPU blobs in our firmware.

On Twitter Raptor is teasing an upcoming new BMC build for all Raptor family systems (Talos II, T2 Lite and Blackbird) offering web-based environmental monitoring and firmware updates. I'm a command line jockey myself and I didn't find the previous SSH-based means too onerous, but a web-accessible environmental monitoring system could be quite useful for centralized setups (especially if there's an API or some other means to scrape the data into a dashboard). Since this data is directly served from the BMC, it would be more complete than what ibmpowernv offers now and being directly pushed should be faster than ipmitool which can take several seconds to gather information (see our DIY GNOME IPMI fan tool for a real-world example). If you can't wait, it looks like this code is publicly available in Raptor's git tree right now, but you'll have to build it yourself (you should anyway) since there don't appear to be beta builds just yet.

Firefox 73 on POWER


... seems to just work. New in this release is better dev tools and additional CSS features. This release includes the fix for certain extensions that regressed in Fx71, and so far seems to be working fine on this Talos II. The debug and optimized mozconfigs I'm using are, as before, unchanged from Firefox 67.

DIY IPMI fans


It was pointed out on the Raptor discussion board that the ibmpowernv hwmon module doesn't report fan speed for Raptor family systems, and I suspect this is true for most things based on the Romulus reference design (you can only see the fan of the graphics card, and of course only if it's installed). This means most of the GNOME shell extensions to display system status won't display it. However, it is accessible by talking to the BMC over IPMI, so you should be able to get it that way. Here's a quick-and-dirty method to put your Blackbird or T2 fan speed(s) into your GNOME shell (and probably works fine for other systems with IPMI-accessible fans). This is using Fedora 31; adjust for your distro to taste.

  1. First, verify that you do have fans. You'll need to do this as root: sudo ipmitool sdr type fan

    This will show output like this, after a couple seconds:

    fan0   | DDh | ok  | 29.1 | 2100 revolution
    fan1   | DEh | ok  | 29.2 | 2100 revolution
    fan2   | DFh | ok  | 29.3 | 1900 revolution
    fan3   | E2h | ok  | 29.4 | 2000 revolution
    fan4   | E3h | ok  | 29.5 | 1700 revolution
    fan5   | E4h | ok  | 29.6 | 1700 revolution
    fan6   | E5h | ns  | 29.7 | Disabled
    
  2. We don't want to have to constantly query the BMC as root, so create a ipmi group and put yourself in it (with vigr, vigr -s and vipw as needed). Log out and log back in, and check groups to make sure you have ipmi privileges.

  3. Create a udev rule to make IPMI group-accessible by our new group ipmi. In /etc/udev/rules.d/99_my.rules, I have

    # allow ipmi to be seen by ipmi group
    KERNEL=="ipmi*", GROUP="ipmi", MODE="0660"

    Restart your system to make sure this sticks, and/or chgrp ipmi /dev/ipmi0 ; chmod 0660 /dev/ipmi0 to make the change live. You should now be able to just do ipmitool sdr type fan as your IPMI-group user.

Now that your system is configured, let's actually integrate the output. At some point I'll maybe make this into a full-fledged extension but for prototyping and playing around purposes, there is an easier way: Argos. Though sadly the maintainer is no longer a GNOME user, the extension seems to work fine still for this purpose as of this writing.

  1. Install the Argos GNOME extension if not already done. You may wish to chmod -x ~/.config/argos/argos.sh afterwards to get the demo menu out of your menu bar.

  2. Download this simple script to format the output from ipmitool into Argos BitBar output. Its only dependencies are bash, awk and ipmitool. It gets the IPMI information, caches it (because it's expensive), and then figures out the fastest fan and puts that into the Argos button (click that for all the fans in the system, as shown in the screenshot).

  3. The script goes into ~/.config/argos, but the filename will be based on where you want it and how quickly you want it to update itself. My filename is ipmitool.6r.5s.sh, which says set it to position six on the right side of the shell bar (this varies on other shell components you have there) and updates every 5 seconds.

  4. Once you have selected position and interval, chmod +x ~/.config/argos/[filename].sh, Argos will automatically see it, and it will start updating at the interval encoded in the filename. If it's in the wrong place, or you don't like how quickly or slowly it updates, just rename the file and Argos will "do the right thing" live.

Do the brew*


I've long tried to position the Talos family as an upgrade path for Power Mac owners, and here's another way: the macOS Homebrew package manager has been ported to OpenPOWER.

The concept is a bit involved but most of the work has been done for you. To bootstrap Ruby requires building a version from the portable Ruby recipe, or you can borrow a ppc64le build and patch the vendor install script to find it. At that point you should be able to patch brew itself with the three patches linked in the instructions. We look forward to seeing these patches getting into Homebrew proper!

(*The authors of Talospace do not endorse the large-scale drinking of alcoholic beverages unless you are an Asgardian god or Australian. And even then.)

Bonuses for big-endian


As I recover from the flu and from my apoplexy over a local plumber who has stood me up for four days in a row, there's at least some good news for those of you who like big ends and cannot lie.

First, Void Linux for PowerPC reports that all available 64-bit big-endian (ppc64) and 32-bit PowerPC packages have been built, bringing them to near-parity with ppc64le. Of all total Void packages available, 32-bit ppc-musl has the lowest percentage of buildable packages, but even that is an impressively robust 88.28%. 32-bit ppc with glibc is at 88.79%, ppc64-musl at 90.15% and ppc64 at 90.49% (compare to 64-bit little-endian musl at 93.76% and glibc at 94.64%). As many 32-bit PowerPC users, particularly Power Mac owners, are looking for current supported browser options, Firefox isn't available on 32-bit (due to an xptcall issue), but WebKit and various derivatives (such as Midori and various other shells) are as well as NetSurf. (G5 owners can run Firefox on ppc64.) Chromium isn't available on any of the ports, but this seems fitting.

Secondly, someone pointed me at a distribution I hadn't encountered before called PowerEL, from VanTosh. PowerEL seems to be ultimately a Red Hat descendant and states it is ABI compatible with RHEL, but claims it derives other attributes from other distributions and has a list of repos you can browse. What I found most notable is that it offers support for big-endian POWER8 and up as well as little-endian POWER8/9 and Intel x86_64 with AVX. Note that I haven't tried booting this on a Talos II or Blackbird and it isn't clear to me how workstation-suitable it is; additionally, the internal version number suggests it's derived from RHEL 7 (not the current 8) which possibly explains why it still has big-endian support. If you use PowerEL or have tried it, post your impressions in the comments.

CentOS updated to 8.1.1911


CentOS has been updated to 8.1.1911, the distro for those of you who want Red Hat Enterprise Linux but don't want to pay for it. While not linked from the main AltArch Downloads page, ISOs that should be compatible with all OpenPOWER systems are available from the various mirrors.

Alternatively, CentOS Stream (the Freshmaker!) is also available for ppc64le.

Another Amiga you don't want


The Amigaphile community is possibly more rabid about its ecosystem than even us OpenPOWER dweebs, so right off I present my Amiga bona fides before getting stuck into it: in this house is a Amiga Technologies '060 A4000T running AmigaOS 3.9, an Amiga 3000 (with the tape drive, so I might even try Amix on it someday), and several A500s.

My disenchantment with the current crop of PowerPC-based Amigas, however, is well-known. At their various asking prices you got embedded-class CPUs at mid-high desktop level prices with performance less than a generation-old Power Mac. (Many used and easily available models of which, I might add, can run MorphOS for considerably less cash out of pocket.) A-EON's AmigaOne X5000 in particular attracts my ire for actually running a CPU that doesn't have AltiVec to replace the X1000, which did. Part of this problem was the loss of availability of the PA6T-1682M CPU, the one and only CPU P. A. Semi ever officially produced under that corporate name before Apple bought them, but there were plenty better choices than the NXP P5020 which wasn't ever really intended for this application.

Since 2014 or thereabouts a "small" Amiga (presumably along the lines of the earlier Sam440ep and Sam460ex family, which used PowerPC 400-series CPUs) was allegedly in development by A-EON and unveiled in 2015 as the original A1222 "Tabor" board. At that time it was specced as a mini-ITX board with 8GB of RAM and most controversially another grossly underpowered embedded core, a QorIQ P1022 at 1.2GHz. Based on the 32-bit e500v2 microarchitecture, it not only also lacks AltiVec but even a conventional FPU, something that for many years was not only standard on PowerPC CPUs but considered one of its particular strengths. As an embedded CPU, this was forgivable. As an embedded CPU in a modern general-purpose computing environment, however, this was unconscionable. Even by Amiga standards this was an odd choice and one with potential impacts for compatibility, and one a G5 or high-end Power Mac G4 would mop the floor with.

The original Tabor was apparently manufactured in a small batch of 50 for testers. Between then and 2018 it's not clear what happened on A-EON's end, but their hardware partner had a change of management and various other components needed updating. Strangely, the CPU was not among them. Today, six years later, people may now publicly pre-order the "A1222 Plus," with no change in specs other than changing some board components, and expected to ship Q2 2020. If you're lucky and they're not sold out, your $128 will get you a AAA Bundle Package with some software and a certificate in a pretty box as a 20 percent down payment on the "special low" $450 purchase price. If you get your $128 in after they're gone, then it's just a deposit. The AAA Bundle Package was supposed to be available December 24 but the website wasn't even up until January 11.

I'm willing to make some allowances for a cheap(er) modern Amiga because like our OpenPOWER systems, getting the price down some expands the market (ergo, Blackbird). However, that $450 price is almost certainly not the intended MSRP (nor has it been reported what that eventual MSRP is), and for low production volumes the CPU is not the major cost of a system. I don't know what actual considerations went into its design, but if A-EON chose this CPU deliberately as an attempt to make the price of the system lower, they chose wrong.

I certainly don't want to pick on other boutique systems unnecessarily. After all, in many people's minds the Talos II I'm typing on is itself a pretty darn boutique system, and one where the out-the-door price can easily eclipse even the priciest AmigaOne configuration. If people want to have their fun and pay a crapload of money for it even if other people think it's junk, then as long as it's not hurting anyone else praise the Lord and pass the credit card.

However, this is not that situation. Amiga has a long history in computing consciousness and most computer nerds would at least recognize the brand. Similarly, people still remember Power Macs, and while recollections vary in accuracy and fondness, there's a rather pernicious and commonly-held belief that Apple's migration to Intel somehow "proved" the inferiority of Power ISA. Today, along comes yet another underwhelming PowerPC-based Amiga to confirm their preconceived notions: because it's Amiga, it sticks in people's minds, and because it confirms their own beliefs about PowerPC, it reinforces their unjustified biases. Regardless of the fact that POWER9 systems like this one outpace ARM and RISC-V and compete with even high-end x86_64 offerings, the presence of Tabor and X5000 et al simply gives the resolutely underinformed yet another stick to beat Power desktops with. These Amigas suck compared to PCs, they reason, so OpenPOWER must suck too.

At the very least modern Amiga systems need to beat these decade-plus-old Power Macs to be even vaguely taken seriously. If they must go embedded then at least an e6500-series processor would do better than anything that they're running, and while I certainly do appreciate the considerable AmigaOS porting work that would be necessary, going with a minimally-redesigned reference design board for a POWER8 or POWER9 in big-endian mode could still free up development resources for such a task that would otherwise be sunk into yet another quixotic product. As it is, these Amiga systems don't do the Amiga community any favours, and from this outsider's view their performance even seems to be going backwards. If cost were the main consideration, there are other ways of dealing with it, and it's a certainty that Tabor's eventual price will be much greater than $450. Given all that, plus the protracted gestation time for "Tabor Plus," don't the Amiga owners who would be willing to buy such a machine deserve something with a little more zip?

The Amiga community needs new hardware, to be sure, and A-EON has at least filled the business opportunity however slow and abortive its progress in doing so; whatever small number they end up producing this time I imagine they'll eventually sell. That doesn't change the fact that weaksauce systems like this being sold as desktop machines continues to tar the architecture as underpowered and does nothing to expand the market beyond the shrinking circle of faithfuls. When there exist today at this very moment Power ISA workstations that can be every bit as utilitarian and functional as Intel workstations, the last thing any of our two communities need is yet another Amiga that people don't want.

Fedora Workstation Live ISOs up (and hopefully sticking)


After a little glitch getting them off the ground, Fedora Workstation Live ISOs are now a thing for ppc64le, and are exactly what they say on the tin: a try-before-you-metaphorically-buy image to check things out before you install. A bootable disk image of Rawhide (what will currently become Fedora 32) is ready now for your entertainment and possible utility, and they should turn up as a standard download presumably when F32 goes live late April.

Firefox 72 on POWER


Firefox 72 builds out of the box and uneventfully on OpenPOWER. The marquee feature this time around is picture-in-picture, which is now supported in Linux and works just fine for playing Trooper Clerks ("salsa shark! we're gonna need a bigger boat!"). The blocking of fingerprinting scripts should also be very helpful since it will reduce the amount of useless snitchy JavaScript that gets executed. The irony of that statement on a Blogger site is not lost on me, by the way.

The bug that mashed Firefox 71 (ultimately fallout from bug 1601707 and its many dupes) did not get fixed in time for Firefox 72 and turned out to be a compiler issue. The lifetime change that the code in question relies upon is in Clang 7 and up, but unless you are using a pre-release build this fix is not (yet) in any official release of gcc 9 or 10. As Clang is currently unable to completely build the browser on ppc64le, if your extensions are affected (mine aren't) you may want to add this patch which was also landed on the beta release channel for Firefox 73.

The debug and opt configurations are, again, otherwise unchanged from Firefox 67.

DOSBox JIT on ppc64le (and how you can write your own)


Apparently the quickest way to make software moar faster is to turn it into a tiny compiler and lots of things are doing it. As I get time between iterations of TenFourFox and smoke-testing Firefox builds on ppc64le, slow work on the Firefox JIT continues, but that doesn't mean we can't be JITting all the other things in the meantime.

One of my favourite games is LucasArts' Dark Forces, an FPS set in the Star Wars universe (but now apparently non-canon after the events of Rogue One). Although projects like XLEngine can run it, that, too, requires a code generator to be written (because of AngelScript). I decided that if I had to write a backend after all, a better approach would be to add a backend to DOSBox, the famous DOS emulator. That would happily run the copy of PC Dark Forces I already have and any of my other old DOS games like Extreme Pinball and Pinball Illusions and Death Rally and all those other great titles in my office besides. (The classic Mac version of Dark Forces is better, by the way, not least of which because of its beautiful high-resolution graphics that are double those of the PC release.)

Fortunately, a 32-bit big-endian PowerPC version of DOSBox already existed as unofficial patches (play it on your old Power Mac), which took only a few days for me to convert to 64-bit little-endian. While DOSBox in strictly interpreted mode on the Talos II is no slouch, this JIT, which is for DOSBox's dynamic recompiling "dynrec" core, increases performance roughly by a factor of four. This makes even the most demanding games playable and makes most other games run like butter (in fact, it's so fast it even destabilizes some timing loops, like the credits scroller in Descent). If I could shoot and take screenshots at the same time, you'd see me do better at blowing away Imperial officers, too.

You can build this yourself. Download the patch and backend, which we will call ppc64le_dosbox.diff and risc_ppc64le.h. This patch is intended to apply against SVN trunk.

svn checkout svn://svn.code.sf.net/p/dosbox/code-0/dosbox/trunk dosbox-code-0
cd dosbox-code-0
patch -p0 < ../ppc64le_dosbox.diff
cp ../risc_ppc64le.h src/cpu/core_dynrec
./autogen.sh
./configure CFLAGS="-O3 -mcpu=power9" CXXFLAGS="-O3 -mcpu=power9"
make -j24 (or as you like)

Copy over your favourite games or install media, and src/dosbox to start your fast-powered DOS machine. Or try this benchmark pack and see for yourself. (I like the PC Player one.) See the DOSBox documentation and DOSBox wiki for more.

But let's say you'd like to work on a JIT backend of your very own for some other ppc64le port. This is hardly the place for a tutorial on writing ppc64 assembly language -- you can read IBM's -- but I will talk about how to get generated code into memory and how to execute it, and how this might differ from x86_64 or ARM. The following examples should run on any 64-bit Power CPU from the 970/G5 to the POWER9 regardless of pagesize or endianness under recent Linux kernel versions, but they're tested on my ppc64le Talos II in Fedora 31, of course.

Being able to emit generated code to memory and run it is actually a rather big security hole unless it is done carefully and correctly. Indeed, SELinux will halt you right in your tracks (as it should) unless you do the correct mating dance. The basic steps are:

  1. Allocate writeable memory.
  2. Emit machine code to that memory.
  3. Flush the caches.
  4. Make that tract of memory executable. (This is the dangerous bit. We'll talk about how to mitigate it.)
  5. Run the now executable code.
  6. Profit!

For steps 1 (and, indirectly, step 4), you need to know what the OS believes the memory page size is. (On Fedora 31, this Talos II has a pagesize of 64K. Some Linuces like Adelie use 4K pages.) For step 3, you need to know how large a cache line on your CPU is (for all current 64-bit Power ISA processors, this number is 128 bytes). We will handwave away these a bit here by keeping the example code at or less than a cache line's length, and reading the OS's page size ourselves.

Let's look at the first example. You'll notice a couple blocks of code commented out. If you are not using SELinux for whatever reason, you may be able to get away with posix_memalign() and mprotect() to allocate your memory. However, if you use this on an SELinux system (Red Hat, recent Debians, etc.), you will have to modify your policy or temporarily disable some protection features to run that code.

The better way is to create the tract of memory as an in-RAM file and use mmap() to make it writeable. This only works in integral page numbers, hence the need to know your page size (we ask the kernel via syscall). You may be able to use the second commented block to call memfd_create() directly, but the syscall approach I've used here doesn't require setting _GNU_SOURCE. Once we have mmap()ed the memory, we can write to it. We do so as simply an array of 32-bit integers (even 64-bit Power still has 32-bit opcodes).

I've ripped off the assembler macros from DOSBox's backend because they match up nicely with the numbers in the Power ISA book and pretty much any Power assembly language reference. Our first example runs a very simple program:

li 3,2020
blr

This code loads the integer 2020 into r3, the first argument register and the standard return register in the ELFv2 ABI (and indeed for any PowerPC using either PowerOpen ABI like AIX and Mac OS X, or SysV ABIs like Linux or the BSDs). It then branches back to the return address in the link register, terminating the program. We emit this code to the mmap()ed in-memory file.

The Linux kernel flushes the data and instruction caches of the processor separately, so we do the same. For every cache line that needs invalidation, we use the dcbst instruction to invalidate the data cache line in question, and once they are all flushed, we use the sync instruction so that each CPU's view of memory is consistent. Then we flush the instruction cache in the same way, using the icbi instruction and finally the isync instruction to ensure consistency of the I-cache this time. Because this all fits into a single cache line we just do each instruction once.

As our last step, we do another mmap() to make the tract of memory executable. Since we are using a named file to store our code rather than just memory we managed to grab, SELinux does not block it the same way it would ordinarily block an anonymous allocation. The second mmap() adds further security, because we are not making the memory executable until the program is fully assembled in RAM, and while the two mmap()s are linked and reference the same memory area as far as our program is concerned, assuming randomization is in effect an attacker would now have to derive two completely random memory addresses to do any funny business. We then execute the code as if were a C function (more on this in a moment).

Run the code like this:

% gcc -o jitlab1 jitlab1.c
% ./jitlab1
jitcode at 0x7fffac650000
result = 2020

Excellent! Note that the actual address of the jitcode may vary from run to run.

However, an example this trivial can only be this trivial because it didn't need to pull and manage a stack frame or maintain all but the most token adherence to the C ABI. We need a stack frame if we modify, or might modify, any non-volatile register or the link register (i.e., make any calls to other subroutines). Frankly, a JIT isn't much good if it can't call into its host somehow, either to run higher level operations or exchange data, so for all practical purposes you'll probably want to pull a stack frame in your code and be ABI compliant.

With that in mind, let's turn to our second contrived example. This slightly more complex demonstration will receive an integer value which it will perform basic math on (add 2020 to it), then call a function to display that result, double the result, and return that. As we are doing very little actual work in this JIT code (essentially messing with r3 as both the in and out argument, which is volatile, so it need not be saved in the stack frame we created), that makes control flow simpler. The function prologue, which is broken up a bit for performance reasons, can be as simple as stdu 1,-size(1):mflr 0:std 0,size+16(1), where size is the desired size of the stack frame (we use 256 here just for laziness).

Having pulled the frame and computed the first value, we now want to call the display routine to show our work. A minor complication is that there is no PowerPC/Power ISA instruction to directly branch to an address in a regular general purpose register. Instead, the ISA only supports indirect branching to either the address in the link register "LR" (blr) or the counter register "CTR" (bctr), both of which are special-purpose registers. As a practical measure, for branching other than returns from a routine, we prefer the counter register which does not need to be saved.

A more significant complication is setting up the actual function call itself. Without getting too deep into the weeds, 64-bit ELFv2 compliant functions can have two entry points, called the "global entry point" and the "local entry point." The global entry point is called when r2, the register customarily used for global symbol access through the Table of Contents, must be computed. The TOC, a holdover from PowerOpen systems, contains a conventional ELF global offset table "GOT" and optionally a small data section. The global entry point invariably consists of two instructions that compute r2 from the address of the routine itself, which is conventionally maintained in r12, and the local entry point follows at eight bytes after the global entry point. This scheme facilitates position-independent code generation as the compiler can emit code referencing globals as relative indexes on the TOC base stored in r2.

At link time the linker looks at branches and determines whether the caller function and callee function will share the same TOC. If they don't, then the linker points the branch at the global entry point either directly or through a procedure linkage table "PLT" stub. If they do, however, then this call is considered "local" (from the perspective of the global context), and the linker calls the local entry point instead. However, if it turns out the callee actually does no global access, the compiler generates only a single entry point because there is no need to compute r2, and the linker calls that.

The linker can juggle this because it has time to burn and the breadth of the codebase to scan. However, in our example here, we are the linker, and we have much less visibility into the codebase we're trying to call. Think of it as working in the basement with a flashlight trying not to walk into the walls and only a limited time to finish our job.

A hard branch (using b or bl) will always be faster, even if imperceptibly, and is less susceptible to Spectre-style attacks. However, if we actually turn out to be calling a function's global entry point and r12 is not set correctly, then r2 will also not be set correctly, and global access will either fault or just be plain wrong. If we end up doing all this computation, we might as well just branch to r12 via CTR, which is slower by a minor degree but will always work. Plus, if we're able, whatever speed hit is incurred can be mitigated by hoisting the mtctr up a few instructions ahead of the bctr or bctrl that uses it. Either way, whether we hit the global entry, the local entry or a non-global function, everything will be ABI compliant.

That does not mean you can never branch directly. Directly branching to a function with b or bl will work if you are calling a pure local function that accesses no globals, or you are calling the local entry point and nothing has modified r2 (though this is an awfully big gamble in complex codebases), or it's a function you generated (i.e., another JIT function) that you can guarantee to be purely local because it never touches r2. Or you can just set r12 and directly branch, I suppose.

The last consideration you need to keep in mind with JIT function calls is whether they need to be patchable. If they need to be patched or redirected as code addresses change, then you need to have a fixed-size branch stanza that can be changed on the fly. The minimum size of a branch stanza in 64-bit Power ISA is seven instructions (28 bytes) because we may call a 64-bit address and we need four immediate loads to compose it plus a rotation step. For example, to call a routine at 0x1234567876543210, the branch stanza looks like this:

lis 12,0x1234 (of course actually addis 12,0,0x1234)
ori 12,12,0x5678
rldicr 12,12,32,31
oris 12,12,0x7654
ori 12,12,0x3210
mtctr 12
bctrl

This yields a full 64-bit quantity, 0x1234567876543210. We can then mtctr 12 and bctrl to call the routine and come back to the generated code. Repatching this stanza is "merely" a matter of changing the bottom 16 bits of the four immediate loads and flushing the caches. You can have a direct branch in a stanza if the location of the routine is within the available displacement for those instructions, but then everything else should be nops, e.g.,

bl 0x1234567876543210
nop
(ori 0,0,0)
nop
nop
nop
nop
nop

so that if the new address no longer fits into the branch instruction's displacement, you can rewrite it as a full stanza. If you choose to set r12 and end in a hard-coded branch at the same time, remember that you'll need to repatch both things if the address changes, which might make your code generator a bit more hairy. I consider the aggressive promotion of branch stanzas to hard branches to be a form of premature optimization and you shouldn't be doing this until you are sure the rest of your code generator is working correctly.

This gets even more complex, by the way, if your JIT code must itself access globals; in that case you may need to save r2 yourself and/or do additional linkage work. I should note that in the JITs I've personally written this was never necessary. Let the C host code handle it and debugging your generated code will then be much less of a hassle.

Returning to our example, when the function call returns to the JIT code it finishes its "work," dismantles the stack frame, recovers the previous link register value and branches back to it to exit. The result remains in r3. We use decimal 1111 as our passed value in this example, so the resulting values should be 3131 (1111+2020) and 6262 (3131+3131). Since our intermediate print function helpfully returns the value it was called with, in our example here we don't need to worry about stashing it anywhere. You may not be so lucky calling someone else's work.

% gcc -o jitlab2 jitlab2.c
% ./jitlab2
jitcode at 0x7fff99dc0000
called with 3131
result = 6262

These were obviously toy examples generating small blebs of code. For code blocks greater than one cache line in size, which you will almost certainly generate, you need to dcbst (or similar) and icbi each cache line in whatever setup function gets called to flush the cache. The Z constraint we use here can make this very easy by having gcc do the work of setting up the register for each address for you. See the cache_block_closing() function in the DOSBox backend for a real-world example. In a like fashion, if your code spans multiple memory pages, with the method we have used here you will need to make each memory page writeable, and then ultimately executable, in turn.

You should also get used to tossing trap (tw 31,0,0) instructions in code blocks you're not sure about. This lets you use the debugger to verify that you actually assembled the opcodes in memory correctly and allows you to trap at the point you believe might be problematic and single-step from there. Oddly, in Fedora 31, when a trap instruction is hit gdb does pause but doesn't register a SIGTRAP (whereas lldb does seem to trap correctly, but I'm much more used to gdb personally). In this case, when the trap is sprung in gdb, you would need to press CTRL-C and induce an interrupt when the trap is hit to actually get into the debugger. Debugging JITs can be a real nightmare especially when the bug is very subtle, so only write the minimum you need to get the JIT off the ground, use traps liberally during building and never optimize prematurely. In particular ensure that your actual assembler steps that write instructions to memory are doing so accurately, as this can yield some rather humiliating issues later if you hit edge cases.

Let's make all the ppc64le things faster!

Void Linux ppc64le packages at 100%


Happy New Year! The Power ISA port of Void Linux (currently supporting 32-bit PowerPC and 64-bit Power ISA in both big-endian and little-endian with musl or glibc) has hit an important milestone, at least for the musl and glibc flavours of ppc64le: 100% of the Void repo packages are either built or blacklisted. As of this writing, that means almost 95% of the Void repo runs on ppc64le-glibc and almost 94% on ppc64le-musl. This includes Firefox and Epiphany, but not Chromium. This is great news for folks who want another systemd-free alternative.

The numbers quickly fall off for big-endian systems, though. Both ppc64 and regular 32-bit ppc are building about 54% of the repo for either libc flavour. Notably, Firefox is available for big-endian ppc64, but not for 32-bit PowerPC. However, Epiphany and Midori are available all the way down to your beloved G3 or G4.