Showing posts from February, 2019

Raptor, help us out here on the software side

I'll preamble this with a status report on the ppc64le Firefox JIT: stalled. The stall-out is because my time is currently occupied with Mozilla bug 1512162 which has to do with the connection between JavaScript and the native representation causing debug builds to assert. I wallpapered this for Firefox 66, and that seems to be okay, but the wallpaper just seems to have moved the badness around and the issue is more fundamental. This is a more serious issue, so it's my highest priority.

Unfortunately, work on this bug is getting stymied by the fact that POWER9 has hardware watchpoints disabled in the Linux kernel. Trying to catch who's writing the faulty values is nearly impossible in an application of Firefox's complexity without it because without hardware assistance gdb has to single step through the code. When you don't know even what haystack to look for the needle in, it's slow going, as in (no exaggeration) hundreds of times slower. I left Firefox to try to initialize overnight on my 8-core Talos II in the debugger. By the morning, 8 hours later, it hadn't even launched its first thread.

The reason they are disabled is an errata which causes watchpoints set on cache-inhibited memory (such as devices) to make the CPU halt with a checkstop. Arguably some sort of fault on this is correct behaviour, but a checkstop is catastrophic; it's the equivalent of stopping your car by driving it off the road into a wall. I don't fault the PowerPC kernel maintainers for taking this interim approach because without it an unprivileged user could instantly halt the machine, even inadvertently. Even if the kernel could detect that the watchpoint was pointing at a cache-inhibited address and return an error, a tricksy user could potentially set up the watchpoint to "good" memory and then change the memory mapping.

I talked to the PowerPC kernel maintainers about this and an interim solution we're sort of agreed on is to use a debugfs entry as a one-way switch so that workstation developers like me can turn on hardware watchpoints "at our own risk." When I'm ready to debug something that requires a proper watchpoint, then I create a debugfs file and the kernel will then allow the watchpoint in hardware until the next reboot. I'm the only user on the machine, so if I screw it up, it's only my pasty Roman sculptured tuckus (and filesystem).

But this isn't going to write itself. I need to scratch my own itch, get it working, get it accepted, get it actually in a kernel (instead of schlepping forward local changes), and then finish debugging Firefox, and then finish the JIT, in addition to my day job, my work on TenFourFox and not being in trouble with my lovely wife for not emerging from the back room for hours.

So now I'm going to be a little less than nice with Raptor. Raptor has been fairly public about their support for Chromium, ostensibly because of the (IMHO irresponsible) proliferation of Electron apps, but they've been very tepid on Firefox. I understand a small company can't do everything, but they have had time to do ports of Unreal Engine and WINE (though this in and of itself is not enough to make the QEMU-WINE fusion Hangover work, which I've been also trying to tinker with and have it about 70% building). These are fun things to do and are certainly interesting, but that makes statements like this a bit galling, and statements like this a bit disingenuous.

Mozilla has a chicken-egg problem when it comes to an architecture that until very recently had a very small desktop share: its share (and share of desktop users using Firefox) would doubtlessly increase with a Firefox JIT, but the resources to expend into writing that JIT can't be justified until there is a larger share. Furthermore, it rings hollow for Raptor to ding Mozilla in that tweet about not being sufficiently open when Google until literally days ago wouldn't land the existing POWER9 Chromium work when Mozilla has been allowing POWER9 (and other PowerPC) patches into Firefox as a tier-3 for pretty much its entire existence. You can put the open-source lipstick on the Google pig as much as you like but at the end of the day, it's still Google and it's still Google's repo. Freedom involves choice. I'm not going to slam the people who did hard work on the Chromium port, because it is hard work and unfortunately Electron is a thing despite my misgivings, but I am going to slam Raptor for endorsing it at Firefox's expense.

I'm not asking Raptor to do the Firefox JIT port, though I may be soliciting help to farm it out with my reduced number of available cycles. (Right now it's based on Firefox 62, which once it works there, we'll forward-port it to trunk. More on that in a future post, but I'll probably put my current work up on Github. If you're interested in contributing, post in the comments.) I am asking Raptor to endorse the effort, however, and I am asking them to become more involved with developer-facing features to allow those of us who are working on ports to do so more productively.

As an example, developing and getting the interim debugfs switch for hardware watchpoints into the kernel would be an enormous help to me personally (and would save me a great deal of time), and would probably be very beneficial for other developers. Nearly everyone on the LinuxPPC kernel team I talked to agreed this is a big deficiency and one that is realistically implementable. It would be nice if this could proceed in parallel so I'm not blocked on doing everything myself because on a scale of 0 to even, I just can't. I'm sure there are many other developer pain points that will appear as more people start working on Talos systems, and I'd like Raptor to also treat these requests with priority and dedicate resources to worthy ones to allow more port work and development to flourish.

Let me soften a little bit in conclusion by saying Raptor has a very hard row to hoe being a small company jumpstarting an entire ecosystem. I'm being hard on them because I'm glad they exist, I intend to continue being a customer, and as a long-time Power ISA bigot I want them to succeed. But I also want to see the principles of free computing embodied in the hardware for the Talos family appropriately manifested in software. I don't see that being adequately expressed in the choices they've made so far and I'd like that to change. Developers need to be prioritized and software choice needs to be facilitated. Let's see more of that so we can see more POWER9 adoption and a brighter future for desktop computing.

OpenSUSE 15.1 Beta available, just not for us (yet)

The beta for OpenSUSE 15.1 is now available, at least for x86_64. This isn't actually news on the ppc64(le) side because the Leap releases don't currently have a Power ISA port. However, if you want to run OpenSUSE on your Talos II, you can with the Tumbleweed releases, which is their rolling-release flavour and apparently (I'm told) works quite well on the hardware. Nevertheless, hopefully as the install base grows there will be more interest in a stable Leap release for Power ISA as well.

Blackbirds to ship Q2 2019

On Twitter Raptor is reporting Blackbird shipments will start occurring in Q2 2019 instead of Q1 as previously announced. However, manufacture of the first production batch is in progress (we have one on order and will be doing a review as soon as it arrives). The ASpeed BMC, which is built-in, apparently also needs upcoming Linux kernel support to route its output over HDMI (via an ITE device) when hotplugging a display, which implies current distros compatible with the T2 may not fully work on the Blackbird without a discrete graphics card until they are also updated. We'll be watching.

assert_slb_presence aaargh_warnings_everywhere make_it_stop

We're tracking what seems to be a recent regression in Linux ppc64le (and probably big-endian as well, if we understand the actual cause) kernels from at least 4.20.5 and possibly a little earlier which throws recurrent kernel warnings to dmesg. Depending on your distro this may pass completely unnoticed except for your logs filling up a little faster, but systems that send notifications on such events may drive you up the wall (such as our Fedora 29 installation, where our testing of current Firefox trunk trips this assertion like mad). The output invariably looks like this:

[46425.991034] WARNING: CPU: 22 PID: 0 at arch/powerpc/mm/slb.c:74 assert_slb_presence+0x28/0x40
[46425.991039] WARNING: CPU: 18 PID: 0 at arch/powerpc/mm/slb.c:74 assert_slb_presence+0x28/0x40

followed by the usual debugging information. As the filename implies, this is related to the CPU's segment lookaside buffer, but failing the given assertion is otherwise harmless on the Talos. It looks like the bug has been there for a little while but at least as of 4.20.10 only occurs on CPUs that support the slbfee. instruction (POWER6 and up) and, if our understanding is correct, only on testing effective addresses with a particular bit set. If so, this patch should fix it, but there is no ETA.

In the meantime, if you're badly affected, one way to get the messages to temporarily quiet might be to twiddle your console logging level settings; see man klogctl for how this works. Alternatively, on a Red Hat-type system like ours (Fedora, CentOS, etc.), the notifications come from ABRT, so killall abrt-applet will temporarily quell the warnings (/usr/bin/abrt-applet --gapplication-service & to restart).

Ubuntu LTS 18.04.2 available

An updated release of the long-term support Ubuntu 18 (Bionic Beaver) is now available for ppc64el. Read the full changelog for 18.04.2. As with prior releases, Ubuntu 18 should "just work" on the Talos II. All Power ISA official releases of Ubuntu are Server branded and do not install a GUI by default.

The last POWER1 on Mars is dead

The Opportunity Rover, also known as the Mars Exploration Rover B (or MER-1), has finally been declared at end of mission today after 5,352 Mars solar days when NASA was not successfully able to re-establish contact. It had been apparently knocked off-line by a dust storm and was unable to restart either due to power loss or some other catastrophic failure. Originally intended for a 90 Mars solar day mission, its mission became almost 60 times longer than anticipated and it traveled nearly 30 miles on the surface in total. Spirit, or MER-2, its sister unit, had previously reached end of mission in 2010.

And why would we report that here? Because Opportunity and Spirit were both in fact powered by the POWER1, or more accurately a 20MHz BAE RAD6000, a radiation-hardened version of the original IBM RISC Single Chip CPU and the indirect ancestor of the PowerPC 601. There are a lot of POWER chips in space, both with the original RAD6000 and its successor the RAD750, a radiation-hardened version of the PowerPC G3.

That's not the end of Power ISA chips on Mars, though: Curiosity, which is running a pair of RAD750s (one main and one backup, plus two SPARC accessory CPUs), is still in operation at 2,319 Mars solar days and ticking. There is also the 2001 Mars Odyssey orbiter, which is still circling the planet with its own RAD6000 and is expected to have enough propellant to continue survey operations until 2025. Curiosity's design is likely to be reused for the Mars 2020 rover, meaning possibly even more Power chips will be exploring space and doing science where it counts millions of miles from home.


On a recent Hacker News discussion someone pointed me to this weird historical oddity: the AMD Opteron-socket-compatible POWER7 as reported in El Reg, circa 2006.

We use a POWER6 here at Floodgap for the main server, which as typical for RISC servers of those days uses a bespoke logic board and getting a replacement for it was quite expensive (as we found out when it blew one in 2014). Part of this was no doubt due to their low production volumes and in 2006 IBM was still producing x86 Xeon-based servers, so it made logical sense to try to consolidate their manufacturing. (Recall Apple did something similar with the Power Macintosh 4400 and the "Yellowknife"-derived "Gossamer" beige Power Macintosh G3, both of which were intended to use, or at least use more, off-the-shelf commodity PC components.)

What was particularly interesting about this concept, however, was that the envisioned AMD motherboard would also have accommodated SPARC processors, intended to attract IBM, Sun and Fujitsu at a time when Intel was planning to unify their own hardware for Xeon and Itanium (rip). In some respects it may have reflected an IBM perception that Itanium was potentially a threat to their RISC line and to achieve similar economies as Intel planned to.

Did this happen? Although the Register's article implies some prototyping was done, it doesn't look like it ever saw the light of day, and it's not clear why the agreement foundered. Indeed, the POWER7 systems I've all seen continued to use a custom board and I've never heard anything about SPARCs of that generation using such a common logic board either. In particular, the lowest level Power 720 820x machines — the ones that would have been most likely to use such a cost-reduced design — are in fact very similar to the POWER6 820x machines (including our local 8203-E4A), and there are even upgrade paths.

The idea didn't really die, though, because IBM finally opened up their architecture into OpenPOWER with the POWER8 and now anyone can make a board that a Power chip can go into. And, of course, one particular vendor's POWER9 workstation is what this very article is being typed on. Naturally this wasn't altruism on Big Blue's part; it was their attempt to build a larger multi-front ecosystem to combat x86 dominance in the server room, which would embiggen the pie for "big RISC" servers and thus IBM's slice of it. If it also caused Power chips to turn up in other environments, well, that would be more icing on the cake. While the "Opteron POWER7" looks like it never happened, and no one's putting Epyc chips in Talos IIs, at least some concept of a cross-vendor Power logic board did manage to survive and we OpenPOWER pioneers are the lucky beneficiaries today.

So long, Itanium

I have mixed feelings about Intel announcing the end of Itanic the Itanium. Everyone knew this was coming, of course, but Intel finally officially gave Itanium a death date of mid-2021 instead of keeping it limping along as it has for the past several years.

On the one hand, I don't like Itanium for killing PA-RISC. The first machine I ever had root on was a HP K250 running HP-UX 10.20, and I still have a PA-RISC laptop (an RDI PrecisionBook C160L) and a C8000 with dual PA-8900s, the most powerful Precision Architecture workstation HP ever made. I thought PA-RISC was a nice, clean instruction set with decent performance and HP seemed to at least try to keep up with PowerPC and SPARC, and I think it died before its time (there was even talk of using PA-RISC in Amiga computers!). HP is still the Itanium's only customer, mostly because they would probably roil the remnant HP-UX user base with another architecture switch. In fairness, it should be noted that it was HP themselves that didn't think there was any money in continuing with developing their own CPU (the same logic they applied to killing the DEC Alpha, another exceptional architecture murdered way too early), and that may have been true at the time, but Itanium has painted the Superdomes into a corner and HP-UX and OpenVMS will probably go the way of emulation like the Unisys mainframes have gone.

On the other hand, the end of IA-64 marks the end of an era, not only for non-x86 designs within Intel, but as the last major VLIW CPU (which Intel and HP called "EPIC"). GPUs are of course largely VLIW, VLIW chips still turn up in embedded systems applications and there are still oddballs like the Russian Elbrus, but as a CPU architecture load/store designs have largely won. Even your typical Intel chip is essentially a modern RISC-style core with an x86_64 instruction decoder bolted on (and of course all that other black box crap that you don't get with POWER9). Itanium made it worse with a pretty weaksauce x86 emulator and its unusual architecture choices that make it relatively resistant to Spectre-style attacks but difficult to optimize typical software applications for, a recurrent problem with VLIW compilers in general. This was one of the big reasons the SGI IA-64 workstations never really took off compared to the MIPS systems they replaced.

Big classic proprietary CPUs were a big part of my early career and losing Itanium is a sad whimper. But while Power ISA is a much less adventurous design, it's a much more performant one, and OpenPOWER means it will stay relevant for a long time to come. If you've got a Talos II under your desktop as I do, you chose right.