Unfortunately, work on this bug is getting stymied by the fact that POWER9 has hardware watchpoints disabled in the Linux kernel. Trying to catch who's writing the faulty values is nearly impossible in an application of Firefox's complexity without it because without hardware assistance gdb has to single step through the code. When you don't know even what haystack to look for the needle in, it's slow going, as in (no exaggeration) hundreds of times slower. I left Firefox to try to initialize overnight on my 8-core Talos II in the debugger. By the morning, 8 hours later, it hadn't even launched its first thread.
The reason they are disabled is an errata which causes watchpoints set on cache-inhibited memory (such as devices) to make the CPU halt with a checkstop. Arguably some sort of fault on this is correct behaviour, but a checkstop is catastrophic; it's the equivalent of stopping your car by driving it off the road into a wall. I don't fault the PowerPC kernel maintainers for taking this interim approach because without it an unprivileged user could instantly halt the machine, even inadvertently. Even if the kernel could detect that the watchpoint was pointing at a cache-inhibited address and return an error, a tricksy user could potentially set up the watchpoint to "good" memory and then change the memory mapping.
I talked to the PowerPC kernel maintainers about this and an interim solution we're sort of agreed on is to use a debugfs entry as a one-way switch so that workstation developers like me can turn on hardware watchpoints "at our own risk." When I'm ready to debug something that requires a proper watchpoint, then I create a debugfs file and the kernel will then allow the watchpoint in hardware until the next reboot. I'm the only user on the machine, so if I screw it up, it's only my pasty Roman sculptured tuckus (and filesystem).
But this isn't going to write itself. I need to scratch my own itch, get it working, get it accepted, get it actually in a kernel (instead of schlepping forward local changes), and then finish debugging Firefox, and then finish the JIT, in addition to my day job, my work on TenFourFox and not being in trouble with my lovely wife for not emerging from the back room for hours.
So now I'm going to be a little less than nice with Raptor. Raptor has been fairly public about their support for Chromium, ostensibly because of the (IMHO irresponsible) proliferation of Electron apps, but they've been very tepid on Firefox. I understand a small company can't do everything, but they have had time to do ports of Unreal Engine and WINE (though this in and of itself is not enough to make the QEMU-WINE fusion Hangover work, which I've been also trying to tinker with and have it about 70% building). These are fun things to do and are certainly interesting, but that makes statements like this a bit galling, and statements like this a bit disingenuous.
Mozilla has a chicken-egg problem when it comes to an architecture that until very recently had a very small desktop share: its share (and share of desktop users using Firefox) would doubtlessly increase with a Firefox JIT, but the resources to expend into writing that JIT can't be justified until there is a larger share. Furthermore, it rings hollow for Raptor to ding Mozilla in that tweet about not being sufficiently open when Google until literally days ago wouldn't land the existing POWER9 Chromium work when Mozilla has been allowing POWER9 (and other PowerPC) patches into Firefox as a tier-3 for pretty much its entire existence. You can put the open-source lipstick on the Google pig as much as you like but at the end of the day, it's still Google and it's still Google's repo. Freedom involves choice. I'm not going to slam the people who did hard work on the Chromium port, because it is hard work and unfortunately Electron is a thing despite my misgivings, but I am going to slam Raptor for endorsing it at Firefox's expense.
I'm not asking Raptor to do the Firefox JIT port, though I may be soliciting help to farm it out with my reduced number of available cycles. (Right now it's based on Firefox 62, which once it works there, we'll forward-port it to trunk. More on that in a future post, but I'll probably put my current work up on Github. If you're interested in contributing, post in the comments.) I am asking Raptor to endorse the effort, however, and I am asking them to become more involved with developer-facing features to allow those of us who are working on ports to do so more productively.
As an example, developing and getting the interim debugfs switch for hardware watchpoints into the kernel would be an enormous help to me personally (and would save me a great deal of time), and would probably be very beneficial for other developers. Nearly everyone on the LinuxPPC kernel team I talked to agreed this is a big deficiency and one that is realistically implementable. It would be nice if this could proceed in parallel so I'm not blocked on doing everything myself because on a scale of 0 to even, I just can't. I'm sure there are many other developer pain points that will appear as more people start working on Talos systems, and I'd like Raptor to also treat these requests with priority and dedicate resources to worthy ones to allow more port work and development to flourish.
Let me soften a little bit in conclusion by saying Raptor has a very hard row to hoe being a small company jumpstarting an entire ecosystem. I'm being hard on them because I'm glad they exist, I intend to continue being a customer, and as a long-time Power ISA bigot I want them to succeed. But I also want to see the principles of free computing embodied in the hardware for the Talos family appropriately manifested in software. I don't see that being adequately expressed in the choices they've made so far and I'd like that to change. Developers need to be prioritized and software choice needs to be facilitated. Let's see more of that so we can see more POWER9 adoption and a brighter future for desktop computing.