Posts

Showing posts from December, 2021

W(h)ither POWER8


With the recent announcement that Ubuntu's ppc64le ("ppc64el") flavour is moving to require POWER9, it's worth asking not only how much life is in POWER8, but also POWER9, now that Power10 (such as it is) is now available.

POWER8 was the first OpenPOWER processor and the one planned for the original Raptor Talos (that never got released to the public), but also appeared in several third-party systems, largely by Tyan. It offered fully open firmware and while it exclusively required Centaur memory buffer chips, these could be on riser cards, interposers or even on the logic board to allow attaching regular ECC DIMMs. It introduced ISA 2.07, which among other features expanded on the vector-scalar extension instructions first introduced in POWER7 (called VSX-2 in 2.07).

POWER8 systems are certainly more widely distributed than previous generations which since about POWER5 were almost exclusively IBM, and they were also the first Power ISA CPU with a fully-functioning little-endian mode (the POWER7 implementation had gaps), which caused it to rapidly become the baseline for most distributions supporting Power. But POWER9 is even more widely distributed, not least of which because of "low end" systems like this Talos II and the Blackbird, uses 25% less power but is 50% faster than a chip that was already two to three times faster than POWER7, and has even more advantages in terms of instruction set; ISA 3.0 expands VSX further (VSX-3) and also adds a number of other useful instructions. The current incarnation of our Firefox JIT, for example, leverages new POWER9-specific instructions for remainders, accessing the program counter and 64-bit byte swapping. All this, and it's still a fully open architecture with fully open firmware.

On the other hand, Power10 is presently a step backwards. Putting its otiose binary blobs aside for the moment, there are only a few Power10 SKUs in its current infancy, none of them are workstations, and none of them don't say IBM. No Power10 hardware takes direct attach RAM, not even like the POWER8 did. No ODM has a channel for obtaining the actual CPUs. If there's a Rainier reference design to work from, no one seems to be talking about it. It's almost back to the bad old days when IBM wouldn't sell me a POWER7 and nobody else made one (my long-running POWER6 was a reseller purchase).

If Ubuntu's move is the first of many to decommission POWER8 support, that's still over six years as a first-tier citizen (almost five as second fiddle to POWER9), and no one else so far has talked about a similar move. (Even if RHEL 9 goes POWER9+ only, RHEL 8 would presumably support your POWER8 until 2029.) It's sad to see it happen but POWER9, besides being easier to get, is an improvement in virtually every way and in ways Power10 right now is not. Besides the fact IBM's still selling POWER9 machines, the chip's time on top and its wider distribution are good signs for the first Power CPU in years to be in purpose-built desktops and more third-party servers. Nearly five years atop the heap buys you a lot of market penetrance especially with a questionable successor. While all good things must come to an end, POWER8's death is hardly imminent, and POWER9's is nowhere yet in sight.

91ESR with Baseline Compiler/Baseline wasm for POWER9


It's heeeeeee-re. I've completed the pull-up of the POWER9 Firefox JavaScript JIT to the current ESR, Firefox 91. As a bonus I also completed the second-stage Baseline Compiler (Baseline Interpreter being the first-stage compiler) at the same time for a reason I'll explain in a minute.

The build process is the same as Firefox 91, using the 91ESR tree, but requires adding --enable-jit to your .mozconfig and applying this patch and set of files. Please note that POWER9 remains the only supported architecture (Power10 grudgingly, but it should work), and only on little-endian. If you compile big-endian, the JIT should statically disable itself, even with --enable-jit. If you compile with -mcpu=power9, which is recommended, the JIT is statically enabled with --enable-jit and becomes slightly faster because there are fewer runtime checks. If you don't explicitly specify POWER9, or do something like -mcpu=power8, but still specify --enable-jit, then runtime detection should be enabled (which right now disables the JIT). I have not tested this on POWER8 because I don't have a POWER8, so I can't fix it myself. If this doesn't work or builds a defective Firefox or JavaScript shell, please submit a correction and I'll incorporate it.

What's working? What now works is the Baseline Interpreter and the Baseline Compiler, and Baseline compilation for Web Assembly. asm.js using Cranelift isn't supported yet, because this requires the third-stage Ion optimizing compiler, and WebAssembly transpiled to asm.js will simply compile in Baseline. This is not the fastest the browser can run, but it is certainly noticeably faster, and most of the pure JavaScript benchmarks I tested showed it is already several times more efficient than the C++ interpreter. I did not encounter any obvious crashes in things like Gmail, Google Docs and my workplace Office 365 instance (and I was a lot more productive!) but the reason for releasing this is to see if you find any. If you can reliably crash the browser in a way that doesn't crash with the JIT off, file an issue with exact steps to reproduce. If I can't reproduce it, I can't fix it. Steps to trigger an assertion in a debug build would be even more helpful.

What's not working yet? The third-stage optimizing compiler doesn't work and isn't enabled (our patches turn it off by default in the browser, and you should always specify --no-ion to the JS shell unless you're doing development), and as stated, this also means no specific Cranelift support for things like the asm.js-based DOSBox and MAME emulators on Internet Archive. These will run in the slower Baseline Compiler directly. There are also some failures in Wasm compared to x86_64 and ARM that didn't turn up in the test suite (it passes everything) which I'm unable to narrow down right now. For example, WAD Commander has graphical glitches even though the game plays fine, and Google Earth stalls out with a runtime error. The reason I finished the Baseline Compiler support was on the hopes I'd smoke out some other bugs, and I did in fact find more to fix but it didn't fix these. On the other hand, these handcoded Wasm demos seem to work, as does this Wasm RISC-V emulator, this somewhat funky karts game and this Wasm Gameboy emulator:

It is entirely possible that some of this is simply due to other pre-existing bugs on our platform that this support just unmasks — after all, we were never able to run code like this before — and there are naturally changes in later Firefoxen that aren't in the ESR. I won't be able to assess that until it's pulled up further, of course, but for the time being you can use the JIT in 91ESR if you prefer/need the speed while further development stabilizes. Until then, please don't file issues on Wasm stuff that doesn't work unless you know why it doesn't work.

Next steps? The plan is to pull the 91ESR JIT up to Firefox 97 or 98 alpha and start on Ion development on that new base hopefully finishing in time to do one last pull-up to Firefox 102, i.e., the next ESR, and submit the finished JIT to Mozilla then. Longer term, we'd welcome support for additional configurations and the key is SupportsFloatingPoint() in js/src/jit/ppc64/Assembler-ppc64.h, which I have abused as a runtime gate. You should be able to tell from the comments in that file how to force the JIT to run on an unsupported configuration. I have implemented HasPPCISA3() which returns true on POWER9 (and Power10) so that appropriate codegen paths are run based on the CPU present. Most of the codegen will work on little-endian POWER8 except for a few places that will hit a forced crash. If you get this working and implement HasPPCISA27() or some such, then I will accept those changes assuming they are not massive. I will also accept big-endian patches, but you will have a much bigger job, and unless you're prepared to do little-endian emulation for Wasm or asm.js (like the limited little-endian support in TenFourFox's IonPower-NVLE for typed arrays) and maintain those changes certain things will never work on big.

Meanwhile, your contributions are still solicited especially on the new work to be done and we'll be getting that new tree up so you can participate. However, patches and PRs that will not be accepted are anything that regresses the core support for LE POWER9, spacing or style changes (we will be doing cleanup on the entire set before submitting to Mozilla, so please don't waste our time on this right now), or sets covering multiple issues (one catastrophe at a time, please). The faster we get this done, the faster we get it in the tree, and the better supported we'll be going forward.

Starting with Firefox 96, there will be the usual updates on building mozilla-release, but I'll also do a verification build on 91ESR and make any needed updates to patches, and upload updates to Github. Please post your constructive and reproducible issues in the comments or on Github for triage.

Firefox 95 on POWER


Firefox 95 is released, screenshot at right. The big new feature, besides speculative AOT JIT which doesn't apply to us yet, is RLBox, which compiles certain third-party libraries into safe WebAssembly, and then compiles them back into C, so they can be compiled a third time into pre-sanitized native code. This has obvious security benefits and the performance impact shouldn't be especially large, but it adds yet another build-time prerequisite: the WASI SDK. This kind of really sucks because now you have to have a third toolchain (it builds one whether you like it or not) besides clang and our preferred compiler, gcc. Pending internal package support, some distros have chosen simply to disable this for the immediate future, even including Fedora.

Besides the inconvenience the other main issue with this is while it's clearly safer native code, it's also slower native code by some non-zero factor, however small in a well-optimized PGO-LTO build. As such I've chosen to test to make sure it works but my "official" PGO-LTO build configs will have it turned off for the time being with --without-wasm-sandboxed-libraries. While it was very easy to build the smaller WASI libc, it doesn't have C++ headers, so the build goes bang if you try to use it as a WASI sysroot. Fortunately you can download a pre-built copy of the SDK and just pull out the system-independent wasi-sysroot and feed that to the Firefox build system with --with-wasi-sysroot=/where/it/at/wasi-sysroot. Then, to get the built-ins for linking pull out libclang_rt.builtins-wasm32.a and copy it to /usr/lib64/clang/13.0.0/lib/wasi (or wherever your clang libraries reside), and ensure you have wasm-lld. You may have to install lld to get wasm-ld; I had to, but Fedora has a package for it already. Now that you've looted the archive, you can just trash the rest of it and use your system version of clang assuming it's version 8+. This works and makes a functional version of Firefox but I can totally understand why this is unacceptable if you want to build from the raw source code.

After all that, though, I'm running Firefox 95 without RLBox simply because we need to wring all the performance out of the executable that we can. As such, here are the current .mozconfigs.

Debug

export CC=/usr/bin/gcc
export CXX=/usr/bin/g++

mk_add_options MOZ_MAKE_FLAGS="-j24" # or as you like
ac_add_options --enable-application=browser
ac_add_options --enable-optimize="-Og -mcpu=power9 -fpermissive"
ac_add_options --enable-debug
ac_add_options --enable-linker=bfd
ac_add_options --without-wasm-sandboxed-libraries

export GN=/home/censored/bin/gn # if you haz

Optimized

export CC=/usr/bin/gcc
export CXX=/usr/bin/g++

mk_add_options MOZ_MAKE_FLAGS="-j24"
ac_add_options --enable-application=browser
ac_add_options --enable-optimize="-O3 -mcpu=power9 -fpermissive"
ac_add_options --enable-release
ac_add_options --enable-linker=bfd
ac_add_options --enable-lto=full
ac_add_options --without-wasm-sandboxed-libraries
ac_add_options MOZ_PGO=1

export GN=/home/censored/bin/gn
export RUSTC_OPT_LEVEL=2

The PGO-LTO build patch is also updated.

Oh, by the way, in JIT news, I've mounted a debug browser and OpenPOWER Firefox can now run Doom. (Firefox 91 ESR shown.)

Still some glitches to work out, some of which I suspect aren't anything to do with Wasm support, but you couldn't do this on Firefox on OpenPOWER before. Does this count as a "Tonight's Game on OpenPOWER" entry? (*Firefox 2.0.0.20 on Windows 95 screenshot from Beta Archive.)