Posts

Latest Posts

The first production RISC-V workstation?


No, not the RiscPC, a RISC-V PC. And, not counting the various one-offs, it appears to be the very first production RISC-V workstation available. SiFive is announcing the RISC-V PC at the Linley Group Fall Virtual Processor Conference, based on the Freedom U740 ("FU740") to be introduced at the same time next month.

Precious little details are available, such as loadout, options, availability and most of all cost, but when has that stopped us from idly speculating before, eh? It is virtually certain the machine will be composed largely of off-the-shelf components other than the CPU, which is the real mystery of interest. The FU740 appears to be an evolution of the FU540, which is a 64-bit 1.5GHz+ part with four U54 "little" cores combined with one S1-series "big" core and 2MB of L2 cache on a 28nm process. Plainly, neither of these cores are even remotely in the ballpark with OpenPOWER: SiFive quotes CoreMark/MHz scores of 3.01 for both the U54 and S54, whereas the POWER9 easily achieves over 160. While the FU740 will almost certainly be faster due to its probable basis on the U74, it is difficult to imagine that the performance gulf will be narrowed significantly (the U74 edges up to around 5). You should not buy one and expect it to compare favourably with x86 or a Raptor system.

On the other hand, there's a good chance this will be another truly open system based on the fact that the Freedom E300 and U500 series are open source under the Apache license. While some parts of SiFive are proprietary, this line is not, and we presume that the U700 series will be likewise. RISC-V still lacks firm specs for vector and bit manipulation instructions, and this certainly hurts them for desktop and mobile applications, but this is a known deficiency and is being worked on. Assuming no shenanigans with the firmware, there's encouraging potential even in this early form.

I'm unambiguously on Team Power because of my long history with the architecture, but this blog is certainly interested in all kinds of free vendor-unencumbered computing, and this machine may well represent another such system. And it's newsworthy as the first RISC-V system that's at least workstation form factor even if its likely performance doesn't currently make it a credible daily driver. But maybe that's not the point: the point is to get developers on the architecture in a way that's bigger than an evaluation board (cf. Linus Torvalds and ARM), meaning it doesn't have to be their only daily driver; it just has to "be there" so people think about it. More on cost and specs and "how open is it" when we actually see it in October.

Moar OpenPOWER cores plz


More news from virtual OpenPOWER Summit 2020: I mentioned it would be interesting to see what other cores would pop up on the OpenPOWER Github and indeed following on from the PowerPC A2I comes another A2 variant, the PowerPC A2O.

Announced today by IBM and released under the standard OpenPOWER license, the A2O is an evolved 64-bit PowerPC A2 compliant to ISA 2.07, comparable to POWER8 (the A2I was 2.06) under the embedded-focused Book III-E, and can run both big or little endian. At 45nm it was intended for 3GHz+ speeds; at 7nm it is expected to achieve 4.2GHz speeds at 0.85W, or 3GHz at 0.25W. Unlike the strictly in-order and slightly more power-thrifty A2I the A2O is out-of-order and prioritizes single-threaded performance, but it's only SMT-2 versus the A2I which is SMT-4. Even this is theoretical, however, because the documentation notes that only single-thread generation has been attempted so far. Each core has an AXU similar to the A2I that appears to offer FPU operations in the Verilog code, plus a branch unit, FXUs for single and complex integer operations respectively, and a load/store unit. There also appears to be a basic MMU, though the core allows running without one relying entirely on ERATs, but unfortunately I couldn't find a vector unit (the A2I as released didn't come with one either).

IBM casts the A2O as being more appropriate for artificial intelligence, autonomous driving and security, whereas the A2I was meant for streaming, network processing and data analysis. I'm not sure I believe either of those claims, but despite apparently being just an evolutionary improvement over the A2I I think the A2O is more promising especially for smaller-scale systems. By being 2.07-compliant it's already almost a mainline POWER8 and the interest that has bubbled up around A2I should find even more to like in A2O. Adding a radix MMU implementation and vector operations wouldn't be trivial, and even this single-thread implementation has high FPGA utilization, but I think this would be a better basis than A2I for that hypothetical OpenPOWER developer board everybody seems to want or even a mythical modern PowerPC laptop. Like A2O, A2I still doesn't replace Microwatt, which is much better documented, better supported, can actually boot a Linux kernel, and if for no other purpose than pedagogy is a far more purposeful model for OpenPOWER systems. That said, A2I's very presence is yet another choice and yet another great reason to be on board with OpenPOWER.

IBM open-sources PowerAI as OpenCE


News from today's COVID-19 socially distanced virtual OpenPOWER Summit: IBM announced the open-sourcing of their PowerAI package today as OpenCE, the Open Cognitive Environment for deep learning and machine learning applications. The code should build on any Linux-based OpenPOWER system, including Raptor-family workstations and servers, and the Github repository contains everything needed to build Tensorflow, Pytorch, XGBoost and related projects and dependencies. If building binaries from scratch leaves you cold waiting for the goodies, Oregon State University simultaneously announced plans to offer pre-built ppc64le binaries for each upcoming tagged release both with and without CUDA support. Unfortunately, not everything is open: you'll still need to register and download a separate blob from Nvidia if you intend to use CUDA, even though it can be reportedly downloaded at no cost afterwards, and if you do you'll naturally be limited to Nvidia GPUs (which you can't use for 3D acceleration on OpenPOWER currently due to the lack of a working open-source driver). Still, here's a high-power option for your machines coming from someone who knows how to optimize for the platform, and Raptor's PowerAI-specific SKU is a turnkey package configured expressly for that purpose (and it's even in stock). Perhaps OpenCE is something they could preinstall for even greater value now that it's available.

Microwatt floats


When we last visited Microwatt, the little synthesizeable OpenPOWER core that could, we looked at how you could hack instructions in. Or, you can sit back and wait for the PRs from IBM, including now a simple FPU. While this pull request describes its performance in modest terms, impressively it operates exactly the same (and even authentically "fails" the same tests in the same fashion) as the FPU in the POWER9. There is still no (full) supervisor mode, and no vector unit, but Microwatt is now advanced enough to boot a Linux kernel. The possibility of a single-board Microwatt-based system (and fully reprogrammable, too) gets closer every day.

Firefox 80 on POWER


Firefox 80 is available, and we're glad it's here considering Mozilla's recent layoffs. I've observed in this blog before that Firefox is particularly critical to free computing, not just because of Google's general hostility to non-mainstream platforms but also the general problem of Google moving the Web more towards Google.

I had no issues building Firefox 79 because I was still on rustc 1.44, but rustc 1.45 asserted while compiling Firefox, as reported by Dan Horák. This was fixed with an llvm update, and with Fedora 32 up to date as of Sunday and using the most current toolchain available, Firefox 80 built out of the box with the usual .mozconfigs.

Since there was a toolchain update, I figured I would try out link-time optimization again since a few releases had elapsed since my last failed attempt (export MOZ_LTO=1 in your .mozconfig). This added about 15 minutes of build-time on the dual-8 Talos II to an optimized build, and part of it was spent with the fans screaming since it seemed to ignore my -j24 to make and just took over all 64 threads. However, it not only builds successfully, I'm typing this post in it, so it's clearly working. A cursory benchmark with Speedometer 2.0 indicated LTO yielded about a 4% improvement over the standard optimized build, which is not dramatic but is certainly noticeable. If this continues to stick, I might try profile-guided optimization for the next release. The toolchain on this F32 system is rustc 1.45.2, LLVM 10.0.1-2, gcc 10.2.1 and GNU ld.bfd 2.34-4; your mileage may vary with other versions.

There's not a lot new in this release, but WebRender is still working great with the Raptor BTO WX7100, and a new feature available in Fx80 (since Wayland is a disaster area without a GPU) is Video Acceleration API (VA-API) support for X11. The setup is a little involved. First, make sure WebRender and GPU acceleration is up and working with these prefs (set or create):

gfx.webrender.enabled true
layers.acceleration.force-enabled true

Restart Firefox and check in about:support that the video card shows up and that the compositor is WebRender, and that the browser works as you expect.

VA-API support requires EGL to be enabled in Firefox. Shut down Firefox again and bring it up with the environment variable MOZ_X11_EGL set to 1 (e.g., for us tcsh dweebs, setenv MOZ_X11_EGL 1 ; firefox &, or for the rest of you plebs using bash and descendants, MOZ_X11_EGL=1 firefox &). Now set (or create):

media.ffmpeg.vaapi-drm-display.enabled true
media.ffmpeg.vaapi.enabled true
media.ffvpx.enabled false

The idea is that VA-API will direct video decoding through ffmpeg and theoretically obtain better performance; this is the case for H.264, and the third setting makes it true for WebM as well. This sounds really great, but there's kind of a problem:

Reversing the last three settings fixed this (the rest of the acceleration seems to work fine). It's not clear whose bug this is (ffmpeg, or something about VA-API on OpenPOWER, or both, though VA-API seems to work just fine with VLC), but either way this isn't quite ready for primetime yet on our platform. No worries since the normal decoder seemed more than adequate even on my no-GPU 4-core "stripper" Blackbird. There are known "endian" issues with ffmpeg, presumably because it isn't fully patched yet for little-endian PowerPC, and I suspect once these are fixed then this should "just work."

In the meantime, the LTO improvement with the updated toolchain is welcome, and WebRender continues to be a win. So let's keep evolving Firefox on our platform and supporting Mozilla in the process, because it's supported us and other less common platforms when the big 1000kg gorilla didn't, and we really ought to return that kindness.

POWER10 sounds really great, but ...


IBM took the wraps off POWER10 officially today, a (Samsung-manufactured) 7nm monster in 18 layers with up to 15 SMT-8 cores (120 threads) with 2MB of L2 per core, up to 120MB of L3, 1 TB/s memory access, OpenCAPI and PCIe 5. New on-board is an embedded matrix math accelerator for specialized AI performance, multipetabyte memory clusters and transparent memory encryption with four times the number of AES engines than POWER9. Overall, IBM is touting that the processor is three times more energy efficient than POWER9 while being up to twice as fast at scalar and four times as fast at vector operations. General availability is announced for Q3 or Q4 of 2021.

First of all: damn. This sounds sweet. The dual-8 POWER9 Talos II under the desk with "just" 64 threads and PCIe 4 is already giving me sorrowful Eeyore eyes even though there's no guarantee what, if any, lower-end systems suitable as being workstations will be available when the processor is. But right now, what we do know is that right now Raptor has said there won't be POWER10 systems, and as it stands presently nobody else is making workstation-class OpenPOWER machines. Raptor, probably for reasons of NDAs, is playing this close to the vest, so what follows is merely my variably informed personal conjecture and may be completely inaccurate.

One of the truly incredible things about OpenPOWER — or at least POWER8 and POWER9 — is how far down you can see what the hardware is doing. In previous articles, we looked at emulating OpenPOWER at the bare metal level, and then even writing your own firmware bootkernel. But the bootloader and high-level firmware are really only the beginning: the build image created by op-build not only contains the Petitboot bootloader, but its Skiroot filesystem, Skiboot (containing OPAL, the OpenPOWER Abstraction Layer, which handles PCIe, interrupt and operating system services), Hostboot (which initializes and trains RAM, buffers and the bus), and the Self-Boot Engine which initializes the CPUs. Even the fused-in first instructions the POWER9 executes from its OTPROM to run the Self-Boot Engine are open source, and other than the OTPROM itself (it is a One-Time Programmable ROM, after all), everything is inspectable and changeable. And before the POWER9 executes those very first instructions, the Baseboard Management Controller that powers the system on has its own open firmware too. You know what your computer is doing, and you don't have to trust anyone's firmware build if you don't want to because you can always build and flash the system yourself.

Contrast this against the gyrations that x86 "open" systems have to struggle with. Do not interpret this as a slam against vendors like System76 or Purism because they're doing the best they can to deliver the most frequently used architecture in workstations and servers, in as unlocked a fashion as possible from processor manufacturers who are going in exactly the opposite direction. And there have been great improvements in untangling the tendrils of the Intel Management Engine from the processor, primarily through Coreboot's steady evolution. But even with these improvements where significant portions of the Intel ME are disabled, secret sauce is still needed to bring up the CPU and you have to trust that the sauce is only and specifically doing what it says it is, in addition to the other partitions of the ME which activated or not are still not fully understood. The situation is even worse for AMD Ryzen processors with the Platform Security Processor, which (at least the 3000 and 4000 variants) aren't presently supported by Coreboot at all, though System76 is apparently working on a port.

Don't just take my word for it: as of this writing no recent x86 system appears on the FSF Respects Your Freedom list, but the Talos II and T2 Lite both do (and I imagine the Blackbird is soon to follow). The Vikings D8 is indisputably libre, and has an FSF RYF certification, but is an AMD Opteron 4200, which is about eight or nine years old. As it stands I believe this is the most powerful x86 system still available on the FSF RYF list now that the D16 is out of production (Opteron 6200).

I think there's a reasonable argument to be had about how "open" something needs to be to be considered "libre" and at what point you could be considered to have meaningful control of your machine, but there's no denying there are aspects of modern x86 machines which you are prohibited by policy from getting into, and that means putting more faith in the processor vendor than they may truly deserve. (Don't get me started on GPUs, either. Or, for that matter, ARM.) Again, Raptor won't say, but their public disenchantment with POWER10 suggests that some aspects of the processor firmware stack are not open. This is a situation which is no better than x86, and I'm hoping this is merely an oversight on IBM's part and not a future policy.

To be effective, OpenPOWER needs to be more open than just the ISA being royalty-free, even though that's huge. To be sure, I think there has to be room for processor manufacturers to distinguish themselves in the market or you run the risk of a race to the bottom where people simply rip off designs (this is, I think, a real concern for RISC-V). I think sharing reference designs is necessary to get systems bootstrapped but I can't deny there's money in high performance applications, and high performance microarchitecture demands a return on investment to justify development costs. Similarly, to the extent that any pack-in hardware (like POWER9's Nest Accelerators) isn't part of the open ISA and are separately managed devices that simply share a die, to me it seems logical to also make it part of how a processor manufacturer can stand out to potential customers.

But the firmware absolutely needs to be as clean and available as the ISA. If the ISA is open and the instructions the CPU is running are part of that open standard, then any firmware components, which (ought to) entirely consist of those instructions, must be open too. If the CPU has pack-in hardware on the die that isn't part of the open ISA, then you should be able to bring up the chip without it. The standard that was set for current OpenPOWER should be the same standard for POWER10 or it doesn't really deserve the OpenPOWER name, and I'm worried that Raptor's insinuations imply IBM's standard isn't the same. Similarly, arguing that the currently incomplete situation with x86 is functionally equivalent to OpenPOWER (or, for that matter, RISC-V) may be well-intentioned but is disingenuous. The FSF may be ideologues on binary blobs, but that doesn't make their position wrong, and the entire OpenPOWER ecosystem from IBM on down should recognize how much goodwill and prominence the openness of POWER8 and POWER9 has generated for the community.

I hope I'm wrong, but I'm concerned I'm not. Let's make sure we get POWER10 right or we won't be practicing what we preach, and that's going to kill us in the crib.