POWER10 sounds really great, but ...


IBM took the wraps off POWER10 officially today, a (Samsung-manufactured) 7nm monster in 18 layers with up to 15 SMT-8 cores (120 threads) with 2MB of L2 per core, up to 120MB of L3, 1 TB/s memory access, OpenCAPI and PCIe 5. New on-board is an embedded matrix math accelerator for specialized AI performance, multipetabyte memory clusters and transparent memory encryption with four times the number of AES engines than POWER9. Overall, IBM is touting that the processor is three times more energy efficient than POWER9 while being up to twice as fast at scalar and four times as fast at vector operations. General availability is announced for Q3 or Q4 of 2021.

First of all: damn. This sounds sweet. The dual-8 POWER9 Talos II under the desk with "just" 64 threads and PCIe 4 is already giving me sorrowful Eeyore eyes even though there's no guarantee what, if any, lower-end systems suitable as being workstations will be available when the processor is. But right now, what we do know is that right now Raptor has said there won't be POWER10 systems, and as it stands presently nobody else is making workstation-class OpenPOWER machines. Raptor, probably for reasons of NDAs, is playing this close to the vest, so what follows is merely my variably informed personal conjecture and may be completely inaccurate.

One of the truly incredible things about OpenPOWER — or at least POWER8 and POWER9 — is how far down you can see what the hardware is doing. In previous articles, we looked at emulating OpenPOWER at the bare metal level, and then even writing your own firmware bootkernel. But the bootloader and high-level firmware are really only the beginning: the build image created by op-build not only contains the Petitboot bootloader, but its Skiroot filesystem, Skiboot (containing OPAL, the OpenPOWER Abstraction Layer, which handles PCIe, interrupt and operating system services), Hostboot (which initializes and trains RAM, buffers and the bus), and the Self-Boot Engine which initializes the CPUs. Even the fused-in first instructions the POWER9 executes from its OTPROM to run the Self-Boot Engine are open source, and other than the OTPROM itself (it is a One-Time Programmable ROM, after all), everything is inspectable and changeable. And before the POWER9 executes those very first instructions, the Baseboard Management Controller that powers the system on has its own open firmware too. You know what your computer is doing, and you don't have to trust anyone's firmware build if you don't want to because you can always build and flash the system yourself.

Contrast this against the gyrations that x86 "open" systems have to struggle with. Do not interpret this as a slam against vendors like System76 or Purism because they're doing the best they can to deliver the most frequently used architecture in workstations and servers, in as unlocked a fashion as possible from processor manufacturers who are going in exactly the opposite direction. And there have been great improvements in untangling the tendrils of the Intel Management Engine from the processor, primarily through Coreboot's steady evolution. But even with these improvements where significant portions of the Intel ME are disabled, secret sauce is still needed to bring up the CPU and you have to trust that the sauce is only and specifically doing what it says it is, in addition to the other partitions of the ME which activated or not are still not fully understood. The situation is even worse for AMD Ryzen processors with the Platform Security Processor, which (at least the 3000 and 4000 variants) aren't presently supported by Coreboot at all, though System76 is apparently working on a port.

Don't just take my word for it: as of this writing no recent x86 system appears on the FSF Respects Your Freedom list, but the Talos II and T2 Lite both do (and I imagine the Blackbird is soon to follow). The Vikings D8 is indisputably libre, and has an FSF RYF certification, but is an AMD Opteron 4200, which is about eight or nine years old. As it stands I believe this is the most powerful x86 system still available on the FSF RYF list now that the D16 is out of production (Opteron 6200).

I think there's a reasonable argument to be had about how "open" something needs to be to be considered "libre" and at what point you could be considered to have meaningful control of your machine, but there's no denying there are aspects of modern x86 machines which you are prohibited by policy from getting into, and that means putting more faith in the processor vendor than they may truly deserve. (Don't get me started on GPUs, either. Or, for that matter, ARM.) Again, Raptor won't say, but their public disenchantment with POWER10 suggests that some aspects of the processor firmware stack are not open. This is a situation which is no better than x86, and I'm hoping this is merely an oversight on IBM's part and not a future policy.

To be effective, OpenPOWER needs to be more open than just the ISA being royalty-free, even though that's huge. To be sure, I think there has to be room for processor manufacturers to distinguish themselves in the market or you run the risk of a race to the bottom where people simply rip off designs (this is, I think, a real concern for RISC-V). I think sharing reference designs is necessary to get systems bootstrapped but I can't deny there's money in high performance applications, and high performance microarchitecture demands a return on investment to justify development costs. Similarly, to the extent that any pack-in hardware (like POWER9's Nest Accelerators) isn't part of the open ISA and are separately managed devices that simply share a die, to me it seems logical to also make it part of how a processor manufacturer can stand out to potential customers.

But the firmware absolutely needs to be as clean and available as the ISA. If the ISA is open and the instructions the CPU is running are part of that open standard, then any firmware components, which (ought to) entirely consist of those instructions, must be open too. If the CPU has pack-in hardware on the die that isn't part of the open ISA, then you should be able to bring up the chip without it. The standard that was set for current OpenPOWER should be the same standard for POWER10 or it doesn't really deserve the OpenPOWER name, and I'm worried that Raptor's insinuations imply IBM's standard isn't the same. Similarly, arguing that the currently incomplete situation with x86 is functionally equivalent to OpenPOWER (or, for that matter, RISC-V) may be well-intentioned but is disingenuous. The FSF may be ideologues on binary blobs, but that doesn't make their position wrong, and the entire OpenPOWER ecosystem from IBM on down should recognize how much goodwill and prominence the openness of POWER8 and POWER9 has generated for the community.

I hope I'm wrong, but I'm concerned I'm not. Let's make sure we get POWER10 right or we won't be practicing what we preach, and that's going to kill us in the crib.

Comments

  1. I expect that it has something to do with an inability to segment the market through purely technical means. Look at the way they handle their virtualization offerings: missing IBM firmware means missing AIX functionality - which is hard to get mad about. But how mad would you be if missing firmware (and instructions used only by said firmware) meant that you've got silicon operating outside of your control, flipping switches in the dark to artificially segment the market? Meet the new boss same as the old boss.

    ReplyDelete
  2. Looking at the remaining partners of OpenPOWER, I think it is basically dead, nowadays. No major HW vendors left at the top, outside of IBM themselves. Nvidia is now downgraded from a top-tier member to somewhere in the bottom tier and Mellanox went with them. Without those two, we basically have system integrators or other companies that just don't openly sell that kind of HW anyways. Inspur even saw fit to recently change their partner logo to explicitly note "Power Systems." At that point, IBM no longer needs to publicly share the inner guts of the CPU, since they are the main partner in that organization.

    ReplyDelete
  3. Looking at https://twitter.com/raptorcompsys/status/1295364416469377026?lang=en it appears raptor is only saying it wont have power systems this year. That may be a courtesy thing to allow IBM a marketing window on the new chip. According to https://www.itjungle.com/2020/11/23/ibm-reveals-power10-rollout-plan-begins-power11/ they are releasing the bigger iron first at end of Q3 and not releasing the two socket ones till early next year. From what I have read about x86 recently, intel seem to have hit a bit of a wall with 10nm and 7nm chip yields (and are only shipping them in higher end/more pricey servers) - even looking outside Intel for expertise to fix the 7nm issue, so perhaps Power10 (with its smaller transister counts) is starting to look a better longer term option in the enterprise space (not sure if AMD has similar issues). Hopefully Quantum Computing will be ready to step in by the time conventional cpu architectures can no longer go forward. Wonder if there is a QC port of Linux in the works ?

    ReplyDelete

Post a Comment

Comments are subject to moderation. Be nice.