Posts

Latest Posts

Low-level change to Firefox 70 and ESR coming


If you are using Firefox on 64-bit Power, you'll want to know about bug 1576303 which will be landing soon on the beta and ESR68 trees to be incorporated into 70 and the next ESR respectively. This fixes a long-standing issue with intermittent and difficult to trace crashes (thanks to Ted Campbell at Mozilla for figuring out the root cause and Dan Horák for providing the hardware access) due to what in retrospect was a blatant violation of the ELF ABI in xpconnect, which glues JavaScript to native XPCOM. This needed several dodgy workarounds until we found the actual culprit.

The patch is well tested on multiple little-endian systems including this Talos II, but because it's an issue with register allocation in function calls the issue also theoretically affects big-endian Power even though we haven't seen any reports. I'm pretty sure the code I wrote will work for big-endian but none of my big-endian Power systems run mainline Firefox (and TenFourFox even on the G5 is 32-bit, where the problem isn't present). If you're using a big-endian system, you may want to pull a current release and make sure there is no regression in the browser with the changes; if there is and you can bisect to it, post in the bug so we can do a follow-up fix. On the other hand, if you're building from an old ESR such as 52 (the last non-Rust-required one), you may want to backport this fix because the problem has been there pretty much since it was first written.

Stuff like this actually proves Linus Torvalds' point that "as long as everybody does cross-development, the platform won't be all that stable." Linus was talking about ARM-based servers being undercut by a dearth of ARM-based PCs, but the point is also true here: 64-bit Power may do well in the data center but it was rarely used for workstations other than the Power Mac G5 and the small number of non-Apple PowerPC 970 towers, meaning this bug went undiscovered until people like us finally started dogfooding Power-based desktops again. (For that matter, the official PowerPC Mac OS X builds of Firefox were also always 32-bit, even on the G5, so no one would have noticed it there.) There's just no substitute for improving the quality and quantity of software for Power ISA like having one under your desk, and as the number of machines increases I expect we'll get more of these ugly corner bugs ironed out in other packages too.

CentOS 7-1908 available


CentOS 7-1908 is now available; this is a maintenance release with multiple updated components derived from Red Hat Enterprise Linux 7.7. Particularly interesting is that there are no less than three Power ISA downloads available, one for big-endian ppc64 (though POWER7 and up only: sorry G5 owners), one for ppc64le and a special build for POWER9 (which appears to also be little-endian), each with its own Everything, NetInstall and Minimal flavours.

Linux 5.3 for POWER, and ppc64le gets a Fedora Desktop


Linus always says that no Linux release is a feature release and numbers are purely bookkeeping instead of goalposts, but Linux 5.3 has landed. There are many changes for the x86 side of the fence that I won't mention here, but in platform-agnostic changes, 5.3 adds support for the AMD Navi GPU in amdgpu, allows loading of xz-compressed firmware files, further improves the situation with process ID reuse with additional expansions to pidfd (including polling support), refinements to the scheduler by supporting clamped processor clock ranges, and support for 0.0.0.0/8 as a valid IPv4 range, allowing another 16 million IPv4 addresses while IPv6 continues to not set the world on fire.

Power ISA-specific changes in this release are relatively few but still noteworthy. Besides support for LZMA and LZO-compressed uImages, there is now Power ISA support for HAVE_ARCH_HUGE_VMAP, which enables (as the name would suggest) huge virtual memory mappings. With additional code in a future kernel, this should facilitate upcoming performance improvements. There is also additional /proc support for getting statistics on how virtual CPUs are dispatched to physical cores by systems using the Power hypervisor.

Meanwhile, this won't make much difference to people like me who have been using Fedora for awhile, but if you want to experiment with other distros on your POWER9 system Fedora is working on Live and Workstation ISOs for ppc64le. Currently this is Rawhide only (which is what will become F32) and you can of course already install from Server and switch to the Workstation flavour, or install over the network. However, it's just another positive indicator that IBM's purchase of Red Hat will continue facilitating improvements in Linux in general and Fedora/RHEL support for OpenPOWER in particular, especially as the installed base of POWER9 workstations like our T2s and Blackbirds continues to grow in numbers. In fact, although we don't have statistics, it's still quite possible (counting box for box) that there are now more discrete POWER9 workstations in operation out there than there are servers.

A beginner's guide to hacking Microwatt


Many improvements have occurred in Microwatt, the little VHDL Power ISA softcore, so far the easiest way — particularly for us hobbyists — of getting an OpenPOWER core in hardware you can play with. (The logo is not an official logo for Microwatt, but I figured it would be fun to try my hand at one in Krita.) Even though it still has many known and acknowledged deficiencies it's actually pretty easy to get it up and running in simulation, and easier still on POWER9 hardware where the toolchain is already ready to go.

I'm no VHDL genius personally, but this seemed like as good a time as any to learn. ghdl is available for most distros, though Fedora 30 and earlier curiously lack it for ppc64le; fortunately, Dan Horák's builds work fine. So let's get the basics up. If you're on F30 as I am, install ghdl from his repo first. These URLs may vary; they are what was current at the time of this article.

% sudo dnf install https://copr-be.cloud.fedoraproject.org/results/sharkcz/danny/fedora-30-ppc64le/01028671-ghdl/ghdl-grt-0.37dev-1.20190820gitf977ba0.fc30.ppc64le.rpm
[...]
% sudo dnf install https://copr-be.cloud.fedoraproject.org/results/sharkcz/danny/fedora-30-ppc64le/01028671-ghdl/ghdl-0.37dev-1.20190820gitf977ba0.fc30.ppc64le.rpm
[...]

For Debian and other distros, install from your package manager as appropriate.

Next, let's install Microwatt and MicroPython and make sure all that works. This is essentially the same demo Anton showed at the OpenPOWER summit. If you are doing this on an inferior x86_64 system (or at least something that isn't POWER8 or POWER9), you will need to have a Power ISA C cross-compilation toolchain installed to properly build MicroPython. Adjust the make -jXX to your number of threads. This sequence of commands will end up with microwatt/ and micropython/ installed in separate directories at the same filesystem depth (in my case, ~/src). Keep it this way because we will be adding one more project at the end.

% git clone git://github.com/antonblanchard/microwatt.git
Cloning into 'microwatt'...
[...]
Resolving deltas: 100% (818/818), done.
% cd microwatt
% make -j24
ghdl -a --std=08 decode_types.vhdl
[...]
% cd ..
% git clone git://github.com/mikey/micropython.git
Cloning into 'micropython'...
[...]
Resolving deltas: 100% (52248/52248), done.
% cd micropython
% git checkout powerpc
Already on 'powerpc'
Your branch is up to date with 'origin/powerpc'.
% cd ports/powerpc
% make -j24
mkdir -p build/genhdr
[...]
MISC freezing bytecode
CC build/_frozen_mpy.c
LINK build/firmware.elf
[...]
% cd ../../../microwatt
% ln -s ../micropython/ports/powerpc/build/firmware.bin simple_ram_behavioural.bin
% ./core_tb > /dev/null
MicroPython v1.11-320-g7747411e9 on 2019-09-14; bare-metal with POWERPC
Type "help()" for more information.
>>> 1+2
3

The simulation is rather slow, made worse by all the copious debugging output (which here is sent to the bitbucket), but it does work as advertised. To make core_tb stop, you will probably need to kill it from another terminal session, depending on your shell. (I had to.)

Let's now turn to adding instructions to Microwatt. Since having to manually kill the simulation is annoying — it would be nice if the simulation could gracefully halt under program control — we'll implement a wait instruction as an educational example. This instruction is new in ISA 3.0B; the ISA book explains its operation as that it "causes instruction fetching and execution to be suspended. Instruction fetching and execution are resumed when the events specified by the WC field [the wait condition, its sole constant parameter] occur." Strictly speaking probably the stop instruction would have the most authentic semantics — "The thread is placed into power-saving mode and execution is stopped." — but for obvious reasons this is a privileged instruction because this would completely halt that hardware thread until a system reset or other system-level event. Also, it doesn't take any parameters, so it's not as nice an illustration.

wait's sole supported WC field code is 0b00; this causes the instruction to "[r]esume instruction fetching and execution when an exception, an event-based branch exception, or a platform notify occurs." In practical circumstances, if you execute wait from a userspace program, these events happen all the time and the instruction seems like a no-op.

% more test.c
#include <stdio.h>

int main(int argc, char** argv) {
  __asm__("wait 0\n");
  fprintf(stderr, "ok\n");
  return 0;
}
% gcc -o test test.c
% ./test
ok

However, on a little core doing nothing else, it well might be a terminal instruction sequence, so since we can run it from userspace anyway let's go ahead and implement a hal-fassed version of it which will cause the simulation to conclude gracefully. This is the diff that does so, applied against ab34c483. Let's analyze it piece by piece.

First, let us note that there's already code in Microwatt for an ungraceful exit, such as when you execute an undefined instruction; this terminates with an error. We could simply use that, but I'd prefer to do something cleaner, so we'll define a new signal for halting.

Next, we will define the opcode format in the instruction decoder. Conveniently, the instructions td and tdi (trap doubleword and trap doubleword immediate, respectively) have a similar encoding where their common constant argument — the "TO" or trap operation bits — occupies the same bit field. (Note that td et al. allow five bits here but wait only takes the two least significant bits with the other three reserved. We will handwave this away since they are invariably encoded as zero.) To get these bits decoded for us, we specify that the first constant argument is encoded as TOO. You can see other encodings for registers and immediates in the surrounding templates.

Next, we tell Microwatt how to identify the opcode. The bit fields for the opcode pieces are simply cribbed from the ISA book.

Next, we add the actual symbols for the instruction and the operation, thus linking them up with the decoder.

Then, we write the operation's logic. For illustrative purposes, since only 0b00 is allowed and other bit combinations are reserved, we will have the simulation assert and ungracefully terminate on other values using the existing code. Otherwise, we set the halted signal.

Finally, we write the code to actually gracefully halt when the halted signal appears, using the built-in VHDL test bench function stop() (coincidentally named, as it happens).

With this patch applied, rebuild Microwatt with a make. To test it, we'll need something that actually executes this instruction, so let's make a simple "hello world" type example using pieces from MicroPython and Microwatt's own built-in "hello world." A small assembly language stub (in both of these examples, head.S) acts as a trampoline into whatever our main() is, detecting if we are running it within QEMU or from the VHDL test bench. However, we won't have a libc and we'll need routines to actually send and receive data with the "serial console" presented by the core. We also need a couple hints for the linker to make a binary we can actually run in the simulator.

I've compiled all of these pieces into a Github project "Microhello," which you can use as a scaffold for your own programs to run on the core. I've tried to make it a little more modularized than the Microwatt "Hello World" example as well. Clone it at the same depth as microwatt/ and micropython/, then do make runrun to replace the symbolic link to the MicroPython binary with Microhello:

% git clone git://github.com/classilla/microhello.git
[...]
% cd microhello
% make runrun
cc -I. -g -Wall -std=c99 -msoft-float -mno-string -mno-multiple -mno-vsx -mno-altivec -mlittle-endian -fno-stack-protector -mstrict-align -ffreestanding -Os -fdata-sections -ffunction-sections -c -o build/main.o main.c
cc -I. -g -Wall -std=c99 -msoft-float -mno-string -mno-multiple -mno-vsx -mno-altivec -mlittle-endian -fno-stack-protector -mstrict-align -ffreestanding -Os -fdata-sections -ffunction-sections -c -o build/uart_core.o uart_core.c
cc -I. -g -Wall -std=c99 -msoft-float -mno-string -mno-multiple -mno-vsx -mno-altivec -mlittle-endian -fno-stack-protector -mstrict-align -ffreestanding -Os -fdata-sections -ffunction-sections -c -o build/string.o string.c
cc head.S -c -o build/head.o
ld -N -T powerpc.lds -o build/firmware.elf build/main.o build/uart_core.o build/string.o build/head.o powerpc.lds
size build/firmware.elf

text   data    bss    dec    hex filename
6508      0     24   6532   1984 build/firmware.elf
objcopy -O binary build/firmware.elf build/firmware.bin
( cd ../microwatt && rm -f simple_ram_behavioural.bin )
/usr/bin/make run
make[1]: Entering directory '/home/censored/src/microhello'
( cd ../microwatt && \
ln -s ../microhello/build/firmware.bin simple_ram_behavioural.bin && \
./core_tb > /dev/null )
PowerPC to the People

We neatly came to a halt. Yay!

The serial console library is in uart_core.c and a basic implementation of puts() (and strlen()) is in string.c. The main() is very simple. Minus the comments, here is main.c in its entirety:

#include "uart_core.h"
#include "string.h"

int main(int argc, char** argv) {
  uart_init_ppc(argc);

  puts("PowerPC to the People");
  __asm__("wait 0\n");
  return 0;
}

The trampoline uses the start of execution to determine what mode to initialize the serial console in, passing that to main() in r3, which in the Power ABI is the first argument to the function (argc). We then puts() the string and execute a wait 0 to terminate. Easy.

To prove the argument is being evaluated, change the instruction to wait 3 and re-run with make runrun. Notice how it terminates:

PowerPC to the People
make[1]: *** [Makefile:16: run] Error 1

If you run ./core_tb (in the microwatt/) directory without sending the output to /dev/null, you will see the message from our implementation in the log with the invalid wait condition.

Lastly, if you remove the wait instruction entirely and re-run with make runrun, then the test bench will loop forever echoing our string repeatedly, bouncing in and out of our code on the trampoline, until you kill it.

Microwatt is fun, simple, easy to experiment with and a great way to better understand what Power ISA does under the hood. While its performance is no barnburner, as a pedagogical aid it's a great little proof of concept, and it can certainly be the basis for something bigger. In a future article we'll actually synthesize this core and do a little more with it in actual hardware.

Firefox 69 on POWER


A brief note to say so far no major issues with Firefox 69 on Power ISA and this post is being made from it on my T2. (We're still dealing with bug 1576303 for Firefox 70, however.) As with Fx68, the working build configurations for ppc64le are unchanged from Fx67.

The VMX eagle is landing in Firefox 70 (plus: which core should open the door?)


Hugo Landau has an interesting take on the new open-OpenPOWER world. He points out, correctly, that Power ISA is a big win for open architectures because it has maturity in both the embedded and server spaces, but he'd like to see an actual production core opened as well (Microwatt is a lovely MVP and a great proof of concept but it is clearly for experimentation, not for production).

His suggestion is a softcore version of the PPC 405. PowerPC 4xx is a very common embedded CPU family indeed (the POWER8 OCC even has one inside of it), and in the Power.org days IBM was even willing to make it available to academia and researchers. He also suggests open-sourcing Mambo, IBM's currently proprietary simulator.

Open-sourcing Mambo is especially appealing to me trying to do simulation work of my own and not being able to do it on a POWER9! (It claims there is a POWER9 version for Debian, but the install directions and download area strictly show x86_64.) I also think there would be little non-IBM IP to stand in the way of doing so. On the other hand, although opening up the 405 would be admirable, I'm not sure how much it would accomplish in practice: it's 32-bit, not 64-bit; it's strictly big-endian (I like big-endian personally and three of the five systems on this KVM are big-endian, but we all know where the market's going and OpenPOWER in particular emphasizes LE); and it lacks VMX, a/k/a AltiVec. That brings us to Firefox.

In the TenFourFox world myself and several contributors did a fair bit of work on AltiVec acceleration to beef up performance on G4 and G5 systems. (Editorial note: I only use the term AltiVec for Apple systems and chips made by Motorola/Freescale, since Apple used both Motorola/Freescale and IBM parts, and Motorola/Freescale (now NXP) owned the trademark. IBM never owned nor licensed this trademark and always called it VMX, so in OpenPOWER, it's VMX. For that matter neither did P.A. Semi, so the PA6T has VMX too. Even with the G5, although its vector unit was popularly called AltiVec, IBM never officially referred to it by that name.) There are also many opportunities for VMX acceleration in mainline Firefox; depending on your compiler settings, these might get silently enabled already (such as qcms). libpng even has support for VSX. However, many in-tree components either never had the build-system glue written to turn on VMX support (libjpeg, libpng) or they're based on custom SIMD code Mozilla wrote that has no Power ISA equivalent.

For Firefox 70, build system support for VMX, VSX and VSX-3 compiler flags plus runtime detection is now available, written by yours truly, along with the first of the TenFourFox patches I updated and upstreamed to mainline Firefox (this one for fast scanning of text fragments for wide characters). I'm also hoping the libjpeg VMX enablement patch lands in time for merge with several more VMX patches to come. My work on the Firefox Power JIT is somewhat slowed by my continuing responsibilities to TenFourFox and compiler issues such as bug 1576303, which is why I wanted to get a couple quick wins with VMX stuff I already had on the shelf.

Allow me to close the loop on our core digression, though. In bug 817058 I'm asked a question by one of the Mozilla devs: can they just assume every Power chip someone is running Firefox on would support VMX? The answer, even for 64-bit Power, is no, because of poor choices like the AmigaOne X5000 running the QorIQ P5020 which has no SIMD. However, Rust supports compiling for SIMD and Power is a supported architecture, which means Rust supports VMX too, and Mozilla would be foolish not to take advantage of that. Assuming SIMD features are "just present" will become increasingly common and that means that continuing to run parts that don't have VMX (let alone VSX) will become an even bigger losing game on the desktop than it already was. Rather than the 405 I'd personally like to see something like the G5 itself be made openly available: it's POWER4, so it's 64-bit and largely upwardly compatible but wouldn't be a commercially competitive product at the high tiers IBM cares about, it has a VMX unit but it's IBM's (co-developed from the G4/7400), and it's fairly well-understood. Downclock it to reduce power consumption and it could even be a credible upper-end embedded chip. The only thing it lacks is a true little-endian mode.

More on Firefox 69 when it is officially released next week.

Support from a silicon turnip


Raptor is asking users to do a brief checklist before asking for support to help streamline problem determination. I think this just makes good sense, but although it's reasonable to assume most users would have another system around that can talk to the BMC (I use my Quad G5 for this), it might be nice in a future firmware version to have some sort of confidence testing. I'm not sure how that would look necessarily in implementation but I know when I was trying to determine why my kernel was freaking out that eliminating hardware as a cause, however unlikely, would have been helpful.

Ordinarily this would merit merely a brief informational item, except let's consider it in the context of my earlier underdeveloped pontification: Raptor must now have enough of an installed base that streamlining support is now necessary. I'm an early adopter; my Talos II is serial #12 and my Blackbird is serial #75. Back in those distant bygone days of 2018 with the early firmware that ran like a wind tunnel, I pretty much conversed with support directly over E-mail (I suspect it was Tim himself) and handled everything that way, but that clearly wouldn't scale beyond a certain number of even technically adept users. (I did comment at the time that it was the best support I'd ever had with any computer system and I still think so.)

We don't know how many Talos-family systems are out there, and Raptor is not a public company, so sales figures are kept close to the vest. I don't really begrudge them this, either, because pro-Power bigots like me would still use the platform even if we were the only ones out there and haters gonna hate whether there's 10 or 10 million. (However, if people want to post serial numbers in the comments, we can find the highest one and make an educated guess.) I think we can safely assume the support volume is not being driven by poor quality, so if support volume has increased to a critical mass where changes must be made, then that must be due to enough machines out there actually being used. And as I said in the prior article, moving enough machines is the only sane way to get the cost down. I stand by my musings that a good second workstation-market supplier could have advantages for both volume and market stability, but we also don't want some knockoff company that isn't beholden to open libre computing principles sucking the life from this segment with a race to the bottom.

Having said that: I hear things and people tell me stuff, and while I'm sworn to secrecy right now, I am permitted to say obliquely that a promising development in moving more machines is afoot. I think that's as much as I should say on the subject but if it pans out, I think all of us in the OpenPOWER world will be very, very pleased.

Blood from a silicon turnip


Now that we all have the hangover from hell after the big OpenPOWER-is-open party and are sitting around nursing headaches and sipping raw eggs from brandy snifters, let's talk about squeezing blood out of silicon turnips.

In general my cursory view of the Internet demonstrates two, maybe three, reactions to the OpenPOWER announcement:

"Hey, cool!" (or, less commonly but frequently enough to be obnoxious, "Didn't PowerPC die years ago?")

and

"It's too expensive."

Uniformly these two statements are being said by individual developers talking about getting one of their own systems, at least publicly, anyway (enterprise customers may also be complaining but I haven't seen very much in the places that are publicly visible). For the sake of the discussion let's ignore both the fact that people who skimp on privacy and owner control for a cheaper system are slowly boiling themselves alive in their own cauldrons, and the fact that you can go get a (often substantially) cheaper Intel or AMD system and have similar performance if not better because the CPU optimizations already exist.

The problem really isn't the CPUs. You could cheap out and buy some lower binned consumer part, and I'm sure some of you are very happy with those, but realistically POWER9 is meant to complete against server-grade tiers. At the Cascade Lake level, Intel's most similar 16-thread part is the 8-core Xeon Silver 4209T, with 11MB of L3 and clocked from 2.2 to 3.2GHz for $500 MSRP, or you can go Coffee Lake and get its 8-core/16-thread part, clocked between 3.7 and 5.0GHz and with 16MB of L3 as the E-2288G for about $540 MSRP as of this writing, though the E-2288G also has a GPU. AMD has a 16-thread Rome Epyc (the 7232P) with clocks from 3.1 to 3.2GHz and 32MB of L3 for about $450 MSRP. I think we can agreeably stipulate that both of those are ballpark comparable with a Sforza 4-core POWER9, also 16 threads, with 40MB L3 (10MB per core, unpaired); Raptor is the only retail source for this right now and they sell a 3.2-3.8GHz clocked part (CP9M01) for about $440.

As for a 32-thread Xeon, Intel doesn't sell a 32-thread Coffee Lake. You'll have to buy Cascade Lake, and your closest option is the 16-core/32-thread Xeon Silver 4216, also 2.1-3.2GHz, with 22MB of L3 for $1000. AMD offers the 16-core/32-thread 7302P, 3-3.3GHz and 128MB of L3, for $825 MSRP. Raptor, again, is the only retail source for the 8-core/32-thread POWER9 and they sell a 3.45-3.8GHz clocked part with 80MB L3 (CP9M02) for $690. In fact, let's be ridiculous and comparison-price the 22-core, 88-thread monster. Raptor sells this 2.75-3.8GHz part with 220MB L3 (CP9M08) for $2800. Coffee Lake, sir? Sorry, sir. Intel does list a Cascade Lake Xeon Platinum 9242 with 48 cores, 96 threads, 71.5MB of L3 and clocks from 2.3 to 3.8GHz, but the MSRP for such systems is atrociously high (estimated north of $25,000). The closest Epyc is probably the 48-core/96-thread 2.2-3.3GHz 7552 with 192MB L3; even that will set you back $4025.

Not only can we conclude that POWER9 CPUs are reasonably priced, but I think there's also a credible argument that they're competitively priced. There's a reason for this: Raptor doesn't make them. They're shipped in from IBM's supply chain (presumably from GlobalFoundries) and IBM not only has them made in volume, but higher-cored parts where not all the cores are working can be binned lower for this market and increase overall yield, thus improving the economy of scale.

All right, so what about the logic boards? A quick survey of LGA 1151 (Coffee Lake) server-grade boards on Newegg averaged around $250 and LGA 3647 (Cascade Lake Gold/Silver) around $400, with varying numbers of expansion and RAM slots, though I have no idea what a BGA 5903 board for that Xeon Platinum part would run. SP3-socket boards for the Epyc look comparable. Meanwhile, the cheapest Raptor motherboard (as an item) is the basic Blackbird starting at $1100. Is this justified?

As it happens, we actually do have other PowerPC small-volume systems to compare against. They're called Amigas, or at least the AmigaOne. Even as an Amigaphile I have never been shy about voicing my displeasure with their running embedded parts as entire systems and the P5020 they're using in the current X5000 is basically at a G5 level of performance (until you factor in its loss of AltiVec, and then the G5 stomps it on such tasks), but they're out there and you can buy one. At £1800 from AmigaKit, that's about US$2200 right now prior to Brexit, plus shipping. It includes the case, Radeon GPU, 2GB RAM, optical and spinning disk, CPU and board. Ignoring the obvious performance differences, I paid about $2100 for my 4-core Blackbird system with everything there minus the GPU, but more RAM and an SSD. (By the way, the parents were visiting not too long ago and we watched Glove and Boots videos from YouTube on the home theatre with it. Worked fine. I may not install a GPU in it after all.)

We can extrapolate prosumer pricing too. While I couldn't find my Quad G5's original sales receipt, I seem to recall that I paid around $3600 in 2006 for it (4 cores, no SMT), 4GB RAM, a hard disk and an ATI 7800GT video card. That's about $4500 in current money for a system that was not massively high volume, but not particularly niche, and could be readily bought at the consumer level. IBM provided the chips for that too. Currently the Talos II with a single-4 (16 threads), 16GB of RAM, 500GB NVMe and a WX7100 is selling for $6500, despite in much smaller volumes than the Quad.

This is to say that Raptor's pricing is by no means out of whack for boutique low-volume sales, and again, arguably even competitive. Let's remember that first and foremost Raptor is a small company. Many of the people coming new to the platform don't remember the original POWER8 Talos crowdsourcing attempt, but I do, because I was one of the people who had my money in. They needed about $3.7 million to do the job and unsurprisingly that went aground as you might recall, but Raptor refunded people's money and this went a great deal to establishing their trustworthiness. As such, I imagine there was no small amount of internal investment required to launch the Talos II (which I was delighted to preorder as soon as I could do so). Even though the T2 (and the T2 Lite and Blackbird, by extension) is strongly based on the Romulus reference platform, that doesn't mean there wasn't any R&D required on their part, and there is still manufacturing, QA and support costs as well as the need to actually turn a profit. I mean, seriously, some of you actually seem to expect Raptor to sell these things at a loss. How long do you think they'd stay in business?

Now, with all that said, none of you who have bought one (or several) of these systems will need any convincing that the price is worth it, and it won't convince those of you who have heard these arguments before and discount them. This is a fair criticism because frankly there's no getting around the sticker price, even if I think I've made the case that Raptor cannot easily make it cheaper. So how will they ever get cheaper?

Raptor has a more or less natural monopoly on the OpenPOWER workstation market. Don't get me wrong: I am not accusing them of gouging. As monopolies go, this is about as benign as you can get because not only are they good stewards of the ecosystem but frankly they were simply the first ones in the pool. Look at the OpenPOWER membership list. Do you see anyone else catering to workstation users? (I nearly choked on my Mr Pibb when they talked about Raptor's "low end" systems at the Summit. This is, of course, purely by comparison.) There are some people running some of the Tyan POWER8 systems as workstations but they are clearly not designed as such, and the effect is not unlike running an Xserve G5 instead of the regular Power Mac tower. My POWER6 may be a "tower" system but I sure wouldn't want it under the desk. No one else makes OpenPOWER workstations. No one is even talking about it.

Raptor management may not like me saying this, but this is an independent blog, and if the OpenPOWER workstation market is going to grow and stabilize then there's going to have to be someone else. It's not a situation like the Mac clones where all Umax and Power Computing did was eat Apple's lunch (which is why Steve Jobs canned the whole thing), because Apple was big enough to saturate the market such that anyone who wanted a Mac had one and thus all the clones did was steal sales. By contrast Raptor is not big enough to saturate the OpenPOWER workstation market because they can't move enough units: there is pent-up demand waiting for the price to come down, and they can get backordered even on the systems that people do purchase. Yes, I'm hopeful that an open ISA will lead to new and more exciting chip designs, but as far as the actual cost of the chips themselves, the "big reveal" probably changes the retail cost of the actual CPUs very little if any because they were never priced out of the market to begin with. Where we need improvement is in the cost of the actual systems so that people can get them and there can be more of them. And Raptor cannot do this by themselves.

I like Raptor because I like their people, I like their products and I like the way they do business. If someone else entered this space I would probably still buy from them. But someone else in that space also means new ways of looking at the market, presumably newer niches to distinguish themselves, and hopefully more investor interest in the sector to increase the available capital needed to enable volume production and sales in a way that would actually then start lowering prices.

Plus, more players in the workstation market also means market resiliency. Raptor seems to be a stable company, but what if they weren't? What would we do if they had to close their doors? CPU manufacturers back in the dark ages (around 1978) had to have second sources to get design wins. We need second sources to make the market survive a loss of the primary manufacturer, however unlikely that would be, because their exit with no replacement would doom the Talos family to being a modern Power Mac G5: a dead end.

In the meantime, we want Raptor to do well and their success to attract other players to this market, because right now they're the only (though best) game in town. The cost for the CPU is reasonable, and criticism of their logic board pricing is unjustified, especially as you reach higher core counts (the same Talos board takes a single-4 or a dual-22). If people believe that a non-x86 system is valuable to have, if people (also) believe that supporting an open architecture is valuable to do, and if people (also) believe that a truly owner-controlled workstation is valuable to use, then people need to understand where the market is now and put their money where their mouths are. Best of all, you're not buying an underpowered conversation piece; you're getting a competitive system you can actually use. I don't see how we get much more blood out of that silicon turnip otherwise.

Day 2 keynote and OpenPOWER blows the doors off: royalty-free, open soft-core (RISC-V sweating gallons)


Holy monkeys of Mars. What a morning at the OpenPOWER Summit Keynote (Day 2)! I swear I'm not paid to write this stuff except for the trivial pittance from ads that goes to maintain the domain name (I'm writing this on my lunch break!). I'm just an old-timer Power ISA bigot who's finally seeing the faith pay off. And boy howdy did it.

Let's hit the big news right now. A reasonable criticism I hear of the OpenPOWER movement is that the ISA isn't, or at least wasn't (oops, spoiler), the open part. This is something that RISC-V in particular could claim superiority on. Somebody at IBM was listening, because today Ken King, general manager of OpenPOWER at IBM, announced "we are licensing [the ISA] to the OpenPOWER Foundation so that anyone can implement on top of it royalty-free with patent rights" (emphasis mine). That's a quote right off the livestream. ISA changes will be "done through the community" with "an open governance model" and a majority vote for ISA expansions and changes.

Let me spell out what this means: you, yes, you, can go out and make your own Power ISA chip and not have to pay IBM. OpenPOWER is now truly open.

The other surprise wasn't OpenCAPI; the announcement that it and the Open Memory Interface are moving into the OpenCAPI Consortium is welcome, but expected. What was the other big news is that the OpenPOWER Foundation is moving into the Linux Foundation. There were already close ties between them before but now the OpenPOWER Foundation will be a component of it, albeit still with its own board, governance structure and decision making.

This announcement was definitely not all talk, because they also introduced Microwatt: a Power ISA soft core. Yes! You can drop it in your design as soon as they upload it!

Anton Blanchard from IBM OzLabs in Canberra announced this one, which was actually demonstrated at the show. Now, this is a very basic core: it's single issue in-order (so your old clamshell blueberry iBook will thrash this), and it doesn't even have hardware divide or cache support yet, though this is planned. In fact, the gcc they used was even hacked to not issue divide instructions. But the darn thing actually works. Here's the super-polished block diagram:

MicroPython is provided, so you can drop this into your design and then talk to it. Here it is in the simulator (which took a couple seconds to compute the answer):

On real hardware it is definitely quicker. Here's the core running on an old Xilinx Artix-7 he found doing nothing in the office computing the Fibonacci sequence:

Xilinx was on stage as a sort of sponsor thing, naturally, so they also gave Anton an Alveo to try this on. They crammed forty cores onto it, and then made it say "Hello World" over and over, because that's exactly what I would do with an expensive programmable piece of hardware. (This is where the name "Microwatt" is kind of crummy, because saying "40 microwatts on an Alveo" sounds like a power consumption benchmark.)

The repo as of this writing is not yet live on Github, but should be within the next day or so.

I'm giving Anton a hard time here because his segment actually was the part of today's keynote that impressed me most. Microwatt is real and tangible and you can work on it, and it can scale from hobbyist to enterprise. This is what really put the "open" into OpenPOWER and I was so delighted to see it run.

I will say I see perhaps a little worry from IBM that RISC-V is going to steal the initiative and momentum, and this move (and the open soft core) is their attempt to recapture the vanguard. RISC-V people should actually be happy about this move: at minimum it means they're being taken seriously at the corporate level, it gets more people thinking about open architectures, and the more truly open architectures out there, the more viable and expected the concept becomes. OpenPOWER is the biggest fish in this sea and (with my bias showing) the most powerful, the most ready for migration and the most well-rounded of all of them, but with more water in the pool everyone can swim farther.

After all of that the rest of it was comparatively pedestrian. Red Hat was also there; Michael Cunningham gave a speech which was largely corporate happy talk, but I think he meant it, and I'm hopeful the big blue and little red merger will generate something of the same rich burgundy shade of my SGI Indigo2. Facebook was there too but their presentation was cloyingly light on tech and heavy on smarm, and I think Facebook is ruining the Internet and the psyche of all who touch it, so that's all I'm going to say about that.

The panel at the end was asked to react to the news, which was a little silly, because what else were they going to say? On stage were Derek Chiou, partner system architect at Microsoft and associate professor at UT-Austin; Alan Clark, CTO for SUSE; Tim Pearson, CTO for Raptor; Bapi Vinnakota, engineer from Netronome; Steve Hebert, CEO for Nimbix and Peter Rutten, research director within IDC's Enterprise Infrastructure Practice. They all thought it was cool, because it is cool.

Microsoft was an interesting choice, but Dr. Chiou was complimentary, saying, "we're very supportive of the open source ... Microsoft sees that's where things are going." He also observed, to my interest, that "the interconnect is more important than the ISA." I'm not sure how true that is but I do agree with him that the ability to openly connect is certainly something that's been overlooked, and we need open tooling to make all of this possible. However, the best panel quote was this one, name censored to protect the innocent: "I'm a pretty incompetent developer, so ... [pauses] Python." Yep. Python definitely is the language of incompetent developers. :D (Hey, I got honourable mention in the obfuscated Perl contest one year! I couldn't resist.)

Tim put it best, though, when he said that "it's going to allow people to trust their computers again." That's why we're using OpenPOWER hardware in the first place. Mendy Furmanek, president of the OpenPOWER Foundation, closed up and said that "Christmas has to end sometime," but we got a whopper of a present today. The party's about to get started and IBM deserves all the credit for a move that really is courageous.

Read yesterday's Day 1 coverage for more if you haven't already.

Keynote notes from Day 1 of OpenPOWER Summit NA (and introducing the Condor)


UPDATE: An even bigger announcement from Day 2!

I'm still catching up on everything since I have to do this after $DAYJOB, but the big news from the OpenPOWER Summit keynote among all the great vendors and technology announcements (Day 1) was the last of the POWER9s and the next Raptor system.

Although there were many great pieces in the keynote, the IBM Power roadmap is of course of significant interest. The big one was a subtle but significant change in announced specs. Although one more generation of POWER9 is planned before POWER10, compare this slide to what we posted last year:

For the (now) 2020 "Advanced I/O" POWER9, there's still the same number of PCIe lanes, same signaling speed, same CAPI, NVLink and OpenCAPI 4.0 options. But memory bandwidth went from 350 GB/s to 650.

This whopping difference appears to be from OpenCAPI agnostic buffered memory, implemented as OMI, the Open Memory Interface. For POWER8 IBM introduced Centaur, a way of getting around the inherent limitations of running DDRRAM on a large number of channels by creating an intermediate controller. Instead of driving the RAM directly (as in POWER9 scale-out CPUs like the Sforzas in our Talos II systems), Centaur accepts high-level read and write commands from the CPU(s) and abstracts away the details of getting it to and from the actual DIMMs, including reordering requests and caching them as needed. Each differential memory interface channel on the CPU has its own Centaur which in the current implementation offers four DDR4 memory channels, giving a single CPU up to eight buffered channels to memory and effectively 32 channels to the underlying DDR4. Centaur is also supported on POWER9 scale-up so that people's investment in RAM won't go to waste, but a complex chip like that adds various board engineering constraints, which is why POWER9 scale-out with direct memory attachment was also offered as an option for systems that didn't quite need all Centaur had to offer. (Scale-out's emphasis on PCIe lanes also makes a difference in that market segment, too.)

The idea with OMI is striking a balance between buffering memory and directly attaching it, and AIO POWER9 will be the first CPU to support it. By being "agnostic" it has no ties to any particular underlying memory technology, meaning it can grow as new technologies emerge. OMI runs at 25Gbps per lane and with a latency of just 5ns instead of the 10ns of present-day Centaur. Best of all, it will be non-proprietary, meaning any vendor that wants to make an OMI-compliant memory system can do so and hopefully increase the economies of scale. In fact, one of them did:

Microchip subsidiary Microsemi's OMI-compliant "differential DIMMs" (DDIMMs) should be simultaneously available with the AIO POWER9 next year, using their custom on-board DDR4 OMI interface with an eight-lane channel for a full 25GB/s. I have to say I'm a little cold over yet another RAM standard (looking at the weirdo RAM in my SGI Fuel), but as long as the prices are competitive and the performance is stonking, I could be convinced. Alternatively, the OMI controller could simply be on the board and fan out to regular DIMM slots more or less as things work now, though this robs the standard of some of the future proofing I think it's intended to have.

Back to IBM. The 14nm "Bandwidth Beast," as they're nicknaming the AIO POWER9, will have 16 x8 OMI channels for 25 GT/s and -- there it is -- up to 650 GB/s peak bandwidth. Microchip's buffer won't get that high, though, which is a puzzling thing to pair it with; it seems to top out at "only" 410 GB/s (I know, cry me a river). Onboard will be up to 24 SMT-4 cores, up to 120MB eDRAM L3 cache, 48 PCIe 4.0 lanes (yes, same as our scale-out Sforzas) at 16 GT/s, and up to 48 lanes each for NVLink and OpenCAPI 4.0 attaches. Clearly IBM intends this to replace both scale-up and scale-out simultaneously, so I guess AIO also stands for "all in one":

Oh yeah ... Raptor was there too. Here's Hugh Blemings introducing Tim Pearson:

I'll gently needle Raptor here and say they need a PowerPoint or LibreOffice deity to sex up their slides a bit. But who needs eye candy when you can announce this?

Yes, friends, you too can have a big, intimidating vulture of a computer -- in a form factor smaller than the T2. The Condor is that mythical LaGrange system we heard about last fall. This is a single-socket system to get it to fit in an ATX form factor as opposed to the hulking EATX T2 I'm typing this on, so it won't take advantage of the extra X-Bus capacity, although a multi-socket LaGrange would probably have been too pricey and power-hungry (and too big) for our rarified workstation market anyway. It will have 4 PCIe slots and 8 DDR4 slots (42 PCIe lanes as opposed to Sforza's 48, but double Sforza's four DDR4 channels), which for my money would slot it between the T2 and T2 Lite, and Raptor seems to be encouraging this comparison with the board size. The extra PCIe slot probably would entice some buyers who don't find the Blackbird or T2 Lite expandable enough but don't want to go the full hog, as well as those looking for a less expensive platform to experiment with OpenCAPI (it offers one slot).

We would expect nothing less from Raptor than it to be a fully open, blob-free platform, and it will be available Q1 2020. Price wasn't announced, but my guess is it will be commensurate with that same product placement.

One final miscellaneous note; I don't recall anything said about this, but Raptor seems to have it on its wiki now, so I'll assume the "embargo" is lifted. While you're waiting for the AIO, in the meantime the Sforza DD2.3 stepping should be emerging soon, which will fix various errata including the DAWR for hardware watchpoints. Finally! This should drop right into your existing Talos and Blackbird systems.

More tomorrow, including the BIG ANNOUNCEMENT!
(About all I know is it isn't a laptop.)

Gearing up for OpenPOWER Summit


While unfortunately I won't be able to make the OpenPOWER US Summit on Monday and Tuesday (August 19-20) due to work commitments, apparently a big announcement is in the works and we should know about it on Tuesday. If you're there, Raptor will be at booth S2. We'll dive into it as soon as it's public.

Notable items on the schedule: Hugh Blemings, the executive director of the OpenPOWER Foundation, is of course opening and closing the Monday keynote and Tim Pearson, CTO for Raptor, is scheduled for 9:55. At 1:50pm Justin Lynn talks about using an OpenPOWER Workstation (gee, I wonder which model) as a daily driver in case you don't get enough of that here. Hugh, Tim and others are back for the Tuesday keynote and then the "special announcement" is scheduled for 10:30 (I'll make sure I'm near a computer). IBM talks about the POWER roadmap at 1:30pm and there's an update on the state of Power support for FreeBSD at 3pm. I'm sorry I'll be missing it because it sounds like a great program, but this blog doesn't pay the bills!