Posts

Showing posts from August, 2018

Is there a tiny Talos in the timeline?


Did Raptor just tip their hand on a micro-ATX Talos? No announcements on price or capacity, but being one CPU, it would probably be most comparable to the Talos II Lite (1TB RAM maximum, PCIe x16 and x8). This is doable in an mATX form factor, but given the size of the heatsinks in the EATX T2 and T2 Lite, cooling might be an issue in a case this small. We'd imagine the price would be competitive with the T2 Lite as well.

Some fun case graphics might give this thing a little style, too (see our artist's impression of the image they linked).

Musing over POWER9 roadmap at Hot Chips


(See the presentation from AnandTech's live blog at Hot Chips.)

With the news that GlobalFoundries has stopped all 7nm development, the next step for the Power ISA got more nebulous. IBM really phoned in their presentation at Hot Chips this time around; there wasn't a lot of meat on the bone, and they probably got advance warning of the changes at GF which likely cut what they were willing to say in public. But IBM still has one more stop on the roadmap for the POWER9, so they're not done with 14nm yet.

The 2019 "advanced I/O" POWER9 will increase memory bandwidth from the "scale up" 210GB/s to 350GB/s, over twice as much as the "scale out" cores in the Talos II at 150GB/s. IBM didn't appear to say if this would require buffering or if it was direct attached memory, though our incompletely informed suspicion here is the former. If so, it wouldn't be a direct replacement for the Sforza cores the T2 runs now; the board would probably need a redesign to accommodate whatever Centaur successor they require. That would also have power and thermal impacts in a workstation form factor. I/O on the "AIO" POWER9 jumps to OpenCAPI 4.0 from 3.0, allowing caching on accelerators and additional link widths, and NVLink 3.0 from 2.0, presumably both over Bluelink. IBM didn't announce clock speeds, but given that the core counts are the same, they're most likely identical or comparable.

IBM also said rather little about the POWER10. No core count was reported and the node size was pointedly not shown. However, signaling was announced at 32 and 50GT/s, up to double the POWER9, indicating IBM continues to prioritize bandwidth as their competitive advantage against x86 commodity servers. The timeframe is still 2020, so we can expect at least another 18 months of POWER9 goodness.

Making your Talos II into a Power Mac: dcbz considered harmful (part 2)


In the first part of this article we talked about getting your Talos II prepped to emulate a Power Mac using KVMPPC, the kernel virtualization facility in Linux. Having followed the instructions in that article, you've got your kernel in hash table mode, you've got the KVM-PR kernel module loaded (and patched it if necessary), you installed (or built) QEMU, and you have a blank QEMU disk image ready to go.

For this part, we will assume you have chosen 10.3 Panther, 10.4 Tiger or 10.5 Leopard to install. I will discuss Leopard relatively little other than how to get you started in it; most of the rest applies to Leopard that applies to Tiger. I'll briefly discuss booting OS 9 with TCG at the end.

Before starting, since we will use tun/tap networking, make sure the interface is up before booting. On Fedora, I do something like this:

sudo ip tuntap add dev tap0 mode tap user [your username]
sudo ip link set tap0 up promisc on

and, if you use libvirt,

sudo brctl addif virbr0 tap0

For filesharing you could set up either Samba or Netatalk. I use Netatalk, since I'm more accustomed to AppleTalk and it enables my T2 to serve files over AFP to the other classic Macs here, and it also will work fine with Mac OS 9 if you want to use that at some point.

Let's begin by constructing the command line to boot your emulated Mac from disc and install the OS. Each OS does better currently with certain combinations of emulated CPU and hardware features. In addition, we also need to make sure that the emulator stays within a single core for better performance (you will get random system stalls if it moves over to another core and throughput will be generally impaired), so we need to set affinities appropriately.

We'll go with 10.4 for our example; substitute for your OS of choice where relevant. Start out with

taskset -a -c 0-3 qemu-system-ppc -M

This binds all of QEMU's threads to a single core (recall that the T2 Sforza cores are SMT-4, and each appear as logical CPUs, so everything must run on a single core this way). While QEMU spawns more than four threads, encompassing two cores (i.e., 0-7) has no noticeable performance benefit and can sometimes unsettle Mac OS X by making timing loops unpredictable.

For the -M option, we will specify mac99 and kvm. The OSes differ on what they prefer for the VIA. 10.3 and 10.4 need to run the emulated mac99 with an emulated CUDA chip onboard, or the OS is unable to detect the real-time clock. 10.5, however, requires the later PMU attached to the VIA. So that gets us to

taskset -a -c 0-3 qemu-system-ppc -M mac99,accel=kvm,via=cuda (10.3, 10.4)
taskset -a -c 0-3 qemu-system-ppc -M mac99,accel=kvm,via=pmu (10.5)

All three of these OSes work fine emulating a 7400-series G4. We will use the "Nitro" 7410 (-cpu nitro), which is a bit faster than the G3 (-cpu G3). 10.3 may have some problems with assigning more than 1.5GB of RAM (-m 1536), but 10.4 and 10.5 work fine with 2GB (-m 2048). Don't use more than 2GB of RAM; it will cause various problems. A verbose boot is helpful in case you accidentally did something wrong (-prom-env boot-args=-v). We'll specify our disk image and some tuning parameters (-drive file=[filename].img,format=qcow2,l2-cache-size=4M), and say boot from the CD or DVD (-boot d -cdrom "/dev/cdrom"). Lastly, we'll enable the emulated RTL8139 NIC and USB tablet (-netdev tap,id=mynet0,ifname=tap0,script=no,downscript=no -device rtl8139,netdev=mynet0 -usb -device usb-tablet) and use a sane screen resolution (-g 1024x768x32). For my 10.4 booter, the full command line looks like this (using the filenames I use on this system):

taskset -a -c 0-3 qemu-system-ppc -M mac99,accel=kvm,via=cuda -cpu nitro -m 2048 -prom-env boot-args=-v -boot d -cdrom /dev/cdrom -drive file=tigerhd.img,format=qcow2,l2-cache-size=4M -netdev tap,id=mynet0,ifname=tap0,script=no,downscript=no -device rtl8139,netdev=mynet0 -usb -device usb-tablet -g 1024x768x32

I strongly suggest saving this as a shell script so that you can make any necessary variations. Insert your OS CD or DVD and run the script. It should go into the installer. If it didn't, make sure your filenames are correct, that you have OpenBIOS installed (it comes with QEMU) in a location the emulator can see, and that the KVM kernel modules (both kvm and kvm_pr) are loaded by checking lsmod.

Once the installer has booted you can of course directly proceed to installation in KVM, but I actually recommend shutting down the emulated Mac at this point and bringing everything back up in TCG to get the OS installed. To do that, just use the same command line, but change accel=kvm to accel=tcg. As I mentioned in the first part, heavy I/O loads tend to be less performant on KVMPPC, and installing and upgrading an OS is a pretty heavy I/O load, so running it in TCG will complete the task more quickly and more reliably.

If you want to run Software Update to bring your emulated Mac up to date, it's probably best to also do this in TCG. You could also separately download one of the combo installers (such as the one for 10.4.11) and push that to the emulated Mac on your Samba or Netatalk AFP share.

When the OS is installed, remove the CD-ROM from your command line unless you want to keep it, and change the -boot argument to -boot c to boot from the emulated drive image.

Ta-daa!

For best results with video updates, make sure that the display settings inside System Preferences match your physical display. I'm in 32-bit colour, so I made sure that System Preferences was using Millions instead of Thousands of colours. Because of variabilities in timing, you may notice the OS X clock is close but may seem to run somewhat unsynchronized from your host's clock because of how the delay loop might have been calibrated at bootup. This is mostly just a nuisance.

The next step is optional, but hacks KVMPPC to improve performance of the emulated Mac. Right now we're actually fooling the operating system; we're not really a G4. In fact, the closest Power Mac relative to the T2's POWER9 is the G5, i.e., the PowerPC 970, which is essentially a POWER4 with some modifications for workstation duty and a bolted-on AltiVec unit. Even though we told the OS we're a G4, this doesn't change the attributes of the CPU, in particular for this case the specific instructions it does and does not support and how certain others are handled.

With "big POWER" IBM removed some of the PowerPC instructions that were infrequently used or scaled badly, such as dcba and mcrxr. You don't need to know what these do; just know they were used in some software, but as of the G5 ceased to exist in hardware. Additionally, the G5 and later big POWER designs (including the POWER9) also have a 128-byte cache line instead of the 32-byte cache line of the G3 and G4, which is relevant to the dcbz instruction as it zeroes an entire cache line and potentially spills it to memory. OS X has adaptations for dealing with these cases (an illegal instruction handler in the first case that simulates the instructions in software, and modified system routines in the second), but that only happens if OS X knows the machine is a G5. In this case, it doesn't, so these adaptations are never installed.

KVM-PR gets around the dcbz problem on later POWER designs, including the POWER9, by scanning every new code page in a 32-bit guest for the dcbz instruction and replacing it with an illegal one it can detect. (Remember, it's still a legal instruction; it just behaves differently.) When executed it faults and falls back to KVM-PR, which simulates a 32-byte dcbz instruction in software, and returns control to the guest. It's not a surprise that this process is quite slow, especially if it gets called in a loop. Unfortunately Apple does just exactly that for clearing memory and the instruction is a major portion of the OS' built-in implementation of bzero, which is also called by memset. This is a hot routine and needs to run fast. The G5 version knows about the cache line difference and accounts for it; the G4 and G3 versions don't, and we're using the G4 version.

Apple, however, also helped us out here a little bit by allowing us to guess where the routine is. This and other major components live in a section of memory called the "commpage," which is always located in the top eight pages of the 32-bit addressing space in every process. It is provided by the kernel as an optimization for fast access to important data and common routines. The bzero routine is virtually unchanged from 10.3 to 10.4, and both start with a very unique instruction (cmplwi cr7,r4,32). If we see this instruction in the commpage, we can be confident we have found bzero. And now that we've found it, we can modify it.

Recall I mentioned that KVM-PR must scan each new executable code page for the instruction and change it. We can alter KVM-PR to detect that unique leader instruction if it's mapping in the commpage, and then monkeypatch in a new routine that doesn't use dcbz and thus won't require slow simulation. To make it more reliable, we know where the location should be, so we'll only patch it if it's actually there. As a bonus we'll also map dcba to nop anywhere in an executable section so that it doesn't need a trip to a special handler either. That is what this patch does.

To build KVMPPC with this patch uses the same steps as we discussed for building and installing the kernel modules in part 1. This patch also applies with -p1.

Does it make a difference? You bet it does. On my system with Geekbench 32-bit on Mac OS X 10.4.11, it improved the overall benchmark by nearly 200 points over the unpatched version, almost all of it in (no surprise) the memory score.

This consequence of masquerading as a different CPU also carries over into which software you can run. Even though this is a G4, you actually have to run the G5 version of TenFourFox, which doesn't have any of the other illegal instructions that aren't patched (just be patient -- it will take TenFourFox almost a full minute to come up). If your software offers a G5 version, you should run that if you can. The discontinuity leads to amusing discrepancies like this one.

Interestingly, TCG on POWER9 actually had errors during SunSpider that the JIT in TenFourFox under KVMPPC doesn't, and even with the warmup was up to twice as slow as KVM at SunSpider. Go TenFourFox!

You'll find that performance is still fairly pedestrian even with KVMPPC. While the OS typically benchmarks my T2 as a "2.04GHz G4" (TCG usually gets computed as somewhere between "900 MHz" and "1.0GHz"), the actual throughput you get varies greatly on workload. Raw CPU performance is a bit better than my Quad G5 scores running single core in Reduced mode, though the Quad running full tilt easily surpasses it (the emulation overhead is only reduced, not eliminated). The numbers get a lot different in applications depending on how their workload is structured. For example, TenFourFox's G5 JIT in KVMPPC gets about 6800ms in SunSpider compared to around 3800ms on a "real" 1GHz iMac G4. Improving these numbers to get parity, and especially getting QEMU to support SMP, will need to be an area of active future development.

Lastly, I mentioned about the best way to run OS 9 on a Talos. Although limited to TCG, it's still pretty snappy, a testament to Mac OS 9's comparatively low system requirements. Mac OS 9 works better with the PMU than the CUDA (or you get problems with the mouse not responding to double clicks reliably) and is limited to 1.5GB of RAM. It also doesn't support the QEMU USB tablet, but it does support the RTL8139 with this driver. To get the driver installed, I actually just made an ISO image out of it, dropped it in the Extensions folder and rebooted it. My command line looks like this:

qemu-system-ppc -M mac99,accel=tcg,via=pmu -m 1536 -boot c -drive file=classic.img,format=qcow2,l2-cache-size=4M -usb -netdev tap,id=mynet0,ifname=tap0,script=no,downscript=no -device rtl8139,netdev=mynet0 -rtc base=localtime

Mac OS 9 uses a different real-time clock base, so this has an additional -rtc option. You can use any CPU you want since it's emulated; I just use the default G4 7400 here instead of specifying one.

Post questions or things you've discovered in the comments.

GlobalFoundries stops all 7nm development


As reported in AnandTech, GlobalFoundries, which includes the former chip manufacturing foundries of AMD and, notably for our reporting, IBM, has scuttled their 7nm process roadmap. Instead, the company will be concentrating more on their 14nm and 12nm FinFET technology, including the 14nm FinFET process that GlobalFoundaries uses to manufacture the POWER9.

The POWER10, scheduled for 2020 in the IBM product roadmap, is supposedly being designed on a 10nm process. Assuming IBM doesn't redesign the POWER10 for 12nm, their other options are Intel (unlikely), TSMC or Samsung, all of whom have 10nm processes. POWER11 was planned for 7nm, but has no timeframe. Meanwhile, in a possible sign of what's to come, AMD has moved to TSMC.

Making your Talos II into a Power Mac: KVMPPC for POWER9 (part 1)


UPDATE: On current systems (as of April 2020), see these errata.

Talospace is a spinoff from the TenFourFox Development blog, which for those unfamiliar with it, is a Firefox fork maintained for Power Macs running Mac OS X 10.4 and 10.5. It shouldn't be a surprise that the common architecture was a big plus for me, and it's possible to run OS X with reduced emulation overhead on the processor using the same Kernel-based Virtual Machine (KVM) scheme used for virtualization on other platforms.

Emulation is of course just one of the things us old Mac users would like working properly on the new Power hotness. The other is the damn Command key working like it's supposed to. We'll address that pain point in another "First Person" post coming soooooon.

Anyway, a brief digression before we begin, for those unfamiliar with how KVM works on Power ISA. KVMPPC comes in two flavours, KVM-PR ("PRoblem") and KVM-HV ("HyperVisor"); both work in big and little endian modes. KVM-HV is the more modern of the two and the most technically like hypervisors on other architectures. It uses the hardware support in later Power ISA CPUs, so it's overall faster, particularly when many supervisor-level instructions must be executed. However, it cannot be nested (you can't run a KVM-HV guest inside a KVM-HV guest, though you can run a KVM-PR guest; more on that in a moment), and most importantly, it supports only virtualizing the same processor generation or the one immediately prior. Since no version of OS X ran on a POWER8 (let alone a POWER9), we won't be dealing with it further for the purposes of this article.

That brings us to KVM-PR. Unlike KVM-HV, KVM-PR runs strictly in user mode, or what IBM docs refer to as the "Problem State." It does run as a kernel module, so it's not in userspace, but it does not depend on the hardware which powers KVM-HV and thus only runs user-level instructions. That means it must trap and emulate supervisor-level instructions on behalf of the guest, which is much slower. However, KVM-PR can also emulate other instructions and their desired behaviour, which theoretically allows it to act like any supported Power ISA or PowerPC CPU, including a G3, G4 or G5. Instructions which aren't supported natively are trapped and executed just like supervisor-level instructions, and everything else can still run on the metal. Because it's user mode, it can be nested (a KVM-PR guest can run inside of another KVM-PR guest, as well as inside a KVM-HV guest). KVM-PR was the original method of virtualization on PowerPC Linux, descending from the venerable old Mac-on-Linux project (which had its own peculiar hypercalls), and a specialized form of this method is how OS X runs Classic on 10.4 and earlier. This is the method we will use here.

Let's first talk about whether KVM is the way you want to go. For our Power Mac hardware emulation, we will use QEMU, which can use KVM (and KVMPPC) to accelerate the processor, and QEMU provides the rest of the platform. QEMU provides two platform profiles, g3beige, a Gossamer Beige Power Mac G3, as the name implies, and mac99, essentially a Sawtooth G4. We will only be using the mac99 platform since it provides the best combination of flexibility and compatibility.

QEMU also provides emulated USB devices. The most useful to us is the USB tablet, which allows QEMU to detect when the mouse is within the QEMU window without having to grab it and makes using the emulator a lot more seamless. Unfortunately, the USB tablet is only supported by 10.3 Panther and up. No version of PowerPC Mac OS currently has support for VirtIO devices yet either, so there is no graphics or disk acceleration. On the other hand, QEMU does provide an emulated RTL8139 network card, for which drivers are available for Mac OS 9 through 10.2 Jaguar and are built into at least 10.3 Panther and up, and with tun/tap runs with decent throughput. Sound is best described as a work in progress and graphics work but are basically a dumb framebuffer. Still, this is enough to get the OS off the ground and be useful.

KVMPPC does not work in all situations with QEMU. Most notoriously it does not work for booting Mac OS 9 and Rhapsody, and not 10.0 or 10.1 either, at least from disc. I've done some work on improving this and it gets mostly through the nanokernel startup in OS 9 but doesn't get any further yet. For these operating systems you will currently need to use TCG, QEMU's software CPU emulator, which runs by default if you don't ask for KVM. TCG does have JIT acceleration, and the JIT supports Power ISA, so while it's definitely slower it's at least somewhat better than it sounds. TCG also tends to run a little smoother than KVM since it's all within a user process, but compute-intensive tasks can run up to an order of magnitude slower. TCG is also involved if you run a completely alien inferior architecture like x86.

KVMPPC also tends to be problematic with heavy I/O loads. TCG can be noticeably faster when running installers, for example, or anything that involves substantial emulated disk access. This is probably due to the large amount of supervisor-level code that incurs a speed penalty with KVM-PR. I had better luck and faster install times installing things with QEMU using TCG, then shutting down and rebooting in QEMU with KVM to actually use them.

Finally, KVMPPC only works to mimic certain processors currently. G3 works for every system, and Nitro (G4 7410) works for most of them, but right now that's all. None will boot in KVMPPC with any G4 7450-series processor, and trying to start KVMPPC in 64-bit mode to emulate a G5 currently crashes my Talos. There is also no support for SMP, so our monstrous multi-core beasts will only present one CPU to the emulated OS. The processor you choose doesn't necessarily change the underlying vagaries of the architecture, though, which will be discussed in the next part as well.

Some specific notes on individual versions of Mac OS:

  • Mac OS 9, Rhapsody, 10.0 Cheetah and 10.1 Puma do not currently boot on KVMPPC, at least not from CD. They also don't support the USB tablet, so you must click in the window to grab the mouse and keyboard, and hit Ctrl-Alt-G to release the grab to do something else. Rhapsody can be notoriously hard to install and requires multiple steps which I won't discuss here. For OS 9 I'll talk about a couple of glitches with QEMU in Part 2, since many of you will still want to run it even though there is no CPU acceleration.

  • 10.2 Jaguar has various problems in KVMPPC, though it does work. Finder windows tend to glitch and not fully load when you doubleclick folders and devices on the desktop. Classic does not work in 10.2 with KVMPPC and aborts with a bus error. 10.2 also does not support the USB tablet, so you need to grab the mouse as with OS 9.

  • 10.3 Panther and 10.4 Tiger both run well in KVMPPC. Later on we'll talk about a specific optimization to the operating system "commpage" to make them run even better. 10.3 runs better than 10.4, but 10.4 has better compatibility. Both support the USB tablet and have built-in support for the RTL8139 NIC. Classic will boot and run in both, but is noticeably slower than on a real machine (this is true of both TCG and KVM), though Classic is somewhat faster in Panther.

  • 10.5 Leopard appears to work fine in KVMPPC. It supports everything that 10.3 and 10.4 do, though I haven't done the particular commpage optimization for 10.5 yet because I don't use, nor particularly like, Leopard personally. 10.5 obviously does not support Classic.

You'll need to do some preparation to get your Talos II to be an accelerated Mac with KVMPPC (this isn't needed if you're going to use TCG since it's purely userspace). The first is that you need to make sure your T2's MMU is in hash table mode, used by POWER8 and earlier CPU generations. The POWER9 introduces a new MMU mode called radix mode, but without going into the gory technical details, the particular memory mapping characteristics of radix mode mean certain tracts of memory cannot be properly manipulated by KVM-PR. All OSes that support the POWER9 in radix mode will support it in hash table mode. For Linux, just add disable_radix to your kernel command-line arguments. For my Fedora workstation, I just put it into GRUB, regenerated the configuration, and rebooted. If you did this right, dmesg will show a line like this:

[    0.000000] hash-mmu: Initializing hash mmu with SLB

You shouldn't see any mention of radix mode.

The next step is possibly to download a copy of the kernel source code. If you have kernel 4.17.x or earlier (as my Fedora 28 system does), you will need to apply patches to the KVMPPC kernel modules in that version to even get it to start. If you have kernel 4.18.x and up, the necessary patches should already be present for basic functionality, but you may still want to get the kernel source for some of the hack optimizations in this post that aren't (and probably won't ever be) included by default.

Let's assume for didactic purposes that you do need to patch the KVMPPC kernel module that comes with your distro. We will talk about adding the hacks to it a little later. This will taint your kernel. If your system behaves strangely and you are unable to unload the module, you may need to reboot.

  1. Download and unpack the source archive, and cd into the root of the unpacked source archive.
  2. Download this patch (written by yours truly) into the root of the source archive.
  3. patch -p1 < that_patch.diff
  4. cp /your/kernel/config .config (assuming you're still in the source archive)
  5. make -j24 modules (or if you're one of the lucky scum with more cores, adjust as appropriate; I have two of the 4-core CPUs for 32 threads, but I like to leave a core free)
  6. When the make has run to completion, edit include/generated/utsrelease.h to make sure it matches what appears in uname -r, or your kernel may refuse to load the module.
  7. Regenerate the KVMPPC modules with the matching string: make -j24 SUBDIRS=arch/powerpc/kvm

Now you can load your custom modules:

cd arch/powerpc/kvm && sudo modprobe -r kvm_hv kvm_pr kvm && sudo insmod kvm.ko && sudo insmod kvm-pr.ko

and you should see something like this in dmesg:

[22198.130998] kvm: loading out-of-tree module taints kernel.
[22198.184535] kvm: module verification failed: signature and/or required key missing - tainting kernel

If you actually got an error message, you loaded the wrong thing, or you possibly forgot the patch (earlier KVMPPC versions won't even start KVM-PR on a POWER9).

If you already have the patches, chances are your OS already loaded KVM-PR. You can check this with lsmod. If it didn't, and trying to load it with sudo modprobe kvm_pr doesn't work, you may need to also patch your kernel modules with the steps above. On the other hand, if you see both kvm_pr and kvm_hv listed, do a sudo modprobe -r kvm_hv (unless you really do need it) to limit your system to KVM-PR and help to simplify the remaining steps in this article.

Next, the third step is to install QEMU. QEMU 3.0 is strongly advised; if your package source doesn't have it, then download and compile from source (and you get to do -O3 -mcpu=power9 anyway for great justice). Although QEMU 2.12 will mostly work for these examples, many bugs and edge cases were fixed in the Mac hardware emulation and some bugs can't be worked around easily any other way. You may also have to remove some command line options from the examples that were not supported in 2.12.

Create your base disk image according to the QEMU instructions and get out your OS X disc. I saw little value in using a raw disk image and it was substantially larger than a qcow2 image, so I'd just use that. We'll assume this is your chosen format for the remainder of this series.

In part 2, we'll talk about how to get all this actually booting.

Intel: Benchmark our product and we'll sue


Making the rounds on the web: discovered in Intel's new firmware EULA, "You will not, and will not allow any third party to ... (v) publish or provide any Software benchmark or comparison test results."

It's probably unenforceable to limit what is essentially a product review of their CPUs. Probably. Or, just don't buy CPUs from a company that's willing to put junk like that in their EULAs in the first place.

Update: Intel has recanted.

CentOS announces support for POWER9


On the CentOS mailing list, CentOS 7 (1804) now supports POWER9 (running little-endian). This is not a surprise, as your humble author is typing this blog post on a Talos II in Fedora 28, but it's good to see the support trickling into downstream distros. The POWER9 version uses kernel 4.14 and is described as otherwise identical to the POWER8 distribution, which also runs little-endian. Download it from the CentOS mirror server.

IBM finishes Cumulus rollout


With a 24-core SMT-4 processor an option for the Talos systems, you can be forgiven for forgetting these beasts come even bigger. Let's look at IBM's recent announcement of the the last of the POWER9 herd.

The Sforza modules in the T2 and T2 Lite are Nimbus systems, with four-way symmetric multithreading and support for up to two processors (so-called "Scale Out"); the currently largest 24-core variant thus has 96 threads per processor and therefore a maximum of 192 per system. These PowerNV systems are designed to run an OS on the bare metal and have "direct attached" RAM that does not require a memory buffer. By contrast, the Cumulus ("Scale Up") systems come as SMT-8 parts, and support up to sixteen processors. Notably, the current maximum core count is "only" twelve, so the maximum number of threads per processor is still 96, and the Cumulus systems use Centaur memory buffers which act as a level of cache below L3 and offer about half the memory bandwidth of Nimbus. On the other hand, this also allows shops with big IBM investments to use the same RAM from their POWER8 machines, a substantial cost savings, and the "Fleetwood" Power E980 with all 16 processors enabled offers a staggering 1,536 threads. The Cumulus machines are also designed to boot and run AIX, IBM i or Linux virtualized under PowerVM, whereas Linux is the only supported option for the Nimbus machines.

The smaller of the two machines, though they're both beefy, is the 4U "Zeppelin" Power E950, which supports four processors and comes in 8, 10, 11 (?!) and 12 core variants running from 3.1 to 3.8GHz, thermal headroom permitting (the 8-core has a base frequency of 3.6 and the 12 core 3.1), for up to 384 threads. A 16Gb/s NUMA interconnect links the processors and up to 16TB of RAM is supported. It supports AIX and Linux.

The 5U "Fleetwood" Power E980 is basically up to four E950s lashed together with 25Gb/s Bluelink between the nodes, with four times, well, everything. And you can bet that includes the price tag. The (up to) 192 cores are also slightly faster, ranging from 3.55 in the 12-core processors to 3.9GHz in the 8's. The E980 is the only system of the two that supports IBM i, as well as AIX and Linux.

Obviously there will be future POWER9 systems as well as these, but this completes IBM's current roadmap. Read more about it at TheNextPlatform.

Foreshadow and L1 Terminal Fault: another good day to be on Power ISA


Boy, Spectre really is the gift that keeps on giving, isn't it? That is, if you're anybody but Intel. Read this excellent (as always) explanation from LWN and then sit back and relax, since neither Foreshadow nor Terminal Fault affects Power systems.

Linux kernel 4.18 available


Version 4.18 of the Linux kernel is now official; Phoronix lists the major hits. There's even some PA-RISC love in there.

For Talos II land among all the other updates for POWER9, some of the KVMPPC work that enables QEMU to actually boot and run Mac OS X on T2 hardware without using pure software CPU emulation is now in this release (disclaimer: yours truly is a contributor). This requires using KVM-PR instead of the true hypervisor KVM-HV (which also is the subject of substantial updates in this release), but now will work as long as your Talos II's MMU is set to use hash tables instead of radix mode (putting disable_radix as a kernel command line parameter in your GRUB configuration will do nicely). More details on getting this up and running will be the subject of a future post.

Our local Talos, let us show you it


The Talos II your humble editor is typing this on is probably pretty middling-spec by most of your standards, but it does the job. From the "factory" it came with two 4-core POWER9 CPUs, 32GB RAM, Radeon WX7100 workstation card, LSI RAID PCIe card, 4-port SATA PCIe card connected to an LG BD-RAM drive, and 500GB Samsung 960 EVO NVMe main storage, all in the standard Supermicro case Raptor is currently shipping.

We started with, and still use, Fedora. Fedora 28 "just works" with the T2 and should work just fine with the T2 Lite as well. Our Talos slotted right in with the Aten CS1764A KVM on the desk split between the Power Mac Quad G5 (our other primary use system), a Silicon Graphics Fuel and a Power Mac G4 MDD. (The jet-black system it's sitting next to is an Alpha 164LX in a Nanoxia case.)

Since then, we swapped out the LSI RAID card for another 960 EVO NVMe, this time 1TB (we'll populate those front bays sooner or later, tho), and added a Rosewill RC-504 PCIe FireWire card for video work. That takes care of all the slots we have available and both new installs were immediately detected and operable in Linux. For sound we added a Sabrent AU-MMSA USB audio dongle (the Aten KVM switches sound sources, too), which "just worked" in Fedora, and to read our old hard disks from the Mac a Sabrent EC-HDFN dual 2.5"/3.5" SATA docking station connected over USB 3.0. This unit also "just worked" with no specific setup and has a nice disk-clone feature as well.

Lastly, for noise reduction, we replaced the factory Supermicro PWS-1K41P-1R power supplies with PWS-1K41P-SQ "super quiets." These are also 1100/1400W-rated PSUs of the same specification, and while they were a little hard to source, were so worth it; they're just about whisper-quiet. The Talos II makes hardly any noise now even under load and long uptimes.

What's your rig running? All Talos-family systems are eligible. Tell us at talospace at floodgap dawt com. Please, don't send photos until we're ready for them (unless you have them hosted somewhere). Cool loadouts will appear in future episodes of Show Us Your Talos.

New Talos II Special Developer System SKU: TLSDS3


Joining the ranks of the Talos 2 Special Developer Systems is a new SKU announced by Raptor on Twitter, TLSDS3. This unit offers a T2 Lite motherboard with a single 4-core Sforza POWER9, 8GB of RAM and 128GB of NVMe flash in a chassis with 500W PSU for $2099. The processor may be upgraded, but only a single processor is supported. The TLSDS3 appears to be replacing the TLSDS2, which is no longer available for order.

Although the original TLSDS1 is cheaper by about $400, the TLSDS3 doesn't have the limitations of the DD2.1 stepping used in the original Special Developer System or the the successor TLSDS2 (lower clocks and no support for virtualization).

The TLSDS3 package is available for order now.

Ubuntu LTS updates


An updated release of the long-term support Ubuntu 18 (Bionic Beaver) is now available for ppc64el. Read the full changelog for 18.04.1.

Ubuntu 18 is the first LTS version to officially support the POWER9 and the Talos II and should "just work." POWER8 and earlier 64-bit Power generations are still supported by Ubuntu 16 (Xenial Xerus). Read the full changelog for 16.04.5. You may be able to use the hardware enablement stack to boot a POWER9 system with Ubuntu 16, but Ubuntu 18 is strongly recommended.

Of the currently supported LTSes, only Ubuntu 14 and 16 still support 32-bit PowerPC; 32-bit systems are unfortunately no longer supported for Ubuntu 17 and up. For those using Ubuntu 14, 14.04.5 (Trusty Tahr) was also released.

All Power ISA official releases of Ubuntu are Server branded and do not install a GUI by default.

Adélie Linux ported to Talos II big-endian


As reported by its chief maintainer A. Wilcox on Twitter, the Adélie Linux distribution has been ported to the Talos II.

This distro is particularly interesting because it explicitly runs big-endian (the POWER9 in the T2 is capable of either mode); many Linux distros, including the Fedora 28 distribution your humble author is typing this on, currently only support little-endian operation (ppc64le). It's definitely an open question how much longer big-endian operation will still be supported on future Power ISA chips, though the official word from Armonk is "IBM remains committed to transitioning the Linux on Power application ecosystem from big endian to little endian in an expeditious manner" (source), so the direction is clear for the Linux ecosystem. However, what this means for AIX and IBM i customers is less certain, because backwards compatibility is very important to those shops, and those operating systems remain big-endian currently. These legacy markets are important revenue sources for IBM. It is entirely possible that future OpenPOWER designs become strict little-endian and only IBM-built POWER servers would support big-endian operation; it seems very unlikely these operating systems, particularly IBM i, would support a little-endian mode.

You can download the T2 ISO of Adélie Linux, including a Live CD, from their distribution site.

Talos articles from our sister blog


Talospace originally started as a spinoff from the TenFourFox Development blog to move Talos-related content into its own concentrated space. However, a number of relevant articles were posted there earlier; here are permalinks for them:

And, just for historical fun, coverage of the original Talos POWER8 project (preorder announcement, preorder price drop, pledges open, Crowd Supply launch and the end of the campaign).