Posts

Showing posts from 2018

The POWER continues in Firefox 64


If you are able to build Firefox 63, you should be able to build Firefox 64 without anything additional. Meanwhile, a whole mess of fixes for long-standing issues have landed for Firefox 65 and later, but more about that in a future post.

FreeBSD 12.0 released


FreeBSD 12.0 is released, the latest version of the venerable BSD-derived operating system, with updates to crypto, TRIM support and Clang among other features and improvements. ISO images and downloadable archives are available for ppc64, but this support does not seem to include POWER9 (yet). One of our occasional readers is on this team and perhaps could give an update?

Note that Power support in FreeBSD is big-endian; no word on whether ppc64le will be supported as well.

Fedora 29 mini-review on Talos II


Although I don't think anyone has good data on this so far, I suspect that Fedora usage among Talos II users is pretty high because it was among the first to offer POWER9 support out of the box. I believe Debian has the greatest install base overall because of its general reputation and it seems to be the one Raptor recommends, but I still wager that Fedora is in the top couple. Indeed, coming from the Power Mac world without loyalties to any particular distribution, the fact I could just throw a bootable Fedora CD into my brand-spanking-new T2 and install an OS without any fuss was pretty much the whole reason I'm using Fedora now. Fortunately, after an initially bumpy start with some weird glitches here and there, F28 Workstation has generally been a very pleasant experience and I think most distros that support Talos systems are now to that point. This review, then, is not a general review of Fedora 29, but rather a review of F29 from the perspective of a Talos II user and anything relevant to this platform that I encountered.

If you are unfamiliar with Fedora, the most important thing to remember is that there is no such thing as a Fedora LTS (if you want that kind of extended support, then you really should be using CentOS or Red Hat Enterprise Linux). Releases are maintained on an N+2 system: now that Fedora 29 is out, F28 will be supported until one month after F30 is released and F27 will be shortly unsupported since we are roughly a month after F29 emerged. In practice this means any one release is supported for roughly a year, give or take, so it really is important to stay current with official releases. On the other hand, since the cadence is somewhat quick, the chance of major breaking changes between any two consecutive releases is relatively low.

A few notes about my configuration before we begin. My high security systems are on a standalone wired network that cannot route directly to the Internet, only through proxies, and this system has the BTO AMD WX7100 workstation card as its primary console with the VGA port jumpered off. It comes up with a textual boot (not graphical) so I can do updates with high confidence of not interfering with anything running; I manually do a startx when I want to start GNOME from there. (Incidentally, I didn't like the default icky PC VGA font, so I changed it to the Sun workstation font shown in the photograph which I think befits this machine better. Just set FONT="sun12x22" in /etc/vconsole.conf.)

If you don't have your GPU's firmware loaded in to Petitboot, you may wish to consider doing so before upgrading to or installing F29. If this is not possible or feasible, you may find life a bit easier if you use the VGA port (make sure it is not jumpered off), particularly if you are installing from scratch. In my case, I don't have the firmware on the Petitboot side yet since I rarely work with Petitboot directly, but I did need to have the VGA jumpered back on to actually install Fedora for the first time. For the upgrade this is optional but requires a few more steps. I'll talk about this in a moment for the benefit of those with unusual video cards or issues with them during early IPL.

Rather than upgrade immediately when F29 was available, I waited a few weeks to ensure that updated packages were available from both Fedora and the RPMFusion repositories. When I was ready to upgrade from F28 to F29, I quit GNOME and returned to the text console, and started the upgrade process with the standard steps:

sudo dnf upgrade --refresh # upgrade DNF
sudo dnf install dnf-plugin-system-upgrade # install upgrade plugin
sudo dnf system-upgrade download --refresh --releasever=29 # download F29 packages
sudo dnf system-upgrade reboot # reboot into upgrader

(GNOME Software can apparently do this for you automatically, but I prefer to kick off such updates manually. If you are installing F29 from scratch, however, install from the ISO for the Server or Everything versions as usual for ppc64le and then convert to Workstation afterwards if desired.)

If you have the firmware loaded in Petitboot or you are using the VGA, you can see the upgrade progress automatically begin after the system reboots. If you don't, however, you can still monitor the system by pressing CTRL-ALT-F2 for an alternate console, logging in as root (other UIDs are locked out) and periodically issuing

dnf system-upgrade log --number=-1

which displays the log so far. You'll see the normal dnf strings you would ordinarily do as it goes through the packages.

On my system the upgrade process (after the reboot) took about an hour and I went to bed in the middle of it. I woke up a few hours later to find the Talos at a black screen with its fans roaring at top volume, which looked like it had gone berserk during IPL when the upgrade ended with an automatic reboot. This caused a few nervous moments when I had to hard-power it off from the front power button and bring it back up. Fortunately, the machine then booted immediately into F29 and the system upgrade log showed that dnf had completed the upgrade without any errors. All of the packages I currently use seem to have made it over, so pretty much anything else you need to have on ppc64le should "just work" at this stage.

There are many small improvements in F29 but the two big ones are Fedora Modularity, allowing shipping multiple package versions on the same platform base, and GNOME 3.30. Although this problem is not specific to the Talos, the GNOME upgrade introduces issues of its own:

One minor annoyance was that the system monitor extension I use had gone a little nuts and needed to be reconfigured. However, the big annoyance is that as you can see from the screenshot Epiphany (GNOME Web) no longer supports plugins, including the one to manage GNOME extensions. The Tweaks window demonstrates I use quite a few of them, so losing that feature really hurts their management -- especially because trying to install and use the Firefox extension instead still gives up with an error about a missing "native host connector" even after I installed and manually confirmed the native host component was present. The big one was that Dash to Dock needed an upgrade and installing GNOME extensions by hand isn't exactly an entertaining pastime.

The trick is that the Fedora package for the native host connector only includes the JSON native messaging descriptor for Chrome. Since we don't/won't use Chrome and there isn't an official release for POWER9 anyway, we have to create our own: once you have downloaded and installed the Fedora native host connector package and the GNOME shell extension for Firefox, place the contents of this gist into ~/.mozilla/native-messaging-hosts/org.gnome.chrome_gnome_shell.json and restart Firefox. Now, when you visit the GNOME Extensions site, it should simply "just work" and the browser should now be able to automatically enumerate, install, disable, enable and upgrade your GNOME extensions.

On the whole, a week into the installation, I don't notice a great deal else, which means the upgrade was relatively uneventful overall (the most desired attribute of any system update). One thing I do notice is that running updates is now quite a bit faster (downloading a lot less metadata) and this is very welcome. Bumps are unfortunately to be expected and we should be striving for better in Linux if we want to be a viable alternative to macOS and Windows, but I'm relieved to say that at least with this update, the update wasn't much bumpier on our unusual black beast.

CentOS 7.6.1810 available


CentOS 7.6.1810 is now available based on Red Hat Enterprise Linux 7.6 (fresh off the IBM merger). Separate ISO images are available for ppc64 and ppc64le, along with a standalone build for POWER9 which should support Talos family machines. You can read the release notes for CentOS-specific information and get more information on RHEL 7.6, from which it descends.

Why we need Firefox on Talos, not just Chromium


Today's news is that apparently EdgeHTML, the layout engine for the Edge browser, is being replaced not just on mobile, not just for ARM on Windows, but even on Windows 10 itself -- with Chromium. There's more about that on our sister blog, TenFourFox Development.

This potentially allows Chromium to arrogate even more browsershare to itself, enabling Google to continue with eroding support for anything that isn't Chrome. If you're using a Talos II as I am, you of all people should recognize the vulnerabilities of an architectural monoculture. We've seen that with Intel's stagnation on x86, we saw that with Internet Explorer 6, and we're about to see it again if Chromium is successful in driving Gecko's marketshare to irrelevancy.

Chromium on POWER9 exists, and apparently works; I won't use it personally for the reasons I cited above, but I salute the work that went into it. (Too bad Google doesn't seem to.) Mozilla, on the other hand, has been willing to accept PowerPC patches even after PPC OS X was no longer a tier-1 platform (which is where TenFourFox came from), and is taking patches to repair Firefox builds on ppc64le today. They've taken some of mine, and they've even taken patches to fix big-endian PowerPC and PPC64. The irony with the current big-endian issues is that they're actually with Skia, which was written by ... Google.

Mozilla has proven willing to support platforms outside their core as long as those platforms take responsibility and the support for those platforms doesn't interfere with tier-1 builds. This is an eminently reasonable policy. Moreover, with the exception of the JIT, which I'm trying to work on between TenFourFox, Christmas holidays and visiting family, Firefox exists and works and can be built. We need to remember that after Microsoft imminently outsources their browser to Chromium, Gecko (Firefox) will be the last major rendering engine that isn't Chromium or WebKit. Nothing else has enough marketshare for any other developer to think about, least of all Google themselves. If we don't act to support and preserve its marketshare on a platform where choice and freedom are part of its DNA, we may confront a future where Chromium is the only choice. And Google's already showing us what that future's going to look like.

Blackbird's on sale for Black Friday weekend!


We've just bought one of the Basic Blackbird Bundles (BK1B01, 90W 4-core POWER9 CPU, mATX mainboard, I/O plate and recovery DVD). Yes, they're on sale for $999.99, but only until 11:59:59pm Central Standard Time on Monday, the 26th! That's a $175 savings over buying the motherboard and 4-core CPU on sale separately, which is also on sale now too over the same period for $799.99 for the board (BK1MB1) plus $375 for the CPU.

There is of course some fine print: besides the fact the bundle does not include RAM, storage or a case, the most curious omission is that there is no heat sink-fan assembly in the pack-in deal. The 4-core should do fine with the 2U HSF, which is an extra $75 (and is what we ordered), but the 8-core will require the 3U HSF (160W max supported on the Blackbird).

We'll be reviewing it when it arrives. Get it while it's hot. Note that this is a pre-order, and Raptor hasn't given any more specific date for arrival than Q1 2019.

Update: Raptor is now also offering an 8-core Blackbird bundle (BK1B02) using the 160W part for $1329, and does include a 3U HSF with that SKU.

Sometimes it's necessary: running x86_64 binaries on the Talos II


Yes, it's gross, but sometimes it's necessary. There's a lot of software for Intel processors, and there's a lot of it that you can't recompile, so once in awhile you've got to get a little dirty to run what you need to.

In prior articles we've used QEMU with KVMPPC to emulate virtual Power Macs and IBM pSeries, but this time around we'll exclusively use its TCG JITted software CPU emulation to run x86_64 programs and evaluate their performance on the Talos II. For this entry we will be using QEMU 3.0, compiled from source with gcc -O3 -mcpu=power9. Make sure you have built it with (at least) x86_64-linux-user,x86_64-softmmu in your --target-list for these examples, or if using your distro's package, you'll need the qemu-x86_64 and qemu-system-x86_64 binaries.

However, there is also a new experimental fork of QEMU to try called HQEMU. HQEMU uses LLVM to further optimize the code generated by TCG in the background and can yield impressive performance benefits, and the latest version 2.5.2 now supports ppc64le as a host architecture. However, despite its obvious performance gains, HQEMU is not currently suitable as a total QEMU replacement: it's based on an older QEMU (2.5.x) that is missing some later features and improvements, it still has some (possibly ppc64le-specific) notable bugs, and because it needs a modified LLVM it currently has to be built from source. For this reason I recommend you have them both available and select the one that works best.

The step by step instructions for building HQEMU on ppc64le (PDF) work pretty much as is, except for the following:

  • LLVM 7.0 is not supported; I used LLVM 6.0.1. The included patch for 6.0 does not apply against 7.x. I already have clang on the system and the LLVM build system uses it by default (though ordinarily I'm one of those people who prefer gcc), so I don't know if it will build with gcc, though it should. Rather than install into /usr/local, I prefer to install into hqemu/llvm in my source directory to avoid tainting the system's version. This makes your cmake command look like this, assuming you followed the steps in the manual exactly and are in the proper directory:

    cmake -G "Unix Makefiles" \
    -DCMAKE_INSTALL_PREFIX= ../../llvm \
    -DCMAKE_BUILD_TYPE=Release ..

    It takes about 15 minutes to build LLVM and clang with make -j24.

  • Not all QEMU targets are supported. x86_64-linux-user and i386-linux-user compile more or less as is, but you cannot compile x86_64-softmmu in HQEMU 2.5.2 (or any of the other software MMU targets) on ppc64le without this patch. I haven't tried any of the ARM targets, but I have no reason to believe they don't work. None of the other architectures or targets are supported. My recommended configuration command line for the T2 family is:

    ../hqemu-2.5.2/configure --prefix=`pwd` \
    --extra-cflags="-O3 -mcpu=power9"\
    --target-list=x86_64-linux-user,x86_64-softmmu \
    --enable-llvm

    It takes a couple minutes for a build with make -j24.

  • If you rebuild hqemu and you get a weird compile error while building some of the LLVM-related files, make sure that the llvm-config from the modified LLVM suite is first in your PATH (not the system one).

I'll include a couple screenshots of QEMU 3.0 and HQEMU 2.5.2 running a benchmark in ReactOS 0.4.9 under full system emulation here; you should be familiar with using QEMU by now, so I won't discuss its use further in this article. I used the CrystalMark benchmark not because it's particularly good but because most of the typical Windows benchmarking programs don't like ReactOS. First is QEMU, second is HQEMU.

You'll notice there were some zeroes in the benchmark under HQEMU. That's because they made HQEMU segfault! Also, oddly, the ALU score was worse, but the D2D and OpenGL scores -- done purely in software -- were two to four times higher. Indeed, ReactOS is a lot more responsive under HQEMU assuming you don't do anything it doesn't like. If you need to run a Windows application on your T2 and HQEMU doesn't poop its pants while running it, it is a much faster option. Note that some or all of these numbers can improve if you have any VirtIO support in your OS and/or appropriate drivers, which I have intentionally not used here to demonstrate the worst case. You may also be able to use your local GPU with virgl. We might look into that in a future article to see how practical non-native gaming is.

Instead of dithery benchmarks in a full system emulator, however, let's try to better quantify the actual CPU emulation penalty by running a simple math benchmark under QEMU's user mode. One of the easiest is to use the command line calculator bc to compute the digits of π, which can be done by taking the arctangent of 1 and multiplying it by 4. You can then use the scale= variable to set the difficulty level, such as echo "scale=5000;4*a(1)" | bc -l, which will (slowly) compute 5000 digits of π. (It takes around 30 seconds on modern systems.)

However, when you run a foreign architecture binary, you also need each and every one of the libraries it links to from that architecture and current versions of bc have several additional dependencies. This somewhat unnecessarily complicates our little benchmark test. Fortunately, its ancestor, the venerable dc utility, has no dependencies other than libc, as proven from the output of objdump -p:

[...]
Dynamic Section:
  NEEDED               libc.so.6
[...]
Version References:
  required from libc.so.6:
To port this simple benchmark we will take advantage of a little-known fact that bc used to be simply a front end to dc; systems such as AIX and apparently some BSDs can still "compile" bc scripts to dc scripts with the -c option. I've provided the stripped down output from my own POWER6 AIX server to compute digits of π in dc as a gist. Download that and put it somewhere convenient (for the examples in this article I saved it to ~/pi.dc). Note that versions of GNU dc prior to 1.06 or so will not properly parse this script but most of the dc binaries of non-GNU provenance I've encountered will run it fine. Get a baseline by running it against your system (here, my own Talos II):

% time dc ~/pi.dc
3.14159265358979323846264338327950288419716939937508
0.004u 0.001s 0:00.00 0.0% 0+0k 0+0io 0pf+0w

(The exact format of the time command's output will depend on your shell; this is the one built into tcsh.)

Next, increase the number of digits computed by changing the line 50k to 500k (i.e., 500 digits), and time that.

% time dc ~/pi.dc
3.1415926535897932384626433832795028841971693993751058209749445923078\
164062862089986280348253421170679821480865132823066470938446095505822\
317253594081284811174502841027019385211055596446229489549303819644288\
109756659334461284756482337867831652712019091456485669234603486104543\
266482133936072602491412737245870066063155881748815209209628292540917\
153643678925903600113305305488204665213841469519415116094330572703657\
595919530921861173819326117931051185480744623799627495673518857527248\
9122793818301194912
2.393u 0.001s 0:02.39 100.0% 0+0k 0+0io 0pf+0w

Assuming those numbers look accurate, finally bump it to 1000 to get a less noisy test. I'll spare you the digits.

[...]
20.833u 0.007s 0:20.84 99.9% 0+0k 0+0io 0pf+0w

Call it about 20 seconds of wall time natively (though I should note that Fedora 28 ppc64le is compiled for POWER8, not POWER9). Now, let's set up our x86_64 library root for the emulator test. Your distro may offer you these files in some fashion as a package, but I'll assume it doesn't and show you how to do this manually.

  1. Create a folder called debian-lib-x86_64. Our libraries will live here.
  2. Download the desired x86_64 (a.k.a. amd64) .deb of libc. I used the one from Jessie, but any later version should work.
  3. Uncompress it and find data.tar.xz within the .deb. Uncompress that.
  4. Within the data subfolder thus created, drill down to lib/x86_64-linux-gnu. Move that folder to debian-lib-x86_64/lib.
  5. Within debian-root/, create a symlink from lib to lib64 (i.e., ln -s lib lib64).

If you did this correctly, you should have a debian-lib-x86_64/lib with a whole mess of files and symlinks in it, and a debian-lib-x86_64/lib64 that points to the same place. Any additional libraries you need can simply be thrown into debian-lib-x86_64/lib.

Next, grab the x86_64/amd64 build of dc. I used the version from Buster since it matched the one on my Fedora 28 install, 1.07.1. It will work fine with the Jessie libs, at least as of this writing. Uncompress the .deb, find data.tar.xz, uncompress that, and find the dc binary within the created data folder. Move it somewhere convenient. For the examples in this article I saved it to ~/dc.amd64 and my x86_64 Debian libraries are in ~/src/debian-lib-x86_64.

First, let's test with QEMU itself. This assumes your pi.dc script is still set to 1000k.

% time ~/src/qemu-3.0.0/x86_64-linux-user/qemu-x86_64 -L ~/src/debian-lib-x86_64 ~/dc.amd64 ~/pi.dc
[...]
62.736u 0.026s 1:02.77 99.9% 0+0k 0+0io 0pf+0w

This is about three times slower than native dc, which isn't as dismal as you might have expected because all the syscalls are native instead of being emulated as well. We already know HQEMU will do this faster, but it'll be interesting to see how much so.

% time ~/src/hqemu/build/bin/qemu-x86_64 -L ~/src/debian-lib-x86_64 ~/dc.amd64 ~/pi.dc
[...]
45.181u 1.976s 0:27.40 172.0% 0+0k 0+0io 0pf+0w

Yes, 172% CPU utilization because of HQEMU's background optimization threads, but wall clock time is only 27 seconds! That's "only" 35% higher!

Do note that HQEMU's optimization isn't free. If we reduce the number of digits back down to 50 (i.e., 50k), we see this:

% time ~/src/qemu-3.0.0/x86_64-linux-user/qemu-x86_64 -L ~/src/debian-lib-x86_64 ~/dc.amd64 ~/pi.dc
3.14159265358979323846264338327950288419716939937508
0.048u 0.002s 0:00.05 80.0% 0+0k 0+0io 0pf+0w
% time ~/src/hqemu/build/bin/qemu-x86_64 -L ~/src/debian-lib-x86_64 ~/dc.amd64 ~/pi.dc
3.14159265358979323846264338327950288419716939937508
0.164u 0.016s 0:00.17 100.0% 0+0k 0+0io 0pf+0w

In this case, HQEMU is about three times slower than regular QEMU because of the LLVM optimization overhead over a very brief runtime. This example is still a nearly imperceptible seventeen hundredths of a second in wall clock terms, but if your workload consists of repeatedly running an alien architecture binary with a short execution time over and over, HQEMU will cost you more. Admittedly, I can't think of too many workloads in this category, but I'm sure there are some.

The take-away from this is that if you have a Linux binary from an x86_64 system and you can collect all the needed libraries, it has an excellent chance of at least working, and if it's something HQEMU can run, working with a relatively low performance penalty. The trick, of course, is collecting all those libraries, which could be a quick trip to dependency hell, and messing around with binfmt for transparent execution is left as an exercise to the reader. Full system emulation still has a fair bit of overhead but it's easy to set up and generally liveable, even in pure TCG QEMU, so you can do what you need to if you have to. Now go take a shower and wash all that Intel off.

Would you pre-order a Blackbird for $875?


Would you pre-order the new "Tiny Talos" Raptor Blackbird for $875? (Hint: compare this to the new weaksauce Space Gray Mac mini and you tell us what you'd rather buy.) We're planning to because we think this is a fabulous deal on powerful user-controlled hardware and a much lower barrier to entry to the POWER9 ecosystem, and we'll be reviewing it right here to see how viable the low-end spec can be. Do note this is mainboard cost, and while the Blackbird has lots of on-board peripherals, the RAM, storage and (probably) CPU will be extra (the base 4-core POWER9 right now appears on Raptor's site for $375). Either way, tell Raptor your interest level on their straw poll.

Roadgeeking with the Talos II (or, Workstation FUD and Loathing)


Now that Phoronix has published very impressive comparison benchmarks between their 2x22 Talos, AMD Threadripper and Intel Core i9, the next bit of pooh-poohing is "ppc64 (and LE) aren't ready for workstation duty."

FUD. Absolute FUD.

It's definitely true that big-endian systems are sometimes a little rockier to work with (we know all about this from our TenFourFox gig, of which this blog is a spinoff) because after the Macintel transition there are a lot fewer big-endian workstations on developer desktops. This isn't to say that running the Talos II big-endian is impossible, however: much, even most, of the software out there works fine, and if you're prepared to put up with a hiccup now and then and/or pitch in with fixes where needed, it's perfectly liveable.

But this isn't the case for little-endian PPC64, which works well. A few major apps aren't fully there, though some of it is administrative screwaroundry like Google being Google about accepting PPC (and MIPS) changes back to Chromium. On the other hand, although JavaScript in ppc64le Firefox is currently interpreted (I'm done with the first draft of the Ion code generator for the POWER9 JIT, and I'm now working on the trampoline, but this is still several months away), it works fine with the right configuration options and I'm told it still works fine on big-endian PPC64 too. My F28 Talos runs VLC, LibreOffice, Krita, GIMP, QEMU and many other essential apps (like ioquake3) out of the box with the distribution-provided packages. It will only get better as people see the advantages of OpenPOWER for a truly free workstation experience.

Still, the goalposts gotta move because haters gotta hate. So here's an example of demonstrating the viability of the Talos II on the desktop with a decidedly unusual hobby: roadgeeking.

Roadgeeks like yours truly drive miles and miles over weird roads to photograph signs and scenery, trace alignments and routings, and annoy highway departments with detailed questions. As a baseline, that work requires the ability to view mapping applications (such as Google Maps, Bing Maps, Open Street Maps, etc.), send mail, and read and work with spreadsheets of mileage and documents and publications. Check, check, check. I use Firefox, LibreOffice and the GNOME Document Viewer for that.

What about photography? Previously I took shots manually with a hand camera, stopping at intervals to grab a picture. This is really hard work and requires multiple passes on the road because of stuff you miss and inclement weather conditions, and sometimes some rather unsafe setups to get the right view on something, but nothing beats manual work for great images and it avoids the windscreen goo and bad angles hand camera shots at highway speed tend to have. Dumping images from the CF card is part of the basic functionality of any Linux system, and the raw images are then formatted and minimally reprocessed for the web with a shell script and ImageMagick (also works fine on PPC64 and ppc64le). Although I really try to avoid retouching to maintain maximum veracity, any editing required was formerly done in Photoshop on my long-suffering Quad G5 but now can be done in Krita (also works fine on ppc64le). Here is my trip on US Highway 395 from San Diego to British Columbia, and part of my trip of US Highway 6 -- still America's longest continuous highway -- from Bishop, CA to Nebraska (the remainder are all on a hard disk and their publication is pending my full writeup). These were all done by hand and I sometimes still do this for shorter alignments.

The labour-intensive nature of this means of photography, however, demands a better technological solution. As a very modern roadgeek, today I use a "flying camera" system -- a high-definition video camera running at 30 frames/sec, taking progressive non-interlaced video at a very fast shutter speed, about 1/2000th of a second. This enables live views of the road from the driver's perspective at speeds up to 70mph, and every frame is potentially a high-definition still because of the progressive video and fast shutter speed. Although a 1080p image (effectively 1440x1080 upscaled to 1920x1080) is still poorer resolution than even most mobile phone cameras, let alone a point-and-shoot or DSLR, anything the camera can see is captured, I don't have to stop for images, I can stay at highway speeds, I can still adjust zoom and some of the positioning on the fly (or switch lanes as required), and certain shots would not even be possible without such a setup. When I eventually upgrade the camera, the resolution will only improve. Plus, HDV 1080p is perfectly acceptable resolution for the web even nowadays and I can always stop for a manual shot with a proper camera if I want something larger. The only major drawback is that the fast shutter speed requires a lot of light or it gets grainy, limiting photography to times of day and year when there is sufficient ambient sun.

I devised this system originally for my Quad G5 running Mac OS X 10.4, which doesn't have native AVCHD support in Final Cut or QuickTime but can support HDV (MPEG-2), so I selected the Canon Vixia HV30 which stores HDV to MiniDV tape and has FireWire output. Final Cut then acquires this video over FireWire and converts it to Apple Intermediate Codec which is display and edit-ready. Later I got a Focus FS-CF Pro DTE digital recorder which dumps the HDV video to QuickTime on CF cards; the HDV file can then just be copied directly from the CF card, and HDV yields smaller files than the edit-friendlier AIC which is an advantage for archiving. The G5 can play HDV video in QuickTime using the codec from recent PowerPC-compatible versions of Final Cut Pro, and of course the codec is available on the Talos II also. (I still use boxes of MiniDV tapes for long trips where I don't have enough CF cards, and a nice side benefit is being "instant tape backup," but solid-state is just the way to go for this nowadays.) You can see the setup I use at the right, or click it for an enlarged image. Note the black felt on the dashboard to reduce reflections, a suction-cup mount to the windscreen, and a 15% red tint on the lens to compensate for the green cast from the safety glass. The entire setup is powered by batteries and the car's 12 volt output for recharging.

I'm now in the process of converting my old US 395 exhibit to the new flying camera format (as a bonus, 16:9 instead of the old 4:3 aspect ratio, so landscapes just look more impressive, too) and recently drove the old alignments between Bishop, CA and Carson City, NV, and then the trans-Sierra Ebbetts Pass (CA 4) from Markleeville, CA to Angels Camp, CA and back to southern California. We had about eight hours of footage to sort through on our first trial of using the Talos II to ingest video instead of the G5. As a guard against data corruption the FS-CF DTE is programmed to dump video to the card in 2GB blobs, or about 10 minutes footage each. With a USB 3.0 card reader and USB 3.0 hard drive dock we copied the HDV videos in Nautilus from the CF card to a spinning hard disk for archiving. As it turns out one 10 minute segment was corrupt on the card but it's a section we can easily rephotograph. Our Talos during the video ingest (show us yours) is at right; click for an enlargement.

For frame grabs, previously I would play the AIC or HDV file in QuickTime Player on the G5 and advance quickly up to a scene and go frame by frame until there was a nice image to use. I'd then grab the frame and save it as an image clipping, which I converted to PNG with AppleScript. (The CMOS image sensor "shimmy" actually helps here, because if I was slightly off position or rotated, sometimes the "shimmy" caused by the car's vibration would correct the geometry from frame to frame.)

With VLC, the same workflow is possible. I use the scroll wheel to scrub through the video and then hold down the advance-frame key to slow down until I get to the right spot I want. It looks like this on the Talos (click for an enlarged screenshot):

The workflow is actually better with VLC, in fact, because the screen grabs are already PNG. I mentioned some shots would be impossible without the "flying camera." Here's one of them:

This unretouched screen grab from VLC was captured in Carson City, Nevada from the middle of the road at 55mph at about 5pm on a fall October day. It's as if I froze time, walked to the middle of a busy highway transitioning to full Interstate freeway, and took a picture right there. The slight red cast due to the changing light conditions with the lens tint is compensated for with a little white balance, and the highway gantry is ever so slightly distorted, but this image is pretty much ready to go. And it was acquired with a full libre stack of software and hardware. ImageMagick and Krita will do the rest of what needs to be done and the write up will be done in Firefox.

Yes, this is my hobby. But it's also a great demonstration that doing this kind of work wouldn't be possible on the Talos II without the application support to match. That must mean the application support is already here. The handful of leftover glitches are disappearing by the day, and most everything else works as-is out of the box on most distros supporting ppc64le. If the sticker shock of a full T2 still gives you the shakes and even a T2 Lite is more than you had in mind, then hang out for the Raptor Blackbird next year, which will give you a taste of freedom for lots less green. No matter what your budget, the price points are improving just as fast as the software options.

The point is made, though: POWER9 is desktop-ready now. Anyone who says otherwise hasn't used one.

LaGrange system in the works?


We're all very jealous that Phoronix gets to play with a dual 22-core Talos II (we're just dual 4-core pikers here), but from the comments thread comes the mention of a possible future LaGrange-based system. All of the current Talos family (the Talos II, T2 Lite and upcoming Blackbird) use Sforza POWER9 processors, which currently offer the best flexibility for workstation, workstation-like and low-to-midrange server systems with 48 PCIe 4.0 lanes. However, Sforza "only" has half the memory bandwidth of the bigger beasts with "just" 4-channel DDR4 and a single X-bus SMP link, limiting such systems to two CPUs maximum. LaGrange, by contrast, has "only" 42 PCIe 4.0 lanes, but has 8-channel DDR4 and double the X-bus, making a 4-CPU system possible. LaGrange systems are already in use by Google and Rackspace for their Zaius/Barreleye designs. With SMT-4 and 22 cores, such a system could max out at a whopping 352 threads and would very clearly be positioned against AMD's offerings.

As an aside, those of you who know the entire Nimbus family may be wondering where Monza fits in the Talos product line, and our opinion is currently it doesn't. While Monza has 8-channel DDR4 and more relevantly the best OpenCAPI and NVLink interconnect bandwidth of the three chips, making it an excellent choice for large multinode systems like the gargantuan Summit supercomputer, it pays the price in just 34 PCIe 4.0 lanes. Raptor's current product line wouldn't seem a good fit for its more limited expandability, and Raptor systems aren't currently cheap enough to realize Monza's strength in clustering.

The mention is unofficial and no other details are available, including specs, price or release date, but we'll keep watching.

Making your Talos II into an IBM pSeries


This post has been updated with new information. Thanks to Zhuowei Zhang, the author of the post we reference, for pointing out QEMU did add SMP support for emulated pSeries hardware. Read on.

In our previous series on turning your Talos II into a Power Mac, we spent most of our time with the KVM-PR virtualizer, the "problem state" version of KVMPPC, which is lower performance but has no hardware dependencies and can emulate a great number of historical Power CPUs (including the G3 and G4, which were of most relevance to those articles).

Recently, however, someone pointed me to this blog post on running IBM's proprietary AIX operating system under QEMU and asked about how well this would work on the Talos II. AIX runs on IBM's own POWER hardware and thus affords a good opportunity for exploring KVM-HV, the hardware-assisted hypervisor flavour of KVMPPC, so let's find out.

Parenthetically I should say that I have a very long history with AIX: my first job out of college in 1997 was mostly working on a medium-size PA-RISC university server running HP-UX 10.20, but we also had a number of RS/6000 machines for E-mail running AIX 3.2.5 that I had access to as well. The RS/6000s are, of course, early implementations of the POWER architecture. In 1998, I ended up with an Apple Network Server 500 running AIX 4.1.4 (and later 4.1.5) that became the first floodgap.com until it was decommissioned in 2012. Its replacement was a 2-way SMT-2 IBM POWER6 p520 Express running AIX 6.1 TL.mumble with some hand-rolled patches, and this system still runs floodgap.com and gopher.floodgap.com today. I also have a couple of the oddball PowerPC ThinkPads, a ThinkPad "800" whose SCSI controller fuse got blown by a SCSI2SD upgrade, and a fully functional ThinkPad 860 with a German keyboard running AIX 4.1.5 as well.

I should also add that the licensing situation with AIX on non-IBM hardware is sticky. I may give the lawyers a heart attack with this oversimplification, but the salesdroids I worked with back in the day essentially had the rule that if you own IBM hardware that can run AIX, then you may run it, because you were considered to have an implicit license simply by possessing the hardware. This situation changed after IBM introduced pSeries hardware that was not allowed to run AIX, starting with the original POWER5 OpenPower machines: even though they are IBM hardware, they are not licensed for AIX, even though you allegedly could coerce AIX to run on at least a subset of these machines with some work.

This handwavy "some work" is what QEMU provides. There is enough of a pSeries-like environment to at least boot AIX, though some pieces are still missing and the kernel appears able to detect it's running under QEMU. However, whether it functions or not, it may not be legal to run an AIX installation on an OpenPOWER or PowerNV system like the Talos II even under virtualization because OpenPOWER and non-IBM Power ISA systems are explicitly not licensed for AIX. IBM is unlikely to come after you if you're just playing around with it, but you have been warned.

First of all, make sure your system is able to run QEMU under virtualization. You should be running at least kernel version 4.18 (my Fedora 28 T2 has 4.18.16) and QEMU 3.0. Check that kvm_hv shows up in lsmod to make sure it has loaded. You shouldn't need to make any modifications to it for this tutorial. If it hasn't loaded, try sudo modprobe kvm_hv to make sure the modules are enabled (check the dmesg if you get errors). There shouldn't be any problem if your kernel boots in HPT instead of radix MMU mode as mine does to enable KVM-PR.

Next, get bootable media. Although I have a set of install discs for AIX 7, the version I have is too old to boot on POWER9 systems (it's intended for when I get around to it with my POWER6), so for this demonstration we'll simply use the diagnostic image that the author of the blog post above uses. Although any of the diagnostic images compatible with POWER9 will work, download the CD72220.iso image to use the patch tool that author offers. This enables you to boot to a limited root shell to snoop around the filesystem. I haven't gotten around to updating the patcher for the more recent images, but this one will suffice for our purpose.

QEMU provides a graphical console and USB keyboard, but just like a real IBM system, only specific IBM-supplied devices are supported as the AIX console terminal (my own POWER6 requires a particular IBM USB keyboard and mouse, naturally provided at a confiscatory markup, to drive a console powered by a GXT145 graphics card). Since QEMU doesn't know how to provide these devices yet, we'll tell QEMU to provide an emulated serial terminal connected to one of the emulated system's VTYs instead, which will "just work." This emulated serial terminal is provided in the terminal session you run QEMU from, not the main QEMU window.

AIX will boot under TCG, the built-in JITted CPU emulation system. This is very slow but will demonstrate the speed differential versus running with hardware assistance. The same command line provided in the original blog post will work here too (I recommend keeping verbose booting enabled if you run with TCG so you can be reassured QEMU hasn't frozen); substitute your ISO filename below:

qemu-system-ppc64 -cpu POWER9 -machine pseries -m 2G -serial mon:stdio -cdrom iso/aix-72220-patched.iso -d guest_errors -prom-env "input-device=/vdevice/vty@71000000" -prom-env "output-device=/vdevice/vty@71000000" -prom-env "boot-command=dev / 0 0 s\" ibm,aix-diagnostics\" property boot cdrom:\ppc\chrp\bootfile.exe -s verbose"

When QEMU starts, just stay in the terminal session and minimize its graphical console; you won't be using it. Booting under TCG takes about seven minutes on my 32 thread (dual 4-core SMT-4) Talos II with QEMU built with -O3 -mcpu=power9. As the original author indicates, the boot will stall for some minutes (about six on my system) at the define_rspc step. You'll also notice four-digit hex codes appearing at the bottom of the terminal session representing the state of the bootloader which any AIX admin will recognize (real IBM hardware and the Apple Network Server display this on a front LCD or LED panel). Once the system prompts you to press 1 and press ENTER, do so, and it will either enter the diagnostics menu or the root shell depending on if you're using the patched ISO or not. This is sufficient to show it basically works but you will already appreciate this is dreadfully slow for any task of substance.

So, kill the QEMU process (or close the graphical console window) and let's bring it up with KVM-HV this time. SMP is supported, so let's give it four cores while we're at it to start with. You can continue to use a verbose boot if you want but this starts up so quickly you'll probably just find the messages annoying. As above, substitute your ISO filename below (if you get an error saying that the KVM type isn't supported and you know that kvm_hv is loaded, try booting it with just accel=kvm):

qemu-system-ppc64 -M accel=kvm,kvm-type=HV -cpu host -smp 4 -machine pseries -m 2G -serial mon:stdio -cdrom iso/aix-72220-patched.iso -d guest_errors -prom-env "input-device=/vdevice/vty@71000000" -prom-env "output-device=/vdevice/vty@71000000" -prom-env "boot-command=dev / 0 0 s\" ibm,aix-diagnostics\" property boot cdrom:\ppc\chrp\bootfile.exe"

Notice that we are using -cpu host. KVM-HV only supports virtualizing the actual CPU itself or the generation immediately before (-cpu power8 thus should work, but not -cpu power7 or before).

Once started, this virtualized boot shoots straight to the "press 1 on console" message in about 50 seconds on my box (!!), and all the way to the diags menu/root shell prompt in just under one minute. Much faster! As you explore the command line, do note that there are many missing binaries in the miniroot the diags disk provides and the terminal emulation (and my delete key: I manually backspaced with CTRL-H) have many glitches. This is to be expected since this disc was never meant to provide a shell environment and the components of the miniroot exist only to support the diagnostics front end. (In addition, it is not possible to actually configure the terminal correctly from the diags menu and therefore do anything useful, probably due to missing support in QEMU. Even if you enter a valid terminal type, the diagnostics front end will continue to complain the terminal was improperly initialized and prevent you from doing anything further.)

Nevertheless, once you get a root shell up, it's interesting to compare lsattr -E -lsys0 on real IBM hardware and on this emulated system. On my POWER6, here are some selected entries (I censored the system ID from the hardware VPD, nothing personal):

ent_capacity 2.00 Entitled processor capacity
frequency 2656000000 System Bus Frequency
fwversion IBM,EL350_149 Firmware version and revision levels
modelname IBM,8203-E4A Machine name
systemid IBM,{censored} Hardware system identifier

But some values are definitely different (and occasionally abnormal) on the emulated pSeries system. Some are even missing outright despite having a placeholder. Here are the corresponding ones from our virtualized 4-core box:

ent_capacity 4.00 Entitled processor capacity
frequency System Bus Frequency
fwversion SLOF,HEAD Firmware version and revision levels
modelname IBM pSeries (emulated by qemu) Machine name
systemid Hardware system identifier

The difference in entitled processor capacity is due to our command line options, but the CPU frequency is oddly unreported and the various other identifiers have different values or are unpopulated. This is possibly how the kernel was able to detect it's running under virtualization.

If you're curious what other hardware support is present, lsdev looks like this (with the given command line):

# lsdev
L2cache0   Available       L2 Cache
cd0        Available       N/A
mem0       Available       Memory
pci0       Available       PCI Bus
proc0      Available 00-00 Processor
proc8      Available 00-08 Processor
proc16     Available 00-16 Processor
proc24     Available 00-24 Processor
rcm0       Defined         Rendering Context Manager Subsystem
sys0       Available       System Object
sysplanar0 Available       System Planar
vio0       Available       Virtual I/O Bus
vsa0       Available       LPAR Virtual Serial Adapter
vscsi0     Available       N/A
vty0       Available       Asynchronous Terminal

The (in)famous AIX smit system configuration tool can be made to work from the command line; try something like TERM=vt100 /usr/bin/smitty to start it. As we say in the biztm, "smit happens."tm Use CTRL-L to repaint the screen if needed; if you see key combinations like "Esc+0," press ESC, release it, and then quickly press the second key. Note that this version of smit is missing quite a few screens and not everything does anything.

To bring down the system cleanly, not like it really matters here, just type exit at the shell, eject the virtual CD if you want to (Y or N), and then indicate to halt the system (H). AIX will respond with Halt completed and QEMU will automatically exit.

IBM used to be a lot more interesting with AIX. AIX 4 in particular offered a lot of workstation features and even a few games (my ANS 500 has AIX ports of Quake and Abuse on it), but modern versions are intended as buttoned-down server OSes and any client functionality is either accidental or secondarily grafted on. That said, after AIX 5L it got a lot easier to build stuff on AIX (either with xlc or gcc) and my full-service POWER6 (web, gopher and E-mail) runs a good collection of servers and utilities I ported myself plus all my old binaries I built on the Apple Network Server without comment. AIX is definitely different (and arguably staid and humourless) and its underpinnings such as the ODM may not be immediately familiar, but it's a tough OS that can take punishment and run like a tank, and I have to admit that I do love the jackboots. Despite having my own real hardware, it is fun to see it boot and run on the Talos even if only in a limited sense.

Fedora 29 out


The IBM merger may be in, but Fedora 29 is out, and it's business as usual at Red Hat with its release today. Fedora is our distro here at Floodgap-Talospace and we'll be updating soon with a review focused on F29 and how it works on the Talos II.

Under the Fedora Alternative Architectures are server and "everything" install images for both big-endian ppc64 and ppc64le (though I still believe that ppc64's lifetime is limited on Fedora, and it's highly possible this issuance will be a dead end). You can then immediately download the components to turn these into workstation releases (I did a "quick" dnf install @workstation-product-environment on my own system but you can also go into gory detail).

More coverage in Fedora Magazine with update instructions (from F27 to F28, but the same steps will work from F28 to F29).

Also announced in parallel: Red Hat Enterprise Linux 7.6.

IBM buys Red Hat


Red Hat has been purchased by IBM in a deal worth US$33.4 billion or approximately US$190/share, well above its Friday closing price of US$116.68. The deal is clearly positioned to diversify IBM away from its stable but slow-growing legacy mainframe line of business, which recently had been the only major portion of the company with a positive showing, and jump-start its ailing cloud offerings.

What this means for Red Hat Enterprise Linux (RHEL), the company's premier enterprise offering, is continued stability for large customers and their service contracts. IBM has always served its large accounts well and this will increase confidence in using IBM server hardware outside of the traditional AIX and Z shops. It also almost certainly indicates an enhanced commitment to RHEL on IBM's POWER hardware, which due to being a Linux on Power partner was already the preferred Linux option on hardware IBM sold directly, and IBM may push RHEL to OpenPOWER hardware customers as a product more strongly in the future.

Fedora is also very unlikely to be affected by this: IBM has made substantial contributions to open source over the years, and fumbling a high-profile project like Fedora which even Linus himself uses (at least of late) would be an own goal of such epic proportions that even the sometimes brazenly incompetent IBM upper management would probably not do it. However, given that as far back as 2016 IBM said that ppc64le was the future for Linux on POWER, this all but guarantees big-endian ppc64 will not return to Fedora after its removal in the upcoming F29. (We run F28 on our own Talos II.)

The outlook for the various Fedora and Red Hat downstreams like CentOS is less clear, but it seems reasonable to assume that if Fedora remains intact, they will also continue to do so (albeit potentially with less official support post IBM-Red Hat, and it is possible CentOS developers who work for Red Hat may not be so employed after the merger). There is no word on what this means for Red Hat's other products.

From the perspective of the OpenPOWER ecosystem and the Talos systems specifically, there are good reasons for optimism. It more firmly weds IBM and Power ISA to Linux and will likely elevate ppc64le to a tier-1 offering within the Red Hat and Fedora ecosystem to further IBM's strategic goals with OpenPOWER. To the extent the underlying work makes its way back to the Linux source tree, it in turn can trickle down into better POWER9 and Power ISA support in any Linux distribution that chooses to take advantage of it, and that eventuality can only be positive. Unfortunately IBM management has recently become more ossified and less visionary than ever, and as a result the company has not shown good performance and innovation outside of its core large system competency and its research labs. The old joke during the Power Mac AIM (Apple-IBM-Motorola) alliance days was Apple plus IBM equals IBM, but no one was really laughing. Red Hat certainly has its own bureaucratic issues to deal with, but our worry is that the IBM monolith will affect Red Hat far more than the other way around.

Patches needed for Firefox 63


Unfortunately the most currently available package of Firefox 63 for Fedora 28 ppc64le doesn't even start, but again, it works fine if you build it from source. If you were able to successfully build Firefox 62 with our .mozconfigs, then you will need to install an updated cbindgen (preferably from your package manager, but I actually had to cargo install cbindgen to get a version recent enough that the build system would accept), node.js (?!), and the patches from bug 1494037 and bug 1498938. (The issue with the Fedora package has been filed as Redhat bug 1643729.)

Meanwhile, I'm about halfway through the code generator on my task to add a POWER9 JIT to Firefox based on the work in TenFourFox. Going from a G5 to POWER9 is a big jump. Lots of delicious new instructions.

Chromium for ppc64le builds available


While waiting for progress on accepting their patches to the tree, the ppc64le Chromium team has binary builds available.

Blackbird scheduled for availability in Q1 2019


From Twitter, Blackbird exists in hardware, and is estimated for general availability in Q1 2019. Our spies at OpenPOWER found the unit to be well-developed and functional, but a solid first impression obviously is no substitute for a full review. I'll be purchasing one to look at how well POWER9 systems can cover the low-end, as well as a test system for development work, and we'll be reviewing it here. The price is still planned to be less than the Talos II Lite, but no word on exactly how much less.

Ubuntu 18.10 now available


As reported in the official announcement, Ubuntu 18.10 is now out of beta and with many useful changes. A server install ISO image is available for ppc64le, which you can then convert to the desktop flavour.

MXE fix for ppc64le


One of my favourite tools is MXE, which ably builds Windows applications on your platform of choice as shown in the screenshot, and offers many libraries and toolks such as SDL, SDL2, Qt and NSIS. (I use it for OverbiteNX, for example.) However, the current version does not build on ppc64le because the gcc cross-compiler needs a later patch for POWER8+ which wasn't backported. Now it's been backported. While waiting for the pull request to enter the main tree, you can pull from our Github fork if you want to try it now.

The Talos II really performs nicely in QEMU, even with pure TCG emulation of the x86. ReactOS boots very smoothly on it.

Initial Blackbird specifications announced


Raptor in a series of tweets has made initial announcements about the specifications of the new Blackbird system. As expected, it is a single POWER9 CPU system with two ECC DDR4 2.666GHz RAM slots, two PCI slots (x16 and x8), onboard HDMI via an AST2500, same NIC as the Talos II, 4x onboard SATA and 5.1 sound with S/PDIF out. Combined with an estimated under 100W power consumption (with a 4-core CPU), which is very welcome, and you have a system that can live in many more settings than a desktop workstation. Maximum core count was not yet announced, but our guess is that the Blackbird will top out at 8, nor a price.

We're planning to preorder one of these and we'll review it (and compare it with the big T2) here.

Update: A Raptor Wiki entry is now available which seems to confirm that 8 cores is the limit on this machine "due to power delivery limitations." In addition, 8-core systems have a slightly slower all-core turbo frequency. Raptor says on Twitter they "should in fact be able to support the full 8 core device at listed clock speeds."

Raptor confirms Talos II not subject to Supermicro chip hack


Bloomberg dropped a bomb earlier today alleging Chinese state actors compromised thousands of Supermicro motherboards by infiltrating the supply chain to insert tiny, almost undetectable chips as exfiltration hardware. The chips, manufactured by the Chinese military, were designed to look like innocuous board components but actually contained memory, networking and sufficient processing power to apparently exploit the machine's BMC at a very low level. The devices could literally do almost anything, and do so in a way that could be nearly undetectable.

It should be said in the interest of journalistic accuracy that Apple, which jettisoned Supermicro servers from its data centers for reasons it said were unrelated to this issue in 2016 and disputes the account, and Amazon, which vehemently denied the report, have both attacked the article (as well as Supermicro, of course). Nevertheless, we are informed by Raptor today that the Raptor systems, from the Talos II to the brand-new Blackbird, are designed and manufactured in the United States and are not subject to this issue. In addition, Raptor verifies manufactured boards against their own schematics, and OpenBMC as used in the T2 family is completely open-source. The Supermicro case that the T2 comes in has not been reported to be affected, and so far no malicious components have been identified in the power supplies or power routing systems, nor are we able to currently detect any in our system at Floodgap Talospace.

"Tiny Talos" reveal scheduled for tomorrow


The mATX "Tiny Talos" has a reveal date tomorrow at OpenPOWER, and seems to have an official name: Blackbird.

Updated: It's official! Specs to follow.

Ubuntu 18.10 beta announced


The beta release of Ubuntu 18.10 "Cosmic Cuttlefish" is now available. The release notes are currently a little sparse but the update is good news for our *buntu audience. As we previously reported, Ubuntu 18 is the first release to support the POWER9 and should "just work." You can download the server image, which you should be able to convert to a workstation installation.

Fedora 29 Beta announcement


The official Linux distro here at Floodgap-Talospace is Fedora, and we use 28 on our own system. Fedora has now made an announcement for the F29 beta, scheduled for final release by the end of October. Among other improvements in F29 are expanded modular repositories and GNOME 3.30, along with other updated core libraries.

Oddly, the images available for download do not, as of this writing, include any build for ppc64le. However, a ppc64le release is planned for the Server flavour, You can download beta Server images for ppc64le, which you can then turn around and convert into Workstation.

Chromium on POWER9 ready to land


As reported on the Chromium development mailing list, patches are ready to land to make Chromium production-ready on the Talos II and other POWER9 systems. The last remaining blocker in libvpx was worked around by disabling VSX-AltiVec support, presumably temporarily (the patches TenFourFox uses for AltiVec VP8 and VP9 are maintained independently and are not part of libvpx). This includes a functional JavaScript JIT using V8.

Chromium is an important dependency for many tools, such as QtWebEngine, so this port definitely improves the T2's viability and compatibility as a workstation. However, I'll say as a personal note, and with full disclosure as a long-time member of the Mozilla community, moves like Chrome 69's forced integration with Google web services (even cookies won't be cleared unless you log out) continue to make me unwilling to support this project myself even though I'm glad it exists. At least there are alternatives like Ungoogled Chromium, at least for right now.

Firefox doesn't have a JIT, but it also doesn't have Google or its general untrustworthiness and it otherwise works fine, and my goal is to get ppc64le supported in SpiderMonkey eventually. Meanwhile, if you really do prefer Chromium, you now have a fully-working port and this hard work should be commended.

Nested virtualization coming to POWER9


On the KVMPPC mailing list, Paul Mackerras posted for comments a new set of updates to KVM-HV allowing POWER9 systems in radix MMU mode to finally nest virtualization (i.e., run a virtualized POWER9 guest within another virtualized POWER9 guest through KVM-HV). This is not only a big boon to shops that run Power ISA virtual machines in terms of enhanced security and portability, but also offers the potential for improved debugging and development.

As you will no doubt recall from our previous series on turning your Talos into a Power Mac, the Kernel-based Virtual Machine functionality on Power ISA and PowerPC comes in two flavours: KVM-PR, which emulates supervisor instructions in software and thus is slower but more flexible and can be nested, and KVM-HV, which uses hardware hypervisor support in later Power ISA chips and is faster, but cannot emulate most earlier CPUs and previously could not be nested (though a KVM-PR guest can run within a KVM-HV guest, and additional KVM-PR guests within that).

With these patches, nested KVM-HV guests are now possible, and can run at nearly full speed. Let's define the base hypervisor to be at level 0 ("L0"). L0 can use the hardware virtualization support to run a guest at level 1 ("L1"). An L1 guest, however, currently cannot do the same thing, so it can't spawn any additional nested VMs under its own control. The trick with these patches is to add hypercalls to allow an L1 guest to ask the L0 hypervisor to create another guest on its behalf, but set up address translation that the L1 guest can manipulate. The new guest is actually another L1 guest, but it looks like an L2 guest because L0 will in effect translate the fake L2's addressing requests through the L1 guest that requested it using a combination of instruction emulation and paravirtualization. The emulated L2 guest should be able to then turn around and request a new VM itself, and the L0 hypervisor will make another L1 guest that the faux L2 guest can control that acts like an L3 guest, and thus turtles all the way down.

Because it is still inherently KVM-HV, however, it inherits all of its basic limitations such as only supporting the current processor generation and the one immediately preceding it. In addition, the current nested guest implementation relies on radix MMU mode, the default MMU mode of the POWER9 (KVM-PR requires hashed page table MMU mode), meaning it does not support earlier Power ISA generations that only support hashed page tables. The patches are out for comments on the mailing list and hopefully will be incorporated into the Linux kernel tree in the very near future.

More news on the "Tiny Talos"


Raptor is revealing more details on what we'll christen the Tiny Talos. In posts on Phoronix, engineer Timothy Pearson indicated that the unit will take "low-end Sforza" parts (likely capped at four or possibly eight cores, assuming SMT-4), putting it at the low-end under the T2 Lite. Interestingly, integrated sound is available as well as presumably integrated video.

Whatever it is, the full reveal is scheduled for October. We'll be watching.

Alpine Linux updated to 3.8.1


Alpine Linux has been updated to version 3.8.1, including bugfixes and security updates (though note this information on a RCE this release supposedly fixes). The ppc64le versions available for download don't seem to have support for POWER9 or Talos systems yet (only POWER8 systems so far), but hopefully this will change in the near future.

More POWER in Firefox 62


The fixes for compilation and better performance on ppc64le yours truly contributed to Firefox are now in the release channel with Firefox 62. These were bug 1464751, bug 1464754 and bug 1465274, which was a spin-off from bug 1434589.

Unfortunately, the Fedora pre-built Firefox 62 seems to have a crippling crash bug in it when typing addresses into the location bar. Your distro's package may vary. However, building from source doesn't seem to be affected, implying some build configuration issue on their end. Note that there are build system changes in 62 which require some additional workarounds in your .mozconfig and this might have been what bit them. Here's what I use for making a debug build:

export CC=/usr/bin/gcc
export CXX=/usr/bin/g++

mk_add_options MOZ_MAKE_FLAGS="-j24"
ac_add_options --enable-application=browser
ac_add_options --enable-optimize="-Og -mcpu=power9"
ac_add_options --enable-debug
ac_add_options --disable-jemalloc
ac_add_options --disable-release
ac_add_options --enable-linker=bfd

export RUSTC_OPT_LEVEL=0
Adjust the -j24 to the number of threads you want (I like keeping some resources free, so I reserve eight threads from the 32 on this system). The linker defaults to gold, which doesn't work right on ppc64le; this configuration forces it to GNU ld ("bfd"). This config also forces the use of gcc instead of clang; you change to your tastes.

Making a release build seems to have some problems on POWER9 still, so that's disabled, along with jemalloc. I also have a binary of gn (from Chromium) used to regenerate some configurations, which I'm happy to provide upon request. If you have such a binary, then add export GN=/path/to/gn to let the build system use it.

Save this as .mozconfig in the root of the Mercurial tree you cloned or tarball you expanded, and then ./mach build to build.

For an optimized build, such as the one this blog post is being typed in, the config is nearly the same:

export CC=/usr/bin/gcc
export CXX=/usr/bin/g++

mk_add_options MOZ_MAKE_FLAGS="-j24"
ac_add_options --enable-application=browser
ac_add_options --enable-optimize="-O3 -mcpu=power9"
ac_add_options --disable-jemalloc
ac_add_options --disable-release
ac_add_options --enable-linker=bfd

export RUSTC_OPT_LEVEL=2
Unfortunately, setting MOZ_PGO (for profile-guided optimization) and MOZ_LTO (for link-time optimization), although they complete, seem to generate defective executables. That will be a project to work on later. The RUSTC_OPT_LEVEL is probably unnecessary here but doesn't hurt.

My internal builds also use a port of TenFourFox's basic adblock to reduce the amount of JavaScript it needs to run, since Firefox does not yet have a JIT for ppc64le. That's something I'm working on as well, inspired by the big-endian 32-bit PowerPC JIT in TenFourFox, but this JIT will be 64-bit and little-endian so that we can get wasm up and running. I'll be posting progress reports here as the work moves along. This is rather different than the folks working on the ppc64le Chromium port, which uses the existing Power ISA support in V8 and is trying to get the rest of the browser up. For philosophical reasons I won't be working on that project (I think Google's dissemination of Blink is not ultimately benign, a topic for another day), but I support more browser choice on our new platform, and I hope they are successful too.

Talos shines at OSS North America


Raptor was at the Linux Foundation's Open Source Summit North America, and at least one attendee was very impressed with the Talos II. Theirs run Debian (we run Fedora here), but we can still be friends. The unit on display was the dual-socket T2 with two 4-way CPUs for 32 threads, the same as the one your humble author is typing on.

Is there a tiny Talos in the timeline?


Did Raptor just tip their hand on a micro-ATX Talos? No announcements on price or capacity, but being one CPU, it would probably be most comparable to the Talos II Lite (1TB RAM maximum, PCIe x16 and x8). This is doable in an mATX form factor, but given the size of the heatsinks in the EATX T2 and T2 Lite, cooling might be an issue in a case this small. We'd imagine the price would be competitive with the T2 Lite as well.

Some fun case graphics might give this thing a little style, too (see our artist's impression of the image they linked).

Musing over POWER9 roadmap at Hot Chips


(See the presentation from AnandTech's live blog at Hot Chips.)

With the news that GlobalFoundries has stopped all 7nm development, the next step for the Power ISA got more nebulous. IBM really phoned in their presentation at Hot Chips this time around; there wasn't a lot of meat on the bone, and they probably got advance warning of the changes at GF which likely cut what they were willing to say in public. But IBM still has one more stop on the roadmap for the POWER9, so they're not done with 14nm yet.

The 2019 "advanced I/O" POWER9 will increase memory bandwidth from the "scale up" 210GB/s to 350GB/s, over twice as much as the "scale out" cores in the Talos II at 150GB/s. IBM didn't appear to say if this would require buffering or if it was direct attached memory, though our incompletely informed suspicion here is the former. If so, it wouldn't be a direct replacement for the Sforza cores the T2 runs now; the board would probably need a redesign to accommodate whatever Centaur successor they require. That would also have power and thermal impacts in a workstation form factor. I/O on the "AIO" POWER9 jumps to OpenCAPI 4.0 from 3.0, allowing caching on accelerators and additional link widths, and NVLink 3.0 from 2.0, presumably both over Bluelink. IBM didn't announce clock speeds, but given that the core counts are the same, they're most likely identical or comparable.

IBM also said rather little about the POWER10. No core count was reported and the node size was pointedly not shown. However, signaling was announced at 32 and 50GT/s, up to double the POWER9, indicating IBM continues to prioritize bandwidth as their competitive advantage against x86 commodity servers. The timeframe is still 2020, so we can expect at least another 18 months of POWER9 goodness.

Making your Talos II into a Power Mac: dcbz considered harmful (part 2)


In the first part of this article we talked about getting your Talos II prepped to emulate a Power Mac using KVMPPC, the kernel virtualization facility in Linux. Having followed the instructions in that article, you've got your kernel in hash table mode, you've got the KVM-PR kernel module loaded (and patched it if necessary), you installed (or built) QEMU, and you have a blank QEMU disk image ready to go.

For this part, we will assume you have chosen 10.3 Panther, 10.4 Tiger or 10.5 Leopard to install. I will discuss Leopard relatively little other than how to get you started in it; most of the rest applies to Leopard that applies to Tiger. I'll briefly discuss booting OS 9 with TCG at the end.

Before starting, since we will use tun/tap networking, make sure the interface is up before booting. On Fedora, I do something like this:

sudo ip tuntap add dev tap0 mode tap user [your username]
sudo ip link set tap0 up promisc on

and, if you use libvirt,

sudo brctl addif virbr0 tap0

For filesharing you could set up either Samba or Netatalk. I use Netatalk, since I'm more accustomed to AppleTalk and it enables my T2 to serve files over AFP to the other classic Macs here, and it also will work fine with Mac OS 9 if you want to use that at some point.

Let's begin by constructing the command line to boot your emulated Mac from disc and install the OS. Each OS does better currently with certain combinations of emulated CPU and hardware features. In addition, we also need to make sure that the emulator stays within a single core for better performance (you will get random system stalls if it moves over to another core and throughput will be generally impaired), so we need to set affinities appropriately.

We'll go with 10.4 for our example; substitute for your OS of choice where relevant. Start out with

taskset -a -c 0-3 qemu-system-ppc -M

This binds all of QEMU's threads to a single core (recall that the T2 Sforza cores are SMT-4, and each appear as logical CPUs, so everything must run on a single core this way). While QEMU spawns more than four threads, encompassing two cores (i.e., 0-7) has no noticeable performance benefit and can sometimes unsettle Mac OS X by making timing loops unpredictable.

For the -M option, we will specify mac99 and kvm. The OSes differ on what they prefer for the VIA. 10.3 and 10.4 need to run the emulated mac99 with an emulated CUDA chip onboard, or the OS is unable to detect the real-time clock. 10.5, however, requires the later PMU attached to the VIA. So that gets us to

taskset -a -c 0-3 qemu-system-ppc -M mac99,accel=kvm,via=cuda (10.3, 10.4)
taskset -a -c 0-3 qemu-system-ppc -M mac99,accel=kvm,via=pmu (10.5)

All three of these OSes work fine emulating a 7400-series G4. We will use the "Nitro" 7410 (-cpu nitro), which is a bit faster than the G3 (-cpu G3). 10.3 may have some problems with assigning more than 1.5GB of RAM (-m 1536), but 10.4 and 10.5 work fine with 2GB (-m 2048). Don't use more than 2GB of RAM; it will cause various problems. A verbose boot is helpful in case you accidentally did something wrong (-prom-env boot-args=-v). We'll specify our disk image and some tuning parameters (-drive file=[filename].img,format=qcow2,l2-cache-size=4M), and say boot from the CD or DVD (-boot d -cdrom "/dev/cdrom"). Lastly, we'll enable the emulated RTL8139 NIC and USB tablet (-netdev tap,id=mynet0,ifname=tap0,script=no,downscript=no -device rtl8139,netdev=mynet0 -usb -device usb-tablet) and use a sane screen resolution (-g 1024x768x32). For my 10.4 booter, the full command line looks like this (using the filenames I use on this system):

taskset -a -c 0-3 qemu-system-ppc -M mac99,accel=kvm,via=cuda -cpu nitro -m 2048 -prom-env boot-args=-v -boot d -cdrom /dev/cdrom -drive file=tigerhd.img,format=qcow2,l2-cache-size=4M -netdev tap,id=mynet0,ifname=tap0,script=no,downscript=no -device rtl8139,netdev=mynet0 -usb -device usb-tablet -g 1024x768x32

I strongly suggest saving this as a shell script so that you can make any necessary variations. Insert your OS CD or DVD and run the script. It should go into the installer. If it didn't, make sure your filenames are correct, that you have OpenBIOS installed (it comes with QEMU) in a location the emulator can see, and that the KVM kernel modules (both kvm and kvm_pr) are loaded by checking lsmod.

Once the installer has booted you can of course directly proceed to installation in KVM, but I actually recommend shutting down the emulated Mac at this point and bringing everything back up in TCG to get the OS installed. To do that, just use the same command line, but change accel=kvm to accel=tcg. As I mentioned in the first part, heavy I/O loads tend to be less performant on KVMPPC, and installing and upgrading an OS is a pretty heavy I/O load, so running it in TCG will complete the task more quickly and more reliably.

If you want to run Software Update to bring your emulated Mac up to date, it's probably best to also do this in TCG. You could also separately download one of the combo installers (such as the one for 10.4.11) and push that to the emulated Mac on your Samba or Netatalk AFP share.

When the OS is installed, remove the CD-ROM from your command line unless you want to keep it, and change the -boot argument to -boot c to boot from the emulated drive image.

Ta-daa!

For best results with video updates, make sure that the display settings inside System Preferences match your physical display. I'm in 32-bit colour, so I made sure that System Preferences was using Millions instead of Thousands of colours. Because of variabilities in timing, you may notice the OS X clock is close but may seem to run somewhat unsynchronized from your host's clock because of how the delay loop might have been calibrated at bootup. This is mostly just a nuisance.

The next step is optional, but hacks KVMPPC to improve performance of the emulated Mac. Right now we're actually fooling the operating system; we're not really a G4. In fact, the closest Power Mac relative to the T2's POWER9 is the G5, i.e., the PowerPC 970, which is essentially a POWER4 with some modifications for workstation duty and a bolted-on AltiVec unit. Even though we told the OS we're a G4, this doesn't change the attributes of the CPU, in particular for this case the specific instructions it does and does not support and how certain others are handled.

With "big POWER" IBM removed some of the PowerPC instructions that were infrequently used or scaled badly, such as dcba and mcrxr. You don't need to know what these do; just know they were used in some software, but as of the G5 ceased to exist in hardware. Additionally, the G5 and later big POWER designs (including the POWER9) also have a 128-byte cache line instead of the 32-byte cache line of the G3 and G4, which is relevant to the dcbz instruction as it zeroes an entire cache line and potentially spills it to memory. OS X has adaptations for dealing with these cases (an illegal instruction handler in the first case that simulates the instructions in software, and modified system routines in the second), but that only happens if OS X knows the machine is a G5. In this case, it doesn't, so these adaptations are never installed.

KVM-PR gets around the dcbz problem on later POWER designs, including the POWER9, by scanning every new code page in a 32-bit guest for the dcbz instruction and replacing it with an illegal one it can detect. (Remember, it's still a legal instruction; it just behaves differently.) When executed it faults and falls back to KVM-PR, which simulates a 32-byte dcbz instruction in software, and returns control to the guest. It's not a surprise that this process is quite slow, especially if it gets called in a loop. Unfortunately Apple does just exactly that for clearing memory and the instruction is a major portion of the OS' built-in implementation of bzero, which is also called by memset. This is a hot routine and needs to run fast. The G5 version knows about the cache line difference and accounts for it; the G4 and G3 versions don't, and we're using the G4 version.

Apple, however, also helped us out here a little bit by allowing us to guess where the routine is. This and other major components live in a section of memory called the "commpage," which is always located in the top eight pages of the 32-bit addressing space in every process. It is provided by the kernel as an optimization for fast access to important data and common routines. The bzero routine is virtually unchanged from 10.3 to 10.4, and both start with a very unique instruction (cmplwi cr7,r4,32). If we see this instruction in the commpage, we can be confident we have found bzero. And now that we've found it, we can modify it.

Recall I mentioned that KVM-PR must scan each new executable code page for the instruction and change it. We can alter KVM-PR to detect that unique leader instruction if it's mapping in the commpage, and then monkeypatch in a new routine that doesn't use dcbz and thus won't require slow simulation. To make it more reliable, we know where the location should be, so we'll only patch it if it's actually there. As a bonus we'll also map dcba to nop anywhere in an executable section so that it doesn't need a trip to a special handler either. That is what this patch does.

To build KVMPPC with this patch uses the same steps as we discussed for building and installing the kernel modules in part 1. This patch also applies with -p1.

Does it make a difference? You bet it does. On my system with Geekbench 32-bit on Mac OS X 10.4.11, it improved the overall benchmark by nearly 200 points over the unpatched version, almost all of it in (no surprise) the memory score.

This consequence of masquerading as a different CPU also carries over into which software you can run. Even though this is a G4, you actually have to run the G5 version of TenFourFox, which doesn't have any of the other illegal instructions that aren't patched (just be patient -- it will take TenFourFox almost a full minute to come up). If your software offers a G5 version, you should run that if you can. The discontinuity leads to amusing discrepancies like this one.

Interestingly, TCG on POWER9 actually had errors during SunSpider that the JIT in TenFourFox under KVMPPC doesn't, and even with the warmup was up to twice as slow as KVM at SunSpider. Go TenFourFox!

You'll find that performance is still fairly pedestrian even with KVMPPC. While the OS typically benchmarks my T2 as a "2.04GHz G4" (TCG usually gets computed as somewhere between "900 MHz" and "1.0GHz"), the actual throughput you get varies greatly on workload. Raw CPU performance is a bit better than my Quad G5 scores running single core in Reduced mode, though the Quad running full tilt easily surpasses it (the emulation overhead is only reduced, not eliminated). The numbers get a lot different in applications depending on how their workload is structured. For example, TenFourFox's G5 JIT in KVMPPC gets about 6800ms in SunSpider compared to around 3800ms on a "real" 1GHz iMac G4. Improving these numbers to get parity, and especially getting QEMU to support SMP, will need to be an area of active future development.

Lastly, I mentioned about the best way to run OS 9 on a Talos. Although limited to TCG, it's still pretty snappy, a testament to Mac OS 9's comparatively low system requirements. Mac OS 9 works better with the PMU than the CUDA (or you get problems with the mouse not responding to double clicks reliably) and is limited to 1.5GB of RAM. It also doesn't support the QEMU USB tablet, but it does support the RTL8139 with this driver. To get the driver installed, I actually just made an ISO image out of it, dropped it in the Extensions folder and rebooted it. My command line looks like this:

qemu-system-ppc -M mac99,accel=tcg,via=pmu -m 1536 -boot c -drive file=classic.img,format=qcow2,l2-cache-size=4M -usb -netdev tap,id=mynet0,ifname=tap0,script=no,downscript=no -device rtl8139,netdev=mynet0 -rtc base=localtime

Mac OS 9 uses a different real-time clock base, so this has an additional -rtc option. You can use any CPU you want since it's emulated; I just use the default G4 7400 here instead of specifying one.

Post questions or things you've discovered in the comments.