Posts

Showing posts from November, 2018

Blackbird's on sale for Black Friday weekend!


We've just bought one of the Basic Blackbird Bundles (BK1B01, 90W 4-core POWER9 CPU, mATX mainboard, I/O plate and recovery DVD). Yes, they're on sale for $999.99, but only until 11:59:59pm Central Standard Time on Monday, the 26th! That's a $175 savings over buying the motherboard and 4-core CPU on sale separately, which is also on sale now too over the same period for $799.99 for the board (BK1MB1) plus $375 for the CPU.

There is of course some fine print: besides the fact the bundle does not include RAM, storage or a case, the most curious omission is that there is no heat sink-fan assembly in the pack-in deal. The 4-core should do fine with the 2U HSF, which is an extra $75 (and is what we ordered), but the 8-core will require the 3U HSF (160W max supported on the Blackbird).

We'll be reviewing it when it arrives. Get it while it's hot. Note that this is a pre-order, and Raptor hasn't given any more specific date for arrival than Q1 2019.

Update: Raptor is now also offering an 8-core Blackbird bundle (BK1B02) using the 160W part for $1329, and does include a 3U HSF with that SKU.

Sometimes it's necessary: running x86_64 binaries on the Talos II


Yes, it's gross, but sometimes it's necessary. There's a lot of software for Intel processors, and there's a lot of it that you can't recompile, so once in awhile you've got to get a little dirty to run what you need to.

In prior articles we've used QEMU with KVMPPC to emulate virtual Power Macs and IBM pSeries, but this time around we'll exclusively use its TCG JITted software CPU emulation to run x86_64 programs and evaluate their performance on the Talos II. For this entry we will be using QEMU 3.0, compiled from source with gcc -O3 -mcpu=power9. Make sure you have built it with (at least) x86_64-linux-user,x86_64-softmmu in your --target-list for these examples, or if using your distro's package, you'll need the qemu-x86_64 and qemu-system-x86_64 binaries.

However, there is also a new experimental fork of QEMU to try called HQEMU. HQEMU uses LLVM to further optimize the code generated by TCG in the background and can yield impressive performance benefits, and the latest version 2.5.2 now supports ppc64le as a host architecture. However, despite its obvious performance gains, HQEMU is not currently suitable as a total QEMU replacement: it's based on an older QEMU (2.5.x) that is missing some later features and improvements, it still has some (possibly ppc64le-specific) notable bugs, and because it needs a modified LLVM it currently has to be built from source. For this reason I recommend you have them both available and select the one that works best.

The step by step instructions for building HQEMU on ppc64le (PDF) work pretty much as is, except for the following:

  • LLVM 7.0 is not supported; I used LLVM 6.0.1. The included patch for 6.0 does not apply against 7.x. I already have clang on the system and the LLVM build system uses it by default (though ordinarily I'm one of those people who prefer gcc), so I don't know if it will build with gcc, though it should. Rather than install into /usr/local, I prefer to install into hqemu/llvm in my source directory to avoid tainting the system's version. This makes your cmake command look like this, assuming you followed the steps in the manual exactly and are in the proper directory:

    cmake -G "Unix Makefiles" \
    -DCMAKE_INSTALL_PREFIX= ../../llvm \
    -DCMAKE_BUILD_TYPE=Release ..

    It takes about 15 minutes to build LLVM and clang with make -j24.

  • Not all QEMU targets are supported. x86_64-linux-user and i386-linux-user compile more or less as is, but you cannot compile x86_64-softmmu in HQEMU 2.5.2 (or any of the other software MMU targets) on ppc64le without this patch. I haven't tried any of the ARM targets, but I have no reason to believe they don't work. None of the other architectures or targets are supported. My recommended configuration command line for the T2 family is:

    ../hqemu-2.5.2/configure --prefix=`pwd` \
    --extra-cflags="-O3 -mcpu=power9"\
    --target-list=x86_64-linux-user,x86_64-softmmu \
    --enable-llvm

    It takes a couple minutes for a build with make -j24.

  • If you rebuild hqemu and you get a weird compile error while building some of the LLVM-related files, make sure that the llvm-config from the modified LLVM suite is first in your PATH (not the system one).

I'll include a couple screenshots of QEMU 3.0 and HQEMU 2.5.2 running a benchmark in ReactOS 0.4.9 under full system emulation here; you should be familiar with using QEMU by now, so I won't discuss its use further in this article. I used the CrystalMark benchmark not because it's particularly good but because most of the typical Windows benchmarking programs don't like ReactOS. First is QEMU, second is HQEMU.

You'll notice there were some zeroes in the benchmark under HQEMU. That's because they made HQEMU segfault! Also, oddly, the ALU score was worse, but the D2D and OpenGL scores -- done purely in software -- were two to four times higher. Indeed, ReactOS is a lot more responsive under HQEMU assuming you don't do anything it doesn't like. If you need to run a Windows application on your T2 and HQEMU doesn't poop its pants while running it, it is a much faster option. Note that some or all of these numbers can improve if you have any VirtIO support in your OS and/or appropriate drivers, which I have intentionally not used here to demonstrate the worst case. You may also be able to use your local GPU with virgl. We might look into that in a future article to see how practical non-native gaming is.

Instead of dithery benchmarks in a full system emulator, however, let's try to better quantify the actual CPU emulation penalty by running a simple math benchmark under QEMU's user mode. One of the easiest is to use the command line calculator bc to compute the digits of π, which can be done by taking the arctangent of 1 and multiplying it by 4. You can then use the scale= variable to set the difficulty level, such as echo "scale=5000;4*a(1)" | bc -l, which will (slowly) compute 5000 digits of π. (It takes around 30 seconds on modern systems.)

However, when you run a foreign architecture binary, you also need each and every one of the libraries it links to from that architecture and current versions of bc have several additional dependencies. This somewhat unnecessarily complicates our little benchmark test. Fortunately, its ancestor, the venerable dc utility, has no dependencies other than libc, as proven from the output of objdump -p:

[...]
Dynamic Section:
  NEEDED               libc.so.6
[...]
Version References:
  required from libc.so.6:
To port this simple benchmark we will take advantage of a little-known fact that bc used to be simply a front end to dc; systems such as AIX and apparently some BSDs can still "compile" bc scripts to dc scripts with the -c option. I've provided the stripped down output from my own POWER6 AIX server to compute digits of π in dc as a gist. Download that and put it somewhere convenient (for the examples in this article I saved it to ~/pi.dc). Note that versions of GNU dc prior to 1.06 or so will not properly parse this script but most of the dc binaries of non-GNU provenance I've encountered will run it fine. Get a baseline by running it against your system (here, my own Talos II):

% time dc ~/pi.dc
3.14159265358979323846264338327950288419716939937508
0.004u 0.001s 0:00.00 0.0% 0+0k 0+0io 0pf+0w

(The exact format of the time command's output will depend on your shell; this is the one built into tcsh.)

Next, increase the number of digits computed by changing the line 50k to 500k (i.e., 500 digits), and time that.

% time dc ~/pi.dc
3.1415926535897932384626433832795028841971693993751058209749445923078\
164062862089986280348253421170679821480865132823066470938446095505822\
317253594081284811174502841027019385211055596446229489549303819644288\
109756659334461284756482337867831652712019091456485669234603486104543\
266482133936072602491412737245870066063155881748815209209628292540917\
153643678925903600113305305488204665213841469519415116094330572703657\
595919530921861173819326117931051185480744623799627495673518857527248\
9122793818301194912
2.393u 0.001s 0:02.39 100.0% 0+0k 0+0io 0pf+0w

Assuming those numbers look accurate, finally bump it to 1000 to get a less noisy test. I'll spare you the digits.

[...]
20.833u 0.007s 0:20.84 99.9% 0+0k 0+0io 0pf+0w

Call it about 20 seconds of wall time natively (though I should note that Fedora 28 ppc64le is compiled for POWER8, not POWER9). Now, let's set up our x86_64 library root for the emulator test. Your distro may offer you these files in some fashion as a package, but I'll assume it doesn't and show you how to do this manually.

  1. Create a folder called debian-lib-x86_64. Our libraries will live here.
  2. Download the desired x86_64 (a.k.a. amd64) .deb of libc. I used the one from Jessie, but any later version should work.
  3. Uncompress it and find data.tar.xz within the .deb. Uncompress that.
  4. Within the data subfolder thus created, drill down to lib/x86_64-linux-gnu. Move that folder to debian-lib-x86_64/lib.
  5. Within debian-root/, create a symlink from lib to lib64 (i.e., ln -s lib lib64).

If you did this correctly, you should have a debian-lib-x86_64/lib with a whole mess of files and symlinks in it, and a debian-lib-x86_64/lib64 that points to the same place. Any additional libraries you need can simply be thrown into debian-lib-x86_64/lib.

Next, grab the x86_64/amd64 build of dc. I used the version from Buster since it matched the one on my Fedora 28 install, 1.07.1. It will work fine with the Jessie libs, at least as of this writing. Uncompress the .deb, find data.tar.xz, uncompress that, and find the dc binary within the created data folder. Move it somewhere convenient. For the examples in this article I saved it to ~/dc.amd64 and my x86_64 Debian libraries are in ~/src/debian-lib-x86_64.

First, let's test with QEMU itself. This assumes your pi.dc script is still set to 1000k.

% time ~/src/qemu-3.0.0/x86_64-linux-user/qemu-x86_64 -L ~/src/debian-lib-x86_64 ~/dc.amd64 ~/pi.dc
[...]
62.736u 0.026s 1:02.77 99.9% 0+0k 0+0io 0pf+0w

This is about three times slower than native dc, which isn't as dismal as you might have expected because all the syscalls are native instead of being emulated as well. We already know HQEMU will do this faster, but it'll be interesting to see how much so.

% time ~/src/hqemu/build/bin/qemu-x86_64 -L ~/src/debian-lib-x86_64 ~/dc.amd64 ~/pi.dc
[...]
45.181u 1.976s 0:27.40 172.0% 0+0k 0+0io 0pf+0w

Yes, 172% CPU utilization because of HQEMU's background optimization threads, but wall clock time is only 27 seconds! That's "only" 35% higher!

Do note that HQEMU's optimization isn't free. If we reduce the number of digits back down to 50 (i.e., 50k), we see this:

% time ~/src/qemu-3.0.0/x86_64-linux-user/qemu-x86_64 -L ~/src/debian-lib-x86_64 ~/dc.amd64 ~/pi.dc
3.14159265358979323846264338327950288419716939937508
0.048u 0.002s 0:00.05 80.0% 0+0k 0+0io 0pf+0w
% time ~/src/hqemu/build/bin/qemu-x86_64 -L ~/src/debian-lib-x86_64 ~/dc.amd64 ~/pi.dc
3.14159265358979323846264338327950288419716939937508
0.164u 0.016s 0:00.17 100.0% 0+0k 0+0io 0pf+0w

In this case, HQEMU is about three times slower than regular QEMU because of the LLVM optimization overhead over a very brief runtime. This example is still a nearly imperceptible seventeen hundredths of a second in wall clock terms, but if your workload consists of repeatedly running an alien architecture binary with a short execution time over and over, HQEMU will cost you more. Admittedly, I can't think of too many workloads in this category, but I'm sure there are some.

The take-away from this is that if you have a Linux binary from an x86_64 system and you can collect all the needed libraries, it has an excellent chance of at least working, and if it's something HQEMU can run, working with a relatively low performance penalty. The trick, of course, is collecting all those libraries, which could be a quick trip to dependency hell, and messing around with binfmt for transparent execution is left as an exercise to the reader. Full system emulation still has a fair bit of overhead but it's easy to set up and generally liveable, even in pure TCG QEMU, so you can do what you need to if you have to. Now go take a shower and wash all that Intel off.

Would you pre-order a Blackbird for $875?


Would you pre-order the new "Tiny Talos" Raptor Blackbird for $875? (Hint: compare this to the new weaksauce Space Gray Mac mini and you tell us what you'd rather buy.) We're planning to because we think this is a fabulous deal on powerful user-controlled hardware and a much lower barrier to entry to the POWER9 ecosystem, and we'll be reviewing it right here to see how viable the low-end spec can be. Do note this is mainboard cost, and while the Blackbird has lots of on-board peripherals, the RAM, storage and (probably) CPU will be extra (the base 4-core POWER9 right now appears on Raptor's site for $375). Either way, tell Raptor your interest level on their straw poll.

Roadgeeking with the Talos II (or, Workstation FUD and Loathing)


Now that Phoronix has published very impressive comparison benchmarks between their 2x22 Talos, AMD Threadripper and Intel Core i9, the next bit of pooh-poohing is "ppc64 (and LE) aren't ready for workstation duty."

FUD. Absolute FUD.

It's definitely true that big-endian systems are sometimes a little rockier to work with (we know all about this from our TenFourFox gig, of which this blog is a spinoff) because after the Macintel transition there are a lot fewer big-endian workstations on developer desktops. This isn't to say that running the Talos II big-endian is impossible, however: much, even most, of the software out there works fine, and if you're prepared to put up with a hiccup now and then and/or pitch in with fixes where needed, it's perfectly liveable.

But this isn't the case for little-endian PPC64, which works well. A few major apps aren't fully there, though some of it is administrative screwaroundry like Google being Google about accepting PPC (and MIPS) changes back to Chromium. On the other hand, although JavaScript in ppc64le Firefox is currently interpreted (I'm done with the first draft of the Ion code generator for the POWER9 JIT, and I'm now working on the trampoline, but this is still several months away), it works fine with the right configuration options and I'm told it still works fine on big-endian PPC64 too. My F28 Talos runs VLC, LibreOffice, Krita, GIMP, QEMU and many other essential apps (like ioquake3) out of the box with the distribution-provided packages. It will only get better as people see the advantages of OpenPOWER for a truly free workstation experience.

Still, the goalposts gotta move because haters gotta hate. So here's an example of demonstrating the viability of the Talos II on the desktop with a decidedly unusual hobby: roadgeeking.

Roadgeeks like yours truly drive miles and miles over weird roads to photograph signs and scenery, trace alignments and routings, and annoy highway departments with detailed questions. As a baseline, that work requires the ability to view mapping applications (such as Google Maps, Bing Maps, Open Street Maps, etc.), send mail, and read and work with spreadsheets of mileage and documents and publications. Check, check, check. I use Firefox, LibreOffice and the GNOME Document Viewer for that.

What about photography? Previously I took shots manually with a hand camera, stopping at intervals to grab a picture. This is really hard work and requires multiple passes on the road because of stuff you miss and inclement weather conditions, and sometimes some rather unsafe setups to get the right view on something, but nothing beats manual work for great images and it avoids the windscreen goo and bad angles hand camera shots at highway speed tend to have. Dumping images from the CF card is part of the basic functionality of any Linux system, and the raw images are then formatted and minimally reprocessed for the web with a shell script and ImageMagick (also works fine on PPC64 and ppc64le). Although I really try to avoid retouching to maintain maximum veracity, any editing required was formerly done in Photoshop on my long-suffering Quad G5 but now can be done in Krita (also works fine on ppc64le). Here is my trip on US Highway 395 from San Diego to British Columbia, and part of my trip of US Highway 6 -- still America's longest continuous highway -- from Bishop, CA to Nebraska (the remainder are all on a hard disk and their publication is pending my full writeup). These were all done by hand and I sometimes still do this for shorter alignments.

The labour-intensive nature of this means of photography, however, demands a better technological solution. As a very modern roadgeek, today I use a "flying camera" system -- a high-definition video camera running at 30 frames/sec, taking progressive non-interlaced video at a very fast shutter speed, about 1/2000th of a second. This enables live views of the road from the driver's perspective at speeds up to 70mph, and every frame is potentially a high-definition still because of the progressive video and fast shutter speed. Although a 1080p image (effectively 1440x1080 upscaled to 1920x1080) is still poorer resolution than even most mobile phone cameras, let alone a point-and-shoot or DSLR, anything the camera can see is captured, I don't have to stop for images, I can stay at highway speeds, I can still adjust zoom and some of the positioning on the fly (or switch lanes as required), and certain shots would not even be possible without such a setup. When I eventually upgrade the camera, the resolution will only improve. Plus, HDV 1080p is perfectly acceptable resolution for the web even nowadays and I can always stop for a manual shot with a proper camera if I want something larger. The only major drawback is that the fast shutter speed requires a lot of light or it gets grainy, limiting photography to times of day and year when there is sufficient ambient sun.

I devised this system originally for my Quad G5 running Mac OS X 10.4, which doesn't have native AVCHD support in Final Cut or QuickTime but can support HDV (MPEG-2), so I selected the Canon Vixia HV30 which stores HDV to MiniDV tape and has FireWire output. Final Cut then acquires this video over FireWire and converts it to Apple Intermediate Codec which is display and edit-ready. Later I got a Focus FS-CF Pro DTE digital recorder which dumps the HDV video to QuickTime on CF cards; the HDV file can then just be copied directly from the CF card, and HDV yields smaller files than the edit-friendlier AIC which is an advantage for archiving. The G5 can play HDV video in QuickTime using the codec from recent PowerPC-compatible versions of Final Cut Pro, and of course the codec is available on the Talos II also. (I still use boxes of MiniDV tapes for long trips where I don't have enough CF cards, and a nice side benefit is being "instant tape backup," but solid-state is just the way to go for this nowadays.) You can see the setup I use at the right, or click it for an enlarged image. Note the black felt on the dashboard to reduce reflections, a suction-cup mount to the windscreen, and a 15% red tint on the lens to compensate for the green cast from the safety glass. The entire setup is powered by batteries and the car's 12 volt output for recharging.

I'm now in the process of converting my old US 395 exhibit to the new flying camera format (as a bonus, 16:9 instead of the old 4:3 aspect ratio, so landscapes just look more impressive, too) and recently drove the old alignments between Bishop, CA and Carson City, NV, and then the trans-Sierra Ebbetts Pass (CA 4) from Markleeville, CA to Angels Camp, CA and back to southern California. We had about eight hours of footage to sort through on our first trial of using the Talos II to ingest video instead of the G5. As a guard against data corruption the FS-CF DTE is programmed to dump video to the card in 2GB blobs, or about 10 minutes footage each. With a USB 3.0 card reader and USB 3.0 hard drive dock we copied the HDV videos in Nautilus from the CF card to a spinning hard disk for archiving. As it turns out one 10 minute segment was corrupt on the card but it's a section we can easily rephotograph. Our Talos during the video ingest (show us yours) is at right; click for an enlargement.

For frame grabs, previously I would play the AIC or HDV file in QuickTime Player on the G5 and advance quickly up to a scene and go frame by frame until there was a nice image to use. I'd then grab the frame and save it as an image clipping, which I converted to PNG with AppleScript. (The CMOS image sensor "shimmy" actually helps here, because if I was slightly off position or rotated, sometimes the "shimmy" caused by the car's vibration would correct the geometry from frame to frame.)

With VLC, the same workflow is possible. I use the scroll wheel to scrub through the video and then hold down the advance-frame key to slow down until I get to the right spot I want. It looks like this on the Talos (click for an enlarged screenshot):

The workflow is actually better with VLC, in fact, because the screen grabs are already PNG. I mentioned some shots would be impossible without the "flying camera." Here's one of them:

This unretouched screen grab from VLC was captured in Carson City, Nevada from the middle of the road at 55mph at about 5pm on a fall October day. It's as if I froze time, walked to the middle of a busy highway transitioning to full Interstate freeway, and took a picture right there. The slight red cast due to the changing light conditions with the lens tint is compensated for with a little white balance, and the highway gantry is ever so slightly distorted, but this image is pretty much ready to go. And it was acquired with a full libre stack of software and hardware. ImageMagick and Krita will do the rest of what needs to be done and the write up will be done in Firefox.

Yes, this is my hobby. But it's also a great demonstration that doing this kind of work wouldn't be possible on the Talos II without the application support to match. That must mean the application support is already here. The handful of leftover glitches are disappearing by the day, and most everything else works as-is out of the box on most distros supporting ppc64le. If the sticker shock of a full T2 still gives you the shakes and even a T2 Lite is more than you had in mind, then hang out for the Raptor Blackbird next year, which will give you a taste of freedom for lots less green. No matter what your budget, the price points are improving just as fast as the software options.

The point is made, though: POWER9 is desktop-ready now. Anyone who says otherwise hasn't used one.

LaGrange system in the works?


We're all very jealous that Phoronix gets to play with a dual 22-core Talos II (we're just dual 4-core pikers here), but from the comments thread comes the mention of a possible future LaGrange-based system. All of the current Talos family (the Talos II, T2 Lite and upcoming Blackbird) use Sforza POWER9 processors, which currently offer the best flexibility for workstation, workstation-like and low-to-midrange server systems with 48 PCIe 4.0 lanes. However, Sforza "only" has half the memory bandwidth of the bigger beasts with "just" 4-channel DDR4 and a single X-bus SMP link, limiting such systems to two CPUs maximum. LaGrange, by contrast, has "only" 42 PCIe 4.0 lanes, but has 8-channel DDR4 and double the X-bus, making a 4-CPU system possible. LaGrange systems are already in use by Google and Rackspace for their Zaius/Barreleye designs. With SMT-4 and 22 cores, such a system could max out at a whopping 352 threads and would very clearly be positioned against AMD's offerings.

As an aside, those of you who know the entire Nimbus family may be wondering where Monza fits in the Talos product line, and our opinion is currently it doesn't. While Monza has 8-channel DDR4 and more relevantly the best OpenCAPI and NVLink interconnect bandwidth of the three chips, making it an excellent choice for large multinode systems like the gargantuan Summit supercomputer, it pays the price in just 34 PCIe 4.0 lanes. Raptor's current product line wouldn't seem a good fit for its more limited expandability, and Raptor systems aren't currently cheap enough to realize Monza's strength in clustering.

The mention is unofficial and no other details are available, including specs, price or release date, but we'll keep watching.

Making your Talos II into an IBM pSeries


This post has been updated with new information. Thanks to Zhuowei Zhang, the author of the post we reference, for pointing out QEMU did add SMP support for emulated pSeries hardware. Read on.

In our previous series on turning your Talos II into a Power Mac, we spent most of our time with the KVM-PR virtualizer, the "problem state" version of KVMPPC, which is lower performance but has no hardware dependencies and can emulate a great number of historical Power CPUs (including the G3 and G4, which were of most relevance to those articles).

Recently, however, someone pointed me to this blog post on running IBM's proprietary AIX operating system under QEMU and asked about how well this would work on the Talos II. AIX runs on IBM's own POWER hardware and thus affords a good opportunity for exploring KVM-HV, the hardware-assisted hypervisor flavour of KVMPPC, so let's find out.

Parenthetically I should say that I have a very long history with AIX: my first job out of college in 1997 was mostly working on a medium-size PA-RISC university server running HP-UX 10.20, but we also had a number of RS/6000 machines for E-mail running AIX 3.2.5 that I had access to as well. The RS/6000s are, of course, early implementations of the POWER architecture. In 1998, I ended up with an Apple Network Server 500 running AIX 4.1.4 (and later 4.1.5) that became the first floodgap.com until it was decommissioned in 2012. Its replacement was a 2-way SMT-2 IBM POWER6 p520 Express running AIX 6.1 TL.mumble with some hand-rolled patches, and this system still runs floodgap.com and gopher.floodgap.com today. I also have a couple of the oddball PowerPC ThinkPads, a ThinkPad "800" whose SCSI controller fuse got blown by a SCSI2SD upgrade, and a fully functional ThinkPad 860 with a German keyboard running AIX 4.1.5 as well.

I should also add that the licensing situation with AIX on non-IBM hardware is sticky. I may give the lawyers a heart attack with this oversimplification, but the salesdroids I worked with back in the day essentially had the rule that if you own IBM hardware that can run AIX, then you may run it, because you were considered to have an implicit license simply by possessing the hardware. This situation changed after IBM introduced pSeries hardware that was not allowed to run AIX, starting with the original POWER5 OpenPower machines: even though they are IBM hardware, they are not licensed for AIX, even though you allegedly could coerce AIX to run on at least a subset of these machines with some work.

This handwavy "some work" is what QEMU provides. There is enough of a pSeries-like environment to at least boot AIX, though some pieces are still missing and the kernel appears able to detect it's running under QEMU. However, whether it functions or not, it may not be legal to run an AIX installation on an OpenPOWER or PowerNV system like the Talos II even under virtualization because OpenPOWER and non-IBM Power ISA systems are explicitly not licensed for AIX. IBM is unlikely to come after you if you're just playing around with it, but you have been warned.

First of all, make sure your system is able to run QEMU under virtualization. You should be running at least kernel version 4.18 (my Fedora 28 T2 has 4.18.16) and QEMU 3.0. Check that kvm_hv shows up in lsmod to make sure it has loaded. You shouldn't need to make any modifications to it for this tutorial. If it hasn't loaded, try sudo modprobe kvm_hv to make sure the modules are enabled (check the dmesg if you get errors). There shouldn't be any problem if your kernel boots in HPT instead of radix MMU mode as mine does to enable KVM-PR.

Next, get bootable media. Although I have a set of install discs for AIX 7, the version I have is too old to boot on POWER9 systems (it's intended for when I get around to it with my POWER6), so for this demonstration we'll simply use the diagnostic image that the author of the blog post above uses. Although any of the diagnostic images compatible with POWER9 will work, download the CD72220.iso image to use the patch tool that author offers. This enables you to boot to a limited root shell to snoop around the filesystem. I haven't gotten around to updating the patcher for the more recent images, but this one will suffice for our purpose.

QEMU provides a graphical console and USB keyboard, but just like a real IBM system, only specific IBM-supplied devices are supported as the AIX console terminal (my own POWER6 requires a particular IBM USB keyboard and mouse, naturally provided at a confiscatory markup, to drive a console powered by a GXT145 graphics card). Since QEMU doesn't know how to provide these devices yet, we'll tell QEMU to provide an emulated serial terminal connected to one of the emulated system's VTYs instead, which will "just work." This emulated serial terminal is provided in the terminal session you run QEMU from, not the main QEMU window.

AIX will boot under TCG, the built-in JITted CPU emulation system. This is very slow but will demonstrate the speed differential versus running with hardware assistance. The same command line provided in the original blog post will work here too (I recommend keeping verbose booting enabled if you run with TCG so you can be reassured QEMU hasn't frozen); substitute your ISO filename below:

qemu-system-ppc64 -cpu POWER9 -machine pseries -m 2G -serial mon:stdio -cdrom iso/aix-72220-patched.iso -d guest_errors -prom-env "input-device=/vdevice/vty@71000000" -prom-env "output-device=/vdevice/vty@71000000" -prom-env "boot-command=dev / 0 0 s\" ibm,aix-diagnostics\" property boot cdrom:\ppc\chrp\bootfile.exe -s verbose"

When QEMU starts, just stay in the terminal session and minimize its graphical console; you won't be using it. Booting under TCG takes about seven minutes on my 32 thread (dual 4-core SMT-4) Talos II with QEMU built with -O3 -mcpu=power9. As the original author indicates, the boot will stall for some minutes (about six on my system) at the define_rspc step. You'll also notice four-digit hex codes appearing at the bottom of the terminal session representing the state of the bootloader which any AIX admin will recognize (real IBM hardware and the Apple Network Server display this on a front LCD or LED panel). Once the system prompts you to press 1 and press ENTER, do so, and it will either enter the diagnostics menu or the root shell depending on if you're using the patched ISO or not. This is sufficient to show it basically works but you will already appreciate this is dreadfully slow for any task of substance.

So, kill the QEMU process (or close the graphical console window) and let's bring it up with KVM-HV this time. SMP is supported, so let's give it four cores while we're at it to start with. You can continue to use a verbose boot if you want but this starts up so quickly you'll probably just find the messages annoying. As above, substitute your ISO filename below (if you get an error saying that the KVM type isn't supported and you know that kvm_hv is loaded, try booting it with just accel=kvm):

qemu-system-ppc64 -M accel=kvm,kvm-type=HV -cpu host -smp 4 -machine pseries -m 2G -serial mon:stdio -cdrom iso/aix-72220-patched.iso -d guest_errors -prom-env "input-device=/vdevice/vty@71000000" -prom-env "output-device=/vdevice/vty@71000000" -prom-env "boot-command=dev / 0 0 s\" ibm,aix-diagnostics\" property boot cdrom:\ppc\chrp\bootfile.exe"

Notice that we are using -cpu host. KVM-HV only supports virtualizing the actual CPU itself or the generation immediately before (-cpu power8 thus should work, but not -cpu power7 or before).

Once started, this virtualized boot shoots straight to the "press 1 on console" message in about 50 seconds on my box (!!), and all the way to the diags menu/root shell prompt in just under one minute. Much faster! As you explore the command line, do note that there are many missing binaries in the miniroot the diags disk provides and the terminal emulation (and my delete key: I manually backspaced with CTRL-H) have many glitches. This is to be expected since this disc was never meant to provide a shell environment and the components of the miniroot exist only to support the diagnostics front end. (In addition, it is not possible to actually configure the terminal correctly from the diags menu and therefore do anything useful, probably due to missing support in QEMU. Even if you enter a valid terminal type, the diagnostics front end will continue to complain the terminal was improperly initialized and prevent you from doing anything further.)

Nevertheless, once you get a root shell up, it's interesting to compare lsattr -E -lsys0 on real IBM hardware and on this emulated system. On my POWER6, here are some selected entries (I censored the system ID from the hardware VPD, nothing personal):

ent_capacity 2.00 Entitled processor capacity
frequency 2656000000 System Bus Frequency
fwversion IBM,EL350_149 Firmware version and revision levels
modelname IBM,8203-E4A Machine name
systemid IBM,{censored} Hardware system identifier

But some values are definitely different (and occasionally abnormal) on the emulated pSeries system. Some are even missing outright despite having a placeholder. Here are the corresponding ones from our virtualized 4-core box:

ent_capacity 4.00 Entitled processor capacity
frequency System Bus Frequency
fwversion SLOF,HEAD Firmware version and revision levels
modelname IBM pSeries (emulated by qemu) Machine name
systemid Hardware system identifier

The difference in entitled processor capacity is due to our command line options, but the CPU frequency is oddly unreported and the various other identifiers have different values or are unpopulated. This is possibly how the kernel was able to detect it's running under virtualization.

If you're curious what other hardware support is present, lsdev looks like this (with the given command line):

# lsdev
L2cache0   Available       L2 Cache
cd0        Available       N/A
mem0       Available       Memory
pci0       Available       PCI Bus
proc0      Available 00-00 Processor
proc8      Available 00-08 Processor
proc16     Available 00-16 Processor
proc24     Available 00-24 Processor
rcm0       Defined         Rendering Context Manager Subsystem
sys0       Available       System Object
sysplanar0 Available       System Planar
vio0       Available       Virtual I/O Bus
vsa0       Available       LPAR Virtual Serial Adapter
vscsi0     Available       N/A
vty0       Available       Asynchronous Terminal

The (in)famous AIX smit system configuration tool can be made to work from the command line; try something like TERM=vt100 /usr/bin/smitty to start it. As we say in the biztm, "smit happens."tm Use CTRL-L to repaint the screen if needed; if you see key combinations like "Esc+0," press ESC, release it, and then quickly press the second key. Note that this version of smit is missing quite a few screens and not everything does anything.

To bring down the system cleanly, not like it really matters here, just type exit at the shell, eject the virtual CD if you want to (Y or N), and then indicate to halt the system (H). AIX will respond with Halt completed and QEMU will automatically exit.

IBM used to be a lot more interesting with AIX. AIX 4 in particular offered a lot of workstation features and even a few games (my ANS 500 has AIX ports of Quake and Abuse on it), but modern versions are intended as buttoned-down server OSes and any client functionality is either accidental or secondarily grafted on. That said, after AIX 5L it got a lot easier to build stuff on AIX (either with xlc or gcc) and my full-service POWER6 (web, gopher and E-mail) runs a good collection of servers and utilities I ported myself plus all my old binaries I built on the Apple Network Server without comment. AIX is definitely different (and arguably staid and humourless) and its underpinnings such as the ODM may not be immediately familiar, but it's a tough OS that can take punishment and run like a tank, and I have to admit that I do love the jackboots. Despite having my own real hardware, it is fun to see it boot and run on the Talos even if only in a limited sense.