Posts

Latest Posts

Firefox 67 on POWER


Firefox 67 has been released and to my relief (though I now build smoketest builds of Firefox on ppc64le approximately weekly to find such problems) mostly builds uneventfully. It has a number of nice features, including enhanced content blocking, improved full keyboard accessibility and various performance improvements. The marquee GPU-accelerated WebRender isn't on most Linux systems yet but that's coming soon, hopefully. I haven't experimented with it yet myself but the existing GPU acceleration works fine on this Talos II with the BTO AMD WX7100 card (set layers.acceleration.force-enabled to true in about:config).

That brings up the first catch, because I did say mostly uneventfully: changes to profile handling. If you build from mozilla-release as I do and I recommend, you will end up with a "nightly release" version (assuming you don't pass --enable-release, which I advise you don't pass right now). Starting with Fx67 nightlies from any tree will try to create a new profile separate from your previous profile but the old one remains intact. You can explicitly select it from the Profile Manager (pass -P), or, if you know already which profile you want to use, you can specify it with -p (on my system the default profile is unimaginatively called default, ergo, -p default).

The second catch I haven't figured out the cause, whether it's a kernel or a Firefox bug, but periodically it will throw occasional but not infrequent warnings that look like this in dmesg (this is on a 5.0.x kernel):

[337262.237052] ida_free called for id=170 which is not allocated.
[337262.237089] WARNING: CPU: 6 PID: 12276 at lib/idr.c:519 ida_free+0x114/0x1e0

If you are on a distribution where kernel warnings get converted into notifications (like the Fedora machine I'm typing on), this can be rather obnoxious. If you are badly afflicted, you can temporarily turn them off with these instructions. I haven't found the root cause for it yet and it's hardly a great hardship, but it didn't occur in Firefox 66.

As far as the Firefox JIT for POWER9, I'm still plugging along, but other than a minor pull request to the documentation it's still 100% yours truly working on it. Of the remaining pieces the macro assembler is about 2/3rds written, leaving the low level assembler after that, and then trying to make it build. However, I'm also in the midst of a systems update for TenFourFox, which I still have a commitment to maintain in the short term, so any help will get it in your hands faster. Hopefully the commits make it clear how I'm translating the MIPS backend into POWER9, using all that 3.0B goodness (population count instructions! trailing zero count instructions! load PC in one instruction! it's an assembly language candy store!).

It's been a little while since I posted the .mozconfigs I use, so rather than direct you to old entries I'll just reproduce them here. Note that MOZ_PGO and MOZ_LTO don't seem to properly work and may generate defective binaries, thus their absence, and I explicitly pass --disable-release to an opt build because of various minor problems which hopefully we'll eventually smoke out. Adjust the number of cores as you like; this is a dual-4 system, so with 32 threads available I reserve 8 to let me still play Descent II during build runs. :)

Debug

export CC=/usr/bin/gcc
export CXX=/usr/bin/g++

mk_add_options MOZ_MAKE_FLAGS="-j24"
ac_add_options --enable-application=browser
ac_add_options --enable-optimize="-Og -mcpu=power9"
ac_add_options --enable-debug
ac_add_options --enable-linker=bfd

export GN=/usr/bin/gn # if you have it
export RUSTC_OPT_LEVEL=0

Optimized

export CC=/usr/bin/gcc
export CXX=/usr/bin/g++

mk_add_options MOZ_MAKE_FLAGS="-j24"
ac_add_options --enable-application=browser
ac_add_options --enable-optimize="-O3 -mcpu=power9"
ac_add_options --disable-release
ac_add_options --enable-linker=bfd

export GN=/usr/bin/gn # if you have it
export RUSTC_OPT_LEVEL=2

What's missing in this picture


I've got the case (an inexpensive mATX Silverstone SST-ML03B), I've got the memory (16GB). The PSU, optical drive, wireless keyboard and WiFi should arrive next week. Now, what am I missing? Think think think!

Since the whole idea is a POWER9 system for the more price-sensitive, the trimmings cost about $500 on Amazon (minus tax and shipping) and could probably be found elsewhere for less. I also got in on the 4-core $999 Blackbird bundle special price, so with the 2U HSF and tooling that was $1090 before tax and shipping (now it would be roughly $1380) for a base outlay of about $1600. This is a nice attempt at a barebones 8-core for $1950, also apparently minus tax/shipping. Yes, I know you can get an Intel system for less, so don't even bother posting that. If price is your highest priority, you already know you're in the wrong place, but at least now price can still be a priority for what is a decent libre system regardless.

Obviously the aim for us here in the Floodgap household is to use it as an HTPC and that's how I'll be reviewing it. If you just want it as a workstation or to jam in a closet as a low-end server, you can almost certainly cut this parts list further.

ZombieLoad does not affect POWER9


If it's Tuesday, there must be yet another speculative execution attack debuting with a funny name and this Tuesday's entry is ZombieLoad. ZombieLoad works on the same conceptual basis of observable speculation flaws to exfiltrate data but implements it with a new class of Intel-specific side-channel attacks utilizing a technique the investigators termed MDS, or microarchitectural data sampling. While Spectre and Meltdown attack at the cache level, ZombieLoad targets Intel HyperThreading (HT), the company's implementation of symmetric multithreading, by trying to snoop on the processor's line fill buffers (LFBs) used to load the L1 cache itself. In this case, side-channel leakages of data are possible if the malicious process triggers certain specific and ultimately invalid loads from memory -- hence the nickname -- that require microcode assistance from the CPU; these have side-effects on the LFBs which can be observed by methods similar to Spectre by other processes sharing the same CPU core. Other internal buffers of potential value can also be sussed out by related MDS-style techniques.

Because of the limited bandwidth of the LFBs and the effectively streaming nature of the technique, an attacking process can't select arbitrary addresses and therefore can't easily read arbitrary memory. Nevertheless, targeting easily recognizable kinds of data can still make the attack feasible, even against kernelspace. For example, since URLs can be picked out of memory, this apparent proof of concept shows a separate process running on the same CPU victimizing Firefox to extract the URL as the user types it in. As the user types, the values of the individual keystrokes go through the LFB to the L1 cache, allowing the malicious process to observe the changes and extract characters. By its nature there is much less data available to the attacking process but that also means there is less data to scan, making real-time attacks like this more feasible combined with other attacks or social engineering.

However, ZombieLoad is pretty much irrelevant against POWER9 because the LFBs it attempts to monitor are specific to Intel's implementation of HyperThreading (which is true for really any other SMT implementation other than Intel's; the authors of the attack say they even tried on other SMT CPUs without success, almost certainly AMD, though it is not stated for certain that they tested on Power ISA). Even for unpatched Intel machines the actual risk from this (or even most speculative execution attacks, to be sure) is probably limited because it requires running a malicious process to do the snooping and such processes almost certainly have other, more reliable ways of pwning such machines. The decision to patch may simply come down to how much risk you're willing to tolerate: nearly every Intel chip since 2011 is apparently vulnerable and the performance impact of fixing ZombieLoad varies anywhere from Intel's rosy estimate of 3-9% to up to 40% if HT must be disabled completely.

Blackbird shipments start next week


The release of the Blackbird firmware indicates shipping is imminent and Raptor confirms on Twitter that shipments should start next week. Time to get those mATX cases!

Red Hat Enterprise Linux 8.0


The newly Big Blue Red Hat released Red Hat Enterprise Linux (RHEL) 8.0 today, based on Fedora 28. Because it's based on F28, this release of RHEL should "just work" on the Talos II (F28 was the first Fedora to support it), and mostly whatever applies to F28 applies to RHEL 8, including GNOME 3.28, Wayland-by-default and other changes. Although both big and little endian are apparently supported on the trial evaluation images, the documentation says only little-endian is supported, so it's possible not all parts of the site are in sync yet. Still, if you like your Linux corporate (and paying for it), now that Red Hat is an IBM product it's probably as corporate as you can get.

DAWR YOLO mode coming to kernel 5.2


Thanks to Michael Neuling at OzLabs who gave me the heads up and wrote up the patch. One of my pain points doing development on the POWER9 is that hardware watchpoints are disabled at the kernel level. This is because the CPU will checkstop if a watchpoint is set on cache-inhibited memory such as devices, and if a checkstop occurs will invariably bring the system down. The formal name for the special purpose register governing this feature (recall that Power ISA has three classes of registers, i.e., general purpose, floating point and special purpose) is the Data Access Watchpoint Register, or DAWR. There is no software workaround for this problem, and because a malicious local user could bring the system down without privileges by managing to provoke such a situation, setting such watchpoints via the DAWR is therefore currently disabled for safety. Unfortunately, software watchpoints are sometimes hundreds of times slower than hardware watchpoints and for certain debugging tasks are just about indispensable (such as JIT code generation).

IBM notes this issue as an erratum which implies they see it as a defect and therefore suggests it will be fixed in hardware in the future (it does not affect POWER8). Until then, Michael's patch enables "DAWR YOLO mode" for those of us (like me) who are single users on a workstation who know what we're doing, need hardware watchpoints to debug our software before the heat death of the universe, and accept the risk of system crashes. It creates a debugfs switch at /sys/kernel/debug/powerpc/dawr_enable_dangerous that enables the superuser to (mostly) freely turn access to the DAWR off and on; see the patch for more details. Fortunately this change has been finally queued for kernel version 5.2, which means I hopefully won't have to screw around with a custom kernel for much longer and is very good news for other developers in the same boat. Thanks, Michael!

A friendly FPGA reminder


Raptor is seeking beta testers for the upcoming 2.00 Talos II firmware, including a tease for users of the built-in VGA port. This brings up a question: flashing the PNOR and BMC is relatively straightforward (I've done it from my Quad G5), but flashing the FPGA requires a programmer and not all of us are handy with those -- or even have one. Do we need to do that too to try the beta?

Fortunately, Raptor's great support staff responded to my query and said (emphasis mine), "The FPGA firmware is largely independent of the BMC and PNOR. FPGA updates are only released to improve compatibility with PSUs and chassis components encountered in the wild, and at no point will a BMC nor PNOR update be released that is incompatible with the earliest FPGA revisions." That's very reassuring since I have an early T2 that is basically on the same FPGA flash it came from the factory with.

If that's the case, then, when should you update the FPGA? Raptor Support answered that too: "The only time you need to upgrade the FPGA is when you need functionality a new FPGA release provides, for example to activate the VGA disable jumper or to allow the system to boot with a different, previously problematic PSU."

Looks like I've got something to try over the weekend.

Fedora 30 released (and a big Void)


Not to be outdone by the release of Ubuntu 19.04, Fedora 30 has been released as well. We pay special attention to Fedora here at Talospace since this Talos II runs F29. As with our mini-review of F29, we will be doing a similar mini-review of F30 after a couple weeks when the package repositories should be caught up for ppc64le. Chief amongst the updates is GNOME 3.32, gcc 9, bash 5.0 and PHP 7.3; here is the full change set. One disappointment is that 128-bit long doubles did not make this release as previously scheduled and has been held over to F31, which affects building MAME with gcc (see that post for a possible workaround) and a few other things. It's not clear what caused the delay since the issue plagues relatively few packages overall, but it's just enough to be obnoxious when it does. Until then, though, watch for our mini-review once we're ready to update.

Meanwhile, if you like big ends and you cannot lie, the POWER9 Void Linux port can now boot in big-endian mode (the maintainer clarifies: with glibc). With a little bootstrapping help from Adelie Linux, now you can choose best you suits endianness which (deny can't brothers other you). The plan is to get the Void package repos for POWER9 at parity between the big and little endian versions and then let users pick what's most appropriate for their circumstances, including your choice between glibc and musl on both endiannesses. The big-endian version is also planned to have support for the PowerPC G5. For more information, the maintainer now has updated documentation.

A quick trip to IBM OzLabs


(Before we begin: this post was not sponsored, vetted or in any other way written in official collabouration with IBM; I'm writing this up strictly as a Power ISA bigot enthusiast and for no other purpose or consideration.)

Jetlagged greetings from a lovely trip now back at Floodgap Orbiting HQ in sunny southern California (this post partially composed on Air New Zealand's in-flight WiFi). Many thanks to Hugh Blemings of OpenPOWER whom I met at SCaLE 17x and suggested, since I was going to be visiting family in Australia and having a holiday in Canberra, that I head by IBM OzLabs (where he used to be manager). With the kind indulgence of Leonard Low, the current manager, it actually came together and on a warm autumn day last week my wife and I trooped down the National Circuit to the office.

OzLabs has a very long history as a Linux hacker collective and was one of the first commercial labs set up for Linux and Linux software support. In fact, it is rather infrequently reported that a visit to the Canberra Linux User Group, an indirect ancestor of OzLabs, is where the Linux Tux mascot originated (from a 1994 incident at the National Zoo & Aquarium where Linus Torvalds was actually bitten by a penguin; a somewhat inaccurate sign at the Zoo commemorates the injury). Formed at what was then the Australian division of Linuxcare in 1999, for a period of time OzLabs even provided a Linux CD burning service (in exchange for cookies, which my wife calls biscuits) along with Linux software development and support until the division shut down. Hugh was manager for part of this period along with its resurrection under IBM in 2001 as part of the IBM Linux Technology Center, which Leonard manages now.

The current OzLabs location is in an executive building and isn't purpose-built, as Leonard pointed out somewhat apologetically, but still gets the job done. Today the division concentrates primarily on Linux on Power support (including Skiboot and Petitboot, which actually originated from the Cell-based PlayStation 3) in addition to its hosted projects; in fact, Paul Mackerras, the original maintainer of PowerPC Linux, is today a senior technical staff member working on KVMPPC and very politely reviewed my trivial patches for KVM-PR a few months back.

Leonard greeted us at the entry and took us back where the magic happens.

Now, this is IBM, so there's still the corporate face. (I have a story about this: when I was an AIX sysdweeb back in the antediluvian days, we would regularly get visits from IBM salesdroids. However, a few years ago when I tried to buy my own POWER7 hardware personally, I couldn't get any VARs to take my money probably because I didn't need a service contract. I ended up settling for a lightly used POWER6 from a reseller; that box still runs Floodgap today. Now that I have an executive position in a large municipal department, though in a job unrelated to computing, upon hearing this story the IBM salesdroid servicing the municipal account gave me his card and told me to call him any time.) There are meeting rooms and a decent-sized auditorium space, where my wife was talking academia shop with Leonard while I shutterbugged.

A Thomas Watson-esque THINK mural dominates the wall (I have a THINK notepad as a gift from that salesdroid which I use for late night call notes). There's also a small display case nearby with several items, most notably the head and disk assembly from an IBM 3380 circa 1980. This assembly is a Model J with two actuators each accessing about 630MB each; the 3380's frame carried two of these assemblies for a grand total of 2.52GB in the AJ4 configuration, making the 3380 the first gigabyte storage device. The larger but slower Model K had 1890MB per actuator for roughly 7.5GB in a fully loaded AK4 frame, weighing a foot-flattening 250kg and pulling 6.6kW of power. This assembly alone weighs 32kg, so hope you had your hernia belt on while installing it. Due to its incredible rotational inertia the spindle was stopped by the equivalent of an automotive disc brake.

One disappointment: no photography inside the work area. Although I actually do hold a US security clearance, export regulations are such that pictures taken inside are not allowed, so to ease Leonard's heartburn all the pictures you see in this blog post were taken outside in the public space and I took no pictures within the secure area. However, I tapped out some copious notes and those I'll share with you.

In the prototype area Leonard showed us examples of Romulus and Witherspoon. The Romulus development reference design you should know very well by now: the Talos II is strongly based on it, and apparently the OzLabs developers like the T2 so much they're ordering more as workstations. (Maybe that's why it's currently backordered at Raptor, grr.) Witherspoon is described in Skiboot documentation as "a POWER9 system with NVLink2 attached GPUs"; it is the direct ancestor of the Monza-based AC922 used as nodes in the Summit and Sierra supercomputers (more at the end). There was also a small system with an FPGA prototype BMC under testing. Amusingly, the prototype room also had some historical items, including otherwise nondescript tower systems based on the PowerPC 604, 750 (G3) and 405, none of them Power Macs, and some workstation hardware I recognized from the beige days.

In the server room a POWER9 Zaius (a/k/a Barreleye G2) system was sprawled out on a table. This is an OpenCompute device developed jointly by Google and Rackspace as a successor to the original POWER8-based Barreleye. Although just 1U tall, the system we saw was too wide for IBM's racks, though I did rather like the removable drive bay. It takes LaGrange CPUs (more on Monza and LaGrange in a second).

We also saw the POWER8 Palmetto and Stratton prototypes in the racks, each in this SilverStone ATX case. The Palmetto design emerged as the Tyan GN70-BP010, the first customer-available OpenPOWER system; Stratton became the S821LC, with its close relative Briggs (get it?) as the S822.

Although the pamphlet I stole it from is dreadfully out of date (2008), since I couldn't photograph it you can get a small idea of the lab from this page out of IBM Australia's then-official brochure (warning: large PDF; usual disclaimers apply). While only some of the staff were there due to the Easter holidays, which was poor planning on my part, a relaxed and skilled atmosphere in the relatively open floor plan was evident. We also spotted the continuous integration display (all builds green!) and a modified xkcd that said "Petitboot" instead.

Michael Neuling, another IBM staffer, kindly provided some public chip samples to photograph and we took them outside the secure area. One of them was this POWER8 wafer which I took from a couple different angles. The two-ply "white" strip is test logic; the dies between them have six cores on a 22nm process. The Turismo POWER8 has one of these and the Murano POWER8 has two.

The POWER9 Nimbus scale-out family in OpenPOWER systems (on top of the POWER8's wafer carrier for size comparison, more or less), as elegantly hand-lettered by Mikey. Sforza is the chip we know and love in the T2 I'm typing this into; as implemented in Romulus it provides the most similarity to existing commodity designs and prioritizes PCIe, offering the most of all three (48 PCIe 4.0 lanes). LaGrange and Monza are in the larger form factors with double the memory channels of Sforza, with LaGrange also offering the biggest XBus bandwidth between processor sockets (two lanes, twice that of Monza and Sforza) and Monza the greatest OpenCAPI/NVLink throughput. Knowing this, it makes sense why Rackspace and Google went with LaGrange for Zaius, but IBM used Monza for Witherspoon/AC922 where GPU attachment mattered more.

At one point Raptor made an off-hand mention of a future LaGrange system, but so far nothing more has been heard.

Finally, a couple more items: the Top 500 certificates for Summit at Oak Ridge National Laboratory, TN and Sierra at Lawrence Livermore National Laboratory, CA, currently ranked numbers 1 and 2 as of this writing. Summit has 4,608 nodes based on the later 6-GPU AC922, each with dual 22-core Monza POWER9 CPUs and six NVIDIA Tesla V100 GPUs on a Mellanox dual-rail EDR InfiniBand network; Sierra has 4,320 nodes of an earlier revision with the same CPUs and four GPUs. Last is one of the many patents from the team, this one from 2015 honouring Andrew Bentley, a senior technology architect.

Although I couldn't show you everything, we're still very grateful we could drop by and see how the magic gets into our wonderful OpenPOWER machines. Thanks in particular to Paul, Mikey and Leonard for tolerating our silly questions and disturbing their quiet workplace, to Hugh for getting this all in motion, and all of the OzLabs inhabitants. We had a lovely time!

On our way out admiring the view from the auditorium (left: St Andrews Presbyterian Church; right: Parliament House) for the rest of our vacation, disturbed only by some officious stuffed shirt from the Attorney General's office that didn't like us photographing in a public street.

Broadcom BCM5719 libre firmware coming real soon


Posting from the Southern Hemisphere today, kudos to reader Mark J who sent in a heads-up on the progress the Ortega project has done on reverse-engineering the Broadcom BCM5719's firmware. If that number sounds familiar, it's because it's the very same NIC in the Talos II and Blackbird (see a photograph) and one of the few places left in the Talos family that binary blobs are required. Apparently, the reverse engineering effort is now believed sufficient to create a clean-room implementation.

And where is that implementation? Why, on Github, of course. As of this writing the current firmware is a work in progress, but now that the chip is much better understood it's very likely work will move much more quickly.

The BCM5719 is an interesting chip internally, implementing MIPS II (!) cores that apparently were once part of the receive and transmit machinery but are now mostly relegated to autoconfiguration-like tasks. A MIPS core exists for each port, but only a single application processor engine (APE) is present per chip. Hugo Landau's work on figuring out how to talk to the APE, the most essential component any open-source driver would need to interface with as it implements the sideband interface, is nothing short of heroic. A particularly noteworthy deficiency he discovered was that even though the firmware image for the APE's ARM Cortex-M3 has an RSA signature, nothing actually checks it! That would seem like a terrible rookie mistake on Broadcom's part but it's great news for us. A hearty congratulations to everyone's hard work on a very necessary project.

ArchLinux on POWER9


Another option is available for Linux on the Talos: Arch Linux. Although officially x86_64, alpha installation ISOs are now available which apparently "just work" on the Talos II. No word yet on available packages but if you like your Linux lean and mean on your machine that cost you much green, you'll like what we've seen. (I'll be here all week. Try the veal.) Update: The maintainer has a site up.