And now a bird that's not a Raptor: the H3 Falcon II

A post on the OpenPOWER blog caught my eye about the H3 Falcon II. This is not Raptor hardware; in fact, it's technically not even a computer, but rather a big rackmountable box of up to 16 GPUs. The Falcon II is PCIe 4.0 based and supports up to 31.5 GB/s bandwidth on each of its x16 GPU slots, with four host interconnects that can share them on demand. Obviously a lot of deep learning and other kinds of GPU-heavy tasks would be very interested in that much power that hopefully can be dynamically utilized as processing need requires.

POWER9 systems are well positioned to take advantage of this kind of hardware due to their prodigious I/O capacity and full support for PCIe 4.0, but although the machine is shown with Tesla V100 GPUs, their PCIe version currently "just" supports 3.0. AMD does have a PCIe 4.0 GPU, the data center-oriented Radeon Instinct MI60 and MI50, but let's not forget the Tesla has one other trick up its sleeve: NVLink 2.0, providing up to 150 GB/s bandwidth each way which is directly supported by the POWER9 as well. However, the Falcon II doesn't seem to offer NVLink.

The Falcon II is definitely an interesting unit and for POWER9-based datacenters looking at really heavily compute-bound learning tasks could be a more economical way of sharing lots of powerful GPUs between multiple nodes. It's not likely to achieve its fullest potential until PCIe 4.0 GPUs are more common and it lacks the flat-out crushing bandwidth of NVLink 2.0, but so far NVLink isn't shareable in the way this is and the Falcon II's NVMe-to-GPU link is truly innovative. That said, if you're in the kind of AI stratosphere where you would actually be cramming 16 $10,000 GPUs into one box, economy may not be the most pressing concern you'd probably have.