CXL is going to eat OMI's lunch


The question is whether that's a bad thing. And as it stands right now, maybe it's not.

High I/O throughput has historically been the shiny IBM dangled to keep people in the Power fold, and was a featured part of the POWER9 roadmap even though those parts never emerged. IBM's solution to the memory throughput problem was the Centaur buffer used in POWER8 and scale-up Cumulus POWER9 systems (as opposed to our scale-out Nimbus POWER9s, which use conventional DDR4 RAM and an on-chip controller), and then for Power10 the Open Memory Interface, or OMI, a subset of OpenCAPI. In these systems, the memory controller-buffer accepts high-level commands from the CPU(s), abstracting away the details of where the underlying physical memory actually is and reordering, fusing or splitting those requests as required. Notoriously, OMI has an on-board controller, and its firmware isn't open-source.

But why should the interconnect be special-purpose? Compute Express Link (CXL) defines three classes of protocol: CXL.io, an enhanced CPU-to-device interconnect based on PCIe 5.0 with enhancements; CXL.cache, allowing peripheral devices to coherently access CPU memory; and CXL.mem, an interface for low-latency access to both volatile and non-volatile memory. Both CXL.cache and CXL.mem are closely related and themselves transmit over a standard PCIe 5.0 PHY. Memory would be an instance of a CXL Type 3 device, implementing both the CXL.io and CXL.mem specifications (Type 1 devices implement CXL.io and CXL.cache, and rely on access to CPU memory; Type 2 devices implement all three protocols, such as GPUs or other types of accelerators). The memory topology is highly flexible. If this sounds familiar, you might be thinking of Gen-Z, which aimed for an open royalty-free "memory semantic" protocol; Gen-Z started the merge into the CXL Consortium, led by Intel, in January.

IBM was part of Gen-Z, but eventually let it dangle for OpenCAPI and OMI, and while it is a contributing member to CXL this seems to have been as a consequence of its earlier involvement with Gen-Z. But really, what's OMI's practical future anyway? So far we've seen exactly one chipset implementation from one vendor and that implementation has directly harmed Power10's wider adoption apart from IBM's own hardware. OMI promises 25Gbps per lane at a 5ns latency, but Samsung's new CXL memory module puts 512GB of DDR5 RAM on the bus at nearly 32Gbps. It's a cinch that Power11, whenever it gets on the roadmap, would support at least PCIe 5.0 or whatever it is by then and CXL would appear to be a better overlay on that baseline. Devices of all sorts could share a huge memory pool, even GPUs. Plus, a lot more companies are on board and that would mean a lot more choices and greater staying power, plus more likelihood of open driver support the more devices emerge.

There are still some aspects of CXL that aren't clear. Although it's advertised as an open industry standard, there's nothing saying it's royalty or patent-free (Gen-Z explicitly was, or at least the former), and the download for the specification has an access agreement. The open aspect may not be much better either: Samsung has an ASIC controller in their memory device but it still may need a blob to drive it, either internally or as part of CPU firmware (earlier prototypes used an FPGA), and nothing says that another manufacturer might not require it either.

Still, OMI has the growing stench of death around it, and it never got the ecosystem support IBM was hoping for; CXL currently looks like everything technologically OMI was to be and more, and at least so far not substantially worse from a policy perspective. Other than as a sop to their legacy customers, one may easily conclude there's no technological nor practical reason to keep OMI in future IBM processors. With nothing likely changing on the horizon for Power10's firmware, that may be cautiously good news for us for a future Power11 option.

Comments

  1. Side note: I think OMI = Open Memory Interface, not OpenCAPI Memory Interface

    ReplyDelete
    Replies
    1. D'oh, I used the old acronym. Though in my defense I note that it still says on the site it's a subset of OpenCAPI.

      Delete
  2. This is very interesting, let's see if it pays off. Any potential ramifications on performance/watt? I remember that memory is one of the reasons Apple Silicon is so good.
    Also, wow, 512GB in a single module. I'm not that fond of serial RAM, but I guess I can't resist the future forever, just take the least odious path through it.

    ReplyDelete

Post a Comment

Comments are subject to moderation. Be nice.