Prefixed instructions and more in the OpenPOWER 64-bit ELF V2 ABI Specification


This is pretty nerdy, but you read this blog, so you have no room to talk. The ABI specification for 64-bit ELF v2, which is virtually everything running little-endian and many big-endian 64-bit Power systems, is now in draft for version 1.5. Although library interfaces are not in spec, the document defines a linking interface for executables and shared objects so that register usage and calling convenion is consistent. Rarely if ever would such updates include breaking changes and the small version number increment implies no large shifts, but those of us who write JITs and compilers pay attention to these updates since they may yield new opportunities to generate better code, and they also occasionally shed light on new or upcoming instructions.

Besides incorporating errata from the previous version 1.4, and moving vector programming to a separate document, the most interesting change here is related to POWER10. We've mentioned prefixed instructions briefly here before, a new class of load/store operations available in POWER10 and PowerISA 3.1 that effectively create 64-bit instructions: a first half (prefix) containing up to 18 bits of a displacement, and a second half (suffix) containing the lower 16. This enables to you to express as many as 34 bits of displacement for a memory address with an instruction like pld (as opposed to the 32-bit classical instruction ld with a 16-bit displacement); previously the maximum you could indicate was 26 bits, and only for instructions like unconditional branches that allowed it. However, the R bit in the prefix allows you to use the address of the prefix itself as the register against which the displacement is added, rather than having to have a general purpose register do it or (for non-local data) the dedicated TOC register. (Recall that in Power ISA the program counter is not a general purpose register.) This is actually a big deal for things like constant pools (embedding constants directly into many RISC-style instructions is generally unwieldy) because now you can just squirt local data right into whatever function fragment you're generating instead of having to keep track of it separately. This is mentioned briefly in the ISA 3.1 manual but the ABI spec makes it more prominent as a feature.

Unfortunately, various complexities in instruction dispatch make this less useful than it would appear to be. Prefixed instructions cannot be split over 64-byte (not bit!) instruction address boundaries or else an alignment exception occurs, which even if the OS handles it for you would be expensive. On the other hand, the CPU is clearly treating them as two 32-bit pieces because the prefix is always followed by the suffix regardless of the endianness (i.e., in little endian mode, the suffix is not in front of the prefix), and there are also debugging irregularities in that some suffixes are actually regular instructions that mean something different when used as a prefix. These can be detected by looking at bit 6 of the prefix (not the suffix), which if set indicates the suffix is a valid instruction that the prefix changes the behaviour of, but one wonders if more changes are to come and we don't need any more mscdfr (means something completely different for r0)-type situations in the ISA. A great example is pnop (yes, a prefixed no-operation instruction): you'd think the suffix would be ignored, and it mostly is, except if it's a branch instruction, rfebb, any context synchronizer other than isync or a service processor attention instruction. The ISA 3.1 book benignly says, "This restriction eases hardware implementation complexity." Well, thanks a lot! Does your head hurt too?

Again, I'm not a fan of introducing variable length instructions into what was a fairly regular instruction set and there are many important gotchas which to me seemed avoidable, but the displacement features are welcome and it makes certain on-the-metal programming tasks easier. Always watch these deceptively boring documents closely because there are sometimes valuable signals in their changelogs. Unfortunately, until the situation with POWER10 and OMI gets worked out, this is of largely academic interest.

Comments