More cores for Mesa llvmpipe


On our open platforms we've long bemoaned that we currently need to deal with closed firmware to have good graphics performance (or indeed even 3D support: the built-in ASpeed BMC, though it has open firmware, only provides a 2D framebuffer).

While various alternatives like Libre-SoC continue development, the only 3D solution right now for a system that wants to run entirely open is a software rasterizer like llvmpipe, and even though it supports ppc64le its performance has not been great historically on our systems — see my poor struggling 4-core Blackbird running Xonotic at 1080p on the right. Fortunately, a modest but noticeable improvement is landing which should help. Apparently there's a hard cap of 16 threads, meaning all but the smallest 4-core Blackbird and T2 Lite machines were going underutilized, so now the cap is raised to 32.

This doesn't double graphics performance: as a developer notes in the thread, there are other bottlenecks that serialize the output, so the effective improvement going from 16 to 32 on a system with sufficient threadroom is about 10%. Also, on a smaller system the renderer will only use up to the maximum number of threads available no matter what the cap is set to. Still, if you have the cores this gets you another frame per second or two, so that's not nothing. Best guess is this will come out as part of Mesa 22.3; props to Luke Dashjr, who noticed the hard cap, and Jeremy Rand, who got the patch landed to raise it.

The next logical question is how far to turn up the volume knob (or, alternatively, why there's a hard cap at all apart from not recruiting too many execution units when the improvement is expected to be minor). While I can't answer the second question, Jeremy is looking for someone who wants to try ramping the patch up to 176 threads, the maximum number of hardware threads available on a dual-22 system. Such monsters do exist in the hands of enthusiasts, although it would also be good to see how it performs on smaller systems (regrettably my dual-8 is my daily driver or I'd try this already, so I'm deferring this until I have to mess with the guts again). If you're able to recompile your own local copy of Mesa with this change, post in the comments what you observe (there's a benchmark script on the Raptor wiki you can use to get the performance delta).

Comments

  1. Is there a good summary anywhere of how LLVMPIPE compares to a GPU architecture? Is the bottleneck in emulating the GPU cores and how they share memory, or is it more in implementing GPU ASIC parts that are not simply general purpose computation but in parallel?
    For example, I think there was a GPU design that had high core count, but limited geometry engines, so GPGPU (like OpenCL) was fine, but more traditional GPU tasks like drawing triangles suffered.

    ReplyDelete
    Replies
    1. Jeremy here. LLVMpipe is a very high-level implementation of the OpenGL API; it doesn't attempt to emulate the hardware or architecture of any specific GPU. Similarly, Lavapipe is a very high-level implementation of Vulkan (which converts to OpenGL and then passes through to LLVMpipe); it also doesn't resemble real GPU designs. AFAIK trying to emulate a real GPU in a CPU would be much less efficient, just like trying to emulate an x86_64 CPU on POWER9 and running Linux on that is much less efficient than running Linux natively for POWER9.

      Delete

Post a Comment

Comments are subject to moderation. Be nice.