Bettering the BMC and bring-up


Pretty much every computer these days has a service processor of some sort for bringing up the system and the main CPUs, just sometimes under different names (the Intel Management Engine could be considered a form of one, albeit with a lot more black boxes tacked on). So do POWER9 systems from the smallest Blackbird to the biggest IBM E980, and for many of these systems that service processor is the BMC, or the baseboard management controller. (Systems capable of PowerVM use one or more PowerPC (405?)-based Flexible Service Processors, or FSPs, which are accessed over ASMI. This includes big Power boxes like the E980 and even some z/Machines but also many pre-OpenPOWER systems like the 8203-E4A POWER6 that runs Floodgap. Since this is not particularly relevant to OpenPOWER, I won't talk about it further in this article.)

BMCs are present on multiple systems from multiple manufacturers, including many POWER8s and, most relevant to us, the POWER9 Romulus reference platform on which the Raptor Talos II family (including the Blackbird) is based. Most of these have AST BMCs, typically either the ASpeed AST2400 or 2500; all shipping Raptor systems use the AST2500. This is an 800MHz ARM11 ARM926EJ CPU with a secondary 200MHz ColdFire core, a built-in 2D framebuffer, USB and dual GigE MACs. Among its many tasks it controls the power rails, provides firmware stored in the PNOR flash for the main system CPUs, provides services to other autonomous subsystems over IPMI, receives temperature data from the main system CPUs' On-Chip Controller (OCC) and manages fan and environmental controls during normal operation. During bringup it is the main processing element before the primary CPU(s) are enabled; it is what signals the Power cores to execute from their burned-in OTPROM which then transfers control to the self-boot engine (SBE) firmware SEEPROM.

Besides the fact that the BMC has overwhelming ability to affect virtually all system components, including while the system is running, the BMC also directly influences how quickly the system can be brought up. To improve the security and auditability of BMC-based systems, the BMC in just about every OpenPOWER machine runs OpenBMC (not to be confused with the identically-named and similarly functioning Facebook OpenBMC), a small open-source Linux distribution tailored to its unique tasks.

The open availability of both the OpenPOWER firmware and OpenBMC itself is what really makes our systems truly ours from the firmware up. You can download and audit these critical pieces, including Raptor's Talos OpenBMC, and you are encouraged (and actively supported) to build and install your own. The problem, however, is that OpenBMC is relatively slow to get the system going when power is applied and the machine can't be started until it does. On a server which is normally up this is generally unimportant; my POWER6 gets rebooted pretty much only when the backup power fails. Similarly, this T2 is usually running all the time. My Blackbird, on the other hand, boots when the projector is turned on and gets shutdown when I'm not in the home theatre room. With over two minutes to get from turning on the power strip to a Fedora login and almost a full minute of that just to get the ability to start main power, this is a major drag and harms the ability to dogfood POWER9 in smaller applications. There is also the small but non-zero risk that if a power failure occurs during access to the flash that it could brick the system. The longer the bring-up time, the longer that potential window of vulnerability.

Fortunately, it looks like further advancements are now finally making a dent in the BMC bring-up delay. Almost 25% of the boot time on a Witherspoon (AC922) system was shaved off by converting the mapper service from Python to C++, and further savings were realized with straightforward wins such as eliminating other older Python components and adjusting the priority of the system service. Another big winner was apparently moving to dbus-broker, which is D-Bus compatible but higher performance. With all of these the OpenBMC bring-up on their Witherspoon box has reduced substantially and the upcoming AST2600 is reportedly three to four times faster.

This is a nice improvement even if it's probably most of the low-hanging fruit, and the OpenBMC team should get a solid thumbs up for the work here. I look forward to this appearing in a future firmware update for the T2 family. However, OpenBMC start time is only just one (albeit significant) piece of the startup puzzle: once main power is on, from the time the Blackbird boot screen appears (i.e., IPLs Hostboot) through Skiboot to the Petitboot menu the delay is still a hair over one minute, which compared to other platforms still seems way too long. Much of the time spent seems to be in Hostboot before Skiboot even gets initialized, but even Skiboot adds some overhead. Again, if you're like me and this is your primary computer, you won't deal with this often. But there's lots of Blackbirds and T2 Lites out there which are sidecars and while this is an obvious first world problem it's still a useability penalty to be paid.

None of this is a crippling fault with the platform, but particularly for the workstation market many of us are in, it's suboptimal. Therefore, continued improvement in basics like these makes the liveability of OpenPOWER on the desktop even better than it already is. And these improvements in OpenBMC hopefully should be just the beginning.

Comments

  1. By the way there exists an alternative (not a replacement) to OpenBMC: !BMC (or "bangBMC"). No dbus, systemd, etc.
    Check it out: https://git.anastas.io/dormito/br-blackbird-external/
    It can bring up the platform and do SSH access, but it's (deliberately) not a full replacement for OpenBMC. Many desktop/workstation users, don't need all the features of OpenBMC.
    Might even be worth a short article?

    ReplyDelete
    Replies
    1. That does sound like an interesting alternative, and would definitely be a thought to put on the Bird (I'd probably keep the Talos stock). I'll contact Shawn and see what's up. Thanks!

      Delete
  2. How noisy is the 8203-E4A? The service manual seems to show 4x squirrel cage/blower fans. I'd love to get my hands on an IBM system with AIX/LPAR but I don't have a basement to put one in.

    ReplyDelete
    Replies
    1. It can be noisy, especially at startup (the fans must be in operation or it will fail to enter IPL). It does include a baffle which fits onto the rear pegs, but you may not get this with a used machine. The sound meter in the room measures audible background noise at around 55dB though this also includes other systems running, and I usually have the baffle off since it's in a room by itself. With the baffle on it's noticeably quieter and that's how I ran it when I was an apartment dweller.

      Delete
    2. Thanks! As far as I can tell there aren't any other hobbyists writing about their POWER6 machines (a shame) so your experience is really valuable. 55dB is much better than my SPARC T4-2, which is the first time I encountered a truly loud server.

      Delete
    3. Yes, the SPARCs can be howlers, at least the Sun/Snoracle boxes I've run into. Don't misunderstand me: the 8203-E4A is hardly silent, certainly not compared to my Talos II which only moans a bit under load. But it's liveable with the baffle.

      I should also note that my observations apply to the tower, which is a 2U on its side in a case. The rackmount variant may be different.

      Delete

Post a Comment

Comments are subject to moderation. Be nice.