Many improvements have occurred in
Microwatt, the little VHDL Power ISA softcore, so far the easiest way — particularly for us hobbyists — of getting an OpenPOWER core in hardware you can play with. (The logo is not an official logo for Microwatt, but I figured it would be fun to try my hand at one in Krita.) Even though it still has many known and acknowledged deficiencies it's actually pretty easy to get it up and running in simulation, and easier still on POWER9 hardware where the toolchain is already ready to go.
I'm no VHDL genius personally, but this seemed like as good a time as any to learn. ghdl is available for most distros, though Fedora 30 and earlier curiously lack it for ppc64le; fortunately, Dan Horák's builds work fine. So let's get the basics up. If you're on F30 as I am, install ghdl from his repo first. These URLs may vary; they are what was current at the time of this article.
% sudo dnf install https://copr-be.cloud.fedoraproject.org/results/sharkcz/danny/fedora-30-ppc64le/01028671-ghdl/ghdl-grt-0.37dev-1.20190820gitf977ba0.fc30.ppc64le.rpm
[...]
% sudo dnf install https://copr-be.cloud.fedoraproject.org/results/sharkcz/danny/fedora-30-ppc64le/01028671-ghdl/ghdl-0.37dev-1.20190820gitf977ba0.fc30.ppc64le.rpm
[...]
For Debian and other distros, install from your package manager as appropriate.
Next, let's install Microwatt and MicroPython and make sure all that works. This is essentially the same demo Anton showed at the OpenPOWER summit. If you are doing this on an inferior x86_64 system (or at least something that isn't POWER8 or POWER9), you will need to have a Power ISA C cross-compilation toolchain installed to properly build MicroPython. Adjust the make -jXX to your number of threads. This sequence of commands will end up with microwatt/ and micropython/ installed in separate directories at the same filesystem depth (in my case, ~/src). Keep it this way because we will be adding one more project at the end.
% git clone git://github.com/antonblanchard/microwatt.git
Cloning into 'microwatt'...
[...]
Resolving deltas: 100% (818/818), done.
% cd microwatt
% make -j24
ghdl -a --std=08 decode_types.vhdl
[...]
% cd ..
% git clone git://github.com/mikey/micropython.git
Cloning into 'micropython'...
[...]
Resolving deltas: 100% (52248/52248), done.
% cd micropython
% git checkout powerpc
Already on 'powerpc'
Your branch is up to date with 'origin/powerpc'.
% cd ports/powerpc
% make -j24
mkdir -p build/genhdr
[...]
MISC freezing bytecode
CC build/_frozen_mpy.c
LINK build/firmware.elf
[...]
% cd ../../../microwatt
% ln -s ../micropython/ports/powerpc/build/firmware.bin simple_ram_behavioural.bin
% ./core_tb > /dev/null
MicroPython v1.11-320-g7747411e9 on 2019-09-14; bare-metal with POWERPC
Type "help()" for more information.
>>> 1+2
3
The simulation is rather slow, made worse by all the copious debugging output (which here is sent to the bitbucket), but it does work as advertised. To make core_tb stop, you will probably need to kill it from another terminal session, depending on your shell. (I had to.)
Let's now turn to adding instructions to Microwatt. Since having to manually kill the simulation is annoying — it would be nice if the simulation could gracefully halt under program control — we'll implement a wait instruction as an educational example. This instruction is new in ISA 3.0B; the ISA book explains its operation as that it "causes instruction fetching and execution to be suspended. Instruction fetching and execution are resumed when the events specified by the WC field [the wait condition, its sole constant parameter] occur." Strictly speaking probably the stop instruction would have the most authentic semantics — "The thread is placed into power-saving mode and execution is stopped." — but for obvious reasons this is a privileged instruction because this would completely halt that hardware thread until a system reset or other system-level event. Also, it doesn't take any parameters, so it's not as nice an illustration.
wait's sole supported WC field code is 0b00; this causes the instruction to "[r]esume instruction fetching and execution when an exception, an event-based branch exception, or a platform notify occurs." In practical circumstances, if you execute wait from a userspace program, these events happen all the time and the instruction seems like a no-op.
% more test.c
#include <stdio.h>
int main(int argc, char** argv) {
__asm__("wait 0\n");
fprintf(stderr, "ok\n");
return 0;
}
% gcc -o test test.c
% ./test
ok
However, on a little core doing nothing else, it well might be a terminal instruction sequence, so since we can run it from userspace anyway let's go ahead and implement a hal-fassed version of it which will cause the simulation to conclude gracefully. This is the diff that does so, applied against ab34c483. Let's analyze it piece by piece.
First, let us note that there's already code in Microwatt for an ungraceful exit, such as when you execute an undefined instruction; this terminates with an error. We could simply use that, but I'd prefer to do something cleaner, so we'll define a new signal for halting.
Next, we will define the opcode format in the instruction decoder. Conveniently, the instructions td and tdi (trap doubleword and trap doubleword immediate, respectively) have a similar encoding where their common constant argument — the "TO" or trap operation bits — occupies the same bit field. (Note that td et al. allow five bits here but wait only takes the two least significant bits with the other three reserved. We will handwave this away since they are invariably encoded as zero.) To get these bits decoded for us, we specify that the first constant argument is encoded as TOO. You can see other encodings for registers and immediates in the surrounding templates.
Next, we tell Microwatt how to identify the opcode. The bit fields for the opcode pieces are simply cribbed from the ISA book.
Next, we add the actual symbols for the instruction and the operation, thus linking them up with the decoder.
Then, we write the operation's logic. For illustrative purposes, since only 0b00 is allowed and other bit combinations are reserved, we will have the simulation assert and ungracefully terminate on other values using the existing code. Otherwise, we set the halted signal.
Finally, we write the code to actually gracefully halt when the halted signal appears, using the built-in VHDL test bench function stop() (coincidentally named, as it happens).
With this patch applied, rebuild Microwatt with a make. To test it, we'll need something that actually executes this instruction, so let's make a simple "hello world" type example using pieces from MicroPython and Microwatt's own built-in "hello world." A small assembly language stub (in both of these examples, head.S) acts as a trampoline into whatever our main() is, detecting if we are running it within QEMU or from the VHDL test bench. However, we won't have a libc and we'll need routines to actually send and receive data with the "serial console" presented by the core. We also need a couple hints for the linker to make a binary we can actually run in the simulator.
I've compiled all of these pieces into a Github project "Microhello," which you can use as a scaffold for your own programs to run on the core. I've tried to make it a little more modularized than the Microwatt "Hello World" example as well. Clone it at the same depth as microwatt/ and micropython/, then do make runrun to replace the symbolic link to the MicroPython binary with Microhello:
% git clone git://github.com/classilla/microhello.git
[...]
% cd microhello
% make runrun
cc -I. -g -Wall -std=c99 -msoft-float -mno-string -mno-multiple -mno-vsx -mno-altivec -mlittle-endian -fno-stack-protector -mstrict-align -ffreestanding -Os -fdata-sections -ffunction-sections -c -o build/main.o main.c
cc -I. -g -Wall -std=c99 -msoft-float -mno-string -mno-multiple -mno-vsx -mno-altivec -mlittle-endian -fno-stack-protector -mstrict-align -ffreestanding -Os -fdata-sections -ffunction-sections -c -o build/uart_core.o uart_core.c
cc -I. -g -Wall -std=c99 -msoft-float -mno-string -mno-multiple -mno-vsx -mno-altivec -mlittle-endian -fno-stack-protector -mstrict-align -ffreestanding -Os -fdata-sections -ffunction-sections -c -o build/string.o string.c
cc head.S -c -o build/head.o
ld -N -T powerpc.lds -o build/firmware.elf build/main.o build/uart_core.o build/string.o build/head.o powerpc.lds
size build/firmware.elf
text data bss dec hex filename
6508 0 24 6532 1984 build/firmware.elf
objcopy -O binary build/firmware.elf build/firmware.bin
( cd ../microwatt && rm -f simple_ram_behavioural.bin )
/usr/bin/make run
make[1]: Entering directory '/home/censored/src/microhello'
( cd ../microwatt && \
ln -s ../microhello/build/firmware.bin simple_ram_behavioural.bin && \
./core_tb > /dev/null )
PowerPC to the People
We neatly came to a halt. Yay!
The serial console library is in uart_core.c and a basic implementation of puts() (and strlen()) is in string.c. The main() is very simple. Minus the comments, here is main.c in its entirety:
#include "uart_core.h"
#include "string.h"
int main(int argc, char** argv) {
uart_init_ppc(argc);
puts("PowerPC to the People");
__asm__("wait 0\n");
return 0;
}
The trampoline uses the start of execution to determine what mode to initialize the serial console in, passing that to main() in r3, which in the Power ABI is the first argument to the function (argc). We then puts() the string and execute a wait 0 to terminate. Easy.
To prove the argument is being evaluated, change the instruction to wait 3 and re-run with make runrun. Notice how it terminates:
PowerPC to the People
make[1]: *** [Makefile:16: run] Error 1
If you run ./core_tb (in the microwatt/) directory without sending the output to /dev/null, you will see the message from our implementation in the log with the invalid wait condition.
Lastly, if you remove the wait instruction entirely and re-run with make runrun, then the test bench will loop forever echoing our string repeatedly, bouncing in and out of our code on the trampoline, until you kill it.
Microwatt is fun, simple, easy to experiment with and a great way to better understand what Power ISA does under the hood. While its performance is no barnburner, as a pedagogical aid it's a great little proof of concept, and it can certainly be the basis for something bigger. In a future article we'll actually synthesize this core and do a little more with it in actual hardware.