PowerPC in an FPGAPosted: September 13, 2024 Introduction Back in the early 1990's IBM, Motorola, and Apple released the PowerPC CPU. It originally replaced the 68000 in the Apple Macintosh computers, but was also used in systems such as Nintendo GameCube, Wii, Wii-U, Playstation 3, and Xbox 360. Here's am FPGA project I'm calling the UnderpoweredPC. This is a minimal implementation of PowerPC in Verilog for an FPGA. It started as a copy of the RISC-V FPGA project, mostly just changing the core to PowerPC. The lcd.asm test program from RISC-V was converted to PowerPC. This program blinks and LED and when a button is pushed generates a Mandelbrot on the small LCD display. Related Projects @mikekohn.net
Source Code git clone https://github.com/mikeakohn/powerpc_fpga.git Explanation A few years ago I added PowerPC support to naken_asm. Since I've been playing around with implementing some CPUs in Verilog, I figured PowerPC might be an interesting one. While on a plane ride I started this up (along with a few others unfortunately). The code was a fork of the RISC-V FPGA I did earlier. They are pretty much the same except the memory interface was changed to big endian and the RISC-V instruction decoding was replaced with PowerPC. In some ways I found this architecture pretty slick and straight forward but in other ways... I wish I could ask IBM's engineers what they were thinking. So some positives are, the instruction formats are pretty clean. The top 6 bits are always a "main" opcode that tell what the instruction is. The destination register (rd) is always encoded in the same bits [25:21], the ra register also always in the bits [20:16], and rb always in the same bit lanes [15:11]. There are also some other flags like oe (for updating the XER register) and rc (for updating CR register) and they too are always in the same spot. Now some of the ugly. Well, the first thing that comes to mind is the complexity of the condition flags. There are two registers that hold some condition information, the CR register which has eight 4 bit sets of condition flags (cr0 to cr7, which have LT, GT, EQ, SO) and the XER register which holds C, OV, and SO. C is carry, OV is overflow, and SO is summary overflow which gets set to 1 if OV is 1, but doesn't get cleared until the user clears it. These condition flags usually don't get set less the user sets the OE or RC bits, which basically is the difference between: add, add., addo, addo. where the instructions that end in o let the XER be updated and the . lets the CR0 flags get updated. The cr0 to cr7 parts of the CR register also have special instructions for copying the data around both to general purpose registers and to each other. Really, though was this complexity actually needed? Why not just always update the condition flags and be done with it? For opcode 31, it means there is a sub-opcode in bits [9:1] or sometimes when there is no OE flag from [10:1]. This one seems really... screwie. [9:1] gives 512 combinations of opcodes, which really should have covered everything. Instead instructions like srw/srw. use [10:1] and have values of 536 so it really requires all 10 bits to decode. Why not just leave OE set to 0 and use some of the unused subopcodes in the [9:1] area? I guess it's possible there is a bit encoding pattern to it. One thing nice over RISC-V is the way immediates are done. RISC-V has a combination of lui which can load the upper 20 bits of the register with a signed addi or signed ori to load the bottom 12 bits. It's quite awkward. PowerPC (MIPS too) has 16 bit immediates for the lower bits which can be signed (with an addi) or unsigned (with ori). All logic instructions in PowerPC / MIPS are unsigned which makes a lot of things easier. Another nice thing about PowerPC is it has load / store instructions with base registers and update instructions. Meaning not only can it do the typical: lwz r1, 4(r2) -- load word from the address r2 + 4 points to into r1, but it can do:
lwz r1, 4(r2) - ea = r2 + 4, r1 = [ea]
lwzu r1, 4(r2) - ea = r2 + 4, r1 = [ea], r2 = ea
lwzx r1, r2, r3 - ea = r2 + r3, r1 = [ea]
lwzux r1, r2, r3 - ea = r2 + r3, r1 = [ea], r2 = ea
ARM64 has similar instructions, but these types of things, which can reduce code density, are missing from RISC-V. I added instructions like this to the RISC-V FPGA project (and named it CISC-V as a joke) to see if there would be a performance kick from those instructions. One other thing that was kind of rough but really just slowed down development was that the documentation calls the bit at the far left as bit 0 instead of the bit on the right like most CPU's documentation. I guess they did this because the CPU is big endian? Motorola's big endian 68000 doesn't even do this though. I ended up writing some Python scripts to help translate their bit numbering into something useful (in the tools directory the reverse_bits.py script). This also made writing the assembler a challenge too. Features The supported peripherals are IO, Button input, speaker tone generator, and SPI. The memory map has two 4k RAM segments and a 4k ROM segement at the top of the 16 bit addressable memory.
The peripherals bank contains the following locations:
The sample test code can be assembled with naken_asm. Picture This is the IceFUN with the PowerPC verilog code in it running the lcd.asm demo.
Copyright 1997-2024 - Michael Kohn
|