Mike Kohn!

CONTENTS

YouTube
BlueSky
GitHub
LinkedIn

Apollo Guidance Computer in an FPGA

December 20, 2024

Introduction

Back in the 1960's a computer was developed to manage NASA's Apollo moon missions. The computer was called the Apollo Guidance Computer (AGC) and existed both in the main spaceship and the lunar lander itself. In the recent months I had been reading posts on social media from people who don't believe it was possible for NASA to land on the moon, one of the arguments was about the computer technology being weak. This made me curious so I started reading up on what kind of computer system was actually used. I ended up finding some decent specs on the instruction set and decided to build it into an FPGA. Although I called this project Apollo 11, the AGC this implements was used in all of the Apollo missions including a mission to dock with a Russian space station.

Some of the sources used:

The Virtual AGC Project
Ken Shirriff's blog

I also got ahold of a PDF that has the original developer docs on computer and instruction set. Along with the further explanation below, the README in the git repo has a fairly detailed summary of the computer and a breakdown of all the opcodes.

The instruction set is fully implemented, along with the timer interrupt system, and such, but there are some notable differences:

Running 6MHz instead of 2.048MHz
Instruction timings don't match.
Hardware multiply instead of addition loop
No IMU (could be added?) and other hardware
Modern SPI, 7seg display, and 1980's joystick instead of the standard rocket hardware.

Before starting the Verilog I needed an assembler, so I added the Apollo Guidance Computer support to naken_asm. The git repo for the project (linked below) has some sample programs that can be assembled with this assembler.

Related Projects @mikekohn.net

FPGA:

FPGA VGA, Nexys2, Glow In The Dark Memory, Intel 8008, F100-L, RISC-V, x86 / 68000, MIPS, MSP430, PowerPC, W65C832, Apollo 11, PDP-11

Video

Above is a video of the AGC running a couple different programs. The first just demos the joystick (tests/joystick.asm) and the second is a simple Lunar Lander game: (tests/lunar_lander.asm).

https://youtu.be/8TdTqtAVF78?si=joy8cn-eZO7jPoWv

Explanation

Like most of the FPGA projects on this website, the Verilog was implemented for an iceFUN FPGA board (Lattice iCE40-HX8K) using the opensource tools (yosys, nextpnr, IceStorm). External devices connected to the board include an SPI controlled 7seg Display from SparkFun, a joystick, an LED, and a generic SPI bus which has an LCD display connected to it for a demo.

The AGC itself was designed in the 1960's, even before the introduction of the first commercially available CPU. Intel's 4 bit 4004 was released in 1971 and their second CPU the 8 bit 8008 in 1972. Rather than the computer's core being baked into a single IC, the AGC was built out of NOR gate logic chips wire wrapped together and used core rope memory for ROM and magnetic core memory for RAM.

In the documentation for the computer, just about everything was done in octal. For this reason, a lot of the Verilog was done with octal and the disassembler output for naken_asm tends to use octal. It made coding things really awkward at times.

The AGC has a limited number of instructions (all described in the git repo's README file), but they are actually pretty powerful. There are instructions like "index" that make it possible to make jump tables, there's a ccs instruction that can run 1 of 4 instructions based on if the accumulator is less, greater, or equal to 0, there are typical function calling instructions, interrupts, and even multiply and divide.

Memory in the system is 15 bit plus a parity bit. All math is fixed point 1's complement memory. That actually threw me off a little bit since modern computers are all 2's complement. In 1's complement, the most significant bit is the sign bit and if the sign bit is 0 (positive number) the rest of the bits are simply the value. If the sign bit is 1 (negative number) then the rest of the bits are are the value with all the bits reversed (1 is 0 and 0 is 1).

The fixed point number representation itself is pretty awkward. All numbers are fall between -1 and 1 (not including -1 and 1). So it's possible to represent 0.123 and -0.123 but not 1.234 or even 1.0. The most significant bit (indexed from 0, bit 14) represents the sign and the rest of the binary digits represent num / 32768.

Some examples:


      +0 is 0_00_0000_0000_0000
      -0 is 1_11_1111_1111_1111
     1/2 is 0_10_0000_0000_0000
    -1/2 is 1_01_1111_1111_1111
     1/4 is 0_01_0000_0000_0000
    -1/4 is 1_10_1111_1111_1111

Yes, there is +0 and -0 and some instructions do different things depending on if the result is +0 or -0.

Adding numbers in 1's complement math is done by taking what would be the carry bit in 2's complement math and adding it back into the result. So:


    1_00_0000_0000_0000  or -16383 / -0.999939
 +  0_00_0000_0000_0001  or      1 /  0.000061
 ----------------------
    1_00_0000_0000_0001
 +                    0   <-- carry is 0
 ---------------------- 
    1_00_0000_0000_0001  or -16382 / -0.999878

Or another example where the addition produces a carry bit:


    1_11_1111_1111_1111  or    -0 / -0.0
 +  0_00_0000_0000_0001  or     1 /  0.000061
 ----------------------
    0_00_0000_0000_0000
 +                    1   <-- carry is 1
 ---------------------- 
    0_00_0000_0000_0001  or     1 /  0.000061

There are some Python tools in the git repo that were used to verify the results of math operations came out correctly. Mostly they would do the math using Python's float and convert the float back to a binary 1's complement representation.

Multiply and divide (mp and dv) at first seemed.. rough. Hardware multiply can be done in 1 cycle with many logic gates and divide is complex. The documentation I have claims that multiply is done in 3 machine cycles (35.1us) and divide is done in 6 machine cycles (70.2us), however reading elsewhere it seems the computer was only capable of addition and subtraction, so it seems these instructions are just hardware loops of adds and subtracts. In apollo11_fpga implementation, multiply is done with verilog * so it takes only 1 CPU cycle to do the multiplication itself, but a couple other cycles to deal with load / store / and dealing with signed numbers. Divide is done with a hardware subtraction loop. Not sure where 6 MCT came from for the AGC if divide / muliply is looping with subtracts / adds.

Memory Map

The AGC computer has two types of memory in it: magnetic core memory for RAM (called erasable memory in the docs) and rope core memory for ROM (called fixed memory in the docs). Magnetic core memory kind of reminded me of MSP430's FRAM memory. On poweroff it retains its state. Clearly this implementation of the AGC is using regular FPGA block RAM and does not retain its state on power down.

Both the erasable memory and fixed memory in the AGC are done in banks and uses a bank register to pick which chunk of memory is accessable in the limited addressing space. So reading from fixed memory address 02000 will get different results depending on which bank is currently selected.

There 8 banks of RAM each 256 words (512 bytes) in size for a total of 2048 words (4096 bytes). The layout for erasable memory is:


    00000 - 00377 (0x0000 - 0x00ff)  E0 Overlap
    00400 - 00777 (0x0100 - 0x01ff)  E1 Overlap
    01000 - 01377 (0x0200 - 0x02ff)  E2 Overlap
    01400 - 01777 (0x0300 - 0x03ff)  Depends on EB:
      EB = 0: Same as E0
      EB = 1: Same as E1
      EB = 2: Same as E2
      EB = 3: Bank 3 memory
      EB = 4: Bank 4 memory
      EB = 5: Bank 5 memory
      EB = 6: Bank 6 memory
      EB = 7: Bank 7 memory

There are 36 banks of 1024 words (2048 bytes) in size for a total of 36874 words (73728 bytes). 32 banks are selected from FB's value and 5 extra banks come from a super bank bit in I/O channel 7 (FEB).


    02000 - 03777 (0x0400 - 0x07ff) Bank 00 to 31 (FB/BB 00 to 31)
    04000 - 05777 (0x0800 - 0x0bff) Common-fixed mem (bank 02 overlap)
    06000 - 07777 (0x0c00 - 0x0fff) Common-fixed mem (bank 03 overlap)

This implementation maps fixed memory to 16 bit * 4096 addresses.

Changing memory banks is done with 2 (or 3 to change both banks at the same time) registers:


    003 0x03 EB          Erasable bank register 000 0EE E00 000 000
    004 0x04 FB          Fixed bank register    FFF FF0 000 000 000
    006 0x06 BB          Both banks register    FFF FF0 000 000 EEE

I/O

The AGC has I/O channels that can be only accessed through the variations of the read / write instructions (read, write, rand, wand, etc). The typical AGC I/O (with the addresses) are: 1: L 2: Q 3: hi_scaler 4: lo_scaler 5: pyjets 6: rolljets 7: superbnk

To make this implementation more useful, some extra I/O ports were added to allow a SparkFun 7seg display, SPI (which is used to drive the OLED screen with some of the samples), 1 bit I/O for an LED, and simple push-button joystick to work. The extra ports are:


    12: IO data (bit 0: connected to LED)
    13: 4 x 7seg display
    14: display_ctrl - bit 0: display_busy
    15: interrupt flags
    16: interrupt clear
    17: IO data port 1 (3 bits used for SPI control for LCD)
    18: SPI transmit (8 bit)
    19: SPI receive (8 bit)
    20: SPI control - bit 0: SPI ready
    21: JOYSTICK - bits 4 to 0 are fire button and 4 axis of the stick

Registers

There are several registers available that serve various purposes:


    000 0x00 A           Accumulator
    001 0x01 L           Lower product register
    002 0x02 Q           Return address of called procedures
    005 0x05 Z           Program counter
    007 0x07 ZERO        Always zero.

Registers are all memory mapped, which made coding the Verilog pretty awkward.

A is used basically for math and moving data around. The L register is used for multiply and divide. In the multiply case it's used to have the lower 14 bits of the 29 bit result and in the divide case it's used to make a 29 bit dividend. It can also be used for 30 bit transfers of memory. Z is the program counter and Q is the return address when using the tc (transfer control) instruction. The ZERO register always gives a result of 0.

Function Calling

Function calling is done through the tc (transfer control) instruction. This instruction puts the return address (Z + 1 relative to the address of the tc instruction) into the Q register trashing any value that was previously there. For this reason, if a function has any other use of the tc instruction, either for a local jump or call to another function, Q must be saved and restored before the return instruction is exectuted.

The return instruction copies whatever is in the Q register back to the Z register. The computer will resume executing code there.

Branches

There is a tcf instruction which acts the same as tc but doesn't trash Q, so this can be used for basic jumps. There is also bzf and bzmf which are branch zero and branch zero or minus. I got bit by bzf at some point since it branches both on +0 and -0.

Timers

The AGC includes several counters that increment at some interval, usually every clock cycle. They are used for various reasons and some of them can trigger interrupts when they overflow. The following timers are:


    024 0x14 TIME2       14 bit / Inc on overflow of TIME1
    025 0x15 TIME1       15 bit / Inc every 10ms
    026 0x16 TIME3       15 bit / Inc every 10ms (T3RUPT)
    027 0x17 TIME4       15 bit / Inc every 10ms (T4RUPT / 7.5ms phase of TIME3)
    030 0x18 TIME5       15 bit / Inc every 10ms (T5RUPT / 5ms phase of TIME1)
    031 0x19 TIME6       15 bit / Updated 1/6000s DINC seq (T6RUPT)

The lunar_lander.asm demo uses TIME4 to count out ~200ms by getting the current value of TIME4 and looping until it changes 20 times.

Interrupts

There are various parts of the AGC can can trigger interrupts including timer overflows (value goes from 0x7fff back to 0), data transfers, and others. This implementation has interrupts for just the timers:


    T6RUPT - TIME6 decremented to 0.
    T5RUPT - TIME5 overflowed (digital autopilot thrust).
    T3RUPT - TIME3 overflowed (autopilot).
    T4RUPT - TIME4 overflowed (task scheduler).

Interrupts (in this implentation at least) only happen before the start of a full instruction. Some instructions in the AGC take 2 words and other instructions can modify the next instruction being executed either by changing the K value or skipping it altogether. Those instructions must completely finish before an interrupt happens.

At the start of an interrupt, the following happens:


    1. Copy Z to ZRUPT (Z points to the next instruction that should have executed).
    2. The opcode instruction at location Z is copied to BRUPT.
    3. Load Z with interrupt vector address.

On a resume instruction the following happens:


    1. Copy ZRUPT to Z.
    2. Copy BRUPT to next instruction to be decoded register.
    3. Increment Z.

The user will need to save any registers that could be used in the interrupt routine. Since there is no hardware stack, there are instead special memory locations that are used to save registers:


    010 0x08 ARUPT       Save A during interrupt (not automatic)
    011 0x09 LRUPT       Save L during interrupt (not automatic)
    012 0x0a QRUPT       Save Q during interrupt (not automatic)
    013 0x0b SAMPTIME1   Store copy for TIME1 (automatic?)
    014 0x0c SAMPTIME2   Store copy for TIME2 (automatic?)
    015 0x0d ZRUPT       Return address for interrupt
    016 0x0e BBRUPT      Save BB during interrupt (not automatic)
    017 0x0f BRUPT       Copy of instruction pointed to by ZRUPT (automatic)

Lunar Lander Game

The Lundar Lander game uses the SPI port with the same 96x64 OLED display used on previous FPGA projects. The display uses an SSD1331 driver chip which has the capability of drawing lines and boxes and such just by sending simple commands. I had never used any of those features before so I decided to test it first with the MSP430 FPGA project since MSP430 assembly is magnitudes easier than AGC and this core takes only 15 seconds to build while AGC is 30. After getting a box drawing on the MSP430, code was added to the lunar_lander.asm program which ended up having trouble drawing the bigger boxes needed for the horizon and ground area. It ended up that putting a delay between the drawing commands fixed it. It seems like sending a new command to the display while it's busy drawing the previous command caused problems. Since the SPI on the display used here only has an input pin, there appears to be no way to get status from the SSD1331 chip to know if it's busy.

A script in the tools/ directory of the repo has a script make_gravity_table.py, which uses the force of gravity on the moon (1.62 m/s^2), an assumed 200ms delay between frames, and taking the height of the real Lunar Module with the number of pixels represented to build a velocity table.

The game itself is in the test/ directory as lunar_lander.asm. The code exercises quite a bit of CPU, which is nice. The game is played with the joystick. Moving left or right will cause a thrust equal to the moon's gravity while moving up will apply a thrust double the moons gravity upward.

Pictures

Above is a picture of the FPGA board and the OLED display showing the lunar_lander.asm game. The 7seg LED display is showing only 2 numbers since it uses PWM... viewed before the picture was taken all 4 segments had numbers. To the right is the space-aged joystick used to play the lunar_lander.asm game.

Testing

Most instructions were tested individually with samples in the test/ directory in the repo. This really needs a lot more testing though before it's safe to fly it into space or land on the moon.

Conclusion

So back to the conspiracy theory that claims the computer technology in the Apollo spacecraft couldn't be powerful enough to get to the moon... for reference here are other technologies of the time:

The first commercial airliner that could autoland was done in 1965. The Minuteman ICBM (a missile that can fly into space, travel to the other side of the planet, and explode a nuclear weapon in another country) with a similar computer was introduced in 1962. The Soviet Union landed a spacecraft on Venus in 1970. The AGC itself had an IMU (gyro / accelerometer?) and had some kind of autopilot mode, not sure exactly what for... but the astronauts of the Apollo system navigated to the moon with a sextant (not sure how much that method was used but it was there and used) and while landing on the moon the system was piloted by a human. The computer itself had plenty of ROM memory for the software and plenty of RAM for what it was programmed to do. As far as being less power than say, a Commodore 64 with an 8 bit 6502, personally I would have rather coded on a 6502 because to me the assembly language is so much more obvious, but the AGC could do things like 15 bit math. It was a more specialized computer designed to run a spacecraft.

Either way, to me I don't see why this computer system wasn't possible.

Source Code

git clone https://github.com/mikeakohn/apollo11_fpga.git