CONTENTS

Home
Updates
Software
Electronics
Music
Resume
Contact


YouTube
BlueSky
GitHub
LinkedIn
Twitter

HTTPS VERSION


W65C832 In An FPGA

Posted: October 16, 2024

Introduction

Back in the 80's, the Western Design Center (WDC) created a 16 bit version of the 6502 CPU called the W65C816. I believe it was created for the Apple IIgs computer, although it was also used in the Super Nintendo. It appears they had a data sheet for a 32 bit version of the chip called the W65C832. I decided to do a Verilog version of it in an FPGA.

I updated naken_asm so it has a .65832 directive that allows immediate values for things like lda and such to have a .l modifier so it can do 32 bit values now.

The instruction set is able to access 16MB of RAM. The FPGA itself has a small amount of block RAM that this w65c832 core uses as 4k of RAM at the bottom of memory (for zero / direct page), 4k as ROM, and 4k as pages for a Winbond W25Q128J flash chip. More on that further on this page.

The Verilog source code is available on GitHub and currently works with an IceFun board, but I plan to do something a little bigger with it later.

Joe Davisson also created a bootloader for this core. It uses XMODEM so programs can be uploaded into into the core using a term program like MiniCom. I'll post a binary so anyone can buy the IceFun board and just use their simple uploading software to load it. No need to build the Verilog from source code. More info about that on his website: XMODEM Bootloader

Related Projects @mikekohn.net

FPGA: FPGA VGA, Nexys2, Glow In The Dark Memory, Intel 8008, F100-L, RISC-V, x86 / 68000, MSP430, PowerPC, W65C832

65C832

The chip itself has 3 registers still:

A (8/16/32 bit accumulator) X (8/16/32 bit index x) Y (8/16/32 bit index y)

Along with some more internal registers of the 6502 and some newer 65C816 registers:

SP (16 bit stack pointer) PC (16 bit program counter) DR (Direct Register - 65C816) DRB (Data Bank Register - 65C816) PRB (Program Bank Register to extend PC to 24 bit - 65C816)

There's a status register (called P for some reason):

P: 7 6 5 4 3 2 1 0 N V M X D I Z C N negative (set if bit 7 of the result is set) V overflow M X / break D decimal I interrupt disable Z zero (set if ALU result is 0) C carry (set if ALU result requires bit 8)

There are two other 1 bit registers called E16 and E8 that pick what mode the CPU is operating in. They can't directly be accessed, but instead are accessed through an XCE instruction, which acts differently depending on if the CPU is in 8 bit emulation mode or not.

M flag sets the size of A and memory fetches. X flag sets the size of X and Y. // E16 E8 M X A X,Y Mode // 0 0 0 0 16 32 W65C832 Native // 0 0 0 1 16 8 W65C832 Native // 0 0 1 0 8 32 W65C832 Native // 0 0 1 1 8 8 W65C832 Native // 0 1 0 0 32 32 W65C832 Native // 0 1 0 1 32 8 W65C832 Native // 0 1 1 0 8 32 W65C832 Native // 0 1 1 1 8 8 W65C832 Native // 1 0 0 0 16 16 W65C816 Emulation // 1 0 0 1 16 8 W65C816 Emulation // 1 0 1 0 8 16 W65C816 Emulation // 1 0 1 1 8 8 W65C816 Emulation // 1 1 1 BRK 8 8 W65C02 Emulation

E16 and E8 change the emulation mode. At startup, the all 3 flags are 1 while BRK is ignored. To switch from W6502 emulation to W65C816 emulation:

clc xce

While in W65C816 mode, the xce (Exchange C with E8) instruction becomes the xfe (Exchange C with E8 and Exchange V with E16) instruction. So to change to 65C832 mode while in W65C816 mode:

clc clv xce

Winbond W25Q128

To add a lot more ROM area, this implementation can use a Winbond W25Q128 16MB flash chip. The core uses the memory similar to virtual memory. Accessing a any memory address 0xc000 or above will trigger the CPU to pause while 4k of flash is transfered from the flash chip to RAM. At that point that 4k page is immediately accessable. If a memory location is read that isn't in the currently loaded page, the CPU again will pause and the next page will be read in.

To program the flash, a CH341A along with the software flashrom was used to tranfer a .bin file to the chip. The commands used to write the out.bin to flash and then read it back to a rom.bin file to make sure it works:

flashrom -p ch341a_spi -c W25Q128.V..M -w out.bin flashrom -p ch341a_spi -c W25Q128.V..M -r rom.bin

Here is a picture of the CH341A programmer with the flash ROM in the ZIF socket:

CH341A Winbond Flash

Any memory below 0xc000 in the flash is ignored, but anything 0xc000 and above is mapped directly into the CPU. To access memory location 0x20003 in the flash memory, the following code can be used:

lda.b #2 pha plb lda 0x0003

The instructions set the accumulator to 2, pushes it to the stack with pha, pulls it from the stack to the dbr (data bank register) with plb, and uses lda 0x0003 to pull from the effective address of 0x20003.

It could be possible to make the CPU write back pages into the flash ROM when they are paged out, but that doesn't seem very practical.

Memory Map

This implementation of the W65C832 has 4 banks of memory. If there is a Winbond W25Q128JV, Bank 3 and all memory above up to 16MB will be paged in (and out) of RAM 4k at a time.

Bank 0: RAM (4096 bytes) Bank 1: ROM (4096 bytes from rom.txt) Bank 2: Peripherals Bank 3: Wondbond W25Q128JV Flash (filling up to 16MB).

On start up the chip will execute code from Bank 1. If the program select button is pushed on reset code will start from location 0xc000 in Bank 3.

The peripherals area contain the following:

0x8000: input from push button 0x8001: SPI TX 0x8002: SPI RX 0x8003: SPI CTRL 0x8008: ioport0 output (in my test case only 1 pin is connected) 0x8009: MIDI note value (60-96) to play a tone on the speaker or 0 to stop 0x800a: iport1 0x800b: UART TX buffer 0x800c: UART RX buffer (reading clears out rx_ready) 0x800d: UART CTRL - bit 1: rx_ready, bit 0: tx_busy

The UART can be accessed through a cheap USB-UART cable. The cable must have logic levels of 3.3v only. There is a picture below showing how it's connected.

Joe Davisson has a working bootloader over the UART (works with EasySXB too) so changing the software doesn't require reprogramming the FPGA or a flash/eeprom chip.

Explanation

Even with just the 65C816, the register modes (8 or 16) bit can be a challange to work with. When working on the PANCAKE-ROM project, I got bit by mixing up what mode the CPU was in while writing some memory locations. The memory manipulation was assumed to be in 16 bit mode, but it was 8 bit. The 65C832 makes this even more rough.

While working on the test/lcd.asm, which blinks an LED and draws a Mandelbrot on an LCD display when a button is pushed, a few times I would forget when using immediate to use the proper modifier:

0x0000: a9 05 lda.b #0x0005 0x0002: a9 05 00 lda.w #0x0005 0x0005: a9 05 00 00 00 lda.l #0x0005

If the CPU is in 8 bit mode and the lda.l is used, it will read in only 0xa9 and 0x05 to load A and then execute the 0x00 part (brk) of the immediate that it didn't read in.

I also got bit when trying to load the P (status register) into A. While in 32 bit mode I did:

php pla

This pushed 1 byte for the P register and popped 4 bytes into A.

When disassembling code, the disassembler never knows what mode the CPU is in (8, 16, 32 bit) so it makes doing a disassembly of a binary 65C816 and 65C832 code pretty much impossible.

Subroutines also have to be sensitive to what mode the CPU currently is in. In lcd.asm, subroutines would push the P (flags) register and pop status when leaving the subroutine. The bad part is e16 and e8 are not a part of the flags so saving those off is a bit more complex. With the 65C816 this wasn't as much of a problem, once out of 6502 emulation mode, both A and X/Y can be in all modes with just sep and rep, no need to mess with e8. Not so much in 65C832.

Inside the 65C832 Verilog, I got bit by some instructions reusing the opcode space of others. So, every 8 bit opcode in the 6502 can be broken up into 3 pieces: aaa bbb cc. The first 3 bits (aaa) are kind of the operation, the bbb part is typically the addressing mode, and cc kind of separates out different types of opcodes. One instruction I got bit by was the "bit" instruction. If cc is 00 then if the operation (aaa) is 001, it's a "bit" test instruction for addressing modes immediate, zero page, absolute, zero page with x, and absolute with x. What I didn't realize (until I spent too long on a bug in lcd.asm) is that bit #imm overlaps with c=01, aaa=001 which is supposed to be the sta instruction. If it's an sta instruction and the addressing mode bbb=010 (immediate) it's kind of useless so the bit #imm (100_010_01 or 0x89) was placed in that opcode space.

The test/lcd.asm that draws a Mandelbrot on an LCD display was ported from the F100-L FPGA version. It uses fixed point math of 6.10 (6 bits integer and 10 bit fraction) so most of the Mandelbrot code runs in 16 bit accumulator mode. It switches to 32 bit mode when doing the multiply and back to 16 bit after shifting the result right 10 times. The code exercises quite a bit of the instruction set including stz.

The 65C832 Mandelbrot takes around 1m8s to generate.

Picture

W65C832 UART

The picture above shows how to connect a UART cable. To prevent damage to the FPGA, this must be a 3.3v logic cable. The 5v red wire is not needed and left unconneted. The green wire is TX transmit and is connected to G3 on the FPGA (the RX receive pin). The white wire is RX receive on the USB side and is connected to H3 (the TX transmit pin).

LCD W65C832 Mandelbrot

This is the IceFUN with the W65C832 verilog code in it running the lcd.asm demo.

Differences

As of this writing, there are a few things that need polishing or needs to be implemented. These will probably eventually be done:

  • Instruction timings don't match the spec.
  • Decimal mode isn't implemented.
  • 6502 Emulation mode allows newer instructions to work.

There might be others I'm forgetting.

Source Code

git clone https://github.com/mikeakohn/w65c832.git

Copyright 1997-2024 - Michael Kohn