Transistor CPU Project

January 4th 2020, 4:20 pm

For about a decade and a half, I've wanted to design and build my own CPU from some sort of discrete components. This has become fairly standard in the hobby world and is completely obsolete with the existence of FPGAs, tons of cheap and available processors and even some microcontrollers costing as little as three cents USD. Nonetheless, I wanted to design a CPU myself mostly as an opportunity to learn and give myself a large project as a challenge.

A few weeks ago I was feeling super under the weather so I came home from work to rest up and started binge watching Ben Eater's Channel on YouTube. Side note, I love his channel. He does such an amazing job breaking things down to easy, well paced chunks that so many people can understand. The channel is like junk food to me and I love picking random things to watch. Anyway, six or seven videos into his 8-bit CPU build I started asking myself why I hadn't ever gotten around to designing and building my own CPU. So, I grabbed a set of resistors and some 2N2222 transistors I had laying around and just started playing with BJT logic gate circuits I could find on Google.

Simple circuits that I threw on paper after testing.

I didn't want to just take somebody's word for it online, so when I built the circuits I took a lot of measurements using my oscilloscope to verify the design. For each simple circuit I build I measured propagation time when the input went from low to high and from high to low as well as the amperage pulled by the circuit when running at 5V in various scenarios. Getting the worst-case propagation delay for each circuit allows me to figure out what the maximum clock speed of any CPU I build will be. I worked my way up from a simple not gate with a single transistor as well as a buffer, to nand, nor, and and or gates, and finally an SR latch. Once I had those parts built and verified, I would sketch up schematics for them with various notes on their propagation delay and current requirements. Ignore the obvious schematic errors below, I haven't done pen and paper logic design in years and completely forgot that I was drawing xor gates instead of nor gates.

Additional circuits that I tested and measured.

With an SR latch you can build all other types of latches and flip-flops. A CPU needs registers, and I want to build the core of the CPU entirely from discrete transistors and resistors, so I needed to build and test a D flip-flop. This meant adding an enable line and tying that enable to an edge detection circuit and verifying that I could "clock in" a 1 or a 0. This worked as expected, except for playing around with the resistor and capacitor values for the edge detection circuit. It still doesn't work quite right depending on the speed of the clock, and its affected by the circuit that drives it so I think I'll have to buffer it in a future redesign. Later, I added a second un-clocked enable and an asynchronous reset input, both of which will be necessary to use this as a single bit in an upcoming register. The enable will act as a chip select, allowing CPU control logic to dictate whether a particular register should store a value present on its input at the next clock pulse or retain the existing value. The asynchronous reset will allow a power on reset circuit to reset all registers to zero when the CPU is powered on for the first time.

D flip-flop with a clock and data input, a buffered output with a 10K load and an indicator LED.

If you have a D flip-flop and you have access to the inverted output, you can feed that back into the data input in order to make a T flip-flop. This type of circuit is great for chaining together to make counter circuits or clock dividers. I verified that the theory also worked on my D flip-flop circuit. I have several more circuits that I have to build and verify before I could theoretically put together a CPU of any sort. I need xor gates. I also need some way of selecting one of multiple inputs to drive a bus. Both of these could be handled by the circuits I've already built at the cost of additional propagation delay as well as higher part count. However, I want to keep the part count low so I need to build simpler circuits. Currently I have plans to lay out several busses for the CPU core and I've decided to go with an open collector design instead of tri-state output. This is because of the increased complexity involved in producing a tri-state output (multiple transistors, diodes and an inversion required) versus an open-collector (a single pull-down transistor on an active-high bus).

In order to make the CPU core more modular and thus easier to build, I've decided to go with a microcoded architecture. This will let me prototype the CPU using an EEPROM to hold the microcodes and very quickly swap things out if I don't like how it works. The final CPU will use combinatorial logic to decode instructions and a diode matrix board per opcode to store the control signals at each step in the CPU's execution. I'll use a series of D flip-flops as a counter to control which microcode to select given a decoded instruction. This design also allows me to reduce parts in several critical areas of the CPU since I can reuse expensive parts such as an adder circuit to drive both the ALU and the program counter. This comes at the cost of slower instruction throughput as only part of each instruction will be executed every clock cycle. I could have made the trade-off to have more complicated logic circuitry, but when I'm looking at hand-soldering each transistor I would prefer to keep things simple and slow.

With most of the basic theory out of the way and verified in-circuit, I got to work thinking about an instruction set. I took a lot of inspiration from the PDP-8, another transistor computer, as well as Ben Eater's simple 8-bit computer. Ben Eater's computer is more of a learning CPU since it only has 16 bytes of memory available. While it is turing complete, it is extremely limited. I want to keep my CPU simple so that its humanly possible to design, wire up, debug and code for. However, I do want to be able to write "useful" software for it. I'd like it to be capable of interfacing with external devices, possibly through serial or keyboard and VGA. I'd like to be able to code simple games or productivity software for it. And finally, I'd like it to be self hosting which means making it powerful enough to code an assembler that runs in a boot ROM. This necessitates a Von Neumann architecture. It also necessitates having access to a decent amount of memory and external hardware registers.

I settled on a hybrid 8-bit CPU design which allows for software access to 16 bits of RAM/ROM/external hardware registers. Staying true to its inspiration, I have a simple 8-bit, accumulator-based CPU and software can only interact with memory or this single register. However, several more support registers that aren't directly software-accessible will be 16-bit to enable full access to program and data memory and external hardware. I wanted to keep instruction decoding simple, so all instructions are 8-bit as well with no variable instruction width support. Most instructions deal with loading/storing or manipulating the accumulator in some manner, with a few instructions able to interact with special registers. Instead of conditional jumps, I'm going with a skip next instruction opcode which will allow any supported instruction to be made conditional. I don't currently have absolute jumps, call or return support or a stack right now but I have plenty of space reserved in the opcode space to add these in a future revision. Several of the registers are write-only or cannot be directly read or written from software which means this CPU cannot be multi-threaded. I think that's okay though, given that the CPU is probably going to run on a clock in the KHz range and would struggle with even the simplest of multi-threaded code.

Snapshot of a Google Sheets document outlining my current instruction support.

The full list of busses and their design is as follows:

  • 16-bit data bus. This is the primary bus used for moving data between registers and memory. It is 16 bits in order to allow the instruction pointer register to interact with the ALU.
  • 16-bit address bus. This is the bus that feeds the address circuitry for main RAM. It is separate from the data bus to remove the need for a dedicated memory address register.
  • 16-bit ALU source bus. This bus feeds one input to the ALU.

The full list of registers and their capabilities are as follows:

  • 16-bit instruction pointer register (IP), holding the address of the current instruction in memory. Its value can be placed on the address bus or ALU input bus and it can read from the data bus. Software cannot directly set this, but a jump relative to immediate instruction allows it to be indirectly updated.
  • 8-bit instruction register (IR), holding the current instruction that was fetched from memory. It is write-only and feeds microcode decoding logic, but can read from the data bus. Software cannot directly set this, but memory is modifiable so self-modifying code is possible.
  • 8-bit accumulator register (A), holding the current accumulated result. It can read from and write to the data bus, and it can output to the ALU input bus. Many instructions available to software can directly manipulate this register.
  • 8-bit ALU temporary register (B), holding a temporary value from the bus. It can read from and write to the data bus. Its output is also hardcoded to the second input of the ALU. Software has no capability to modify this register and it is used by various microcodes to accomplish virtually all CPU operations.
  • 8-bit memory page register (P) and 8-bit memory cell register (C), together holding a 16-bit address. The P and C registers can individually read from the data bus, and the combined PC value can be output to the address bus or the ALU input bus. Software can write to the P and C registers from the A register and can use the combined PC register contents to load from and store to memory, but it cannot directly read from either register.
  • 2-bit flags register, containing a carry flag (CF) and a zero flag (ZF). Software can directly set or clear the carry flag using a pair of instructions, and both carry and zero flags are set appropriately when carrying out any ALU-based operation which sources from and stores to the A register. It is not directly readable by software but there exist skip instructions that allow software to conditionally execute a particular instruction if either CF or ZF is set or cleared.

Aside from registers and busses, a few more pieces of hardware will exist to make the CPU core:

  • An ALU, which performs operations against the B register and either the A register (sign-extended from 8 bits to 16 bits), the IP register or the PC virtual register. All operations except for add operate only on the low 8 bits of the ALU bus. Add works on all 16 bits of the ALU bus and a sign-extended version of the B register. This allows software to request that the PC be incremented or decremented and it allows the CPU to use the ALU to both increment the IR as well as perform both conditional and unconditional jumps.
  • A zero generator which outputs all zeros to the data bus. This is for pre-loading the B register for certain ALU operations. We also need a -1 value but given that we are using an open collector bus design, we can simply turn off all outputs and the bus will read all 1's which is equivalent to a -1 in two's compliment.
  • Some sort of ROM and some sort of RAM. Given that its infeasable to build an SRAM circuit of any usable size out of discrete transistors and core memory is far past obsolte and difficult to obtain, I'll probably use a standard EEPROM and SRAM chip for this.
  • Combinatorial decoding logic, feeding diode ROM select boards and sourcing from the IR. This is the heart of the control circuitry which will generate the control signals which feed the various register enable input and bus output inputs.
  • Flags register combinatorial logic, feeding the data bus with either a 0 or 1 value given particular opcodes in the IR and current values in the CF and ZF registers. This allows us to preload a 1 or a 0 into the B register and implement a conditional skip.

Given that its going to be rather expensive to prototype in terms of space, time and actual components, I went ahead and wrote a microcode simulator for the CPU design. This was super useful when I was laying out the supported instructions because I was able to test out the actual capabilities of such a CPU by writing miniature programs. Using this simulator I realized that I could do away with several opcodes such as shift right, and could cleverly manipulate the IP register using the ALU to implement standard instruction advancing, relative jumps and conditional execution. During the development of the simulator I ended up also writing a simple assembler and disassembler in Python which will be super useful for writing code that runs on the real hardware before I get the on-target assembler off the ground. If you're interested in playing with it, I threw it up on my website. It also serves as the master documentation for microcodes since it fully simulates the various busses and registers.

I still have a lot of work to do on virtually all of the CPU pieces before I have anything resembling a real CPU. However, given the layout and simplicity of the busses and control signals, I should be able to piece-wise assemble and test the CPU part by part. The next big thing I need to do is test circuitry that will allow me to assert on a bus so that I can start building the bus itself and then send out some boards to be fabricated. I think I'll start with register read/write and a bus and build out the various pieces from there. The most complicated part is going to be the ALU but even that can be built function-by-function until I have a fully functioning ALU circuit. And finally, once I get this particular version of the CPU up and running I'll jump in and see if I cant get absolute jump, call and return instructions and a real stack implemented. Stay tuned for updates to this project!