MiniDragon Homebrew CPU Early Progress

March 27th 2020, 2:05 pm

Progress has continued on my MiniDragon Homebrew CPU at a fairly linear pace. I'm well on my way to a very early bring up. Lots of things have been cemented and I am narrowing in on the final physical layout for lots of parts! Since my last blog post a month ago I've made a ton of progress on both the hardware and software side of things, validated a bunch of assumptions and tested a giant chunk of the existing design individually. I have yet to do a full integration test but I am getting very close to having enough of the CPU built in order to start that process!

Physical Assembly and Layout

The current plan for the physical towers of components. The bold text represent finished sections.

At the end of February several major components were out to fab and I had no design for the CPU itself outside of the simulator. In order to get an accurate part count I started putting together a high level block diagram for the whole thing. Dispite a lot of limitations and bugs, I decided to do this in KiCad. The advantage is huge: I have a block diagram where each component can be opened up and inspected, all the way down to an individual transistor. So MiniDragon is about as fully documented as is possible. I have yet to finish all of the diagrams but they are already complete enough for me to have built six components of the CPU as well as create a microcode programming generator program. If you are curious, the block diagram is up on github, along with everything else in this blog entry!

I've continued assembling circuits as they come back from fab. Progress has shifted from initial board bring-up on the first board of a design to bulk assembly of boards. I've been building components as fast as I can in order to get enough boards to build out the various high-level components of the CPU itself. Since the last blog post I've assembled nine Rev. 2 D/T flip-flops, almost a dozen simple logic gates, a few 1-to-2 decoders and 2-to-4 decoders and and a handful of the 4-bit register circuits. On the completed components I count 24 logic boards, flip-flops and demultiplexer circuts and another 24 breakout boards used to bring connections out to the edge of the 1'x1' component boards.

Assembling the D/T flip-flops to be used in the microcode counter board.

As for the components themselves, I have assembled the B and D registers (seen below on the right), the instruction register, the flags circuitry, the microcode counter (seen below on the left) and the data bus (below in the center). The pieces of each of these that interface with each other have been connected as well. Each board has been verified in isolation to ensure that it performs as specified. However, without the beginnings of an instruction decoder any test to verify that the components play well with each other will be meaningless so I've held off for now. I'm sure I'll find stuff during integration and bring-up but that's how every project goes. I'm extremely excited to be very close to this, however!

Current layout of completed components.

The block diagrams also include the wiring for control signals coming out of the instruction decoder. My current design revolves around a bunch of 32-bit ROM boards which are programmed using jumpers and collected through a pull-down open-collector bus. That means that I have plenty of room to represent the 29 control signals as reprogrammable microcodes per-instruction. It also means that I now have a physical location in ROM for each of the control signals. With that I've been able to write a utility that takes the instruction classes in the current simulator and spits out a microcode programming guide for each instruction, telling me how many microcode entries each instruction needs as well as where to put the jumpers in order to make the instructions work correctly. This also means that any modifications I make to the instruction set in the simulator can be quickly reflected in hardware. This will surely come in handy as I continue to iterate on the instruction set.

Software and Opcodes

The instruction set itself has changed little since the last blog post but the changes I have made unlock new processing power. I made some minor adjustments to the ADDPC/SUBPC instructions, renaming them to ADDPCI/SUBPCI (the I standing for immediate, to bring them in line with the other immediate-based instructions). I also added a new 4-bit sign extended immediate register to complement the 6-bit one that currently exists. This allowed me to optimize the ADDPCI/SUBPCI instructions in terms of clock cycles per execution as well as remove the need for 29 ROM boards! This doesn't seem like much, but the overall speedup to the standard library was around 7% and a few of the worst algorithms were sped up by over 20%. Also, the ROM boards themselves are massive and thus expensive, so reducing my need by such a large number of boards is huge!

Measurements taken in the simulator before and after the ADDPCI/SUBPCI change.

Renaming the ADDPC/SUBPC instructions to ADDPCI and SUBPCI opened up the ADDPC namespace for a new instruction that can adjust the PC register by a signed offset stored in the A register. I originally added this instruction to rewind string pointers for strcmp/strcat/strlen/strcpy functions in the standard library. However, after finishing the implementation I realized that it also unlocks indirect memory addressing. This means doing object-oriented operations as well as array lookups and jump tables become much, much faster. It was always theoretically possible to increment/decrement the PC in a loop, but this takes operations that could potentially be thousands of clock ticks down to a single instruction and makes it feasable to use in practice. Much like introducing SKIPIF gave me turing completeness and introducing PUSHIP/POPIP gave me subroutines, this single instruction gives me yet another degree of power to write complex algorithms!

On the software side of things, I've been hard at work fleshing out the standard library for MiniDragon. I've been coding up a host of useful basics that one might expect in a stdlib, such as atoi/itoa, strlen/strcpy/strstr/strcmp, cmp/add/negate/multiply/divide and processor initialization routines all of which are up on github. This has been an enormous amount of fun! I love nothing more than writing a standard library from scratch on a new CPU architecture that doesn't even exist physically yet! It has also been extremely valuable. The changes I've made to instructions that I detailed above came directly out of this exercise. In order to make the standard library as useful and comprehensive as possible I've been making sure that all the functions are fully tested and side-effect free from the perspective of the caller. That meant, in the case of the string functions, adding the ADDPC instruction in order to facilitate this! It has also been a huge relief to see that it is indeed possible to do real-world, useful things with this CPU and that it is not just a physical build of a toy instruction set.

Finally, in the process of writing the standard library I've made a lot of progress refining the toolset that comes with MiniDragon. I've made a boatload of improvements to the assembler, fixed several small bugs in the simulator and added additional utilities such as a function visualizer and the aforementioned microcode ROM programming generator. The processor test suite now includes validation for many classes of errors that the assembler should generate and the assembler is much more helpful in attempting to communicate why something isn't valid. This has been especially valuable in tracking down when JRI instructions reference labels outside of the 31 byte jump boundary. The simulator frontend now allows step-over-function debugging as well as run until return debugging in tandem with the existing single-step. And finally, the assembler supports much more powerful constant definitions, including the addition of the ".char" and ".str" directives for data embedding as well as support for using character literals as parameters to instructions. This has allowed me to keep the standard library fairly readable (as readable as a low-level stack-based CPU can be) while also using fewer instructions for referencing constants.

Screenshot visualization of strlen, showing stack tracing for PC and SPC registers.

What's Left?

Of course, I'm nowhere near done! I've had a few expected setbacks and for everything I finish two more things magically show up on my TODO list. The AND/OR gates that I put together to make the microcode counter board ended up not working in-circuit so I had to submit redesigns of them to fabrication before assembling the microcode counter board. Some of my seemingly simpler components were revealed to be more complicated once I block diagrammed them which meant that I had to submit more fabrication requests. And, of course, assembling the 4-bit register boards is still a very long process. I have gotten the time down from about 3 hours to an hour and twenty minutes. I have also refined my soldering technique which has resulted in far fewer parts coming up wrong during bring-up, reducing the need for time-intensive debugging sessions. However, I still have 14 more boards to assemble, so at the current rate of assembly that's about 20 hours of soldering!

Stack of register boards awaiting assembly and bring-up.

If assembling enough 4-bit register boards to complete the PC, IP and A registers wasn't enough, I also have hours upon hours of additional things to tackle before I'm on the home stretch. The microcode programming boards and the control signals termination and distributor board are all with oshpark right now. When they arrive, I'll need to bring them up and ensure they work as expected before I order a ton more ROM boards. I have another 30 bus output boards coming from fab which will be used for everything from the general purpose registers to the immediate register and the ALU. I also need to start the design of both the ALU and the external memory interface circuitry which will talk to the external RAM, bootROM and external hardware. I also need to decide on that external hardware in order to work on IO routines in the standard library. Right now I am leaning towards a simple LCD for debug messages and an 8-bit serial chip to provide standard in/standard out. It might never happen, but I'd love to pair this thing to a VT-100 like some old mainframe. And finally, I need to continue working on the standard library. I have yet to code memcpy, memset or memcmp. These will be similar to the existing string functionality. I also need to create 16-bit and 32-bit versions of the math and conversion libraries, and also handle 8-bit, 16-bit and 32-bit signed variants of the math library. And finally, once I standardize on the IO itself, I'll need to create access routines and start working on a basic shell to place in the bootROM.

As I get closer to the physical realm one thing is becoming clear: I need to finish this. If I don't, and get to 80-90% done before calling success, I am going to miss the other 80-90% of the process. I am not interested in software-only theoretical CPUs, I want a real life stack of hardware that executes software I wrote for it, built from the ground up. It is going to be a lot of hand-soldering and a lot of patience, but it is going to be VERY worth it!