MiniDragon Takes its First Steps!

May 1st 2020, 6:19 pm

Last time I posted about the MiniDragon CPU project I had assembled several components, sent out the heart of the instruction decoder to fabrication and was just beginning to assemble the (at the time) 14 additional register boards to complete the PC, IP and A registers. I had done some initial optimization of the existing stdlib and had a pretty good idea of how I wanted to lay out the boards physically. I had not yet tested any of the boards with each other, although I was relatively confident in their operations. A lot has changed since then! First and foremost, I have assembled enough of MiniDragon to successfully execute my first instruction! Last night, after debugging the last known issue with the boards, I was able to step through the microcodes for LOADI and set the A register to the sign-extended immediate value stored in the lower 6 bits of the LOADI opcode. This works both in manual clocking mode where I can single step with a pushbutton and in automatic mode where the clock generation circuit runs the CPU at a predetermined speed.

Problems with the Instruction Decoder

The instruction decoder is essentially a pull-down bus with 32 parallel control signals. There is a distributor which provides 1K pull-ups and a pair of 32-pin connectors designed to plug into the ROM boards as well as feed the various control signals for the CPU. The ROM boards consist of a series of jumpers allowing me to set which control signals should be pulled low when the ROM is active. There are two variants of the ROM board: a 4-position board which has two bits of addressing and an enable input, and a 1-position mini-board which has only an enable input. When a ROM board is enabled, it will select the correct set of jumpers and pull the correct bits of the bus low to set the control signals for a particular microcode. When it is not enabled, the open-collector transistors providing the pull-down effect will be deactivated and thus high impedance. This is the most elegant solution I could find for tying a large number of ROM boards together.

Initial bring-up of the first ROM board connected to the control signal distributor board.

Now, when measuring the control signal outputs all seemed to be acceptable. The control signals illuminated their respective indicator LED and output ~4.7V according to my oscilloscope. When off, they appeared close enough to 0V to cut it. So, I assembled enough ROM boards to code for a LOADI instruction, including a mini-board that stored the "load next instruction from memory, placing it into the instruction register" microcode. In isolation, everything appeared to work fine. When I connected the instruction decoder board to the microcode counter board, I was able to successfully step through the various microcodes, seeing the control signals change for every step. So I hastily assembled and connected the rest of the components necessary for the LOADI to work (instruction register, A register, immediate register).

The boards I assembled in one marathon assembly in order to integration test a single instruction.

You know what comes next. Nothing worked. Nothing. Somehow the microcode counter wasn't counting up, the instruction register wasn't loading the value from the bus, and it seemed like I completely fried something. The only thing that appeared to work was the SRAM emulator circuit which was faithfully outputting an 8-bit binary value onto the data bus that I had entered on a bank of DIP switches. I know better than to just throw everything together and hope for the best but I'll admit that I was very excited. So, it was back to the drawing board and I was quite disappointed. I put the project down for a night and gave it some thought.

The next morning, I started by taking some measurements on the control signals. I found it a bit suspicious that when I connected the instruction register to the control signal distributor the indicator LED for that signal got quite dim. So, I poked at it with an oscilloscope and found that the control signal line was sitting at around 3V. Unplugging one of the two 4-bit registers bumped it up to almost 4V and unplugging both brought it back to the expected 4.7V. However, when buffering through a pair of not gates, the voltage for a logic high only dropped by a few tenths of a volt. So I went back to the schematics and realized that I had a pair of design flaws.

The inputs for all of my gates go through 10K current limiting resistors before hitting their respective transistors. For almost all circuits, there is only one resistor/transistor pair that inputs connect to. However, the 4-bit registers are effectively 4 identical copies of a 1-bit register laid out on a single board. I had connected both the EN and RST lines directly to the respective 4 bits. This meant that while the data inputs and clock input could be seen as costing 1 fanout to a driving circuit, the EN and RST lines cost 4 fanout. So, connecting two registers was a cost of 8 fanout. Basically connecting a single register board was equivalent to trying to drive four NOT gates at once. Most of my circuits included a emitter-follower buffer on the output stage, ensuring that there is no current limiting and effectively making the fanout a function of the input impedance (normally 10K per circuit) and the 2N2222A's maximum current. So this flaw would not be fatal in and of itself. However, the control signal distributor did not include any output conditioning. It included only a 1K pull-up per signal, allowing that to drive the logic level to 1 when a ROM wasn't setting it to 0.

In the control signal distributor the 1K resistor acts as a current limiter and makes the output voltage succeptible to the number of gates connected to that circuit. To understand why, Ohm's law can be used. If we look at the voltage at the control signal pin when only one circuit is connected, we have 5V feeding a 1K resistor, the control signal pin and then a 10K resistor feeding the base of a 2N2222A. While there is a voltage drop across the 2N2222A, it doesn't matter when trying to understand the problem. So, we have a simple voltage divider with the voltage at the control signal pin equivalent to 5V * (10K/(10K + 1K)) or around 4.5V. That should work. Okay, so let's connect two circuits. Ohm's law also allows us to calculate effective resistance when multiple resistors are in parallel, which in this case is 5K. So, the control signal voltage is now equal to 5V * (5K/(5K + 1K)) or around 4.1V. Its easy to see a pattern here, the more inputs you place in line, the lower the effective resistance is on the low side of the 1K resistor and therefore the lower the voltage is. At a certain point, it becomes not enough to be seen as a logic 1.

So, with the existing design of the control signal distributor there was no way to reliably drive more than 1-2 circuits on a single control line, and an 8-bit register was effectively 8 circuits from the perspective of the instruction decoder. Fortunately, I know how to solve this! Most of my circuits already included emitter-follower buffers to drive their outputs. These buffers pull down to GND instead of up to 5V, meaning they can source as much current as downstream circuits demand up until the transistor burns out. For the 2N2222A, you can provide about 800mA of current before the transistor dies. With 10K current limiting resistors on all the inputs of my circuits, that fanout works out to ~1600 circuits. For my purposes, that's effectively infinite. So, I needed to design a buffer circuit that could be plugged into the control signals distributor that provided an emitter-follower buffer per bit. I looked at my schematics and realized the clock and reset circuit also suffered from the same flaw on its outputs. So, I designed an 8-bit buffer for the control signals and a 2-bit buffer for the clock and reset circuit.

Emitter-follower buffer for the clock and reset lines.

It happens that the 2-to-4 decoder also suffers from this particular flaw. I cheaped out on the design, dropping the output buffer as well as the indicator LEDs. So, instead of trying to patch existing circuits I redesigned it to include both the standard buffering and indicator LEDs that are available on all other circuits. Since designing it I've found that the LEDs are an amazing debugging aide so making larger (and more expensive) boards which contain them seems to be always worth it. That wasn't as clear to me when I laid out the original version of the board a few months ago. Also, remember in a previous blog post where the original AND and OR gates I designed didn't work in some cases? That turns out to be the same root cause so if I'd taken a bit more time to understand why they failed the way they did instead of just shrugging and throwing the output stage back on them I might not have had to fix so much. Oh well though, what's done is done and I have a much more solid understanding of just why everything is actually working together.

Additional Issues

Of course, in any engineering project bigger than a small program or circuit there will be unforseen problems. So its no surprise that when I took the time to bring up the components slowly and in a more organized fashion I found several more issues with my overall design. The first was relating to the data inputs of the various registers. The data bus works very similarly to the control signals distributor. There is a central backbone component providing 1K pull-ups and a bunch of circuits hanging off of that which either read from the bus, write to the bus or both. They work under the principle that when a component isn't reading it doesn't consume any current from the bus and thus doesn't affect its voltage level. That turned out to not be the case for the 4-bit registers. This is due to the inverter connected to the reset input of the internal SR-latch for each bit. So, if I connected enough registers, the voltage on the data bus would sag until other registers only saw 0's on the bus. I didn't want to sacrifice on the elegant design of having one 8-bit bus connection per register so I instead sacrificed on the current consumption for the bus circuits. Remembering the formula for voltage dividers above, we can generalize it for X registers connected like so: 5V * ((10K / X)/((10K / X) + 1K)). As X increases, the dominant factor in the equation becomes the 1K resistor. So I swapped out the 1K pull-ups for 100Ohm pull-ups instead. If we assume 5 registers connected, we go from a 3.33V for a 1 up to a 4.76V for a 1 which is more than adequate.

Partially re-connected components as I brought up each piece methodically.

The second problem was somewhat surprising, but easy to account for. There's a large amount of voltage drop across the feeder wires from the bench power supply to the circuits themselves. This is obviously current-dependent since wires have parasitic resistance. So, the more circuits I plugged in, the more the voltage sagged at the inputs for each circuit. Remember that the power on reset circuitry in the clock generator looks for a voltage of around 4.7V and holds the system in reset if it drops below that amount. This allows the system to self-reset when being powered for the first time but also means that we're fairly sensitive to voltage drops. Once I figured this out it was easy enough to put a probe from the oscilloscope on the power inputs for a random circuit and adjust the voltage of the bench power supply until the oscillocope read 5V. Once I did this the circuits that were behaving erratically when everything was connected started behaving correctly.

The third had to do with the way the instruction decoder ROM boards pulled relevant control signals to 0. I noticed that when the ROM boards were active, they would only pull the control signals to ~1V. This was good enough before the emitter-follower buffers, but now that those were in place lots of circuits were seeing all control signals as active all the time. This turned out to be the current limiting resistor feeding the open-collector pull-down transistors. In the case of the 4-position ROM boards, I replaced the resistors with bridges as they were fed by the internal demux circuit which was already current limited. For the mini-ROM boards I first tried a bridge but burned out the enable transistor on the load instruction ROM. Replacing the bridge with a 1K resistor instead of the original 10K resistor did the trick.

Finally, I had a phantom problem with the clock. I noticed that when additional circuits were connected, the clock would start behaving more and more erratically. It got to the point where just connecting the clock to the general purpose registers was enough to break synchronization for the whole CPU. I spent a bunch of time thinking on this one before starting to take a bunch of measurements. However, they all ended up being unnecessary as the problem turned out to be the 2-position DIP switch on the clock circuit itself. This switch allows you to select between manual and automatic clocking and must have been shorting in some cases, leading to erratic clock behavior. After flipping the DIP switches a few times, the problem went away!

Current state of MiniDragon, with A register holding the binary value 00010111 which was set via a LOADI instruction.

New Physical Layout

I wasn't quite happy with the layout from last time as it stretched some control signals 4+ feet and took up a lot of room physically. So, I came up with a revised 3x3 layout instead of a 2x5 layout. This new version puts the busses much closer to their intended components and also takes into account power circuitry. In order for this to work, the circuits will have to be stacked higher but that's no big problem. My partner also pointed out that she thinks it would look cooler with clear acrylic instead of black ABS so I have a few 12"x12" sheets coming to try that out. I agree with her since its a bit disappointing to work hard on LEDs and circuit aesthetics only to have everything obscured by stacks of additional boards. So, we'll see how that looks!

I've done lots of little tweaks to the schematics in order to make sure they stay up-to-date with the actual built MiniDragon. There haven't been many design changes except to the instruction decoder itself which needed updates to reflect a new opcode layout as well as the buffer boards. I also switched the MCODE_RST control signal to being active low instead of active high. I was a bit worried that it was possible for a ROM board to be deselected before another ROM board was activated. If you remember, control signas are default high with pull-down jumpers. That means if there were variable delays in the enable lines for various microcode ROM boards it would be possible for the MCODE_RST line to pulse high briefly. This is tied to the RST inputs of the flip-flops on the microcode counter board which are async. So, in order to head off any potential problems I switched polarity of the signal. That way, its active low and default high meaning it will never be activated by mistake. All other control signals are acted upon during clock change so this isn't a problem for any other circuits. As a bonus, this also saves jumpers as only one is needed on the final microcode.

The new layout, with bold highlighting completed boards.

Software Side

My progress on MiniDragon has not been limited to hardware. When I needed a break from assembly or I was mulling over problems I tackled lots of TODOs on the software side of things. As I've been coding the stdlib I've run into a few annoying patterns. One of them was that you had to stick temporary values in memory in order to perform math against them. The accumulator was just that: an accumulator. So, more complicated algorithms wasted a lot of their time shuffling values around in memory to get their job done. I took at look at the instruction set and realized that the "no variable width instructions" ship had sailed with the introduction of LNGJUMP and decided to make LOADI perform the same. So, I moved it into the stack operations opcode range and updated the microcode so that LOADI loads an 8 bit immediate value from the next position in memory after the opcode. This meant that a lot of operations in the CPU were much faster now because setting an arbitrary value in A didn't involve lots of shifting and adding. It also meant that I freed up 64 opcodes for use with register math!

I realized that I had several spare register boards laying around that I could populate and bring up so I introduced two temporary registers: U and V. I added additional opcodes to work with these registers. They both hold an 8-bit value, and they can be used as the second math source for all ALU operations. This means that you can manipulate the accumulator using the value in memory that PC points to or the value in U or V. Additionally, instructions to load/store the value of memory at PC into U/V have been added, instructions to move values between A, U and V have been added and instructions to swap the value of A and U, A and V or U and V have been added. I took advantage of these instructions in a few functions, most notably the umult function which is both 20% faster and slightly smaller in the bootROM. Similar optimizations have been made to several other functions. Notably, atoi is almost 50% faster than it was when I originally coded it.

Nothing comes for free, however. If you remember in the last blog post, I had three control signals to spare for future additions. In order to have a working U and V register, I need four control signals (read bus to U, read bus to V, write U to bus, write V to bus). So, I had to take a look and find another group of signals that would never be active at once. I settled on the control signals that feed various bus outputs to the top half of the data bus. It shouldn't be possible to have more than one signal for data bus writing active at once because then two circuits would fight over what value to place on the bus. So, I took three signals (ALU_OUT, A_HIGH_OUT and D_HIGH_OUT) and connected them through a 2-to-4 decoder so they could take up only two bits on the ROM boards. As a bonus, the default value (11 in binary) is mapped to nothing, meaning that for these signals, leaving jumpers out defaults to no signals active. Much like the MCODE_RST negation above, this saves a lot of jumpers in practice.

In order to make U/V registers work with math, a new chunk of the opcode space has been carved out for arithmetic that uses two bits as the operand input. This also left room for modifications to SHL/SHR. As of the last blog post, shifting left or right always shifted in a 0 to the unoccupied spot, and always shifted the lost bit into the carry flag. This meant that in order to shift a 16/32-bit number you would have to do a fair amount of juggling numbers. Given that shifts do not take a second operand, I had room in the opcode layout for variations on shifting. So, I've added rotate left/right and rotate with carry left and right. The former, known as ROL/ROR, shifts as expected, but sets the shifted in bit to the bit that was shifted out, allowing a barrel rotate. The latter, known as RCL/RCR does similar, but the bit which is rotated in comes from the carry. Both set carry to the bit that was shifted out. RCL/RCR are especially useful in umult as they allow me to multiply/divide the two operands by two as per the Russian peasant's algorithm far more efficiently.

The udiv function is now more correct than it was before. Previously, it could only divide unsigned integers that were in the range of 0-127. While it was performing unsigned math, it could not function against numbers that appeared signed on account of how it determined that it was finished. Given the speed-ups afforded by the U/V registers, I was able to switch it over to using ucmp and lift this restriction. It is still twice as slow as it used to be, but it no longer has any caveats. There is also a working implementation of umult16 and umult32 meaning that while MiniDragon technically only has an 8-bit adder, it can successfully multiply two 32-bit integers as long as their result fits in 32 bits.

The assembler/disassembler I've written for the project has its limitations which prevents having separate opcodes decode to the same mnemonic. I work around that by having plenty of duplicated instruction names which include the operand in the name, such as LOADA to load the A register and LOADU to load the U register. This looks ugly in practice, so I sugar over it with macros which the assembler has supported from the beginning. So, under the hood there is a LOADA, LOADU and LOADV instruction, and then there is a LOAD macro which takes a single parameter and emits the correct mnemonic. This allows me to type things such as LOAD A instead of LOADA, making the assembly much prettier in my opinion. If I was truly ambitious, I could integrate with an existing table assembler but I don't see the benefit at the moment.

Next Steps

With the successful integration of the currently built boards and the execution of the (outdated) LOADI instruction I now have a lot more confidence in my direction. I still need to design and lay out the ALU and SRAM interface circuits. However, the design of the rest of the components is solidified and tested. All I have to do to complete the general purpose registers and the majority of the instruction decoder is solder about a billion components to way too many circuits, bring them up and then build the various boards with them. I also need to finish up several functions for the standard library and choose an 8-bit serial chip to use for IO routines. A friend of mine has graciously volunteered a serial terminal that I can use to interface with MiniDragon some day when it is complete. So hopefully next time I write an entry I'll have things further along!