8x16-bit Register File

The goal of today’s post is to build a 8x16-bit register file, i.e. a bunch of memory elements packaged as a single component. Figure 1 shows the final component we are going to build. It can read two registers asynchronously and write one of them on the rising edge of the supplied clock.

The first input is the clock (clk). The 16-bit value dD will be written to the register pointed by rD on the rising edge of the clk input. Since the registers will be 2 bytes wide, each byte can be independently written via the lb_0 and lb_1 inputs (lb = load byte).

Inputs rA and rB are the two registers we are interested in reading. Their current value will be available on the dA and dB outputs. Note that these 2 won’t be synchronized to the clk input. They can change at any moment and their respective output will be updated (almost) immediately.

Finally, oeA and oeB inputs (oe = output enable) control whether the corresponding output will hold the current value of the selected register, or if it will float. This way, we can use multiple instances of this register file to increase the number of available registers in a circuit, by combining all their outputs on a common bus. More on this at the end of the post.

Positive edge triggered D flip flop

Let’s start small. The basic building block for this component is the “Positive Edge Triggered D Flip Flop” shown in figure 2. It consists of three cross-coupled active LOW SR latches. Every time the clk input goes from LOW to HIGH, Q is updated to mirror the input D.

Qb is the complement of Q. It won’t be used later, but it’s there for completeness. The same flip flop can be used to build other kinds of circuits where Qb might be needed. The latches are connected in a way that the Q output is updated only on the transition of the clk input from LOW to HIGH (positive edge). In all other cases, no change on the D input will affect the Q output. For additional details take a look at the wikipedia article on flip flops (paragraph “Classical positive-edge-triggered D flip-flop”).

1-bit Register

The flip flop from figure 2 can hold 1 bit of data. The clk input doesn’t need to be an actual clock. Any kind of 1-bit input can be used as a clock and it will be updated on its rising edge. In cases where the clk input is actually a free-running clock, we might have a problem if we don’t want to update its contents on the next rising edge. E.g. the wire connected to the D input changes value but we don’t want to store it in the flip flop on the next clock tick.

Figure 3 shows how this can be accomplished. By using a 2-input mutliplexer in front of the D input, with the first MUX choice being the old value and the second choice being the new value, we can select whether we want to update the flip flop or not, via the new load input.

In order to distinguish this circuit from the flip flop shown above, I’ll call it an 1-bit Register.

Side note: I’ve created the mutliplexer using 2 AND and 1 OR gates instead of 2 tristate buffers and a 2x1-bit bus, as shown in the ALU post. This is because the tristate MUX has a small glitch which affects the rest of the circuit (at least in DLS). When changing the sel input of the tristate MUX, there’s a simulation timestep where both bus inputs are active at the same time. In such cases the buses in DLS are configured to output an Error value. If the output of the bus is connected to the D input of the flip flop, we might end up in an invalid state, from which it’s impossible to get out of. In the ALU circuit this wasn’t a problem since all components were combinational and they correctly handled Error inputs.

16-bit Register

Expanding to 16 bits is easy. We just use 16 instances of the 1-bit Register component, wire everything together and we are done. Figure 4 shows the 16-bit Register circuit.

As I mentioned at the beginning of the post, since the register is 16 bits wide (2 bytes), we might want to control (write) each individual byte separately. This is the reason the load input is 2 bits wide. Each bit controls one of the bytes. If it’s not obvious from the figure, load_0 is connected to the load inputs of the first 8 1-bit registers and load_1 is connected to the load input of the other 8 1-bit registers.

Side note: The first time a register is initialized, both bytes should be written. The initial state of the D flip flop produces an Undefined output (because the clk hasn’t ticked yet). Since we don’t mask/split the register output to separate the individual bytes, having only 8 of the 16 bits initialized and the rest in an Undefined state will produce an Undefined value on the Y output. This is because, if at least 1 of the inputs on a wire merger is equal to Undefined or Error, the rest of the bits are ignored and the special state is propagated to the output.

The Register File

With the 16-bit Register component ready, we can now build a small register file. To keep things as simple as possible, we’ll use only 8 registers and (as mentioned in the intro) we’ll add a way to mask the outputs in order to be able to cascade multiple instances of this circuit to build larger files. Figure 5 shows the complete circuit. It might be a bit difficult to read, so I’ll break it up into parts, with zoomed in screenshots.

Figure 5: 8x16-bit Register File circuit

Figure 6 shows the write part of the circuit. clk is the clock and it’s routed to all the clk inputs of the 8 registers. dD is the 16-bit value we want to write to the rD register and it is again connected to the D inputs of all the registers. The 3-bit rD input is decoded using gated 3-to-8 decoders, one for each byte, based on the lb input. A gated decoder (figure 7) works the same way as the decoder we saw in the ALU post, with the only difference being that when its en input is LOW, all outputs are LOW.

Figure 6: The *write* part of the circuit

Figure 8 shows the read part of the circuit. All register outputs are routed to two 16-bit 8-input mutliplexers (figure 9). rA and rB are used as the sel input to the two MUXes. MUX outputs are connected to 16-bit tristate buffers, with the control pins connected to the oeA and oeB inputs.

Figure 8: The *read* part of the circuit

Note that in this case, since the MUX is after the flip flops, we can use tristate buffers. As long as the output of the register file isn’t connected to another register, there shouldn’t be a problem. If there is, we can always come back and replace the MUX with an AND/OR version.

Larger register files

The component presented above (figure 5) can be used to build both wider and deeper register files. Unfortunately, DLS doesn’t currently support bit widths larger than 16 bits per wire/pin, so building (e.g.) a 32-bit register file will end up in a mess of wires :) It’s possible, but you’ll need double the IO pins and wires.

Instead of building a wider file we’ll build a larger/deeper one, with 16 registers, using two instances of the component. In this case rA, rB and rD should be expanded to 4 bits, with their MSB used to select the correct file, by turning on or off the corresponding load and oe inputs. The dA and dB outputs of the two instances are connected on one bus each and then routed to the final outputs. Figure 10 shows the final 16x16-bit register file circuit.

Figure 10: 16x16-bit Register File circuit

Note that in this circuit there’s no output enable inputs since I assumed this component won’t be used to build even larger components. If this is the case, both oe inputs should be exposed to correctly handle cascading.

That’s all for now. Thanks for reading. As always, comments and corrections are welcome.