16-bit adders

Last time I presented 2 different implementations of a 4-bit adder. The Ripple Carry adder (RCA from now on) was a bit slower in terms of gate delays compared to the Carry Lookahead adder (CLA). Today I’ll try to reuse those circuits as components in order to build 2 different versions of a 16-bit adder.

Before turning the circuits into components, and in order to keep the overhead of splitting and merging fat wires using Wire Splitters and Mergers, I replaced the 4-bit single-pin inputs A and B with 4-bit 4-pin inputs. The difference is that the generated component will have 8 individual 1-bit inputs (excluding Cin) for A and B. The same is done to the S output.

Figure 1 shows the simplified 4-bit RCA circuit and figure 2 shows the simplified 4-bit CLA circuit. Also note that the Cout output of the 4-bit CLA has been removed because it won’t be used in the 16-bit circuit.

Figure 1: Simplified 4-bit RCA
Figure 2: Simplified 4-bit CLA

16-bit Ripple Carry Adder

Figure 3 shows the 16-bit RCA circuit. It includes 4 instances of the 4-bit RCA component, connected in series (if it’s not obvious from the image, the order is top left -> top right -> bottom left -> bottom right; just follow the carries). If we take into account that each 4-bit RCA includes 4 1-bit Full Adders, this circuit requires 32 XOR, 32 AND and 16 OR gates. The worst case delay for this circuit is 34T (calculated using the testbench method described in the previous article), including the 2T overhead of the 16-bit merger and splitter.

Figure 3: 16-bit RCA

16-bit Carry Lookahead Adder

The 16-bit CLA is shown in figure 4. It uses 4 instances of the 4-bit CLA component from figure 2 and 1 instance of the same Carry Lookahead Unit (CLU) used in the 4-bit CLA circuit. The CLU takes the 4 (P,G) pairs generated by the 4-bit CLAs, as well as the Cin input, and calculates PG and GG. From those two the final carry output is calculated, using the same subcircuit as in the 4-bit CLAs (Cout = GG + PG*Cin).

The total gates required for this circuit is: 32XOR + 85AND2 + 67AND4 + 51AND3 + 34OR4 + 17OR3 + 18OR2. The worst case delay is 12T, including the 2T overhead of the mergers/splitters.

Figure 4: 16-bit CLA

Comparison

From the above it’s clear that the CLA is a lot faster than the RCA, but it requires many more gates. Looking at the maximum delays when comparing the 2 circuits might not tell the whole truth though. Figure 5 shows a histogram from 10k random changes to the inputs of both circuits. The X axis is the measured circuit delay and the Y axis is the number of occurrences. The worst case delay for the RCA might be 32T but the average is a lot less. The most frequent delays is around 11T for both circuits.

Figure 5: Comparison between the 2 circuits

That’s all for now. Thanks for reading. Comments and corrections are welcome.