Design of an 8x8 Modified Booth Multiplier
Introduction to VLSI Design, EE 103Tufts University
Robbie D'Angelo & Scott Smith
In this project an 8x8 multiplier was designed and simulated at the gate level and at the transistor level using the AMS simulator in Cadence Design System. We optimized the multiplier for speed by implementing fundamental building blocks directly in CMOS with the IBM CMRF7SF 0.18um process. Booth's multiplication algorithm was used to reduce the number of partial products, and thus the number of adders, providing a speed advantage. Furthermore, the adder circuit, which is the primary source of delay, was constructed with two layers of carry lookahead logic (CLA) to decrease propagation delay. A sign extension trick is utilized to further decrease the number of logic gates between the input and output. By using transistor level implementations for the CLA logic and the full adder, our design also reduces the total area required compared to gate level designs. Layout was constructed for each block and the full architecture. The worst case delay time with 100fF load capacitance was approximately 2.98ns. This propagation delay is 46% faster than the reference gate level design, where delay was 5.50ns. The final schematic consumes a total area less than 200x200 square microns. Total power consumption at an input signal rate of 200MHz was 2.5mW.
Gate Level Reference Design
A gate level implementation was designed and simulated as a reference design. A 100ps inverter delay was assumed in the Verilog functional description of the logic primitives. Using logical effort the delays of the remaining primitives were approximated. A detailed description of the original gate level design along with explainations of the Booth algorithm, the architecture, the logic equations used for this design, and other background information can be found in the following reports: Lab 3 Part II: Gate Level Booth Multiplier and Lab 3 Parts I&III: Logic Primitives.
Gate Level Booth Multiplier
The Booth multiplier functional simulation shows the simulation results of the gate level Booth multiplier. From the test cases performed, this circuit has a worst case delay of about 5.50ns. The test cases with delays are listed in the design summary table in Design Summary.
Gate Level 12-bit Adder
The 12-bit adder functional simulation shows the simulation results of the gate level 12-bit carry lookahead adder. From the test cases performed, this circuit has a worst case delay of about 3.13ns. The test cases with delays are listed in the design summary table in Design Summary.
8x8 Signed Booth Multiplier
The images below summarize the design and implementation of the signed Booth multiplier. All simulations were performed with Cadence AMS. Automated testing was used to verify functionality.
12-bit Carry Lookahead Adder
The images below summarize the design and implementation of the 12-bit CLA adder, which is used to sum the partial products produced by the Booth encoder/decoder. All simulations were performed with Cadence AMS. Automated testing was used to verify functionality. Techniques used to improve speed performance from the gate level implementation include the use of a transmission gate full adder, the use of two layers of carry lookahead logic, and a CMOS implementation of the CLA logic. The worst case delay time of the test cases was reduced from 3.13ns to 1.18ns using these techniques. The following sections summarize the heirarchy of the 12-bit CLA adder.
Transmission Gate Full Adder
This circuit was implemented to slightly improve speed and to save area by avoiding the use of the large XOR2 gate. Comparative simulations between gate level and pass gate implementations are shown in the links below.
Carry Lookahead Logic
This circuit is comprised of four circuits, where each computes one of the carries in a four bit full adder chain. The CLA_PG block computes propogate and generate signals, which are used for the second layer of CLA on the 4-bit CLA cascade. Layout of the CLA_PG block is identical to the CLA_C3 block, but with different pin names. These designs are done in CMOS as opposed to gate level implementation. This reduces the total area and the worst case delay by nearly 75% although the average cases are roughly the same speed. These results are summarized in the Propagation Delay Summary Table.
4-bit Carry Lookahead Adder
12-bit CLA Simulation Results
This section contains the simulation results for the 12-bit CLA adder. Test cases are enumerated in the delay summary table. Automated testing was used to verify functionality. 100fF capacitors were attached at each output bit for the analog simulations.
Booth Encoder and Decoder
The functionality and purpose of the Booth encoder is described in Lab 3 Part II, cited above.
The half adder is necessary for the Booth decoder to properly implement two's complement negation.
The functionality and purpose of the Booth decoder is described in Lab 3 Part II, cited above.
Logic PrimitivesThe following logical primitives were used in the design of the Booth multiplier.
NAND2, NAND3, NAND4
A signed 8x8 Booth multiplier was successfully designed and simulated. Layout was completed for all blocks except for the full multiplier, the 12 bit adder and the 4 bit adder. The blocks that were completed successfully passed DRC and LVS and post layout simulations. To characterize the improvements made, the test cases for the reference and final schematics of the 12-bit adder and Booth multiplier are shown with respective propagation delays in the Summary of Test Cases table. Note that speed improvements are particularly notable when there is a high degree of carry bit propagation through the circuit due to the CLA logic. To demonstrate the main improvements made in more detail, the 12 bit adder was simulated using Verilog with a reference inverter delay of 100ps (remaining primitive delays determined by logical effort), in SPICE with gate level implementations, in SPICE with transmission gate adder and CMOS CLA logic, and in SPICE with a 2nd layer of CLA. It can be seen in the table that the delay is significantly improved in some cases but remains the same or slightly worse in others. The use of CMOS logic for the CLA required 4um wide tranistors in series, which would decrease the speed greatly in some cases. However, this design reduces the area significantly compared to a gate level design, and in some cases the propagation delay is reduced by up to 60% whereas the cases where it is increased, it is only increased by 20%. This demonstrates the inherent trade off of area and speed that is always present in VLSI design.
Summary of Test Cases
Table of Completion