Design of an 8x8 Modified Booth Multiplier

Introduction to VLSI Design, EE 103
Tufts University

Robbie D'Angelo & Scott Smith

Fall 2011

Abstract

In this project an 8x8 multiplier was designed and simulated at the gate level and at the transistor level using the AMS simulator in Cadence Design System. We optimized the multiplier for speed by implementing fundamental building blocks directly in CMOS with the IBM CMRF7SF 0.18um process. Booth's multiplication algorithm was used to reduce the number of partial products, and thus the number of adders, providing a speed advantage. Furthermore, the adder circuit, which is the primary source of delay, was constructed with two layers of carry lookahead logic (CLA) to decrease propagation delay. A sign extension trick is utilized to further decrease the number of logic gates between the input and output. By using transistor level implementations for the CLA logic and the full adder, our design also reduces the total area required compared to gate level designs. Layout was constructed for each block and the full architecture. The worst case delay time with 100fF load capacitance was approximately 2.98ns. This propagation delay is 46% faster than the reference gate level design, where delay was 5.50ns. The final schematic consumes a total area less than 200x200 square microns. Total power consumption at an input signal rate of 200MHz was 2.5mW.

Gate Level Reference Design

A gate level implementation was designed and simulated as a reference design. A 100ps inverter delay was assumed in the Verilog functional description of the logic primitives. Using logical effort the delays of the remaining primitives were approximated. A detailed description of the original gate level design along with explainations of the Booth algorithm, the architecture, the logic equations used for this design, and other background information can be found in the following reports: Lab 3 Part II: Gate Level Booth Multiplier and Lab 3 Parts I&III: Logic Primitives.

Gate Level Booth Multiplier

The Booth multiplier functional simulation shows the simulation results of the gate level Booth multiplier. From the test cases performed, this circuit has a worst case delay of about 5.50ns. The test cases with delays are listed in the design summary table in Design Summary.

Gate Level 12-bit Adder

The 12-bit adder functional simulation shows the simulation results of the gate level 12-bit carry lookahead adder. From the test cases performed, this circuit has a worst case delay of about 3.13ns. The test cases with delays are listed in the design summary table in Design Summary.

8x8 Signed Booth Multiplier

The images below summarize the design and implementation of the signed Booth multiplier. All simulations were performed with Cadence AMS. Automated testing was used to verify functionality.

  • Architecture: Block diagram of the proposed architecture.

  • Schematic: Final Schematic of the 8x8 Signed Booth multiplier.

  • Symbol: Symbol view of the multiplier.

  • Layout: Final layout of the multiplier.

  • Simulation Waveforms: Transient analysis of Booth multiplier using test cases in Summary of Test Cases table.

  • Test Bench Schematic: Schematic used for simulating the multiplier.

  • Verilog Test Bench Source Code: Verilog code with automated testing for the Booth multiplier.

  • 12-bit Carry Lookahead Adder

    The images below summarize the design and implementation of the 12-bit CLA adder, which is used to sum the partial products produced by the Booth encoder/decoder. All simulations were performed with Cadence AMS. Automated testing was used to verify functionality. Techniques used to improve speed performance from the gate level implementation include the use of a transmission gate full adder, the use of two layers of carry lookahead logic, and a CMOS implementation of the CLA logic. The worst case delay time of the test cases was reduced from 3.13ns to 1.18ns using these techniques. The following sections summarize the heirarchy of the 12-bit CLA adder.

    Transmission Gate Full Adder

    This circuit was implemented to slightly improve speed and to save area by avoiding the use of the large XOR2 gate. Comparative simulations between gate level and pass gate implementations are shown in the links below.

  • Schematic: Schematic view of a full adder implemented with pass gates for faster switching time and smaller area than the gate level version.

  • Symbol: Symbol view of a full adder.

  • Layout: Layout of the full adder.

  • DRC: Successful Assura DRC log.

  • LVS: Successful Assura LVS log.

  • Full Adder Comparative Simulation (Zoomed in): Zoomed in comparative simulation of transmission gate and gate level full adder showing slight speed improvement of 100ps.

  • Full Adder Comparative Simulation: Simulation of full adder. Note that the transmission gate version has more reliable signals due to the smaller spikes during transitions.

  • Carry Lookahead Logic

    This circuit is comprised of four circuits, where each computes one of the carries in a four bit full adder chain. The CLA_PG block computes propogate and generate signals, which are used for the second layer of CLA on the 4-bit CLA cascade. Layout of the CLA_PG block is identical to the CLA_C3 block, but with different pin names. These designs are done in CMOS as opposed to gate level implementation. This reduces the total area and the worst case delay by nearly 75% although the average cases are roughly the same speed. These results are summarized in the Propagation Delay Summary Table.

  • C1 Schematic: CMOS schematic of the logic for the first carry bit.

  • C2 Schematic: CMOS schematic of the logic for the second carry bit.

  • C3 Schematic: CMOS schematic of the logic for the third carry bit.

  • PG Schematic: CMOS schematic of the logic for the propogate and generate bits for a second layer of CLA.

  • Symbol: This link shows a symbol view of a full adder.

  • C1 Layout: Layout of first bit CLA.

  • C2 Layout: Layout of second bit CLA.

  • C3 Layout: Layout of third bit CLA (CLA generate and propogate generation not shown because it is identical to C3).

  • Overall CLA Layout: Layout of total CLA (2nd layer of CLA not shown because it is identical).

  • DRC: Successful Assura DRC log.

  • LVS: Successful Assura LVS log.

  • 4-bit Carry Lookahead Adder

  • Schematic: Schematic of 4-bit CLA.

  • Symbol: Symbol of 4-bit CLA.

  • Layout: Layout of 4-bit CLA.

  • 12-bit CLA Simulation Results

    This section contains the simulation results for the 12-bit CLA adder. Test cases are enumerated in the delay summary table. Automated testing was used to verify functionality. 100fF capacitors were attached at each output bit for the analog simulations.

  • Schematic: Schematic of 12-bit CLA.

  • Symbol: Symbol of 12-bit CLA.

  • Layout: Layout of 12-bit CLA.

  • Simulation Waveforms: Transient analysis of 12-bit CLA using test cases in Summary of Test Cases table.

  • Test Bench Schematic: Schematic used for simulating the 12-bit adder.

  • Verilog Test Bench Source Code: Verilog code with automated testing for the 12-bit adder.

  • Booth Encoder and Decoder

    Booth Encoder

    The functionality and purpose of the Booth encoder is described in Lab 3 Part II, cited above.

  • Single Booth Encoder Schematic

  • Single Booth Encoder Symbol

  • Single Booth Encoder Layout

  • Simulation Waveforms

  • Test Bench Schematic: Schematic used for simulating the encoder.

  • Verilog Test Bench Source Code: Verilog code with automated testing for the encoder.

  • Four Cell Booth Encoder Schematic

  • Four Cell Booth Encoder Symbol

  • Four Cell Booth Encoder Layout

  • DRC: Successful Assura DRC log.

  • LVS: Successful Assura LVS log.

  • Half Adder

    The half adder is necessary for the Booth decoder to properly implement two's complement negation.

  • Schematic

  • Symbol

  • Layout

  • Booth Decoder

    The functionality and purpose of the Booth decoder is described in Lab 3 Part II, cited above.

  • Single Booth Decoder Schematic

  • Single Booth Decoder Symbol

  • Single Booth Decoder Layout

  • 8-bit Booth Decoder Schematic

  • 8-bit Booth Decoder Symbol

  • 8-bit Booth Decoder Layout

  • DRC: Successful Assura DRC log.

  • LVS: Successful Assura LVS log.

  • 8-bit Booth Decoder Simulation Waveforms

  • 8-bit Booth Decoder Test Bench Schematic

  • 8-bit Booth Decoder Verilog Test bench Source Code

  • Logic Primitives

    The following logical primitives were used in the design of the Booth multiplier.

    NOT

  • Schematic

  • Symbol

  • Layout

  • NAND2, NAND3, NAND4

  • NAND2 Schematic

  • NAND2 Symbol

  • NAND2 Layout

  • NAND3 Schematic

  • NAND3 Symbol

  • NAND3 Layout

  • NAND4 Schematic

  • NAND4 Symbol

  • NAND4 Layout

  • Pass Gate

  • Schematic

  • Symbol

  • Layout

  • XOR2

  • Schematic

  • Symbol

  • Layout

  • Design Summary

    A signed 8x8 Booth multiplier was successfully designed and simulated. Layout was completed for all blocks except for the full multiplier, the 12 bit adder and the 4 bit adder. The blocks that were completed successfully passed DRC and LVS and post layout simulations. To characterize the improvements made, the test cases for the reference and final schematics of the 12-bit adder and Booth multiplier are shown with respective propagation delays in the Summary of Test Cases table. Note that speed improvements are particularly notable when there is a high degree of carry bit propagation through the circuit due to the CLA logic. To demonstrate the main improvements made in more detail, the 12 bit adder was simulated using Verilog with a reference inverter delay of 100ps (remaining primitive delays determined by logical effort), in SPICE with gate level implementations, in SPICE with transmission gate adder and CMOS CLA logic, and in SPICE with a 2nd layer of CLA. It can be seen in the table that the delay is significantly improved in some cases but remains the same or slightly worse in others. The use of CMOS logic for the CLA required 4um wide tranistors in series, which would decrease the speed greatly in some cases. However, this design reduces the area significantly compared to a gate level design, and in some cases the propagation delay is reduced by up to 60% whereas the cases where it is increased, it is only increased by 20%. This demonstrates the inherent trade off of area and speed that is always present in VLSI design.

    Summary of Test Cases

    Propagation Delay Summary

    Table of Completion

    Table of Completion