The complete design of the chip was accomplished in various stages, which were handled deftly by both the team members, Vinay and Jerry. The motivation behind trying to implement the nth-root machine came from the desire to do something novel for the EE103 Class project because a lot of people have been working on conventional topics like ALU, Adders, Multipliers etc.
The first thought which struck our minds, when we heard about the VLSI project, was to implement a square-root machine because that was an architecture which we generally do not come across in any coursework or text-book and implementing such a thing as part of the project would have been a nice avenue to gain more knowledge. However, the literature search on square-root machines showed that people have long been working on such systems and have implemented it using various algorithms like the Newton-Raphson method, Restoring and Non-restoring Algorithms, etc. Though the Non-restoring algorithms used to compute square roots proved to be a very accurate method, but it computed the square root only to 2 places of decimal and the VLSI implementation of the algorithm for a 16-bit input required a major trade-off between circuit complexity and speed.
During the literature survey for square-root systems, we were also trying to find new ways to improve the existing algorithms as well as trying to introduce some more features in our design. It was then that the idea of using logarithms flashed in the minds of one of the team members. We worked on an algorithm to compute the binary logarithm to the base 2 for a 16-bit input number. After some mathematical computations and approximations, we devised a simple way to implement the binary to binary logarithmic converter. However, subsequent search through electronic journals showed that two people named Abed and Siferd have already been working on the same algorithm since the year 2006. They have also suggested VLSI implementation for the logarithmic and anti-logarithmic converters.
The above development made the situation both easy as well as difficult for us. It simplified matters for us because now we had a ready circuit to implement and test, but the quest to achieve something new made us think of new ways to improve the circuit performance by improving the speed and power consumption of the chip. It was, then that we decided to divide the work among ourselves and work individually on each part.
16 bit binary to binary logarithmic converter
The first step towards the design was to implement a 16-bit binary to binary logarithmic converter. The circuit has already been implemented by Abed and Siferd, so our main focus was to implement the design successfully as well as improve the performance in terms of speed and power. The major block of this circuit is a Leading One Detector Circuit which is responsible for the maximum delay of the circuit. So, special attention was paid to reduce the delay of this block. We changed the combinational logic used in the paper to implement the Leading One Detection function. Instead of computing the output in 3 stages, as done in the paper, we computed the output in 2 stages and finally the simulated result showed an increase in speed. The circuit implemented in literature works at maximum frequency of 310 MHz and it gives a decoded output with a HIGH bit at the position of the leading ‘1’ in the input. However, the simulation results clearly show that even in the worst case, our circuit operates at a speed of more than 400 MHz and we get a binary encoded output, corresponding to the position of leading ‘1’ in the input.
We tried to implement several other techniques for increasing the speed, one of which was to use a pipelined architecture so that the throughput of the circuit increases. However, simulations showed that the delay of the registers used for pipelining, made it futile to pipeline such circuits. However, this method can be helpful if we try to implement the same circuit for higher number of bits.
The next issue of concern for us was to reduce the power consumption of the block. In order to reduce the power consumption, we tried to use gated clocks for registers and also tried to turn OFF those blocks of the circuit which were not operational for a particular input. However, this created several issues, which we were not able to solve within the time frame of the project, so we decided to shift our stress from saving power to increasing the speed and maintaining the overall performance of the circuit.
The remaining block for the logarithmic converter was a logarithmic shifter, which was very well explained by Rabaey [1]. The whole design of the logarithmic converter was implemented by Vinay Agarwal, who efficiently tried to increase the speed of the circuit by proper sizing of the gates and finally by implementing a compact layout of the design using standard techniques for the layout of Digital ICs.
The final simulation of the logarithmic converter shows that it generates the Characteristic bits at a clock frequency of 200 MHz. However, it can operate at a maximum clock of 150 MHz to generate the Mantissa bits correctly.
The area of the layout of the logarithmic converter block is 675 x 280 μm2.
Divider:
The next stage of the design was to implement a divider which takes the 16-input from the output of the logarithmic converter and divides it by another 4-bit input n. For this design, we stressed more on the functionality rather than on speed or power because no one has ever integrated a divider with a logarithmic and anti-logarithmic converter. So the success of this block was a crucial part of our design.
Here, we observe that the input to the divider is the 16-bit output of the logarithmic converter, out of which the 4 MSBs form the integral part and the remaining 12 bits form the decimal part of the number. So, the approximations and truncation that resulted from this integer divider actually truncated the lower bits of the decimal part and this had a very negligible effect on the final output of the circuit.
The divider that is used in this project is based on Restoring Division algorithm. Some modifications have been applied to the algorithm to reduce the subtraction iterations to 13 to obtain a 16-bit quotient. The SpectreS simmulation results show that this divider has a typical of 30ns propagation delay, however, divider circuits are usually big and not very fast, especially for simple division algorithm. But fast division algorithms always lead to extremely complex hardware structure and are not applicable to this project. So the divider implemented by us works not very fast compared to Logarithmic Convert and that degrades the speed of the entire circuit. We tried to implement alternate algorithms but within the given time frame, it was more important to achieve correct functionality.
The divider block was implemented by Jerry. He implemented the complete circuit till the final layout of the divider. The layout area of the divider is 2150 x 935 μm2.
Antilogarithmic converter:
The final stage of the design was to implement an antilogarithmic converter. The design of this block was also implemented by Abed and Siferd, so we put more stress on improving the speed performance of the circuit. The 4 MSBs of the divider output were used as control bits for this block. However, in their implementation, the authors used 4 inverters to invert the control signal in order to get a desired output. However, we implemented the same function without using the inverters by simply changing the transmission gate connections in the logarithmic shifter circuit used in this block. Another improvement made in the design was to incorporate parallel computation of the integral and decimal bits by using two logarithmic shifters whose shift operations were modified according to the desired output.
We do not use any clock or registers in this block because the delay of the previous block is so high that it made no sense to use a low frequency clock. However, the total delay of the block has been simulated as 2.313 ns. This shows that we have definitely achieved an improvement in performance over the previous circuits which operate at a frequency of 178 MHz.
The complete design, simulation and layout of this block was carried out by Vinay Agarwal. The final layout area for this block is 980 by 130 μm2.
Final Integration of the chip
The integration of the final chip and subsequent simulations were carried out by both the team members together. Further, the layout and LVS check for the whole chip was done by Vinay Agarwal.
Design of Website:
The design of the website was the next crucial job ahead since we need to present the results obtained. The painstaking job of consolidating the entire stuff and put them on the website was done by Jerry. While Jerry was building the website, Vinay prepared the write-ups for the various pages to speed up the entire process.
Achievements:
This project was a great enriching experience for both of us in terms of gaining knowledge in various spheres of digital design. We had set very high targets for the design both in terms of speed and power consumption. Now, as we approach the end of the end, we feel that we have partially achieved the goals set forth in the beginning of the design. The achievements are listed below:
- Successful implementation of the complete architecture to compute the nth root of any 16-bit input.
- Improvement in the speed of the Leading One Detector Circuit
- Improvement in the speed of the Logarithmic converter
- Improvement in the speed of the Antilogarithmic converter.
Future Improvements:
There are several unsolved issues we need to work on before we send the chip for fabrication. A few major ones are listed below:
- The Divider block is not working very fast, so we have to work on improving the speed of the divider by employing techniques like pipelining and by trying alternate algorithms for division operation.
- The mathematical approximations taken in computing the binary logarithm of the 16-bit input leads to large error in the logarithm value of the inverter. This error increases many folds when we take the anti-logarithm at the output stage. This sometimes leads to a very bad output. So, we have to work on implementing a correction algorithm to correct the output of the logarithmic converter.
- The power saving techniques which we could not try due to the limited time frame, should be implemented to increase the efficiency of the design.
Name of the Block |
Schematic View |
Layout View |
DRC |
LVS |
Analog Simulation |
NOT |
Vinay |
Vinay |
Pass |
Pass |
Vinay |
NAND2 |
Vinay |
Vinay |
Pass |
Pass |
Vinay |
NAND3 |
Jian |
Jian |
Pass |
Pass |
Jian |
NAND4 |
Jian |
Jian |
Pass |
Pass |
Jian |
NAND5 |
Jian |
Jian |
Pass |
Pass |
Jian |
AND2 |
Jian |
Jian |
Pass |
Pass |
Jian |
AND3 |
Jian |
Jian |
Pass |
Pass |
Jian |
AND4 |
Jian |
Jian |
Pass |
Pass |
Jian |
AND5 |
Jian |
Jian |
Pass |
Pass |
Jian |
NOR2 |
Vinay |
Vinay |
Pass |
Pass |
Vinay |
NOR3 |
Vinay |
Vinay |
Pass |
Pass |
Vinay |
NOR4 |
Vinay |
Vinay |
Pass |
Pass |
Vinay |
NOR5 |
Jian |
Jian |
Pass |
Pass |
Jian |
OR2 |
Jian |
Jian |
Pass |
Pass |
Jian |
OR3 |
Jian |
Jian |
Pass |
Pass |
Jian |
OR4 |
Vinay |
Vinay |
Pass |
Pass |
Vinay |
OR5 |
Jian |
Jian |
Pass |
Pass |
Jian |
XOR2 |
Jian |
Jian |
Pass |
Pass |
Jian |
XNOR2 |
Jian |
Jian |
Pass |
Pass |
Jian |
2-1-MUX-DFF |
Vinay |
Vinay |
Pass |
Pass |
Vinay |
2-1-MUX_Shifter |
Vinay |
Vinay |
Pass |
Pass |
Vinay |
2-1-MUX-TG |
Vinay |
Vinay |
Pass |
Pass |
Vinay |
4-bit Register |
Vinay |
Vinay |
Pass |
Pass |
Vinay |
12-bit Register |
Vinay |
Vinay |
Pass |
Pass |
Vinay |
16-bit Register |
Vinay |
Vinay |
Pass |
Pass |
Vinay |
DFF with CLEAR |
Vinay |
Vinay |
Pass |
Pass |
Vinay |
DFF without CLEAR |
Vinay |
Vinay |
Pass |
Pass |
Vinay |
LOD 4-bit |
Vinay |
Vinay |
Pass |
Pass |
Vinay |
LOD 4-bit Group |
Vinay |
Vinay |
Pass |
Pass |
Vinay |
LOD 16-bit |
Vinay |
Vinay |
Pass |
Pass |
Vinay |
16-word-by-4-bit-ROM |
Vinay |
Vinay |
Pass |
Pass |
Vinay |
Logarithmic Shifter |
Vinay |
Vinay |
Pass |
Pass |
Vinay |
16-bit Log-Converter |
Vinay |
Vinay |
Pass |
Pass |
Vinay |
Divisor Shifter |
Jian |
Jian |
Pass |
Pass |
Jian |
4-bit Subtractor |
Jian |
Jian |
Pass |
Pass |
Jian |
Remainder Selector |
Jian |
Jian |
Pass |
Pass |
Jian |
4-bit Controlled Subtractor |
Jian |
Jian |
Pass |
Pass |
Jian |
4-bit Comparator |
Jian |
Jian |
Pass |
Pass |
Jian |
16-bit Quotient Shifter |
Jian |
Jian |
Pass |
Pass |
Jian |
16-bit Partial Divider |
Jian |
Jian |
Pass |
Pass |
Jian |
Complete Divider |
Jian |
Jian |
Pass |
Pass |
Jian |
Anti-log Shifter(integer part) |
Vinay |
Vinay |
Pass |
Pass |
Vinay |
Anti-log Shifter(decimal part) |
Vinay |
Vinay |
Pass |
Pass |
Vinay |
Anti-log Converter |
Vinay |
Vinay |
Pass |
Pass |
Vinay |
16-bit Nth Root Circuit |
Vinay |
Vinay |
Pass |
Pass |
Vinay |
|
Total Area |
Total Transistors |
Power Comsumption |
16-bit Nth Root Circuit |
1001.850um by 2599.2um |
6480 |
3.462 mW |
|