Наноиндустрия - научно-технический журнал - Наноиндустрия - Особенности маршрута схемотехнического синтеза и топологического проектирования СБИС по технологическим нормам 28 нм

Issue #9/2018

Bobkov Sergey G., Vlasov Alexander O., Gorelov Andrey A., Emin Evgeniy K.
Features of 28nm TSMC Physical and Logical VLSI Design Flow

This work describes the analysis of the specific features of TSMC 28HPC+ library cells. Cell-based synthesis design flow and topological design flow is being considered with account of obtained results.

Теги: cadence circuit synthesis route of design topological design tsmc 28nm tsmc 2 — m vlsi маршрут проектирования сбис схемотехнический синтез топологическое проектирование

INTRODUCTION
As feature size standards decrease, problems related to the optimization and improvement of existing VLSI development routes are being dealt with. A direct consequence of feature size reduction is library cells area reduction, which results in higher degree of IC components integration and performance improvement. However, it should be taken into account that when developing a project for technological standards below 65nm, the contribution of static power to total power consumption increases due to leakage currents. This parameter is just as critical as performance, and also has a huge impact on the physical implementation flow.
ANALYZING 28NM TSMC HPC+ LIBRARY CELLS PROPERTIES
The basic principle of building libraries is based on PVT (process / voltage / temperature) modeling, reflecting the performance of cells built on n- and p-channel transistors. In this process, simulation occurs at the extreme corners in order to simulate expected results under various conditions and to make sure circuits operate correctly in all cases: ss (slow-slow), having the low performance, best ff (fast-fast) high-performance corner and tt typical corner.

The starting point of the current research was the physical implementation flow developed for the TSMC65 process. Its feature was an ability to optimize the static power of the project, using library cells built on transistors with different threshold voltages (Fig. 1).
Fig. 1 shows that the basic structure of the library is to break up the library cells into subgroups that are implemented using transistors with different threshold voltage values, namely:
HVT (High Voltage Threshold) — cells built on transistors with a high voltage threshold. They have low power consumption and low speed. Optimization by these cells occurs in non-critical ways of the project to reduce the static power.
RVT/SVT (Regular/Standard Voltage Threshold) — cells built on transistors with a standard value of the threshold voltage. It is a compromise between HVT and LVT cells. They are balanced on performance and power consumption.
LVT (Low Voltage Threshold) — cells built on transistors with a low threshold voltage value. They have higher speed and smaller delays compared to HVT and RVT, but consume more power. They are used for achieving a good result in the project’s critical paths.
The analysis of 28nm TSMC HPC+ library cells has shown that unlike TSMC65, a new transistor channel length (L) classification is added to the library structure. The user has 3 options: (L30) — channel length of 30nm, (L35) — channel length of 35nm and (L40) — channel length of 40nm (Fig. 1). This variation allows you to reduce not only the dynamic power consumption, but also the leakage power. The application of cells with a shorter channel length allows achieving the maximum performance; therefore they are used to optimize critical paths, while cells with a longer channel can significantly reduce the leakage power in paths that are not critical by performance.
An additional analysis of library cells has been performed to improve the results of the physical design flow for the 28nm technology. So, in the given technological process within one library there may be additional cells implementation combined in separate groups and marked by the following indexes:
Р (performance category) — are characterized by smaller delays and an increased area in comparison with the basic cells. This set could be used in critical locations.
Р (high performance category) — are characterized by even smaller delays and an even larger area compared to the basic cells. This set has to be used in critical locations.
M (maximum category) — all the transistors in these cells have a maximum size. Thus, it is possible to reduce internal delays. The area of cells is the same as that of the basic cells.
RE (reversed category) — cells with a reversed stack order, so that it is possible to reduce delays at certain inputs. The effect is similar to that of M category.
REM — a variant of cells with maximum transistors sizes and reversed stack order.
REP — reversed stack order with performance option (similar to P category).
REP — reversed stack order with high performance option (similar to HP category)
OPT (option category) — cell variant with minimal possible use of M2 metal for trace connections inside the cell.
Another feature of 28nm TSMC HPC+ is that it is not possible to complete routing of all cells in the first metallization layer (M1). The interconnections that link the cells contacts to the power/ground grid are usually located directly above the corresponding contacts and are performed in the M1 layer. According to the documentation, the resistance of M1 is much larger than that of the rest of the metal layers. Therefore, in order to avoid a maximum IR drop, it is necessary to place the accompanying contacts of the line elements not only in M1, but also in M2, thus reducing the number of options for interconnections routing. Therefore, if it is impossible to complete full routing at a fixed placement density, it is reasonable to use OPT cells (Figs. 2–3).
CHOOSING THE OPTIMAL SET OF STANDARD CELLS IN THE LOGIC DESIGN FLOW
The development of optimization methods has been implemented on the test block of integer operations int_mult_div, which is part of the microprocessor core developed by SRISA RAS. This block has been chosen because it contains neither macro-blocks, nor memory blocks, and its structure is rather heterogeneous, which makes it possible to adequately reveal the properties of library cells. The whole flow of integral circuit development has been carried out using CADENCE software. Logic synthesis has been performed in EDA GENUS [2].
The first step in optimizing the already existing route was to determine the main qualitative characteristics, which will be used for evaluation and further optimization, namely:
clock frequency;
the block area;
static power consumption;
total power consumption.
For more accurate evaluation some constraints were used, namely:
the same set of combinational and sequential functional logic cells in all project variants;
identical constraints used for every project variant.
Optimization was carried out up to maximum performance achievement.
As a basic implementation of the project, we have chosen a variant containing cells built on transistors with a channel length L of 35nm and with a standard value of the threshold voltage RVT. For further study and more intuitive visualization of the data, all values of characteristics obtained by implementing different variations of int_mult_div block have been normalized to the reference project.
Table 1 shows that for the given comparison with a decrease of transistor channel length L performance increases by 12 %. But in order to achieve these results, it is necessary to sacrifice power efficiency. In this case, leakage power increases by 105 %. For the project variation based on cells with a channel length L of 40nm, a 9 % degradation of performance is observed, but the leakage capacity is also reduced by 42 % relative to the reference case. From these considerations, the following conclusions have been made:
the reduction of the channel length L increases the performance;
with an increase in the length of the channel L, the lowest value of the leakage power is achieved;
considering the application field of the integrated circuits, namely devices that require making trade-offs between improving power and improving delays, it can be said that the main advantage of the project variant with different channel length is a decrease of the static power consumption. There is a significant scatter in values compared to the reference version: + 105 % and — 42 % corresponding to L30 and L40nm.
Further analysis has shown how optimization by cells with different channel length and different threshold voltage values affects the design (see Tables 2 and 3).
In case of design optimization by cells built on transistors with 30nm channel length and a low threshold voltage, the performance is increased by 18 %, whereas the power is greater almost by 4 fold (396 % relative to the basic implementation of the project). However, optimization by HVT cells within the same channel length leads to performance degradation by 1 % and leakage power increase by 3 %. During the project optimization by cells built on transistors with a channel of 40nm and having different threshold voltage values, it has been revealed that optimizing the LVT project with cells almost does not increase the performance and power consumption, just like the case of HVT cells. It may be explained by small percentage of the cells used when optimizing the project.
Summarizing obtained data, it is possible to make the following conclusions:
the most dramatic changes of the project parameters for the optimization by LVT and HVT cells will occur, if the percentage of cells during the synthesis flow exceeds 20 % of the total. Otherwise, the use of LVT and HVT cells does not provide significant advantages;
the project optimization by cells, containing only transistors with standard value of threshold voltage, but with different channel lengths is economically effective. The implementation of HVT and LVT cells requires additional photolithographic mask creation, which can significantly increase the cost of the whole IC development flow.
For a more preсise analysis, it is necessary to obtain three design options critical for performance and power consumption. For this purpose, let us analyse all the variants built up on L30 LVT and L40 HVT cells and the variant built up on L30 LVT and L40 HVT cells (Table 4).
According to the data obtained, the project based on the L30 LVT has the highest performance (+30 % compared to the L35 RVT), but the power consumption is significantly increased (by 746 %). The lowest energy consumption has the project built on the L40 HVT — leakage power reduced by 82 % as compared to the L35 RVT. But there is also the performance degradation by 17 %. The best solution for performance and power consumption is a design built on L30 LVT and L40 HVT cells. In this case performance improves by 26 % and power consumption is reduced by 18 %.
Summarizing the above, in order to change parameters of productivity and power consumption radically it is necessary to use LVT cells to optimize delays on critical paths, and HVT cells to optimize performance on non-critical paths to reduce static power, since the use of cells built on transistors with different channel lengths does not provide significant advantages for important parameters of the circuit. In adapting the existing design flow for the TSMC28HPC+ process, we have considered the fact that it was necessary to provide the engineer with a possibility of implementing three main design variants (Table 5), namely:
reference variant providing trade-off between power consumption and performance;
super-economical variant providing low power consumption with minimal performance loss;
high-performance variant providing best performance and optimized by static power.
Maximum performance variant has not been considered due to disproportionate growth of power consumption.
These variants allow the user to choose the required implementation variant depending on the technical design specification, considering the integrated circuit purpose and economic implementation of the project.
STANDARD CELLS PLACEMENT AND ROUTING FLOWS
According to manufacturer’s technical documentation[3], within the standard cells library, there exist several variations of one cell, differing in technological parameters. So, within already developed topological design flow there are additional possibilities of influencing the performance/power consumption ratio. Also there is a number of approaches that allow reducing the number of possible DRC violations and total physical synthesis runtime. Physical synthesis has been performed in EDA Innovus[4]. To test the influence of above mentioned technological process options on the implementation of the int_mult_div block, all standard cells variations were divided into 5 conditional groups:
Reference implementation without special variations of standard cells, which will be used to normalize the obtained results.
Implementation using OPT cells. According to the manufacturer’s description [3], it is possible to assume that application of this group of cells will reduce the number of DRC violations associated with impossibility of routing cells interconnections.
Implementation using P, HP and M cells. According to the technical documentation, the use of these elements leads to a timing performance improvement by reducing inputs delays.
Implementation using all available cells (OPT, P, HP, M, all variations of R). It is possible to assume that this variation of the block will provide best timing performance, as well as minimal cell’s placement density. In addition, the reduction of the total number of cells used in the physical synthesis flow can significantly reduce the static power consumption of the entire design. A major disadvantage of this variation is a significant increase in the physical synthesis runtime.
Implementation using all cells except OPT (P, HP, M, all variations of R). It is assumed that this variation will have the advantages over variant 4, i.e. improved timing, reduced static power consumption compared to the reference variant. Besides, there may be an increase in DRC violations and in the runtime of physical synthesis.
For more complete analysis, additional constraints on block area should be added — to observe changes in the cells placement density.
For all five implementations, strict timing constraints (Tclk = 0.4ns) have been imposed. Thus, it can be assumed that the design flow has been focused precisely on the timing performance. It should be noted that the basic implementation of the block has the largest number of elements and has almost the biggest cell placement density within a given block area. It is noteworthy that the block area of implementation without constraints is 2 % less than the reference one. This fact can be explained by a possibility of using cells with the best timing performance and as a result the total elements number has been reduced, while in the reference implementation it is necessary to add additional buffers for improving STA (Static Timing Analysis).
Besides, the number of DRC violations has also been taken into account. So, a significant increase in the number of errors for the 4th variant is observed, i.e. a 79 % increase in violations compared to the reference variant. Moreover, the number of DRC errors for block variation with OPT cells is reduced by 14 % compared to the reference variant, just like it has been predicted. A larger number of DRC violations do not necessarily lead to a degradation of the design properties, but it results in the need to correct such errors, which can result in significant time costs during the design flow. Power consumption for all the variants except the second one has remained without notable changes. For the second variation, static power consumption increased by 6 % which leads to an increase in total power consumption by 12 %. Besides, for the third variant, an insignificant reduction of static power consumption by 4 % can be observed.
In addition, the synthesis runtime of each block has been analyzed. A significant increase in execution time should be noted for the fourth variation (by 241 %), as well as for variation not using OPT cells (by 214 %).
To sum up, the following conclusions should be made:
The previously made conclusion has been confirmed, the use of OPT cells resulted in a reduction of DRC-related violations by 14 % due to wires routing in the first two metal layers. On the other hand, total power consumption has increased by 12 %. The expediency of applying this variation depends on the requirements for the design flow.
In performance-critical cases, EDA can use different variations of P, HP, M, RE cells, which ultimately makes it possible to significantly reduce the placement density and power consumption, including the static power. It can be explained by the absence of the need to place additional buffers and, consequently, by optimizing the cell sizes, which ultimately allows one to provide the specified performance. The main disadvantage of using these variations is a significant increase in the synthesis runtime and the number of DRC violations, which can significantly increase the total design flow time when developing complex blocks.
Thus, as compared to the TSMC65nm design rules, there are even more options for optimizing the design in two ways: towards improving power consumption and towards timing performance.
Using all possible sets of cells leads to a significant increase in the synthesis runtime and the number of DRC violations, which leads to a significant increase in the design flow total development time. The appropriateness of using certain elements in the synthesis should be most carefully considered.
CONCLUSIONS
The article highlights the main key features of 28nmTSMC logic and physical design flow technology. The library cells analysis allowed implementing into the VLSI design flow new options aimed at optimizing such critical parameters as performance and power consumption, with a minimum number of DRC violations and no errors in setup and hold time during the static time analysis in register elements of combinational logic. The obtained data are especially relevant in the light of future researches of the SRISA RAS.
REFERENCES
1. Vlasov A. O. “Optimizatsiya potreblyaemoi moshchnosti mikroskhem s ispol'zovaniem tranzistorov s raznym porogovym napryazheniem” 13-ya Rossiiskaya nauchno-tekhnicheskaya konferentsiya “Elektronika, mikro- i nanoelektronika” Sbornik nauchnykh trudov, 2011. P. 65–68.
2. Genus User Guide for Legacy UI. Product Version 16.2. April 2017, Cadence Design Systems, Inc.
3. Dolphin Technology Standard Cell Usage Document, September 2012, Dolphin Tecnology, Inc.

Nanoindustry. Issue #9/2018

Readers feedback

Leave a feedback

Nanoindustry. Issue #9/2018