Issue #9/2018
Grekov Artem V., Tyurin Sergey F.
Improving FPGA Logic Basing on Increased LUT Bit Capacity and Creating Adaptive Logic Modules
Improving FPGA Logic Basing on Increased LUT Bit Capacity and Creating Adaptive Logic Modules
Expressions were obtain for estimating the complexity and speed of decomposition of a multi-bit LUT at a lower-order LUT. A comparison of the complexity and delay in the number of transistors was perform for the decomposition of a multi-bit LUT in the computer mathematics Mathcad system. The features of constructing multi-bit LUTs were determined and various variants of decomposition were evaluate with further increase in the LUT dimension with the subsequent choice of the optimal variant of the adaptive logic module.
Теги: adaptive logic module alm complexity decomposition fpga logic element lut speed transistor адаптивный логический модуль alm быстродействие декомпозиция логический элемент плис типа fpga сложность транзистор
INTRODUCTION
Logical elements (LE) of programmable logic integrated circuits of FPGA type (field-programmable gate array) [1–4] are ROM permanent memory devices (often called LUT-Look Up Table) implemented on a multiplexer whose data inputs are adjusted by constants. To configure a given logical function in RAM cells (SRAM), the corresponding truth table is loaded. When one of the 2n paths in the transistor tree is activated by variables, the value of the logic function is read from the corresponding RAM cell and transmitted to the OUT output. Variable inverters ensure the realization of all members of a perfect disjunctive normal form (PDNF).
The optimal in terms of speed and complexity of representing typical logic functions is the use of LUT in four variables (Fig. 1).
Such LUT for the input variables х4, х3, х2, х1 (setting is for 16 bits) is described by the expression:
Eqn001EN.eps(1)
STATING THE PROBLEM
Given: adaptive logic modules FPGA Stratix III in seven variables.
The problems of decomposition of multi-bit LUT are not fully covered in the literature [3–4, 6–7].
It is required to assess the complexity and speed of the decomposition of a multi-bit LUT in order to identify features of the construction of adaptive logic modules and the prospects for further increasing the bit capacity.
LITERARY REVIEW
Stratix III FPGAs have adaptive (ALM) logical blocks that are combined into logical blocks (Logic Array Block, LAB) [2, 3], which implement functions of even seven variables. The peculiarities of the implementation of such LUTs are of interest. The point is that due to the limitations of Meade and Conway on the number of consecutively connected transistors [5], the tree of transmitting transistors cannot contain more than four transistors in the chain. It is necessary to decompose the multi-bit LUT into LUTs of lesser length, that is, to construct a tree from the subtrees.
Stratix III FPGA is described in a sufficient number of sources [3, 4, 6, 7]. There are data on the possible production of such FPGAs at the Voronezh Plant of Semiconductor Devices (VZPP-Mikron), JSC KTTS “Electronics” [8]. The structure of such FPGAs includes the so-called logic array blocks containing adaptive logic modules ALM, which can be configured to implement combinational logic, including arithmetic operations, as well as for the implementation of automata with memory.
The ALM architecture is compatible with the architecture of the 4-input LUTs, and one ALM can also implement any functions up to six variables and certain functions of seven variables. It is noted that such architecture wins on speed and efficiency (probably, it is a question of hardware expenses and the area of a crystal) — see Fig. 2.
Fig. 2 indicates eight inputs of the adaptive LUT, which may give the impression of the possibility of implementing the 8-LUT. The more detailed structure of ALM0, ALM1, shown in Fig. 3, does not clarify the particular implementation of the 6-LUT tree.
Even more confusing is the information contained in the presentation [9], where it is indicated that for the implementation of k-LUT, 2k bits of SRAM and a multiplexer are also needed 2k:1. But this is impossible! Different modes of using ALM do not clarify the details (Fig. 4).
Let’s consider the primary source — the documentation for Stratix III FPGA [10], where the details of ALM are given (Fig. 5).
Thus, it appears that ALM is built not only on two 4-LUTs, but there are four LUTs in 3 variables (3-LUT) that is from two 3-LUTs we can get one 4-LUT. Therefore, there are only four 4-LUTs, then it becomes clear how the 6-LUT is constructed — the two older variables e, f choose one of the four! Fig. 5 does not show control signals on a number of multiplexers designated by trapezoids (LUTs 1–6 are also multiplexers, but they are shown with control signals, the setting is implied).
METHODS
Let k be the dimension of the basic LUT (k ∈ {1, 2, 3, 4}). In principle for 1-LUT up to n = 4 there is no need for an output inverter. At the moment more than 4 for the indicated restrictions k is not practised.
Let’s estimate the complexity of LUT without decomposition (“ideal” complexity, since this can only be up to n = 4, no more):
Eqn003.eps,(2)
where 2n·8 is the number of tuning elements (six SRAM transistors and two transistors are needed for each input of the tuning to implement the inverter at the input of the transistor tree); 2n — the number of inverters in n variables; 2n+1 — the number of elements of the tree of transmitting transistors with the output inverter.
When decomposing an n-tree with k LUT, k ∈ {1, 2, 3, 4}, n> = k, n< = 8:
Eqn005-1.eps
Eqn005-2.eps,(3)
where 2k+1 is the complexity of the tree k LUT; 2k is the number of transistors in k inverters, 2n−k trees is needed, more LUTs for 2n−k inputs (which can also be decomposed) are needed to connect the trees obtained with decomposition of 2n−k trees, respectively complexity Eqn006.eps, where Eqn007.eps is the complexity of the tree with the output inverter, Eqn008.eps — the complexity of input inverters. The time delay in the decomposition is estimated by the length of the maximum path in the logical element from the input to the output. At the same time, without decomposition — with the “ideal” version (Fig. 2) we get:
Eqn009.eps.(4)
The path for decomposition in the transmitting transistors is also estimated by the value n, but due to additional inverters at the input and output in the LUT chain (Fig. 5, 6), it will be larger:
Eqn010.eps.(5)
MODELING
In the process of investigation, schemes of various variants of the multi-bit LUT (n> 4) were obtained and modeled. Fig 6 demonstrates an example of the synthesis of a 6-LUT of four 4-LUTs and one 2-LUT.
In Fig. 6 2-LUT inputs have inverters, therefore, since the number of inverters on the signal path is even, the settings are recorded as usual.
RESULTS
We restrict ourselves to n = 8, so it is assumed that the additional LUT will fit into the required decomposition parameters with k LUT, k ∈ {1, 2, 3, 4}. We use the Mathcad computer mathematics system. Fig. 7 shows the graphs for comparing the complexity of the decomposition of n LUT by k according to the expression (3).
The result is expected — the larger is the building block, the less is the cost for implementing a complex LUT for 5, 6, 7 and 8 variables. Fig. 8 shows the graphs of the change (5) for n = 5–8.
Fig. 9 shows the graphs of the change (5) for n = 7–10.
DISCUSSION
Thus, in the adaptive logic modules of the Stratix III FPGA there are two 4-LUTs, as indicated in the translation articles. However, in fact there are two more LUTs in 3 variables (3-LUT), from which two additional 4-LUTs can be built. In total, four 4-LUTs are obtained. Then it is clear how 5-LUT and 6-LUT are built from them. There is no difficulty in obtaining two 5-LUTs. Therefore, the setting must contain at least 64 bits to specify any function of the six variables. It is advisable in the future by analyzing the ALM setup to obtain a logical model and check on it the compliance of the declared capabilities of ALM with the variants depicted in the documentation.
CONCLUSION
Analysis of the decomposition of multi-bit LUTs has shown that the most effective in terms of complexity and speed is the use of 4-LUT as “building blocks”, as indicated in the available sources. It is interesting to build LUT on the basis of so-called 3D transistors [11–14], which are already actively used by leading companies. There is information about mitigating the limitations of Meade and Conway in such technologies. In addition, it is advisable to investigate the problem of decomposition when introducing the fault tolerance facilities proposed in [15–20] into the LUT.
REFERENCES
1. Strogonov A., Tsybin S. Programmiruemaya kommutatsiya PLIS: vzglyad iznutri [Electronic resource]. URL: http://www.kit-e.ru/articles/plis/2010_11_56.php.(In Russian).
2. Ugryumov E. P. Tsifrovaya skhemotekhnika: uchebnoe posobie / E. P. Ugryumov. SPb: BKhV-Peterburg, 2004. 518 p. (In Russian).
3. Zolotukha R., Komolov D. Stratix III — novoe semeistvo FPGA firmy Altera [Electronic resource]. URL: http://kit-e.ru/assets/files/pdf/2006_12_30.pdf. (In Russian).
4. Ispol'zovanie resursov PLIS Stratix III firmy Altera pri proektirovanii mikroprotsessornykh yader [Electronic resource]. URL: http://www.kit-e.ru/articles/plis/ 2010_2_39.php. (In Russian).
5. Ul'man Dzh. D. Vychislitel'nye aspekty SBIS. Per. s angl.: A. V. Neimana. Pod red. P. P. Parkhomenko. M.: Radio i svyaz', 1990. 480 p. (In Russian).
6. Davydov S. I. Proektirovanie funktsional'¬nykh blokov programmiruemoi logicheskoi integral'noi skhemy, konfiguriruemykh s is¬pol'¬zovaniem metoda skanirovaniya puti [Electronic resource]. URL: http://www.dslib.net/tverdoteln-elektronika/proektirovanie-funkcionalnyh-blokov-programmiruemoj-logicheskoj-integralnoj.html.(In Russian).
7. Bystritskii A. V. Proektirovanie struktury mezhsoedinenii programmiruemykh logicheskikh integral'nykh skhem. [Electronic resource]. URL: http://www.dslib.net/tverdoteln-elektronika/proektirovanie-struktury-mezhsoedinenij-programmiruemyh-logicheskih-integralnyh.html. (In Russian).
8. Otkrytoe aktsionernoe obshchestvo “Konstruktorsko-tekhnologicheskii tsentr “Elektronika” [Electronic resource]. URL: http://www.edc-electronics.ru/upload/iblock/1cd/1cd2009ffa52599ff023b0843885fad6.pdf. (In Russian).
9. Presentation on ALTERA’s FPGA Technology. [Electronic resource]. — Access mode: http://www.authorstream.com/Presentation/hsrathore158-1410279-fpga/.
10. Logic Array Blocks and Adaptive Logic Modules in Stratix III Devices [Electronic resource]. — Access mode: https://www.altera.com.cn/content/dam/altera-www/global/zh_CN/pdfs/literature/hb/stx3/stx3_siii51002.pdf.
11. Platforma 22i. [Electronic resource]. URL: http://www.achronix.ru/technology/22i-platform.html. (In Russian).
12. Intel vypustila pervye protsessory semeistva Ivy Bridge. [Electronic resource]. URL: http://www.cybersecurity.ru/pda/149408.html?seccode=pda&ID=149408&last=.(In Russian).
13. 3D-komp'yuternye chipy budut v tysyachu raz proizvoditel'nei obychnykh. [Electronic resource]. URL: http://gearmix.ru/archives/22528. (In Russian).
14. TSMC predstavila marshruty proektirovaniya 16-nm FinFET na osnove yader Cortex-A15. [Electronic resource]. URL: http://www.3dnews.ru/762685. (In Rus¬sian).
15. Tyurin S. F. Programmiruemoe logicheskoe ustroistvo: patent RF № 2544750; opubl. 20.03.2015, Byul. № 8. (In Russian).
16. Tyurin S. F., Gorodilov A. Yu., Vikho¬rev R. V. Programmiruemoe logicheskoe ustroistvo: patent RF № 2547229; opubl. 10.04.2015, Byul. № 10. (In Russian).
17. Grekov A. V., Uspalenko V. B. Perspektivnye programmiruemye logicheskie integral'nye skhemy FPGA firmy Altera. “V mire nauchnykh otkrytii. Estestvennye i tekhnicheskie nauki”. — Krasnoyarsk: Nauchno-innovatsionnyi tsentr, 2014. № 6.1(54). P. 518–534. DOI: 10.12731/wsd-2014-6.1-13. (In Russian).
18. Tyurin S. F., Grekov A. V. The Checked Logic Element ChLUT FPGA. “In the World of Scientific Discoveries”. — Krasnoyarsk: Publishing House Science and Innovation Center, 2014, 10 (58). PP. 223—231. DOI: 10.12731/wsd-2014-10-17.
19. Tyurin S. F., Grekov A. V. Functionally Complete Tolerant Elements / International Journal of Applied Engineering Research 10 (14): 34433-34442, 2015. ISSN 0973-4562. Research India Publications, 2015.
20. Tyurin S. F., Grekov A. V. The Decoding of LUT FPGA Configuration of the Finite State Machine with Quartus II / International Journal of Applied Engineering Research 11 (20): 10264–10266, 2016. ISSN 0973-4562. Research India Publications, 2016.
Logical elements (LE) of programmable logic integrated circuits of FPGA type (field-programmable gate array) [1–4] are ROM permanent memory devices (often called LUT-Look Up Table) implemented on a multiplexer whose data inputs are adjusted by constants. To configure a given logical function in RAM cells (SRAM), the corresponding truth table is loaded. When one of the 2n paths in the transistor tree is activated by variables, the value of the logic function is read from the corresponding RAM cell and transmitted to the OUT output. Variable inverters ensure the realization of all members of a perfect disjunctive normal form (PDNF).
The optimal in terms of speed and complexity of representing typical logic functions is the use of LUT in four variables (Fig. 1).
Such LUT for the input variables х4, х3, х2, х1 (setting is for 16 bits) is described by the expression:
Eqn001EN.eps(1)
STATING THE PROBLEM
Given: adaptive logic modules FPGA Stratix III in seven variables.
The problems of decomposition of multi-bit LUT are not fully covered in the literature [3–4, 6–7].
It is required to assess the complexity and speed of the decomposition of a multi-bit LUT in order to identify features of the construction of adaptive logic modules and the prospects for further increasing the bit capacity.
LITERARY REVIEW
Stratix III FPGAs have adaptive (ALM) logical blocks that are combined into logical blocks (Logic Array Block, LAB) [2, 3], which implement functions of even seven variables. The peculiarities of the implementation of such LUTs are of interest. The point is that due to the limitations of Meade and Conway on the number of consecutively connected transistors [5], the tree of transmitting transistors cannot contain more than four transistors in the chain. It is necessary to decompose the multi-bit LUT into LUTs of lesser length, that is, to construct a tree from the subtrees.
Stratix III FPGA is described in a sufficient number of sources [3, 4, 6, 7]. There are data on the possible production of such FPGAs at the Voronezh Plant of Semiconductor Devices (VZPP-Mikron), JSC KTTS “Electronics” [8]. The structure of such FPGAs includes the so-called logic array blocks containing adaptive logic modules ALM, which can be configured to implement combinational logic, including arithmetic operations, as well as for the implementation of automata with memory.
The ALM architecture is compatible with the architecture of the 4-input LUTs, and one ALM can also implement any functions up to six variables and certain functions of seven variables. It is noted that such architecture wins on speed and efficiency (probably, it is a question of hardware expenses and the area of a crystal) — see Fig. 2.
Fig. 2 indicates eight inputs of the adaptive LUT, which may give the impression of the possibility of implementing the 8-LUT. The more detailed structure of ALM0, ALM1, shown in Fig. 3, does not clarify the particular implementation of the 6-LUT tree.
Even more confusing is the information contained in the presentation [9], where it is indicated that for the implementation of k-LUT, 2k bits of SRAM and a multiplexer are also needed 2k:1. But this is impossible! Different modes of using ALM do not clarify the details (Fig. 4).
Let’s consider the primary source — the documentation for Stratix III FPGA [10], where the details of ALM are given (Fig. 5).
Thus, it appears that ALM is built not only on two 4-LUTs, but there are four LUTs in 3 variables (3-LUT) that is from two 3-LUTs we can get one 4-LUT. Therefore, there are only four 4-LUTs, then it becomes clear how the 6-LUT is constructed — the two older variables e, f choose one of the four! Fig. 5 does not show control signals on a number of multiplexers designated by trapezoids (LUTs 1–6 are also multiplexers, but they are shown with control signals, the setting is implied).
METHODS
Let k be the dimension of the basic LUT (k ∈ {1, 2, 3, 4}). In principle for 1-LUT up to n = 4 there is no need for an output inverter. At the moment more than 4 for the indicated restrictions k is not practised.
Let’s estimate the complexity of LUT without decomposition (“ideal” complexity, since this can only be up to n = 4, no more):
Eqn003.eps,(2)
where 2n·8 is the number of tuning elements (six SRAM transistors and two transistors are needed for each input of the tuning to implement the inverter at the input of the transistor tree); 2n — the number of inverters in n variables; 2n+1 — the number of elements of the tree of transmitting transistors with the output inverter.
When decomposing an n-tree with k LUT, k ∈ {1, 2, 3, 4}, n> = k, n< = 8:
Eqn005-1.eps
Eqn005-2.eps,(3)
where 2k+1 is the complexity of the tree k LUT; 2k is the number of transistors in k inverters, 2n−k trees is needed, more LUTs for 2n−k inputs (which can also be decomposed) are needed to connect the trees obtained with decomposition of 2n−k trees, respectively complexity Eqn006.eps, where Eqn007.eps is the complexity of the tree with the output inverter, Eqn008.eps — the complexity of input inverters. The time delay in the decomposition is estimated by the length of the maximum path in the logical element from the input to the output. At the same time, without decomposition — with the “ideal” version (Fig. 2) we get:
Eqn009.eps.(4)
The path for decomposition in the transmitting transistors is also estimated by the value n, but due to additional inverters at the input and output in the LUT chain (Fig. 5, 6), it will be larger:
Eqn010.eps.(5)
MODELING
In the process of investigation, schemes of various variants of the multi-bit LUT (n> 4) were obtained and modeled. Fig 6 demonstrates an example of the synthesis of a 6-LUT of four 4-LUTs and one 2-LUT.
In Fig. 6 2-LUT inputs have inverters, therefore, since the number of inverters on the signal path is even, the settings are recorded as usual.
RESULTS
We restrict ourselves to n = 8, so it is assumed that the additional LUT will fit into the required decomposition parameters with k LUT, k ∈ {1, 2, 3, 4}. We use the Mathcad computer mathematics system. Fig. 7 shows the graphs for comparing the complexity of the decomposition of n LUT by k according to the expression (3).
The result is expected — the larger is the building block, the less is the cost for implementing a complex LUT for 5, 6, 7 and 8 variables. Fig. 8 shows the graphs of the change (5) for n = 5–8.
Fig. 9 shows the graphs of the change (5) for n = 7–10.
DISCUSSION
Thus, in the adaptive logic modules of the Stratix III FPGA there are two 4-LUTs, as indicated in the translation articles. However, in fact there are two more LUTs in 3 variables (3-LUT), from which two additional 4-LUTs can be built. In total, four 4-LUTs are obtained. Then it is clear how 5-LUT and 6-LUT are built from them. There is no difficulty in obtaining two 5-LUTs. Therefore, the setting must contain at least 64 bits to specify any function of the six variables. It is advisable in the future by analyzing the ALM setup to obtain a logical model and check on it the compliance of the declared capabilities of ALM with the variants depicted in the documentation.
CONCLUSION
Analysis of the decomposition of multi-bit LUTs has shown that the most effective in terms of complexity and speed is the use of 4-LUT as “building blocks”, as indicated in the available sources. It is interesting to build LUT on the basis of so-called 3D transistors [11–14], which are already actively used by leading companies. There is information about mitigating the limitations of Meade and Conway in such technologies. In addition, it is advisable to investigate the problem of decomposition when introducing the fault tolerance facilities proposed in [15–20] into the LUT.
REFERENCES
1. Strogonov A., Tsybin S. Programmiruemaya kommutatsiya PLIS: vzglyad iznutri [Electronic resource]. URL: http://www.kit-e.ru/articles/plis/2010_11_56.php.(In Russian).
2. Ugryumov E. P. Tsifrovaya skhemotekhnika: uchebnoe posobie / E. P. Ugryumov. SPb: BKhV-Peterburg, 2004. 518 p. (In Russian).
3. Zolotukha R., Komolov D. Stratix III — novoe semeistvo FPGA firmy Altera [Electronic resource]. URL: http://kit-e.ru/assets/files/pdf/2006_12_30.pdf. (In Russian).
4. Ispol'zovanie resursov PLIS Stratix III firmy Altera pri proektirovanii mikroprotsessornykh yader [Electronic resource]. URL: http://www.kit-e.ru/articles/plis/ 2010_2_39.php. (In Russian).
5. Ul'man Dzh. D. Vychislitel'nye aspekty SBIS. Per. s angl.: A. V. Neimana. Pod red. P. P. Parkhomenko. M.: Radio i svyaz', 1990. 480 p. (In Russian).
6. Davydov S. I. Proektirovanie funktsional'¬nykh blokov programmiruemoi logicheskoi integral'noi skhemy, konfiguriruemykh s is¬pol'¬zovaniem metoda skanirovaniya puti [Electronic resource]. URL: http://www.dslib.net/tverdoteln-elektronika/proektirovanie-funkcionalnyh-blokov-programmiruemoj-logicheskoj-integralnoj.html.(In Russian).
7. Bystritskii A. V. Proektirovanie struktury mezhsoedinenii programmiruemykh logicheskikh integral'nykh skhem. [Electronic resource]. URL: http://www.dslib.net/tverdoteln-elektronika/proektirovanie-struktury-mezhsoedinenij-programmiruemyh-logicheskih-integralnyh.html. (In Russian).
8. Otkrytoe aktsionernoe obshchestvo “Konstruktorsko-tekhnologicheskii tsentr “Elektronika” [Electronic resource]. URL: http://www.edc-electronics.ru/upload/iblock/1cd/1cd2009ffa52599ff023b0843885fad6.pdf. (In Russian).
9. Presentation on ALTERA’s FPGA Technology. [Electronic resource]. — Access mode: http://www.authorstream.com/Presentation/hsrathore158-1410279-fpga/.
10. Logic Array Blocks and Adaptive Logic Modules in Stratix III Devices [Electronic resource]. — Access mode: https://www.altera.com.cn/content/dam/altera-www/global/zh_CN/pdfs/literature/hb/stx3/stx3_siii51002.pdf.
11. Platforma 22i. [Electronic resource]. URL: http://www.achronix.ru/technology/22i-platform.html. (In Russian).
12. Intel vypustila pervye protsessory semeistva Ivy Bridge. [Electronic resource]. URL: http://www.cybersecurity.ru/pda/149408.html?seccode=pda&ID=149408&last=.(In Russian).
13. 3D-komp'yuternye chipy budut v tysyachu raz proizvoditel'nei obychnykh. [Electronic resource]. URL: http://gearmix.ru/archives/22528. (In Russian).
14. TSMC predstavila marshruty proektirovaniya 16-nm FinFET na osnove yader Cortex-A15. [Electronic resource]. URL: http://www.3dnews.ru/762685. (In Rus¬sian).
15. Tyurin S. F. Programmiruemoe logicheskoe ustroistvo: patent RF № 2544750; opubl. 20.03.2015, Byul. № 8. (In Russian).
16. Tyurin S. F., Gorodilov A. Yu., Vikho¬rev R. V. Programmiruemoe logicheskoe ustroistvo: patent RF № 2547229; opubl. 10.04.2015, Byul. № 10. (In Russian).
17. Grekov A. V., Uspalenko V. B. Perspektivnye programmiruemye logicheskie integral'nye skhemy FPGA firmy Altera. “V mire nauchnykh otkrytii. Estestvennye i tekhnicheskie nauki”. — Krasnoyarsk: Nauchno-innovatsionnyi tsentr, 2014. № 6.1(54). P. 518–534. DOI: 10.12731/wsd-2014-6.1-13. (In Russian).
18. Tyurin S. F., Grekov A. V. The Checked Logic Element ChLUT FPGA. “In the World of Scientific Discoveries”. — Krasnoyarsk: Publishing House Science and Innovation Center, 2014, 10 (58). PP. 223—231. DOI: 10.12731/wsd-2014-10-17.
19. Tyurin S. F., Grekov A. V. Functionally Complete Tolerant Elements / International Journal of Applied Engineering Research 10 (14): 34433-34442, 2015. ISSN 0973-4562. Research India Publications, 2015.
20. Tyurin S. F., Grekov A. V. The Decoding of LUT FPGA Configuration of the Finite State Machine with Quartus II / International Journal of Applied Engineering Research 11 (20): 10264–10266, 2016. ISSN 0973-4562. Research India Publications, 2016.
Readers feedback