Issue #8/2016
O.Brekhov, A.Klimenko, A.Shdanov, A.Yakupov
Implementation of experimental software prototype for control of fault tolerance of IC design
Implementation of experimental software prototype for control of fault tolerance of IC design
A set of developed modules that form the software core of the experimental prototype of a hardware-software system for control of fault tolerance of IC design is presented.
Development of devices of "system on chip" (SoC) type includes a whole set of interrelated tasks from design of structure and selection of hardware base to implementation of experimental samples. At the same time, the testing efficiency at the design stage is paramount, and for devices that are based on custom and semicustom chips it is critical, as the quality of testing significantly affects the cost of development. Devices for special purposes (in particular, components of space equipment) must meet additional requirements to ensure the operability in the conditions of aggressive external environments. In this regard, the testing of failure tolerance of devices like SoC at the stage of their development is urgent.
It is widely known that one of the main failure sources of electronics in both the satellite and terrestrial applications is cosmic radiation [1]. The testing of operability of chips in conditions of radiation exposure is usually provided by method of fault injection, and can be executed either on later stages of development by testing experimental models in particle accelerators or in the early stages, by simulation of failures of designs of the chips with use of hardware-software systems (HSS). The second method avoids the costly fabrication of experimental samples in the cycles of testing, redesign, and therefore is used by many developers. Many implementations of hardware and software solutions to control the operability of the SoC under conditions of exposure to cosmic radiation are well known (in particular, [2–5]). There are solutions based on FPGA prototyping, which provide the optimal ratio of price and performance of HSS [6]. However, the known approaches do not provide the possibility of detailed studies of the impact of sources of failures on the chip.
This paper describes a set of software modules of developed HSS for control of the failure-stability of IC design using an advanced method of fault injection [7]. The method uses a stack of three models (of external influences, of emergence of threats and of fault localization), and allows to study the multi-level impact of faults on the chip.
Structural diagram of the proposed HSS is presented in Fig.1. The complex contains a workstation with software component, as well as four specialized expansion boards – Xilinx Virtex-6 FPGA ML605 Evaluation Kit.
HSS allows to estimate the failure-stability of devices like the SoC for an arbitrary target hardware, implementing step-by-step simulation [8]. This technique involves carrying out functional testing of the initial IC design, described on a subset of the Verilog language [9] at the level of the digital functional elements of the target hardware, through FPGA prototyping, with the subsequent introduction of the means of fault injection in the original design and functional testing of the modified design with the determination of their equivalence. In the case of equivalence of the original and modified designs, the methodology requires modeling the functioning of the latter in the presence of failures, the results of which determine the failure-stability of the initial IC design.
Let’s consider a software component of the developed HSS.
GENERAL DESCRIPTION
OF SOFTWARE SYSTEM
The structure of the software system is shown in Fig.2. The software complex is realized on the basis of five main components, allowing to carry out processing of IC design, the generation of a list of faults and test inputs, and also processing of results of modeling of IC design with simulated failures. HSS contains CADs for SoC, target hardware and Xilinx ISE, providing a synthesis of the lists of compounds of elements of the device in the respective bases.
The software system carries out the following tasks:
• functional analysis of the IC design created by SoC CAD tools (represented as a list of connections for target hardware);
• detection in the IC design of units, in which fault occurrence is the most likely;
• determine the consequences of faults in operation of IC;
• generation of external influences to control the failure-stability of the chip;
• data generation for programming hardware for the simulation of faults in accordance with the methodology of simulation;
• modeling the operation of the IC design with simulated failures;
• formation of a timing chart of internal signals of the IC design at simulation of failures;
• collection, analysis, storage and processing of simulation data.
The software system allows to simulate operation of the target SoC on 5521 and 5529 gate array families in conditions of exposure to cosmic radiation. In the process of modeling based on the known characteristics of streams of charged particles and of the target chips, the time moments of failures of different types and their localization are determined. The modeling process involves several steps, in particular, testing of the source IC design, implementation in the IC design of functionality for fault injection, simulation of the operation of the obtained design in the absence and presence of failures. At each step, the monitoring signals at the outputs of the chip and internal signals specified by the user and they comparing with reference values are carried out. Analysis of simulation results allows to evaluate the tolerance of the target SoC in conditions of a given exposure to cosmic radiation.
We will describe the four components of the software system implementing its basic functionality, which are declared for registration in the Federal service for intellectual property as computer programs.
COMPONENT FOR PROCESSING
OF IC DESIGN
The main functions of Component for processing of IC design are generating a modifiable part of the code of the microkernel of simulation support with simulation of failures, its integration in the project of the chip and processing of technology libraries of 5521 and 5529 gate array families and Virtex 6 LX240T FPGA.
The source data for Component for processing of IC design:
• IC design in structural Verilog language;
• library of elements of the target hardware in the structural Verilog language;
• ordered list of elements of IC design for simulation of failures;
• library of elements with means of fault injection in the basis of target hardware in the structural Verilog language;
• ordered list of terminals in IC design for input actions;
• ordered list of terminals in IC design for reading responses;
• ordered list of terminals of the internal elements of IC design, the status of which is monitored;
• output of clock frequency of the IC design.
The results of the operation of the module are:
• IC design description file in the structural Verilog language with embedded means of fault injection and the terminals for the control of internal signals;
• modified IC design implemented in the FPGA.
Component for processing of IC design consists of the following modules:
• module of the intermediate representation of IC design structure and code generation;
• reader of library elements;
• module for modification of IC design;
• lexical analyzer;
• syntax analyzer.
Lexical and syntactic analyzers are used in the process of analysis of IC design file, presented as structured Verilog language code.
The analysis of IC design file is carried out using a bottom-up parser based LR-analysis [10].
GENERATOR OF LIST OF FAULTS
The main objective of the generator of list of faults (LFG) is the generation of source data on injected faults for simulation of IC design with simulated failures. The solution of the main task of LFG involves determining periods of model time between adjacent facts of fault injection, determining a plurality of chip elements for each case of fault injection and identification of type of fault for each element at each fault injection.
The component can operate in two modes: basic and detailed. Detailed mode involves the use of a stack of three models [8]: of external influences (MEI), of emergence of threats (MET) and of fault localization (MFL). In this mode, the cosmic radiation is considered as the source of the failure. The basic mode assumes that the user specifies the parameters of the sources of failures. As a consequence, any source of faults, the impact of which on the chip leads to logical failures (e.g., bit flip), can be considered in this mode.
To operate in detailed mode the following data are given:
• date of launch of the spacecraft (SC) with chip, which is required for the calculation of solar activity (SA) during the period of operation of the SC (active lifetime, AL);
• AL of SC (to define the calculation period);
• parameters of SC orbit;
• data on the densities of the energy spectra of space particles for different locations in near-Earth space in different phases of solar activity;
• names of elements of the chip selected for modeling;
• process data of the chips, including the value of operating clock frequency and the voltage level;
• data of the device, implemented on the basis of the target IC, including the period of his work, as well as a list of connections of elements of the device.
To operate in basic mode the following data are given:
• types of particles that act on the chip (it is required to select from the database of external influences or to create new type);
• names of elements of the chip selected for modeling;
• operating clock frequency of the chip;
• period of operation of the device;
• list of connections of the elements of the device.
Type of particles is characterized by the distribution law of time between influences of particles of same type, by the probability of occurrence of each type of failure when hit by particles of a given type, and by the area of the lesion, which determines the radius of the circle in the plane of the chip, centered at the point of incidence of the particles (all elements within this circle, will be affected by this particle).
The result of operation of LFG is the list of faults consisting of a header "simulation parameters" and consecutive data of several experiments. The title "simulation parameters" contains the fields "SC orbit parameters" and "start date of SC". Data of each experiment contain the heading "title of experiment" and an array of k simulation packages. "Title of experiment" contains the following fields:
• array of names of elements of the chip, simulated in this experiment;
• coordinates of the simulated site of the orbit;
• data on the modulated streams of charged cosmic particles at the site of the orbit;
• comment describing the features of the experiment.
Each modeling package consists of the fields "offset" and "massive of failures". The value of the "offset" describes the time interval between the last fault injection into IC design and fault injection that is described in this package. The offset is measured in cycles of the working clock frequency of the simulated device.
Dimension of an array of failures is equal to the number of elements selected for modeling in this experiment. Each array element contains the code of the fault that corresponds to the moment of the model time determined by the value of the field "offset".
GENERATOR OF TEST INPUTS
Generator of test inputs (TIG) is used to form vectors of input signals during the control of failure stability of the chip. Test inputs are formed at the stage of functional testing on the basis of information about inputs and reference responses obtained during the development of design of the target IC.
Generator of test inputs performs the following functions:
• analysis of the source data on inputs and the reference responses received at the previous stages of development;
• on the basis of the obtained information the arrays of vectors of input signals and a reference responses are generated;
• transfer of vectors of input signals to the component for simulation support that controls other components of the software system.
The source data for the generator of test inputs are:
• source file of data on inputs and reference responses;
• list of the names of the terminals of IC design, which will transmit the input signals;
• list of the names of the terminals of IC design, which will be used for reading of the responses;
• name of the terminal of IC design, which will be used as the clock;
• active front of the clock.
The output data for the generator of test inputs are arrays of vectors of input signals and of vectors of reference responses.
COMPONENT FOR PROCESSING
OF SIMULATION RESULTS
Component for processing of simulation results of IC design is intended to control the simulation of the project components with simulated failures by comparing arrays of vectors of responses obtained through simulation in the presence and absence of faults with an array of vectors of reference responses. An array of vectors of reference responses can be obtained as a result of operation of TIG after functional testing of the route of IC design simulation with simulated failures.
Component for processing of simulation results of IC design is aimed to solve the following tasks:
• detection in IC design units, in which fault occurrence is the most likely;
• verification of functional equivalence of the source IC design in the target hardware and IC design in the basis of FPGA at the stage of functional testing, as well as of the equivalence of the last to the modified design with embedded means of fault injection at the stage of modelling with simulated faults;
• control of failure-stability of source IC design based on the simulation results of operability of the modified IC design in the presence of failures;
• determination of the consequences of failure in the operation of the chip;
• generation of report with results of the simulation of the IC design with simulated failures;
• formation of a timing chart of internal signals of the IC design in the modeling process.
The source data for this component:
• type of the performed simulation phase;
• information about the result of the previous stage of the simulation;
• vectors of input signals and the parameters of injected faults for each cycle of the simulation;
• names of elements for fault injection;
• list of control points of IC design (outputs of the internal components of the chip, the values of signals which should be monitored);
• vectors of the reference response;
• • vectors of responses obtained in simulation of the IC design;
• module of the intermediate representation of IC design structure and code generation;
• information about the hierarchical structure of IC design;
• information about areas of elements of the target hardware library.
The results of the operation of the component is the report file that contains information about the results of the modeling stage of IC design with simulated failures, and the file of vcd format to display the time diagrams of signals of IC design. The report file contains the following information:
• total simulation time;
• number of transferred vectors of input actions and vectors of reference response;
• number of obtained vectors of responses of IC design;
• result of the comparison of the array of reference vectors of responses and responses obtained in the simulation;
• list of names of outputs of the chip with mismatch of values of signals with a reference;
• statistical information about the inconsistencies found for each output of IC design and control point, including the total number of detected mismatches and localization of cycles of modeling with mismatches of the reference values;
• statistical information about the faults injected in the simulation process (for stage of simulation with fault injection);
• result of control of the failure tolerance of IC design, which determines the impact of failures on its performance (for stage of simulation with fault injection).
The chip is considered operable in conditions of influence of sources of failures in the case of coincidence of the values of the signals at the outputs of IC design with the corresponding reference values at each stage of the simulation.
PROSPECTS
The paper presents a set of software modules of the prototype of HSS for control of fault tolerance of IC design. These modules provide all simulation stages of IC design in the process of defining its failure-stability. The applied technical solutions allow to realize a flexible choice of sources of failures in the chip and to obtain detailed information about the localization of the critical faults that caused a failure of the simulated system. The use of FPGA prototyping accelerates the control of fault tolerance in comparison with the use of software simulators. An advanced method of fault injection provides a reduction in costs in the determination of the failure tolerance of the chip, allowing to abandon the use of particle accelerators.
As the main areas of further development of the HSS for control of fault tolerance, in particular, of its software component, we can highlight the determining of the speed of recovery of the device after a critical failure, support of dynamic generation of external influences based on current feedback of the chip, as well as the integration of more sophisticated MEI, MET and MFL into the proposed complex. ■
The development was carried out with the support of the Ministry of education and science of the Russian Federation in the framework of the Federal Targeted Programme for Research and Development in Priority Areas of Development of the Russian Scientific and Technological Complex for 2014-2020. Unique identifier of applied research RFMEFI57715X0161.
It is widely known that one of the main failure sources of electronics in both the satellite and terrestrial applications is cosmic radiation [1]. The testing of operability of chips in conditions of radiation exposure is usually provided by method of fault injection, and can be executed either on later stages of development by testing experimental models in particle accelerators or in the early stages, by simulation of failures of designs of the chips with use of hardware-software systems (HSS). The second method avoids the costly fabrication of experimental samples in the cycles of testing, redesign, and therefore is used by many developers. Many implementations of hardware and software solutions to control the operability of the SoC under conditions of exposure to cosmic radiation are well known (in particular, [2–5]). There are solutions based on FPGA prototyping, which provide the optimal ratio of price and performance of HSS [6]. However, the known approaches do not provide the possibility of detailed studies of the impact of sources of failures on the chip.
This paper describes a set of software modules of developed HSS for control of the failure-stability of IC design using an advanced method of fault injection [7]. The method uses a stack of three models (of external influences, of emergence of threats and of fault localization), and allows to study the multi-level impact of faults on the chip.
Structural diagram of the proposed HSS is presented in Fig.1. The complex contains a workstation with software component, as well as four specialized expansion boards – Xilinx Virtex-6 FPGA ML605 Evaluation Kit.
HSS allows to estimate the failure-stability of devices like the SoC for an arbitrary target hardware, implementing step-by-step simulation [8]. This technique involves carrying out functional testing of the initial IC design, described on a subset of the Verilog language [9] at the level of the digital functional elements of the target hardware, through FPGA prototyping, with the subsequent introduction of the means of fault injection in the original design and functional testing of the modified design with the determination of their equivalence. In the case of equivalence of the original and modified designs, the methodology requires modeling the functioning of the latter in the presence of failures, the results of which determine the failure-stability of the initial IC design.
Let’s consider a software component of the developed HSS.
GENERAL DESCRIPTION
OF SOFTWARE SYSTEM
The structure of the software system is shown in Fig.2. The software complex is realized on the basis of five main components, allowing to carry out processing of IC design, the generation of a list of faults and test inputs, and also processing of results of modeling of IC design with simulated failures. HSS contains CADs for SoC, target hardware and Xilinx ISE, providing a synthesis of the lists of compounds of elements of the device in the respective bases.
The software system carries out the following tasks:
• functional analysis of the IC design created by SoC CAD tools (represented as a list of connections for target hardware);
• detection in the IC design of units, in which fault occurrence is the most likely;
• determine the consequences of faults in operation of IC;
• generation of external influences to control the failure-stability of the chip;
• data generation for programming hardware for the simulation of faults in accordance with the methodology of simulation;
• modeling the operation of the IC design with simulated failures;
• formation of a timing chart of internal signals of the IC design at simulation of failures;
• collection, analysis, storage and processing of simulation data.
The software system allows to simulate operation of the target SoC on 5521 and 5529 gate array families in conditions of exposure to cosmic radiation. In the process of modeling based on the known characteristics of streams of charged particles and of the target chips, the time moments of failures of different types and their localization are determined. The modeling process involves several steps, in particular, testing of the source IC design, implementation in the IC design of functionality for fault injection, simulation of the operation of the obtained design in the absence and presence of failures. At each step, the monitoring signals at the outputs of the chip and internal signals specified by the user and they comparing with reference values are carried out. Analysis of simulation results allows to evaluate the tolerance of the target SoC in conditions of a given exposure to cosmic radiation.
We will describe the four components of the software system implementing its basic functionality, which are declared for registration in the Federal service for intellectual property as computer programs.
COMPONENT FOR PROCESSING
OF IC DESIGN
The main functions of Component for processing of IC design are generating a modifiable part of the code of the microkernel of simulation support with simulation of failures, its integration in the project of the chip and processing of technology libraries of 5521 and 5529 gate array families and Virtex 6 LX240T FPGA.
The source data for Component for processing of IC design:
• IC design in structural Verilog language;
• library of elements of the target hardware in the structural Verilog language;
• ordered list of elements of IC design for simulation of failures;
• library of elements with means of fault injection in the basis of target hardware in the structural Verilog language;
• ordered list of terminals in IC design for input actions;
• ordered list of terminals in IC design for reading responses;
• ordered list of terminals of the internal elements of IC design, the status of which is monitored;
• output of clock frequency of the IC design.
The results of the operation of the module are:
• IC design description file in the structural Verilog language with embedded means of fault injection and the terminals for the control of internal signals;
• modified IC design implemented in the FPGA.
Component for processing of IC design consists of the following modules:
• module of the intermediate representation of IC design structure and code generation;
• reader of library elements;
• module for modification of IC design;
• lexical analyzer;
• syntax analyzer.
Lexical and syntactic analyzers are used in the process of analysis of IC design file, presented as structured Verilog language code.
The analysis of IC design file is carried out using a bottom-up parser based LR-analysis [10].
GENERATOR OF LIST OF FAULTS
The main objective of the generator of list of faults (LFG) is the generation of source data on injected faults for simulation of IC design with simulated failures. The solution of the main task of LFG involves determining periods of model time between adjacent facts of fault injection, determining a plurality of chip elements for each case of fault injection and identification of type of fault for each element at each fault injection.
The component can operate in two modes: basic and detailed. Detailed mode involves the use of a stack of three models [8]: of external influences (MEI), of emergence of threats (MET) and of fault localization (MFL). In this mode, the cosmic radiation is considered as the source of the failure. The basic mode assumes that the user specifies the parameters of the sources of failures. As a consequence, any source of faults, the impact of which on the chip leads to logical failures (e.g., bit flip), can be considered in this mode.
To operate in detailed mode the following data are given:
• date of launch of the spacecraft (SC) with chip, which is required for the calculation of solar activity (SA) during the period of operation of the SC (active lifetime, AL);
• AL of SC (to define the calculation period);
• parameters of SC orbit;
• data on the densities of the energy spectra of space particles for different locations in near-Earth space in different phases of solar activity;
• names of elements of the chip selected for modeling;
• process data of the chips, including the value of operating clock frequency and the voltage level;
• data of the device, implemented on the basis of the target IC, including the period of his work, as well as a list of connections of elements of the device.
To operate in basic mode the following data are given:
• types of particles that act on the chip (it is required to select from the database of external influences or to create new type);
• names of elements of the chip selected for modeling;
• operating clock frequency of the chip;
• period of operation of the device;
• list of connections of the elements of the device.
Type of particles is characterized by the distribution law of time between influences of particles of same type, by the probability of occurrence of each type of failure when hit by particles of a given type, and by the area of the lesion, which determines the radius of the circle in the plane of the chip, centered at the point of incidence of the particles (all elements within this circle, will be affected by this particle).
The result of operation of LFG is the list of faults consisting of a header "simulation parameters" and consecutive data of several experiments. The title "simulation parameters" contains the fields "SC orbit parameters" and "start date of SC". Data of each experiment contain the heading "title of experiment" and an array of k simulation packages. "Title of experiment" contains the following fields:
• array of names of elements of the chip, simulated in this experiment;
• coordinates of the simulated site of the orbit;
• data on the modulated streams of charged cosmic particles at the site of the orbit;
• comment describing the features of the experiment.
Each modeling package consists of the fields "offset" and "massive of failures". The value of the "offset" describes the time interval between the last fault injection into IC design and fault injection that is described in this package. The offset is measured in cycles of the working clock frequency of the simulated device.
Dimension of an array of failures is equal to the number of elements selected for modeling in this experiment. Each array element contains the code of the fault that corresponds to the moment of the model time determined by the value of the field "offset".
GENERATOR OF TEST INPUTS
Generator of test inputs (TIG) is used to form vectors of input signals during the control of failure stability of the chip. Test inputs are formed at the stage of functional testing on the basis of information about inputs and reference responses obtained during the development of design of the target IC.
Generator of test inputs performs the following functions:
• analysis of the source data on inputs and the reference responses received at the previous stages of development;
• on the basis of the obtained information the arrays of vectors of input signals and a reference responses are generated;
• transfer of vectors of input signals to the component for simulation support that controls other components of the software system.
The source data for the generator of test inputs are:
• source file of data on inputs and reference responses;
• list of the names of the terminals of IC design, which will transmit the input signals;
• list of the names of the terminals of IC design, which will be used for reading of the responses;
• name of the terminal of IC design, which will be used as the clock;
• active front of the clock.
The output data for the generator of test inputs are arrays of vectors of input signals and of vectors of reference responses.
COMPONENT FOR PROCESSING
OF SIMULATION RESULTS
Component for processing of simulation results of IC design is intended to control the simulation of the project components with simulated failures by comparing arrays of vectors of responses obtained through simulation in the presence and absence of faults with an array of vectors of reference responses. An array of vectors of reference responses can be obtained as a result of operation of TIG after functional testing of the route of IC design simulation with simulated failures.
Component for processing of simulation results of IC design is aimed to solve the following tasks:
• detection in IC design units, in which fault occurrence is the most likely;
• verification of functional equivalence of the source IC design in the target hardware and IC design in the basis of FPGA at the stage of functional testing, as well as of the equivalence of the last to the modified design with embedded means of fault injection at the stage of modelling with simulated faults;
• control of failure-stability of source IC design based on the simulation results of operability of the modified IC design in the presence of failures;
• determination of the consequences of failure in the operation of the chip;
• generation of report with results of the simulation of the IC design with simulated failures;
• formation of a timing chart of internal signals of the IC design in the modeling process.
The source data for this component:
• type of the performed simulation phase;
• information about the result of the previous stage of the simulation;
• vectors of input signals and the parameters of injected faults for each cycle of the simulation;
• names of elements for fault injection;
• list of control points of IC design (outputs of the internal components of the chip, the values of signals which should be monitored);
• vectors of the reference response;
• • vectors of responses obtained in simulation of the IC design;
• module of the intermediate representation of IC design structure and code generation;
• information about the hierarchical structure of IC design;
• information about areas of elements of the target hardware library.
The results of the operation of the component is the report file that contains information about the results of the modeling stage of IC design with simulated failures, and the file of vcd format to display the time diagrams of signals of IC design. The report file contains the following information:
• total simulation time;
• number of transferred vectors of input actions and vectors of reference response;
• number of obtained vectors of responses of IC design;
• result of the comparison of the array of reference vectors of responses and responses obtained in the simulation;
• list of names of outputs of the chip with mismatch of values of signals with a reference;
• statistical information about the inconsistencies found for each output of IC design and control point, including the total number of detected mismatches and localization of cycles of modeling with mismatches of the reference values;
• statistical information about the faults injected in the simulation process (for stage of simulation with fault injection);
• result of control of the failure tolerance of IC design, which determines the impact of failures on its performance (for stage of simulation with fault injection).
The chip is considered operable in conditions of influence of sources of failures in the case of coincidence of the values of the signals at the outputs of IC design with the corresponding reference values at each stage of the simulation.
PROSPECTS
The paper presents a set of software modules of the prototype of HSS for control of fault tolerance of IC design. These modules provide all simulation stages of IC design in the process of defining its failure-stability. The applied technical solutions allow to realize a flexible choice of sources of failures in the chip and to obtain detailed information about the localization of the critical faults that caused a failure of the simulated system. The use of FPGA prototyping accelerates the control of fault tolerance in comparison with the use of software simulators. An advanced method of fault injection provides a reduction in costs in the determination of the failure tolerance of the chip, allowing to abandon the use of particle accelerators.
As the main areas of further development of the HSS for control of fault tolerance, in particular, of its software component, we can highlight the determining of the speed of recovery of the device after a critical failure, support of dynamic generation of external influences based on current feedback of the chip, as well as the integration of more sophisticated MEI, MET and MFL into the proposed complex. ■
The development was carried out with the support of the Ministry of education and science of the Russian Federation in the framework of the Federal Targeted Programme for Research and Development in Priority Areas of Development of the Russian Scientific and Technological Complex for 2014-2020. Unique identifier of applied research RFMEFI57715X0161.
Readers feedback