### Approaches to the SoC IP-Blocks' Design With Errors' Mitigation

Valentin Rozanov, Elena Suvorova Saint-Petersburg State University of Aerospace Instrumentation





#### Errors on different stages of IP-block lifetime

| Compilation for manufacturing                            | Manufacturing                                                         | Exploitation                                                                                                                                            |
|----------------------------------------------------------|-----------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------|
| Compilation<br>errors                                    | Manufacturing<br>errors                                               | External<br>influence                                                                                                                                   |
| Checks by testing before exploitation                    |                                                                       |                                                                                                                                                         |
| Can be fixed if error is detected Sometimes can be fixed |                                                                       | Can't be fixed                                                                                                                                          |
|                                                          | manufacturing<br>Compilation<br>errors<br>ks by testing before exploi | manufacturing   Manufacturing   Compilation<br>errors   Manufacturing<br>errors   Ks by testing before exploitation   ks by testing before exploitation |





### Types and causes of errors in exploitation part of lifetime

| Soft Errors                              | Hard Errors                      |  |
|------------------------------------------|----------------------------------|--|
| Single event upset (SEU)                 | Single event latch-up (SEL)      |  |
| Multiple cell upset (MCU)                | Single event gate rupture (SEGR) |  |
| Single event transient (SET)             |                                  |  |
| Single event functional interrupt (SEFI) |                                  |  |





#### Construction of errors resilient SoC







#### Reconfiguration as a fault mitigation methods in FPGA



of Aerospace Instrumentation

**Cooperation in Telecommunications** 

### Reconfiguration as a fault mitigation methods in ASIC

- Switching on and off different elements, in this case redundancy at the level of components and connections is used
- Using of look-up tables
- Using of logical elements libraries, that allows reconfiguration of logic (logical element can perform various functions depending on configuration for example NAND, NOR, NOT)





#### Methods of failure assessment





# Scheme of transport layer protocol controller without reconfiguration







## Graph of non-reconfigurable controller states



- 1. All works correct
- 2. Receiving branch fails, transmitting branch works
- 3. Transmitting branch fails, receiving branch works
- 4. Both of branches fails





Using Chapman-Kolmogorov equation to calculate probability of finding in each of the state

For non-reconfigurable considered variant

$$P_{n} = \begin{bmatrix} p_{n11} & p_{n12} & p_{n13} & p_{n14} \\ 0 & p_{n22} & 0 & p_{n24} \\ 0 & 0 & p_{n33} & p_{n34} \\ 0 & 0 & 0 & p_{n44} \end{bmatrix} \qquad P_{n}^{*}(0) = [1,0,0,0],$$

 $P_{r1}^* + P_{r2}^* + P_{r3}^* + P_{r4}^* = 1$ 

 $P_n^*(t) = [P_{n1}^* < 0.1, P_{n2}^* < 0.1, P_{n3}^* < 0.1, P_{n4}^* > 0.99]$ 





### Dependence of probability value to stay in state 1-4



5009 steps made for  $P_n^*(t) = [P_{n1}^* < 0.1, P_{n2}^* < 0.1, P_{n3}^* < 0.1, P_{n4}^* > 0.99]$ 





## Scheme of transport layer protocol controller with reconfiguration







# Graph of controller states with reconfiguration in states 2 or 3



- 1. All works correct
- 2. Receiving branch fails, transmitting branch works
- 3. Transmitting branch fails, receiving branch works
- 4. Reconfiguration
- 5. Reconfiguration
- 6. Both of branches fails





### Compare non-reconfigurable and reconfigurable graphs







#### Using Chapman-Kolmogorov equation to calculate probability of finding in each of the state

For reconfigurable considered variant

$$P = \begin{bmatrix} p_{r11} & p_{r12} & p_{r13} & 0 & 0 & p_{r16} \\ 0 & 0 & 0 & p_{r24} & 0 & p_{r26} \\ 0 & 0 & 0 & 0 & p_{r35} & p_{r36} \\ 0 & 0 & 0 & p_{r44} & 0 & p_{r46} \\ 0 & 0 & 0 & 0 & p_{r55} & p_{r56} \\ 0 & 0 & 0 & 0 & 0 & p_{r66} \end{bmatrix}$$

 $P_r^{*}(0) = [1,0,0,0,0,0].$ 

$$p_{mr}=0.001, p_{mt}=0.002$$

 $P_{r1}^* + P_{r2}^* + P_{r3}^* + P_{r4}^* + P_{r5}^* + P_{r6}^* = 1$ 

 $P_{r}^{*}(t) = [P_{r1}^{*} < 0.1, P_{r2}^{*} < 0.1, P_{r3}^{*} < 0.1, P_{r4}^{*} < 0.1, P_{r5}^{*} < 0.1, P_{r6}^{*} > 0.99]$ 





### Dependence of probability value to stay in state 1-6



4551 steps made for  $P_r^*(t) = [P_{r1}^* < 0.1, P_{r2}^* < 0.1, P_{r3}^* < 0.1, P_{r4}^* < 0.1, P_{r5}^* < 0.1, P_{r6}^* > 0.99]$ 





#### Compare two results in graph view

- non-reconfigurable
- 4 states
- 5009 steps made
- The probability to be in stage 1-4 0.9 0.8 0.7 State 1 0.6 State 2 State 3 0.5 State 4 0.4 0.3 0.2 0.1 0 10<sup>0</sup> 10<sup>3</sup> 10<sup>1</sup>  $10^{2}$  $10^{4}$ Discrete time The probability to be in stage 1-6 0.9 0.8 State 1 0.7 State 2 0.6 State 3 0.5 State 4 State 5 0.4 State 6 0.3 0.2 0.1 10<sup>0</sup> 10<sup>1</sup> 10<sup>3</sup>  $10^{4}$ 10 Discrete time

- reconfigurable
- 6 states
- 4551 steps made





#### Results of calculation

| Parameter                      | Controller                                     |                                            | D.((       |
|--------------------------------|------------------------------------------------|--------------------------------------------|------------|
|                                | Non-Reconfigurable                             | Reconfigurable                             | Difference |
| Number of states               | 4                                              | 6                                          | 2          |
| Value of fail probability      | $p_{mr} = 0.001, p_{mt} = 0.002$               |                                            | -          |
| Starting values of probability | $P_n^{*}(0) = [1, 0, 0, 0, ]$                  | $P_r^{*}(0) = [1, 0, 0, 0, 0, 0]$          | =          |
| Ending values of probabilities | $P_n^{*}(t) = [P_n^{*}4 > 0.99, others < 0.1]$ | $P_r^*(t) = [P_r^*6 > 0.99, others < 0.1]$ | =          |
| Number of steps to fail        | 5009                                           | 4551                                       | 10%        |





#### Advantages and Disadvantages

#### Disadvantages

- Speed of data receiving and transmitting may be lower, because of using one memory unit for two directions;
- If the last memory unit breaks down, controller becomes faulty in a moment.

#### Advantages

- Ensure full operability of the controller even in the event of failure of one of the memory units;
- Maintaining the required space occupied by NoC in terms of memory elements.





### Thank you! Questions?!



