7.4.5 Requirements for quantifying the impact of random hardware failures
Note: GB T 20438Appendix A. in 6-20172 provides an overview of the steps necessary to achieve the required hardware security integrity and explains the relationship between this clause and the other requirements of GB T 20438.
7.4.5.1 For each safety function, the safety integrity achievable by the safety-related system under the influence of random hardware failures (including soft errors) and random failures in the data communication process shall be in accordance with 74.5.2 and 74.11. Estimated, and the result should be less than or equal to the target failure amount specified in the specification for the safety requirements of the E e PE system (see GB T 20438.)1. of 2017-710)。
Note: To demonstrate that the requirements have been met, appropriate technology should be used (see 7.)4.5.2) Perform the reliability of the relevant function** and compare it with the target failure amount of the relevant safety function (see GB T20438, 1).
7.4.5.2 to 74.5.1 The estimation of the amount of failure achieved for each safety function should take into account the following factors:
a) the architecture of the E e PE safety-related system and its subsystems and their relationship to the relevant safety functions;
Note 1: It is necessary to determine in which failure mode the components of the system are configured in series (i.e., any failure will result in the failure of the relevant safety function being performed) and in which failure mode is the parallel configuration (i.e., the relevant safety function will not fail until multiple failures occur at the same time).
b) the component architecture of the various subsystems of the e e pe safety-related system and their relationship to the individual safety functions under consideration;
c) Estimate that in any mode may cause a dangerous failure of the E e PE safety-related system, but will be subject to diagnostic tests (see 7.)4.9.4~7.4.9.5) The failure rate of each subsystem and its components should be demonstrated according to the data source and its accuracy or margin. This may include considering and comparing different** data and selecting the failure rate that most closely matches the combination of systems under consideration. Quantifying the impact of random hardware failures versus the failure rates used to calculate the safety failure score or diagnostic coverage should take into account the specified operating conditions
Note 2: Considering operating conditions means that failure rates from the database such as contact load and temperature effects are often adjusted.
d) The susceptibility of the E e PE safety-related systems and their subsystems to common cause failures (see Notes 3 and 4). Hypotheses should be demonstrated:
Note 3: Common cause failures may be caused by factors other than the actual failure of hardware components (such as electromagnetic interference and decoding errors). However, such failures need to be taken into account when calculating the effects of random hardware failures in GB T 20438. Component cross-testing reduces the likelihood of common cause failure.
NOTE 4: If there is a common cause failure between the E e PE safety-related system and the required cause or other layers of protection, it is necessary to demonstrate that this factor was taken into account in determining the safety integrity level and target failure volume requirements. The method for determining cocausal factors is described in GB T 20438Appendix D of 62017.
e) Diagnostic coverage of diagnostic tests for each subsystem (as determined by Appendix C), associated diagnostic test intervals, and undisclosed failure rates due to random hardware failures. When relevant, only 7,4 is satisfied5.3 The diagnostic tests required in 3 are taken into account. MTTR and MRT should be considered in the reliability model (see GB T 20438.)4-2017 in 36.21 vs. 36.Part 22):
Note 5: When determining the diagnostic test interval, all test intervals related to diagnostic coverage need to be considered.
f) Inspection test intervals for revealing dangerous faults: g) Checking whether the test is 100% valid:
Note 6: If the safety function cannot be restored to a "good as new" state due to imperfect inspection testing, the probability of failure will increase accordingly. Hypotheses should be demonstrated, including, in particular, the updatingability cycle of components or the impact on risk reduction over the life cycle of the security function. If the test is offline, the duration of the test should be considered. h) Repair time after detection of failure:
Note 7: Mean time to repair (MRT) is part of the mean time to recovery (MTTR) (see GB T 20438.)4-2017 in 36.22 vs. 36.21)。
The MTTR also includes the time it takes to detect failure and the time when repairs cannot be carried out (see GB T 20438.)6--Appendix B of 2017, How to Calculate the Probability of Failure Using MTTR and MRT), which can only be considered immediate repair when the EUC is shut down or repaired in a safe state. If it is not possible to stop the service at the EUC and carry out repairs in a safe state, it is especially important to fully consider the time when the maintenance cannot be carried out, especially if the period is relatively long. All factors related to maintenance should be considered.
i) If human operation is required to implement safety functions, the impact of random human error should be considered:
Note 8: If a person receives an alarm about an unsafe situation and needs to take measures, it is advisable to consider the randomness of human error, and the possibility of human error should be included in the entire calculation.
j) In fact, there are a variety of modeling methods, and the most appropriate method is determined by the analyst and depends on the circumstances. Possible methods include:
Causal analysis (GB T 20438.)7--2017 in b6.6.2) Fault tree analysis (GB T 20438.)7--2017 in b6.6.5), Markov model (GB T 20438.)Appendix B and GB T 6 in 2017-204387--2017 in b6.6.6) Reliability block diagram (GB T 20438.)Appendix B and GB T 6 in 2017-204387--2017 in b6.6.7) as well as Petri.com (GB T 20438.)Appendix B and GB T 6 in 2017-204387--2017 in b2.3.3)。
Note 9: GB T 20438A simple method is given in Appendix B6 for estimating the average probability of dangerous failure when a safety function is required due to random hardware failures to determine the target number of failures that the architecture meets the requirements.
Note 10: GB T 204386-2017 in a2 gives the necessary steps to achieve the required hardware security integrity and how this clause relates to the other requirements of GB T20438.
Note 11: For each safety function, the reliability of the safety-related system needs to be calculated separately. Because of the different failure modes of the components, the architecture of the E e PE safety-related systems (in terms of redundancy) may also be different.
7.4.5.3 When quantifying the effects of random hardware failures of a subsystem, if the component is used in a subsystem with a hardware fault margin of 0 and the safety function or part of the safety function of the subsystem needs to be operated in high-demand or continuous operation mode, the diagnosis may only be accepted if one of the following conditions is met
1. The sum of diagnostic test intervals and the time spent performing a specific function to obtain or maintain a safe state is less than the process safety time; 1. When working in the high-demand operation mode, the ratio of the diagnostic test rate to the required rate is equal to or greater than 100.
7.4.5.4 Diagnostic test intervals for any subsystem:
The hardware failure margin is greater than 0and to operate safety functions or partial safety functions in demanding or continuous operation mode; or one to run safety functions or partial safety functions in low-demand operating mode.
Ensure that the sum of the diagnostic test interval and the time it takes for the repair to detect the failure is less than the MTTR used to calculate the safety integrity of the safety function.
7.4.5.5 If for a particular design. The safety integrity requirements of the relevant safety function are not implemented. Then:
a) identify the components, subsystems and/or parameters that contribute the most to the failure rate;
b) Evaluate the impact of feasible improvement measures on the identified critical components, subsystems or parameters (e.g. use of more reliable components, additional common cause failure prevention measures). Improve diagnostic coverage, increase redundancy, shorten the interval between inspection and testing, cross-testing, etc.);
c) select and implement feasible improvement measures;
d) Repeat the necessary steps to determine the probability of a new random hardware failure.