Analysis and Improvement of the Safety of Component-Based Systems

Todays embedded and safety-critical systems incorporate increasing amounts of software. Consequently, the software architecture and its connection to hardware elements have a big impact on the safety of those systems. We present in this paper an approach and its implementation in the Fujaba4Eclipse environment for the analysis and improvement of component-based systems w.r.t. the safety which specifically exploits the software and system structure. The proposed analysis technique utilizes the system structure and each element's failure propagation to reason about the possibility of a hazard as well as its probability. Additionally, our approach supports the specification and execution of reusable architectural transformations based on the graph transformation formalism to improve the safety of the sys- tem.

Introduction

Software has become the driving force in the evolution of many technical systems and in some areas grows at an exponential rate. As a consequence, system engineers face a dramatically increasing complexity due to the cooperation of beforehand isolated functions, as e.g. in the domain of automotive software [8, 11]. To counter the effect of growing complexity, systems are often built in a component-based fashion. A concrete system then is a specific composition of reusable components.
Technical systems are often employed in a safety-critical context such as in automobiles or airplanes where many human lives depend on their safe operation. The software components of such a system, in particular their interaction and their distribution throughout the system, have a tremendous impact on its safety. Thus, when reasoning about the safety of a system, both the hardware structure and the software structure as well as the deployment relation between them - basically the whole system architecture - have to be taken into account.
In [2, 1], we presented approaches and their implementation in the Fujaba research prototype for the modeling of the structure as well as real-time an hybrid behavior of safety-critical embedded systems. In both approaches, formal verification is used to detect systematic faults in the behavioral models.
We present a complementary approach [7, 6, 22] and its implementation in the Eclipse version of the Fujaba research prototype [17] to tackle random faults. The impact of random faults by their propagation throughout the system architecture is explored using a component-based hazard analysis. The system architecture can then be improved by the application of fault tolerance techniques which are formalized using graph transformations.


Figure 1: Approach overview

Figure 1 shows an overview of the approach. In a first step, a developer creates a model of the system's architecture. Besides the hardware and software components and the connections between them, the developer specifies the abstract behavior of the architectural elements w.r.t. faults, errors, and failures, following the fault pathology proposed by Laprie [15]. In addition, safety-critical system states, called hazards, are defined as certain combinations of failures.
Based on this model, a qualitative and quantitative hazard analysis is performed by the same or a different developer. If the results of the hazard analysis do not satisfy the system's safety requirements, the system can be changed in order to improve its safety, e.g. by the integration of fault tolerance techniques. A manual change of the system, however, can result in additional systematic faults due to the increased complexity. Consequently, we propose to automate the application of fault tolerance techniques by reusable formal model transformation specifications. The developer chooses the appropriate transformation from a transformation library. The transformation is then automatically applied to the system architecture model.

System Modeling

We use UML 2.0 Components to model the structure of the system. As hardware failures have a direct influence on the software we use a generalized model of components which distinguishes hardware types and software types. Further, we use UML 2.0 Deployment Diagrams to deploy software component instances on a hardware component instance (e.g an electrical control unit) and to model the network topology which connects the hardware components. We use special deployment ports in addition to standard software ports to describe the possible effects of the hardware on the software components.
We use the fault pathology concept [15] for the specification of a component type's abstract behavior w.r.t. faults, errors, and failures. A manifested fault is an error. An error may lead to a failure and a failure is an externally visible deviation from the correct behavior. A failure results in a fault for other components which depend on that component. We omit faults from our model as only their manifestations as errors are relevent for a hazard analysis not their dormant state. To model the events of the system which can lead to a hazard, a component type is extended by a failure propagation. We use Boolean logic to specify a failure propagation. In general, a failure propagation consists of a set of outgoing failure variables, a set of incoming failure variables, a set of internal error variables, and failure dependencies. The specified failures and errors are typed and we distinguish the general failure classes crash, timing, and value failure. The approach additionally supports user-defined failure classifications. Figure 2 shows an example of a failure propagation. Component type Comp propagates a value failure on port into a value failue on port out.


Figure 2: Component with Failure Propagation

After modeling the component types and their failure propagations, the software components are deployed on hardware components. On this instance view, hazards are specified.
To describe the occurrence of hazards, we use standard fault tree analysis. Hence, the hazardous event is shown as the top of a fault tree which is caused by a combination of outgoing failure variables and AND/OR operators.

Hazard Analysis

The structural and behavioral models support different types of hazard analysis. A qualitative analysis is used to answer which hazards result from a set of given basic errors (bottom up) as well as which errors must occur in order for a given hazard to happen (top down).
This qualitative analysis is accompanied by a quantitative analysis which computes the hazard's probability or risk. In contrast to recent works in the field of component-based hazard analysis [4, 18, 13, 10] our hazard analysis approach utilizes the system architecture, supports user-defined failure classifications, and cyclic models.

1. Top Down

We presented an approach to compute the errors for a given hazard and the hazard's probability and risk in [7, 6]. Our approach employs, as other works [3, 19], binary decision diagrams (BDDs) for efficient operation.
The system failure propagation is a combination of the failure propagations of all component instances in the deployment diagram with automatically inferred failure propagations of the connectors between components. The system failure propagation is then combined with the hazard definition for the top-down hazard analysis.
The failure variables are only used to connect the failure propagations of the components and the connectors. They only occur as a result of errors and are not associated with a probability. Consequently, they can be removed from the failure propagation of the whole model using the techniques presented in [19].
The resulting Boolean formula then only consists of those basic errors whose combination results in the hazard. The prime implicants of a Boolean formula are of special interest in a hazard analysis since they denote smallest hazard scenarios. The prime implicants for the hazard are efficiently computable from the BDD-representation.
In addition to the possibility of a hazard occurrence, its probability is computable, if (independent) probabilities are known for the basic errors. This probability is recursively computed on the BDD in linear time as shown in [3, 6].

2. Botttom Up

We employ a simulation of the failure propagation to determine which hazards occur for a given set of errors. This simulation additionally enables the developer to visually see the propagation path of the errors through the system architecture. This simulation starts with the given set of errors and recursively evaluates and executes the failure propagations and hazard conditions. This simulation finishes when no further failure propagations and hazard conditions are executable. The result is a deployment diagram which is annotated by the propagation paths of the failure propaga- tions.

Improvement

The analysis step identifies which errors in which component instances of the modeled system ultimately lead to safety-critical situations. In order to keep the system operating as safely as possible, such situations should be avoided. The triggering errors are inherently unavoidable. Their effect on the system, however, can be minimized using fault tolerance techniques (cf. [21]).
We support the semi-automatic application of fault tolerance techniques to an existing system model. The steps which are necessary to implement such a technique in an existing model, are formally specified by transformations. The specification of transformations is based on controlled graph rewriting using an extension of Story diagrams [5] which we call transformation diagrams. Transformation diagrams are special UML activity diagrams in which the activities are defined by graph transformation rules [20]. The rules are specified w.r.t. the metamodel of the system model, i.e. they are typed by the metamodel and thus are representatives of partial system models.


Figure 3: Example activity of a Transformation Diagram

Figure 3 shows an example of a rule inside an activity, that copies a given component instance, represented by the instance object. Starting from it, the instances link is traversed backwards to bind the type of the instance. After that, a new component instance of the same type is created by creating the copy object (denoted by the stereotype «create» ) and linking it to the type object via an instances link.
Like activity diagrams, transformation diagrams allow to connect activities, i.e. graph transformation rules, in complex control flow consisting of sequences, alternatives, and iterations. Unless a graph transformation rule is explicitly specified to be iterated, it is executed only once when reached by the control flow. The different rules contained by one transformation diagram may build upon each other in such a way, that they may directly access sub graphs which were bound or created by previously applied rules. The in stance object in Figure 3 for example must have been bound before by another rule.
Transformation diagrams may call other transformation diagrams to facilitate composition and reuse. Similar to method declarations in programming languages, a transformation diagram has a signature consisting of parameters and result declarations. In order to call a transformation, the caller has to supply all required arguments. Results declared by the called transformation may in turn be bound and used by a calling transformation. Each transformation must declare at least one parameter representing an element of the system model. Consequently, one element of the model is always known and used as a starting point for the transformation.
Multiple transformation diagrams can be packed into a library to make them available for application. In order to apply a fault tolerance technique, the engineer chooses the appropriate transformation from the respective transformation library and supplies the arguments for the transformation's parameters. After that, the transformation can be executed automatically resulting in a transformed model.
Our approach for transforming the system structure differs from recent works [9, 12] by the support of control flow in the transformation specifications. The approach in [14] differs from our approach in that it only targets the behavioral but not the structural aspects of the system architecture. This approach might complement ours.

Conclusions and Future Work

We presented an approach for the analysis and improvement of the safety of component-based systems. The approach is prototypically implemented as a set of plugins for the Fujaba4Eclipse environment.
The approach takes advantage of the system architecture. It supports the modeling of the architecture including the architectural elements' abstract failure propagations. Using this system architecture model, a simulation of the failure propagation based on injected errors can be performed. Additionally, the possibility of a hazard's occurrence and its probability can be analyzed. If the results of the hazard analysis do not satisfy the safety requirements, the system architecture can be changed by formal architecture transformations to automate the application of fault tolerance techniques. The transformations are based on the graph transformation formalism. We additionally employ our transformation diagrams formalism in the domain of software reengineering [16].
Our top-down hazard analysis approach additionally supports the analysis of models with structural variants since more and more embedded systems are available in different variants or even reconfigure during runtime. The approach, amongst other things [6], computes the best and worst variant w.r.t. the hazard probability, a hazard risk, or a weighted sum of hazard risks.

Future Work

The approach and the accompanying research prototype can be improved in several ways. It would benefit from checking the correctness of the modeled abstract failure propagation behavior of each component w.r.t. its functional behavior. Additionally, the failure propagation might be (semi-)auto- matically inferable from the functional behavior. The transformations only adapt the structure of the sys- tem architecture. If new components are introduced by the application of a transformation, our approach requires that the functional behavior of the component is provided by the engineer. We achieved early results for automatically syn- thesizing the real-time behavior of a voting component of a triple modular redundancy setup from the communication protocols of the involved components as discussed in [22]. The correctness of the transformations themselves is im- portant. We are currently working on verification techniques which allow us to prove that transformations do not vio- late certain structural properties when applied to a model (cf. [16]).

References

[1] S. Burmester, H. Giese, S. Henkler, M. Hirsch, M. Tichy, A. Gambuzza, E. Münch, and H. Vöcking. Tool support for developing advanced mechatronic systems: Integrating the fujaba real-time tool suite with camel-view. In Proc. of the 29th International Conference on Software Engineering (ICSE), Minneapolis, Minnesota, USA, pages 801-804, May2007.

[2] S. Burmester, H. Giese, M. Hirsch, D. Schilling, and M. Tichy. The Fujaba Real-Time Tool Suite: Model-Driven Development of Safety-Critical, Real-Time Systems. In Proc. of the 27th International Conference on Software Engineering (ICSE), St. Louis, Missouri, USA, pages 670-671, May 2005.

[3] O. Coudert and J. Madre. Fault tree analysis: 1020 prime implicants and beyond. In Proceedings of the Annual Reliability and Maintainability Symposium, pages 240-245, Atlanta, GA , USA, January 1993. IEEE Press.

[4] P. Fenelon, J. A. McDermid, M. Nicolson, and D. J. Pumfrey. Towards integrated safety analysis and design. ACM SIGAPP Applied Computing Review, 2(1):21-32, 1994.

[5] T. Fischer, J. Niere, L. Torunski, and A. Zündorf. Story diagrams: A new graph rewrite language based on the unified modeling language. In Proc. of the 6th International Workshop on Theory and Application of Graph Transformation (TAGT), Paderborn, Germany, LNCS 1764, pages 296-309. Springer Verlag, November 1998.

[6] H. Giese and M. Tichy. Component-Based Hazard Analysis: Optimal Designs, Product Lines, and Online-Reconfiguration. In Proc. of the 25th International Conference on Computer Safety, Security and Reliability, Gdansk, Poland, LNCS. Springer Verlag, September 2006.

[7] H. Giese, M. Tichy, and D. Schilling. Compositional Hazard Analysis of UML Components and Deployment Models. In Proc. of the 23rd International Conference on Computer Safety, Reliability and Security (SAFECOMP), Potsdam, Germany, volume 3219 of Lecture Notes in Computer Science. Springer Verlag, September 2004.

[8] K. Grimm. Software technology in an automotive company: major challenges. In ICSE '03: Proceedings of the 25th International Conference on Software Engineering, pages 498-503, Washington, DC, USA, 2003. IEEE Computer Society.

[9] L. Grunske. Transformational Patterns for the Improvement of Safety. In Proc. of the The Second Nordic Conference on Pattern Languages of Programs (VikingPLoP 03). Microsoft Buisness Press, 2003.

[10] L. Grunske and R. Neumann. Quality Improvement by Integrating Non-Functional Properties in Software Architecture Specification. In Proc. of the Second Workshop on Evaluating and Architecting System dependabilitY (EASY), 6 October 2002, San Jose, California, USA, October 2002.

[11] B. Hardung, T. Kölzow, and A. Krüger. Reuse of software in distributed embedded automotive systems. In EMSOFT '04: Proceedings of the 4th ACM international conference on Embedded software, pages 203-210, New York, NY, USA, 2004. ACM Press.

[12] M. H. Kacem, A. H. Kacem, M. Jmaiel, and K. Drira. Describing dynamic software architectures using an extended uml model. In SAC '06: Proceedings of the 2006 ACM symposium on Applied computing, pages 1245-1249, New York, NY, USA, 2006. ACM Press.

[13] B. Kaiser, P. Liggesmeyer, and O. Maeckel. A New Component Concept for Fault Trees. In Proceedings of the 8th National Workshop on Safety Critical Systems and Software (SCS 2003), Canberra, Australia. 9-10th October 2003, volume 33 of Research and Practice in Information Technology, 2003.

[14] S. S. Kulkarni and A. Ebnenasir. Enhancing The Fault-Tolerance of Nonmasking Programs. In Proc. of the 23rd International Conference on Distributed Computing Systems, pages 441-449. IEEE Computer Society, 2003.

[15] J. C. Laprie, editor. Dependability : basic concepts and terminology in English, French, German, Italian and Japanese [IFIP WG 10.4, Dependable Computing and Fault Tolerance], volume 5 of Dependable computing and fault tolerant systems. Springer Verlag, Wien, 1992.

[16] M. Meyer. Pattern-based Reengineering of Software Systems. In Proceedings of the 13th Working Conference on Reverse Engineering (WCRE 2006), Benevento, Italy, pages 305-306. IEEE Computer Society, October 2006.

[17] M. Meyer and L. Wendehals. Teaching Object-Oriented Concepts with Eclipse. In Proc. of the Eclipse Technology eXchange Workshop (ETX), Satellite Event of the 19th Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), Vancouver, Canada, pages 1-5. ACM Press, October 2004.

[18] Y. Papadopoulos, J. McDermid, b. R. Sasse, and G. Heiner. Analysis and synthesis of the behaviour of complex programmable electronic systems in conditions of failure. Reliability Engineering & System Safety, 71:229-f247, March 2001.

[19] A. Rauzy. A new methodology to handle Boolean models with loops. IEEE Transactions on Reliability, 52:96-105, March 2003.

[20] G. Rozenberg, editor. Handbook of Graph Grammars and Computing by Graph Transformation : Foundations. World Scientific Pub Co, February 1997. Volume 1.

[21] N. Storey. Safety-Critical Computer Systems. Addison-Wesley, 1996.#

[22] M. Tichy. Pattern-based synthesis of fault-tolerant embedded systems. In Proc. of the Doctoral Symposium of the Fourteenth ACM SIGSOFT Symposium on Foundations of Software Engineering (FSE), Portland, Oregon, USA, pages 13-18, Nov. 2006.

Imprint | Webmaster | Recent changes: 27.10.2009