Todays embedded and safety-critical systems incorporate increasing amounts of software. Consequently, the software architecture and its connection to hardware elements have a big impact on the safety of those systems. We present in this paper an approach and its implementation in the Fujaba4Eclipse environment for the analysis and improvement of component-based systems w.r.t. the safety which specifically exploits the software and system structure. The proposed analysis technique utilizes the system structure and each element's failure propagation to reason about the possibility of a hazard as well as its probability. Additionally, our approach supports the specification and execution of reusable architectural transformations based on the graph transformation formalism to improve the safety of the sys- tem.
Software has become the driving force in the evolution of many
technical systems and in some areas grows at an exponential rate. As a
consequence, system engineers face a dramatically increasing complexity
due to the cooperation of beforehand isolated functions, as e.g. in the
domain of automotive software [8, 11]. To counter the effect of growing
complexity, systems are often built in a component-based fashion. A
concrete system then is a specific composition of reusable components.
Technical systems are often employed in a safety-critical context such
as in automobiles or airplanes where many human lives depend on their
safe operation. The software components of such a system, in particular
their interaction and their distribution throughout the system, have a
tremendous impact on its safety. Thus, when reasoning about the safety
of a system, both the hardware structure and the software structure as
well as the deployment relation between them - basically the whole
system architecture - have to be taken into account.
In [2, 1], we presented approaches and their implementation in the
Fujaba research prototype for the modeling of the structure as well as
real-time an hybrid behavior of safety-critical embedded systems. In
both approaches, formal verification is used to detect systematic
faults in the behavioral models.
We present a complementary approach [7, 6, 22] and its implementation
in the Eclipse version of the
Fujaba research prototype [17] to tackle random faults. The impact of
random faults by their propagation throughout the system architecture
is explored using a component-based hazard analysis. The system
architecture can then be improved by the application of fault tolerance
techniques which
are formalized using graph transformations.
Figure 1 shows an overview of the approach. In a first step, a
developer creates a model of the system's architecture. Besides the
hardware and software components and the connections between them, the
developer specifies the abstract behavior of the architectural elements
w.r.t. faults, errors, and failures, following the fault pathology
proposed
by Laprie [15]. In addition, safety-critical system states, called
hazards, are defined as certain combinations of failures.
Based on this model, a qualitative and quantitative hazard analysis is
performed by the same or a different developer. If the results of the
hazard analysis do not satisfy the system's safety requirements, the
system can be changed in order to improve its safety, e.g. by the
integration of fault tolerance techniques. A manual change of the
system, however, can result in additional systematic faults due to the
increased complexity. Consequently, we propose to automate the
application of fault tolerance techniques by reusable formal model
transformation specifications. The developer chooses the appropriate
transformation from a transformation library. The transformation is
then automatically applied to the system architecture model.
We use UML 2.0 Components to model the structure of the system. As
hardware failures have a direct influence on the software we use a
generalized model of components which distinguishes hardware types and
software types. Further, we use UML 2.0 Deployment Diagrams to deploy
software component instances on a hardware component instance (e.g an
electrical control unit) and to model the network topology which
connects the hardware components. We use special deployment ports in
addition to standard software ports to describe the possible effects of
the hardware on the software components.
We use the fault pathology concept [15] for the specification of a
component type's abstract behavior w.r.t. faults, errors, and failures.
A manifested fault is an error. An error may lead to a failure and a
failure is an externally visible deviation from the correct behavior. A
failure results in a fault for other components which depend on that
component. We omit faults from our model as only their manifestations
as errors are relevent for a hazard analysis not their dormant state.
To model the events of the system which can lead to a hazard, a
component type is extended by a failure propagation. We use Boolean
logic to specify a failure propagation. In general, a failure
propagation consists of a set of outgoing failure variables, a set of
incoming failure variables, a set of internal error variables, and
failure dependencies. The specified failures and errors are typed and
we distinguish the general failure classes crash, timing, and value
failure. The approach additionally supports user-defined failure
classifications. Figure 2 shows an example of a failure propagation.
Component type Comp propagates a value failure on port into a value
failue on port out.
After modeling the component types and their failure propagations,
the software components are deployed on hardware components. On this
instance view, hazards are specified.
To describe the occurrence of hazards, we use standard fault tree
analysis. Hence, the hazardous event is shown as the top of a fault
tree which is caused by a combination of outgoing failure variables and
AND/OR operators.
The structural and behavioral models support different types of
hazard analysis. A qualitative analysis is used to answer which hazards
result from a set of given basic errors (bottom up) as well as which
errors must occur in order for a given hazard to happen (top down).
This qualitative analysis is accompanied by a quantitative analysis
which computes the hazard's probability or risk. In contrast to recent
works in the field of component-based hazard analysis [4, 18, 13, 10]
our hazard analysis approach utilizes the system architecture, supports
user-defined failure classifications, and cyclic models.
We presented an approach to compute the errors for a given hazard
and the hazard's probability and risk in [7, 6]. Our approach employs,
as other works [3, 19], binary decision diagrams (BDDs) for efficient
operation.
The system failure propagation is a combination of the failure
propagations of all component instances in the deployment diagram with
automatically inferred failure propagations of the connectors between
components. The system failure propagation is then combined with the
hazard definition for the top-down hazard analysis.
The failure variables are only used to connect the failure propagations
of the components and the connectors. They only occur as a result of
errors and are not associated with a probability. Consequently, they
can be removed from the failure propagation of the whole model using
the techniques presented in [19].
The resulting Boolean formula then only consists of those basic errors
whose combination results in the hazard. The prime implicants of a
Boolean formula are of special interest in a hazard analysis since they
denote smallest hazard scenarios. The prime implicants for the hazard
are efficiently
computable from the BDD-representation.
In addition to the possibility of a hazard occurrence, its probability
is computable, if (independent) probabilities are known for the basic
errors. This probability is recursively computed on the BDD in linear
time as shown in [3, 6].
We employ a simulation of the failure propagation to determine which hazards occur for a given set of errors. This simulation additionally enables the developer to visually see the propagation path of the errors through the system architecture. This simulation starts with the given set of errors and recursively evaluates and executes the failure propagations and hazard conditions. This simulation finishes when no further failure propagations and hazard conditions are executable. The result is a deployment diagram which is annotated by the propagation paths of the failure propaga- tions.
The analysis step identifies which errors in which component
instances of the modeled system ultimately lead to safety-critical
situations. In order to keep the system operating as safely as
possible, such situations should be avoided. The triggering errors are
inherently unavoidable. Their effect on the system, however, can be
minimized using fault tolerance techniques (cf. [21]).
We support the semi-automatic application of fault tolerance techniques
to an existing system model. The steps which are necessary to implement
such a technique in an existing model, are formally specified by
transformations. The specification of transformations is based on
controlled graph
rewriting using an extension of Story diagrams [5] which we call
transformation diagrams. Transformation diagrams are special UML
activity diagrams in which the activities are defined by graph
transformation rules [20]. The rules are specified w.r.t. the metamodel
of the system model, i.e. they are typed by the metamodel and thus are
representatives of partial system models.
Figure 3 shows an example of a rule inside an activity, that copies
a given component instance, represented by the instance object.
Starting from it, the instances link is traversed backwards to bind the
type of the instance. After that, a new component instance of the same
type is created by creating the copy object (denoted by the stereotype
«create» ) and linking it to the type object via an instances link.
Like activity diagrams, transformation diagrams allow to connect
activities, i.e. graph transformation rules, in complex control flow
consisting of sequences, alternatives, and iterations. Unless a graph
transformation rule is explicitly specified to be iterated, it is
executed only once when reached by the control flow. The different
rules contained by one transformation diagram may build upon each other
in such a way, that they may directly access sub graphs which were
bound or created by previously applied rules. The in stance object in
Figure 3 for example must have been bound before by another rule.
Transformation diagrams may call other transformation diagrams to
facilitate composition and reuse. Similar to method declarations in
programming languages, a transformation diagram has a signature
consisting of parameters and result declarations. In order to call a
transformation, the caller has to supply all required arguments.
Results declared by the called transformation may in turn be bound and
used by a calling transformation. Each transformation must declare at
least one parameter representing an element of the system model.
Consequently, one element of the model is always known and used as a
starting point for the transformation.
Multiple transformation diagrams can be packed into a library to make
them available for application. In order to apply a fault tolerance
technique, the engineer chooses the appropriate transformation from the
respective transformation library and supplies the arguments for the
transformation's parameters. After that, the transformation can be
executed automatically resulting in a transformed model.
Our approach for transforming the system structure differs from recent
works [9, 12] by the support of control flow in the transformation
specifications. The approach in [14] differs from our approach in that
it only targets the behavioral but not the structural aspects of the
system architecture. This approach might complement ours.
We presented an approach for the analysis and improvement of the
safety of component-based systems. The approach is prototypically
implemented as a set of plugins for the Fujaba4Eclipse environment.
The approach takes advantage of the system architecture. It supports
the modeling of the architecture including the architectural elements'
abstract failure propagations. Using this system architecture model, a
simulation of the failure propagation based on injected errors can be
performed. Additionally, the possibility of a hazard's occurrence and
its probability can be analyzed. If the results of the hazard analysis
do not satisfy the safety requirements, the system
architecture can be changed by formal architecture transformations to
automate the application of fault tolerance techniques. The
transformations are based on the graph transformation formalism. We
additionally employ our transformation diagrams formalism in the domain
of software reengineering [16].
Our top-down hazard analysis approach additionally supports the
analysis of models with structural variants since more and more
embedded systems are available in different variants or even
reconfigure during runtime. The approach, amongst other things [6],
computes the best and worst variant w.r.t. the hazard probability, a
hazard risk, or a weighted sum of hazard risks.
The approach and the accompanying research prototype can be improved in several ways. It would benefit from checking the correctness of the modeled abstract failure propagation behavior of each component w.r.t. its functional behavior. Additionally, the failure propagation might be (semi-)auto- matically inferable from the functional behavior. The transformations only adapt the structure of the sys- tem architecture. If new components are introduced by the application of a transformation, our approach requires that the functional behavior of the component is provided by the engineer. We achieved early results for automatically syn- thesizing the real-time behavior of a voting component of a triple modular redundancy setup from the communication protocols of the involved components as discussed in [22]. The correctness of the transformations themselves is im- portant. We are currently working on verification techniques which allow us to prove that transformations do not vio- late certain structural properties when applied to a model (cf. [16]).
[1] S. Burmester, H. Giese, S. Henkler, M. Hirsch, M. Tichy, A. Gambuzza, E. Münch, and H. Vöcking. Tool support for developing advanced mechatronic systems: Integrating the fujaba real-time tool suite with camel-view. In Proc. of the 29th International Conference on Software Engineering (ICSE), Minneapolis, Minnesota, USA, pages 801-804, May2007.
[2] S. Burmester, H. Giese, M. Hirsch, D. Schilling, and M. Tichy. The Fujaba Real-Time Tool Suite: Model-Driven Development of Safety-Critical, Real-Time Systems. In Proc. of the 27th International Conference on Software Engineering (ICSE), St. Louis, Missouri, USA, pages 670-671, May 2005.
[3] O. Coudert and J. Madre. Fault tree analysis: 1020 prime implicants and beyond. In Proceedings of the Annual Reliability and Maintainability Symposium, pages 240-245, Atlanta, GA , USA, January 1993. IEEE Press.
[4] P. Fenelon, J. A. McDermid, M. Nicolson, and D. J. Pumfrey. Towards integrated safety analysis and design. ACM SIGAPP Applied Computing Review, 2(1):21-32, 1994.
[5] T. Fischer, J. Niere, L. Torunski, and A. Zündorf. Story diagrams: A new graph rewrite language based on the unified modeling language. In Proc. of the 6th International Workshop on Theory and Application of Graph Transformation (TAGT), Paderborn, Germany, LNCS 1764, pages 296-309. Springer Verlag, November 1998.
[6] H. Giese and M. Tichy. Component-Based Hazard Analysis: Optimal Designs, Product Lines, and Online-Reconfiguration. In Proc. of the 25th International Conference on Computer Safety, Security and Reliability, Gdansk, Poland, LNCS. Springer Verlag, September 2006.
[7] H. Giese, M. Tichy, and D. Schilling. Compositional Hazard Analysis of UML Components and Deployment Models. In Proc. of the 23rd International Conference on Computer Safety, Reliability and Security (SAFECOMP), Potsdam, Germany, volume 3219 of Lecture Notes in Computer Science. Springer Verlag, September 2004.
[8] K. Grimm. Software technology in an automotive company: major challenges. In ICSE '03: Proceedings of the 25th International Conference on Software Engineering, pages 498-503, Washington, DC, USA, 2003. IEEE Computer Society.
[9] L. Grunske. Transformational Patterns for the Improvement of Safety. In Proc. of the The Second Nordic Conference on Pattern Languages of Programs (VikingPLoP 03). Microsoft Buisness Press, 2003.
[10] L. Grunske and R. Neumann. Quality Improvement by Integrating Non-Functional Properties in Software Architecture Specification. In Proc. of the Second Workshop on Evaluating and Architecting System dependabilitY (EASY), 6 October 2002, San Jose, California, USA, October 2002.
[11] B. Hardung, T. Kölzow, and A. Krüger. Reuse of software in distributed embedded automotive systems. In EMSOFT '04: Proceedings of the 4th ACM international conference on Embedded software, pages 203-210, New York, NY, USA, 2004. ACM Press.
[12] M. H. Kacem, A. H. Kacem, M. Jmaiel, and K. Drira. Describing dynamic software architectures using an extended uml model. In SAC '06: Proceedings of the 2006 ACM symposium on Applied computing, pages 1245-1249, New York, NY, USA, 2006. ACM Press.
[13] B. Kaiser, P. Liggesmeyer, and O. Maeckel. A New Component Concept for Fault Trees. In Proceedings of the 8th National Workshop on Safety Critical Systems and Software (SCS 2003), Canberra, Australia. 9-10th October 2003, volume 33 of Research and Practice in Information Technology, 2003.
[14] S. S. Kulkarni and A. Ebnenasir. Enhancing The Fault-Tolerance of Nonmasking Programs. In Proc. of the 23rd International Conference on Distributed Computing Systems, pages 441-449. IEEE Computer Society, 2003.
[15] J. C. Laprie, editor. Dependability : basic concepts and terminology in English, French, German, Italian and Japanese [IFIP WG 10.4, Dependable Computing and Fault Tolerance], volume 5 of Dependable computing and fault tolerant systems. Springer Verlag, Wien, 1992.
[16] M. Meyer. Pattern-based Reengineering of Software Systems. In Proceedings of the 13th Working Conference on Reverse Engineering (WCRE 2006), Benevento, Italy, pages 305-306. IEEE Computer Society, October 2006.
[17] M. Meyer and L. Wendehals. Teaching Object-Oriented Concepts with Eclipse. In Proc. of the Eclipse Technology eXchange Workshop (ETX), Satellite Event of the 19th Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA), Vancouver, Canada, pages 1-5. ACM Press, October 2004.
[18] Y. Papadopoulos, J. McDermid, b. R. Sasse, and G. Heiner. Analysis and synthesis of the behaviour of complex programmable electronic systems in conditions of failure. Reliability Engineering & System Safety, 71:229-f247, March 2001.
[19] A. Rauzy. A new methodology to handle Boolean models with loops. IEEE Transactions on Reliability, 52:96-105, March 2003.
[20] G. Rozenberg, editor. Handbook of Graph Grammars and Computing by Graph Transformation : Foundations. World Scientific Pub Co, February 1997. Volume 1.
[21] N. Storey. Safety-Critical Computer Systems. Addison-Wesley, 1996.#
[22] M. Tichy. Pattern-based synthesis of fault-tolerant embedded systems. In Proc. of the Doctoral Symposium of the Fourteenth ACM SIGSOFT Symposium on Foundations of Software Engineering (FSE), Portland, Oregon, USA, pages 13-18, Nov. 2006.