Analysis and Improvement of the Safety of Component-Based Systems

This page shows our approach of a component based hazard analysis using an example. For the concept and references please use this link.

Example

To outline our approach, we employ an example, which stems from the RailCab research project at the University of Paderborn. The RailCab project was founded at the University of Paderborn in 1998 in order to develop a new railway system that features the advantages of public and individual transport in terms of cost and fuel efficiency as well as flexibility and comfort. The system is characterized by autonomous vehicles traveling on demand instead of trains according to a fixed schedule. For real-life validation of the complex system, a test track in a scale of 1:2.5 was built at the University of Paderborn in 2002.
One particular problem is to reduce the energy consumption due to air resistance by coordinating the autonomously operating RailCabs in such a way that they build convoys whenever possible. Such convoys are built on-demand and require a small distance between the different RailCabs. In convoy mode, the leader sends position reference values to the following RailCab which are used by the follower to control its speed. Additionally, the system employs a doubly-fed linear drive using magnetic fields in the RailCab and on the tracks for acceleration. The RailCab sends set values concerning the speed, direction and amplitude of the magnetic field wave to the controller on the track side in order to accelerate and brake.
Figure 1 shows the communication structure between the RailCabs and the inverter modules on the tracks which contain the magnetic field wave controller. We use a simplified version of the system architecture of the 1:2.5 test track within this demonstration.


Figure 1: Communication structure for a RailCab convoy on the test track

System Modeling

1. Component Types and Failure Propagation

We start the demonstration by showing the component type definition for the speed control component of a RailCab (upper part of Figure 2). The speed control component has several ports for input data (e.g. position reference values from the leader RailCab (port inPos), speed sensor values (port inSensor)), a port for a connection to the execution hardware (port ex), and an output port out for sending set values to the magnetic field controller.


Figure 2: Specification of component type and failure propagation
The failure propagation of a component type is specified in addition to its structural properties like ports and interfaces. The failure propagation of the value failure is shown in the lower part of the Figure. A value failure on the outgoing port out manifests when a value failure on ex port or a value failure on the inSensor manifests. The failure propagation view additionally contains a list of all available failure-type/port combinations for use in the failure propagation formula on the lower right side.

2. Deployment

We present the deployment diagram of Figure 3 in the next step. The upper part contains the simplified system architecture for a single RailCab. In the lower right part, the controller for the magnetic field wave on the track is shown.


Figure 3: Diagram showing the software component structure and its deployment to hardware components
The hardware part of the RailCab's system architecture consists of two ECUs (dSPACE DS1006 rapid prototyping boards), two sensors (a distance sensor and a speed sensor), and one wireless network device (DataEagle1000). The software components Comm, ConvoyOperator, PosCtrl, DistCtrl, and SpeedCtrl are deployed on the same ECU instance. The component for the communication between the RailCab and the magnetic field wave controller is deployed on a different execution hardware mainly due to performance issues experienced by the engineers. The magnetic field wave controller software component is deployed on a different ECU. The communication between the RailCab and the wave con- troller is realized by the DataEagle1000 component.

3. Hazard Specification

After showing the deployment diagram, we specify the hazard conditions. It is a hazardous situation if a RailCab drives in a convoy with the wrong speed. Therefore, a value failure on the out port of the speed control component instance is specified as a hazard condition. We specify as a second hazard that the magnetic field wave controller computes wrong set values for the actuator since the resulting incorrect acceleration of the RailCab may result in an accident. This hazard condition is modeled as a value failure at the in port of the actuator component instance.

Analysis

1. Top Down

After modeling the system architecture, its failure propagation, and the hazard conditions, we employ the top down analysis in order to determine which errors of the components result in the two mentioned hazards. Figure 4(a) shows the wizard page with the relevant results of the top down analysis.


Figure 4: Relevant errors for the hazards

The next step is to compute the probability of the hazards. Figure 5 shows the wizard page for specifying the error probabilities . On the next wizard page (s. Figure 6), the computed hazard probability is shown. The hazard concerning the wrong value on the outgoing port of the speed control component has a probability of 0.0100999899.


Figure 5: Input page for the first errors

Figure 6: Results of the quantitative analysis showing the hazard probability

2. Bottom Up

The top down analysis provides us with the relevant errors which lead to the hazards. We use the bottom up analysis to determine the propagation path of two of those errors through the system architecture to search for possible improvement points in the system architecture.
We inject an error into the distance sensor and simulate the failure propagation. In the simulation, the injected value error is propagated to the distance control component and finally to the speed control component by the specified and automatically derived failure propagations. There, the failure is not propagated any further and no hazard occurs. We additionally inject a value error into the data eagle component. Then, a simulation of the failure propagation based on the injected errors leads to a value failure on the out port of the speed control component which fulfills the hazard condition. This simulated failure propagation is shown in Figure 7.


Figure 7: Simulated failure propagation and hazard shown in deployment diagram

Transformation

1. Application

The value error in the data eagle communication hardware ultimately leads to the hazard condition. The data eagle device already employs checksums to detect wrong values, but it is still possible that the values are changed in such a way that the checksum is still valid.
In order to further reduce the probability that faulty values are received and remain undetected, we employ an acceptance test which checks whether received values are reasonable. We will, therefore, apply the "Insert Acceptance Test" transformation to the deployment model. Figure 8 shows a part of the resulting deployment model.


Figure 8: Deployment after application of the acceptance test transformation

The second presented hazard condition is a wrong value sent to the actuator by the wave control component. One source of this hazard is a value error in the hardware which executes the wave control component. In order to reduce the likelihood, we will make redundant copies of the wave control component which are executed on different hardware.
This change to the model is accomplished by the application of the "Triple Modular Redundancy" transformation. The transformation works by creating two clones of the wave control component. All incoming connections of the original wave control component are connected to multipliers which distribute the values to each of the three wave controls. The outgoing values are then combined by the voter. Figure 9 shows a part of the resulting deployment diagram. Note that the transformation does not include the deployment of the cloned wave control components as an automatic deployment of software components to execution hardware is not part of the TMR transformation. We, therefore, add new execution hardware and deploy the cloned component instances as the next step in this demonstration.


Figure 9: Deployment after application of the Triple Modular Redundancy transformation showing the redundant wave control components and the generated multiplier and voter components

After application of both transformations, we redo the top-down analysis. We assume that the acceptance test does not detect a wrong value with a probability of 0.01. The quantitative analysis computes a significantly lower probability (0.0002009898) for the hazard concerning the wrong value on the outgoing port of the speed control component for the transformed architecture. This is a direct result of using an acceptance test.

2. Specification

After the demonstration of the transformation application, we take a look into the specification of the "Triple Modular Redundancy" (TMR) transformation. Figure 10 shows the transformation diagram for the "Duplicate component instance" transformation which is later on used by the TMR transformation to clone a software component instance.


Figure 10: Tranformation for cloning a component instance including all its port instances

The signature at the start node shows that the transformation needs two parameters: the software component instance that is to be duplicated, and a name prefix for the new instance. The transformation returns the duplicate software component instance. The transformation consists of three activities. The first activity begins with the comp object which has been passed as an argument and thus is already bound. Next, the type of the component instance identified by comp and the diagram which comp is deployed in are bound. After that, a new software component instance of the same type is created. The new component's name is composed of the prefix and the original component's name.
The next activity is a for each activity. For every match of the specified structure that can be found in the architectural model, its each time transition is traversed. In this case, this happens for every discrete port instance attached to the original component instance. The each time transition leads to the third activity which creates a new discrete port instance on the duplicate component instance. Since this is done for all ports of the original component instance, a duplicate of the original component instance with the same number of port instances is returned in the end.


Figure 11: Part of the transformation which applies the Triple Modular Redundancy technique to the architecture

Figure 11 shows a section of the "Triple modular redundancy" transformation. The first activity creates two copies of a given software component instance by calling the "Duplicate component instance" transformation. The component instance for which the TMR technique is to be applied is used as an argument for both transformation calls (represented by the rounded rectangles). After that, multipliers for the incoming values and voters for the outgoing values of the component instances are created. The port instances are rewired appropriately.

Imprint | Webmaster | Recent changes: 27.10.2009