Mentee Project

Beginning Summary

The Laboratory for Advanced Software Engineering Research (LASER) has two divisions: The analysis group researches ways to verify that software correctly meets its specifications, using finite state verification. The process group researches ways to make the software development process more conducive to producing high quality software. I'm with the analysis group, and Heather Conboy is my project leader.

My project pertains to FLAVERS, Flow Analysis for Verification of Systems. FLAVERS is a tool used for verifying particular properties of software systems written in Ada or Java. The user can submit the source code and a property, then let FLAVERS determine if that property holds for every possible execution path in the system. Sequential or concurrent programs can be analyzed.

The property must be expressed by the user as a Quantified Regular Expression (QRE) or a Finite-State Automaton (FSA). If the user writes the property as a QRE, FLAVERS will translate that to a FSA. FLAVERS uses control flow graphs (CFG) and trace flow graphs (TFG) to model the system.

After the system analysis, FLAVERS will give conclusive (property always held) or inconclusive results. If the results are inconclusive, FLAVERS produces a counterexample. In this case, there might be a bug in the user's program, or the counterexample is an infeasible path, meaning the system could never actually execute the behavior that violated the property.

These infeasible paths are possible because the TFG that FLAVERS creates is not an accurate model of the system. The TFG includes all possible paths, but also includes infeasible paths. Making the model completely accurate is not always necessary to verify the property, so it's better to save time by not making it completely accurate right away. The user can add constraints to the model to weed out some of those infeasible paths. Constraints are also expressed as a QRE or FSA.

My project involves researching ways to automate the choosing of constraints. I'll focus on choosing a type of constraint called a task automaton (TA) when dealing with counterexamples in concurrent systems. (A task is an Ada term for what's called a thread in Java). A TA ensures that for that particular task, no steps are taken out of order. During an analysis using a TA, any counterexample given is feasible in regards to that task.

As an example, think of the familiar Dining-Philosophers concurrency problem. One property that could be verified is that Philosopher-1 and Philosopher-2 never eat at the same time because they're seated beside each other. If FLAVERS analyzed a correct implementation of Dining-Philosophers without using constraints, it would produce inconclusive results because the TFG included paths that were not actually executable. By adding a TA for Philosopher-1, a TA for Philosopher-2, and a TA for the chopstick between the two, FLAVERS can create a TFG accurate enough for the analysis to give conclusive results.

" Finite State Verification has been gaining credibility and attention as an alternative to testing and to formal verification approaches based on theorem proving. "
-- FLAVERS: a Finite State Verification Technique for Software Systems
by Jamieson Cobleigh, Lori Clarke, and Leon Osterweil

For more about FLAVERS, see the tutorial.

Final Report

My overall experience in the mentorship has been beneficial. I've gotten a feel for what graduate school is like, learned some things that will be helpful for my remaining undergraduate semesters, and had fun in the process.

Predicting Task Automata Needed for the FLAVERS Analysis of a Concurrent System
Kris Hauman
August 2002

FLAVERS (Flow Analysis for Verification of Systems) is a Finite State Verification application. In order to verify certain properties in a system, FLAVERS first creates an imprecise model of the system. The user usually has to make the model more accurate by adding constraints. To verify a property, the user often has to repeatedly choose constraints and run the analysis. In concurrent systems, a type of constraint called task automata (TA) is needed. If FLAVERS could predict at least some of the needed task automata, it could save the user some time by automatically including those TA or suggesting them to the user.

I have been looking at using alphabet related and communication related tasks as possible predictions of what TA must be included to verify a property in an analysis problem. Alphabet related tasks refer to tasks that are related to the property's alphabet or tasks that are related to a variable automata (VA) or context constraint's alphabet. An alphabet consists of the events that occur in a task, or that are used in a property's definition. The communication related tasks are those that communicate with the alphabet related tasks.

A task is related to a property if the task has one or more event in common with the property. A task is related to a constraint if the task has one or more event in common with the constraint. (By constraint I mean non-TA constraint, such as a variable automaton or context constraint). Two tasks are communication related if they have an event in common, more specifically, if they share a communication node (the system being analyzed is modeled using a trace flow graph).

It is possible that some combination of property-related, constraint-related, and communication-related tasks closely approximates the group of actually needed TA.

When looking at 36 test case examples and comparing what TA were needed with what TA were property-related, constraint-related, and communication-related to the property-related tasks, it seemed that these related tasks could make useful predictions. In all the examples, at least one property-related TA was needed. In about 58% of the examples, all of the TA needed were property-related. Thus, including the property-related tasks is sufficient for verifying the property in many examples. This may make it seem like a good idea to suggest or include all the property-related TA. However, in about 44% of the examples, at least one of the property-related tasks was not a needed TA. So if all the property-related TA were included, sometimes there would be extra, unneeded TA. For this reason, it would be useful to know if the analysis run-time suffers from having extra TA.

It seemed that most of the time, all the needed TA were either property-related or communication-related to property-related tasks. In about 73% of the examples with needed, property-unrelated TA, all of the needed, property-unrelated TA were communication-related to the needed, property-related TA.

Including the constraint-related TA may also prove useful. In about 67% of examples with needed VA, all of the needed TA were constraint-related to a needed VA.

Some of the example properties can be verified by different combinations of constraints. When looking at the examples mentioned above, I tried to make sure that the minimum number of constraints needed was the combination recorded. The percentages above could be changed by looking at other combinations of needed constraints for each example. The Breadth-First-Search Find Path algorithm was used when finding the counterexamples for these examples. Though the numbers are likely inaccurate, the point is that these criteria for predicting TA may be useful. More experiments are being done to try to determine how useful the predictions are.

The experiments will help determine if the analyses with predicted TA give conclusive results. Also, if the predicted TA is more than what's needed, will the analysis time suffer from the extra TA. If it does, maybe better predictions are needed.

The experimentation will compare the analysis-runtime, the TA used, and verification results of analysis problems that include the minimum TA needed for property verification and analysis problems that include certain TA that are predicted to be needed. These comparisons will give a better idea of how close the property-related, constraint-related, or communication-related tasks come to the combination needed for property verification.

In the experiments, there are several analysis problems for each test case example. Each one includes the TA that comes from a different prediction approach. The analysis problems with the predicted TA will include any VA or context constraints that are needed for property verification, but not any additional TA. Six approaches were used, as shown below.

Constraints used in each of the prediction approaches:

VA + context-constraints +

all property-related tasks
some property-related tasks
all property-related tasks and communication-related to those
constraint-related tasks
constraint-related tasks and communication-related to those
all property-related tasks and communication-related to those; constraint-related tasks and communication-related to those

One approach is to include some, rather than all, property related tasks. With this approach, the property-related tasks have at least a given number of events in common with the property. In these experiments, that number is 2. The constraint-related and communication-related tasks can be limited in this same way, though doing so is not examined in these experiments.

The following is measured in the data collected for the experiments:

number of predicted TA;
number of predicted TA required;
number of required TA not predicted;
verification results;
predicted analysis run-time minus optimal analysis run-time;
the percent of the time difference ((Pred-Opt)/Opt);
the number of VA and context-constraints.

Results

More analysis needs to be done to determine how useful the predictions are.

Conclusion

There is some potential for inaccuracy in the experiments. The predicted TA is compared to the estimated minimum, or optimal, TA needed. The optimal TA set was found manually and may actually contain unneeded TA. If predicted TA is compared to supposed optimal TA that actually has extra TA, a good prediction might falsely appear to be bad. The comparison also has less meaning if the supposed optimal TA actually fails to verify the property. So the information measured will be skewed in any cases where the optimal TA is skewed. Also, the timing may be inaccurate for smaller scaled problems.

There are other possibilities for making predictions. The tasks that are alphabet related could be limited by how many times one particular event is in a task. Certain events in an alphabet might be more significant than others, so it might be important for a task to have only one particular event in common rather than any of the alphabet's events. Perhaps it would be useful to predict all the tasks that have certain events. The number of events a task must have in common with an alphabet to be included in the group of some alphabet-related tasks might be more useful if it's larger than 2. It might also be useful to look at limiting the communication-related tasks by the number of events a task has in common with another task. Another thing to consider is how many TA should be predicted at a time. If too many TA are automatically included, the analysis might take too long or run out of memory.