Loops in Static Symbolic Execution¶
import utils
from minipy import *
from transparent_function_execution import *
Loops are a standard obstacle in symbolic execution. The dynamic / concolic variants of SE use concrete inputs as guides, which ensures the termination of the analysis even in the presence of symbolic loop guards. In static SE, termination is easiest ensured by bounded unwinding (see A Symbolic Interpreter and Correctness of Symbolic Transitions). However, this impedes the construction of an exhaustive symbolic execution procedure (see Correctness of Symbolic Transitions), as generally some loop paths will not be covered if the number of loop executions is not bound by a constant.
The standard way to deal with unbounded loops in exhaustive symbolic execution are loop invariants [Hoa69]. Essentially, a loop invariant is an assertion satisfied by the loop after any number of iterations (i.e., the the assertion is invariant with respect to the number of completed loop iterations). Thus, we can abstract a loop by its invariant and continue symbolic execution.
To prove that a formula \(\mathit{Inv}\) is indeed a loop invariant in the first place, we have to assert that
(Initially Valid) \(\mathit{Inv}\) holds initially, that is, right before the very first loop iteration, and
(Preserved) if \(\mathit{Inv}\) holds at the beginning of an arbitrary iteration, \(\mathit{Inv}\) also holds right before the next iteration.
In rule-based symbolic execution systems, loop invariant-based reasoning is performed by loop invariant rules (see, e.g., [DE82] [ABB+16]). Alternatively, we can encode the above two proof cases as well as the case where we assume the invariant and resume symbolic execution (the “Use Case”) into the program. This approach, which is, for example, followed by the Boogie verifier [BCD+05], requires two additional statement types: (1) havoc statements havoc x
erase the currently assigned value for the variable x
in the symbolic store, assigning it a fresh, symbolic value; and (2) assume statements assume expr
, which add the boolean expression expr
to the path condition.
Assuming these verification-only statements, we transform a loop as follows, where Inv
is a loop invariant and x_1
to x_n
are the variables written in the loop body: A loop statement
while guard:
body
remaining_program
is transformed to
assert Inv # Initially Valid
havoc x_1
havoc x_2
havoc ...
havoc x_n
assume Inv
if guard:
body
assert Inv # Preserved
assume False
remaining_program # Use Case
The assume False
after the “Preserved” assertion indicates that we reached a code path which does not exit the loop and preserves Inv
(since the assertion before was satisfied). Assuming False
leads to a symbolic state with unsatisfiable path condition, such that symbolic execution stops at that point.
Consider, for example, the following program computing the product of all integers in a tuple t
:
i = 0
result = 1
while i < len(t):
elem = t[i]
result = result * elem
i = i + 1
return result
One loop invariant for this program is \(\mathtt{i}\geq{}0\wedge\mathtt{i}\leq\mathtt{len(t)}\land\texttt{result % elem} = 0\), informally “i
ranges between 0 (before the loop) and, inclusively, len(t)
(after loop termination), and the current tuple element is a factor of result
”.
Let Inv(i, t, result)
denote the invariant expression. Then, the product computation program is transformed to
i = 0
result = 1
elem = 1
assert Inv(i, t, elem, result)
havoc i
havoc elem
havoc result
assume Inv(i, t, elem, result)
if i < len(t):
elem = t[i]
result = result * elem
i = i + 1
assert Inv(i, t, elem, result)
assume False
return result
Loop Invariant Reasoning using Program Transformation¶
To enable this program transformation-based approach to loop invariant reasoning, we add havoc
and assume
statements to minipy and the symbolic interpreter. Since these statements work with symbolic expressions and manipulate path constraints, it does not make much sense to integrate them also in the concrete minipy interpreter.
We first extend the minipy grammar.
#% EXPORT
MINIPY_GRAMMAR["<simple_stmt>"].insert(3, "<havoc_stmt>")
MINIPY_GRAMMAR["<simple_stmt>"].insert(6, "<assume_stmt>")
MINIPY_GRAMMAR["<havoc_stmt>"] = ["havoc <NAME>"]
MINIPY_GRAMMAR["<assume_stmt>"] = ["assume <expression>"]
Next, we add AST node classes, extend the AST converter, and re-declare the parse
function to use the new ASTConverter
version.
class Havoc(ASTNode):
def __init__(self, variable: str):
super().__init__("havoc " + variable)
self.variable = variable
def transform(self, transformer: Callable[[ASTNode], ASTNode]) -> ASTNode:
return transformer(self)
class Assume(ASTNode):
def __init__(self, expression: 'Expression'):
super().__init__("assume " + expression.code)
self.expression = expression
def transform(self, transformer: Callable[[ASTNode], ASTNode]) -> ASTNode:
return transformer(Assume(self.expression.transform(transformer)))
class ASTConverter(ASTConverter):
def __init__(self):
super().__init__()
self.stmt_interpretations["<havoc_stmt>"] = self.transform_havoc_stmt
self.stmt_interpretations["<assume_stmt>"] = self.transform_assume_stmt
def transform_havoc_stmt(self, stmt: ParseTree) -> ASTNode:
m_res = match(("havoc ", "<NAME>"), stmt)
assert m_res is not None
return Havoc(tree_to_string(m_res[-1]))
def transform_assume_stmt(self, stmt: ParseTree) -> ASTNode:
m_res = match(("assume ", "<expression>"), stmt)
assert m_res is not None
eval_expr_result = self.transform_expression(m_res[1])
return Assume(eval_expr_result)
def parse(inp: str) -> ASTNode:
inp = strip_comments_and_whitespace(inp)
tree = PythonPEGParser(MINIPY_GRAMMAR).parse(inp)[0]
return ASTConverter().transform_top(tree)
Finally, we extend the symbolic interpreter.
def z3_fresh_sequence(prefix: str, elem_sort: Optional[z3.SortRef] = None, ctx=None):
ctx = z3.get_ctx(ctx)
elem_sort = z3.IntSort(ctx) if elem_sort is None else elem_sort
return z3.SeqRef(
z3.Z3_mk_fresh_const(
ctx.ref(),
prefix,
z3.SeqSortRef(z3.Z3_mk_seq_sort(elem_sort.ctx_ref(), elem_sort.ast)).ast),
ctx)
class SymbolicInterpreter(SymbolicInterpreter):
def __init__(self,
loop_unrolling_threshold: Optional[int] = None,
predicates: Optional[Dict[str, Tuple[Tuple[z3.SortRef, ...], z3.SortRef, Callable]]] = None):
super().__init__(loop_unrolling_threshold, predicates)
self.stmt_interpretations[Havoc] = self.execute_havoc
self.stmt_interpretations[Assume] = self.execute_assume_stmt
def execute_havoc(self, stmt: Havoc, environment: SymbolicEnvironment) -> SET:
variable = environment.store.get_variable(stmt.variable)
if variable is None:
# This variable has not been assigned a value yet, so there is nothing to havoc
return SETNode(environment, stmt.code), [(SETNode(environment, None), [])]
new_env = copy.deepcopy(environment.store.env)
new_env[variable] = (
z3.FreshInt(variable.name) if variable.type == INT_TYPE else
z3.FreshBool(variable.name) if variable.type == BOOL_TYPE else
z3_fresh_sequence(variable.name)
)
new_environment = SymbolicEnvironment(SymbolicStore(new_env),
environment.path_constraints,
environment.abrupt_completion,
environment.functions)
return SETNode(environment, stmt.code), [(SETNode(new_environment, None), [])]
def execute_assume_stmt(self, stmt: Assert, environment: SymbolicEnvironment) -> SET:
eval_expr_result = self.evaluate(stmt.expression, environment)
children: List[SET] = []
result = (SETNode(environment, stmt.code), children)
for value, constraint in eval_expr_result:
new_env = environment.add_constraint(constraint)
if isinstance(value, Exception):
new_env = new_env.set_abrupt_completion(value)
else:
assert isinstance(value, z3.BoolRef)
new_env = new_env.add_constraint(value)
if not self.unsatisfiable(new_env):
children.append((SETNode(new_env, ''), []))
return result
Now, we can symbolically execute our transformed product computation program with invariant reasoning:
program_with_invariants = """
i = 0
result = 1
elem = 1
assert Inv(i, t, elem, result)
havoc i
havoc elem
havoc result
assume Inv(i, t, elem, result)
if i < len(t):
elem = t[i]
result = result * elem
i = i + 1
assert Inv(i, t, elem, result)
assume False
return result
"""
program_with_invariants_ast = parse(program_with_invariants)
display_program(program_with_invariants_ast.code)
i = 0
result = 1
elem = 1
assert Inv(i, t, elem, result)
havoc i
havoc elem
havoc result
assume Inv(i, t, elem, result)
if (i < len(t)):
elem = t[i]
result = (result * elem)
i = (i + 1)
assert Inv(i, t, elem, result)
assume False
return result
Instead of using standard minipy expressions for the assertions (which would also have been possible here), we introduce a predicate Inv
for the invariant. This has two advantages: First, it enables us to store the invariant formula at a central place, which makes it easier to change it in case of problems. And second, that way we can go beyond mere minipy assertions when designing loop invariants. Most importantly, we can use existential and universal quantifiers. An example for an invariant with quantifiers is provided further below.
invariant = (
lambda iv, tv, elemv, resultv:
z3.And(
iv >= z3.IntVal(0),
iv <= z3.Length(tv),
z3.Or(
z3.And(
elemv == z3.IntVal(0),
resultv == z3.IntVal(0)
),
z3.And(
z3.Not(elemv == z3.IntVal(0)),
resultv % elemv == z3.IntVal(0)
))))
interpreter = SymbolicInterpreter(predicates={
"Inv": ((z3.IntSort(), z3.SeqSort(z3.IntSort()), z3.IntSort(), z3.IntSort()), z3.BoolSort(), invariant),
"len": ((z3.SeqSort(z3.IntSort()),), z3.IntSort(), z3.Length)
})
t = Variable("t", TUPLE_TYPE)
environment = SymbolicEnvironment().set(t, t.to_z3())
result = interpreter.execute(program_with_invariants_ast, environment)
display_set(result)
Since there are no red nodes in the SET, all invariant checks were passed, and we also did not miss any other exceptional cases! Let us check what happens if we add elem
to result
instead of computing the product:
faulty_program_with_invariants = """
i = 0
result = 1
elem = 1
assert Inv(i, t, elem, result)
havoc i
havoc elem
havoc result
assume Inv(i, t, elem, result)
if i < len(t):
elem = t[i]
result = result + elem # <-- "+"" instead of *
i = i + 1
assert Inv(i, t, elem, result)
assume False
return result
"""
faulty_program_with_invariants_ast = parse(faulty_program_with_invariants)
display_program(faulty_program_with_invariants_ast.code)
i = 0
result = 1
elem = 1
assert Inv(i, t, elem, result)
havoc i
havoc elem
havoc result
assume Inv(i, t, elem, result)
if (i < len(t)):
elem = t[i]
result = (result + elem)
i = (i + 1)
assert Inv(i, t, elem, result)
assume False
return result
t = Variable("t", TUPLE_TYPE)
environment = SymbolicEnvironment().set(t, t.to_z3())
result = interpreter.execute(faulty_program_with_invariants_ast, environment)
display_set(result)
Now, there is a red node with an AssertionError
: As expected, our buggy program mutant does not satisfy the loop invariant.
Next, we have a look at a slightly more complex example: A program filtering tuple values, retaining only the positive ones.
def filter_nonnegative(t: tuple) -> tuple:
i = 0
result = tuple()
while i < len(t):
if t[i] >= 0:
result = result + (t[i], )
i = i + 1
return result
filter_nonnegative((3, 17, -4, 0, 5, -321))
(3, 17, 0, 5)
We transform the body of the filter_nonnegative
function according to our “loop invariants with program transformation” approach:
filter_program_with_invariants = """
i = 0
result = tuple()
assert Inv(i, t, result)
havoc i
havoc result
assume Inv(i, t, result)
if i < len(t):
if t[i] >= 0:
result = result + (t[i], )
i = i + 1
assert Inv(i, t, result)
assume False
return result
"""
filter_program_with_invariants_ast = parse(filter_program_with_invariants)
display_program(filter_program_with_invariants_ast.code)
i = 0
result = tuple()
assert Inv(i, t, result)
havoc i
havoc result
assume Inv(i, t, result)
if (i < len(t)):
if (t[i] >= 0):
result = (result + (t[i], ))
i = (i + 1)
assert Inv(i, t, result)
assume False
return result
The invariant maintained by this program is
i
stays between 0 and the size oft
the size of
result
does not exceed the size ofiv
all nonnegative elements of
t
up to (exclusively) indexi
are contained inresult
kv, lv = z3.Ints("k l")
invariant = lambda iv, tv, resultv: \
z3.And(
iv >= z3.IntVal(0),
iv <= z3.Length(tv),
z3.Length(resultv) <= iv,
z3.ForAll(
[kv],
z3.Implies(
z3.And(z3.IntVal(0) <= kv, kv < iv),
z3.Or(
tv[kv] < z3.IntVal(0),
z3.Exists(
[lv],
z3.And(
lv >= z3.IntVal(0),
lv < z3.Length(resultv),
resultv[lv] == tv[kv]
)
)))))
interpreter = SymbolicInterpreter(predicates={
"Inv": ((z3.IntSort(), z3.SeqSort(z3.IntSort()), z3.IntSort()), z3.BoolSort(), invariant),
"len": ((z3.SeqSort(z3.IntSort()),), z3.IntSort(), z3.Length)
})
t = Variable("t", TUPLE_TYPE)
environment = SymbolicEnvironment().set(t, t.to_z3())
result = interpreter.execute(filter_program_with_invariants_ast, environment)
display_set(result)
Note that the invariant occurs in the path constraint of the blue SET node (the “use case”). For any code following the loop, we can (and have to) therefore rely on the information encoded in the loop invariant when accessing values changed by the loop: The loop invariant is an abstraction of the loop’s behavior. In Program Proving, we will have a look at the role of loop invariants in program proving. Reasonable loop invariants should at least exclude non-feasible exceptions raised in the code following the loop (e.g., by limiting the range of variables used for indexing tuples). To show stronger properties (e.g., functional postconditions), we also have to come up with stronger loop invariants.
For a final example, we return to the linear search program:
def find(needle: int, haystack: tuple) -> int:
i = 0
while i < len(haystack):
if haystack[i] == needle:
break
i = i + 1
else:
return -1
return i
Applying the code transformation as before yields (we only show the function’s body):
i = 0
assert Inv(i, needle, haystack)
havoc i
assume Inv(i, needle, haystack)
if i < len(haystack):
if haystack[i] == needle:
break
i = i + 1
assert Inv(i, needle, haystack)
assume False
else:
return -1
return i
Obviously, we missed something: What do we do about the break
statement, which now appears outside any while
statement? To preserve its semantics, we have to change the program such that no loop code is executed after hitting the break
. For this simple example, this is not too difficult:
i = 0
assert Inv(i, needle, haystack)
havoc i
assume Inv(i, needle, haystack)
if i < len(haystack):
if haystack[i] == needle:
pass # <-- formerly break
else:
i = i + 1
assert Inv(i, needle, haystack)
assume False
else:
return -1
return i
It suffices to replace the break
by a pass
and to embed the remaining loop code inside the else
branch of the if
. Turning this idea into an automated, general transformation still remains a major challenge with potentially many special cases to consider.
Loop Scopes¶
An alternative approach which does not require any program transformation of loop bodies was proposed in [SteinhofelW17]. This work introduces so-called loop scope statements, which are similar in spirit to the method frames described in Transparent Symbolic Execution of Function Calls. The idea is that we do not need to transform, for example, the break
statement above. Instead, a break
statement may legally occur within a loop scope, which “swallows” it and completes normally.
The original loop scope statements from [SteinhofelW17] are statements of the form
loop-scope(x):
body
where x
is the boolean index variable of the loop scope and body
its body. Their intuitive semantics is that x
is set to False
if, and only if, body
completes abruptly because of a continue
statement, and is set to True
otherwise. That is, x
indicates if the loop has been exited. Loop scopes were implemented for the rule-based symbolic execution engine KeY1. There, the value of x
is then used in an updated postcondition to distinguish whether the loop invariant or the original postcondition has to be shown. It turns out that in our symbolic minipy interpreter, the index x
is not needed, since we can simply “catch” the abrupt completion behavior of body
and proceed accordingly.
In [Steinhofel20], loop scopes are given a precise, non-mechanized semantics. We adapt this definition to minipy, which involves both changes to the syntax and semantics of loop scopes. Syntactically, we drop the loop scope index x
, and instead require a boolean expression as argument: The loop invariant. The syntax of minipy loop scopes is
loop-scope(inv=expr):
body
where expr
is a boolean expression. We also change the semantics; most notably, we check the invariant if the body
completes due to a continue
statement, and either terminate the execution if the invariant is satisfied (cf. the addition of False
to the path constraint in case of the transformation-based approach) or raise an AssertionError
. Subsequently, we provide a more precise description.
A loop scope statement is executed by first executing the body body
. Then there is a choice:
If execution of
body
completes normally, the behavior of the loop scope statement is undefined.If execution of
body
completes abruptly because of abreak
, the loop scope statement completes normally.If execution of
body
completes abruptly because of acontinue
, there is a choice:If
expr
evaluates toTrue
, the execution (of the whole program) completes normally.If
expr
evaluates toFalse
, the loop scopes statement completes abruptly because of anAssertionError
.
If execution of
body
completes abruptly for any other reason, the loop scope statement completes abruptly for the same reason.
This facilitates a different kind of program transformation for loop invariant-based symbolic execution of loops which leaves loop bodies fully intact. A while
loop with guard guard
, body body
, assigned locations x_1
to x_n
and invariant expression Inv
is transformed to
assert Inv
havoc x_1
# ...
havoc x_n
assume Inv
loop-scope(inv=Inv):
if guard:
body
continue
else:
break
If the loop has an else
branch with body else_body
, we retain this branch, adding a trailing else
statement. We obtain
assert Inv
havoc x_1
# ...
havoc x_n
assume Inv
loop-scope(inv=Inv):
if guard:
body
continue
else:
else_body
break
Observe that it is not possible for the body of the loop scope to complete normally. If the original body
completes normally, the newly added continue
statement is reached and the loop invariant checked. If the guard evaluates to False
, we hit the added break
statement, leading to the classic use case of loop invariant reasoning.
Grammar, Parser, and Mechanized Semantics¶
Let us implement loop scopes for minipy. First, as usual, we extend the grammar, add AST node classes and extend the ASTConverter.
MINIPY_GRAMMAR["<compound_stmt>"].insert(0, "<loop_scope>")
MINIPY_GRAMMAR["<loop_scope>"] = ["loop-scope(inv=<expression>):<block>"]
class LoopScope(ASTNode):
def __init__(self, inv: Expression, block: Block):
super().__init__(f"loop-scope(inv={inv.code}):\n{indent(block.code, ' ')}")
self.inv = inv
self.block = block
def transform(self, transformer: Callable[[ASTNode], ASTNode]) -> ASTNode:
return transformer(LoopScope(self.inv.transform(transformer), self.block.transform(transformer)))
class ASTConverter(ASTConverter):
def __init__(self):
super().__init__()
self.stmt_interpretations["<loop_scope>"] = self.transform_loop_scope_stmt
def transform_loop_scope_stmt(self, stmt: ParseTree) -> LoopScope:
m_res = match(("loop-scope(inv=", "<expression>", "):", "<block>"), stmt)
assert m_res is not None
return LoopScope(self.transform_expression(m_res[1]), self.transform_block(m_res[-1]))
def parse(inp: str) -> 'ASTNode':
inp = strip_comments_and_whitespace(inp)
tree = PythonPEGParser(MINIPY_GRAMMAR).parse(inp)[0]
return ASTConverter().transform_top(tree)
find_loop_scopes = """
i = 0
assert Inv(i, needle, haystack)
havoc i
assume Inv(i, needle, haystack)
loop-scope(inv=Inv(i, needle, haystack)):
if i < len(haystack):
if haystack[i] == needle:
break
i = i + 1
continue
else:
return -1
break
return i
"""
find_loop_scopes_ast = parse(find_loop_scopes)
display_program(find_loop_scopes_ast.code)
i = 0
assert Inv(i, needle, haystack)
havoc i
assume Inv(i, needle, haystack)
loop-scope(inv=Inv(i, needle, haystack)):
if (i < len(haystack)):
if (haystack[i] == needle):
break
i = (i + 1)
continue
else:
return -1
break
return i
As we did for method frames in Transparent Symbolic Execution of Function Calls, we can mechanize the semantics of loop scopes by implementing their behavior in the concrete minipy interpreter.
class Interpreter(Interpreter):
def __init__(self):
super().__init__()
self.stmt_interpretations[LoopScope] = self.execute_loop_scope
def execute_loop_scope(self, stmt: LoopScope, environment: Environment) -> None:
try:
self.execute(stmt.block, environment)
assert False
except Exception as exc:
if isinstance(exc, Break):
return
elif isinstance(exc, Continue):
if self.evaluate(stmt.inv, environment):
raise Halt()
else:
raise AssertionError(f"Loop scope invariant {stmt.inv.code} not satisfied.")
else:
raise exc
For experimenting with this implementation, we use a simple loop iterating over a tuple and performing special actions for the magic values 17 and 42:
while i < len(t):
if i == 17:
break
if i == 42:
return i + 1
i = i + 1
Assuming the invariant inv >= 0 and i <= len(t)
, transforming loop as explained above for invariant-based symbolic execution with loop scopes yields the following loop scope statement:
loop_scope_program = """
loop-scope(inv=i >= 0 and i <= len(t)):
if i < len(t):
if i == 17:
break
if i == 42:
return i + 1
i = i + 1
continue
else:
break
"""
loop_scope_program_ast = parse(loop_scope_program)
display_program(loop_scope_program_ast.code)
loop-scope(inv=((i >= 0) and (i <= len(t)))):
if (i < len(t)):
if (i == 17):
break
if (i == 42):
return (i + 1)
i = (i + 1)
continue
else:
break
If the loop is entered and the continue
statement reached, the invariant is verified. If this succeeds, the execution completes due to a raised Halt
exception (which is a special exception type signaling abrupt completion of the execution for a reason other than an error).
i = Variable("i", INT_TYPE)
t = Variable("t", TUPLE_TYPE)
interpreter = Interpreter()
environment = Environment()
environment[i] = 3
environment[t] = (1, 2, 3, 4, 5)
try:
interpreter.execute(loop_scope_program_ast, environment)
except Halt:
print(f"Invariant satisfied (Halt raised). i={environment[i]}")
Invariant satisfied (Halt raised). i=4
If the value of index i
exceeds the tuple size, the loop body is not executed and the break
statement hit instead. The value of i
is not changed.
environment = Environment()
environment[i] = 6 # Out of bounds; loop body not entered
environment[t] = (1, 2, 3, 4, 5)
interpreter.execute(loop_scope_program_ast, environment)
print(f"Program completed normally. i={environment[i]}")
Program completed normally. i=6
The magic value 17 induces an abrupt completion of the body due to a break
statement. The loop scope completes normally, i
remains unchanged.
environment = Environment()
environment[i] = 17 # Magic value
environment[t] = tuple([i for i in range(50)])
interpreter.execute(loop_scope_program_ast, environment)
print(f"Program completed normally. i={environment[i]}")
Program completed normally. i=17
For the magic value 42, a return
is executed.
environment = Environment()
environment[i] = 42 # Magic value
environment[t] = tuple([i for i in range(50)])
try:
interpreter.execute(loop_scope_program_ast, environment)
except Return as ret:
print(f"Program completed because of a return of {ret.value}. i={environment[i]}")
Program completed because of a return of 43. i=42
Since i
is not initialized before the loop and the loop guard only checks if i
is too big, we can assign it a negative initial value leading to a violation of the loop invariant.
environment = Environment()
environment[i] = -3 # out of bounds, but still enters loop body
environment[t] = (1, 2, 3)
try:
interpreter.execute(loop_scope_program_ast, environment)
except AssertionError:
print(f"Loop invariant not satisfied. i={environment[i]}")
Loop invariant not satisfied. i=-2
Symbolic Execution of Loop Scopes¶
Extending the symbolic interpreter with loop scope support is not too difficult since we can copy the code for the most difficult part, the checking of the invariant in the case of a completion due to a continue
statement, from the method handling assert
statements.
class SymbolicInterpreter(SymbolicInterpreter):
def __init__(self,
loop_unrolling_threshold: Optional[int] = None,
predicates: Optional[Dict[str, Tuple[Tuple[z3.SortRef, ...], z3.SortRef, Callable]]] = None):
super().__init__(loop_unrolling_threshold, predicates)
self.stmt_interpretations[LoopScope] = self.execute_loop_scope
def execute_loop_scope(self, stmt: LoopScope, environment: SymbolicEnvironment) -> SET:
tree = self.execute(stmt.block, environment)
result = (SETNode(environment, stmt.code), [tree])
for leaf in get_leaves(tree):
leaf_node, _ = leaf
environment = leaf_node.environment
exc = leaf_node.environment.abrupt_completion
assert exc is not None
if isinstance(exc, Break):
result = replace_in_tree(
result, leaf, (SETNode(leaf_node.environment.set_abrupt_completion(None)), []))
elif isinstance(exc, Continue):
eval_inv_result = self.evaluate(stmt.inv, environment)
children = leaf[1]
for value, constraint in eval_inv_result:
new_env = environment.add_constraint(constraint)
if self.unsatisfiable(new_env):
continue
if isinstance(value, Exception):
exc_node = (SETNode(new_env.set_abrupt_completion(value), ''), [])
children.append(exc_node)
continue
assert isinstance(value, z3.BoolRef)
sat_env = (new_env
.add_constraint(value)
.set_abrupt_completion(Halt()))
viol_env = (new_env
.add_constraint(z3.Not(value))
.set_abrupt_completion(AssertionError()))
sat_env_unsat = self.unsatisfiable(sat_env)
viol_env_unsat = self.unsatisfiable(viol_env)
assert not sat_env_unsat or not viol_env_unsat
if sat_env_unsat:
children.append((SETNode(viol_env, ''), []))
elif viol_env_unsat:
children.append((SETNode(sat_env, ''), []))
else:
children.append((SETNode(sat_env, ''), []))
children.append((SETNode(viol_env, ''), []))
return result
As an example for demonstrating the result of this implementation, we take our linear search program. Its loop invariant is “i stays within 0 and the size of haystack (inclusively), and we did not find needle
at all previous indexes”.
kv = z3.Int("k")
invariant = lambda iv, needlev, haystackv: \
z3.And(
iv >= z3.IntVal(0),
iv <= z3.Length(haystackv),
z3.ForAll(
[kv],
z3.Implies(
z3.And(z3.IntVal(0) <= kv, kv < iv),
z3.Not(haystackv[kv] == needlev)
)
)
)
interpreter = SymbolicInterpreter(predicates={
"Inv": ((z3.IntSort(), z3.IntSort(), z3.SeqSort(z3.IntSort())), z3.BoolSort(), invariant),
"len": ((z3.SeqSort(z3.IntSort()),), z3.IntSort(), z3.Length)
})
needle = Variable("needle", INT_TYPE)
haystack = Variable("haystack", TUPLE_TYPE)
environment = SymbolicEnvironment().set(haystack, haystack.to_z3()).set(needle, needle.to_z3())
result = interpreter.execute(find_loop_scopes_ast, environment)
display_set(result)
What happens if we add a small bug into our implementation? Assume we typed i = i + i
instead i = i + 1
(which is not entirely impossible):
find_loop_scopes_buggy = """
i = 0
assert Inv(i, needle, haystack)
havoc i
assume Inv(i, needle, haystack)
loop-scope(inv=Inv(i, needle, haystack)):
if i < len(haystack):
if haystack[i] == needle:
break
i = i + i
continue
else:
return -1
break
return i
"""
find_loop_scopes_buggy_ast = parse(find_loop_scopes_buggy)
display_program(find_loop_scopes_buggy_ast.code)
i = 0
assert Inv(i, needle, haystack)
havoc i
assume Inv(i, needle, haystack)
loop-scope(inv=Inv(i, needle, haystack)):
if (i < len(haystack)):
if (haystack[i] == needle):
break
i = (i + i)
continue
else:
return -1
break
return i
result = interpreter.execute(find_loop_scopes_buggy_ast, environment)
display_set(result)
Now, we get an AssertionError
node after the SET node for the continue
statement, since the invariant is not preserved by the code.
The only missing puzzle piece is an automatic transformation from loops into loop scopes. Since we do not have to perform transformations inside loop bodies, this is easy to implement. First, however, we define another visitor class to retrieve assigned loop locations for the introduction of the havoc
statements.
class AssignedVariablesVisitor:
def __init__(self):
self.result: Set[str] = set()
def __call__(self, *args, **kwargs):
assert len(args) == 1
assert not kwargs
assert isinstance(args[0], ASTNode)
node: ASTNode = args[0]
if isinstance(node, Assignment):
self.result.add(node.lhs)
return node
@staticmethod
def get_assigned_variables(node: ASTNode) -> Set[str]:
visitor = AssignedVariablesVisitor()
node.transform(visitor)
return visitor.result
The LoopScopeTransformer
class itself is straightforward. It requires a mapping from loop statements to invariant expressions which we insert into the loop scope statements.
class LoopScopeTransformer:
def __init__(self, invariants: Dict[WhileStmt, Expression]):
self.invariants = invariants
def __call__(self, *args, **kwargs):
assert len(args) == 1
assert not kwargs
assert isinstance(args[0], ASTNode)
node: ASTNode = args[0]
if isinstance(node, WhileStmt):
inv = self.invariants[node]
assigned_vars = AssignedVariablesVisitor.get_assigned_variables(node)
result_stmts: List[ASTNode] = [Assert(inv)]
result_stmts.extend([Havoc(variable) for variable in assigned_vars])
result_stmts.append(Assume(inv))
body = IfStmt(
node.guard,
Block(node.body.stmts + [ContinueStmt()]),
Block(([] if node.else_block is None else node.else_block.stmts) + [BreakStmt()])
)
result_stmts.append(LoopScope(inv, Block([body])))
return Stmts(result_stmts)
return node
We transform the body of our linear search program:
find_program = """
i = 0
while i < len(haystack):
if haystack[i] == needle:
break
i = i + 1
else:
return -1
return i
"""
find_program_ast = parse(find_program)
display_program(find_program_ast.code)
i = 0
while (i < len(haystack)):
if (haystack[i] == needle):
break
i = (i + 1)
else:
return -1
return i
The transformation requires that we first extract the loop statement for assigning it its invariant.
loop_stmt = next(iter(
FilterASTVisitor.filter(find_program_ast, lambda node: isinstance(node, WhileStmt))))
display_program(loop_stmt.code)
while (i < len(haystack)):
if (haystack[i] == needle):
break
i = (i + 1)
else:
return -1
loop_scope_transformer = LoopScopeTransformer({loop_stmt: parse_expr("Inv(i, needle, haystack)")})
find_loop_scopes = find_program_ast.transform(loop_scope_transformer)
display_program(find_loop_scopes.code)
i = 0
assert Inv(i, needle, haystack)
havoc i
assume Inv(i, needle, haystack)
loop-scope(inv=Inv(i, needle, haystack)):
if (i < len(haystack)):
if (haystack[i] == needle):
break
i = (i + 1)
continue
else:
return -1
break
return i
This looks as expected! As promised in Transparent Symbolic Execution of Function Calls, we can combine (chain) the transformers for loop scopes and method frames, such that the full version of the linear search program including the method call can be transformed and transparently executed symbolically:
find_program_with_func_decl = """
def find(needle: int, haystack: tuple) -> int:
i = 0
while i < len(haystack):
if haystack[i] == needle:
break
i = i + 1
else:
return -1
return i
result = find(needle, haystack)
"""
find_program_with_func_decl_ast = parse(find_program_with_func_decl)
display_program(find_program_with_func_decl_ast.code)
def find(needle: int, haystack: tuple) -> int:
i = 0
while (i < len(haystack)):
if (haystack[i] == needle):
break
i = (i + 1)
else:
return -1
return i
result = find(needle, haystack)
find_program_with_func_decl_loop_scopes = \
find_program_with_func_decl_ast.transform(loop_scope_transformer)
display_program(find_program_with_func_decl_loop_scopes.code)
def find(needle: int, haystack: tuple) -> int:
i = 0
assert Inv(i, needle, haystack)
havoc i
assume Inv(i, needle, haystack)
loop-scope(inv=Inv(i, needle, haystack)):
if (i < len(haystack)):
if (haystack[i] == needle):
break
i = (i + 1)
continue
else:
return -1
break
return i
result = find(needle, haystack)
method_frame_transformer = MethodFrameTransformer(find_program_with_func_decl_loop_scopes)
find_program_with_func_decl_loop_scopes_method_frames = \
find_program_with_func_decl_loop_scopes.transform(method_frame_transformer)
display_program(find_program_with_func_decl_loop_scopes_method_frames.code)
def find(needle: int, haystack: tuple) -> int:
i = 0
assert Inv(i, needle, haystack)
havoc i
assume Inv(i, needle, haystack)
loop-scope(inv=Inv(i, needle, haystack)):
if (i < len(haystack)):
if (haystack[i] == needle):
break
i = (i + 1)
continue
else:
return -1
break
return i
method-frame(result=var_0):
needle = needle
haystack = haystack
i = 0
assert Inv(i, needle, haystack)
havoc i
assume Inv(i, needle, haystack)
loop-scope(inv=Inv(i, needle, haystack)):
if (i < len(haystack)):
if (haystack[i] == needle):
break
i = (i + 1)
continue
else:
return -1
break
return i
result = var_0
Symbolically executing this code uses sound loop-invariant based loop abstraction thanks to loop scopes, and produces an SET including nodes for all intermediate execution steps thanks to method frames.
result = interpreter.execute(find_program_with_func_decl_loop_scopes_method_frames, environment)
display_set(result)
TODO
Final words, maybe descriptions of related approaches to loop invariant reasoning (loop invariant inference, abstract interpretation, mention that in dynamic SE, pruning/search strategies are used)
References¶
- ABB+16
Wolfgang Ahrendt, Bernhard Beckert, Richard Bubel, Reiner Hähnle, Peter H. Schmitt, and Mattias Ulbrich, editors. Deductive Software Verification – The KeY Book. Volume 10001 of LNCS. Springer, 2016. doi:10.1007/978-3-319-49812-6.
- BCD+05
Mike Barnett, Bor-Yuh Evan Chang, Robert DeLine, Bart Jacobs, and K Rustan M Leino. Boogie: A Modular Reusable Verifier for Object-Oriented Programs. In International Symposium on Formal Methods for Components and Objects, 364–387. Springer, 2005.
- DE82
R.B. Dannenberg and G.W. Ernst. Formal Program Verification Using Symbolic Execution. IEEE Transactions on Software Engineering, SE-8(1):43–52, 1982.
- Hoa69
Charles Antony Richard Hoare. An Axiomatic Basis for Computer Programming. Communications of the ACM, 12(10):576–580, 1969.
- Steinhofel20
Dominic Steinhöfel. Abstract Execution: Automatically Proving Infinitely Many Programs. PhD thesis, TU Darmstadt, Dept. of Computer Science, Darmstadt, Germany, 2020. URL: http://tuprints.ulb.tu-darmstadt.de/8540/, doi:10.25534/tuprints-00008540.
- SteinhofelW17(1,2)
Dominic Steinhöfel and Nathan Wasser. A New Invariant Rule for the Analysis of Loops with Non-standard Control Flows. In Proc. 13th Intern. Conf on Integrated Formal Methods (IFM), 279–294. 2017. doi:10.1007/978-3-319-66845-1\_18.