Loops in Static Symbolic Execution

import utils
from minipy import *
from transparent_function_execution import *

Loops are a standard obstacle in symbolic execution. The dynamic / concolic variants of SE use concrete inputs as guides, which ensures the termination of the analysis even in the presence of symbolic loop guards. In static SE, termination is easiest ensured by bounded unwinding (see A Symbolic Interpreter and Correctness of Symbolic Transitions). However, this impedes the construction of an exhaustive symbolic execution procedure (see Correctness of Symbolic Transitions), as generally some loop paths will not be covered if the number of loop executions is not bound by a constant.

The standard way to deal with unbounded loops in exhaustive symbolic execution are loop invariants [Hoa69]. Essentially, a loop invariant is an assertion satisfied by the loop after any number of iterations (i.e., the the assertion is invariant with respect to the number of completed loop iterations). Thus, we can abstract a loop by its invariant and continue symbolic execution.

To prove that a formula \(\mathit{Inv}\) is indeed a loop invariant in the first place, we have to assert that

  1. (Initially Valid) \(\mathit{Inv}\) holds initially, that is, right before the very first loop iteration, and

  2. (Preserved) if \(\mathit{Inv}\) holds at the beginning of an arbitrary iteration, \(\mathit{Inv}\) also holds right before the next iteration.

In rule-based symbolic execution systems, loop invariant-based reasoning is performed by loop invariant rules (see, e.g., [DE82] [ABB+16]). Alternatively, we can encode the above two proof cases as well as the case where we assume the invariant and resume symbolic execution (the “Use Case”) into the program. This approach, which is, for example, followed by the Boogie verifier [BCD+05], requires two additional statement types: (1) havoc statements havoc x erase the currently assigned value for the variable x in the symbolic store, assigning it a fresh, symbolic value; and (2) assume statements assume expr, which add the boolean expression expr to the path condition.

Assuming these verification-only statements, we transform a loop as follows, where Inv is a loop invariant and x_1 to x_n are the variables written in the loop body: A loop statement

while guard:
    body
    
remaining_program

is transformed to

assert Inv        # Initially Valid

havoc x_1
havoc x_2
havoc ...
havoc x_n
assume Inv

if guard:
    body
    assert Inv    # Preserved
    assume False

remaining_program # Use Case

The assume False after the “Preserved” assertion indicates that we reached a code path which does not exit the loop and preserves Inv (since the assertion before was satisfied). Assuming False leads to a symbolic state with unsatisfiable path condition, such that symbolic execution stops at that point.

Consider, for example, the following program computing the product of all integers in a tuple t:

i = 0
result = 1
while i < len(t):
    elem = t[i]
    result = result * elem
    i = i + 1
    
return result

One loop invariant for this program is \(\mathtt{i}\geq{}0\wedge\mathtt{i}\leq\mathtt{len(t)}\land\texttt{result % elem} = 0\), informally “i ranges between 0 (before the loop) and, inclusively, len(t) (after loop termination), and the current tuple element is a factor of result”.

Let Inv(i, t, result) denote the invariant expression. Then, the product computation program is transformed to

i = 0
result = 1
elem = 1 

assert Inv(i, t, elem, result)

havoc i
havoc elem
havoc result
assume Inv(i, t, elem, result)

if i < len(t):
    elem = t[i]
    result = result * elem
    i = i + 1
    assert Inv(i, t, elem, result)
    assume False

return result

Loop Invariant Reasoning using Program Transformation

To enable this program transformation-based approach to loop invariant reasoning, we add havoc and assume statements to minipy and the symbolic interpreter. Since these statements work with symbolic expressions and manipulate path constraints, it does not make much sense to integrate them also in the concrete minipy interpreter.

We first extend the minipy grammar.

#% EXPORT
MINIPY_GRAMMAR["<simple_stmt>"].insert(3, "<havoc_stmt>")
MINIPY_GRAMMAR["<simple_stmt>"].insert(6, "<assume_stmt>")
MINIPY_GRAMMAR["<havoc_stmt>"] = ["havoc <NAME>"]
MINIPY_GRAMMAR["<assume_stmt>"] = ["assume <expression>"]

Next, we add AST node classes, extend the AST converter, and re-declare the parse function to use the new ASTConverter version.

class Havoc(ASTNode):
    def __init__(self, variable: str):
        super().__init__("havoc " + variable)
        self.variable = variable

    def transform(self, transformer: Callable[[ASTNode], ASTNode]) -> ASTNode:
        return transformer(self)
class Assume(ASTNode):
    def __init__(self, expression: 'Expression'):
        super().__init__("assume " + expression.code)
        self.expression = expression

    def transform(self, transformer: Callable[[ASTNode], ASTNode]) -> ASTNode:
        return transformer(Assume(self.expression.transform(transformer)))
class ASTConverter(ASTConverter):
    def __init__(self):
        super().__init__()
        self.stmt_interpretations["<havoc_stmt>"] = self.transform_havoc_stmt
        self.stmt_interpretations["<assume_stmt>"] = self.transform_assume_stmt
        
    def transform_havoc_stmt(self, stmt: ParseTree) -> ASTNode:
        m_res = match(("havoc ", "<NAME>"), stmt)
        assert m_res is not None
        return Havoc(tree_to_string(m_res[-1]))
    
    def transform_assume_stmt(self, stmt: ParseTree) -> ASTNode:
        m_res = match(("assume ", "<expression>"), stmt)
        assert m_res is not None
        eval_expr_result = self.transform_expression(m_res[1])
        return Assume(eval_expr_result)
def parse(inp: str) -> ASTNode:
    inp = strip_comments_and_whitespace(inp)
    tree = PythonPEGParser(MINIPY_GRAMMAR).parse(inp)[0]
    return ASTConverter().transform_top(tree)

Finally, we extend the symbolic interpreter.

def z3_fresh_sequence(prefix: str, elem_sort: Optional[z3.SortRef] = None, ctx=None):
    ctx = z3.get_ctx(ctx)
    elem_sort = z3.IntSort(ctx) if elem_sort is None else elem_sort
    return z3.SeqRef(
        z3.Z3_mk_fresh_const(
            ctx.ref(),
            prefix,
            z3.SeqSortRef(z3.Z3_mk_seq_sort(elem_sort.ctx_ref(), elem_sort.ast)).ast),
        ctx)
class SymbolicInterpreter(SymbolicInterpreter):
    def __init__(self,
                 loop_unrolling_threshold: Optional[int] = None,
                 predicates: Optional[Dict[str, Tuple[Tuple[z3.SortRef, ...], z3.SortRef, Callable]]] = None):
        super().__init__(loop_unrolling_threshold, predicates)
        self.stmt_interpretations[Havoc] = self.execute_havoc
        self.stmt_interpretations[Assume] = self.execute_assume_stmt
        
    def execute_havoc(self, stmt: Havoc, environment: SymbolicEnvironment) -> SET:
        variable = environment.store.get_variable(stmt.variable)
        
        if variable is None:
            # This variable has not been assigned a value yet, so there is nothing to havoc
            return SETNode(environment, stmt.code), [(SETNode(environment, None), [])]

        new_env = copy.deepcopy(environment.store.env)
        new_env[variable] = (
            z3.FreshInt(variable.name) if variable.type == INT_TYPE else
            z3.FreshBool(variable.name) if variable.type == BOOL_TYPE else
            z3_fresh_sequence(variable.name)
        )

        new_environment = SymbolicEnvironment(SymbolicStore(new_env),
                                              environment.path_constraints,
                                              environment.abrupt_completion,
                                              environment.functions)
        return SETNode(environment, stmt.code), [(SETNode(new_environment, None), [])]

    def execute_assume_stmt(self, stmt: Assert, environment: SymbolicEnvironment) -> SET:
        eval_expr_result = self.evaluate(stmt.expression, environment)

        children: List[SET] = []
        result = (SETNode(environment, stmt.code), children)

        for value, constraint in eval_expr_result:
            new_env = environment.add_constraint(constraint)
            if isinstance(value, Exception):
                new_env = new_env.set_abrupt_completion(value)
            else:
                assert isinstance(value, z3.BoolRef)
                new_env = new_env.add_constraint(value)

            if not self.unsatisfiable(new_env):
                children.append((SETNode(new_env, ''), []))

        return result

Now, we can symbolically execute our transformed product computation program with invariant reasoning:

program_with_invariants = """
i = 0
result = 1
elem = 1 

assert Inv(i, t, elem, result)

havoc i
havoc elem
havoc result
assume Inv(i, t, elem, result)

if i < len(t):
    elem = t[i]
    result = result * elem
    i = i + 1
    assert Inv(i, t, elem, result)
    assume False

return result
"""

program_with_invariants_ast = parse(program_with_invariants)
display_program(program_with_invariants_ast.code)
i = 0
result = 1
elem = 1
assert Inv(i, t, elem, result)
havoc i
havoc elem
havoc result
assume Inv(i, t, elem, result)
if (i < len(t)):
    elem = t[i]
    result = (result * elem)
    i = (i + 1)
    assert Inv(i, t, elem, result)
    assume False
return result

Instead of using standard minipy expressions for the assertions (which would also have been possible here), we introduce a predicate Inv for the invariant. This has two advantages: First, it enables us to store the invariant formula at a central place, which makes it easier to change it in case of problems. And second, that way we can go beyond mere minipy assertions when designing loop invariants. Most importantly, we can use existential and universal quantifiers. An example for an invariant with quantifiers is provided further below.

invariant = (
    lambda iv, tv, elemv, resultv:
    z3.And(
        iv >= z3.IntVal(0),
        iv <= z3.Length(tv),
        z3.Or(
            z3.And(
                elemv == z3.IntVal(0),
                resultv ==  z3.IntVal(0)
            ),
            z3.And(
                z3.Not(elemv == z3.IntVal(0)),
                resultv % elemv == z3.IntVal(0)
            ))))

interpreter = SymbolicInterpreter(predicates={
    "Inv": ((z3.IntSort(), z3.SeqSort(z3.IntSort()), z3.IntSort(), z3.IntSort()), z3.BoolSort(), invariant),
    "len": ((z3.SeqSort(z3.IntSort()),), z3.IntSort(), z3.Length)
})

t = Variable("t", TUPLE_TYPE)
environment = SymbolicEnvironment().set(t, t.to_z3())

result = interpreter.execute(program_with_invariants_ast, environment)
display_set(result)
_images/static_loops_17_0.svg

Since there are no red nodes in the SET, all invariant checks were passed, and we also did not miss any other exceptional cases! Let us check what happens if we add elem to result instead of computing the product:

faulty_program_with_invariants = """
i = 0
result = 1
elem = 1 

assert Inv(i, t, elem, result)

havoc i
havoc elem
havoc result
assume Inv(i, t, elem, result)

if i < len(t):
    elem = t[i]
    result = result + elem  # <-- "+"" instead of *
    i = i + 1
    assert Inv(i, t, elem, result)
    assume False

return result
"""

faulty_program_with_invariants_ast = parse(faulty_program_with_invariants)
display_program(faulty_program_with_invariants_ast.code)
i = 0
result = 1
elem = 1
assert Inv(i, t, elem, result)
havoc i
havoc elem
havoc result
assume Inv(i, t, elem, result)
if (i < len(t)):
    elem = t[i]
    result = (result + elem)
    i = (i + 1)
    assert Inv(i, t, elem, result)
    assume False
return result
t = Variable("t", TUPLE_TYPE)
environment = SymbolicEnvironment().set(t, t.to_z3())

result = interpreter.execute(faulty_program_with_invariants_ast, environment)
display_set(result)
_images/static_loops_20_0.svg

Now, there is a red node with an AssertionError: As expected, our buggy program mutant does not satisfy the loop invariant.

Next, we have a look at a slightly more complex example: A program filtering tuple values, retaining only the positive ones.

def filter_nonnegative(t: tuple) -> tuple:
    i = 0
    result = tuple()

    while i < len(t):
        if t[i] >= 0:
            result = result + (t[i], )

        i = i + 1

    return result
filter_nonnegative((3, 17, -4, 0, 5, -321))
(3, 17, 0, 5)

We transform the body of the filter_nonnegative function according to our “loop invariants with program transformation” approach:

filter_program_with_invariants = """
i = 0
result = tuple() 

assert Inv(i, t, result)

havoc i
havoc result
assume Inv(i, t, result)

if i < len(t):
    if t[i] >= 0:
        result = result + (t[i], )
    i = i + 1
    assert Inv(i, t, result)
    assume False

return result 
"""

filter_program_with_invariants_ast = parse(filter_program_with_invariants)
display_program(filter_program_with_invariants_ast.code)
i = 0
result = tuple()
assert Inv(i, t, result)
havoc i
havoc result
assume Inv(i, t, result)
if (i < len(t)):
    if (t[i] >= 0):
        result = (result + (t[i], ))
    i = (i + 1)
    assert Inv(i, t, result)
    assume False
return result

The invariant maintained by this program is

  1. i stays between 0 and the size of t

  2. the size of result does not exceed the size of iv

  3. all nonnegative elements of t up to (exclusively) index i are contained in result

kv, lv = z3.Ints("k l")
invariant = lambda iv, tv, resultv: \
    z3.And(
        iv >= z3.IntVal(0),
        iv <= z3.Length(tv),
        z3.Length(resultv) <= iv,
        z3.ForAll(
            [kv],
            z3.Implies(
                z3.And(z3.IntVal(0) <= kv, kv < iv),
                z3.Or(
                    tv[kv] < z3.IntVal(0),
                    z3.Exists(
                        [lv],
                        z3.And(
                            lv >= z3.IntVal(0),
                            lv < z3.Length(resultv),
                            resultv[lv] == tv[kv]
                        )
                    )))))

interpreter = SymbolicInterpreter(predicates={
    "Inv": ((z3.IntSort(), z3.SeqSort(z3.IntSort()), z3.IntSort()), z3.BoolSort(), invariant),
    "len": ((z3.SeqSort(z3.IntSort()),), z3.IntSort(), z3.Length)
})

t = Variable("t", TUPLE_TYPE)
environment = SymbolicEnvironment().set(t, t.to_z3())

result = interpreter.execute(filter_program_with_invariants_ast, environment)
display_set(result)
_images/static_loops_27_0.svg

Note that the invariant occurs in the path constraint of the blue SET node (the “use case”). For any code following the loop, we can (and have to) therefore rely on the information encoded in the loop invariant when accessing values changed by the loop: The loop invariant is an abstraction of the loop’s behavior. In Program Proving, we will have a look at the role of loop invariants in program proving. Reasonable loop invariants should at least exclude non-feasible exceptions raised in the code following the loop (e.g., by limiting the range of variables used for indexing tuples). To show stronger properties (e.g., functional postconditions), we also have to come up with stronger loop invariants.

For a final example, we return to the linear search program:

def find(needle: int, haystack: tuple) -> int:
    i = 0
    while i < len(haystack):
        if haystack[i] == needle:
            break
            
        i = i + 1
    else:
        return -1
    
    return i

Applying the code transformation as before yields (we only show the function’s body):

i = 0

assert Inv(i, needle, haystack)

havoc i
assume Inv(i, needle, haystack)

if i < len(haystack):
    if haystack[i] == needle:
        break
            
    i = i + 1
    assert Inv(i, needle, haystack)
    assume False
else:
    return -1

return i 

Obviously, we missed something: What do we do about the break statement, which now appears outside any while statement? To preserve its semantics, we have to change the program such that no loop code is executed after hitting the break. For this simple example, this is not too difficult:

i = 0

assert Inv(i, needle, haystack)

havoc i
assume Inv(i, needle, haystack)

if i < len(haystack):
    if haystack[i] == needle:
        pass  # <-- formerly break
    else:
        i = i + 1
        assert Inv(i, needle, haystack)
        assume False
else:
    return -1

return i 

It suffices to replace the break by a pass and to embed the remaining loop code inside the else branch of the if. Turning this idea into an automated, general transformation still remains a major challenge with potentially many special cases to consider.

Loop Scopes

An alternative approach which does not require any program transformation of loop bodies was proposed in [SteinhofelW17]. This work introduces so-called loop scope statements, which are similar in spirit to the method frames described in Transparent Symbolic Execution of Function Calls. The idea is that we do not need to transform, for example, the break statement above. Instead, a break statement may legally occur within a loop scope, which “swallows” it and completes normally.

The original loop scope statements from [SteinhofelW17] are statements of the form

loop-scope(x):
    body

where x is the boolean index variable of the loop scope and body its body. Their intuitive semantics is that x is set to False if, and only if, body completes abruptly because of a continue statement, and is set to True otherwise. That is, x indicates if the loop has been exited. Loop scopes were implemented for the rule-based symbolic execution engine KeY1. There, the value of x is then used in an updated postcondition to distinguish whether the loop invariant or the original postcondition has to be shown. It turns out that in our symbolic minipy interpreter, the index x is not needed, since we can simply “catch” the abrupt completion behavior of body and proceed accordingly.

In [Steinhofel20], loop scopes are given a precise, non-mechanized semantics. We adapt this definition to minipy, which involves both changes to the syntax and semantics of loop scopes. Syntactically, we drop the loop scope index x, and instead require a boolean expression as argument: The loop invariant. The syntax of minipy loop scopes is

loop-scope(inv=expr):
    body

where expr is a boolean expression. We also change the semantics; most notably, we check the invariant if the body completes due to a continue statement, and either terminate the execution if the invariant is satisfied (cf. the addition of False to the path constraint in case of the transformation-based approach) or raise an AssertionError. Subsequently, we provide a more precise description.

A loop scope statement is executed by first executing the body body. Then there is a choice:

  • If execution of body completes normally, the behavior of the loop scope statement is undefined.

  • If execution of body completes abruptly because of a break, the loop scope statement completes normally.

  • If execution of body completes abruptly because of a continue, there is a choice:

    • If expr evaluates to True, the execution (of the whole program) completes normally.

    • If expr evaluates to False, the loop scopes statement completes abruptly because of an AssertionError.

  • If execution of body completes abruptly for any other reason, the loop scope statement completes abruptly for the same reason.

This facilitates a different kind of program transformation for loop invariant-based symbolic execution of loops which leaves loop bodies fully intact. A while loop with guard guard, body body, assigned locations x_1 to x_n and invariant expression Inv is transformed to

assert Inv

havoc x_1
# ...
havoc x_n
assume Inv

loop-scope(inv=Inv):    
    if guard:
        body
        continue
    else:
        break

If the loop has an else branch with body else_body, we retain this branch, adding a trailing else statement. We obtain

assert Inv

havoc x_1
# ...
havoc x_n
assume Inv

loop-scope(inv=Inv):    
    if guard:
        body
        continue
    else:
        else_body
        break

Observe that it is not possible for the body of the loop scope to complete normally. If the original body completes normally, the newly added continue statement is reached and the loop invariant checked. If the guard evaluates to False, we hit the added break statement, leading to the classic use case of loop invariant reasoning.

Grammar, Parser, and Mechanized Semantics

Let us implement loop scopes for minipy. First, as usual, we extend the grammar, add AST node classes and extend the ASTConverter.

MINIPY_GRAMMAR["<compound_stmt>"].insert(0, "<loop_scope>")
MINIPY_GRAMMAR["<loop_scope>"] = ["loop-scope(inv=<expression>):<block>"]
class LoopScope(ASTNode):
    def __init__(self, inv: Expression, block: Block):
        super().__init__(f"loop-scope(inv={inv.code}):\n{indent(block.code, '    ')}")
        self.inv = inv
        self.block = block

    def transform(self, transformer: Callable[[ASTNode], ASTNode]) -> ASTNode:
        return transformer(LoopScope(self.inv.transform(transformer), self.block.transform(transformer)))
class ASTConverter(ASTConverter):
    def __init__(self):
        super().__init__()
        self.stmt_interpretations["<loop_scope>"] = self.transform_loop_scope_stmt
        
    def transform_loop_scope_stmt(self, stmt: ParseTree) -> LoopScope:
        m_res = match(("loop-scope(inv=", "<expression>", "):", "<block>"), stmt)
        assert m_res is not None

        return LoopScope(self.transform_expression(m_res[1]), self.transform_block(m_res[-1]))
def parse(inp: str) -> 'ASTNode':
    inp = strip_comments_and_whitespace(inp)
    tree = PythonPEGParser(MINIPY_GRAMMAR).parse(inp)[0]
    return ASTConverter().transform_top(tree)
find_loop_scopes = """
i = 0

assert Inv(i, needle, haystack)

havoc i
assume Inv(i, needle, haystack)

loop-scope(inv=Inv(i, needle, haystack)):
    if i < len(haystack):
        if haystack[i] == needle:
            break
        
        i = i + 1
        continue
    else:
        return -1
        break

return i 
"""

find_loop_scopes_ast = parse(find_loop_scopes)
display_program(find_loop_scopes_ast.code)
i = 0
assert Inv(i, needle, haystack)
havoc i
assume Inv(i, needle, haystack)
loop-scope(inv=Inv(i, needle, haystack)):
    if (i < len(haystack)):
        if (haystack[i] == needle):
            break
        i = (i + 1)
        continue
    else:
        return -1
        break
return i

As we did for method frames in Transparent Symbolic Execution of Function Calls, we can mechanize the semantics of loop scopes by implementing their behavior in the concrete minipy interpreter.

class Interpreter(Interpreter):
    def __init__(self):
        super().__init__()
        self.stmt_interpretations[LoopScope] = self.execute_loop_scope
        
    def execute_loop_scope(self, stmt: LoopScope, environment: Environment) -> None:
        try:
            self.execute(stmt.block, environment)
            assert False
        except Exception as exc:
            if isinstance(exc, Break):
                return
            elif isinstance(exc, Continue):
                if self.evaluate(stmt.inv, environment):
                    raise Halt()
                else:
                    raise AssertionError(f"Loop scope invariant {stmt.inv.code} not satisfied.")
            else:
                raise exc

For experimenting with this implementation, we use a simple loop iterating over a tuple and performing special actions for the magic values 17 and 42:

while i < len(t):
    if i == 17:
        break
    if i == 42:
        return i + 1

    i = i + 1

Assuming the invariant inv >= 0 and i <= len(t), transforming loop as explained above for invariant-based symbolic execution with loop scopes yields the following loop scope statement:

loop_scope_program = """
loop-scope(inv=i >= 0 and i <= len(t)):
    if i < len(t):
        if i == 17:
            break
        if i == 42:
            return i + 1
            
        i = i + 1
        continue
    else:
        break
"""

loop_scope_program_ast = parse(loop_scope_program)
display_program(loop_scope_program_ast.code)
loop-scope(inv=((i >= 0) and (i <= len(t)))):
    if (i < len(t)):
        if (i == 17):
            break
        if (i == 42):
            return (i + 1)
        i = (i + 1)
        continue
    else:
        break

If the loop is entered and the continue statement reached, the invariant is verified. If this succeeds, the execution completes due to a raised Halt exception (which is a special exception type signaling abrupt completion of the execution for a reason other than an error).

i = Variable("i", INT_TYPE)
t = Variable("t", TUPLE_TYPE)

interpreter = Interpreter()

environment = Environment()
environment[i] = 3
environment[t] = (1, 2, 3, 4, 5)

try:
    interpreter.execute(loop_scope_program_ast, environment)
except Halt:
    print(f"Invariant satisfied (Halt raised). i={environment[i]}")
Invariant satisfied (Halt raised). i=4

If the value of index i exceeds the tuple size, the loop body is not executed and the break statement hit instead. The value of i is not changed.

environment = Environment()
environment[i] = 6  # Out of bounds; loop body not entered
environment[t] = (1, 2, 3, 4, 5)

interpreter.execute(loop_scope_program_ast, environment)
print(f"Program completed normally. i={environment[i]}")
Program completed normally. i=6

The magic value 17 induces an abrupt completion of the body due to a break statement. The loop scope completes normally, i remains unchanged.

environment = Environment()
environment[i] = 17  # Magic value
environment[t] = tuple([i for i in range(50)])

interpreter.execute(loop_scope_program_ast, environment)
print(f"Program completed normally. i={environment[i]}")
Program completed normally. i=17

For the magic value 42, a return is executed.

environment = Environment()
environment[i] = 42  # Magic value
environment[t] = tuple([i for i in range(50)])

try:
    interpreter.execute(loop_scope_program_ast, environment)
except Return as ret:
    print(f"Program completed because of a return of {ret.value}. i={environment[i]}")
Program completed because of a return of 43. i=42

Since i is not initialized before the loop and the loop guard only checks if i is too big, we can assign it a negative initial value leading to a violation of the loop invariant.

environment = Environment()
environment[i] = -3  # out of bounds, but still enters loop body
environment[t] = (1, 2, 3)

try:
    interpreter.execute(loop_scope_program_ast, environment)
except AssertionError:
    print(f"Loop invariant not satisfied. i={environment[i]}")
Loop invariant not satisfied. i=-2

Symbolic Execution of Loop Scopes

Extending the symbolic interpreter with loop scope support is not too difficult since we can copy the code for the most difficult part, the checking of the invariant in the case of a completion due to a continue statement, from the method handling assert statements.

class SymbolicInterpreter(SymbolicInterpreter):
    def __init__(self,
                 loop_unrolling_threshold: Optional[int] = None,
                 predicates: Optional[Dict[str, Tuple[Tuple[z3.SortRef, ...], z3.SortRef, Callable]]] = None):
        super().__init__(loop_unrolling_threshold, predicates)
        self.stmt_interpretations[LoopScope] = self.execute_loop_scope
        
    def execute_loop_scope(self, stmt: LoopScope, environment: SymbolicEnvironment) -> SET:
        tree = self.execute(stmt.block, environment)
        result = (SETNode(environment, stmt.code), [tree])

        for leaf in get_leaves(tree):
            leaf_node, _ = leaf
            environment = leaf_node.environment
            exc = leaf_node.environment.abrupt_completion

            assert exc is not None

            if isinstance(exc, Break):
                result = replace_in_tree(
                    result, leaf, (SETNode(leaf_node.environment.set_abrupt_completion(None)), []))
            elif isinstance(exc, Continue):
                eval_inv_result = self.evaluate(stmt.inv, environment)
                children = leaf[1]

                for value, constraint in eval_inv_result:
                    new_env = environment.add_constraint(constraint)
                    if self.unsatisfiable(new_env):
                        continue

                    if isinstance(value, Exception):
                        exc_node = (SETNode(new_env.set_abrupt_completion(value), ''), [])
                        children.append(exc_node)
                        continue

                    assert isinstance(value, z3.BoolRef)
                    sat_env = (new_env
                               .add_constraint(value)
                               .set_abrupt_completion(Halt()))
                    viol_env = (new_env
                                .add_constraint(z3.Not(value))
                                .set_abrupt_completion(AssertionError()))

                    sat_env_unsat = self.unsatisfiable(sat_env)
                    viol_env_unsat = self.unsatisfiable(viol_env)
                    
                    assert not sat_env_unsat or not viol_env_unsat

                    if sat_env_unsat:
                        children.append((SETNode(viol_env, ''), []))
                    elif viol_env_unsat:
                        children.append((SETNode(sat_env, ''), []))
                    else:
                        children.append((SETNode(sat_env, ''), []))
                        children.append((SETNode(viol_env, ''), []))

        return result

As an example for demonstrating the result of this implementation, we take our linear search program. Its loop invariant is “i stays within 0 and the size of haystack (inclusively), and we did not find needle at all previous indexes”.

kv = z3.Int("k")
invariant = lambda iv, needlev, haystackv: \
    z3.And(
        iv >= z3.IntVal(0),
        iv <= z3.Length(haystackv),
        z3.ForAll(
            [kv],
            z3.Implies(
                z3.And(z3.IntVal(0) <= kv, kv < iv),
                z3.Not(haystackv[kv] == needlev)
            )
        )
    )

interpreter = SymbolicInterpreter(predicates={
    "Inv": ((z3.IntSort(), z3.IntSort(), z3.SeqSort(z3.IntSort())), z3.BoolSort(), invariant),
    "len": ((z3.SeqSort(z3.IntSort()),), z3.IntSort(), z3.Length)
})

needle = Variable("needle", INT_TYPE)
haystack = Variable("haystack", TUPLE_TYPE)
environment = SymbolicEnvironment().set(haystack, haystack.to_z3()).set(needle, needle.to_z3())

result = interpreter.execute(find_loop_scopes_ast, environment)
display_set(result)
_images/static_loops_56_0.svg

What happens if we add a small bug into our implementation? Assume we typed i = i + i instead i = i + 1 (which is not entirely impossible):

find_loop_scopes_buggy = """
i = 0

assert Inv(i, needle, haystack)

havoc i
assume Inv(i, needle, haystack)

loop-scope(inv=Inv(i, needle, haystack)):
    if i < len(haystack):
        if haystack[i] == needle:
            break

        i = i + i
        continue
    else:
        return -1
        break

return i 
"""

find_loop_scopes_buggy_ast = parse(find_loop_scopes_buggy)
display_program(find_loop_scopes_buggy_ast.code)
i = 0
assert Inv(i, needle, haystack)
havoc i
assume Inv(i, needle, haystack)
loop-scope(inv=Inv(i, needle, haystack)):
    if (i < len(haystack)):
        if (haystack[i] == needle):
            break
        i = (i + i)
        continue
    else:
        return -1
        break
return i
result = interpreter.execute(find_loop_scopes_buggy_ast, environment)
display_set(result)
_images/static_loops_59_0.svg

Now, we get an AssertionError node after the SET node for the continue statement, since the invariant is not preserved by the code.

The only missing puzzle piece is an automatic transformation from loops into loop scopes. Since we do not have to perform transformations inside loop bodies, this is easy to implement. First, however, we define another visitor class to retrieve assigned loop locations for the introduction of the havoc statements.

class AssignedVariablesVisitor:
    def __init__(self):
        self.result: Set[str] = set()

    def __call__(self, *args, **kwargs):
        assert len(args) == 1
        assert not kwargs
        assert isinstance(args[0], ASTNode)
        node: ASTNode = args[0]

        if isinstance(node, Assignment):
            self.result.add(node.lhs)

        return node

    @staticmethod
    def get_assigned_variables(node: ASTNode) -> Set[str]:
        visitor = AssignedVariablesVisitor()
        node.transform(visitor)
        return visitor.result

The LoopScopeTransformer class itself is straightforward. It requires a mapping from loop statements to invariant expressions which we insert into the loop scope statements.

class LoopScopeTransformer:
    def __init__(self, invariants: Dict[WhileStmt, Expression]):
        self.invariants = invariants

    def __call__(self, *args, **kwargs):
        assert len(args) == 1
        assert not kwargs
        assert isinstance(args[0], ASTNode)
        node: ASTNode = args[0]

        if isinstance(node, WhileStmt):
            inv = self.invariants[node]
            assigned_vars = AssignedVariablesVisitor.get_assigned_variables(node)

            result_stmts: List[ASTNode] = [Assert(inv)]
            result_stmts.extend([Havoc(variable) for variable in assigned_vars])
            result_stmts.append(Assume(inv))

            body = IfStmt(
                node.guard,
                Block(node.body.stmts + [ContinueStmt()]),
                Block(([] if node.else_block is None else node.else_block.stmts) + [BreakStmt()])
            )

            result_stmts.append(LoopScope(inv, Block([body])))

            return Stmts(result_stmts)

        return node

We transform the body of our linear search program:

find_program = """
i = 0
while i < len(haystack):
    if haystack[i] == needle:
        break

    i = i + 1
else:
    return -1

return i
"""

find_program_ast = parse(find_program)
display_program(find_program_ast.code)
i = 0
while (i < len(haystack)):
    if (haystack[i] == needle):
        break
    i = (i + 1)
else:
    return -1
return i

The transformation requires that we first extract the loop statement for assigning it its invariant.

loop_stmt = next(iter(
    FilterASTVisitor.filter(find_program_ast, lambda node: isinstance(node, WhileStmt))))
display_program(loop_stmt.code)
while (i < len(haystack)):
    if (haystack[i] == needle):
        break
    i = (i + 1)
else:
    return -1
loop_scope_transformer = LoopScopeTransformer({loop_stmt: parse_expr("Inv(i, needle, haystack)")})
find_loop_scopes = find_program_ast.transform(loop_scope_transformer)
display_program(find_loop_scopes.code)
i = 0
assert Inv(i, needle, haystack)
havoc i
assume Inv(i, needle, haystack)
loop-scope(inv=Inv(i, needle, haystack)):
    if (i < len(haystack)):
        if (haystack[i] == needle):
            break
        i = (i + 1)
        continue
    else:
        return -1
        break
return i

This looks as expected! As promised in Transparent Symbolic Execution of Function Calls, we can combine (chain) the transformers for loop scopes and method frames, such that the full version of the linear search program including the method call can be transformed and transparently executed symbolically:

find_program_with_func_decl = """
def find(needle: int, haystack: tuple) -> int:
    i = 0
    while i < len(haystack):
        if haystack[i] == needle:
            break
            
        i = i + 1
    else:
        return -1
    
    return i

result = find(needle, haystack)
"""

find_program_with_func_decl_ast = parse(find_program_with_func_decl)
display_program(find_program_with_func_decl_ast.code)
def find(needle: int, haystack: tuple) -> int:
    i = 0
    while (i < len(haystack)):
        if (haystack[i] == needle):
            break
        i = (i + 1)
    else:
        return -1
    return i
result = find(needle, haystack)
find_program_with_func_decl_loop_scopes = \
    find_program_with_func_decl_ast.transform(loop_scope_transformer)
display_program(find_program_with_func_decl_loop_scopes.code)
def find(needle: int, haystack: tuple) -> int:
    i = 0
    assert Inv(i, needle, haystack)
    havoc i
    assume Inv(i, needle, haystack)
    loop-scope(inv=Inv(i, needle, haystack)):
        if (i < len(haystack)):
            if (haystack[i] == needle):
                break
            i = (i + 1)
            continue
        else:
            return -1
            break
    return i
result = find(needle, haystack)
method_frame_transformer = MethodFrameTransformer(find_program_with_func_decl_loop_scopes)
find_program_with_func_decl_loop_scopes_method_frames = \
    find_program_with_func_decl_loop_scopes.transform(method_frame_transformer)
display_program(find_program_with_func_decl_loop_scopes_method_frames.code)
def find(needle: int, haystack: tuple) -> int:
    i = 0
    assert Inv(i, needle, haystack)
    havoc i
    assume Inv(i, needle, haystack)
    loop-scope(inv=Inv(i, needle, haystack)):
        if (i < len(haystack)):
            if (haystack[i] == needle):
                break
            i = (i + 1)
            continue
        else:
            return -1
            break
    return i
method-frame(result=var_0):
    needle = needle
    haystack = haystack
    i = 0
    assert Inv(i, needle, haystack)
    havoc i
    assume Inv(i, needle, haystack)
    loop-scope(inv=Inv(i, needle, haystack)):
        if (i < len(haystack)):
            if (haystack[i] == needle):
                break
            i = (i + 1)
            continue
        else:
            return -1
            break
    return i
result = var_0

Symbolically executing this code uses sound loop-invariant based loop abstraction thanks to loop scopes, and produces an SET including nodes for all intermediate execution steps thanks to method frames.

result = interpreter.execute(find_program_with_func_decl_loop_scopes_method_frames, environment)
display_set(result)
_images/static_loops_74_0.svg

TODO

Final words, maybe descriptions of related approaches to loop invariant reasoning (loop invariant inference, abstract interpretation, mention that in dynamic SE, pruning/search strategies are used)

References

ABB+16

Wolfgang Ahrendt, Bernhard Beckert, Richard Bubel, Reiner Hähnle, Peter H. Schmitt, and Mattias Ulbrich, editors. Deductive Software Verification – The KeY Book. Volume 10001 of LNCS. Springer, 2016. doi:10.1007/978-3-319-49812-6.

BCD+05

Mike Barnett, Bor-Yuh Evan Chang, Robert DeLine, Bart Jacobs, and K Rustan M Leino. Boogie: A Modular Reusable Verifier for Object-Oriented Programs. In International Symposium on Formal Methods for Components and Objects, 364–387. Springer, 2005.

DE82

R.B. Dannenberg and G.W. Ernst. Formal Program Verification Using Symbolic Execution. IEEE Transactions on Software Engineering, SE-8(1):43–52, 1982.

Hoa69

Charles Antony Richard Hoare. An Axiomatic Basis for Computer Programming. Communications of the ACM, 12(10):576–580, 1969.

Steinhofel20

Dominic Steinhöfel. Abstract Execution: Automatically Proving Infinitely Many Programs. PhD thesis, TU Darmstadt, Dept. of Computer Science, Darmstadt, Germany, 2020. URL: http://tuprints.ulb.tu-darmstadt.de/8540/, doi:10.25534/tuprints-00008540.

SteinhofelW17(1,2)

Dominic Steinhöfel and Nathan Wasser. A New Invariant Rule for the Analysis of Loops with Non-standard Control Flows. In Proc. 13th Intern. Conf on Integrated Formal Methods (IFM), 279–294. 2017. doi:10.1007/978-3-319-66845-1\_18.


1

https://www.key-project.org/