Syntax and Semantics of Symbolic States¶

import utils
from minipy import *
from constraint_solving import *

Symbolic states are the basic building blocks of symbolic execution. They differ from concrete execution states inasmuch that program variables are assigned symbolic instead of concrete expressions, and that they include an additional path constraint accumulating decisions taken during symbolic execution.

Definition (Symbolic Execution State). A Symbolic (Execution) State (SES) is a triple \((\mathit{Constraint}, \mathit{Store}, \mathit{PC})\) of (1) a path constraint \(\mathit{Constraint}\), formally a set of formulas over program variables, (2) a symbolic store \(\mathit{Store}\), formally a set of assignments of program variables to symbolic expressions over program variables, and (3) a program counter \(\mathit{PC}\), formally a minipy program. We write \(\mathit{SEStates}\) for the set of all symbolic states.

Symbolic values in minipy are arithmetic expressions, boolean expressions, or sequences. Instead of defining our own language for symbolic expressions, we go for a shallow embedding into the language of the z3 SMT solver 1. To represent tuples, we use z3’s SeqRef type for sequences. SeqRef does not permit expressing nested sequences. To add nested sequences as symbolic values, we would have to go for a deeper embedding, which we abstain from for simplicity.

from typing import Union
import z3

SymbolicValueType = Union[z3.ArithRef, z3.BoolRef, z3.SeqRef]

We first need a procedure to produce a symbolic z3 expression for a minipy Variable:

class Variable(Variable):
    def to_z3(self):
        if self.type == INT_TYPE:
            return z3.Int(self.name)
        elif self.type == BOOL_TYPE:
            return z3.Bool(self.name)
        elif self.type == TUPLE_TYPE:
            return z3_sequence(self.name)

Consider, for example, the program x = x + 1, which increments the initial, symbolic value of the integer variable x. The resulting symbolic value is

x = Variable("x", INT_TYPE)
symbolic_increment_expression = x.to_z3() + z3.IntVal(1)
symbolic_increment_expression

x + 1

type(symbolic_increment_expression).__name__

'ArithRef'

A symbolic sequence (which represents a non-nested minipy tuple) is defined the z3.Unit and z3.Concat primitives. We define a helper function transforming Python integer lists to z3 sequences:

def int_list_to_seq(l: Union[List[int], Tuple[int, ...]]) -> z3.SeqRef:
    if not l:
        return z3.Empty(z3.SeqSort(z3.IntSort()))
    elif len(l) == 1:
        return z3.Unit(z3.IntVal(l[0]))
    else:
        return z3.Concat(*[z3.Unit(z3.IntVal(elem)) for elem in l])

int_list_to_seq([1, 2, 3])

Concat(Unit(1), Concat(Unit(2), Unit(3)))

int_list_to_seq([1, 2, 3])[z3.IntVal(1)]

Nth(Concat(Unit(1), Concat(Unit(2), Unit(3))), 1)

z3.simplify(int_list_to_seq([1, 2, 3])[z3.IntVal(1)])

2

We can now implement a symbolic store. In essence, a symbolic store is just a mapping from variables to symbolic values (our implementation contains some “gimmicks” like a suitable equality check and conversion to pretty-printed HTML code):

from IPython.core.display import display, HTML

def display_html(inp: str):
    display(HTML(inp))

class SymbolicStore:
    def __init__(self, env: Optional[Dict[Variable, SymbolicValueType]] = None):
        if env is None:
            self.env: Dict[Variable, SymbolicValueType] = {}
        else:
            self.env = env

    def __eq__(self, other):
        return isinstance(other, SymbolicStore) and self.env == other.env
    
    def get_variable(self, name: str) -> Optional[Variable]:
        maybe_var = [variable for variable in self.env if variable.name == name]
        assert len(maybe_var) <= 1
        return None if not maybe_var else maybe_var[0]

    
    def to_html(self):
        prev_html = z3.in_html_mode()
        z3.set_html_mode(True)

        result = "{" + ", ".join([str(var) + ": " + z3_html_escape(z3.obj_to_string(val))
                                  for var, val in self.env.items()]) + "}"
        z3.set_html_mode(prev_html)

        return result
    
    def display(self):
        display_html(self.to_html())

    def __str__(self):
        return str({str(var): str(val) for var, val in self.env.items()})
    
    def __repr__(self):
        return f"SymbolicStore({repr(self.env)})"


def z3_html_escape(html_input: str) -> str:
    return html_input.replace("<", "&lt;").replace(">", "&gt;")

We overload the Python’s __getitem__ function to access elements of the store using dictionary syntax store[variable]:

class SymbolicStore(SymbolicStore):
    def __getitem__(self, item: Union[str, Variable]):
        if item in self.env:
            return self.env[item]
        else:
            if isinstance(item, str):
                maybe_var = self.get_variable(item)
                assert maybe_var, f"Unknown variable {item}"
                return self.env[maybe_var]

            raise AttributeError(f"Attempt to read uninitialized variable {item}")

We also need a setter function. We design symbolic stores to be immutable, which is why the setter returns a new store:

class SymbolicStore(SymbolicStore):
    def set(self, variable: Variable, value: SymbolicValueType) -> 'SymbolicStore':
        new_env = copy.deepcopy(self.env)
        new_env[variable] = z3.simplify(value)
        return SymbolicStore(new_env)

Back to our x = x + 1 example: Assume we start symbolic execution in a state where variable x is assigned its unknown, initial symbolic value:

store = SymbolicStore({x: x.to_z3()})
store.display()

{x: x}

Now, we update the value of x to x + 1:

store = store.set(x, x.to_z3() + z3.IntVal(1))
store.display()

{x: 1 + x}

What happens when we do that once again?

store = store.set(x, x.to_z3() + z3.IntVal(1))
store.display()

{x: 1 + x}

This could have been expected form our definition of the set method; however, it is not desirable: Since the current value of x was already x + 1, it should be x + 2 after adding 1 another time! We have to consider the current assignments of variables occurring in symbolic expressions. In other words, we have to apply existing assignments to symbolic expressions. Consequently, we define an apply_to method:

def subst(expr: z3.ExprRef, subst_map: Dict[z3.ExprRef, z3.ExprRef]) -> z3.ExprRef:
    assert all(z3.is_expr(lhs) for lhs in subst_map.keys())
    assert all(z3.is_expr(rhs) for rhs in subst_map.values())
    assert all(lhs.sort().eq(rhs.sort()) for lhs, rhs in subst_map.items()), \
           "LHS sorts don't equal RHS sorts: " + \
           str({f"{lhs} ({lhs.sort()})" : f"{rhs} ({rhs.sort()})"
               for lhs, rhs in subst_map.items()
               if not lhs.sort().eq(rhs.sort())})
    
    return z3.substitute(expr, *tuple(subst_map.items()))


class SymbolicStore(SymbolicStore):
    def apply_to(self, onto: SymbolicValueType) -> SymbolicValueType:
        return z3.simplify(
            subst(onto, {variable.to_z3(): self.env[variable]
                         for variable in self.env}))

Let’s try it out:

store = SymbolicStore({x: x.to_z3() + z3.IntVal(1)})
store.apply_to(x.to_z3() + z3.IntVal(1))

2 + x

z3.simplify(store.apply_to(x.to_z3() + z3.IntVal(1)))

2 + x

We use this to correct our set method:

class SymbolicStore(SymbolicStore):
    def set(self, variable: Variable, value: SymbolicValueType) -> 'SymbolicStore':
        new_env = copy.deepcopy(self.env)
        new_env[variable] = self.apply_to(value)
        return SymbolicStore(new_env)

store = SymbolicStore({x: x.to_z3() + z3.IntVal(1)})
store.display()

{x: x + 1}

store = store.set(x, x.to_z3() + z3.IntVal(1))
z3.simplify(store[x])

2 + x

We call our Python representation of symbolic states symbolic environments to align with our concrete minipy interpreter. Analogously to concrete environments, a symbolic environment contains a store and a repository of registered functions. Additionally, it comprises the set of path constraints. Symbolic environments are immutable. We define a function unsatisfiable which returns True if if can be shown that the set of path constraints is unsatisfiable, which is vital for symbolic execution: It enables a better performing analysis since we do not have to follow infeasible branches, and is necessary in verification to judge whether, e.g., assertions are met or failed.

import logging
from typing import Set

class SymbolicEnvironment:
    def __init__(self,
                 store: Optional[SymbolicStore] = None,
                 path_constraints: Optional[Set[z3.BoolRef]] = None,
                 functions: Optional[Dict[str, Tuple[Tuple[Variable], Type, Callable]]] = None):
        self.store: SymbolicStore = SymbolicStore() if store is None else store
        self.path_constraints: Set[z3.BoolRef] = (
            set() if path_constraints is None
            else set([z3.simplify(path_constraint) for path_constraint in path_constraints]))
        self.functions: Dict[str, Tuple[Tuple[Variable], Type, Callable]] = \
            {} if functions is None else copy.deepcopy(functions)

    def __getitem__(self, item: Union[str, Variable]) -> SymbolicValueType:
        return self.store[item]

    def set(self, variable: Variable, value: SymbolicValueType) -> 'SymbolicEnvironment':
        return SymbolicEnvironment(self.store.set(variable, value), self.path_constraints,
                                   self.functions)

    def add_constraints(self, *constraints: z3.BoolRef) -> 'SymbolicEnvironment':
        result = self
        for constraint in constraints:
            result = result.add_constraint(constraint)
        return result

    def add_constraint(self, constraint: z3.BoolRef) -> 'SymbolicEnvironment':
        if z3.is_false(z3.simplify(constraint)):
            new_constraints = {z3.BoolVal(False)}
        elif z3.is_true(z3.simplify(constraint)):
            return self
        else:
            old_constraints = {other_constraint for other_constraint in self.path_constraints
                               if not implies(constraint, other_constraint)}
            new_constraints = old_constraints | {constraint}


        return SymbolicEnvironment(self.store, new_constraints, self.functions)

    def add_function(self, name: str, params: Tuple[Variable], type: Type, impl: Callable) -> 'SymbolicEnvironment':
        new_functions = copy.deepcopy(self.functions)
        new_functions[name] = (params, type, impl)
        return SymbolicEnvironment(self.store, self.path_constraints, new_functions)
    
    def unsatisfiable(self) -> bool:
        return is_unsat(z3_and(*[constraint for constraint in self.path_constraints]))
        
    def to_html(self):
        prev_html = z3.in_html_mode()
        z3.set_html_mode(True)

        result = f"({self.path_constraint_to_html()}, {self.store.to_html()})"

        z3.set_html_mode(prev_html)
        return result

    def path_constraint_to_html(self):
        prev_html = z3.in_html_mode()
        z3.set_html_mode(True)

        path_constraint_string = "{"
        if self.path_constraints:
            path_constraint_string += ", ".join(map(z3_html_escape, map(z3.obj_to_string, self.path_constraints)))
        path_constraint_string += "}"
        path_constraint_string = path_constraint_string.replace("\n", "<BR/>").replace(" ", "&nbsp;")

        z3.set_html_mode(prev_html)
        return path_constraint_string
    
    def display(self):
        display_html(self.to_html())
    
    def __str__(self):
        return "(" + ("{}" if not self.path_constraints else str(self.path_constraints)) + ", " + str(self.store) + ")"

    def __repr__(self):
        result = f"SymbolicEnvironment({repr(self.store)}, " \
                 f"{repr(self.path_constraints)}, "
        result += repr({f_name: f"fun {f_name}{params} -> "
                                f"{type}" for f_name, (params, type, _) in self.functions.items()})
        return f"{result})"

    def __eq__(self, other):
        return isinstance(other, SymbolicEnvironment) and self.store == other.store and \
               self.path_constraints == other.path_constraints and \
               self.functions == other.functions

To illustrate the idea of symbolic environments with path conditions, consider the following minipy program:

x = 2 * y
if x < 0:
    x = -x

SE starts with an empty environment assigning symbolic values to both x and y:

x = Variable("x", INT_TYPE)
y = Variable("y", INT_TYPE)
env = SymbolicEnvironment(SymbolicStore({x: x.to_z3(), y: y.to_z3()}))

Executing the first assignment updates the symbolic store:

env = env.set(x, z3.IntVal(2) * y.to_z3())
env.display()

({}, {x: 2·y, y: y})

Next, we perform a case distinction on the guard of the if statement, which yields two environments:

env_1 = env.add_constraint(env.store.apply_to(x.to_z3() < z3.IntVal(0)))
env_2 = env.add_constraint(env.store.apply_to(z3.Not(x.to_z3() < z3.IntVal(0))))

env_1.display()

({¬(y ≥ 0)}, {x: 2·y, y: y})

env_2.display()

({y ≥ 0}, {x: 2·y, y: y})

Note that we applied the initial assignment to the path constraint. For the env_1 environment, where we entered the then branch of the if statement, we now execute the inversion of x:

env_1_1 = env_1.set(x, z3.IntVal(-1) * x.to_z3())
env_1_1.display()

({¬(y ≥ 0)}, {x: -2·y, y: y})

What is the semantics of the environment env_1_1? Imagine that we did start the example program with a concrete value for y (the initial value of x is irrelevant, since it is overwritten):

y = -2

x = 2 * y
if x < 0:
    x = -x
    
x, y

(4, -2)

Obviously, for an initial negative value of y, the final value of x is twice that inverted value. The astute reader might recognize the similarity to the symbolic environment {¬(y ≥ 0)}, {x: -2·y, y: y}): Assuming that y is initially negative, the final value of y is unchanged, while the final value of x twice the inverted initial value of y. The semantics of symbolic states is defined just like that: For any initial concrete state satisfying the path constraint, the symbolic state represents the assignments resulting from evaluating variables in the symbolic store in that concrete state. This process is called the concretization of the symbolic state. If we concretize a symbolic state from any concrete state, we obtain its full semantics.

We first define the concretization of symbolic stores. Let \(\mathit{ConcrStates}\) be the set of all concrete states (mappings from program variables to concrete values). Then, the concretization of a symbolic store \(\mathit{Store}\) with respect to a concrete store \(\sigma\in\mathit{ConcrStates}\) is a concrete store \(\sigma'\in\mathit{ConcrStates}\) such that (1) for all \(\mathtt{x}\) in the environment of \(\mathit{Store}\), \(\sigma'(\mathtt{x})\) equals the right-hand side of \(\mathtt{x}\) in \(\mathit{Store}\) when evaluating all occurring program variables in \(\sigma\), and (2) \(\sigma(\mathtt{y})=\sigma'(\mathtt{y})\) for all other program variables \(\mathtt{y}\) not occurring in \(\mathit{Store}\). This is easy to implement for our SymbolicStore class:

class SymbolicStore(SymbolicStore):
    def concretize(self, base_store: Store) -> Store:
        subst_map = {variable.to_z3(): python_expr_to_z3_expr(base_store.env[variable]) for variable in base_store.env}
        new_env = {variable: z3_expr_to_python_expr(z3.simplify(subst(self.env[variable], subst_map)))
                   for variable in self.env}
        new_env.update({variable: value for variable, value in base_store.env.items() if variable not in self.env})
        return Store(new_env)

The method concretize depends on two functions converting back and forth between Python / minipy and z3 expressions:

def python_expr_to_z3_expr(expr: Union[int, bool, tuple]) -> Union[z3.ArithRef, z3.BoolRef, z3.SeqRef]:
    if type(expr) is int:
        return z3.IntVal(expr)
    elif type(expr) is bool:
        return z3.BoolVal(expr)
    elif type(expr) is tuple:
        return int_list_to_seq(expr)

    assert False

def z3_expr_to_python_expr(expr: Union[z3.IntNumRef, z3.BoolRef, z3.SeqRef]) -> Union[int, bool, tuple]:
    if isinstance(expr, z3.IntNumRef):
        return expr.as_long()
    elif isinstance(expr, z3.BoolRef):
        if z3.is_true(expr):
            return True
        elif z3.is_false(expr):
            return False

        assert False
    elif isinstance(expr, z3.SeqRef):
        if expr.decl().kind() == z3.Z3_OP_SEQ_EMPTY:
            return tuple()

        units = split_app(expr, z3.Z3_OP_SEQ_CONCAT)
        assert all(u.decl().kind() == z3.Z3_OP_SEQ_UNIT for u in units)
        return tuple([u.children()[0].as_long() for u in units])
    
    assert False
    

def split_app(expr: z3.ExprRef, kind: int) -> List[z3.ExprRef]:
    if z3.is_app(expr) and expr.decl().kind() == kind:
        return [elem for l in [split_app(child, kind) for child in expr.children()] for elem in l]
    else:
        return [expr]

store = SymbolicStore({x: z3.IntVal(-2) * y.to_z3(), y: y.to_z3()})
print(store.concretize(Store({y: -2})))

{x: 4, y: -2}

The following definition of the concretization function is a simplified version of the original definition in [Steinhofel20]: We assume that there is one fixed interpretation for all function symbols occurring in symbolic states (e.g., for arithmetic functions such as addition) and that there do not appear uninterpreted constants. Thus, we do not need to index the concretization function with a first-order structure. The definition furthermore assumes a set \(\mathit{ConcrStates}\) of concrete states (mappings from program variables to concrete values) and a transition relation \(\rho(p)\) for every program \(p\) which consists of all pairs \((\sigma, \sigma')\) such that executing \(p\) in the state \(\sigma\in\mathit{ConcrStates}\) results in the state \(\sigma'\in\mathit{ConcrStates}\).

Definition (Concretization Function). The concretization function \(\mathit{concr}: \mathit{SEStates} \, \times \, \mathit{ConcrStates} \rightarrow 2^{\mathit{ConcrStates}}\) maps an SES \((\mathit{Constraint},\mathit{Store},\mathit{PC})\) and a concrete state \(\sigma\in\mathit{ConcrStates}\) (1) to the empty set \(\emptyset\) if either the the set \(\mathit{Constraint}\) does not hold in \(\sigma\), or, where \(\sigma'\in\mathit{ConcrStates}\) is the concretization of \(\mathit{Store}\) with respect to \(\sigma\), there is no \(\sigma''\) such that \((\sigma', \sigma'')\in\rho(\mathit{PC}))\), or otherwise (2) the singleton set \(\{\sigma''\}\) such that \((\sigma', \sigma'')\in\rho(\mathit{PC}))\), where \(\sigma'\) is as before.

In other words: If the path constraint is not satisfied by the given initial concrete store, the concretization is empty. Otherwise, the concretization is the concrete state resulting from executing the program counter starting in the concretization of the symbolic store starting in the initial concrete store. If the program counter diverges in this situation, the concretization is also empty.

We extend our implementation of SymbolicEnvironment by a function concretize, which implements this definition up to the execution of the remaining program counter. Instead of returning either an empty set of a singleton set, we return an optional result.

class SymbolicEnvironment(SymbolicEnvironment):
    def concretize(self, base_store: Store) -> Optional[Store]:
        subst_map = {variable.to_z3(): python_expr_to_z3_expr(base_store.env[variable]) for variable in base_store.env}

        concretized_path_cond = subst(z3_and(*self.path_constraints), subst_map)

        if not is_unsat(z3.Not(concretized_path_cond)):
            return None

        return self.store.concretize(base_store)

env_1_1.display()

({¬(y ≥ 0)}, {x: -2·y, y: y})

print(env_1_1.concretize(Store({y: -2})))

{x: 4, y: -2}

print(env_1_1.concretize(Store({y: 2})))

None

The full concr function 2, including the execution of the program counter, can be implemented using our minipy interpreter:

def concr(ses: SymbolicEnvironment, pc: str, sigma: Store) -> Optional[Store]:
    sigma_1 = ses.concretize(sigma)
    if sigma_1 is None:
        return None
    
    interpreter = Interpreter()
    environment = Environment(sigma_1)
    interpreter.execute(parse(pc), environment)
    sigma_2 = environment.store
    
    return sigma_2

We compute the concretization for the symbolic state arising after the execution of the initial assignment x = 2 * y of our example program, for the initial state assigning -2 to y. The program counter in this case is the if statement following the initial assignment.

x = Variable("x", INT_TYPE)
y = Variable("y", INT_TYPE)

pc = """
if x < 0:
    x = -x
"""

sigma_2 = concr(
    SymbolicEnvironment(SymbolicStore({x: z3.IntVal(2) * y.to_z3(), y: y.to_z3()})),
    pc,
    Store({y: -2})
)

print(sigma_2)

{x: 4, y: -2}

For a different initial state:

sigma_2 = concr(
    SymbolicEnvironment(SymbolicStore({x: z3.IntVal(2) * y.to_z3(), y: y.to_z3()})),
    pc,
    Store({y: 2})
)

print(sigma_2)

{x: 4, y: 2}

The method concretize is also practically useful, as we will see when implementing a concolic interpreter: We maintain both a concrete and a symbolic state, and concretize the symbolic state with respect to the concrete state to decide which branch of, e.g., an if statement to follow.

The “full semantics” of symbolic states is the union over the concretization for all possible initial concrete states. We can compute a poor approximation by generating a finite amount of initial states randomly (this demo implementation only works for integer variables, only generates values in a pre-defined range, and only produces initial states which assign values to exactly the variables assigned in the symbolic environment):

import random

def ses_semantics(ses: SymbolicEnvironment, pc: str) -> Generator[Store, None, None]:
    while True:
        sigma = Store({var: random.randint(-100, 100) for var in ses.store.env})
        yield concr(ses, pc, sigma)

env = SymbolicEnvironment(SymbolicStore({x: z3.IntVal(2) * y.to_z3(), y: y.to_z3()}))

for _ in range(10):
    print(next(ses_semantics(env, pc)))

{x: 132, y: 66}
{x: 180, y: -90}
{x: 158, y: 79}
{x: 116, y: 58}
{x: 36, y: 18}
{x: 128, y: 64}
{x: 20, y: -10}
{x: 46, y: 23}

{x: 46, y: -23}
{x: 118, y: -59}

In the next section, we extend our concrete minipy interpreter to a baseline symbolic interpreter. In that section, we also derive our correctness properties precision and exhaustiveness for symbolic transition relations.

References¶

Steinhofel20: Dominic Steinhöfel. Abstract Execution: Automatically Proving Infinitely Many Programs. PhD thesis, TU Darmstadt, Dept. of Computer Science, Darmstadt, Germany, 2020. URL: http://tuprints.ulb.tu-darmstadt.de/8540/, doi:10.25534/tuprints-00008540.

1: https://github.com/Z3Prover/z3
2: Technically, it is not the full concr function, since there is, of course, no way to decide whether the program counter will terminate. In the theoretical definition, we assume that in the case of nontermination, there is no corresponding pair in the transition relation \(\rho(\mathit{PC})\). Practically, concr will diverge if, and only if, the program counter diverges in the concretization of the symbolic store with respect to the given initial concrete store.

Symbolic Execution: Foundations, Techniques, Applications and Future Perspectives

Syntax and Semantics of Symbolic States¶

References¶