Symbolic program analysis and model checking as Satisfiability Modulo Theories (invited talk)

Nikolaj Bjørner

Microsoft Research

Satisfiability Modulo Theories (SMT) solvers offer a compelling match for software tools, since several common software constructs map directly into supported theories. This talk describes the state-of-the art SMT solver Z3, developed by Leonardo de Moura, Christoph Wintersteiger and Nikolaj Bjorner at Microsoft Research. The talk gives an overview of Z3 uses in several program analysis and verification tools, including the SAGE and Pex smart white-box fuzzers that use symbolic execution, software model checkers Yogi, SLAM and Poirot, and program verification systems VCC, F*, Dafny. We then revisit a central problem in program analysis, model checking and verif\ ication - synthesizing sufficient inductive invariants - as solving satisfiability of Horn clauses. We examine algorithms used in software model checking as strategies for solving Horn clauses and we present a new algorithm for solving Horn clauses based on a recent method for symbolic model checking known as IC3 or Property Directed Reachability.

The topics covered in the talk are based on joint work with Leonardo de Moura, Krystof Hoder, Ken McMilland and Andrey Rybalchenko.

The site http://rise4fun.com provides interactive tutorials of Z3 and several tools based on Z3.

About speaker: Nikolaj is a senior researcher at Microsoft Research in Redmond where he works on theorem proving and software engineering.

Slides available as .pdf

Towards Abstract Interpretation of Epistemic Logic

John P. Gallagher

Roskilde University

We present an application of abstract interpretation to a multiagent epistemic logic containing operators for common and distributed knowledge. The motivation is to be able to reason about agents in infinite-state worlds. Firstly, we derive a model checking algorithm directly from the semantics. The algorithm does not terminate for infinite-state models; hence, we consider how to construct an abstract model checker. When applied to the problem of whether M ⊨ φ, where M is a model with an infinite number of states, it terminates and returns the set of states in M at which φ might hold. If the set is empty, then M definitely does not satisfy φ, while if the set is non-empty then M possibly satisfies φ. A typical abstraction (but by no means the only kind) is based on a finite set of properties of interest {p1,…,pk}, for example those atomic propositions appearing in the formula to be checked. Suppose we have a model whose set of states S is infinite and in every state in S, exactly one pi holds. Then the finite partition A = {d1,…,dk} is defined such that di = {sSpiL(s)}. This can be used to construct an abstract domain in a standard way. Then in property-based abstract interpretation, the abstract semantic function returns a set of partitions (an element of ℘(A)). The union of this set of partitions is a superset of the set of concrete states that would be returned by the concrete model checker. We discuss how an agent could make use of this kind of information.

(Joint work with Mai Ajspur. Also appeared at the Scandinavian Logic Symposium 2012.)

Slides available as .pdf

Obfuscation by Partial Evaluation of Distorted Interpreters

Neil D. Jones

University of Copenhagen

How to construct a general program obfuscator? We present a novel approach to automatically generating obfuscated code P' from any program P with source clear code. Start with a (program-executing) interpreter interp for the language in which P is written. Then "distort" interp so it is still correct, but its specialization P' with respect to P is transformed code that is equivalent to the original program, but harder to understand or analyze. Potency of the obfuscator is proved with respect to a general model of the attacker, modeled as an approximate (abstract) interpreter. A systematic approach to distortion is to make program P obscure by transforming it to P' on which (abstract) interpretation is incomplete. Interpreter distortion can be done by making residual in the specialization process sufficiently many interpreter operations to defeat an attacker in extracting sensible information from transformed code. Our method is applied to: code flattening, data-type obfuscation, and opaque predicate insertion. The technique is language independent

(Joint work with Roberto Giacobazzi and Isabella Mastroeni. Also appeared at PEPM'12.)

Slides available as .pdf

Remedying the Eval that Men Do

Anders Møller

Aarhus University

A range of static analysis tools and techniques have been developed in recent years with the aim of helping JavaScript web application programmers produce code that is more robust, safe, and efficient. However, as shown in a previous large-scale study, many web applications use the JavaScript eval function to dynamically construct code from text strings in ways that obstruct existing static analyses. As a consequence, the analyses either fail to reason about the web applications or produce unsound or useless results.

We present an approach to soundly and automatically transform many common uses of eval into other language constructs to enable sound static analysis of web applications. By eliminating calls to eval, we expand the applicability of static analysis for JavaScript web applications in general.

The transformation we propose works by incorporating a refactoring technique into a dataflow analyzer. We report on our experimental results with a small collection of programming patterns extracted from popular web sites. Although there are inevitably cases where the transformation must give up, our technique succeeds in eliminating many nontrivial occurrences of eval.

(Joint work with Simon Holm Jensen and Peter A. Jonsson. Also appeared at ISSTA'12.)

Slides available as .pdf

Formalisation and Analysis of Dalvik Bytecode

Erik Ramsgaard Wognsen

Aalborg University

More than 400 million Android devices have been activated since the introduction of the Android mobile operating system. Its open nature allows third party apps to be downloaded and executed on smartphones, and while this feature is popular with end-users, it also paves the way for malware which may steal private information or abuse services that cost the user money, for example, by secretly sending text messages to expensive premium numbers. To counter this problem, automated tools for analysing and verifying apps are essential, and to ensure the quality and effectiveness of such tools, it is essential to formally specify both semantics and analyses.

We have made the first (to the best of our knowledge) formalisation of the Dalvik bytecode language which is the common executable format for all Android apps. To determine which features to include in the formalisation and analysis, 1,700 Android apps from Android Market were downloaded and examined. The analysis is a flow logic based control flow analysis that includes advanced control flow features such as dynamic dispatch, exceptions, and reflection. The analysis is implemented in a prototype that translates each bytecode instruction in an app to a set of Prolog rules based on the flow logic judgements of the instructions. The generated Prolog program can be queried for any information that the analysis specifies. The analysis is demonstrated on a few small apps by analysing the sources of destination phone numbers for sending text messages.

(Joint work with Henrik Søndberg Karlsen, Mads Chr. Olesen, and René Rydhof Hansen. Also appeared at BYTECODE'12.)

Slides available as .pdf

Measuring Enforcement Windows with Symbolic Trace Interpretation:
What Well-Behaved Programs Say

Bor-Yuh Evan Chang

University of Colorado Boulder

A static analysis design is sufficient if it can prove the property of interest with an acceptable number of false alarms. Ultimately, the only way to confirm that an analysis design is sufficient is to implement it and run it on real-world programs. If the evaluation shows that the design is insufficient, the designer must return to the drawing board and repeat the process -- wasting expensive implementation effort over and over again. In this paper, we make the observation that there is a minimal range of code needed to prove a property of interest under an ideal static analysis; we call such a range of code a validation scope. Armed with this observation, we create a dynamic measurement framework that quantifies validation scopes and thus enables designers to rule out insufficient designs at lower cost. A novel attribute of our framework is the ability to model aspects of static reasoning using dynamic execution measurements. To evaluate the flexibility of our framework, we instantiate it on an example property -- null dereference errors -- and measure validation scopes on real-world programs. We use a broad range of metrics that capture the difficulty of analyzing programs along varying dimensions. We also examine how validation scopes evolve as developers fix null dereference errors and as code matures. We find that bug fixes shorten validation scopes, that longer validation scopes are more likely to be buggy, and that overall validation scopes are remarkably stable as programs evolve.

(Joint work with Devin Coughlin, Amer Diwan, and Jeremy G. Siek. Also appeared at ISSTA'12.)

Dataflow Analysis for Software Product Lines

Claus Brabrand

IT University of Copenhagen

Software product lines (SPLs) are commonly developed using annotative approaches such as conditional compilation that come with an inherent risk of constructing erroneous products. For this reason, it is essential to be able to analyze SPLs. However, as dataflow analysis techniques are not able to deal with SPLs, developers must generate and analyze all valid methods individually, which is expensive for non-trivial SPLs.

In this paper, we demonstrate how to take any intraprocedural standard dataflow analysis and automatically turn it into a feature-sensitive dataflow analysis in three different ways. All are capable of analyzing all valid methods of an SPL without having to generate all of them explicitly. We have implemented all analyses as extensions of SOOT's intraprocedural dataflow analysis framework and experimentally evaluated their performance and memory characteristics on four qualitatively different SPLs. The results indicate that the feature-sensitive analyses are respectively on average three, four, and five times faster than the naive brute force approach on our SPLs, and that they have different time and space tradeoffs.

(Joint work with Marcio Ribeiro, Tarsis Toledo, Johnni Winther, and Paulo Borba. Also appeared at AOSD'12.)

Slides available as .pdf

Automated Detection of Client-State Manipulation Vulnerabilities

Mathias Schwarz

Aarhus University

Web application programmers must be aware of a wide range of potential security risks. Although the most common pitfalls are well described and categorized in the literature, it remains a challenging task to ensure that all guidelines are followed. For this reason, it is desirable to construct automated tools that can assist the programmers in the application development process by detecting weaknesses. Many vulnerabilities are related to web application code that stores references to application state in the generated HTML documents to work around the statelessness of the HTTP protocol. In this paper, we show that such client-state manipulation vulnerabilities are amenable to tool supported detection.

We present a static analysis for the widely used frameworks Java Servlets, JSP, and Struts. Given a web application archive as input, the analysis identifies occurrences of client state and infers the information flow between the client state and the shared application state on the server. This makes it possible to check how client-state manipulation performed by malicious users may affect the shared application state and cause leakage or modifications of sensitive information. The warnings produced by the tool help the application programmer identify vulnerabilities. Moreover, the inferred information can be applied to configure a security filter that automatically guards against attacks. Experiments on a collection of open source web applications indicate that the static analysis is able to effectively help the programmer prevent client-state manipulation vulnerabilities.

(Joint work with Anders Møller. Also appeared at ICSE'12.)