Identifying and Analyzing Pointer-based Dynamic Data Structures

Programs that make heavy use of pointers are notoriously difficult to analyse, especially when the programmer is given the freedom allowed by languages such as C. In this research project we aim to simplify the analysis process by automatically identifying the pointer-based dynamic data structures employed by a program. The scope of this identification includes not only the structure (e.g., a singly linked list) but also its behaviour (e.g., a singly linked list used as a queue or a stack). To identify such behaviour it is necessary to also analyse the associated operations, i.e., those that transform the data structure from one state to another.

Our approach to this problem is a dynamic analysis where the program is instrumented such that the sequence of memory states produced during execution can be recovered. By using techniques from machine learning, we search in this sequence for repeating patterns caused by multiple invocations of a code fragment. If any data structure operations (e.g., insertion at the end of, or deletion in, a singly linked list) were executed, then some of these patterns will represent code fragments of the operations. To label these we employ a pattern matching step that examines the memory transformations due to the identified code fragments. Finally, the combination of operations manipulating a particular data structure allows us to determine its behaviour.

Beyond the obvious benefit of program comprehension, we are currently applying our technique to the verification domain. To this end we are collaborating with verification experts at the University of York, UK in the project Advanced Heap Analysis and Verification, which is supported by the German Research Foundation (DFG, grant no. LU 1748/2-1), and with Dr. Jan Tobias Mühlberg (K.U. Leuven, Belgium).

Contact: Dr. David White

Publications:

  • White, D. H. and Lüttgen, G. Identifying dynamic data structures by learning evolving patterns in memory. In N. Piterman and S. A. Smolka, eds., 19th Intl. Conf. on Tools and Algorithms for the Construction and Analysis of Systems (TACAS 2013), vol. 7795 of Lecture Notes in Computer Science, pp. 354-369, Rome, Italy, March 2013. Springer-Verlag.