Heap Reference Analysis for Functional Programs

Reading time: 7 minute
...

📝 Original Info

  • Title: Heap Reference Analysis for Functional Programs
  • ArXiv ID: 0710.1482
  • Date: 2007-10-08
  • Authors: Amey Karkare, Amitabha Sanyal, Uday Khedker

📝 Abstract

Current garbage collectors leave a lot of garbage uncollected because they conservatively approximate liveness by reachability from program variables. In this paper, we describe a sequence of static analyses that takes as input a program written in a first-order, eager functional programming language, and finds at each program point the references to objects that are guaranteed not to be used in the future. Such references are made null by a transformation pass. If this makes the object unreachable, it can be collected by the garbage collector. This causes more garbage to be collected, resulting in fewer collections. Additionally, for those garbage collectors which scavenge live objects, it makes each collection faster. The interesting aspects of our method are both in the identification of the analyses required to solve the problem and the way they are carried out. We identify three different analyses -- liveness, sharing and accessibility. In liveness and sharing analyses, the function definitions are analyzed independently of the calling context. This is achieved by using a variable to represent the unknown context of the function being analyzed and setting up constraints expressing the effect of the function with respect to the variable. The solution of the constraints is a summary of the function that is parameterized with respect to a calling context and is used to analyze function calls. As a result we achieve context sensitivity at call sites without analyzing the function multiple number of times.

💡 Deep Analysis

Deep Dive into Heap Reference Analysis for Functional Programs.

Current garbage collectors leave a lot of garbage uncollected because they conservatively approximate liveness by reachability from program variables. In this paper, we describe a sequence of static analyses that takes as input a program written in a first-order, eager functional programming language, and finds at each program point the references to objects that are guaranteed not to be used in the future. Such references are made null by a transformation pass. If this makes the object unreachable, it can be collected by the garbage collector. This causes more garbage to be collected, resulting in fewer collections. Additionally, for those garbage collectors which scavenge live objects, it makes each collection faster. The interesting aspects of our method are both in the identification of the analyses required to solve the problem and the way they are carried out. We identify three different analyses – liveness, sharing and accessibility. In liveness and sharing analyses, the func

📄 Full Content

An object is dead at an execution instant if it is not used in future. Ideally, garbage collectors should reclaim all objects that are dead at the time of garbage collection. However, even state of the art garbage collectors are not able to distinguish between reachable objects that are live and reachable objects that are dead. Therefore they conservatively approximate the liveness of an object by its reachability from a set of locations called the root set (stack locations and registers containing program variables). As a consequence, many dead objects are left uncollected. This has been confirmed by empirical studies for Haskell [1], Scheme [2] and Java [3][4][5].

In this paper, we consider a first order functional language without imperative features and propose a method to release dead objects so that they can be collected by the garbage collector. This is done by detecting unused references to objects and setting them to null. If all references to the object are nullified, then the dead objects may become unreachable and may be claimed by garbage collector. We propose three analyses to obtain the information required for nullification: liveness analysis, which computes live references at each program point (i.e. the references used by the program beyond the program point), sharing analysis, which computes alternate ways to access live references and accessibility analysis which ensures that the references used by the nullification statement itself exist and do not cause a dereferencing exception. An earlier paper [6] outlined the basic method and provided details of the liveness analysis. This paper brings the theoretical aspects of the method to completion.

As our analyses are interprocedural in scope, the effect of function calls on the heap must be modeled precisely. Most program analyses are either not scalable because they analyze the same function more than once or imprecise because they make overly safe worst-case assumptions about the effect of a function on the heap. For a better balance between scalability and precision, one can compute context independent summaries of the effect of functions on the heap and then use this summary at particular calling context of the function [7][8][9]. We do this by using a variable to represent an unknown context of the function being analyzed and setting up constraints expressing the effect of the function with respect to the variable. The set of constraints is viewed as a set of CFGs and the solution of these constraints is a set of finite state machines approximating the languages defined by the CFGs. The solution, which is a summary of the function parameterized with respect to a calling context, is used to analyze function calls.

The main contributions of the paper are as follows. We identify the analysis required to find nullable references at each program point. As part of the analyses, we show how context independent summaries of functions can be obtained by setting up a set of constraints and solving them by viewing them as a CFG. Finally we show how the result can be used for safe insertion of nullifying statements in the program.

Figure 1(a) shows an example program. The label π of an expression e denotes the program point just before the evaluation of e. The heap memory can be viewed as a (possibly unconnected) directed acyclic graph called memory graph 1during any instant in the execution of the program. The elements of root set are the entry points for the memory graph. The nodes in the memory graph are the cons cells allocated in the heap. There are three kind of edges in the memory graph: (1) Entry edges from an element of the root set to a heap node, (define (append lst1 lst2) (if (null? lst1) lst2 (cons (car lst1) (append (cdr lst1) lst2))))

(let z ←(cons (cons 4 (cons 5 nil)) (cons 6 nil)) in (let y ← (cons 3 nil) in πa :(let w ← (append y z) in πb :(car (car (cdr w)))))) (2) edges from the car field of a heap node to another, and (3) edges from the cdr field of a heap node to another. Elements of the basic data types and the 0-ary constructor nil form the leaf nodes of the graph. All data is assumed to be boxed, i.e. stored in heap cells and accessed through references. The edges in the graph are also called links. The edges shown by thick arrows are those which will be dereferenced beyond π b . These edges are live at π b . Edges that are not live can be nullified by the compiler by inserting suitable statements. These edges are shown with a × in the figure. If an object becomes unreachable due to nullification of such edges, it can be collected by the garbage collector. Note that an edge need not be nullified if nullifying some other edges makes it unreachable from the root set.

To find out all nullable edges in a memory graph, we need the following analyses:

-For every program point π, liveness analysis finds out all the edges in the memory graph that can be potentially dereferenced along some path from π to exit. For the progr

…(Full text truncated)…

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut