Bezpieczna dealokacja obiektów

Z Lem
Skocz do: nawigacji, wyszukiwania

Explicit dealocation without dangling references

by Andrzej Salwicki, Andrzej Zadrozny

a preliminary version

Abstract. This paper appears 30 years after the article by Gianna Cioni and Antoni Kreczmar under the same title. In the original paper [Kreczmar 1987], Kreczmar introduced a memory management system for dynamic allocation of objects with a few interesing properties. While leaving the programmer in control of object dealloacation (which in Kreczmar’s system is explicit), the system is free of the dangling reference problem, i.e., following the explicit deallocation of an object, all references to the object within the program become automatically nullified. In the present paper, we compare the memory management system described in [Cioni and Kreczmar 1984] with the solutions that appeared later (cf. Table I. below). We also give a formal specification for Kreczmar’s allocator in the form of a formalized algorithmic theory ATHM and sketch the construction of a programming model of the specification. Thus we prove that the specification is consistent, i.e., free of contradictions, and hence it is feasible.

Keywords: programming languages, objects, garbage collection, dangling reference, safety

Introduction

The management of objects, especially the problem of efficient and safe memory allocation, plays an essential role in object-oriented programming. A practical system addressing the problem must be safe (i.e., a deallocated object should be clearly perceived as absent from all places in the program, preventing an unintended interpretation of “bad memory” as the contents of an object that formally no longer exists) and efficient, meaning that the cost of referencing an object (dereferencing an object pointer) cannot increase too significantly beyond the cost of a simple (machine-level) indirect memory reference. Typically, the dynamic instances of a program’s modules reside in two disjoint areas in memory: the stack and the heap. The activation records of functions, methods, contructors, and blocks are allocated on the stack, while the dynamic instances of structures, classes, and, sometimes, more exotic objects, like coroutines (for those languages that admit such objects) are stored in the heap. That latter category of objects is of our primary concern. Owing to the fact that, unlike the former category, the patterns of their creation and destruction do not follow a simple ordered paradigm, but are essentially random and arbitrary, the mechanisms of their allocation and requisite bookkeeping are open for discussion and invention, being thus also susceptible to trends. The first generation of popular programming languages with rudiments of objects: Pascal and C, opted for simplicity (read efficiency) at the cost of safety. Following the (explicit) deallocation of an object, any replicate references to the object within the program would still point to the old chunk of memory originally used to store the object, thus creating a hazard. It was up to the programmer to sidestep that hazard by not touching the old references (pointers) until reset. Note that a mistake consisting in a reference to a deallocatad object would not, in those circumstances, translate into an immediate error: a read access would return an incorrect result (that might trigger an error a potentially long while later), while a write access would potentially corrupt some legitimate object, often with delayed symptoms difficult to diagnose. When C++ appeared later, inheriting and absorbing the legacy of C under its umbrella, it did not revolutionize the paradigm of memory allocation for objects, primarily for two reasons:

  1. C++ still preferred efficiency over safety (it was supposed to replace C without affecting the implementation aspects of its C subset)
  2. C++ wanted to be compatible with C (C functions, using the traditional memory allocator, had to be able to coexist with C++ functions within the same program allowing both types of functions to operate on the same pointers).

Java brought a response to the lack of safety inherent in the traditional approach to memory allocation for objects in the postulate to completely hide the action of object deallocation from the programmer’s view. By abandoning pointers (plus the large bag of pointer trick available to experts in C and C++), and replacing deallocation with the implicit and invisible garbage collection, Java also simplified object-oriented programming, thus making it accessible to a wider population of programmers (who no longer needed to be experts on such low-level and mundane aspects of computing as pointers). C#, introduced by Microsoft as their own “Java”, was basically the same creation when seen from the viewpoint of run-time memory management. That global paradigm shift was compatible with the rapid advancements in CPU and RAM technologies making large and sloppy (less efficient) programs vastly more acceptable than in the early days of C++. These days, few people care whether the efficiency of non-number-crunching program can be improved by the factor of 10 or 20 in terms of execution speed or memory demands for as long as the program appears to work. Following the short initial euphoria, the sealed, one-size-fits-all memory allocation system of Java (and friends) begun to exhibit its own shortcomings. The problem was the inability of the programmer to indicate to the garbage collector that some objects were less needed than others, even though references to them were still present in the program. The standard example of a situation where this kind of indication would be useful is opportunistic caching where, by definition, an object stored in the cache (and thus being referenced by a “pointer”) can (should) be removed when memory becomes scarce. Thus, in 1995 (i.e., three years after the introduction of the first version of Java), the concept of weak reference was added to the language. The idea was to let the garbage collector remove an object, if it was reachable solely via weak references. Notably, that solution didn’t quite solve all problems. As of today, there are four reference types in Java: strong, soft, weak, and phantom (in the decreasing order of strength) whose role is to cater to the various flavors of discarding an object without eliminating its reference first. Judging by the amount of discussion on the web, the four reference types cause more problems to apprentice Java programmers than pointers ever did to newcomers to C or C++. In the latter case the issue was quite simple: one either understood pointers in their entirety or not at all, whereas in Java the situation is exacerbated by the fact that one can go some way without any understanding of the underlying problem at all. The problem, of course, is that there are important circumstances where the cog- nizant programmer would (should) prefer to exercise more or less direct control over memory allocation for objects. That was realised quite promptly, also within the community of (expert) Java developers, despite their initial optimistic attitude towards garbage collection as the sole and ultimate remedy for all problems of memory allocation. In this context, as the pendulum seems to have reached the end of its swing, and is in fact beginning to move in the opposite direction, we believe that the time is ripe to revisit the valuable and somewhat forgotten ideas behind a solution that strikes a compromise between the two extremes, i.e., direct (unsafe) deallocation and (total, unqualified) garbage collection. This solution comes as safe programmed deallocation.

Managing objects

Proper management of objects plays an essential rôle in execution of object-oriented programs. It must ensure safety and efficiency. Ill thought ways to remove objects appear unsafe. On the other hand keeping not needed objects slows down the calculations and may lead to a blockade.

In 1984 Antoni Kreczmar and Gianna Cioni presented an object management system which is free of dangling references and is efficient. The system has been validated by the use in Loglan’s running system (i.e. virtual machine)during more than 30 years.

Most of presently used object oriented programming languages appeared after that date (C++ in 1985, Java in 1995, Python in 1989, Ruby 1993). No one of them has a system similar to the Kreczmar’s one. In C++ deallocation of objects is allowed without any form of control. It is well known that the system has no protection against dangling reference errors.

Java programming language forbids the explicit removal of objects. Its system is safe however inefficient.

We think that it is worthwhile to discover the solutions contained in and to apply them in practice.

Let us expose briefly the problems. Objects are created, used and eventually become not needed anymore.

The designer of an object management system is confronted to three major threats:

  • memory leak problem,
  • memory fragmentation,
  • dangling references problem.

Memory leakage occurs when objects are created and remain unused. Program consumes memory. It leads to the slowdown of computations or even to a complete blockade.

There exists a variety of object management systems. One may classify them with respect to different definitions of garbage.

The first, most natural, definition of garbage reads Object is a garbage whenever the program decides it is no longer needed. This definition is accepted in C++ programming language. A programmer has the instruction delete(x) to his disposal. However, the instruction has a side effect: the dangling reference may appear. Namely, we observe some variables that point to segments of memory where no object resides. A dangling reference provokes another error of contradicting information. This phenomenon occurs when two variables point to the same address and one says: “I am pointing to an object of type A and the other says I am pointing to an object of type B. Errors of both kinds are difficult in diagnosis and very dangerous ones.

Another definition of garbage reads: Object with no live references to it is a garbage. Some programming languages try to keep track of references with reference counters (e.g. Python ). In this way a garbage collector can easily identify objects with reference counters equal zero as garbage. However, by introducing reference counters one creates an overhead in code’s length and also in execution time. The result is a slowdown of execution (A. Appel says “On the whole, the problems with reference counting outweigh its advantages”, p.264).

Subsequent definitions of garbage bases on different types of references. Namely, one differentiates weak references from normal ones. Now, Object [math]o[/math] is an garbage if there is no normal reference to it, even if there exist weak reference to [math]o[/math]. The weak references were introduced with two aims: 1to decrease the cost of reference counting, 2to diminish the risk that some objects will be kept for the programmer forgot to nullify all references to it.

All three types of object management systems have certain deficiencies. The question arises: is it possible to replace operation delete by another operation kill, such that kill has no side effect of dangling reference. In the paper of Cioni and Kreczmar it was shown that there exists an objects management system with kill operation.

Any program that intensively creates objects and deallocates some, may cause fragmentation of the object memory. Sometimes, the situation may be improved by the defragmentation.[1]

The system described by Kreczmar is a hybrid one. For it allows to

  • deallocate no longer needed objects (by means of kill operation),
  • compactify heap of object (a defragmentation operation),
  • garbage collect objects that are not accessible (by a reference).

Below we compare object programming languages we require that any system that manages heap of objects should satisfy the following conditions:

  1. For every type (class) [math]T[/math], for every variable [math]x[/math]. If the variable [math]x[/math] is of type [math]T[/math] then its value is an object of a subclass [math]U[/math] of class [math]T[/math] or [math]x=none[/math]. (This is a global invariant of the system)
  2. For every object [math]s[/math] it is possible to distinguish the fields containing pointers to objects from the fields that hold values of primitive types, like e.g. integer, float, boolean,...

Deallocation in Java, C++, Python et al.

In this section we briefly present the solutions taken in other object oriented languages.

C++, object Pascal, Objective C

We shall limit our considerations to the programs free of malloc instruction. For malloc instructions break the rules r1, r2, listed above.

Object [math]x[/math] can be deleted with [math]delete(x)[/math] statement. Other variables that are pointing to the removed object preserve their value. It leads to the dangling references error. The error can be avoided if all those variables are nullified [math]x1 = null; ... xn=null;[/math].

It is to a developer to remember all variables referring to the object getting deallocated and to nullify all of them prior to instruction delete. Obviously it is an error prone approach. To decrease the number of errors the notion of weak reference was proposed. .

Java, Python

Already the report of language Modula3 drew attention to the risk of hanging references and non-existence of an algorithm that could detect such errors. In Java white paper there are quite a few statements describing this problem and justifying the interdict of [math]delete[/math] instruction. Instead there is a Garbage Collector mechanism, which frees developer from manual removing of references to objects.

Soon, the opinion that garbage collector is an ultimate solution in objects management was verified. Three years after first version of Java (1995), in JDK 1.2 weak references have been introduced. Developer can declare variable as weak reference. Weak references do not change reference counter in cPython , and garbage collection algorithm does not take them into account in Java. If there are no strong references to object it is considered for collection. Even if there are some weak references to it.

Comparison

Loglan’82 C++, Pascal Java, Python
Pre- Certain object [math]o[/math] is referenced by the [math]n[/math] variables, i.e. [math]\ x_1=x_2=\ ...\ =x_n[/math].
Code kill([math]x_i[/math]) delete([math]x_i[/math]) [math] x_1=null;\ x_2=null;\ ...\ x_n=null[/math];

Now, the instruction

[math]gc()[/math] causes the object [math]o[/math] will be deleted.

Post- All the variables took the value none.

Object [math]o[/math] is deleted.

Object [math]o[/math] has been deleted. The variable [math]x_i[/math] has the value null.

Other variables point to the deleted frame – it is a dangerous error – dangling reference.

Object [math]o[/math] has been deleted – under condition that all the references to the object have been earlier assigned the null value.
Cost [math]O(1)[/math] [math]O(1)[/math] [math]O(n+m)[/math] where [math] m[/math] is the global size of the heap of objects.
Risk [math] No\ risk(!) [/math]

For each attempt to read and/or write from the deleted object will raise an error signal {reference to none}.

If n>1 then dangling reference error occurs. Worse, with high probability, the error of contradicting information will occur. Chances that programmer will forget to nullify some reference to the object [math]o[/math] and hence that the object will remain (not deleted).



Observation. Lets consider situation from Java:

[math]x_1:x_1 \rightarrow o[/math]

[math]y_{weak} \stackrel{weak}{:=} x_1[/math]

If weak reference [math]y[/math] refer to some object o, then there exists a strong reference [math]x_i[/math] to same object:

[math]y_{weak}: y_{weak} \rightarrow o \Leftrightarrow exists\ x_i \rightarrow o[/math]

It is only partially true. We could say probably true. Due to non-deterministic behaviour of garbage collector new ambiguity arises. After operation [math]x_1:=null[/math] weak reference [math]y_{weak}[/math] continues to refer to original object [math]o[/math] for some time. Only after run of garbage collection mechanism object is disposed and weak reference is nullified. Due to non-deterministic implementations of most garbage collectors developers cannot predict or effectively enforce collection. Side effect of this property is a necessity to treat normal object references and weak references in different manner.

Second observation. As long as weak reference to the object exists we can provoke a situation when object intended for collection will be restored to life:

123456789= Disposable tg = new Disposable();
/* A new Disposable object o created, tg is a reference to the object o */
WeakReference<Disposable> weak_tg = new WeakReference<Disposable>(tg);
/* creates Weak Reference to the same object o */
tg = null; /* remove last reference to the object o
The object o is ready to be collected */
System.gc(); /* This is a hint to activate GC */
Disposable tg2 = weak_tg.get();
/* It may happen that GC was not activated
And operation get will return reference to the object o */

This will happen when instruction [math]System.gc()[/math] will be ignored for some reason by virtual machine.

Algorithmic specification of Kreczmar’s system

System for the first time presented in original article is part of running-system (virtual machine) Loglan’82 language. Last 30 years provided us with arguments to renew interest in Kreczmar’s system. In this chapter we will provide formal requirements for managing memory of objects. We used here H.Oktaba thesis . After reading these requirements, the reader can better understand the nature of the result .

Informal description

The universe of a heap managing system HM consists of states and objects. A state may be viewed as a finite set of objects. For the purpose of the present work we can abstract from the structure of objects, even from their size.[2]

The universe of HM system consist of two sets [math]U = Frames \cup States[/math] with the following operations:

  • [math]reserve: States \rightarrow Frames[/math]
  • [math]insert: Frames \times States \rightarrow States[/math]
  • [math]delete: Frames \times States \rightarrow States[/math]
  • [math]member: Frames \times States \rightarrow {true, false}[/math]
  • [math]initSt \in States [/math]
  • [math]kill: Frame \times States \rightarrow States[/math]

[math]States[/math] are finite sets of [math]frames[/math]. For each state [math]s[/math] function [math]reserve[/math] returns a frame that does not belong to [math]s[/math]. For each pair [math]\langle e, s \rangle[/math] operation insert return the set-theoretical union of the set [math]s[/math] and and the element [math]e[/math]. Similarly, operation delete returns the set [math]s'[/math] that resulted by the deletion of element [math]e[/math] from the set [math]s[/math]. The element [math]initSt[/math] is the empty set.

Algorithmic specification

Specifications can be expressed in two variants:

  • a metamathematical one, or
  • a quasi-programmed one.

We choose the second notation. Its form is similar to Java’s interface or ADA’s specification. The differentia specifica stems from the presence of algorithmic formulas.[3] They are invariants or axioms of the specified class.

Remember, in this way we define a formalized language in which one may program algorithms and/or write formulas. unit HeapManagement: class;
unit Frame: class ; ...
unit State: class ...
unit reserve: function(s: State): Frame ...
unit member: function(f:Frame, s:State): Frame ...
unit insert: function(f:Frame,s: State): State; ...
unit delete: function(f:Frame, s: State): State ...
unit kill: function(f:Frame, s: State): State ...
unit amb: function(s: State): Frame ...
unit initState: function(): State ...

(* —– Invariants (i.e. axioms) of the Heap Management system —– *)

Below we shall use abbreviations [math]Fr[/math] instead of Frames, [math]St[/math] instead of States,
[math]res[/math] – reserve, [math]mb[/math] – member, [math]ins[/math] – insert, [math]del[/math] – delete, [math]init[/math] – initstate.

  1. [math]\forall_{s \in St}\ \textbf{while}\ s \neq init \ \textbf{do}\ s := del(amb(s),s) \ \textbf{od}\ (s = init)[/math]

    For every state the program terminates, hence, every state is a finite set of frames

  2. [math]\forall_{f \in Fr}\ \lnot mb(f, init)[/math]

    initstate is the empty set of frames

  3. [math]\forall_{s \in St} \ \lnot mb(none, s)[/math]

    constant none does not belong to any state

  4. [math]\forall_{s \in St} \ \lnot mb(res(s),s) \wedge res(s) \neq none [/math]

    for every state [math]s[/math], function [math]res[/math] returns a frame that does not belong to state [math]s[/math]

  5. [math]\forall_{s \in St} \ s\neq init \Rightarrow mb(amb(s),s)[/math]

    For every non-empty state, function [math]amb[/math] returns a member of the set [math]s[/math]

  6. [math]\{s':=ins(f,s)\}(mb(f,s')\wedge \forall_{f' \in Fr}(f'\neq f \implies mb(f',s) \Leftrightarrow mb(f',s')) [/math]

    operation insert adds frame f to the set s


  7. [math]\{s':=del(f,s) \}(\lnot mb(f,s')\wedge \forall_{f' \in Fr}((f'\neq f \implies mb(f',s)\Leftrightarrow mb(f',s'))[/math]

    operation del deletes frame f from the set s

  8. [math]mb(f,s) \Leftrightarrow \left \{ \begin{array}{l} \textbf{begin} \\ \quad s1 := s;\ bool := false; \\ \quad \textbf{while} \ s1 \neq init \wedge \lnot bool \\ \quad \textbf{do} \\ \qquad f1 := amb(s1); \\ \qquad \textbf{if} \ f=f1 \ \textbf{then} \ bool := true \ \textbf{fi}; \\ \qquad s1 :=del(f1,s1); \\ \quad \textbf{od} \\ \textbf{end} \ \end{array} \right \} bool [/math]

    This formula defines the relation member, its implementation may differ however. We require that its cost should be constant.

  9. The operation kill is described by a scheme. The index [math]k[/math] may be any natural number [math]k\gt0[/math], let [math]1 \leq i \leq k[/math]

    [math] ((f_1= ... =f_k) \wedge mb(f_1,s))\Rightarrow [s' :=kill(f_i,s)](f_1= ... =f_k = \textbf{none}) [/math]

    This is a scheme, it tells that operation kill in one move nullifies all the references to the object pointed by the variable [math]f_i[/math]. We require that its cost should be constant.

end HeapManagement;

Properties of the specification

The above set of algorithmic formulas defines the requirements imposed on a class to be implemented. Moreover, it allows to prove some useful facts e.g.
Theorem 1
For every state s,
[math]\forall_{s \in State}\, \left \{ \begin{array}{l} \textbf{begin} \\ \quad s' := initSt; \\ \quad \textbf{while}\ s \neq initSt\ \textbf{do} \\ \qquad r := reserve(s'); \\ \qquad \textbf{if}\ member(r,s)\ \textbf{then} \ s := delete(r,s) \ \textbf{fi}; \\ \qquad s' := insert(r,s') \\ \quad \textbf{od} \\ \textbf{end} \end{array} \right \}\, (s = initSt)[/math]

i.e. the operation reserve new frame may replace the operation amb in the postulate that every state is a finite set of frames.
Theorem 2
The program in the axiom HM[math]_8[/math] never loops.

We would like to attract the attention of the reader to the property HM[math]_4[/math]. One may use it to deduce the important fact.
Theorem 3
Let T be any class, let (a[math]_1[/math], …, a[math]_k[/math]) be a list of actual parametres. [math]\textbf{new }T(a_1, \dots , a_k) \neq \textbf{new }T(a_1, \dots , a_k)[/math]

One can investigate the properties of the specification itself. We are able to state a couple of important metatheorems about the system of axioms HM.
Metatheorem 1
(on consistency)
The system of axioms HM[math]_1[/math] – HM[math]_9[/math] has a model.

Proof. Using the ideas of the paper one can construct a programmable model of axioms HM[math]_1[/math] – HM[math]_9[/math].
This model will be called the standard model.
Metatheorem 2
(representation theorem)
Every model of the axioms HM[math]_1[/math] – HM[math]_9[/math] is isomorphic to the standard one.

Variations of axiom’s system

Are the simplifications we made important? One can easily observe two points:

  • One can consider a slightly different operation reserve - with a parameter appetite defining the size required for an object. It leads to a consistent set of axioms.
  • Another extension of our system HM is defined when one describes the internal structure of object. This extension is also consistent with our initial system.

Till now we needed not to introduce an operation of garbage collection. In our abstract version the set of Frames is isomorphic with the set of natural numbers.

One can ask how to express the property the set [math]Fr[/math] of frames is finite? The answer is easy:
HM[math]_{10}[/math]) [math]\exists_{s_0 \in St} \forall_{f \in Fr} mb(f, s_0)[/math]
Which reads: the set of frames is equal to some state, hence [math]Fr[/math] is a finite set.
Again, the set of the axioms HM[math]_1[/math] – HM[math]_{10}[/math] is consistent.

Final remarks

In the paper the memory management system is treated as a whole. The problems of garbage collection and dangling references were not separated.

Operation kill can be safely implemented with low, fixed cost. The frequency of garbage collection is diminished due to following discipline

  1. kill operation is called whenever an object should be deleted, the freed frame is added to the list of free memory frames,
  2. the list of free memory frames is checked and used during an operation of new,
  3. compactification (i.e. defragmentation) is done when needed,
  4. instruction gc() is called when compactification did not help.

Let us remark that since Java was introduced in 1995 the memory size has grown thousand times.

Acknowledgement

The second author was financed by research fellowship within the Project Information technologies: Research and its interdisciplinary applications, Agreement number UDA POKL.04.01.01-00-051/10-00.

  1. The authors of use the word compactification.
  2. One can add these ingredients of object’s size and its content later.
  3. In Eiffel programming language, one finds invariants of classes. They are formulas of first-order language which limits their expressibility in an essential way.