Bezpieczna dealokacja obiektów

Z Lem
Skocz do: nawigacji, wyszukiwania

This paper appears 30 years after the article by G.Cioni and A. Kreczmar under the same title . We 1) explain the problems of managing objects, 2) compare with existing (partial!) solutions and finally, 3) present a specification of Kreczmar’s system as an axiomatic system.

Keywords: programming languages, objects, garbage collection, dangling reference, safety

Managing objects

Proper management of objects plays an essential rôle in execution of object-oriented programs. It must ensure safety and efficiency. Ill thought ways to remove objects appear unsafe. On the other hand keeping not needed objects slows down the calculations and may lead to a blockade.

In 1984 Antoni Kreczmar and Gianna Cioni presented an object management system which is free of dangling references and is efficient. The system has been validated by the use in Loglan’s running system (i.e. virtual machine)during more than 30 years.

Most of presently used object oriented programming languages appeared after that date (C++ in 1985, Java in 1995, Python in 1989, Ruby 1993). No one of them has a system similar to the Kreczmar’s one. In C++ deallocation of objects is allowed without any form of control. It is well known that the system has no protection against dangling reference errors.

Java programming language forbids the explicit removal of objects. Its system is safe however inefficient.

We think that it is worthwhile to discover the solutions contained in and to apply them in practice.

Let us expose briefly the problems. Objects are created, used and eventually become not needed anymore.

The designer of an object management system is confronted to three major threats:

  • memory leak problem,
  • memory fragmentation,
  • dangling references problem.

Memory leakage occurs when objects are created and remain unused. Program consumes memory. It leads to the slowdown of computations or even to a complete blockade.

There exists a variety of object management systems. One may classify them with respect to different definitions of garbage.

The first, most natural, definition of garbage reads Object is a garbage whenever the program decides it is no longer needed. This definition is accepted in C++ programming language. A programmer has the instruction delete(x) to his disposal. However, the instruction has a side effect: the dangling reference may appear. Namely, we observe some variables that point to segments of memory where no object resides. A dangling reference provokes another error of contradicting information. This phenomenon occurs when two variables point to the same address and one says: “I am pointing to an object of type A and the other says I am pointing to an object of type B. Errors of both kinds are difficult in diagnosis and very dangerous ones.

Another definition of garbage reads: Object with no live references to it is a garbage. Some programming languages try to keep track of references with reference counters (e.g. Python ). In this way a garbage collector can easily identify objects with reference counters equal zero as garbage. However, by introducing reference counters one creates an overhead in code’s length and also in execution time. The result is a slowdown of execution (A. Appel says “On the whole, the problems with reference counting outweigh its advantages”, p.264).

Subsequent definitions of garbage bases on different types of references. Namely, one differentiates weak references from normal ones. Now, Object [math]o[/math] is an garbage if there is no normal reference to it, even if there exist weak reference to [math]o[/math]. The weak references were introduced with two aims: 1to decrease the cost of reference counting, 2to diminish the risk that some objects will be kept for the programmer forgot to nullify all references to it.

All three types of object management systems have certain deficiencies. The question arises: is it possible to replace operation delete by another operation kill, such that kill has no side effect of dangling reference. In the paper of Cioni and Kreczmar it was shown that there exists an objects management system with kill operation.

Any program that intensively creates objects and deallocates some, may cause fragmentation of the object memory. Sometimes, the situation may be improved by the defragmentation.[1]

The system described by Kreczmar is a hybrid one. For it allows to

  • deallocate no longer needed objects (by means of kill operation),
  • compactify heap of object (a defragmentation operation),
  • garbage collect objects that are not accessible (by a reference).

Below we compare object programming languages we require that any system that manages heap of objects should satisfy the following conditions:

  1. For every type (class) [math]T[/math], for every variable [math]x[/math]. If the variable [math]x[/math] is of type [math]T[/math] then its value is an object of a subclass [math]U[/math] of class [math]T[/math] or [math]x=none[/math]. (This is a global invariant of the system)
  2. For every object [math]s[/math] it is possible to distinguish the fields containing pointers to objects from the fields that hold values of primitive types, like e.g. integer, float, boolean,...

Deallocation in Java, C++, Python et al.

In this section we briefly present the solutions taken in other object oriented languages.

C++, object Pascal, Objective C

We shall limit our considerations to the programs free of malloc instruction. For malloc instructions break the rules r1, r2, listed above.

Object [math]x[/math] can be deleted with [math]delete(x)[/math] statement. Other variables that are pointing to the removed object preserve their value. It leads to the dangling references error. The error can be avoided if all those variables are nullified [math]x1 = null; ... xn=null;[/math].

It is to a developer to remember all variables referring to the object getting deallocated and to nullify all of them prior to instruction delete. Obviously it is an error prone approach. To decrease the number of errors the notion of weak reference was proposed. .

Java, Python

Already the report of language Modula3 drew attention to the risk of hanging references and non-existence of an algorithm that could detect such errors. In Java white paper there are quite a few statements describing this problem and justifying the interdict of [math]delete[/math] instruction. Instead there is a Garbage Collector mechanism, which frees developer from manual removing of references to objects.

Soon, the opinion that garbage collector is an ultimate solution in objects management was verified. Three years after first version of Java (1995), in JDK 1.2 weak references have been introduced. Developer can declare variable as weak reference. Weak references do not change reference counter in cPython , and garbage collection algorithm does not take them into account in Java. If there are no strong references to object it is considered for collection. Even if there are some weak references to it.

Comparison

Loglan’82 C++, Pascal Java, Python
Pre- colspan=3 Certain object [math]o[/math] is referenced by the [math]n[/math] variables, i.e. [math]\ x_1=x_2=\ ...\ =x_n[/math].
Code kill([math]x_i[/math]) delete([math]x_i[/math]) [math] x_1=null;\ x_2=null;\ ...\ x_n=null[/math];

Now, the instruction

[math]gc()[/math] causes the object [math]o[/math] will be deleted.

Post- All the variables took the value none.

Object [math]o[/math] is deleted.

Object [math]o[/math] has been deleted. The variable [math]x_i[/math] has the value null.

Other variables point to the deleted frame – it is a dangerous error – dangling reference.

Object [math]o[/math] has been deleted – under condition that all the references to the object have been earlier assigned the null value.
Cost [math]O(1)[/math] [math]O(1)[/math] [math]O(n+m)[/math] where [math] m[/math] is the global size of the heap of objects.
Risk No risk(!)

For each attempt to read and/or write from the deleted object will raise an error signal {reference to none}.

If n>1 then dangling reference error occurs. High probability of the error of contradicting information. Chances that programmer will forget to nullify some reference to the object [math]o[/math] and hence that the object will remain (not deleted).



Observation. Lets consider situation from Java:

[math]x_1:x_1 \rightarrow o[/math]

[math]y_{weak} \stackrel{weak}{:=} x_1[/math]

If weak reference [math]y[/math] refer to some object o, then there exists a strong reference [math]x_i[/math] to same object:

[math]y_{weak}: y_{weak} \rightarrow o \Leftrightarrow exists\ x_i \rightarrow o[/math]

It is only partially true. We could say probably true. Due to non-deterministic behaviour of garbage collector new ambiguity arises. After operation [math]x_1:=null[/math] weak reference [math]y_{weak}[/math] continues to refer to original object [math]o[/math] for some time. Only after run of garbage collection mechanism object is disposed and weak reference is nullified. Due to non-deterministic implementations of most garbage collectors developers cannot predict or effectively enforce collection. Side effect of this property is a necessity to treat normal object references and weak references in different manner.

Second observation. As long as weak reference to the object exists we can provoke a situation when object intended for collection will be restored to life:

123456789= Disposable tg = new Disposable();
/* A new Disposable object o created, tg is a reference to the object o */
WeakReference<Disposable> weak_tg = new WeakReference<Disposable>(tg);
/* creates Weak Reference to the same object o */
tg = null; /* remove last reference to the object o
The object o is ready to be collected */
System.gc(); /* This is a hint to activate GC */
Disposable tg2 = weak_tg.get();
/* It may happen that GC was not activated
And operation get will return reference to the object o */

This will happen when instruction [math]System.gc()[/math] will be ignored for some reason by virtual machine.

Algorithmic specification of Kreczmar’s system

System for the first time presented in original article is part of running-system (virtual machine) Loglan’82 language. Last 30 years provided us with arguments to renew interest in Kreczmar’s system. In this chapter we will provide formal requirements for managing memory of objects. We used here H.Oktaba thesis . After reading these requirements, the reader can better understand the nature of the result .

Informal description

The universe of a heap managing system HM consists of states and objects. A state may be viewed as a finite set of objects. For the purpose of the present work we can abstract from the structure of objects, even from their size.[2]

The universe of HM system consist of two sets [math]U = Frames \cup States[/math] with the following operations:

  • [math]reserve: States \rightarrow Frames[/math]
  • [math]insert: Frames \times States \rightarrow States[/math]
  • [math]delete: Frames \times States \rightarrow States[/math]
  • [math]member: Frames \times States \rightarrow {true, false}[/math]
  • [math]initSt \in States [/math]
  • [math]kill: Frame \times States \rightarrow States[/math]

[math]States[/math] are finite sets of [math]frames[/math]. For each state [math]s[/math] function [math]reserve[/math] returns a frame that does not belong to [math]s[/math]. For each pair [math]\langle e, s \rangle[/math] operation insert return the set-theoretical union of the set [math]s[/math] and and the element [math]e[/math]. Similarly, operation delete returns the set [math]s'[/math] that resulted by the deletion of element [math]e[/math] from the set [math]s[/math]. The element [math]initSt[/math] is the empty set.

Algorithmic specification

Specifications can be expressed in two variants:

  • a metamathematical one, or
  • a quasi-programmed one.

We choose the second notation. Its form is similar to Java’s interface or ADA’s specification. The differentia specifica stems from the presence of algorithmic formulas.[3] They are invariants or axioms of the specified class.

Remember, in this way we define a formalized language in which one may program algorithms and/or write formulas. unit HeapManagement: class;
unit Frame: class ; ...
unit State: class ...
unit reserve: function(s: State): Frame ...
unit member: function(f:Frame, s:State): Frame ...
unit insert: function(f:Frame,s: State): State; ...
unit delete: function(f:Frame, s: State): State ...
unit kill: function(f:Frame, s: State): State ...
unit amb: function(s: State): Frame ...
unit initState: function(): State ...

(* —– Invariants (i.e. axioms) of the Heap Management system —– *)

Below we shall use abbreviations [math]Fr[/math] instead of Frames, [math]St[/math] instead of States,
[math]res[/math] – reserve, [math]mb[/math] – member, [math]ins[/math] – insert, [math]del[/math] – delete, [math]init[/math] – initstate.

  1. [math]\forall_{s \in St}\ \textbf{while}\ s \neq init \ \textbf{do}\ s := del(amb(s),s) \ \textbf{od}\ (s = init)[/math]

    For every state the program terminates, hence, every state is a finite set of frames

  2. [math]\forall_{f \in Fr}\ \lnot mb(f, init)[/math]

    initstate is the empty set of frames

  3. [math]\forall_{s \in St} \ \lnot mb(none, s)[/math]

    constant none does not belong to any state

  4. [math]\forall_{s \in St} \ \lnot mb(res(s),s) \wedge res(s) \neq none [/math]

    for every state [math]s[/math], function [math]res[/math] returns a frame that does not belong to state [math]s[/math]

  5. [math]\forall_{s \in St} \ s\neq init \Rightarrow mb(amb(s),s)[/math]

    For every non-empty state, function [math]amb[/math] returns a member of the set [math]s[/math]

  6. [math]\{s':=ins(f,s)\}(mb(f,s')\wedge \forall_{f' \in Fr}(f'\neq f \implies mb(f',s) \Leftrightarrow mb(f',s')) [/math]

    operation insert adds frame f to the set s


  7. [math]\{s':=del(f,s) \}(\lnot mb(f,s')\wedge \forall_{f' \in Fr}((f'\neq f \implies mb(f',s)\Leftrightarrow mb(f',s'))[/math]

    operation del deletes frame f from the set s

  8. [math]mb(f,s) \Leftrightarrow \left \{ \begin{array}{l} \textbf{begin} \\ \quad s1 := s;\ bool := false; \\ \quad \textbf{while} \ s1 \neq init \wedge \lnot bool \\ \quad \textbf{do} \\ \qquad f1 := amb(s1); \\ \qquad \textbf{if} \ f=f1 \ \textbf{then} \ bool := true \ \textbf{fi}; \\ \qquad s1 :=del(f1,s1); \\ \quad \textbf{od} \\ \textbf{end} \ \end{array} \right \} bool [/math]

    This formula defines the relation member, its implementation may differ however. We require that its cost should be constant.

  9. The operation kill is described by a scheme. The index [math]k[/math] may be any natural number [math]k\gt0[/math], let [math]1 \leq i \leq k[/math]

    [math] ((f_1= ... =f_k) \wedge mb(f_1,s))\Rightarrow [s' :=kill(f_i,s)](f_1= ... =f_k = \textbf{none}) [/math]

    This is a scheme, it tells that operation kill in one move nullifies all the references to the object pointed by the variable [math]f_i[/math]. We require that its cost should be constant.

end HeapManagement;

Properties of the specification

The above set of algorithmic formulas defines the requirements imposed on a class to be implemented. Moreover, it allows to prove some useful facts e.g.
Theorem 1
For every state s,
[math]\forall_{s \in State}\, \left \{ \begin{array}{l} \textbf{begin} \\ \quad s' := initSt; \\ \quad \textbf{while}\ s \neq initSt\ \textbf{do} \\ \qquad r := reserve(s'); \\ \qquad \textbf{if}\ member(r,s)\ \textbf{then} \ s := delete(r,s) \ \textbf{fi}; \\ \qquad s' := insert(r,s') \\ \quad \textbf{od} \\ \textbf{end} \end{array} \right \}\, (s = initSt)[/math]

i.e. the operation reserve new frame may replace the operation amb in the postulate that every state is a finite set of frames.
Theorem 2
The program in the axiom HM[math]_8[/math] never loops.

We would like to attract the attention of the reader to the property HM[math]_4[/math]. One may use it to deduce the important fact.
Theorem 3
Let T be any class, let (a[math]_1[/math], …, a[math]_k[/math]) be a list of actual parametres. [math]\textbf{new }T(a_1, \dots , a_k) \neq \textbf{new }T(a_1, \dots , a_k)[/math]

One can investigate the properties of the specification itself. We are able to state a couple of important metatheorems about the system of axioms HM.
Metatheorem 1
(on consistency)
The system of axioms HM[math]_1[/math] – HM[math]_9[/math] has a model.

Proof. Using the ideas of the paper one can construct a programmable model of axioms HM[math]_1[/math] – HM[math]_9[/math].
This model will be called the standard model.
Metatheorem 2
(representation theorem)
Every model of the axioms HM[math]_1[/math] – HM[math]_9[/math] is isomorphic to the standard one.

Variations of axiom’s system

Are the simplifications we made important? One can easily observe two points:

  • One can consider a slightly different operation reserve - with a parameter appetite defining the size required for an object. It leads to a consistent set of axioms.
  • Another extension of our system HM is defined when one describes the internal structure of object. This extension is also consistent with our initial system.

Till now we needed not to introduce an operation of garbage collection. In our abstract version the set of Frames is isomorphic with the set of natural numbers.

One can ask how to express the property the set [math]Fr[/math] of frames is finite? The answer is easy:
HM[math]_{10}[/math]) [math]\exists_{s_0 \in St} \forall_{f \in Fr} mb(f, s_0)[/math]
Which reads: the set of frames is equal to some state, hence [math]Fr[/math] is a finite set.
Again, the set of the axioms HM[math]_1[/math] – HM[math]_{10}[/math] is consistent.

Final remarks

In the paper the memory management system is treated as a whole. The problems of garbage collection and dangling references were not separated.

Operation kill can be safely implemented with low, fixed cost. The frequency of garbage collection is diminished due to following discipline

  1. kill operation is called whenever an object should be deleted, the freed frame is added to the list of free memory frames,
  2. the list of free memory frames is checked and used during an operation of new,
  3. compactification (i.e. defragmentation) is done when needed,
  4. instruction gc() is called when compactification did not help.

Let us remark that since Java was introduced in 1995 the memory size has grown thousand times.

Acknowledgement

The second author was financed by research fellowship within the Project Information technologies: Research and its interdisciplinary applications, Agreement number UDA POKL.04.01.01-00-051/10-00.

  1. The authors of use the word compactification.
  2. One can add these ingredients of object’s size and its content later.
  3. In Eiffel programming language, one finds invariants of classes. They are formulas of first-order language which limits their expressibility in an essential way.