Bezpieczna dealokacja obiektów: Różnice pomiędzy wersjami

Z Lem
Skocz do: nawigacji, wyszukiwania
(Comparison)
(Bibliografia)
 
(Nie pokazano 32 pośrednich wersji utworzonych przez tego samego użytkownika)
Linia 1: Linia 1:
<small>This paper appears 30 years after the article by G.Cioni and A. Kreczmar under the same title . We 1) explain the problems of managing objects, 2) compare with existing (partial!) solutions and finally, 3) present a specification of Kreczmar’s system as an axiomatic system.</small>
+
<big>Explicit dealocation without dangling references</big>
 +
 
 +
by Andrzej Salwicki, Andrzej Zadrozny
 +
 
 +
''a preliminary version''
 +
 
 +
'''Abstract'''. <small>This paper appears 30 years after the article by Gianna Cioni and Antoni Kreczmar under the same title. In the original paper [Kreczmar 1987], Kreczmar introduced a memory management system for dynamic allocation of objects with a few interesing properties. While leaving the programmer in control of object dealloacation (which in Kreczmar’s system is explicit), the system is free of the dangling reference problem, i.e., following the explicit deallocation of an object, all references to the object within the program become automatically nullified. In the present paper, we compare the memory management system described in [Cioni and Kreczmar 1984] with the solutions that appeared later (cf. Table I. below). We also give a formal specification for Kreczmar’s allocator in the form of a formalized algorithmic theory ATHM and sketch the construction of a programming model of the specification. Thus we prove that the specification is consistent, i.e., free of contradictions, and hence it is feasible.</small>
  
 
Keywords:
 
Keywords:
 
programming languages, objects, garbage collection, dangling reference, safety
 
programming languages, objects, garbage collection, dangling reference, safety
 +
 +
= Introduction =
 +
 +
The management of objects, especially the problem of efficient and safe memory allocation, plays an essential role in object-oriented programming. A practical system addressing the problem must be safe (i.e., a deallocated object should be clearly perceived as absent from all places in the program, preventing an unintended interpretation of “bad memory” as the contents of an object that formally no longer exists) and efficient, meaning that the cost of referencing an object (dereferencing an object pointer) cannot increase too significantly beyond the cost of a simple (machine-level) indirect memory reference. Typically, the dynamic instances of a program’s modules reside in two disjoint areas in memory: the stack and the heap. The activation records of functions, methods, contructors, and blocks are allocated on the stack, while the dynamic instances of structures, classes, and, sometimes, more exotic objects, like coroutines (for those languages that admit such objects) are stored in the heap. That latter category of objects is of our primary concern. Owing to the fact that, unlike the former category, the patterns of their creation and destruction do not follow a simple ordered paradigm, but are essentially random and arbitrary, the mechanisms of their allocation and requisite bookkeeping are open for discussion and invention, being thus also susceptible to trends. The first generation of popular programming languages with rudiments of objects: Pascal and C, opted for simplicity (read efficiency) at the cost of safety. Following the (explicit) deallocation of an object, any replicate references to the object within the program would still point to the old chunk of memory originally used to store the object, thus creating a hazard. It was up to the programmer to sidestep that hazard by not touching the old references (pointers) until reset. Note that a mistake consisting in a reference to a deallocatad object would not, in those circumstances, translate into an immediate error: a read access would return an incorrect result (that might trigger an error a potentially long while later), while a write access would potentially corrupt some legitimate object, often with delayed symptoms difficult to diagnose. When C++ appeared later, inheriting and absorbing the legacy of C under its umbrella, it did not revolutionize the paradigm of memory allocation for objects, primarily for two reasons:
 +
# C++ still preferred efficiency over safety (it was supposed to replace C without affecting the implementation aspects of its C subset)
 +
# C++ wanted to be compatible with C (C functions, using the traditional memory allocator, had to be able to coexist with C++ functions within the same program allowing both types of functions to operate on the same pointers).
 +
Java brought a response to the lack of safety inherent in the traditional approach to memory allocation for objects in the postulate to completely hide the action of object deallocation from the programmer’s view. By abandoning pointers (plus the large bag of pointer trick available to experts in C and C++), and replacing deallocation with the implicit and invisible garbage collection, Java also simplified object-oriented programming, thus making it accessible to a wider population of programmers (who no longer needed to be experts on such low-level and mundane aspects of computing as pointers). C#, introduced by Microsoft as their own “Java”, was basically the same creation when seen from the viewpoint of run-time memory management. That global paradigm shift was compatible with the rapid advancements in CPU and RAM technologies making large and sloppy (less efficient) programs vastly more acceptable than in the early days of C++. These days, few people care whether the efficiency of non-number-crunching program can be improved by the factor of 10 or 20 in terms of execution speed or memory demands for as long as the program appears to work. Following the short initial euphoria, the sealed, one-size-fits-all memory allocation system of Java (and friends) begun to exhibit its
 +
own shortcomings. The problem was the inability of the programmer to indicate to the garbage collector that some objects were less needed than others, even though references to them were still present in the program. The standard example of a situation where this kind of indication would be useful is opportunistic caching where, by definition, an object stored in the cache (and thus being referenced by a “pointer”) can (should) be removed when memory becomes scarce. Thus, in 1995 (i.e., three years after the introduction of the first version of Java), the concept of weak reference was added to the language. The idea was to let the garbage collector remove an object, if it was reachable solely via weak references. Notably, that solution didn’t quite solve all problems. As of today, there are four reference types in Java: strong, soft, weak, and phantom (in the decreasing order of strength) whose role is to cater to the various flavors of discarding an object without eliminating its reference first. Judging by the amount of discussion on the web, the four reference types cause more problems to apprentice Java programmers than pointers ever did to newcomers to C or C++. In the latter case the issue was quite simple: one either understood pointers in their entirety or not at all, whereas in Java the situation is exacerbated by the fact that one can go some way without any understanding of the underlying problem at all. The problem, of course, is that there are important circumstances where the cog- nizant programmer would (should) prefer to exercise more or less direct control over memory allocation for objects. That was realised quite promptly, also within the community of (expert) Java developers, despite their initial optimistic attitude towards garbage collection as the sole and ultimate remedy for all problems of memory allocation. In this context, as the pendulum seems to have reached the end of its swing, and is in fact beginning to move in the opposite direction, we believe that the time is ripe to revisit the valuable and somewhat forgotten ideas behind a solution that strikes a compromise between the two extremes, i.e., direct (unsafe) deallocation and (total, unqualified) garbage collection. This solution comes as safe programmed deallocation.
  
 
= Managing objects =
 
= Managing objects =
  
Proper management of objects plays an essential rôle in execution of object-oriented programs. It must ensure safety and efficiency. Ill thought ways to remove objects appear unsafe. On the other hand keeping not needed objects slows down the calculations and may lead to a blockade.
+
Proper management of objects plays an essential role in execution of object-oriented programs. It must ensure safety and efficiency. Poorly designed ways to remove objects appear unsafe. On the other hand, keeping unnecessary objects slows down the calculations and may lead to a collapse.
  
In 1984 Antoni Kreczmar and Gianna Cioni presented an object management system which is free of dangling references and is efficient. The system has been validated by the use in Loglan’s running system (i.e. virtual machine)during more than 30 years.
+
It is commonly accepted that dynamic instances of program’s modules reside in two non-overlapping fields in memory. The activation records of procedures, functions, constructors, and blocks are alocated on a stack. The dynamic instances of classes (and also of coroutines, if a programming language allows) are alocated on a heap. This paper presents a choice of problems related to the management of heap. In 1980 Antoni Kreczmar conceived a running system (a virtual machine) for Loglan programming language c.f.. Later, in the paper  an the object management system which is free of dangling references and is efficient has been described. Hanna Oktaba studied the system and constructed the algorithmic theory of references and analysed its metamathematical properties . The papers of Cioni and Kreczmar and of Oktaba accomplished the task of ''verification'' of the Kreczmar’s system. The system has been thoroughly ''validated'', for it is in use since more than 30 years as a part of the running system of Loglan’82 object programming language .
  
Most of presently used object oriented programming languages appeared after that date (C++ in 1985, Java in 1995, Python in 1989, Ruby 1993). No one of them has a system similar to the Kreczmar’s one. In C++ deallocation of objects is allowed without any form of control. It is well known that the system has no protection against dangling reference errors.
+
We believe the time has come to compare the system of Kreczmar to other object management systems and to explain it in terms of an algorithmic theory (different from the Oktaba’s theory).
  
Java programming language forbids the explicit removal of objects. Its system is safe however inefficient.
+
Most of presently used object oriented programming languages appeared after 1984 (C++ in 1985, Java in 1995, Python in 1989, Ruby 1993 ). No one of them has a system similar to the Kreczmar’s one.
  
We think that it is worthwhile to discover the solutions contained in  and to apply them in practice.
+
In C++ deallocation of objects is fast. However, it is well known that the system of C++ has no protection against dangling reference errors. The instruction <code>delete()</code> is executed without any form of control. The same may be said about the programming language Pascal and its instruction: <code>dispose()</code>.
  
Let us expose briefly the problems. Objects are created, used and eventually become not needed anymore.
+
Java programming language forbids the explicit removal of objects. Its system is free of dangling reference errors, however it is inefficient.
  
The designer of an object management system is confronted to three major threats:
+
We think that it is worthwhile to discover the inventions contained in  and to apply them in practice, for they offer the safety without any loss on efficiency.
 +
 
 +
Let us briefly expose the problems. Objects are 1created, 2shared, 3used, and, 4eventually become not needed, anymore. We shall abstract from many technical details of object’s creation such as its size, the type of object, the ways the free memory is organized.
 +
 
 +
Creation of objects is done through evaluation of object generator expressions, e.g. in the assignment <code>x := new ClassType(actparams)</code>. An object, once created, can be shared among several variables, inspected, and updated.
 +
 
 +
Sharing is accomplished by execution of assignment instructions e.g. the assignment <code>y:=x</code> causes that now the object pointed by <math>x</math> is shared by the variables <math>x</math> and <math>y</math>.
 +
 
 +
Other forms of object’s usage reduce to one of three cases:<br />''inspection'' – reading the value of an attribute <code>attr</code> of an object e.g. <code>x.attr</code> ,<br />''updating'' – writing the value of an attribute, e.g. <code>x.attr:=8</code>,<br />''servicing'' – done by calling a method of the object e.g. <code>call x.meth(81)</code>.<br />All three forms of usage should begin with the checking whether the variable <code>x</code> points to an alive object.
 +
 
 +
Eventually no further actions on the object will be taken. It is wise to dispose of it. In these circumstances the designer of an object management system is confronted to three major threats:
  
 
* ''memory leak'' problem,
 
* ''memory leak'' problem,
Linia 24: Linia 48:
 
* ''dangling references'' problem.
 
* ''dangling references'' problem.
  
''Memory leakage'' occurs when objects are created and remain unused. Program consumes memory. It leads to the slowdown of computations or even to a complete blockade.
+
Below, we explain these terms:
 +
 
 +
dangling reference
 +
 
 +
occurs when objects are created and remain unused. Program consumes memory. It leads to the slowdown of computations or even to a complete blockade.
 +
 
 +
A situation when in past some variable <math>x</math> pointed to an object <math>o</math>, but at present, the object <math>o</math> no longer exists. One says, the variable <math>x</math> is a dangling pointer if any attempt to use the variable <math>x</math> is treated '''without''' an alarm. The system is free of dangling reference error if any attempt to use the variable <math>x</math> raises an exception, e.g. caused alarm: <code>reference to none</code>.
 +
 
 +
Two variables of different types mutually contradict themselves: variable <math>x</math> says ''I am pointing to an object of type A'', variable <math>y</math> says ''I am pointing to an object of type B'' and both variables point to the same address. This situation may happen when deallocation created dangling reference(s).
 +
 
 +
Suppose the object memory system admits the dangling references. It may happen that, after execution of <code>delete(x)</code>, the dangling pointer <math>y</math> points to the memory frame where a new object <math>z</math> resides. Then the delayed instruction <code>delete(y)</code> will cause the destruction of the object <math>z</math>.
 +
 
 +
There are many slots of free memory, none is large enough to hold a new object.
 +
 
 +
A few words on dangling reference error are in order: one may distinguish between detected and undetected dangling reference error. The second error is a real danger. Detected dangling reference happens when the virtual machine finds that the value of a pointer is <code>none</code> (or <code>null</code>). This is an unpleasant situation, for the programmer must find the cause of it. Undetected dangling reference is much worse for many reasons, see above. One may ask: ''if so then perhaps it is possible to equip the compiler in a tool to detect dangling references errors in advance – before an execution of a program?'' The answer is: NO, such an algorithm does not exist. <ref>Should an algorithm <math>\mathcal{A}</math> for detection of dangling reference errors in source program exist, then we would construct another algorithm <math>\mathcal{B}</math> to detect whether any given program will terminate or not. This is known to be impossible.
 +
</ref>
  
 
There exists a variety of object management systems. One may classify them with respect to different definitions of garbage.
 
There exists a variety of object management systems. One may classify them with respect to different definitions of garbage.
  
The first, most natural, definition of garbage reads ''Object is a garbage whenever the program decides it is no longer needed''. This definition is accepted in C++ programming language. A programmer has the instruction delete(x) to his disposal. However, the instruction has a side effect: the ''dangling reference'' may appear. Namely, we observe some variables that point to segments of memory where no object resides. A dangling reference provokes another error of ''contradicting information''. This phenomenon occurs when two variables point to the same address and one says: ''“I am pointing to an object of type A '' and the other says ''I am pointing to an object of type B''. Errors of both kinds are difficult in diagnosis and very dangerous ones.
+
The first, most natural, definition of garbage reads ''An object <math>o</math> is a garbage, whenever the program instructs ''(the runtime system)'' it is no longer needed.''. This definition is accepted in C++ programming language. A C++ program may contain instruction <code>delete(x)</code>. However, the instruction has a side effect: the ''dangling reference'' may appear. Namely, one can observe some variables that point to segments of memory where no object resides. A dangling reference may lead to another error of ''contradicting information''. This phenomenon occurs when two variables point to the same address and one says: ''“I am pointing to an object of type A”'' and the other says ''“I am pointing to an object of type B”''. Errors of both kinds are difficult in diagnosis and very dangerous ones. There is no algorithm to detect the dangling reference in the text of program. For the problem is reducible to the halting problem.
  
Another definition of garbage reads: ''Object with no live references to it is a garbage''. Some programming languages try to keep track of references with reference counters (e.g. Python ). In this way a garbage collector can easily identify objects with reference counters equal zero as garbage. However, by introducing reference counters one creates an overhead in code’s length and also in execution time. The result is a slowdown of execution (A. Appel says “On the whole, the problems with reference counting outweigh its advantages”, p.264).
+
Another definition of garbage reads: ''Object with no references to it, is a garbage''. Some programming languages try to keep track of references with reference counters (e.g. Python ). In this way a garbage collector can easily identify objects with reference counters equal zero as garbage. However, by introducing reference counters one creates an overhead in code’s length and also in execution time. The result is a slowdown of execution (A. Appel says “On the whole, the problems with reference counting outweigh its advantages”, p.264). The same opinion had O.-J. Dahl, the father of object oriented programming . Let us recall that reference counters do not help in recognition of cycles of no longer needed objects.
  
Subsequent definitions of garbage bases on different types of references. Namely, one differentiates ''weak'' references from normal ones. Now, ''Object <math>o</math> is an garbage if there is no normal reference to it, even if there exist weak reference to <math>o</math>''. The weak references were introduced with two aims: 1to decrease the cost of reference counting, 2to diminish the risk that some objects will be kept for the programmer forgot to nullify all references to it.
+
Subsequent definitions of garbage base on different types of references. Namely, one differentiates ''weak'' references from normal ones. Now, ''Object <math>o</math> is an garbage if there is no normal reference to it, even if there exist weak reference to <math>o</math>''. The weak references were introduced with two aims: 1 to decrease the cost of reference counting, 2 to diminish the risk that some objects will be kept because the programmer forgot to nullify all references to it.
  
All three types of object management systems have certain deficiencies. The question arises: is it possible to replace operation delete by another operation kill, such that kill has no side effect of dangling reference. In the paper of Cioni and Kreczmar it was shown that there exists an objects management system with kill operation.
+
All three types of object management systems have certain deficiencies. The question arises: is it possible to replace operation delete by another operation, say kill, such that kill has no side effect of dangling reference. In  it was shown that there exists an objects management system with kill operation.
  
Any program that intensively creates objects and deallocates some, may cause ''fragmentation'' of the object memory. Sometimes, the situation may be improved by the defragmentation.<ref>The authors of use the word compactification.
+
Any program that intensively creates objects and deallocates some, may cause ''fragmentation'' of the object memory. Sometimes, the situation may be improved by the defragmentation.<ref>In the authors use the word compactification.
 
</ref>
 
</ref>
  
The system described by Kreczmar is a hybrid one. For it allows to
+
The system designed by Kreczmar integrates the features mentioned above. For it allows to:
  
* deallocate no longer needed objects (by means of kill operation),
+
* deallocate no longer needed objects (kill operation),
* compactify heap of object (a defragmentation operation),
+
* compactify heap of object (defragmentation),
* garbage collect objects that are not accessible (by a reference).
+
* collect garbage, i.e. objects that are not accessible.
  
Below we compare object programming languages we require that any system that manages heap of objects should satisfy the following conditions:
+
Below we compare object programming languages. We indicate that it is desirable that the languages and running systems satisfy the following conditions:
  
# For every type (class) <math>T</math>, for every variable <math>x</math>. If the variable <math>x</math> is of type <math>T</math> then its value is an object of a subclass <math>U</math> of class <math>T</math> or <math>x=none</math>. (This is a global invariant of the system)
+
#(r1) For every type (class) <math>T</math>, for every variable <math>x</math>. If the variable <math>x</math> is of type <math>T</math> then its value is an object of a subclass <math>U</math> of class <math>T</math> or <math>x=none</math>. (This is a fundamental invariant (i.e. axiom) of any true object system)
# For every object <math>s</math> it is possible to distinguish the fields containing pointers to objects from the fields that hold values of primitive types, like e.g. integer, float, boolean,...
+
#(r2) For every object <math>s</math> it is possible to distinguish the fields containing pointers to objects from the fields that hold values of primitive types, like e.g. integer, float, boolean,...
  
 
= Deallocation in Java, C++, Python et al. =
 
= Deallocation in Java, C++, Python et al. =
Linia 56: Linia 95:
 
== C++, object Pascal, Objective C ==
 
== C++, object Pascal, Objective C ==
  
We shall limit our considerations to the programs free of ''malloc'' instruction. For malloc instructions break the rules r1, r2, listed above.
+
We shall limit our considerations to the programs free of ''malloc'' instruction. For malloc instructions break the rules (r1), (r2), listed above.
  
 
Object <math>x</math> can be deleted with <math>delete(x)</math> statement. Other variables that are pointing to the removed object preserve their value. It leads to the dangling references error. The error can be avoided if all those variables are nullified <math>x1 = null; ... xn=null;</math>.
 
Object <math>x</math> can be deleted with <math>delete(x)</math> statement. Other variables that are pointing to the removed object preserve their value. It leads to the dangling references error. The error can be avoided if all those variables are nullified <math>x1 = null; ... xn=null;</math>.
Linia 89: Linia 128:
 
|| Object <math>o</math> has been deleted – '''under condition''' that all the references to the object have been earlier assigned the null value.
 
|| Object <math>o</math> has been deleted – '''under condition''' that all the references to the object have been earlier assigned the null value.
 
|-
 
|-
| '''Cost''' || style="text-align: center;" |<math>O(1)</math> || <math>O(1)</math> || <math>O(n+m)</math> <small>where <math> m</math> is the global size of the heap of objects</small>.
+
| '''Cost''' || style="text-align: center;" |<math>O(1)</math> ||style="text-align: center;" | <math>O(1)</math> ||style="text-align: center;" | <math>O(n+m)</math> <small>where <math> m</math> is the global size of the heap of objects</small>.
 
|-
 
|-
| '''Risk''' ||<math>\color{green} No\ risk(!) \color{black}</math>
+
| '''Risk''' || <math> No\ risk(!) </math>
  
 
For each attempt to read and/or write from the deleted object will raise an error signal {reference to none}.  
 
For each attempt to read and/or write from the deleted object will raise an error signal {reference to none}.  
|| If n>1 then dangling reference error occurs. High probability of the error of contradicting information.  
+
|| If n>1 then '''dangling reference error occurs'''. Worse, with high probability, the error of contradicting information will occur.  
 
|| Chances that programmer will forget to nullify some reference to the object <math>o</math> and hence that the object will remain (''not deleted'').  
 
|| Chances that programmer will forget to nullify some reference to the object <math>o</math> and hence that the object will remain (''not deleted'').  
 
|}
 
|}
Linia 139: Linia 178:
 
<math>States</math> are finite sets of <math>frames</math>. For each state <math>s</math> function <math>reserve</math> returns a frame that does not belong to <math>s</math>. For each pair <math>\langle e, s \rangle</math> operation insert return the set-theoretical union of the set <math>s</math> and and the element <math>e</math>. Similarly, operation delete returns the set <math>s'</math> that resulted by the deletion of element <math>e</math> from the set <math>s</math>. The element <math>initSt</math> is the empty set.
 
<math>States</math> are finite sets of <math>frames</math>. For each state <math>s</math> function <math>reserve</math> returns a frame that does not belong to <math>s</math>. For each pair <math>\langle e, s \rangle</math> operation insert return the set-theoretical union of the set <math>s</math> and and the element <math>e</math>. Similarly, operation delete returns the set <math>s'</math> that resulted by the deletion of element <math>e</math> from the set <math>s</math>. The element <math>initSt</math> is the empty set.
  
== Algorithmic specification ==
+
== Algorithmic specification ATHM of heap management ==
  
Specifications can be expressed in two variants:
+
In this subsection we develop a specification of the Kreczmar's system in the form of an algorithmic theory, i.e. a theory based on algorithmic logic instead of first-order logic c.f. \cite{AL}.  Our theory $\mathcal{ATHM}$ differs from the one proposed by Hanna Oktaba \cite{OktabaDr} by the presence of functor kill and the corresponding axiom of kill.
  
* a metamathematical one, or
+
* a quasi-programmed one.
+
 +
-----------------
 +
Each formalized algorithmic theory $\mathcal{T}$ can be identified with a triple $\mathcal{T}=\langle \mathcal{L, C, A}  \rangle$, where $\mathcal{L}$ is a formalized algorithmic language, $\mathcal{C}$ is the syntactical consequence operation defined by the notion of proof. The last element of the triple is the set $\mathcal{A}$ of axioms specific for the theory $\mathcal{T}$. We can assume that the notion of \textit{proof} is defined on the basis of the sets $\mathcal{A}_L$ of axioms  and $\mathcal{R}$ the set of inference rules of algorithmic logic.  
  
We choose the second notation. Its form is similar to Java’s interface or ADA’s specification. The ''differentia specifica'' stems from the presence of algorithmic formulas.<ref>In Eiffel  programming language, one finds invariants of classes. They are formulas of first-order language which limits their expressibility in an essential way.
+
The formalized algorithmic language $\mathcal{L}$ of our theory $\mathcal{ATHM}$ consists of three sets of well formed expressions: terms $T$, formulas $F$, and programs $P$. The alphabet of the language contains the sets of variables, of functors, of logical functors, of program connectives, and auxiliary symbols like parentheses, commas, etc.
</ref> They are invariants or axioms of the specified class.
+
  
Remember, in this way we define a formalized language in which one may program algorithms and/or write formulas.
+
The set of algorithmic formulas is the least set of expressions that contains all first-order formulas over the same alfabet and is closed with respect to the usual formation rules.Moreover, for any program $K$ and any algorithmic formula $\alpha$, the expression $K\alpha$ is an algorithmic formula.
'''unit''' HeapManagement: '''class''';<br />'''unit''' Frame: '''class''' ; ...<br />'''unit''' State: '''class''' ...<br />'''unit''' reserve: '''function'''(s: State): Frame ...<br />'''unit''' member: '''function'''(f:Frame, s:State): Frame ...<br />'''unit''' insert: '''function'''(f:Frame,s: State): State; ...<br />'''unit''' delete: '''function'''(f:Frame, s: State): State ...<br />'''unit''' kill: '''function'''(f:Frame, s: State): State ...<br />'''unit''' amb: '''function'''(s: State): Frame ...<br />'''unit''' initState: '''function'''(): State ...<br />
+
----------------
 +
We shall consider variables of type $F$ - for frames, usually denoted by $f, f', ...$ and of type $S$ - for states,usually denoted by $s,s', ...$.
 +
The set of functors and predicates of the theory's language    consists of:
  
(* —– Invariants (i.e. axioms) of the '''H'''eap '''M'''anagement system —– *)<br />
+
* $res: S \rightarrow F$
 +
* $amb: S \rightarrow F$
 +
* $ins: F \times S \rightarrow S$
 +
* $del: F \times S \rightarrow S$
 +
* $mb: F \times S \rightarrow \{true, false\}$
 +
* $kill: F \times S \rightarrow S $
 +
 +
and two constants '''none''' $\notin \{F \cup S\}$  and $eS \in S$. The value of any variable $f$ of type $F$ is a frame, or '''none'''.
 +
 +
The logical consequence operation $\vdash$  is defined as in \cite{AL}
  
Below we shall use abbreviations  <math>Fr</math> instead of Frames, <math>St</math> instead of States,<br />  <math>res</math> – reserve, <math>mb</math> – member, <math>ins</math> – insert, <math>del</math> – delete, <math>init</math> – initstate.
 
  
<ol>
+
<big>Axioms specific of the theory $\mathcal{ATHM}$ </big> are given below
<li><p><math>\forall_{s \in St}\ \textbf{while}\ s \neq init \ \textbf{do}\ s := del(amb(s),s) \ \textbf{od}\ (s = init)</math><br /></p>
+
<p><small>For every state the program terminates, hence, every state is a finite set of frames</small></p></li>
+
HM$_1$) $\forall_{s \in S}\ \neg \,mb(res(s),s)$ <br />
<li><p><math>\forall_{f \in Fr}\ \lnot mb(f, init)</math><br /></p>
+
 
<p><small>initstate is the empty set of frames</small></p></li>
+
<small> For every state $s$, operation $res(s)$ returns a new frame, not an element of $s$ </small> <br />
<li><p><math>\forall_{s \in St} \ \lnot mb(none, s)</math><br /></p>
+
 
<p><small>constant none does not belong to any state</small></p></li>
+
HM$_2$) $\forall_{f \in F}\ \lnot \, mb(f, eS)<br />
<li><p><math>\forall_{s \in St} \ \lnot mb(res(s),s) \wedge res(s) \neq none </math><br /></p>
+
<small> the initstate $eS$ is the empty set of frames </small><br />
<p><small>for every state <math>s</math>, function <math>res</math> returns a frame that does not belong to state <math>s</math></small></p></li>
+
HM$_3$)  $\forall_{s \in S}\,  
<li><p><math>\forall_{s \in St} \ s\neq init \Rightarrow mb(amb(s),s)</math><br /></p>
+
\left \{ \begin{array}{l} 
<p><small>For every non-empty state, function <math>amb</math> returns a member of the set <math>s</math></small></p></li>
+
    \mbox{while }\ s \neq eS\ \mbox{ do} \\
<li><p><math>\{s':=ins(f,s)\}(mb(f,s')\wedge \forall_{f' \in Fr}(f'\neq f \implies mb(f',s) \Leftrightarrow mb(f',s')) </math><br /></p>
+
  \quad  s := delete(amb(s),s) \   \\
<p><small>operation insert adds frame f to the set s</small></p>
+
  \mbox{od} \\
<p><br/></p></li>
+
\end{array} \right \}\, (s = eS)$ <br />
<li><p><math>\{s':=del(f,s) \}(\lnot mb(f,s')\wedge \forall_{f' \in Fr}((f'\neq f \implies mb(f',s)\Leftrightarrow    mb(f',s'))</math><br /></p>
+
 
<p><small>operation del deletes frame f from the set s</small></p></li>
+
<small> For every state $s$, the program while (above) terminates, hence, every state is a finite set of frames </small> <br />
<li><p><math>mb(f,s)  \Leftrightarrow  
+
HM$_4$) $\forall_{s \in S} \ s\neq eS \Rightarrow mb(amb(s),s)$ <br />
 +
<small> For every non-empty state, function $amb$ returns a member of the state $s$ </small> <br />
 +
HM$_5$) $\forall_{f \in F}\forall_{s \in S}\{s':=ins(f,s)\}(mb(f,s')\wedge \forall_{f' \in F}(f'\neq f \implies mb(f',s) \Leftrightarrow mb(f',s')) $<br />
 +
<small> operation $ins$ adds frame $f$ to the state $s$ </small> <br />
 +
HM$_6$) $\forall_{f \in F}\forall_{s \in S}\{s':=del(f,s) \}(\lnot mb(f,s')\wedge \forall_{f' \in F}(f'\neq f \implies mb(f',s)\Leftrightarrow    mb(f',s'))$ <br />
 +
<small> operation $del$ deletes frame $f$ from the state $s </small> <br />
 +
HM$_7$) $mb(f,s)  \Leftrightarrow  
 
   \left \{ \begin{array}{l}  
 
   \left \{ \begin{array}{l}  
     \textbf{begin} \\
+
     \mbox{begin} \\
 
     \quad s1 := s;\ bool := false; \\
 
     \quad s1 := s;\ bool := false; \\
     \quad \textbf{while} \ s1 \neq init \wedge \lnot bool \\
+
     \quad \mbox{while} \ s1 \neq eS \wedge \lnot bool \\
     \quad \textbf{do} \\
+
     \quad \mbox{do} \\
 
     \qquad f1 := amb(s1); \\
 
     \qquad f1 := amb(s1); \\
     \qquad \textbf{if} \ f=f1 \ \textbf{then} \ bool := true \ \textbf{fi}; \\
+
     \qquad \mbox{if} \ f=f1 \ \mbox{then} \ bool := true \ \mbox{fi}; \\
 
     \qquad s1 :=del(f1,s1); \\
 
     \qquad s1 :=del(f1,s1); \\
     \quad \textbf{od} \\
+
     \quad \mbox{od} \\
     \textbf{end} \  
+
     \mbox{end} \  
 
   \end{array}
 
   \end{array}
   \right \} bool </math><br /></p>
+
   \right \} bool <br />  
<p><small>This formula defines the relation member, its implementation may differ however. We require that its cost should be constant.</small></p>
+
<small> This formula defines the properties of relation member.  It is '''not''' an implementation. We postulate  that the implemented cost should be constant. </small> <br />
 
+
HM$_8$) <small> The operation kill is characterised by the axioms of the following scheme.   The index $k$ may be any natural number $k>0$, let $1 \leq i \leq k$. </small> <br />  
</li>
+
$\underbrace{((f_1=  ... =f_k) \wedge mb(f_1,s))}_{precondition}\Rightarrow \underbrace{[s' :=kill(f_i,s)]}_{\mathrm{statement}}\underbrace{(f_1=  ... =f_k  = \textbf{none})}_{postcondition}  $ <br />
<li><p><small>The operation kill is described by a scheme. The index <math>k</math> may be any natural number <math>k>0</math>, let <math>1 \leq i \leq k</math></small></p>
+
<small> Any formula of this form is an axiom, it tells that operation kill in one move nullifies ''all the references to the object pointed by the variable'' $f_i$. And indeed,  in the system of Kreczmar the cost of the operation kill is constant.   </small> <br />
<p><math>  ((f_1=  ... =f_k) \wedge mb(f_1,s))\Rightarrow [s' :=kill(f_i,s)](f_1=  ... =f_k  = \textbf{none})  </math></p>
+
<p><small>This is a scheme, it tells that operation kill in one move nullifies ''all the references to the object pointed by the variable <math>f_i</math>''. We require that its cost should be constant.</small></p></li></ol>
+
 
+
'''end''' HeapManagement;
+
  
 
== Properties of the specification ==
 
== Properties of the specification ==
Linia 253: Linia 305:
  
 
The second author was financed by research fellowship within the Project ''Information technologies: Research and its interdisciplinary applications'', Agreement number UDA POKL.04.01.01-00-051/10-00.
 
The second author was financed by research fellowship within the Project ''Information technologies: Research and its interdisciplinary applications'', Agreement number UDA POKL.04.01.01-00-051/10-00.
 +
== Zobacz też ==
 +
* [[odśmiecanie pamięci]]
  
<references />
+
{{Uwagi}}
 +
{{Przypisy}}
 +
<references/>
 +
== Bibliografia ==
 +
 +
# [Kreczmar, Cioni 1984] [[ Media:Programmed-deallocation-without-Dangling-Reference-I.pdf|{{Cytuj pismo| odn=tak | imię=Antoni | nazwisko=Kreczmar| imię2=Gianna | nazwisko2=Cioni |tytuł=Programmed deallocation without dangling reference |czasopismo=Information Processing Letters |strony=179-187 |rok=1984}}]]

Aktualna wersja na dzień 20:39, 22 paź 2015

Explicit dealocation without dangling references

by Andrzej Salwicki, Andrzej Zadrozny

a preliminary version

Abstract. This paper appears 30 years after the article by Gianna Cioni and Antoni Kreczmar under the same title. In the original paper [Kreczmar 1987], Kreczmar introduced a memory management system for dynamic allocation of objects with a few interesing properties. While leaving the programmer in control of object dealloacation (which in Kreczmar’s system is explicit), the system is free of the dangling reference problem, i.e., following the explicit deallocation of an object, all references to the object within the program become automatically nullified. In the present paper, we compare the memory management system described in [Cioni and Kreczmar 1984] with the solutions that appeared later (cf. Table I. below). We also give a formal specification for Kreczmar’s allocator in the form of a formalized algorithmic theory ATHM and sketch the construction of a programming model of the specification. Thus we prove that the specification is consistent, i.e., free of contradictions, and hence it is feasible.

Keywords: programming languages, objects, garbage collection, dangling reference, safety

Introduction

The management of objects, especially the problem of efficient and safe memory allocation, plays an essential role in object-oriented programming. A practical system addressing the problem must be safe (i.e., a deallocated object should be clearly perceived as absent from all places in the program, preventing an unintended interpretation of “bad memory” as the contents of an object that formally no longer exists) and efficient, meaning that the cost of referencing an object (dereferencing an object pointer) cannot increase too significantly beyond the cost of a simple (machine-level) indirect memory reference. Typically, the dynamic instances of a program’s modules reside in two disjoint areas in memory: the stack and the heap. The activation records of functions, methods, contructors, and blocks are allocated on the stack, while the dynamic instances of structures, classes, and, sometimes, more exotic objects, like coroutines (for those languages that admit such objects) are stored in the heap. That latter category of objects is of our primary concern. Owing to the fact that, unlike the former category, the patterns of their creation and destruction do not follow a simple ordered paradigm, but are essentially random and arbitrary, the mechanisms of their allocation and requisite bookkeeping are open for discussion and invention, being thus also susceptible to trends. The first generation of popular programming languages with rudiments of objects: Pascal and C, opted for simplicity (read efficiency) at the cost of safety. Following the (explicit) deallocation of an object, any replicate references to the object within the program would still point to the old chunk of memory originally used to store the object, thus creating a hazard. It was up to the programmer to sidestep that hazard by not touching the old references (pointers) until reset. Note that a mistake consisting in a reference to a deallocatad object would not, in those circumstances, translate into an immediate error: a read access would return an incorrect result (that might trigger an error a potentially long while later), while a write access would potentially corrupt some legitimate object, often with delayed symptoms difficult to diagnose. When C++ appeared later, inheriting and absorbing the legacy of C under its umbrella, it did not revolutionize the paradigm of memory allocation for objects, primarily for two reasons:

  1. C++ still preferred efficiency over safety (it was supposed to replace C without affecting the implementation aspects of its C subset)
  2. C++ wanted to be compatible with C (C functions, using the traditional memory allocator, had to be able to coexist with C++ functions within the same program allowing both types of functions to operate on the same pointers).

Java brought a response to the lack of safety inherent in the traditional approach to memory allocation for objects in the postulate to completely hide the action of object deallocation from the programmer’s view. By abandoning pointers (plus the large bag of pointer trick available to experts in C and C++), and replacing deallocation with the implicit and invisible garbage collection, Java also simplified object-oriented programming, thus making it accessible to a wider population of programmers (who no longer needed to be experts on such low-level and mundane aspects of computing as pointers). C#, introduced by Microsoft as their own “Java”, was basically the same creation when seen from the viewpoint of run-time memory management. That global paradigm shift was compatible with the rapid advancements in CPU and RAM technologies making large and sloppy (less efficient) programs vastly more acceptable than in the early days of C++. These days, few people care whether the efficiency of non-number-crunching program can be improved by the factor of 10 or 20 in terms of execution speed or memory demands for as long as the program appears to work. Following the short initial euphoria, the sealed, one-size-fits-all memory allocation system of Java (and friends) begun to exhibit its own shortcomings. The problem was the inability of the programmer to indicate to the garbage collector that some objects were less needed than others, even though references to them were still present in the program. The standard example of a situation where this kind of indication would be useful is opportunistic caching where, by definition, an object stored in the cache (and thus being referenced by a “pointer”) can (should) be removed when memory becomes scarce. Thus, in 1995 (i.e., three years after the introduction of the first version of Java), the concept of weak reference was added to the language. The idea was to let the garbage collector remove an object, if it was reachable solely via weak references. Notably, that solution didn’t quite solve all problems. As of today, there are four reference types in Java: strong, soft, weak, and phantom (in the decreasing order of strength) whose role is to cater to the various flavors of discarding an object without eliminating its reference first. Judging by the amount of discussion on the web, the four reference types cause more problems to apprentice Java programmers than pointers ever did to newcomers to C or C++. In the latter case the issue was quite simple: one either understood pointers in their entirety or not at all, whereas in Java the situation is exacerbated by the fact that one can go some way without any understanding of the underlying problem at all. The problem, of course, is that there are important circumstances where the cog- nizant programmer would (should) prefer to exercise more or less direct control over memory allocation for objects. That was realised quite promptly, also within the community of (expert) Java developers, despite their initial optimistic attitude towards garbage collection as the sole and ultimate remedy for all problems of memory allocation. In this context, as the pendulum seems to have reached the end of its swing, and is in fact beginning to move in the opposite direction, we believe that the time is ripe to revisit the valuable and somewhat forgotten ideas behind a solution that strikes a compromise between the two extremes, i.e., direct (unsafe) deallocation and (total, unqualified) garbage collection. This solution comes as safe programmed deallocation.

Managing objects

Proper management of objects plays an essential role in execution of object-oriented programs. It must ensure safety and efficiency. Poorly designed ways to remove objects appear unsafe. On the other hand, keeping unnecessary objects slows down the calculations and may lead to a collapse.

It is commonly accepted that dynamic instances of program’s modules reside in two non-overlapping fields in memory. The activation records of procedures, functions, constructors, and blocks are alocated on a stack. The dynamic instances of classes (and also of coroutines, if a programming language allows) are alocated on a heap. This paper presents a choice of problems related to the management of heap. In 1980 Antoni Kreczmar conceived a running system (a virtual machine) for Loglan programming language c.f.. Later, in the paper an the object management system which is free of dangling references and is efficient has been described. Hanna Oktaba studied the system and constructed the algorithmic theory of references and analysed its metamathematical properties . The papers of Cioni and Kreczmar and of Oktaba accomplished the task of verification of the Kreczmar’s system. The system has been thoroughly validated, for it is in use since more than 30 years as a part of the running system of Loglan’82 object programming language .

We believe the time has come to compare the system of Kreczmar to other object management systems and to explain it in terms of an algorithmic theory (different from the Oktaba’s theory).

Most of presently used object oriented programming languages appeared after 1984 (C++ in 1985, Java in 1995, Python in 1989, Ruby 1993 ). No one of them has a system similar to the Kreczmar’s one.

In C++ deallocation of objects is fast. However, it is well known that the system of C++ has no protection against dangling reference errors. The instruction delete() is executed without any form of control. The same may be said about the programming language Pascal and its instruction: dispose().

Java programming language forbids the explicit removal of objects. Its system is free of dangling reference errors, however it is inefficient.

We think that it is worthwhile to discover the inventions contained in and to apply them in practice, for they offer the safety without any loss on efficiency.

Let us briefly expose the problems. Objects are 1created, 2shared, 3used, and, 4eventually become not needed, anymore. We shall abstract from many technical details of object’s creation such as its size, the type of object, the ways the free memory is organized.

Creation of objects is done through evaluation of object generator expressions, e.g. in the assignment x := new ClassType(actparams). An object, once created, can be shared among several variables, inspected, and updated.

Sharing is accomplished by execution of assignment instructions e.g. the assignment y:=x causes that now the object pointed by [math]x[/math] is shared by the variables [math]x[/math] and [math]y[/math].

Other forms of object’s usage reduce to one of three cases:
inspection – reading the value of an attribute attr of an object e.g. x.attr ,
updating – writing the value of an attribute, e.g. x.attr:=8,
servicing – done by calling a method of the object e.g. call x.meth(81).
All three forms of usage should begin with the checking whether the variable x points to an alive object.

Eventually no further actions on the object will be taken. It is wise to dispose of it. In these circumstances the designer of an object management system is confronted to three major threats:

  • memory leak problem,
  • memory fragmentation,
  • dangling references problem.

Below, we explain these terms:

dangling reference

occurs when objects are created and remain unused. Program consumes memory. It leads to the slowdown of computations or even to a complete blockade.

A situation when in past some variable [math]x[/math] pointed to an object [math]o[/math], but at present, the object [math]o[/math] no longer exists. One says, the variable [math]x[/math] is a dangling pointer if any attempt to use the variable [math]x[/math] is treated without an alarm. The system is free of dangling reference error if any attempt to use the variable [math]x[/math] raises an exception, e.g. caused alarm: reference to none.

Two variables of different types mutually contradict themselves: variable [math]x[/math] says I am pointing to an object of type A, variable [math]y[/math] says I am pointing to an object of type B and both variables point to the same address. This situation may happen when deallocation created dangling reference(s).

Suppose the object memory system admits the dangling references. It may happen that, after execution of delete(x), the dangling pointer [math]y[/math] points to the memory frame where a new object [math]z[/math] resides. Then the delayed instruction delete(y) will cause the destruction of the object [math]z[/math].

There are many slots of free memory, none is large enough to hold a new object.

A few words on dangling reference error are in order: one may distinguish between detected and undetected dangling reference error. The second error is a real danger. Detected dangling reference happens when the virtual machine finds that the value of a pointer is none (or null). This is an unpleasant situation, for the programmer must find the cause of it. Undetected dangling reference is much worse for many reasons, see above. One may ask: if so then perhaps it is possible to equip the compiler in a tool to detect dangling references errors in advance – before an execution of a program? The answer is: NO, such an algorithm does not exist. [1]

There exists a variety of object management systems. One may classify them with respect to different definitions of garbage.

The first, most natural, definition of garbage reads “An object [math]o[/math] is a garbage, whenever the program instructs (the runtime system) it is no longer needed.”. This definition is accepted in C++ programming language. A C++ program may contain instruction delete(x). However, the instruction has a side effect: the dangling reference may appear. Namely, one can observe some variables that point to segments of memory where no object resides. A dangling reference may lead to another error of contradicting information. This phenomenon occurs when two variables point to the same address and one says: “I am pointing to an object of type A” and the other says “I am pointing to an object of type B”. Errors of both kinds are difficult in diagnosis and very dangerous ones. There is no algorithm to detect the dangling reference in the text of program. For the problem is reducible to the halting problem.

Another definition of garbage reads: “Object with no references to it, is a garbage”. Some programming languages try to keep track of references with reference counters (e.g. Python ). In this way a garbage collector can easily identify objects with reference counters equal zero as garbage. However, by introducing reference counters one creates an overhead in code’s length and also in execution time. The result is a slowdown of execution (A. Appel says “On the whole, the problems with reference counting outweigh its advantages”, p.264). The same opinion had O.-J. Dahl, the father of object oriented programming . Let us recall that reference counters do not help in recognition of cycles of no longer needed objects.

Subsequent definitions of garbage base on different types of references. Namely, one differentiates weak references from normal ones. Now, “Object [math]o[/math] is an garbage if there is no normal reference to it, even if there exist weak reference to [math]o[/math]”. The weak references were introduced with two aims: 1 to decrease the cost of reference counting, 2 to diminish the risk that some objects will be kept because the programmer forgot to nullify all references to it.

All three types of object management systems have certain deficiencies. The question arises: is it possible to replace operation delete by another operation, say kill, such that kill has no side effect of dangling reference. In it was shown that there exists an objects management system with kill operation.

Any program that intensively creates objects and deallocates some, may cause fragmentation of the object memory. Sometimes, the situation may be improved by the defragmentation.[2]

The system designed by Kreczmar integrates the features mentioned above. For it allows to:

  • deallocate no longer needed objects (kill operation),
  • compactify heap of object (defragmentation),
  • collect garbage, i.e. objects that are not accessible.

Below we compare object programming languages. We indicate that it is desirable that the languages and running systems satisfy the following conditions:

  1. (r1) For every type (class) [math]T[/math], for every variable [math]x[/math]. If the variable [math]x[/math] is of type [math]T[/math] then its value is an object of a subclass [math]U[/math] of class [math]T[/math] or [math]x=none[/math]. (This is a fundamental invariant (i.e. axiom) of any true object system)
  2. (r2) For every object [math]s[/math] it is possible to distinguish the fields containing pointers to objects from the fields that hold values of primitive types, like e.g. integer, float, boolean,...

Deallocation in Java, C++, Python et al.

In this section we briefly present the solutions taken in other object oriented languages.

C++, object Pascal, Objective C

We shall limit our considerations to the programs free of malloc instruction. For malloc instructions break the rules (r1), (r2), listed above.

Object [math]x[/math] can be deleted with [math]delete(x)[/math] statement. Other variables that are pointing to the removed object preserve their value. It leads to the dangling references error. The error can be avoided if all those variables are nullified [math]x1 = null; ... xn=null;[/math].

It is to a developer to remember all variables referring to the object getting deallocated and to nullify all of them prior to instruction delete. Obviously it is an error prone approach. To decrease the number of errors the notion of weak reference was proposed. .

Java, Python

Already the report of language Modula3 drew attention to the risk of hanging references and non-existence of an algorithm that could detect such errors. In Java white paper there are quite a few statements describing this problem and justifying the interdict of [math]delete[/math] instruction. Instead there is a Garbage Collector mechanism, which frees developer from manual removing of references to objects.

Soon, the opinion that garbage collector is an ultimate solution in objects management was verified. Three years after first version of Java (1995), in JDK 1.2 weak references have been introduced. Developer can declare variable as weak reference. Weak references do not change reference counter in cPython , and garbage collection algorithm does not take them into account in Java. If there are no strong references to object it is considered for collection. Even if there are some weak references to it.

Comparison

Loglan’82 C++, Pascal Java, Python
Pre- Certain object [math]o[/math] is referenced by the [math]n[/math] variables, i.e. [math]\ x_1=x_2=\ ...\ =x_n[/math].
Code kill([math]x_i[/math]) delete([math]x_i[/math]) [math] x_1=null;\ x_2=null;\ ...\ x_n=null[/math];

Now, the instruction

[math]gc()[/math] causes the object [math]o[/math] will be deleted.

Post- All the variables took the value none.

Object [math]o[/math] is deleted.

Object [math]o[/math] has been deleted. The variable [math]x_i[/math] has the value null.

Other variables point to the deleted frame – it is a dangerous error – dangling reference.

Object [math]o[/math] has been deleted – under condition that all the references to the object have been earlier assigned the null value.
Cost [math]O(1)[/math] [math]O(1)[/math] [math]O(n+m)[/math] where [math] m[/math] is the global size of the heap of objects.
Risk [math] No\ risk(!) [/math]

For each attempt to read and/or write from the deleted object will raise an error signal {reference to none}.

If n>1 then dangling reference error occurs. Worse, with high probability, the error of contradicting information will occur. Chances that programmer will forget to nullify some reference to the object [math]o[/math] and hence that the object will remain (not deleted).



Observation. Lets consider situation from Java:

[math]x_1:x_1 \rightarrow o[/math]

[math]y_{weak} \stackrel{weak}{:=} x_1[/math]

If weak reference [math]y[/math] refer to some object o, then there exists a strong reference [math]x_i[/math] to same object:

[math]y_{weak}: y_{weak} \rightarrow o \Leftrightarrow exists\ x_i \rightarrow o[/math]

It is only partially true. We could say probably true. Due to non-deterministic behaviour of garbage collector new ambiguity arises. After operation [math]x_1:=null[/math] weak reference [math]y_{weak}[/math] continues to refer to original object [math]o[/math] for some time. Only after run of garbage collection mechanism object is disposed and weak reference is nullified. Due to non-deterministic implementations of most garbage collectors developers cannot predict or effectively enforce collection. Side effect of this property is a necessity to treat normal object references and weak references in different manner.

Second observation. As long as weak reference to the object exists we can provoke a situation when object intended for collection will be restored to life:

123456789= Disposable tg = new Disposable();
/* A new Disposable object o created, tg is a reference to the object o */
WeakReference<Disposable> weak_tg = new WeakReference<Disposable>(tg);
/* creates Weak Reference to the same object o */
tg = null; /* remove last reference to the object o
The object o is ready to be collected */
System.gc(); /* This is a hint to activate GC */
Disposable tg2 = weak_tg.get();
/* It may happen that GC was not activated
And operation get will return reference to the object o */

This will happen when instruction [math]System.gc()[/math] will be ignored for some reason by virtual machine.

Algorithmic specification of Kreczmar’s system

System for the first time presented in original article is part of running-system (virtual machine) Loglan’82 language. Last 30 years provided us with arguments to renew interest in Kreczmar’s system. In this chapter we will provide formal requirements for managing memory of objects. We used here H.Oktaba thesis . After reading these requirements, the reader can better understand the nature of the result .

Informal description

The universe of a heap managing system HM consists of states and objects. A state may be viewed as a finite set of objects. For the purpose of the present work we can abstract from the structure of objects, even from their size.[3]

The universe of HM system consist of two sets [math]U = Frames \cup States[/math] with the following operations:

  • [math]reserve: States \rightarrow Frames[/math]
  • [math]insert: Frames \times States \rightarrow States[/math]
  • [math]delete: Frames \times States \rightarrow States[/math]
  • [math]member: Frames \times States \rightarrow {true, false}[/math]
  • [math]initSt \in States [/math]
  • [math]kill: Frame \times States \rightarrow States[/math]

[math]States[/math] are finite sets of [math]frames[/math]. For each state [math]s[/math] function [math]reserve[/math] returns a frame that does not belong to [math]s[/math]. For each pair [math]\langle e, s \rangle[/math] operation insert return the set-theoretical union of the set [math]s[/math] and and the element [math]e[/math]. Similarly, operation delete returns the set [math]s'[/math] that resulted by the deletion of element [math]e[/math] from the set [math]s[/math]. The element [math]initSt[/math] is the empty set.

Algorithmic specification ATHM of heap management

In this subsection we develop a specification of the Kreczmar's system in the form of an algorithmic theory, i.e. a theory based on algorithmic logic instead of first-order logic c.f. \cite{AL}. Our theory $\mathcal{ATHM}$ differs from the one proposed by Hanna Oktaba \cite{OktabaDr} by the presence of functor kill and the corresponding axiom of kill.



Each formalized algorithmic theory $\mathcal{T}$ can be identified with a triple $\mathcal{T}=\langle \mathcal{L, C, A} \rangle$, where $\mathcal{L}$ is a formalized algorithmic language, $\mathcal{C}$ is the syntactical consequence operation defined by the notion of proof. The last element of the triple is the set $\mathcal{A}$ of axioms specific for the theory $\mathcal{T}$. We can assume that the notion of \textit{proof} is defined on the basis of the sets $\mathcal{A}_L$ of axioms and $\mathcal{R}$ the set of inference rules of algorithmic logic.

The formalized algorithmic language $\mathcal{L}$ of our theory $\mathcal{ATHM}$ consists of three sets of well formed expressions: terms $T$, formulas $F$, and programs $P$. The alphabet of the language contains the sets of variables, of functors, of logical functors, of program connectives, and auxiliary symbols like parentheses, commas, etc.

The set of algorithmic formulas is the least set of expressions that contains all first-order formulas over the same alfabet and is closed with respect to the usual formation rules.Moreover, for any program $K$ and any algorithmic formula $\alpha$, the expression $K\alpha$ is an algorithmic formula.


We shall consider variables of type $F$ - for frames, usually denoted by $f, f', ...$ and of type $S$ - for states,usually denoted by $s,s', ...$. The set of functors and predicates of the theory's language consists of:

  • $res: S \rightarrow F$
  • $amb: S \rightarrow F$
  • $ins: F \times S \rightarrow S$
  • $del: F \times S \rightarrow S$
  • $mb: F \times S \rightarrow \{true, false\}$
  • $kill: F \times S \rightarrow S $

and two constants none $\notin \{F \cup S\}$ and $eS \in S$. The value of any variable $f$ of type $F$ is a frame, or none.

The logical consequence operation $\vdash$ is defined as in \cite{AL}


Axioms specific of the theory $\mathcal{ATHM}$ are given below

HM$_1$) $\forall_{s \in S}\ \neg \,mb(res(s),s)$

For every state $s$, operation $res(s)$ returns a new frame, not an element of $s$

HM$_2$) $\forall_{f \in F}\ \lnot \, mb(f, eS)$
the initstate $eS$ is the empty set of frames
HM$_3$) $\forall_{s \in S}\,

\left \{ \begin{array}{l}  
    \mbox{while }\ s \neq eS\ \mbox{ do} \\
  \quad   s := delete(amb(s),s) \   \\
  \mbox{od} \\
\end{array} \right \}\, (s = eS)$ 

For every state $s$, the program while (above) terminates, hence, every state is a finite set of frames
HM$_4$) $\forall_{s \in S} \ s\neq eS \Rightarrow mb(amb(s),s)$
For every non-empty state, function $amb$ returns a member of the state $s$
HM$_5$) $\forall_{f \in F}\forall_{s \in S}\{s':=ins(f,s)\}(mb(f,s')\wedge \forall_{f' \in F}(f'\neq f \implies mb(f',s) \Leftrightarrow mb(f',s')) $
operation $ins$ adds frame $f$ to the state $s$
HM$_6$) $\forall_{f \in F}\forall_{s \in S}\{s':=del(f,s) \}(\lnot mb(f,s')\wedge \forall_{f' \in F}(f'\neq f \implies mb(f',s)\Leftrightarrow mb(f',s'))$
operation $del$ deletes frame $f$ from the state $s $
HM$_7$) $mb(f,s) \Leftrightarrow

 \left \{ \begin{array}{l} 
   \mbox{begin} \\
   \quad s1 := s;\ bool := false; \\
   \quad \mbox{while} \ s1 \neq eS \wedge \lnot bool \\
   \quad \mbox{do} \\
   \qquad f1 := amb(s1); \\
   \qquad \mbox{if} \ f=f1 \ \mbox{then} \ bool := true \ \mbox{fi}; \\
   \qquad s1 :=del(f1,s1); \\
   \quad \mbox{od} \\
   \mbox{end} \ 
 \end{array}
 \right \} bool $  

This formula defines the properties of relation member. It is not an implementation. We postulate that the implemented cost should be constant.
HM$_8$) The operation kill is characterised by the axioms of the following scheme. The index $k$ may be any natural number $k>0$, let $1 \leq i \leq k$.
$\underbrace{((f_1= ... =f_k) \wedge mb(f_1,s))}_{precondition}\Rightarrow \underbrace{[s' :=kill(f_i,s)]}_{\mathrm{statement}}\underbrace{(f_1= ... =f_k = \textbf{none})}_{postcondition} $
Any formula of this form is an axiom, it tells that operation kill in one move nullifies all the references to the object pointed by the variable $f_i$. And indeed, in the system of Kreczmar the cost of the operation kill is constant.

Properties of the specification

The above set of algorithmic formulas defines the requirements imposed on a class to be implemented. Moreover, it allows to prove some useful facts e.g.
Theorem 1
For every state s,
[math]\forall_{s \in State}\, \left \{ \begin{array}{l} \textbf{begin} \\ \quad s' := initSt; \\ \quad \textbf{while}\ s \neq initSt\ \textbf{do} \\ \qquad r := reserve(s'); \\ \qquad \textbf{if}\ member(r,s)\ \textbf{then} \ s := delete(r,s) \ \textbf{fi}; \\ \qquad s' := insert(r,s') \\ \quad \textbf{od} \\ \textbf{end} \end{array} \right \}\, (s = initSt)[/math]

i.e. the operation reserve new frame may replace the operation amb in the postulate that every state is a finite set of frames.
Theorem 2
The program in the axiom HM[math]_8[/math] never loops.

We would like to attract the attention of the reader to the property HM[math]_4[/math]. One may use it to deduce the important fact.
Theorem 3
Let T be any class, let (a[math]_1[/math], …, a[math]_k[/math]) be a list of actual parametres. [math]\textbf{new }T(a_1, \dots , a_k) \neq \textbf{new }T(a_1, \dots , a_k)[/math]

One can investigate the properties of the specification itself. We are able to state a couple of important metatheorems about the system of axioms HM.
Metatheorem 1
(on consistency)
The system of axioms HM[math]_1[/math] – HM[math]_9[/math] has a model.

Proof. Using the ideas of the paper one can construct a programmable model of axioms HM[math]_1[/math] – HM[math]_9[/math].
This model will be called the standard model.
Metatheorem 2
(representation theorem)
Every model of the axioms HM[math]_1[/math] – HM[math]_9[/math] is isomorphic to the standard one.

Variations of axiom’s system

Are the simplifications we made important? One can easily observe two points:

  • One can consider a slightly different operation reserve - with a parameter appetite defining the size required for an object. It leads to a consistent set of axioms.
  • Another extension of our system HM is defined when one describes the internal structure of object. This extension is also consistent with our initial system.

Till now we needed not to introduce an operation of garbage collection. In our abstract version the set of Frames is isomorphic with the set of natural numbers.

One can ask how to express the property the set [math]Fr[/math] of frames is finite? The answer is easy:
HM[math]_{10}[/math]) [math]\exists_{s_0 \in St} \forall_{f \in Fr} mb(f, s_0)[/math]
Which reads: the set of frames is equal to some state, hence [math]Fr[/math] is a finite set.
Again, the set of the axioms HM[math]_1[/math] – HM[math]_{10}[/math] is consistent.

Final remarks

In the paper the memory management system is treated as a whole. The problems of garbage collection and dangling references were not separated.

Operation kill can be safely implemented with low, fixed cost. The frequency of garbage collection is diminished due to following discipline

  1. kill operation is called whenever an object should be deleted, the freed frame is added to the list of free memory frames,
  2. the list of free memory frames is checked and used during an operation of new,
  3. compactification (i.e. defragmentation) is done when needed,
  4. instruction gc() is called when compactification did not help.

Let us remark that since Java was introduced in 1995 the memory size has grown thousand times.

Acknowledgement

The second author was financed by research fellowship within the Project Information technologies: Research and its interdisciplinary applications, Agreement number UDA POKL.04.01.01-00-051/10-00.

Zobacz też

Uwagi

Przypisy

  1. Should an algorithm [math]\mathcal{A}[/math] for detection of dangling reference errors in source program exist, then we would construct another algorithm [math]\mathcal{B}[/math] to detect whether any given program will terminate or not. This is known to be impossible.
  2. In the authors use the word compactification.
  3. One can add these ingredients of object’s size and its content later.

Bibliografia

  1. [Kreczmar, Cioni 1984] Antoni Kreczmar, Gianna Cioni. Programmed deallocation without dangling reference. „Information Processing Letters”, s. 179-187, 1984.