shithub: riscv

ref: 32b5b2f42db04ed84197392cebf2930b2208ae20
dir: /sys/src/cmd/python/Doc/api/memory.tex/

View raw version
\chapter{Memory Management \label{memory}}
\sectionauthor{Vladimir Marangozov}{[email protected]}


\section{Overview \label{memoryOverview}}

Memory management in Python involves a private heap containing all
Python objects and data structures. The management of this private
heap is ensured internally by the \emph{Python memory manager}.  The
Python memory manager has different components which deal with various
dynamic storage management aspects, like sharing, segmentation,
preallocation or caching.

At the lowest level, a raw memory allocator ensures that there is
enough room in the private heap for storing all Python-related data
by interacting with the memory manager of the operating system. On top
of the raw memory allocator, several object-specific allocators
operate on the same heap and implement distinct memory management
policies adapted to the peculiarities of every object type. For
example, integer objects are managed differently within the heap than
strings, tuples or dictionaries because integers imply different
storage requirements and speed/space tradeoffs. The Python memory
manager thus delegates some of the work to the object-specific
allocators, but ensures that the latter operate within the bounds of
the private heap.

It is important to understand that the management of the Python heap
is performed by the interpreter itself and that the user has no
control over it, even if she regularly manipulates object pointers to
memory blocks inside that heap.  The allocation of heap space for
Python objects and other internal buffers is performed on demand by
the Python memory manager through the Python/C API functions listed in
this document.

To avoid memory corruption, extension writers should never try to
operate on Python objects with the functions exported by the C
library: \cfunction{malloc()}\ttindex{malloc()},
\cfunction{calloc()}\ttindex{calloc()},
\cfunction{realloc()}\ttindex{realloc()} and
\cfunction{free()}\ttindex{free()}.  This will result in 
mixed calls between the C allocator and the Python memory manager
with fatal consequences, because they implement different algorithms
and operate on different heaps.  However, one may safely allocate and
release memory blocks with the C library allocator for individual
purposes, as shown in the following example:

\begin{verbatim}
    PyObject *res;
    char *buf = (char *) malloc(BUFSIZ); /* for I/O */

    if (buf == NULL)
        return PyErr_NoMemory();
    ...Do some I/O operation involving buf...
    res = PyString_FromString(buf);
    free(buf); /* malloc'ed */
    return res;
\end{verbatim}

In this example, the memory request for the I/O buffer is handled by
the C library allocator. The Python memory manager is involved only
in the allocation of the string object returned as a result.

In most situations, however, it is recommended to allocate memory from
the Python heap specifically because the latter is under control of
the Python memory manager. For example, this is required when the
interpreter is extended with new object types written in C. Another
reason for using the Python heap is the desire to \emph{inform} the
Python memory manager about the memory needs of the extension module.
Even when the requested memory is used exclusively for internal,
highly-specific purposes, delegating all memory requests to the Python
memory manager causes the interpreter to have a more accurate image of
its memory footprint as a whole. Consequently, under certain
circumstances, the Python memory manager may or may not trigger
appropriate actions, like garbage collection, memory compaction or
other preventive procedures. Note that by using the C library
allocator as shown in the previous example, the allocated memory for
the I/O buffer escapes completely the Python memory manager.


\section{Memory Interface \label{memoryInterface}}

The following function sets, modeled after the ANSI C standard,
but specifying  behavior when requesting zero bytes,
are available for allocating and releasing memory from the Python heap:


\begin{cfuncdesc}{void*}{PyMem_Malloc}{size_t n}
  Allocates \var{n} bytes and returns a pointer of type \ctype{void*}
  to the allocated memory, or \NULL{} if the request fails.
  Requesting zero bytes returns a distinct non-\NULL{} pointer if
  possible, as if \cfunction{PyMem_Malloc(1)} had been called instead.
  The memory will not have been initialized in any way.
\end{cfuncdesc}

\begin{cfuncdesc}{void*}{PyMem_Realloc}{void *p, size_t n}
  Resizes the memory block pointed to by \var{p} to \var{n} bytes.
  The contents will be unchanged to the minimum of the old and the new
  sizes. If \var{p} is \NULL, the call is equivalent to
  \cfunction{PyMem_Malloc(\var{n})}; else if \var{n} is equal to zero, the
  memory block is resized but is not freed, and the returned pointer
  is non-\NULL.  Unless \var{p} is \NULL, it must have been
  returned by a previous call to \cfunction{PyMem_Malloc()} or
  \cfunction{PyMem_Realloc()}. If the request fails,
  \cfunction{PyMem_Realloc()} returns \NULL{} and \var{p} remains a
  valid pointer to the previous memory area.
\end{cfuncdesc}

\begin{cfuncdesc}{void}{PyMem_Free}{void *p}
  Frees the memory block pointed to by \var{p}, which must have been
  returned by a previous call to \cfunction{PyMem_Malloc()} or
  \cfunction{PyMem_Realloc()}.  Otherwise, or if
  \cfunction{PyMem_Free(p)} has been called before, undefined
  behavior occurs. If \var{p} is \NULL, no operation is performed.
\end{cfuncdesc}

The following type-oriented macros are provided for convenience.  Note 
that \var{TYPE} refers to any C type.

\begin{cfuncdesc}{\var{TYPE}*}{PyMem_New}{TYPE, size_t n}
  Same as \cfunction{PyMem_Malloc()}, but allocates \code{(\var{n} *
  sizeof(\var{TYPE}))} bytes of memory.  Returns a pointer cast to
  \ctype{\var{TYPE}*}.  The memory will not have been initialized in
  any way.
\end{cfuncdesc}

\begin{cfuncdesc}{\var{TYPE}*}{PyMem_Resize}{void *p, TYPE, size_t n}
  Same as \cfunction{PyMem_Realloc()}, but the memory block is resized
  to \code{(\var{n} * sizeof(\var{TYPE}))} bytes.  Returns a pointer
  cast to \ctype{\var{TYPE}*}. On return, \var{p} will be a pointer to
  the new memory area, or \NULL{} in the event of failure.
\end{cfuncdesc}

\begin{cfuncdesc}{void}{PyMem_Del}{void *p}
  Same as \cfunction{PyMem_Free()}.
\end{cfuncdesc}

In addition, the following macro sets are provided for calling the
Python memory allocator directly, without involving the C API functions
listed above. However, note that their use does not preserve binary
compatibility across Python versions and is therefore deprecated in
extension modules.

\cfunction{PyMem_MALLOC()}, \cfunction{PyMem_REALLOC()}, \cfunction{PyMem_FREE()}.

\cfunction{PyMem_NEW()}, \cfunction{PyMem_RESIZE()}, \cfunction{PyMem_DEL()}.


\section{Examples \label{memoryExamples}}

Here is the example from section \ref{memoryOverview}, rewritten so
that the I/O buffer is allocated from the Python heap by using the
first function set:

\begin{verbatim}
    PyObject *res;
    char *buf = (char *) PyMem_Malloc(BUFSIZ); /* for I/O */

    if (buf == NULL)
        return PyErr_NoMemory();
    /* ...Do some I/O operation involving buf... */
    res = PyString_FromString(buf);
    PyMem_Free(buf); /* allocated with PyMem_Malloc */
    return res;
\end{verbatim}

The same code using the type-oriented function set:

\begin{verbatim}
    PyObject *res;
    char *buf = PyMem_New(char, BUFSIZ); /* for I/O */

    if (buf == NULL)
        return PyErr_NoMemory();
    /* ...Do some I/O operation involving buf... */
    res = PyString_FromString(buf);
    PyMem_Del(buf); /* allocated with PyMem_New */
    return res;
\end{verbatim}

Note that in the two examples above, the buffer is always
manipulated via functions belonging to the same set. Indeed, it
is required to use the same memory API family for a given
memory block, so that the risk of mixing different allocators is
reduced to a minimum. The following code sequence contains two errors,
one of which is labeled as \emph{fatal} because it mixes two different
allocators operating on different heaps.

\begin{verbatim}
char *buf1 = PyMem_New(char, BUFSIZ);
char *buf2 = (char *) malloc(BUFSIZ);
char *buf3 = (char *) PyMem_Malloc(BUFSIZ);
...
PyMem_Del(buf3);  /* Wrong -- should be PyMem_Free() */
free(buf2);       /* Right -- allocated via malloc() */
free(buf1);       /* Fatal -- should be PyMem_Del()  */
\end{verbatim}

In addition to the functions aimed at handling raw memory blocks from
the Python heap, objects in Python are allocated and released with
\cfunction{PyObject_New()}, \cfunction{PyObject_NewVar()} and
\cfunction{PyObject_Del()}.

These will be explained in the next chapter on defining and
implementing new object types in C.