If I had to name one thing that surprised me the most back when I started messing with C and PostgreSQL, I'd probably name memory contexts. I never met this concept before, so it seemd rather strange, and there's not much documentation introducing it. I recently read an interesting paper summarizing architecture of a database system (by Hellerstein, Stonebraker and Hamilton), and there's actually devote a whole section (7.2 Memory Allocator) to memory contexts (aka allocators). The section explicitly mentions PostgreSQL as having a fairly sophisticated allocator, but sadly it's very short (only ~2 pages) and describes only the general ideas, without going discussing the code and challenges - which is understandable, because the are many possible implementations. BTW the paper is very nice, definitely recommend reading it.
But this blog is a good place to present details of the PostgreSQL memory contexts, including the issues you'll face when using them. If you're a seasoned PostgreSQL hacker, chances are you know all of this (feel free to point out any inaccuracies), but if you're just starting hacking PostgreSQL in C, this blog post might be useful for you.
Now, when I said there's not much documentation about memory contexts, I was lying a bit. The are plenty of comments in memutils.h and aset.c, explaining the internals quite well - but who reads code comments, right? Also, you can only read them when you realize how important memory contexts are (and find the appropriate files). Another issue is that the comments only explain "how it works" and not some of the consequences (like, palloc
overhead, for example).
Motivation
But, why do we even need memory contexts? In C, you simply call malloc
whenever you need to allocate memory on heap, and when you're done with the memory, you call free
. It's simple and for short programs this is pretty sufficient and manageable, but as the program gets more complex (passing allocated pieces between functions) it becomes really difficult to track all those little pieces of memory. Memory allocated at one place may be passed around and then freed at a completely different part of the code, far far away from the malloc
that allocated it. If you free them too early, the application will eventually see garbage, if you free them too late (or never), you get excessive memory usage (or memory leaks).
And PostgreSQL is quite complex code - consider for example how tuples flow throught execution plans. The tuple is allocated at one place, gets passed through sorting, aggregations, various transformations etc. and eventually sent to the client.
Memory contexts are a clever way to deal with this - instead of tracking each little piece of memory separately, each piece is registered somewhere (in a context), and then the whole context is released at once. All you have to choose the memory context, and call palloc/pfree
instead of malloc/free
.
In the simplest case palloc
simply determines the "current" memory context (more on this later), allocates appropriate piece of memory (by calling malloc
) and associates is with the current memory context (by storing a pointer in the memory context and some info in a "header" of the allocated piece) and returns it to the caller. Freeing the memory is done either through pfree
(which reverses palloc
logic) or by freeing the whole memory context (you can see it as pfree
loop over all allocated pieces).
This offers multiple optimization options - for example reducing the number of malloc
/free
calls by keeping a cache of released pieces, etc.
The other thing is granularity and organization of memory contexts. We certainly don't want a single huge memory contexts, because that's almost exactly the same as having no contexts at all. So we know we need multiple contexts, but how many? Luckily, there's a quite natural way to split memory contexts, because all queries are evaluated through execution plans - a tree of operators (scans, joins, aggregations, ...).
Most executor nodes have their own memory context, released once that particular node completes. So for example when you have a join or aggregation, once this step finishes (and passes all the results to the downstream operator), it discards the context and frees the memory it allocated (and didn't free explicitly). Sometimes this is not perfectly accurate (e.g. some nodes create multiple separate memory contexts), but you get the idea.
The link to execution plans also gives us hint on how to organize the memory context - the execution plan is a tree of nodes, and with memory contexts attached to nodes, it's natural to keep the memory contexts organized in a tree too.
That being said, it's worth mentioning that memory contexts are not used only when executing queries - pretty much everything in PostgreSQL is allocated within some a memory context, including "global" structures like various caches, global structures etc. That however does not contradict the tree-ish structure and per-node granularity.
MemoryContext API
The first thing you should probably get familiar with is MemoryContextMethods API which provides generic infrastructure for various possible implementations. It more or less captures the ideas outlined above. The memory context itself is defined as a simple structure:
typedefstructMemoryContextData{NodeTagtype;MemoryContextMethods*methods;MemoryContextparent;MemoryContextfirstchild;MemoryContextnextchild;char*name;boolisReset;}MemoryContextData;
Which allows the tree structure of memory contexts (by parent and first/next child fields). The methods describe what "operations" are available for a context:
typedefstructMemoryContextMethods{void*(*alloc)(MemoryContextcontext,Sizesize);/* call this free_p in case someone #define's free() */void(*free_p)(MemoryContextcontext,void*pointer);void*(*realloc)(MemoryContextcontext,void*pointer,Sizesize);void(*init)(MemoryContextcontext);void(*reset)(MemoryContextcontext);void(*delete_context)(MemoryContextcontext);Size(*get_chunk_space)(MemoryContextcontext,void*pointer);bool(*is_empty)(MemoryContextcontext);void(*stats)(MemoryContextcontext,intlevel);#ifdefMEMORY_CONTEXT_CHECKINGvoid(*check)(MemoryContextcontext);#endif}MemoryContextMethods;
Which pretty much says that each memory context implementation provides methods to allocate, free and reallocate memory (alternatives to malloc
, free
a realloc
) and also methods to manage the contexts (e.g. initialize a new context, destroy it etc.).
There are also several helper methods wrapping this API, forwarding the calls to the proper instance of MemoryContextMethods
.
And when I mentioned palloc
and pfree
before - these are pretty much just additional wrappers on top of these helper methods (grabbing the current context and passing it into the method).
Allocation Set (AllocSet) Allocator
Clearly, the MemoryContext API provides just the infrastructure, and was developer in anticipation of multiple allocators with different features. That however newer happened, and so far there's a single memory context implementation - Allocation set.
This often makes the discussion a bit confusing, because people mix the general concept of memory contexts and the (single) implementation available.
Allocation Set implementation is quite sophisticated (aka complex). Let me quote the first comment in aset.c:
... it manages allocations in a block pool by itself, combining
many small allocations in a few bigger blocks. AllocSetFree() normally
doesn't free() memory really. It just add's the free'd area to some
list for later reuse by AllocSetAlloc(). All memory blocks are free()'d
at once on AllocSetReset(), which happens when the memory context gets
destroyed.
To explain this a bit - AllocSet allocates blocks of memory (multiples of 1kB), and then "splits" this memory into smaller chunks, to satisfy the actual palloc
requests. When you free a chunk (by calling pfree
), it can't immediately pass it to free
because the memory was allocated as a part of a larger block. So it keeps the chunk for reuse (for similarly-sized palloc
requests), which has the nice benefit of lowering the number of malloc
calls (and generally malloc-related book-keeping).
This works perfectly once you have palloc calls with a mix of different requests sizes, but once you break this, the results are pretty bad. Similarly, it's possible to construct requests that interact with the logic grouping requests into groups (making it easier to reuse the chunks), resulting in a lot of wasted memory.
There's another optimization for requests over 8kB, that are handled differently - the largest blocks (part of the block pool) are 8kB, and all requests exceeding this are allocated through malloc
directly, and freed immediately using free
.
The CurrentMemoryContext
Now, let's say you call palloc
, which looks almost exactly the same as a malloc
call:
char*x=palloc(128);// allocate 128B in the context
So how does it know which memory context to use? It's really simple - the memory context implementation defines a few global variables, tracking interesting memory contexts, and one of them is CurrentMemoryContext
which means "we're currently allocating memory in this context."
Earlier I mentioned that each execution node has an associated context - the first thing the memory node may do is setting the associated memory context as the current one. This however is a problem, because the child nodes may do the same, and the execution may be "interleaved" (the nodes are passing tuples in an iterative manner).
Thus what we usually see is this idiom:
MemoryContextoldcontext=MemoryContextSwitchTo(nodecontext);char*x=palloc(128);char*y=palloc(256);MemoryContextSwitchTo(oldcontext)
which keeps the current memory context set to the original value.
Summary
I tried to explain the motivation and basic of memory contexts, and hopefully direct you to the proper source files for more info.
The main points to remember are probably:
- Memory contexts group allocated pieces of memory, making it easier to manage lifecycle.
- Memory contexts are organized in a tree, roughly matching the execution plans.
- There's a generic infrastructure allowing different implementations, but nowadays there's a single implementation - Allocation Set.
- It attempts to minimize
malloc
calls/book-keeping, maximize memory reuse, and never really frees memory.
In the next post I'll look into the usual problems with palloc
overhead.