A collection of thoughts related to the challenges of software engineering

stay connected

Posts Tagged with ‘heap’

Let me tell you why you're here. You're here because you know something. What you know you can't explain, but you feel it. You've felt it your entire life, that there's something wrong with the world. You don't know what it is, but it's there, like a splinter in your mind, driving you mad. It is this feeling that has brought you to me. You want to know what is heap fragmentation.

And now for something completely different : Virtual Memory

Should you wish to know more about virtual memory, I advise you to read this Wikipedia article.

Virtual memory is an abstraction layer maintained by the Virtual Memory Manager (VMM), a component within the operating system kernel. It sits on top of physical memory. Every pointer in your program - running in an OS having virtual memory that is - refers to a virtual memory address. You don't address physical memory directly from an user mode program in a modern OS.

When your pointer says "0x400000", it's 0x400000 relative to the base of your virtual address space. It's totally unrelated to the actual physical memory address (only the VMM knows what is the corresponding physical address).

You don't have to allocate virtual memory directly, you use new or malloc.

It's nonetheless possible. On Windows this is done with the VirtualAlloc function that allows fine grained operation on the ranges you wish to allocate and the rights. You can choose to commit the memory or not.

Sounds useless? In most cases it is.

Here's one use case however: let's say you want to have a huge matrix in memory, but only plan on using few spots. You can allocate the whole matrix, but only commit the spots you need (you can even add a handler to the access violation routine to commit on the fly...). In doing so you're going to minimize the actual amount of memory used.

VirtualAlloc (or mmap in BSD) is however extremely slow. It's not Windows fault. Not only do you have to go through the kernel gate, but you're also requesting the system to fetch resources... The higher the load on your system, the slower it will be. Even when you have huge amounts of memory available, virtualalloc et al. are slow.

Fortunately, you generally don't allocate virtual memory directly. So what happens when you make a call to new or malloc?

You are using a heap!

The heap is a block allocator that reduces the amount of calls to the system. It will request a large block of memory and slice it to your needs. It will try to make your memory request fit in one of its available slots.

What happens when no slot is available? It asks the operating system for another block of virtual memory ,commits it and give you what you need.

What happens when no more virtual memory is available? A null pointer is returned and most of the time the program will cash in its chips unless it's been written to handle low memory conditions (hint : it hasn't been).

When you work with pointers, it's very important to keep your pointer valid (it's always pinned to use a .Net term). That means your heap allocator cannot move blocks around to optimize the space usage.

So you're randomly filling blocks of various sizes and cannot move them around. Here comes the fragmentation train, next stop is you.

Memory fragmentation results in increased memory usage as if you were leaking. It's pretty tough to narrow down the problem to fragmentation as you will generally waste of lot of time looking for a inexistent leak.

Differences between leaking and fragmenting

Fragmentation generally results in huge blocks of memory going away for no reason. Typically you'll suddenly lose 200 MiB of memory after allocating 20 bytes. First time it happens to you, believe me, you're going to get a bottle of rum and a cigar, and pray for Baron Samedi to help you on this case.

Memory leaking is usually triggered by a specific event (a call to the leaking code) and the memory increase is more linear and regular than with fragmentation.

The graph below sums up the differences:

Fragmentation vs Leaking

Please understand that the behavior of memory fragmentation is largely dependent on your heap allocation strategy. On this graph I exhibit a fragmentation where the allocator requests twice more memory than you're already using.

How to reduce heap fragmentation?

Use a better heap

The heap is nothing more than a chunck of code sitting somewhere in your C/C++ runtime. If you search the Internet a little bit, you will hear about lkmalloc, jemalloc, hoard, libumem, etc.

The classic strategy is to group allocations of similar sizes. If you have allocations close to, say, 4 kib, sitting one next to the other, you reduce the odds of fragmentation as it will be easier and faster to find room for new allocations.

Modern heaps are not only designed to fragment as little as possible but also to be efficient on multi CPU machines and to offer security and debugging features. So you might hit two birds with one stone: reduce fragmentation and get better performances!

FreeBSD comes with a very decent allocator by default (jemalloc since FreeBSD 7) and I see very few use cases where you would want to switch to another one.

In Windows, the default allocator is poor. For retrocompatiblity reasons, the more advanced memory allocator has to be explicitly requested. This allocator is called the "Low Fragmentation Heap" (LFH).

Activating the allocator consists in listing all existing heaps in your program and flipping the feature on for each one with the HeapSetInformation function.

Here is the code we use at Bureau 14 to switch all heaps to low fragmentation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
void get_heaps(vector<handle> & heaps)
{
    HANDLE bogus[1];
    // measure
    DWORD c1 = ::GetProcessHeaps(0, bogus);
    // resize
    heaps.resize(c1);
    // get
    if (::GetProcessHeaps(c1, &heaps[0]) != c1)
    {
        // fail, we discard everything
        heaps.clear();
    }
}

long low_frag_heaps(void)
{

    // getting the heaps
    vector<handle> heaps;
    get_heaps(heaps);

    if (heaps.empty())
    {
        // no heap found => error, there must be at
        // least the process heap
        return 0;
    }

    typedef std::pair<handle, LONG> heap_type;
    typedef std::vector<heap_type> heaps_vector;

    heaps_vector heaps_info(heaps.size());

    struct compat_info
    {
        heap_type operator()(HANDLE h) const
        {
            LONG type = 0;
            SIZE_T returnedLength = 0;

            ::HeapQueryInformation(h,
                HeapCompatibilityInformation,
                &type,
                sizeof(type),
                &returnedLength);

            return heap_type(h, type);
        }
    };

    // we get the type of all heaps
    transform(heaps.begin(),
        heaps.end(),
        heaps_info.begin(),
        compat_info());

    // and we only work on the ones that are of type 0
    // (standard heap)on the left we have all heaps
    // that are low fragmentation or look-aside on
    // the right all heaps that are standard (and will
    // need to be set to lf)
    // note : I don't think look-asides are available
    // in user mode
    heaps_vector::iterator std_start = partition(
        heaps_info.begin(),
        heaps_info.end(),
        lambda::bind(&heap_type::second, lambda::_1) != 0);

    // we want to include the heaps that are already lf in
    // our count (type 2)
    heaps_vector::size_type lf_count = std::count_if(
        heaps_info.begin(),
        std_start,
        lambda::bind(&heap_type::second, lambda::_1) == 2);

    ULONG hf_val = 2;

    // we activate Low Fragmentation on all std heaps
    for_each(std_start,
        heaps_info.end(),
        lambda::var(lf_count) +=
        lambda::bind<bool>(&HeapSetInformation,
        lambda::bind(&heap_type::first, lambda::_1),
        HeapCompatibilityInformation,
        &hf_val,
        sizeof(hf_val)));

    // no need to close heaps handle
    // cast is safe as we won't have more than 2^32
    // heaps...
    return static_cast<long>(lf_count & 0xffffffff);
}

The function returns the count of heaps that are in LF mode. You only need to call this once per program once all the heaps are created. In other words, call it after all your libraries have been initialized.

Note that the Low Fragmentation Heap is not available in debug mode unless you set the global variable _NO_DEBUG_HEAP to 1 . As the variable's name suggest, doing so will strip the heap of all debugging features.

Write better programs

It's easy to blame others for your own deficiencies.

If you have a wide range of allocations of very different sizes, the above approach will only have a limited effect. It is my experience that the programmer has to do some effort to optimize allocations to get the best out of the operating system.

In this post I give you a precise example about how you can hugely improve memory usage in your programs without (too much) hassle.

Life is complicated...

A memory intensive program needs to be written with a broad view. You will need to have a good understanding of memory management, use the right allocator and make efficient use of it.

Heap fragmentation is one of the many enemies that will ambush you on the path to awesome programming.

I know you're reading this and wonder when I'm going to tell you how to cook rice well. Don't worry, I'll keep my word. The trick is to wash the rice. Wash the rice as long as the water is white. Be thorough. Make sure your hands are clean and use them to mix the rice with the water. Here comes the next important step to have a wonderfully cooked rice: take the washed rice, put it in the rice cooker and press on.

June 14th, 2009

C++ memory management

C++ is the world of do-it-yourself memory management. When discussing programming with someone, as I reach the point where I tell that I mainly work with C++ the first thing I hear is "Oh... But you have to deallocate memory yourself... That's so cumbersome and obsolete!"

Although I don't have to wrestle with news and deletes anymore, I find this remark pretty interesting. It's true that for most projects having to bother about memory management and other similar details is a waste of time. But for many projects, tight memory management is the difference between awesome software and bloated software.

Generally these projects imply programs that stay up and running for months, middleware used by developers on crack or video games that really, really, shouldn't go below 60 frames per seconds.

I lied

Actually, the reason why I find the comment about C++ memory management interesting is because it shows a lack of comprehension of the whole memory management thing.

Garbage collection is not a silver bullet against memory leaks, it's a different approach that works very well in many cases but horribly wrong in some others.

The garbage collector memory basically tells you :

You won't have to bother about calling delete (or free, or whatever you want to call it), I'll be able to see when it's appropriate. But please, don't do anything stupid because I'm just a program and I won't call delete if I cannot know for sure that the object must be destroyed. By the way if I find the one who left a half eaten sandwich in the fridge I'm going to beat him to death with linked lists.

Most of C++ programmers use reference counted memory management through the use of smart pointers. It's pretty safe and straightforward but it also has got some catches. You have two main things to keep in mind: you must be careful with circular references and you must make sure that you don't keep unnecessary copies of an object (singletons...).

We also hope to see improvement in a multithreaded context when smart pointer will start to use the new atomic semantics (aka lock free smart pointers). Currently, increasing and decreasing the reference counter can be a bottle neck when a lot of threads access the same variable.

In addition, it's not garbage collected memory. Memory isn't "abstracted out". When you create an object and encapsulate it in a smart pointer, you're making a call to new which in turns call the heap allocator of your operating system and that might result in a extremely expensive request to the kernel. That's the second most frequent performance penalty.

All things being equal, smart pointers make the usage of dynamic objects safe and easy. This is why they're great. No more memory leaks. Safe deletion. I love them and use them a lot.

Wait.

I lied again

Actually, I don't use them a lot. You know what's better than smart pointers ? Not using pointers.
The problem is that, if you're not careful, you can end up with stuff like this:

1
2
3
4
5
6
7
struct object
{
    shared_ptr<object1> _o1;
    shared_ptr<object2> _o2;
    shared_ptr<object3> _o3;
    shared_ptr<object4> _o4;
};

and of course you're going to do

1
shared_ptr<object> o = make_shared<o>();

That's a minimum of 5 allocations for you. Notwithstanding other allocations that sub-objects may do.

And when you have something like:

1
vector<shared_ptr<object>>

The heap allocator looks at you with scared eyes. He knows his face is going to be the landing zone of the baseball bat you have in the right hand.

That kind of pattern is acceptable in a garbage collected language such as Java or C#, especially because the garbage collector is going to reuse unused objects and issue less calls to the kernel.

In C++ we do care about the allocations count, because it's an expensive operation. I can already hear you say "premature optimization is evil!", "measure and then optimize!" and "don't assume anything!".

Sure. Whatever.

How can we improve this?

Don't use dynamic object unless you really need to. With references, in C++, you can avoid using pointers in many situations.

The above object becomes:

1
2
3
4
5
6
7
struct object
{
    object1 _o1;
    object2 _o2;
    object3 _o3;
    object4 _o4;
};

And if you need an accessor:

1
const object4 & get_o4() const { return _o4; }

That gives even more guarantees in terms of safety because the return value can never be a null pointer.

And maybe you can place it in the vector as in

1
vector<object>

Which is more straightforward.

Also don't forget that

1
2
3
4
struct object
{
    object1 & _o1;
}

Is perfectly legal, the only requisite being that o1 must be initialized in the constructor. Another alternative to dynamic allocation.

But I really need a dynamic object!

Of course you do. You're not going to put everything on the stack, are you? You may also want to make use of polymorphism. Last but not least, pointers have the advantage of taking only few bytes which is great when you want to swap data around.

But keeping their number to a minimum is a good way to reduce heap pressure and horrible heap fragmentation issues. Fragmentation is an issue whatever awesome heap manager you believe to use.

It occurs when you allocate and deallocate a lot of object of various sizes. It becomes difficult for the heap manager to find a suitable location for your new memory chunk, resulting in an increase of the heap size. When that happens, performances drop, memory usage raises and users cry.

This is extremely true when having large STL containers with a lot of dynamic objects. Not only the maintenance of your STL container is going to require dynamically allocated memory, but your objects too! You can easily find yourself doing thousands of small allocations when applying algorithms to your collections.

Something as innocent as:

1
2
3
4
5
6
7
8
vector<shared_ptr<object>> v1;
vector<shared_ptr<object>> v2;
vector<shared_ptr<object>> r;
set_intersection(v1.begin(),
    v1.end(),
    v2.begin(),
    v2.end(),
    back_inserter(r));

Can kill your heap for good (I know it lacks a functor to make something useful, but you get the idea).

But Edouard, you n00b! You're using a vector whereas in that case a deque would be more appropriate! lol!

Right. Deque will help. But not enough. You're going to make so many hidden news and deletes that the heap will agonize, crawl to you and beg for mercy.

You need a pool in your backyard

You're not pimping you C++ program enough if you don't put a pool in your backyard. A memory pool is a specialized allocator that creates and destroys objects of identical sizes.

As you can guess, it's pretty easy to fill a hole in your closet when it's designed to receive boxes of identical sizes. That's what a pool is all about. Not trying to solve the difficult general case, but solving well an easy special case.

A well used pool results in increased performances and reduced memory fragmentation.

Guess what, the boost libraries have pools. If you couple one with a shared pointer, you'll be ready to rain down meteors on your containers.

How to use safely and efficiently the boost pool

Let's say we have an object we want to allocate dynamically a lot:

1
2
3
4
5
6
7
8
9
struct rock_it
{
    rock_it(void) : _a(0), _b(1), _c(2), _d(true) {}

    int _a;
    int _b;
    char _c;
    bool _d;
};

I know, this object rocks. It's probably the most awesome object to be ever written in C++ or any other language.
In our example we will use the singleton pool which is safe in a multithreaded environment. You define the pool with a typedef and a tag:

1
2
3
4
5
6
struct a_pool_tag {};
typedef singleton_pool
<
    a_pool_tag,
    sizeof(rock_it)
> elite_allocation;

This pool will create as many blocks as needed. Each block will be large enough to hold a rock_it object. It's got the disadvantage of requiring explicit calls to free. You probably don't want that. Even if the pool can free all allocated objects at once when requested to, throughout the life of your program you probably want to keep in memory no more than what's needed.

You can do that in encapsulating your buffer into a smart pointer. You allocate the necessary memory with the pool and then you use a placement new on the allocated buffer to construct it. Then, you put that into a smart pointer and make sure it's going to be deallocated using the pool's semantics.

We'll design destruction first. Our smart pointer is going to need a bespoke destructor, the default one calls delete on the buffer and makes Mr. Access Violation pay you a visit.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
struct rock_it_deallocator
{
    void operator()(rock_it * yeah)
    {
        // delete *must* work with NULL
        if (yeah)
        {
            // must come from our allocator
            BOOST_ASSERT(elite_allocation::is_from(yeah));
            // explicit destruction
            yeah->~rock_it();
            // now we free the memory
            elite_allocation::free(yeah);
        }
    }
};

The assertion is pretty handy to catch you mixing custom deallocators. Thanks to the tag you specified in the pool definition, it's got the ability to recognize its own people. You want this error to be caught as early as possible in your code.

Other than that, he custom deallocator is all about making an explicit destructor call and then freeing memory. If you did the opposite the result would be undefined.

Destruction after deallocation would be like trying to carefully dismantle a building after a nuclear attack. I did that once. Even if you're careful with the radiations you spend your time figuring out if the pile of sand you have in the hand was the door or the roof.

Now time for the full fledged construction:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
rock_it_ptr make_rock_it()
{
    // let's get a block of memory
    void * p = elite_allocation::malloc();
    // boost allocator doesn't trow, you might want to
    // throw when you cannot allocate
    if (!p)
        throw bad_alloc();

    // we construct on the allocated block,
    // we don't use the
    // strandard new allocator
    return rock_it_ptr(new(p)rock_it(),
        rock_it_deallocator());
}

This weird new call is the infamous placement new. It's a nice C++ feature that enables you to build objects on already allocated memory. Also note the pointer validation after the call to the pool, you might want to have a difference behavior (such as returning a null pointer instead of throwing an exception).

That's all there is to it. You can now pass around your rock_it_ptr without having to bother how they were allocated and how they should be destroyed. The pool allocator will grow as necessary and you can be confident that freed objects will make room for new objects in a blink of eye, without fragmenting memory. Your memory management strategy is carefully delimited into the make_rock_it function. No one needs to know how you created the object to use correctly.

That's what C++ and generic programming is all about: solving problems once.

Conclusions

  • Don't use dynamically allocated objects unless you really need to. References help.
  • If you know you're going to need a flurry of identical objects, use a pool. Combine the pool with reference counting for optimal memory management.
  • Encapsulate your memory management. This decreases maintenance costs and raises reliability.