TLDR version: Your program is
not going to walk the heap and free blocks when it exits. There is one primary exception to this rule: a leak detector, which walks the heap to list any blocks that are still allocated. (Even then I doubt they actually free them.)
It's better to say "don't worry about memory your program allocates wasting space after it exits, the OS cleans it up" then to talk about your program walking the list of allocated blocks when it exits and freeing each of them. The former is both a simpler explanation and actually correct (at least to the extent you can say anything regarding such behavior in a C program).
PM 2Ring wrote:Notice that I put the word "system" in quotes. I wasn't (necessarily) talking about the OS, I was talking about the invisible stuff supplied by the compiler / linker (generally in a _main() function) that handles such things as opening stdin/stdout/stderr, and the automatic freeing of allocated memory and closing of open files upon normal program termination.
Again, these things are
not generally handled by your program. (Also, ) Why? Because the OS will take care of it. Especially in the case of memory, why have your program do a bunch of extra work for no reason? How your program runs your heap is up to your program (do you even have a heap?) and the OS is going to do exactly the same thing no matter whether your program walks over all
malloc()'d blocks and frees them or not -- because it doesn't care what the contents of your process's pages are. So why would your program do a bunch of extra work it doesn't need to?
You can see this effect in action.
- Code: Select all
#include <stdlib.h>
#define NUM_BLOCKS 10000000
int * blocks[NUM_BLOCKS];
int main()
{
int i;
for(i=0; i<NUM_BLOCKS; ++i) {
blocks[i] = malloc(10);
}
for(i=0; i<NUM_BLOCKS; ++i) {
free(blocks[i]);
}
return 0;
}
If I run this on my system (amd64 Linux, compiled with GCC 4.3.2 with
gcc -O2 leak.c), it takes about .63 seconds to execute:
- Code: Select all
~/delete : time ./a.out
./a.out 0.48s user 0.15s system 100% cpu 0.631 total
~/delete : time ./a.out
./a.out 0.49s user 0.14s system 99% cpu 0.630 total
~/delete : time ./a.out
./a.out 0.49s user 0.14s system 99% cpu 0.628 total
If I comment out just the call to
free, it takes more like .46 seconds to execute:
- Code: Select all
~/delete : time ./a.out
./a.out 0.29s user 0.17s system 99% cpu 0.459 total
~/delete : time ./a.out
./a.out 0.33s user 0.13s system 99% cpu 0.458 total
~/delete : time ./a.out
./a.out 0.31s user 0.18s system 100% cpu 0.491 total
What can we conclude from this? If the runtime
is walking the heap, it is doing it in a far more efficient way then what you have access to as a programmer (i.e.
free). It's not
completely impossible (e.g. they could provide a different version of
free that doesn't do coalescing), but when you put all this information together, Occam's Razor says that the better explanation is that it's not doing it at all.
Edit:
Actually we can make this more precise even. Add code to measure timing of malloc and free. I put a call to
gettimeofday before and after each loop, then printed out the times for those loops. (As a side note, every time I read the manpage for
gettimeofday I feel like the next manpage I read is going to just be the lyrics of Never Gonna Give You Up. There are two parameters, but you are required to pass NULL for the second? It returns a value which is guaranteed to be 0? Whose idea of "good API design" is this?)
Now we get this (just one run of each since the times are consistent among 3; this is the median):
With free:
- Code: Select all
malloc took 0 seconds and 428884 microseconds
free took 0 seconds and 170775 microseconds
./a.out 0.47s user 0.15s system 99% cpu 0.622 total
In other words, more than 0.600 sec took place within
main() proper, which means about 0.022 seconds outside of it.
- Code: Select all
malloc took 0 seconds and 431134 microseconds
free took 0 seconds and 0 microseconds
./a.out 0.31s user 0.14s system 100% cpu 0.454 total
This leaves 0.023 seconds outside of my
main(). That's 0.001 seconds more than the run with
free().
In other words, if the runtime
is walking over the heap explicitly freeing all still-allocated blocks, it's doing it two orders of magnitude faster than my free loop. Yeah right.
(BTW, I also tried counting down from NUM_BLOCKS-1 to 0, freeing in that order. Came out exactly the same.)