Some of these tools are not yet available in any cut version of Dervish --- but they are all checked into cvs; they should all be in version 6.8.
The hardest class of bug to find in C programs is probably memory problems, either leaks or corruption. Fortunately, dervish has a number of tools available to help you in your task.
In the following, the term heap will sometimes appear; it's the name of all the memory under shMalloc's control; that is, the memory that has been handed out by shMalloc (and maybe shFreed again). You should also know that every piece of memory that dervish hands to the user, whether via shMalloc or shRealloc, has a unique serial number (the same address may be used several times).
Dervish will always:
CHECK_LEAKS
to the C pre-processor, you will also be given:
The first class of problems are either a programming error (e.g. calling shFree on a variable that you declared as an array), or else the block is really valid, but the heap is corrupted; if the latter, consult the section on heap corruption.
You can diagnose the second class of problems using the tools discussed under memory callbacks.
memBlocksPrintRange
that prints out all the blocks of memory
that have been allocated but not freed. Because you have almost certainly
got some blocks that are not going to be freed (e.g. stuff allocated in
startup code), it actually only tells you about blocks in a specified
range of memory serial numbers. The photo
group uses the following proc as a convenient wrapper:
proc mortal {args} { global startMem if {$args == "set"} { set startMem [memSerialNumber]; return } if {![info exists startMem]} { set startMem 0 } memBlocksPrintRange [expr $startMem+1] [memSerialNumber] $args }Used as:
allocate lots of stuff for startup mortal set do lots of work, cleaning up carefully if {[mortal] != ""} { error "Found a memory leak" }The output looks like:
0x10d7f1c0 {18 2 64 region.c 875} 0x10d58a50 {17 44 64 region.c 349} 0x10d33260 {16 3 64 region.c 340} 0x10d18fb0 {15 4 64 region.c 934} 0x10d674d0 {14 72 128 region.c 331 h0 REGION}Where the first column (0x10d7f1c0)is the address that should have been freed, the second (18) is the serial number, the next two (2 64) are the number of bytes allocated, and the size of the internal block that dervish used to satisfy the request for memory. The next two fields (region.c 875) are the file and line number where shMalloc was called.
In the case of block 14, there are two additional fields (h0 REGION), which tell you that the block, of type REGION, is bound to handle 14.
memCheck
that checks the entire heap
for corruption;
with the option -abort it'll call shFatal if it finds any. It is a good
idea to call this at about the time that your tcl framework calls
memBlocksPrintRange to check for memory leaks. It is very helpful to
track down memory corruption as soon as it's introduced into your
program, even before it starts leading to symptoms.
-DCHECK_LEAKS
, every call
to shMalloc, shRealloc, and shFree contains the file and line number where
the call is made. This is used in error messages when dervish detects
problems, as well as in the output from
memBlocksPrintRange
Let's first consider finding a memory leak. The output of memBlocksPrintRange indicates that the block with serial number 50258 is never freed. Add a function that looks like to your main program,
static void malloc_trace(unsigned long thresh, const SH_MEMORY *mem) { printf("Allocated block %ld\n", thresh); }add a line
shMemSerialCB(50258,malloc_trace);recompile, and when block 50258 is allocated, a message is printed.
This may not seem very helpful, but when used in conjunction with a debugger things look up. Set a breakpoint in malloc_trace, and the program will stop when your block is allocated, which is usually enough to diagnose the problem.
Once you've decided to use a debugger, the whole procedure can be streamlined. Rather than adding the line
shMemSerialCB(50258,malloc_trace);only when block 50258 catches your fancy, leave the line
shMemSerialCB(0,malloc_trace);in permanently. Then use the debugger to set the variable
shMalloc::g_Serial_threshold
to 50258, and proceed as
before (that's what gdb likes to call it; with e.g. dbx your mileage
may vary).
If your problem is a doubly-freed pointer, you need to define
malloc_free_trace
, call
shMemSerialCB(0,malloc_free_trace);set a breakpoint in malloc_free_trace, and set
shMalloc::g_Serial_free_threshold
.
p_shMemCheck
to check the heap. I usually
run it from a memory callback like that described in the
previous section:
/* * This callback can used to check the heap for corruption at any desired * granularity (set by the variable frequency) */ static void malloc_check(unsigned long thresh, const SH_MEMORY *mem) { static int abort_on_error = 1; /* abort on first error? */ static int check_allocated = 1; /* check allocated blocks? */ static int check_free = 1; /* check free blocks? */ static int frequency = 10; /* frequency of checks */ shAssert(mem != NULL); /* use it for something */ if(frequency > 0) { p_shMemCheck(check_allocated, check_free, abort_on_error); shMemSerialCB(thresh + frequency, malloc_check); } }Followed by a call to
shMemSerialCB(0,malloc_check);and setting the variable
shMalloc::g_Serial_threshold
to whatever value you want to
start checking the heap (set it to 1 to start at the beginning of your
program. Using malloc_check will slow things down, so I usually increase
the starting threshold as I localise the problem).
n
bytes and return them, or not return at all. It's called when
dervish has failed to allocate the desired memory, so simply calling shMalloc
(or malloc) is unlikely to work; you'll have to free something first. Set
by shMemEmptyCB
mem
is the
offending block, and thresh
is the current value of
p_Serial_threshold. You needn't do anything in your callback function,
simply returning will probably not lead to any trouble --- but you should
fix the underlying problem immediately.
Set by shMemInconsistencyCB
b file.c:123 if obj1->id == 123Some data types, however, have no such luxury, but all is not lost as you can use their memory serial number; you can find this by saying (in gdb)
p ((SH_MEMORY*)obj1 - 1)->serial_numberafter which the preceeding break point could have been set as
b file.c:123 if ((SH_MEMORY*)obj1 - 1)->serial_number == 12695
If I wanted to watch when that object was created, I could have registered a callback for memory block 12695.