A GDB Primer
Cengiz
Akinli
This is a VERY quick and
dirty primer on using GDB to debug the basic C++ projects in CS2604 and similar
courses.
GDB is great because it lets
you see the innards of your program after it has crashed OR while it is still
running. You can stop the program at any point and even under specific
conditions only, and take a look at variable values. Then, you can walk it
forward line by line, watching variable values to see what it's doing, and even
run functions at any time, even though they're not called by your code.
I'm not even gonna try to
write an exhaustive howto even on just the commands we would use in our simple
projects. Instead, I'm just gonna give as concise a description as possible of
those commands, and how we're most likely to use them. The purpose of this
document is to give those of you who are an hour away from the deadline and
have no idea what's wrong with your program a way to get up and running with
GDB as fast as possible to solve what, with GDB at your disposal, could quickly
turn into a trivial problem. GDB is a very
robust debugging tool, and what I'm describing here is a tiny portion of what
it can do. The debugger in MS VS.net is
like a toy compared to GDB. Now, it
takes time to learn, but that's what any powerful system would require. Don't let MS convince you that ease-of-use
equals power. If that were true, we'd
all be learning to program in BASIC right now, wouldn't we?
If you've got time to delve
deeper, the online help is far more useful than what I've written here if
you have a working knowledge of the program, and they do a far better job than
I ever could. Problem is, if you don't have a working knowledge, the online
help may was well be Greek. Well, this document will hopefully give you that working
knowledge so that when you do have time, you can go learn more and actually get
something from the help.
Look at this document as
more of an application-oriented (the application being the debugging of
programs written for class) tutorial.
Alright, here we go.
Note about this document: Shell commands will appear in blue, whereas gdb
commands will appear in red. Both will include prompts for clarity which you do
not type. Shell commands are preceded
with a $,
and GDB commands with a (gdb), and all computer input and output is in monospace fonts. Shell
commands are for csh (I don't know bash well). To run csh type:
$
csh
1. Rebuilding with debugging symbols
Before you can do any of
this, you have to have debugging symbols in your executable. This is done at compile time by adding the -g switch to the
gcc/g++ compile command. So,
$ g++ -c -o somefile.o somefile.cpp
becomes
$ g++ -c -g -o somefile.o somefile.cpp
Rebuild everything using
commands like that, then link it all together to make your executable as
normal.
2. Start GDB
You can use GDB in one of
two ways:
1. To examine the core file.
This is the innards of a dying program
dumped into a file for later analysis. Very handy. This is often referred to as
post-mortem debugging. To do it, type:
$ limit coredumpsize unlimited
This allows your dying program to write
a core file in case your default environment isn't already setup this way.
Then, run your program, and after it
dies (it should say that it dumped core), do
$ gdb myprog core
Where myprog is your
program's executable filename.
2. To examine a running program. This, in turn, can be accomplished in one of two ways:
a. Attaching to an
already running instance of your program.
This is good if your program HAS to read input from a file or do other stuff that's
not easy to do from within GDB.
To do this, figure out your program's process ID, then start GDB with,
$ gdb myprog
Then, within GDB, do
(gdb) attach PID
where PID is your program instance's process ID.
b. Running the program from withing GDB.
Startup gdb with,
$ gdb myprog
Then, if you want it to stop at a particular location within the
program, in GDB, do
(gdb) break somefile.cpp:123
where somefile.cpp:123 is the source code
file and line number where you want GDB to stop the program. Note that it will stop the program just
BEFORE executing that line of code, not after.
Finally, run the program with,
(gdb) run
2.1 Debugging a
crashing program
Ok, so now we're in GDB and examining either a running program or a core file. The first thing GDB is gonna tell you is the line number where the program is currently stopped (if you have a live program), or where it crashed, if you're examining a core file.
Unless it's a really simple program, the first thing you will want to do is get a backtrace of the execution stack. This is done with,
(gdb) backtrace
or
(gdb) bt
for short.
The output will look something like this example:
(gdb) bt
#0 0x40084ea8
in chunk_alloc (ar_ptr=0x40119d60, nb=16) at malloc.c:2875
#1 0x400845ce
in __libc_malloc (bytes=7) at malloc.c:2696
#2 0x40089a29
in __strdup (s=0xbfffdc4f "(null)") at strdup.c:43
#3 0x804c706
in mycc_stat (ct=0xbffff874, icc_conn=0xbffffb18)
at
invoice.c:415
#4 0x80554c0
in batchclose (icc_conn=0xbffffb18) at bclose.c:158
#5 0x8054eba
in close_batch (icc_conn=0xbffffb18) at bclose.c:33
#6 0x804a97e
in main () at icc.c:96
Now, this is a stack. Remember what that means. It's a LIFO. It tells me that my main() function was at line 96 in file icc.c, executing function close_batch(), which got to line line 33 in file bclose.c, where it executed function batchclose(), which got to line number 158 in that same file, where it executed function mycc_stat(), which got as far as line 415 in file invoice.c, where it executed function __strdup(), etc.
But __strdup() is not my function. The program continued down into the bowels of system code until the actual operation (a memory write, usually) illicited the signal. But the last function (counting from the bottom of the stack going up) that is in my code is mycc_stat(). I now know that my program died in function mycc_stat() at line 415 of file invoice.c. So that's where I'm looking first. I need to go to that stack frame and poke around at variable values and code and try and figure out what happened. I go to that stack frame with,
(gdb)
frame 3
Now, I can start poking around. I can list source code with the list command or what's probably better is to just bring up the code in another window. Immediately after giving the frame command, GDB tells me what line number we're on, and lists that line of code:
#3 0x804c706
in mycc_stat (ct=0xbffff874, icc_conn=0xbffffb18)
at
invoice.c:415
415 ct->dbtype = strdup(ptr2);
Ok, so my first guess is usually that I've done something bad with a NULL pointer. It's not necessarily the MOST likely thing, but it's the easiest to find, so I check that out first. Either ct is NULL, and my attempting to dereference it died, or perhaps ptr2 is NULL, and strdup() choked on it. Here, I have my first clue. In C/C++, the RHS of an assignment statement is evaluated THEN assigned to the LHS. Well, remember, my program died INSIDE the strdup() call. So my guess is that ptr2 is either NULL, or points off into space somewhere that my program wasn't allowed to read from. So let's see which:
(gdb)
print ptr2
$1 =
0xbfffdc4f "(null)"
Aw crap. Well, it isn't NULL. It points to 0xBFFFDC4F (that's the hexadecimal representation of the offset in the program's memory block to which this guy points). Don't let the "(null)" confuse you. That's an actual string used by the program. If ptr2 had been NULL, I'd have gotten:
$1 =
0x0
But, I'm not that lucky. There's no obvious reason it should be crashing here. Now what?
Well, the news isn't all bad. There is a good place to go for help. The odds are surprisingly good that it's crashing here because at some other point, my program overflowed another pointer and overwrote some bit of data and/or executable instruction code. If it had tried to write to address 0x0 (NULL), then that would've illicited a segfault from the OS immediately. But instead, it may have written some place proper, and just written too much. The problem only cropped up here because we tried to access that data or code just now. But it would seem I have no way of knowing when that happened.
Or do I?
Well, there was a time when people in this situation actually WERE screwed. But now, we have a whole class of tools called malloc() debuggers. The malloc() system call is the basic memory allocation call used to allocate memory. We used it for the memory manager whether we knew it or not (new and delete actually call it). It allocates memory, and returns the address (sound familiar?) of the start of the memory block. But unlike a self contained memory manager, it leaves it to us to do the writing. It allocates how ever many bytes we ask for and gives us back the starting address, trusting us not to write more bytes than we requested. So what happens if we do? Well, funny things first, then death. But oftentimes, the death comes many dozens or hundreds of lines of code later, leaving us guessing where the misbehaving line of code actually was.
So malloc() debuggers help us out. Instead of giving us blocks of memory that are sequential, which is what the system malloc() call does, they replace the system malloc() with one that reserves a block of memory before and after each block they allocate for us. This way, if we write past the end (overflow) or out the front end of the block (underflow), we go into that protected page and illicit the segfault immediately, and presto!, the errant line of code can hide itself no more! So the debugger doesn't stop the program from crashing, but rather makes it crash when it should, rather than later on or not at all (which is even worse). This is great for us, as students. Because if our code does something screwy, we have the added risk that it MAY NOT crash at ALL with our own test data. So we feel confident and hand in a broken program. Instead, malloc() debuggers help us flush out these problems before the TA's do.
Ok, so there's no shortage of these malloc() debuggers flying around the net. I like ElectricFence. You need to go get it, build it, and install it. You can install it in your own account on the lab machines, or on /usr/lib on your own machine. In your account, I would recommend making a directory called lib off your home directory and putting it there. That way in the future, you can put other libraries there for reuse, including your own code while developing future programs. Then, you need to link against this library. You need to add -lefence to the end of your g++ link command. If you're doing this in your lab account, you also need to add your ~/lib directory to your library search path with the -L option (if you're on your own machine, and you installed ElectricFence in one of your /lib, /usr/lib, or /usr/local/lib directories, you don't need to do this). So for me, my linking step changes from
$ g++
-o myprog somefile.o someotherfile.o main.o
to
$ g++
-L/home/ugrads/c/cakinli/lib -o myprog somefile.o someotherfile.o main.o
-lefence
Note that I can't use the ~ abbreviation here because I can't have a space after -L, which will keep the shell from expanding out the ~ alias.
Alright, so now we have a program that should crash where it went wrong instead of much later.
But what if this doesn't do it for us? What if it's not crashing it all?
2.2 Debugging a misbehaving program
Well,
we know how to load the program into GDB and run it. We know about the break
command. Now we'll learn a few more to
track down those annoying data errors.
What we're gonna do is trace the program line-by-line, or
block-by-block, and examine variables at each step to figure out what's going
on. So we start with the break
command. Decide where we want the program
to stop and kick us out to the GDB command line. Do that now, then run the program. Maybe feed it input, do whatever, and whammo,
GDB stops the program where you told it to, and you're now staring at a (gdb)
prompt. What are some things you can do?
print
Print
the value of an expression, be it a primitive type or a pointer to any
type. In the case of a pointer, the
pointer address is printed and, if possible, the value stored there.
e.g.
(gdb) print someVar
printf
Execute
a complex print statement using a format string with conversion specifiers
exactly like the ANSI C printf() call.
e.g.
(gdb) printf "x
= %d\ny = %d\nf = %5.2f", x, y, z
next
Execute
the current line of code and move on to the next.
step
Step
into the current line of code, if a function call, and stop before the first
line of execution within that function.
continue
Continue
execution until the next breakpoint or end of program is reached.
until
Continue
execution until a line numbered higher than this one is reached. Typically, you do this at the end of a loop
to run the rest of the loop, then stop again.
finish
Continue
execution until the current stack frame (current function) returns.
break
Insert
a breakpoint at a specified line in the current file or another specified file.
e.g.
(gdb) break 415
(gdb) break somefile.cpp:251
Note
also that all of the commands which control program flow take an optional
numerical argument to specify essentially how many times in a row the command
should be executed. So let's say you're
in a loop, and you've set a breakpoint at the fifth line of the loop, but you
know that the line of data that's causing problems is the tenth piece of
data. You might do:
(gdb) continue
8
to
let the program continue through the next 8 iterations of the loop, passing right through the
breakpoint 8 times before you stop execution and start looking at things
line-by-line. This applies to the next,
step, continue, until, and finish commands.
This
is pretty handy, and you'll use it a lot.
It won't always be the first piece of data that chokes your
program. Sometimes, it'll be exactly the 513 iteration of a
particular for loop that crashes, but only in the fourth call to the
function. You can figure out how many
times a particular breakpoint is crossed before you get to that point and give
the appropriate continue
command to get there. Then you can keep
rerunning the program from the beginning with a new run command until you figure out what's happening.
And
lastly, you only need to type enough of a command to distinguish it from the
others (sort of). So next, step,
continue, and until can all be abbreviated as n, s, c, and u respectively.
3. A
few cool tricks
3.1 Function calls
Ok,
so you can use the GDB printf command to print several things out
formatted. But it's a GDB command that
just works like the printf() call. So
what if you could actually call printf()?
Well, you can. You can either use
the call command, or enter a
function as part of an expression whose value you want to print with either the
print or printf GDB commands:
call printf("x = %d\ny = %d\nf =
%5.2f", x, y, z)
print myList->isEmpty()
print myList->current
The
first command is equivalent to the printf GDB command we used earlier except
that it calls the actualy printf() system call to do the job instead. The second command actually executes the
isEmpty() function in myList. The third
prints out a private data member of
myList-- that's right-- private. When
running a C++ program with GDB, you are basically God. You can do anything, call anything, and see
anything you want. You can even jump
over lines of execution in a function (see the jump command).
So
then, could you, per se, add a debugging function to a class, like a
linked-list class, that would print out the entire list, head to tail, but
would never be called by your program, just so that you could call it from
within GDB?
The
answer is yes. You could do that for ANY
class or complex data structure. The
print and printf GDB commands are only good for primitive types or more complex
objects that are small and can be referenced on a line or two of code. But now that you can call entire functions,
you can write these debugging functions into your code and call them from
within the debugger with any arguments you want.
This is an immensely powerful concept.
Think about that for a minute.
With
this, you can easily see any data structure in your program at any time without
having to type more than a single command from within GDB. You can really see the innards of your
program step-by-step.
3.2 Conditional execution, tracepoints, etc
That's
about it. The only other thing you need
to know is how to find out more. The
answer is the help command.
By
itself, you'll get a list of categories containing other commands. You can type help followed by a category name for a list of commands
in that category, or by a specific command for the help page for that
command. Here are some specific help
pages you may wanna check out:
You
can stop at a breakpoint only if a certain condition is met. See the condition command.
You
can set some GDB commands to be executed automatically at a breakpoint. See the commands command.
You
can set breakpoints that don't actually stop the program, but just do stuff
when they're reached. These are called
tracepoints. See the tracepoints help page.
Now
you know how to get started debugging your program with GDB.
I
hope you found this useful. If so,
kindly email Prof. McPherson and let him know that he should give me an 'A.'
-Cengiz