The "IT DOESN'T WORK!" thread

A place to discuss the implementation and style of computer programs.

Moderators: phlip, Moderators General, Prelates

EvanED
Posts: 4331
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI
Contact:

Postby EvanED » Mon Jun 11, 2007 10:14 pm UTC

...and this is why I disagree with Linus for saying that writing kernel code in C++ is a "bloody stupid idea".

Alternately, it shows you why you should run Lint. ;-)

iw
Posts: 150
Joined: Tue Jan 30, 2007 3:58 am UTC

Postby iw » Tue Jun 12, 2007 1:26 am UTC

EvanED wrote:...and this is why I disagree with Linus for saying that writing kernel code in C++ is a "bloody stupid idea".

Er, to be fair, the same problem exists in C++. Also, C++ is complete hell to debug.

I agree with you on Lint, though.

User avatar
ZoFreX
Posts: 70
Joined: Thu May 10, 2007 11:23 pm UTC
Location: Bristol, UK

Postby ZoFreX » Tue Jun 12, 2007 2:33 am UTC

Rysto wrote:The problem? I forgot to add in breaks for every case in the switch statement. :oops:

If I had a pound for every time I'd done that...

Fixed pircbot, and suggested a patch for it :)

Edit:
iw wrote:the same problem exists in C++

How is it a problem? It's an intentional design decision that increases the power of the switch/case statement, one that I've used on a few occasions...

EvanED
Posts: 4331
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI
Contact:

Postby EvanED » Tue Jun 12, 2007 2:59 am UTC

iw wrote:
EvanED wrote:...and this is why I disagree with Linus for saying that writing kernel code in C++ is a "bloody stupid idea".

Er, to be fair, the same problem exists in C++.


In C++ he could have gone with the first approach (returning from each case instead of breaking to the unlock) because RAII is possible.

There's an argument to be made for single-entry, single-exit, which would mean that this approach would be frowned upon anyway, but that's a somewhat separate issue.

Also, C++ is complete hell to debug.


Eh, I disagree. Not more or less than C anyway.

ZoFreX wrote:
iw wrote:the same problem exists in C++

How is it a problem? It's an intentional design decision that increases the power of the switch/case statement, one that I've used on a few occasions...


Because it's too easy to make that mistake, and (if you ignore empty case blocks) is probably an error 98% of the time.

The 'correct' approach IMO is to take the C# method. It is a compiler error to have:

Code: Select all

 case blah1:
     statement;
 case blah2:


However, you can implicitly fall through if there is no statement:

Code: Select all

 case blah1:
 case blah2:
     statement;


And explicitly fall through if there is a statement:

Code: Select all

 case blah1:
    statement;
    continue;
 case blah2:
     statement;


It more or less eliminates the potential for the bug without requiring an external tool, and adding only a minimal additional effort for the edge case where fallthrough is what you want.

iw
Posts: 150
Joined: Tue Jan 30, 2007 3:58 am UTC

Postby iw » Tue Jun 12, 2007 3:50 am UTC

EvanED wrote:
Also, C++ is complete hell to debug.

Eh, I disagree. Not more or less than C anyway.

I was referring specifically to using gdb to debug things. You may still disagree with me, but dealing with mangled functions and vtables is not a good time. Of course, there's the possibility that you use another tool to debug C++.

iw
Posts: 150
Joined: Tue Jan 30, 2007 3:58 am UTC

Postby iw » Tue Jun 12, 2007 3:54 am UTC

Separate post for separate topic:
EvanED wrote:The 'correct' approach IMO is to take the C# method. It is a compiler error to have:

Code: Select all

 case blah1:
     statement;
 case blah2:

But what if you want to do:

Code: Select all

case "BLACK":
  flags |= BLACK;
case "RED":
case "BLUE":
  doColor(color);
  break;
case "FIRETRUCK":
...

or something where you want to do the same thing for multiple cases, but some require some extra stuff to be done?

EvanED
Posts: 4331
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI
Contact:

Postby EvanED » Tue Jun 12, 2007 3:55 am UTC

iw wrote:I was referring specifically to using gdb to debug things. You may still disagree with me, but dealing with mangled functions and vtables is not a good time. Of course, there's the possibility that you use another tool to debug C++.


GDB demangles names. I debug C++ code with GDB on a fairly regular basis.

When doesn't it? If you don't have debugging information and you're going off of raw symbols or something?

EvanED
Posts: 4331
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI
Contact:

Postby EvanED » Tue Jun 12, 2007 3:57 am UTC

iw wrote:Separate post for separate topic:
EvanED wrote:The 'correct' approach IMO is to take the C# method. It is a compiler error to have:

Code: Select all

 case blah1:
     statement;
 case blah2:

But what if you want to do:

Code: Select all

case "BLACK":
  flags |= BLACK;
case "RED":
case "BLUE":
  doColor(color);
  break;
case "FIRETRUCK":
...

or something where you want to do the same thing for multiple cases, but some require some extra stuff to be done?


Then you write this:

Code: Select all

case "BLACK":
  flags |= BLACK;
  continue;         <== I would bold this line if I could
case "RED":
case "BLUE":
  doColor(color);
  break;
case "FIRETRUCK":
...


Java does it without the option of continue (I think...), which isn't an improvement.

iw
Posts: 150
Joined: Tue Jan 30, 2007 3:58 am UTC

Postby iw » Tue Jun 12, 2007 7:56 am UTC

EvanED wrote:...continue...


I'm having an illiterate day, apparently...

iw
Posts: 150
Joined: Tue Jan 30, 2007 3:58 am UTC

Postby iw » Tue Jun 12, 2007 8:03 am UTC

EvanED wrote:GDB demangles names. I debug C++ code with GDB on a fairly regular basis.

When doesn't it? If you don't have debugging information and you're going off of raw symbols or something?

I haven't done any C++ coding since 2004 or so, but all I remember is 1) someone claiming this was the case and 2) having to deal with a lot of pain regarding debugging and templates. Looks like I picked up some fallacies somewheres.

Rysto
Posts: 1460
Joined: Wed Mar 21, 2007 4:07 am UTC

Postby Rysto » Tue Jun 12, 2007 1:48 pm UTC

Yeah, I've long thought that using the continue statement like that would catch a lot of potential bugs.

Rysto
Posts: 1460
Joined: Wed Mar 21, 2007 4:07 am UTC

Postby Rysto » Tue Jun 12, 2007 3:15 pm UTC

So I just spent my morning designing a feature. I was very happy with the whole design -- it should offer some nice efficiency enhancements over the current implementation. Then as I was going over it, I noticed something: You can't acquire any reader-writer locks in our system if you've acquired any other mutexes. What the hell is the point of the reader-writer locks, then? Argh!

User avatar
taggedunion
Posts: 146
Joined: Fri Jul 06, 2007 6:20 am UTC
Location: BEHIND YOU

Postby taggedunion » Fri Jul 06, 2007 6:59 am UTC

Heh, because of definition conflict that occurs in BSD but not GNU/Linux, I can't use an unadorned 'isnumber' macro for an interpreter I'm writing. Redefining the macro is okay, but it also has a function prototype, and so far, I know of no way of undefining a prototype. Yeah, I'm screwed. :P

Hmm, would there be any given situation where redefining a function prototype already in scope would be useful?

No, I can't not include ctype.h. I'm using it. :P ;)
Yo tengo un gato en mis pantelones.

Fieari
Posts: 102
Joined: Mon Jan 29, 2007 2:16 am UTC
Location: Okayama, Japan

Postby Fieari » Fri Jul 06, 2007 2:59 pm UTC

ZoFreX wrote:Struggling with pircbot to get it to bloody well autoreconnect! Here's a sampler:

onDisconnect() // Called when the bot disconnects
{
disconnect(); // Just to make sure...
reconnect(); // Throws a "bot already connected error"

Fun, fun, fun.

Try making it pause for a moment or two before reconnecting?
Surely it is as ridiculous to consider sqrt(-1) "imaginary" because you can't use it to count pieces of chalk as to consider the number 200 imaginary because by itself it cannot express the location of one point with reference to another. -Isaac Asimov

Rysto
Posts: 1460
Joined: Wed Mar 21, 2007 4:07 am UTC

Postby Rysto » Fri Jul 06, 2007 3:58 pm UTC

taggedunion wrote:Heh, because of definition conflict that occurs in BSD but not GNU/Linux, I can't use an unadorned 'isnumber' macro for an interpreter I'm writing. Redefining the macro is okay, but it also has a function prototype, and so far, I know of no way of undefining a prototype. Yeah, I'm screwed. :P

Hmm, would there be any given situation where redefining a function prototype already in scope would be useful?

No, I can't not include ctype.h. I'm using it. :P ;)

Reason #372 why macros are evil...

If I understand you properly, you have something like this:

//this definition might be in a header file somewhere
#define isnumber(x) //some definition

#include <ctype.h>


And the compiler chokes when trying to parse ctype.h? Can you use an inline function instead of a macro? That's one easy way to get around the problem. Another would be to call your macro something other than isnumber(x), if that's possible. If you can't do either, you could do this:

#undef isnumber
#include <ctype.h>
#define isnumber(x) //whatever

Gross, but it will work.

User avatar
taggedunion
Posts: 146
Joined: Fri Jul 06, 2007 6:20 am UTC
Location: BEHIND YOU

Postby taggedunion » Fri Jul 06, 2007 4:18 pm UTC

Rysto:

That would be all well in good, but in the POSIX standard one can combine function prototypes with macros (which I thought I mentioned), and I have undefined the 'isnumber' macro, but the function prototype is still hanging around.

Code: Select all

#include <ctype.h>


And in ctype is, around lines 170 or so: (at least on BSD 4.4)

Code: Select all

int isnumber(int);
#define isnumber(x) /* some table lookup */


Back to my source file:

Code: Select all

struct object {
    int type;
    void *value;
};

#define NUMBER_TYPE /* whatever */

#ifdef isnumber
#    undef isnumber
#endif

int isnumber(struct object *);
#define isnumber(o) ((o)->type == NUMBER_TYPE)


And while the macro is redefined, the prototypes clash.

I'll probably have to rename and/or put Hungarian notation on the buggers, true; but I was mostly wishing to be able to undefine or redefine function prototypes.
Yo tengo un gato en mis pantelones.

Rysto
Posts: 1460
Joined: Wed Mar 21, 2007 4:07 am UTC

Postby Rysto » Fri Jul 06, 2007 6:03 pm UTC

You can't define a prototype for a macro.

User avatar
taggedunion
Posts: 146
Joined: Fri Jul 06, 2007 6:20 am UTC
Location: BEHIND YOU

Postby taggedunion » Fri Jul 06, 2007 6:19 pm UTC

Rysto wrote:You can't define a prototype for a macro.


Technically, yes. But I can define a prototype that interacts with a macro, as I wrote in my code above.

As of yet, I have no proof of this except from source code. I've found it in ctype.h and in the headers of the source of a microkernel project whose name escapes me.

However, it compiles under the -ansi -pedantic -Wall flags with no errors or warnings, and works just fine. All the prototype does is type check the incoming expression. I don't need an actual function body for the prototype to be valid; a prototype is an incomplete function declaration anyway.
Yo tengo un gato en mis pantelones.

Rysto
Posts: 1460
Joined: Wed Mar 21, 2007 4:07 am UTC

Postby Rysto » Fri Jul 06, 2007 7:13 pm UTC

I would be very, very surprised if your compiler did any type checks of the macro parameters even if there's a prototype in scope.

User avatar
taggedunion
Posts: 146
Joined: Fri Jul 06, 2007 6:20 am UTC
Location: BEHIND YOU

Postby taggedunion » Fri Jul 06, 2007 7:24 pm UTC

Okay, I'm trying it out, and I just might be full of shit.

Which lends the question of why the hell have these people put function prototypes with macros at all.

I AM CONFUSED.

Sorry about all the crap about type-checking and stuff.
Yo tengo un gato en mis pantelones.

User avatar
phlip
Restorer of Worlds
Posts: 7573
Joined: Sat Sep 23, 2006 3:56 am UTC
Location: Australia
Contact:

Postby phlip » Fri Jul 13, 2007 11:05 am UTC

The ANSI C standard requires that all of the <ctype.h> functions (isalpha, isdigit, etc) be given as functions. Although it's bad practice to do so, it has to be possible (for backwards-compatibility reasons) to call library functions without including the header (if you try it, though, the compiler'll give you a warning).

So an actual int isalpha(int) function has to actually exist in libc somewhere. The standards also say that there has to be a function declaration in <ctype.h>

The standard doesn't say, however, that there can't be a macro as well... and the whole macro/LUT thing is faster than a function call, so they do that.

Anyway, my suggestion: Rename your new macro to something that doesn't clash with stuff in your headers. And don't add a int isnumber(struct object *); line, unless you plan to actually make a function as well. Technically you could still call it "isnumber" if you only have the macro... but having a macro and a function with the same name, that do different things... it's just asking for trouble.

Alternatively, look into the ctype.h header file, and see if there's any #ifdef or something to skip over the isnumber() definition... that function isn't part of any standard I know of (it's not in ANSI C, POSIX or the GNU C headers I have)... I know some GNU headers have a STRICT_ANSI define, that removes some extra stuff that's not in ANSI C.

Code: Select all

enum ಠ_ಠ {°□°╰=1, °Д°╰, ಠ益ಠ╰};
void ┻━┻︵​╰(ಠ_ಠ ⚠) {exit((int)⚠);}
[he/him/his]

User avatar
taggedunion
Posts: 146
Joined: Fri Jul 06, 2007 6:20 am UTC
Location: BEHIND YOU

Postby taggedunion » Wed Jul 18, 2007 5:28 am UTC

phlip: thank you. Now I just feel retarded now. :)

That certainly explains why there are function prototypes, though. Thank you, thank you.

I'm going to keep just to macros.

Any of you guys interested in compilers/interpreters? I could show interested parties to the SVN server for my project. If I didn't mention it before, it's a Lisp dialect.
Yo tengo un gato en mis pantelones.

Rysto
Posts: 1460
Joined: Wed Mar 21, 2007 4:07 am UTC

Postby Rysto » Wed Jul 18, 2007 3:50 pm UTC

I just re-learned a valuable lesson today:

Never, ever, do a cp my_new_libc.so /lib/libc.so

Especially if my_new_libc.so is on an NFS mount.

Qabach
Posts: 11
Joined: Thu Jul 12, 2007 1:28 am UTC

Postby Qabach » Thu Aug 02, 2007 6:03 am UTC

I recently did an assignment for a class where we had to load our files (called prog5_1.cgi) onto the school's server so that they would be accessible from an internet browser (the program interacts with the user via dynamically generated web pages. I'm sure that there's a more technical way of saying that, but I don't know it.)

Anyway, it had worked fine for Perl and PHP, but for this program we were working in Python.

I wrote a simple "hello world" style program and uploaded it as usual, but when I tried to access it from the web, I got error 500 (internal server error.) After about an hour of trying to get that to work at home, I gave up. When I went to school the next day, I wrote exactly the same program locally on their Linux machines and it worked fine. My professor looked at my code for over half an hour and couldn't figure out why it wasn't working (permissions are set correctly, and I uploaded the file as ascii text, not binary data, and those are the 2 most common problems students have in this class.)

The most confusing thing to me is that internal server error would normally (I think) imply that my code has a bug in it, but when I run "python prog5_1.cgi" on the command line of my remote connection to the school, the program executes perfectly and outputs valid HTML, which I can then paste into an html file and open.

Has anyone had similar experiences? Do you know what is going on?


edit: It may be useful to mention that the school's server, and backup server both broke recently and had to be rebuilt, so all of the old settings etc. were fubar'd. It's quite possible that the server's current configuration has something to do with my problems.

iw
Posts: 150
Joined: Tue Jan 30, 2007 3:58 am UTC

Postby iw » Thu Aug 02, 2007 12:11 pm UTC

Qabach wrote:My professor looked at my code for over half an hour and couldn't figure out why it wasn't working (permissions are set correctly, and I uploaded the file as ascii text, not binary data, and those are the 2 most common problems students have in this class.)
First off, this is backwards: if you ftp files as ASCII, there is a chance they will become corrupted. Binary transfer will always work. It doesn't sound like it got corrupted, though.

What OS is the server running? If it's Linux, you may be forgetting to add the magic Python shebang at the top:

Code: Select all

#!/usr/bin/env python

If that's not the case, the only other thing could be improper server settings.

Qabach
Posts: 11
Joined: Thu Jul 12, 2007 1:28 am UTC

Postby Qabach » Sat Aug 04, 2007 12:22 am UTC

Actually, uploading them as binary caused some of my programs not to work earlier on. The professor said something to the effect that the problem was that in my Windows development environment, newline characters were used and this didn't agree with the linux environment, which needs carraige returns.

There's nothing wrong with my shebang. I'm afraid that you may be right about the server settings.

iw
Posts: 150
Joined: Tue Jan 30, 2007 3:58 am UTC

Postby iw » Sat Aug 04, 2007 2:42 am UTC

Qabach wrote:Actually, uploading them as binary caused some of my programs not to work earlier on. The professor said something to the effect that the problem was that in my Windows development environment, newline characters were used and this didn't agree with the linux environment, which needs carraige returns.

That's not the right explanation, but it's close enough. Using text mode ftp transfer to convert newlines... that's... so old!

EvanED
Posts: 4331
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI
Contact:

Postby EvanED » Sat Aug 04, 2007 3:23 am UTC

We've had to deal with the CRLF - vs - LF thing a couple times at work.

To me it just seems really retarded. It's 2007 -- why are there compatibility issues with such a stupid thing?

I mean, even Emacs doesn't handle the differences properly all the time!

Here's my proposal: add a new (third) newline character. CR and LF have specific meanings, and my personal feeling is that Unix shouldn't have appropriated one to take the meaning of both. (Most network protocols, such as HTTP, take this view. Newlines are CRLF.) It's too late to get Unix to switch to CRLF or Windows to LF, but if we add a new character to represent a newline (as opposed to two representing half of a newline), maybe we could make some progress as both OSes start using it as standard.

Or maybe that would just make it worse. Who knows.

User avatar
taggedunion
Posts: 146
Joined: Fri Jul 06, 2007 6:20 am UTC
Location: BEHIND YOU

Postby taggedunion » Sat Aug 04, 2007 9:57 am UTC

EvanED wrote:Here's my proposal: add a new (third) newline character.


Unicode already has one, the Line Separator U+2028.

(Wikipedia on newlines)
Yo tengo un gato en mis pantelones.

User avatar
sunkistbabe1
Posts: 258
Joined: Tue Jul 31, 2007 11:03 pm UTC
Location: Shuswap, BC, Canada

Re: The "IT DOESN'T WORK!" thread

Postby sunkistbabe1 » Mon Aug 20, 2007 11:09 pm UTC

SpitValve wrote:Post your frustrations and inadequencies at programming...

LE4dGOLEM SAYS: Stickied for being a good idea.


My personal frustration is when the testers come back with "This doesn't Work" (when referencing a form or screen).

Almost as descriptive as "Type Mismatch".

What part of the friggen form doesn't work. Do you get an error? Does the balance of the <whatever> not come out to what is expected? Does the screen show inaccurate data? WHAAAT???

Coming back to me with "This doesn't work" makes me look incompetent. I didn't just spend X Days coding you a new form to have you tell me nothing works.


Phew, there is my rant for the day. :)
- Sunkist -

cathrl
Posts: 427
Joined: Tue Jan 30, 2007 9:58 am UTC

Postby cathrl » Tue Aug 21, 2007 4:32 pm UTC

My OH's answer to this is the silly error message.

People can never remember what the error number they got was. If they get an Out of Cheese Error (with apologies to Pratchett), they remember it.

Sadly, I can't get away with it, because his testers are internal, but mine work for the client :(

ToLazyToThink
Posts: 83
Joined: Thu Jun 14, 2007 1:08 am UTC

Re: The "IT DOESN'T WORK!" thread

Postby ToLazyToThink » Wed Aug 22, 2007 7:16 am UTC

sunkistbabe1 wrote:
SpitValve wrote:Post your frustrations and inadequencies at programming...

LE4dGOLEM SAYS: Stickied for being a good idea.


My personal frustration is when the testers come back with "This doesn't Work" (when referencing a form or screen).

Almost as descriptive as "Type Mismatch".

What part of the friggen form doesn't work. Do you get an error? Does the balance of the <whatever> not come out to what is expected? Does the screen show inaccurate data? WHAAAT???

Coming back to me with "This doesn't work" makes me look incompetent. I didn't just spend X Days coding you a new form to have you tell me nothing works.


Phew, there is my rant for the day. :)

Don't forget the useless screenshots.

I get those types of descriptions accompanied with multiple 10MB+ bitmaps of screens that look perfectly normal.

Unless of course there was an actual error message, then they feel the need to paraphrase it so I don't have a prayer of grepping it out of the source/logs.

User avatar
Pesto
Posts: 737
Joined: Wed Sep 05, 2007 5:33 pm UTC
Location: Berkeley, CA

Postby Pesto » Thu Sep 06, 2007 9:09 pm UTC

GRR!

I work for a non-profit. We hire outside companies to do "phone outreach", which is basically calling people and asking them for money. They send us a data file of all the people who decided to make a pledge, and I'm supposed to import it into our database.

The only problem is, every time they send us data, it's in a slightly different format, and I never know what it's going to be. I so need a new job.

User avatar
sunkistbabe1
Posts: 258
Joined: Tue Jul 31, 2007 11:03 pm UTC
Location: Shuswap, BC, Canada

Re: The "IT DOESN'T WORK!" thread

Postby sunkistbabe1 » Thu Sep 06, 2007 9:25 pm UTC

ToLazyToThink wrote:Don't forget the useless screenshots.

I get those types of descriptions accompanied with multiple 10MB+ bitmaps of screens that look perfectly normal.

Unless of course there was an actual error message, then they feel the need to paraphrase it so I don't have a prayer of grepping it out of the source/logs.


I started in Tech Support at the company I currently work at, taking calls from the clients who bought our program and I think I heard it all... including people using the cd tray for their drink. We have had people who would call in and say they got an error the previous day or week and ask if we could fix it... Of course they never wrote it down, it never happened since, and they cannot remember what it said.

<smashes face on keyboard>
- Sunkist -

zenten
Posts: 3799
Joined: Fri Jun 22, 2007 7:42 am UTC
Location: Ottawa, Canada

Re: The "IT DOESN'T WORK!" thread

Postby zenten » Fri Sep 07, 2007 1:00 am UTC

sunkistbabe1 wrote:
ToLazyToThink wrote:Don't forget the useless screenshots.

I get those types of descriptions accompanied with multiple 10MB+ bitmaps of screens that look perfectly normal.

Unless of course there was an actual error message, then they feel the need to paraphrase it so I don't have a prayer of grepping it out of the source/logs.


I started in Tech Support at the company I currently work at, taking calls from the clients who bought our program and I think I heard it all... including people using the cd tray for their drink. We have had people who would call in and say they got an error the previous day or week and ask if we could fix it... Of course they never wrote it down, it never happened since, and they cannot remember what it said.

<smashes face on keyboard>


"Well, I would need to know what that error is to fix the problem. Next time it happens again just write it down, and then call in again with the message."

"What if it doesn't happen again?"

"Well, that's not really a problem then, is it?"

And if they don't accept that then I get to be a broken record.

User avatar
warhorse
Posts: 203
Joined: Fri Mar 09, 2007 6:42 pm UTC
Location: Möbius Strip
Contact:

Postby warhorse » Fri Sep 07, 2007 3:45 pm UTC

When I took Operating Systems in college, my lab partner and I once spent 5 hours chasing down a memory corruption issue. It turns out that we were allocating memory for a struct by doing

Code: Select all

malloc(sizeof(sizeof(my_struct))


We got 4 bytes back every time :oops:
It's OK to be social, just don't tell anyone about it.

User avatar
Aperfectring
Posts: 252
Joined: Fri Sep 07, 2007 3:47 am UTC
Location: Oregon (happily)

Postby Aperfectring » Fri Sep 07, 2007 10:58 pm UTC

Bad debugging experience #1:
Shortly after I learned about pointers and linked lists in school, I decided to be the overzealous student and write a program that would automatically play "War", using a linked list of cards for each person's hand.

I slowly worked on many different parts, testing as I went along, forcing myself to step through each "battle" to ensure that the correct card won. So after I am confident it works, I take out the "waits" I had put in and let it run through unimpeded. First time through a game, some or all of the cards disappeared. So I went back to stepping through it. Went through 5-10 whole games without a misplaced card. So I took out the wait, and again, first time, cards disappeared. I decided to run through it multiple times without waits, and every time, cards disappeared. I went back to stepping through it, and every time, no cards were misplaced. I to this day cannot explain why it operated differently in those two cases.

Bad debugging experience #2:
Working with a Linux driver for a 2 channel MAC chip. The driver had been working and had been stable for about 4 months by the time we hit this bug. Brand new build, very little new code in for the driver, all of a sudden, on initial boot, the driver hung.
WEEK 1:
What the heck is in this scant 100 lines of code that could POSSIBLY be hanging this driver up? printk statments galore are added to the new code, lots of time spent sifting through large log files.
WEEK 2:
The boss is getting very anxious for a fix to this problem. Project manager and I do a deep dive into the new code to find the problem, find nothing.
WEEK 3:
We decide we should go to the root of the problem, and take a look at interrupts (since we hang in a very bad way, and the only way for that to happen is to be in an interrupt handler). Find nothing.
WEEK 4:
Deep dive into ALL of the driver code with the boss looking over the shoulder. And still, we find nothing.
WEEK 5:
Someone notices something that doesn't look quite right about the read/write macros we are using.

The solution:

Code: Select all

Before:
MAC_CHIP_WR(base_mac_chip_addr, offset, value) do{vol_int_ptr=((base_mac_chip_addr) + (offset) << 2); *vol_int_ptr = (value);} while 0
MAC_CHIP_WR_CHAN(chan_num, base_mac_chip_addr, offset, value) MAC_CHIP_WR((base_mac_chip_addr) + ((chan_num) * 0x500), (offset), (value))

After:
MAC_CHIP_WR(base_mac_chip_addr, offset, value) do{vol_int_ptr=((base_mac_chip_addr) + (offset) << 2); *vol_int_ptr = (value);} while 0
MAC_CHIP_WR_CHAN(chan_num, base_mac_chip_addr, offset, value) MAC_CHIP_WR((base_mac_chip_addr), (offset) + ((chan_num) * 0x500), (value))


In code that was put into the driver before I even worked on it (before I even worked for the company even), there was a "tiny" bug that would have been catastrophic if it had made it out to the field. The reason we didn't find it earlier: I computed the offset to include the channel number myself, and only used the first macro. Also, the read macro (which I can't recall correctly) had the same bug, so it would also write to the wrong register location.

It turned out the some of the new code added decided to turn some interrupt on that we weren't handling because we never touched it, and thus it was never serviced, and would instantly trigger a new interrupt.
Odds are I did well on my probability exam.

Agentlien
Posts: 31
Joined: Mon Sep 10, 2007 6:06 pm UTC
Location: Göteborg, Sweden

Postby Agentlien » Mon Sep 10, 2007 10:32 pm UTC

Since I've been programming in C/C++ for a few years (half my life), I've naturally had a few nightmare debugging memories. And as always there's the one which seems so obvious once you solve it, as well as the one where you still have no clue what was going on once you've fixed it. The latter being alarming since it proves you still didn't fully understand the problem.

My two favourite examples of things not working are both from the same project. A simple Final Fantasy clone I made for school a few years ago. In the game, every square held a pointer to an object of the generic superclass of all NPCs, enemies, triggers, etc. Now, this bug suddenly presented itself when I had written the code for a brand new subclass: The teleport. A teleport was simply an object which, whenever collided with, unloaded the current map and brought you to a new one.

Here's the bug:
If there was a teleport on the map, the game crashed whenever you pressed the left arrow key.
It did not matter whether you were actually close to the teleport or not, and no other key had any effect. In fact, none of the code to handle keypresses, or directions, or movement called the teleport interaction, ever! And the teleport never checked what keys were pressed either. It didn't even matter if I removed the code which actually handled that key being pressed! After a few days of trying to figure it out I just suddenly came home, turned on my computer, opened the project, and instantly spotted the error. It was in the code which, if a teleport had been activated, triggered the actual teleportation. As an early step of the teleportation I needed to cast the teleport down from the type of its superclass. I did it somewhat like this (Except, these are not the actual variable names, nor class names used in the project.):

Code: Select all

Teleport* T = (Teleport*)&Object


The problem was that Object of course wasn't an object, it was a pointer. I took the address of a pointer to an object. Casting from SuperClass** to SubClass*. That does seem able to mess things up. :lol: But I still never figured out why the left arrow key, of all things, triggered the crash, which still frustrates me a bit.

If you don't recognize me, don't worry. Your memory is not faulty, this just happens to be one of my first posts. And.. Nevermind my sig, I had no idea what to write there anyway.[/code]
You C, I aim to Assemble a PERL of knowledge, helpful FOR TRANsit to new languages.
___
When writing a report, always include valid references. You never know when the Garbage Collector may stop by.

Rysto
Posts: 1460
Joined: Wed Mar 21, 2007 4:07 am UTC

Postby Rysto » Mon Sep 10, 2007 10:43 pm UTC

dynamic_cast is your friend.

Agentlien
Posts: 31
Joined: Mon Sep 10, 2007 6:06 pm UTC
Location: Göteborg, Sweden

Postby Agentlien » Tue Sep 11, 2007 7:12 pm UTC

Rysto wrote:dynamic_cast is your friend.


Yes, it is. :) Unfortunately I didn't know of it when I wrote this program.
You C, I aim to Assemble a PERL of knowledge, helpful FOR TRANsit to new languages.

___

When writing a report, always include valid references. You never know when the Garbage Collector may stop by.


Return to “Coding”

Who is online

Users browsing this forum: No registered users and 6 guests