Commenting and Conventions

A place to discuss the implementation and style of computer programs.

Moderators: phlip, Moderators General, Prelates

snow5379
Posts: 247
Joined: Tue Apr 12, 2011 6:06 pm UTC

Commenting and Conventions

Postby snow5379 » Mon Oct 29, 2012 3:43 am UTC

I don't know how to comment code or name my stuff. I've started a new project and I want to do things right in case, god forbid, someone else someday reads my code. Here's my first class:

Code: Select all

package files;

import javax.swing.*;

class frame extends JFrame
{
   frame(int a,int b)
   {
      setSize(a,b);
      setLayout(null);
      setLocationRelativeTo(null);
      setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
      setVisible(true);
   }
}

I've noticed that the vast majority of frames I've created in my life have been centered and made visible with no layout. So... why not create a class to make a frame like that? And here's my problem: I don't know what to call my package, my class, where to comment, and so on. This is a personal project and eventually I want to create a 2D adventure game like the ones I used to play as a kid... so it's not THAT important that I code correctly... but the perfectionist in me wants to do it right. Any tips?

Here's the second class I've written. Eventually it's going to help me align the sprites on the screen (thinking it over now I should probably rename "target" to "screen" though is that too vague? It's the area inside the JFrame that the "view" will be looking at. Is there a name for that?):

Code: Select all

package files;

import java.awt.*;
import java.awt.geom.*;

class view
{
   double zoom=1;
   Point2D.Double viewer;
   Component target;

   view(Point2D.Double a,Component b)
   {
      viewer=a;
      target=b;
   }

   double offset_x()
   {
      if(viewer!=null&&target!=null)return viewer.x-target.getWidth()/2;
      return 0;
   }

   double offset_y()
   {
      if(viewer!=null&&target!=null)return viewer.y-target.getHeight()/2;
      return 0;
   }

   int adjust_x(double a){return(int)Math.rint(a*zoom-offset_x());}
   int adjust_y(double a){return(int)Math.rint(a*zoom-offset_y());}
   int adjust(double a){return(int)Math.rint(a*zoom);}
}

Any tips are welcome! Also I have a general idea of what classes my project will contain and how they will relate to each other. Should I write that up at any point or does it not matter?

snow5379
Posts: 247
Joined: Tue Apr 12, 2011 6:06 pm UTC

Re: Commenting and Conventions

Postby snow5379 » Mon Oct 29, 2012 4:45 am UTC

Apparently changing a frame's content pane has different effects depending on your look and feel. Should I add a comment about that?

Code: Select all

package files;

import java.awt.*;
import javax.swing.*;

class frame extends JFrame
{
   frame(Container a,int b,int c)
   {
      setSize(b,c);
      setLayout(null);
      setLocationRelativeTo(null);
      setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);

      //Making the frame visible before the content pane can mess up some look and feels.

      setContentPane(a);
      setVisible(true);
   }
}

How's that for a comment? Is there anything else I should comment?

DaveInsurgent
Posts: 207
Joined: Thu May 19, 2011 4:28 pm UTC
Location: Waterloo, Ontario

Re: Commenting and Conventions

Postby DaveInsurgent » Mon Oct 29, 2012 3:23 pm UTC

For starters, you should probably stick to Java conventions for naming, meaning your ClassName and your methodName have casing as such.

where to comment


Nowhere. Aim to have zero comments because they lie and bloat the source. Instead focus on better naming (which I appreciate is something you're struggling with). The only place for comments is when there's no other way to explain what is going on, and even then you should at least punch yourself in the kidney each time you write a comment, that way you're really sure it's necessary.

You seem to be attempting to subclass JFrame as a means to achieve specialization. I have to instead ask why not encapsulate a JFrame in a higher level concept within your problem domain? What that concept is, I don't know, because you haven't really said what you're doing. Your Frame is-a JFrame, which may or may not be suitable. If you're writing a 2D game engine... just off the top of my head, maybe you want to support multiple players on the same screen? In which case you might have a PlayerScreen which has-a JFrame. Even if you don't, you might still want to think of it that way. By inheriting from JFrame you're carrying it's baggage. Try to avoid that. One of the biggest mistakes anyone can make in object-oriented programming is to try to use inheritance to model your problem domain. Even senior developers do it and can't see the errors in their ways until it blows up in their face. This is all despite the overwhelming evidence that it can, will and does happen put forth by the field of software engineering.

It's the area inside the JFrame that the "view" will be looking at. Is there a name for that?):


I would guess Frustum though I believe it's usually found in 3D related discussion.

if(viewer!=null&&target!=null)return viewer.y-target.getHeight()/2;


Don't write code like this. Kittens, etc.

The best advice I can give you is you need to think more before you code. You need to define a problem domain and solve it: in the goal of writing a 2D game, that is going to involve writing a 2D game engine (I say this because you've already started, otherwise I'd suggest that you ought to consider using an engine that already exists). This is going to involve solving several problems:

- Assets
- Rendering (rasterization)
- Sound
- Physics/object interaction
- Persistence (save games, load games, loading level data)
- Networking

.. among others

The different ways of approaching those problems are documented in books and on the internet in great abundance. You need to understand how you're going to solve them before you can think about implementing the solutions. The reason you're having a hard time naming things is because you're not certain about what they (should) do.

User avatar
WanderingLinguist
Posts: 237
Joined: Tue May 22, 2012 5:14 pm UTC
Location: Seoul
Contact:

Re: Commenting and Conventions

Postby WanderingLinguist » Mon Oct 29, 2012 3:35 pm UTC

I think you're on the right track with comments (more or less). That is, fewer is better if you can make it obvious what the code does.

Generally, if it's not obvious what something does I try to rewrite it to make it more obvious.

But if it's not obvious why something is needed, that's when I comment it, because I probably won't remember later.

For classes, a brief explanation of how it's intended to be used it often a good idea.

The best way to learn what to comment and what to name things is to look at other people's code. Open source stuff is really good for this. The parts that are confusing to you ("Ah, I wished they'd commented this") ought to give you some clues about what parts of your own code you should be commenting.

For your class names, you probably want to stick with the convention of capitalizing the first letter. Also, classes like "View" are pretty commonly used in GUI toolkits to mean something that displays something on the screen (kind of like a JFrame) so if your view just provides centering and positioning, maybe you want to call it something like ViewPositioner or something along those lines.

Think about what the class does. Is it an object that is visually recognized on the screen like a button, check box or text field? If so, Button, CheckBox, TextField are probably good names (maybe with something added to make them more specific; that depends on how your namespace -- packages, that is -- is laid out). If it holds state and performs a function based on that state, then describe it that way. Actually, looking at your code again, it looks like your "view" class is used to calculate positioning for items within a Component, so maybe you want to call it "ComponentContentPositioner" or something.

But looking at other people's code will help a lot. And check the Java naming guidelines. Though you may hate the verbosity, sticking to the standard (class names start with uppercase letters, etc.) will mean there's less to learn when someone else looks at your code.

Edit: Ninja'd about fewer comments and following Java conventions.

DaveInsurgent
Posts: 207
Joined: Thu May 19, 2011 4:28 pm UTC
Location: Waterloo, Ontario

Re: Commenting and Conventions

Postby DaveInsurgent » Mon Oct 29, 2012 3:46 pm UTC

I'll also add that adding tests is considered a superior way of "commenting" and you will see this more in open-source software than you will see exhaustive comments or documentation nowadays.

You can write comments about what (not) to do all you want, but someone can go and change it and release the software with a bug in it. If you write a unit or interaction test that checks you actually have a degree of enforcement.

Comments are like that parent that stands there looking at their kid saying "Timmy, no Timmy, stop Timmy, put that down, take that out of our mouth, Timmy.. stop.. stop that please."

Unit tests are like the parent that throws a shoe. They might miss, but, at least you tried.

I highly recommend you pick up a copy of Robert C. Martin's "Clean Code".

User avatar
WanderingLinguist
Posts: 237
Joined: Tue May 22, 2012 5:14 pm UTC
Location: Seoul
Contact:

Re: Commenting and Conventions

Postby WanderingLinguist » Mon Oct 29, 2012 4:04 pm UTC

The worst commenting I've ever seen was something like this:

Code: Select all

setColor(Color.RED); // set color to red
moveTo(50,50);  // move to x=50 y=50
circle(25);  // draw a circle with size=25


I'm not joking. Even saying "with a 25 pixel radius" would have at least been something.

And this was on every line of code in the whole damn program.

DaveInsurgent
Posts: 207
Joined: Thu May 19, 2011 4:28 pm UTC
Location: Waterloo, Ontario

Re: Commenting and Conventions

Postby DaveInsurgent » Mon Oct 29, 2012 4:40 pm UTC

Here's my random-sample comments from our code base:

Code: Select all

   /**
    * Get list of files. Default file type is Debug
    * @param foo
    * @param bar
    * @param baz
    * @param type
    * @return
    */


Note that it actually returns an ArrayList despite the comment (oh yeah, everything is an ArrayList in here. Fuck interfaces, man!)

Code: Select all

   /** getUser
    * @brief adds user to database


I for one, definitely associate "adds to database" with "get".

Oh, thank god you told me!

Meem1029
Posts: 379
Joined: Wed Jul 21, 2010 1:11 am UTC

Re: Commenting and Conventions

Postby Meem1029 » Mon Oct 29, 2012 7:53 pm UTC

WanderingLinguist wrote:The worst commenting I've ever seen was something like this:

Code: Select all

setColor(Color.RED); // set color to red
moveTo(50,50);  // move to x=50 y=50
circle(25);  // draw a circle with size=25


I'm not joking. Even saying "with a 25 pixel radius" would have at least been something.

And this was on every line of code in the whole damn program.


There's a good chance that in a few months that program will look like

Code: Select all

setColor(Color.BLUE); // set color to red
moveTo(25,25);  // move to x=50 y=50
circle(50);  // draw a circle with size=25
cjmcjmcjmcjm wrote:If it can't be done in an 80x24 terminal, it's not worth doing

snow5379
Posts: 247
Joined: Tue Apr 12, 2011 6:06 pm UTC

Re: Commenting and Conventions

Postby snow5379 » Mon Oct 29, 2012 9:52 pm UTC

Thanks for your advice! It's very helpful.

The reason why I'm extending a frame is because if I add a player to the frame I can reference the player without passing it everywhere by just going through a getRoot(Component).

Also what's wrong with this?

Code: Select all

   double offset_x()
   {
      if(viewer!=null)return zoom*viewer.x-screen.getWidth()/2;
      return 0;
   }

   double offset_y()
   {
      if(viewer!=null)return zoom*viewer.y-screen.getHeight()/2;
      return 0;
   }

It centers the viewer by subtracting half on each axis. But it can't work if there's no viewer to center so I check first if the viewer exists. Also I account for zoom.

My current code looks like this:

Code: Select all

package files;

import java.awt.*;

class view
{
   double zoom=1;
   actor viewer;
   Component screen;

   view(Component a){screen=a;}

   double offset_x()
   {
      if(viewer!=null)return zoom*viewer.x-screen.getWidth()/2;
      return 0;
   }

   double offset_y()
   {
      if(viewer!=null)return zoom*viewer.y-screen.getHeight()/2;
      return 0;
   }

   int adjust_x(double a){return(int)Math.floor(a*zoom-offset_x());}
   int adjust_y(double a){return(int)Math.floor(a*zoom-offset_y());}
   int adjust(double a){return(int)(a*zoom);}
}

Thought I can't for the life of me remember why I floor adjust_x. It might have been a hack to prevent a rounding bug with negatives... I'm going to try to remove the floor and see if anything weird happens.

EDIT

Yeah it was a hack to fix a bug: there were gaps between tiles at certain zooms because negatives round up while positives round down. Later on I just set the tile width to be whatever the difference between its x and the next tile's x so the hack was no longer needed.

EvanED
Posts: 4330
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI
Contact:

Re: Commenting and Conventions

Postby EvanED » Mon Oct 29, 2012 10:04 pm UTC

DaveInsurgent wrote:
where to comment


Nowhere. Aim to have zero comments because they lie and bloat the source. Instead focus on better naming (which I appreciate is something you're struggling with). The only place for comments is when there's no other way to explain what is going on, and even then you should at least punch yourself in the kidney each time you write a comment, that way you're really sure it's necessary.

I strongly disagree here. I understand the point of what you're saying, which is that too often comments are used as crutches for bad code. This is worth keeping in mind, and it's good to think about "can I turn this comment into code" as you write.

However, there are plenty of times when comments are between useful and essential. For instance:
  • Explaining surprising behavior of an external API
  • Explaining some code which is just inherently tricky
  • Explaining why an alternative (often simpler) approach which seems like it should work actually doesn't. (Dave's point about having a test however is also well-motivated. But a comment still helps.)
  • Providing documentation comments (a la Doxygen/JavaDoc), at least if you keep them current. (Less important for open source stuff; pretty much essential for any closed-source API.)
  • Compensating for a lack of expressibility because of language issues (e.g. I find // for each foo in foos before for (some_container<Foo>::const_iterator foo = foos.begin() ; foo != foos.end(); ++foos) to be useful
  • Giving a quick summary of what the next 5 lines or something do I think is extremely useful, and is often way clearer than splitting it into its own function
  • Some special cases like ASCII art

It is possible to write too many comments -- but it's way easier to write too few. And too few comments is almost way more damaging than too many, even if the comments get out-of-date. And it's way less work to hit C-k or whatever to delete a comment that you see becoming incorrect than it is to write ones.

snow5379
Posts: 247
Joined: Tue Apr 12, 2011 6:06 pm UTC

Re: Commenting and Conventions

Postby snow5379 » Mon Oct 29, 2012 10:23 pm UTC

Would you comment something like this?

Code: Select all

   void draw_tiles(Color[][]a,Graphics b)
   {
      int c=map.tile_size;

      for(int d=0;d<a.length;d++)
         for(int e=0;e<a[d].length;e++)
         {
            b.setColor(a[d][e]);

            b.fillRect
            (
               view.adjust_x(d*c),
               view.adjust_y(e*c),
               view.adjust_x((d+1)*c)-view.adjust_x(d*c),
               view.adjust_y((e+1)*c)-view.adjust_y(e*c)
            );
         }
   }


Eventually I plan on using a tile object instead of a rectangle but for now it works.

I'm not using the tile_size as the width and height because there can sometimes be gaps between when one tile ends and another starts because of rounding. Should I add a comment about that? Also is there a certain guideline for when to split up things like fillRect(int,int,int,int) or do you just go with whatever looks nice?

EvanED
Posts: 4330
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI
Contact:

Re: Commenting and Conventions

Postby EvanED » Mon Oct 29, 2012 10:37 pm UTC

snow5379 wrote:Would you comment something like this?

I would make a number of changes. First, a and b are really bad variable names. It takes way less effort to press a few more keys than it does to understand non-meaningful names. (Some common exceptions are things like i for loop counters, and I think either Java or C# uses g for a graphics object.) Second, learn to use the space bar a bit; you're running pretty much everything together that can be. Finally, Java convention is to use "lowerCamelCase" for function and variable names and "UpperCamelCase" for classes. I'll change the code to that to demonstrate, but I'm not always one to follow a platform's conventions if I don't like them and I think I like "underscore_names" better (like you have) for functions and variables. (Still definitely UpperCamelCase for classes, though. Also, Java convention is to capitalize abbreviations like HTMLReader or something like that, but I think that way lies madness; I much prefer HtmlReader. The function name NWAToWPDS in my own code base was enough to totally sell me on that.)

Code: Select all

   void drawTiles(Color[][] colors, Graphics g)
   {
      int tileSize = map.tileSize;

      for(int row=0; row<colors.length; row++)
         for(int col=0; col<colors[row].length; col++)
         {
            g.setColor(colors[row][col]);

            g.fillRect
            (
               view.adjustX(row * tileSize);
               view.adjustY(col * tileSize);
               view.adjustX((row+1) * tileSize) - view.adjustX(row * tileSize),
               view.adjustY((col+1) * tileSize) - view.adjustY(col * tileSize)
            );
         }
   }


I'm not using the tile_size as the width and height because there can sometimes be gaps between when one tile ends and another starts because of rounding. Should I add a comment about that?

That would be a good thing to comment. I don't think I'd comment anything else unless I expected to want a Javadoc comment for it.

Also is there a certain guideline for when to split up things like fillRect(int,int,int,int) or do you just go with whatever looks nice?

You mean on multiple lines? I go with what looks nice. In your case, that's too long for one line so splitting it up is good. Usually you'd put the ( on the preceding line, but it's not bad as-is, just a bit weird.

Edit: Alternatively, you could use auxiliary variables to hold the intermediate calculations, something like:

Code: Select all

   void drawTiles(Color[][] colors, Graphics g)
   {
      int tileSize = map.tileSize;

      for(int row=0; row<colors.length; row++)
         for(int col=0; col<colors[row].length; col++)
         {
            g.setColor(colors[row][col]);

            double pixelLeft = view.adjustX(row * tileSize);
            double pixelTop = view.adjustY(col * tileSize);
            double pixelRight = view.adjustX((row+1) * tileSize);
            double pixelBottom = view.adjustY((col+1) * tileSize);
            double pixelWidth = pixelRight - pixelLeft;
            double pixelHeight = pixelBottom - pixelTop;

            g.fillRect(pixelLeft, pixelTop, pixelWidth, pixelHeight);
         }
   }

or some compromise between them. (Some people won't like that degree of "useless" variables.)

I may have rows/cols backwards, or Y coordinates backwards, or stuff like that -- which just goes to illustrate how important variable names are. (For something like tiles, you're more likely to see the first coordinate be the row starting from 0 at the top, and the second coordinate be the column starting from 0 at the left, than the cartesian (X,Y) coordinates. Matrix coordinates, in other words.)

DaveInsurgent
Posts: 207
Joined: Thu May 19, 2011 4:28 pm UTC
Location: Waterloo, Ontario

Re: Commenting and Conventions

Postby DaveInsurgent » Tue Oct 30, 2012 4:02 pm UTC

EvanED wrote:Explaining surprising behavior of an external API


This is done via unit tests and integration tests. If you ask me "How does your thing work?" I will tell you that there are a number of tests that function as direct, clean examples of how the thing behaves, all with excellent names. The behavior of an external API is prone to change, and thus comments suffer the same fate as they do any other place. You *already* have to maintain the unit tests around the behavior, why add a third item to be maintained?

Also, if there is surprising behavior in the API, much worse issues are afoot than code style.

Explaining some code which is just inherently tricky


Often times this is just an implementation detail. It's certainly an exception. What is tricky? Can you give an example? Because often it's more like "clever".

Explaining why an alternative (often simpler) approach which seems like it should work actually doesn't. (Dave's point about having a test however is also well-motivated. But a comment still helps.)


The test isn't well-motivated: it outright proves it. The comment is a passive suggestion. Especially if the bug doesn't materialize instantly, but instead comes temporally or only after a significant amount of up-time. The unit test catches the condition in which the approach fails, and voila. You don't allow commits that don't pass their tests, so the developer wasted his time. Now that would be a problem except that in this case the developer was randomly rewriting a part of code for what? What purpose? This goes hand-in-hand with committing early and often.

Providing documentation comments (a la Doxygen/JavaDoc), at least if you keep them current. (Less important for open source stuff; pretty much essential for any closed-source API.)


Again, the "at least if you keep the current" part is a proven - not suggested, but proven by numerous experts in our field - to be not the case. It doesn't happen. Not only that, but it also creates extra work that does nothing - you read the API and guess what? You still have to read the code. The superior approach is to go to the tests. The tests tell you how it would work and prove that it does. This is something that '@returns the customer' will never, ever, ever, compete with.

Compensating for a lack of expressibility because of language issues (e.g. I find // for each foo in foos before for (some_container<Foo>::const_iterator foo = foos.begin() ; foo != foos.end(); ++foos) to be useful


Really? Use a macro then or something. Or use std::for_each because, shit - that at least describes it. I get what you're saying but, I think you need to give a real example. "I don't know the look of looping over a std iterator" seems a bit... weak don't you think? Also, in your specific case, ++foos can do fucking anything, so there's issues way, way beyond the languages' expressiveness.

Giving a quick summary of what the next 5 lines or something do I think is extremely useful, and is often way clearer than splitting it into its own function


But, it's not. It's precisely not. It's exactly the opposite of useful: that summary represents what those 5 lines did at some point in time (maybe, if it was ever accurate). Now? Who knows. You can't say. You can't trust it, and guess what? It's usually wrong. You have to read the summary, then the code. That' s just a waste of time - it's a waste writing it, and it's a waste reading it. Yes, you should put those 5 lines of code in to a separate function that describes what it does.

Some special cases like ASCII art


I've never heard of this before, so I'll give it to you. My advice and the experience I draw from (via published authors) only applies to products that are part of some business or otherwise rely on engineering. I don't see where ASCII art fits in to that.

And too few comments is almost way more damaging than too many


All cases converge on reading and understanding the code. The difference is that if you don't rely on comments, the code is likely to be clean (and if it's not, you get fired). If you do, the comments then have to be additionally kept clean, maintained, read and understood - and then you have to go and read the code anyway because who knows. The comments lead you to make assumptions about how things will work. Then you have bugs and an implementation that might not even be right. If you give me code with tests and no comments, I can easily grok the code, read the tests, and build a solution around your code because you've given me proof about how it works.

EvanED
Posts: 4330
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI
Contact:

Re: Commenting and Conventions

Postby EvanED » Tue Oct 30, 2012 5:00 pm UTC

I'll have more to say about this later. Some of your questions are left unanswered or partially answered for now.

DaveInsurgent wrote:
EvanED wrote:Explaining surprising behavior of an external API

This is done via unit tests and integration tests. If you ask me "How does your thing work?" I will tell you that there are a number of tests that function as direct, clean examples of how the thing behaves, all with excellent names. The behavior of an external API is prone to change, and thus comments suffer the same fate as they do any other place. You *already* have to maintain the unit tests around the behavior, why add a third item to be maintained?

Co-location of the explanation and the thing being explained. (I consider that to be pretty important.)

Also, if there is surprising behavior in the API, much worse issues are afoot than code style.

I'll be sure to tell Microsoft and POSIX to fix their APIs.

Again, the "at least if you keep the current" part is a proven - not suggested, but proven by numerous experts in our field - to be not the case. It doesn't happen.

There are several well-maintained projects with useful Javadocs. Of course the most obvious example is the Java API itself.

Not only that, but it also creates extra work that does nothing - you read the API and guess what? You still have to read the code.

And what if the code won't be available to its users because it's a closed API? And "go read the code" often starts you down a huge rabbit hole -- now instead of just reading the description of foo, you also have to read bar and baz and borg, and because you read bar now you have to go read three more.

Really? Use a macro then or something.

Talk about a cure worse than the disease.

Or use std::for_each because, shit - that at least describes it.

std::for_each is horrendously obnoxious to use before lambda support and can be ugly to use even if you have it.

Also, in your specific case, ++foos can do fucking anything, so there's issues way, way beyond the languages' expressiveness.

So can iter.next(). Operator overloading doesn't introduce anything new to the picture in that respect, regardless of what its detractors say.

Some special cases like ASCII art


I've never heard of this before, so I'll give it to you. My advice and the experience I draw from (via published authors) only applies to products that are part of some business or otherwise rely on engineering. I don't see where ASCII art fits in to that.

I've been doing a lot of work with an automaton library over the last couple years, so I probably get more mileage out of this than many. But I have many places with comments like:

Code: Select all

                // We have:
                //            eps path (w_eps)        sym (w_t)
                //        *q - - - - - - - - - > mid ----------> target
                //
                // and want to make it:
                //
                //            eps path (w_eps)        sym (w_t)
                //        *q - - - - - - - - - > mid ----------> target
                //         |                                      /\    .
                //         +--------------------------------------+
                //                sym (w_eps * w_t)
                //
                // (We will remove the epsilon transitions later, but above
                // we checked to make sure we aren't adding a new one now.)

or in test code

Code: Select all

            //             eps            a            eps
            // --> start -------> middle ----> almost ----->(accept)

Which is easier to understand what's going on: that comment, or

Code: Select all

                wfa.addState(start, zero);
                wfa.addState(middle, zero);
                wfa.addState(almost, zero);
                wfa.addState(accept, zero);

                wfa.setInitialState(start);
                wfa.addFinalState(accept);

                wfa.addTrans(start, WALI_EPSILON, middle, one);
                wfa.addTrans(middle, Letters().a, almost, one);
                wfa.addTrans(almost, WALI_EPSILON, accept, one);

If I say "now I want to write a test for whether the machine accepts 'a'", which is easier to determine what the answer will be? What should the resulting automaton look like if I remove epsilon transitions (keeping the language the same)? If you just gave me that code -- even as simple of an automaton as it is -- I would just pull out a scrap of paper and draw what I put in the freaking comment.

In this case the difference between what a reasonable API can do and what comments can display is so dramatic that IMO it's not even a contest.

Tree algorithms are also a great candidate for ASCII art -- e.g. rotations in a balanced binary tree. Actually what I'd really like is the ability to embed pictures into code directly so you wouldn't have to mess with ASCII art, but in some ways that doesn't change anything fundamental -- you still have a comment. Even if you say "go look in CLRS p.1111" then CLRS p. 1111 is just a comment -- and it's a really inconvenient comment to access.

(Along the same line, but not ASCII art, I did some refactoring of a project a while back that had a in-some-ways similar thing. There was some reasonably big logical expression with a bunch of .equals() and .add() and .and() etc. calls. There was a comment above it that gave the same expression using actual notation -- (x * y = z) and (1 + y' = 7) or whatever. Sure you can break the expression up and stuff like that, but that only goes so far. I changed it to actually use overloaded operators -- and then the code essentially became the comment, and I was able to get rid of the comment. But in a language without operator overloading? There's absolutely no way you can get the code to the point where it's as readable as the comment -- and so the comment should stay.)

DaveInsurgent
Posts: 207
Joined: Thu May 19, 2011 4:28 pm UTC
Location: Waterloo, Ontario

Re: Commenting and Conventions

Postby DaveInsurgent » Tue Oct 30, 2012 8:27 pm UTC

That ASCII art tells me nothing, to be honest. The code is hard to read, yes. You can't defend comments with unreadable code: they're related. That code needs to be rewritten to be more descriptive. I can't tell you what that is, because I have no idea what the code does, and the comment didn't help me at all.

Operator overloading doesn't introduce anything new to the picture in that respect, regardless of what its detractors say


I'm not a detractor, but my point was that you used one of the most common loop constructs as an example of something that needed a comment. It's only slightly less redundant than documenting a setter or getter. If you don't know what is going on there, then there's biggest issues (I'm not saying that's actually the case for you, just... what the hell does the comment do? How can someone that needs a for loop explained to them possibly handle operator overloading or function overloading?)

Co-location of the explanation and the thing being explained. (I consider that to be pretty important.)


You've not once addressed the issue that this trends to be incorrect. You can consider it important all you want, but the experience of the experts in our field says that the cons outweigh the pros.

std::for_each is horrendously obnoxious to use before lambda support and can be ugly to use even if you have it.


Those weren't serious suggestions, because the point that a for loop needs a comment was, in my opinion, ridiculous.

And what if the code won't be available to its users because it's a closed API? And "go read the code" often starts you down a huge rabbit hole -- now instead of just reading the description of foo, you also have to read bar and baz and borg, and because you read bar now you have to go read three more.


If using Foo requires Bar, how would the documentation avoid this if the code does not? That's impossible, and the unit tests would abstract Bar away with a test double letting you focus on Foo.

EvanED
Posts: 4330
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI
Contact:

Re: Commenting and Conventions

Postby EvanED » Tue Oct 30, 2012 9:02 pm UTC

DaveInsurgent wrote:That ASCII art tells me nothing, to be honest. The code is hard to read, yes. You can't defend comments with unreadable code: they're related. That code needs to be rewritten to be more descriptive. I can't tell you what that is, because I have no idea what the code does, and the comment didn't help me at all.

Do you know what a finite automaton is? If not, that's understandable -- but you're not exactly in a position to judge the utility of the comment then, as you won't really be able to do anything with the library. That'd be like trying to go from never having used a GUI or know what it is to understanding what Qt does by reading the code.

The (second) comment I posted is an ASCII depiction of the following graph:
aut.png

Here's the comment again to compare:

Code: Select all

            //             eps            a            eps
            // --> start -------> middle ----> almost ----->(accept)

If you think that there's a large difference in readability between the ASCII art and that rendering, well, I disagree. (In other cases it's more dramatic, of course, though nothing horribly convoluted tends to come up in test fixtures.) If you think that even that diagram and the code are equally unreadable, well, I strongly disagree -- both the ASCII art and rendered diagram are an order of magnitude easier to skim than the code.

And you know why I have to attach a picture? Because there really isn't a good textual depiction of an automaton (or any graph) that's nearly as readable! There just isn't! That's all you see anywhere: hard-to-visualize textual depictions and graphics. And unless you want to say "yes, I'll have a graphical programming language", that means there's no API I could possibly write that would make it as easy to visualize as the pseudo-graphical ASCII art. (I guess I could write a parser which would take my ASCII art as a string and parse it. (That's not a serious suggestion.))

I'm not a detractor, but my point was that you used one of the most common loop constructs as an example of something that needed a comment. It's only slightly less redundant than documenting a setter or getter. If you don't know what is going on there, then there's biggest issues (I'm not saying that's actually the case for you, just... what the hell does the comment do? How can someone that needs a for loop explained to them possibly handle operator overloading or function overloading?)

OK, so maybe that was a bit extreme. I'll give a bit better of an example later.

You've not once addressed the issue that this trends to be incorrect. You can consider it important all you want, but the experience of the experts in our field says that the cons outweigh the pros.

Again I'll try to say more later, but my own experience differs.

If using Foo requires Bar, how would the documentation avoid this if the code does not?

Because Bar is an internal function that the user doesn't even need to know exists and will disappear in the next version as the API stays stable?

DaveInsurgent
Posts: 207
Joined: Thu May 19, 2011 4:28 pm UTC
Location: Waterloo, Ontario

Re: Commenting and Conventions

Postby DaveInsurgent » Tue Oct 30, 2012 10:34 pm UTC

No experience with Automata (though I am not intrigued), however I did some research over the last half hour to try and have a clue, and then took to Stack Exchange just to see:

http://codereview.stackexchange.com/que ... n-in-scala

I can "read" the question as well as the answers with my superficial understanding of what is happening. I don't need comments for that. So your point seems to be that the expressiveness of the language is prohibitive, and I'd agree, in which case I'd ask if you can not use a different language? If not, what about some kind of builder pattern? Your domain model doesn't seem to have a transition as an actual entity, just a method call on some context object (not sure what that is). The API reads a lot like what happens when you wrap, say, OpenGL (which is a state machine, no? similar concepts apply?) in what I consider to be a poor manner.

If you have an open-source code sample (with tests) that I could play with, I'd love to take a stab at refactoring it without breaking it (yet with a limited understanding of what it actually does - that kind of would put my money where my...keyboard is).

As for the Foo->Bar part. I don't get what you're saying. If Foo needs Bar, be it directly, or indirectly (factory, whatever), then it's relevant to my use. If it's about to be deprecated, great, but that doesn't change my use of it right now. The unit test either then contains Bar, or doubles it out. If it doubles it out, that's fine --- but I still need one to make a Foo. You can't give me a Foo-API that somehow uses a Bar without Bar being part of the picture. If Foo creates Bar completely internal then the unit tests (as in: how to use a Foo) would not show it, so you're golden.

DaveInsurgent
Posts: 207
Joined: Thu May 19, 2011 4:28 pm UTC
Location: Waterloo, Ontario

Re: Commenting and Conventions

Postby DaveInsurgent » Tue Oct 30, 2012 10:34 pm UTC

No experience with Automata (though I am not intrigued), however I did some research over the last half hour to try and have a clue, and then took to Stack Exchange just to see:

http://codereview.stackexchange.com/que ... n-in-scala

I can "read" the question as well as the answers with my superficial understanding of what is happening. I don't need comments for that. So your point seems to be that the expressiveness of the language is prohibitive, and I'd agree, in which case I'd ask if you can not use a different language? If not, what about some kind of builder pattern? Your domain model doesn't seem to have a transition as an actual entity, just a method call on some context object (not sure what that is). The API reads a lot like what happens when you wrap, say, OpenGL (which is a state machine, no? similar concepts apply?) in what I consider to be a poor manner.

If you have an open-source code sample (with tests) that I could play with, I'd love to take a stab at refactoring it without breaking it (yet with a limited understanding of what it actually does - that kind of would put my money where my...keyboard is).

As for the Foo->Bar part. I don't get what you're saying. If Foo needs Bar, be it directly, or indirectly (factory, whatever), then it's relevant to my use. If it's about to be deprecated, great, but that doesn't change my use of it right now. The unit test either then contains Bar, or doubles it out. If it doubles it out, that's fine --- but I still need one to make a Foo. You can't give me a Foo-API that somehow uses a Bar without Bar being part of the picture. If Foo creates Bar completely internal then the unit tests (as in: how to use a Foo) would not show it, so you're golden.

EvanED
Posts: 4330
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI
Contact:

Re: Commenting and Conventions

Postby EvanED » Wed Oct 31, 2012 2:51 am UTC

DaveInsurgent wrote:No experience with Automata (though I am not intrigued), however I did some research over the last half hour to try and have a clue, and then took to Stack Exchange just to see:

http://codereview.stackexchange.com/que ... n-in-scala

I can "read" the question as well as the answers with my superficial understanding of what is happening. I don't need comments for that.

Meanwhile, I don't think that code is substantially better than mine; I consider the main difference to be that he can say (after you get rid of boiler plate cruft syntax):

Code: Select all

S0 --E0--> S1
   --E1--> S2
   --E2--> S0

while I'd have to repeat S0:

Code: Select all

S0 --E0--> S1
S0 --E1--> S2
S0 --E2--> S0

and that he may be able to list multiple targets in one line:

Code: Select all

S0  --E0--> { S1, S2}

while I'd have to spell them out:

Code: Select all

S0  --E0--> S1
S0  --E0--> S2


But at least for me, it's still fundamentally a textual representation for a graphical structure. How do I know? Because I drew it when I wanted to see if I could figure out what it does! (That was before I realized that it doesn't have an accepting state.) Even if that's a marginal improvement over my API, it's still way inferior to a picture.

Maybe my other example will be more familiar: rebalancing a tree. Show me either a piece of raw code that either performs the operation or tests its correctness that is as clear at illustrating the transformation as the following:

Code: Select all

        P                   Q
       / \                 / \
      /   \               /   \
     a     Q      =>     P     c
          / \           / \
         /   \         /   \
        b     c       a     b


...in which case I'd ask if you can not use a different language?

I cannot; I would use other languages if I had free choice.

Your domain model doesn't seem to have a transition as an actual entity...

It actually does; wfa.addTrans(a, b, c, d) is a shortcut for wfa.addTrans(new Trans(a, b, c, d)). If by the context object you mean wfa, that's the actual automaton; it's an instance of the WFA class.

The API reads a lot like what happens when you wrap, say, OpenGL (which is a state machine, no? similar concepts apply?) in what I consider to be a poor manner.

Don't think of my functions as being like OpenGL calls: it's more like OpenGL's behavior could be modeled as an instance of my WFA class. (I'd have named it something different, even if only Wfa, but that was before my time.)

If you have an open-source code sample (with tests) that I could play with, I'd love to take a stab at refactoring it without breaking it (yet with a limited understanding of what it actually does - that kind of would put my money where my...keyboard is).

Um, so actually I do. If you give me a couple days, I can tar up the latest version (I haven't gotten the repository on a public site yet, only release tarballs) and point you at some places. Actually, if you would still be interested in 3 weeks or something, I hope to be able to put it up on Github, which would make it way easier for you to work on.

The library is an, uh, interesting mixture of actually pretty clean (if a tiny bit too... Java-ish? for my tastes) code and some pretty nightmarish tangles of super-long functions and global variables. As you might guess, there have been I'd say five primary authors who worked on different parts. I actually probably have the least new code if you discount the tests, which are mostly written by me. (It's also research software and we are or were grad students, so there's less consistency and quality control type of stuff than you'd get at a lot of companies, as well as less polish put into handling corner cases and such where we're fine with it just asserting.)

As for the Foo->Bar part. I don't get what you're saying. If Foo needs Bar, be it directly, or indirectly (factory, whatever), then it's relevant to my use.... If Foo creates Bar completely internal then the unit tests (as in: how to use a Foo) would not show it, so you're golden.

I am talking about when Bar is internal and not part of the public API. You said I should look at the code to figure out how to use Foo. Do you actually mean "look at all of Foo's tests to figure out how to use Foo"?

----

Edit: I also wrote this up on the bus:

With regard to the suggestion I made about providing a comment that covers the next group of lines and you suggested that means the function is too long and should be split up:

OK, so first of all, I don't have a problem with somewhat long functions if they are able to be neatly divided; the real problem comes in if you start having long functions combined with complicated control structures. If I could take a 50-line function, cut it into fifths, wrap a function definition around it, pass in two or three parameters, and not really have to change anything else, then that function wasn't too bad to begin with.

And in some sense, you may not get much out of splitting it up. If the function was written in that way in the first place, then you probably don't need to call its parts from anywhere else. (Though to play Devil's advocate, putting in its own function would make it easier to see that it was available.)

A bigger benefit is that splitting it up actually makes sure that the function meets the criteria above -- that it's easy to split up. (After all, you did it!) It makes sure there aren't complicated dependencies between the different sections that make things hard to reason about in similar ways as global variables do (just on a smaller scale). I don't want to discount this benefit, but at the same time, I don't see it as all that significant of a factor.

By far the biggest benefit is that it makes that section independently testable. And if you actually write the tests for it -- that's great, and that is a good reason to split them up (and lose the comment I'm going to advocate for instead in a second). I consider myself reasonably good about testing, but I'm not quite that good -- for a couple reasons, at least in our code base testing private stuff is really obnoxious to do and I almost never do it. So practically speaking, I'd usually just test the outer function anyway.

So there are benefits to splitting it up, but there are drawbacks too -- particularly in terms of code size. I do most of my programming in either Python or C++. Particularly in C++, that sort of thing can quickly become annoying! Suppose you have a long function with no blank lines or comments. If you add a blank line and a one-line comment every 8 lines, you'll increase the code volume by 25%. If you split them into separate functions, in a language with }s you'll increase it by 50% at least unless you commit true code layout sins1. :-) My predominant layout style would produce at least a 88% increase2, and in a significant amount of the time, more than that3.

Spoiler:
1 1 blank line separating functions + 1 line for the function signature and opening brace + 1 line for the closing brace + 1 line for the call = 4 lines. "True layout sins" means putting the closing brace at the end of the preceeding line or not separating your functions with blank lines.

2 In addition to the above, I put a second blank line between functions, usually put the return type on its own line, and put the opening brace of functions on its own line, for 3 additional lines (total: 7).

3 There are two not-uncommon-in-my-C++ cases where I would need more. The first is if the extra functions want to access private members of a class, then the function needs an additional declaration line in the class definition (either to make it a member or a friend) -- and that's in an entirely different file! The second is that C++ type names can easily become quite long and make it quite realistic to have even a two- or three-argument function require multiple lines for the signature.


In other words, rather than

Code: Select all

int foo()
{
    int x = frob_the_grobnoblets();
    int y = sprinkle_the_grobnoblets_on_the_whizbiz();
    return rotate_whizbiz_into_snorb(x, y);
}

I'm more likely to do

Code: Select all

int foo()
{
    // Step 1: frob the grobnoblets
    ... [code from function]

    // Step 2: sprinkle the grobnoblets on the whizbiz
    ... [code from function]

    // Step 3: rotate the whizbiz into the snorb
    ... [code from function]
}

particularly if there are very few variable declarations in the top levels of the three steps.

If you ask me, my way is no less readable than splitting things up. I'm not saying that my way is good and splitting it up is bad, just that my way is reasonable, especially if you're not going to test the individual steps independently.

Edit again: Finally, you have mentioned a number of times both in this thread and the related one about how easy it is for the comments to get out of sync. But suppose I do what you promote and factor frob_the_grobnoblets() into its own function. How long will it be before that function name gets out of sync? It's pretty much the same thing, after all -- except that the barrier to fix it is actually higher in the function case, as I have to change frob_the_grobnoblets() in two or three places (the call, the definition, and possibly a non-defining declaration) instead of just one place.
Last edited by EvanED on Wed Oct 31, 2012 4:13 am UTC, edited 2 times in total.

jareds
Posts: 436
Joined: Wed Jan 03, 2007 3:56 pm UTC

Re: Commenting and Conventions

Postby jareds » Wed Oct 31, 2012 3:57 am UTC

DaveInsurgent wrote:If you have an open-source code sample (with tests) that I could play with, I'd love to take a stab at refactoring it without breaking it (yet with a limited understanding of what it actually does - that kind of would put my money where my...keyboard is).

I'm not EvanED, but I completely agree with him about this. I'm afraid that your position seems to be based on crappy comments in code that doesn't do anything complex. Fine, "@returns the customer" is useless. That is not an argument that comments are useless in general.

Here is a nice open source library based on automata theory which appears to has a good test suite.

I would imagine that most of, for example, re2/dfa.c would be difficult to understand with neither comments nor access to these articles by the author. (Note that in this case, the diagrams are in the articles rather than ASCII art, but I don't think that's important to the point, as you seem to be claiming that uncommented code together with tests should be self-documenting.

However, I also can't really imagine rewriting this library to be self-documenting without comments and/or articles explaining it.

DaveInsurgent
Posts: 207
Joined: Thu May 19, 2011 4:28 pm UTC
Location: Waterloo, Ontario

Re: Commenting and Conventions

Postby DaveInsurgent » Wed Oct 31, 2012 2:16 pm UTC

I am talking about when Bar is internal and not part of the public API. You said I should look at the code to figure out how to use Foo. Do you actually mean "look at all of Foo's tests to figure out how to use Foo"?


Yes, I do, though which tests will be based on how you want to use Foo (and that obviously depends on the quality of the tests, but for the sake of argument we're assuming excellent test quality).

especially if you're not going to test the individual steps independently.


You shouldn't test internal stuff. That's wrong. You test the class. The canonical example is the Money class. You test various aspects of manipulating Money objects. You're concerning yourself with testing function but tests should be about behavior and that behavior is determined by responsibility - looking very loosely at the dfa.cc from jareds, that class is way too big, and has way too much responsibility. If you see that there's a lot of stuff inside that could "go wrong" and you want to test it, that's a sign the class needs to be broken up. It can also be a sign that object-oriented programming is not the correct approach to the problem. That's why I specifically looked for an example in Scala (though something even more obscure might be good too). It seems to me like Automata is a case for a DSL (though I understand you're bound to C++ for whatever your current use is. I'm just saying).

particularly if there are very few variable declarations in the top levels of the three steps.


Though it does create a variable scoping issue that might prevent having simpler (but still descriptive) names, because if they were in the function they wouldn't collide with names of variables in other (now 'inlined') functions. Not saying that happens a lot, but it's something to consider.

Also, your code sample omits the actual content. You're presenting the change in the 'documentation' size-wise but that's not the only net change in lines. You can't compare the two that way and be fair - if you actually put in the [code from ... ] then your foo() is going to be larger - possibly several screen sizes. Even if it's not, if it's just 3-5 lines, it' still a matter of how it reads. The function names have to be something that means something... that's vague I know, but we're talking about something so abstract now... "frob_the_grobnoblets()" has to be a purposeful thing that makes sense from part of the requirements documentation for the software that way someone from the business side can say "umm, sometimes we notice that a grobnoblet is not frobed".

except that the barrier to fix it is actually higher in the function case, as I have to change frob_the_grobnoblets() in two or three places (the call, the definition, and possibly a non-defining declaration) instead of just one place.


So the argument about the naming being wrong is an interesting one. You're right that it is no different than a comment because it's just a textual label for a set of functionality - but I would argue that it's one you have to do anyway (at some level) meaning that you either have 'all functions" or "functions + comments", the later being more complicated by virtue of being non-homogenous. However, the changing it part is wrong: refactoring that function name is a single step in any modern IDE (and foregoing the use of tools is no argument)

..

I am interested now to take a look at both sources and see what I can come up with (or yield to). Right off the bat it seems wrong to have the Automaton as a thing. Isn't that like making an "Internet" or "WorldWideWeb" class in a piece of networking software? One of the reasons I am becoming less fond of OO in general.

User avatar
Mat
Posts: 414
Joined: Fri Apr 21, 2006 8:19 pm UTC
Location: London

Re: Commenting and Conventions

Postby Mat » Wed Oct 31, 2012 10:06 pm UTC

I guess I lean more towards commenting than most.

I don't subscribe to the idea that comments should be avoided because they will inevitably become inaccurate - I think this kind of attitude leads to very change-averse coding, hurting the long term maintainability of your software for the sake of avoiding mistakes. Poorly documented code should be treated the same as poorly written code: if you make a change that contradicts some comment, you've screwed up; if you regularly encounter crap comments and just ignore them, you're not doing your job properly.

Well structured code with good naming etc. goes a long way to communicating what your code does, but any non trivial code can be made more understandable with actual comments and/or documentation, as EvanED's examples demonstrate.

It annoys me when people say that it's up to to the people using/maintaining the code in the future to dig through their code and figure stuff out on their own etc. etc. I'm currently going through the pain of learning my way around a new (sparsely documented) code base, and seriously, fuck that shit. If future users of your code have to waste more time reverse engineering something than it would take you to document it, then you're being a dick. :evil:

Anyway, going back to the original question, here's my wishlist for code that I work with:

Libraries/packages/classes/etc:
If there is any chance anyone that is not you will need to use your code, write some actual documentation! As in, a guide for someone that someone who doesn't already know how everything works. Examples, things to watch out for, etc.
If you are intentionally writing something to be used once, there should be a basic explanation of what it does in the source. The less assumed knowledge the better.

Methods/functions:
A short comment explaining what its for, what is assumed/guaranteed of the input/output and anything else that aids in the generation of an API reference (e.g. javadoc). If you have to hunt down existing uses of the function to write new code, you fail. The name alone should be descriptive enough to guess its purpose - if you need to go to the definition to even understand existing code, you also fail.

Any smaller block of code that isn't obvious:
A comment explaining what it does and why. If the behaviour itself is complicated, tests for that.

Inline comments:
Not convinced there is ever a need for these.

Bugfixes:
Anything like "changed X to Y on date Z" or "fixed bug X" doesn't belong in the source. This kind of knowledge should be captured by check in comments and tests.

Tests as documentation is a cool idea but I think it depends on what tools you are using. I'm only vaguely familiar with rspec and its basic output format so I'm not sure what's out there, but I'd want tests to be a seamless part of the documentation, rather than an alternative to it.

User avatar
WanderingLinguist
Posts: 237
Joined: Tue May 22, 2012 5:14 pm UTC
Location: Seoul
Contact:

Re: Commenting and Conventions

Postby WanderingLinguist » Wed Oct 31, 2012 10:41 pm UTC

I guess I'd add that with modern IDEs, having good comments about usage on functions/methods can be helpful because many IDEs can display these comments at the point where the function is called (via mouseover or in a separate pane when the cursor is in the function name).

But "comment all functions" can a bad rule too if the comments are no good. Just for example, I've seen things like:

Code: Select all

/**
  * @param  a  Offset at which to delete characters
  * @param  b  Number of characters to delete
  */
void deleteChars( int a, int b );

...which is horrible for obvious reasons, and you could easily do away with the comment:

Code: Select all

void deleteChars( int offset, int length );


Another thing to keep in mind is the language barrier: If people with a variety of different naive languages will be working on the code base, it's very easy for someone to get lazier than usual and not update comments. I admit to being guilty of this myself; if there a comment that's not in English, it takes a lot of motivation for me to update it. Everyone says English is the language of programming, and most programmers I know can certainly get by enough in English to understand variables and functions named in English, but that often ends at comments. Variable and function names are essentially more concise and easy to read if it's not your native language; long comments might get skipped (and not updated).

That doesn't mean comments don't have their place, but I think it's good to keep them short and to the point and leave as much as possible to properly named variables, parameters, functions, etc.

jareds
Posts: 436
Joined: Wed Jan 03, 2007 3:56 pm UTC

Re: Commenting and Conventions

Postby jareds » Thu Nov 01, 2012 9:34 am UTC

DaveInsurgent wrote:
EvanED wrote:I am talking about when Bar is internal and not part of the public API. You said I should look at the code to figure out how to use Foo. Do you actually mean "look at all of Foo's tests to figure out how to use Foo"?


Yes, I do, though which tests will be based on how you want to use Foo (and that obviously depends on the quality of the tests, but for the sake of argument we're assuming excellent test quality).

So, the RE2 thing I linked to is a regular expression library. Here is some (useless?) documentation:
Unlike most automata-based engines, RE2 implements almost all the common Perl and PCRE features and syntactic sugars. It also finds the leftmost-first match, the same match that Perl would, and can return submatch information. The one significant exception is that RE2 drops support for backreferences¹ and generalized zero-width assertions, because they cannot be implemented efficiently. The syntax page gives full details.
So, if I want to use this library, am I better of wading through tests to determine that it's mostly PCRE without backreferences? Rather than looking at the syntax chart, if I'm writing a regular expression should I find each construct I want to use in some lengthy series of tests?
DaveInsurgent wrote:Right off the bat it seems wrong to have the Automaton as a thing. Isn't that like making an "Internet" or "WorldWideWeb" class in a piece of networking software?
No...they're quite finite things.
DaveInsurgent wrote:It can also be a sign that object-oriented programming is not the correct approach to the problem. That's why I specifically looked for an example in Scala (though something even more obscure might be good too). It seems to me like Automata is a case for a DSL (though I understand you're bound to C++ for whatever your current use is. I'm just saying).
That's not what's going on here. Nobody is interested in hard-coding automata like that Scala example. Our examples involving automata as objects to be built and manipulated. The RE2 library isn't a case for a DSL and/or it already implements a DSL and uses one internally. Are regular expressions perhaps a DSL in the first place? Do you have any examples of DSLs where the DSL implementation has no comments?
Let's review what's going on a very high level in RE2:
1. Start with a regular expression as a string.
2. It is parsed into a Regexp, an abstract syntax tree.
3. It is compiled in a Prog, a sequence of instructions for an automaton (this is referred to in the articles as a "virtual machine" for running automata; in this analogy, Prog is bytecode and DFA and NFA are two different VMs that can run it)
4. Try running the Prog with DFA, which works by building a state transition graph on the fly, with a cache to bound the memory use, and aborts if the cache is being flushed too much (because if that happens all the time it is just a slower NFA).
5. Try running the Prog with NFA, which is more straightforward.
Of course, each subpart would deserve it's own sub-overview; as well as some description of how regex submatches are handled. You can find this sort of thing in the articles, and then more detail in the comments. This is an excellent way to understand code, start with a high level overview and dig down into on the parts you're interested in.
Again, I can't even imagine how a reading a series of tests could give anyone the understanding of that short overview at the same speed. It seems like it would not be very much better than reading uncommented code

Here is an interesting question. Can tests have comments? Here is an excerpt from re2/testing/dfa_test.cc:

Code: Select all

// Generates and returns a string over binary alphabet {0,1} that contains
// all possible binary sequences of length n as subsequences.  The obvious
// brute force method would generate a string of length n * 2^n, but this
// generates a string of length n + 2^n - 1 called a De Bruijn cycle.
// See Knuth, The Art of Computer Programming, Vol 2, Exercise 3.2.2 #17.
// Such a string is useful for testing a DFA.  If you have a DFA
// where distinct last n bytes implies distinct states, then running on a
// DeBruijn string causes the DFA to need to create a new state at every
// position in the input, never reusing any states until it gets to the
// end of the string.  This is the worst possible case for DFA execution.
static string DeBruijnString(int n) {
  CHECK_LT(n, 8*sizeof(int));
  CHECK_GT(n, 0);

  vector<bool> did(1<<n);
  for (int i = 0; i < 1<<n; i++)
    did[i] = false;

  string s;
  for (int i = 0; i < n-1; i++)
    s.append("0");
  int bits = 0;
  int mask = (1<<n) - 1;
  for (int i = 0; i < (1<<n); i++) {
    bits <<= 1;
    bits &= mask;
    if (!did[bits|1]) {
      bits |= 1;
      s.append("1");
    } else {
      s.append("0");
    }
    CHECK(!did[bits]);
    did[bits] = true;
  }
  return s;
}

// Test that the DFA gets the right result even if it runs
// out of memory during a search.  The regular expression
// 0[01]{n}$ matches a binary string of 0s and 1s only if
// the (n+1)th-to-last character is a 0.  Matching this in
// a single forward pass (as done by the DFA) requires
// keeping one bit for each of the last n+1 characters
// (whether each was a 0), or 2^(n+1) possible states.
// If we run this regexp to search in a string that contains
// every possible n-character binary string as a substring,
// then it will have to run through at least 2^n states.
// States are big data structures -- certainly more than 1 byte --
// so if the DFA can search correctly while staying within a
// 2^n byte limit, it must be handling out-of-memory conditions
// gracefully.
How would you ever know why there was a test running the DFA on a De Bruijn string without these comments? Sure, you would have put DontRunOutOfMemory in the test name, but would you put "Regex That Needs 1bit Each For Last N Chars Thus Exponentially Many States When Run On De Bruijn String" (pretend that's camel case; I didn't want to get in trouble for with the mods) in the regex name and StringContainingAllDistinctNBitSequences in the De Bruijn function name so readers would understand why the test might be expected to run out of memory?

I think this is the crux of the problem. Tests tell you what code is supposed to do, but they don't tell you why it's doing what it's doing, so they are a poor format for conveying understanding.

snow5379
Posts: 247
Joined: Tue Apr 12, 2011 6:06 pm UTC

Re: Commenting and Conventions

Postby snow5379 » Thu Nov 01, 2012 10:08 am UTC

A lot of interesting thoughts and opinions. Thanks!

If you're applying for a job, and they ask you about comments and such... would you answer differently than you think? I'm not applying for a job or anything but I'm curious because I've read through the Java docs and there's like a 10:1 comment:code ratio. I've also peeked at some open source projects and it seems like every single line has a paragraph long comment.

Also what about spacing? I'm OCD and it bugs the heck out of me when people space their shit inconstantly: many times there's an arbitrary amount of spacing (2 spaces, then 3, then no spacing for a while), after every word/symbol/whatever. Then you have different files written by different people and the spacing looks so different it makes me frustrated.

DaveInsurgent
Posts: 207
Joined: Thu May 19, 2011 4:28 pm UTC
Location: Waterloo, Ontario

Re: Commenting and Conventions

Postby DaveInsurgent » Thu Nov 01, 2012 4:16 pm UTC

So, if I want to use this library, am I better of wading through tests to determine that it's mostly PCRE without backreferences?


Well, "PCRE without backreferenes" is more like a feature to me, so it should be part of the product documentation. I don't oppose that at all. There should also be an acceptance test around this feature, which have more textual documentation around them than just the unit test name. Think of it this way, if you needed a regex library with backreferences, and you had a list of 5 of libraries... would you want to jump in to their source code to start reading about what they can or can not do in comments at the top of whatever source file you can find (because the structure is whatever they want?)

I'd want that on a product page. When it comes time to know how to use the library, yes, I want to look at the unit tests, because if they run and they pass they will tell me in a manner that makes me confident that they work. It's both the documentation and a demonstration, if done correctly.

It's not just unit tests. A unit test shows you how one thing works. That's good if you need to understand how that one thing works. The replies thus far have been, to me, in the spirit of "Well, how the hell would you show how the entire product functions in a single test?" (or a significant feature of the product). That's not what unit tests are for. That's what acceptance tests are for, and they're higher level. You WOULD have an acceptance test for a regex library (as a product) that "does not support backreferences". But that functionality is the composition of a whole bunch of much, much smaller units of functionality that themselves work only on the very specific data pertinent to their sole responsibility.

No...they're quite finite things.


That doesn't mean it's correct to base your object model around them, or that object-oriented is the right approach.

Our examples involving automata as objects


Right, and as I just said (and said in the part you replied to): object oriented might not be the best way to represent it. I don't know. I'm bringing it up, not asserting it to be true.

Are regular expressions perhaps a DSL in the first place?


Sure, in which case their DSL (automata?) is used to create another one. There's nothing wrong with that.

I guess I'd add that with modern IDEs, having good comments about usage on functions/methods can be helpful because many IDEs can display these comments at the point where the function is called (via mouseover or in a separate pane when the cursor is in the function name).


Which inherits the problem of inaccuracy.

I don't subscribe to the idea that comments should be avoided because they will inevitably become inaccurate - I think this kind of attitude leads to very change-averse coding,


No, people who are not confident that the software works leads to change-averse coding. If you have sufficient tests (not just unit, but integration and acceptance) you can be confident enough to rewrite anything on a whim and as long as the tests pass, it works. Getting management to understand this is one thing, but getting developers to understand it is pretty easy. It's just hard to get there.

but any non trivial code can be made more understandable with actual comments and/or documentation


Basically, no. This is not what the People Smarter Than Us In Our Field are saying, namely Robert Martin.

If future users of your code have to waste more time reverse engineering something than it would take you to document it, then you're being a dick.


If they wrote such shitty code to begin with, what makes you think the comments would have been any better?

The other issue with comments is they're dually maintained. I've said this before but no one has acknowledged it. It's not just that they may be inaccurate. It's that in order to do them right, you have to do them right in addition to doing the code right. And in addition to doing the tests right.

If you're applying for a job, and they ask you about comments and such... would you answer differently than you think?


No, that's ridiculous. Assuming not starving or desperate, I'd answer exactly how I feel, with the caveat stated that I will adhere to whatever conventions the company wants. I'd also be asking them, I'd want to see some source or pair up with someone for an afternoon.. it's important to want to work there yourself as much as they want you to work there.

Rather than looking at the syntax chart, if I'm writing a regular expression should I find each construct I want to use in some lengthy series of tests?


I just want to reiterate (in case of skip-wall-of-text): I'm not against product documentation. What you're talking about is specifically that. The kind of source code comments I'm debating about only refer to the behavior of the code that they reside with. Documenting how the product works in the source is absurd and not even close to professional.

User avatar
ElWanderer
Posts: 287
Joined: Mon Dec 12, 2011 5:05 pm UTC

Re: Commenting and Conventions

Postby ElWanderer » Thu Nov 01, 2012 4:21 pm UTC

DaveInsurgent wrote:
snow5379 wrote:if(viewer!=null&&target!=null)return viewer.y-target.getHeight()/2;

Don't write code like this. Kittens, etc.

snow5379 wrote:Also what's wrong with this?

Code: Select all

   double offset_x()
   {
      if(viewer!=null)return zoom*viewer.x-screen.getWidth()/2;
      return 0;
   }

   double offset_y()
   {
      if(viewer!=null)return zoom*viewer.y-screen.getHeight()/2;
      return 0;
   }

It centers the viewer by subtracting half on each axis. But it can't work if there's no viewer to center so I check first if the viewer exists. Also I account for zoom.

What'swrongisthatyouhavecodethatreadslikethis.

Whitespace is your friend (as long as you're not using a language where whitespace matters...) For example, despite reading this thread yesterday, it's only this morning that I realised "y-target" is not a single token. I'd never had made that mistake if you had a space either side of that minus sign. Computers are quite good at parsing, but humans can fail horribly.

Also, the order of precedence in that calculation is not immediately obvious at a glance; I'd recommend adding braces as well as spaces so that you don't mis-interpret it when you return to it later to make another change. i.e.:

Code: Select all

return (zoom * viewer.x) - (screen.getWidth() / 2.0);
I've used 2.0 instead of 2 as you're working with doubles - otherwise if screen.getWidth() returns an integer then you may well find it does an integer division (where 1 / 2 = 0, as opposed to 1 / 2.0 = 0.5) then casts the result to a double when subtracting it from the result of (zoom * viewer.x)

Have you read that recent thread about multiple return statements? This is something I got "told off" for in my first ever code review! I must admit that I'm so used to "single return at the bottom" now, that when I looked at your offset functions, I mis-read them as always returning 0. I did not see the return statements buried in the long run-on lines. There's also a school of thought that each if() statement should have its own {}-block so that you can add and remove lines without accidentally destroying the code's flow.

Coming back to the original question about commenting, in the quote above, below the offset_x and offset_y functions, you said this: "It centers the viewer by subtracting half on each axis." That sounds like something that could possibly go into the code somewhere either as a comment or a function name, to explain those calculations. Your code itself uses the words offset and adjust, but nowhere says "center".

snow5379 wrote:Also what about spacing? I'm OCD and it bugs the heck out of me when people space their shit inconstantly: many times there's an arbitrary amount of spacing (2 spaces, then 3, then no spacing for a while), after every word/symbol/whatever. Then you have different files written by different people and the spacing looks so different it makes me frustrated.

Ah, spacing again. Going without spacing is bad because it can quickly become unreadable. I don't think you need more than one space at a time except where deliberately lining things up, nor do I think that every single token needs separating from the next by a space (e.g. you probably wouldn't want to write "object . memberFunction ( )" instead of "object.memberFunction()" each time). Add enough space so that things are readable, but don't add so many spaces it becomes hard to read again. I doubt there's a hard and fast rule that should be applied mercilessly.

Some people seem to have the habit of typing two spaces after a full-stop in prose, and maybe some carry that into coding. Multiple, inconsistent spacing has probably come from copying and pasting and not tidying up afterwards... though it could also be down to using tabs inconsistently (especially if people have different tab sizes defined in their editors - what you see might not be what they saw when they were writing). I hate tabs.
Now I am become Geoff, the destroyer of worlds

User avatar
Mat
Posts: 414
Joined: Fri Apr 21, 2006 8:19 pm UTC
Location: London

Re: Commenting and Conventions

Postby Mat » Thu Nov 01, 2012 8:46 pm UTC

DaveInsurgent wrote:
I don't subscribe to the idea that comments should be avoided because they will inevitably become inaccurate - I think this kind of attitude leads to very change-averse coding,


No, people who are not confident that the software works leads to change-averse coding. If you have sufficient tests (not just unit, but integration and acceptance) you can be confident enough to rewrite anything on a whim and as long as the tests pass, it works. Getting management to understand this is one thing, but getting developers to understand it is pretty easy. It's just hard to get there.

I totally agree. But even if you had perfect tests, that doesn't say anything about how manageable the code is. All it means is that any change you make has a very low risk of breaking, which shouldn't be your only goal.

but any non trivial code can be made more understandable with actual comments and/or documentation


Basically, no. This is not what the People Smarter Than Us In Our Field are saying, namely Robert Martin.

Well people smarter than us write better code, so that's hardly fair :) Can you elaborate on what his view is? I haven't read any of his books.

If future users of your code have to waste more time reverse engineering something than it would take you to document it, then you're being a dick.


If they wrote such shitty code to begin with, what makes you think the comments would have been any better?

Because the people who have contributed to it in the past have a great deal more knowledge about the product/domain than I do now, and I very much doubt that everyone who has ever worked on it is a worse programmer than I am. There are some structural issues, but on the whole it's decent software that works.

The other issue with comments is they're dually maintained. I've said this before but no one has acknowledged it. It's not just that they may be inaccurate. It's that in order to do them right, you have to do them right in addition to doing the code right. And in addition to doing the tests right.

Yeah that was what I meant by inaccurate, out of sync with the code. As with testing, I think the benefits outweigh the costs.

I just want to reiterate (in case of skip-wall-of-text): I'm not against product documentation. What you're talking about is specifically that. The kind of source code comments I'm debating about only refer to the behavior of the code that they reside with. Documenting how the product works in the source is absurd and not even close to professional.

If you are concerned about dual maintenance, surely external documentation is worse? What exactly is unprofessional about commenting how stuff works?

DaveInsurgent
Posts: 207
Joined: Thu May 19, 2011 4:28 pm UTC
Location: Waterloo, Ontario

Re: Commenting and Conventions

Postby DaveInsurgent » Thu Nov 01, 2012 9:43 pm UTC

"Stuff" is insufficiently descriptive. There are different kinds of documentation that need to be considered.

If you make a product that does [x], and in order to do so, it make use of a Foo, Bar, and a Baz then your [x] is a feature/product level documentation. There hopefully is a high level acceptance test to verify that the product indeed does [x], because [x] is a reason people make a decision to go with your product and if you don't do it, you're in trouble! That acceptance test is itself written in such a way that can be read, but I agree that you don't have to only use that test as a means of showing how your product works. But for any reasonably complex product, there's likely an entire team that handles that. Your engineers aren't the ones that write that.

Now, Foo, Bar, and Baz are "stuff" but they're individual pieces of behavior that by themselves don't make a product or a solution. You still have to test those. The people that read those tests are going to be other developers. You document those types of "stuff" with tests. Developers write them, and developers read them.

Software libraries obvious straddle a unique place where developers can both occupy an internal role (developing the product) and an external role (consuming the product to develop other products) but that doesn't change the nature or the roles that tests play: you can't expect product-level documentation out of a unit test. There's no ideal place to say "this library does not support back-references" in a source file. It belongs on the product site, and sure, it's desirable to generate that information - but I'd do so using the requirements content that the acceptance tests are based on (or the acceptance tests themselves). Not the unit tests and not the source code.

None of the replies, thus far, have successfully differentiated between "API documentation" (how to use it) from "product documentation" (what it can be used to do). I made the argument that the API should be self-documenting and do so via descriptive naming and purposeful, clear tests. The product documentation is a whole other ballgame.

So when one asks "How would I know that it doesn't support back-references when I go to use it?" I say, because you made the decision to use it based on that knowledge. You should be, by the time you're looking at the source, in a position where you just want to know how to use it for it's intended purpose. You get that from product docs. The source code is not a product doc.

I totally agree. But even if you had perfect tests, that doesn't say anything about how manageable the code is. All it means is that any change you make has a very low risk of breaking, which shouldn't be your only goal.


I'm not sure what you mean. The only point of writing software is to solve a problem. If the solution isn't broken, it's working software. In the meta-space of "bad code" which is itself a problem (but not a customer-facing one because the tests verify the software works), then yes, my only goal is to improve the code while keeping the product functional. (The assumption here is that's what I've been tasked or tasked myself to do: the business feels that we're having a hard time delivering new features, the development team says it's because the code is awful to work with. The business then finds it reasonable to start refactoring the code. Now one of two things is true: you're afraid to change it, or you're not.)

Can you elaborate on what his view is? I haven't read any of his books.


Basically what I've been saying, though with more tact and experience and the authority that comes with being a published author. I highly recommend "Clean Code" and "The Clean Coder".

Because the people who have contributed to it in the past have a great deal more knowledge about the product/domain than I do now, and I very much doubt that everyone who has ever worked on it is a worse programmer than I am. There are some structural issues, but on the whole it's decent software that works.


Sure, these are the 'dragons' where I work. They craft up some seriously intense stuff, and it's great that it works, but their comments aren't any less cryptic than their code.

User avatar
WanderingLinguist
Posts: 237
Joined: Tue May 22, 2012 5:14 pm UTC
Location: Seoul
Contact:

Re: Commenting and Conventions

Postby WanderingLinguist » Thu Nov 01, 2012 10:10 pm UTC

snow5379 wrote:A lot of interesting thoughts and opinions. Thanks!

If you're applying for a job, and they ask you about comments and such... would you answer differently than you think? I'm not applying for a job or anything but I'm curious because I've read through the Java docs and there's like a 10:1 comment:code ratio. I've also peeked at some open source projects and it seems like every single line has a paragraph long comment.

Also what about spacing? I'm OCD and it bugs the heck out of me when people space their shit inconstantly: many times there's an arbitrary amount of spacing (2 spaces, then 3, then no spacing for a while), after every word/symbol/whatever. Then you have different files written by different people and the spacing looks so different it makes me frustrated.


Well, as you noticed, everyone does it differently. In general, you want to space things so that it's readable (that is, don't mash it all together with no spaces), but exactly what is "readable" depends somewhat on the individual. If you're working on an existing code base, you should try to match the style of that code base as much as possible. If you're writing your own, use a style that you're comfortable with.

Personally, I tend to use either 0 or 1 spaces, unless I have a table in the code, in which case I'll add more spaces so things line up in columns). Generally, I use spaces in such a way that longer expressions are easier to read. If it gets too long and hard to take in at a glance, breaking it up into separate expressions and store intermediate values in temporary variables sometimes helps (if those intermediate variables have useful names)

For example, I would probably do this:

Code: Select all

(a*b + c*d) / (e*f + g*h)

rather than this:

Code: Select all

(a*b+c*d)/(e*f+g*h)


In this example, it's easy to figure out precedence in both cases, but for a more complex expression with lots of nested parenthesis it might help.

But basically, if your company has a standard, follow it. If the code base you're working on has an existing style, follow it.

And keep in mind that most IDEs provide an option to automatically re-format a code block (spacing, brace placement, etc.). Eclipse in particular has an incredible number of customizable options for code format; definitely worth checking out.

snow5379
Posts: 247
Joined: Tue Apr 12, 2011 6:06 pm UTC

Re: Commenting and Conventions

Postby snow5379 » Fri Nov 02, 2012 11:18 pm UTC

What about ternary operations? How would you split them up?

At the moment I have this:

Code: Select all

Image image(double a)
{
   return zoom==a?scaled:(scaled=tools.scale(image,image.getWidth(null)*(zoom=a),image.getHeight(null)*a));
}

Since I don't want to scale my images every time the screen is refreshed (oh god no) I only scale them when the player zooms in or out. However I'm having a hard time figuring out how to split up what would otherwise be a one line function. :(

EvanED
Posts: 4330
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI
Contact:

Re: Commenting and Conventions

Postby EvanED » Fri Nov 02, 2012 11:53 pm UTC

snow5379 wrote:What about ternary operations? How would you split them up?

At the moment I have this:

Code: Select all

Image image(double a)
{
   return zoom==a?scaled:(scaled=tools.scale(image,image.getWidth(null)*(zoom=a),image.getHeight(null)*a));
}

Since I don't want to scale my images every time the screen is refreshed (oh god no) I only scale them when the player zooms in or out. However I'm having a hard time figuring out how to split up what would otherwise be a one line function. :(

A ternary expression with embedded assignments? Really?

Please don't do side-effecting things (assignments and increment/decrement operators) as subexpressions in a larger expression -- they should be at the top level. (Exception: foo = bar = baz = 7 is fine. Possible exception: something like while((current = next_something()) != sentinel) { ... }.)

I would probably write that

Code: Select all

Image image(double new_zoom)
{
    if (new_zoom != zoom) {
        zoom = new_zoom;
        scaled = tools.scale(image, image.getWidth(null)*zoom, image.getHeight(null)*zoom));
    }
    return scaled;
}


The assignment to scaled could be formatted as

Code: Select all

        scaled = tools.scale(image,
                             image.getWidth(null) * zoom,
                             image.getHeight(null) * zoom));

User avatar
Mat
Posts: 414
Joined: Fri Apr 21, 2006 8:19 pm UTC
Location: London

Re: Commenting and Conventions

Postby Mat » Sat Nov 03, 2012 12:27 pm UTC

DaveInsurgent wrote:"Stuff" is insufficiently descriptive. There are different kinds of documentation that need to be considered.

If you make a product that does [x], and in order to do so, it make use of a Foo, Bar, and a Baz then your [x] is a feature/product level documentation. There hopefully is a high level acceptance test to verify that the product indeed does [x], because [x] is a reason people make a decision to go with your product and if you don't do it, you're in trouble! That acceptance test is itself written in such a way that can be read, but I agree that you don't have to only use that test as a means of showing how your product works. But for any reasonably complex product, there's likely an entire team that handles that. Your engineers aren't the ones that write that.

Now, Foo, Bar, and Baz are "stuff" but they're individual pieces of behavior that by themselves don't make a product or a solution. You still have to test those. The people that read those tests are going to be other developers. You document those types of "stuff" with tests. Developers write them, and developers read them.

Software libraries obvious straddle a unique place where developers can both occupy an internal role (developing the product) and an external role (consuming the product to develop other products) but that doesn't change the nature or the roles that tests play: you can't expect product-level documentation out of a unit test. There's no ideal place to say "this library does not support back-references" in a source file. It belongs on the product site, and sure, it's desirable to generate that information - but I'd do so using the requirements content that the acceptance tests are based on (or the acceptance tests themselves). Not the unit tests and not the source code.

None of the replies, thus far, have successfully differentiated between "API documentation" (how to use it) from "product documentation" (what it can be used to do). I made the argument that the API should be self-documenting and do so via descriptive naming and purposeful, clear tests. The product documentation is a whole other ballgame.

So when one asks "How would I know that it doesn't support back-references when I go to use it?" I say, because you made the decision to use it based on that knowledge. You should be, by the time you're looking at the source, in a position where you just want to know how to use it for it's intended purpose. You get that from product docs. The source code is not a product doc.

I admit I conflated a few things in my original post but I'm really talking about documentation aimed at developers, for components rather than the end product, for example API references like these:
http://ruby-doc.org/stdlib-1.9.3/libdoc/optparse/rdoc/OptionParser.html
http://docs.python.org/2/library/argparse.html
http://docs.oracle.com/javase/1.5.0/docs/api/java/util/Scanner.html

These are publicly available libraries so obviously the standard is higher than something for internal use, but how would you convey the same level of information with just code & tests?

I prefer to have this kind of thing generated from source comments, so you don't lose the connection between the two, although more detailed information could go in a separate document.

I totally agree. But even if you had perfect tests, that doesn't say anything about how manageable the code is. All it means is that any change you make has a very low risk of breaking, which shouldn't be your only goal.


I'm not sure what you mean. The only point of writing software is to solve a problem. If the solution isn't broken, it's working software. In the meta-space of "bad code" which is itself a problem (but not a customer-facing one because the tests verify the software works), then yes, my only goal is to improve the code while keeping the product functional. (The assumption here is that's what I've been tasked or tasked myself to do: the business feels that we're having a hard time delivering new features, the development team says it's because the code is awful to work with. The business then finds it reasonable to start refactoring the code. Now one of two things is true: you're afraid to change it, or you're not.)

Yes, the point of writing software is to solve a problem within some constraints, e.g. dev time. Whether it is reasonable to start refactoring depends on the cost of the refactoring itself, and the risk involved, as well as the benefits. Tests make this safer but not necessarily easier. Documentation/commenting makes it easier but not safer. The less you have of either one, the less likely you are to regularly refactor.

I'd be quite wary of experimental refactorings of something I didn't fully understand, and relying on the tests too much. They're a good safety net, but you're still likely to have gaps in coverage, and they may not capture non functional requirements very well, e.g. performance/security concerns. And you may not even know what it affects until you've invested some time and broken it. Better understanding the authors intent would save you from having to figure it out the hard way.

Because the people who have contributed to it in the past have a great deal more knowledge about the product/domain than I do now, and I very much doubt that everyone who has ever worked on it is a worse programmer than I am. There are some structural issues, but on the whole it's decent software that works.


Sure, these are the 'dragons' where I work. They craft up some seriously intense stuff, and it's great that it works, but their comments aren't any less cryptic than their code.

Actually some of it's pretty well commented (with ascii diagrams and everything!). My main problem is more finding the code I'm interested in. It's a very ajax heavy web application with various reusable ui components that fit together in funny ways, and it's sometimes difficult to track down how data propagates through the system, and how to reuse components in different contexts. Doesn't help that we use django, which is basically magic. :(

DaveInsurgent
Posts: 207
Joined: Thu May 19, 2011 4:28 pm UTC
Location: Waterloo, Ontario

Re: Commenting and Conventions

Postby DaveInsurgent » Sat Nov 03, 2012 2:58 pm UTC

I admit I conflated a few things in my original post but I'm really talking about documentation aimed at developers, for components rather than the end product,


Ah, but I said - developers are both sides of the coin sometimes. But still - for an API, the documentation you're referring to is product documentation. An API is a feature of a product - that library is a product.

So, right off the bad, I opened up that OptionParser page and I saw a wall of ASCII text. What. The. Fuck. ASCII text on an HTML page. I have no beef with a diagrams, but they should be things - they can be generated using code analysis, that's for sure, but if they need to represent something, use a fucking real diagram not characters.

The introduction/features are great examples of product documentation and I wish were accompanied by acceptance tests to prove they work. I don't know if they do, but, obviously we can make the assumption. It's just nice to know for sure.

Then we get in to the usual:

Code: Select all

version[W]
Version


type empty, redundant documentation ... and then of course the page comment has someone pointing out a bug regarding to text wrapping. There's of course no mention in any of the documentation how text wrapping would be handled.

These are publicly available libraries so obviously the standard is higher than something for internal use


I try not to allow that to dictate how I write software. Once you do it the "right" way, it's the same, regardless.

but how would you convey the same level of information with just code & tests?


They don't. They already have product documentation, and that's cool. I'm saying that you shouldn't comment the source code and expect that to drive your product documentation. Source code (files) represent small pieces of functionality that by themselves don't necessarily do something interesting or useful to you, the consumer (and yes, developers are consumers of libraries). Acceptance tests inherently span multiple units of functionality, so they're great. I think you could, if you're clever, generate product docs from acceptance test metadata, but I'm saying that for larger, more commercially viable projects, you likely have a team to do that. Doesn't it make sense that when your technical writer goes to document your system for the people that will consume it, that they go and look at your acceptance tests for the software release you're about to drop to verify the features are working? They shouldn't look at unit tests for that.

All of my advice goes hand in hand with having small classes, small functions, and the principles of SOLID applied to your work. So far the examples that might argue against me have interestingly all been parsing related - which, I doubt most people work on. Those are useful libraries, commonly used, but the actual act of making one is not something most of us would do as a means of gainful employment. They usually make up some small piece of a larger system. The regex library had over two thousand lines in a single source file. That's obviously wrong. It's not a matter of preference at that point.

so you don't lose the connection between the two


The connection is already lost. They (code, comments) are not connected at all except by location. I trust the senior people in the profession that this is more often the case than not. It was aptly pointed out already that function naming can also suffer the drift but it's one you can't avoid. It's a necessity. I don't want to undertake yet another case of non-functional maintenance.

Tests make this safer but not necessarily easier. Documentation/commenting makes it easier but not safer.


I hate to make the appeal so often, but again: the experts in our field disagree. The documentation/commenting offers little-to-no actual value, only perceived value, and it's often misleading. When I write software, I rewrite parts of it as I see fit. It's not something I do because "today is re-factoring day". I was tasked with fetching some additional data from an external system and at that point chose to break one class in to two and start using composition. I had to add some tests, but the unit tests for the class that got broken up (it stayed alive, just with less responsibility) remained. The reason I was able to do it so confidently was because of tests. All the comments in the world could have still yielded code that didn't work after what seemed like reasonable changes.

I'd be quite wary of experimental refactorings of something I didn't fully understand, and relying on the tests too much.


Well, no, tests aren't a substitute for understanding the problem domain you're working in. But if you understand the language and understand the responsibility of the piece of code (which, again, the tests verify... so read them) then you should be fine. You still have to understand things like binary compatibility and all that jazz. You can't randomly make changes to the code that need changes to the test - that's more "re-achitecture" than it is re-factoring.

To say it again: it's not experimenting if you have tests that prove it does the same thing as it did before.

My main problem is more finding the code I'm interested in.


Well, what part of the source code should they have put that documentation in? ;)


Code: Select all

Image image(double new_zoom)
{
    if (new_zoom != zoom) {
        zoom = new_zoom;
        scaled = tools.scale(image, image.getWidth(null)*zoom, image.getHeight(null)*zoom));
    }
    return scaled;
}


It's preferable to use positive checks as it's easier to traverse them when coming back later to the code. I realize you were making a small point - I'm just extending it:

Code: Select all

Image get_scaled_image(double new_zoom)
{
    scale_image_if_needed(new_zoom);

    return scaled_image;
}

scale_image_if_needed(double new_zoom) {
  if (new_zoom - current_zoom < zoom_epsilon) {
    return;
   }

   scale_image(new_zoom)
}

scale_image(double new_zoom) {
   scaled_image = tools.scale(original_image, new_zoom);
   current_zoom = new_zoom;
}


A few things:

1. I explicitly referenced original_image because it doesn't usually make sense to repeatedly scale the same object (lossy)
2. The current_zoom is only updated when the scaling operation is successful. This is a hidden bug that if an exception propagates up from the scale function, and is caught for some reason, the current_zoom
3. Generally not good to compare floating point values for equality like that. More appropriate to use an epislon.

All of those things can be tested/enforced with unit tests.

Another consideration is that, after this refactoring, it's exceedingly obvious that what you really have here is a typical lazy-value initialization. This can sometimes be better represented using aspect-oriented programming or proxying.

Another thought would be encapsulating Image in an Asset interface (such that it is an ImageAsset) - then you can stop whatever ungodly thing requires null value passing to propagate through your system.

snow5379
Posts: 247
Joined: Tue Apr 12, 2011 6:06 pm UTC

Re: Commenting and Conventions

Postby snow5379 » Sun Nov 04, 2012 3:19 am UTC

Are null values bad? I use them all the time.

Basically when the player loads a map all tiles start at null. If a tile is referenced and is null then it's built and the new tile is displayed. I don't like loading things up front because that causes a 0.5 second delay (my maps are 1000 by 1000 tiles) when changing maps which can be annoying as fuck to the player if they're running through a set of maps. If the tiles are loaded as needed on the other hand it takes less than one repaint cycle.

I use the same logic in just about every method: if it's null, load it. Otherwise use the already loaded one. I don't see the issue of "propagating null values" as you put it? o_O

Sometimes the load can fail (if a file is missing for example) and in that case a placeholder is generated: for example if a monster's image isn't loaded instead a black square will be shown.

DaveInsurgent
Posts: 207
Joined: Thu May 19, 2011 4:28 pm UTC
Location: Waterloo, Ontario

Re: Commenting and Conventions

Postby DaveInsurgent » Sun Nov 04, 2012 1:36 pm UTC

Passing null is considered sub optional because it carries with it a null check. That's what you did. Internal null is natural but try not to pass it and it shouldn't affect behavior (eg lazy load). Using null to control program flow is a recipe for problems

snow5379
Posts: 247
Joined: Tue Apr 12, 2011 6:06 pm UTC

Re: Commenting and Conventions

Postby snow5379 » Sun Nov 04, 2012 10:19 pm UTC

So... it's best to throw and exception than pass null?

Is having trys and catches really fundamentally different from having null checks? I think the latter is better because many functions process nulls just fine... and aren't trys and catches a bit expensive?

I don't know... it just seems weird that using nulls is considered "lazy" when it's damn efficient and easy to follow. I guess it's possible that trys and catches are easier to debug because you can print out exactly what is causing the function to not work... but many times it's damn obvious as there's only one thing that can break it.

DaveInsurgent
Posts: 207
Joined: Thu May 19, 2011 4:28 pm UTC
Location: Waterloo, Ontario

Re: Commenting and Conventions

Postby DaveInsurgent » Mon Nov 05, 2012 2:54 am UTC

Think about what you're asking.

Throwing an exception means something exceptional happened. It's a dramatic change in program flow. I would, for example, expect an exception if a string was being parsed in to a number and there was a character that was not understandable. That kind of parsing wouldn't happen in a processing-intensive part of your algorithm, so your concerns about the try-catch overhead are irrelevant.

Checking for null is not the same thing. It's a relatively minor change in program flow that then propagates everywhere. That or it's a mystery to debug. Your program structure will dictate what is best. But passing null as an argument is not the ideal way. You have to ask yourself why things were null to begin with. Like I said, lazy-load is a fine example of a time where an internal null would be expected. But it doesn't leave that location. It's not about the performance of the null check. It's about the effect it has on your system. Everywhere you now have to check if something is null, and what if it is?

C++ gives you references, which are a nice way of assuring that the thing is not null. It's a way of stopping the null check propagation (or ensuing bug from the lack thereof). Languages like Java give you automatic null checking, which only means your program breaks in a consistent way - not consistently. For that reason, passing null is considered risky and not preferable. The same is true for checking null in C++. If you encounter null, what do you do? Do you just not do something? Is that safe? It introduces some serious ambiguity about what the expected state of the system should be after coming across null. It requires thought as two why the program can get in to such a state to begin with, and how it might be altered so as to prevent that from happening or being necessary. Like I said, lazy-load is an example where you'd expect a null. But it is encapsulated. My default position is that nulls should remain internal and never be passed, and only with some critical thinking and a general sense of suspicion would I agree to let them escape.

I'm not saying you have to go hunt down every null check/pass in existence. But you should at least be aware that it's a strong sign that you should re-think what you're doing. As this thread has already said, there's very little in the way of "this is always wrong, never do it" type rules, but it's one of those things. It's like a pedo mustache. You can't go arrest someone for it, but it makes you wonder.


Return to “Coding”

Who is online

Users browsing this forum: No registered users and 5 guests