Unsigned integers

Please compose all posts in Emacs.

Moderators: phlip, Moderators General, Prelates

EvanED
Posts: 4331
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI
Contact:

Unsigned integers

Postby EvanED » Mon Mar 24, 2008 4:34 am UTC

C has them, Java doesn't. Are they a good idea?

I know they are required sometimes; this is not what I'm talking about. I'm more concerned with a particular question. I'm creating classes that wrap a couple from the STL that do things a little differently. One thing that I've always been a little annoyed about is that the size() methods of all the STL classes return a size_t instead of a signed integer type. This means that either I need to turn off or ignore things like signed/unsigned comparison or use unsigned types myself. I've never been a fan of them when they aren't needed.

For instance, take a std::vector. There is almost no way that there could be enough elements in the vector so that it is out of the range of, say, a 'signed long' but in the range of an 'unsigned long'. The only way is if it were a vector of chars or something else one byte and took up half the memory space, or if it was a vector<bool> under an implementation that packed bits. If you make a vector<short>, if there were more than the max value of a 'signed long', it would be bigger than the process memory space. (Okay, you probably could make an implementation of std::vector that uses a file to back its storage. But show me a vendor that actually does and I'll mail you some cookies.)

Meanwhile, using unsigned numbers means that you have to deal with situations like the following otherwise-reasonable loop not working

Code: Select all

for(unsigned int i = N - 1; i>=0 ; --i)
    ...
This seems like a good enough reason to prefer signed types unless you have a good reason the other way.

On the other hand, it's kind of nice to have the type describe that negative numbers aren't legal. However, without actual enforcement of that (I don't count wrapping -1 to 2^32-1 as enforcing that the value is not negative), I don't think that's enough of a reason to prefer it given the possibility for semi-latent bugs like that.

So I'm really asking two questions here. What are your thoughts on unsigned types in general and what do the C++ people think about making my wrapper classes return signed ints from size()?

User avatar
ash.gti
Posts: 404
Joined: Thu Feb 07, 2008 1:18 am UTC
Location: Probably a coffee shop.

Re: Unsigned integers

Postby ash.gti » Mon Mar 24, 2008 5:08 am UTC

EvanED wrote:Meanwhile, using unsigned numbers means that you have to deal with situations like the following otherwise-reasonable loop not working
CODE: SELECT ALL
for(unsigned int i = N - 1; i>=0 ; --i)
    ...
This seems like a good enough reason to prefer signed types unless you have a good reason the other way.


What's wrong with

Code: Select all

for (unsigned int i = N; i > 0; --i)

anyway?

Or even

Code: Select all

for (unsigned int i = N; i != 0; --i)


I think unsigned int 's are useful because there are times when you know it shouldn't ever be negative and it narrows your margin of error. Yeah, you might have the case of subtracting below 0, but that should give you a run-time error and/or a warning.

When exactly can something have a negative size anyway? I can understand 0, but negative? I can't think of an instance of a *traditional* size() operation that should return negative unless your overriding the size() functionality for something.
# drinks WAY to much espresso

EvanED
Posts: 4331
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI
Contact:

Re: Unsigned integers

Postby EvanED » Mon Mar 24, 2008 5:46 am UTC

ash.gti wrote:What's wrong with

Code: Select all

for (unsigned int i = N; i > 0; --i)

anyway?

Nothing, but then you need to use 'i-1' everywhere within the body of the loop, which could get annoying. Why don't you use 'for(int i=1 ; i<=N ; ++i)' when iterating forward instead of 'for(int i=0 ; i<N ; ++i)'? (Assuming you actually depend on the value and not just the number of iterations; but then I would prefer the counting up version anyway.)

A more compelling argument might be to use reverse iterators, as my motivating example is walking backward over an array.

Yeah, you might have the case of subtracting below 0, but that should give you a run-time error and/or a warning.

In an ideal language, yes. But C won't give you a runtime error in most implementations, and I've never known 'i--;' to produce a warning even though it could overflow.

When exactly can something have a negative size anyway? I can understand 0, but negative? I can't think of an instance of a *traditional* size() operation that should return negative unless your overriding the size() functionality for something.

size() would never return a negative number. However, it would let you iterate over the array using a signed number without getting a warning.

User avatar
segmentation fault
Posts: 1770
Joined: Wed Dec 05, 2007 4:10 pm UTC
Location: Nu Jersey
Contact:

Re: Unsigned integers

Postby segmentation fault » Mon Mar 24, 2008 5:57 pm UTC

in terms of size functions, a size would never be negative, and its a preventative measure as well as code that makes its purpose clear.
people are like LDL cholesterol for the internet

User avatar
TomBot
Posts: 228
Joined: Sun Jul 29, 2007 1:17 am UTC
Location: Illinois (UIUC)
Contact:

Re: Unsigned integers

Postby TomBot » Mon Mar 24, 2008 6:48 pm UTC

I will admit that looping to zero with unsigned integers is a pain. But is that really worth losing the capability to have >2GB vectors? Anyway, you're not supposed to use the indices directly anyway, because then it's a pain if you switch to something besides a vector. Try:

Code: Select all

for(vector<Foo>::iterator i = v.rbegin(); i != v.rend(); --i)
        do_something(*i);

EvanED
Posts: 4331
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI
Contact:

Re: Unsigned integers

Postby EvanED » Mon Mar 24, 2008 7:28 pm UTC

TomBot wrote:I will admit that looping to zero with unsigned integers is a pain. But is that really worth losing the capability to have >2GB vectors?

How often does that arise? By default in Windows, you already can't have > 2GB vectors, because the upper half of the address space is not writable by the process! You have to modify boot.ini with the /3GB flag or however you do it.

Besides, it's not just whether you can have >2GB vectors, it's whether you can have more than 2 billion ELEMENTS. If your element size is even two bytes, your vector would have to be bigger than your address space before size() would return something not representable by signed numbers.

With other containers it's even worse, because there is more overhead; even if you make a list<char> say, the nodes will be at least 5 bytes, which means at least 8 bytes, which means that with just over 500 million entries you have exhausted a 32-bit address space.

So yes, I do think it is worth it to lose that capability to reduce the annoyances and bug potential caused by unsigned numbers. I find the argument that the unsignedness is self-documenting more compelling than that.

(Of course all this holds equally validly in a 64-bit address space as long as you're debating between 64-bit signed and unsigned types.)

Anyway, you're not supposed to use the indices directly anyway, because then it's a pain if you switch to something besides a vector. Try:

Code: Select all

for(vector<Foo>::iterator i = v.rbegin(); i != v.rend(); --i)
        do_something(*i);

I did say that was a better counterexample before. ;-)

However, using iterators is (until C++0x is common enough to start depending on it and 'auto') a lot of extra typing and code clutter for what I see as very little benefit. It's not uncommon that the "vector<Foo>::iterator" is long enough that either you have to introduce a local typedef or split the contents of the for loop onto multiple lines to keep it readable, especially if dependent types come into the picture and you have to use typename.

If I decide that I want to change from a vector at a later point, I can change the loop then. This has happened infrequently enough that I think this is a good tradeoff.

User avatar
d3adf001
Posts: 1000
Joined: Thu Mar 29, 2007 4:27 pm UTC
Location: State College, PA
Contact:

Re: Unsigned integers

Postby d3adf001 » Tue Mar 25, 2008 3:50 am UTC

couldnt you just use a double?

Rysto
Posts: 1460
Joined: Wed Mar 21, 2007 4:07 am UTC

Re: Unsigned integers

Postby Rysto » Tue Mar 25, 2008 4:23 am UTC

d3adf001 wrote:couldnt you just use a double?

No.

User avatar
ash.gti
Posts: 404
Joined: Thu Feb 07, 2008 1:18 am UTC
Location: Probably a coffee shop.

Re: Unsigned integers

Postby ash.gti » Tue Mar 25, 2008 4:30 am UTC

d3adf001 wrote:couldnt you just use a double?


For many many reason's a double would be a bad idea...

Yes, they do decimals, and have large ranges...

But for those exact reasons you don't need that type of variable here. The numbers returned from a size() should never be negative, or a decimal.

You can't have an array with 2 and a half elements... or an array with -4 elements... that just logically doesn't make sense.
# drinks WAY to much espresso

User avatar
segmentation fault
Posts: 1770
Joined: Wed Dec 05, 2007 4:10 pm UTC
Location: Nu Jersey
Contact:

Re: Unsigned integers

Postby segmentation fault » Tue Mar 25, 2008 3:42 pm UTC

ash.gti wrote:You can't have an array with 2 and a half elements... or an array with -4 elements... that just logically doesn't make sense.


according to YOUR logic...

;)
people are like LDL cholesterol for the internet

User avatar
evilbeanfiend
Posts: 2650
Joined: Tue Mar 13, 2007 7:05 am UTC
Location: the old world

Re: Unsigned integers

Postby evilbeanfiend » Tue Mar 25, 2008 3:57 pm UTC

unsigned shows intent better
signed allows (slightly) better run time error detection

is there any chance of an overflow that you would need to detect?

personally id go for unsigned, if only by the principle of least surprise i.e. anyone using your class that has used the STL will expect size_t anyway.

personally i'm usually suspicious of signed/unsigned comparisons with size_t as i suspect it means my signed int should be unsigned. but if you really need a signed/unsigned comparison just cast one of them and quit grumbling
in ur beanz makin u eveel

Rysto
Posts: 1460
Joined: Wed Mar 21, 2007 4:07 am UTC

Re: Unsigned integers

Postby Rysto » Tue Mar 25, 2008 5:00 pm UTC

EvanED wrote:Nothing, but then you need to use 'i-1' everywhere within the body of the loop, which could get annoying. Why don't you use 'for(int i=1 ; i<=N ; ++i)' when iterating forward instead of 'for(int i=0 ; i<N ; ++i)'? (Assuming you actually depend on the value and not just the number of iterations; but then I would prefer the counting up version anyway.)

You could just do:

Code: Select all

unsigned i = N;
while(i > 0) {
    i--;
    ...

User avatar
segmentation fault
Posts: 1770
Joined: Wed Dec 05, 2007 4:10 pm UTC
Location: Nu Jersey
Contact:

Re: Unsigned integers

Postby segmentation fault » Tue Mar 25, 2008 5:04 pm UTC

personally im completely against having a loop control variable for a while. use a for instead.
people are like LDL cholesterol for the internet

Rysto
Posts: 1460
Joined: Wed Mar 21, 2007 4:07 am UTC

Re: Unsigned integers

Postby Rysto » Tue Mar 25, 2008 5:08 pm UTC

That doesn't have the same semantics as a for loop. If you wanted to get clever you could do:

for(unsigned i = N; i-- > 0;)

But that's even worst IMO.

User avatar
ash.gti
Posts: 404
Joined: Thu Feb 07, 2008 1:18 am UTC
Location: Probably a coffee shop.

Re: Unsigned integers

Postby ash.gti » Tue Mar 25, 2008 5:27 pm UTC

segmentation fault wrote:
ash.gti wrote:You can't have an array with 2 and a half elements... or an array with -4 elements... that just logically doesn't make sense.


according to YOUR logic...

;)


But... My logic is ir~i~fruitible!!!

Who's ever heard of putting half a pigeon in a pigeon hole?! Thats like sadistic or something...

/sarcasm_off

Is there anything in the C++0x about this?

I looked around there website and was impressed by a few of the things they plan on changing and what have you (like for loops) and when I glanced through a few documents they always use size_t for array size information.

So... assuming your size() functions is for arrays, it would seem to me, in my limited understanding of C++0x that you would want to return a size_t for consistencies sake.

This makes me want to actually try C++ again >< but I know the first thing thats going to annoy me about it, because it annoys me in every language that isn't designed based off a single object root would be the way objects work in C++.
# drinks WAY to much espresso

coppro
Posts: 117
Joined: Mon Feb 04, 2008 6:04 am UTC

Re: Unsigned integers

Postby coppro » Wed Mar 26, 2008 3:30 pm UTC

What's the big deal with objects that aren't derived from a global root object class? If you are passing around objects of any type, you probably want to use templates, or some sort of clever wrapper class (like Boost.Any).

User avatar
Yakk
Poster with most posts but no title.
Posts: 11115
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove

Re: Unsigned integers

Postby Yakk » Wed Mar 26, 2008 4:03 pm UTC

The problem with unsigned ints is that they model "positive numbers" quite poorly.

They are actually numbers modulo 2^32. This they model very well. (Yes, the number 32 varies by implementation).

There are situations where the space of numbers modulo 2^32 are useful...

But if you are using those to model positive numbers, I would personally have removed all subtraction operations, added in a "distance between" operation, and had them behave more sanely when compared with signed integers.

If you must have subtraction operators, they don't return unsigned values.

As it stands, (3-4) > 0 in the world of unsigned ints. Clearly either ">" or "-" is behaving very strangely if you are viewing "unsigned int" as "positive integer".

You can see that sort of thinking leaking into the stl with "pointer diff type" being distinct from size types. The difference between two sizes shouldn't be another size: it needs to be a larger signed type. If that was done,

Now, you can note that signed ints have similar problems at +/- ~2^31. However these tend to be far away from the area we are dealing with -- while zero is right next to the relatively low values that a "positive integer" type will often deal with. And with operator- sitting right there waiting to screw things up...

It is possible that we can generate a non-stupid positive integer type in C++ via template work that has next to no unrequired overhead.

Code: Select all

template<typename T>
struct Positive {
  typedef larger_signed<T> diff_type;
  typedef Positive<T> my_type;
  typedef T base_type;

  diff_type operator-( my_type const& other );
  my_type operator+( my_type const& other );
// ... etc
};

with a bit of work, this will compile away, except when you take unsigned int - unsigned int you get a long long.

It isn't perfect, but at least it doesn't explode when touched on the wrong side.
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.

User avatar
ash.gti
Posts: 404
Joined: Thu Feb 07, 2008 1:18 am UTC
Location: Probably a coffee shop.

Re: Unsigned integers

Postby ash.gti » Wed Mar 26, 2008 4:07 pm UTC

*Off topic*
coppro wrote:What's the big deal with objects that aren't derived from a global root object class? If you are passing around objects of any type, you probably want to use templates, or some sort of clever wrapper class (like Boost.Any).


If everything is an object you can chain calls like in Smalltalk or Ruby.

It lets you for instance:

Code: Select all

arrayA, arrayB = ['hi', 'test', 2, 4, [ 'sub array', 5, 7 ]], [1,2,4]
# makes an array with a sub array and a 2nd array
puts arrayA.concat(arrayB).flatten.uniq * ','

# outputs: hi,test,2,4,sub array,5,7,1


I like being able to do things like that.

I suppose if all of the data types returned that were returned by functions were wrapped to classes then this could be possible, but not nearly as easy...

I also like closures, which C++ doesn't support. Again, ruby and smalltalk do... so... I can say I already know which parts of C++ will bug me if I get back into it. Don't get me wrong, C++ is great and powerful, but just not my favorite.
# drinks WAY to much espresso

EvanED
Posts: 4331
Joined: Mon Aug 07, 2006 6:28 am UTC
Location: Madison, WI
Contact:

Re: Unsigned integers

Postby EvanED » Wed Mar 26, 2008 5:27 pm UTC

ash.gti wrote:*Off topic*
coppro wrote:What's the big deal with objects that aren't derived from a global root object class? If you are passing around objects of any type, you probably want to use templates, or some sort of clever wrapper class (like Boost.Any).


If everything is an object you can chain calls like in Smalltalk or Ruby.

It lets you for instance:

Code: Select all

arrayA, arrayB = ['hi', 'test', 2, 4, [ 'sub array', 5, 7 ]], [1,2,4]
# makes an array with a sub array and a 2nd array
puts arrayA.concat(arrayB).flatten.uniq * ','

# outputs: hi,test,2,4,sub array,5,7,1


I like being able to do things like that.

That seems like a separate problem, because you can do that in C++ too. In fact, that's how things as simple as cout << "Hello world" << endl; work, since that's essentially cout.operator<<("Hello world").operator<<(endl), or even more illustratively (if incorrectly) cout.ouput("Hello world").output(endl).

The fact that people don't do it very commonly seems to speak more about the conventions of use (and, granted, the standard library) than it does about the language.

I also like closures, which C++ doesn't support. Again, ruby and smalltalk do... so... I can say I already know which parts of C++ will bug me if I get back into it. Don't get me wrong, C++ is great and powerful, but just not my favorite.

C++0x will introduce closures to the language, which is pretty cool.

Boost also adds some library support for them... depending on how complicated the closure is, the syntax seems to range from "simpler and more compact than either Lisp or ML" (I don't know Ruby or Smalltalk syntax) to "a crime against humanity", usually tending toward one of those extremes. ;-)

For instance, if you wanted to print every item in an STL collection, you could do for_each(c.begin(), c.end(), cout << _1 << " "). Contrasted with an ML-like closure syntax: for_each(c.begin(), c.end(), fn arg => cout << arg << " ") or Lisp-like: for_each(c.begin(), c.end(), (lambda (arg) cout << arg << " ")). You don't need to explicitly begin the closure with an fn or a lambda, and you don't need to provide an explict argument list, because they are always called _1, _2, etc.

At the same time, if you want to start referring to local variables, or delaying computations to the point the closure is evaluated, or write something that should be multiple statements, the lack of language support quickly moves into the "crime against humanity" category.

(As one more point of illustration, the syntax with C++0x closures: for_each(c.begin(), c.end(), <>(int arg)(cout << arg << " ")). (I don't know if you can not name the argument type explicitly, I just had to look up the syntax real quickly on Wikipedia.)

User avatar
Yakk
Poster with most posts but no title.
Posts: 11115
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove

Re: Unsigned integers

Postby Yakk » Wed Mar 26, 2008 6:44 pm UTC

Iteration 1 of closures will required typed (and named) arguments. There are some extensions planned for later iterations that make this requirement useful.

You also have control over which local arguments come in, and if they come in via value or via reference.

Code: Select all

<>(int foo) { cout << foo; };
// vs:
<&>(int foo) { total+= foo; cout << foo; }


The first imports nothing, the second imports everything in local scope by reference. You can import everything by value, or individual variables by name or value as well.

With concept maps and template mojo, you'll be able to say "this container contains copies of things that have the following operations on them", and allow someone who has a type that doesn't match your requirements set up a remapping from your operation to how to do it on the type (without modifying the type).

With a bit of effort in 0x, you'll be able to create a "has basic message-based methods" concept, and then concept_map the base types into that category, and create a "your private base type" object that uses the concept_map to know how to redirect queries to the right actions.

That would be neat. :)
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.

AdShea
Posts: 14
Joined: Tue Apr 24, 2007 9:11 pm UTC

Re: Unsigned integers

Postby AdShea » Wed May 14, 2008 11:46 am UTC

I think you're missing the point of unsigned values. They're mainly used for interacting with low-level systems. character codes are unsigned. Memory sizes are unsigned. Disk sectors are unsigned (and there can be well over 2^32 blocks on a disk as we all know). Memory addresses are unsigned. Also, the "Positive integers mod(2^n)" implementation is very very useful for things like hardware timers and power-of-2 circular buffers. Maybe it's all the microcode I've been writing lately, but there's a lot faster and more efficient ways to do things if you make use of unsigned int (usually only 8-bits) rather than doing it in the CSci textbook manner with oversized fully signed integers.

User avatar
Yakk
Poster with most posts but no title.
Posts: 11115
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove

Re: Unsigned integers

Postby Yakk » Wed May 14, 2008 6:55 pm UTC

Sure -- and in those isolated cases where you want to deal with that stuff, use the data structure that is numbers modulo 2^8.

Having unsigned integers in the language is a great thing. Using them in every case that you expect your value to be positive is a bad idea, as there are a myriad of small bugs that can leak in.
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.

User avatar
Sc4Freak
Posts: 673
Joined: Thu Jul 12, 2007 4:50 am UTC
Location: Redmond, Washington

Re: Unsigned integers

Postby Sc4Freak » Wed May 21, 2008 11:09 am UTC

EvanED wrote:C has them, Java doesn't. Are they a good idea?

I know they are required sometimes; this is not what I'm talking about. I'm more concerned with a particular question. I'm creating classes that wrap a couple from the STL that do things a little differently. One thing that I've always been a little annoyed about is that the size() methods of all the STL classes return a size_t instead of a signed integer type. This means that either I need to turn off or ignore things like signed/unsigned comparison or use unsigned types myself. I've never been a fan of them when they aren't needed.

For instance, take a std::vector. There is almost no way that there could be enough elements in the vector so that it is out of the range of, say, a 'signed long' but in the range of an 'unsigned long'. The only way is if it were a vector of chars or something else one byte and took up half the memory space, or if it was a vector<bool> under an implementation that packed bits. If you make a vector<short>, if there were more than the max value of a 'signed long', it would be bigger than the process memory space. (Okay, you probably could make an implementation of std::vector that uses a file to back its storage. But show me a vendor that actually does and I'll mail you some cookies.)

Meanwhile, using unsigned numbers means that you have to deal with situations like the following otherwise-reasonable loop not working

Code: Select all

for(unsigned int i = N - 1; i>=0 ; --i)
    ...
This seems like a good enough reason to prefer signed types unless you have a good reason the other way.

On the other hand, it's kind of nice to have the type describe that negative numbers aren't legal. However, without actual enforcement of that (I don't count wrapping -1 to 2^32-1 as enforcing that the value is not negative), I don't think that's enough of a reason to prefer it given the possibility for semi-latent bugs like that.

So I'm really asking two questions here. What are your thoughts on unsigned types in general and what do the C++ people think about making my wrapper classes return signed ints from size()?

There are reasons for size_t rather than just int or unsigned int. If you're writing a loop counter using size(), the type of your index should be size_t, not unsigned int or int. While what you're saying makes sense today on a 32-bit architecture, you can potentially run into problems in the future when using 64-bit.

In the Microsoft Windows world, size_t is 64-bits when compiled as a 64-bit application, whereas int's and unsigned int's remain 32-bits. On a 64-bit platform, it is entirely reasonable to have more than 4 billion elements in an array (or 2 billion for a signed int). Using an int when you should be using a size_t can cause problems.

User avatar
evilbeanfiend
Posts: 2650
Joined: Tue Mar 13, 2007 7:05 am UTC
Location: the old world

Re: Unsigned integers

Postby evilbeanfiend » Wed May 21, 2008 2:00 pm UTC

Yakk wrote:Sure -- and in those isolated cases where you want to deal with that stuff, use the data structure that is numbers modulo 2^8.

Having unsigned integers in the language is a great thing. Using them in every case that you expect your value to be positive is a bad idea, as there are a myriad of small bugs that can leak in.


iirc size_t is guaranteed in the standard to always be big enough to not wrap the size of a container, i'm not there really is a problem here? the only way you would realistically wrap a size_t is when trying to combine two very large containers, at which point you will get exceptions thrown i think (bad_alloc?)

of course also whether you do signed or unsigned addition doesn't effect whether you can detect if the value has wrapped, signed just make the check slightly simpler and more intuitive.
in ur beanz makin u eveel

User avatar
Yakk
Poster with most posts but no title.
Posts: 11115
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove

Re: Unsigned integers

Postby Yakk » Wed May 21, 2008 3:09 pm UTC

The problem is that size_t has an operator- that can generate utter crap in regions near where size_t (ie, small values) is typically used, generating relatively common bugs. What is worse is that these bugs sometimes end up being relatively subtle logic errors.

Using int instead is, of course, also bad -- less due to the fact that an int isn't big enough under standard systems to fit the data in a container, and more because of the incoming 64 bit issue.

I didn't say there was a good, easy solution -- I said unsigned integers suck. :)
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.

User avatar
Berengal
Superabacus Mystic of the First Rank
Posts: 2707
Joined: Thu May 24, 2007 5:51 am UTC
Location: Bergen, Norway
Contact:

Re: Unsigned integers

Postby Berengal » Wed May 21, 2008 4:10 pm UTC

Well now, there are times when a number shouldn't be negative, and there are times when a number cannot be negative. For the latter, I use unsigned numbers. For example, need a random positive number? "unsigned int random = rand();" In java, that's "int random = prng.nextInt() & 0x7fffffff;".

Of course, apart from that example, I've never had use for unsigned numbers. I suspect they could be useful for wrapping around large arrays if you have need for that, but that's no worse than "number %= length;"
It is practically impossible to teach good programming to students who are motivated by money: As potential programmers they are mentally mutilated beyond hope of regeneration.

User avatar
Sc4Freak
Posts: 673
Joined: Thu Jul 12, 2007 4:50 am UTC
Location: Redmond, Washington

Re: Unsigned integers

Postby Sc4Freak » Thu May 22, 2008 3:49 am UTC

Yakk wrote:The problem is that size_t has an operator- that can generate utter crap in regions near where size_t (ie, small values) is typically used, generating relatively common bugs. What is worse is that these bugs sometimes end up being relatively subtle logic errors.

What? Depending on what your compiler decides to do, size_t should just be a typedef for a regular built-in primitive type.

User avatar
Yakk
Poster with most posts but no title.
Posts: 11115
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove

Re: Unsigned integers

Postby Yakk » Thu May 22, 2008 3:58 am UTC

Sc4Freak wrote:
Yakk wrote:The problem is that size_t has an operator- that can generate utter crap in regions near where size_t (ie, small values) is typically used, generating relatively common bugs. What is worse is that these bugs sometimes end up being relatively subtle logic errors.

What? Depending on what your compiler decides to do, size_t should just be a typedef for a regular built-in primitive type.

Yes. And, as noted, unsigned integers generate can generate complete crap when operator- is run on them.

size_t isn't an abstraction of "integers modulo 2^K", it is supposed to be an abstraction of size.

The difference between two sizes is not a very large positive value.

There is a serious abstraction leak.

unsigned integers are integers modulo 2^K in C/C++. When you want to treat them as positive integers, the operator- has a tendency to generate crappy results that don't line up with what happens when you subtract two positive integers. As such, operator- has serious flaws if you are attempting to use signed integers as positive integers.

(Didn't I say this already?)

This, in theory, could be fixed by removing operator- when you are modeling positive integers, and/or replacing it with an operator that generates errors when a loop around case occurs. The C++ code isn't that tricky.

size_t has the problem that it is often implemented as an unsigned integer, yet it leaves the naked operator- around. Which makes it a poor choice for everyday use, as the operator- generates crappy results in a number of simple cases.
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.

User avatar
Sc4Freak
Posts: 673
Joined: Thu Jul 12, 2007 4:50 am UTC
Location: Redmond, Washington

Re: Unsigned integers

Postby Sc4Freak » Thu May 22, 2008 5:05 am UTC

Ah, I see what you're saying. Because size_t is unsigned, a - b where a < b will result in a large positive number.

Yakk wrote:This, in theory, could be fixed by removing operator- when you are modeling positive integers, and/or replacing it with an operator that generates errors when a loop around case occurs. The C++ code isn't that tricky.

Unfortunately, that's not possible. Since size_t is defined in terms of built-in primitive types, you can't redefine operators for it. So it'll always have to use the default subtraction operator provided for whatever built-in type it's typedef'd as.

User avatar
Yakk
Poster with most posts but no title.
Posts: 11115
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove

Re: Unsigned integers

Postby Yakk » Thu May 22, 2008 3:17 pm UTC

Code: Select all

struct sane_size_t {
  size_t raw_value;
  sane_size_t(size_t const& raw_value_):raw_value(raw_value_){}
  sane_size_t(sane_size_t const& other):raw_value(other.raw_value){}
  sane_size_t operator+(...) const {...}
  sane_size_t operator+=(...) {...}
  sane_size_t operator*(...) const {...}
  sane_size_t operator*=(...) {...}
  sane_size_t distance_between(sane_size_t other) const { ... };
  ...
};

Don't include an implicit conversion from sane_size_t to size_t, but have an implicit conversion from size_t to sane_size_t.

Include every operator you might want for a positive integer. distance_between is sort of like subtraction, but a.distance_between(b) = b.distance_between(a) -- ie, it is always big-small.

a.delta(b) might also exist, but it returns a signed integer, and either the different type is large enough to deal with the large possible values, or it errors out if you try too large a subtraction.

Now, while the language provides you with size_ts at various points, you only make variables of type sane_size_t to store them.

Code: Select all

for (sane_size_t item = 0; item != vector_object.size(); ++item) {
  // .. code ..
}

the code that uses item almost certainly ends up inlining the operator methods of item, so it compiles down to just doing unsigned mathematics -- except in the few cases where things are kooky (subtraction), which don't silently occur and blow up in your face.

I doubt that the standard gives the compiler implementors sufficient leeway to make a sane operator- on size_t, barring implementing it as a signed integer (which I think is possible?).
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.

User avatar
evilbeanfiend
Posts: 2650
Joined: Tue Mar 13, 2007 7:05 am UTC
Location: the old world

Re: Unsigned integers

Postby evilbeanfiend » Thu May 22, 2008 3:54 pm UTC

Sc4Freak wrote:Ah, I see what you're saying. Because size_t is unsigned, a - b where a < b will result in a large positive number.


yes but that is only more dangerous (in terms of logic errors) if the large positive value is actually a valid index into the container, which while not impossible is probably not a case we really need to overly concern ourselves with. otherwise it will blow up in pretty much the same way as the signed version or can be trapped in pretty much the same way as the signed version.

i.e. you are trading doubling your possible container sizes for risking logic errors from size_t manipulation for the larger container sizes only, if you don't use them there is no additional risk so overall its a gain. unsigned also shows the intent of the value better.

the only real downside i can see is that the test for wrapping is slightly less intuitive (and potentially slower, but we are in micro optimization territory then) then test for negativity if you want to trap it (but then you should probably be using .at() et al with an index anyway if you want to trap out of range ones)
in ur beanz makin u eveel

User avatar
Yakk
Poster with most posts but no title.
Posts: 11115
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove

Re: Unsigned integers

Postby Yakk » Thu May 22, 2008 5:14 pm UTC

I'm talking about logic errors, not accessing memory beyond the ends of a container errors.

a-b does not mean "a-b" when the types of a and b are unsigned in a rather annoyingly large number of cases. It means "a-b mod 2^K".

This is also true of signed values, but the "wrap around" zones of signed values are far away from the areas where the numbers are typically used.

If you only do + and - operations, things are fine. Do any division, multiplication or comparison mixed in with your subtraction, and you are risking rather annoying logic errors -- ones that are subtle and don't always immediately blow up in your face.

On the other hand, if you avoid doing any subtraction or comparison with negative integers, you are reasonably well off.
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.

User avatar
TomBot
Posts: 228
Joined: Sun Jul 29, 2007 1:17 am UTC
Location: Illinois (UIUC)
Contact:

Re: Unsigned integers

Postby TomBot » Fri May 23, 2008 7:46 am UTC

Actually, there is no guarantee that integers wrap as you expect - the result is simply undefined. (I learned this by submitting a GCC bug about wrapping working strangely. It turns out the optimizer can just do whatever it wants in that case. There are some options to override that, though.)

This is coming up to a language philosophy issue. Yes, you can get stupid bugs when you do a - b, or more subtly, expect abs(a - b) to work as you expect. But this is one of about a million stupid things you can do in C++. My philosophy is, well, to strive not to do these things. If you want sloppily written code to be more likely to work anyway, use Java or something, and unit test it. C++ is for making code work awesomely if it's totally correct, and die horribly if it's not, which will hopefully inspire one to use some discipline.

zenten
Posts: 3799
Joined: Fri Jun 22, 2007 7:42 am UTC
Location: Ottawa, Canada

Re: Unsigned integers

Postby zenten » Fri May 23, 2008 12:59 pm UTC

TomBot wrote:C++ is for making code work awesomely if it's totally correct, and produce errors that you may never be able to track down, which will hopefully inspire one to use some discipline.


Fixed.

User avatar
evilbeanfiend
Posts: 2650
Joined: Tue Mar 13, 2007 7:05 am UTC
Location: the old world

Re: Unsigned integers

Postby evilbeanfiend » Fri May 23, 2008 2:42 pm UTC

TomBot wrote:Actually, there is no guarantee that integers wrap as you expect - the result is simply undefined. (I learned this by submitting a GCC bug about wrapping working strangely. It turns out the optimizer can just do whatever it wants in that case. There are some options to override that, though.)


i think you need to specify signed in there somewhere. unsigned integers are explicitly required by the standard to wrap as you would expect iirc.
in ur beanz makin u eveel

User avatar
Yakk
Poster with most posts but no title.
Posts: 11115
Joined: Sat Jan 27, 2007 7:27 pm UTC
Location: E pur si muove

Re: Unsigned integers

Postby Yakk » Fri May 23, 2008 4:03 pm UTC

My point was simple: operator - works poorly on unsigned types. If a fellow programmer defined a type in which operator- worked as poorly as it does in unsigned integers, you would consider that programmer to be an idiot.

My fix was simple: wrap an unsigned type in a relatively transparent wrapping that blocks the direct use of operator-.

C++ isn't about fragile code. It is about more than that. It lets you, more than any other language out there, change the language in ways that doesn't create any inefficiencies.

The sane_size_t I wrote? It is just as fast as a size_t under even really naive optimization. It takes the same space -- it has the exact same binary representation. It just doesn't let you say "a-b" without calling a method or accessing internal state.

Sort of like the STL: operations which are sucky and slow are removed from containers and iterators. Sure, the container could implement them poorly -- a forward iterator can implement +(integer) easily, it just is very slow and it sucks. There is even a means to do this if you really need to -- the algorithm advance (if I remember the name right). But if you take a forward iterator, and say +7 to it, it fails to compile.

This is a good thing. It is a C++ thing.

Yes, unsigned integers are good when you are doing math modulo 2^32 (or whatever max unsigned int+1 is). Having them around is a good thing.

The point of C++ is to allow the programmer to write the powerful leverage code on top of it that makes it as fast as possible, and as safe as reasonable. Examine the extensions in C++0x -- while other languages implement in-language iteration via exceptions (Python), virtual interfaces (Java, C#, Python) and/or new language constructs (C#), C++ produced a solution that makes a for loop over an array as efficient as custom C code, a for loop over a std::vector as efficient as the array, and let you create your own classes that can be iterated as efficiently, inefficiently, or abstractly as you wish them to be. At the same time, it removed a bunch of possible typos in each loop that could cause a myriad of errors.

You could write the C++0x for(auto& i: container) in existing C++ -- it just writes the code for you, and does all of the anal performance speedup tricks that don't look pretty, and wraps them in an easy to use, safe interface.

C++ doesn't mean you have to neglect safety for speed, nor speed for safety. It lets you make a fast thing safer without slowing it down, and it lets you become even more safe by decreasing speed. That first step -- adding some safety for no speed loss -- is what makes C++ fundamentally different than many of it's alternatives. Claiming "if you wanted safe, you wouldn't be using C++" is simply ignoring one of the fundamental benefits of writing in C++ instead of ASM.
One of the painful things about our time is that those who feel certainty are stupid, and those with any imagination and understanding are filled with doubt and indecision - BR

Last edited by JHVH on Fri Oct 23, 4004 BCE 6:17 pm, edited 6 times in total.

UchihaJax
Posts: 12
Joined: Thu Sep 06, 2007 7:37 pm UTC

Re: Unsigned integers

Postby UchihaJax » Sun May 25, 2008 2:54 pm UTC

Aren't they just a throwback to the 16 bit days?
I can't even think of modern use case (well perhaps a very simple and low powered embedded device....)

Rysto
Posts: 1460
Joined: Wed Mar 21, 2007 4:07 am UTC

Re: Unsigned integers

Postby Rysto » Sun May 25, 2008 4:17 pm UTC

UchihaJax wrote:Aren't they just a throwback to the 16 bit days?
I can't even think of modern use case (well perhaps a very simple and low powered embedded device....)

When you want to do bitfiddling on an int, unsigned makes your life a whole lot easier.

User avatar
evilbeanfiend
Posts: 2650
Joined: Tue Mar 13, 2007 7:05 am UTC
Location: the old world

Re: Unsigned integers

Postby evilbeanfiend » Tue May 27, 2008 7:42 am UTC

yes or if you are directly messing with memory you almost certainly want to use arrays of unsigned chars.

i think the other issue is one of intent, you have to make a decision as to what your interface says to other people.
in ur beanz makin u eveel

ThomasS
Posts: 585
Joined: Wed Dec 12, 2007 7:46 pm UTC

Re: Unsigned integers

Postby ThomasS » Tue May 27, 2008 6:10 pm UTC

I kind of like to use types to describe what values are permitted. In particular I have some polynomial code which uses unsigned types (in C++) to represent powers. So I just got to hunt down a bug that was (effectively)

Code: Select all

int test=-20;
unsigned int q = 4;
test /= q;


I have no real problem with unsigned types, even if you do end up with unitary "-" operator as a historical accident. But sometimes the semantics that C/C++ comes up with for them strike me as a little odd.


Return to “Religious Wars”

Who is online

Users browsing this forum: No registered users and 4 guests