Page 1 of 1

C++ question about non-integer parts of large doubles

Posted: Sat Sep 13, 2014 11:55 pm UTC
by sgfw
Okay, so I have this problem, where I need to multiply a float (which would always be between 0 and 1) by 4294967295 and get the non-integer part. Generally, this could be done with

Code: Select all

double a = whatever;
double b = a*4294967295.;
answer = b - (int)b


but the number when multiplied by 4294967295 is too large for the information right of the decimal place to be preserved. Because I know one of the numbers being multiplied, I feel like there should be a way to do this without b exceeding a certain magnitude, but I can't figure out how. I'd rather not include any arbitrary precision libraries, if at all possible. Does anyone have a solution to this problem?

Re: C++ question about non-integer parts of large doubles

Posted: Sun Sep 14, 2014 12:35 am UTC
by thoughtfully
A bigger floating point type (long double, for instance) might do the job. Might not be portable, and might not actually be larger, so check the compiler docs.

You probably have something more subtle going on, though, like not displaying the full precision when you convert the float into a string. A double precision float has a lot more significant figures than you are chopping off.

Re: C++ question about non-integer parts of large doubles

Posted: Sun Sep 14, 2014 1:13 am UTC
by Qaanol
You are multiplying by (232-1). So take advantage of that.

Code: Select all

x = whatever;                // Between 0 and 1 by assertion
a = x * (double)(1L << 32);  // Preserves all digits of x because it only affects the exponent not the mantissa
a = fmod(a, 1.0);            // The fractional part of x * 2^32
if (x <= a) {
  return a - x;
} else {
  return a + (1.0 - x);
}


Note that I’m also taking advantage of x being between 0 and 1. If that is not guaranteed, you’ll have to make some changes.

Re: C++ question about non-integer parts of large doubles

Posted: Sun Sep 14, 2014 3:02 pm UTC
by KnightExemplar
sgfw wrote:Okay, so I have this problem, where I need to multiply a float (which would always be between 0 and 1) by 4294967295 and get the non-integer part. Generally, this could be done with

Code: Select all

double a = whatever;
double b = a*4294967295.;
answer = b - (int)b


but the number when multiplied by 4294967295 is too large for the information right of the decimal place to be preserved. Because I know one of the numbers being multiplied, I feel like there should be a way to do this without b exceeding a certain magnitude, but I can't figure out how. I'd rather not include any arbitrary precision libraries, if at all possible. Does anyone have a solution to this problem?


I think you've got the wrong analysis. 4294967295 is a 32-bit number, so in the worst case you wipe out 32-bits from the 53-bit mantissa. You should still have 21-bits of mantissa after the calculation.

Is 21-bits not sufficient for your needs?

The problem you're facing is cancellation error in the subtract. The bits "disappear" at this point of the code:

Code: Select all

answer = b - (int)b


Unless you figure out a way to delay the subtraction (or addition of a similar-magnitude negative number), you will always wipe out some ~32-bits of information. Qaanol's code guarantees this loss, as the top 32-bits are always set to 0 and doesn't seem to address the inherent cancellation issue.

One strategy you can employ (if your code can support it) is do all your multiplies and divides together. (these do not affect precision very much... maybe a bit lost to rounding at the worst), THEN perform the information-losing adds / subtracts from smallest to largest to minimize the information loss. IE: 5 - 4 + 3 - 2 + 1 should be operated in the order of ((((1-2)+3)-4)+5)

Again, it isn't always possible, but its something you should strive for.

http://dm.ing.unibs.it/gervasio/Nummeth ... errors.pdf

EDIT: Cancellation error is very simple to understand.

Imagine the subtraction of these two 10 digit numbers on a 10-digit machine:

Code: Select all

  1.234567899
- 1.234567890
=============
  0.000000009


After the subtraction, you are left with 1-digit worth of information. In double-precision, the decimal place is then moved all the way over. In essence, the 10-digit answer will be stored like this:

Code: Select all

9.000000000 x (10^-9)


The number above may claim to have 10-digits of precision, but in fact it only has 1-digit of precision. When performing floating-point math, it is important to "count your sig-figs" at every step of all calculations if you want to ensure an accurate result.

Re: C++ question about non-integer parts of large doubles

Posted: Sun Sep 14, 2014 6:24 pm UTC
by sgfw
I think you've got the wrong analysis. 4294967295 is a 32-bit number, so in the worst case you wipe out 32-bits from the 53-bit mantissa. You should still have 21-bits of mantissa after the calculation.


I don't really know that much about this, but when I run a code like this one:

Code: Select all

double d = 0.2432;
std::cout << 4294967295*d;


It comes out with 1.04454e+009, which I take to mean that it didn't have enough space to remember the full number (1.044536046144e+009). Maybe it does remember the number, but just doesn't display it?

Re: C++ question about non-integer parts of large doubles

Posted: Sun Sep 14, 2014 6:32 pm UTC
by KnightExemplar
sgfw wrote:
I think you've got the wrong analysis. 4294967295 is a 32-bit number, so in the worst case you wipe out 32-bits from the 53-bit mantissa. You should still have 21-bits of mantissa after the calculation.


I don't really know that much about this, but when I run a code like this one:

Code: Select all

double d = 0.2432;
std::cout << 4294967295*d;


It comes out with 1.04454e+009, which I take to mean that it didn't have enough space to remember the full number (1.044536046144e+009). Maybe it does remember the number, but just doesn't display it?


Printing a number doesn't tell you anything about the accuracy of it. The accuracy of a particular variable is something the programmer has to keep track of, the CPU has no idea how accurate or inaccurate the various variables are... its just a dumb computer doing your commands.

Try this, to get an idea of what is going on:

Code: Select all

double d = 0.2432;
cout.precision(15);
std::cout << 4294967295*d;


Now... the whole process, in particular the subtraction, is what wipes out 32-bits of accuracy. But it is the programmer who knows this. The CPU doesn't keep track of the amount of error a particular variable has... its again, something the programmer must learn how to keep track of when using doubles.

Again, the key here is this piece of code:

Code: Select all

answer = b - (int)b


After this point, "answer" only has some 20 or 21 bits of accuracy.

You have some ~53 bits of accuracy after the multiplication. Multiplication and division barely introduces any errors into double-precision point arithmetic (maybe 1 bit)

Re: C++ question about non-integer parts of large doubles

Posted: Sun Sep 14, 2014 9:54 pm UTC
by sgfw
Thanks! I had no idea you could set the precision like that. That's helpful.

Re: C++ question about non-integer parts of large doubles

Posted: Sun Sep 14, 2014 10:01 pm UTC
by KnightExemplar
sgfw wrote:Thanks! I had no idea you could set the precision like that. That's helpful.


That is only the precision of the print however. The double always has 53-bits (erm... sometimes 52-bits) of mantissa (which roughly translates to 15ish or 16ish decimal digits to print).

Your core problem of cancellation error is still there.

Also, study how the bits are packed in the double. Its useful to know how computers actually do floating-point arithmetic in these precision problems.