C++ question about non-integer parts of large doubles

A place to discuss the implementation and style of computer programs.

Moderators: phlip, Moderators General, Prelates

sgfw
Posts: 46
Joined: Wed Jun 19, 2013 11:47 pm UTC

C++ question about non-integer parts of large doubles

Postby sgfw » Sat Sep 13, 2014 11:55 pm UTC

Okay, so I have this problem, where I need to multiply a float (which would always be between 0 and 1) by 4294967295 and get the non-integer part. Generally, this could be done with

Code: Select all

double a = whatever;
double b = a*4294967295.;
answer = b - (int)b


but the number when multiplied by 4294967295 is too large for the information right of the decimal place to be preserved. Because I know one of the numbers being multiplied, I feel like there should be a way to do this without b exceeding a certain magnitude, but I can't figure out how. I'd rather not include any arbitrary precision libraries, if at all possible. Does anyone have a solution to this problem?

User avatar
thoughtfully
Posts: 2253
Joined: Thu Nov 01, 2007 12:25 am UTC
Location: Minneapolis, MN
Contact:

Re: C++ question about non-integer parts of large doubles

Postby thoughtfully » Sun Sep 14, 2014 12:35 am UTC

A bigger floating point type (long double, for instance) might do the job. Might not be portable, and might not actually be larger, so check the compiler docs.

You probably have something more subtle going on, though, like not displaying the full precision when you convert the float into a string. A double precision float has a lot more significant figures than you are chopping off.
Image
Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away.
-- Antoine de Saint-Exupery

User avatar
Qaanol
The Cheshirest Catamount
Posts: 3057
Joined: Sat May 09, 2009 11:55 pm UTC

Re: C++ question about non-integer parts of large doubles

Postby Qaanol » Sun Sep 14, 2014 1:13 am UTC

You are multiplying by (232-1). So take advantage of that.

Code: Select all

x = whatever;                // Between 0 and 1 by assertion
a = x * (double)(1L << 32);  // Preserves all digits of x because it only affects the exponent not the mantissa
a = fmod(a, 1.0);            // The fractional part of x * 2^32
if (x <= a) {
  return a - x;
} else {
  return a + (1.0 - x);
}


Note that I’m also taking advantage of x being between 0 and 1. If that is not guaranteed, you’ll have to make some changes.
wee free kings

KnightExemplar
Posts: 5494
Joined: Sun Dec 26, 2010 1:58 pm UTC

Re: C++ question about non-integer parts of large doubles

Postby KnightExemplar » Sun Sep 14, 2014 3:02 pm UTC

sgfw wrote:Okay, so I have this problem, where I need to multiply a float (which would always be between 0 and 1) by 4294967295 and get the non-integer part. Generally, this could be done with

Code: Select all

double a = whatever;
double b = a*4294967295.;
answer = b - (int)b


but the number when multiplied by 4294967295 is too large for the information right of the decimal place to be preserved. Because I know one of the numbers being multiplied, I feel like there should be a way to do this without b exceeding a certain magnitude, but I can't figure out how. I'd rather not include any arbitrary precision libraries, if at all possible. Does anyone have a solution to this problem?


I think you've got the wrong analysis. 4294967295 is a 32-bit number, so in the worst case you wipe out 32-bits from the 53-bit mantissa. You should still have 21-bits of mantissa after the calculation.

Is 21-bits not sufficient for your needs?

The problem you're facing is cancellation error in the subtract. The bits "disappear" at this point of the code:

Code: Select all

answer = b - (int)b


Unless you figure out a way to delay the subtraction (or addition of a similar-magnitude negative number), you will always wipe out some ~32-bits of information. Qaanol's code guarantees this loss, as the top 32-bits are always set to 0 and doesn't seem to address the inherent cancellation issue.

One strategy you can employ (if your code can support it) is do all your multiplies and divides together. (these do not affect precision very much... maybe a bit lost to rounding at the worst), THEN perform the information-losing adds / subtracts from smallest to largest to minimize the information loss. IE: 5 - 4 + 3 - 2 + 1 should be operated in the order of ((((1-2)+3)-4)+5)

Again, it isn't always possible, but its something you should strive for.

http://dm.ing.unibs.it/gervasio/Nummeth ... errors.pdf

EDIT: Cancellation error is very simple to understand.

Imagine the subtraction of these two 10 digit numbers on a 10-digit machine:

Code: Select all

  1.234567899
- 1.234567890
=============
  0.000000009


After the subtraction, you are left with 1-digit worth of information. In double-precision, the decimal place is then moved all the way over. In essence, the 10-digit answer will be stored like this:

Code: Select all

9.000000000 x (10^-9)


The number above may claim to have 10-digits of precision, but in fact it only has 1-digit of precision. When performing floating-point math, it is important to "count your sig-figs" at every step of all calculations if you want to ensure an accurate result.
First Strike +1/+1 and Indestructible.

sgfw
Posts: 46
Joined: Wed Jun 19, 2013 11:47 pm UTC

Re: C++ question about non-integer parts of large doubles

Postby sgfw » Sun Sep 14, 2014 6:24 pm UTC

I think you've got the wrong analysis. 4294967295 is a 32-bit number, so in the worst case you wipe out 32-bits from the 53-bit mantissa. You should still have 21-bits of mantissa after the calculation.


I don't really know that much about this, but when I run a code like this one:

Code: Select all

double d = 0.2432;
std::cout << 4294967295*d;


It comes out with 1.04454e+009, which I take to mean that it didn't have enough space to remember the full number (1.044536046144e+009). Maybe it does remember the number, but just doesn't display it?

KnightExemplar
Posts: 5494
Joined: Sun Dec 26, 2010 1:58 pm UTC

Re: C++ question about non-integer parts of large doubles

Postby KnightExemplar » Sun Sep 14, 2014 6:32 pm UTC

sgfw wrote:
I think you've got the wrong analysis. 4294967295 is a 32-bit number, so in the worst case you wipe out 32-bits from the 53-bit mantissa. You should still have 21-bits of mantissa after the calculation.


I don't really know that much about this, but when I run a code like this one:

Code: Select all

double d = 0.2432;
std::cout << 4294967295*d;


It comes out with 1.04454e+009, which I take to mean that it didn't have enough space to remember the full number (1.044536046144e+009). Maybe it does remember the number, but just doesn't display it?


Printing a number doesn't tell you anything about the accuracy of it. The accuracy of a particular variable is something the programmer has to keep track of, the CPU has no idea how accurate or inaccurate the various variables are... its just a dumb computer doing your commands.

Try this, to get an idea of what is going on:

Code: Select all

double d = 0.2432;
cout.precision(15);
std::cout << 4294967295*d;


Now... the whole process, in particular the subtraction, is what wipes out 32-bits of accuracy. But it is the programmer who knows this. The CPU doesn't keep track of the amount of error a particular variable has... its again, something the programmer must learn how to keep track of when using doubles.

Again, the key here is this piece of code:

Code: Select all

answer = b - (int)b


After this point, "answer" only has some 20 or 21 bits of accuracy.

You have some ~53 bits of accuracy after the multiplication. Multiplication and division barely introduces any errors into double-precision point arithmetic (maybe 1 bit)
First Strike +1/+1 and Indestructible.

sgfw
Posts: 46
Joined: Wed Jun 19, 2013 11:47 pm UTC

Re: C++ question about non-integer parts of large doubles

Postby sgfw » Sun Sep 14, 2014 9:54 pm UTC

Thanks! I had no idea you could set the precision like that. That's helpful.

KnightExemplar
Posts: 5494
Joined: Sun Dec 26, 2010 1:58 pm UTC

Re: C++ question about non-integer parts of large doubles

Postby KnightExemplar » Sun Sep 14, 2014 10:01 pm UTC

sgfw wrote:Thanks! I had no idea you could set the precision like that. That's helpful.


That is only the precision of the print however. The double always has 53-bits (erm... sometimes 52-bits) of mantissa (which roughly translates to 15ish or 16ish decimal digits to print).

Your core problem of cancellation error is still there.

Also, study how the bits are packed in the double. Its useful to know how computers actually do floating-point arithmetic in these precision problems.
First Strike +1/+1 and Indestructible.


Return to “Coding”

Who is online

Users browsing this forum: No registered users and 6 guests