sgfw wrote:Okay, so I have this problem, where I need to multiply a float (which would always be between 0 and 1) by 4294967295 and get the non-integer part. Generally, this could be done with

Code: Select all

`double a = whatever;`

double b = a*4294967295.;

answer = b - (int)b

but the number when multiplied by 4294967295 is too large for the information right of the decimal place to be preserved. Because I know one of the numbers being multiplied, I feel like there should be a way to do this without b exceeding a certain magnitude, but I can't figure out how. I'd rather not include any arbitrary precision libraries, if at all possible. Does anyone have a solution to this problem?

I think you've got the wrong analysis. 4294967295 is a 32-bit number, so in the worst case you wipe out 32-bits from the 53-bit mantissa. You should still have 21-bits of mantissa after the calculation.

Is 21-bits not sufficient for your needs?

The problem you're facing is cancellation error in the subtract. The bits "disappear" at this point of the code:

Unless you figure out a way to delay the subtraction (or addition of a similar-magnitude negative number), you will

always wipe out some ~32-bits of information. Qaanol's code guarantees this loss, as the top 32-bits are

always set to 0 and doesn't seem to address the inherent cancellation issue.

One strategy you can employ (if your code can support it) is do all your multiplies and divides together. (these do not affect precision very much... maybe a bit lost to rounding at the worst),

THEN perform the information-losing adds / subtracts from smallest to largest to minimize the information loss. IE: 5 - 4 + 3 - 2 + 1 should be operated in the order of ((((1-2)+3)-4)+5)

Again, it isn't always possible, but its something you should strive for.

http://dm.ing.unibs.it/gervasio/Nummeth ... errors.pdfEDIT: Cancellation error is very simple to understand.

Imagine the subtraction of these two 10 digit numbers on a 10-digit machine:

Code: Select all

` 1.234567899`

- 1.234567890

=============

0.000000009

After the subtraction, you are left with 1-digit worth of information. In double-precision, the decimal place is then moved all the way over. In essence, the 10-digit answer will be stored like this:

The number above may claim to have 10-digits of precision, but in fact it only has 1-digit of precision. When performing floating-point math, it is important to "count your sig-figs" at every step of all calculations if you want to ensure an accurate result.

First Strike +1/+1 and Indestructible.