faster function than 1/x?
Moderators: phlip, Moderators General, Prelates
faster function than 1/x?
Hey there! I have a simple program that I am optimizing. Right now one of the bottle necks is taking the inverse of each element a very large vector (1/x). I don't actually need specifically to take the inverse. All I require is a function with the following two properties:
(1) f(x)>0 for all x>0
(2) given x1 > x2, f(x1)<f(x2)
so as you can see, f(x)=1/x is a perfectly valid solution. However, I am trying to find a different function which satisfies these two constraints but evaluates more quickly than 1/x.
I've been looking and can't find anything.
Any ideas?
(1) f(x)>0 for all x>0
(2) given x1 > x2, f(x1)<f(x2)
so as you can see, f(x)=1/x is a perfectly valid solution. However, I am trying to find a different function which satisfies these two constraints but evaluates more quickly than 1/x.
I've been looking and can't find anything.
Any ideas?
Re: faster function than 1/x?
*waves hand*
Oooh! Ooh! How about you swap the values (or the comparison operator) and set the lower bound to ∞? Or optimize to an upper bound of +∞?
f(x) = 2^x may also be a nice contender: subtract x from the maximum exponent and set the result as the exponent of a floating point type of your choice. (though, given x is "a very large vector", you probably don't have enough exponent bits in even a quadruplesized float) Might be cheaper than inverting.
Oooh! Ooh! How about you swap the values (or the comparison operator) and set the lower bound to ∞? Or optimize to an upper bound of +∞?
f(x) = 2^x may also be a nice contender: subtract x from the maximum exponent and set the result as the exponent of a floating point type of your choice. (though, given x is "a very large vector", you probably don't have enough exponent bits in even a quadruplesized float) Might be cheaper than inverting.
Re: faster function than 1/x?
Input and output need to be doubles? Do the comparisons need to be strict? (i.e. would 'x1 > x2' but 'f(x1) <= f(x2)' be a sufficient condition)

 Posts: 39
 Joined: Tue Jan 14, 2014 6:18 am UTC
Re: faster function than 1/x?
f(x) = Max(element)  x ?
or
f(x) = A  x, where A is a constant known to be larger than Max(element)
or
f(x) = A  x, where A is a constant known to be larger than Max(element)

 Posts: 849
 Joined: Thu Dec 10, 2009 6:21 pm UTC
 Location: Placerville, CA
 Contact:
Re: faster function than 1/x?
BedderDanu wrote:f(x) = Max(element)  x ?
or
f(x) = A  x, where A is a constant known to be larger than Max(element)
Yeah, this is probably the simplest solution, except for the gotcha about f(Max(element)) = 0  if that's not a problem, or if you can be certain that x will never equal Max(element), then this is pretty much the simplest solution you're going to find. The workaround version is also good, although it requires you to go up to the next size type, which may affect the performance gain.
"'Legacy code' often differs from its suggested alternative by actually working and scaling."
 Bjarne Stroustrup
www.commodorejohn.com  in case you were wondering, which you probably weren't.
 Bjarne Stroustrup
www.commodorejohn.com  in case you were wondering, which you probably weren't.
Re: faster function than 1/x?
BedderDanu wrote:f(x) = Max(element)  x ?
The perfect solution for integer math, but it can be problematic on floating point values that have vastly different orders of magnitude. For example, the set 1, 2, 3, ..., 100, 2^100 would map all the first hundred numbers to exactly 2^100, which is bad.
1/x seems like the perfect function, if only it could be computed faster. And it can. See for example here or here. If you're only using single precision floats, you can also try the Fast inverse square root.
The fastest way would be to roll your own bit level hacks. Floating points are of the form m * 2^e, where 1 < m < 2, so if you negate both the exponent and the mantissa separately, it might work. Construct a floating point number that reads (2m) * 2^(e), which can be done using nothing but bit operations and addition, as long as you don't have special values like +Inf, NaN, subnormals or even 0.
Re: faster function than 1/x?
Thanks for the excellent advice!
@EvanED:
No, singles are fine. I avoided using f(x1) <= f(x2), because then f(x)=constant is technically a valid solution.
@BedderDanu:
Duh, this is a pretty obvious choice once you put it that way. I'm going to experiment with this and see how it goes. I've got to make sure that the bounds on the values are well behaved enough to give the behavior I am looking for
@Tub:
Thanks. I discovered the fast inverse square root in my searching. I was considering giving it a shot but figured I'd ask this forum to see if an easier implementation exists.
Gah. I also tried doing bitlevel manipulation. I figured out negating the exponent, but I didn't think of m'=(2m) for the mantissa. Would it be necessary to actually use m'=3m, so that 1<m'<2?
@EvanED:
No, singles are fine. I avoided using f(x1) <= f(x2), because then f(x)=constant is technically a valid solution.
@BedderDanu:
Duh, this is a pretty obvious choice once you put it that way. I'm going to experiment with this and see how it goes. I've got to make sure that the bounds on the values are well behaved enough to give the behavior I am looking for
@Tub:
Thanks. I discovered the fast inverse square root in my searching. I was considering giving it a shot but figured I'd ask this forum to see if an easier implementation exists.
Gah. I also tried doing bitlevel manipulation. I figured out negating the exponent, but I didn't think of m'=(2m) for the mantissa. Would it be necessary to actually use m'=3m, so that 1<m'<2?
 phlip
 Restorer of Worlds
 Posts: 7535
 Joined: Sat Sep 23, 2006 3:56 am UTC
 Location: Australia
 Contact:
Re: faster function than 1/x?
Tub wrote:↶
The fastest way would be to roll your own bit level hacks. Floating points are of the form m * 2^e, where 1 < m < 2, so if you negate both the exponent and the mantissa separately, it might work. Construct a floating point number that reads (2m) * 2^(e), which can be done using nothing but bit operations and addition, as long as you don't have special values like +Inf, NaN, subnormals or even 0.
You don't even need several of those operations... just invert the mantissa and the exponent, like:
Code: Select all
float fastinvert(float x) {
uint32_t asint = *(uint32_t *)(&x);
asint ^= 0x7FFFFFFF;
return *(float *)(&asint);
}
Subnormals and zero map to infinity and NaN, and vice versa, in various weird and wonderful combinations. But for normal numbers, this works.
You'd want to test it to make sure it's actually faster than just 1/x in your program, of course. Once you start getting down to this level of microoptimisation, computers are weird, and performance on this scale can be hard to predict. I'd expect that copying some bits from float registers to int registers, doing a single bitmask, and copying them back, would be faster than a float division... but not with any amount of certainty.
Code: Select all
enum ಠ_ಠ {°□°╰=1, °Д°╰, ಠ益ಠ╰};
void ┻━┻︵╰(ಠ_ಠ ⚠) {exit((int)⚠);}
Re: faster function than 1/x?
The simplicity of that solution is beautiful.
I wouldn't assume that the float is in a register; thefargo said that the function is applied to a large vector. In that case, it may benefit from SIMD instructions, or at least from reading two floats at a time, for even n:
Another thing I forgot to mention: working on a huge vector of floats can often be sped up by moving calculations to the GPU via openCL. There's no benefit if all you're doing is 1/x (the overhead of copying the data to the GPU and back is a lot larger than the potential savings), but depending on the other calculations involved, it might be worth investigating.
I wouldn't assume that the float is in a register; thefargo said that the function is applied to a large vector. In that case, it may benefit from SIMD instructions, or at least from reading two floats at a time, for even n:
Code: Select all
void fastinvertvector(float *v, const int n) {
uint64_t *vi = (uint64_t *) v;
for (int i=0;i<n/2;i++)
vi[i] ^= 0x7FFFFFFF7FFFFFFF;
}
Another thing I forgot to mention: working on a huge vector of floats can often be sped up by moving calculations to the GPU via openCL. There's no benefit if all you're doing is 1/x (the overhead of copying the data to the GPU and back is a lot larger than the potential savings), but depending on the other calculations involved, it might be worth investigating.
Re: faster function than 1/x?
I should have said "floating point" rather than "doubles"... point is, you're not working with ints or fixed point.thefargo wrote:Thanks for the excellent advice!
@EvanED:
No, singles are fine. I avoided using f(x1) <= f(x2), because then f(x)=constant is technically a valid solution.
(Both of my questions were aiming toward a 'something big  x' solution, but it sounds like you have other better answers.
phlip wrote:Code: Select all
float fastinvert(float x) {
uint32_t asint = *(uint32_t *)(&x);
asint ^= 0x7FFFFFFF;
return *(float *)(&asint);
}
You really ought to be using a union for type punning like that; my understanding is your solution can lead to nasal demons due to the strictaliasing rule.
Code: Select all
union FloatAndInt {
float as_float;
uint32_t as_int;
};
float fastinvert(float x) {
FloatAndInt u;
u.as_float = x;
u.as_int ^= 0x7FFFFFFF;
return u.as_float;
}
It's possible your version was safe because you never keep around the pointers that violate the strictaliasing rule... but I wouldn't count on it.
Last edited by EvanED on Tue Sep 20, 2016 4:09 pm UTC, edited 1 time in total.
 Xenomortis
 Not actually a special flower.
 Posts: 1373
 Joined: Thu Oct 11, 2012 8:47 am UTC
Re: faster function than 1/x?
I seem to remember reading that the union variant is only safe in GCC, but I never really properly understood the logic around union aliasing rules from the standard.
Re: faster function than 1/x?
The pointer cast is undefined behavior. The compiler is allowed to reorder the int loads before previous float stores. Type pruning though union works on all compilers as far as I remember but I'm not entirely sure about MSVC and it's not in the standard. The standard way to do this is to use memcpy() or reads through char pointer.
 phlip
 Restorer of Worlds
 Posts: 7535
 Joined: Sat Sep 23, 2006 3:56 am UTC
 Location: Australia
 Contact:
Re: faster function than 1/x?
I'm pretty sure using a union is also undefined behaviour? That reading from a field of a union that's not (a compatible type of) the one you wrote to isn't strictly allowed. Or something like that, it's been a while since I read the standard. Point being that if we're talking strict bythestandard UB, I believe a union isn't any better than a pointer cast...
But it's been a while since I read the standard...
But it's been a while since I read the standard...
Code: Select all
enum ಠ_ಠ {°□°╰=1, °Д°╰, ಠ益ಠ╰};
void ┻━┻︵╰(ಠ_ಠ ⚠) {exit((int)⚠);}
Re: faster function than 1/x?
I think it may depend on C vs C++.phlip wrote:I'm pretty sure using a union is also undefined behaviour?
In C, it is not UB  rather, it just produces a perhapspartiallyunspecified result. However, from what I can tell it does guarantee that, if you are converting from type A to B and A's "object representation" is a valid object representation for a B, that's what you'll get out. (6.2.6.1 of this draft. Also see J.1 for a list of unspecified behavior, which includes "The value of a union member other than the last one stored into".) At any rate, there's a pretty big gulf between unspecified and undefined behavior, and violation of the strictaliasing rule is UB. I'm pretty sure that, in practice, pointer casts are also more likely to cause you problems.
For C++, I'm not sure. I went through the standard and didn't see anything about this, though I'm not totally sure what to look for other than "union." I did see one site say that this can be UB in C++; assuming that's true, using memcpy would be better (and that's what that site suggests).
 phlip
 Restorer of Worlds
 Posts: 7535
 Joined: Sat Sep 23, 2006 3:56 am UTC
 Location: Australia
 Contact:
Re: faster function than 1/x?
EvanED wrote:↶
In C, it is not UB  rather, it just produces a perhapspartiallyunspecified result.
Ah, I do sometimes get undefined and unspecified mixed up. I think I remembered seeing it in the list in J.1 but misremembered what the list was.
In which case, sure, this is safer:
Code: Select all
float fastinvert(float x) {
union {float f; uint32_t i;} val = {.f = x};
val.i ^= 0x7FFFFFFF;
return val.f;
}
Code: Select all
enum ಠ_ಠ {°□°╰=1, °Д°╰, ಠ益ಠ╰};
void ┻━┻︵╰(ಠ_ಠ ⚠) {exit((int)⚠);}
Re: faster function than 1/x?
Does GCC compile the memcpy version to optimal assembly? Last time I checked GCC was pretty good at juggling with memcpy.
 phlip
 Restorer of Worlds
 Posts: 7535
 Joined: Sat Sep 23, 2006 3:56 am UTC
 Location: Australia
 Contact:
Re: faster function than 1/x?
The memcpy version:compiles with GCC identically to the pointercasting version, even without optimisation (which makes sense, if you unroll the memcpy call then they're the same thing). And with O1 or higher, all three compile the same. And because I'm now testing from home (instead of work, where I only have 32bit Cygwin GCC to work with) I can say the same is true for both x86 and x86_64.
Code: Select all
float fastinvert3(float x) {
uint32_t asint;
memcpy(&asint, &x, 4);
asint ^= 0x7FFFFFFF;
memcpy(&x, &asint, 4);
return x;
}
Code: Select all
enum ಠ_ಠ {°□°╰=1, °Д°╰, ಠ益ಠ╰};
void ┻━┻︵╰(ಠ_ಠ ⚠) {exit((int)⚠);}

 Posts: 11
 Joined: Tue Oct 18, 2016 10:24 am UTC
Re: faster function than 1/x?
Why are you becoming complex. There is another easier and you only need to do subtraction and a if statement.
Let say, x1 = 4 and x2 = 2
which means f(x1) < f(x2) if f(x) = 1/x.
But
But you can also implement
Let say f(x1,x2) = x1  x2
If f(x1,x2)>f(x2,x1)
then f(x1) = x2
f(x2) = x1
else
f(x1) = x1
f(x2) = x2
In our case ^^ f(x1,x2) > f(x2,x1). So f(x1)<f(x2).
If you want to have a mathematical equation use karnaugh map. It will be in a digital form and convert it into proper mathematical form using simple changing.
Let say, x1 = 4 and x2 = 2
which means f(x1) < f(x2) if f(x) = 1/x.
But
But you can also implement
Let say f(x1,x2) = x1  x2
If f(x1,x2)>f(x2,x1)
then f(x1) = x2
f(x2) = x1
else
f(x1) = x1
f(x2) = x2
In our case ^^ f(x1,x2) > f(x2,x1). So f(x1)<f(x2).
If you want to have a mathematical equation use karnaugh map. It will be in a digital form and convert it into proper mathematical form using simple changing.
 phlip
 Restorer of Worlds
 Posts: 7535
 Joined: Sat Sep 23, 2006 3:56 am UTC
 Location: Australia
 Contact:
Re: faster function than 1/x?
... I don't fully understand what you're trying to say? In some places you're calling f() with two arguments, and in others you're calling it with one, and it seems like you're talking about different functions in different places?
But from what I think it sounds like you're saying, it's something like: say we were looking at the points 3 and 5, we just make f(3) = 5 and f(5) = 3, so that f(3) > f(5). And that's neat and all, but it only defines f at those two points. That's not particularly helpful. We want a function that's defined over a sufficiently large domain, that's positivebutstrictlydecreasing over the whole thing. Not just at two specific points.
But from what I think it sounds like you're saying, it's something like: say we were looking at the points 3 and 5, we just make f(3) = 5 and f(5) = 3, so that f(3) > f(5). And that's neat and all, but it only defines f at those two points. That's not particularly helpful. We want a function that's defined over a sufficiently large domain, that's positivebutstrictlydecreasing over the whole thing. Not just at two specific points.
Code: Select all
enum ಠ_ಠ {°□°╰=1, °Д°╰, ಠ益ಠ╰};
void ┻━┻︵╰(ಠ_ಠ ⚠) {exit((int)⚠);}
Who is online
Users browsing this forum: No registered users and 2 guests