## faster function than 1/x?

A place to discuss the implementation and style of computer programs.

Moderators: phlip, Moderators General, Prelates

thefargo
Posts: 34
Joined: Mon Jan 20, 2014 3:45 pm UTC

### faster function than 1/x?

Hey there! I have a simple program that I am optimizing. Right now one of the bottle necks is taking the inverse of each element a very large vector (1/x). I don't actually need specifically to take the inverse. All I require is a function with the following two properties:

(1) f(x)>0 for all x>0
(2) given x1 > x2, f(x1)<f(x2)

so as you can see, f(x)=1/x is a perfectly valid solution. However, I am trying to find a different function which satisfies these two constraints but evaluates more quickly than 1/x.

I've been looking and can't find anything.

Any ideas?

Flumble
Yes Man
Posts: 2183
Joined: Sun Aug 05, 2012 9:35 pm UTC

### Re: faster function than 1/x?

*waves hand*
Oooh! Ooh! How about you swap the values (or the comparison operator) and set the lower bound to -∞? Or optimize to an upper bound of +∞?

f(x) = 2^-x may also be a nice contender: subtract x from the maximum exponent and set the result as the exponent of a floating point type of your choice. (though, given x is "a very large vector", you probably don't have enough exponent bits in even a quadruple-sized float) Might be cheaper than inverting.

EvanED
Posts: 4331
Joined: Mon Aug 07, 2006 6:28 am UTC
Contact:

### Re: faster function than 1/x?

Input and output need to be doubles? Do the comparisons need to be strict? (i.e. would 'x1 > x2' but 'f(x1) <= f(x2)' be a sufficient condition)

BedderDanu
Posts: 39
Joined: Tue Jan 14, 2014 6:18 am UTC

### Re: faster function than 1/x?

f(x) = Max(element) - x ?

or

f(x) = A - x, where A is a constant known to be larger than Max(element)

commodorejohn
Posts: 1161
Joined: Thu Dec 10, 2009 6:21 pm UTC
Location: Placerville, CA
Contact:

### Re: faster function than 1/x?

BedderDanu wrote:f(x) = Max(element) - x ?

or

f(x) = A - x, where A is a constant known to be larger than Max(element)

Yeah, this is probably the simplest solution, except for the gotcha about f(Max(element)) = 0 - if that's not a problem, or if you can be certain that x will never equal Max(element), then this is pretty much the simplest solution you're going to find. The workaround version is also good, although it requires you to go up to the next size type, which may affect the performance gain.
"'Legacy code' often differs from its suggested alternative by actually working and scaling."
- Bjarne Stroustrup
www.commodorejohn.com - in case you were wondering, which you probably weren't.

Tub
Posts: 446
Joined: Wed Jul 27, 2011 3:13 pm UTC

### Re: faster function than 1/x?

BedderDanu wrote:f(x) = Max(element) - x ?

The perfect solution for integer math, but it can be problematic on floating point values that have vastly different orders of magnitude. For example, the set 1, 2, 3, ..., 100, 2^100 would map all the first hundred numbers to exactly 2^100, which is bad.

1/x seems like the perfect function, if only it could be computed faster. And it can. See for example here or here. If you're only using single precision floats, you can also try the Fast inverse square root.

The fastest way would be to roll your own bit level hacks. Floating points are of the form m * 2^e, where 1 < m < 2, so if you negate both the exponent and the mantissa separately, it might work. Construct a floating point number that reads (2-m) * 2^(-e), which can be done using nothing but bit operations and addition, as long as you don't have special values like +Inf, NaN, subnormals or even 0.

thefargo
Posts: 34
Joined: Mon Jan 20, 2014 3:45 pm UTC

### Re: faster function than 1/x?

@EvanED:
No, singles are fine. I avoided using f(x1) <= f(x2), because then f(x)=constant is technically a valid solution.

@BedderDanu:
Duh, this is a pretty obvious choice once you put it that way. I'm going to experiment with this and see how it goes. I've got to make sure that the bounds on the values are well behaved enough to give the behavior I am looking for

@Tub:
Thanks. I discovered the fast inverse square root in my searching. I was considering giving it a shot but figured I'd ask this forum to see if an easier implementation exists.

Gah. I also tried doing bitlevel manipulation. I figured out negating the exponent, but I didn't think of m'=(2-m) for the mantissa. Would it be necessary to actually use m'=3-m, so that 1<m'<2?

phlip
Restorer of Worlds
Posts: 7565
Joined: Sat Sep 23, 2006 3:56 am UTC
Location: Australia
Contact:

### Re: faster function than 1/x?

Tub wrote:
The fastest way would be to roll your own bit level hacks. Floating points are of the form m * 2^e, where 1 < m < 2, so if you negate both the exponent and the mantissa separately, it might work. Construct a floating point number that reads (2-m) * 2^(-e), which can be done using nothing but bit operations and addition, as long as you don't have special values like +Inf, NaN, subnormals or even 0.

You don't even need several of those operations... just invert the mantissa and the exponent, like:

Code: Select all

`float fastinvert(float x) {  uint32_t asint = *(uint32_t *)(&x);  asint ^= 0x7FFFFFFF;  return *(float *)(&asint);}`
What you end up with is a function that's... almost f(x) = 4/x when x is a power of two (specifically, f(1) is just slightly less than 4, f(2) is just slightly less than 2, f(4) is just slightly less than 1, etc). Then for input values that aren't powers of 2, it's linearly interpolated (so eg f(1.1) is about 3.8, since that's 10% of the way from f(1) to f(2)).

Subnormals and zero map to infinity and NaN, and vice versa, in various weird and wonderful combinations. But for normal numbers, this works.

You'd want to test it to make sure it's actually faster than just 1/x in your program, of course. Once you start getting down to this level of micro-optimisation, computers are weird, and performance on this scale can be hard to predict. I'd expect that copying some bits from float registers to int registers, doing a single bitmask, and copying them back, would be faster than a float division... but not with any amount of certainty.

Code: Select all

`enum ಠ_ಠ {°□°╰=1, °Д°╰, ಠ益ಠ╰};void ┻━┻︵​╰(ಠ_ಠ ⚠) {exit((int)⚠);}`
[he/him/his]

Tub
Posts: 446
Joined: Wed Jul 27, 2011 3:13 pm UTC

### Re: faster function than 1/x?

The simplicity of that solution is beautiful.

I wouldn't assume that the float is in a register; thefargo said that the function is applied to a large vector. In that case, it may benefit from SIMD instructions, or at least from reading two floats at a time, for even n:

Code: Select all

`void fastinvertvector(float *v, const int n) {  uint64_t *vi = (uint64_t *) v;  for (int i=0;i<n/2;i++)    vi[i] ^= 0x7FFFFFFF7FFFFFFF;}`

Another thing I forgot to mention: working on a huge vector of floats can often be sped up by moving calculations to the GPU via openCL. There's no benefit if all you're doing is 1/x (the overhead of copying the data to the GPU and back is a lot larger than the potential savings), but depending on the other calculations involved, it might be worth investigating.

EvanED
Posts: 4331
Joined: Mon Aug 07, 2006 6:28 am UTC
Contact:

### Re: faster function than 1/x?

thefargo wrote:Thanks for the excellent advice!
@EvanED:
No, singles are fine. I avoided using f(x1) <= f(x2), because then f(x)=constant is technically a valid solution.
I should have said "floating point" rather than "doubles"... point is, you're not working with ints or fixed point.

(Both of my questions were aiming toward a 'something big - x' solution, but it sounds like you have other better answers.

phlip wrote:

Code: Select all

`float fastinvert(float x) {  uint32_t asint = *(uint32_t *)(&x);  asint ^= 0x7FFFFFFF;  return *(float *)(&asint);}`

You really ought to be using a union for type punning like that; my understanding is your solution can lead to nasal demons due to the strict-aliasing rule.

Code: Select all

`union FloatAndInt {    float as_float;    uint32_t as_int;};float fastinvert(float x) {    FloatAndInt u;    u.as_float = x;    u.as_int ^= 0x7FFFFFFF;    return u.as_float;}`

It's possible your version was safe because you never keep around the pointers that violate the strict-aliasing rule... but I wouldn't count on it.
Last edited by EvanED on Tue Sep 20, 2016 4:09 pm UTC, edited 1 time in total.

Xenomortis
Not actually a special flower.
Posts: 1443
Joined: Thu Oct 11, 2012 8:47 am UTC

### Re: faster function than 1/x?

I seem to remember reading that the union variant is only safe in GCC, but I never really properly understood the logic around union aliasing rules from the standard.

korona
Posts: 495
Joined: Sun Jul 04, 2010 8:40 pm UTC

### Re: faster function than 1/x?

The pointer cast is undefined behavior. The compiler is allowed to reorder the int loads before previous float stores. Type pruning though union works on all compilers as far as I remember but I'm not entirely sure about MSVC and it's not in the standard. The standard way to do this is to use memcpy() or reads through char pointer.

phlip
Restorer of Worlds
Posts: 7565
Joined: Sat Sep 23, 2006 3:56 am UTC
Location: Australia
Contact:

### Re: faster function than 1/x?

I'm pretty sure using a union is also undefined behaviour? That reading from a field of a union that's not (a compatible type of) the one you wrote to isn't strictly allowed. Or something like that, it's been a while since I read the standard. Point being that if we're talking strict by-the-standard UB, I believe a union isn't any better than a pointer cast...

But it's been a while since I read the standard...

Code: Select all

`enum ಠ_ಠ {°□°╰=1, °Д°╰, ಠ益ಠ╰};void ┻━┻︵​╰(ಠ_ಠ ⚠) {exit((int)⚠);}`
[he/him/his]

EvanED
Posts: 4331
Joined: Mon Aug 07, 2006 6:28 am UTC
Contact:

### Re: faster function than 1/x?

phlip wrote:I'm pretty sure using a union is also undefined behaviour?
I think it may depend on C vs C++.

In C, it is not UB -- rather, it just produces a perhaps-partially-unspecified result. However, from what I can tell it does guarantee that, if you are converting from type A to B and A's "object representation" is a valid object representation for a B, that's what you'll get out. (6.2.6.1 of this draft. Also see J.1 for a list of unspecified behavior, which includes "The value of a union member other than the last one stored into".) At any rate, there's a pretty big gulf between unspecified and undefined behavior, and violation of the strict-aliasing rule is UB. I'm pretty sure that, in practice, pointer casts are also more likely to cause you problems.

For C++, I'm not sure. I went through the standard and didn't see anything about this, though I'm not totally sure what to look for other than "union." I did see one site say that this can be UB in C++; assuming that's true, using memcpy would be better (and that's what that site suggests).

phlip
Restorer of Worlds
Posts: 7565
Joined: Sat Sep 23, 2006 3:56 am UTC
Location: Australia
Contact:

### Re: faster function than 1/x?

EvanED wrote:
In C, it is not UB -- rather, it just produces a perhaps-partially-unspecified result.

Ah, I do sometimes get undefined and unspecified mixed up. I think I remembered seeing it in the list in J.1 but misremembered what the list was.

In which case, sure, this is safer:

Code: Select all

`float fastinvert(float x) {  union {float f; uint32_t i;} val = {.f = x};  val.i ^= 0x7FFFFFFF;  return val.f;}`
gcc (on my machine at least) compiles this into the exact same thing as the pointer-dance version, given -O1 or higher.

Code: Select all

`enum ಠ_ಠ {°□°╰=1, °Д°╰, ಠ益ಠ╰};void ┻━┻︵​╰(ಠ_ಠ ⚠) {exit((int)⚠);}`
[he/him/his]

korona
Posts: 495
Joined: Sun Jul 04, 2010 8:40 pm UTC

### Re: faster function than 1/x?

Does GCC compile the memcpy version to optimal assembly? Last time I checked GCC was pretty good at juggling with memcpy.

phlip
Restorer of Worlds
Posts: 7565
Joined: Sat Sep 23, 2006 3:56 am UTC
Location: Australia
Contact:

### Re: faster function than 1/x?

The memcpy version:

Code: Select all

`float fastinvert3(float x) {  uint32_t asint;  memcpy(&asint, &x, 4);  asint ^= 0x7FFFFFFF;  memcpy(&x, &asint, 4);  return x;}`
compiles with GCC identically to the pointer-casting version, even without optimisation (which makes sense, if you unroll the memcpy call then they're the same thing). And with -O1 or higher, all three compile the same. And because I'm now testing from home (instead of work, where I only have 32-bit Cygwin GCC to work with) I can say the same is true for both x86 and x86_64.

Code: Select all

`enum ಠ_ಠ {°□°╰=1, °Д°╰, ಠ益ಠ╰};void ┻━┻︵​╰(ಠ_ಠ ⚠) {exit((int)⚠);}`
[he/him/his]

EmelinaVollmering
Posts: 11
Joined: Tue Oct 18, 2016 10:24 am UTC

### Re: faster function than 1/x?

Why are you becoming complex. There is another easier and you only need to do subtraction and a if statement.

Let say, x1 = 4 and x2 = 2
which means f(x1) < f(x2) if f(x) = 1/x.

But
But you can also implement
Let say f(x1,x2) = x1 - x2

If f(x1,x2)>f(x2,x1)
then f(x1) = x2
f(x2) = x1
else
f(x1) = x1
f(x2) = x2

In our case ^^ f(x1,x2) > f(x2,x1). So f(x1)<f(x2).

If you want to have a mathematical equation use karnaugh map. It will be in a digital form and convert it into proper mathematical form using simple changing.

phlip
Restorer of Worlds
Posts: 7565
Joined: Sat Sep 23, 2006 3:56 am UTC
Location: Australia
Contact:

### Re: faster function than 1/x?

... I don't fully understand what you're trying to say? In some places you're calling f() with two arguments, and in others you're calling it with one, and it seems like you're talking about different functions in different places?

But from what I think it sounds like you're saying, it's something like: say we were looking at the points 3 and 5, we just make f(3) = 5 and f(5) = 3, so that f(3) > f(5). And that's neat and all, but it only defines f at those two points. That's not particularly helpful. We want a function that's defined over a sufficiently large domain, that's positive-but-strictly-decreasing over the whole thing. Not just at two specific points.

Code: Select all

`enum ಠ_ಠ {°□°╰=1, °Д°╰, ಠ益ಠ╰};void ┻━┻︵​╰(ಠ_ಠ ⚠) {exit((int)⚠);}`
[he/him/his]

### Who is online

Users browsing this forum: No registered users and 9 guests