Optimized floor() for cross-platform software

My work in computer graphics software involves a lot of float-to-int conversions, the majority of which require a floor() operation rather than a simple truncation. The problem is that floor() is a math library function, and quite hefty one at that. To conform to the IEEE-754 standard, it must correctly deal with overflow and with NANs, in addition to the fact that the function call itself introduces some overhead.

Since the days of the Pentium III, I had been using a fast inlined floor() approximation based on the bit-twiddle trick suggested at stereopsis.com, but that fix was specific to the FPU on the Pentium III and earlier x86 CPUs, and was completely irrelevant to other CPU brands or to newer x86 CPUs with SSE2 instructions. Also, the code not only ugly but was, as I noted in the first sentence of this paragraph, an approximation. Float values that were only slightly less than an integer (on the order of 1e-6 less) would sometimes be rounded up instead of down.

So, a few months ago, I decided to get rid of the obsolete bit-twiddle trick in favor of something more general. I came up with the following, which gives exact results as long as it doesn't encounter overflow or NAN, and which does not require any branching or other expensive CPU operations:

inline int FastFloor(double x)
{
  int i = (int)x;
  return i - ( i > x );
}

The trick here is very simple. If truncation to an int caused the value to increase, then we have to subtract one. We rely on the fact that conditionals such as ( i > x ) evaluate to one or zero.

As I mentioned earlier, this trick does not give correct results on overflow or underflow: it will counter-intuitively return 2147483647 on underflow and -2147483648 on overflow, though this is hardly worse than static_cast<int>(), which returns -2147483648 on both underflow and overflow in most implementations. A check for over/underflow could be included without adding any conditional branches, but that is left as an exercise for the reader.

Don't worry, I didn't forget about the ceil() operation:

inline int FastCeil(double x)
{
  int i = (int)x;
  return i + ( i < x );
}

Like FastFloor(), this code requires as a preconditon that the floating-point value is not NAN and lies within the range [-2147483648,2147483647]. If either of these is likely to occur, then an isnan() check and/or a range check on the float value can be performed. The same is also true for the math-library floor().