verified for all 1,056,964,609 finite normal values between 0 and 1, i.e. one full period
Impressive
No, the right word is
useful. It only takes a minute on my ye olde Intel Core i5 7200U laptop, even without vectorization. (And if we count 0 and 1 both, then it is 1,056,964,610 tests.)
IEEE 754 Binary32 aka
float is what microcontrollers with hardware support for floats implement.
float has
- Two zeroes: -0.0 and +0.0. These compare equal.
- 16,777,214 subnormal values, 8,388,607 positive and 8,388,607 negative. Their magnitude is less than 2-126 ≃ 1.1754944e-38, so they are all essentially zero. Many math libraries do not support subnormals, and instead treat them as zero.
- Two infinities: -INFINITY and +INFINITY.
- 16,777,214 NaNs, 8,388,607 positive and 8,388,607 negative. These compare inequal to everything, even themselves. If they are used in an arithmetic operation, the result will also be NaN.
- 4,261,412,864 normal values, 2,130,706,432 positive and 2,130,706,432 negative. Zero is not counted as a normal value.
1,056,964,608 of these are less than 1 in magnitude, and 1,073,741,823 are greater than 1 in magnitude.
Because the sign is a separate bit in
float (that you can access with
signbit(x) in C), you only need to test positive zeros and normals. Thus, there are only 2,130,706,433 unique
float values to test. Or, if you are only interested in
0.0f <= x <= 1.0f, as is the case here, 1,056,964,610 unique values.
On architectures with IEEE 754 Binary32 support with
float byte order the same as
uint32_t byte order, the storage representation of positive zero is 0x00000000 (and negative zero is 0x80000000), and the storage representation of positive normals ranges from 0x00800000 to 0x7F7FFFFF. +1.0f is represented as 0x3f800000.
If you construct a function that checks your approximation for some
float x, and records any deviation from the correct value, you can use a very simple loop:
for (float x = 0.0f; isfinite(x); x = nexttowardf(x, 1e64)) check(x);or use the storage representation:
union { uint32_t u32; float f; } x; check(0.0f); for (x.u32 = 0x00800000; x.u32 < 0x7F800000; x++) check(u.f);If you are only interested in finite normals up to 1.0f, then use
for (float x = 0.0f; x <= 1.0f; x = nextafterf(x, +HUGE_VALF)) check(x);or use the storage representation:
union { uint32_t u32; float f; } x; check(0.0f); for (x.u32 = 0x00800000; x.u32 <= 0x3F800000; x++) check(u.f);While the above check the values in increasing order, the order obviously doesn't matter.
If you are interested in finding out if replacing a
float division by a constant with a
float multiplication with a constant having the reciprocal value, you can use the above to check. If you need it to yield the exact same answer for an integer divisor, you can use
the Markstein approach to turn the division into one multiplication and two fused multiply-adds (
fmaf); my pseudocode there tells you how often the given integer divisor will yield a value different to the division, across all normal positive
float dividends.
To leverage
vector extensions using GCC (or Clang), you only need to declare a suitable type, say
typedef float float4 __attribute__((vector_size (4 * sizeof (float))));and basic arithmetic operators –– +, -, *, /, negation, ^ (XOR), | (OR), & (AND), ~ (binary NOT), % (modulus or remainder) -– will do the operation element-wise. You can also access such vector as an array. If you use a scalar for the other operand for +, -, *, /, ^, |, &, or %, GCC will automatically convert the scalar to a vector with all components the same value.
If you dynamically allocate vector types, you should use
aligned_alloc(_Alignof (vectortype, (size_t)count * sizeof (vectortype)), because
malloc()/
calloc()/
realloc() may not enforce sufficient alignment for hardware-supported vector types. It is annoying, yes, but the compiler developers insist this is according to the C standards. (The same also affects libquadmath.)
Conversion from a vector type to another vector type
newvectortype with the same number of components is done using
__builtin_convertvector(vector, newvectortype). It behaves exactly like casting each member separately to
newvectortype.
This works across all architectures. If the target does not support such a vector type, GCC (and Clang) combine it from smaller vectors, or uses the plain scalar type, and duplicates the operations as many times as necessary. It does not always generate optimal code then (especially for sequences of operations that need more than half the available vector registers), but it will work.
Unfortunately, for vector verification of the results, like comparing two vectors component-wise, you do need to use architecture-specific vector extension built-ins. It is best to check what operations are available on the different architectures you use for verification, and hide the details behind
static inline functions. For each function, start with a version that does the checking component by component, so you can easily verify your vectorized and component-by-component checking functions always produce the same results. I like to keep a compile-time switch to select between vectorized and non-vectorized checking for that reason.
Vectorized checking of piecewise polynomial functions are complicated and funky, so I suggest you don't bother with those until you have a firm grasp of to handle single operation multiple-data vectors in general.