But adding them up forwards [...] is *still* better (and also makes my_sin_inner() an iteration not a recursion) ...
I assume you verified FLT_EVAL_METHOD==0 (so that float operations are done at float precision and range)?
In that case, the rounding errors in the individual terms happen to cancel "better"!
(If you were to check, you'd find that "incorrect" 24/28 cases occur for completely different arguments, too. The ±1 differences we cannot get completely rid of without extending precision somehow, because they occur due to rounding errors at individual terms.)
The smaller the difference between successive terms with alternating signs, the more it helps with domain cancellation errors to sum them in descending order of magnitude. Ascending order of magnitude is usually better if the terms all have the same sign.
I suppose five
float terms are too few to really tell which one is truly better here. The 24 or 28 cases out of 65536, are due to rounding errors in individual terms not canceled out by rounding errors in other terms.
Now, since the target table sizes are powers of two, I wonder if a simple
CORDIC approach using
phase = x/TWOPI as the argument, using say Q.30 fixed point arithmetic, would yield even better results?
But now I need to test this method against the Horner's scheme I usually prefer...
Mine also uses Horner's method. It simply postpones the additions by saving them in temporary variables, and then does the additions from right to left, because that happens to be the order of increasing magnitude.
It is not "mathematically obvious" which order (increasing or decreasing in magnitude) is superior, when the terms have alternating signs. At least, I don't know of any good argument for either.
In case somebody does not recognize the term "domain cancellation" when summing/subtracting floating-point numbers:
Domain cancellation is what happens, when you try to naïvely calculate e.g.
1e100 + 2e-100 - 1e100 - 2e-100. You expect zero, but due to the limited range and precision in the floating point type, the result you get is
-2e-100: The second term is too small in magnitude to affect the sum (of it and the first term), and this is what we call "domain cancellation". If you use Kahan sum, or sum terms in descending order of magnitude, you get the expected zero result.
That sum is a contrived example, where the terms cancel out exactly: in a real life set or series, they only cancel out
somewhat. With descending order of magnitude, at some point the new term no longer affects the result, because the magnitude of the term is too small to affect the sum. With increasing order of magnitude, the small terms are accumulated first, so that with floating point, they are accounted for in the temporary sum.
When all the terms have the same sign, summing in order of increasing magnitude means that the temporary sum is as close to the magnitude of the next term to be added to the sum as possible.
Summing the negative and positive terms separately, would lead to
maximum domain cancellation when adding the two sums together, so that's no good.