Author Topic: Compact sin(x) which gets to within 0.1%, x = 0 to 2xPI (Read 9416 times)

Nominal Animal · « **Reply #25 on:** December 19, 2021, 04:18:39 am »

Quote from: brucehoult on December 18, 2021, 11:54:00 pm

My much simpler and smaller code in ...

True, but yours uses double-precision floating-point (double), while mine uses only single precision (float).

brucehoult · « **Reply #26 on:** December 19, 2021, 04:57:34 am »

Quote from: Nominal Animal on December 19, 2021, 04:18:39 am

Quote from: brucehoult on December 18, 2021, 11:54:00 pm
My much simpler and smaller code in ...
True, but yours uses double-precision floating-point (double), while mine uses only single precision (float).

Changing my series expansion to float instead of double gives (N=65536, S=32767):

Code: [Select]

  411 -1
64694 0
  431 1

brucehoult · « **Reply #27 on:** December 19, 2021, 05:24:28 am »

I actually get slightly worse results if I add up the terms from smallest to largest:

Code: [Select]

float my_sin_inner(int n, float tot, float term, float rad2){
  term *= rad2 / (-n * (n-1));
  float new_tot = tot + term;
  if (new_tot != tot)
    return term + my_sin_inner(n+2, new_tot, term, rad2);
  else
    return term;
}

int my_sin(int i){
  float rad = 2*PI*i/TBLSZ;
  float sin = rad + my_sin_inner(3, 0.0, rad, rad*rad);
  return (sin * SCALE) + (sin >= 0 ? 0.5 : -0.5);
}

Gives ...

Code: [Select]

  451 -1
64643 0
  442 1

brucehoult · « **Reply #28 on:** December 19, 2021, 07:32:57 am »

Restricting my single-precision routine to the first quadrant does improve it quite a lot.

Code: [Select]

int my_sin(int i){
  int neg = 0;
  if (i > TBLSZ/2){
    neg = 1;
    i = TBLSZ-i;
  }
  if (i > TBLSZ/4) i = TBLSZ/2 - i;
  float rad = 2*PI*i/TBLSZ;
  int sin = SCALE * (rad + my_sin_inner(3, 0.0, rad, rad*rad)) + 0.5;
  return neg ? -sin : sin;
}

Code: [Select]

   28 -1
65480 0
   28 1

So that's 0.085% with ±1 and 99.915% exact.

But adding them up forwards ...

Code: [Select]

float my_sin_inner(int n, float tot, float term, float rad2){
  term *= rad2 / (-n * (n-1));
  float new_tot = tot + term;
  if (new_tot != tot)
    return my_sin_inner(n+2, new_tot, term, rad2);                                                          
  else
    return tot;
}

... is *still* better (and also makes my_sin_inner() an iteration not a recursion) ...

Code: [Select]

   24 -1
65488 0
   24 1

emece67 · « **Reply #29 on:** December 19, 2021, 08:59:55 am »

brucehoult · « **Reply #30 on:** December 19, 2021, 10:15:19 am »

Quote from: emece67 on December 19, 2021, 08:59:55 am

Quote from: Nominal Animal on December 18, 2021, 10:53:06 pm
If you are interested in why one would implement the power series expansion of sin() that way: we want to do the addition of the terms in order of increasing magnitude, to minimize domain cancellation errors.

Didn't know that. But now I need to test this method against the Horner's scheme I usually prefer...

My functions effectively implement Horner's method. You'd be crazy not to. But I'm finding forward addition of the terms to be slightly more accurate than reverse addition. There's very little in it. We're only using 15 of the 23 bits in the final sum, after all -- so we're not really concerned with the last bit or two.

peter-h · « **Reply #31 on:** December 19, 2021, 04:38:46 pm »

@brucehoult your code works perfectly

Many thanks!

SPC = samples per cycle.

Code: [Select]

// Compact sin(x) function which doesn't need math.h
// i is from 0 to SPC-1

#define PI 3.14159265358979323846
#define SCALE 13107

int sin2(int i)
{
  double rad = 2*PI*i/SPC, term=rad, sin=rad, old_sin=0;
  for (int n=3; sin != old_sin; n+=2)
  {
    old_sin = sin;
    term *= (rad*rad) / (-n * (n-1));
    sin += term;
  }
  return (sin * SCALE) + (sin >= 0 ? 0.5 : -0.5);
}



	// Precompute a sin(x) table, for a whole cycle (0 to 2*PI) with signed 16 bit int
	// values from -13107 to +13107. Uses int16 rather than int32 to save RAM.
	// 13107 is 16384 * 2 / 2.5, to get DAC swing from +0.25V to +2.25V (2V P-P)

	// The table is shifted to the left by SHIFT samples, to compensate for the 2-sample
	// delay in the wave gen, plus the phase delay in the DAC output filter (probably
	// arranged to be equal to 1 sample period)

	/*
	 * This version needs #include "math.h" which adds about 5k to the FLASH usage
	double xinc=2*3.1415926/SPC;
	double x=xinc*SHIFT;

	for (int i=0;i<SPC;i++)
	{
		sintab[i] = sin(x)*13107.0;
		x+=xinc;
		if (x>(2*3.1415926)) x=0;
	}
	*/

	// This uses the simple sin2(x) computation
	for (int i=0;i<SPC;i++)
	{
		sintab[i] = sin2(i+SHIFT);
	}

To my surprise it works fine through the 2*PI cycle end; I am adding SHIFT to i to generate a shifted table to compensate for some 32F417 quirks and for the time delay in the 3 pole lowpass filter.

Nominal Animal · « **Reply #32 on:** December 19, 2021, 05:17:40 pm »

Quote from: brucehoult on December 19, 2021, 07:32:57 am

But adding them up forwards [...] is *still* better (and also makes my_sin_inner() an iteration not a recursion) ...

I assume you verified FLT_EVAL_METHOD==0 (so that float operations are done at float precision and range)?

In that case, the rounding errors in the individual terms happen to cancel "better"!
(If you were to check, you'd find that "incorrect" 24/28 cases occur for completely different arguments, too. The ±1 differences we cannot get completely rid of without extending precision somehow, because they occur due to rounding errors at individual terms.)

The smaller the difference between successive terms with alternating signs, the more it helps with domain cancellation errors to sum them in descending order of magnitude. Ascending order of magnitude is usually better if the terms all have the same sign.

I suppose five float terms are too few to really tell which one is truly better here. The 24 or 28 cases out of 65536, are due to rounding errors in individual terms not canceled out by rounding errors in other terms.

Now, since the target table sizes are powers of two, I wonder if a simple CORDIC approach using phase = x/TWOPI as the argument, using say Q.30 fixed point arithmetic, would yield even better results?

Quote from: emece67 on December 19, 2021, 08:59:55 am

But now I need to test this method against the Horner's scheme I usually prefer...

Mine also uses Horner's method. It simply postpones the additions by saving them in temporary variables, and then does the additions from right to left, because that happens to be the order of increasing magnitude.

It is not "mathematically obvious" which order (increasing or decreasing in magnitude) is superior, when the terms have alternating signs. At least, I don't know of any good argument for either.

In case somebody does not recognize the term "domain cancellation" when summing/subtracting floating-point numbers:

Domain cancellation is what happens, when you try to naïvely calculate e.g. 1e100 + 2e-100 - 1e100 - 2e-100. You expect zero, but due to the limited range and precision in the floating point type, the result you get is -2e-100: The second term is too small in magnitude to affect the sum (of it and the first term), and this is what we call "domain cancellation". If you use Kahan sum, or sum terms in descending order of magnitude, you get the expected zero result.

That sum is a contrived example, where the terms cancel out exactly: in a real life set or series, they only cancel out somewhat. With descending order of magnitude, at some point the new term no longer affects the result, because the magnitude of the term is too small to affect the sum. With increasing order of magnitude, the small terms are accumulated first, so that with floating point, they are accounted for in the temporary sum.

When all the terms have the same sign, summing in order of increasing magnitude means that the temporary sum is as close to the magnitude of the next term to be added to the sum as possible.

Summing the negative and positive terms separately, would lead to maximum domain cancellation when adding the two sums together, so that's no good.

Mechatrommer · « **Reply #33 on:** December 19, 2021, 08:00:51 pm »

analyzing all Taylor's series methods proposed here are using Horner's scheme dont worry. you all people dont smash each other

1) emece67 is using up to X^5 term with tuned (experimented?) constants instead of the true inversed factorial values (i guess to reduce error as much as possible in 1st quadrant (0-pi/2)) this should run fastest in real/run time i guess...
2) Nominal Animal is using up to X^9 term approximation, wrapped around (instead of loop). i guess slower than (1) but faster than (3)
3) brucehoult is using infinite loop until convergence hence truer Taylor's series. run slowest, but theoretically should give true sin value. code is smallest too if it matters. one possible problem is if the loop doesnt converge such as due to hardware FPU flaw such as round off error at the last decimal place, but its easily tested on particular machine. and esp OP's case of using static table calculate once at bootup.

looking at OP requirement, i will choose (3) but in case i need runtime sine value at acceptable accuracy, no table based allowed, and fast execution is important, i will choose (1), but if i have more room (faster processing) and need more accuracy, i will use (2) YMMV...

ps: since last night i tried to get fast interpolation table based for getting sin value in real/run time (not what the OP wants) including a novel deviation correction trick

i still cant beat emece67's code even with simplest interpolation in VB (his code also beats VB builtin Sin function). maybe i havent tried hard enough

hence, i surrender fondling this thread, until next time and cheers

...

Nominal Animal · « **Reply #34 on:** December 19, 2021, 08:06:03 pm »

Just out of interest, I verified that with scale = 13107, the following will produce exact results for power of two SPC, up to 8,388,608, with phase between 0 and 1:

Code: [Select]

#define  TWOPI  6.28318530717958647692528676655900576839433879875021164

int sinp(double phase, double scale)
{
    if (phase >= 0.5) {
        phase -= 0.5;
        scale = -scale;
    }

    if (phase > 0.25) {
        phase = 0.5 - phase;
    }

    double       x = phase * TWOPI;
    const double xx = x*x;

    x *= scale;
    double       result = x;

    for (int i = 2; i < 14; i+=2) {
        x /= (double)(-i*(i+1));
        x *= xx;
        result += x;
    }

    return (result < 0) ? (int)(result - 0.5) : (int)(result + 0.5);
}

with the table calculated via

Code: [Select]

    for (i = 0; i < SPC; i++) {
        sintab[i] = sinp((double)i / (double)(SPC), 13107.0);
    }

That is, the sintab will be identical to when computed using standard C library sin() and

Code: [Select]

    for (i = 0; i < SPC; i++) {
        sintab[i] = round(13107.0 * sin(TWOPI * (double)i / (double)(SPC)));
    }

Interestingly, if SPC is a power of two not greater than 65536, and you only have single-precision floating-point hardware, then

Code: [Select]

#define  TWOPI         6.28318530717958647692528676655900576839433879875021164

static int sinpf(float phase, float scale)
{
    if (phase >= 0.5f) {
        phase -= 0.5f;
        scale = -scale;
    }

    if (phase > 0.25f) {
        phase = 0.5f - phase;
    }

    const int   p = 65536.0f * phase;
    const float half = (p == 7014 || p == 7437 || p == 7689 || p == 8774 || p == 9447 ||
                        p == 11382 || p == 13188 || p == 14289) ? 0.0f : 0.5f;

    float       x = phase * (float)(TWOPI);
    const float xx = x*x;
    x *= scale;
    float       result = x;

    for (int i = 2; i < 12; i+=2) {
        x /= (float)(-i*(i+1));
        x *= xx;
        result += x;
    }

    return (result < 0) ? (int)(result - half) : (int)(result + half);
}

with the table calculated via

Code: [Select]

    for (i = 0; i < SPC; i++) {
        sintab[i] = sinpf((float)i / (float)(SPC), 13107.0f);
    }

will match the previous sintab[] exactly.

(For power of two periods up to 8192, no special cases for half are needed. For period 16384, there is one special case. For period 32768, there are 4 special cases. For period 65536, there are 8 special cases. Above that, both +1 and -1 do occur.)

When half is fixed to 0.5f, sinpf(x, 13107.0f) is correct in 99.998% of all possible float x between 0 and 1. The rest, 24624 of 1065353218 cases, differ by at most ±1.

Interestingly, sinp(x, 13107.0) does have 11 cases of float x between 0 and 1 (1,065,353,218, total) where the result is incorrect: 0x1.c06d98p-3 (0.21895903), 0x1.cd1914p-3 (0.22514549), 0x1.e702ccp-3 (0.23779830), 0x1.ee37c4p-3 (0.24131730), 0x1.f4f90ap-3 (0.24461563), 0x1.08e41ep-2 (0.25868270), 0x1.0c7e9ap-2 (0.26220170), 0x1.197376p-2 (0.27485451), and 0x1.1fc934p-2 (0.28104097) all yield a result that is too large by +1; and 0x1.701b66p-1 (0.71895903) and 0x1.8fe49ap-1 (0.78104097) yield a result that is too small by -1.

peter-h · « **Reply #35 on:** December 19, 2021, 08:35:59 pm »

Interesting observation about

Code: [Select]

for (int n=3; sin != old_sin; n+=2)
doing a floating point inequality comparison. Perhaps one should limit n to say 100, which must be a ridiculously large value anyway, but it would prevent the loop hanging?

Code: [Select]

  for (int n=3; sin != old_sin; n+=2)
  {
    old_sin = sin;
    term *= (rad*rad) / (-n * (n-1));
    sin += term;
    if (n>100) break;
  }

brucehoult · « **Reply #36 on:** December 19, 2021, 08:36:25 pm »

Quote from: Nominal Animal on December 19, 2021, 05:17:40 pm

Quote from: brucehoult on December 19, 2021, 07:32:57 am
But adding them up forwards [...] is *still* better (and also makes my_sin_inner() an iteration not a recursion) ...
I assume you verified FLT_EVAL_METHOD==0 (so that float operations are done at float precision and range)?

I did better than that -- looked at the code.

Code: [Select]

0000000100003d20 _my_sin_inner:
100003d20: e9 23 be 6d                  stp     d9, d8, [sp, #-32]!
100003d24: fd 7b 01 a9                  stp     x29, x30, [sp, #16]
100003d28: fd 43 00 91                  add     x29, sp, #16
100003d2c: 03 1c a0 4e                  mov.16b v3, v0
100003d30: 08 04 00 51                  sub     w8, w0, #1
100003d34: 08 fd 00 1b                  mneg    w8, w8, w0
100003d38: 00 01 22 1e                  scvtf   s0, w8
100003d3c: 40 18 20 1e                  fdiv    s0, s2, s0
100003d40: 08 08 21 1e                  fmul    s8, s0, s1
100003d44: 00 29 23 1e                  fadd    s0, s8, s3
100003d48: 00 20 23 1e                  fcmp    s0, s3
100003d4c: c0 00 00 54                  b.eq    #24 <_my_sin_inner+0x44>
100003d50: 00 08 00 11                  add     w0, w0, #2
100003d54: 01 1d a8 4e                  mov.16b v1, v8
100003d58: f2 ff ff 97                  bl      #-56 <_my_sin_inner>
100003d5c: 00 29 20 1e                  fadd    s0, s8, s0
100003d60: 02 00 00 14                  b       #8 <_my_sin_inner+0x48>
100003d64: 00 1d a8 4e                  mov.16b v0, v8
100003d68: fd 7b 41 a9                  ldp     x29, x30, [sp, #16]
100003d6c: e9 23 c2 6c                  ldp     d9, d8, [sp], #32
100003d70: c0 03 5f d6                  ret

It's using s register names not d register names, so it's single precision.

Code: [Select]

The smaller the difference between successive terms with alternating signs, the more it helps with domain cancellation errors to sum them in descending order of magnitude.  Ascending order of magnitude is usually better if the terms all have the same sign.

With factorial on the bottom line and powers of something never much bigger than 1.0 on the top line (1.57 if we restrict to the first quadrant) each successive term is much smaller than the previous one.

It would be a completely different matter for something that converges slowly, such as pi = 4 * (1 - 1/3 + 1/5 - 1/7 ...)

brucehoult · « **Reply #37 on:** December 19, 2021, 08:46:22 pm »

Quote from: peter-h on December 19, 2021, 08:35:59 pm

doing a floating point inequality comparison. Perhaps one should limit n to say 100, which must be a ridiculously large value anyway, but it would prevent the loop hanging?

It can't hang. The peculiar properties of computer floating point arithmetic are the whole reason for doing it like this -- finding out when A + B == A so that you know B has become too small to matter.

It also doesn't get anywhere near N=100. The maximum value in the first quadrant is N=15 i.e. 7 iterations/recursions on about 10% of the values.

emece67 · « **Reply #38 on:** December 19, 2021, 09:18:53 pm »

Mechatrommer · « **Reply #39 on:** December 20, 2021, 10:53:13 am »

Quote from: emece67 on December 19, 2021, 09:18:53 pm

Quote from: Mechatrommer on December 19, 2021, 08:00:51 pm
1) emece67 is using up to ~~X^3~~X^5 term with tuned (experimented?) constants instead of the true inversed factorial values (i guess to reduce error as much as possible in 1st quadrant (0-pi/2)) this should run fastest in real/run time i guess...
In fact using three terms of 1, 3 & 5 degree. Coeffs were computed using a variant of the Remez algorithm that uses as weighting function the same function to be approximated, thus turning the equi-ripple behavior of the basic Remez method into an "equi-correct_digits" one.

yup i stand corrected in earlier post. this is FU moment when i tried to post before bed time... looking back at posts.. yours is using X^5 tuned coeff's approx... Nominal Animal's is using X^9 and to the textbook 1/factorials approx... earlier post corrected... thanks.

Quote from: peter-h on December 19, 2021, 08:35:59 pm

Interesting observation about
Code: [Select]
for (int n=3; sin != old_sin; n+=2)doing a floating point inequality comparison. Perhaps one should limit n to say 100, which must be a ridiculously large value anyway, but it would prevent the loop hanging?

yes since peformance is not mandatory and this is calculate once only, imho you may be wise adding some safety guard, as bruce said, 100 maybe too large, maybe something like X^19 is good enough? i've not been coding for a while, so maybe this is the syntax?

Code: [Select]

for (int n=3; sin != old_sin, n < 20; n+=2)

peter-h · « **Reply #40 on:** December 20, 2021, 02:07:57 pm »

Error (GCC): left-hand operand of comma expression has no effect [-Wunused-value]

This works:

Code: [Select]

for (int n=3; (sin != old_sin) && (n < 30); n+=2)

AntiProtonBoy · « **Reply #41 on:** December 20, 2021, 02:16:11 pm »

Quote from: emece67 on December 17, 2021, 11:21:41 am

This:
Code: [Select]
x2 = x * x; sinx = x2 * 7.628645e-03f; // 7.628644736609089e-03 sinx -= 0.1660249f; // 0.1660248982778021 sinx *= x2; sinx += 0.9999130f; // 0.9999130398499662 sinx *= x;
being all variables 32 bit floats, gives a maximum error about 0.013 % over [0, π/2], x in radians. You can extend the allowed input range for x (to 0~2π) prior to entering such code.

Hope this helps, regards.

Underrated comment. By far the simplest and most elegant solution, given op's accuracy requirements. No loops, no branching, just a few hard coded newton-raphson iterations.

peter-h · « **Reply #42 on:** December 20, 2021, 02:29:29 pm »

By n/2, do you mean PI/2 ?

I know N-R converges very fast on a square root.

emece67 · « **Reply #43 on:** December 20, 2021, 02:33:31 pm »

nfmax · « **Reply #44 on:** December 20, 2021, 03:04:56 pm »

The Taylor series polynomial of a transcendental function, truncated at some finite term, is generally not the best approximating polynomial of that order. Approximating polynomials of lower degree giving errors less than the truncated Taylor series are called 'economised polynomials'

Abramowitz & Stegun give economised polynomial approximations for the sine and (many) other functions. The simplest such (4.3.96), using only two coefficients, gives more than enough accuracy for the OP's requirements, and with suitable scaling, can be implemented in 32 bit fixed-point arithmetic. Note that it gives an approximation for sin(x)/x - when you finish, don't forget to multiply by the number you first thought of!

emece67 · « **Reply #45 on:** December 20, 2021, 03:17:38 pm »

nfmax · « **Reply #46 on:** December 20, 2021, 03:28:53 pm »

Quote from: emece67 on December 20, 2021, 03:17:38 pm

Quote from: nfmax on December 20, 2021, 03:04:56 pm
The Taylor series polynomial of a transcendental function, truncated at some finite term, is generally not the best approximating polynomial of that order. Approximating polynomials of lower degree giving errors less than the truncated Taylor series are called 'economised polynomials'

Abramowitz & Stegun give economised polynomial approximations for the sine and (many) other functions. The simplest such (4.3.96), using only two coefficients, gives more than enough accuracy for the OP's requirements, and with suitable scaling, can be implemented in 32 bit fixed-point arithmetic. Note that it gives an approximation for sin(x)/x - when you finish, don't forget to multiply by the number you first thought of!

Note that the code I posted works the same way as the Abramowitz & Stegun example you cite (and the coefficients are almost the same).

You are quite right - there is a strong family resemblance! However, the A&S formulation is better suited to fixed-point implementation. I think the first computers with floating-point hardware didn't arrive until after 1955, and the original paper (which I don't have a copy of) was probably aimed at users of desk calculating machines

snarkysparky · « **Reply #47 on:** December 20, 2021, 06:22:11 pm »

Just put in a table of the first octent that is a power of two long at the highest resolution you want. The array argument has range of 0 to 2^n

You would have to scale the input values to be max at 2^n - 1 .

Determine the octent of the argument by shifting right by n.

langwadt · « **Reply #48 on:** December 21, 2021, 03:11:37 am »

Quote from: snarkysparky on December 20, 2021, 06:22:11 pm

Just put in a table of the first octent that is a power of two long at the highest resolution you want. The array argument has range of 0 to 2^n

You would have to scale the input values to be max at 2^n - 1 .

Determine the octent of the argument by shifting right by n.

quadrant?

emece67 · « **Reply #49 on:** December 21, 2021, 11:51:06 pm »


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: Compact sin(x) which gets to within 0.1%, x = 0 to 2xPI (Read 9416 times)

Share me