Author Topic: Compact sin(x) which gets to within 0.1%, x = 0 to 2xPI (Read 8601 times)

T3sl4co1l · « **Reply #75 on:** December 26, 2021, 03:54:24 am »

Yeah, note that lg2(x) can be reduced to either counting maximum bit position (which is simply the integer logarithm; some ISAs even have an instruction for this), then computing on the range [1.0, 2.0). Also, presumably this is fixed point, since floating already is the first part, and libraries cover the rest?

Or also, if you only have basic arithmetic handy, just use Feynman's algorithm or the like; it's computed directly, and in not much more time than a few longhand multiplications or divisions. Polynomials are only attractive when fast hardware multiplication is available, with bonus points if division is also available.

It's interesting that polynomial approximations are notoriously poor (the (natural) series expansion goes as x^n/n); a rational series is a better fit (i.e. something like 1 - 1/x - 1/x^2 - 1/x^3 + ..., or some combination of p(x)/q(x) as a polynomial ratio, or whatever form you wish to represent it as). Err, what is such a series even called? I'm sure I've seen it before but I can't recall... Not quite Pade approximants, though that generates similar expressions.

Also, offhand, a hybrid approach seems better..?
\[ 1 - \frac{1}{x} + \frac{(x-1)^2}{3} - \frac{(x-1)^3}{7.5} \]
does surprisingly well. log has an infinite repeated pole at 0 (in effect; I think? But the answer from complex analysis is of course more interesting than a repeated pole, i.e., branch cut stuff) but if you aren't modeling arbitrarily close to 0, that really doesn't mean much to you, and any polynomial will do.

I can probably do a lot more with this. The above was just poking some numbers into WolframAlpha; perhaps for every positive power added, a complementary negative power gets added, too? Hmm...

And as usual -- if division is impractical on your platform, disregard rational approaches; a higher-order Taylor series (or better-optimized alternative) will simply perform better. And for that, we can use whatever transforms and tweaks as above.

Tim

emece67 · « **Reply #76 on:** December 26, 2021, 10:35:10 am »

Picuino · « **Reply #77 on:** December 26, 2021, 11:38:40 am »

This is a real log(x) implementation in c:

Code: [Select]

/*
   log returns the natural logarithm of its floating
   point argument.
   The coefficients are #2705 from Hart & Cheney. (19.38D)
   It calls frexp.
*/
#include <errno.h>
#include "cmath.h"

static const double p0 = -0.240139179559210510e2;
static const double p1 =  0.309572928215376501e2;
static const double p2 = -0.963769093368686593e1;
static const double p3 =  0.421087371217979714e0;
static const double q0 = -0.120069589779605255e2;
static const double q1 =  0.194809660700889731e2;
static const double q2 = -0.891110902798312337e1;

double log(double arg)
{
   double x,z, zz, temp;
   int exp;

   if(arg <= 0.0) 
   {
      errno = EDOM;
      return(-HUGE);
   }

   x = frexp(arg, &exp);
 
   if(x < INV_SQRT2)
   {
      x *= 2;
      exp--;
   }

   z = (x-1)/(x+1);
   zz = z*z;
   temp = ((p3*zz + p2)*zz + p1)*zz + p0;
   temp = temp/(((1.0*zz + q2)*zz + q1)*zz + q0);
   temp = temp*z + exp*LN2;
   return(temp);
}

double log10(double arg)
{
   return(log(arg)*INV_LN10);
}

It makes the transformation z = (x-1)/(x+1); before compute the polynomial.

The interval of approximation is [0.7071067, 1.4142135]

Edit:
Book Hart & Cheney https://books.google.es/books/about/Computer_Approximations.html?id=jPVQAAAAMAAJ&redir_esc=y&hl=es

Picuino · « **Reply #78 on:** December 26, 2021, 12:38:57 pm »

Approximation of log(x) between [sqrt (2) / 2, sqrt (2)]

Code: [Select]

def log(x):
    a_1 = 2.0000004743492
    a_3 = 0.6664575555535
    a_5 = 0.41516419048219
    z = (x-1)/(x+1)
    zz = z * z
    return z * (a_1 + zz * (a_3 + zz * a_5))

Absolute error < 4e-8

Edit:
Python program to do the calculations:

Code: (Python) [Select]

import matplotlib.pyplot as plt
import numpy as np

def transf(x):
    return (x-1)/(x+1)

num_nodes = 6
interval = [transf(np.sqrt(2)/2), transf(np.sqrt(2))]

def f(x):
    return np.log(-(x+1)/(x-1))

def chebyshev_nodes(n, interval, closed_interval=False):
    """Return n chebyshev nodes over interval [a, b]"""
    i = np.arange(n)
    if closed_interval:
       x = np.cos((2*i)*np.pi/(2*(n-1))) # nodes over closed interval [-1, 1]
    else:
       x = np.cos((2*i+1)*np.pi/(2*n)) # nodes over open interval (-1, 1)
    x = np.flip(x)
    a, b = interval
    return 0.5*(b - a)*x + 0.5*(b + a) # nodes over interval [a, b]

def print_coefs(coefs):
    print("\nCoefficients:")
    for i in range(len(coefs)):
        print(f"    a_{i} = {coefs[i]:.14g}")

def tics(interval, numtics):
    a, b = interval
    return np.linspace(a, b, numtics)

def plot_func(x, y, err_abs, err_rel):
    fig, ax = plt.subplots(3)
    fig.subplots_adjust(left=0.1, bottom=0.05, right=0.95, top=0.95, wspace=None, hspace=0.35)

    ax[0].plot(x, y, linewidth=1)
    ax[0].set_title('Function')
    #ax[0].spines['left'].set_position('zero')
    ax[0].spines['right'].set_color('none')
    ax[0].spines['bottom'].set_position('zero')
    ax[0].spines['top'].set_color('none')  

    ax[1].plot(x, err_abs, linewidth=1)
    ax[1].set_title('Polynomial absolute error')
    #ax[1].spines['left'].set_position('zero')
    ax[1].spines['right'].set_color('none')
    ax[1].spines['bottom'].set_position('zero')
    ax[1].spines['top'].set_color('none')   

    ax[2].plot(x, err_rel, linewidth=1)
    ax[2].set_title('Polynomial relative error')
    #ax[2].spines['left'].set_position('zero')
    ax[2].spines['right'].set_color('none')
    ax[2].spines['bottom'].set_position('zero')
    ax[2].spines['top'].set_color('none') 

    plt.show()

def test_errors(interval, poly_coefs, num_dots=1000):
    x_dots = np.linspace(interval[0], interval[1], num_dots)
    y_dots = f(x_dots)
    y_poly_dots = np.polyval(poly_coefs, x_dots)

    # Compute errors
    err_abs = y_dots - y_poly_dots
    err_rel = np.arange(num_dots).astype(float)
    for i in range(len(x_dots)):
        if y_dots[i] == 0:
            if y_poly_dots[i] == 0:
                err_rel[i] = 0.0
            else:
                err_rel[i] = np.inf
        else:
            err_rel[i] = err_abs[i] / y_dots[i]

    # Show errors
    print(f"\nMax absolute error = {max(err_abs):.14g}")
    print(f"Min absolute error = {min(err_abs):.14g}")
    print(f"Max relative error = {max(err_rel):.14g}")
    print(f"Min relative error = {min(err_rel):.14g}")
    print(f"Max polynomial value = {max(y_poly_dots):.14g}")
    print(f"Min polynomial value = {min(y_poly_dots):.14g}")
    plot_func(x_dots, y_dots, err_abs, err_rel)

def main():
    x_dots = chebyshev_nodes(num_nodes, interval, closed_interval=True)
    y_dots = f(x_dots)
    print(f"x nodes = {x_dots}")
    print(f"y values = {y_dots}")
    degrees = np.arange(1, num_nodes, 2)
    poly_coefs = np.polynomial.polynomial.polyfit(x_dots, y_dots, degrees)
    print_coefs(poly_coefs)
    test_errors([interval[0], interval[1]], np.flip(poly_coefs))

main()

SiliconWizard · « **Reply #79 on:** December 26, 2021, 06:27:44 pm »

There is more than one way of doing it of course, and interestingly, the almost only way exposed here was some kind of polynomial approximation, which doesn't do all that well for log(), unless you use a lot of coefficients.

T3sl4co1l's approach is not polynomial and requires a reciprocal (which may or may not be practical depending on your requirements), but anyway, I tried it and could not make it evaluate to anything close to a log? So could you elaborate?

One typical way of computing log2() is to first extract the integer part of the log2, then work on the fractional part. The integer part can be found using various approaches. But if you're dealing with IEEE 754 FP numbers, and can assume those are normalized, it's actually just a matter of extracting the exponent, which is trivial, and then set the exponent to 0 for working on the fractional part. Oh yes, manipulating IEEE-754 is baddd, but it's actually what most libraries do under the hood.

Then, getting the fractional part can be done using a binary log approach, which allows you to control how many bits of precision you want. The more bits, the longer it takes of course.
Here is a fully functional approach which works on IEEE 754 double precision numbers, you can of course adapt it to single precision (just adjust the part dealing with the exponent), or use it as a starting point if you're going to use any other number format (I wrote a variant of this for fixed-point numbers, for instance.) So depending on the number of bits of precision, it's not ultra-fast, but it can be made as accurate as possible and only requires FP multiplication, compare and addition. And it will work for any x. Again you can translate this for other formats too.

Code: [Select]

double MyLog2(double x)
{
	if (x <= 0.0)
		return DBL_MIN;
	
	// Assume x is normalized.
	double dX = x;
	uint64_t *px = (uint64_t *) &dX;	// Oh yeah, that's not pretty.
	
	// Integer part.
	int16_t nExp = (int16_t)((*px >> 52) & 0x7FF) - 1023;	// IEEE 754 exponent
	double dY = (double) nExp, dB = 0.5;
	
	// Fractional part.
	*px |= 0x3FF0000000000000;	// Set IEEE 754 exponent to 0 (1023)
	*px &= 0xBFFFFFFFFFFFFFFF;
	
	const unsigned int nFracBits = 32;
	
	for (unsigned int i = 0; i < nFracBits; i++, dB *= 0.5)
	{
		dX = dX * dX;
		if (dX >= 2.0)
		{
			dY += dB;
			dX *= 0.5;
		}
	}
	
	return dY;
}

Picuino · « **Reply #80 on:** December 26, 2021, 07:39:41 pm »

Quote from: SiliconWizard on December 26, 2021, 06:27:44 pm

There is more than one way of doing it of course, and interestingly, the almost only way exposed here was some kind of polynomial approximation, which doesn't do all that well for log(), unless you use a lot of coefficients.

It doesn't take many coefficients to get a good approximation to log (x)

With only 3 coefficients you can obtain an aproximation with max relative error of 1.2e-7

With 6 coefficients you can obtain an aproximation with max relative error of 3e-14:

Code: [Select]

#include <errno.h>
#include <math.h>

#define INV_SQRT2  (0.70710678118654752440084436210)      /* sqrt(1/2) */
#define LN2        (0.69314718055994530941723212146)      /* log(2)    */

static const double a_1 = 1.999999999999945;
static const double a_3 = 0.6666666667970299;
static const double a_5 = 0.3999999486999626;
static const double a_7 = 0.2857216790491805;
static const double a_9 = 0.2217418680272728;
static const double a_11 = 0.1960893014786098;

double logpol(double arg) {
   double x, z, zz, temp;
   int exp;

   if(arg <= 0.0) {
      errno = EDOM;
      return(-HUGE);
   }

   x = frexp(arg, &exp);
 
   if(x < INV_SQRT2) {
      x *= 2;
      exp--;
   }

   z = (x-1)/(x+1);
   zz = z*z;
   temp = ((((a_11*zz + a_9)*zz + a_7)*zz + a_5)*zz + a_3)*zz + a_1;
   temp = temp*z + exp*LN2;
   return(temp);
}

T3sl4co1l · « **Reply #81 on:** December 26, 2021, 07:49:16 pm »

Quote from: SiliconWizard on December 26, 2021, 06:27:44 pm

T3sl4co1l's approach is not polynomial and requires a reciprocal (which may or may not be practical depending on your requirements), but anyway, I tried it and could not make it evaluate to anything close to a log? So could you elaborate?

It wasn't supposed to; well, the display-math one was (fitted to ln(x) from 1 to 2, which... wait, that's not log2 anyway, that should be 1 to e -- well, it was a _very_ basic attempt, you see--). The other suggested series were merely forms to try and fit coefficients to (emphasis on "something like"!).

I'm not very familiar with the (x+1)/(x-1) substitution, but it sounds like it's doing a much better job, so I'm not too bothered about [not] expanding on my ideas. It's probably a better form of ultimately the same sort of thing.

Tim

SiliconWizard · « **Reply #82 on:** December 26, 2021, 08:07:16 pm »

OK no problem!

And yes, what Picuino showed is not polynomial either (in x!)
emece67 tried to use polynomial approx. and could see for themselves it was not the right way of tackling it.

I haven't checked the results myself for Picuino's code - but I do think it's a popular way of doing it (seems to be used in some C lib implementations), so I guess it does the job. But it requires a division, which may cost quite a bit depending on the architecture.

The one I suggested has the benefit of not requiring any division, and being as accurate as you want it to be. It's not as fast obviously. For the 32-bit fractional version, it's on average 8 times slower per log2() than the C lib one (GCC on x86_64). Of course most of the time is spent in the loop. There may be ways to optimize it.

Oh, and you may find the pointer aliasing for extracting the exponent ugly, but if you're using frexp(), as in Picuino's code, then you're going to need to use the math library. And then you may as well use the log2() function directly.

brucehoult · « **Reply #83 on:** December 26, 2021, 10:24:29 pm »

Quote from: SiliconWizard on December 26, 2021, 08:07:16 pm

Oh, and you may find the pointer aliasing for extracting the exponent ugly, but if you're using frexp(), as in Picuino's code, then you're going to need to use the math library. And then you may as well use the log2() function directly.

frexp() is very easy to do yourself, without the math library.

Ignoring zero, infinities, and NaNs it's just:

Code: [Select]

double my_frexp(double d, int *exp) {
  uint64_t d_bits = *(uint64_t*)&d;
  uint64_t exp_mask = 2047L << 52;
  *exp = ((d_bits & exp_mask) >> 52) - 1022;
  d_bits = (d_bits & ~exp_mask) | (1022L << 52);
  return *(double*)&d_bits;
}

On my Mac that's just...

Code: [Select]

0000000100003e94 _my_frexp:
100003e94: 08 00 66 9e                  fmov    x8, d0
100003e98: 09 f9 74 d3                  ubfx    x9, x8, #52, #11
100003e9c: 29 f9 0f 51                  sub     w9, w9, #1022
100003ea0: 09 00 00 b9                  str     w9, [x0]
100003ea4: 08 d1 41 92                  and     x8, x8, #0x800fffffffffffff
100003ea8: 08 21 4b b2                  orr     x8, x8, #0x3fe0000000000000
100003eac: 00 01 67 9e                  fmov    d0, x8
100003eb0: c0 03 5f d6                  ret

SiliconWizard · « **Reply #84 on:** December 27, 2021, 12:32:25 am »

Quote from: brucehoult on December 26, 2021, 10:24:29 pm

Quote from: SiliconWizard on December 26, 2021, 08:07:16 pm
Oh, and you may find the pointer aliasing for extracting the exponent ugly, but if you're using frexp(), as in Picuino's code, then you're going to need to use the math library. And then you may as well use the log2() function directly.

frexp() is very easy to do yourself, without the math library.

Yes, have you seen my code?

Apparently you did not quite get what I meant above.

CatalinaWOW · « **Reply #85 on:** December 27, 2021, 01:47:37 am »

Excellent thread with many good suggestions. For those looking for starting points on these types of problems it is hard to do better than "Handbook of Mathematical Functions" by Abramowitz and Stegun. Has tables of values forall functions normally encountered in mathematics and suggested rational approximations for varying levels of accuracy. There is a Dover reprint of the original US Government Printing Office version.

emece67 · « **Reply #86 on:** December 27, 2021, 10:03:15 am »

Nominal Animal · « **Reply #87 on:** December 27, 2021, 11:13:58 am »

Quote from: T3sl4co1l on December 25, 2021, 03:55:54 pm

Which, FWIW, are equivalent to fractions of:
3253427/432964489
2779529/16777216
8386063/8388608

Actually, the smallest term is 8068423/1073741824.
Put another way, they are 8068423*2**-30, 2779529*2**-24, and 8386063*2**-23.

If one uses truncating (and not rounding) math, I fully believe that the exact best solution is very near the Remez coefficients, but just some ULPs away, due to the quantization to single-precision floating-point and rounding differences.

If anyone is wondering:

When approximating logarithms, we are relying on the key observation: log_b(a b) = log_b(a) + log_b(b).
Although the basis b=2 makes most sense when dealing with binary-radix floating point numbers, it does expand to any b via log_a(x) = log_b(x)/log_b(a). And since 1/log₂(e) ≃ 1.4426950408889634073599246810 and 1/log₂(10) ≃ 3.32192809488736234787031942948939, using any other basis is just one multiplication by a constant away, so we don't even need to worry about natural or base-10 logarithms; we get them for basically free if we handle base-2.

Because log₂(2) = 1 and log₂(1/2) = -1, we can multiply the argument x by any integer power of two 2^y, and use log₂(x) = log₂(x y) + y. This is why we only really need a to handle a small range of arguments.

The reason 0.7071 ≃ sqrt(1/2) ≤ x ≤ sqrt(2) ≃ 1.4142 is chosen as the range, is that 2*sqrt(1/2) = sqrt(2) (and 1/sqrt(2) = sqrt(1/2), and 1/sqrt(1/2) = sqrt(2)). That is, if you have a positive real number r, you can always multiply it with an integer power of two, 2^y where y is an integer, to get sqrt(1/2) ≤ r*2^y ≤ sqrt(2). Of course you can pick an even larger range to approximate –– just like you don't need to restrict to |x|≤PI/2 for approximating sine –– but the approximation will be more complex than it has to be.

What I am wondering myself, is whether absolute or relative error is the "better" error metric here. log₂(sqrt(1/2)) = -1/2, and log₂(sqrt(2)) = +1/2, with log₂(1) = 0. Also, log₂(x) = -log₂(1/x), so we definitely have symmetry here.

emece67 · « **Reply #88 on:** December 27, 2021, 01:54:09 pm »

Picuino · « **Reply #89 on:** December 27, 2021, 02:04:43 pm »

Here some interesting links about CORDIC method:
Elementary functions for embedded systems
https://www.hpl.hp.com/hpjournal/72jun/jun72a2.pdf
https://www.quinapalus.com/efunc.html

Somebody knows a good CORDIC algoritm for simple precision log(x) function?

Picuino · « **Reply #90 on:** December 27, 2021, 02:11:43 pm »

Program to generate CORDIC table for sine and cosine functions for several words precission lenght:
http://www.dcs.gla.ac.uk/~jhw/cordic/

emece67 · « **Reply #91 on:** December 27, 2021, 02:45:07 pm »

Nominal Animal · « **Reply #92 on:** December 27, 2021, 03:13:02 pm »

Another "CORDIC"-like approach to logarithm calculation is Feynman's algorithm, which uses the fact that every real \$x\$, \$1 \lt x \lt 2\$, can be approximated with
$$x \approx \prod_{k=1}^{N} \left(1 + 2^{-k}\right) B_k , \quad B_k \in 0, 1$$
where \$B_k\$ is the \$k\$'th fractional bit in \$x\$, and therefore
$$\log(x) = \sum_{k=1}^{N} B_k \log\left(1 + 2^{-k}\right)$$
Since single-precision floating point (IEEE 754 Binary32) has 24 significant bits in its mantissa, \$N = 24\$ should suffice.
That is, we'll only need 24 single-precision floating point numbers, pair-wise.

This is particularly well suited for IEEE 754 binary floating point types.

SiliconWizard · « **Reply #93 on:** December 28, 2021, 12:22:51 am »

Quote from: Nominal Animal on December 27, 2021, 11:13:58 am

What I am wondering myself, is whether absolute or relative error is the "better" error metric here.

Yeah. I think that would really depend on what you're using the function for.
With that said, as always, of course, before using any kind of "expensive" function or operation, make sure you really need them. Sometimes changing the computation approach just allows to get rid of them. Probably obvious for most people here, but I've see too many using log() or sqrt() when you could definitely do without them.

T3sl4co1l · « **Reply #94 on:** December 28, 2021, 03:06:05 am »

Or the really low-hanging fruits, like integer-argument pow()...

Speaking of, anyone know how often GCC or other compilers are able to simplify those? I don't know offhand whether it's smart enough to spot an integer (or say within 1 ULP of an integer cast to float) literal and produce alternative code, or if it really matters much/at all (because the argument has to be cast to float and no assumptions can be made), or if it can even end-run around the semantics and detect that an integer literal has been used? I would assume the latter, not so much, but all are reasonable to varying degrees...

Tim

SiliconWizard · « **Reply #95 on:** December 28, 2021, 03:25:07 am »

As far as I know, if you pass an integer to a function which has a FP argument, the integer will get promoted to a FP, and then the function will get called. Due to the promotion rule in C, I don't think a compiler would be allowed to optimize this out - at least by using an integer version of the function instead, for instance.

Now you're talking about literals - meaning you have in mind calling such functions with literals (constants). So, in general, would the value of the literal matter? Could the compiler even compute the result at compile-time?

Now feel free to experiment.

A small test showed that common math functions seem to be evaluated at compile time if they are called with a literal argument. For instance, the simple:

Code: [Select]

#include <math.h>

double Foo(double x)
{
        return sqrt(2)*x;
}

Yields:

Code: [Select]

Foo:
        .seh_endprologue
        mulsd   .LC0(%rip), %xmm0
        ret
        .seh_endproc
        .section .rdata,"dr"
        .align 8
.LC0:
        .long   1719614413
        .long   1073127582

Whether that works will all standard math functions, I have no clue!

brucehoult · « **Reply #96 on:** December 28, 2021, 03:38:05 am »

Certainly easy enough to experiment...

https://godbolt.org/z/xcaPnq5YW

I get the same results on this independent of ISA or gcc/clang.

T3sl4co1l · « **Reply #97 on:** December 28, 2021, 04:58:27 am »

Quote from: brucehoult on December 28, 2021, 03:38:05 am

Certainly easy enough to experiment...

https://godbolt.org/z/xcaPnq5YW

I get the same results on this independent of ISA or gcc/clang.

Ahh, interesting that it simplifies 2, but not 3...

Ah, that makes sense, -funsafe-math-optimizations will generate the [naively] expected code (but it might not distribute errors exactly the way you like, hence the name of the option). I suppose squaring being a binary operation (x * x), order doesn't matter, but for other powers it does. Also works for negative powers (emitting a division op, and whatever muls to get it there).

Tim

Nominal Animal · « **Reply #98 on:** December 28, 2021, 05:19:25 am »

Since binary logarithm requires a large-ish number of squarings, I suppose the alternative is a piecewise cubic approximation:

Code: [Select]

#include <math.h>
#define   LOG2_PARTS  256
#define   LOG2_ERROR  -HUGE_VALF

float log2_poly[LOG2_PARTS][4];

float log2_approx(float x)
{
    if (x <= .0f)
        return LOG2_ERROR;

    int  p, i;

    x = 2 * LOG2_PARTS * (frexpf(x, &p) - 0.5f);
    i = (int)x;
    x -= (float)i;

    float  result = x * log2_poly[i][3];
    result += log2_poly[i][2];
    result *= x;
    result += log2_poly[i][1];
    result *= x;
    result += log2_poly[i][0];

    return result + p;
}

If we use Cn as shorthand for log2_poly[i][n], then i = 0..N-1 is the piece 0.5+i/(2N) ≤ x < 0.5+(i+1)/(2N), C0 is the base-2 logarithm at the lower boundary, and C0+C1+C2+C3 is the base-2 logarithm at the upper boundary, approximately. The individual polynomial coefficients can be calculated using e.g. the Remez algorithm. N = LOG2_PARTS is the number of polynomials in the entire interval 0.5 ≤ x < 1, and is one of the parameters we can control to tune the fit. For obvious reasons, powers of two N are preferable if we need to extract the mantissa bits from the IEEE 754 Binary32 storage representation.

Of course, each polynomial takes 4×4 bytes = 16 bytes of ROM/Flash, so this is just another method of trading ROM/Flash for speed/accuracy, up to a limit.

hamster_nz · « **Reply #99 on:** December 28, 2021, 11:07:32 am »

Late to the party, but I knocked up a CORDIC implementation for you. It generates the tables needed, which in an actual implementation can be initialized with constants, rather than calling build_tables(), and avoid all floating point operations.

mysin() takes a 32-bit integer representing the phase angle (0 = 0 degrees, 0x40000000 = 90 degrees and so on) and returns the sine at that point. Adjust the value of "start" to scale the output range - the "CORDIC gain" is about 1.647, so the current value of 19813 gives a full scale of +/-32627

Here's the output:

Code: [Select]

$ ./cordic
table[0] is 536870912
table[1] is 316933405
table[2] is 167458907
table[3] is 85004756
table[4] is 42667331
table[5] is 21354465
table[6] is 10679838
table[7] is 5340245
table[8] is 2670163
table[9] is 1335086
table[10] is 667544
table[11] is 333772
table[12] is 166886
table[13] is  83443
table[14] is  41721
table[15] is  20860
table[16] is  10430
table[17] is   5215
table[18] is   2607
table[19] is   1303
table[20] is    651
table[21] is    325
table[22] is    162
table[23] is     81

Actual  Calced  Error
     0      0    0
  6365   6365    0
 12485  12486    1
 18126  18126    0
 23070  23070    0
 27128  27128    0
 30143  30143    0
 32000  32000    0
 32627  32627    0
 32000  32000    0
 30143  30143    0
 27128  27128    0
 23070  23071    1
 18126  18126    0
 12485  12486    1
  6365   6365    0
     0      0    0
 -6365  -6365    0
-12485 -12486   -1
-18126 -18126    0
-23070 -23070    0
-27128 -27128    0
-30143 -30143    0
-32000 -32000    0
-32627 -32627    0
-32000 -32000    0
-30143 -30143    0
-27128 -27128    0
-23070 -23070    0
-18126 -18126    0
-12485 -12486   -1
 -6365  -6365    0

Here's the code:

Code: [Select]

#include <stdio.h>
#include <stdint.h>
#include <math.h>


static uint32_t table[24] = {0};

#define TABLE_SIZE (sizeof(table)/sizeof(table[1]))

int build_table(void) {
   double total = 0;
   for(int i = 0; i < TABLE_SIZE; i++) {
      double angle = pow(2.0,-i);
      double v = atan(angle);
      v /= 2*M_PI;
      v *= (1<<30);
      v *= 4;
      table[i] = (int)v;
      printf("table[%d] is %6d\n", i, table[i]);
      total += v;
   }
   return 0;
}

int32_t mysin(int32_t in) {
   int32_t start = 19813*16;
   int32_t out_s = 0;
   int32_t out_c = 0;
   // Get 'in' to within +/- PI/2
   if(in > 0x40000000) {
     out_c = -start;
     in -= 0x80000000;
   } else if(in <= -0x40000000) {
     out_c = -start;
     in += 0x80000000;
   } else {
     out_c = start;
   }
   for(int i = 0; i < TABLE_SIZE; i++) {
      int32_t next_c, next_s;
      if(in > 0) {
        in -= table[i];
        next_c = out_c - (out_s>>i);
        next_s = out_s + (out_c>>i);
      } else {
        in += table[i];
        next_c = out_c + (out_s>>i);
        next_s = out_s - (out_c>>i);
      }
      out_s = next_s;
      out_c = next_c;
   }
   return out_s/16;
}

int main(int argc, char *argv[]) {
   build_table();
   int test_points = 32;
   printf("\nActual  Calced  Error\n");
   for(int32_t i = 0; i < test_points; i++) {
      int actual =  (int)(sin(i*M_PI*2/test_points)*32627);
      int calc   =  mysin(0x10000000/test_points*16*i);
      printf("%6d %6d %4d\n", actual, calc, calc-actual);
   }
}


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: Compact sin(x) which gets to within 0.1%, x = 0 to 2xPI (Read 8601 times)

Share me