15k LE generated just for that operation. Whoops!
eheheh, I had the same shock when I tried to implement a modified version of "bkm".
I have been researching a good way to calculate the complex exponential of a complex number, since the result is interesting!
ans=cmplxexp(x0.re, x0.im)
With a pure imaginary part (x0.re = 0), you get a trigonometric couple of function, { cos(phi), sin(phi) }
With a pure real part (x0.im = 0), you get the real exponential
Combining them you can get hyperbolic functons, { cosh(phi), sinh(phi) }
Manipulating them you can get the square root, logarithm, tan, tanh, etc, etc
It's a very powerful block of math which offers a lot of function-implementations!
But! It's damn complex,
unstable(1), and even staying with the whole arithmetic was fixed point, QN8.24, the algorithm requires nine huge LUT full of values of pre- calculated function sampled at specific points, plus a lot of correction value to help the algorithm to stay stable.
In short it consumes a lot of BRAM.
(1) this problem has been solved recently (with not formally) with an accelerant serie accelerating the convergence at the cost of introducing distortions. You need a control accelerant serie to smooth them, but it works stable, and it's damn fast!
32bit data size? it takes 32 clock cycles! And you get the result on the 33th!
Bad news: it also consumes a lot of multiplier and adder, and a lot of logic. Something like 20K LE (definitively too much area!!!), with a maximal speed of 130MHz on my Spartan3E
~ ~ ~ ~ ~ ~
Reading papers, I discovered that Intel has a similar technology implemented in their CPU starting from 80487. It was BKM-base, but their papers were never published as they are "industrial secrets". So, I wonder how did they solved the convergence problem? And how did they implemented it without wasting an area of silicon that, in my case, on fpga, it would take five time the area of the whole CPU_core ?
Intel is a commercial company. There will be no answer to my questions, neither their modified BKM algorithm will be published or explained in details.
A pity, but that's life
Conclusion:
since I need to implement the whole softcore, and since it eats resources, I am limited about resources available for the Math-(fixedpoint)-CoProcessor, thus I am implementing a soft version of Cordic, which just computes the two most used trigonometric functions: { COS, SIN }.
In the future, I will try to put the modified-BKM inside a dedicated fpga, like if it was a "80487" companion chip