Author Topic: Converting to Cortex M0 ASM  (Read 4357 times)

0 Members and 1 Guest are viewing this topic.

Online DavidAlfa

  • Super Contributor
  • ***
  • Posts: 6207
  • Country: es
Re: Converting to Cortex M0 ASM
« Reply #25 on: June 20, 2022, 12:03:58 pm »
« Last Edit: June 20, 2022, 12:05:49 pm by DavidAlfa »
Hantek DSO2x1x            Drive        FAQ          DON'T BUY HANTEK! (Aka HALF-MADE)
Stm32 Soldering FW      Forum      Github      Donate
 

Offline ali_asadzadehTopic starter

  • Super Contributor
  • ***
  • Posts: 1929
  • Country: ca
Re: Converting to Cortex M0 ASM
« Reply #26 on: June 20, 2022, 12:33:22 pm »
Thanks david, But there is no saturation math on this link
ASiDesigner, Stands for Application specific intelligent devices
I'm a Digital Expert from 8-bits to 64-bits
 

Online DavidAlfa

  • Super Contributor
  • ***
  • Posts: 6207
  • Country: es
Re: Converting to Cortex M0 ASM
« Reply #27 on: June 20, 2022, 01:12:51 pm »
Weren't you aiming at decoding MP3 on your M0+ STM32G070?
As explained in lief's minimp3 (Which seems what you're using), it uses floating point, contrary as Keij's minimp3.
Forget mp3 decoding in FP on a M0+! You absolutely need a fixed point decoder.

Helix Mp3 decoder also uses fixed point, uses about 32KB of RAM after adjusting the MP3/audio buffers.
Only needs these asm routines, which are very similar to the arduino post.
Although the guy posted half the code, missing the required tables, etc... what a faggot! Better to do nothing then!
Code: [Select]
#elif defined(ARM_TEST)
static __inline__ int MULSHIFT32(int x, int y){
int zlow;
__asm__ volatile ("smull %0,%1,%2,%3" : "=&r" (zlow), "=r" (y) : "r" (x), "1" (y) : "cc");
return y;
}

static __inline int CLZ(int x){
int numZeros;
__asm__ ("clz %0, %1" : "=r" (numZeros) : "r" (x) : "cc");
return numZeros;
}

static __inline Word64 MADD64(Word64 sum64, int x, int y){
U64 u;
u.w64 = sum64;
__asm__ volatile ("smlal %0,%1,%2,%3" : "+&r" (u.r.lo32), "+&r" (u.r.hi32) : "r" (x), "r" (y) : "cc");
return u.w64;
}
« Last Edit: June 20, 2022, 04:30:22 pm by DavidAlfa »
Hantek DSO2x1x            Drive        FAQ          DON'T BUY HANTEK! (Aka HALF-MADE)
Stm32 Soldering FW      Forum      Github      Donate
 

Offline ali_asadzadehTopic starter

  • Super Contributor
  • ***
  • Posts: 1929
  • Country: ca
Re: Converting to Cortex M0 ASM
« Reply #28 on: June 20, 2022, 01:55:53 pm »
Quote
Weren't you aiming at decoding MP3 on your M0+ STM32G070?
As explained in lief's minimp3 (Which seems what you're using), it uses floating point, contrary as Keij's minimp3.
Forget mp3 decoding in FP on a M0+! You absolutely need a fixed point decoder.
minimp3 does it with Fixed point too! It's just way simpler than helix, a single header file ;)
ASiDesigner, Stands for Application specific intelligent devices
I'm a Digital Expert from 8-bits to 64-bits
 

Offline ali_asadzadehTopic starter

  • Super Contributor
  • ***
  • Posts: 1929
  • Country: ca
Re: Converting to Cortex M0 ASM
« Reply #29 on: June 20, 2022, 03:16:26 pm »
DavidAlfa you where right, stupid me just was looking to PCM buffer and the int (missing ASM function in minimp3 ), so please help to port the Helix to M0, maybe we had a chance to use it on this 1$ cortex M0 part (STM32G070RBT6) ;D
ASiDesigner, Stands for Application specific intelligent devices
I'm a Digital Expert from 8-bits to 64-bits
 

Online DavidAlfa

  • Super Contributor
  • ***
  • Posts: 6207
  • Country: es
Re: Converting to Cortex M0 ASM
« Reply #30 on: June 20, 2022, 04:03:07 pm »
Check the CMSIS library, usually has some math tricks.
Code: [Select]
  /**
   * @brief Clips Q31 to Q15 values.
   */
  static __INLINE q15_t clip_q31_to_q15(
  q31_t x)
  {
    return ((q31_t) (x >> 16) != ((q31_t) x >> 15)) ?
      ((0x7FFF ^ ((q15_t) (x >> 31)))) : (q15_t) x;
  }

Test:
Code: [Select]
int16_t sat16(int32_t x){
  return ((x>>16) != (x>>15)) ?
          (0x7FFF^(x >> 31)) : x;
}

11 instructions:
Code: [Select]
8000480: 1402      asrs r2, r0, #16
 8000482: 13c3      asrs r3, r0, #15
 8000484: 429a      cmp r2, r3
 8000486: d003      beq.n 8000490 <saturate+0x10>
 8000488: 4b02      ldr r3, [pc, #8] ; (8000494 <saturate+0x14>)
 800048a: 17c0      asrs r0, r0, #31
 800048c: 4058      eors r0, r3
 800048e: 4770      bx lr
 8000490: b200      sxth r0, r0
 8000492: e7fc      b.n 800048e <saturate+0xe>
 8000494: 00007fff .word 0x00007fff

Output seems correct:
Code: [Select]
IN        OUT
1         1
32767     32767
32768     32767
85000     32767
-1        -1
-32768    -32768
-32769    -32768
-85000    -32768

Still slower than Nominal Animal's code :D
« Last Edit: June 20, 2022, 04:09:58 pm by DavidAlfa »
Hantek DSO2x1x            Drive        FAQ          DON'T BUY HANTEK! (Aka HALF-MADE)
Stm32 Soldering FW      Forum      Github      Donate
 

Online DavidAlfa

  • Super Contributor
  • ***
  • Posts: 6207
  • Country: es
Re: Converting to Cortex M0 ASM
« Reply #31 on: June 20, 2022, 04:18:31 pm »
BTW, what's your real application? If it works at all, the decoder will likely eat 80-90% of the stm32 resources...
Hantek DSO2x1x            Drive        FAQ          DON'T BUY HANTEK! (Aka HALF-MADE)
Stm32 Soldering FW      Forum      Github      Donate
 

Offline ali_asadzadehTopic starter

  • Super Contributor
  • ***
  • Posts: 1929
  • Country: ca
Re: Converting to Cortex M0 ASM
« Reply #32 on: June 21, 2022, 06:12:41 am »
Quote
BTW, what's your real application? If it works at all, the decoder will likely eat 80-90% of the stm32 resources...
It's for a Robot Toy that can talk!

These are the ASM functions in Helix,
Code: [Select]
MULSHIFT32(x, y)    signed multiply of two 32-bit integers (x and y), returns top 32 bits of 64-bit result
 FASTABS(x)          branchless absolute value of signed integer x
 CLZ(x)              count leading zeros in x
 MADD64(sum, x, y)   (Windows only) sum [64-bit] += x [32-bit] * y [32-bit]
 SHL64(sum, x, y)    (Windows only) 64-bit left shift using __int64
 SAR64(sum, x, y)    (Windows only) 64-bit right shift using __int64

So Beginning with the first one, MULSHIFT32, I think it's the most critical one!
So What if I just took the top 16 bits of the inputs & did 16 x 16 ==> 32-bit result ?
ASiDesigner, Stands for Application specific intelligent devices
I'm a Digital Expert from 8-bits to 64-bits
 

Offline westfw

  • Super Contributor
  • ***
  • Posts: 4290
  • Country: us
Re: Converting to Cortex M0 ASM
« Reply #33 on: June 21, 2022, 07:07:02 am »
Quote
MULSHIFT32
There's a guy over on the ARM forums that seems pretty obsessed with the idea of getting MP3 decoding working on a CM0, and there have been some interesting optimization discussions on the 32x32->64 bit multiply and CLZ, IIRC.For example: https://community.arm.com/support-forums/f/architectures-and-processors-forum/11313/cortex-m0-m0-m1-32-bit-x-32-bit-----64-bit-signed-multiply/168510#168510
 

Online DavidAlfa

  • Super Contributor
  • ***
  • Posts: 6207
  • Country: es
Re: Converting to Cortex M0 ASM
« Reply #34 on: June 21, 2022, 09:44:19 am »
I forgot. I already said the decoder already uses 31 out of 36KB RAM... But you must also take the data from somewhere. SDcard, SPI flash, USB...? They will also take some ram for buffering.
Why not use a cheaper/simpler IC like
As for a simple test, you can skip asm optimization for now:
Code: [Select]
#elif defined(ARM_TEST)
static __inline__ int MULSHIFT32(int x, int y){
  return (x*y);
}

static __inline int FASTABS(int x){
int sign;
sign = x >> (sizeof(int) * 8 - 1);
x ^= sign;
x -= sign;
return x;
}

static __inline int CLZ(int x){
  int numZeros;
  if (!x)
    return (sizeof(int) * 8);
  numZeros = 0;
  while (!(x & 0x80000000)){
    numZeros++;
    x <<= 1;
  }
  return numZeros;
}

typedef union _U64 {
Word64 w64;
struct {
/* ARM ADS = little endian */
unsigned int lo32;
signed int hi32;
} r;
} U64;

static __inline Word64 MADD64(Word64 sum64, int x, int y){
  return sum64 + (int64_t)x*y;
}

static __inline Word64 SAR64(Word64 x, int n){
return x >> n;
}


use a bit of googling, searching these functions for cortex-M0/M0+ might drop some results:

https://community.arm.com/support-forums/f/architectures-and-processors-forum/11313/cortex-m0-m0-m1-32-bit-x-32-bit-----64-bit-signed-multiply
https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/a-fairly-quick-count-leading-zeroes-for-cortex-m0


Consider a cheap external solution like:
- DFplayer (IC YX5200-24SS)
- GPD2856C (IC GPD2846A)
- BT 201 (IC KT1025A)

These boards are pretty inexpensive and usually include a low-impedance ~1W amplifier, so you might get two things done.
They work with USB, SD-card, also eeprom, spi flash (But FAT16/32 filesystem)... some have a small programable internal memory to store sounds in low quality.

If it's just for a toy, you might get the job done with a ESP-32 module, disabling Wifi. Might be even cheaper than the STM32.
Pretty sure you'll find everything you required for mp3 decoding in Arduino.
« Last Edit: June 21, 2022, 10:30:49 am by DavidAlfa »
Hantek DSO2x1x            Drive        FAQ          DON'T BUY HANTEK! (Aka HALF-MADE)
Stm32 Soldering FW      Forum      Github      Donate
 

Offline ali_asadzadehTopic starter

  • Super Contributor
  • ***
  • Posts: 1929
  • Country: ca
Re: Converting to Cortex M0 ASM
« Reply #35 on: June 21, 2022, 01:36:54 pm »
Thanks, david for the Tips, Though I need some help regarding using the Lib,
Suppose that I have an array containing a minimal MP3 file, (so I can use internal flash for storage) something like this

Code: [Select]
const unsigned char _mp3Data[10000] ={ 0xFF, 0xFB, 0xE2, 0x04, 0x00, 0x01, 0x12, 0xE3 ,etc...};

MP3FrameInfo mp3FrameInfo;
static HMP3Decoder hMP3Decoder;
short pcm[1152*2];
int bytesLeft = 1940;//1940 is the Maximum MP3 frame size
unsigned char* ptr = &_mp3Data;
hMP3Decoder = MP3InitDecoder();

MP3Decode(hMP3Decoder, &ptr, &bytesLeft, pcm, 0);
ptr+=(1940 - bytesLeft);
bytesLeft = 1940;
MP3Decode(hMP3Decoder, &ptr, &bytesLeft, pcm, 0);

The decoder do it right for the first frame, But I think the Decoder does not do it right for the second frame, what’s am I missing!?

ASiDesigner, Stands for Application specific intelligent devices
I'm a Digital Expert from 8-bits to 64-bits
 

Online DavidAlfa

  • Super Contributor
  • ***
  • Posts: 6207
  • Country: es
Re: Converting to Cortex M0 ASM
« Reply #36 on: June 21, 2022, 11:34:51 pm »
I have no idea how HelixMP3 works. But a basic reading gives some hints.
First, the decoder will update pointer and bytes left.
Quote
/**************************************************************************************
 * Function:    MP3Decode
 *
 * Description: decode one frame of MP3 data
 *
 * Inputs:      valid MP3 decoder instance pointer (HMP3Decoder)
 *              double pointer to buffer of MP3 data (containing headers + mainData)
 *              number of valid bytes remaining in inbuf
 *              pointer to outbuf, big enough to hold one frame of decoded PCM samples
 *              flag indicating whether MP3 data is normal MPEG format (useSize = 0)
 *                or reformatted as "self-contained" frames (useSize = 1)
 *
 * Outputs:     PCM data in outbuf, interleaved LRLRLR... if stereo
 *                number of output samples = nGrans * nGranSamps * nChans
*              updated inbuf pointer, updated bytesLeft
 *
 * Return:      error code, defined in mp3dec.h (0 means no error, < 0 means error)
 *
 * Notes:       switching useSize on and off between frames in the same stream
 *                is not supported (bit reservoir is not maintained if useSize on)
 **************************************************************************************/
int MP3Decode(HMP3Decoder hMP3Decoder, unsigned char **inbuf, int *bytesLeft, short *outbuf, int useSize)

You're completely breaking it here:
Quote
MP3Decode(hMP3Decoder, &ptr, &bytesLeft, pcm, 0);
ptr+=(1940 - bytesLeft);    //<- ptr was already updated with the used bytes!
bytesLeft = 1940;              //<- bytesLeft was already updated! Might not have used all the 1940 bytes! A MP3 frame size is variable! (It's compressed)

Try this:
Code: [Select]
const unsigned char mp3Data[10000] ={ 0xFF, 0xFB, 0xE2, 0x04, 0x00, 0x01, 0x12, 0xE3 ,etc...};
unsigned char * mp3Ptr = &mp3Data;
int bytesLeft = sizeof(mp3Data);   // All your mp3 buffer size

MP3FrameInfo mp3FrameInfo;
static HMP3Decoder hMP3Decoder = MP3InitDecoder();

short pcm[1152*2];  // Each MP3 frame outputs 1152 pcm samples per channel
int result;

while(bytesLeft)
    // Decode one frame
    result = MP3Decode(hMP3Decoder, &mp3Ptr, &bytesLeft, pcm, 0);  // Decode one frame. mp3Ptr and bytesLeft are updated by the decoder.

    //Check result
    if (result != ERR_MP3_NONE ){
        mp3_error(result);
    }

    // Get last decoded frame info. Output samples will be in mp3FrameInfo.outputSamps.
    // Should be 1152*2 ? Perhabs this could be omitted if output is constant by standard, but better check for now...)
    MP3GetLastFrameInfo(hMP3Decoder, &mp3FrameInfo);

    // Now we have new samples in pcm buffer organized as LRLRLRLR...
    // Normally you would start a DMA to start playing the pcm buffer, decode the next frame while playing, filling a second pcm buffer, and keep switching them...
}

void mp3_error(int res){  // If error occurs, check which...
    switch(result){
        case ERR_MP3_INDATA_UNDERFLOW:
            //
            break;
        case ERR_MP3_MAINDATA_UNDERFLOW :
            //
            break;
        case ERR_MP3_FREE_BITRATE_SYNC :
            //
            break;
        case ERR_MP3_OUT_OF_MEMORY :
            //
            break;
        case ERR_MP3_NULL_POINTER :
            //
            break;
        case ERR_MP3_INVALID_FRAMEHEADER :
            //
            break;
        case ERR_MP3_INVALID_SIDEINFO :
            //
            break;
        case ERR_MP3_INVALID_SCALEFACT :
            //
            break;
        case ERR_MP3_INVALID_HUFFCODES :
            //
            break;
        case ERR_MP3_INVALID_DEQUANTIZE :
            //
            break;
        case ERR_MP3_INVALID_IMDCT :
            //
            break;
        case ERR_MP3_INVALID_SUBBAND :
            //
            break;
        case ERR_UNKNOWN :
            //
            break;
    }
}
« Last Edit: June 21, 2022, 11:37:30 pm by DavidAlfa »
Hantek DSO2x1x            Drive        FAQ          DON'T BUY HANTEK! (Aka HALF-MADE)
Stm32 Soldering FW      Forum      Github      Donate
 
The following users thanked this post: ali_asadzadeh


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf