Author Topic: STM32 I2s w/ external master clock (Read 2709 times)

paulca · « **on:** November 21, 2022, 12:09:18 pm »

Hi,

Continuing my adventures with STM32 audio, I connected up an ADC module and a DAC module and tried to copy the stream from one to the other.

First I enabled I2S1 as Half Duplex Master Receive Only. Then I set the clock configuration page to pull in the external master clock on pin PA2. I enabled DMA on SPI1_Rx.

Added a simple HAL_I2S_ReceiveDMA() with a 192byte buffer.

The ADC module was configured with it's crystal output (24.576Mhz) connected to the breadboard and looped back to it's MCK pin. Additionally that clock line was connected to the STM32 PA2. The ADC jumpers and switches were set to 48K 16bit, Slave.

Fired up the code and ... nothing.

Debugged into it and the DMA transfer never starts. It sits idle. I scoped/probed the best I could with at best 20Mhz scope. There were square waves on all the appropriate pins.

So for something to try I switched the ADC over to Master and made the STM32 Slave. DMA transfer started up fine.

Well, when I say "fine", even though the ADC was set to 16bit and the I2S was set to 16bit on 16bit frame I was seeing the following in the 16bit DMA buffer:

1234, 0, 52332, 0, 53525, 0,n,0,n,0,n,0,n,0

Which is exactly what you would expect to see as a full 32bit frame with left justified I2S. However I asked for a 16 bit frame, not a 32bit one.

Anyway. I configured I2S2 as Half Duplex Master, use external clock and connected it to the PCM5102A. Again I called HAL_I2S_TransmitDMA().

Again the DMA transfer never started. All the registers are set but no advancement of the pointer and no NXE trigger.

As a sanity test, starting to get frustrated I connected the PCM5102 directly to the same pins of the STM32 as where coming from the ADC and instantly music started playing. It didn't mater what settings I choose on the ADC after a slightly crashing it would resadjust to the new data, 16 bit, 24bit, 48k-192k all worked.

I read the confusing module makers documentation which hinted that 48k/16bit is only available in slave mode. So I tried reconfiguring it on the STM with 24bit and 96k, but no change.

Here is my suspicion.

The HAL driver code is somehow screwing up the I2S's dependency on that master clock input. That MCK is supposed to be connected to the I2S clock generator block when the external clock is enabled. Thus when the STM32 is in Master mode it will use that clock impetus to generate the BCK/LCK etc. It seems as though, that block is not being driven from the external clock and so the I2S is not triggering and data in it's shift register and thus no DMA.

It worked in master mode, so my first port of investigation is in working out how HAL sets it up. I'm expecting to find a miss configuration of those registers when the mode is set to master it is not infact using the external clock.

Anyone have any experience in this area who could throw me some hints? Am I doing something wrong?

paulca · « **Reply #1 on:** November 21, 2022, 05:18:50 pm »

I'm getting fed up with STM32.

Not once has a project actually worked without ****ing around with broken documentation or functionality not as described.

I've gone through the reference manual line by line, I've gone through the code line for line. I've looked at the individual bits in the registers.

It should be working. It's not.

Game over.

paulca · « **Reply #2 on:** November 21, 2022, 07:51:41 pm »

Long story short. HAL bug. Nowhere does it set the I2S Clock to external.

__HAL_RCC_I2S_CONFIG(RCC_I2SCLKSOURCE_EXT);

Fixes it.

Maybe. Maybe I should listen to those telling me not to waste my time with the HAL libraries. Read the RM set the registers yourself.

It's starting to seem like I spend a lot of time stepping through HAL code trying to figure out (a) what it thinks its doing and (b) what its actually doing.

DavidAlfa · « **Reply #3 on:** November 21, 2022, 07:52:16 pm »

What could I be doing wrong in a chain of 200 things?
Please take the magic glass ball, attaching no code

Do you really expect any help this way?

paulca · « **Reply #4 on:** November 21, 2022, 09:22:38 pm »

Quote from: DavidAlfa on November 21, 2022, 07:52:16 pm

What could I be doing wrong in a chain of 200 things?
Please take the magic glass ball, attaching no code
Do you really expect any help this way?

It wouldn't have helped anyway. If I'd posted the whole project. The bug (s) are in the HAL library. At the moment, there is no way to enable external I2S clock with the HAL libraries. It looks like a merge error in the middle of the file.

As an example:

stm32f4xx_hal_rcc_ex.c:2688

Code: [Select]

  switch (PeriphClk)
  {
  case RCC_PERIPHCLK_I2S:
    {
      /* Get the current I2S source */
      srcclk = __HAL_RCC_GET_I2S_SOURCE();

Code: [Select]

/** @brief  Macro to get the I2S clock source (I2SCLK).
  * @retval The clock source can be one of the following values:
  *            @arg @ref RCC_I2SCLKSOURCE_PLLI2S: PLLI2S clock used as I2S clock source.
  *            @arg @ref RCC_I2SCLKSOURCE_EXT External clock mapped on the I2S_CKIN pin
  *                                        used as I2S clock source
  */
#define __HAL_RCC_GET_I2S_SOURCE() ((uint32_t)(READ_BIT(RCC->CFGR, RCC_CFGR_I2SSRC)))

Code: [Select]

#define RCC_I2SCLKSOURCE_PLLI2S         0x00000000U
#define RCC_I2SCLKSOURCE_EXT            0x00000001U

Code: [Select]

#define RCC_CFGR_I2SSRC_Pos                (23U)                               
#define RCC_CFGR_I2SSRC_Msk                (0x1UL << RCC_CFGR_I2SSRC_Pos)       /*!< 0x00800000 */
#define RCC_CFGR_I2SSRC                    RCC_CFGR_I2SSRC_Msk

The issue is they do not shift the return from READBIT to the right 23U.

That is AFTER you fix what looks like a merge error in the same file which first disables the PLL clocks, then immediately enables them and configures them again without any care for the I2SCLKSource. So the I2S config is completely trashed if you use external clock.

With those fixes made to the library it works perfectly. Even with 24Mhz+ across a breadboard. I just need to work out how to make the changes permenant.

DavidAlfa · « **Reply #5 on:** November 21, 2022, 10:41:07 pm »

Consider submitting a bug fix to ST. I already did some time ago, nothing serious, a flag written inside a ISR no being declared as valatile, took some weeks and they accepted the PR.
Check the HAL index:
https://github.com/STMicroelectronics/STM32Cube_MCU_Overall_Offer/blob/master/README.md#stm32cube-hal-drivers

paulca · « **Reply #6 on:** November 21, 2022, 10:44:45 pm »

On bug reporting it. I put a quick post on their community. There are other larger issues in there that does look like a merge fart.

Well, prototype works.

ADC -I2S-> STM32 -I2S-> DAC works.

I did try a quick low pass filter, but meh. I need to read up. A simple y(n) = x(n)+x(n-1)/2 ... I obviously couldn't tell the difference and with a 48K sample frequency is that not like a cut off of 14K?

Anyway, my USB input should be workable now, maybe not with perfect sync, but with a proper 48K clock it shouldn't slip and crash anywhere near as bad.

That means I can try down mixing tomorrow

newbrain · « **Reply #7 on:** November 21, 2022, 11:43:02 pm »

Quote from: paulca on November 21, 2022, 10:44:45 pm

I did try a quick low pass filter, but meh. I need to read up. A simple y(n) = x(n)+x(n-1)/2

As a low pass, a moving average is quite terrible, though it has other uses.

14 kHz are so far out of my hearing it could as well be bats.

There are many ways to calculate FIR length and coefficients, and ample literature.
A simple way is to apply a windowing function (Kaiser, or others) to the inverse FFT of a rect funct (theoretical low pass), i.e. a sinc function.

Once you have them, CMSIS DSP library (should be part of STM32Cube - the SDK?) provides some decent routines for FIR and FIR decimation (if that's what you mean with downmixing).

As for reporting bugs to ST, maybe they improved the process, when I did that it took them a couple of years to correct what I found, and they didn't accept PRs on GH at the time.

Some of my code, more or less inspired by Lyons DSP book (just a copy paste - so might not be100% complete):

Code: [Select]

/*  SPDX-License-Identifier: BSD-3-Clause */
/* Bessel Zero Order function approximation */
float32_t I0(float32_t x)
{
    /* i0k(0) = 1 */
    double i0q = 1.0;
    double i0  = 1.0;
    /* Lyons says that 25 is enough */
    for (uint32_t q = 1; q < 25; q++)
    {
        /** Induction calculation of each term
         *  i0[q] =  (x^2q) / (4^q q!^2)
         *        =  [(x^q) / (2^q q!)]^2
         *        =  i0[q-1] * [ x/(2q) ]^2
         *           new term  ------------
         */
        double i0term = x / (2 * q);
        i0q *= i0term * i0term;
        /* Accumulate */
        i0 += i0q;
    }
    return i0;
}

/* Kaiser Beta parameter estimator */
static inline float32_t KaiserBeta(float32_t stopBandAtten)
{
    if (stopBandAtten >= 50.0f)
        return 0.1102f * (stopBandAtten - 8.7f);
    else if (stopBandAtten > 21.0f)
    {
        stopBandAtten -= 21.0f;
        return 0.5842f * powf(stopBandAtten, 0.4f) + 0.07886f * stopBandAtten;
    }
    return 0.0f;
}

/* Find FIR coeeficients given the number of taps (*must* be odd) and the BW */
void CalcFIRLowPass(float32_t *h, uint32_t n, float32_t BW, float32_t fsample, float32_t att)
{
    /* Include negative frequencies and normalize */
    BW *= M_2PI;
    BW /= fsample;

    /* Estimate Beta */
    float32_t beta   = KaiserBeta(att);
    float32_t i0bInv = 1.0f/I0(beta);

    /* Use Kaiser window against a theoretical low pass iDFT */
    int32_t centerTap = (n - 1) / 2;

    float32_t gain = 0;
    for (int32_t k = 0; k <= centerTap; k++)
    {
        float32_t x  = (float32_t)(k - centerTap) / centerTap;
        float32_t w  = I0(beta * sqrtf(1.0f - x * x)) * i0bInv; /* Kaiser window calculation */
        float32_t c  = w * Sinc((k - centerTap) * BW);
        h[k]         = c;
        h[n - k - 1] = c; /* Exploit simmetry */
        gain += h[k] * 2;
    }
    /* Normalize filter */
    gain -= h[centerTap]; /* This was counted twice! */
    gain = 1 / gain;
    for (uint32_t k = 0; k < n; k++)
    {
        h[k] *= gain;
    }
}

DavidAlfa · « **Reply #8 on:** November 22, 2022, 12:20:22 am »

Not an expert at dsp, but:

y(n) = x(n)+x(n-1)/2

Won't work at all.
Lacking average/accumulator, you can only filter between consecutive samples...
You can't apply 100% of the new value, you need some sort of coefficients, otherwise you might be amplifying instead!

Consider this buffer:
128,255,800,16500,2500

Output:
?, 319,927, 16900,10750

Doesn't make any sense, it's not even a Moving average.

Simplest filtering would be something like this EMA filtering (Though quite terrible as newbrain says),but it's the easiest filtering I know (My knowledge is close to nothing):

Code: [Select]

#define COEF 0.5f
#define SZ 32
static uint16_t avg;
uint16_t  buf[SZ];

void process(void){
    for (uint16_t i =0; i<SZ; i++){
        avg = (avg*COEF) + (buf[i]*(1.0f-COEF));
        buf[i] = avg;
    }
}

COEF limits are 0-1.
0 = Use 0% of old value, 100% of new value (No filtering at all)
1 = Use 100% of old value, 0% of new value (Value won't change ever)

So by playing with different coefficients, you distribute the weight between old and new values.

Online test, insert the output on the online plot maker:

https://onlinegdb.com/VkO-zPsnd

Sample output:

newbrain · « **Reply #9 on:** November 22, 2022, 09:17:02 am »

Quote

Code: [Select]
avg = (avg*COEF) + (buf[i]*(1.0f-COEF)); buf[i] = avg;

Between paulca filter and yours there's a fundamental difference.

Yours is an IIR, paulca's a FIR (specifically, with coefficients 1, 0.5 so, yes with some gain).
IIR = Infinite Impulse Response, i.e. an input pulse will generate an infinite output (for the filter to be stable, it must decrese in time, of course)
FIR = Finite Impulse Response, i.e. an input pulse will generate an output that only lasts a finite number of samples - hence they are inherently stable, so the gain in paulca's is a red herring.
The main practical difference is that in a FIR the output only depends on the current and previous input values (for a causal filter), in a IIR the output depends on the current value of the input and previous values of the output.

This why yours is (also) not a moving average, but rather an exponential smoothing.

paulca · « **Reply #10 on:** November 22, 2022, 10:05:29 am »

I just googled a basic FIR filter and pick the first one that gave me code and not greek.

I have watched a dozen videos on the design of IIR and FIR filters, but honestly, I just don't speak Maths. It's a problem, because on the rare occasion I feel I understand what the equation does, I think about it in completely different ways. That in itself is useless as I don't know how to translate between the language of Maths and the language of my head. So I mostly stare in awe and secretly think.... there will be libraries... there will. Another part of me says I could go back to Kane Academy and make a proper effort to learn some advanced maths. At 48, I doubt that will be easy. So I'm still holding out for the libraries.

CMSIS libs are there, I believe I need to import the ARM Math library for the M4 and it comes with a few basic primitives that I should be able to work out how to play with.

I'm not after much. I'm not creating an effects rack or a reverb or anything particularly juicy.

My ideal is having 3, maybe 5 parameter EQ stages. A high pass, 1, 2 or 3 notch and one low pass. To form a standard EQ. Cutoff/gain for the LP and HP. Centre, gain and Q for the notches.

Downmixing I mean taking 2 (or more) I2S inputs and mixing them with a variable weighting. Upmixing would be taking one input stream and duplicating it across multiple outputs (or buses). The later I'm actually considering using the highest SPI speed the MCU will do and using the CS line for routing. It's so simple it will either work perfectly or fail misserably.

Downmixing has an other challenge I didn't consider. It will be fun, but you have to have buffers you can align. That means feeding out of (at least 2) circular buffers to feed another after 'mixing'. The plan is going to be extending the tail end of the buffers, so they are not just ping-pong, but quadrants. Unless I can synchronise the DMA controllers somehow such that they finish writing a section of buffer together. The idea being the mix process has to wait on ALL Rx(Half)Cplt call backs before it can lift both the buffers, mix and output. So there has to be enough window to delay processing a buffer for a bunch of microseconds.

newbrain · « **Reply #11 on:** November 22, 2022, 01:27:01 pm »

Consider also that you have to keep state to correctly apply a filter:
if you filter is n long, you need at least n-1 samples before the first in the current buffer.
The FIR routines in CMSIS will do that for you (on you to provide buffers, but the documentation is reasonably clear).
CMCIS DSP might probably not be perfect etc. etc., but it provides most building block needed, and is reasonably efficient.

Unfortunately, some "greek" is needed to understand what is going on.
The Analog resource I pointed to previously is quite decent.
The ToC is here.

I use the Lyons "Understanding Digital Signal Processing" as a reference, the approach is not overly complicated, with many examples.

Familiarity with the idea of convolution is good, luckily 3Blue1Brown just released a very nice (as always) video:

As for downmixing, Ok I thought of something more complicated (FIR decimation).
Be careful with levels, of course.

I imagine you'll be using integers? If you want to do some more advanced processing, consider an FFT based processing chain: applying filters becomes a multiplication rather than a convolution (watch the video!), and that's (often) a computational net gain if you need to apply more than one.
Generating filter coefficients in the frequency domain is no harder than in the time domain, even simpler sometimes.

For a measure of FFT efficiency, and other DSP on STM32 topics, there's the classic AN4841.
As with time domain, you need to save (at least part of) the previous buffer, the two commonly used methods are overlap-and-save and overlap-and-add, only the last is treated in the Analog book, Lyons describes both.

paulca · « **Reply #12 on:** November 22, 2022, 03:11:48 pm »

I followed that video right up until he went to polynomials then I got lost.

I could even see how that process of sliding a window of, say 100 samples along some mulitpliers, like multiple all 100 samples by 1/100 add them up and make the 50th sample that value.... would result in a low pass filter.

I could even imagine that the length of the window will relate to the filter cut-off or bandwidth.

The multipliers would relate to the gain, with obvious care needed on overflows.

The polynominals bit is explaining the thing to someone who already know what they are, so I got lost. It's things like that and if/when it gets describe in mathematical formulae language ... I just go blank. Learning the terms, learning the symbols, learning the syntax instead of just going "Nope!" ... I can try. I'll go on a diet of 3b1b and kane academy and see if I can't get a little more clued.

For this evening. I'm not up to tackling mathematics

I'm getting a bottle of red wine and I'm writing a level meter on a TFT screen, something with more bread and butter geometric maths.

paulca · « **Reply #13 on:** November 22, 2022, 03:14:23 pm »

Quote from: paulca on November 22, 2022, 03:11:48 pm

For this evening. I'm not up to tackling mathematics I'm getting a bottle of red wine and I'm writing a level meter on a TFT screen, something with more bread and butter geometric maths.

Which... ironically will involve a low pass filter. An averaging buffer and a peak sampler with decays, but I can figure things out that way in my head in code pretty straight forwardly.

paulca · « **Reply #14 on:** November 22, 2022, 05:49:06 pm »

Scrap that. I took the max value for left and right from each buffer.

Every odd buffer I render the left meter on the TFT and every even buffer I render the right meter.

That already requires 4ms buffers.

Doing a decaying peak indicator will probably double that. This is where dual cores or dedicated micros for the screen are a must.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

EEVblog Electronics Community Forum

Author Topic: STM32 I2s w/ external master clock (Read 2709 times)

paulca

STM32 I2s w/ external master clock

paulca

Re: STM32 I2s w/ external master clock

paulca

Re: STM32 I2s w/ external master clock

DavidAlfa

Re: STM32 I2s w/ external master clock

paulca

Re: STM32 I2s w/ external master clock

DavidAlfa

Re: STM32 I2s w/ external master clock

paulca

Re: STM32 I2s w/ external master clock

newbrain

Re: STM32 I2s w/ external master clock

DavidAlfa

Re: STM32 I2s w/ external master clock

newbrain

Re: STM32 I2s w/ external master clock

paulca

Re: STM32 I2s w/ external master clock

newbrain

Re: STM32 I2s w/ external master clock

paulca

Re: STM32 I2s w/ external master clock

paulca

Re: STM32 I2s w/ external master clock

paulca

Re: STM32 I2s w/ external master clock

Share me