Author Topic: STM32: STM32F4 SPI Issues (Read 5804 times)

Ulfberth · « **on:** August 10, 2022, 07:12:09 am »

Hello everyone

My goal is to make communication between NRF24L01+ and STM32f411 ( STM32F411-Disco board )
In order to communicate with NRF24, you have to toggle CS pin every time you send data to it. CS can't be toggled to low for whole time

Problem is, SPI not working properly. Judging by logic analyzer data (screenshots below) it seems like there is issue between SPI Clock and CS, they are not synchronized. If I add a little delay right before CS go high it all works fine.

Here is code:

Code: [Select]

#include <stdint.h>
#include "stm32f4xx.h"


    void SPI1_INIT(void)
    {
        RCC->APB2ENR |= RCC_APB2ENR_SPI1EN; // Enable SPI1 Clock
    
        SPI1->CR1 |= SPI_CR1_BR_Msk;
        SPI1->CR1 |= SPI_CR1_MSTR;          //Bit 2 MSTR: Master selection
        SPI1->CR1 |= SPI_CR1_SSM | SPI_CR1_SSI;         
        SPI1->CR1 |= SPI_CR1_SPE;           //Bit 6 SPE: SPI enable
    }
    
    void SPI1_GPIO_INIT(void) // PA4 - SPI1_NSS | PA5-SPI1_SCK | PA6-SPI1_MISO| PA7-SPI1_MOSI
    {
        RCC->AHB1ENR |= RCC_AHB1ENR_GPIOAEN; //Enable GPIOA Clock
    
        GPIOA->MODER |= GPIO_MODER_MODER4_0;  //PA4 GPIO OUT mde
        GPIOA->MODER |= GPIO_MODER_MODER5_1;  //Enable Alternate function mode for GPIOA PA5
        GPIOA->MODER |= GPIO_MODER_MODER6_1;  //Enable Alternate function mode for GPIOA PA6
        GPIOA->MODER |= GPIO_MODER_MODER7_1;  //Enable Alternate function mode for GPIOA PA7
    
        GPIOA->AFR[0] |= 0x5UL<<GPIO_AFRL_AFSEL5_Pos;  //GPIO alternate function set 0101 AF5 SPI PA5
        GPIOA->AFR[0] |= 0x5UL<<GPIO_AFRL_AFSEL6_Pos;  //GPIO alternate function set 0101 AF5 SPI PA6
        GPIOA->AFR[0] |= 0x5UL<<GPIO_AFRL_AFSEL7_Pos;  //GPIO alternate function set 0101 AF5 SPI PA7
    }
    
    void spi_send(uint8_t data)
    {
        while(!(SPI1->SR & SPI_SR_TXE)); // Wait for TX buffer is emptly
        SPI1->DR = data;                  // Send data to TX buffer
    }
        
    
    int main(void)
    {
    
        SPI1_GPIO_INIT();
        SPI1_INIT();
    
        for(;;){
    
            GPIOA->BSRR = GPIO_BSRR_BR4;
    
            while(!(SPI1->SR & SPI_SR_TXE)); // Wait for TX buffer empty
            SPI1->DR = 0xFF;                  // Send data to TX buffer
    
            while(!(SPI1->SR & SPI_SR_TXE));
            while ((SPI1->SR & SPI_SR_BSY));
    
            //for(uint32_t i = 0; i<10; i++); //Delay
    
            GPIOA->BSRR = GPIO_BSRR_BS4;
        }
    }

wek · « **Reply #1 on:** August 10, 2022, 07:39:09 am »

BSY is a problematic signal. Try using RXNE instead.

In newer STM32 (starting with 'F0/'F3 IIRC) there's an automatic NSS framing ("NSS pulse") available.

JW

Ulfberth · « **Reply #2 on:** August 10, 2022, 08:43:24 am »

Oh, didn't knew about BSY acting like this...

I tried to use RXNE, but it gives me pretty much same result

Code: [Select]

	for(;;){

		GPIOA->BSRR = GPIO_BSRR_BR4;

		while(!(SPI1->SR & SPI_SR_TXE));       // Wait for TX buffer is emptly
		SPI1->DR = 0xFF;				  // Send data to TX buffer

		while(!(SPI1->SR & SPI_SR_RXNE));    //Wait till Receive buffer NOT empty
		uint8_t tmp  = SPI1->DR;                   //Reset RXNE by reading DR

//		while(!(SPI1->SR & SPI_SR_TXE));
//		while ((SPI1->SR & SPI_SR_BSY));

//		for(uint32_t i = 0; i<100; i++);

		GPIOA->BSRR = GPIO_BSRR_BS4;


	}

wek · « **Reply #3 on:** August 10, 2022, 10:47:28 am »

> I tried to use RXNE, but it gives me pretty much same result

Interesting.

And, upon closer inspection of my own article, indeed, the SEV pulse at end of second frame (0xCA) is generated from RXNE and is again half of byte early...

Sorry for misleading you. Maybe with some combination of CPOL/CPHA this works as expected. I don't have time to experiment now - this particular experiment was years ago.

JW

newbrain · « **Reply #4 on:** August 10, 2022, 10:58:25 am »

Quote from: wek on August 10, 2022, 07:39:09 am

BSY is a problematic signal. Try using RXNE instead.

Yes the RM itself suggest to use RXNE instead.

The problem here is that we have continuous transmission: as the Tx buffer is immediately filled, the SCK is unbroken.
When this happens, it is very difficult to have a meaningful interval when NSS is de-asserted, there is not a "good place" where to put it.

As the SPI is almost-but-not-quite a standard, there are variations in these details and how each device interprets them.
Depending on the speed you run the SPI, the NRF24L01 might not cope with this kind of continuous transmission - It needs NSS setup and hold times of 2 ns and a minimum NSS de-assered duration of 50 ns - I think the only sensible way is to add a small delay to move the NSS and/or somehow break the continuos transmission.

More sophisticated peripherals, e.g. NXP FlexSPI, geared towards (Q)SPI memory, can configure minimum NSS pulse width and setup and hold times.

Ulfberth · « **Reply #5 on:** August 10, 2022, 11:43:12 am »

It's weird but i just changed SPI speed to 4Mhz ( SPI speed before was 62.5Khz) I chose the slowest DIV, because i thought it gonna be less pain to debug it

And now at 4Mhz it seems work fine now with RXNE check before pulling CS high.

And what's even more weird, it works even if I do change CS by polling BSY and TXE only not touching RXNE at all

I am completely lost and don't know what was wrong...
Why it not works on slow speed, but works on fast while it kinda should be vice versa

UPD. On 8Mhz it also works fine

Code: [Select]

#include <stdint.h>
#include "stm32f4xx.h"


void SPI1_INIT(void)
{
	RCC->APB2ENR |= RCC_APB2ENR_SPI1EN; // Enable SPI1 Clock

	SPI1->CR1 |= SPI_CR1_BR_0;			//Div 4
	SPI1->CR1 |= SPI_CR1_MSTR; 		 	//Bit 2 MSTR: Master selection
	SPI1->CR1 |= SPI_CR1_SSM | SPI_CR1_SSI;
	

	SPI1->CR1 |= SPI_CR1_SPE; 			//Bit 6 SPE: SPI enable

}

void SPI1_GPIO_INIT(void) // PA4 - SPI1_NSS | PA5-SPI1_SCK | PA6-SPI1_MISO| PA7-SPI1_MOSI
{
	RCC->AHB1ENR |= RCC_AHB1ENR_GPIOAEN; //Enable GPIOA Clock

	GPIOA->MODER |= GPIO_MODER_MODER4_0;  //PA4 GPIO OUT mde
	GPIOA->MODER |= GPIO_MODER_MODER5_1;  //Enable Alternate function mode for GPIOA PA5
	GPIOA->MODER |= GPIO_MODER_MODER6_1;  //Enable Alternate function mode for GPIOA PA6
	GPIOA->MODER |= GPIO_MODER_MODER7_1;  //Enable Alternate function mode for GPIOA PA7

	GPIOA->AFR[0] |= 0x5UL<<GPIO_AFRL_AFSEL5_Pos;  //GPIO alternate function set 0101 AF5 SPI PA5
	GPIOA->AFR[0] |= 0x5UL<<GPIO_AFRL_AFSEL6_Pos;  //GPIO alternate function set 0101 AF5 SPI PA6
	GPIOA->AFR[0] |= 0x5UL<<GPIO_AFRL_AFSEL7_Pos;  //GPIO alternate function set 0101 AF5 SPI PA7

}



void spi_send(uint8_t data)
{
	while(!(SPI1->SR & SPI_SR_TXE)); // Wait for TX buffer is emptly
	SPI1->DR = data;				  // Send data to TX buffer
}

uint8_t spi_receive(void)
{
	while(!(SPI1->SR & SPI_SR_RXNE));
	return SPI1->DR;
}


int main(void)
{

	SPI1_GPIO_INIT();
	SPI1_INIT();

	for(;;){


		GPIOA->BSRR = GPIO_BSRR_BR4;

		while(!(SPI1->SR & SPI_SR_TXE));  // Wait for TX buffer is emptly
		SPI1->DR = 0xFF;				  // Send data to TX buffer


		while(!(SPI1->SR & SPI_SR_RXNE));
		uint8_t tmp  = SPI1->DR;


//		while(!(SPI1->SR & SPI_SR_TXE));
//		while ((SPI1->SR & SPI_SR_BSY));

//		for(uint32_t i = 0; i<100; i++);

		GPIOA->BSRR = GPIO_BSRR_BS4;


	}
}

Siwastaja · « **Reply #6 on:** August 10, 2022, 12:02:55 pm »

Read datasheets carefully, usually all these timing parameters need to be considered, and they may be wildly different with each SPI device because SPI is not any specific standard:

* Minimum required time from nCS->0 to the first clock edge
* Minimum required time from the last clock edge to nCS->1
* Minimum required time between nCS->1 to nCS->0 (idle time between two transactions).

Especially the last one might be very long on some SPI devices, and may be different depending on the data (certain "operations" requiring longer idle time).

If slower clock seemingly solves things, it is often because these other timing parameters get scaled, too.

Ulfberth · « **Reply #7 on:** August 10, 2022, 12:36:50 pm »

I doubt that it was timing problems as SPI communication between SPI and NRF is very clear. The easiest way to check communication is just send whatever 8 bit to NRF24 and it will send you its status register back.
So steps are:

1) CS low
2)Check is TX buffer empty
3) Send data to DR ( can be anything )
4) NRF24 sends you STATUS back
5) Check if communication ended
6) Pull CS back high

In my case, step number 5 was broken. Checking BSY and RXNE wasn't working at all. CS was getting toggled high right in the middle of last bit transmission. And that was at pretty slow speed of 65.2Khz. But on higher speed even on 8Mhz it works just fine.
So in my opinion (which can be wrong ofc), problems caused by flags (BSY and RXNE) as this flags controlling behavior of CS pin ( you can see that in code i posted above)

Siwastaja · « **Reply #8 on:** August 10, 2022, 01:02:25 pm »

I usually control nSS with hardware if possible, and if not, with a separate timer. Why? Because:
* SPI master takes predictable and easily calculated time to shift out the bits. You can assume it happens in the calculated time; and it will. Very small jitter can appear due to AHB/APB bus synchronization (as writing to TXDR works as a start trigger), but this is negligible. So you can use a timer to deactivate nCS without considering the SPI peripheral at all.
* As said above, many SPI slaves set strange requirements, like longer-than-usual delay between the last clock edge and nCS deassertion, so that even if the BSY flag did work out as expected (which it of course doesn't, this is STM32 after all), you would still need a timer.

This is what I usually do, nowadays:

isr1():
activate_ncs(),
possible_tiny_blocking_delay(); // if slave needs significant delay, add one more state
SPI->TXDR = data_to_send;
set_handler(isr2);
set_timer(expected_time+margin);

isr2():
deactivate_ncs();
data_to_receive = SPI->RXDR;
set_handler(isr1);
set_timer(ncs_idle_time);

In such case, you just need a timer in one-pulse mode which generates interrupts for you. SPI is not configured to generate any interrupts, and SPI flags are not needed for anything, except maybe verifying your assumptions in isr2 (and error out if not as expected).

wek · « **Reply #9 on:** August 10, 2022, 02:34:10 pm »

Quote from: Ulfberth on August 10, 2022, 11:43:12 am

It's weird but i just changed SPI speed to 4Mhz ( SPI speed before was 62.5Khz) I chose the slowest DIV, because i thought it gonna be less pain to debug it

And now at 4Mhz it seems work fine now with RXNE check before pulling CS high.

And what's even more weird, it works even if I do change CS by polling BSY and TXE only not touching RXNE at all

I am completely lost and don't know what was wrong...
Why it not works on slow speed, but works on fast while it kinda should be vice versa

The BSY flag (and maybe RXNE too) is still flipped half bit time sooner (i.e. before the complete frame end, on incorrect SPI edge), but your software is too slow to react on it.

JW

langwadt · « **Reply #10 on:** August 10, 2022, 03:08:22 pm »

Quote from: Siwastaja on August 10, 2022, 01:02:25 pm

I usually control nSS with hardware if possible, and if not, with a separate timer. Why? Because:
* SPI master takes predictable and easily calculated time to shift out the bits. You can assume it happens in the calculated time; and it will. Very small jitter can appear due to AHB/APB bus synchronization (as writing to TXDR works as a start trigger), but this is negligible. So you can use a timer to deactivate nCS without considering the SPI peripheral at all.
* As said above, many SPI slaves set strange requirements, like longer-than-usual delay between the last clock edge and nCS deassertion, so that even if the BSY flag did work out as expected (which it of course doesn't, this is STM32 after all), you would still need a timer.

This is what I usually do, nowadays:

isr1():
activate_ncs(),
possible_tiny_blocking_delay(); // if slave needs significant delay, add one more state
SPI->TXDR = data_to_send;
set_handler(isr2);
set_timer(expected_time+margin);

isr2():
deactivate_ncs();
data_to_receive = SPI->RXDR;
set_handler(isr1);
set_timer(ncs_idle_time);

In such case, you just need a timer in one-pulse mode which generates interrupts for you. SPI is not configured to generate any interrupts, and SPI flags are not needed for anything, except maybe verifying your assumptions in isr2 (and error out if not as expected).

unless the SPI is running very slow that seems like a lot of overhead

Siwastaja · « **Reply #11 on:** August 10, 2022, 03:18:49 pm »

Quote from: langwadt on August 10, 2022, 03:08:22 pm

Quote from: Siwastaja on August 10, 2022, 01:02:25 pm
I usually control nSS with hardware if possible, and if not, with a separate timer. Why? Because:
* SPI master takes predictable and easily calculated time to shift out the bits. You can assume it happens in the calculated time; and it will. Very small jitter can appear due to AHB/APB bus synchronization (as writing to TXDR works as a start trigger), but this is negligible. So you can use a timer to deactivate nCS without considering the SPI peripheral at all.
* As said above, many SPI slaves set strange requirements, like longer-than-usual delay between the last clock edge and nCS deassertion, so that even if the BSY flag did work out as expected (which it of course doesn't, this is STM32 after all), you would still need a timer.

This is what I usually do, nowadays:

isr1():
activate_ncs(),
possible_tiny_blocking_delay(); // if slave needs significant delay, add one more state
SPI->TXDR = data_to_send;
set_handler(isr2);
set_timer(expected_time+margin);

isr2():
deactivate_ncs();
data_to_receive = SPI->RXDR;
set_handler(isr1);
set_timer(ncs_idle_time);

In such case, you just need a timer in one-pulse mode which generates interrupts for you. SPI is not configured to generate any interrupts, and SPI flags are not needed for anything, except maybe verifying your assumptions in isr2 (and error out if not as expected).

unless the SPI is running very slow that seems like a lot of overhead

Why do you think so, and compared to what? DMA, obviously yes, but this code is doing the exact same things as one would do with the SPI peripheral flags, so no extra overhead.

And if and when you do not have an SPI peripheral which could generate the timing and control the nCS pin properly, then there is no other way than to do it in software, so the overhead is inevitable.

I use this pattern in high-performance inertial measurement systems where SPI packets are often around 5-8 bytes, utilizing SPI FIFOs: for example: activate nCS, write 6 bytes to TX FIFO (only 2 memory writes), set timer after the expected time of 6-byte SPI transaction, then deactivate nCS and read out 6 bytes from the FIFO. Two interrupts.

Only DMA can have less CPU time, but then it requires the SPI peripheral directly can produce the correct timing, and anyway this is limited to single nCS pin (so no multiple devices per bus). Unless you change the AltFunc configuration in an ISR, in which case you are again using CPU time and could just toggle the nCS pin in software.

newbrain · « **Reply #12 on:** August 10, 2022, 03:31:27 pm »

Quote from: Ulfberth on August 10, 2022, 12:36:50 pm

I doubt that it was timing problems as SPI communication between SPI and NRF is very clear.

Yes it's very clear and described in my post, NRF needs at least:
2 ns setup time (CS down to first SCK up)
2 ns hold time (last SCK down to CS up)
50 ns CS up time.

And yes, it is, in fact, a timing problem:
- at slow speed the SW managed to keep the Tx bufffer filled, so the SPI HW generates an unbroken SCK. There's no physical way to place a CS raising and falling edge that complies to the above, as the falling edge of CS should precede its raising edge.
- when you raised the speed, your SW could not fill DR in time for the SPI to keep continuous clocking, and as SCK was much faster, you had very abundant setup and hold time for CS, as can be seen from the LA trace you posted.

As said, SPI devices come in umpteen flavours - some might cope with a CS going up right after the sampling edge of SCK, some (as the NRF) will not.
Unfortunately, the only way to make this work is checking the DS and working around MCU's SPI limitations with SW.

Siwastaja advice is - as usual - very good, if you have a spare timer (STM32F411 has plenty).

langwadt · « **Reply #13 on:** August 10, 2022, 05:23:47 pm »

Quote from: Siwastaja on August 10, 2022, 03:18:49 pm

Quote from: langwadt on August 10, 2022, 03:08:22 pm
Quote from: Siwastaja on August 10, 2022, 01:02:25 pm
I usually control nSS with hardware if possible, and if not, with a separate timer. Why? Because:
* SPI master takes predictable and easily calculated time to shift out the bits. You can assume it happens in the calculated time; and it will. Very small jitter can appear due to AHB/APB bus synchronization (as writing to TXDR works as a start trigger), but this is negligible. So you can use a timer to deactivate nCS without considering the SPI peripheral at all.
* As said above, many SPI slaves set strange requirements, like longer-than-usual delay between the last clock edge and nCS deassertion, so that even if the BSY flag did work out as expected (which it of course doesn't, this is STM32 after all), you would still need a timer.

This is what I usually do, nowadays:

isr1():
activate_ncs(),
possible_tiny_blocking_delay(); // if slave needs significant delay, add one more state
SPI->TXDR = data_to_send;
set_handler(isr2);
set_timer(expected_time+margin);

isr2():
deactivate_ncs();
data_to_receive = SPI->RXDR;
set_handler(isr1);
set_timer(ncs_idle_time);

In such case, you just need a timer in one-pulse mode which generates interrupts for you. SPI is not configured to generate any interrupts, and SPI flags are not needed for anything, except maybe verifying your assumptions in isr2 (and error out if not as expected).

unless the SPI is running very slow that seems like a lot of overhead

Why do you think so, and compared to what? DMA, obviously yes, but this code is doing the exact same things as one would do with the SPI peripheral flags, so no extra overhead.

And if and when you do not have an SPI peripheral which could generate the timing and control the nCS pin properly, then there is no other way than to do it in software, so the overhead is inevitable.

I use this pattern in high-performance inertial measurement systems where SPI packets are often around 5-8 bytes, utilizing SPI FIFOs: for example: activate nCS, write 6 bytes to TX FIFO (only 2 memory writes), set timer after the expected time of 6-byte SPI transaction, then deactivate nCS and read out 6 bytes from the FIFO. Two interrupts.

Only DMA can have less CPU time, but then it requires the SPI peripheral directly can produce the correct timing, and anyway this is limited to single nCS pin (so no multiple devices per bus). Unless you change the AltFunc configuration in an ISR, in which case you are again using CPU time and could just toggle the nCS pin in software.

with lets say 20MHz SPI it is only a few us for 5-8 bytes, how much does the extra two ISR entries, ISR exits, timer setup take compared to busy wait?

Siwastaja · « **Reply #14 on:** August 10, 2022, 06:27:24 pm »

Quote from: langwadt on August 10, 2022, 05:23:47 pm

with lets say 20MHz SPI it is only a few us for 5-8 bytes, how much does the extra two ISR entries, ISR exits, timer setup take compared to busy wait?

Busy wait!? But busy wait takes 100% of CPU. You can't do anything on that controller but to communicate on SPI. Maybe in some super simple application you can hand-craft other things (like calculations) to happen during the wait (like games were written in early days, in assembly), but that gets pretty difficult very soon.

For simple performance tests, blocking waits perform better than interrupt-based designs, of course.

If pure DMA is not an option (and it often isn't), and you need CPU to do something else in parallel, you pretty much have to use interrupts and just take the ISR entry / exit penalty.

ISR entry, setting the timer, changing the interrupt vector, ISR exit is maybe 30-40 CPU cycles total.

Timer setup is very comparable to resetting SPI status register flags, a few CPU clock cycles (1-2 register writes). My point was to suggest using a timer instead of SPI peripheral flags to generate the timing. These solutions are comparable in performance. One offers more flexible (adjustable) timing than the other.

peter-h · « **Reply #15 on:** August 10, 2022, 09:52:02 pm »

I have spent much time playing with 32F417 SPI. Now have it working with a wide variety of devices, from ~600kHz to 21MHz. I have Adesto flash on SPI2 and half a dozen different devices (STLED316, ADS1118, MSP3550, HI3593 ARINC429, a funny chinese 8MB RAM, etc) on SPI3 (mutexed, because hardware is obviously not thread-safe).

1) Check the data sheet for CS=0 to first clock, and last clock to CS=1. A lot of chips are only ns but some need a lot of time (1-2us!).

2) Check the clock phase; most SPI devices use the common setting but some do vary. Also most of them don't actually care so can work "by luck".

3) A scope is an absolute must for verifying the timing. Assume nothing, check it all, and you will sometimes be surprised how fast (or slow) things are.

4) Make SPI TX blocking, properly, so you can do CS=1 as soon as the SPI TX function exits. May have to add a delay to meet 1) above. I always receive the transmitted data. When I use DMA I wait for RX DMA transfer count to go to zero (as an exact alternative there is a status bit you can poll). When I poll, I wait for RX byte count to match TX data byte count.

5) If delays are needed, use predictable delay functions e.g.

Code: [Select]

// Hang around for delay in microseconds

__attribute__((noinline))
void hang_around_us(uint32_t delay)
{
  extern uint32_t SystemCoreClock;

  delay *= (SystemCoreClock/4100000);

  asm volatile (
    "1: subs %[delay], %[delay], #1 \n"
    "   nop \n"
    "   bne 1b \n"
    : [delay] "+l"(delay)
  );
}

This one is neater (have to enable CCYCNT first)

Code: [Select]

// Delay for ms. Uses CPU clock counter.
// Uses two loops to prevent due to delay*B_SystemCoreClock being too big for uint32_t.
// Max delay is uint32_t ms.
// This is a precise delay. It uses special code to deal with uint32_t overflow.

void hang_around(uint32_t delay)
{
	volatile uint32_t max_count = SystemCoreClock/1000L;  // 168M = 1 sec
	volatile uint32_t start_time;
	do
	{
		start_time = DWT->CYCCNT;
		while((DWT->CYCCNT-start_time) < max_count) ;	// this counts milliseconds
		delay--;
	} while (delay>0);
}

I had a lot of people here help me with SPI stuff. May be worth looking at those threads. SPI is a lot slower than you might expect, due to the peripheral running at a slow speed (PCLKx) not the CPU clock speed, plus there are multi-clock sync (anti metastability) delays. So e.g. reading some status register is actually a surprisingly slow operation.

If you need some DMA SPI code I can post mine. I gradually moved my SPI comms to DMA, for everything, even the really slow stuff. The only time one can't is for accessing the CCM, which you can't do with DMA.

thm_w · « **Reply #16 on:** August 10, 2022, 11:37:21 pm »

Quote from: Ulfberth on August 10, 2022, 11:43:12 am

It's weird but i just changed SPI speed to 4Mhz ( SPI speed before was 62.5Khz) I chose the slowest DIV, because i thought it gonna be less pain to debug it

Yeah now you've learned. IMO, debug at the highest speed your logic analyzer can reliably read (within part rating of course).
Most should do 5-10MHz or so with SPI.

Ribster · « **Reply #17 on:** August 11, 2022, 06:39:04 am »

Quote from: Ulfberth on August 10, 2022, 07:12:09 am

My goal is to make communication between NRF24L01+ and STM32f411 ( STM32F411-Disco board )
In order to communicate with NRF24, you have to toggle CS pin every time you send data to it. CS can't be toggled to low for whole time

Why are you not using the HAL library ?
This will make your life a lot more easy + your code will be portable to different STM32 families.
From my experience, i toggle the CS pin in soft control mode. So no automatic CS toggling by hardware.
This had issues in the earlier chips, and i did not use it for the rest of my development on STM32.
Could be, that in newer hardware this is fixed.

newbrain · « **Reply #18 on:** August 11, 2022, 07:45:00 am »

Quote from: Ribster on August 11, 2022, 06:39:04 am

Why are you not using the HAL library ?

HAL will work, but for all the wrong reasons.

It will work because, being burdened by a lot of extra* code, it will give time to the clock to complete its transition before one has a chance to deassert NSS via SW.
Also, it is internally using BSY, against the RM recommendation and including a 1 ms extra timeout to cater for the BSY bug (where it can stay high at the end of a transaction).

DMA transfers using the HAL are more complicated to set up than using direct register writes (a handle here, a handle there, a function-like macro with questionable semantics).
IRQ transfers are quite slow (AHB/4 max frequency) when doing Tx/Rx, due to the baroque ISR and callback structure (the note applies here too).

Note: I've used it and to rig a Q&D thing I still use it. But one must be aware of its shortcomings.

*Mostly for trying to be as general as possible, sometimes without real justification

Siwastaja · « **Reply #19 on:** August 11, 2022, 11:55:48 am »

Quote from: newbrain on August 11, 2022, 07:45:00 am

It will work because, being burdened by a lot of extra* code, it will give time to the clock to complete its transition before one has a chance to deassert NSS via SW.
Also, it is internally using BSY, against the RM recommendation and including a 1 ms extra timeout to cater for the BSY bug (where it can stay high at the end of a transaction).

This kind of "fail every now and then by unexpectedly hanging around extra 1ms because we write buggy shit" is disastrous to any project sensitive with timing.

Obviously the STM32 HAL cannot be used when timing of SPI communication, or product reliability is critical at all.

It is quite difficult to completely abstract SPI communications as high-level software calls. This is because,
1) SPI devices have so varying and weird timing requirements,
2) projects are so vastly different; some need cycle-accurate timing to toggle nCS (e.g., ADC sampling (jitter)), some need burst readouts, some need to instantaneously deal with data, others collect them into long buffers, many deal with multiple slaves on single bus,
3) some projects are really tight on performance. For example, having multiple SPI devices on a 20MHz SPI bus, where each device requires their own software nCS, and CPU is running at only say 48MHz, and this data needs to be dealt with and pushed further somewhere else. Maybe you can deal with 20 cycles of ISR latency, but you cannot deal with 20 more cycles wasted on inefficient library call, or worse, 48000 cycles on a failed attempt to work around a bug, when better workarounds exist which are suitable for your project, but not easy to write as generic library calls (see my above "use a timer" approach which sidesteps all SPI peripheral HW bugs).

It is not a sin to NOT abstract a peripheral into its own module in an embedded MCU project, if this ad-hoc strategy enables good performance, relative ease of proving (or at very least testing) worst-case behavior, and reliability. After all, the peripheral register accesses are Just a Few Lines of Code^TM, so there is not much to be gained by abstraction and code reuse, and there is a lot to lose.

Given this thread, OP is already beyond the training wheel HAL level. They already know how the SPI internally works, and how it doesn't work, and also already know why their code does not work, and why the HAL does not work either, because it makes the same mistakes. The only way is up.

peter-h · « **Reply #20 on:** August 11, 2022, 02:59:36 pm »

BTW, the frequently used "1ms" HAL delay is actually anywhere from near-zero to one 1kHz tick. Like osDelay(1) it can be from "very little" to 1ms.

If you actually need a delay of "around 1ms" you need to do osDelay(2)

And same with the "HAL tick" delays.

So that code is really crap.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: STM32: STM32F4 SPI Issues (Read 5804 times)

Share me