Author Topic: Is there a way to find STM 32F CPUs which are upwards compatible with 32F417? (Read 6978 times)

peter-h · « **Reply #25 on:** July 09, 2023, 04:29:40 pm »

Maybe I am wrong but would the H not be "redoing everything again"?

The worst parts of the "32F417 job" were ETH and USB, both of which have totally unacceptably crap example source code, which one can get working only by trawling the web. One also needs a deep understanding of the functionality which quite obviously very few users have (myself included), unsurprisingly given the complexity.

Re caches, my "delay x microseconds" asm routine runs 20% faster in flash than in ram

Code: [Select]



// Hang around for delay in microseconds

__attribute__((noinline))
void B_hang_around_us(uint32_t delay)
{
  delay *= (B_SystemCoreClock/4100000L);

  asm volatile (
    "1: subs %[delay], %[delay], #1 \n"
    "   nop \n"
    "   bne 1b \n"
    : [delay] "+l"(delay)
  );
}

Later I did it with CYCCNT (milliseconds)

Code: [Select]


// Delay for ms. Uses CPU clock counter.
// Uses two loops to prevent due to delay*B_SystemCoreClock being too big for uint32_t.
// Max delay is uint32_t ms.
// This is a precise delay. It uses special code to deal with uint32_t overflow.

void B_hang_around(uint32_t delay)
{

	volatile uint32_t max_count = B_SystemCoreClock/1000L;  // 168M = 1 sec
	volatile uint32_t start_time;

	do
	{
		start_time = DWT->CYCCNT;
		while((DWT->CYCCNT-start_time) < max_count) ;	// this counts milliseconds
		delay--;
	} while (delay>0);

}

Siwastaja · « **Reply #26 on:** July 09, 2023, 05:32:16 pm »

When migrating, it's worth crawling through the reference manual if the most difficult peripherals (say, USB, ETH) are the same. They don't say this directly, but if it's the same, it's obvious because the whole section in the manual is just copy-paste and looks identical.

But if you managed all the complexity of LWIP + mBEDTLS, then surely that work doesn't need to be redone. Even if ETH MAC is different, similar work has been already done so the next driver is much quicker. Could still take weeks of course if there is some nasty documentation issue etc.

peter-h · « **Reply #27 on:** July 09, 2023, 08:39:46 pm »

The problem is that the low_level_input and low_level_output (the glue between LWIP and ETH PHY) never worked properly. AFAIK no actual full working code, either polled or interrupt drive, was ever published for the 32F4 for this. Yeah, one particularly rude guy on the ST forum claimed he had some working code, and I am sure he had, but no complete source ever surfaced, which one could just use. Only bits here and there. I had a Monday afternoon guy working for me for about 10 years (doing various jobs mostly unrelated to this) and I think he spent 6 months of Mondays on the ETH glue, LWIP and TLS.

MbedTLS sits on top of LWIP, mostly.

For some of the later chips, sources claimed to be good did/do exist but I wasn't paying attention.

To write that code "by reading the RM" is incredibly complex. The RM doesn't tell you how to do it. ST publish some appnotes with code in them but you get the same half-baked stuff you get with button pressing in Cube MX. Reason it's half baked is stuff like handling the link up/down status changes, handling various IP options (fixed, dhcp, etc). Here's just a bit of my code which is now rock solid and has been running 24/7 on a number of test boards for at least a year. It probably originated from MX. Of course Mr know-it-all "Piranha" on the ST forum tells you it is a load of buggy crap... To understand what it does in detail , or to write it originally, you need to be a total expert on TCP/IP, the 32F4 ETH subsystem, LWIP internal operation. This is no good; most people need finished code.

Code: [Select]


/**
 * @brief This function should do the actual transmission of the packet. The packet is
 * contained in the pbuf that is passed to the function. This pbuf
 * might be chained.
 *
 * @param netif the lwip network interface structure for this ethernetif
 * @param p the MAC packet to send (e.g. IP packet including MAC addresses and type)
 * @return ERR_OK if the packet could be sent
 *         an err_t value if the packet couldn't be sent
 *
 * @note Returning ERR_MEM here if a DMA queue of your MAC is full can lead to
 *       strange results. You might consider waiting for space in the DMA queue
 *       to become available since the stack doesn't retry to send a packet
 *       dropped because of memory failure (except for the TCP timers).
 */
static err_t low_level_output(struct netif *netif, struct pbuf *p)
{
	err_t errval;
	struct pbuf *q;
	uint8_t *buffer = (uint8_t *)(EthHandle.TxDesc->Buffer1Addr);
	__IO ETH_DMADescTypeDef *DmaTxDesc;
	uint32_t framelength = 0;
	uint32_t bufferoffset = 0;
	uint32_t byteslefttocopy = 0;
	uint32_t payloadoffset = 0;

	DmaTxDesc = EthHandle.TxDesc;
	bufferoffset = 0;

	/* copy frame from pbufs to driver buffers */
	for(q = p; q != NULL; q = q->next)
	{
		/* Is this buffer available? If not, goto error */
		if((DmaTxDesc->Status & ETH_DMATXDESC_OWN) != (uint32_t)RESET)
		{
			errval = ERR_USE;
			goto error;
		}

		/* Get bytes in current lwIP buffer */
		byteslefttocopy = q->len;
		payloadoffset = 0;

		/* Check if the length of data to copy is bigger than Tx buffer size*/
		// This code never runs. See
		// [url]https://www.eevblog.com/forum/microcontrollers/anyone-here-familiar-with-lwip/msg4693118/#msg4693118[/url]
		while( (byteslefttocopy + bufferoffset) > ETH_TX_BUF_SIZE )
		{

			//osDelay(2); - was a buffer overwrite issue, not possible to reproduce later
			// see mod at the end of IF_HAL_ETH_TransmitFrame() which is a better fix

			// Copy data to Tx buffer - should use DMA but actually the perf diff is negligible
			#ifdef SPEED_TEST
				TopLED(true);
			#endif
			memcpy_fast( (uint8_t*)((uint8_t*)buffer + bufferoffset), (uint8_t*)((uint8_t*)q->payload + payloadoffset), (ETH_TX_BUF_SIZE - bufferoffset) );
			#ifdef SPEED_TEST
				TopLED(false);
			#endif

			/* Point to next descriptor */
			DmaTxDesc = (ETH_DMADescTypeDef *)(DmaTxDesc->Buffer2NextDescAddr);

			/* Check if the buffer is available */
			if((DmaTxDesc->Status & ETH_DMATXDESC_OWN) != (uint32_t)RESET)
			{
				errval = ERR_USE;
				goto error;
			}

			buffer = (uint8_t *)(DmaTxDesc->Buffer1Addr);

			byteslefttocopy = byteslefttocopy - (ETH_TX_BUF_SIZE - bufferoffset);
			payloadoffset = payloadoffset + (ETH_TX_BUF_SIZE - bufferoffset);
			framelength = framelength + (ETH_TX_BUF_SIZE - bufferoffset);
			bufferoffset = 0;
		}

		/* Copy the remaining bytes */
		#ifdef SPEED_TEST
			TopLED(true);
		#endif
		memcpy_fast( (uint8_t*)((uint8_t*)buffer + bufferoffset), (uint8_t*)((uint8_t*)q->payload + payloadoffset), byteslefttocopy );
		#ifdef SPEED_TEST
			TopLED(false);
		#endif
		bufferoffset = bufferoffset + byteslefttocopy;
		framelength = framelength + byteslefttocopy;
	}

	/* Prepare transmit descriptors to give to DMA */
	IF_HAL_ETH_TransmitFrame(&EthHandle, framelength);

	errval = ERR_OK;

	error:

	/* When Transmit Underflow flag is set, clear it and issue a Transmit Poll Demand to resume transmission */
	if ((EthHandle.Instance->DMASR & ETH_DMASR_TUS) != (uint32_t)RESET)
	{
		/* Clear TUS ETHERNET DMA flag */
		EthHandle.Instance->DMASR = ETH_DMASR_TUS;

		/* Resume DMA transmission*/
		//__DMB();
		EthHandle.Instance->DMATPDR = 0;	// Any value issues a descriptor list poll demand.
	}
	return errval;
}

/**
  * @brief Should allocate a pbuf and transfer the bytes of the incoming
  * packet from the interface into the pbuf.
  *
  * @param netif the lwip network interface structure for this ethernetif
  * @return a pbuf filled with the received packet (including MAC header)
  *         NULL on memory error
  */

static struct pbuf * low_level_input(struct netif *netif)
{
	struct pbuf *p = NULL, *q = NULL;
	uint16_t len = 0;
	uint8_t *buffer;
	__IO ETH_DMADescTypeDef *dmarxdesc;
	uint32_t bufferoffset = 0;
	uint32_t payloadoffset = 0;
	uint32_t byteslefttocopy = 0;
	uint32_t i=0;

	/* get received frame */

	HAL_StatusTypeDef status = IF_HAL_ETH_GetReceivedFrame(&EthHandle);

	if (status != HAL_OK)
	{
		return NULL;		// Return if no RX data
	}
	else
	{
		rxactive=true;	// set "seen rx data" flag
	}

	/* Obtain the size of the packet and put it into the "len" variable. */
	len = EthHandle.RxFrameInfos.length;
	buffer = (uint8_t *)EthHandle.RxFrameInfos.buffer;

	// Dump unwanted multicasts, unless g_eth_multi=true.
	if (should_accept_ethernet_packet(buffer, len))
	{
		/* We allocate a pbuf chain of pbufs from the Lwip buffer pool */
		p = pbuf_alloc(PBUF_RAW, len, PBUF_POOL);
	}

	// Load the packet (if not rejected above) into LWIP's buffer
	if (p != NULL)
	{
		dmarxdesc = EthHandle.RxFrameInfos.FSRxDesc;
		bufferoffset = 0;

		for(q = p; q != NULL; q = q->next)
		{
			byteslefttocopy = q->len;
			payloadoffset = 0;

			/* Check if the length of bytes to copy in current pbuf is bigger than Rx buffer size */
			// This code never runs. See
			// [url]https://www.eevblog.com/forum/microcontrollers/anyone-here-familiar-with-lwip/msg4693118/#msg4693118[/url]
			while( (byteslefttocopy + bufferoffset) > ETH_RX_BUF_SIZE )
			{
				/* Copy data to pbuf */
				#ifdef SPEED_TEST
					TopLED(true);
				#endif
				memcpy_fast( (uint8_t*)((uint8_t*)q->payload + payloadoffset), (uint8_t*)((uint8_t*)buffer + bufferoffset), (ETH_RX_BUF_SIZE - bufferoffset));
				#ifdef SPEED_TEST
					TopLED(false);
				#endif

				/* Point to next descriptor */
				dmarxdesc = (ETH_DMADescTypeDef *)(dmarxdesc->Buffer2NextDescAddr);
				buffer = (uint8_t *)(dmarxdesc->Buffer1Addr);

				byteslefttocopy = byteslefttocopy - (ETH_RX_BUF_SIZE - bufferoffset);
				payloadoffset = payloadoffset + (ETH_RX_BUF_SIZE - bufferoffset);
				bufferoffset = 0;
			}

			/* Copy remaining data in pbuf */
			#ifdef SPEED_TEST
				TopLED(true);
			#endif
			memcpy_fast( (uint8_t*)((uint8_t*)q->payload + payloadoffset), (uint8_t*)((uint8_t*)buffer + bufferoffset), byteslefttocopy);
			#ifdef SPEED_TEST
				TopLED(false);
			#endif
			bufferoffset = bufferoffset + byteslefttocopy;
		}
	}

	/* Release descriptors to DMA. This tells the ETH DMA that the packet has been read */
	/* Point to first descriptor */
	dmarxdesc = EthHandle.RxFrameInfos.FSRxDesc;
	/* Set Own bit in Rx descriptors: gives the buffers back to DMA */
	for (i=0; i< EthHandle.RxFrameInfos.SegCount; i++)
	{
		//__DMB();  - fossil code for the 32F417, apparently.
		dmarxdesc->Status |= ETH_DMARXDESC_OWN;
		dmarxdesc = (ETH_DMADescTypeDef *)(dmarxdesc->Buffer2NextDescAddr);
	}

	/* Clear Segment_Count */
	EthHandle.RxFrameInfos.SegCount =0;

	/* When Rx Buffer unavailable flag is set: clear it and resume reception */
	if ((EthHandle.Instance->DMASR & ETH_DMASR_RBUS) != (uint32_t)RESET)
	{
		/* Clear RBUS ETHERNET DMA flag */
		EthHandle.Instance->DMASR = ETH_DMASR_RBUS;
		/* Resume DMA reception */
		EthHandle.Instance->DMARPDR = 0;
	}
	return p;
}

To port this code, one needs a truly deep understanding of the ETH subsystem functionality, and of LWIP internal structure which is especially badly documented (it's a solid bit of code but the devs moved on years ago). Obviously there are people here and elsewhere who have this but AFAICT none of them do consultancy

My finished code transmits out of LWIP when LWIP has something to send (no interrupts needed) and this works because ETH is so fast that the stuff disappears down the cable in not many tens of microseconds. The receive code should use interrupts but doesn't due to various problems (complex issues with calling LWIP back end from an ISR, etc) and is instead polled in an RTOS task, with the poll interval being 10ms and then adaptively shrinking until the data stops, with performance way short of 100mbps theoretical limit but at about 1MB/sec is loads good enough for the job. I have zero interest in re-doing this for another CPU in my remaining actuarial life expectancy

There are also zero-copy versions, which may even work... But I measured the time spent in the memcpy calls (see above code) and they are insignificant.

USB code basically worked (CDC and MSC) except there was no flow control on CDC (I did that, with help here) and MSC was workable only with an ISR taking the whole 15ms FLASH write time. Various past threads. It's OK for my application but most users would regard that as unworkable except with a RAM disk. Again, the code is hugely complex and would take an expert to port to a different USB controller.

ST bought USB from Synopsys. Does anyone know who they got ETH from?

wek · « **Reply #28 on:** July 10, 2023, 10:41:48 am »

Quote from: peter-h on July 09, 2023, 08:39:46 pm

ST bought USB from Synopsys. Does anyone know who they got ETH from?

See picture.

While the OTG USB and ETH modules are of the same provenience in 'F2, 'F4, 'F7 and 'H7 (and OTG USB also in 'F105/107 and higher-end 'L4), they are of different versions with subtle but potentially problematic differences.

As I've said above, there are no "comfortably compatible" "better" variants. The 'F405/407/415/417-to-'F427/429/437/439 couple is quite an exception in the higher-end STM32, all other families' members have surprising and annoying differences breaking compatibility in some way (even within the same part number before the suffixes). The 'F2 is downwards compatible, with few but surprising differences above processor core and frequency (the 'F2 was in fact a testbed preparing technology (90nm), peripherals and fabric for the 'F4).

'F7/'H7 may be "better", but the complexities of the slightly-superscalar partially-64-bit-interfaced Cortex-M7 together with the rather complex matrix may lead to subtle but annoying problems in applications written in the spirit of micro*controller* (in contrast to the "oh it's just another computer with some tedious peripherals" approach).

'G4 as been said is a "replacement" for 'F3 (as is 'G0 for 'F0). Both 'F3 and 'F0 are built using 180nm (as is 'F1) and this is probably ST's way how to cope with potential obsolescence of this technology at the suppliers/fabs. 'G4 is built using super-duper 45nm (similar to 'H7), and it shows.

JW

Siwastaja · « **Reply #29 on:** July 10, 2023, 11:49:11 am »

Yeah, I remember we discussed about the "monday afternoons" guy, and that's really part of the problem. Working with these microcontrollers really requires good concentration, any task switching destroys the efficiency. I find that if I can take 3-4 days 10hours/day I can get a lot done, concentrating on the task at hand only, and nothing more. If I have to multitask between two projects 50-50% then the efficiency is more like 5%-20% that 20% being on the easier project. Totally wasted time. Half a day of week is entirely spent on entry/exit latency of the task switch, no actual hard work gets done at all. You need preferably at least 2-3 full days a week, adjacent days so when you wear out you can rest and continue the next morning without context switch penalty (actually the opposite, your brain was working on it unconsciously).

If you can adjust your focus perfectly (and I say again, it doesn't require being a genius, it's more about concentration and attitude than IQ or skills), then getting around mbedtls and adjust it to work is something that took around a week or two for me. I would be still totally fighting with it after 6 months if I had to multitask. I'm now multitasking with two projects and already wasted a month of everybody's time getting almost nothing done and it hurts inside so much.

nctnico · « **Reply #30 on:** July 10, 2023, 12:03:53 pm »

I agree. I also work on a project for at least a couple of days. At that point it typically gets time to let the project sink in a little bit and work on something different. That actually increases efficiency compared to banging your head against a wall.

peter-h · « **Reply #31 on:** July 10, 2023, 12:04:55 pm »

Quote

While the limitation is not that dramatic as with 'H7 and lifetime even at relatively high temperatures achieves 10 years (contrary to 2 years with 'H7), this is something to be kept in mind when designing with this family.

Bloody hell...

Quote

The 'F405/407/415/417-to-'F427/429/437/439 couple is quite an exception

Interesting that you agree, because you know this stuff. I am documenting all this in the project design document. The 439 is not useful (up from a 437) unless you want to drive a big LCD, AFAICT.

Quote

Yeah, I remember we discussed about the "monday afternoons" guy, and that's really part of the problem. Working with these microcontrollers really requires good concentration, any task switching destroys the efficiency

Yes, but that phase is complete, and now it is just me, 100% (in between running the business etc), at the office and at home.

It wasn't a good situation productivity-wise, but it was all I could afford, plus he was able to do stuff which I didn't know how to do (LWIP/TLS).

Quote

At that point it typically gets time to let the project sink in a little bit and work on something different.

I go for a mountain bike ride for 1hr. Excellent for software

I've just had a look at the 437/439 again. It has three more SPI channels, 4,5,6, which can run 2x faster than my SPI2,3, which could be handy. 42MHz would double the filesystem read speed (an Adesto FLASH which can run at something like 80MHz). But there is no way to AF-map any of 4,5,6 to the SPI2 pins I am using. It looks like lots of people complained about the old SPI2,3 supporting only half speed; daft since SPI1 can go at full speed. But SPI1 is lost of you want four UARTs. May be wrong but a lot of time was spent on working out that stuff.

nctnico · « **Reply #32 on:** July 10, 2023, 12:08:41 pm »

Yeah, doing some kind of 'sporty' activity is a super boost to get things done. It sounds grazy that not working is better than working but it is absolutely true.

wek · « **Reply #33 on:** July 10, 2023, 02:38:37 pm »

Quote from: peter-h on July 10, 2023, 12:04:55 pm

Quote
While the limitation is not that dramatic as with 'H7 and lifetime even at relatively high temperatures achieves 10 years (contrary to 2 years with 'H7), this is something to be kept in mind when designing with this family.
Bloody hell...

That's if you operate it at some 10 degrees below the absolute maximum (of internal "junction" temperature which can be quite surprisingly higher from ambient). In other words, they require to pay more attention to thermal design - it's the same issue as with PCs, for exactly the same reasons.

Quote

Quote
The 'F405/407/415/417-to-'F427/429/437/439 couple is quite an exception

The 439 is not useful (up from a 437) unless you want to drive a big LCD, AFAICT.

The 'F429 came handy when we could not get 'F427 during the Big Shortage; we gladly left the LCD controller (LTDC) unused...

Btw. you can't really drive a *big* LCD with the STM32 LTDC; the point is to spare down the external controller with small-medium LCDs. Nintendo may have been the primary customer for the STM32H7Ax/Bx.

JW

peter-h · « **Reply #34 on:** July 10, 2023, 03:43:54 pm »

Right.

I plan to drive an LCD via SPI.
https://www.eevblog.com/forum/projects/graphics-on-an-embedded-system-tft-display-need-help-with-some-pointers/msg3861806/#msg3861806
https://www.ramtex.dk/gclcd/glcd0129.htm

Apart from VGA+ displays I think 21MHz SPI is fine for text stuff and non-moving graphics.

wek · « **Reply #35 on:** July 10, 2023, 03:55:39 pm »

Quote from: peter-h on July 10, 2023, 03:43:54 pm

I plan to drive an LCD via SPI. Apart from VGA+ displays I think 21MHz SPI is fine for text stuff and non-moving graphics.

Define "fine". As usually, the devil is...

But you can try, on a cheap 'F429 Disco.

JW

peter-h · « **Reply #36 on:** July 10, 2023, 04:17:18 pm »

Ha

Yes we bought the 407 DISCO board to start this project, a few years ago. Then, like everybody else, I copied the circuit to the maximum extent possible. It comes with a big LCD, which sure enough did work, and after a day or two of messing about we managed to send a printf() to it.

But this is no good unless you have loads of spare GPIO. The original Cube IDE/MX code had the LCD memory mapped at 0x60000000 or some such. Obviously if you want to do serious LCD stuff then you need this parallel drive.

At 2MB/sec SPI I am under no illusions re performance. As I wrote I will use a library e.g.
https://www.ramtex.dk/gclcd/glcd1611.htm
and that has build options for trading perf versus available RAM.

That way, any CPU with SPI will be usable.

wek · « **Reply #37 on:** July 10, 2023, 04:36:18 pm »

It's 'F429 Disco, not 'F407 (the latter does not have display).

Quote

But this is no good unless you have loads of spare GPIO.

Thou shalt have faith. Look into the sources in the zip I linked to in that thread.

JW

peter-h · « **Reply #38 on:** July 10, 2023, 08:48:40 pm »

It does not open for me, and the previous is a .hex file.

This is from an old (modded) linkfile which came with the DISCO board

Code: [Select]

MEMORY
{
	FLASH_BOOT (rx)     : ORIGIN = 0x08000000, LENGTH = 32K
	FLASH_APP (rx)      : ORIGIN = 0x08008000, LENGTH = 1024K-32K
	RAM (xrw)           : ORIGIN = 0x20000000, LENGTH = 128K
	MEMORY_B1 (rx)      : ORIGIN = 0x60000000, LENGTH = 0K
	CCMRAM (rw)         : ORIGIN = 0x10000000, LENGTH = 64K
}

and the MEMORY_B1 is the memory mapped LCD which was an optional add-on for our board. The CPU on that board was a 407.

You can also see it e.g. here
https://github.com/EFeru/hoverboard-sideboard-hack-STM/blob/main/STM32F103C8Tx_FLASH.ld
https://community.st.com/t5/stm32-mcu-products/where-in-cube-ide-are-the-linker-options/td-p/172416

It's a bit sad when one keeps finding one's own posts on google, and practically no other useful replies, but this shows how much of a learning curve I have gone up in last 2 years.

tszaboo · « **Reply #39 on:** July 10, 2023, 09:56:44 pm »

Quote from: hans on July 09, 2023, 03:37:18 pm

Quote from: peter-h on July 09, 2023, 04:26:09 am
Quote
Then at least you don't need different drivers for each series

How is that possible?

Speaking of the H7, this is a way to have lots of fun
https://community.st.com/t5/stm32-mcus/dma-is-not-working-on-stm32h7-devices/ta-p/49498
Standard stuff for the Cortex-M7 with D-caches. You need to flush or discard cache lines correctly to sync up your CPU memory perspective on SRAM locations, and the actual memory that got changed by hardware. If I program on M7 and need to get something work fast, then I keep these caches disabled at first.. but it'll be a considerable performance penalty.

I don't think the G series are a replacement of F4 series I think. If any, it is more of a replacement for the F3 series. I think the G series has a similar high resolution PWM peripheral, and it has a faster M4 CPU for things like motor, DC/DC, etc. control loops. No idea why they call them "mainstream" devices, because it looks pretty niche to me.
Perhaps the only advantage in price is only temporarily, since they will surely work as general purpose devices, but once its locked into more designs and also gets older, it will inevitably become more expensive to keep on buying.

I don't see a problem with using the F407 if it just works. Well, as long as you can get them, because its the main workhorse part of ST that got them big (along with the F103) in the ARM space. So it's good to investigate part alternatives, but that's still much different than a complete redesign.

The G4 series is indeed a replacement for the F3, it's even on their website somewhere. It's motor control specific. My favorited part has been the F411 and the F103, didn't really see a reason in my projects to use anything in-between.
Now on my last project, I've used a tiny STM32G0 and was very pleasantly surprised by it. It only needed 2 power supply pins (take note of that raspberry foundation), that BOOT0, BOOT1 pins are gone, so with dedicated programming pins, you only loose 5 pins out of the 20 on that package. Plus the entire IC used like 5mA at full speed, that's about 1/4th of the comparable but slower STM32F0. It really has a few nice features for an IC that costs less than a dollar.
I think they are going to make a G1 series, and these can become very popular MCUs.

SiliconWizard · « **Reply #40 on:** July 10, 2023, 10:07:24 pm »

The G0 is pretty good indeed.

nctnico · « **Reply #41 on:** July 10, 2023, 10:13:10 pm »

Until you start reading the documentation... So far I have found the ADC and internal flash documentation to be incorrect. The UART has issues as well when receiving a stream of data back-to-back from a lightly distorded source. As if a bit offset error is accumulating which shouldn't happen.

peter-h · « **Reply #42 on:** July 11, 2023, 07:30:54 am »

I am not really looking for a compatible downgrade

The most it would save is maybe 2 quid.

tszaboo · « **Reply #43 on:** July 11, 2023, 09:06:42 am »

Quote from: nctnico on July 10, 2023, 10:13:10 pm

Until you start reading the documentation... So far I have found the ADC and internal flash documentation to be incorrect. The UART has issues as well when receiving a stream of data back-to-back from a lightly distorded source. As if a bit offset error is accumulating which shouldn't happen.

I wouldn't know, I just changed the target MCU and the pin definitions, recompiled it and the code was working. If you seen errors, report it back and when they confirm it then it probably makes it into the errata, and maybe it even gets fixed the next version. It doesn't feel like an Infineon MCU or a PIC32MZ riddled with dozens of design breaking errata to me (famous PIC32MZ errata for their 18MSPS quick ADC: The ADC doesn't work at all).

AndyC_772 · « **Reply #44 on:** July 11, 2023, 09:15:47 am »

I started a project with a G0 earlier this year. Found it really surprisingly slow to do some simple integer arithmetic and immediately upgraded to a G4 instead, which was definitely time and effort well spent.

I'm still struggling to see anything that's an actual upgrade from the F405 and its immediate siblings, though - although the F765 is now, finally, back in stock and useful for those projects that need a whole new level of performance.

(Now, where do all those DSB / ISB instructions need to go again?)

nctnico · « **Reply #45 on:** July 11, 2023, 09:52:28 am »

Quote from: tszaboo on July 11, 2023, 09:06:42 am

Quote from: nctnico on July 10, 2023, 10:13:10 pm
Until you start reading the documentation... So far I have found the ADC and internal flash documentation to be incorrect. The UART has issues as well when receiving a stream of data back-to-back from a lightly distorded source. As if a bit offset error is accumulating which shouldn't happen.
I wouldn't know, I just changed the target MCU and the pin definitions, recompiled it and the code was working. If you seen errors, report it back and when they confirm it then it probably makes it into the errata, and maybe it even gets fixed the next version. It doesn't feel like an Infineon MCU or a PIC32MZ riddled with dozens of design breaking errata to me (famous PIC32MZ errata for their 18MSPS quick ADC: The ADC doesn't work at all).

Then you are using the HAL (which also has compatibility issues BTW). But I want to get rid of the HAL because the HAL requires you to compile with the compiler option that puts each function in a seperate section so the linker removes any unused functions / section. This gets in the way of using inline functions (speed optimisation) and other functions / symbols that need to be kept. It is a hot mess...

peter-h · « **Reply #46 on:** July 11, 2023, 10:10:48 am »

Quote

I'm still struggling to see anything that's an actual upgrade from the F405 and its immediate siblings

See above - 417 then 437/439.

Quote

where do all those DSB / ISB instructions need to go again

If you believe the famous guy on the ST forum, this is all BS for the F4 series. Except here, apparently

Code: [Select]


static inline void B_reboot(void)
{
	// Ensure all outstanding memory accesses including buffered write are completed before reset
	__ASM volatile ("dsb 0xF":::"memory");
	// Keep priority group unchanged
	SCB->AIRCR  = (uint32_t)((0x5FAUL << SCB_AIRCR_VECTKEY_Pos) |
	                           (SCB->AIRCR & SCB_AIRCR_PRIGROUP_Msk) |
	                            SCB_AIRCR_SYSRESETREQ_Msk    );
	__ASM volatile ("dsb 0xF":::"memory");
	// wait until reset
	for(;;)
	{
		__NOP();
	}
}

Quote

because the HAL requires you to compile with the compiler option that puts each function in a seperate section so the linker removes any unused functions / section. This gets in the way of using inline functions (speed optimisation) and other functions / symbols that need to be kept. It is a hot mess...

I didn't notice this problem; auto unreachable code removal is a great feature. The "HAL" stuff contains a lot of code which, even if you rewrite it all, is still very useful. The alternative is to rewrite everything according to the RM, which will take you years to get anything nontrivial to work. I tend to use HAL code as a guide and if I use a whole function I strip it down to what is actually needed.

nctnico · « **Reply #47 on:** July 11, 2023, 10:26:09 am »

Automatic unused function removal may seem great at the start but if you dig a little deeper, it is not without its problems (like not being able to use inline functions efficiently; I have not found an easy fix for that).

The problem is that the HAL is rather obfusticated at places where it seems to work more by luck than by design. So even if you rewrite the HAL code, you may end up with code that doesn't work for some undocumented reason. So typically I end up stripping all unneeded parts from the HAL to reduce code size in case rewriting is too much work. For some things the way the HAL works doesn't fit my intended use (or it is too bloated) so there is no other option than to write my own hardware driver. It is not uncommon to replace 30+ lines of HAL code with writing / modifying two or three registers.

AndyC_772 · « **Reply #48 on:** July 11, 2023, 10:40:16 am »

@peter-h: you need ISB / DSB instructions in code running on M7 which requires memory accesses to happen in a particular order. Without them you can get into situations where the expected flow of operations (ie. do this, and then do that) doesn't match the order in which things actually do happen. It's important when doing direct hardware access, eg. enable a DMA controller, then read its 'busy' bit to check if it's still running.

@nctnico: I completely agree about the HAL, it's an obfuscation layer, not an abstraction layer. You still need to RTFM, you still need to understand the underlying hardware in detail, and code isn't directly portable between CPUs with differing peripherals.

peter-h · « **Reply #49 on:** July 11, 2023, 11:05:56 am »

IMHO it would be an unusual case where one needed inline code on a 168MHz CPU. In most cases where a higher speed is needed, a new look at the job is better. Many years ago, in the bad old days of IAR Z180 C, I was coding a lot of stuff in asm but really it was an architecture issue (like writing a totally specific version of scanf for parsing xxx.xxxx numbers; you use two atoi() etc). Similarly upgrading a 168MHz chip to a 480MHz one, with a massive R&D cost hit, not to mention the company component stock profile hit, is quite probably not needed, against optimising the key bits.

I've been coding embedded since 1980 and find the 32F4 execution speed really awesome. 10x to 100x faster than anything before. Or 1000x if you use single floats.

The HAL functions are stupidly written but I think it was done to make the MX code generator less granular i.e. less work to code it. So on e.g. a UART function they check the config reg to see if you enabled CRC, rather than have another config option in MX for CRC!!! It is totally stupid but nobody can say it doesn't actually work, because that bit test is well under a microsecond.

Almost every time I use some HAL function I end up stripping it down, but having it saves me days or weeks. You can check it against the reg descriptions and that is useful for learning how it hangs together.

However I believe the ST code has some timing dependencies which are accidentally taken care of by the bloat, and that's nasty. However, no different to finding some code on github. Much of it is dud anyway, and a lot of it worked on some 16MHz Atmel chip and totally does not work on a 168MHz F4. Especially when driving FLASH chips and /CE timing...

Yes I know about H7/M7 etc needing ISB/DSB but on the F4 it was declared to be pointless. Except

Code: [Select]

		// Enter CPU standby mode. Some is out of HAL_PWR_EnterSTANDBYMode(). There is some
		// weird stuff around the shutdown code e.g.
		// [url]https://www.eevblog.com/forum/microcontrollers/how-fast-does-st-32f417-enter-standby-mode/msg4062652/#msg4062652[/url]
		// Also [url]https://community.st.com/s/question/0D73W000001nnJOSAY/detail[/url]
		// That guy is nasty but knows the topic.

		PWR->CSR &= ~PWR_CSR_EWUP;				// disable WKUP pin, just in case
		PWR->CR |= PWR_CR_CWUF | PWR_CR_PDDS;	// select standby mode
		(void)PWR->CR;							// readback to ensure the bit is set before next
		SCB->SCR |= SCB_SCR_SLEEPDEEP_Msk;		// deep sleep
		DBGMCU->CR = 0; 						// Disable debug, trace and IWDG in low-power modes

		// CPU clock stops, DAC outputs float, etc
		// ** Code after this point is not executed (checked by waggling GPIO) **
		// But sometimes __WFI can fail - if e.g. there is an interrupt pending
		// e.g. USB serial FLASH read interrupt can be blocked for 200-300us. Hence the loop.

		for (;;)
		{
			__DSB();
			__WFI();
		}

		// Should never get here, but if we did, we want to start with a freshly
		// initialised unit.

		reboot();

and more in the ETH code (I removed all those).

The HAL stuff dovetails with the Cube MX concept and their -DISCO boards, to get people started quickly, flashing a LED (which would otherwise take a non-expert a week plus) and even funky stuff like getting an HTTP server serving a fixed page. But moving on from these silly "demos for middle management" is hard work, as I know having been doing that the last 2 years.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: Is there a way to find STM 32F CPUs which are upwards compatible with 32F417? (Read 6978 times)

Share me