Author Topic: Looking for help to debug 32F4 ethernet  (Read 1715 times)

0 Members and 1 Guest are viewing this topic.

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 4143
  • Country: gb
  • Doing electronics since the 1960s...
Looking for help to debug 32F4 ethernet
« on: July 25, 2024, 05:52:34 pm »
I posted the background here
https://www.eevblog.com/forum/programming/weird-strcat()-behaviour/

Now I have narrowed it down to the ETH controller in the 32F417 not seeing any incoming data. This happens if a certain #included text string is defined as static const rather than just const. Yes this is totally bizzarre but I have to start somewhere.

I can step through the ETH initialisation and that all seems to be OK. I haven't read it back to check every bit is correctly set - there are so many. The LAN8742 PHY also gets initialised via the "64 bit UART" interface.

But is there some sort of low hanging fruit - a particular register which controls whether an incoming packet will get notified?

For example this function is never seeing any data

Code: [Select]

HAL_StatusTypeDef IF_HAL_ETH_GetReceivedFrame(ETH_HandleTypeDef *heth)
{
uint32_t framelength = 0U;

/* Check if segment is not owned by DMA */
/* if (((heth->RxDesc->Status & ETH_DMARXDESC_OWN) == (uint32_t)RESET) && ((heth->RxDesc->Status & ETH_DMARXDESC_LS) != (uint32_t)RESET)) */
//__DMB();
if(((heth->RxDesc->Status & ETH_DMARXDESC_OWN) == (uint32_t)RESET))
{
/* Check if last segment */
if(((heth->RxDesc->Status & ETH_DMARXDESC_LS) != (uint32_t)RESET))
{
/* increment segment count */
(heth->RxFrameInfos).SegCount++;

/* Check if last segment is first segment: one segment contains the frame */
if ((heth->RxFrameInfos).SegCount == 1U)
{
(heth->RxFrameInfos).FSRxDesc =heth->RxDesc;
}

heth->RxFrameInfos.LSRxDesc = heth->RxDesc;

/* Get the Frame Length of the received packet: substruct 4 bytes of the CRC */
framelength = (((heth->RxDesc)->Status & ETH_DMARXDESC_FL) >> ETH_DMARXDESC_FRAMELENGTHSHIFT) - 4U;
heth->RxFrameInfos.length = framelength;

/* Get the address of the buffer start address */
heth->RxFrameInfos.buffer = ((heth->RxFrameInfos).FSRxDesc)->Buffer1Addr;
/* point to next descriptor */
heth->RxDesc = (ETH_DMADescTypeDef*) ((heth->RxDesc)->Buffer2NextDescAddr);

/* Return function status */
return HAL_OK;
}
/* Check if first segment */
else if((heth->RxDesc->Status & ETH_DMARXDESC_FS) != (uint32_t)RESET)
{
(heth->RxFrameInfos).FSRxDesc = heth->RxDesc;
(heth->RxFrameInfos).LSRxDesc = NULL;
(heth->RxFrameInfos).SegCount = 1U;
/* Point to next descriptor */
heth->RxDesc = (ETH_DMADescTypeDef*) (heth->RxDesc->Buffer2NextDescAddr);
}
/* Check if intermediate segment */
else
{
(heth->RxFrameInfos).SegCount++;
/* Point to next descriptor */
heth->RxDesc = (ETH_DMADescTypeDef*) (heth->RxDesc->Buffer2NextDescAddr);
}
}

/* Return function status */
return HAL_ERROR;
}

This line
if(((heth->RxDesc->Status & ETH_DMARXDESC_OWN) == (uint32_t)RESET))
never becomes True.

I am suspecting that initialised constants are getting corrupted somewhere. Or maybe there is a buffer alignment issue with DMA but I have checked hopefully everything. It is 4-aligned.

The rx is polled from an RTOS task; it is not interrupt-driven.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline glenenglish

  • Frequent Contributor
  • **
  • Posts: 458
  • Country: au
  • RF engineer. AI6UM / VK1XX . Aviation pilot. MTBr
Re: Looking for help to debug 32F4 ethernet
« Reply #1 on: July 25, 2024, 08:46:38 pm »
Do you have link up and link down notifications , and negotiation information from the MDIO interface ?
What happens if you try and send a packet ?

 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 4143
  • Country: gb
  • Doing electronics since the 1960s...
Re: Looking for help to debug 32F4 ethernet
« Reply #2 on: July 25, 2024, 09:31:38 pm »
Yes I do have Link Up/Down detection.

There is this macro for Link detect

#define netif_is_link_up(netif) (((netif)->flags & 0x04U) ? (u8_t)1 : (u8_t)0)

From stepping through I don't ever see Link Up. But doesn't LU need some incoming data? This isn't like USB where you can sense VBUS (for a simple check). You need a cable going to a switch sending out some packets. I have that connection.

Obviously this is pretty basic. I am suspecting an issue with initialised data (whether const, or copied flash -> ram) so the init is not happening correctly.

Putting that static const string at the end of another very similar file containing another static const string doesn't bomb it, suggesting that any movement in memory addresses is not the problem.


« Last Edit: July 25, 2024, 09:46:19 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline glenenglish

  • Frequent Contributor
  • **
  • Posts: 458
  • Country: au
  • RF engineer. AI6UM / VK1XX . Aviation pilot. MTBr
Re: Looking for help to debug 32F4 ethernet
« Reply #3 on: July 26, 2024, 04:19:37 am »
If you dont have link up, you wont get any packets.....
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 4143
  • Country: gb
  • Doing electronics since the 1960s...
Re: Looking for help to debug 32F4 ethernet
« Reply #4 on: July 26, 2024, 07:05:21 am »
Link Up must be the ETH subsystem receiving some packet and setting a register bit
(netif)->flags & 0x04U
Since the cable etc is still there, this must be a bad initialisation of the subsystem (the 32F or the LAN8742).

I will try to probe the LAN8742 "UART" interface.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline glenenglish

  • Frequent Contributor
  • **
  • Posts: 458
  • Country: au
  • RF engineer. AI6UM / VK1XX . Aviation pilot. MTBr
Re: Looking for help to debug 32F4 ethernet
« Reply #5 on: July 26, 2024, 10:32:40 am »
Have a read up on how the PHY protocol works on the MDIO. read about negotiation, advertising etc.

Link up is not the ETH subsytem receiving a packet.

 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 4143
  • Country: gb
  • Doing electronics since the 1960s...
Re: Looking for help to debug 32F4 ethernet
« Reply #6 on: July 26, 2024, 02:33:14 pm »
Fortunately I am changing the direction of this because I found that (of ETH and USB both getting buggered by the use of a "const" on a string, GCC v11) it buggers up USB even if all the ETH code is stripped out.

It is 99% sure that whatever the problem is, it is nothing to do with ETH (unless it is a buffer alignment issue or some such).

Unfortunately USB is also mega complex. I am suspecting peripheral initialisation is going wrong, in particular some variable like
int fred = 5;
getting messed up.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline ajb

  • Super Contributor
  • ***
  • Posts: 2735
  • Country: us
Re: Looking for help to debug 32F4 ethernet
« Reply #7 on: July 26, 2024, 04:06:24 pm »
Link Up must be the ETH subsystem receiving some packet and setting a register bit

The link is established at the PHY level, not the MAC level (which is the part that's built into the MCU).  If their pins are strapped correctly, two PHYs can negotiate a link with no MACs/MCUs involved at all.  The host can determine if the link is up (and at what speed, etc) by querying the PHY's status register through the SMI, which is (usually) the MDIO/MDC interface (but can also be an SPI interface, especially if the PHY is actually a whole switch with many ports plus switching behavior to configure).

Assuming the STM32F4 has the same MAC hardware as the STM32F7 series, one thing to note is that the MAC must be prompted to scan the descriptor list to see if any are available to receive into.  If it scans the list and finds no descriptors it owns, it goes into idle mode and won't receive anything.  If things get initialized out of order, it's possible for the MAC to scan the list, find no descriptors available, and then go idle before the descriptors are fully initialized.  This also means your polling function (or the one provided by HAL) must prompt the MAC to rescan the descriptor list after the buffers are consumed, handled, and returned to the MAC's ownership.
 

Offline ejeffrey

  • Super Contributor
  • ***
  • Posts: 3932
  • Country: us
Re: Looking for help to debug 32F4 ethernet
« Reply #8 on: July 27, 2024, 01:50:39 am »
It sure sounds like a buffer overflow.  Probably by one or maybe two bytes.  Changing the size or storage location of specific variables can affect the number of padding bytes between subsequent fields.  If you overflow into a padding byte it's probably harmless, if you overflow into another data structure, bad news.

Write overflows are the most obvious but a read overflow can also cause problems since padding bytes are usually initialized to null. 

I would look in the map file and find the variable whose size change triggers the bug, then look at the nearby objects to see if one looks like a good culprit for an overflow. 

One thing you could do is take a "working" build and find the locations of every padding byte in the data section.  Write random data into the padding bytes then run and see if any get overwritten, or if the build now crashes.
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 4143
  • Country: gb
  • Doing electronics since the 1960s...
Re: Looking for help to debug 32F4 ethernet
« Reply #9 on: July 27, 2024, 08:12:10 am »
The curious thing is that it affects USB and ETH, while other 20+ RTOS tasks continue running.

I've just realised that the one thing these two have in common is the generation of special clocks. USB uses 48MHz and ETH uses 50MHz. But the 50MHz comes from the PHY chip...

Both also use DMA and that does need aligned addresses (unless used in byte mode). I have a lot of stuff like this
static uint8_t cdc_receive_temp_buffer[64]  __attribute__ ((aligned (4)));
and there is another syntax
ALIGN_BEGIN static uint8_t USBD_MSC_CfgHSDesc[USB_MSC_CONFIG_DESC_SIZ]  __ALIGN_END =
The latter uses a macro

Code: [Select]
/* In HS mode and when the DMA is used, all variables and data structures dealing
   with the DMA during the transaction process should be 4-bytes aligned */

#if defined   (__GNUC__)        /* GNU Compiler */
  #define __ALIGN_END    __attribute__ ((aligned (4)))
  #define __ALIGN_BEGIN
#else
  #define __ALIGN_END
  #if defined   (__CC_ARM)      /* ARM Compiler */
    #define __ALIGN_BEGIN    __align(4)
  #elif defined (__ICCARM__)    /* IAR Compiler */
    #define __ALIGN_BEGIN
  #elif defined  (__TASKING__)  /* TASKING Compiler */
    #define __ALIGN_BEGIN    __align(4)
  #endif /* __CC_ARM */
#endif /* __GNUC__ */

I can change the failure by moving that code

Code: [Select]
// Function returns a pointer to the customer application string.
// This is the customer application name - max 32 bytes

#include "appname.ini"

char * get_appname(void)
{
return (char*) (appnamestring);
}

around within the project, but ETH+USB do both fail together.

In fact just changing the size of the name string also controls it (3 bytes breaks it, any more makes it run). So I have a very easy way to break it or not :)

More weird is that this has been in development for years (I have some 400 snapshots backed up, over just 2 years) and has never done anything "funny". One thing I looked at again was the linkfile...
« Last Edit: July 27, 2024, 10:02:30 am by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 4143
  • Country: gb
  • Doing electronics since the 1960s...
Re: Looking for help to debug 32F4 ethernet
« Reply #10 on: July 27, 2024, 03:24:02 pm »
I've fixed it, by stepping back through previous versions. Can anyone see a problem with this linkfile syntax? The purpose is to first collect DATA from the 1st module (main.o) and then collect DATA from the remaining modules. There is no particular reason for doing that other than it is neater in the .map file

Code: [Select]

/* Initialized data sections for non boot block code. These go into RAM. LMA copy is loaded after code. */
/* This stuff is copied from FLASH to RAM by C code in the main stub */
/* This is stuff like int fred = 1; which is copied flash -> ram (not const) */

. = ALIGN(4);

/* main.c DATA */
  .main_data :
  {
    . = ALIGN(4);
    _s_nonboot_data = .;        /* create a global symbol at data start */
    *main.o (.data .data*)      /* .data sections */
    . = ALIGN(4);
  } >RAM  AT >FLASH_APP

/* Remaining DATA */
._other_data :
  {
    . = ALIGN(4);
    *(.data .data*)      /* .data sections */
      . = ALIGN(4);
    _e_nonboot_data = .;        /* define a global symbol at data end */
  } >RAM  AT >FLASH_APP

  /* used by the main stub C code to initialize data */
  _si_nonboot_data = LOADADDR(.main_data);


Previously it was done all in one go and that works fine

Code: [Select]

.all_nonboot_data :
  {
    . = ALIGN(4);
    _s_nonboot_data = .;        /* create a global symbol at data start */
    *(.data .data*)      /* .data sections */
      . = ALIGN(4);
    _e_nonboot_data = .;        /* define a global symbol at data end */
  } >RAM  AT >FLASH_APP

  /* used by the main stub C code to initialize data */
  _si_nonboot_data = LOADADDR(.all_nonboot_data);

Obviously it would be good to know what the problem is in the first one, but the linkfile syntax is horrible and there is no useful error reporting; you just get a line # because real men don't need anything else :)

I did always suspect the issue was with DATA not getting initialised correctly in RAM from its FLASH image.

The code in startup.s which sets up the initialised data is the standard ST code

Code: [Select]
/* Copy the boot block initialised data from FLASH to SRAM */
  movs  r1, #0
  b  LoopCopyDataInit
CopyDataInit:
  ldr  r3, =_si_boot_data
  ldr  r3, [r3, r1]
  str  r3, [r0, r1]
  adds  r1, r1, #4
LoopCopyDataInit:
  ldr  r0, =_s_boot_data
  ldr  r3, =_e_boot_data
  adds  r2, r0, r1
  cmp  r2, r3
  bcc  CopyDataInit

Elsewhere I did it in C, with memcpy() etc but in this case I left the old startup code.
« Last Edit: July 28, 2024, 08:17:05 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline Siwastaja

  • Super Contributor
  • ***
  • Posts: 8888
  • Country: fi
Re: Looking for help to debug 32F4 ethernet
« Reply #11 on: July 27, 2024, 03:44:17 pm »
I have narrowed it down

As others have repeatedly said, you have not narrowed down anything. You are probably seeing effects of memory corruption, and any random change in code affects what exactly breaks at that time. The problem isn't where it breaks down. Therefore, usual strategy of narrowing down exact place of breakage and then trying to modify that place is wasted time, because the reason is totally elsewhere.

But the place which breaks can still act as a hint.

I suggest things like:
* Look at static code analysis tools
* List symbol addresses and look what is close to the weirdly behaving variable, for over/underindexing
* do text search on memcpy and go through every single one of them, carefully checking the addresses and size parameters
* same for array indexing []

This is all tedious and slow work, but one thing is sure: wasting time in things that definitely are not the problem is 100% waste of time.
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 4143
  • Country: gb
  • Doing electronics since the 1960s...
Re: Looking for help to debug 32F4 ethernet
« Reply #12 on: July 27, 2024, 03:54:27 pm »
What do you think of that linkfile code?
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline Doctorandus_P

  • Super Contributor
  • ***
  • Posts: 3901
  • Country: nl
Re: Looking for help to debug 32F4 ethernet
« Reply #13 on: July 29, 2024, 07:15:23 am »
I agree with Siwastaja. Bugs that appear and disappear by shifting things around in memory (variable declaration, linkfile settings) are often a result of memory corruption. An uninitialized pointer writing to an already occupied position. Writing to one cell beyond an array size. That sort of thing. This means you can not find such bugs by examining the variables / memory that is being corrupted, but you have to figure out from where that memory is being overwritten.

I once spent 8 work days narrowing down a bug like this. My job was not to find the bug itself, but to delete sections of code while still being able to reproduce the bug. In those 8 days I managed to delete around 90% to 95% of the code. After that I got stuck and a colleague with more experience took over. With help of a logic analyzer (with built in dissassembler) he was able to figure out a few days later that it was a pointer that was only capable to iterate over an 64k byte memory area being used in a bigger chunk of memory, and it was fixed by changing a compiler setting for a particular pointer size.
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 4143
  • Country: gb
  • Doing electronics since the 1960s...
Re: Looking for help to debug 32F4 ethernet
« Reply #14 on: July 29, 2024, 07:48:43 am »
Quote
In those 8 days I managed to delete around 90% to 95% of the code

Yeah - doing the same here. Removed rather more than 95% and can still replicate it (on the old code - I decided to have a closer look). I am still convinced it is a linkfile issue.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 4143
  • Country: gb
  • Doing electronics since the 1960s...
Re: Looking for help to debug 32F4 ethernet
« Reply #15 on: July 29, 2024, 03:29:18 pm »
Found it... those ALIGN 4 directives create gaps (obviously) and these create an out of sync situation between the FLASH based data and the RAM addresses to which the FLASH based data is getting copied to at startup.

Teaches one to not be too clever and try to order modules sensibly! Although one can still achieve that quite simply as I posted here
https://www.eevblog.com/forum/programming/a-specific-q-on-gcc-linker-script-syntax/

Align directives are implemented correctly within modules but if you have multiple sections in a linkfile (for multiple modules) then sync is lost. This is rather weird but probably does not catch out many people because who would bother to do it that way.
« Last Edit: July 29, 2024, 04:02:01 pm by peter-h »
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 
The following users thanked this post: Siwastaja


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf