Author Topic: 32F4 hard fault trap - how to track this down? (Read 11343 times)

ataradov · « **Reply #25 on:** July 23, 2022, 04:46:19 pm »

It is not always possible to know the exact instruction. Some faults are imprecise. And based on the information in the post #13, there is no valid information saved, the fault is secondary to something else going on.

It is hard to suggest something useful when a lot of other people suggesting other stuff.

Jeroen3 · « **Reply #26 on:** July 23, 2022, 04:51:12 pm »

In peters last post the fault address pointers to the memory location of an assertion ready to be printed.

ataradov · « **Reply #27 on:** July 23, 2022, 04:57:37 pm »

There is assertion text in the memory, but I don't see any pointers that it is ready to be printed.

But if that is a suspicion in any way, then assertion routine should be changed to while (1) and breakpoint set there. But based on all the previous information, I don't think it is.

peter-h · « **Reply #28 on:** July 23, 2022, 05:29:12 pm »

Quote

Have you enabled "halt on exception" in the debug startup settings yet?

That is checked, but greyed-out.

It does halt on exceptions. That is how I got that data.

Quote

There is assertion text in the memory

What is "assertion text"?

Some of the STM libs use "assert..." all over the place. I don't use it.

cv007 · « **Reply #29 on:** July 23, 2022, 07:15:09 pm »

Quote

Some of the STM libs use "assert..." all over the place. I don't use it.

lwip has its own assert settings, under arch settings such as-

https://github.com/STMicroelectronics/stm32_mw_lwip/blob/master/system/arch/cc.h

and if this is the same as what you have, it would explain the assertion text in your mcu. I would assume that define could be changed to use the 'general' assert macro so you can enable/disable all asserts in one place. I didn't dig through all the defines to figure out if there is another way in lwip to disable its assertions, but its possible this is the only place it can be done.

Quote

I added this to my existing HardFault-handler. In my production code the while(1) becomes replaced by a define, which is either an eternal loop for debugging or an immediate system restart request for production use.

Another option is to save the data in a set-aside section of ram, so you can still keep going (reset) but also have a little data to look at (check out any previous exception).

I have a section of ram set aside in the linker script to store debug data (so its in a fixed location where address is always known)- in startup code an errorFunc is used to dump the stack registers to 'debugram', then the mcu is reset. The errorFunc address gets populated in all unused vectors (vectors in ram), so any unhandled exception results in a dump of some stack info plus a reset. When I get back to main, I can deal with the data if wanted. I only have a uart as debugger for this mcu (by choice), so when I get back to main I can dump out the data via uart. Not as good as stopping at the exception to take a look around, but its better than nothing and can always be left 'enabled'. I have only used this a couple times, and it did help, but any exceptions I create are usually simple errors in unaligned access (which are also the type where you already have a good idea where to look- your most recent code).

https://github.com/cv007/NUCLEO32_G031K8_B/blob/main/startup.cpp

nctnico · « **Reply #30 on:** July 23, 2022, 07:43:21 pm »

Quote from: ataradov on July 22, 2022, 10:52:35 pm

Here is an example of an instrumented HF handler:

Another option is to get the stackpointer (for example using: asm("mov %0, sp \n\t" : "=r"(reg_sp) ); ) and print the values from the stackpointer and up using formatting routines that write data to a UART directly. When the hardfault handler is entered, the registers are pushed onto the stack as well. I typically have this method print the last 12 stack entries (32 bit words) and that allows me to get a good idea where the hardfault occured (RAM and ROM addresses used).

peter-h · « **Reply #31 on:** July 23, 2022, 07:58:53 pm »

Cube can set a watchpoint on a Write to an address but there seems to be no way to specify data=0x00000000. Unless the Condition box does that but I cannot find any documentation on it.

uer166 · « **Reply #32 on:** July 23, 2022, 08:07:26 pm »

Are you by any chance using printf() with float format with FreeRTOS? I've had a similar issue, printf("%f") calls malloc() which happens to be broken on ST's provided FreeRTOS.

peter-h · « **Reply #33 on:** July 23, 2022, 08:19:49 pm »

Yes, but see above. I tested this with a breakpoint on malloc(). Only known code hits it.

I am using newlib-nano (no idea of why this config was chosen; someone else set this project up 2-3 years ago and I started working on it 1-2 years ago)

There is stuff on google about newlib and such e.g. https://nadler.com/embedded/newlibAndFreeRTOS.html but I have found zero evidence of %f using malloc(). Unless it is calling a different malloc()

There is a FreeRTOS malloc() and then there is an LWIP malloc() (called mem_malloc). But why should a GNU C printf be calling those?

I do have one more idea up my sleeve:
https://community.st.com/s/question/0D50X0000BOtfhnSQB/how-to-make-ethernet-and-lwip-working-on-stm32
and the first bold link there (the need for __DMB in various places in the low level ETH DMA code). I am just not sure where to do this and the writer "Piranha" is not generally contactable. He does know absolutely everything though - if you can get him to tell you

I can find the ETH DMA "OWN" bit ops, but not sure about the rest. If that is the cause, it will never be found by debugging... This fix does need to be done.

Finally, the RTOS thread which causes this crash is a simple http server, which has a 1Hz auto refresh on one of the pages. With that page not up, the server is of course dormant, and no crashing happens. The issue could of course be elsewhere (e.g. this task is the only one calling the LWIP netconn API) but this task does have a printf %f in it. I can easily remove that and I will do that tomorrow.

uer166 · « **Reply #34 on:** July 23, 2022, 08:30:29 pm »

Quote from: peter-h on July 23, 2022, 08:19:49 pm

But why should a GNU C printf be calling those?

printf() calls malloc if the format includes a floating point variable. It doesn't normally on ints, strings, hex, etc.

A quick test might be to uncheck that "use float with printf" options and see if you get a different failure, if any.

peter-h · « **Reply #35 on:** July 23, 2022, 08:42:51 pm »

Quote

printf() calls malloc if the format includes a floating point variable

Is that a definite in GCC? I know for a fact that it is not mandatory because I have programmed in C nearly 30 years ago (Z180) and had the printf sources (Hitech Z180 compiler) and they didn't do any of that. One didn't have heaps in those days

My feeling, with due respect, is that this is another "internet myth".

Unless this printf library is somehow integrated with FreeRTOS and uses its malloc? That malloc is supposed to be mutexed. AFAIK nothing in my project, other than FR, uses that heap. That heap lives entirely within the FR memory block, and is used to allocate blocks to the various tasks. This is my graphical viewer for the 64k CCM block used for this:

Yellow=unused Green=task stack
$100 on freelancer.com to get that written

It picks up a 64k file which I generate.

What is the meaning of t "use float with printf" ? IIRC, if you uncheck these, you get smaller libs but you can't use %f.

SiliconWizard · « **Reply #36 on:** July 23, 2022, 09:20:16 pm »

This is not GCC per se, this is newlib.

You can have a look at the source code: https://sourceware.org/git/?p=newlib-cygwin.git;a=tree;f=newlib/libc/stdio;h=0f5e4dd0dc465029d0b6c0a5d03fc2cc70e8df87;hb=HEAD
(it'll take a while though as it's a cascade of function calls and conditional compilation.)

Unless FreeRTOS redefines printf() and the like, those functions will come from newlib, and *can* call malloc() in some situations. Not the only functions from the std lib that can do this either. IIRC, strtok(), for instance, will also call malloc().

While it's a common conception that newlib in 'nano' mode will call malloc() from printf() calls only if you enable float format, I can't guarantee you for sure this is the only case.
The official doc is here: https://sourceware.org/newlib/libc.html
but I haven't found much info on the above point there.

At the moment, I don't know where the newlib documentation about such details as the exact differences between nano and non-nano, what calls malloc(), and so on, is. I mean, the official doc. There are numerous blog articles, forum threads, and so on, about that, but I can't find the official doc, which is why I kinda had to "reverse-engineer" this reading the source code and looking at what functions are linked in the final object code. If anyone can point you(/us) to any such official doc, that'll probably be helpful. Otherwise, as you said, unless you dig yourself, you never really know if the info you find is true or if it's a myth.

peter-h · « **Reply #37 on:** July 23, 2022, 09:41:52 pm »

Hmmm, you guys are not wrong! I traced through this (got no sources for sprintf)

sprintf(adcString, "ADC1: %d +5V rail: %4.2fV ADC2: %d +3.3V rail: %4.2fV", adc1,adcv1,adc2,adcv2);

But the malloc didn't actually get called. It may be conditional on some config option. I will take another look tomorrow morning.

In the .map file

Code: [Select]

 .text._malloc_r
                0x0000000008042728      0x478 c:/st/stm32cubeide_1.10.1/stm32cubeide/plugins/com.st.stm32cube.ide.mcu.externaltools.gnu-tools-for-stm32.10.3-2021.10.win32_1.0.0.202111181127/tools/bin/../lib/gcc/arm-none-eabi/10.3.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/hard\libc.a(lib_a-mallocr.o)
                0x0000000008042728                _malloc_r
 .text.memcmp   0x0000000008042ba0       0x20 c:/st/stm32cubeide_1.10.1/stm32cubeide/plugins/com.st.stm32cube.ide.mcu.externaltools.gnu-tools-for-stm32.10.3-2021.10.win32_1.0.0.202111181127/tools/bin/../lib/gcc/arm-none-eabi/10.3.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/hard\libc.a(lib_a-memcmp.o)
                0x0000000008042ba0                memcmp
 .text.memcpy   0x0000000008042bc0       0x1c c:/st/stm32cubeide_1.10.1/stm32cubeide/plugins/com.st.stm32cube.ide.mcu.externaltools.gnu-tools-for-stm32.10.3-2021.10.win32_1.0.0.202111181127/tools/bin/../lib/gcc/arm-none-eabi/10.3.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/hard\libc.a(lib_a-memcpy-stub.o)
                0x0000000008042bc0                memcpy
 .text.memmove  0x0000000008042bdc       0x34 c:/st/stm32cubeide_1.10.1/stm32cubeide/plugins/com.st.stm32cube.ide.mcu.externaltools.gnu-tools-for-stm32.10.3-2021.10.win32_1.0.0.202111181127/tools/bin/../lib/gcc/arm-none-eabi/10.3.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/hard\libc.a(lib_a-memmove.o)
                0x0000000008042bdc                memmove
 .text.memset   0x0000000008042c10       0x10 c:/st/stm32cubeide_1.10.1/stm32cubeide/plugins/com.st.stm32cube.ide.mcu.externaltools.gnu-tools-for-stm32.10.3-2021.10.win32_1.0.0.202111181127/tools/bin/../lib/gcc/arm-none-eabi/10.3.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/hard\libc.a(lib_a-memset.o)
                0x0000000008042c10                memset
 .text.__malloc_lock
                0x0000000008042c20        0xc c:/st/stm32cubeide_1.10.1/stm32cubeide/plugins/com.st.stm32cube.ide.mcu.externaltools.gnu-tools-for-stm32.10.3-2021.10.win32_1.0.0.202111181127/tools/bin/../lib/gcc/arm-none-eabi/10.3.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/hard\libc.a(lib_a-mlock.o)
                0x0000000008042c20                __malloc_lock
 .text.__malloc_unlock
                0x0000000008042c2c        0xc c:/st/stm32cubeide_1.10.1/stm32cubeide/plugins/com.st.stm32cube.ide.mcu.externaltools.gnu-tools-for-stm32.10.3-2021.10.win32_1.0.0.202111181127/tools/bin/../lib/gcc/arm-none-eabi/10.3.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/hard\libc.a(lib_a-mlock.o)
                0x0000000008042c2c                __malloc_unlock
 .text.printf   0x0000000008042c38       0x24 c:/st/stm32cubeide_1.10.1/stm32cubeide/plugins/com.st.stm32cube.ide.mcu.externaltools.gnu-tools-for-stm32.10.3-2021.10.win32_1.0.0.202111181127/tools/bin/../lib/gcc/arm-none-eabi/10.3.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/hard\libc.a(lib_a-printf.o)
                0x0000000008042c38                printf
 .text.putchar  0x0000000008042c5c       0x10 c:/st/stm32cubeide_1.10.1/stm32cubeide/plugins/com.st.stm32cube.ide.mcu.externaltools.gnu-tools-for-stm32.10.3-2021.10.win32_1.0.0.202111181127/tools/bin/../lib/gcc/arm-none-eabi/10.3.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/hard\libc.a(lib_a-putchar.o)
                0x0000000008042c5c                putchar
 .text.rand     0x0000000008042c6c       0x38 c:/st/stm32cubeide_1.10.1/stm32cubeide/plugins/com.st.stm32cube.ide.mcu.externaltools.gnu-tools-for-stm32.10.3-2021.10.win32_1.0.0.202111181127/tools/bin/../lib/gcc/arm-none-eabi/10.3.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/hard\libc.a(lib_a-rand.o)
                0x0000000008042c6c                rand
 .text._sbrk_r  0x0000000008042ca4       0x20 c:/st/stm32cubeide_1.10.1/stm32cubeide/plugins/com.st.stm32cube.ide.mcu.externaltools.gnu-tools-for-stm32.10.3-2021.10.win32_1.0.0.202111181127/tools/bin/../lib/gcc/arm-none-eabi/10.3.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/hard\libc.a(lib_a-sbrkr.o)

and the disassembly is

Code: [Select]

          _malloc_r:
08042728:   add.w   r3, r1, #11
0804272c:   cmp     r3, #22
0804272e:   stmdb   sp!, {r0, r1, r2, r4, r5, r6, r7, r8, r9, r10, r11, lr}
08042732:   mov     r5, r0
08042734:   bls.n   0x8042744 <_malloc_r+28>
08042736:   bics.w  r7, r3, #7
0804273a:   bpl.n   0x8042746 <_malloc_r+30>
0804273c:   movs    r3, #12
0804273e:   str     r3, [r5, #0]
08042740:   movs    r4, #0
08042742:   b.n     0x8042a90 <_malloc_r+872>
08042744:   movs    r7, #16
08042746:   cmp     r1, r7
08042748:   bhi.n   0x804273c <_malloc_r+20>
0804274a:   mov     r0, r5
0804274c:   bl      0x8042c20 <__malloc_lock>
08042750:   cmp.w   r7, #504        ; 0x1f8
08042754:   ldr     r6, [pc, #704]  ; (0x8042a18 <_malloc_r+752>)
08042756:   bcs.n   0x80427c8 <_malloc_r+160>
08042758:   add.w   r2, r7, #8
0804275c:   add     r2, r6
0804275e:   sub.w   r1, r2, #8
08042762:   ldr     r4, [r2, #4]
08042764:   cmp     r4, r1
08042766:   mov.w   r3, r7, lsr #3
0804276a:   bne.n   0x8042772 <_malloc_r+74>
0804276c:   ldr     r4, [r2, #12]
0804276e:   cmp     r2, r4
08042770:   beq.n   0x8042794 <_malloc_r+108>
08042772:   ldr     r3, [r4, #4]
08042774:   ldrd    r1, r2, [r4, #8]
08042778:   bic.w   r3, r3, #3
0804277c:   str     r2, [r1, #12]
0804277e:   add     r3, r4
08042780:   str     r1, [r2, #8]
08042782:   ldr     r2, [r3, #4]
08042784:   orr.w   r2, r2, #1
08042788:   str     r2, [r3, #4]
0804278a:   mov     r0, r5
0804278c:   bl      0x8042c2c <__malloc_unlock>
08042790:   adds    r4, #8
08042792:   b.n     0x8042a90 <_malloc_r+872>
08042794:   adds    r3, #2
08042796:   ldr     r4, [r6, #16]
08042798:   ldr     r1, [pc, #640]  ; (0x8042a1c <_malloc_r+756>)
0804279a:   cmp     r4, r1
0804279c:   beq.n   0x804288e <_malloc_r+358>
0804279e:   ldr     r2, [r4, #4]
080427a0:   bic.w   r12, r2, #3
080427a4:   sub.w   r0, r12, r7
080427a8:   cmp     r0, #15
080427aa:   ble.n   0x804283e <_malloc_r+278>
080427ac:   adds    r2, r4, r7
080427ae:   orr.w   r3, r0, #1
080427b2:   orr.w   r7, r7, #1
080427b6:   str     r7, [r4, #4]
080427b8:   strd    r2, r2, [r6, #16]
080427bc:   strd    r1, r1, [r2, #8]
080427c0:   str     r3, [r2, #4]
080427c2:   str.w   r0, [r4, r12]
080427c6:   b.n     0x804278a <_malloc_r+98>
080427c8:   lsrs    r3, r7, #9
080427ca:   beq.n   0x8042822 <_malloc_r+250>
080427cc:   cmp     r3, #4
080427ce:   bhi.n   0x80427f6 <_malloc_r+206>
080427d0:   lsrs    r3, r7, #6
080427d2:   adds    r3, #56 ; 0x38
080427d4:   adds    r2, r3, #1
080427d6:   add.w   r2, r6, r2, lsl #3
080427da:   sub.w   r12, r2, #8
080427de:   ldr     r4, [r2, #4]
080427e0:   cmp     r4, r12
080427e2:   beq.n   0x80427f2 <_malloc_r+202>
080427e4:   ldr     r2, [r4, #4]
080427e6:   bic.w   r2, r2, #3
080427ea:   subs    r0, r2, r7
080427ec:   cmp     r0, #15
080427ee:   ble.n   0x804282a <_malloc_r+258>
080427f0:   subs    r3, #1
080427f2:   adds    r3, #1
080427f4:   b.n     0x8042796 <_malloc_r+110>
080427f6:   cmp     r3, #20
080427f8:   bhi.n   0x80427fe <_malloc_r+214>
080427fa:   adds    r3, #91 ; 0x5b
080427fc:   b.n     0x80427d4 <_malloc_r+172>
080427fe:   cmp     r3, #84 ; 0x54
08042800:   bhi.n   0x8042808 <_malloc_r+224>
08042802:   lsrs    r3, r7, #12
08042804:   adds    r3, #110        ; 0x6e
08042806:   b.n     0x80427d4 <_malloc_r+172>
08042808:   cmp.w   r3, #340        ; 0x154
0804280c:   bhi.n   0x8042814 <_malloc_r+236>
0804280e:   lsrs    r3, r7, #15
08042810:   adds    r3, #119        ; 0x77
08042812:   b.n     0x80427d4 <_malloc_r+172>
08042814:   movw    r2, #1364       ; 0x554
08042818:   cmp     r3, r2
0804281a:   bhi.n   0x8042826 <_malloc_r+254>
0804281c:   lsrs    r3, r7, #18
0804281e:   adds    r3, #124        ; 0x7c
08042820:   b.n     0x80427d4 <_malloc_r+172>
08042822:   movs    r3, #63 ; 0x3f
08042824:   b.n     0x80427d4 <_malloc_r+172>
08042826:   movs    r3, #126        ; 0x7e
08042828:   b.n     0x80427d4 <_malloc_r+172>
0804282a:   cmp     r0, #0
0804282c:   ldr     r1, [r4, #12]
0804282e:   blt.n   0x804283a <_malloc_r+274>
08042830:   ldr     r3, [r4, #8]
08042832:   str     r1, [r3, #12]
08042834:   str     r3, [r1, #8]
08042836:   adds    r3, r4, r2
08042838:   b.n     0x8042782 <_malloc_r+90>
0804283a:   mov     r4, r1
0804283c:   b.n     0x80427e0 <_malloc_r+184>
0804283e:   cmp     r0, #0
08042840:   strd    r1, r1, [r6, #16]
08042844:   blt.n   0x8042856 <_malloc_r+302>
08042846:   add     r12, r4
08042848:   ldr.w   r3, [r12, #4]
0804284c:   orr.w   r3, r3, #1
08042850:   str.w   r3, [r12, #4]
08042854:   b.n     0x804278a <_malloc_r+98>
08042856:   cmp.w   r12, #512       ; 0x200
0804285a:   ldr     r0, [r6, #4]
0804285c:   bcs.w   0x804298c <_malloc_r+612>
08042860:   mov.w   r2, r12, lsr #3
08042864:   mov.w   lr, r12, lsr #5
08042868:   mov.w   r12, #1
0804286c:   adds    r2, #1

This is some sort of mutex protected malloc.

God knows where this one came from.

This is the malloc_lock

Code: [Select]

          __malloc_lock:
08042c20:   ldr     r0, [pc, #4]    ; (0x8042c28 <__malloc_lock+8>)
08042c22:   b.w     0x8048358 <__retarget_lock_acquire_recursive>
08042c26:   nop     
08042c28:   stmia   r2!, {r0, r3, r7}
08042c2a:   movs    r0, #0
          __malloc_unlock:
08042c2c:   ldr     r0, [pc, #4]    ; (0x8042c34 <__malloc_unlock+8>)
08042c2e:   b.w     0x804835a <__retarget_lock_release_recursive>
08042c32:   nop     
08042c34:   stmia   r2!, {r0, r3, r7}
08042c36:   movs    r0, #0

Does anyone recognise this stuff??

uer166 · « **Reply #38 on:** July 23, 2022, 11:13:23 pm »

Quote from: peter-h on July 23, 2022, 08:42:51 pm

My feeling, with due respect, is that this is another "internet myth".

That may be your feeling, but like I said, I chased down this specific issue before on STM32G474, and we eventually gave up and just re-implemented a printFloat() on our own. Not sure if it's been posted before or not: https://community.st.com/s/question/0D50X0000BB1eL7SQJ/bug-cubemx-freertos-projects-corrupt-memory.

Of course, this issue may have nothing to do with this, but I can guarantee you that using printf(%f) and FreeRTOS together in ST's ecosystem does not work.

cv007 · « **Reply #39 on:** July 24, 2022, 12:15:01 am »

Quote

But the malloc didn't actually get called. It may be conditional on some config option.

This seems to be what you are showing-
https://sourceware.org/git?p=newlib-cygwin.git;a=blob;f=newlib/libc/stdio/vfprintf.c;h=6a198e2c657e8cf44b720c8bec76b1121921a42d;hb=HEAD#l866

Ending up in _svfprintf_r via a starting point of sprintf, the __SMBF FILE struct flag is not set so malloc is then skipped.

This suggestion may be worthless- but it seems lwip is your latest code addition, and you have lwip assert enabled as seen by the assert strings in memory, so maybe disable the lwip assert to see what changes. Maybe an assert is taking place, and the resulting printf originating within lwip code is causing some problem. Not necessarily a great way to go about it, but a few pokes of the patient to see how they react is sometimes useful. Alternatively- make a lwip assert fail to see if the lwip assert printf actually works correctly.

peter-h · « **Reply #40 on:** July 24, 2022, 08:00:36 am »

Got out of bed and ... it has crashed again. Ran for 5hrs. So it isn't simply that printf. Same trace data as before.

I put a breakpoint on the _malloc_r and found it gets called right away, in this

Code: [Select]

ck_1 = HAL_RCC_GetPCLK1Freq();
ck_2 = HAL_RCC_GetPCLK2Freq();
printf("PCLK1=%ld  PCLK2=%ld\n",ck_1,ck_2);

which is nothing to do with %f. That btw uses a printf which has had its out redirected to the SWD console so you can do debugs to the debugger window at high speed. It does sort of work... but I will remove it now

There is a mutex passed through, fwiw.

So this printf uses malloc for just about everything! But whose heap?

Anyway, not sure what I can do about that. Obviously it is not thread safe. I probably need to mutex printf sprintf but then mutexes are not accessible until osKernelInitialize().

I did a fixed _sbrk for the general heap malloc but if this _malloc_r is getting a wrong value, that will be broken. Where is it placing the heap? The source - thanks cv007 - says

107 Supporting OS subroutines required: <<close>>, <<fstat>>, <<isatty>>,
108 <<lseek>>, <<read>>, <<sbrk>>, <<write>>.

I fixed sbrk (and I see _sbrk is at the same address) a long time ago. But that source does not reference sbrk so where does this printf place its heap? The initial call to _malloc_r has this register content

Code: [Select]

General Registers		General Purpose and FPU Register Group	
	r0	0x20000598 (Hex)		
	r1	0x400 (Hex)		
	r2	0x1 (Hex)		
	r3	0x400 (Hex)		
	r4	0x200008ec (Hex)		
	r5	0x800 (Hex)		
	r6	0x20000598 (Hex)		
	r7	0x2001fd90 (Hex)		
	r8	0x80086cc (Hex)		
	r9	0x200008ec (Hex)		
	r10	0x20000598 (Hex)		
	r11	0x0 (Hex)		
	r12	0x0 (Hex)		
	sp	0x2001fb38 (Hex)		
	lr	0x8048407 (Hex)		
	pc	0x8042728 (Hex)		
	xpsr	0x1000000 (Hex)		
	d0	0x0 (Hex)		
	d1	0x0 (Hex)		
	d2	0x0 (Hex)		
	d3	0x0 (Hex)		
	d4	0x0 (Hex)

and while I don't know which registers are used for the parameters, I would bet it is allocating 0x400 (R1) or 0x800 (R5) which is 1k or 2k!

It calls _sbrk_r

and the source for that is

which calls _sbrk. Funny how arm implements a RET... they pop the PC

The funny thing is that this heap seems to get discarded when printf returns. Or does it? Does the first call to "printf" grab a block? It would need to use some global variables which get preserved. Normally you don't need to call sbrk to do a malloc, but you do need to call it when creating a new heap, on an existing heap.

Then it gets more complicated. I am tracing calls to sbrk, to see where this "printf heap" is going, and I find calls to the normal malloc is calling _malloc_r !!!! So we come full circle

and I see no calls to _free_r (well not until TLS is running and I know about that one; it gets a 48k block which is freed when the https session ends, about once a minute).

Clearly the printf family is using the general heap for every call. And it never calls free(). That means it must be storing a pointer to its block somewhere. But it still does a malloc call on every subsequent use.

A key question is whether this heap is really mutex protected. It looks like it is. Does anyone recognise this

A google on __retarget_lock_acquire_recursive shows that this is indeed an empty function, and this
https://gist.github.com/thomask77/65591d78070ace68885d3fd05cdebe3a
describes the right code for that - using one of the FreeRTOS mutexes.

So this newlib printf has not been correctly implemented on this system.

This
https://www.freertos.org/FreeRTOS_Support_Forum_Archive/May_2017/freertos_printf_and_heap_4_f2b0ee0cj.html
suggests there is a solution, with FreeRTOS having a proper printf printf-stdarg.c. This is referenced in comments in the FR files but it says it is an incomplete version (no float output for example).

I think I need a new printf which does not use the heap except for %f.

I still don't know if this is the cause of the crashing but it is a good candidate.

I unchecked the newlib-nano checkboxes and it still calls malloc for the integer output, and it calls those empty mutexes...

But I found something

The 1st printf calls malloc but the 2nd one doesn't. Same for calling __retarget_lock_init_recursive. So maybe the heap (and its attempted protection) is used for %f and for longs only.

How can one make those empty mutex functions call proper ones, given that a) I don't have the C source and b) they are presumably not defined as weak?

emece67 · « **Reply #41 on:** July 24, 2022, 09:42:03 am »

emece67 · « **Reply #42 on:** July 24, 2022, 09:47:53 am »

peter-h · « **Reply #43 on:** July 24, 2022, 11:08:52 am »

It is more complicated.

Both the default ST Cube printf and the newlib-nano alternative does this:

1) Uses the heap for floats and longs
2) Uses malloc in the initial call to a printf, and uses it again for the initial call to a sprintf, and does not use it after that
3) Calls _sbrk only once (at the very first printf, but not at the very first sprintf)

Now why would a printf call _sbrk? It is not needed for just using the heap. For that you call malloc(). _sbrk is used only by malloc itself to check if the requested block will fit. Possibly the newlib stuff gets a block via malloc and then runs its own heap within that, but I don't really get it.

Bad coding really; no need for the heap for these little blocks of RAM. They could use the stack and it would be re-entrant, etc.

Does anyone know what is supposed to be the difference in ST Cube between newlib-nano and the default? Nano is supposed to be smaller, but there is stull like this
https://stackoverflow.com/questions/32948032/newlib-nano-long-long-support
stating that it does not support long-long, which is definitely incorrect.

I am now implementing this lot
https://gist.github.com/thomask77/3a2d54a482c294beec5d87730e163bdd
which is implementing the mutexes used by newlib (basically the printf family AFAICT).

I am getting lots of linker errors
multiple definition of `__retarget_lock_try_acquire'; ./src/heap_locking.o:
and I don't know how to fix this since the original symbols are defined in code to which there are no sources, the functions are not "weak", and the symbols are already in the symbol table. Is there some sort of override which one can put in the .c file where the new functions are, and which will get fed through to the linker? Setting up the linker options in Cube is complicated. I have asked the Q in the Programming forum.

Mutex protection on the heap should take care of the newlib heap usage issue. Well, printf itself may still not be thread-safe in which case I have a bigger problem, because that is needed.

But malloc and free themselves are still unprotected AFAICT, if called directly.

Whether the above stuff is causing the crashes, I don't know, but it needs to be fixed.

Thanks to everyone for your continued help.

peter-h · « **Reply #44 on:** July 24, 2022, 04:12:11 pm »

I was puzzled why both newlib-nano and standard produced the same behaviour. Later I found the code size doesn't change! Turns out those checkboxes in Cube are BS.

You have to also select one of these, and it was always "Standard C". If you do that, the nano option does nothing. Obvious, I suppose, to those who know.

Anyway, this means all the stuff around the net about Newlib-nano being a load of crap doesn't need to worry me. I am dealing with the standard C lib, and that is what uses malloc for floats (promoted to double, per standard C printf), longs, and double longs. That also explains why %llu etc has been working; it isn't supposed to be working on newlib-nano.

Still have to do all those mutexes...

cv007 · « **Reply #45 on:** July 24, 2022, 05:38:29 pm »

Quote

Bad coding really; no need for the heap for these little blocks of RAM. They could use the stack and it would be re-entrant, etc.

As shown in a few of your disassembly screenshots, they also use a big chunk of stack space (316 bytes) in addition to whatever else they are doing. Not sure what freertos has in place for stack checking, but probably would be nice to know how all the stack spaces are doing.

peter-h · « **Reply #46 on:** July 24, 2022, 06:00:51 pm »

I am happy with RTOS stacks; I have that graphical monitoring tool for that and have almost 64k of RTOS stacks to play with for the whole project.

Funny tracing through that printf code. First it calls __retarget_lock_acquire_recursive, then it calls _malloc_r, and that calls __retarget_lock_acquire_recursive.

Normally this would not work but I guess a "recursive" mutex can be nested like that, and still works at each depth. I would have used one mutex for the printf and another one for the heap, but maybe printf calls __retarget_lock_acquire_recursive several times without closing each one...

That function __retarget_lock_acquire_recursive is empty as described and that is what I am trying to fix with my stuff above. Need to override the existing symbols...

On past record, it would not surprise me if there was some build option which, when #defined, magically joins this all up with the FreeRTOS mutexes (like my posts above), but I can't see it. For the newlib-nano stuff there is configUSE_NEWLIB_REENTRANT but that creates other problems and it looked like a huge rabbit hole and anyway I do need uint32_t and 64_t printf and scanf support (%f one can avoid, at a push).

jc101 · « **Reply #47 on:** July 24, 2022, 09:06:20 pm »

I had the printf problem with FreeRTOS but with a PIC32. I would get a crash when using printf to format a float, my fix was to use a define to map malloc to the FreeRTOS pvPortMalloc etc. which are thread safe.
This made all, albeit limited, malloc calls use the FreeRTOS heap, solved the problem for me. I was using heap 4 for my project, I think heap 3 does this for you but you don't get the benefit of heap 4. Also had to ensure the task calling printf had enough memory on its stack or the task would fault if the task swapped when in the printf.

Code: [Select]

#define malloc(size) pvPortMalloc(size)
#define free(ptr) vPortFree(ptr)

peter-h · « **Reply #48 on:** July 24, 2022, 09:21:46 pm »

You probably had sources to your pic32 printf. Then you could put those #defines in there.

I could also fix the malloc and free (to which I also don't have sources) by wrapping them in mutexes (just one mutex actually because you cannot have malloc and free occurring concurrently) and calling them say m_malloc and m_free, and then have a file heap.h with similar #defines in it. Or I could replace the malloc code altogether; there is a ton of heap code out there.

Looking at how my printf calls the same recursive mutex as my heap does, they must have come from the same place.

Somebody who knows their way around this development environment would have had this done in less time than it takes me to write this

jc101 · « **Reply #49 on:** July 24, 2022, 09:28:08 pm »

That is probably the case. Have you searched the FreeRTOS forums, they have all sorts buried in there for all kinds of IDE’s?

Some of them are archived but still searchable via the FreeRTOS website.


EEVblog Main Site	EEVblog on Youtube	EEVblog on Twitter	EEVblog on Facebook	EEVblog on Odysee

Author Topic: 32F4 hard fault trap - how to track this down? (Read 11343 times)

Share me