Well, after asking everywhere, putting this out on freelancer.com, etc, I did more digging around and found the main issue. In ethernetif.c
I noticed those mutex macros (which are not suggested anywhere including in the ST UM1713 implementation notes) were rather similar to the ones protecting the LWIP socket etc API
if LWIP_TCPIP_CORE_LOCKING=1... You can't do that; you must not have overlapping mutex regions. Especially obvious protecting low level ETC calls with the same mutex as is used for the high level TCP API! So I used different mutexes. That fixed it.
Actually I wonder if mutexing these low level ETH calls is even needed. The code runs without it, but that can easily be by luck. Nobody else is calling those functions. But they can be chopped by an RTOS task switch. Disabling interrupts ("CRITICAL" region) would have been another way, but with a significant interrupt latency. This code is polled anyway, not interrupt driven.
I also uses one mutex for tx and one mutex for rx. It is an interesting Q whether the 32F417 ETH can transmit and receive at the same time, given that there is only one RJ45, but I think ETH can be bidirectional in general.
Hopefully this helps somebody one day.
Then I retested MEMP_MEM_MALLOC=1 which was the other thing that broke things. Now, the statically allocated buffer MEM_SIZE (inside which LWIP runs a private heap) gets stuffed with more stuff, notably the PBUF_POOL_SIZE * PBUF_POOL_BUFSIZE packet buffers, so MEM_SIZE has to go up from 6k to 16k in my case, but the static RAM usage remains the same because previously LWIP was statically allocating a load of stuff. That now runs also but not reliably so I won't bother with it. It cannot magic up RAM from nowhere.
If MEMP_MEM_MALLOC=0, I have a very reliable system. MEM_SIZE is statically allocated and, by experimentation, holds the PBUF_POOL_BUFSIZE buffers and (if MEMP_MEM_MALLOC=0) some other odd small bits which seem to be just a few bytes in size and appear mysteriously in different places.