Framebuffer transfers for LCDs?
But you don't need cycle perfect performance for this, so a couple minor interruptions to reload the counters and pointers is not a huge deal and does not affect overall performance.
It would, however, make writing and maintaining code easier. I prefer helpful hardware which offloads the CPU, over hardware which needs constant baby-sitting by the CPU, even if said babysitting is not a big performance issue. It still needs to be implemented, tested and verified, and managing peripherals notoriously introduces weird rare corner cases and this is impossible to properly unit test or verify on a PC, because no simulation models exist for the hardware.
You don't have to go to the most expensive flagship models to have more RAM than 64K. UART, SPI, I2S or DCMI transfers can easily go in range of a few hundred K. You can't always utilize word-size transfers. Actually I would think 20-bit counter would cover almost all use cases, 24 for sure.
Mid-range STM32 DMA mapping system is just so ****ed up. They have gazillion of streams available (say, typically 2*8 = 16) - you never ever use that many in a real project simultaneously. But you can only map a few peripherals to each, so before committing to a PCB layout, one needs to verify if they can have
even just two simultaneously. They waste a lot of silicon into needlessly high number of counters, comparators and control logic, yet did not want to create a simple full request matrix, but sixteen half-assed matrices instead.
H7 series was the first to finally add the DMAMUX. They could have halved the number of channels at the same time if they wanted to save some silicon. Instead, they added even more DMA controllers to the thing, but that's OK because the chip is expensive anyway.