Ok, so here is what I did. I used the board from my signature, with Ethernet addon board, and used Vivado/Vitis 2020.2 IDE.
1. Created a system diagram which includes DDR controller (DDR2 in my case), Microblaze, Timer, UART, EthernetLite (for 100 Mbit Ethernet) and Quad SPI (configured in "Performance Mode" and Quad).
2. Once generated bitstream and exported HW, went in Vitis and created a platform project based on that xsa file
3. Created a baremetal echo application using LwIp echo template. It's set to run from DDR (0x8000_0000).
4. Using "Xilinx -> Program Flash" menu item, converted the executable into SREC and burnt it into flash at offset 0x40_0000 (I have 128 Mbit flash, so 0x100_0000 bytes, full bitstream is 0x21_72F7 bytes).
5. Created another baremetal application using "SREC SPI Bootloader" template. It's automatically set to run off BRAM.
6. Opened blconfig.h file inside generated project and changed FLASH_IMAGE_BASEADDR to 0x00400000
7. Launched bootloader under debugger and observed what's going on.
I noticed that it does write some data into BRAM at offset 0 before proceeding to DDR memory. It also does jump to address 0, but it's apparently patched by the loader such that it launches the executable from DDR at 0x8000_0000.
And it worked with no problems! Now - I didn't try burning the bootloader into the bitstream, but that is a trivial step and I'm sure it will work - just make sure you associate the bootloader's elf file with the BRAM before generating a final bitstream. I also had to use 32KB of local BRAM, because debug version of bootloader was over 16KB.