Don't know about esp32, but as any digital bus I2C trasfer time is defined by clock rate * number of bits. I2C standard AFAIK supports 100k, 400k, and 3.4M clock. See what is the clock of your esp32 driver. For myself, I usually measure arduino speed performance using any spare digital pin and a scope, setting high/low at the start and the end of time critical function.
millis() also does same thing, in software.
Regarding size of the fft, depends on lower frequency range & bandwidth. For example, 10 k sampling and 128 bins provides just 10000/128 = 78.125 Hz freq. resolution. It means, you can't measure below this value, and what is more important, error 100% at 78 Hz, and 10% or so (not sure if it's linear, but you get an idea, and do math work to verify ) at 780 Hz, that may not be acceptable.
Cross correlation is the same thing as DFT. Has very little or zero practical value, since FFT about thousands time faster.