It's basically a rather large signal with small changes in amplitude. In power, let's say it could vary by 0.005dBm over a second.
Do you mean 0.005 dB (not dBm)? Is your goal just discriminability? Or in other words, if you get two readings of 1.0V and 1.000576V, is your aim to have high statistical confidence that the true amplitude of the 2nd one is really larger than the true amplitude of the first one? If you want (say) five sigma confidence on each side, so that the overlap of the two PDFs is really just marginal, this implies that the standard error of the measured magnitude must be < 0.000058 times the magnitude. A DFT-based detector with rectangular window and N points requires the input signal to have a SNR of about >= 85dBc - 10*log10(N) to achieve this goal. E.g. for N=100000, the a priori SNR of the signal fed into the detector must exceed 35 dBc.
[ This consideration assumes Gaussian random noise. If your noise PDF is different, or your noise is not white, you'll get different numbers. Any systematic (non-random) interferences need to be analyzed and considered separately. So the point is: You need enough a priori knowledge about the signal you are dealing with, and about included noise and interferences, in order to predict the outcome. ]
Using a "16 bit" ADC with the datasheet saying the SNR makes it about 14.5 bits. Sampling rate let's say 100MSPS.
Could I use a higher bit ADC like a sigma-delta? Normally this would be ideal,
14.5 bits is 89dB
FS. But that's just the contribution of the ADC alone. But it becomes negligible if the noise level which is a priori present in your signal happens to be much higher. If the a priori noise level is large, then even a 12-bit ADC may not make a significant difference at the end, compared to 16-bit. Note that random noise dithers the ADC, which also improves its DNL.
but there's impulse noise that I can't simply low-pass filter out.
You really need to qualify and quantify the non-random noise components polluting your signal, in order to find out whether they can be separated from the useful signal at all, and if yes, to get an idea how it could be done.
Can't you feed your signal into a spectrum analyzer, or into a scope with large FFT (if the frequency is not too high) to get at least a first idea?