People who believe that they can hear differences between 16-bit/24-bit or even 44.1kHz/96kHz+ sampling rates fall under these categories:
1. They're wrong and their ABX method is flawed, or they lied about doing ABX because doing a proper ABX is too much work (sorry!)
2. Their hardware/OS is interfering with the audio in a way that makes the output quality vary depending on source parameters (this could be caused by crappy output filters, crappy resamplers, unwanted signal processing somewhere in the chain, interferences caused by poor hardware implementation, etc)
3. In the case of 96kHz+ content, sometimes the ultrasonic information interferes with amplifiers/speakers causing additional issues which peoples' minds can intepret as "different" and "better"
4. "High resolution" content is often mastered differently, so you're hearing an improvement in the mastering process, not an improvement because of the high resolution itself
If you want to be scientific about it, you have to at least look at the output FFT of a test signal at all those test parameters to ensure that your hardware/software isn't messing it up. Check noise floor, check for harmonics, check for unwanted signals and possible intermodulation products, check square wave response to make sure nothing is oscillating (particularly relevant to DIY/modded hardware). The checks should be done at the final outputs (ie, speaker amp or headphones amp output). Do a white noise test and look at an averaged FFT to see if the linearity changes or if funky filters get switched in at different sampling rates, sometimes that's a thing..
In my experience, things that contribute to audio quality, sorted from highest contributing to lowest contibuting:
1. Speakers and room size/acoustics (together, this is by far the biggest factor)
2. Actual source quality (recording/mastering quality, format compression if any, bit-accurate output etc)
3. Basic issues (ground loops, bad power, interference between components - such as from switching PSUs etc)
4. Amplifier
5. DAC quality (power filtering, power regulation, output coupling, opamps/output stages, main clock oscillator phase noise, complexity of clock tree, dac chip jitter sensitivity, choice of dac chip)
6. Source path (ideally you want to avoid things like SPDIF that need clock recovery or cheap USB converters that carry really dirty power into the signal path and/or produce a lot of jitter because of a crappy clock... stick with I2S all the way if you can, since that's what the DAC chips will use). Some hardware generates I2S with microcontrollers or FPGAs, and is asynchronous to, and isolated from, the data input. Beauty.
1369. Your SD card brand (make sure to only use Sony's low-noise SD cards, or all this stuff will be in vain.
I'm a compulsive ABXer and anti-BS hardware dev person. I love proving myself/being proven wrong. I'm lucky enough to be able to hear to ~22kHz. :v I went to music school as kid. I have no friends because I science too much. If you can't trust me, then who can you trust...