If you want to see how fast a CPU can run, you don't have to test every part of the CPU, just the paths that have the worst timing at various process corners (eg. hot or cold, or more easily high voltage / low voltage...).
I think it would be easiest to include a little test design somewhere on the die (e.g. maybe a ring oscillator), which you can easily access to quickly test the likely performance. This would allow you to sort the best from the rest without having to package and power up the entire die.
Or maybe intersperse test patterns across a wafer, to see where things are better/worse than average.