There's really not a need for non-portable hand optimizations in the types of systems where cache misses hit you hard(dram based, 400+ cycle losses). In those cases you're only optimizing the 10% that's in the domain of the compiler. The rest of it, the way you organize and access data, that's where the real perf gains are to be made.
That's actually what I was referring to WRT "hand-optimizing". Data sets and algorithms, not instructions. Although, if the shoe fits...
The compiler flags issue frightens me. Even if I managed to get it right for my code now, I doubt I could get it right for other people's/companies' code, and I doubt people using my code in 5 years will get it right - because they don't have time to understand how my code works internally.
Here's the way I see it:
Software
always has bugs. Optimization problems can be sneaky, for sure, but hopefully a good test suite will catch a lot of this before the code ever sees the light of day. If not, then maybe they will be found in the wild, investigated, and resolved in whatever way is most appropriate. Not ideal, perhaps, but realistically that's just part of the software release cycle.
If the end-user has access to the source, it's wise to set sensible build defaults, and mention known issues somewhere in the docs (Makefile, header comments, readme) In some cases, the project / author's site will say "before filing a bug report, make sure you've compiled the software with these flags..." In general, there seems to be a consensus among developers and users what is considered "safe", and to go beyond that is The Road Less Traveled.
If the source isn't available, then those things should be controlled internal to the build process anyhow.
I'm not solving this problem for anyone here, obviously -- it's all well understood. I only mean to describe my own perspective on the nature and scale of the problem.
One might say, "But why exacerbate the potential for bugs by continuing to use languages where this is a problem?" That's a fair point, One. However, this has to be considered as one of the often many potential drawbacks of any given tool. Java has dependency issues like a mother, for e.g. Root help you if you have the wrong JRE. That will often lead to the same kind of "WTF -- it works fine on my other computer!" problems as optimization flags. .Net has its share of quirks as well. This is not a unique dilemma to C.
Maybe Fortran runtime environments are perfect though. I dunno, never used it.
Very true. Nowadays in terms of latency, cache = main memory, main memory = disk.
SSD's are not disks.
I think there's some confusion here. I took our consonant friend's remark to say, "cache is the new RAM, and RAM is the new disk", referring to performance expectations rather than the medium itself.
I'm pretty sure most programs run in non changing hardware.
If by "most", you mean "some ambiguous number I've pulled from my exhaust pipe"
Sorry, I couldn't resist. No offense intended, I'm just skeptical of anyone's guess as to the nature of "most programs". That seems as nebulous as saying "most webpages are HTML 4 Transitional" or something. Good luck proving a sample is representative.