Basically, external comp allows performance comparable to current-mode op-amps (which have high BW fairly independent of gain), while keeping the ease of use of the voltage-mode op-amp -- save for the sole exception of an additional required compensation network (typically C or R+C from the internal gain node to GND, or between two pins).
When it's the internal gain node, having that exposed also opens up opportunities like joining multiple amps together to make precision limiters.
But, it seems history was against that, and so we all got fixed-compensation (largely unity-gain stable) op-amps, consequently with terrible GBW-power tradeoff when high gain is required.
It's not like it's that much harder to use, it's literally one or two added components per amp. Maybe a bit annoying for duals or quads; no one ever made a 10 or 12-pin DIP so you'd have duals in DIP14 at least, and so on. (Though if we're talking hypothetical alternate histories, maybe DIP10 would've caught on then, who knows.) Probably impractical as bandwidth goes up -- the physical size of the gain node begins to matter, and taking it out to a whole massive pin starts to cost a lot of performance, plus making the device more susceptible to ambient noise, or stray feedback paths. Today we have unity-gain compensated voltage-mode op-amps with GBW in the GHz, where this is absolutely a relevant factor; but back in the days of 10-100MHz GBW being fast, it wouldn't have been such an issue (and indeed, wasn't, for the devices that did function this way).
"Unity gain stable" is an important term here. It means that, given the amp's dominant 90° phase shift (it looks like an integrator -- which is also to say, it's dominant-pole compensated), gain falls less than 1, at a frequency lower than additional poles, which add phase shift that would then cause it to oscillate.
If the phase margin were measured at gain = 10 instead, we could accept having poles above that point (i.e., in the 1 < gain < 10 range of frequencies) and still have stable operation, but we'd have to sacrifice that we can't reduce gain below 10, i.e. we definitely can't make voltage followers for example (at least, not without hackery -- we can increase noise gain, because this is actually the figure relevant to stability; but as the name suggests, then noise goes up, and this might be worse overall). Such devices are indeed available, for example some LT families have unity, 5 and 10 gain stable parts, so you can get the extra bandwidth when you also need the gain. But, LT parts being what they are, and non-unity-gain-stable in general -- they're less common, and more expensive.
I should probably explain poles as well. Taking the transfer function of the amp, in the frequency domain, H(ω) = Vout(ω) / Vin(ω), we find a normally very high (but finite) gain at DC, then at a low cutoff frequency (typically 10Hz, give or take a few decades), it changes to dropping proportionally with frequency (-20dB/dec, integrator characteristic) (this is the dominant pole). Then at, somewhere past GBW presumably, it drops more (-40dB or faster). Each transition, from flat to -20, or -20 to -40, etc., has a characteristic frequency. These are the intersection points of the asymptotes on the Bode plot (give or take). When we write out (or solve for an approximation of) the rational polynomial corresponding to this transfer function, the zeroes of the denominator polynomial are poles of the transfer function. The poles have units of frequency (the transfer function is in the frequency domain), that being the "break frequency" where a new asymptote takes over in the Bode plot.
So, it's actually rather abstract, to do with how we model a system as lumped equivalent elements (a passive RLC circuit has a corresponding rational polynomial describing its frequency response), and how we calculate it (polynomials), but at its most basic: a pole is a cutoff frequency.
(And, zeroes of the numerator of the transfer function, are also zeroes of the whole thing. So, sometimes we have those, which has advantages, as when the phase shift balances out that from poles, extending phase margin (pole-zero cancellation, or lead-lag compensation); but also having consequences for stability in a feedback loop, particularly for RHP (right half-plane) zeroes, which can be inverted in a feedback loop, becoming poles, and RHP poles mean oscillation.)
(Oh, and in general, poles/zeroes have complex values, so we plot them on the complex plane. A pole p = ξ + jω with positive real value ξ > 0, means the response will grow over time, i.e. diverge, i.e., is unstable or oscillating. So a "RHP" pole is generally a sign that something has gone wrong. Finally, when we plot a transfer function, we plot along the imaginary axis, s = jω = j 2 pi f.)
Tim