Simple Fourier analysis of the waveforms gives the answer.
"Morse" code can't really be called such, beyond some data rate where its pulsation isn't even perceptible; probably on the order of say 20ms bit time, or 50 baud. Even then I don't know that you could understand it at that rate, and almost impossible to send. But as human feats go, quite a lot is possible with enough practice, so, I wouldn't rule it out that someone can do this, or maybe even a bit more.
If it's being registered with a tone, then the tone itself becomes meaningless when the tone burst is shorter than a few cycles. The cycles, or lack thereof, overlap into a jumble of lumps. Not that detection is impossible at this point, given a suitable receiver, but it gets a lot harder than, say for example: bandpass filter --> diode detector --> lowpass filter --> discriminator (say schmitt trigger). The bandwidth of such a signal chain is too low to reliably receive bits.
Notice that using an audio carrier is no different than the on-air signal itself. Just at RF rather than AF. So the same argument applies: when the signal is starting and stopping every couple of cycles, it's a jumble and hard to make any sense of at all.
Applying Fourier analysis, we can see that the frequency response corresponding to a single rectangular pulse, is a sinc envelope with bandwidth inversely proportional to pulse width. This is true at baseband, and it's also true at RF. This relation holds true: multiplication and convolution are exchanged under the Fourier transform. If we multiply a signal rect(xt) with a carrier sin(w0 t), it's equivalent to convolving F(rect(xt)) with F(sin(w0 t)). Convolution isn't always the simplest operation, but we have a special case here, that F(sin(w0 t)) = delta(w - w0) (that is, the Dirac delta function, meaning, one spike on the spectrum*), and convolution with a delta (impulse, spike) is identical to shifting the other argument over to that center frequency.
*Well, two because it's time-symmetric, so +/- w0.
*And yes, it's "not a function". But also we can define it in such a way that it is. It's fine.
So a rectangular window / gated / tone-burst sine wave, has a frequency response (spectrum) of a sinc function, centered at +/-w0, with a width (-3dB bandwidth) of pi/x or something like that (I have to look up the exact F(rect(xt)), but it is what it is).
If we send multiple rect at various times, their spectra superimpose and we get new peaks and valleys. For example, an alternating pulse train as a square wave, everything but the harmonics cancel out, so we would get a peak at w0 (carrier), and secondary peaks at w0 +/-(1, 3, 5, ...) * (2pi)/T. All the odd harmonics, up to whatever bandwidth the square wave happens to have. Or if we send some other duty cycle, then even harmonics show up as well, and their amplitudes take on a different distribution; or if we send random data (like a PRBS (pseudorandom bit stream) test, which, generally, something like Morse code will approximate over time), then again a sinc sort of envelope but with random peaks and dips as the data comes and goes.
For accurate detection of data, a sinc spectrum can be bandpass filtered to the first zero, which corresponds to a bandwidth of two bit times (so, if the bits are alternating 1/0, that's a square wave of that frequency, so, needs at least that much BW to see).
The term for these extra tones/spectra is sidebands.
Radio channels are allocated with fixed (maximum) bandwidth. Morse code channels for example every 500Hz or so, SSB a couple 3kHz or more (I forget what exactly), AM, NB FM a bit more, WB FM (commercial broadcast FM) 200kHz, television some MHz, etc. Notice these are all quite narrow in relation to their center frequencies: even a code channel at 143kHz has very little fractional bandwidth (143/0.5), and amateur channels even at quite high center frequencies (say 144MHz, or heck anything up into the GHz with the same modulation) even less.
So from the standpoint of not stomping on neighboring channels, exceeding that bandwidth, which we know is also the symbol rate, would be considered interference.
More data can also be sent by using more levels/states per symbol, but this is only feasible when the symbols can be discriminated from each other reliably, i.e. the signal to noise ratio is adequate. Like how WiFi negotiates anything from BPSK (couple megabits) with poorest reception, up to, I forget what, but some dense QAM constellation for >60MBps, or even more on the 5GHz band I think? (And, notice again, these are fairly narrow channels, compared to the 2.45GHz center frequency.)
Tim