The wikipedia page is more authoritative and detailed than I can do here:
https://en.wikipedia.org/wiki/Frequency-shift_keying but as a simple outline:
If we're talking a real FSK system, then the actual time function is not y1(t) + y2(t).
If it was, then your assertion would be right: as Kalvin says, the Fourier transform is linear.
But BFSK is a little more complicated than that.
Assume a string of bits x(n) sent over time, where we give each bit T seconds.
We'll use fd = f2 - f1 to simplify the notation.
In the simplest case the output signal is
y(t) = sin(( 2 * pi * (f1 + x(n) * fd)) * t)
where n = int(t / T)
The Fourier transform of this function is decidedly
not two spikes in the frequency domain. Switching between the two frequencies in this case is done in "zero time" so there is power between the two spikes -- nature doesn't like sudden changes. (In fact, except in special cases, the power is very very broadband.)
Good FSK schemes change that x(n) * fd term to something like
h( x(n) ) * fd
where h() spreads the pulse out a bit, think of it as smoothing the rising and falling edges. Properly chosen, the h() function can band limit the energy to minimize the channel width for a given data rate. (See for instance "gaussian ask" in the wikipedia article.).
Typically the spectrum for a "clean" FSK signal will look like two peaks with skirts.
For some example pictures of FSK signals, see for example
https://www.sigidwiki.com/wiki/Radio_Teletype_(RTTY)