Agreed, Lyons and Smith are both good for n00bs. A less-common book that I also like a lot is Richard Newbold's
Practical Applications in Digital Signal Processing.
If you're a programmer, try picking up Fourier theory this way: first write a simple pair of loops to teach yourself what convolution is by walking through a couple of arrays. Fill one array with a few dozen cycles of on-off pulses, and put a single pulse in the other. Notice that the resulting function has a peak where the two arrays' contents match. Notice the peaks and valleys that result as you 'slide' the arrays relative to each other. Congratulations, now you understand correlation.
Now, correlate your square wave with an array of values obtained from a cos() waveform. Do this a few thousand times with cosine waves of various frequencies. Notice how you get peaks at odd multiples of the square wave frequency -- 1x, 3x, 5x, 7x, etc.? Congratulations, now you know what a Fourier transform does. Do the same correlation with a sine function, and the two resulting arrays will have enough information to reconstruct the original square wave.
Then you get to meet Prof. Gibbs...