yes that would work. I'd actually need two clicks to simulate a clicker because you press it and release it and because it's one of those things with a sprung diaphram that flips from one side to the other it clicks when you press and clicks "back" when you release. I suppose if i put a capacitor across the speaker it woud help round the edges off or just sightly low pass filter it and put it through a small amplifier to a speaker.
As a rule of thumb, if you want a single/simple sound effect, almost 100% fixed (the customer WON'T change their minds) etc etc. Then a discrete solution (such as what I previously linked to, or what other(s) have mentioned), may be a viable alternative to an MCU.
But if it/its (specifications) might change (especially a lot), needs to be of high quality (even if you think it does NOT need to be, but all this talk of extensive filtering, makes it sound like it needs to be fairly high quality), multiple (different) sound effects, and/or other complications. Then really the MCU based solution, is probably the best way forward.
If there is any kind of quantity involved ?, pics/microchip (although you want to NOT use them), offer a pre-programmed service (at reasonable cost, and not that high volume requirements, when I last looked), which can save hassle, especially for surface mount parts, which are harder to program by hand.
If it is only a ONE-OFF, then the quickest prototype you can come up with, is probably best.
EDIT:
Obviously a discrete solution CAN handle multiple sound effects and/or high quality, with (usually) extra components. But at some point, it makes much more sense to use an MCU, which can handle a huge range of options, with little or no change in hardware or its costs.
Pics are just one type of MCU which could do it. Using the ones you are familiar with is ok, as long as the volumes are not so high and/or costs need to be so low, that it would be problematic.