Why does it need to be smallest/cheapest?
If it doesn't at all, just get a bigger one, 10-50W is very affordable. Don't even need a heatsink, if you use an aluminum body type.
Which, a note: aluminum-body resistors tend to be cheaper and smaller. Well, they are, until you factor in the heatsink and mounting labor required to use them at ratings! Without heatsink, they're rated a tiny fraction of the total rating, and traditional ceramic body resistors are the better overall deal.
For low average dissipation (averaged over minutes+), they'd actually be perfect here, though. The extra conductive mass yields lower peak temperatures in operation like this.
Don't forget to put a thermal switch or other limiting/lockout/protection on it! All too often I've seen, e.g. in industrial equipment, mains precharge resistors, that didn't get switched out and then had to power the full load by themselves, which needless to say they didn't take kindly to.
Regarding energy -- it's a sliding scale with respect to time. Obviously, for peak power on the order of rated power, the maximum pulse width is infinite, and therefore so is the energy. So, as peak power goes up, the total energy is hyperbolic with time.
If the resistor body heated up uniformly, didn't dissipate heat during the pulse, and dissipated it only afterwards, then all the energy of the pulse would be stored as heat and the peak temperature would depend only on the total energy of the pulse. That would give peak power inversely proportional to time -- constant energy.
But that's a silly case. Real resistors dissipate power when their outside is hot, and real resistors are made of materials which take time for heat to diffuse through them. So we expect the pulse energy to rise (eventually rising very sharply towards infinity) at lower peak power levels. We should expect modest overloads (2x say, but probably not 5x+) can be tolerated for a long time, presumably not forever (material degradation will kick in), but handling quite a lot of energy in the process, in any case. We should also expect that energy goes down at short time scales, where less of the resistor itself has been heated up (except for some wirewounds, and high-energy bulk/composition resistors).
As it happens, the response tends to be energy proportional to sqrt(t) or thereabouts (at peak powers much higher than rated, or equivalently: pulse durations much shorter than the overall thermal time constant). Which implies a diffusion mechanism, and indeed heat is diffusive, so that makes sense.
Or equivalently, peak power proportional to 1/sqrt(t).
Real datasheets show this curve having an exponent of 1/3 (i.e., cube root) to 1/2 (sqrt), but it may be as low as 0 (constant energy).
What use is this? For present case: we're in the hyperbolic range, so, it's not very useful. We only have one datapoint (e.g. 5x rating 5s), not enough information to extrapolate the curve.
For short pulses: we can infer some things about devices of similar construction. Given a datasheet for, say, a pulse-rated wirewound resistor, we should expect similar parts have similar pulse ratings, even if they aren't rated for it.
And, given that we don't require a guarantee of operation for this part. An example might be mains surge arresting, or crowbarring a fuse, or other very infrequent stuff like that: the equipment might be expected to handle ten cycles in its entire lifetime, say. In that time, it's unlikely that the part will fail (and it probably will, due to, say, inferior terminal welding, or coatings that can't handle the thermal shock, or..).
Of course, the obvious exception that you'll inevitably encounter, is the one customer with super dirty power that fatigues your protection scheme to failure in a couple of months. Then it's a business decision: so-and-so many unhappy customers, or this-and-that added cost for the larger or pulse-rated part.
Whereas when reliability is required (say, test equipment, a surge generator that's literally doing this its entire life!), definitely opt for the pulse rated part.
Tim