What I can see is that 1k resistor on the gate is too much. Considering 430 pF gate capacitance and that you are doing pwm it is not good at all.
It's there to limit the current from the microcontroller pin controlling the gate. If you look at the datasheet, you'll see that any Vgs north of 2V is more than sufficient to turn it on. From a 5V micro, through the 1k resistor, this takes about 250ns (and confirmed as such with a scope).
I'm running a pseudo-random PWM signal through it at around 1 MHz, and have no issues with heat dissipation whatsoever, verified with a thermal imager. We also don't see failures happening during operation once the two boards (LED carrier and LED controller) are connected, operational, and tested; they are only spotted at the first test immediately after connecting the boards together.
Also what is that current control in series with the source?
It's an opamp/mosfet based constant current source (CCS), pretty similar to this:
You must apply voltage between gate an source and sticking something in between of them may have all kinds of effects.
Yes, but when the control FET (the one on top) is off, the voltage at the input of the CCS is 0V. When it's on, the voltage is between 0-400mV, depending on current. Since I'm driving the FET at 5V (ground referenced), there aren't any issues with the gate voltage not being high enough to fully turn on the FET.
Did these boards make it through test and were delivered to end customer? What is the human interface that turns the LED on and off, or is there no human interaction, and it's all processor controlled.
Just an MCU driving the gate. Boards were tested, and those that had issues with this were fixed. Then, after installation (and handling), a few ended up with the same issue again.
Connectors that have exposed pins are possible. But if you really believe it's ESD, an operator would have to be able to inject into the gate in some way.
My intuition is that there was some ESD damage during handling, but it's not clear if it was injected at the gate or the drain. The drain is exposed on the connector, making it the most likely place for ESD, but the boards didn't have a conformal coating or anything, so there is definitely a chance that the gate could have been touched at some point during handling of the boards.
The big question for me is how to prevent this from happening in the future, so I guess what I'm asking is: will an ESD event at the drain affect the transistor at all, or is it much more likely that this happened because of ESD on the gate?