Note: I am adding to this list as i think of things. So you may want to re-read it in case i have added more since you read it last.
Questions- Have you got any units back that a client says don't work? If so do they work for you or are they dead.
- Is your QC test automated? If so are you sure it is not faulty and letting dead units out the door? Maybe a manual test is needed so you know for sure all units leaving you are 100% working.
- Is there anything unusual about your location or places you ship to. Some places irradiated all their mail with high energy x-rays and this can destroy electronics.
- Does the MCU/software system interface/talk-to something that is different in different parts of the world. Here's examples of what i mean. Maybe bluetooth to a phone or maybe comms to a desktop PC app. Maybe some people have their phone/desktop set to a different timezone/country/language and may your app is incompatible with this.
- Do you have the sourcecode to your ATMega or does the freelancer hold that?
Few possibilities i can think of.
- Are users connecting the battery around the wrong way and damaging the device. (9V around wrong way for a split sec, that sort of thing)
- Do you have any floating inputs pins on the MCU? Maybe the code only works if an input is read as either low or high but it keeps changing with ambient noise. When built it might stay in one state but in noise environments maybe it floats to high and stops the code running. (Floating inputs should have MCU pullups enabled in software but maybe they are not set in your code?)
- Have you tried powering the device from 4.5V and with lets say 50mA current limit. Not all USB ports are created equal. Maybe your product is quite critical on power and not all USB ports can power it.
- Where are you getting your parts from, maybe you are getting lots of fake ICs
- Could be a PCB track routing issue where tracks run too close to a hole or board edge and sometimes get cut by the drill/router. etc Some PCBs work, some don't, some intermittent.
- Are you sure you have the ATmega Fuse Bits set correctly, maybe the startup delay, brownout detector or crystal settings are wrong and this is making it run intermittently.
- Does the product have protection from ESD or PSU spikes, like a TVS? Does the product get used in a location where it might need this. etc automotive/industrial
- How are you programming the MCUs? I one had a crappy USBASP programmer that would brick 2 our of 5 AVRs it flashed. Not sure why, maybe clock was out of spec and kept erasing fuse bits.
- Does your MCU programming system include a verify check?
- There is one AVR MCU, cant remember which, that comes with fuse bit set to put it into a compatibility mode where it pretends to be a different AVR chip. Some of the IO/peripherals don't work until you get it out of that mode.
(my guess is they have a supply agreement to sell a compatibly chip for 25 years for MIL/MED/AERO)
EDIT: All ATMega128 pretend to be a ATMega103 until you change the M103C fuse bit
And once the PCBs arrived, none of them would allow me to burn the boot loader onto the ATMEGA via the programming header.
Had to pull all the ATMEGAs off and flash them in a socket.
Put them back on and 90% of the boards I made just don't work as expected.
This makes me lean towards a PCB/SCH issue.
Are you using MISO MOSI SCK pins for anything else other than programming?
You can use them for other things too, but you need to make sure you don't load the lines so much that programming is effected.
It can become intermittent if you load them or have caps on the line to gnd.
Also, grab one of those boards that doesn't program and use DMM to check the tracks between the programming header pins GND VCC MOSI MISO SCK RESET and the ATmega pads for those pins. Also check none are shorted together.
Help- Are your PCB files in Altium? if so i'm happy to take a look at your SCH/PCB/CODE and see if i can spot any potential problems.