Author Topic: Pulling my hair out. Circuit boards stop working once shipped to client and more  (Read 11973 times)

0 Members and 6 Guests are viewing this topic.

Online tggzzz

  • Super Contributor
  • ***
  • Posts: 20026
  • Country: gb
  • Numbers, not adjectives
    • Having fun doing more, with less
If I may, I will invest time in understanding WHAT is failed instead trying to fix something that you don't know exactly.

HantekDSO5102P is fine, you could choose also a Siglent SDS1052DL or a used classic Rigol DS1052E.
I know that you should learn how to use it properly but it's something mandatory and not so difficoult (if I can use it , anyone else can do). You can't live without it.

Once you have an oscilloscope, you can't (and you wouldn't) get back!

Imagine this: without that instrument you are blind.

IDK what I am looking for though with it.

I agree with your previous inference that you will be getting someone else to redesign it (viz. "Even so, would it even be worth it if I need to hire someone to remake the circuit anyway..."). That seems like a sensible course of action.

If the hardware is poorly designed/implemented, then any software you add would be building castles on sand.
There are lies, damned lies, statistics - and ADC/DAC specs.
Glider pilot's aphorism: "there is no substitute for span". Retort: "There is a substitute: skill+imagination. But you can buy span".
Having fun doing more, with less
 

Offline AndyC_772

  • Super Contributor
  • ***
  • Posts: 4255
  • Country: gb
  • Professional design engineer
    • Cawte Engineering | Reliable Electronics
If you think the FTDI chip might be involved, the next step is to work out possible ways in which it could cause the symptoms you're seeing.

Are there any physical pins on the FTDI chip that are also shared with pins which are needed to burn your boot loader?

Without knowing the details of your design, I'd suggest two possibilities - either:

a) there's a logic signal (or signals) in common. Are your programming (SPI / reset) pins connected to the FTDI, or are they separate? If they're completely separate, then it really shouldn't be able to interfere with boot loading via that route.

b) they share a common power supply, and something bad is happening which is causing the voltage at the MCU to go out of spec during programming. Does anything get warm?

Another option (c) is that the FTDI chip is a complete red herring, and the difference is caused by heating, cooling and flux contamination of your PCB when you remove and replace components. Be sure to thoroughly clean the board after every rework operation, especially in and around the MCU crystal if it has one.

Offline NivagSwerdna

  • Super Contributor
  • ***
  • Posts: 2497
  • Country: gb
Be careful to not conflate problems... looks like the earlier revision boards have a reliability issue but that needs proper post-mortem analysis.

Rev3 boards...
And once the PCBs arrived, none of them would allow me to burn the boot loader onto the ATMEGA via the programming header.
This is a probably a different problem, consider your design/manufacturing to be flawed and work from first principles on one board.  Does your PCB software have a rules check?  Check the uP pins for things that are GND that shouldn't be.
Populate one board with the bare minimum to program the uP onboard and work backwards.

Good Luck!
 

Offline ebastler

  • Super Contributor
  • ***
  • Posts: 6676
  • Country: de
Have you gotten boards back from customer? i.e. customer returns.
If you get a customer return board back on your bench does it work?

I second that question, and don't think the OP has answered it (unless I overlooked some comment). This is an important step in finding the root cause of the failures in the field.

@Jackster: Do you actually know that the boards somehow "broke" in transit? Or are they still in the same shape they were in when you sent them out, but don't work at the customers while they still work when you (re-)test them at home?
 

Offline JacksterTopic starter

  • Frequent Contributor
  • **
  • Posts: 469
  • Country: gb
    • PCBA.UK
I am not saying the FTDI chip is the cause btw. Just that when removed and replaced with a known working one, there is no boot loader issues.
I am pretty much on board with the issue being the board design. Probably just coincidence that the old FTDI is working and the new ones are not?
Only tested a handful.

Possible fake FTDI chips?  Hack a bad board to loop the FDTI's RX and TX pins, open the USB serial port with a terminal program and see if it responds
Code: [Select]
NON GENUINE DEVICE FOUND!character by character as you type instead of the loopback echoing what you type.  (see: https://www.eevblog.com/forum/microcontrollers/ftdi-gate-2-0/ )

I don't think so. They all have different serial numbers. They were cheap though at $2.78 per chip.
Doing as you said, it just echos back what I type.


If you think the FTDI chip might be involved, the next step is to work out possible ways in which it could cause the symptoms you're seeing.

Are there any physical pins on the FTDI chip that are also shared with pins which are needed to burn your boot loader?

Without knowing the details of your design, I'd suggest two possibilities - either:

a) there's a logic signal (or signals) in common. Are your programming (SPI / reset) pins connected to the FTDI, or are they separate? If they're completely separate, then it really shouldn't be able to interfere with boot loading via that route.

b) they share a common power supply, and something bad is happening which is causing the voltage at the MCU to go out of spec during programming. Does anything get warm?

Another option (c) is that the FTDI chip is a complete red herring, and the difference is caused by heating, cooling and flux contamination of your PCB when you remove and replace components. Be sure to thoroughly clean the board after every rework operation, especially in and around the MCU crystal if it has one.

No shared pins for boot loading other than RESET.


Boards are pretty clean. Been using acetone as that is all I have :/




Be careful to not conflate problems... looks like the earlier revision boards have a reliability issue but that needs proper post-mortem analysis.

Rev3 boards...
And once the PCBs arrived, none of them would allow me to burn the boot loader onto the ATMEGA via the programming header.
This is a probably a different problem, consider your design/manufacturing to be flawed and work from first principles on one board.  Does your PCB software have a rules check?  Check the uP pins for things that are GND that shouldn't be.
Populate one board with the bare minimum to program the uP onboard and work backwards.

Good Luck!

I tried with just the ATMega328p, crystal and cap on the reset pin.
 Pretty sure it burned the boot loader. Was a few days ago and forgot to write it down. Ill try again.



Have you gotten boards back from customer? i.e. customer returns.
If you get a customer return board back on your bench does it work?

I second that question, and don't think the OP has answered it (unless I overlooked some comment). This is an important step in finding the root cause of the failures in the field.

@Jackster: Do you actually know that the boards somehow "broke" in transit? Or are they still in the same shape they were in when you sent them out, but don't work at the customers while they still work when you (re-)test them at home?

The boards are sent out in an aluminium case.
They are physically fine. They just develop a fault where the software no longer cycles.

This can happen on new boards too. It will go through the code 3-6 times and then hang.


Offline Mr. Scram

  • Super Contributor
  • ***
  • Posts: 9810
  • Country: 00
  • Display aficionado
I remember many years ago, a separate State branch of the organisation I worked for were tasked with making boards that would automatically ring various phone numbers  when required.

We duly received our portion of those devices, but unfortunately they didn't work.
When we complained to the other State, they protested:
 "But we tested them & they all rang up who they were supposed to!"

Yup! They dutifully programmed the whole number needed to call those sites from their State into the PROMs.
Those additional numbers, of course, weren't needed in the State they were intended for, & would "freak the exchange out".
I'd argue that's an error on the exchange end, but you'll still have to deal with it.
 
The following users thanked this post: newbrain

Offline AndyC_772

  • Super Contributor
  • ***
  • Posts: 4255
  • Country: gb
  • Professional design engineer
    • Cawte Engineering | Reliable Electronics
Why have you got reset connected to DTR via a capacitor?

What happens if you remove that capacitor?

Offline ubbut

  • Regular Contributor
  • *
  • Posts: 100
  • Country: de
Did not read the whole thread just leaving this here:


ATMegaxxxPB devices do not support the 'high amplitude crystal oscillator mode' only the 'low power mode'.
If you set the fuse bits like for a PA device, some will work, others won't. But always unreliably. Usually first programming action is fine, but then they might just be unresponsive..
Cost me ~100 faulty atmegas until the problem was discovered.
 

Offline NivagSwerdna

  • Super Contributor
  • ***
  • Posts: 2497
  • Country: gb
FWIW I have a high value R across my XTAL... no idea why... think I must have stolen the idea from the Uno reference design.

.... but https://forum.arduino.cc/index.php?topic=176297.0 so I wouldn't lose any sleep over that one  :D
« Last Edit: June 07, 2019, 12:48:12 pm by NivagSwerdna »
 

Offline oPossum

  • Super Contributor
  • ***
  • Posts: 1428
  • Country: us
  • Very dangerous - may attack at any time
Why have you got reset connected to DTR via a capacitor?

Copied from Arduino. Not a good design IMO.
 

Offline mcinque

  • Supporter
  • ****
  • Posts: 1129
  • Country: it
  • I know that I know nothing
I'm puzzled.

We don't know what the device in question does exactly. We know only it uses Wi-Fi and USB.
We have no data on what is wrong on the board or what do not work (hardware? software?). We know only that some boards doesn't work anymore.
We don't know in which mode the boards fail (what should they do? They work partially? They don't turn on? The microcontroller won't start up? The Wi-Fi does not connect? There is no output? What?)
There is no diagnostic data except "Power is fine, sensor input is fine".

I don't think we can guess the problem because there are so many possibilities, from fake chips to esd to user fault... the list goes on and on.

I think this is not the right way to proceed. We need data.

Can you post here (or privately via PM) a full schematic and at least a working diagram of your sketch and principle of operation toghether with detailed symptoms?

Here there are many people that wants to help you, but try to provide something more to work on.
 
The following users thanked this post: Cubdriver, newbrain, OwO

Offline mcinque

  • Supporter
  • ****
  • Posts: 1129
  • Country: it
  • I know that I know nothing
It will go through the code 3-6 times and then hang.
THIS. You must understand why it hangs. If it runs at least 3-6 times it's not the hardware "broken" (intended as burnt or physically broken) probably. There is something changing or at limit on the hardware that is affecting the software readings, maybe.

Could be a power issue or an input issue.

Assuming the power is perfectly fine and there isn't some glitch, noise or bounce affecting the MCU (which you can be sure only with a DSO) when something happens in the code running, you should use a debugger to understand what's happening into the MCU.

If you cannot use a debugger, try to comment your code in excess and output this comments in serial, to use them as "poor man debugger", so you can see at which point it hangs.

Depending on what the code is doing when it happens, you could guess better what's the cause.

But be sure about the power rail.
« Last Edit: June 07, 2019, 03:02:25 pm by mcinque »
 

Offline Mr. Scram

  • Super Contributor
  • ***
  • Posts: 9810
  • Country: 00
  • Display aficionado
What a messy debugging process this is, if you can call it that. You really need to describe what you're dealing with in as much detail as possible. Then you systematically start eliminating potential issues, the most likely first. Right now it's just a haphazard scramble with major parts left in the dark.
 

Offline JacksterTopic starter

  • Frequent Contributor
  • **
  • Posts: 469
  • Country: gb
    • PCBA.UK
I'm puzzled.

We don't know what the device in question does exactly. We know only it uses Wi-Fi and USB.
We have no data on what is wrong on the board or what do not work (hardware? software?). We know only that some boards doesn't work anymore.
We don't know in which mode the boards fail (what should they do? They work partially? They don't turn on? The microcontroller won't start up? The Wi-Fi does not connect? There is no output? What?)
There is no diagnostic data except "Power is fine, sensor input is fine".

I don't think we can guess the problem because there are so many possibilities, from fake chips to esd to user fault... the list goes on and on.

I think this is not the right way to proceed. We need data.

Can you post here (or privately via PM) a full schematic and at least a working diagram of your sketch and principle of operation toghether with detailed symptoms?

Here there are many people that wants to help you, but try to provide something more to work on.


I can't go into any more detail into the debugging as I am not an electronic engineer and can only go as deep as "it is not burning the boot loader" and "it stops after X many cycles".
I know which each part of the circuit does but not how it works on the sort of level required for debugging.

The device takes a PWM input from a sensor and displays the result on 7 segment displays.
It can transmit this info over the WiFi interface to another that will take the data from the WiFi and display it on its seven segment display.



Happy to PM the files.

Offline ebastler

  • Super Contributor
  • ***
  • Posts: 6676
  • Country: de
The device takes a PWM input from a sensor and displays the result on 7 segment displays.
It can transmit this info over the WiFi interface to another that will take the data from the WiFi and display it on its seven segment display.

Is that the unit conversion device for sonar measurements which you had posted about earlier, by any chance? These are used on boats, I would assume. Are you sure they handle the vibrations and humidity well?

(Is this maybe also the "three boards connected via FFC connectors" design you have also asked questions about in earlier threads? If so, are you sure the connectors are robust enough, and are you sure the signals make it across the connections in good shape?)

May I add a personal comment:  It would not hurt if you had the courtesy to let us know what your product is and does, and give us a link to the website you presumably have. You are selling these for profit, it seems, and are asking for free advice here. At least satisfy our curiosity in return; and the information may help with the troubleshooting as well.
 
The following users thanked this post: Ysjoelfir

Offline DimitriP

  • Super Contributor
  • ***
  • Posts: 1360
  • Country: us
  • "Best practices" are best not practiced.© Dimitri
Quote
I'm puzzled.
The lack of an oscilloscope is also a bit puzzling. 
   If three 100  Ohm resistors are connected in parallel, and in series with a 200 Ohm resistor, how many resistors do you have? 
 
The following users thanked this post: Cubdriver, InductorbackEMF

Offline JacksterTopic starter

  • Frequent Contributor
  • **
  • Posts: 469
  • Country: gb
    • PCBA.UK
The device takes a PWM input from a sensor and displays the result on 7 segment displays.
It can transmit this info over the WiFi interface to another that will take the data from the WiFi and display it on its seven segment display.

Is that the unit conversion device for sonar measurements which you had posted about earlier, by any chance? These are used on boats, I would assume. Are you sure they handle the vibrations and humidity well?

(Is this maybe also the "three boards connected via FFC connectors" design you have also asked questions about in earlier threads? If so, are you sure the connectors are robust enough, and are you sure the signals make it across the connections in good shape?)

May I add a personal comment:  It would not hurt if you had the courtesy to let us know what your product is and does, and give us a link to the website you presumably have. You are selling these for profit, it seems, and are asking for free advice here. At least satisfy our curiosity in return; and the information may help with the troubleshooting as well.


Yea a few years back. Same project but a lot further on. Not used on boats, and the boards are conformally coated.


I am not using the FFC connectors. That design was stupid and over complex. Moved all the processing to a single board and now use 4 pin cables with JST connectors to transmit power and data.
Note that this is for a different board. This topic is about the other board I make.


I very much don't want to link the product. This is obviously not a good look to have.
Yes I do sell for profit, I hope I made that clear in the OP. This topic was more for knowing what to do next rather than actual debugging which I think is above my skill level.

As mentioned previously, quite happy to pay for someone to take a look and re-make the boards.


Quote
I'm puzzled.
The lack of an oscilloscope is also a bit puzzling.

I would not know what to do with it..

Offline GreggD

  • Regular Contributor
  • *
  • Posts: 136
  • Country: us
You might want to inject a external oscillator into the atmel crystal input pin. Then try to program and set crystal drive level fuses. Works for me.
 

Offline soldar

  • Super Contributor
  • ***
  • Posts: 3505
  • Country: es
... my guess would be oscillator startup issues - assuming you use a crystal or ceramic resonator. This would be consistent with inability to program and some units working and some not.
...
If you have a dead board in front of you, poke the oscillator pins and see if it starts.
good thinking!
All my posts are made with 100% recycled electrons and bare traces of grey matter.
 

Online langwadt

  • Super Contributor
  • ***
  • Posts: 4545
  • Country: dk
I am not saying the FTDI chip is the cause btw. Just that when removed and replaced with a known working one, there is no boot loader issues.
I am pretty much on board with the issue being the board design. Probably just coincidence that the old FTDI is working and the new ones are not?
Only tested a handful.

Possible fake FTDI chips?  Hack a bad board to loop the FDTI's RX and TX pins, open the USB serial port with a terminal program and see if it responds
Code: [Select]
NON GENUINE DEVICE FOUND!character by character as you type instead of the loopback echoing what you type.  (see: https://www.eevblog.com/forum/microcontrollers/ftdi-gate-2-0/ )

I don't think so. They all have different serial numbers. They were cheap though at $2.78 per chip.
Doing as you said, it just echos back what I type.


the FTDI "NON GENUINE DEVICE FOUND!" is a driver thing so it might work just fine with an old driver
and fail if using a newer driver



 

Offline Ian.M

  • Super Contributor
  • ***
  • Posts: 12977
OTOH that particular FTDI Windows driver quirk will definitely cause AVRDUDE to fail to communicate with a serial bootloader in an ATmega, without any other evidence of why its failed.   If you are even slightly suspicious of the authenticity of a FTDI USB<=>serial chip, you are using a Windows PC, and the FTDI driver is more recent than the 'FTDIgate' event, doing that loopback test is essential.   Don't bother if you are using Linux or a Mac - its a Windows driver only issue.

If you still have FTDI on your 'design in' list in spite of their shenanigans, and you aren't a big enough player to buy direct from FTDI to avoid supply chain contamination, IMHO its essential to design your board to make it easy to do that loopback test - a loopback jumper would be a reasonable choice.

Another possibility with FTDI clones is failure at higher baud rates.  It only takes one lost or corrupted character in a few thousand to totally screw up bootloading, so you may wish to try a lower baud rate.  However the bootloader *MUST* support the baud rate you chose as both ends need to match.
« Last Edit: June 08, 2019, 12:00:21 am by Ian.M »
 

Offline NivagSwerdna

  • Super Contributor
  • ***
  • Posts: 2497
  • Country: gb
This is a fun thread... it has something for everyone and not enough information for any proper conclusions...

.. The schematic showed a ICSP header... I presume that is what is being used to program the uP?  The requirements for ICP are minimal... if it isn't working on Rev3 boards you either have some extra shorting to ground (due to errant ground pour) or dodgy chips.

Ignore any talk about Flux, ESD etc... until all the obvious has been eliminated.  For now just use a multimeter on continuity to determine if any ground shorts on Rev3 boards.
 

Offline JacksterTopic starter

  • Frequent Contributor
  • **
  • Posts: 469
  • Country: gb
    • PCBA.UK
This is a fun thread... it has something for everyone and not enough information for any proper conclusions...

.. The schematic showed a ICSP header... I presume that is what is being used to program the uP?  The requirements for ICP are minimal... if it isn't working on Rev3 boards you either have some extra shorting to ground (due to errant ground pour) or dodgy chips.

Ignore any talk about Flux, ESD etc... until all the obvious has been eliminated.  For now just use a multimeter on continuity to determine if any ground shorts on Rev3 boards.

Yes the ICSP header is for programming.

I can't find any pins on the MCU that are grounded that should not be.

Offline mcinque

  • Supporter
  • ****
  • Posts: 1129
  • Country: it
  • I know that I know nothing
it has something for everyone and not enough information for any proper conclusions...
In my opinion it lacks of something. Not enough data.
I renew my offer: PM complete schematics and at least a flowchart principle of operation (not the sketch) toghether with failure symptoms (what should do and what instead do the board).
 

Offline thinkfat

  • Supporter
  • ****
  • Posts: 2154
  • Country: de
  • This is just a hobby I spend too much time on.
    • Matthias' Hackerstübchen
If you look at the top of package of the Atmega from LCSC, it looks like the chamfer is a lot thinner, close to nonexistent. Were I to venture a guess, I'd say the top face of a chip has been milled or ground down a few thou to get rid of the marking and then a new marking has been stamped on. It's hard to see on the photo, however. To get a better image, maybe  wet the chip with IPA and try a different light angle.
Everybody likes gadgets. Until they try to make them.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf