Author Topic: Pulling my hair out. Circuit boards stop working once shipped to client and more  (Read 11974 times)

0 Members and 6 Guests are viewing this topic.

Offline JacksterTopic starter

  • Frequent Contributor
  • **
  • Posts: 469
  • Country: gb
    • PCBA.UK
Turns out PCBway messed up a bit.

I was unable to see this with my eyes at first but now I have seen it, it is obvious.



The green PCB is correct. The Black one is one from the bad batch from PCBway.
The spacing between the holes is 1.27mm and the distance between the solder pads should be 0.375mm.


Not saying my problems are all their fault.
We have worked out what was causing the other programming issues that affected other boards in earlier batches.
I am working on improving those issues now with a Tag-Connect connector for programming.

I know there are a few other things that are not correct but we will see if they need fixing.


Thank you all for the support.
Learnt a lot.

Offline bd139

  • Super Contributor
  • ***
  • Posts: 23045
  • Country: gb
Might want to look at JLCPCB. They actually test the boards...
 

Offline JacksterTopic starter

  • Frequent Contributor
  • **
  • Posts: 469
  • Country: gb
    • PCBA.UK
The green board is from them.

I have had hundreds of boards from PCBway and this is the first issue I have had.

Offline mcinque

  • Supporter
  • ****
  • Posts: 1129
  • Country: it
  • I know that I know nothing
Thank you for letting us know about the issue

I have had hundreds of boards from PCBway and this is the first issue I have had.

As a suggestion for the future, remember to always order board testing for production boards.


 

Offline JacksterTopic starter

  • Frequent Contributor
  • **
  • Posts: 469
  • Country: gb
    • PCBA.UK
Thank you for letting us know about the issue

I have had hundreds of boards from PCBway and this is the first issue I have had.

As a suggestion for the future, remember to always order board testing for production boards.

I thought they did do that over x number of boards ordered?

Offline mcinque

  • Supporter
  • ****
  • Posts: 1129
  • Country: it
  • I know that I know nothing
I thought they did do that over x number of boards ordered?
Proabably. But it's the "every X boards" that makes the difference.

Of course class2 testing (all traces are tested individually) it's not included.

 

Offline AndyC_772

  • Super Contributor
  • ***
  • Posts: 4255
  • Country: gb
  • Professional design engineer
    • Cawte Engineering | Reliable Electronics
The shorted pins on the Wifi connector are certainly interesting, but they don't obviously explain your symptoms.

You've described cases where boards work for you but then stop working for your customer, or where swapping an apparently unrelated part can allow your CPU to program correctly (or not). Those symptoms aren't consistent with a bunch of shorted pins.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Some years ago, I was working for a company making fairly complex ISA cards for PCs. On one batch, we were seeing roughly a 25% failure rate during testing, and we traced the fault to a broken PCB trace in the exact same position on every board.

The boards were made locally by a reputable supplier, who invited us in for a meeting to discuss what had happened.

As it turned out, the board was made 4 up on a panel, and the master artwork for one of the layers had a scratch in just the same position as our fault. We pointed out that the boards were supposed to be 100% tested before shipment.

Upon questioning, the operator doing the testing admitted that he'd removed the test of that particular net because it was failing too often. He had neither checked a board manually by himself, nor told anyone else that there was a problem. The defect was on an outer layer, plainly visible and easy to check with a multimeter, so no excuse whatsoever.

If I recall correctly, the operator in question was fired, the artwork was reprinted and we had no more issues until the board was discontinued some years later.

Bare board test: *always* do it, *usually* believe it.
 
The following users thanked this post: thm_w, Siwastaja

Offline free_electron

  • Super Contributor
  • ***
  • Posts: 8520
  • Country: us
    • SiliconValleyGarage
a couple of other things ( apart from the fouled up pcb spacing )

- not enough bulk capacitance in the design
- not enough local capacitance in design
- the crystal you use is a resonator. your processor fuse bits may need to be tuned for that ! check what the load capacitance and bleed resistor is in those things. you can buy those in different variants and the tuning needs to be done.
- aluminum case.... do you connect that electrically to your system ground ?
Professional Electron Wrangler.
Any comments, or points of view expressed, are my own and not endorsed , induced or compensated by my employer(s).
 

Offline mcinque

  • Supporter
  • ****
  • Posts: 1129
  • Country: it
  • I know that I know nothing
Upon questioning, the operator doing the testing admitted that he'd removed the test of that particular net because it was failing too often. He had neither checked a board manually by himself, nor told anyone else that there was a problem.
This is insane.
 

Offline bd139

  • Super Contributor
  • ***
  • Posts: 23045
  • Country: gb
Upon questioning, the operator doing the testing admitted that he'd removed the test of that particular net because it was failing too often. He had neither checked a board manually by himself, nor told anyone else that there was a problem.
This is insane.

If you think that’s insane come to the software industry. Unit test not working? Delete!
 

Offline JacksterTopic starter

  • Frequent Contributor
  • **
  • Posts: 469
  • Country: gb
    • PCBA.UK
a couple of other things ( apart from the fouled up pcb spacing )

- not enough bulk capacitance in the design
- not enough local capacitance in design
- the crystal you use is a resonator. your processor fuse bits may need to be tuned for that ! check what the load capacitance and bleed resistor is in those things. you can buy those in different variants and the tuning needs to be done.
- aluminum case.... do you connect that electrically to your system ground ?

I have copied the Arduino Nano circuit. Is there design wrong or am I not placing the caps in the correct place?

The case is anodised so is not conductive. So no I do not. Should I?

Offline free_electron

  • Super Contributor
  • ***
  • Posts: 8520
  • Country: us
    • SiliconValleyGarage
Arduino is not exactly a case of 'proper' design. They take too many shortcuts.
That aside , what else is on your board. Anything drawing pulsed currents like muxed displays , rf transmitters etc ?
Professional Electron Wrangler.
Any comments, or points of view expressed, are my own and not endorsed , induced or compensated by my employer(s).
 

Offline JacksterTopic starter

  • Frequent Contributor
  • **
  • Posts: 469
  • Country: gb
    • PCBA.UK
Arduino is not exactly a case of 'proper' design. They take too many shortcuts.
That aside , what else is on your board. Anything drawing pulsed currents like muxed displays , rf transmitters etc ?

Yea I have 4-8 seven segments displays and a nrf24l01.

Offline Psi

  • Super Contributor
  • ***
  • Posts: 10099
  • Country: nz
Arduino is not exactly a case of 'proper' design. They take too many shortcuts.
That aside , what else is on your board. Anything drawing pulsed currents like muxed displays , rf transmitters etc ?

The main problem with using arduino in a professional product is obscure bugs in the libraries.
They are written by random people who may not be very good at it.
Greek letter 'Psi' (not Pounds per Square Inch)
 

Offline OwO

  • Super Contributor
  • ***
  • Posts: 1250
  • Country: cn
  • RF Engineer.
The green PCB is correct. The Black one is one from the bad batch from PCBway.
The spacing between the holes is 1.27mm and the distance between the solder pads should be 0.375mm.

Check what the annular ring width is. It's possible their software enlarged the pads because the annular ring spec was violated.

If it took this long to triage a fault in production as simple as some shorted pins, then I would say the process needs some work too and not just the design. Were these first few boards soldered manually or did you put the whole batch to automated assembly? The contractor I use always assembles one board from each batch by hand as a sanity check before starting any automated assembly.
Email: OwOwOwOwO123@outlook.com
 

Offline free_electron

  • Super Contributor
  • ***
  • Posts: 8520
  • Country: us
    • SiliconValleyGarage
Arduino is not exactly a case of 'proper' design. They take too many shortcuts.
That aside , what else is on your board. Anything drawing pulsed currents like muxed displays , rf transmitters etc ?

Yea I have 4-8 seven segments displays and a nrf24l01.

That would be one possibility. muxed displays draw peak currents. Any kind of noise on your power rail and the cpu may brown out. Same for RF transmitters. it looks like you have those mounted above the cpu ...
Professional Electron Wrangler.
Any comments, or points of view expressed, are my own and not endorsed , induced or compensated by my employer(s).
 

Offline Ysjoelfir

  • Frequent Contributor
  • **
  • Posts: 542
  • Country: de
I know there are a few other things that are not correct but we will see if they need fixing.
I have to admit that I am slightly furious after reading this phrase. If there is something wrong with your design and you know it but decide to go like "nah, isn't that bad, people buying this won't notice!" you are up for a very bad suprise. Reputation is slowly getting more important again, after years of cheap electronics that fail just after warranty ends because of obvious* design flaws that are not corrected because of cost and "well, it works NOW, why should I care if it works in 3 years?" - mentality.


* or intentionally created
Greetings, Kai \ Ysjoelfir
 

Offline JacksterTopic starter

  • Frequent Contributor
  • **
  • Posts: 469
  • Country: gb
    • PCBA.UK
eMail from PCBway

Quote
Thanks for your information, I checked that your order BATCH3 is with the same pad design
and the production file of it is with smaller pads as your design. The difference is that order BATCH3
and BATCH2 produced at different production line, and it need different way to prepare production file.

Offline bd139

  • Super Contributor
  • ***
  • Posts: 23045
  • Country: gb
So they fucked up basically
 

Offline Rerouter

  • Super Contributor
  • ***
  • Posts: 4694
  • Country: au
  • Question Everything... Except This Statement
Fun to know to avoid them,

Another agreeing that the default arduino layout is not always the best, but the software librarys can be used in commercial systems without issue. provided you review the code, and test the crap out of it.

There are about 1000 arduino based card readers of my creation floating out in the wild in some of the worst electrical and environmental conditions you can imagine, and yet have not had a single lock up and only 2 replacements due to vehicles being submerged in flood waters. Because at the end of the day, arduino code is just AVR code for most things, If you make sure everything checks out, then your sweet (verbose output for compilation is a good way to catch potential issues early)
 
The following users thanked this post: Jackster

Offline Dubbie

  • Supporter
  • ****
  • Posts: 1115
  • Country: nz
I can’t really see how this could happen. If the Gerbers have whatever size in them, how is it possible for them to magically change?
 

Offline AndyC_772

  • Super Contributor
  • ***
  • Posts: 4255
  • Country: gb
  • Professional design engineer
    • Cawte Engineering | Reliable Electronics
It's very normal for PCB manufacturers to make changes to artwork in order to adjust for the physical characteristics of their process. If, for example, they know that their etching process will over-etch by <x>, then they'll adjust the artwork to increase track width by <x> to compensate.

The problem comes if the process doesn't actually do what the (possibly modified) artwork was intended for.
 
The following users thanked this post: cpt.armadillo

Offline Rerouter

  • Super Contributor
  • ***
  • Posts: 4694
  • Country: au
  • Question Everything... Except This Statement
same for how they will usually erase silkscreen from copper pads and treat any hole with 2 touching copper pads as plated, these are simplifications in there process that generally lead to the best customer relation outcome, they just goofed. and somehow missed it in testing (likely because the production file was the flaw the optical never caught it)

The part that the less cheap PCB suppliers will do is give you feedback on ways to make your PCB more production ready for the next run, Allpcb ironically lets you download there production gerbers, which let you see what has changed, generally on the second run of boards I'll see what they shifted and adjust accordingly, Its fun to see just what they let through, Oh you want 0.4/0.3mm vias, yep straight on through without modification, its your own fault if you get a breakout.
« Last Edit: July 02, 2019, 10:32:17 am by Rerouter »
 

Offline JacksterTopic starter

  • Frequent Contributor
  • **
  • Posts: 469
  • Country: gb
    • PCBA.UK
Arduino is not exactly a case of 'proper' design. They take too many shortcuts.
That aside , what else is on your board. Anything drawing pulsed currents like muxed displays , rf transmitters etc ?

Yea I have 4-8 seven segments displays and a nrf24l01.

That would be one possibility. muxed displays draw peak currents. Any kind of noise on your power rail and the cpu may brown out. Same for RF transmitters. it looks like you have those mounted above the cpu ...

So I added a 0.1uF in the original design for the NRF24L01 board.
Reading up on it, people are recommending 10uF and some are recommending either a second or a tantalum as well as.
I gave the NRF24L01 board its own 3.3v power supply (everything else is 5v) and added the 0.1uF between the regulator and the NRF24L01.
The NRF24L01 boards I use are a bit higher spec that the Arduino hobby boards bought on eBay. It has a +10 or +20 dB gain circuit in it too.

The driver for the LEDs 7 segment displays is pretty close to the main 5v regulator.
There are no caps near it though. I looked at off the self boards for that IC and to see how they did it.
No caps on it other than on the signal lines in.

Any recommendations?
Thank you

Offline Siwastaja

  • Super Contributor
  • ***
  • Posts: 8310
  • Country: fi
Looking at the clues posted in this thread, I'm almost 99.9% positive there's more to your problems than just the PCB mishap. Although it's possible such "almost shorted" pads could give intermittent operation, it's unlikely it happens in multiple units. You have so many unexplained incidents of it failing, working again, then failing again.

If you want to become a professional design engineer, do yourself a favor and as soon as you have a moment of silence, don't go on to design more features, or a more advanced product, but instead, try to do a proper root cause analysis. Instead of just building products, try to build a process/a "factory" where you can robustly build these products without wasting a lot of time.

You seem to have many issues, some are likely correlated, some are not.

In a stressful situation, we tend to fall back into trying to just get things to work by whatever means. Like can't get the MCU flashed? It's not a total showstopper, swap the board and go on. But in the long run, solving the problem once and for all would pay back in time used, and, it could turn out it's connected to your other issues, so they would be solved as well.

When I was looking this comment of yours:

"They just develop a fault where the software no longer cycles. This can happen on new boards too. It will go through the code 3-6 times and then hang. "

I thought, you are very lucky. You have a lot of specimen that do fail, on your hands. And you have consistent failures. Like you don't need to operate a well-performing product for weeks to see a failure. If I understand correctly, you have at least one (1) unit in your hands which you can demonstrate a failure with, within minutes or hours. That's great.

It doesn't matter what the fault is and what do you think it might be caused by. Given this particular failure you can demonstrate, go for full-blown root cause analysis and see what you find.

You just need to make your steps smaller, and lower level. Whenever you hit a wall of not knowing how to do it, Google it, learn it.

I don't personally use debuggers a lot, but this could be a case where you'd get a starting point. Failing to have one, just make your code turn an LED on/off at different points of code, after a few iterations of moving around where you turn the LED on/off you have found the exact place in code where it hangs.

If your MCU isn't flashing, look at the communication signals with an oscilloscope, decode the contents. It may take several hours, but then you know exactly where it hangs. Chances are, you find some analog signaling issue (stuck logic level, bad rise/fall time)... in two seconds after looking at the scope screen.

Get yourself the basic tools, a 50MHz 2-channel digital storage oscilloscope being a bare minimum to debug such a design. A $400 4-channel Rigol or similar is more than enough, but I'm sure you can get an older generation thing used for maybe $100.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf