so I think it really should be “don’t use a crap BMS.”
You don't need to look too far back to see where this is coming from. Not even a decade ago, the idea of using a "good BMS" was mostly theoretical for most people: "crap BMS" was the standard. It was impossible to know for a product design engineer - and it still is quite difficult; even for li-ion experts.
Out of commercial li-ion BMS chips (mostly flooded to the market by TI), most were broken-by-design in a way or another. There still are many traps, especially by TI. I have evaluated several totally broken-by-design li-ion ICs by TI. Probably some are very good. But how can I trust them? Looking at TI's li-ion product datasheets, it's utterly clear these parts are designed by inexperienced engineers in a hurry, not using any kind of high-reliability design practices, to keep up with the management's desire to have the greatest lithium ion management IC portfolio (50 new chips out every year, even though the problem field is static); not in co-operation with battery chemistry experts, reliability experts or safety experts.
Out of commercial li-ion BMS modules, most were outright dangerous, or at least would kill the battery. Of course, they are using said IC's, or some are home-brew solutions.
Then you have the more expensive hi-tech li-ion management ICs which come with a ton of certifications and paperwork. I looked at one ADI part for example, with separate redundant analog comparator-based backup system for the cell voltage measurements, against ADC/MCU failures. Sounds great! Then, looking at the typical application in the appnote, both shutdown signal paths were brought out to one microcontroller which "decides" about the system shutdown. Every idiot on the planet can laugh at this ridiculousness. Yet, it's a really well certified high-reliability product for automotive.
Now, what comes to li-ion safety, people tend to make these completely wrong assumptions:
* Li-ion is super unsafe; it instantly blows up or catches fire if you overcharge it the slightest
* BMS is the magic sauce which prevents all those fires which would happen otherwise.
In reality:
* A poor quality no-name Alibaba li-ion cell
may be fairly dangerous; otherwise, a modern, typical li-ion cells implement cell and chemisty level protections such as shutdown separators (that shut down the ionic transfer before the onset of thermal runaway), PTC resettable fuses, CIDs (interrupters based on cell pressure). I have tried my best to try to blow up a modern Samsung/Sony/Panasonic/LG cell. I haven't succeeded. I have applied 30V, 10A for 8 hours to a 4.2V cell. The cell becomes a self-regulating hysteretic heater that swiches on and off at about 120 degC. The plastic wrapping changes its tint. Worst case, I got a tiny leak of electrolyte out. These are BTW all tests that are specified by the manufacturer; they guarantee their cells pass such tests. External heat over the thermal runaway onset temperature - around 150 degC - would be the best bet, since the modern shutdown separators seem to work so well that even a nail penetration is not setting these things on fire anymore.
Warning: this is not to say you should abuse the cells in any way. They
can catch fire because the inherent chemistry is still very volatile - it's just got safety layers built around it -, and abusing them will of course increase the risk as it puts more burden on the safety features that are not "normally" needed. It's just that it doesn't tend to usually happen, because the safety features are well designed.
* The cell-level, balancing BMS, when working properly, may extend the usable pack lifetime slightly. A non-working, destructive BMS often does
not cause safety problems, because - see the previous point. The cells are usually OK with the abuse a faulty BMS gives them. So, a faulty BMS just destroys the pack, but very seldom in a dramatic or dangerous way - this is 100% thanks to the li-ion chemists and engineers!
So, all this is why I rolled my own BMS, not using any BMS IC, for EVs and energy storage (scalable, for large packs, up to 250s, up to around 100-200kWh; something similar to the modules available back then, such as Elithion, just a simplified, minimalistic design.) I did semi-commercialize it (producing a few full units, selling at very low price to selected customers). Even though I tried to address all issues seen in failing BMS's, I still don't have complete trust in my own, either.
Why don't I trust my design? Because I have seen very experienced professionals fail to provide the reliability. I have balancing! It can get stuck on, and although that doesn't cause fire - because I did thermal analysis for stuck-on balancing - it could still overdischarge a cell. I have a timeout feature (some TI products don't - they kill your battery automatically if your I2C communication fails stuck once in your product lifetime!) But still.
BMS design is non-trivial. The first issue you face is defining
what you need to do, the basic functionality and specifications. This is hard due to information overflow. The focus is easily lost to difficult-to-implement but unimportant features, such as:
* high balancing currents
* redistributive balancing
* complex algorithms not based on actual battery science, but the "gut feeling" instead. For example, at lot of effort has gone into AC measurements and "state-of-health analysis" bullshit instead of just implementing reliable LVC and HVC basics.
Then, when it comes to implementation:
* A BMS needs permanent connections to dozens of cell taps, possibly over a decade. Powering any electronics continuously, so that it's guaranteed to work within tight specs for a decade is non-trivial.
* Leakage currents need to be kept to minimal levels, even in corner cases. Any kind of latch-up of increased leakage - such as an MCU or an ASIC FSM exiting sleep and staying awake, or getting stuck to measuring loop - is a catastrophe which automatically kills the battery.
* If high balancing currents are involved in a dissipative way, the power dissipation analysis cannot be done by the "typical" basis, assuming short duty cycle. Balancing resistor can get stuck on for several reasons. I remember at least one reported conversion EV fire that was likely to be caused by overheating balancing resistor.
To make the point: compare the MTBF for a "MCU or FSM gets stuck in a wrong state" event for general consumer (or even industrial or automotive!) electronics, and a BMS.
The general device:
* Runs maybe a few hours a day
* Has a typical lifetime of probably five years; after that, no one's interested.
* Resets every now and while, when power cycling
* May get stuck without causing much problems: the user just resets it and we are good to go again!
Think of any gadget, even well designed. Imagine that every time you need to boot it for any reason, it would die instead. That's the level of reliability we need to think about when designing a cell-level BMS, especially with balancing.
On a li-ion BMS, a full reset cannot be done. It's permantenly powered for a decade; often in a difficult (read automotive) environment. What's worst, such a failure event almost guarantees the self-destruction of the pack! If any part of the IC / MCU gets stuck, a reset cannot be done, power cycling cannot be done, it's all hardwired inside the enclosure. It looks dead, is nonresponding, and you just wait it to kill the cells with it.
The MTBF for the similar event should be at least
4-5 orders of magnitude longer than for general consumer electronics. And because no typical BMS designer - not even at TI, they are making low-cost product series - has access to some super high-reliability NASA space technology, what do you think? It's all based on lowest cost commercial off-the-shelf processes. Especially at TI.
Which is why the only way I could imagine increasing the reliability was to simplify, reduce part count and complexity. But this isn't going to make the 4-5 order of magnitude difference required. So, I'm not too happy in my design. One cell module has actually failed on the field (but, luckily, didn't kill the cell). I suspect ESD during manufacturing or calibration.
When I was hired in the university I talked above, they actually had this super expensive conversion EV with a super over-engineered BMS (with redistributive balancing and everything). The total BOM count for the 80-cell system was over 5000 components, tens of meters of wire, around 300 connectors... And the problem was, the BMS was in some peculiar "state", it had been for half a year at that point, didn't let the car boot, didn't enable the charger. Now, when they finally let us start dismantling the car, about 30% of the cells were already completely dead, discharged to 0V. Now, the only task why the BMS
actually exists is to isolate the battery pack when any cell hits LVC or HVC, completely preventing overdischarge. This BMS failed exactly its primary purpose. To the designers, it clearly wasn't primary.
We got to see another similar EV case a year later, with a very different kind of BMS, and it had the exact same story: the BMS consisted of about ten 8-cell modules (so again around 80 cells total), and out of these ten modules, two were latched up, in a way that they had killed all 8 connected cells through the balancing taps. So, 16 cells were completely dead, 64 cells completely OK.
--
Really, the essence of a cell-level BMS is 90% cargo cult. You just design one in, now you have a BMS! And you don't need to think about it. Convenient, huh!
It's likely to be some random product from TI's massive lineup, most likely broken-by-design. I have chosen TI li-ion management part twice in my life in my own designs (I don't understand why, I usually learn from the mistakes of others), and regretted it twice, and redesigned it twice. It has wrong or non-optimal setpoints, it somehow does let the cells overdischarge, then does "preconditioning" at an order of magnitude higher current (I have seen C/20 in a TI product) than what's considered safe and instructed by battery manufacturers (typically C/100).
It claims to have "overvoltage protection", but when you look at the block diagram, you'll notice it's connected in the wrong place in hurry by the designers, so it has no chance of protecting anything. It's next to impossible to find all these traps beforehand.
Oh well, I had a pack charge to 4.63V/cell on a very simplistic prototype (with no secondary, redundant protection) by a TI part which was fully functional and
could have just shut down the MOSFET it was actively driving "on" - from charger input to the battery - despite the internally nonconnected "overvoltage" signal screaming out of their lungs. But it worked perfectly! My lazily done linear voltage-based battery gauge showed 127% and everybody was happy, because the extra charge was there, and really extended the runtime
. No fire. Thanks Samsung for a great product. No thanks go to TI.
The Typical TI BMS mostly works by luck, it might kill a small percentage of products after some years, but not too many - and no one even thinks about the cause. They think: "oh, the batteries are just unreliable, thank God we have a BMS, without it we would be seeing higher failure rates I'm sure!" Then, in some cases, the BMS causes some theoretically dangerous error, such as lets the cells overcharge - but, thanks to the robustness of the modern cells, nothing dramatic happens. So, everybody's happy, products are reliable enough for most people, and the BMS checkbox is ticked!
But you at least need voltage protection on a per-cell basis and
This is a very interesting myth I see recurring - if I had a dollar every time....
It has some basis on it, but it certainly isn't a general and "hard" rule like people think.
Not using cell-level voltage measurements for cutoffs is not only limited by cheap Aliexpress specials.
Since you seem to know this, could you explain to me why Robert Bosch does not need "per-cell basis" voltage protection? I mean, they are fairly reputable I think?
Could you explain the huge number of li-ion charge management ICs, that are supposed to be used with two cells connected in series, without center tap monitoring, as a single 7.2V nominal cell?
There is no debate about connecting 2 cells in series without cell-level cutoffs. That's the absolute industry standard practice, has been since day 1. The only debate about this is by hobbyists, on forums.
On over 2 cells, or large packs, things start get more complicated, like always, getting us to the "it depends" territory. But Robert Bosch isn't the only reputable manufacturer who has no issues going up to 6s without cell-level anything. There are some industry design traditions: laptops always have cell-level monitoring and balancing (and "killed by BMS" packs were fairly typical at one point about a decade ago - I have dissected several) - power tools often won't (and I haven't seen a single incident of a imbalanced pack, or a cell at 0V).
current protection for the pack overall,
Safety-wise, the most reliable current protection is a passive fuse, properly sized (not massively oversized). Don't ever forget this back up in case your MOSFET switches fail short. Remember to look at fuse DC ratings and DC breaking currents.
Sorry for getting so verbose. Hope this all helps someone.