Author Topic: Is there something to learn for embedded/IOT from the Crowdstrike disaster?  (Read 3733 times)

Just_another_Dave and 1 Guest are viewing this topic.

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3847
  • Country: gb
  • Doing electronics since the 1960s...
Obviously nobody should deploy a "critical" system which is subject to 3rd party updates which have not been tested by the implementer on another (non critical) system.

But obviously millions of IT guys didn't know that ;)

And then we have the endless debates about "latest patches" on IOT products. Who wants to destroy their company? Micro$oft and Crowdstrike will survive this because they have contractually excluded all liability and anyway nobody is big enough to sue them.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline jzx

  • Regular Contributor
  • *
  • Posts: 71
  • Country: es
Learn to do backups  |O
 

Offline mikerj

  • Super Contributor
  • ***
  • Posts: 3294
  • Country: gb
Learn to do backups  |O

Restoring "Backups" is not a very practical solution when hundreds or thousands of important systems have gone down due to an update and you need to get them running ASAP.
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3847
  • Country: gb
  • Doing electronics since the 1960s...
Someone will point out that Samsung, Apple, etc, regularly issue updates to their phones. So how do they manage the risk? Probably by

- having a lot more testing resources than Crowdstrike
- doing a rolling deployment, starting with some small country in Africa ;)

Indeed; restoring backups is useless. Firstly, you will lose all new data, and secondly if the system has "done a BSOD" then it needs a site visit which is a disaster in most cases.
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline hans

  • Super Contributor
  • ***
  • Posts: 1661
  • Country: nl
Don't update if you cannot handle any prolonged downtime. But also don't postpone updates forever. Vulnerabilities will be exploited, so schedule maintenance and downtime regularly.

Having a "rolling release" update model has never worked for critical infrastructure. Just like "Continuous Delivery" is not a thing in the embedded world; if devices instantly get updated with the last Git commit you make to the production branch of your repository, as you can see, it has the potential to brick all your devices in the field. A webserver or computer OS can be rescued, but whatever embedded IoT gadget you in the field probably not without shipping it back and taking more costs than a 'throwaway' product would be worth.
 
The following users thanked this post: nctnico, mikerj, Just_another_Dave

Offline mark03

  • Frequent Contributor
  • **
  • Posts: 728
  • Country: us
Informed users are learning software updates are frequently not in their best interest.  Consider Windows for example.  Setting it to "critical updates only" certainly does not screen out the non-security-critical bloat.  There are similar examples in the embedded space, where a vendor decides to disable some popular feature against users' preferences.  Examples in smartphone apps are legion (the BBC news app is a prominent example).

So, this is slowly corroding users' trust in their SW suppliers, which---given the conflicting motives of user and vendor---is unavoidable, but obviously not good for security and reliability.

The Crowdstrike incident is different, mostly.  By definition, what they are trying to do requires frequent (daily?) updates, so the systemic blame (besides what has been mentioned above---which I agree with) needs to fall on the OS which requires mitigations like Crowdstrike to function safely.

Despite this, to me it's another example of how a computing culture of "continuous updates == security == good" can come back to bite you.  The benefits of software updates need to be balanced against the potential harms; this is rarely if ever discussed, probably because it tends to shine a light on user-hostile policies.  Personally, I would avoid software updates on most of my embedded devices if I could, and I make embedded software for a living.
 

Offline madires

  • Super Contributor
  • ***
  • Posts: 7997
  • Country: de
  • A qualified hobbyist ;)
Some customers can blame themselves for not reading the terms & conditions. Crowdstrike explicitly warns to not use their products for critical systems.
 
The following users thanked this post: SiliconWizard

Offline agehall

  • Frequent Contributor
  • **
  • Posts: 389
  • Country: se
Yeah, don’t let all your nodes update automatically. Test any updates on a few nodes that you can live without if something bad happens and only roll out the update once the test nodes have proven they can survive a reboot.

Everyone pushes a bad update once in a while. That isn’t the big f-up imho (but it was an f-up). Blindly trusting updates was the major f-up here.
 

Offline wek

  • Frequent Contributor
  • **
  • Posts: 514
  • Country: sk
Despite this, to me it's another example of how a computing culture of "continuous updates == security == good" can come back to bite you.  The benefits of software updates need to be balanced against the potential harms; this is rarely if ever discussed, probably because it tends to shine a light on user-hostile policies. 
Maybe; but maybe it's just that people want things to be simple. "Discussing or estimating benefit/drawback balance == complicated == bad".

Ideally, the culture should be shifted so, that the burden of proof of benefit is on whoever provides the update.

JW

« Last Edit: July 22, 2024, 07:31:56 pm by wek »
 

Online tszaboo

  • Super Contributor
  • ***
  • Posts: 7624
  • Country: nl
  • Current job: ATEX product design
Quote
Seamless rollouts, or rolling upgrades, are critical strategies in software deployment that help minimize downtime, maintain stability and enhance user experience. They involve the gradual deployment of updates or new features across servers or users, ensuring that the system remains operational during the process.
 

Online wraper

  • Supporter
  • ****
  • Posts: 17367
  • Country: lv
Don't deploy to prod on Friday.
 
The following users thanked this post: Warhawk

Online wraper

  • Supporter
  • ****
  • Posts: 17367
  • Country: lv
Learn to do backups  |O
Backup is not an issue here whatsoever, no data corruption happened. The issue is about how do you boot many thousands of computers to delete the file that causes BSOD. Basically needs a physical access and manually doing one by one unless it's running on a virtual machine.
 

Online wraper

  • Supporter
  • ****
  • Posts: 17367
  • Country: lv
Micro$oft and Crowdstrike will survive this because they have contractually excluded all liability and anyway nobody is big enough to sue them.
MS is not liable whatsoever. They have nothing to do with causing the problem. However MS stepped in and released a tool for simplifying the fix.
 

Offline langwadt

  • Super Contributor
  • ***
  • Posts: 4565
  • Country: dk
Learn to do backups  |O
Backup is not an issue here whatsoever, no data corruption happened. The issue is about how do you boot many thousands of computers to delete the file that causes BSOD. Basically needs a physical access and manually doing one by one unless it's running on a virtual machine.

and if you are using bitlocker, which is default now I believe, you need the key you can't just boot in safemode 
 

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3847
  • Country: gb
  • Doing electronics since the 1960s...
Quote
the culture should be shifted so, that the burden of proof of benefit is on whoever provides the update.

It is probably not possible to do that, in the contractual sense. Microsoft dominated that market (windows servers and the applications, basically) and you had to use that Crowdstrike stuff. The smart thing, which probably few did, was to test the updates first. The fact that some or many did a BSOD suggests that the simplest test setup would have shown the issue :)

In embedded/IOT the vendor may sometimes be dominant but often the customer will be dominant and thus be able to sink your company if you push out a damaging update. Customers in particular do that because of internal company politics: a corporate ladder climber needs to show ruthlessness, and of the three options on who to scalp (customer, employee, supplier) the last one is the only safe option. You can screw an employee but only if it is a heterosexual white male ;) Especially if the failed update requires on-site action.

This business should be an eye-opener for IOT vendors. But it's not easy to explain to a customer that you are not doing updates because of liability :)
Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline langwadt

  • Super Contributor
  • ***
  • Posts: 4565
  • Country: dk
Micro$oft and Crowdstrike will survive this because they have contractually excluded all liability and anyway nobody is big enough to sue them.
MS is not liable whatsoever. They have nothing to do with causing the problem. However MS stepped in and released a tool for simplifying the fix.

afaiu the way crowdstrike gets around having to get every update signed by MS (it's a kernel driver to run in ring0) is that the crowdstrike driver is basically an interpreter that can execute code in kernel mode delivered separately as update files.

MS allowing that to me sound like it basically makes the whole idea of drivers being wetted and signed, pointless

 
 
The following users thanked this post: hans, glenenglish

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14967
  • Country: fr
Micro$oft and Crowdstrike will survive this because they have contractually excluded all liability and anyway nobody is big enough to sue them.
MS is not liable whatsoever. They have nothing to do with causing the problem. However MS stepped in and released a tool for simplifying the fix.

afaiu the way crowdstrike gets around having to get every update signed by MS (it's a kernel driver to run in ring0) is that the crowdstrike driver is basically an interpreter that can execute code in kernel mode delivered separately as update files.

MS allowing that to me sound like it basically makes the whole idea of drivers being wetted and signed, pointless

Yes, this is what Dave Plummer explains in his video.
This is absolutely atrocious.
We can all see how it would make CrowdStrike's life infinitely easier, but that's completely insane in terms of security. Compromising security for more security being a schizophrenic approach.

Besides, no OS should require a third-party kernel driver for adding security to the OS - that's bogus.
Everything is wrong in that story. The fact that it blew up in everyone's face is, IMHO, actually a very good thing. If it could be a lesson. Which, unfortunately...
 
The following users thanked this post: nctnico

Offline peter-hTopic starter

  • Super Contributor
  • ***
  • Posts: 3847
  • Country: gb
  • Doing electronics since the 1960s...
Wonderful indeed :)

Z80 Z180 Z280 Z8 S8 8031 8051 H8/300 H8/500 80x86 90S1200 32F417
 

Offline PCB.Wiz

  • Super Contributor
  • ***
  • Posts: 1692
  • Country: au

Yes, this is what Dave Plummer explains in his video.
This is absolutely atrocious.
We can all see how it would make CrowdStrike's life infinitely easier, but that's completely insane in terms of security. Compromising security for more security being a schizophrenic approach.

Besides, no OS should require a third-party kernel driver for adding security to the OS - that's bogus.
Everything is wrong in that story. The fact that it blew up in everyone's face is, IMHO, actually a very good thing. If it could be a lesson. Which, unfortunately...

Yup, it's nuts. Microsoft side steps security, by giving the keys to someone else.

Even with that, it's still hard to fathom how any release got past sand box testing ?!
Surely they run anything for 24 hours or more, on a slew of candidate windows versions, before they release ? 

 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14967
  • Country: fr
I don't know the details, but from what I've heard, the code was correct at the test step and it got messed up at the "packaging" step - so what was actually deployed.

Which would mean that they do not test with the final artefacts as deployed, but with intermediate build artefacts. Which, btw, is not that unsual - I've seen that on several occasions in companies of various sizes.

I can't guarantee for sure that's what happened here, but something I've heard.
 

Offline zilp

  • Frequent Contributor
  • **
  • Posts: 297
  • Country: de
afaiu the way crowdstrike gets around having to get every update signed by MS (it's a kernel driver to run in ring0) is that the crowdstrike driver is basically an interpreter that can execute code in kernel mode delivered separately as update files.

MS allowing that to me sound like it basically makes the whole idea of drivers being wetted and signed, pointless

You are misunderstanding the purpose of code signing. The purpose of code signing is primarily authentication/attribution. Also, I think it is misleading to say that Microsoft vets drivers. AIUI, they don't even review the code. All their testing program does is try to find obvious reliability/stability problems. The point is to improve the average user experience due to easily discoverable bugs, not to provide anything close to a quality or security guarantee. Essentially, the signature from MS doesn't say "this is high-quality, secure software", it just says "this doesn't fall over quite as easily as shoddy drivers in past windows versions often did".
 

Offline harerod

  • Frequent Contributor
  • **
  • Posts: 465
  • Country: de
  • ee - digital & analog
    • My services:
In an IOT/MCU environment, which is much simpler than where Crowdsource is being used, one possible approach is double everything critical. Provided you have enough FLASH memory, you can have two bootloader sections and two application sections. A failed update can always switch back to the last known good version. If FLASH is not that abundant, make at least the bootloader redundant. When done correctly, you can recover the application after a missed update AND have a fallback for bootloader updates. The exact details need to be hammered out for each use case.
Double BIOS images for PC's have been around for ages. One or two layers above BIOS, Crowdsource under Windows failed at having a fallback for a foul-up during a critical kernel driver update. In the past I have heard about "security concerns" preventing an automatic roll-back. One can only hope that they will figure out some solution.
 
 

Offline langwadt

  • Super Contributor
  • ***
  • Posts: 4565
  • Country: dk
... the signature from MS doesn't say "this is high-quality, secure software", it just says "this doesn't fall over quite as easily as shoddy drivers in past windows versions often did".

which would be better than crowd strike causing likely the biggest most expensive outage in history

apparently MS did try to force antivirus vendors to use a special MS api instead of everyone inventing they own stuff running in kernel mode, but were told that was monopolistic so they couldn't do that, but Apple could

 

Online IanB

  • Super Contributor
  • ***
  • Posts: 12056
  • Country: us
It is easy to say that security software can be worse than the problem it is trying to protect against. It is like installing malware to protect against malware. The latest "innovation" is when the security software randomly deletes files it doesn't like, or blocks installers from completing their tasks. Like, hey, isn't that what malware does?
 

Offline SiliconWizard

  • Super Contributor
  • ***
  • Posts: 14967
  • Country: fr
apparently MS did try to force antivirus vendors to use a special MS api instead of everyone inventing they own stuff running in kernel mode,

Which looks pretty reasonable. Running anything in kernel mode, if you can do otherwise, is insane.

but were told that was monopolistic so they couldn't do that, but Apple could

We don't know the full details I think, so it's hard to tell what exactly they were told.

If Windows kernel was a microkernel, this would never happen. But, of course, that's just a if and impossible without a complete rewrite of the kernel. So, not a solution. Just a thought.

So in the meantime, MS has no choice but try to offer alternatives to kernel drivers for basic security. What else could they do than provide a corresponding API?

For once, it's definitely none of MS's fault (apart from the kernel not being a microkernel, but none of the major OSs currently is a microkernel, so...) If anything, they should change their WHQL policy to prevent kernel drivers from doing what Crowdstrike did, that is run as an interpreter to code that can be updated without requiring new driver tests. But maybe even just that would be considered monopolistic. Who knows.

Heck, even preventing mass pushing updates on critical systems without any local sysadmin action, which should make a bit of sense, may be seen as monopolistic, preventing third-party companies from doing business as they like. Just stay away from Windows would be my pragmatic advice at this point.
 


Share me

Digg  Facebook  SlashDot  Delicious  Technorati  Twitter  Google  Yahoo
Smf