I would have thought it was obvious why you don't want to have an embedded system as a server on an open port, but e.g.
Yeah, that's what it seemed like. Which is why I am trying to get you to really think this through. This is widely believed, especially among people who don't know all that much about IT security, but really, it's pretty close to cargo cult and often leads to bad security.
- the IOT box will be detected by port sniffers within hours and attacked
How would it be attacked? And how does not having a server prevent that?
- its software will never be up to date security-wise (due to constant R&D cost, and commercial risk of OTA)
And how does not having a server help with that?
- auto update download (client-triggered OTA basically) is possible but risky since the mfg publishing a bad update could brick the entire installed base!
As for accidental bricking, there certainly are things one can do to minimize that risk (like two boot images in flash with some automatic or manual way to recover into an old version if the new version is broken).
But also note that automatic updates are a huge security risk, as a compromise of the update deployment system can be used to fully compromise and/or brick all installed devices.
- look at how many versions of say MbedTLS come out, and the recent ones (v3+) are a big job to integrate into a product which has v2
I am assuming that you are talking about versions due to security fixes here? If so: How does not having a server help with that?
- only a proper heavy duty server (e.g. Centos, Linux and Apache, NGINX etc) should be exposed, and the admin and security patching of such is much easier
It seems like you are suggesting that an IoT device that only acts as a client is not exposed? Why do you think that?
Really think about this.
If there is a buffer overflow in the IP layer, say, how does not having a server on the device prevent exploitation of that buffer overflow?
- the server can be protected against DOS with e.g. Cloudflare
Maybe. But be aware that that can also be a huge security risk, especially if this implies TLS termination by Cloudflare (Cloudflare is a very juicy target).
- IOT box having both client and server is a whole load of RAM; MbedTLS client alone is ~60k
Of per-connection state, you mean? Well, yeah, that's a thing to consider, but certainly not a reason to never have both.
- the client mode means the entire "connection" which the IOT box has to support is private and hidden, which enables operation with no software updates for years or decades
No, it most certainly does not mean that. If you want to estimate IT security risks, you have to think in terms of attack surface, i.e., what parts of the code in your system are reachable by an attacker, and how difficult it is to get into a position that would allow that attack to be performed.
Take my example above: The low-level IP stack does not care about "client" or "server" at all, all it is concerned with are IP packets. Just because you happen to only use one TCP socket to establish an outbound connection, doesn't mean that someone who can send IP packets to your device can't exploit a vulnerability in that code.
Now, you might be thinking: "But how would anyone possibly send IP packets to the device?" Well, for one, even from the public internet through a stateful firewall that blocks incoming connections, that is way easier than your wording above would suggest.
A firewall isn't a magic device that prevents packets from all unauthorized sources from reaching your device. All it does is that it matches a bunch of fields in a packet against fields it has seen in other packets to determine whether the packet might belong to an established connection. But the entropy of those fields isn't all that high. With TCP over IPv4, that's 32 bit each for the IP addresses of the two endpoints, 16 bits each for the port numbers of the two endpoints, and the 32 bit sequence number.
So ... 128 bits? That should be enough to prevent someone just blindly guessing the values to get a packet through the firewall, shouldn't it? Well, if these indeed were purely random 128 bits, sure. But of course, they aren't. In your IoT-device-connects-to-manufacturer's-server scenario, the IP address of the server is public knowledge, so trivial to "guess". The port number of the server is also public knowledge, so doesn't help either. Now, if the attacker has no idea where the devices that they are trying to attack are on the network, then you could consider the client IP address as sort-of random. But of course, that's not really necessarily true, as it might not necessarily be a secret that someone has one of your devices installed, and thus, if someone were to try and run a targeted attack against them, they might easily know the (public) IP address where the device is to be found (be it a business with static addresses, or you get the victim to load some page or image or whatever from your web server, or maybe you can get them to send you an email where the email provider puts their address into the email headers ...), so you really have to assume that the client IP address isn't exactly a secret either.
Well, then we at least have 32 bits of random sequence number, right? Well ... for one, you'd have to make sure that the initial sequence number is really random (especially on embedded systems with bad randomness sources, that can be a problem, though most implementations nowadays at least try to make the ISN unpredictable). But also, the firewall can not do an exact match, because it has to allow for reordering and packet loss within the transmission window. I would be surprised if any firewall let through less than 64 KiB of window, this reducing the remaining entropy of he sequence number fo 16 bits. And I would not be surprised if many firewalls didn't check sequence numbers at all.
So ... well, I guess the only somewhat reliable entropy is the client port number, then? That also depends on the client actually randomizing the port number, of course. And also, if your code establishes multiple connections in short order, chances are that the firewall takes a while to time them all out, so multiple port numbers might actually be let through.
Also, mind you that NAT is not a firewall, and chances are that NAT is less strict than a firewall when matching connections.
So ... you are relying on 16 bits of randomness to prevent an attacker exploiting a vulnerability in your IP stack. I.e., on average 32768 packets to compromise your device. So, with a Gbit/s fiber link, that would be matter of less than a milisecond.
And all of that assumes that the attacker can't eavesdrop the legitimate connection, which is also a risky assumption to make.
And that is just one of many possible attack vectors. Depending on the vulnerability, an attacker might also just be able to use other devices behind the firwall to attack your device, for example. If the user has a smartphone or PC on the same network (which would certainly be common in home setups), an attacker might be able to get the user to load some web page that then can use the numerous javascript APIs in the browser to send packets to your device without those packets ever even passing through a firewall that might exist between the public internet and your device.
- one would use TLS for data encryption, with the server certificate having a 100 year life (one can argue that one may as well just run aes256 with a shared key; there is no difference between a shared key and a long lived certificate, is there?)
That depends on the cipher suites that you are using with TLS. Best practice nowadays would be to only use cipher suites with forward secrecy, i.e., cipher suites that use ephmeral (elliptic curve) diffie hellman key agreement, so that a compromise of the server's secret key does not allow decryption of past connections.
Also, as PSK that's used directly for payload encryption is at high risk of IV reuse and/or difficult to reliably protect against replay attacks.
Or more generally: If you have to ask this question, do not even think about designing your own cryptographic protocol, including "just encrypt the data befrore transmitting it". You are almost guarenteed to fuck it up in a way that allows trivial compromise of the encryption.
The IOT box would call up the server say every 10 mins to collect any config changes, and since it is behind NAT it would be very hard to hack it. NAT can be penetrated using a couple of methods but AFAIK both involve - at a minimum - the attacker gaining control of the server which the IOT box periodically calls up,
No, that is not necessary. Actually, that wouldn't even count as "hacking the NAT", as the server that the client legitimately connects to obviously can send packets to the client device, that is just how a NAT is expected to be used. Of course, there will also be potential attack surfaces that are reachable only from that server, but that would just be using the NAT the way it is intended to be used, not in any way "hacking" it.
Really, the idea that NAT "prevents unsolicited connections" is just a myth. It is a very commonly believed myth, but it still is just a myth.
which is obviously possible but with good admin should be hard, otherwise the internet could never exist
Well, in any case, it is strongly recommended to have defense in depth. It is generally a very bad idea to have a central point of (security) failure, because that makes for a juicy target. While it might be true that it is easier to secure one server than thousands of appliances (which doesn't work anyhow, because you still need to secure the appliances, see above), that one server is also a far more interesting target for attackers than all those appliances that might not be that easy to find otherwise.
Then if the IOT box is hard to trash (a simple way would be to have no way to remotely re-flash it) you have pretty good security.
For one, no, that doesn't mean that you have pretty good security. If your heating controller can be made to burn down the house, then it is irrelevant whether it is possible to remotely re-flash it.
But also ... if an attacker can obtain remote code execution through some vulnerability, how do you prevent them from just rewriting the flash contents, assuming the MCU supports self-flashing? Unless you are talking about an MCU with MMU and unprivileged processes and stuff ... which still assumes correctness of the code that's enforcing that security boundary.
It would be highly risky to build this around some commercial service. These have a habit of ditching protocols and needing a big redesign.
Agreed.
Obviously the server will need careful firewalling.
If your server
needs firewalling, you have already failed.