The Wikipedia article on
OS-level virtualization contains a table comparing some common OS-level virtualization implementations, but doesn't mention their security implications.
chroot() is only a filesystem isolation mechanism, not a security one. (Consider a lab, but with an unlockable door/airlock.) It is very useful for e.g building native second-stage compilers and base C libraries (see
LinuxFromScratch and its
general instructions), because builds and installations within the chrooted environment will not be affected by the host system changes outside the chroot. It is particularly nice when the build uses cmake or autotools, which test to see things the build host supports. Generally, Linux build tools try to behave very nicely, and usually don't do anything that could break them out of a chroot.
Leaking descriptors (referring to directories) to the chrooted process, or passing one via an Unix domain socket (via SCM_RIGHTS ancillary message), is a common way to escape a chroot. The moving directories inside-outside trick is another, but can be avoided if the chroot is a separate partition (because you cannot move/rename/hard-link stuff across partitions: they have to be copied instead). I personally think POSIX
O_CLOEXEC flag should be the default for all
open() variants, especially C library
opendir() implementations due to the risk of leaking descriptors to child processes executing helper programs; it's
super-nasty for multithreaded programs where multiple threads open and close files, and at least one of the threads forks child processes and executes helper programs. One can always duplicate the desired descriptors in the child process, and remove the O_CLOEXEC flag from the duplicate using
fcntl(descriptor,F_SETFD,0) thus allowing the descriptor to survive the exec() boundary, safely avoiding fork()+exec() races against other threads in the same process. Setting the flag after opening is always at risk of such races and leaking the descriptor.
I've always preferred the FreeBSD jail mechanism. Thank you for implementing them, bsdphk!
For similar functionality in Linux, you need to use cgroups.
If you need to isolate an unprivileged process, you can bolster chroot() with a seccomp filter that disables most syscalls, but it approaches security the exact wrong way: starting from an unsecure situation and trying to plug up the known holes. The correct approach is to start by locking everything down, and only allow known safe operations. But be very, very suspicious about what you consider safe.
Nowadays, I like to use virtual machines for stuff I don't really trust. There is a bit of overhead (mostly memory use), but it is lightweight enough and nearly effortless. It also makes post-crapout cleanup much easier – I checkpoint the vm before testing the stuff, and revert back to the checkpoint afterwards.
If I were to suspect something
evil like a virus or a worm, then I'd use physically separate dedicated hardware, with the storage used completely scrubbed without mounting, afterwards.