As far as I can tell cp doesn't have any way to check that the initial stat call resolved to the "correct" file.
It definitely resolves to the inode named by the command line argument. Thus, it cannot be "incorrect".
Put another way, if it opened each input file one at a time,
fstat()ed them, and either read their entire contents into memory, or kept that same descriptor open, before writing any of the output files, then a single
fstat() would suffice. Because it can run out of file descriptors, it instead checks each input file first, decides what to do, and then
re-opens each input file when it is ready to do the copy. The second fstat() is necessary to ensure the inode that was opened is the same as it initially
stat()ed/
fstat()ed.
But what checks / decisions does cp make based on the initial stat call?
Instead of me making some assertions, let's examine that. (Besides, current coreutils-9.5 is
quite different to discussed above.)
Github has a mirror of the coreutils sources
here, with the main
cp source file being
src/cp.c. Note that those links default to the development head, so you might wish to select a specific tag. The current newest release version is 9.5, and its
src/cp.c is here. (You'll also want to look in
gl/lib/ for additional files.) Its
main() handles command-line options, with actual argument examination and copying done in
src/cp.c:do_copy().
The first stat done is on the very last argument, using
gl/lib/targetdir.c:target_directory_operand(). This is needed to verify that the last argument specifies a target directory for the source files.
If there is a target directory, each source file name is used to construct the target file name, and
src/copy.c:copy() is called to copy that file. It in turn uses
src/copy.c:copy_internal() to do the actual work. That in turn uses
src/copy.c:follow_fstatat() to stat the file, to make sure it is not a directory. If there are more than two arguments,
x->src_info is set, and the stat info is added to the hash table using
gnulib:lib/file-set.c:record_file(). Here, previously seen stat's are also used to detect the case where the same source file appears more than once via
gnulib:lib/file-set.c:seen_file(). Note that these use
both the file name and the stat info block, not just the stat info, to detect a repeated source file.
Progressing through that file, we find an use of
src/copy.c:same_file_ok(). Its comment explains the reasons for its use: when the target file in the target directory already exists, it ensures it is not the same inode as the source file, because trying to copy the source to the target would destroy the source file. (This can happen with for example a bind mount.) You'll see related code that allows
cp to copy a file over a symlink that points to the original file, for example.
Note, however, that this compares the source file to the tentative destination file.
Later on, we see two additional
gnulib:lib/file-set.c:seen_file() checks. (The third case, first one in that function, was to detect a repeated source file.) The first of these detects the case where a later copy would overwrite an earlier target file, as would happen if you had
rm -rf a b c ; mkdir a b c ; touch a/f b/fand then did
cp a/f b/f cThe result is that
c/f will be a copy of
a/f, and
cp will display an error:
cp: will not overwrite just-created 'c/f' with 'b/f'.
The second case checks that if an earlier copy caused a symlink to be created as a destination, that symlink will not be followed during creating further destination copies. This, too, involves checking a destination file against source files.
When above checks have all passed, and the source file is a regular file, the actual copy is done by the
src/copy.c:copy_reg() function. This is where the current equivalent of the check mentioned in the thread starter post is, lines
1265-1274, via
psame_inode() that is just a wrapper around the
gnulib:lib/same-inode.h:PSAME_INODE() macro.
This is how the
copy_reg() function verifies that the source file, after opening, is the same one the code used
follow_fstatat() on.
If we look at all the decisions above, if all the code did was to support copying regular files, then it could have just opened the source file (some kind of
follow_openat() followed by
fstat() on the open descriptor), yes. But the same code is actually used in
cp,
mv, and
install, ensuring they all use the same logic (well, code!) for resolving source and target file names, and it supports quite a lot of options from creating symlinks or hardlinks to copying entire directory trees.
So, for regular files, the first stat() (filename-based
follow_fstatat()) is to determine if the source file is a file or directory, that it isn't the same inode as its corresponding copy is (in which case copying would destroy the source file), and helps (along with filename) to detect duplicate source files. The second
fstat() after opening the source file "only" checks that it matches the one
follow_fstatat() resolved to. Technically, it
could have opened the file from the get go, but because coreutils shares this part across multiple utilities and there are many options that change the behaviour, it does not.
Is it correct to say the two stats are needed to avoid TOCTOU? I think so, yes.
One can argue that it is because of code complexity instead (because just opening the source file once, and keeping that descriptor open, would mean one
fstat() call would suffice). Or that it is there to ensure
follow_fstatat() resolves to the same file the later
open() does.
To me, the "verify still same" is a pattern I always apply to avoid TOCTOU, even if not security sensitive, when I cannot sensibly keep a descriptor or similar kernel-vetted handle open from the get go; and that not doing so is a
bug. Am I wrong? I could be, but I don't think so. I do have some practical experience in security-sensitive systems programming and oddball exploits and attacks, and consider even Apache SuEXEC unsafe (for reasons I've described
here and
here); if it matters to you, you can start with those and compare to what other security specialists say on such matters.
The check shouldn't hurt of course. If the file changes during execution something is not right. And it can reduce the window of vulnerability for some attacks, but not to zero. It's just sort of intrinsically unsafe for an administrator to manipulate files in a world writeable location while other users are active.
Oh, I definitely agree. However, on the kinds of systems I've worked with, there are many different administrators and levels of "administrator".
The typical TOCTOU the above check detects is not an intentional attack, but an accident when two or more persons work on the same files at the same time. Or when a human user accidentally interferes with a maintenance script copying some files, perhaps.
I prefer to use group-based access policies, because that way the "last owner user" is recorded as the file owner. (I provide some additional tools to change ownership between users belonging to that group for transferring "ownership". It's more like "who admits to being responsible for it", really. In Linux, for local files (not NFS or other shared volumes), one can use file leases to detect concurrent editing attempts, where the file to be modified/edited is copied to a temporary no-access-to-other-users directory. Not so much for "attacks" per se, but to avoid confusion due to concurrent edits.)
Thus, a typical scenario might be in some configuration directory only "site admins" or above can access, but two "site admins" making conflicting operations at the same time. An example that comes to mind is when one admin starts to prep a new vhost or subsite on some web server, pops for refilling their coffee/tea/water, and another one notices in the mean time that the vhost/subsite directory tree has been created but not copied to yet, so starts doing that to be helpful. In the mean time, the original person comes back and notices they forgot to press Enter to copy the contents to it.
(
"Yea, but that's the reason we have ticketing systems!" – except when you're training someone and what you said and what you did is not in perfect sync what they see. And so on. Accidents always happen.)