Seth Woolley's Blog

Occasional Musings

Wed Mar 16 19:40:02 2011 -- avoiding tmp files

avoiding tmp files(2)

In an IRC chat an expert linux admin chagrined that every now and then they catch themselves in a rookie mistake:

sort /tmp/broken | uniq > /tmp/broken

Since sort and uniq are happening in parallel in a pipe, the > clobbers the input of sort since they are acting on the same file.

Another person asked if there's a way to do it without tmp files:

sort /tmp/broken | uniq > /tmp/broken2 &&
  mv /tmp/broken2 /tmp/broken

Since this is a unix filesystem, we can abuse inode file access counts and directory reference counts to avoid a tmp file, but I have to use fuser to avoid a race between the two parallel spawns so that the rm happens after the sort opens the file:

sort /tmp/broken |
  (
    while fuser /tmp/broken; do sleep .1; done;
    rm /tmp/broken;
    uniq > /tmp/broken;
  )

Sort opens and the subshell opens

If sort opens the file first, fuser returns zero and the rm is issued.

If sort has not opened the file first, fuser returns one, introduces a .1 sleep and tries again and again until it has been opened -- since consuming the input doesn't happen until the rm happens (uniq comes after), sort will block holding access to the file even if it fills the buffer completely -- it won't close the connection until the file is consumed, in sort, meaning that there's no race for the end of the sort's fill of the buffer.

Once the file is removed, the reference count of the directory is now zero and the link to the inode underneath is deleted.  Since we have assured that the file is open with fuser before doing this the file doesn't go away because the file access reference count is still not zero and we are left with an orphaned inode that the filesystem will remove once all accesses to it have stopped.

uniq operates on the orphaned inode's data with the input data we care about after filtering through sort and it writes to a new inode and new file reference to the inode.  When uniq is done, the orphaned inode is closed and the filesystem deletes the inode and its contents.  The new inode that uniq created contains the expected data.

It's kind of cheating because one might think the orphaned inode as a temporary file that the operating system takes care of later.  That leads to the philosophical question -- is an orphaned inode with file contents a "file" if it has no file name?

Seth Woolley's Blog

Thu Apr 7 22:23:27 2011 -- Comment not that hard -- by ben

not that hard

sort /tmp/broken | uniq > /tmp/broken.new & /tmp/broken.new /tmp/broken

Fri Mar 18 02:24:42 2011 -- Comment more tricks -- by swoolley

more tricks

Also, using shell vars:

VAR="$(sort /tmp/broken)"; printf -- "%s" "$VAR" | uniq > /tmp/broken

using built-in shell echo that doesn't have kernel command limits but interprets -n in front insecurely:


VAR="$(sort /tmp/broken)"; echo "$VAR" | uniq > /tmp/broken

Leave A Comment

Secret is used for editing your own comment. If subject, secret, and name all are the same as a previous comment, it will be overwritten. Turing is the name of this program (look at the Source Code link on the front page), used to see if you are human.