Seth Woolley's Man Viewer

safecat(1) - safecat - safely write data to a file - man 1 safecat

([section] manual, -k keyword, -K [section] search, -f whatis)
man plain no title

safecat(1)                                                          safecat(1)



NAME
       safecat - safely write(1,2) data to a file(1,n)


SYNOPSIS
       safecat tempdir destdir


INTRODUCTION
       safecat  is  a  program  which  implements Professor Daniel Bernstein's
       maildir(1,5) algorithm to copy stdin safely to a file(1,n) in(1,8) a specified  direc-
       tory.   With  safecat,  the  user is offered two assurances.  First, if(3,n)
       safecat returns a successful exit(3,n,1 builtins) status, then all data  is  guaranteed
       to  be saved in(1,8) the destination directory.  Second, if(3,n) a file(1,n) exists in(1,8)
       the destination directory, placed there by safecat, then  the  file(1,n)  is
       guaranteed to be complete.

       When  saving data with safecat, the user specifies a destination direc-
       tory, but not a file(1,n) name.  The file(1,n) name is  selected  by  safecat  to
       ensure  that  no  filename  collisions occur, even if(3,n) many safecat pro-
       cesses and other programs implementing the maildir(1,5) algorithm are  writ-
       ing  to  the  directory  simultaneously.   If  particular filenames are
       desired, then the user should rename(1,2,n) the file(1,n) after safecat  completes.
       In general, when spooling data with safecat, a single, separate process
       should handle naming, collecting, and deleting these  files.   Examples
       of such a process are daemons, cron jobs, and mail(1,8) readers.


RELIABILITY ISSUES
       A machine may crash while data is being written to disk.  For many pro-
       grams, including many mail(1,8) delivery agents, this means  that  the  data
       will  be silently truncated.  Using Professor Bernstein's maildir(1,5) algo-
       rithm, every file(1,n) is guaranteed complete or nonexistent.

       Many people or programs may write(1,2) data to a common  "spool"  directory.
       Systems  like  mh-mail  store files using numeric names in(1,8) a directory.
       Incautious writing to files can result in(1,8) a  collision,  in(1,8)  which  one
       write(1,2)  succeeds  and  the  other  appears to succeed but fails.  Common
       strategies to resolve this problem involve creation of  lock  files  or
       other  synchronizing  mechanisms,  but  such  mechanisms are subject to
       failure.  Anyone who has deleted $HOME/.netscape/lock in(1,8) order to start
       netscape  can  attest to this.  The maildir(1,5) algorithm is immune to this
       problem because it uses no locks at all.


THE MAILDIR ALGORITHM
       As described in(1,8) maildir(1,5)(5), safecat applies the  maildir(1,5)  algorithm  by
       writing  data in(1,8) six steps.  First, it stat(1,2)()s the two directories tem-
       pdir and destdir, and exits  unless  both  directories  exist  and  are
       writable.   Second,  it  stat(1,2)()s  the name tempdir/time.pid.host, where
       time(1,2,n) is the number of seconds since the beginning of 1970 GMT,  pid  is
       the  program's process ID, and host(1,5) is the host(1,5) name.  Third, if(3,n) stat(1,2)()
       returned anything other than ENOENT, the program sleeps  for  two  sec-
       onds,  updates  time(1,2,n),  and  tries the stat(1,2)() again, a limited number of
       times.  Fourth, the program creates tempdir/time.pid.host.  Fifth,  the
       program NFS-writes the message to the file.  Sixth, the program link(1,2)()s
       the file(1,n) to destdir/time.pid.host.  At that instant the data  has  been
       successfully written.

       In  addition,  safecat  starts  a  24-hour  timer  before creating tem-
       pdir/time.pid.host, and aborts the write(1,2) if(3,n) the  timer  expires.   Upon
       error(8,n), timeout(1,3x,3x cbreak), or normal completion, safecat attempts to unlink(1,2)() tem-
       pdir/time.pid.host.


EXIT STATUS
       An exit(3,n,1 builtins) status of 0 (success) implies that all  data  has  been  safely
       committed to disk.  A non-zero exit(3,n,1 builtins) status should be considered to mean
       failure, though there is an outside chance that safecat wrote the  data
       successfully, but didn't think so.

       Note again that if(3,n) a file(1,n) appears in(1,8) the destination directory, then it
       is guaranteed to be complete.

       If safecat completes successfully, then it will print the name  of  the
       newly created file(1,n) (without its path) to standard output.


SUGGESTED APPLICATIONS
       Exciting uses for safecat abound, obviously, but a word may be in(1,8) order
       to suggest what they are.

       If you run Linux and use qmail instead of sendmail(1,8), you should consider
       converting your inbox to maildir(1,5) for its superior reliability.  If your
       home directory is NFS mounted, qmail forces you to use maildir(1,5).  On the
       downside,  the  lovely tool procmail, which filters your spam, does not
       know maildir(1,5).  Rather than running the patched procmail, you might con-
       sider  using  safecat to deliver to your inbox.  That allows you to use
       the latest procmail without waiting  for  the  maildir(1,5)  patches  to  be
       applied to it.

       (Note:  the previous paragraph was written before procmail started han-
       dling maildir(1,5) delivery. Since maildir(1,5) delivery has been added, my point
       is  made  stronger!   Procmail's  maildir(1,5)  support does not comply with
       Dan's algorithm, and so does not  offer  the  reliability  promised  by
       maildir(1,5)  delivery.   Procmail  plus safecat has always offered reliable
       maildir(1,5) delivery. Another victory for modularity!)

       If you write(1,2) CGI applications to collect data over the World Wide  Web,
       you  might find safecat useful.  Web applications suffer from two major
       problems.  Their performance suffers from every stoppage or  bottleneck
       in(1,8)  the  internet; they cannot afford to introduce performance problems
       of their own.  Additionally, web applications should  NEVER  leave  the
       server and database in(1,8) an inconsistent state.  This is likely, however,
       if(3,n) CGI scripts directly frob some database--particularly if(3,n)  the  data-
       base  is  overloaded  or  slow.   What happens when users(1,5) get bored and
       click "Stop" or "Back"?  Maybe the database activity completes.   Maybe
       the CGI script is killed, leaving the DB in(1,8) an inconsistent state.

       Consider the following strategy.  Make your CGI script dump its request
       to a spool directory using safecat.  Immediately return  a  receipt  to
       the  browser.  Now the browser has a complete guarantee that their sub-
       mission is received, and the perceived performance of your web applica-
       tion is optimal.

       Meanwhile,  a spooler daemon notices the fresh request, snatches it and
       updates the database.  Browsers can be informed that their request will
       be fulfilled in(1,8) X minutes.  The result is optimal performance despite a
       capricious internet.  In addition, users(1,5) can  be  offered  nearly  100%
       reliability.


EXAMPLES
       To  convince sendmail(1,8) to use maildir(1,5) for message delivery, add the fol-
       lowing line to your .forward file:

       |SAFECAT HOME/Maildir/tmp HOME/Maildir/new || exit(3,n,1 builtins) 75 #USERNAME

       where SAFECAT is the complete path of the safecat program, HOME is the
       complete path to your home directory, and USERNAME is your login(1,3,5) name.
       Making this change is likely to pay off; many campuses and companies
       mount(2,8) user home directories with NFS.  Using maildir(1,5) to deliver to your
       inbox folder helps ensure that your mail(1,8) will not be lost due to some
       NFS error.  Of course, if(3,n) you are a System Administrator, you should
       consider switching to qmail.

       To run a program and catch its output safely into some directory, you
       can use a shell script like the following.

       #!/bin/bash

       MYPROGRAM=cat              # The program you want to run
       TEMPDIR=/tmp               # The name of a temporary directory
       DESTDIR=$HOME/work/data    # The directory for storing information

       try() { $* 2>/dev/null || echo(1,3x,1 builtins) NO 1>&2 }

       set(7,n,1 builtins) `( try $MYPROGRAM | try safecat $TEMPDIR $DESTDIR ) 2>&1`
       test "$?" = "0"  || exit(3,n,1 builtins) -1
       test "$1" = "NO" && { rm -f $DESTDIR/$2; exit(3,n,1 builtins) -1; }

       This script illustrates the pitfalls of writing secure programs with
       the shell.  The script assumes that your program might generate some
       output, but then fail to complete.  There is no way for safecat to know
       whether your program completed successfully or not, because of the
       semantics of the shell.  As a result, safecat might create a file(1,n) in(1,8)
       the data directory which is "complete" but not useful.  The shell
       script deletes the file(1,n) in(1,8) that case.

       More generally, the safest way to use safecat is from within a C pro-
       gram which invokes safecat with fork() and execve().  The parent
       process can the simply kill(1,2,1 builtins)() the safecat process if(3,n) any problems
       develop, and optionally can try again.  Whether to go to this trouble
       depends upon how serious you are about protecting your data.  Either
       way, safecat will not be the weak link(1,2) in(1,8) your data flow.


BUGS
       In order to perform the last step and link(1,2)() the temporary file(1,n) into
       the destination directory, both directories must reside in(1,8) the same
       file(1,n) system.  If they do not, safecat will quietly fail every time.  In
       Professor Bernstein's implementation of maildir(1,5), the temporary and des-
       tination directories are required to belong to the same parent direc-
       tory, which essentially avoids this problem.  We relax this requirement
       to provide some flexibility, at the cost of some risk.  Caveat emptor.

       Although safecat cleans up after itself, it may sometimes fail to
       delete the temporary file(1,n) located in(1,8) tempdir.  Since safecat times out
       after 24 hours, you may freely delete any temporary files older than 36
       hours.  Files newer than 36 hours should be left alone.  A system of
       data flow involving safecat should include a cron job to clean up tem-
       porary files, or should obligate consumers of the data to do the
       cleanup, or both.  In the case of qmail, mail(1,8) readers using maildir(1,5) are
       expected to scan and clean up the temporary directory.

       The guarantee of safe delivery of data is only "as certain as UNIX will
       allow."  In particular, a disk hardware failure could result in(1,8) safecat
       concluding that the data was safe, when it was not.  Similarly, a suc-
       cessful exit(3,n,1 builtins) status from safecat is of no value if(3,n) the computer, its
       disks and backups all explode at some subsequent time.

       In other words, if(3,n) your data is vital to you, then you won't just use
       safecat.  You'll also invest in(1,8) good equipment (possibly including a
       RAID disk), a UPS for the server and drives, a regular backup schedule,
       and competent system administration.  For many purposes, however, safe-
       cat can be considered 100% reliable.

       Also note that safecat was designed for spooling email messages; it is
       not the right tool for spooling large files--files larger than 2GB, for
       example. Some operating systems have a bug which causes safecat to fail
       silently when spooling files larger than 2GB. When building safecat,
       you can take advantage of conditional support for large files on Linux;
       see conf-cc for further information.


CREDITS
       The maildir(1,5) algorithm was devised by Professor Daniel Bernstein, the
       author of qmail.  Parts of this manpage borrow directly from maildir(1,5)(5)
       by Professor Bernstein.  In particular, the section "THE MAILDIR ALGO-
       RITHM" transplants his explanation of the maildir(1,5) algorithm in(1,8) order to
       illustrate that safecat complies with it.

       The original code for safecat was written by the present author, but
       was since augmented with heavy borrowings from qmail code.  However,
       under no circumstances should the author of qmail be contacted concern-
       ing safecat bugs; all are the fault, and the responsibility, of the
       present author.

       Copyright (c) 2000, Len Budney. All rights reserved.


SEE ALSO
       mbox(5), qmail-local(8), maildir(1,5)(5)



                                                                    safecat(1)

References for this manual (incoming links)