Seth Woolley's Man Viewer

shorten(1) - shorten - fast compression for waveform files - man 1 shorten

([section] manual, -k keyword, -K [section] search, -f whatis)
man plain no title

SHORTEN(1)                                                          SHORTEN(1)



NAME
       shorten - fast compression for waveform files

SYNOPSIS
       shorten [-hlu] [-a #bytes] [-b #samples] [-c #channels] [-d #bytes] [-m
       #blocks] [-n #dB] [-p #order] [-q #bits] [-r #bits] [-t  filetype]  [-v
       #version(1,3,5)] [waveform-file [shortened-file]]

       shorten  -x  [-hl] [ -a #bytes] [-d #bytes]  [shortened-file [waveform-
       file(1,n)]]

       shorten [ -e | -i | -k | -s | -S<name> ] shortened-file

       shorten [ -s | -S<name> ] < shortened-data

DESCRIPTION
       shorten reduces the size of waveform files (such as audio) using  Huff-
       man(1,5,7)  coding  of  prediction residuals and optional additional quantisa-
       tion.  In lossless mode the amount of compression obtained  depends  on
       the nature of the waveform.  Those composing of low frequencies and low
       amplitudes give the best compression,  which  may  be  2:1  or  better.
       Lossy compression operates by specifying a minimum acceptable segmental
       signal(2,7) to noise ratio or a maximum bit rate.   Lossy compression  oper-
       ates  by  zeroing  the  lower  order bits of the waveform, so retaining
       waveform shape.

       If both file(1,n) names are specified then these are used as the  input  and
       output  files.  The first file(1,n) name can be replaced by "-" to read(2,n,1 builtins) from
       standard input and likewise the second filename can be replaced by  "-"
       to  write(1,2)  to  standard  output.   Under UNIX, if(3,n) only one file(1,n) name is
       specified, then that name is used for input and the output file(1,n) name is
       generated  by  adding the suffix ".shn" on compression and removing the
       ".shn" suffix on decompression.  In  these  cases  the  input  file(1,n)  is
       removed  on  completion.   The use of automatic file(1,n) name generation is
       not currently supported under DOS.  If no  file(1,n)  names  are  specified,
       shorten reads from standard input and writes to standard output.  When-
       ever possible, the output file(1,n) inherits the permissions, owner,  group,
       access(2,5) and modification times of the input file.

       From  release  2.3  the  RIFF  WAVE  (Microsoft  .wav) file(1,n) type is the
       default.  These files contain enough information to  set(7,n,1 builtins)  most  of  the
       switches  presented  below,  so effective operation is obtained just by
       setting the desired level of compression (-n or -r switch(1,n)).

OPTIONS
       -a align bytes
              Specify the number of bytes to be copied  verbatim  before  com-
              pression  begins.   This  option  can  be used to preserve fixed
              length ASCII headers on waveform files, and may be necessary  if(3,n)
              the header length is an odd number of bytes.

       -b block size
              Specify  the  number  of  samples to be grouped into a block for
              processing.  Within a block the signal(2,7) elements are expected  to
              have  the  same  spectral  characteristics.   The default option
              works well for a large range of audio files.

       -c channels
              Specify the number of independent interwoven channels.  For  two
              signals, a(t) and b(t) the original data format is assumed to be
              a(0),b(0),a(1),b(1)...

       -d discard bytes
              Specify the number of bytes to be discarded  before  compression
              or decompression.  This may be used to delete header information
              from a file.  Refer to the -a  option  for  storing  the  header
              information in(1,8) the compressed file.

       -e     Erase seek information from an existing file.

       -h     Give a short message specifying usage options.

       -i     Inquire  as  to whether the given file(1,n) is an external seek table
              file(1,n), a file(1,n) with seek tables appended to it,  or  neither.   If
              seek  tables  are  present,  the  seek  table revision number is
              shown.

       -k     Append seek information to an existing file.

       -l     Prints the software license specifying the  conditions  for  the
              distribution and usage of this software.

       -m blocks
              Specify  the  number  of  past blocks to be used to estimate the
              mean and power of the signal.  The value of zero  disables  this
              prediction  and  the mean is assumed to lie in(1,8) the middle of the
              range of the relevant data type (i.e. at zero for signed quanti-
              ties).    The  default value is non-zero for format versions 2.0
              and above.

       -n noise level
              Specify the minimum acceptable segmental signal(2,7) to  noise  ratio
              in(1,8) dB.  The signal(2,7) power is taken as the variance of the samples
              in(1,8) the current block.  The noise power is the quantisation noise
              incurred  by  coding the current block assuming that samples are
              uniformally distributed over the quantisation interval.  The bit
              rate  is  dynamically  changed to maintain the desired signal(2,7) to
              noise ratio.  The default value represents lossless coding.

       -p prediction order
              Specify the maximum order of the linear predictive filter.   The
              default  value of zero disables the use of linear prediction and
              a polynomial interpolation method is used instead.  The  use  of
              the  linear  predictive  filter(1,3x,3x curs_util)  generally  results  in(1,8)  a small
              improvement in(1,8) compression ratio at  the  expense  of  execution
              time.    This  is the only option to use a significant amount of
              floating point  processing  during  compression.   Decompression
              still uses a minimal number of floating point operations.

              Decompression  time(1,2,n)  is normally about twice that of the default
              polynomial interpolation.  For version(1,3,5) 0 and 1, compression time(1,2,n)
              is linear in(1,8) the specified maximum order as all lower values are
              searched for the greatest expected compression  (the  number  of
              bits  required  to transmit the prediction residual is monotoni-
              cally decreasing with prediction order,  but  transmitting  each
              filter(1,3x,3x curs_util)  coefficient  requires about 7 bits).   For version(1,3,5) 2 and
              above, the search is started at zero order and  terminated  when
              the  last  two prediction orders give a larger expected bit rate
              than the minimum found to date.   This is a reasonable  strategy
              for many real world signals - you may revert back to the exhaus-
              tive algorithm by setting -v1 to check that this works for  your
              signal(2,7) type.

       -q quantisation level
              Specify the number of low order bits in(1,8) each sample which can be
              discarded (set(7,n,1 builtins) to zero).  This is useful if(3,n) these bits carry  no
              information,  for example when the signal(2,7) is corrupted by noise.

       -r bit rate
              Specify the expected maximum number of  bits  per  sample.   The
              upper bound on the bit rate is achieved by setting the low order
              bits of the sample to zero, hence maximising the segmental  sig-
              nal(2,7) to noise ratio.

       -s     Write  seek table information to a separate file(1,n) (uses shortened
              file(1,n) name with '.skt' extension).  If the shortened data is read(2,n,1 builtins)
              from  standard  input,  then  the seek table information will be
              saved in(1,8) 'stdin.skt'.

       -S<name>
              Write seek  table  information  to  a  separate  file(1,n)  given  by
              "<name>".

       -t file(1,n) type
              Gives the type of the sound sample file(1,n) as one of aiff, wav, s8,
              u8, s16, u16, s16x, u16x, s16hl, u16hl, s16lh, u16lh,  ulaw,  or
              alaw.

              The simple types are listed first and have an initial s or u for
              signed or unsigned data, followed by 8 or 16 as  the  number  of
              bits  per sample.  No further extension means the data is in(1,8) the
              natural byte order, a trailing x specifies byte swapped data, hl
              explicitly  states  the  byte order as high byte followed by low
              byte and lh the converse.  Hence s16 means signed 16  bit  inte-
              gers in(1,8) the natural byte order (like C would fwrite() shorts).

              ulaw is the natural file(1,n) type of ulaw encoded files (such as the
              default sun .au files) and alaw is a similar byte-packed scheme.
              Specific  optimisations  are applied to ulaw and alaw files.  If
              lossless compression is specified with ulaw files then  a  check
              is  made  that the whole dynamic range is used (useful for files
              recorded on a SparcStation with the volume set(7,n,1 builtins) too high).  Loss-
              less(1,3)  coding  of  both file(1,n) types uses an internal format with a
              monotonic mapping to linear.  If lossy compression is  specified
              then  the  data  is  internally  converted to linear.  The lossy
              option "-r4" has been observed to give  little  degradation  and
              provides 2:1 compression.

              With the types listed above you should explicitly set(7,n,1 builtins) the number
              of channels (if(3,n) not mono) with -c and if(3,n)  the  file(1,n)  contains  a
              header  the  size  should  be  specified  with -a.  This is most
              important for lossy compression which will lead to data  corrup-
              tion if(3,n) a file(1,n) header is inadvertently lossy coded.

              Finally,  as  of  version(1,3,5) 2.3, the file(1,n) type may be specified as
              wav (the default).  In this case the file(1,n) to  be  compressed  is
              interogated  for  the specific data type (chosen from the above)
              and the number of channels to be used.  The header length align-
              ment  (-a  flag)  is  also  automatic  so  lossless  compression
              requires no switches to be set(7,n,1 builtins) and  lossy  compression  requires
              only that the compression level be set(7,n,1 builtins) with -n or -r.

       -u     The  ulaw  standard (ITU G711) has two codes which both map onto
              the zero value on a linear scale.   The "-u" flag maps the nega-
              tive zero onto the positive zero and so yields marginally better
              compression for format version(1,3,5) 2 (the gain  is  significant  for
              older format versions).

       -v version(1,3,5)
              Specify  the  binary  format version(1,3,5) number of compressed files.
              Legal values are currently 1, 2 and 3, with higher numbers  gen-
              erally  giving  better compression.  2 and 3 are identical, with
              the exception that 2 does not  generate  seek  tables,  while  3
              does.  Detection of format version(1,3,5) on decode is automatic.

       -x extract
              Reconstruct  the  original file.  All other command line options
              except -a and -d are ignored.


METHODOLOGY
       shorten works by blocking the signal(2,7), making a model of each  block  in(1,8)
       order  to remove temporal redundancy, then Huffman coding the quantised
       prediction residual.


   Blocking
       The signal(2,7) is read(2,n,1 builtins) in(1,8) a block of about 128 or  256  samples,  and  con-
       verted to integers with expected mean of zero.  Sample-wise-interleaved
       data is converted to separate channels, which are assumed  independent.


   Decorrelation
       Four  functions  are  computed, corresponding to the signal(2,7), difference
       signal(2,7), second and third order differences.  The one  with  the  lowest
       variance is coded.  The variance is measured by summing absolute values
       for speed and to avoid overflow.


   Compression
       It is assumed the signal(2,7) has the Laplacian probability density function
       of  exp(-abs(x)).   There is a computationally efficient way of mapping
       this density to Huffman codes, The code is in(1,8)  four  parts:  a  run  of
       zeros;  a  bounding  one; a fixed number of bits mantissa; and the sign
       bit.  The number of leading zeros gives the  offset  from  zero.   Some
       examples for a 2 bit mantissa:

              Value  zeros  stopbit  mantissa  signbit  total code
              0             1        00        0        1000
              1             1        01        0        1010
              2             1        10        0        1010
              4      0      1        00        0        01000
              7      0      1        11        0        01110
              8      00     1        00        0        001000
              -1            1        00        1        1001
              -2            1        01        1        1011
              -7     0      1        10        1        01101

       Note  that  negative  numbers  are offset by one as there is no need to
       have  two  zero  codes.   The  technical  report   CUED/F-INFENG/TR.156
       included  with the shorten distribution as files tr154.tex and tr154.ps
       contains bugs in(1,8) this format description and is superceeded by this man(1,5,7)
       page.


EMBEDDED OPERATION
       Shorten may be used embedded within other programs.  shorten is a func-
       tion call implemented in(1,8) the file(1,n) shorten.c.  The file(1,n) main.c  provides
       a wrapper for stand alone operation.  A simple example of ebedded oper-
       ation can be found in(1,8) the file(1,n) embedded.c.   Full windows DLL operation
       is provided in(1,8) the windll subdirectory.


SEE ALSO
       compress(1),pack(3,n,n pack-old)(1).


DIAGNOSTICS
       Exit  status  is  normally  0.   A warning is issued if(3,n) the file(1,n) is not
       properly aligned, i.e. a whole number of records could not be  read(2,n,1 builtins)  at
       the end of the file.

BUGS
       An  easy way to test shorten for your system is to use "make check", if(3,n)
       this  fails,  for  whatever  reason,  please  report   it   to   <shnu-
       tils@freeshell.org>.

       No  check  is  made  for increasing file(1,n) size, but valid waveform files
       generally achieve some compression.  Even compressing a file(1,n) of  random(3,4,6)
       bytes (which represents the worst case waveform file(1,n)) only results in(1,8) a
       small increase in(1,8) the file(1,n) length (about 6% for 8 bit data and  3%  for
       16  bit  data).  There is one condition that is know to be problematic,
       that is the lossy compression of unsigned data without mean  estimation
       -  large file(1,n) sizes may result if(3,n) the mean is far from the middle range
       value.  For these files the value of the -m switch(1,n) should be  non-zero,
       as it is by default in(1,8) format version(1,3,5) 2.

       There  is no provision for different channels containing different data
       types.  Normally, this is not a restriction, but it does mean  that  if(3,n)
       lossy coding is selected for the ulaw type, then all channels use lossy
       coding.

       The technical report CUED/F-INFENG/TR.156 (included in(1,8) the shorten dis-
       tribution)  report  contains  errors in(1,8) the bitfield format description
       and is superceeded by this document.

       See the file(1,n) "ChangeLog" for a history(1,3,n,1 builtins) of bug fixes and  feature  addi-
       tions.

       Please  mail(1,8)  Jason  Jordan  at  the address below if(3,n) you find a bug in(1,8)
       shorten involving seek tables.

       Please mail(1,8) Brian Willoughby at the address below if(3,n) you find a bug  in(1,8)
       the AIFF implementation.

       Please  mail(1,8) Tony Robinson immediately at the address below if(3,n) you find
       a bug in(1,8) shorten that is NOT related to seek tables  or  AIFF  support.
       Make  sure you can reproduce your bug using version(1,3,5) 2.3a, the last ver-
       sion(1,3,5) known to be released by him.


AVAILABILITY
       The   latest   2.x   and   3.x   versions   can   be   obtained    from
       <http://www.etree.org/shnutils/shorten/>        or        <http://shnu-
       tils.freeshell.org/shorten/>.


AUTHORS
       Copyright (C) 1992-1999 by Tony Robinson and SoftSound  Ltd  (ajr@soft-
       sound.com)

       Unix   maintenance   of   3.x   versions   by   Jason   Jordan   <shnu-
       tils@freeshell.org>.

       AIFF    support     and     maintenance     by     Brian     Willoughby
       <shorten@sounds.wa.com> of Sound Consulting <http://sounds.wa.com/>.

       Shorten  is  available  for  non-commercial  use  without fee.  See the
       LICENSE file(1,n) for the formal copying and usage restrictions.   For  sup-
       ported  versions  please  see http://www.softsound.com/Shorten.html and
       for commercial use please contact shorten@softsound.com



                                12 August 2001                      SHORTEN(1)

References for this manual (incoming links)