Seth Woolley's Man Viewer

extract(1) - extract - determine meta-information about a file - man 1 extract

([section] manual, -k keyword, -K [section] search, -f whatis)
man plain no title

EXTRACT(1)                                                          EXTRACT(1)



NAME
       extract - determine meta-information about a file(1,n)

SYNOPSIS
       extract  [  -abdfhLnrsvV  ]  [ -B language ] [ -H hash-algorithm ] [ -l
       library ] [ -p type ] [ -x type ] file(1,n) ...

DESCRIPTION
       This manual page documents version(1,3,5) 0.4.0 of the extract command.

       extract tests each file(1,n) specified in(1,8) the argument list in(1,8) an attempt to
       infer  meta-information  from  it.   Each  file(1,n)  is  subjected  to  the
       meta-data extraction libraries from libextractor.

       libextractor classifies meta-information (also referred to as keywords)
       into types. A list of all types can be obtained with the -L option.


OPTIONS
       -a      Do  not  remove  any  duplicates,  even  if(3,n)  the keywords match
               exactly and have the same type (i.e. because the  same  keyword
               was found by different extractor libraries).

       -b      Display the output in(1,8) BiBTeX format. This implies the -d option

       -B LANG Use the generic plaintext extractor for the language  with  the
               2-letter  language code LANG.  Supported languages are DA (Dan-
               ish), DE (German), EN (English), ES (Spanish), IT (Italian) and
               NO (Norwegian).

       -d      Remove  duplicates only if(3,n) the types match exactly. By default,
               duplicates are removed if(3,n) the types match  or  if(3,n)  one  of  the
               types is I unknown (in(1,8) this case, the duplicate of unknown type
               is removed).

       -f      add the filename(s) (without directory) to  the  list  of  key-
               words.

       -h      Print a brief summary of the options.

       -H ALGORITHM
               Use  the  ALGORITHM  to  compute  a hash of each file(1,n) (possible
               algorithms are sha1 and md5(1,3,1 dgst)).

       -L      Print a list of all known keyword types.

       -n      Do not use the default set(7,n,1 builtins) of extractors (typically  all  stan-
               dard extractors, currently mp3, ogg, jpg, gif, png, tiff, real,
               html, pdf and mime-types), use only  the  extractors  specified
               with the .B -l option.

       -r      Remove  all  duplicates disregarding differences in(1,8) the keyword
               type.

       -s      Split keywords at delimiters (space, comma,  colon,  etc.)  and
               list  split(1,n) keywords to be of .I unknown type. This can also be
               done by loading the split-library. Using this option guarantees
               that  the splitting is performed after all other libraries have
               been run. It is always performed before duplicate  elimination.

       -v      Print the version(1,3,5) number and exit.

       -V      Be verbose.

       -B      Run  the  printable  extractor  (costly,  generic extractor for
               binaries)

       -l libraries
               Use the specified libraries to extract  keywords.  The  general
               format  of  libraries  is .I [[-]LIBRARYNAME[:[-]LIBRARYNAME]*]
               where LIBRARYNAME is a libextractor compatible library and typ-
               ically  of  the  form .I libextractor_jpeg.so. The minus before
               the libraryname indicates that this library should be run after
               all  the  libraries that were specified so far. If the minus is
               missing, the library is run  before  all  previously  specified
               libraries.

       -p type Print  only  the  keywords  matching  the  specified  type.  By
               default, all keywords that are found and not removed as  dupli-
               cates are printed.

       -x type Exclude  keywords  of  the  specified  type from the output. By
               default, all keywords that are found and not removed as  dupli-
               cates are printed.

SEE ALSO
       libextractor(3) - description of the libextractor library

EXAMPLES
       $ extract test/test.jpg
       comment - (C) 2001 by Christian Grothoff, using gimp 1.2 1
       mimetype - image/jpeg

       $ extract -Vf -x comment test/test.jpg
       Keywords for file(1,n) test/test.jpg:
       mimetype - image/jpeg
       filename - test.jpg

       $ extract -p comment test/test.jpg
       comment - (C) 2001 by Christian Grothoff, using gimp 1.2 1

       $ extract -nV -l libextractor_png.so -p comment test/test.jpg test/test.png
       Keywords for file(1,n) test/test.jpg:
       Keywords for file(1,n) test/test.png:
       comment - Testing keyword extraction


LEGAL NOTICE
       libextractor  and  the extract tool are released under the GPL.  libex-
       tractor is a GNU project.


BUGS
       A couple of file-formats (on the order of 10^3) are not recognized...


AUTHORS
       extract  was  originally  written   by   Christian   Grothoff   <chris-
       tian@grothoff.org>  and  Vidyut Samanta <vids@cs.ucla.edu>. Use <libex-
       tractor@gnu.org> to contact the current maintainer(s).


AVAILABILITY
       You  can   obtain   the   original   author's   latest   version(1,3,5)   from
       http://gnunet.org/libextractor/



libextractor 0.4.2              April 28, 2005                      EXTRACT(1)

References for this manual (incoming links)