EXTRACT(1) EXTRACT(1)
NAME
extract - determine meta-information about a file(1,n)
SYNOPSIS
extract [ -abdfhLnrsvV ] [ -B language ] [ -H hash-algorithm ] [ -l
library ] [ -p type ] [ -x type ] file(1,n) ...
DESCRIPTION
This manual page documents version(1,3,5) 0.4.0 of the extract command.
extract tests each file(1,n) specified in(1,8) the argument list in(1,8) an attempt to
infer meta-information from it. Each file(1,n) is subjected to the
meta-data extraction libraries from libextractor.
libextractor classifies meta-information (also referred to as keywords)
into types. A list of all types can be obtained with the -L option.
OPTIONS
-a Do not remove any duplicates, even if(3,n) the keywords match
exactly and have the same type (i.e. because the same keyword
was found by different extractor libraries).
-b Display the output in(1,8) BiBTeX format. This implies the -d option
-B LANG Use the generic plaintext extractor for the language with the
2-letter language code LANG. Supported languages are DA (Dan-
ish), DE (German), EN (English), ES (Spanish), IT (Italian) and
NO (Norwegian).
-d Remove duplicates only if(3,n) the types match exactly. By default,
duplicates are removed if(3,n) the types match or if(3,n) one of the
types is I unknown (in(1,8) this case, the duplicate of unknown type
is removed).
-f add the filename(s) (without directory) to the list of key-
words.
-h Print a brief summary of the options.
-H ALGORITHM
Use the ALGORITHM to compute a hash of each file(1,n) (possible
algorithms are sha1 and md5(1,3,1 dgst)).
-L Print a list of all known keyword types.
-n Do not use the default set(7,n,1 builtins) of extractors (typically all stan-
dard extractors, currently mp3, ogg, jpg, gif, png, tiff, real,
html, pdf and mime-types), use only the extractors specified
with the .B -l option.
-r Remove all duplicates disregarding differences in(1,8) the keyword
type.
-s Split keywords at delimiters (space, comma, colon, etc.) and
list split(1,n) keywords to be of .I unknown type. This can also be
done by loading the split-library. Using this option guarantees
that the splitting is performed after all other libraries have
been run. It is always performed before duplicate elimination.
-v Print the version(1,3,5) number and exit.
-V Be verbose.
-B Run the printable extractor (costly, generic extractor for
binaries)
-l libraries
Use the specified libraries to extract keywords. The general
format of libraries is .I [[-]LIBRARYNAME[:[-]LIBRARYNAME]*]
where LIBRARYNAME is a libextractor compatible library and typ-
ically of the form .I libextractor_jpeg.so. The minus before
the libraryname indicates that this library should be run after
all the libraries that were specified so far. If the minus is
missing, the library is run before all previously specified
libraries.
-p type Print only the keywords matching the specified type. By
default, all keywords that are found and not removed as dupli-
cates are printed.
-x type Exclude keywords of the specified type from the output. By
default, all keywords that are found and not removed as dupli-
cates are printed.
SEE ALSO
libextractor(3) - description of the libextractor library
EXAMPLES
$ extract test/test.jpg
comment - (C) 2001 by Christian Grothoff, using gimp 1.2 1
mimetype - image/jpeg
$ extract -Vf -x comment test/test.jpg
Keywords for file(1,n) test/test.jpg:
mimetype - image/jpeg
filename - test.jpg
$ extract -p comment test/test.jpg
comment - (C) 2001 by Christian Grothoff, using gimp 1.2 1
$ extract -nV -l libextractor_png.so -p comment test/test.jpg test/test.png
Keywords for file(1,n) test/test.jpg:
Keywords for file(1,n) test/test.png:
comment - Testing keyword extraction
LEGAL NOTICE
libextractor and the extract tool are released under the GPL. libex-
tractor is a GNU project.
BUGS
A couple of file-formats (on the order of 10^3) are not recognized...
AUTHORS
extract was originally written by Christian Grothoff <chris-
tian@grothoff.org> and Vidyut Samanta <vids@cs.ucla.edu>. Use <libex-
tractor@gnu.org> to contact the current maintainer(s).
AVAILABILITY
You can obtain the original author's latest version(1,3,5) from
http://gnunet.org/libextractor/
libextractor 0.4.2 April 28, 2005 EXTRACT(1)