Seth Woolley's Man Viewer

libextractor(3) - libextractor - meta-information extraction library 0.5.0 - man 3 libextractor

([section] manual, -k keyword, -K [section] search, -f whatis)
man plain no title

LIBEXTRACTOR(3)                                                LIBEXTRACTOR(3)



NAME
       libextractor - meta-information extraction library 0.5.0

SYNOPSIS
       #include <extractor.h>

        typedef struct EXTRACTOR_Keywords {
          char * keyword;
          EXTRACTOR_KeywordType keywordType;
          struct EXTRACTOR_Keywords * next;
        } EXTRACTOR_KeywordList;


        EXTRACTOR_ExtractorList * EXTRACTOR_loadDefaultLibraries ();

        const  char  *  EXTRACTOR_getKeywordTypeAsString (const EXTRACTOR_Key-
       wordType type);

        EXTRACTOR_ExtractorList   *   EXTRACTOR_loadConfigLibraries   (EXTRAC-
       TOR_ExtractorList * prev, const char * config(1,5));

        EXTRACTOR_ExtractorList   *   EXTRACTOR_addLibrary  (EXTRACTOR_Extrac-
       torList * prev, const char * library);

        EXTRACTOR_ExtractorList * EXTRACTOR_addLibraryLast  (EXTRACTOR_Extrac-
       torList * prev, const char * library);

        EXTRACTOR_ExtractorList  *  EXTRACTOR_removeLibrary (EXTRACTOR_Extrac-
       torList * prev, const char * library);

        void EXTRACTOR_removeAll (EXTRACTOR_ExtractorList * prev);

        EXTRACTOR_KeywordList * EXTRACTOR_getKeywords (EXTRACTOR_ExtractorList
       * extractor, const char * filename);

        EXTRACTOR_KeywordList  * EXTRACTOR_removeEmptyKeywords (EXTRACTOR_Key-
       wordList * list);

        EXTRACTOR_KeywordList  *  EXTRACTOR_removeDuplicateKeywords   (EXTRAC-
       TOR_KeywordList * list, const unsigned int options);

        void  EXTRACTOR_printKeywords  (FILE * handle, EXTRACTOR_KeywordList *
       keywords);

        void EXTRACTOR_freeKeywords (EXTRACTOR_KeywordList * keywords);

        const char  *  EXTRACTOR_extractLast  (const  EXTRACTOR_KeywordType  *
       type, EXTRACTOR_KeywordList * keywords);

        const char * EXTRACTOR_extractLastByString (const char * type, EXTRAC-
       TOR_KeywordList * keywords);

        unsigned int  EXTRACTOR_countKeywords  (EXTRACTOR_KeywordList  *  key-
       words);

        EXTRACTOR_DEFAULT_LIBRARIES

        EXTRACTOR_VERSION


DESCRIPTION
       libextractor  is a simple library for keyword extraction.  libExtractor
       does not support all formats but supports a simple  plugging  mechanism
       such  that  you can quickly add extractors for additional formats, even
       without recompiling libExtractor.  libExtractor  typically  ships  with
       one  or  more helper-libraries that can be used to obtain keywords from
       common file-types.  If you want to write(1,2) your own  extractor  for  some
       filetype,  all you need to do is write(1,2) a little library that implements
       a single method with this signature:

        EXTRACTOR_KeywordList * LIBRARYNAME_extract(const char * filename,
                                                    char * data,
                                                    size_t size,
                                                    EXTRACTOR_KeywordList    *
       prev);


       The filename is the name of the file(1,n), data is a pointer to the contents
       of the file(1,n) and size is the size of the file.  The extract method  must
       prepend keywords that it finds to the linked list 'prev' and return the
       new head. The library must allocate (malloc) the entry in(1,8)  the  keyword
       list  and  the  memory  for  the filename since both will be free'ed by
       libExtractor once the application calls freeKeywords. An example imple-
       mentation  can  be  found  in(1,8)  mp3extractor.c.  The application extract
       gives an example how to use libExtractor.


       The basic use of libextractor is to load(7,n) the plugins (for example  with
       EXTRACTOR_loadDefaultLibraries), then to extract the keyword list using
       EXTRACTOR_getKeywords, processing the list (using application  specific
       code and possibly some of the postprocessing convenience functions like
       EXTRACTOR_removeDuplicateKeywords), freeing  the  keyword  list  (using
       EXTRACTOR_freeKeywords) and finally unloading the plugins (with EXTRAC-
       TOR_removeAll).

       The keywords obtained  from  libextractor  are  supposed  to  be  UTF-8
       encoded.   The EXTRACTOR_printKeywords function converts the UTF-8 key-
       words to the character set(7,n,1 builtins) from  the  current  locale(3,5,7)  before  printing
       them.  Plugins are supposed to convert meta-data to UTF-8 if(3,n) necessary.


SEE ALSO
       extract(1)


LEGAL NOTICE
       libextractor  is  released  under   the   GPL   and   a   GNU   project
       (http://www.gnu.org/).


BUGS
       A couple of file-formats (on the order of 10^3) are not recognized...


AUTHORS
       extract   was   originally   written   by  Christian  Grothoff  <chris-
       tian@grothoff.org> and Vidyut Samanta <vids@cs.ucla.edu>.  Use  <libex-
       tractor@gnu.org> to contact the current maintainer(s).


AVAILABILITY
       You   can   obtain   the   original   author's   latest   version(1,3,5)  from
       http://gnunet.org/libextractor/.



                                  Apr 5, 2005                  LIBEXTRACTOR(3)

References for this manual (incoming links)