*** empty log message ***

2008-10-13 08:35:34 +00:00 · 2008-10-13 08:35:34 +00:00 · d910d2bebe
commit d910d2bebe
parent 34cd8293ac
2 changed files with 575 additions and 158 deletions
--- a/src/INSTALL
+++ b/src/INSTALL
@ -11,23 +11,21 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or

   --------------------------------------------------------------------------

-                            Chapter 4. Installation
+                            Chapter 5. Installation

   Table of Contents

-   4.1. Installing a prebuilt copy
+   5.1. Installing a prebuilt copy

-   4.2. Supporting packages
+   5.2. Supporting packages

-   4.3. Building from source
+   5.3. Building from source

-   4.4. Configuration overview
+   5.4. Configuration overview

-   4.5. The KDE Kicker Recoll applet
+   5.5. The KDE Kicker Recoll applet

-   4.6. Extending Recoll
-
-                        4.1. Installing a prebuilt copy
+                        5.1. Installing a prebuilt copy

   Recoll binary packages from the Recoll web site are always linked
   statically to the Xapian libraries, and have no other dependencies. You
@ -36,12 +34,12 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
   have a look at the configuration section (but this may not be necessary
   for a quick test with default parameters).

-4.1.1. Installing through a package system
+5.1.1. Installing through a package system

   If you use a BSD-type port system or a prebuilt package (RPM or other),
   just follow the usual procedure for your system.

-4.1.2. Installing a prebuilt Recoll
+5.1.2. Installing a prebuilt Recoll

   The unpackaged binary versions on the Recoll web site are just compressed
   tar files of a build tree, where only the useful parts were kept
@ -56,23 +54,29 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or

   --------------------------------------------------------------------------

-   Prev                                   Home                           Next 
-   Customizing the search interface                       Supporting packages 
+   Prev                               Home                               Next 
+   API                                                    Supporting packages 
   Link: HOME
   Link: UP
   Link: PREVIOUS
   Link: NEXT

                               Recoll user manual
-   Prev                     Chapter 4. Installation                      Next 
+   Prev                     Chapter 5. Installation                      Next 

   --------------------------------------------------------------------------

-                            4.2. Supporting packages
+                            5.2. Supporting packages

   Recoll uses external applications to index some file types. You need to
   install them for the file types that you wish to have indexed (these are
-   run-time dependencies. None is needed for building Recoll):
+   run-time dependencies. None is needed for building Recoll).
+
+   After an indexing pass, the commands that were found missing can be
+   displayed from the recoll File menu. The list is stored in the missing
+   text file inside the configuration directory.
+
+   A list of common file types which need external commands:

     * Openoffice: supported natively, but needs the unzip command to be
       installed.
@ -118,13 +122,13 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
   Link: NEXT

                               Recoll user manual
-   Prev                     Chapter 4. Installation                      Next 
+   Prev                     Chapter 5. Installation                      Next 

   --------------------------------------------------------------------------

-                           4.3. Building from source
+                           5.3. Building from source

-4.3.1. Prerequisites
+5.3.1. Prerequisites

   At the very least, you will need to download and install the xapian core
   package (Recoll 1.9 normally uses version 1.0.2, but any 0.9 or 1.0.x
@ -140,7 +144,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
   not be critical). On Linux systems, the iconv interface is part of libc
   and you should not need to do anything special.

-4.3.2. Building
+5.3.2. Building

   Recoll has been built on Linux (redhat7.3, mandriva 2005/6, Fedora Core
   3/4/5/6), FreeBSD 5/6, macosx, and Solaris 8. If you build on another
@ -178,7 +182,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
   manually copy and modify one of the existing files (the new file name
   should be the output of uname -s).

-4.3.3. Installation
+5.3.3. Installation

   Either type make install or execute recollinstall prefix, in the root of
   the source tree. This will copy the commands to prefix/bin and the sample
@ -201,11 +205,11 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
   Link: NEXT

                               Recoll user manual
-   Prev                     Chapter 4. Installation                      Next 
+   Prev                     Chapter 5. Installation                      Next 

   --------------------------------------------------------------------------

-                          4.4. Configuration overview
+                          5.4. Configuration overview

   Most of the parameters specific to the recoll GUI are set through the
   Preferences menu and stored in the standard QT place ($HOME/.qt/recollrc).
@ -263,7 +267,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
   White space is used for separation inside lists. List elements with
   embedded spaces can be quoted using double-quotes.

-4.4.1. Main configuration file
+5.4.1. Main configuration file

   recoll.conf is the main configuration file. It defines things like what to
   index (top directories and things to ignore), and the default character
@ -467,7 +471,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
           cases. A value of 3 would allow more precision and efficiency on
           longer words, but the index will be approximately twice as large.

-4.4.2. The mimemap file
+5.4.2. The mimemap file

   mimemap specifies the file name extension to mime type mappings.

@ -491,7 +495,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
   there avoids cluttering the more user-oriented and locally customized
   skippedNames.

-4.4.3. The mimeconf file
+5.4.3. The mimeconf file

   mimeconf specifies how the different mime types are handled for indexing,
   and which icons are displayed in the recoll result lists.
@ -503,7 +507,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
   recoll in the result lists (the values are the basenames of the png images
   inside the iconsdir directory (specified in recoll.conf).

-4.4.4. The mimeview file
+5.4.4. The mimeview file

   mimeview specifies which programs are started when you click on an Edit
   link in a result list. Ie: HTML is normally displayed using firefox, but
@ -524,9 +528,9 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
   user preferences, all mimeview entries will be ignored except the one
   labelled application/x-all (which is set to use xdg-open by default).

-4.4.5. Examples of configuration adjustments
+5.4.5. Examples of configuration adjustments

-  4.4.5.1. Adding an external viewer for an non-indexed type
+  5.4.5.1. Adding an external viewer for an non-indexed type

   Imagine that you have some kind of file which does not have indexable
   content, but for which you would like to have a functional Edit link in
@ -557,7 +561,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
   The entries you add in your personal file override those in the central
   configuration, which you do not need to alter

-  4.4.5.2. Adding indexing support for a new file type
+  5.4.5.2. Adding indexing support for a new file type

   Let us now imagine that the above .blob files actually contain indexable
   text and that you know how to extract it with a command line program.
@ -581,11 +585,10 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or

   The rclblob filter should be an executable program or script which exists
   inside /usr/[local/]share/recoll/filters. It will be given a file name as
-   argument and should output the text contents in html format on the
-   standard output.
+   argument and should output the text contents on the standard output.

-   You can find more details about writing a Recoll filter in the section
-   about writing filters
+   The filter programming section describes in more detail how to write a
+   filter.

   --------------------------------------------------------------------------

--- a/src/README
+++ b/src/README
@ -78,41 +78,51 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or

                3.12. Customizing the search interface

-   4. Installation
+   4. Programming interface

-                4.1. Installing a prebuilt copy
+                4.1. Writing a document filter

-                             4.1.1. Installing through a package system
+                             4.1.1. Filter HTML output

-                             4.1.2. Installing a prebuilt Recoll
+                4.2. Field data processing configuration

-                4.2. Supporting packages
+                4.3. API

-                4.3. Building from source
+                             4.3.1. Interface elements

-                             4.3.1. Prerequisites
+                             4.3.2. Python interface

-                             4.3.2. Building
+   5. Installation

-                             4.3.3. Installation
+                5.1. Installing a prebuilt copy

-                4.4. Configuration overview
+                             5.1.1. Installing through a package system

-                             4.4.1. Main configuration file
+                             5.1.2. Installing a prebuilt Recoll

-                             4.4.2. The mimemap file
+                5.2. Supporting packages

-                             4.4.3. The mimeconf file
+                5.3. Building from source

-                             4.4.4. The mimeview file
+                             5.3.1. Prerequisites

-                             4.4.5. Examples of configuration adjustments
+                             5.3.2. Building

-                4.5. The KDE Kicker Recoll applet
+                             5.3.3. Installation

-                4.6. Extending Recoll
+                5.4. Configuration overview

-                             4.6.1. Writing a document filter
+                             5.4.1. Main configuration file
+
+                             5.4.2. The mimemap file
+
+                             5.4.3. The mimeconf file
+
+                             5.4.4. The mimeview file
+
+                             5.4.5. Examples of configuration adjustments
+
+                5.5. The KDE Kicker Recoll applet

     ----------------------------------------------------------------------

@ -256,8 +266,14 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
   individually indexed documents.

   Recoll indexing processes plain text, HTML, openoffice and e-mail files
-   internally. Other types (ie: postscript, pdf, ms-word, rtf) need external
+   internally.
+
+   Other file types (ie: postscript, pdf, ms-word, rtf ...) need external
   applications for preprocessing. The list is in the installation section.
+   After every indexing operation, Recoll updates a list of commands that
+   would be needed for indexing existing files types. This list can be
+   displayed from the recoll File menu. It is stored in the missing text file
+   inside the configuration directory.

   Without further configuration, Recoll will index all appropriate files
   from your home directory, with a reasonable set of defaults.
@ -717,6 +733,9 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
   The query language processor is activated on the simple search entry when
   the search mode selector is set to Query Language.

+   The language is roughly based on the Xesam user search language
+   specification.
+
   Here follows a sample request that we are going to explain:

           author:"john doe" Beatles OR Lennon Live OR Unplugged -potatoes
@ -728,6 +747,12 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
   or lennon and either live or unplugged but not potatoes (in any part of
   the document).

+   An element is composed of an optional field specification, and a value,
+   separated by a colon. Exemple: Beatles, author:balzac, dc:title:grandet
+
+   The colon, if present, means "contains". Xesam defines other relations,
+   which are not supported for now.
+
   All elements in the search entry are normally combined with an implicit
   AND. It is possible to specify that elements be OR'ed instead, as in
   Beatles OR Lennon. The OR must be entered literally (capitals), and it has
@ -735,51 +760,69 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
   (word2 OR word3) not (word1 AND word2) OR word3. Do not enter explicit
   parenthesis, they are not supported for now.

-   An entry preceded by a - specifies a term that should not appear.
+   An element preceded by a - specifies a term that should not appear. Pure
+   negative queries are forbidden.

-   The first element in the above exemple, author:"john doe" is a phrase
-   search limited to a specific field. Phrase searches are specified as usual
-   by enclosing the words in double quotes. The field specification appears
-   before the colon (of course this is not limited to phrases, author:Balzac
-   would be ok too). Recoll currently manages the following fields:
+   As usual, words inside quotes define a phrase (the order of words is
+   significant), so that title:"prejudice pride" is not the same as
+   title:prejudice title:pride, and is unlikely to find a result.
+
+   Recoll currently manages the following default fields:

     * title, subject or caption are synonyms which specify data to be
       searched for in the document title or subject.

     * author or from for searching the documents originators.

-     * keyword for searching the document specified keywords (few documents
+     * recipient or to for searching the documents recipients.
+
+     * keyword for searching the document-specified keywords (few documents
       actually have any).

-   As of release 1.9, the filters have the possibility to create other fields
-   with arbitrary names. No standard filters use this possibility yet.
+     * filename for the document's file name.

-   There are two other elements which may be specified through the field
-   syntax, but are somewhat special:
+     * ext specifies the file name extension (Ex: ext:html)

-     * ext for specifying the file name extension (Ex: ext:html)
+   The field syntax also supports a few field-like, but special, criteria:

-     * dir for specifying the file location (Ex: dir:/home/me/somedir).
-       Please note that this is quite inefficient, that it may produce very
-       slow searches, and that it may be worth in some cases to set up
-       separate databases instead.
+     * dir for filtering the results on file location (Ex:
+       dir:/home/me/somedir). Please note that this is quite inefficient,
+       that it may produce very slow searches, and that it may be worth in
+       some cases to set up separate databases instead.

-     * mime for specifying the mime type. This one is quite special because
-       you can specify several values which will be OR'ed (the normal default
-       for the language is AND). Ex: mime:text/plain mime:text/html.
+     * mime or format for specifying the mime type. This one is quite special
+       because you can specify several values which will be OR'ed (the normal
+       default for the language is AND). Ex: mime:text/plain mime:text/html.
       Specifying an explicit boolean operator or negation (-) before a mime
       specification is not supported and will produce strange results.

+     * type or rclcat for specifying the category (as in
+       text/media/presentation/etc.). The classification of mime types in
+       categories is defined in the Recoll configuration (mimeconf), and can
+       be modified or extended. The default category names are those which
+       permit filtering results in the main GUI screen. Categories are OR'ed
+       like mime types above.
+
+   The document filters used while indexing have the possibility to create
+   other fields with arbitrary names, and aliases may be defined in the
+   configuration, so that the exact field search possibilities may be
+   different for you if someone took care of the customisation.
+
   The query language is currently the only way to use the Recoll field
   search capability.

   Words inside phrases and capitalized words are not stem-expanded.
   Wildcards may be used anywhere inside a term. Specifying a wild-card on
-   the left of a term can produce a very slow search.
+   the left of a term can produce a very slow search (or even an incorrect
+   one if the expansion is truncated because of excessive size).

   You can use the show query link at the top of the result list to check the
   exact query which was finally executed by Xapian.

+   Most Xesam phrase modifiers are unsupported, except for l (small ell) to
+   disable stemming, and p to turn an phrase into a NEAR (unordered) search.
+   Exemple: "prejudice pride"p
+
     ----------------------------------------------------------------------

 3.5. Complex/advanced search
@ -1194,13 +1237,432 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or

   Your main database (the one the current configuration indexes to), is
   always implicitly active. If this is not desirable, you can set up your
-   configuration so that it indexes, for example, an empty directory.
+   configuration so that it indexes, for example, an empty directory. An
+   alternative indexer may also need to implement a way of purging the index
+   from stale data,

     ----------------------------------------------------------------------

-                            Chapter 4. Installation
+                        Chapter 4. Programming interface

-4.1. Installing a prebuilt copy
+   Recoll has an Application programming Interface, usable both for indexing
+   and searching, currently accessible from the Python language.
+
+   Another less radical way to extend the application is to write filters for
+   new types of documents.
+
+   The processing of metadata attributes for documents (fields) is highly
+   configurable.
+
+     ----------------------------------------------------------------------
+
+4.1. Writing a document filter
+
+   Recoll filters are executable programs which translate from a specific
+   format (ie: openoffice, acrobat, etc.) to the Recoll indexing input
+   format, which may be text/plain or text/html.
+
+   Recoll filters are usually shell-scripts, but this is in no way necessary.
+   These programs are extremely simple and most of the difficulty lies in
+   extracting the text from the native format, not outputting what is
+   expected by Recoll. Happily enough, most document formats already have
+   translators or text extractors which handle the difficult part and can be
+   called from the filter. In some case the output of the translating program
+   is appropriate, and no intermediate shell-script is needed.
+
+   Filters are called with a single argument which is the source file name.
+   They should output the result to stdout.
+
+   The RECOLL_FILTER_FORPREVIEW environment variable (values yes, no) tells
+   the filter if the operation is for indexing or previewing. Some filters
+   use this to output a slightly different format. This is not essential.
+
+   The association of file types to filters is performed in the mimeconf
+   file. A sample:
+
+ 
[index]
+ application/msword = exec antiword -t -i 1 -m UTF-8;\
+      mimetype=text/plain;charset=utf-8
+
+ application/ogg = exec rclogg
+
+ text/rtf = exec unrtf --nopict --html; charset=iso-8859-1; mimetype=text/html
+
+   The fragment specifies that:
+
+     * application/msword files are processed by executing the antiword
+       program, which outputs text/plain encoded in iso-8859-1.
+
+     * application/ogg files are processed by the rclogg script, with default
+       output type (text/html, with encoding specified in the header, or
+       utf-8 by default).
+
+     * text/rtf is processed by unrtf, which outputs text/html. The
+       iso-8859-1 encoding is specified because it is not the utf-8 default,
+       and not output by unrtf in the HTML header section.
+
+   The easiest way to write a new filter is probably to start from an
+   existing one.
+
+   Filters which output text/plain text are generally simpler, but they
+   cannot specify the character set and other metadata, so they are limited
+   to cases where these elements are not needed.
+
+     ----------------------------------------------------------------------
+
+  4.1.1. Filter HTML output
+
+   The output HTML could be very minimal like the following example:
+
+ <html><head>
+ <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
+ </head>
+ <body>some text content</body></html>
+         
+
+   You should take care to escape some characters inside the text by
+   transforming them into appropriate entities. "&" should be transformed
+   into "&amp;", "<" should be transformed into "&lt;". This is not always
+   properly done by translating programs which output HTML, and of course
+   nerver by those which output plain text.
+
+   The character set needs to be specified in the header. It does not need to
+   be UTF-8 (Recoll will take care of translating it), but it must be
+   accurate for good results.
+
+   Recoll will also make use of other header fields if they are present:
+   title, description, keywords.
+
+   Filters also have the possibility to "invent" field names. This should be
+   output as meta tags:
+
+ <meta name="somefield" content="Some textual data" />
+
+   See the following section for details about configuring how field data is
+   processed by the indexer.
+
+     ----------------------------------------------------------------------
+
+4.2. Field data processing configuration
+
+   Fields are named pieces of information in or about documents, like title,
+   author, abstract.
+
+   The field values for documents can appear in several ways during indexing:
+   either output by filters as meta fields in the HTML header section, or
+   added as attributes of the Doc object when using the API, or again
+   synthetized internally by Recoll.
+
+   The Recoll query language allows searching for text in a specific field.
+
+   Recoll defines a number of default fields. Additional ones can be output
+   by filters, and described in the fields configuration file.
+
+   Fields can be:
+
+     * indexed, meaning that their terms are separately stored in inverted
+       lists (with a specific prefix), and that a field-specific search is
+       possible.
+
+     * stored, meaning that their value is recorded in the index data record
+       for the document, and can be returned and displayed with search
+       results.
+
+   A field can be either or both indexed and stored.
+
+   A field becomes indexed by having a prefix defined in the [prefixes]
+   section of the fields file. See the comments in there for details
+
+   A field becomes stored by appearing in the [stored] section of the fields
+   file.
+
+     ----------------------------------------------------------------------
+
+4.3. API
+
+  4.3.1. Interface elements
+
+   A few elements in the interface are specific and and need an explanation.
+
+   udi
+
+           An udi (unique document identifier) identifies a document. Because
+           of limitations inside the index engine, it is restricted in length
+           (to 200 bytes), which is why a regular URI cannot be used. The
+           structure and contents of the udi is defined by the application
+           and opaque to the index engine. For example, the internal file
+           system indexer uses the complete document path (file path +
+           internal path), truncated to length, the suppressed part being
+           replaced by a hash value.
+
+   ipath
+
+           This data value (set as a field in the Doc object) is stored,
+           along with the URL, but not indexed by Recoll. Its contents are
+           not interpreted, and its use is up to the application. For
+           example, the Recoll internal file system indexer stores the part
+           of the document access path internal to the container file (ipath
+           in this case is a list of subdocument sequential numbers). url and
+           ipath are returned in every search result and permit access to the
+           original document.
+
+   Stored and indexed fields
+
+           The fields file inside the Recoll configuration defines which
+           document fields are either "indexed" (searchable), "stored"
+           (retrievable with search results), or both.
+
+   Data for an external indexer, should be stored in a separate index, not
+   the one for the Recoll internal file system indexer, except if the latter
+   is not used at all). The reason is that the main document indexer purge
+   pass would remove all the other indexer's documents, as they were not seen
+   during indexing. The main indexer documents would also probably be a
+   problem for the external indexer purge operation.
+
+     ----------------------------------------------------------------------
+
+  4.3.2. Python interface
+
+    4.3.2.1. Introduction
+
+   Recoll versions after 1.11 define a Python programming interface, both for
+   searching and indexing.
+
+   The python interface is not built by default and can be found in the
+   source package, under python/recoll. The directory contains the usual
+   setup.py script which you can use to build and install the module:
+
+         cd recoll-xxx/python/recoll
+         python setup.py build
+         python setup.py install
+     
+
+     ----------------------------------------------------------------------
+
+    4.3.2.2. Interface manual
+
+   NAME
+       recoll - This is an interface to the Recoll full text indexer.
+
+   FILE
+       /usr/local/lib/python2.5/site-packages/recoll.so
+
+   CLASSES
+           Db
+           Doc
+           Query
+           SearchData
+       
+       class Db(__builtin__.object)
+        |  Db([confdir=None], [extra_dbs=None], [writable = False])
+        |  
+        |  A Db object holds a connection to a Recoll index. Use the connect()
+        |  function to create one.
+        |  confdir specifies a Recoll configuration directory (default: 
+        |   $RECOLL_CONFDIR or ~/.recoll).
+        |  extra_dbs is a list of external databases (xapian directories)
+        |  writable decides if we can index new data through this connection
+        |  
+        |  Methods defined here:
+        |  
+        |  
+        |  addOrUpdate(...)
+        |      addOrUpdate(udi, doc, parent_udi=None) -> None
+        |      Add or update index data for a given document
+        |      The udi string must define a unique id for the document. It is not
+        |      interpreted inside Recoll
+        |      doc is a Doc object
+        |      if parent_udi is set, this is a unique identifier for the
+        |      top-level container (ie mbox file)
+        |  
+        |  delete(...)
+        |      delete(udi) -> Bool.
+        |      Purge index from all data for udi. If udi matches a container
+        |      document, purge all subdocs (docs with a parent_udi matching udi).
+        |  
+        |  makeDocAbstract(...)
+        |      makeDocAbstract(Doc, Query) -> string
+        |      Build and return 'keyword-in-context' abstract for document
+        |      and query.
+        |  
+        |  needUpdate(...)
+        |      needUpdate(udi, sig) -> Bool.
+        |      Check if the index is up to date for the document defined by udi,
+        |      having the current signature sig.
+        |  
+        |  purge(...)
+        |      purge() -> Bool.
+        |      Delete all documents that were not touched during the just finished
+        |      indexing pass (since open-for-write). These are the documents for
+        |      the needUpdate() call was not performed, indicating that they no
+        |      longer exist in the primary storage system.
+        |  
+        |  query(...)
+        |      query() -> Query. Return a new, blank query object for this index.
+        |  
+        |  setAbstractParams(...)
+        |      setAbstractParams(maxchars, contextwords).
+        |      Set the parameters used to build 'keyword-in-context' abstracts
+        |  
+        |  ----------------------------------------------------------------------
+        |  Data and other attributes defined here:
+        |  
+       
+       class Doc(__builtin__.object)
+        |  Doc()
+        |  
+        |  A Doc object contains index data for a given document.
+        |  The data is extracted from the index when searching, or set by the
+        |  indexer program when updating. The Doc object has no useful methods but
+        |  many attributes to be read or set by its user. It matches exactly the
+        |  Rcl::Doc c++ object. Some of the attributes are predefined, but, 
+        |  especially when indexing, others can be set, the name of which will be
+        |  processed as field names by the indexing configuration.
+        |  Inputs can be specified as unicode or strings.
+        |  Outputs are unicode objects.
+        |  All dates are specified as unix timestamps, printed as strings
+        |  Predefined attributes (index/query/both):
+        |   text (index): document plain text
+        |   url (both)
+        |   fbytes (both) optional) file size in bytes
+        |   filename (both)
+        |   fmtime (both) optional file modification date. Unix time printed 
+        |      as string
+        |   dbytes (both) document text bytes
+        |   dmtime (both) document creation/modification date
+        |   ipath (both) value private to the app.: internal access path
+        |      inside file
+        |   mtype (both) mime type for original document
+        |   mtime (query) dmtime if set else fmtime
+        |   origcharset (both) charset the text was converted from
+        |   size (query) dbytes if set, else fbytes
+        |   sig (both) app-defined file modification signature. 
+        |      For up to date checks
+        |   relevancyrating (query)
+        |   abstract (both)
+        |   author (both)
+        |   title (both)
+        |   keywords (both)
+        |  
+        |  Methods defined here:
+        |  
+        |  
+        |  ----------------------------------------------------------------------
+        |  Data and other attributes defined here:
+        |  
+       
+       class Query(__builtin__.object)
+        |  Recoll Query objects are used to execute index searches. 
+        |  They must be created by the Db.query() method.
+        |  
+        |  Methods defined here:
+        |  
+        |  
+        |  execute(...)
+        |      execute(query_string, stemming=1|0)
+        |      
+        |      Starts a search for query_string, a Recoll search language string
+        |      (mostly Xesam-compatible).
+        |      The query can be a simple list of terms (and'ed by default), or more
+        |      complicated with field specs etc. See the Recoll manual.
+        |  
+        |  executesd(...)
+        |      executesd(SearchData)
+        |      
+        |      Starts a search for the query defined by the SearchData object.
+        |  
+        |  fetchone(...)
+        |      fetchone(None) -> Doc
+        |      
+        |      Fetches the next Doc object in the current search results.
+        |  
+        |  sortby(...)
+        |      sortby(field=fieldname, ascending=true)
+        |      Sort results by 'fieldname', in ascending or descending order.
+        |      Only one field can be used, no subsorts for now.
+        |      Must be called before executing the search
+        |  
+        |  ----------------------------------------------------------------------
+        |  Data descriptors defined here:
+        |  
+        |  next
+        |      Next index to be fetched from results. Normally increments after
+        |      each fetchone() call, but can be set/reset before the call effect
+        |      seeking. Starts at 0
+        |  
+        |  ----------------------------------------------------------------------
+        |  Data and other attributes defined here:
+        |  
+       
+       class SearchData(__builtin__.object)
+        |  SearchData()
+        |  
+        |  A SearchData object describes a query. It has a number of global
+        |  parameters and a chain of search clauses.
+        |  
+        |  Methods defined here:
+        |  
+        |  
+        |  addclause(...)
+        |      addclause(type='and'|'or'|'excl'|'phrase'|'near'|'sub',
+        |                qstring=string, slack=int, field=string, stemming=1|0,
+        |                subSearch=SearchData)
+        |      Adds a simple clause to the SearchData And/Or chain, or a subquery
+        |      defined by another SearchData object
+        |  
+        |  ----------------------------------------------------------------------
+        |  Data and other attributes defined here:
+        |  
+
+   FUNCTIONS
+       connect(...)
+           connect([confdir=None], [extra_dbs=None], [writable = False])
+                    -> Db.
+           
+           Connects to a Recoll database and returns a Db object.
+           confdir specifies a Recoll configuration directory
+           (the default is built like for any Recoll program).
+           extra_dbs is a list of external databases (xapian directories)
+           writable decides if we can index new data through this connection
+
+   
+
+     ----------------------------------------------------------------------
+
+    4.3.2.3. Example code
+
+   The following sample would query the index with a user language string.
+   See the python/samples directory inside the Recoll source for other
+   examples.
+
+ #!/usr/bin/env python
+
+ import recoll
+
+ db = recoll.connect()
+ db.setAbstractParams(maxchars=80, contextwords=2)
+
+ query = db.query()
+ nres = query.execute("some user question")
+ print "Result count: ", nres
+ if nres > 5:
+     nres = 5
+ while query.next >= 0 and query.next < nres:
+     doc = query.fetchone()
+     print query.next
+     for k in ("title", "size"):
+         print k, ":", getattr(doc, k).encode('utf-8')
+     abs = db.makeDocAbstract(doc, query).encode('utf-8')
+     print abs
+     print
+
+ 
+
+     ----------------------------------------------------------------------
+
+                            Chapter 5. Installation
+
+5.1. Installing a prebuilt copy

   Recoll binary packages from the Recoll web site are always linked
   statically to the Xapian libraries, and have no other dependencies. You
@ -1211,14 +1673,14 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or

     ----------------------------------------------------------------------

-  4.1.1. Installing through a package system
+  5.1.1. Installing through a package system

   If you use a BSD-type port system or a prebuilt package (RPM or other),
   just follow the usual procedure for your system.

     ----------------------------------------------------------------------

-  4.1.2. Installing a prebuilt Recoll
+  5.1.2. Installing a prebuilt Recoll

   The unpackaged binary versions on the Recoll web site are just compressed
   tar files of a build tree, where only the useful parts were kept
@ -1233,11 +1695,17 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or

     ----------------------------------------------------------------------

-4.2. Supporting packages
+5.2. Supporting packages

   Recoll uses external applications to index some file types. You need to
   install them for the file types that you wish to have indexed (these are
-   run-time dependencies. None is needed for building Recoll):
+   run-time dependencies. None is needed for building Recoll).
+
+   After an indexing pass, the commands that were found missing can be
+   displayed from the recoll File menu. The list is stored in the missing
+   text file inside the configuration directory.
+
+   A list of common file types which need external commands:

     * Openoffice: supported natively, but needs the unzip command to be
       installed.
@ -1275,9 +1743,9 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or

     ----------------------------------------------------------------------

-4.3. Building from source
+5.3. Building from source

-  4.3.1. Prerequisites
+  5.3.1. Prerequisites

   At the very least, you will need to download and install the xapian core
   package (Recoll 1.9 normally uses version 1.0.2, but any 0.9 or 1.0.x
@ -1295,7 +1763,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or

     ----------------------------------------------------------------------

-  4.3.2. Building
+  5.3.2. Building

   Recoll has been built on Linux (redhat7.3, mandriva 2005/6, Fedora Core
   3/4/5/6), FreeBSD 5/6, macosx, and Solaris 8. If you build on another
@ -1335,7 +1803,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or

     ----------------------------------------------------------------------

-  4.3.3. Installation
+  5.3.3. Installation

   Either type make install or execute recollinstall prefix, in the root of
   the source tree. This will copy the commands to prefix/bin and the sample
@ -1350,7 +1818,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or

     ----------------------------------------------------------------------

-4.4. Configuration overview
+5.4. Configuration overview

   Most of the parameters specific to the recoll GUI are set through the
   Preferences menu and stored in the standard QT place ($HOME/.qt/recollrc).
@ -1410,7 +1878,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or

     ----------------------------------------------------------------------

-  4.4.1. Main configuration file
+  5.4.1. Main configuration file

   recoll.conf is the main configuration file. It defines things like what to
   index (top directories and things to ignore), and the default character
@ -1616,7 +2084,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or

     ----------------------------------------------------------------------

-  4.4.2. The mimemap file
+  5.4.2. The mimemap file

   mimemap specifies the file name extension to mime type mappings.

@ -1642,7 +2110,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or

     ----------------------------------------------------------------------

-  4.4.3. The mimeconf file
+  5.4.3. The mimeconf file

   mimeconf specifies how the different mime types are handled for indexing,
   and which icons are displayed in the recoll result lists.
@ -1656,7 +2124,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or

     ----------------------------------------------------------------------

-  4.4.4. The mimeview file
+  5.4.4. The mimeview file

   mimeview specifies which programs are started when you click on an Edit
   link in a result list. Ie: HTML is normally displayed using firefox, but
@ -1679,9 +2147,9 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or

     ----------------------------------------------------------------------

-  4.4.5. Examples of configuration adjustments
+  5.4.5. Examples of configuration adjustments

-    4.4.5.1. Adding an external viewer for an non-indexed type
+    5.4.5.1. Adding an external viewer for an non-indexed type

   Imagine that you have some kind of file which does not have indexable
   content, but for which you would like to have a functional Edit link in
@ -1714,7 +2182,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or

     ----------------------------------------------------------------------

-    4.4.5.2. Adding indexing support for a new file type
+    5.4.5.2. Adding indexing support for a new file type

   Let us now imagine that the above .blob files actually contain indexable
   text and that you know how to extract it with a command line program.
@ -1738,86 +2206,32 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or

   The rclblob filter should be an executable program or script which exists
   inside /usr/[local/]share/recoll/filters. It will be given a file name as
-   argument and should output the text contents in html format on the
-   standard output.
+   argument and should output the text contents on the standard output.

-   You can find more details about writing a Recoll filter in the section
-   about writing filters
+   The filter programming section describes in more detail how to write a
+   filter.

     ----------------------------------------------------------------------

-4.5. The KDE Kicker Recoll applet
+5.5. The KDE Kicker Recoll applet

   The Recoll source tree contains the source code to the recoll_applet, a
   small application derived from the find_applet. This can be used to add a
   small Recoll launcher to the KDE panel.

-   The applet is not automatically built with the main Recoll programs. To
-   build it, you need to unpack the Recoll source code, then go to the
-   kde/recoll_applet/ directory, and type the usual configure;make;make
-   install.
+   The applet is not automatically built with the main Recoll programs, nor
+   is it included with the main source distribution (because the KDE build
+   boilerplate makes it relatively big). You can download its source from the
+   recoll.org download page. Use the omnipotent configure;make;make install
+   incantation to build and install.

   You can then add the applet to the panel by right-clicking the panel and
   choosing the Add applet entry.

   The recoll_applet has a small text window where you can type a Recoll
   query (in query language form), and an icon which can be used to restrict
-   the search to certain types of files.
-
-     ----------------------------------------------------------------------
-
-4.6. Extending Recoll
-
-  4.6.1. Writing a document filter
-
-   Recoll filters are executable programs which translate from a specific
-   format (ie: openoffice, acrobat, etc.) to the Recoll indexing input
-   format, which was chosen to be HTML.
-
-   Recoll filters are usually shell-scripts, but this is in no way necessary.
-   These programs are extremely simple and most of the difficulty lies in
-   extracting the text from the native format, not outputting what is
-   expected by Recoll. Happily enough, most document formats already have
-   translators or text extractors which handle the difficult part and can be
-   called from the filter.
-
-   Filters are called with a single argument which is the source file name.
-   They should output the result to stdout.
-
-   The RECOLL_FILTER_FORPREVIEW environment variable (values yes, no) tells
-   the filter if the operation is for indexing or previewing. Some filters
-   use this to output a slightly different format. This is not essential.
-
-   The output HTML could be very minimal like the following example:
-
- <html><head>
- <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
- </head>
- <body>some text content</body></html>
-         
-
-   You should take care to escape some characters inside the text by
-   transforming them into appropriate entities. "&" should be transformed
-   into "&amp;", "<" should be transformed into "&lt;".
-
-   The character set needs to be specified in the header. It does not need to
-   be UTF-8 (Recoll will take care of translating it), but it must be
-   accurate for good results.
-
-   Recoll will also make use of other header fields if they are present:
-   title, description, keywords.
-
-   As of Recoll release 1.9, filters also have the possibility to "invent"
-   field names. This should be output as meta tags:
-
- <meta name="somefield" content="Some textual data" />
-
-   In this case, a correspondance between field name and Xapian prefix should
-   also be added to the mimeconf file. See the existing entries for
-   inspiration. The field can then be used inside the query language to
-   narrow searches.
-
-   The easiest way to write a new filter is probably to start from an
-   existing one.
+   the search to certain types of files. It is quite primitive, and launches
+   a new recoll GUI instance every time (even if it is already running). You
+   may find it useful anyway.

     ----------------------------------------------------------------------