*** empty log message ***

2006-04-26 11:51:32 +00:00 · 2006-04-26 11:51:32 +00:00 · 1bcdf8515e
commit 1bcdf8515e
parent 4718c4016d
2 changed files with 212 additions and 83 deletions
--- a/src/INSTALL
+++ b/src/INSTALL
@ -28,9 +28,9 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
 4.1.1. Prerequisites

   At the very least, you will need to download and install the xapian core
-   package (Recoll currently uses version 0.9.2), and the qt runtime and
-   development packages (Recoll development currently uses version 3.3.5, but
-   any 3.3 version is probably ok).
+   package (Recoll development currently uses version 0.9.5), and the qt
+   runtime and development packages (Recoll development currently uses
+   version 3.3.5, but any 3.3 version is probably ok).

   You will most probably be able to find a binary package for qt for your
   system. You may have to compile Xapian but this is not difficult (if you
--- a/src/README
+++ b/src/README
@ -27,15 +27,19 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or

                1.3. Recoll overview

-   2. Indexation
+   2. Indexing

                2.1. Introduction

-                2.2. The indexation configuration
+                2.2. Index storage

-                2.3. Starting indexation
+                             2.2.1. Security aspects

-                2.4. Using cron to automate indexation
+                2.3. The indexing configuration
+
+                2.4. Starting indexing
+
+                2.5. Using cron to automate indexing

   3. Search

@ -43,13 +47,17 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or

                3.2. Complex/advanced search

-                3.3. Document history
+                3.3. Multiple databases

-                3.4. Result list sorting
+                3.4. Document history

-                3.5. Search tips, shortcuts
+                3.5. Result list sorting

-                3.6. Customising the search interface
+                3.6. Additional result list functionality
+
+                3.7. Search tips, shortcuts
+
+                3.8. Customising the search interface

   4. Installation

@ -136,27 +144,27 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
   Recoll uses the Xapian information retrieval library as its storage and
   retrieval engine. Xapian is a very mature package using a sophisticated
   probabilistic ranking model. Recoll provides the interface to get data
-   into (indexation) and out (searching) of the system.
+   into (indexing) and out (searching) of the system.

   In practice, Xapian works by remembering where terms appear in your
-   document files. The acquisition process is called indexation.
+   document files. The acquisition process is called indexing.

-   The resulting database can be big (roughly the size of the original
-   document set), but it is not a document archive. Recoll can only display
-   documents that still exist at the place from which they were indexed.
-   (Actually, there is a way to reconstruct a document from the information
-   in the database, but the result is not nice, as all formatting,
-   punctuation and capitalisation are lost).
+   The resulting index can be big (roughly the size of the original document
+   set), but it is not a document archive. Recoll can only display documents
+   that still exist at the place from which they were indexed. (Actually,
+   there is a way to reconstruct a document from the information in the
+   index, but the result is not nice, as all formatting, punctuation and
+   capitalisation are lost).

   Recoll stores all internal data in Unicode UTF-8 format, and it can index
   files with different character sets, encodings, and languages into the
-   same database. It has input filters for many document types.
+   same index. It has input filters for many document types.

   Stemming depends on the document language. Recoll stores the unstemmed
   versions of terms and uses auxiliary databases for term expansion. It can
   switch stemming languages, or add a language, without reindexing. Storing
-   documents in different languages in the same database is possible, and
-   useful in practice, but does introduce possibilities of confusion. Recoll
+   documents in different languages in the same index is possible, and useful
+   in practice, but does introduce possibilities of confusion. Recoll
   currently makes no attempt at automatic language recognition.

   Recoll has many parameters which define exactly what to index, and how to
@ -170,7 +178,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
   should be sufficient for giving Recoll a try, but you may want to adjust
   it later.

-   Indexation is started automatically the first time you execute the recoll
+   Indexing is started automatically the first time you execute the recoll
   search graphical user interface, or by executing the recollindex command.

   Searches are performed inside the recoll program, which has many options
@ -178,20 +186,20 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or

     ----------------------------------------------------------------------

-                             Chapter 2. Indexation
+                              Chapter 2. Indexing

 2.1. Introduction

-   Indexation is the process by which the set of documents is analyzed and
-   the data entered into the database. Recoll indexation is normally
-   incremental: documents will only be processed if they have been modified.
-   On the first execution, of course, all documents will need processing. A
-   full index build can be forced later on by specifying an option to the
-   indexation command (recollindex -z).
+   Indexing is the process by which the set of documents is analyzed and the
+   data entered into the database. Recoll indexing is normally incremental:
+   documents will only be processed if they have been modified. On the first
+   execution, of course, all documents will need processing. A full index
+   build can be forced later on by specifying an option to the indexing
+   command (recollindex -z).

-   Recoll indexation takes place at discrete times. There is currently no
+   Recoll indexing takes place at discrete times. There is currently no
   interface to real time file modification monitors. The typical usage is to
-   have a nightly indexation run programmed into your cron file.
+   have a nightly indexing run programmed into your cron file.

   +------------------------------------------------------------------------+
   | Side note: there is nothing in Recoll and Xapian that would prevent    |
@ -208,7 +216,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
   document. Some file types, like mail folder files can hold many
   individually indexed documents.

-   Recoll indexation processes plain text, HTML, openoffice and e-mail files
+   Recoll indexing processes plain text, HTML, openoffice and e-mail files
   internally. Other types (ie: postscript, pdf, ms-word, rtf) need external
   applications for preprocessing. The list is in the installation section.

@ -217,7 +225,48 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or

     ----------------------------------------------------------------------

-2.2. The indexation configuration
+2.2. Index storage
+
+   The default location for the index data is the $HOME/.recoll/xapiandb/
+   directory. This can be changed by setting the RECOLL_CONFDIR environment
+   variable, or by specifying the dbdir parameter in the configuration file
+   (see the configuration section).
+
+   The size of the index is determined by the size of the set of documents,
+   but the ratio can vary a lot. For a typical mixed set of documents, the
+   index size will often be close to the data set size. In specific cases (a
+   set of compressed mbox files for example), the index can become much
+   bigger than the documents. It may also be much smaller if the documents
+   contain a lot of images or other non-indexed data (an extreme example
+   being a set of mp3 files where only the tags would be indexed).
+
+   Of course, images, sound and video do not increase the index size, which
+   means that it will be quite typical nowadays (2006), that even a big index
+   will be negligible against the total amount of data on the computer.
+
+   The index data directory only contains data that will be rebuilt by an
+   index run, so that it can be destroyed safely.
+
+     ----------------------------------------------------------------------
+
+  2.2.1. Security aspects
+
+   The Recoll index does not hold copies of the indexed documents. But it
+   does hold enough data to allow for an almost complete reconstruction. If
+   confidential data is indexed, access to the database directory should be
+   restricted.
+
+   As of version 1.4, Recoll will create the configuration directory with a
+   mode of 0700 (access by owner only). As the index directory is by default
+   a subdirectory of the configuration directory, this should result in
+   appropriate protection.
+
+   If you use another setup, you should think of the kind of protection you
+   need for your index, and set the directory access modes appropriately.
+
+     ----------------------------------------------------------------------
+
+2.3. The indexing configuration

   Values set in the system-wide configuration file (named like
   /usr/[local/]share/recoll/examples/recoll.conf) can be overriden by those
@ -226,8 +275,8 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or

   The most accurate documentation for editing the file is given by comments
   inside the central one. If you want to adjust the configuration before
-   indexation, just click Cancel when the program asks if it should start
-   initial indexation. This will have created a .recoll directory containing
+   indexing, just click Cancel when the program asks if it should start
+   initial indexing. This will have created a .recoll directory containing
   empty configuration files.

   The configuration is also documented inside the installation chapter of
@ -235,27 +284,27 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or

     ----------------------------------------------------------------------

-2.3. Starting indexation
+2.4. Starting indexing

-   Indexation is performed either by the recollindex program, or by the
-   indexation thread inside the recoll program (use the File menu).
+   Indexing is performed either by the recollindex program, or by the
+   indexing thread inside the recoll program (use the File menu).

-   If the recoll program finds no database when it starts, it will
-   automatically start indexation (except if cancelled).
+   If the recoll program finds no index when it starts, it will automatically
+   start indexing (except if cancelled).

-   It is best to avoid interrupting the indexation process, as this may
+   It is best to avoid interrupting the indexing process, as this may
   sometimes leave the database in a bad state. This is not a serious
   problem, as you then just need to clear everything and restart the
-   indexation: the database files are normally stored in the
+   indexing: the index files are normally stored in the
   $HOME/.recoll/xapiandb directory, which you can just delete if needed.
   Alternatively, you can start recollindex -z, which will reset the database
-   before indexation.
+   before indexing.

     ----------------------------------------------------------------------

-2.4. Using cron to automate indexation
+2.5. Using cron to automate indexing

-   The most common way to set up indexation is to have a cron task execute it
+   The most common way to set up indexing is to have a cron task execute it
   every night. For example the following crontab entry would do it every day
   at 3:30AM (supposing recollindex is in your PATH):

@ -335,7 +384,30 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or

     ----------------------------------------------------------------------

-3.3. Document history
+3.3. Multiple databases
+
+   Your Recoll configuration always defines a main index. This is what gets
+   updated, for example, when you execute recollindex.
+
+   You can use the search configuration tool to define additional databases
+   to be searched. These databases can be made active or inactive at any
+   moment.
+
+   The typical use of this feature is for a system administrator to set up a
+   central index, that you may choose to search, or not, in addition to your
+   personal data. Of course, there are other possibilities.
+
+   The main index (defined by your personal configuration) is always active.
+
+   The list of searchable databases may also be defined by the
+   RECOLL_EXTRA_DBS environment variable. This should hold a colon-separated
+   list of index directories, ie:
+
+ export RECOLL_EXTRA_DBS=/some/place/xapiandb:/some/other/db
+
+     ----------------------------------------------------------------------
+
+3.4. Document history

   Documents that you actually view (with the internal preview or an external
   tool) are entered into the document history, which is remembered. You can
@ -343,7 +415,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or

     ----------------------------------------------------------------------

-3.4. Result list sorting
+3.5. Result list sorting

   The documents in a result list are normally sorted in order of relevance.
   It is possible to specify different sort parameters by using the Sort
@ -359,7 +431,34 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or

     ----------------------------------------------------------------------

-3.5. Search tips, shortcuts
+3.6. Additional result list functionality
+
+   Apart from the preview and edit links, you can display a popup menu by
+   right-clicking over a paragraph in the result list. This menu has the
+   following entries:
+
+     * Preview
+
+     * Edit
+
+     * Copy File Name
+
+     * Copy Url
+
+     * More like this
+
+   The Preview and Edit entries do the same thing as the corresponding links.
+   The two following entries will copy either an url or the file path to the
+   clipboard, for pasting into another application.
+
+   The More like this entry will select a number of relevant term from the
+   current document and enter them into the simple search field. You can then
+   start a simple search, with a good chance of finding documents related to
+   the current result.
+
+     ----------------------------------------------------------------------
+
+3.7. Search tips, shortcuts

   Disabling stem expansion. Entering a capitalized word in any search field
   will prevent stem expansion (no search for gardening if you enter Garden
@ -371,14 +470,31 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
   followed by manual. You can use the This exact phrase field of the
   advanced search dialog to the same effect.

+   Term completion. Typing ^TAB (Control+Tab) in the simple search entry
+   field while entering a word will either complete the current word if its
+   beginning matches a unique term in the index, or open a window to propose
+   a list of completions
+
+   Picking up new terms for search from displayed documents. Double-clicking
+   on a word in the result list or in a preview window will copy it to the
+   simple search entry field.
+
+   Finding related documents. Selecting the More like this entry in the
+   result list paragraph right-click menu will select a set of "interesting"
+   terms from the current result, and insert them into the simple search
+   entry field. You can then possibly edit the list and start a search to
+   find documents which may be apparented to the current result.
+
   Query explanation. You can get an exact description of what the query
   looked for, including stem expansion, and boolean operators used, by
   clicking on the result list header.

-   File names. All file name elements (the broken up file path) are entered
-   as terms during indexation, and you can specify them as ordinary terms in
-   normal search fields. Alternatively, you can use specific file name search
-   which will only look for file names and can use wildcard expansion.
+   File names. File names are added as terms during indexing, and you can
+   specify them as ordinary terms in normal search fields (Recoll used to
+   index all directories in the file path as terms. This has been abandonned
+   as it did not seem really useful). Alternatively, you can use specific
+   file name search which will only look for file names and can use wildcard
+   expansion.

   Quitting. Entering ^Q almost anywhere will close the application.

@ -387,7 +503,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or

     ----------------------------------------------------------------------

-3.6. Customising the search interface
+3.8. Customising the search interface

   It is possible to customise some aspects of the search interface by using
   Query configuration entry in the Preferences menu.
@ -404,7 +520,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
       The rest of the fonts used by Recoll are determined by your generic QT
       config (try the qtconfig command.

-     * Html help browser: this will let you chose your the preferred browser
+     * Html help browser: this will let you chose your preferred browser
       which will be started from the Help menu to read the user manual. You
       can enter a simple name if the command is in your PATH, or browse for
       a full pathname.
@ -413,6 +529,11 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
       be turned off. They take quite a lot of space and convey relatively
       little useful information.

+     * Auto-start simple search on whitespace entry: if this is checked, a
+       search will be executed each time you enter a space in the simple
+       search input field. This lets you look at the result list as you enter
+       new terms. This is off by default, you may like it or not...
+
   Search parameters:

     * Stemming language: stemming obviously depends on the document's
@ -420,7 +541,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
       which were built during indexing (this is set in the main
       configuration file), or later added with recollindex -s (See the
       recollindex manual). Stemming languages which are dynamically added
-       will be deleted at the next indexation pass unless they are also added
+       will be deleted at the next indexing pass unless they are also added
       in the configuration file.

     * Dynamically build abstracts: this decides if Recoll tries to build
@ -433,6 +554,20 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
       and display an abstract in place of an explicit abstract found within
       the document itself.

+   Extra databases:
+
+   This panel will let you browse for additional databases that you may want
+   to search. Extra databases are designated by their database directory (ie:
+   /home/someothergui/.recoll/xapiandb, /usr/local/recollglobal/xapiandb).
+
+   Once entered, the databases will appear in the All extra databases list,
+   and you can chose which ones you want to use at any moment by tranferring
+   them to/from the Active extra databases list.
+
+   Your main database (the one the current configuration indexes to), is
+   always implicitely active. If this is not desirable, you can set up your
+   configuration so that it indexes, for example, an empty directory.
+
     ----------------------------------------------------------------------

                            Chapter 4. Installation
@ -442,9 +577,9 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
  4.1.1. Prerequisites

   At the very least, you will need to download and install the xapian core
-   package (Recoll currently uses version 0.9.2), and the qt runtime and
-   development packages (Recoll development currently uses version 3.3.5, but
-   any 3.3 version is probably ok).
+   package (Recoll development currently uses version 0.9.5), and the qt
+   runtime and development packages (Recoll development currently uses
+   version 3.3.5, but any 3.3 version is probably ok).

   You will most probably be able to find a binary package for qt for your
   system. You may have to compile Xapian but this is not difficult (if you
@ -563,13 +698,12 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
   in a directory named like /usr/[local/]share/recoll/examples, they define
   default values for the system. A parallel set of files exists in the
   .recoll directory in your home (this can be changed with the
-   RECOLL_CONFDIR environment variable. The database is also kept in .recoll
-   by default, (this can be changed by a configuration parameter).
+   RECOLL_CONFDIR environment variable.

   If the .recoll directory does not exist when recoll or recollindex are
   started, it will be created with a set of empty configuration files.
   recoll will give you a chance to edit the configuration file before
-   starting indexation. recollindex will proceed immediately.
+   starting indexing. recollindex will proceed immediately.

   Most of the parameters specific to the recoll GUI are set through the
   Preferences menu and stored in the standard QT place ($HOME/.qt/recollrc).
@ -600,8 +734,8 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
     * Section definition ([somedirname]).

   Section lines allow redefining some parameters for a directory subtree.
-   Some of the parameters used for indexation are looked up hierarchically
-   from the more to the less specific. Not all parameters can be meaningfully
+   Some of the parameters used for indexing are looked up hierarchically from
+   the more to the less specific. Not all parameters can be meaningfully
   redefined, this is specified for each in the next section.

   The tilde character (~) is expanded in file names to the name of the
@ -619,9 +753,9 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
   set to use for document types which do not specify it internally.

   The default configuration will index your home directory. If this is not
-   appropriate, use recoll to copy the sample configuration, click Cancel,
+   appropriate, start recoll to create a blank configuration, click Cancel,
   and edit the configuration file before restarting the command. This will
-   start the initial indexation, which may take some time.
+   start the initial indexing, which may take some time.

   Paramers:

@ -630,8 +764,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
           Specifies the list of directories or files to index (recursively
           for directories). The indexer will not follow symbolic links
           inside the indexed trees. If an entry in the topdirs list is a
-           symbolic link, indexation will not start and will generate an
-           error.
+           symbolic link, indexing will not start and will generate an error.

   skippedNames

@ -662,8 +795,8 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or

   logfilename

-           Where should the messages go. 'stderr' can be used as a special
-           value.
+           Where the messages should go. 'stderr' can be used as a special
+           value, and is the default.

   filtersdir

@ -677,7 +810,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
           A list of languages for which the stem expansion databases will be
           built. See recollindex(1) for possible values. You can add a stem
           expansion database for a different language by using recollindex
-           -s, but it will be deleted during the next indexation. Only
+           -s, but it will be deleted during the next indexing. Only
           languages listed in the configuration file are permanent.

   iconsdir
@ -687,8 +820,8 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or

   dbdir

-           The name of the Xapian database directory. It will be created if
-           needed when the database is initialized.
+           The name of the Xapian data directory. It will be created if
+           needed when the index is initialized.

   defaultcharset

@ -710,7 +843,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
           determining the mime type for a file (the main procedure uses
           suffix associations as defined in the mimemap file). This can be
           useful for files with suffixless names, but it will also cause the
-           indexation of many bogus "text" files.
+           indexing of many bogus "text" files.

   indexallfilenames

@ -718,7 +851,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
           allow specific file names searches using wild cards. This
           parameter decides if file name indexing is performed only for
           files with mime types that would qualify them for full text
-           indexation, or for all files inside the selected subtrees,
+           indexing, or for all files inside the selected subtrees,
           independant of mime type.

     ----------------------------------------------------------------------
@ -731,10 +864,6 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
   file -i command will be executed to determine the mime type (this can be
   switched off inside the main configuration file).

-   mimemap also has a list of extensions which should be ignored totally (to
-   avoid losing time by executing file for things that certainly should not
-   be indexed).
-
   The mappings can be specified on a per-subtree basis, which may be useful
   in some cases. Example: gaim logs have a .txt extension but should be
   handled specially, which is possible because they are usually all located
@ -750,11 +879,11 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or

  4.4.3. The mimeconf file

-   mimeconf specifies how the different mime types are handled for
-   indexation, and for display.
+   mimeconf specifies how the different mime types are handled for indexing,
+   and for display.

-   Changing the indexation parameters is probably not a good idea except if
-   you are a Recoll developper.
+   Changing the indexing parameters is probably not a good idea except if you
+   are a Recoll developper.

   You may want to adjust the external viewers defined in (ie: html is either
   previewed internally or displayed using firefox, but you may prefer