diff --git a/src/doc/user/usermanual.sgml b/src/doc/user/usermanual.sgml
index d6e4a84a..0ad4f6fc 100644
--- a/src/doc/user/usermanual.sgml
+++ b/src/doc/user/usermanual.sgml
@@ -20,7 +20,7 @@
- 2005
+ 2005-2011Jean-Francois
Dockes
@@ -197,18 +197,18 @@
Periodic indexing:indexing takes place at discrete
- times, by executing the recollindex
- command. The typical usage is to have a nightly indexing run
- programmed into your
- cron file.
+ times, by executing the recollindex
+ command. The typical usage is to have a nightly indexing run
+ programmed
+ into your cron file.
Real time indexing:indexing takes place as soon as a file is created or
- changed. recollindex runs as a daemon
- and uses a file system alteration monitor such as
+ changed. recollindex runs as a daemon
+ and uses a file system alteration monitor such as
inotify,
Fam or
Gamin
@@ -218,17 +218,16 @@
The choice between the two methods is mostly a matter of
- preference, and they can be combined by setting up multiple
- indexes (ie: use periodic indexing on a big documentation
- directory, and real time indexing on a small home
- directory). Monitoring a big file system tree can consume
- significant system resources.
+ preference, and they can be combined by setting up multiple
+ indexes (ie: use periodic indexing on a big documentation
+ directory, and real time indexing on a small home
+ directory). Monitoring a big file system tree can consume
+ significant system resources.&RCL; knows about quite a few different document
- types. The parameters for document types recognition and
- processing are set in
- configuration files.
-
+ types. The parameters for document types recognition and
+ processing are set in
+ configuration files.Most file types, like HTML or word processing files, only hold
one document. Some file types, like mail folder files or zip
@@ -236,25 +235,24 @@
in turn be themselves compound ones. Such hierarchies can go quite
deep, and &RCL; has no problem processing, for example, an ms-word
document which would be an attachment to an email message part of
- a folder file archived inside a zip file...
-
+ a folder file archived inside a zip file...&RCL; indexing processes plain text, HTML, openoffice
- and e-mail files internally (a few more actually).
+ and e-mail files internally (a few more actually).Other file types (ie: postscript, pdf, ms-word, rtf ...)
- need external applications for preprocessing. The list is in the
- installation
- section. After every indexing operation, &RCL; updates a list of
- commands that would be needed for indexing existing files
- types. This list can be displayed from the
- recollFile menu. It is
- stored in the missing text file
- inside the configuration directory.
+ need external applications for preprocessing. The list is in the
+ installation
+ section. After every indexing operation, &RCL; updates a list of
+ commands that would be needed for indexing existing files
+ types. This list can be displayed from the
+ recollFile menu. It is
+ stored in the missing text file
+ inside the configuration directory.Without further configuration, &RCL; will index all
- appropriate files from your home directory, with a reasonable
- set of defaults.
+ appropriate files from your home directory, with a reasonable
+ set of defaults.
In some cases, it may be interesting to index different
areas of the file system to separate databases. You can do this
@@ -323,19 +321,19 @@ recoll
The size of the index is determined by the document set size,
- but the ratio can vary a lot. For a typical mixed
- set of documents, the index size will often be close to
- the data set size. In specific cases (a set of compressed
- mbox files for example), the index can become much bigger than
- the documents. It may also be much smaller if the documents
- contain a lot of images or other non-indexed data (an extreme
- example being a set of mp3 files where only the tags would be
- indexed).
+ but the ratio can vary a lot. For a typical mixed
+ set of documents, the index size will often be close to
+ the data set size. In specific cases (a set of compressed
+ mbox files for example), the index can become much bigger than
+ the documents. It may also be much smaller if the documents
+ contain a lot of images or other non-indexed data (an extreme
+ example being a set of mp3 files where only the tags would be
+ indexed).Of course, images, sound and video do not increase the
- index size, which means that it will be quite typical nowadays
- (2006), that even a big index will be negligible against the
- total amount of data on the computer.
+ index size, which means that it will be quite typical nowadays
+ (2006), that even a big index will be negligible against the
+ total amount of data on the computer.
The index data directory (xapiandb)
only contains data that can be completely rebuilt by an index
@@ -385,20 +383,20 @@ recoll
Security aspectsThe &RCL; index does not hold copies of the indexed
- documents. But it does hold enough data to allow for an almost
- complete reconstruction. If confidential data is indexed,
- access to the database directory should be restricted.
+ documents. But it does hold enough data to allow for an almost
+ complete reconstruction. If confidential data is indexed,
+ access to the database directory should be restricted. As of version 1.4, &RCL; will create the configuration
- directory with a mode of 0700 (access by owner only). As the
- index data directory is by default a sub-directory of the
- configuration directory, this should result in appropriate
- protection.
+ directory with a mode of 0700 (access by owner only). As the
+ index data directory is by default a sub-directory of the
+ configuration directory, this should result in appropriate
+ protection.
If you use another setup, you should think of the kind
- of protection you need for your index, set the directory
- and files access modes appropriately, and also maybe adjust
- the umask used during index updates.
+ of protection you need for your index, set the directory
+ and files access modes appropriately, and also maybe adjust
+ the umask used during index updates.
@@ -409,38 +407,38 @@ recoll
Indexing configurationVariables set inside the
- &RCL; configuration files
- control which areas of the file system are indexed, and how
- files are processed. These variables can be set either by
- editing the text files or using the dialogs in the
- recoll GUI.
+ &RCL; configuration files
+ control which areas of the file system are indexed, and how
+ files are processed. These variables can be set either by
+ editing the text files or using the dialogs in the
+ recoll GUI.
You can also use multiple
- indexes defined by separate configurations, typically to
- separate personal and shared indexes, or to take advantage of
- the organization of your data to improve search precision.
+ indexes defined by separate configurations, typically to
+ separate personal and shared indexes, or to take advantage of
+ the organization of your data to improve search precision.
The first time you start recoll, you
- will be asked whether or not you would like it to build the
- index. If you want to adjust the configuration before indexing,
- just click Cancel at this point, which will get
- you into the configuration interface. If you exit,
- recoll will have created a ~/.recoll directory
- containing empty configuration files, which you can edit by hand.
+ will be asked whether or not you would like it to build the
+ index. If you want to adjust the configuration before indexing,
+ just click Cancel at this point, which will get
+ you into the configuration interface. If you exit,
+ recoll will have created a ~/.recoll directory
+ containing empty configuration files, which you can edit by hand.
- The configuration is documented inside the installation chapter of this
- document, or in the recoll.conf(5) man page, but the most
- current information will most likely be the comments inside the
- sample file. The most immediately useful variable you may
- interested in is probably topdirs,
- which determines what subtrees get indexed.
+ The configuration is documented inside the
+ installation chapter
+ of this document, or in the recoll.conf(5) man page, but the most
+ current information will most likely be the comments inside the
+ sample file. The most immediately useful variable you may
+ interested in is probably
+ topdirs,
+ which determines what subtrees get indexed.The applications needed to index file types other than
- text, HTML or email (ie: pdf, postscript, ms-word...) are
- described in the external
- packages section
+ text, HTML or email (ie: pdf, postscript, ms-word...) are
+ described in the external
+ packages section
The indexing configuration GUI
@@ -510,7 +508,7 @@ recoll
Periodic indexing
- Starting indexing
+ Running indexingIndexing is performed either by the
recollindex program, or by the
@@ -525,22 +523,22 @@ recoll
recollindex command:
Starting the indexing thread is more convenient,
- being just one click away.
+ being just one click away.The recollindex command has
- more options, especially the one to reset the index
- (-z).
+ more options, especially the one to reset the index
+ (-z).
The recollindex command will
- not take down your GUI if it crashes (a rare occurrence, but who
- knows...)
+ not take down your GUI if it crashes (a rare occurrence,
+ but who knows...)
The recollindex command uses
- setpriority/nice to lower its priority while
- indexing
- (it will also use ionice when this becomes
- more widely available), the thread can't do it, else it would
- also slow down the user/search interface.
+ setpriority/nice to lower its priority while
+ indexing
+ (it will also use ionice when this becomes
+ more widely available), the thread can't do it, else it would
+ also slow down the user/search interface.
I'll let the reader decide where my heart belongs...
@@ -567,7 +565,24 @@ recoll
up to date will not need to be reindexed).
recollindex has a number of other options
- which are described in its man page.
+ which are described in its man page.
+
+ Of special interest maybe are the -i and
+ -f options. -i allows
+ indexing an explicit list of files (given as command line
+ parameters or read on stdin). -f tells
+ recollindex to ignore file selection
+ parameters from the configuration. Together, these options allow
+ building a custom file selection process for some area of the
+ file system, by adding the top directory to the
+ skippedPaths list and using an appropriate
+ file selection method to build the file list to be fed to
+ recollindex -if .
+
+ recollindex -i will not descend into
+ directory parameters, but just add them as index entries. It is
+ up to the external file selection method to build the complete
+ file list.