From e41216aa9d052ba12fa596e8845eba4d9c45e709 Mon Sep 17 00:00:00 2001 From: Jean-Francois Dockes Date: Mon, 20 Jun 2011 13:53:36 +0200 Subject: [PATCH] doc --- src/doc/user/usermanual.sgml | 189 +++++++++++++++++++---------------- 1 file changed, 102 insertions(+), 87 deletions(-) diff --git a/src/doc/user/usermanual.sgml b/src/doc/user/usermanual.sgml index d6e4a84a..0ad4f6fc 100644 --- a/src/doc/user/usermanual.sgml +++ b/src/doc/user/usermanual.sgml @@ -20,7 +20,7 @@ - 2005 + 2005-2011 Jean-Francois Dockes @@ -197,18 +197,18 @@ Periodic indexing: indexing takes place at discrete - times, by executing the recollindex - command. The typical usage is to have a nightly indexing run - programmed into your - cron file. + times, by executing the recollindex + command. The typical usage is to have a nightly indexing run + programmed + into your cron file. Real time indexing: indexing takes place as soon as a file is created or - changed. recollindex runs as a daemon - and uses a file system alteration monitor such as + changed. recollindex runs as a daemon + and uses a file system alteration monitor such as inotify, Fam or Gamin @@ -218,17 +218,16 @@ The choice between the two methods is mostly a matter of - preference, and they can be combined by setting up multiple - indexes (ie: use periodic indexing on a big documentation - directory, and real time indexing on a small home - directory). Monitoring a big file system tree can consume - significant system resources. + preference, and they can be combined by setting up multiple + indexes (ie: use periodic indexing on a big documentation + directory, and real time indexing on a small home + directory). Monitoring a big file system tree can consume + significant system resources. &RCL; knows about quite a few different document - types. The parameters for document types recognition and - processing are set in - configuration files. - + types. The parameters for document types recognition and + processing are set in + configuration files. Most file types, like HTML or word processing files, only hold one document. Some file types, like mail folder files or zip @@ -236,25 +235,24 @@ in turn be themselves compound ones. Such hierarchies can go quite deep, and &RCL; has no problem processing, for example, an ms-word document which would be an attachment to an email message part of - a folder file archived inside a zip file... - + a folder file archived inside a zip file... &RCL; indexing processes plain text, HTML, openoffice - and e-mail files internally (a few more actually). + and e-mail files internally (a few more actually). Other file types (ie: postscript, pdf, ms-word, rtf ...) - need external applications for preprocessing. The list is in the - installation - section. After every indexing operation, &RCL; updates a list of - commands that would be needed for indexing existing files - types. This list can be displayed from the - recoll File menu. It is - stored in the missing text file - inside the configuration directory. + need external applications for preprocessing. The list is in the + installation + section. After every indexing operation, &RCL; updates a list of + commands that would be needed for indexing existing files + types. This list can be displayed from the + recoll File menu. It is + stored in the missing text file + inside the configuration directory. Without further configuration, &RCL; will index all - appropriate files from your home directory, with a reasonable - set of defaults. + appropriate files from your home directory, with a reasonable + set of defaults. In some cases, it may be interesting to index different areas of the file system to separate databases. You can do this @@ -323,19 +321,19 @@ recoll The size of the index is determined by the document set size, - but the ratio can vary a lot. For a typical mixed - set of documents, the index size will often be close to - the data set size. In specific cases (a set of compressed - mbox files for example), the index can become much bigger than - the documents. It may also be much smaller if the documents - contain a lot of images or other non-indexed data (an extreme - example being a set of mp3 files where only the tags would be - indexed). + but the ratio can vary a lot. For a typical mixed + set of documents, the index size will often be close to + the data set size. In specific cases (a set of compressed + mbox files for example), the index can become much bigger than + the documents. It may also be much smaller if the documents + contain a lot of images or other non-indexed data (an extreme + example being a set of mp3 files where only the tags would be + indexed). Of course, images, sound and video do not increase the - index size, which means that it will be quite typical nowadays - (2006), that even a big index will be negligible against the - total amount of data on the computer. + index size, which means that it will be quite typical nowadays + (2006), that even a big index will be negligible against the + total amount of data on the computer. The index data directory (xapiandb) only contains data that can be completely rebuilt by an index @@ -385,20 +383,20 @@ recoll Security aspects The &RCL; index does not hold copies of the indexed - documents. But it does hold enough data to allow for an almost - complete reconstruction. If confidential data is indexed, - access to the database directory should be restricted. + documents. But it does hold enough data to allow for an almost + complete reconstruction. If confidential data is indexed, + access to the database directory should be restricted. As of version 1.4, &RCL; will create the configuration - directory with a mode of 0700 (access by owner only). As the - index data directory is by default a sub-directory of the - configuration directory, this should result in appropriate - protection. + directory with a mode of 0700 (access by owner only). As the + index data directory is by default a sub-directory of the + configuration directory, this should result in appropriate + protection. If you use another setup, you should think of the kind - of protection you need for your index, set the directory - and files access modes appropriately, and also maybe adjust - the umask used during index updates. + of protection you need for your index, set the directory + and files access modes appropriately, and also maybe adjust + the umask used during index updates. @@ -409,38 +407,38 @@ recoll Indexing configuration Variables set inside the - &RCL; configuration files - control which areas of the file system are indexed, and how - files are processed. These variables can be set either by - editing the text files or using the dialogs in the - recoll GUI. + &RCL; configuration files + control which areas of the file system are indexed, and how + files are processed. These variables can be set either by + editing the text files or using the dialogs in the + recoll GUI. You can also use multiple - indexes defined by separate configurations, typically to - separate personal and shared indexes, or to take advantage of - the organization of your data to improve search precision. + indexes defined by separate configurations, typically to + separate personal and shared indexes, or to take advantage of + the organization of your data to improve search precision. The first time you start recoll, you - will be asked whether or not you would like it to build the - index. If you want to adjust the configuration before indexing, - just click Cancel at this point, which will get - you into the configuration interface. If you exit, - recoll will have created a ~/.recoll directory - containing empty configuration files, which you can edit by hand. + will be asked whether or not you would like it to build the + index. If you want to adjust the configuration before indexing, + just click Cancel at this point, which will get + you into the configuration interface. If you exit, + recoll will have created a ~/.recoll directory + containing empty configuration files, which you can edit by hand. - The configuration is documented inside the installation chapter of this - document, or in the recoll.conf(5) man page, but the most - current information will most likely be the comments inside the - sample file. The most immediately useful variable you may - interested in is probably topdirs, - which determines what subtrees get indexed. + The configuration is documented inside the + installation chapter + of this document, or in the recoll.conf(5) man page, but the most + current information will most likely be the comments inside the + sample file. The most immediately useful variable you may + interested in is probably + topdirs, + which determines what subtrees get indexed. The applications needed to index file types other than - text, HTML or email (ie: pdf, postscript, ms-word...) are - described in the external - packages section + text, HTML or email (ie: pdf, postscript, ms-word...) are + described in the external + packages section The indexing configuration GUI @@ -510,7 +508,7 @@ recoll Periodic indexing - Starting indexing + Running indexing Indexing is performed either by the recollindex program, or by the @@ -525,22 +523,22 @@ recoll recollindex command: Starting the indexing thread is more convenient, - being just one click away. + being just one click away. The recollindex command has - more options, especially the one to reset the index - (-z). + more options, especially the one to reset the index + (-z). The recollindex command will - not take down your GUI if it crashes (a rare occurrence, but who - knows...) + not take down your GUI if it crashes (a rare occurrence, + but who knows...) The recollindex command uses - setpriority/nice to lower its priority while - indexing - (it will also use ionice when this becomes - more widely available), the thread can't do it, else it would - also slow down the user/search interface. + setpriority/nice to lower its priority while + indexing + (it will also use ionice when this becomes + more widely available), the thread can't do it, else it would + also slow down the user/search interface. I'll let the reader decide where my heart belongs... @@ -567,7 +565,24 @@ recoll up to date will not need to be reindexed). recollindex has a number of other options - which are described in its man page. + which are described in its man page. + + Of special interest maybe are the -i and + -f options. -i allows + indexing an explicit list of files (given as command line + parameters or read on stdin). -f tells + recollindex to ignore file selection + parameters from the configuration. Together, these options allow + building a custom file selection process for some area of the + file system, by adding the top directory to the + skippedPaths list and using an appropriate + file selection method to build the file list to be fed to + recollindex -if . + + recollindex -i will not descend into + directory parameters, but just add them as index entries. It is + up to the external file selection method to build the complete + file list.