diff --git a/src/doc/user/usermanual.sgml b/src/doc/user/usermanual.sgml index 17ef2ef5..36b0548a 100644 --- a/src/doc/user/usermanual.sgml +++ b/src/doc/user/usermanual.sgml @@ -140,20 +140,20 @@ currently makes no attempt at automatic language recognition. &RCL; has many parameters which define exactly what to - index, and how to classify and decode the source - documents. These are kept in configuration files. A - default configuration is copied into a standard location - (usually something like - /usr/[local/]share/recoll/examples) - during installation. The default parameters from this file may - be overridden by values that you set inside your personal - configuration, found by default in the - .recoll sub-directory of your home - directory. The default configuration will index your home - directory with default parameters and should be sufficient for - giving &RCL; a try, but you may want to adjust it - later. + index, and how to classify and decode the source documents. These + are kept in configuration + files. A default configuration is copied into a standard + location (usually something like + /usr/[local/]share/recoll/examples) during + installation. The default parameters from this file may be + overridden by values that you set inside your personal + configuration, found by default in the .recoll + sub-directory of your home directory. The default configuration + will index your home directory with default parameters and should + be sufficient for giving &RCL; a try, but you may want to adjust it + later, which can be done either by editing the text files or by + using configuration menus in the recoll + GUI Indexing is started automatically the first time you execute the @@ -184,7 +184,7 @@ Indexing is the process by which the set of documents is analyzed and the data entered into the database. &RCL; indexing is normally incremental: documents will only be processed if - they have been modified. On the first execution, of course, all + they have been modified. On the first execution, all documents will need processing. A full index build can be forced later by specifying an option to the indexing command (recollindex -z). @@ -238,7 +238,7 @@ a folder file archived inside a zip file... &RCL; indexing processes plain text, HTML, openoffice - and e-mail files internally (a few more actually). + and e-mail files, and a few others internally. Other file types (ie: postscript, pdf, ms-word, rtf ...) need external applications for preprocessing. The list is in the @@ -342,40 +342,23 @@ recoll Xapian index formats - If your first installation of &RCL; was 1.9.0 or more - recent, you can skip this section. - - &XAP; has had two possible index formats for quite some - time. The "old" one named Quartz, and the - new one named Flint. &XAP; 0.9 used - Quartz by default, but could use - Flint if a specific environment variable - (XAPIAN_PREFER_FLINT) was set. &XAP; 1.0 - still supports Quartz but will use - Flint by default for new index - creations. - - The number of disk accesses performed during indexing - has been much optimized in the new Flint - engine and you may see indexing times improved by 50% in some - cases (compared to Quartz), typically for - big indexes where disk accesses dominate the indexing - time. There is also a more modest improvement of index - size. + &XAP; versions usually support several formats for index + storage. A given major &XAP; version will have a current format, + used to create new indexes, and will also support the format from + the previous major version. &XAP; will not convert automatically an existing index - from the Quartz to the - Flint format. If you have an older index - and want to take advantage of the new format (which can be - done without setting the environment variable as of &RCL; - 1.8.2 and &XAP; 1.0.0), you will have to explicitly delete - the old index, then run a normal indexing process. + from the older format to the newer one. If you want to upgrade to + the new format, or if a very old index needs to be converted + because its format is not supported any more, you will have to + explicitly delete the old index, then run a normal indexing + process. Unfortunately, using the -z option to recollindex is not sufficient to change the - format, you have to delete all files inside the index + format, you will have to delete all files inside the index directory (typically ~/.recoll/xapiandb) - before starting indexing. + before starting the indexing. @@ -387,7 +370,7 @@ recoll complete reconstruction. If confidential data is indexed, access to the database directory should be restricted. - As of version 1.4, &RCL; will create the configuration + &RCL; (since version 1.4) will create the configuration directory with a mode of 0700 (access by owner only). As the index data directory is by default a sub-directory of the configuration directory, this should result in appropriate @@ -511,16 +494,16 @@ recoll Running indexing Indexing is performed either by the - recollindex program, or by the - indexing thread inside the recoll - program (use the File menu). Both programs - will use the RECOLL_CONFDIR - variable or accept a -c - confdir option to specify a non-default - configuration directory. + recollindex program, or by the indexing thread + inside the recoll program (start it from the + File menu). Both programs will use the + RECOLL_CONFDIR variable or accept a + -c confdir option + to specify a non-default configuration directory. - Reasons to use either the indexing thread or the - recollindex command: + There are reasons to use either the indexing thread or the + recollindex command, but it is also a matter of + personal preferences: Starting the indexing thread is more convenient, being just one click away. @@ -534,14 +517,15 @@ recoll but who knows...) The recollindex command uses - setpriority/nice to lower its priority while - indexing - (it will also use ionice when this becomes - more widely available), the thread can't do it, else it would - also slow down the user/search interface. + setpriority/nice to lower its priority + while indexing. When available (and for &RCL; version + 1.16.2 and newer), it also uses the + ionice command to lower its IO + priority. The thread can't do it, else it would also slow + down the user/search interface. - I'll let the reader decide where my heart belongs... + If the recoll program finds no index when it starts, it will automatically start indexing (except @@ -631,7 +615,7 @@ recoll with the --with[out]-fam or --with[out]-inotify options. The default is currently to include inotify monitoring on systems that support - it. + it, and, as of recoll 1.17, gamin support on FreeBSD. The rclmon.sh script can be used to easily start and stop the daemon. It can be found in the @@ -1311,19 +1295,13 @@ fvwm Sorting search results and collapsing duplicates The documents in a result list are normally sorted in - order of relevance. It is possible to specify different sort - parameters by using the Sort parameters - dialog (located in the Tools menu). - - The tool sorts a specified number of the most - relevant documents in the result list, according to specified - criteria. The currently available criteria are - date and mime - type. - - The sort parameters stay in effect until they are - explicitly reset, or the program exits. An activated sort is - indicated in the result list header. + order of relevance. It is possible to specify a different sort + order, either by using the vertical arrows in the GUI toolbox to + sort by date, or switching to the result table display and clicking + on any header. The sort order chosen inside the result table + remains active if you switch back to the result list, until you + click one of the vertical arrows, until both are unchecked (you are + back to sort by relevance). Sort parameters are remembered between program invocations, but result sorting is normally always inactive @@ -1427,15 +1405,34 @@ fvwm AutoPhrases This option can be set in the preferences dialog. If it is - set, a phrase will be automatically built and added to simple - searches when looking for Any terms. This - will not change radically the results, but will give a relevance - boost to the results where the search terms appear as a - phrase. Ie: searching for virtual reality - will still find all documents where either - virtual or reality or - both appear, but those which contain virtual - reality should appear sooner in the list. + set, a phrase will be automatically built and added to simple + searches when looking for Any terms. This + will not change radically the results, but will give a relevance + boost to the results where the search terms appear as a + phrase. Ie: searching for virtual reality + will still find all documents where either + virtual or reality or + both appear, but those which contain virtual + reality should appear sooner in the list. + + Phrase searches can strongly slow down a query if most of the + terms in the phrase are common. This is why the + autophrase option is off by default for &RCL; + versions before 1.17. As of version 1.17, + autophrase is on by default, but very common + terms will be removed from the constructed phrase. The removal + threshold can be adjusted from the search preferences. + + Phrases and abbreviations As of + &RCL; version 1.17, dotted abbreviations like + I.B.M. are also automatically indexed as a word + without the dots: IBM. Searching for the word + inside a phrase (ie: "the IBM company") will only + match the dotted abrreviation if you increase the phrase slack (using the + advanced search panel control, or the o query + language modifier). Literal occurences of the word will be matched + normally. + @@ -3406,6 +3403,13 @@ skippedNames = #* bin CVS Cache cache* caughtspam tmp .thumbnails .svn \ skippedPaths = ~/somedir/∗.txt + The values in the *skippedPaths + variables are currently matched with + fnmatch(3), with the FNM_PATHNAME and + FNM_LEADING_DIR flags. This means that '/' characters must + be matched explicitely, which is probably + unfortunate. +