From b5013b41e139378bbb9fba7e02a25e7d9fc820c3 Mon Sep 17 00:00:00 2001 From: Jean-Francois Dockes Date: Tue, 5 Oct 2021 18:36:04 +0200 Subject: [PATCH] doc --- src/doc/user/recoll.conf.xml | 57 +++++++++++++++-------- src/doc/user/usermanual.html | 89 ++++++++++++++++++++++++++++-------- 2 files changed, 107 insertions(+), 39 deletions(-) diff --git a/src/doc/user/recoll.conf.xml b/src/doc/user/recoll.conf.xml index 4cd4d43d..aa7d4a79 100644 --- a/src/doc/user/recoll.conf.xml +++ b/src/doc/user/recoll.conf.xml @@ -17,19 +17,17 @@ subset of the whole indexed area. The elements must be included in the tree defined by the 'topdirs' members. skippedNames -Files and directories which should be ignored. -White space separated list of wildcard patterns (simple ones, not paths, -must contain no / ), which will be tested against file and directory -names. The list in the default configuration does not exclude hidden -directories (names beginning with a dot), which means that it may index -quite a few things that you do not want. On the other hand, email user -agents like Thunderbird usually store messages in hidden directories, and -you probably want this indexed. One possible solution is to have ".*" in -"skippedNames", and add things like "~/.thunderbird" "~/.evolution" to -"topdirs". Not even the file names are indexed for patterns in this -list, see the "noContentSuffixes" variable for an alternative approach -which indexes the file names. Can be redefined for any -subtree. +Files and directories which should be ignored. White space separated list of wildcard patterns (simple ones, not paths, must contain no +'/' characters), which will be tested against file and directory names. Have a look at the default +configuration for the initial value, some entries may not suit your situation. The easiest way to +see it is through the GUI Index configuration "local parameters" panel. The list in the default +configuration does not exclude hidden directories (names beginning with a dot), which means that +it may index quite a few things that you do not want. On the other hand, email user agents like +Thunderbird usually store messages in hidden directories, and you probably want this indexed. One +possible solution is to have ".*" in "skippedNames", and add things like "~/.thunderbird" +"~/.evolution" to "topdirs". Not even the file names are indexed for patterns in this list, see +the "noContentSuffixes" variable for an alternative approach which indexes the file names. Can be +redefined for any subtree. skippedNames- List of name endings to remove from the default skippedNames @@ -313,7 +311,7 @@ Swedish: unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe æae Æae ffff fifi flfl åå Åå . German: unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe æae Æae ffff fifi flfl -In French, you probably want to decompose oe and ae and nobody would type +. French: you probably want to decompose oe and ae and nobody would type a German ß unac_except_trans = ßss œoe Œoe æae Æae ffff fifi flfl . The default for all until someone protests follows. These decompositions @@ -442,6 +440,13 @@ possible to change it. The path to browser downloads directory. This is where the new browser add-on extension has to create the files. They are then moved by a script to webqueuedir. + +webcachekeepinterval +Page recycle interval By default, only one instance of an URL is kept in the cache. This +can be changed by setting this to a value determining at what frequency +we keep multiple instances ('day', 'week', 'month', +'year'). Note that increasing the interval will not erase existing +entries. aspellDicDir Aspell dictionary storage directory location. The @@ -486,10 +491,11 @@ is mainly to avoid infinite loops in postscript files filtermaxmbytes Maximum virtual memory space for filter processes -(setrlimit(RLIMIT_AS)), in megabytes. Note that this -includes any mapped libs (there is no reliable Linux way to limit the -data space only), so we need to be a bit generous here. Anything over -2000 will be ignored on 32 bits machines. +(setrlimit(RLIMIT_AS)), in megabytes. Note that this includes any mapped libs (there is no reliable +Linux way to limit the data space only), so we need to be a bit generous +here. Anything over 2000 will be ignored on 32 bits machines. The +previous default value of 2000 would prevent java pdftk to work when +executed from Python rclpdf.py. thrQSizes Stage input queues configuration. There are three @@ -530,6 +536,12 @@ console. idxlogfilename Override logfilename for the indexer. + +helperlogfilename +Destination file for external helpers standard error output. The external program error output is left alone by default, +e.g. going to the terminal when the recoll[index] program is executed +from the command line. Use /dev/null or a file inside a non-existent +directory to completely suppress the output. daemloglevel Override loglevel for the indexer in real time @@ -583,7 +595,9 @@ be looked up in the filters dirs, then in the path. Use an absolute path to do otherwise. recollhelperpath -Additional places to search for helper executables. This is only used on Windows for now. +Additional places to search for helper executables. This is used, e.g., on Windows by the Python code, and on Mac OS by the bundled recoll.app +(because I could find no reliable way to tell launchd to set the PATH). The example below is for +Windows. Use ':' as entry separator for Mac and Ux-like systems, ';' is for Windows only. idxabsmlen Length of abstracts we store while indexing. Recoll stores an abstract for each indexed file. @@ -609,6 +623,11 @@ may be too low if you have custom fields. the beginning of documents. This is not recommended except if you are sure that the interesting keywords are at the top and have severe disk space issues. + +idxsynonyms +Name of the index-time synonyms file. This is used for indexing multiword synonyms as single terms, +which in turn is only useful if you want to perform proximity searches +with such terms. aspellLanguage Language definitions to use when creating the aspell diff --git a/src/doc/user/usermanual.html b/src/doc/user/usermanual.html index f298584d..582d2b2a 100644 --- a/src/doc/user/usermanual.html +++ b/src/doc/user/usermanual.html @@ -8927,22 +8927,26 @@ hasextract = False

Files and directories which should be ignored. White space separated list of wildcard patterns - (simple ones, not paths, must contain no / ), - which will be tested against file and directory - names. The list in the default configuration does - not exclude hidden directories (names beginning - with a dot), which means that it may index quite - a few things that you do not want. On the other - hand, email user agents like Thunderbird usually - store messages in hidden directories, and you - probably want this indexed. One possible solution - is to have ".*" in "skippedNames", and add things - like "~/.thunderbird" "~/.evolution" to - "topdirs". Not even the file names are indexed - for patterns in this list, see the - "noContentSuffixes" variable for an alternative - approach which indexes the file names. Can be - redefined for any subtree.

+ (simple ones, not paths, must contain no '/' + characters), which will be tested against file + and directory names. Have a look at the default + configuration for the initial value, some entries + may not suit your situation. The easiest way to + see it is through the GUI Index configuration + "local parameters" panel. The list in the default + configuration does not exclude hidden directories + (names beginning with a dot), which means that it + may index quite a few things that you do not + want. On the other hand, email user agents like + Thunderbird usually store messages in hidden + directories, and you probably want this indexed. + One possible solution is to have ".*" in + "skippedNames", and add things like + "~/.thunderbird" "~/.evolution" to "topdirs". Not + even the file names are indexed for patterns in + this list, see the "noContentSuffixes" variable + for an alternative approach which indexes the + file names. Can be redefined for any subtree.

+ webcachekeepinterval
+
+

Page recycle interval By default, only one + instance of an URL is kept in the cache. This can + be changed by setting this to a value determining + at what frequency we keep multiple instances + ('day', 'week', 'month', 'year'). Note that + increasing the interval will not erase existing + entries.

+
+
aspellDicDir
@@ -9734,7 +9753,9 @@ hasextract = False no reliable Linux way to limit the data space only), so we need to be a bit generous here. Anything over 2000 will be ignored on 32 bits - machines.

+ machines. The previous default value of 2000 + would prevent java pdftk to work when executed + from Python rclpdf.py.

Override logfilename for the indexer.

helperlogfilename
+
+

Destination file for external helpers standard + error output. The external program error output + is left alone by default, e.g. going to the + terminal when the recoll[index] program is + executed from the command line. Use /dev/null or + a file inside a non-existent directory to + completely suppress the output.

+
+
daemloglevel
@@ -9915,8 +9949,13 @@ hasextract = False "varname">recollhelperpath

Additional places to search for helper - executables. This is only used on Windows for - now.

+ executables. This is used, e.g., on Windows by + the Python code, and on Mac OS by the bundled + recoll.app (because I could find no reliable way + to tell launchd to set the PATH). The example + below is for Windows. Use ':' as entry separator + for Mac and Ux-like systems, ';' is for Windows + only.

idxsynonyms
+
+

Name of the index-time synonyms file. This is + used for indexing multiword synonyms as single + terms, which in turn is only useful if you want + to perform proximity searches with such + terms.

+
+
aspellLanguage