comments and doc

This commit is contained in:
Jean-Francois Dockes 2021-11-11 18:45:37 +01:00
parent a1e98c1bdc
commit 9e2e73a995
4 changed files with 358 additions and 225 deletions

View File

@ -49,7 +49,7 @@ usermanual.pdf: usermanual.xml recoll.conf.xml
dblatex --xslt-opts="--xinclude" -tpdf $< dblatex --xslt-opts="--xinclude" -tpdf $<
UTILBUILDS=/home/dockes/tmp/builds/medocutils/ UTILBUILDS=/home/dockes/tmp/builds/medocutils/
recoll-conf-xml: recoll.conf.xml: ../../sampleconf/recoll.conf
$(UTILBUILDS)/confxml --docbook \ $(UTILBUILDS)/confxml --docbook \
--idprefix=RCL.INSTALL.CONFIG.RECOLLCONF \ --idprefix=RCL.INSTALL.CONFIG.RECOLLCONF \
../../sampleconf/recoll.conf > recoll.conf.xml ../../sampleconf/recoll.conf > recoll.conf.xml
@ -65,7 +65,7 @@ recoll-conf-xml:
# script. # script.
# Also could not get readthedocs to generate the left pane TOC? could # Also could not get readthedocs to generate the left pane TOC? could
# probably be fixed... # probably be fixed...
#usermanual-rst: recoll-conf-xml #usermanual-rst: recoll.conf.xml
# tail -n +2 recoll.conf.xml > rcl-conf-tail.xml # tail -n +2 recoll.conf.xml > rcl-conf-tail.xml
# sed -e '/xi:include/r rcl-conf-tail.xml' \ # sed -e '/xi:include/r rcl-conf-tail.xml' \
# < usermanual.xml > full-man.xml # < usermanual.xml > full-man.xml

View File

@ -8,26 +8,34 @@
<listitem><para>Space-separated list of files or <listitem><para>Space-separated list of files or
directories to recursively index. Default to ~ (indexes directories to recursively index. Default to ~ (indexes
$HOME). You can use symbolic links in the list, they will be followed, $HOME). You can use symbolic links in the list, they will be followed,
independently of the value of the followLinks variable.</para></listitem></varlistentry> independently of the value of the followLinks variable.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MONITORDIRS"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MONITORDIRS">
<term><varname>monitordirs</varname></term> <term><varname>monitordirs</varname></term>
<listitem><para>Space-separated list of files or directories to monitor for <listitem><para>Space-separated list of files or directories to monitor for
updates. When running the real-time indexer, this allows monitoring only a updates. When running the real-time indexer, this allows monitoring only a
subset of the whole indexed area. The elements must be included in the subset of the whole indexed area. The elements must be included in the
tree defined by the 'topdirs' members.</para></listitem></varlistentry> tree defined by the 'topdirs' members.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.SKIPPEDNAMES"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.SKIPPEDNAMES">
<term><varname>skippedNames</varname></term> <term><varname>skippedNames</varname></term>
<listitem><para>Files and directories which should be ignored. White space separated list of wildcard patterns (simple ones, not paths, must contain no <listitem><para>Files and directories which should be ignored. White space separated list of wildcard patterns (simple ones, not paths, must contain no
'/' characters), which will be tested against file and directory names. Have a look at the default '/' characters), which will be tested against file and directory names.
configuration for the initial value, some entries may not suit your situation. The easiest way to </para><para>
see it is through the GUI Index configuration "local parameters" panel. The list in the default Have a look at the default configuration for the initial value, some entries may not suit your
configuration does not exclude hidden directories (names beginning with a dot), which means that situation. The easiest way to see it is through the GUI Index configuration "local parameters"
it may index quite a few things that you do not want. On the other hand, email user agents like panel.
Thunderbird usually store messages in hidden directories, and you probably want this indexed. One </para><para>
possible solution is to have ".*" in "skippedNames", and add things like "~/.thunderbird" The list in the default configuration does not exclude hidden directories (names beginning with a
"~/.evolution" to "topdirs". Not even the file names are indexed for patterns in this list, see dot), which means that it may index quite a few things that you do not want. On the other hand,
the "noContentSuffixes" variable for an alternative approach which indexes the file names. Can be email user agents like Thunderbird usually store messages in hidden directories, and you probably
redefined for any subtree.</para></listitem></varlistentry> want this indexed. One possible solution is to have ".*" in "skippedNames", and add things like
"~/.thunderbird" "~/.evolution" to "topdirs".
</para><para>
Not even the file names are indexed for patterns in this list, see the "noContentSuffixes"
variable for an alternative approach which indexes the file names. Can be redefined for any
subtree.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.SKIPPEDNAMES-"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.SKIPPEDNAMES-">
<term><varname>skippedNames-</varname></term> <term><varname>skippedNames-</varname></term>
<listitem><para>List of name endings to remove from the default skippedNames <listitem><para>List of name endings to remove from the default skippedNames
@ -40,7 +48,8 @@ list. </para></listitem></varlistentry>
<term><varname>onlyNames</varname></term> <term><varname>onlyNames</varname></term>
<listitem><para>Regular file name filter patterns If this is set, only the file names not in skippedNames and <listitem><para>Regular file name filter patterns If this is set, only the file names not in skippedNames and
matching one of the patterns will be considered for indexing. Can be matching one of the patterns will be considered for indexing. Can be
redefined per subtree. Does not apply to directories.</para></listitem></varlistentry> redefined per subtree. Does not apply to directories.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.NOCONTENTSUFFIXES"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.NOCONTENTSUFFIXES">
<term><varname>noContentSuffixes</varname></term> <term><varname>noContentSuffixes</varname></term>
<listitem><para>List of name endings (not necessarily dot-separated suffixes) for <listitem><para>List of name endings (not necessarily dot-separated suffixes) for
@ -51,7 +60,8 @@ which will go away in a future release (the move from mimemap to
recoll.conf allows editing the list through the GUI). This is different recoll.conf allows editing the list through the GUI). This is different
from skippedNames because these are name ending matches only (not from skippedNames because these are name ending matches only (not
wildcard patterns), and the file name itself gets indexed normally. This wildcard patterns), and the file name itself gets indexed normally. This
can be redefined for subdirectories.</para></listitem></varlistentry> can be redefined for subdirectories.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.NOCONTENTSUFFIXES-"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.NOCONTENTSUFFIXES-">
<term><varname>noContentSuffixes-</varname></term> <term><varname>noContentSuffixes-</varname></term>
<listitem><para>List of name endings to remove from the default noContentSuffixes <listitem><para>List of name endings to remove from the default noContentSuffixes
@ -62,19 +72,26 @@ list. </para></listitem></varlistentry>
list. </para></listitem></varlistentry> list. </para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.SKIPPEDPATHS"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.SKIPPEDPATHS">
<term><varname>skippedPaths</varname></term> <term><varname>skippedPaths</varname></term>
<listitem><para>Absolute paths we should not go into. Space-separated list of wildcard expressions for absolute <listitem><para>Absolute paths we should not go into. Space-separated list of wildcard expressions for absolute filesystem paths (for files or
filesystem paths. Must be defined at the top level of the configuration directories). The variable must be defined at the top level of the configuration file, not in a
file, not in a subsection. Can contain files and directories. The database and subsection.
configuration directories will automatically be added. The expressions </para><para>
are matched using 'fnmatch(3)' with the FNM_PATHNAME flag set by Any value in the list must be textually consistent with the values in topdirs, no attempts are
default. This means that '/' characters must be matched explicitly. You made to resolve symbolic links. In practise, if, as is frequently the case, /home is a link to
can set 'skippedPathsFnmPathname' to 0 to disable the use of FNM_PATHNAME /usr/home, your default topdirs will have a single entry '~' which will be translated to
(meaning that '/*/dir3' will match '/dir1/dir2/dir3'). The default value '/home/yourlogin'. In this case, any skippedPaths entry should start with '/home/yourlogin' *not*
contains the usual mount point for removable media to remind you that it with '/usr/home/yourlogin'.
is a bad idea to have Recoll work on these (esp. with the monitor: media </para><para>
gets indexed on mount, all data gets erased on unmount). Explicitly The index and configuration directories will automatically be added to the list.
adding '/media/xxx' to the 'topdirs' variable will override </para><para>
this.</para></listitem></varlistentry> The expressions are matched using 'fnmatch(3)' with the FNM_PATHNAME flag set by default. This
means that '/' characters must be matched explicitly. You can set 'skippedPathsFnmPathname' to 0
to disable the use of FNM_PATHNAME (meaning that '/*/dir3' will match '/dir1/dir2/dir3').
</para><para>
The default value contains the usual mount point for removable media to remind you that it is in
most cases a bad idea to have Recoll work on these Explicitly adding '/media/xxx' to the 'topdirs'
variable will override this.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.SKIPPEDPATHSFNMPATHNAME"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.SKIPPEDPATHSFNMPATHNAME">
<term><varname>skippedPathsFnmPathname</varname></term> <term><varname>skippedPathsFnmPathname</varname></term>
<listitem><para>Set to 0 to <listitem><para>Set to 0 to
@ -83,13 +100,15 @@ paths. </para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.NOWALKFN"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.NOWALKFN">
<term><varname>nowalkfn</varname></term> <term><varname>nowalkfn</varname></term>
<listitem><para>File name which will cause its parent directory to be skipped. Any directory containing a file with this name will be skipped as <listitem><para>File name which will cause its parent directory to be skipped. Any directory containing a file with this name will be skipped as
if it was part of the skippedPaths list. Ex: .recoll-noindex</para></listitem></varlistentry> if it was part of the skippedPaths list. Ex: .recoll-noindex
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.DAEMSKIPPEDPATHS"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.DAEMSKIPPEDPATHS">
<term><varname>daemSkippedPaths</varname></term> <term><varname>daemSkippedPaths</varname></term>
<listitem><para>skippedPaths equivalent specific to <listitem><para>skippedPaths equivalent specific to
real time indexing. This enables having parts of the tree real time indexing. This enables having parts of the tree
which are initially indexed but not monitored. If daemSkippedPaths is which are initially indexed but not monitored. If daemSkippedPaths is
not set, the daemon uses skippedPaths.</para></listitem></varlistentry> not set, the daemon uses skippedPaths.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.ZIPUSESKIPPEDNAMES"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.ZIPUSESKIPPEDNAMES">
<term><varname>zipUseSkippedNames</varname></term> <term><varname>zipUseSkippedNames</varname></term>
<listitem><para>Use skippedNames inside Zip archives. Fetched <listitem><para>Use skippedNames inside Zip archives. Fetched
@ -115,7 +134,8 @@ multiple indexing of linked files. No effort is made to avoid duplication
when this option is set to true. This option can be set individually for when this option is set to true. This option can be set individually for
each of the 'topdirs' members by using sections. It can not be changed each of the 'topdirs' members by using sections. It can not be changed
below the 'topdirs' level. Links in the 'topdirs' list itself are always below the 'topdirs' level. Links in the 'topdirs' list itself are always
followed.</para></listitem></varlistentry> followed.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.INDEXEDMIMETYPES"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.INDEXEDMIMETYPES">
<term><varname>indexedmimetypes</varname></term> <term><varname>indexedmimetypes</varname></term>
<listitem><para>Restrictive list of <listitem><para>Restrictive list of
@ -124,14 +144,16 @@ supported types are indexed). If it is set, only the types from the list
will have their contents indexed. The names will be indexed anyway if will have their contents indexed. The names will be indexed anyway if
indexallfilenames is set (default). MIME type names should be taken from indexallfilenames is set (default). MIME type names should be taken from
the mimemap file (the values may be different from xdg-mime or file -i the mimemap file (the values may be different from xdg-mime or file -i
output in some cases). Can be redefined for subtrees.</para></listitem></varlistentry> output in some cases). Can be redefined for subtrees.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.EXCLUDEDMIMETYPES"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.EXCLUDEDMIMETYPES">
<term><varname>excludedmimetypes</varname></term> <term><varname>excludedmimetypes</varname></term>
<listitem><para>List of excluded MIME <listitem><para>List of excluded MIME
types. Lets you exclude some types from indexing. MIME type types. Lets you exclude some types from indexing. MIME type
names should be taken from the mimemap file (the values may be different names should be taken from the mimemap file (the values may be different
from xdg-mime or file -i output in some cases) Can be redefined for from xdg-mime or file -i output in some cases) Can be redefined for
subtrees.</para></listitem></varlistentry> subtrees.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.NOMD5TYPES"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.NOMD5TYPES">
<term><varname>nomd5types</varname></term> <term><varname>nomd5types</varname></term>
<listitem><para>Don't compute md5 for these types. md5 checksums are used only for deduplicating results, and can be <listitem><para>Don't compute md5 for these types. md5 checksums are used only for deduplicating results, and can be
@ -140,32 +162,37 @@ lets you turn off md5 computation for selected types. It is global (no
redefinition for subtrees). At the moment, it only has an effect for redefinition for subtrees). At the moment, it only has an effect for
external handlers (exec and execm). The file types can be specified by external handlers (exec and execm). The file types can be specified by
listing either MIME types (e.g. audio/mpeg) or handler names listing either MIME types (e.g. audio/mpeg) or handler names
(e.g. rclaudio).</para></listitem></varlistentry> (e.g. rclaudio).
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.COMPRESSEDFILEMAXKBS"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.COMPRESSEDFILEMAXKBS">
<term><varname>compressedfilemaxkbs</varname></term> <term><varname>compressedfilemaxkbs</varname></term>
<listitem><para>Size limit for compressed <listitem><para>Size limit for compressed
files. We need to decompress these in a files. We need to decompress these in a
temporary directory for identification, which can be wasteful in some temporary directory for identification, which can be wasteful in some
cases. Limit the waste. Negative means no limit. 0 results in no cases. Limit the waste. Negative means no limit. 0 results in no
processing of any compressed file. Default 50 MB.</para></listitem></varlistentry> processing of any compressed file. Default 50 MB.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.TEXTFILEMAXMBS"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.TEXTFILEMAXMBS">
<term><varname>textfilemaxmbs</varname></term> <term><varname>textfilemaxmbs</varname></term>
<listitem><para>Size limit for text <listitem><para>Size limit for text
files. Mostly for skipping monster files. Mostly for skipping monster
logs. Default 20 MB.</para></listitem></varlistentry> logs. Default 20 MB.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.INDEXALLFILENAMES"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.INDEXALLFILENAMES">
<term><varname>indexallfilenames</varname></term> <term><varname>indexallfilenames</varname></term>
<listitem><para>Index the file names of <listitem><para>Index the file names of
unprocessed files Index the names of files the contents of unprocessed files Index the names of files the contents of
which we don't index because of an excluded or unsupported MIME which we don't index because of an excluded or unsupported MIME
type.</para></listitem></varlistentry> type.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.USESYSTEMFILECOMMAND"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.USESYSTEMFILECOMMAND">
<term><varname>usesystemfilecommand</varname></term> <term><varname>usesystemfilecommand</varname></term>
<listitem><para>Use a system command <listitem><para>Use a system command
for file MIME type guessing as a final step in file type for file MIME type guessing as a final step in file type
identification This is generally useful, but will usually identification This is generally useful, but will usually
cause the indexing of many bogus 'text' files. See 'systemfilecommand' cause the indexing of many bogus 'text' files. See 'systemfilecommand'
for the command used.</para></listitem></varlistentry> for the command used.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.SYSTEMFILECOMMAND"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.SYSTEMFILECOMMAND">
<term><varname>systemfilecommand</varname></term> <term><varname>systemfilecommand</varname></term>
<listitem><para>Command used to guess <listitem><para>Command used to guess
@ -173,12 +200,14 @@ MIME types if the internal methods fails This should be a
"file -i" workalike. The file path will be added as a last parameter to "file -i" workalike. The file path will be added as a last parameter to
the command line. "xdg-mime" works better than the traditional "file" the command line. "xdg-mime" works better than the traditional "file"
command, and is now the configured default (with a hard-coded fallback to command, and is now the configured default (with a hard-coded fallback to
"file")</para></listitem></varlistentry> "file")
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.PROCESSWEBQUEUE"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.PROCESSWEBQUEUE">
<term><varname>processwebqueue</varname></term> <term><varname>processwebqueue</varname></term>
<listitem><para>Decide if we process the <listitem><para>Decide if we process the
Web queue. The queue is a directory where the Recoll Web Web queue. The queue is a directory where the Recoll Web
browser plugins create the copies of visited pages.</para></listitem></varlistentry> browser plugins create the copies of visited pages.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.TEXTFILEPAGEKBS"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.TEXTFILEPAGEKBS">
<term><varname>textfilepagekbs</varname></term> <term><varname>textfilepagekbs</varname></term>
<listitem><para>Page size for text <listitem><para>Page size for text
@ -187,12 +216,14 @@ into documents of approximately this size. Will reduce memory usage at
index time and help with loading data in the preview window at query index time and help with loading data in the preview window at query
time. Particularly useful with very big files, such as application or time. Particularly useful with very big files, such as application or
system logs. Also see textfilemaxmbs and system logs. Also see textfilemaxmbs and
compressedfilemaxkbs.</para></listitem></varlistentry> compressedfilemaxkbs.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MEMBERMAXKBS"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MEMBERMAXKBS">
<term><varname>membermaxkbs</varname></term> <term><varname>membermaxkbs</varname></term>
<listitem><para>Size limit for archive <listitem><para>Size limit for archive
members. This is passed to the filters in the environment members. This is passed to the filters in the environment
as RECOLL_FILTER_MAXMEMBERKB.</para></listitem></varlistentry> as RECOLL_FILTER_MAXMEMBERKB.
</para></listitem></varlistentry>
</variablelist></sect3> </variablelist></sect3>
<sect3 id="RCL.INSTALL.CONFIG.RECOLLCONF.TERMS"> <sect3 id="RCL.INSTALL.CONFIG.RECOLLCONF.TERMS">
<title>Parameters affecting how we generate terms and organize the index </title><variablelist> <title>Parameters affecting how we generate terms and organize the index </title><variablelist>
@ -204,28 +235,34 @@ searches sensitive to case and diacritics can be performed, but the index
will be bigger, and some marginal weirdness may sometimes occur. The will be bigger, and some marginal weirdness may sometimes occur. The
default is a stripped index. When using multiple indexes for a search, default is a stripped index. When using multiple indexes for a search,
this parameter must be defined identically for all. Changing the value this parameter must be defined identically for all. Changing the value
implies an index reset.</para></listitem></varlistentry> implies an index reset.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.INDEXSTOREDOCTEXT"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.INDEXSTOREDOCTEXT">
<term><varname>indexStoreDocText</varname></term> <term><varname>indexStoreDocText</varname></term>
<listitem><para>Decide if we store the <listitem><para>Decide if we store the
documents' text content in the index. Storing the text documents' text content in the index. Storing the text
allows extracting snippets from it at query time, instead of building allows extracting snippets from it at query time, instead of building
them from index position data. them from index position data.
</para><para>
Newer Xapian index formats have rendered our use of positions list Newer Xapian index formats have rendered our use of positions list
unacceptably slow in some cases. The last Xapian index format with good unacceptably slow in some cases. The last Xapian index format with good
performance for the old method is Chert, which is default for 1.2, still performance for the old method is Chert, which is default for 1.2, still
supported but not default in 1.4 and will be dropped in 1.6. supported but not default in 1.4 and will be dropped in 1.6.
</para><para>
The stored document text is translated from its original format to UTF-8 The stored document text is translated from its original format to UTF-8
plain text, but not stripped of upper-case, diacritics, or punctuation plain text, but not stripped of upper-case, diacritics, or punctuation
signs. Storing it increases the index size by 10-20% typically, but also signs. Storing it increases the index size by 10-20% typically, but also
allows for nicer snippets, so it may be worth enabling it even if not allows for nicer snippets, so it may be worth enabling it even if not
strictly needed for performance if you can afford the space. strictly needed for performance if you can afford the space.
</para><para>
The variable only has an effect when creating an index, meaning that the The variable only has an effect when creating an index, meaning that the
xapiandb directory must not exist yet. Its exact effect depends on the xapiandb directory must not exist yet. Its exact effect depends on the
Xapian version. Xapian version.
</para><para>
For Xapian 1.4, if the variable is set to 0, the Chert format will be For Xapian 1.4, if the variable is set to 0, the Chert format will be
used, and the text will not be stored. If the variable is 1, Glass will used, and the text will not be stored. If the variable is 1, Glass will
be used, and the text stored. be used, and the text stored.
</para><para>
For Xapian 1.2, and for versions after 1.5 and newer, the index format is For Xapian 1.2, and for versions after 1.5 and newer, the index format is
always the default, but the variable controls if the text is stored or always the default, but the variable controls if the text is stored or
not, and the abstract generation method. With Xapian 1.5 and later, and not, and the abstract generation method. With Xapian 1.5 and later, and
@ -242,26 +279,31 @@ still be). Numbers are often quite interesting to search for, and this
should probably not be set except for special situations, ie, scientific should probably not be set except for special situations, ie, scientific
documents with huge amounts of numbers in them, where setting nonumbers documents with huge amounts of numbers in them, where setting nonumbers
will reduce the index size. This can only be set for a whole index, not will reduce the index size. This can only be set for a whole index, not
for a subtree.</para></listitem></varlistentry> for a subtree.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.DEHYPHENATE"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.DEHYPHENATE">
<term><varname>dehyphenate</varname></term> <term><varname>dehyphenate</varname></term>
<listitem><para>Determines if we index 'coworker' <listitem><para>Determines if we index 'coworker'
also when the input is 'co-worker'. This is new also when the input is 'co-worker'. This is new
in version 1.22, and on by default. Setting the variable to off allows in version 1.22, and on by default. Setting the variable to off allows
restoring the previous behaviour.</para></listitem></varlistentry> restoring the previous behaviour.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.BACKSLASHASLETTER"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.BACKSLASHASLETTER">
<term><varname>backslashasletter</varname></term> <term><varname>backslashasletter</varname></term>
<listitem><para>Process backslash as normal letter. This may make sense for people wanting to index TeX commands as <listitem><para>Process backslash as normal letter. This may make sense for people wanting to index TeX commands as
such but is not of much general use.</para></listitem></varlistentry> such but is not of much general use.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.UNDERSCOREASLETTER"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.UNDERSCOREASLETTER">
<term><varname>underscoreasletter</varname></term> <term><varname>underscoreasletter</varname></term>
<listitem><para>Process underscore as normal letter. This makes sense in so many cases that one wonders if it should <listitem><para>Process underscore as normal letter. This makes sense in so many cases that one wonders if it should
not be the default.</para></listitem></varlistentry> not be the default.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MAXTERMLENGTH"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MAXTERMLENGTH">
<term><varname>maxtermlength</varname></term> <term><varname>maxtermlength</varname></term>
<listitem><para>Maximum term length. Words longer than this will be discarded. <listitem><para>Maximum term length. Words longer than this will be discarded.
The default is 40 and used to be hard-coded, but it can now be The default is 40 and used to be hard-coded, but it can now be
adjusted. You need an index reset if you change the value.</para></listitem></varlistentry> adjusted. You need an index reset if you change the value.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.NOCJK"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.NOCJK">
<term><varname>nocjk</varname></term> <term><varname>nocjk</varname></term>
<listitem><para>Decides if specific East Asian <listitem><para>Decides if specific East Asian
@ -269,20 +311,23 @@ adjusted. You need an index reset if you change the value.</para></listitem></va
off. This will save a small amount of CPU if you have no CJK off. This will save a small amount of CPU if you have no CJK
documents. If your document base does include such text but you are not documents. If your document base does include such text but you are not
interested in searching it, setting nocjk may be a interested in searching it, setting nocjk may be a
significant time and space saver.</para></listitem></varlistentry> significant time and space saver.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.CJKNGRAMLEN"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.CJKNGRAMLEN">
<term><varname>cjkngramlen</varname></term> <term><varname>cjkngramlen</varname></term>
<listitem><para>This lets you adjust the size of <listitem><para>This lets you adjust the size of
n-grams used for indexing CJK text. The default value of 2 is n-grams used for indexing CJK text. The default value of 2 is
probably appropriate in most cases. A value of 3 would allow more precision probably appropriate in most cases. A value of 3 would allow more precision
and efficiency on longer words, but the index will be approximately twice and efficiency on longer words, but the index will be approximately twice
as large.</para></listitem></varlistentry> as large.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.INDEXSTEMMINGLANGUAGES"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.INDEXSTEMMINGLANGUAGES">
<term><varname>indexstemminglanguages</varname></term> <term><varname>indexstemminglanguages</varname></term>
<listitem><para>Languages for which to create stemming expansion <listitem><para>Languages for which to create stemming expansion
data. Stemmer names can be found by executing 'recollindex data. Stemmer names can be found by executing 'recollindex
-l', or this can also be set from a list in the GUI. The values are full -l', or this can also be set from a list in the GUI. The values are full
language names, e.g. english, french...</para></listitem></varlistentry> language names, e.g. english, french...
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.DEFAULTCHARSET"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.DEFAULTCHARSET">
<term><varname>defaultcharset</varname></term> <term><varname>defaultcharset</varname></term>
<listitem><para>Default character <listitem><para>Default character
@ -293,37 +338,39 @@ set, the default character set is the one defined by the NLS environment
($LC_ALL, $LC_CTYPE, $LANG), or ultimately iso-8859-1 (cp-1252 in fact). ($LC_ALL, $LC_CTYPE, $LANG), or ultimately iso-8859-1 (cp-1252 in fact).
If for some reason you want a general default which does not match your If for some reason you want a general default which does not match your
LANG and is not 8859-1, use this variable. This can be redefined for any LANG and is not 8859-1, use this variable. This can be redefined for any
sub-directory.</para></listitem></varlistentry> sub-directory.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.UNAC_EXCEPT_TRANS"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.UNAC_EXCEPT_TRANS">
<term><varname>unac_except_trans</varname></term> <term><varname>unac_except_trans</varname></term>
<listitem><para>A list of characters, <listitem><para>A list of characters, encoded in UTF-8, which should be handled specially when converting
encoded in UTF-8, which should be handled specially text to unaccented lowercase. For example, in Swedish, the letter a with diaeresis has full alphabet citizenship and
when converting text to unaccented lowercase. For should not be turned into an a. Each element in the space-separated list has the special
example, in Swedish, the letter a with diaeresis has full alphabet character as first element and the translation following. The handling of both the lowercase and
citizenship and should not be turned into an a. upper-case versions of a character should be specified, as appartenance to the list will turn-off
Each element in the space-separated list has the special character as both standard accent and case processing. The value is global and affects both indexing and
first element and the translation following. The handling of both the querying. We also convert a few confusing Unicode characters (quotes, hyphen) to their ASCII
lowercase and upper-case versions of a character should be specified, as equivalent to avoid "invisible" search failures.
appartenance to the list will turn-off both standard accent and case </para><para>
processing. The value is global and affects both indexing and querying.
Examples: Examples:
Swedish: Swedish:
unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe æae Æae ffff fifi flfl åå Åå unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe æae Æae ffff fifi flfl åå Åå ' ❜' ʼ' -
. German: . German:
unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe æae Æae ffff fifi flfl unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe æae Æae ffff fifi flfl ' ❜' ʼ' -
. French: you probably want to decompose oe and ae and nobody would type . French: you probably want to decompose oe and ae and nobody would type
a German ß a German ß
unac_except_trans = ßss œoe Œoe æae Æae ffff fifi flfl unac_except_trans = ßss œoe Œoe æae Æae ffff fifi flfl ' ❜' ʼ' -
. The default for all until someone protests follows. These decompositions . The default for all until someone protests follows. These decompositions
are not performed by unac, but it is unlikely that someone would type the are not performed by unac, but it is unlikely that someone would type the
composed forms in a search. composed forms in a search.
unac_except_trans = ßss œoe Œoe æae Æae ffff fifi flfl</para></listitem></varlistentry> unac_except_trans = ßss œoe Œoe æae Æae ffff fifi flfl ' ❜' ʼ' -
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MAILDEFCHARSET"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MAILDEFCHARSET">
<term><varname>maildefcharset</varname></term> <term><varname>maildefcharset</varname></term>
<listitem><para>Overrides the default <listitem><para>Overrides the default
character set for email messages which don't specify character set for email messages which don't specify
one. This is mainly useful for readpst (libpst) dumps, one. This is mainly useful for readpst (libpst) dumps,
which are utf-8 but do not say so.</para></listitem></varlistentry> which are utf-8 but do not say so.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.LOCALFIELDS"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.LOCALFIELDS">
<term><varname>localfields</varname></term> <term><varname>localfields</varname></term>
<listitem><para>Set fields on all files <listitem><para>Set fields on all files
@ -331,7 +378,8 @@ which are utf-8 but do not say so.</para></listitem></varlistentry>
name = value ; attr1 = val1 ; [...] name = value ; attr1 = val1 ; [...]
value is empty so this needs an initial semi-colon. This is useful, e.g., value is empty so this needs an initial semi-colon. This is useful, e.g.,
for setting the rclaptg field for application selection inside for setting the rclaptg field for application selection inside
mimeview.</para></listitem></varlistentry> mimeview.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.TESTMODIFUSEMTIME"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.TESTMODIFUSEMTIME">
<term><varname>testmodifusemtime</varname></term> <term><varname>testmodifusemtime</varname></term>
<listitem><para>Use mtime instead of <listitem><para>Use mtime instead of
@ -353,12 +401,12 @@ undetected). Perform a full index reset after changing this.
<term><varname>noxattrfields</varname></term> <term><varname>noxattrfields</varname></term>
<listitem><para>Disable extended attributes <listitem><para>Disable extended attributes
conversion to metadata fields. This probably needs to be conversion to metadata fields. This probably needs to be
set if testmodifusemtime is set.</para></listitem></varlistentry> set if testmodifusemtime is set.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.METADATACMDS"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.METADATACMDS">
<term><varname>metadatacmds</varname></term> <term><varname>metadatacmds</varname></term>
<listitem><para>Define commands to <listitem><para>Define commands to
gather external metadata, e.g. tmsu tags. gather external metadata, e.g. tmsu tags. There can be several entries, separated by semi-colons, each defining
There can be several entries, separated by semi-colons, each defining
which field name the data goes into and the command to use. Don't forget the which field name the data goes into and the command to use. Don't forget the
initial semi-colon. All the field names must be different. You can use initial semi-colon. All the field names must be different. You can use
aliases in the "field" file if necessary. aliases in the "field" file if necessary.
@ -383,13 +431,15 @@ cachedir is ~/.cache/recoll, the default dbdir would be
mboxcachedir, aspellDicDir, which can still be individually specified to mboxcachedir, aspellDicDir, which can still be individually specified to
override cachedir. Note that if you have multiple configurations, each override cachedir. Note that if you have multiple configurations, each
must have a different cachedir, there is no automatic computation of a must have a different cachedir, there is no automatic computation of a
subpath under cachedir.</para></listitem></varlistentry> subpath under cachedir.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MAXFSOCCUPPC"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MAXFSOCCUPPC">
<term><varname>maxfsoccuppc</varname></term> <term><varname>maxfsoccuppc</varname></term>
<listitem><para>Maximum file system occupation <listitem><para>Maximum file system occupation
over which we stop indexing. The value is a percentage, over which we stop indexing. The value is a percentage,
corresponding to what the "Capacity" df output column shows. The default corresponding to what the "Capacity" df output column shows. The default
value is 0, meaning no checking.</para></listitem></varlistentry> value is 0, meaning no checking.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.DBDIR"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.DBDIR">
<term><varname>dbdir</varname></term> <term><varname>dbdir</varname></term>
<listitem><para>Xapian database directory <listitem><para>Xapian database directory
@ -397,36 +447,43 @@ location. This will be created on first indexing. If the
value is not an absolute path, it will be interpreted as relative to value is not an absolute path, it will be interpreted as relative to
cachedir if set, or the configuration directory (-c argument or cachedir if set, or the configuration directory (-c argument or
$RECOLL_CONFDIR). If nothing is specified, the default is then $RECOLL_CONFDIR). If nothing is specified, the default is then
~/.recoll/xapiandb/</para></listitem></varlistentry> ~/.recoll/xapiandb/
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.IDXSTATUSFILE"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.IDXSTATUSFILE">
<term><varname>idxstatusfile</varname></term> <term><varname>idxstatusfile</varname></term>
<listitem><para>Name of the scratch file where the indexer process updates its <listitem><para>Name of the scratch file where the indexer process updates its
status. Default: idxstatus.txt inside the configuration status. Default: idxstatus.txt inside the configuration
directory.</para></listitem></varlistentry> directory.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MBOXCACHEDIR"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MBOXCACHEDIR">
<term><varname>mboxcachedir</varname></term> <term><varname>mboxcachedir</varname></term>
<listitem><para>Directory location for storing mbox message offsets cache <listitem><para>Directory location for storing mbox message offsets cache
files. This is normally 'mboxcache' under cachedir if set, files. This is normally 'mboxcache' under cachedir if set,
or else under the configuration directory, but it may be useful to share or else under the configuration directory, but it may be useful to share
a directory between different configurations.</para></listitem></varlistentry> a directory between different configurations.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MBOXCACHEMINMBS"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MBOXCACHEMINMBS">
<term><varname>mboxcacheminmbs</varname></term> <term><varname>mboxcacheminmbs</varname></term>
<listitem><para>Minimum mbox file size over which we cache the offsets. There is really no sense in caching offsets for small files. The <listitem><para>Minimum mbox file size over which we cache the offsets. There is really no sense in caching offsets for small files. The
default is 5 MB.</para></listitem></varlistentry> default is 5 MB.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MBOXMAXMSGMBS"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MBOXMAXMSGMBS">
<term><varname>mboxmaxmsgmbs</varname></term> <term><varname>mboxmaxmsgmbs</varname></term>
<listitem><para>Maximum mbox member message size in megabytes. Size over which we assume that the mbox format is bad or we <listitem><para>Maximum mbox member message size in megabytes. Size over which we assume that the mbox format is bad or we
misinterpreted it, at which point we just stop processing the file.</para></listitem></varlistentry> misinterpreted it, at which point we just stop processing the file.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.WEBCACHEDIR"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.WEBCACHEDIR">
<term><varname>webcachedir</varname></term> <term><varname>webcachedir</varname></term>
<listitem><para>Directory where we store the archived web pages. This is only used by the web history indexing code <listitem><para>Directory where we store the archived web pages. This is only used by the web history indexing code
Default: cachedir/webcache if cachedir is set, else Default: cachedir/webcache if cachedir is set, else
$RECOLL_CONFDIR/webcache</para></listitem></varlistentry> $RECOLL_CONFDIR/webcache
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.WEBCACHEMAXMBS"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.WEBCACHEMAXMBS">
<term><varname>webcachemaxmbs</varname></term> <term><varname>webcachemaxmbs</varname></term>
<listitem><para>Maximum size in MB of the Web archive. This is only used by the web history indexing code. <listitem><para>Maximum size in MB of the Web archive. This is only used by the web history indexing code.
Default: 40 MB. Default: 40 MB.
Reducing the size will not physically truncate the file.</para></listitem></varlistentry> Reducing the size will not physically truncate the file.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.WEBQUEUEDIR"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.WEBQUEUEDIR">
<term><varname>webqueuedir</varname></term> <term><varname>webqueuedir</varname></term>
<listitem><para>The path to the Web indexing queue. This used to be <listitem><para>The path to the Web indexing queue. This used to be
@ -434,36 +491,42 @@ hard-coded in the old plugin as ~/.recollweb/ToIndex so there would be no
need or possibility to change it, but the WebExtensions plugin now downloads need or possibility to change it, but the WebExtensions plugin now downloads
the files to the user Downloads directory, and a script moves them to the files to the user Downloads directory, and a script moves them to
webqueuedir. The script reads this value from the config so it has become webqueuedir. The script reads this value from the config so it has become
possible to change it.</para></listitem></varlistentry> possible to change it.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.WEBDOWNLOADSDIR"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.WEBDOWNLOADSDIR">
<term><varname>webdownloadsdir</varname></term> <term><varname>webdownloadsdir</varname></term>
<listitem><para>The path to browser downloads directory. This is <listitem><para>The path to browser downloads directory. This is
where the new browser add-on extension has to create the files. They are where the new browser add-on extension has to create the files. They are
then moved by a script to webqueuedir.</para></listitem></varlistentry> then moved by a script to webqueuedir.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.WEBCACHEKEEPINTERVAL"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.WEBCACHEKEEPINTERVAL">
<term><varname>webcachekeepinterval</varname></term> <term><varname>webcachekeepinterval</varname></term>
<listitem><para>Page recycle interval By default, only one instance of an URL is kept in the cache. This <listitem><para>Page recycle interval By default, only one instance of an URL is kept in the cache. This
can be changed by setting this to a value determining at what frequency can be changed by setting this to a value determining at what frequency
we keep multiple instances ('day', 'week', 'month', we keep multiple instances ('day', 'week', 'month',
'year'). Note that increasing the interval will not erase existing 'year'). Note that increasing the interval will not erase existing
entries.</para></listitem></varlistentry> entries.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.ASPELLDICDIR"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.ASPELLDICDIR">
<term><varname>aspellDicDir</varname></term> <term><varname>aspellDicDir</varname></term>
<listitem><para>Aspell dictionary storage directory location. The <listitem><para>Aspell dictionary storage directory location. The
aspell dictionary (aspdict.(lang).rws) is normally stored in the aspell dictionary (aspdict.(lang).rws) is normally stored in the
directory specified by cachedir if set, or under the configuration directory specified by cachedir if set, or under the configuration
directory.</para></listitem></varlistentry> directory.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.FILTERSDIR"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.FILTERSDIR">
<term><varname>filtersdir</varname></term> <term><varname>filtersdir</varname></term>
<listitem><para>Directory location for executable input handlers. If <listitem><para>Directory location for executable input handlers. If
RECOLL_FILTERSDIR is set in the environment, we use it instead. Defaults RECOLL_FILTERSDIR is set in the environment, we use it instead. Defaults
to $prefix/share/recoll/filters. Can be redefined for to $prefix/share/recoll/filters. Can be redefined for
subdirectories.</para></listitem></varlistentry> subdirectories.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.ICONSDIR"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.ICONSDIR">
<term><varname>iconsdir</varname></term> <term><varname>iconsdir</varname></term>
<listitem><para>Directory location for icons. The only reason to <listitem><para>Directory location for icons. The only reason to
change this would be if you want to change the icons displayed in the change this would be if you want to change the icons displayed in the
result list. Defaults to $prefix/share/recoll/images</para></listitem></varlistentry> result list. Defaults to $prefix/share/recoll/images
</para></listitem></varlistentry>
</variablelist></sect3> </variablelist></sect3>
<sect3 id="RCL.INSTALL.CONFIG.RECOLLCONF.PERFS"> <sect3 id="RCL.INSTALL.CONFIG.RECOLLCONF.PERFS">
<title>Parameters affecting indexing performance and resource usage </title><variablelist> <title>Parameters affecting indexing performance and resource usage </title><variablelist>
@ -481,13 +544,15 @@ value (from this file) is now 50 MB, and should be ok in many cases.
You can set it as low as 10 to conserve memory, but if you are looking You can set it as low as 10 to conserve memory, but if you are looking
for maximum speed, you may want to experiment with values between 20 and for maximum speed, you may want to experiment with values between 20 and
200. In my experience, values beyond this are always counterproductive. If 200. In my experience, values beyond this are always counterproductive. If
you find otherwise, please drop me a note.</para></listitem></varlistentry> you find otherwise, please drop me a note.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.FILTERMAXSECONDS"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.FILTERMAXSECONDS">
<term><varname>filtermaxseconds</varname></term> <term><varname>filtermaxseconds</varname></term>
<listitem><para>Maximum external filter execution time in <listitem><para>Maximum external filter execution time in
seconds. Default 1200 (20mn). Set to 0 for no limit. This seconds. Default 1200 (20mn). Set to 0 for no limit. This
is mainly to avoid infinite loops in postscript files is mainly to avoid infinite loops in postscript files
(loop.ps)</para></listitem></varlistentry> (loop.ps)
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.FILTERMAXMBYTES"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.FILTERMAXMBYTES">
<term><varname>filtermaxmbytes</varname></term> <term><varname>filtermaxmbytes</varname></term>
<listitem><para>Maximum virtual memory space for filter processes <listitem><para>Maximum virtual memory space for filter processes
@ -495,7 +560,8 @@ is mainly to avoid infinite loops in postscript files
Linux way to limit the data space only), so we need to be a bit generous Linux way to limit the data space only), so we need to be a bit generous
here. Anything over 2000 will be ignored on 32 bits machines. The here. Anything over 2000 will be ignored on 32 bits machines. The
previous default value of 2000 would prevent java pdftk to work when previous default value of 2000 would prevent java pdftk to work when
executed from Python rclpdf.py.</para></listitem></varlistentry> executed from Python rclpdf.py.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.THRQSIZES"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.THRQSIZES">
<term><varname>thrQSizes</varname></term> <term><varname>thrQSizes</varname></term>
<listitem><para>Stage input queues configuration. There are three <listitem><para>Stage input queues configuration. There are three
@ -507,7 +573,8 @@ next stage. In practise, deep queues have not been shown to increase
performance. Default: a value of 0 for the first queue tells Recoll to performance. Default: a value of 0 for the first queue tells Recoll to
perform autoconfiguration based on the detected number of CPUs (no need perform autoconfiguration based on the detected number of CPUs (no need
for the two other values in this case). Use thrQSizes = -1 -1 -1 to for the two other values in this case). Use thrQSizes = -1 -1 -1 to
disable multithreading entirely.</para></listitem></varlistentry> disable multithreading entirely.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.THRTCOUNTS"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.THRTCOUNTS">
<term><varname>thrTCounts</varname></term> <term><varname>thrTCounts</varname></term>
<listitem><para>Number of threads used for each indexing stage. The <listitem><para>Number of threads used for each indexing stage. The
@ -517,7 +584,8 @@ in thrQSizes: if the first queue depth is 0, all counts are ignored
(autoconfigured); if a value of -1 is used for a queue depth, the (autoconfigured); if a value of -1 is used for a queue depth, the
corresponding thread count is ignored. It makes no sense to use a value corresponding thread count is ignored. It makes no sense to use a value
other than 1 for the last stage because updating the Xapian index is other than 1 for the last stage because updating the Xapian index is
necessarily single-threaded (and protected by a mutex).</para></listitem></varlistentry> necessarily single-threaded (and protected by a mutex).
</para></listitem></varlistentry>
</variablelist></sect3> </variablelist></sect3>
<sect3 id="RCL.INSTALL.CONFIG.RECOLLCONF.MISC"> <sect3 id="RCL.INSTALL.CONFIG.RECOLLCONF.MISC">
<title>Miscellaneous parameters </title><variablelist> <title>Miscellaneous parameters </title><variablelist>
@ -525,7 +593,8 @@ necessarily single-threaded (and protected by a mutex).</para></listitem></varli
<term><varname>loglevel</varname></term> <term><varname>loglevel</varname></term>
<listitem><para>Log file verbosity 1-6. A value of 2 will print <listitem><para>Log file verbosity 1-6. A value of 2 will print
only errors and warnings. 3 will print information like document updates, only errors and warnings. 3 will print information like document updates,
4 is quite verbose and 6 very verbose.</para></listitem></varlistentry> 4 is quite verbose and 6 very verbose.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.LOGFILENAME"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.LOGFILENAME">
<term><varname>logfilename</varname></term> <term><varname>logfilename</varname></term>
<listitem><para>Log file destination. Use 'stderr' (default) to write to the <listitem><para>Log file destination. Use 'stderr' (default) to write to the
@ -541,17 +610,20 @@ console. </para></listitem></varlistentry>
<listitem><para>Destination file for external helpers standard error output. The external program error output is left alone by default, <listitem><para>Destination file for external helpers standard error output. The external program error output is left alone by default,
e.g. going to the terminal when the recoll[index] program is executed e.g. going to the terminal when the recoll[index] program is executed
from the command line. Use /dev/null or a file inside a non-existent from the command line. Use /dev/null or a file inside a non-existent
directory to completely suppress the output.</para></listitem></varlistentry> directory to completely suppress the output.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.DAEMLOGLEVEL"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.DAEMLOGLEVEL">
<term><varname>daemloglevel</varname></term> <term><varname>daemloglevel</varname></term>
<listitem><para>Override loglevel for the indexer in real time <listitem><para>Override loglevel for the indexer in real time
mode. The default is to use the idx... values if set, else mode. The default is to use the idx... values if set, else
the log... values.</para></listitem></varlistentry> the log... values.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.DAEMLOGFILENAME"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.DAEMLOGFILENAME">
<term><varname>daemlogfilename</varname></term> <term><varname>daemlogfilename</varname></term>
<listitem><para>Override logfilename for the indexer in real time <listitem><para>Override logfilename for the indexer in real time
mode. The default is to use the idx... values if set, else mode. The default is to use the idx... values if set, else
the log... values.</para></listitem></varlistentry> the log... values.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.PYLOGLEVEL"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.PYLOGLEVEL">
<term><varname>pyloglevel</varname></term> <term><varname>pyloglevel</varname></term>
<listitem><para>Override loglevel for the python module. </para></listitem></varlistentry> <listitem><para>Override loglevel for the python module. </para></listitem></varlistentry>
@ -564,7 +636,8 @@ the log... values.</para></listitem></varlistentry>
configuration directory inside the directory tree makes it possible to configuration directory inside the directory tree makes it possible to
provide automatic query time path translations once the data set has provide automatic query time path translations once the data set has
moved (for example, because it has been mounted on another moved (for example, because it has been mounted on another
location).</para></listitem></varlistentry> location).
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.CURIDXCONFDIR"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.CURIDXCONFDIR">
<term><varname>curidxconfdir</varname></term> <term><varname>curidxconfdir</varname></term>
<listitem><para>Current location of the configuration directory. Complement orgidxconfdir for movable datasets. This should be used <listitem><para>Current location of the configuration directory. Complement orgidxconfdir for movable datasets. This should be used
@ -576,7 +649,8 @@ example if a dataset originally indexed as '/home/me/mydata/config' has
been mounted to '/media/me/mydata', and the GUI is running from a copied been mounted to '/media/me/mydata', and the GUI is running from a copied
configuration, orgidxconfdir would be '/home/me/mydata/config', and configuration, orgidxconfdir would be '/home/me/mydata/config', and
curidxconfdir (as set in the copied configuration) would be curidxconfdir (as set in the copied configuration) would be
'/media/me/mydata/config'.</para></listitem></varlistentry> '/media/me/mydata/config'.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.IDXRUNDIR"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.IDXRUNDIR">
<term><varname>idxrundir</varname></term> <term><varname>idxrundir</varname></term>
<listitem><para>Indexing process current directory. The input <listitem><para>Indexing process current directory. The input
@ -585,19 +659,22 @@ makes sense to have recollindex chdir to some temporary directory. If the
value is empty, the current directory is not changed. If the value is empty, the current directory is not changed. If the
value is (literal) tmp, we use the temporary directory as set by the value is (literal) tmp, we use the temporary directory as set by the
environment (RECOLL_TMPDIR else TMPDIR else /tmp). If the value is an environment (RECOLL_TMPDIR else TMPDIR else /tmp). If the value is an
absolute path to a directory, we go there.</para></listitem></varlistentry> absolute path to a directory, we go there.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.CHECKNEEDRETRYINDEXSCRIPT"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.CHECKNEEDRETRYINDEXSCRIPT">
<term><varname>checkneedretryindexscript</varname></term> <term><varname>checkneedretryindexscript</varname></term>
<listitem><para>Script used to heuristically check if we need to retry indexing <listitem><para>Script used to heuristically check if we need to retry indexing
files which previously failed. The default script checks files which previously failed. The default script checks
the modified dates on /usr/bin and /usr/local/bin. A relative path will the modified dates on /usr/bin and /usr/local/bin. A relative path will
be looked up in the filters dirs, then in the path. Use an absolute path be looked up in the filters dirs, then in the path. Use an absolute path
to do otherwise.</para></listitem></varlistentry> to do otherwise.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.RECOLLHELPERPATH"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.RECOLLHELPERPATH">
<term><varname>recollhelperpath</varname></term> <term><varname>recollhelperpath</varname></term>
<listitem><para>Additional places to search for helper executables. This is used, e.g., on Windows by the Python code, and on Mac OS by the bundled recoll.app <listitem><para>Additional places to search for helper executables. This is used, e.g., on Windows by the Python code, and on Mac OS by the bundled recoll.app
(because I could find no reliable way to tell launchd to set the PATH). The example below is for (because I could find no reliable way to tell launchd to set the PATH). The example below is for
Windows. Use ':' as entry separator for Mac and Ux-like systems, ';' is for Windows only.</para></listitem></varlistentry> Windows. Use ':' as entry separator for Mac and Ux-like systems, ';' is for Windows only.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.IDXABSMLEN"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.IDXABSMLEN">
<term><varname>idxabsmlen</varname></term> <term><varname>idxabsmlen</varname></term>
<listitem><para>Length of abstracts we store while indexing. Recoll stores an abstract for each indexed file. <listitem><para>Length of abstracts we store while indexing. Recoll stores an abstract for each indexed file.
@ -609,62 +686,72 @@ defines the size of the stored abstract. The default value is 250
bytes. The search interface gives you the choice to display this stored bytes. The search interface gives you the choice to display this stored
text or a synthetic abstract built by extracting text around the search text or a synthetic abstract built by extracting text around the search
terms. If you always prefer the synthetic abstract, you can reduce this terms. If you always prefer the synthetic abstract, you can reduce this
value and save a little space.</para></listitem></varlistentry> value and save a little space.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.IDXMETASTOREDLEN"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.IDXMETASTOREDLEN">
<term><varname>idxmetastoredlen</varname></term> <term><varname>idxmetastoredlen</varname></term>
<listitem><para>Truncation length of stored metadata fields. This <listitem><para>Truncation length of stored metadata fields. This
does not affect indexing (the whole field is processed anyway), just the does not affect indexing (the whole field is processed anyway), just the
amount of data stored in the index for the purpose of displaying fields amount of data stored in the index for the purpose of displaying fields
inside result lists or previews. The default value is 150 bytes which inside result lists or previews. The default value is 150 bytes which
may be too low if you have custom fields.</para></listitem></varlistentry> may be too low if you have custom fields.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.IDXTEXTTRUNCATELEN"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.IDXTEXTTRUNCATELEN">
<term><varname>idxtexttruncatelen</varname></term> <term><varname>idxtexttruncatelen</varname></term>
<listitem><para>Truncation length for all document texts. Only index <listitem><para>Truncation length for all document texts. Only index
the beginning of documents. This is not recommended except if you are the beginning of documents. This is not recommended except if you are
sure that the interesting keywords are at the top and have severe disk sure that the interesting keywords are at the top and have severe disk
space issues.</para></listitem></varlistentry> space issues.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.IDXSYNONYMS"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.IDXSYNONYMS">
<term><varname>idxsynonyms</varname></term> <term><varname>idxsynonyms</varname></term>
<listitem><para>Name of the index-time synonyms file. This is used for indexing multiword synonyms as single terms, <listitem><para>Name of the index-time synonyms file. This is used for indexing multiword synonyms as single terms,
which in turn is only useful if you want to perform proximity searches which in turn is only useful if you want to perform proximity searches
with such terms.</para></listitem></varlistentry> with such terms.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.ASPELLLANGUAGE"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.ASPELLLANGUAGE">
<term><varname>aspellLanguage</varname></term> <term><varname>aspellLanguage</varname></term>
<listitem><para>Language definitions to use when creating the aspell <listitem><para>Language definitions to use when creating the aspell
dictionary. The value must match a set of aspell language dictionary. The value must match a set of aspell language
definition files. You can type "aspell dicts" to see a list The default definition files. You can type "aspell dicts" to see a list The default
if this is not set is to use the NLS environment to guess the value. The if this is not set is to use the NLS environment to guess the value. The
values are the 2-letter language codes (e.g. 'en', 'fr'...)</para></listitem></varlistentry> values are the 2-letter language codes (e.g. 'en', 'fr'...)
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.ASPELLADDCREATEPARAM"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.ASPELLADDCREATEPARAM">
<term><varname>aspellAddCreateParam</varname></term> <term><varname>aspellAddCreateParam</varname></term>
<listitem><para>Additional option and parameter to aspell dictionary creation <listitem><para>Additional option and parameter to aspell dictionary creation
command. Some aspell packages may need an additional option command. Some aspell packages may need an additional option
(e.g. on Debian Jessie: --local-data-dir=/usr/lib/aspell). See Debian bug (e.g. on Debian Jessie: --local-data-dir=/usr/lib/aspell). See Debian bug
772415.</para></listitem></varlistentry> 772415.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.ASPELLKEEPSTDERR"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.ASPELLKEEPSTDERR">
<term><varname>aspellKeepStderr</varname></term> <term><varname>aspellKeepStderr</varname></term>
<listitem><para>Set this to have a look at aspell dictionary creation <listitem><para>Set this to have a look at aspell dictionary creation
errors. There are always many, so this is mostly for errors. There are always many, so this is mostly for
debugging.</para></listitem></varlistentry> debugging.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.NOASPELL"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.NOASPELL">
<term><varname>noaspell</varname></term> <term><varname>noaspell</varname></term>
<listitem><para>Disable aspell use. The aspell dictionary generation <listitem><para>Disable aspell use. The aspell dictionary generation
takes time, and some combinations of aspell version, language, and local takes time, and some combinations of aspell version, language, and local
terms, result in aspell crashing, so it sometimes makes sense to just terms, result in aspell crashing, so it sometimes makes sense to just
disable the thing.</para></listitem></varlistentry> disable the thing.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MONAUXINTERVAL"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MONAUXINTERVAL">
<term><varname>monauxinterval</varname></term> <term><varname>monauxinterval</varname></term>
<listitem><para>Auxiliary database update interval. The real time <listitem><para>Auxiliary database update interval. The real time
indexer only updates the auxiliary databases (stemdb, aspell) indexer only updates the auxiliary databases (stemdb, aspell)
periodically, because it would be too costly to do it for every document periodically, because it would be too costly to do it for every document
change. The default period is one hour.</para></listitem></varlistentry> change. The default period is one hour.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MONIXINTERVAL"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MONIXINTERVAL">
<term><varname>monixinterval</varname></term> <term><varname>monixinterval</varname></term>
<listitem><para>Minimum interval (seconds) between processings of the indexing <listitem><para>Minimum interval (seconds) between processings of the indexing
queue. The real time indexer does not process each event queue. The real time indexer does not process each event
when it comes in, but lets the queue accumulate, to diminish overhead and when it comes in, but lets the queue accumulate, to diminish overhead and
to aggregate multiple events affecting the same file. Default 30 to aggregate multiple events affecting the same file. Default 30
S.</para></listitem></varlistentry> S.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MONDELAYPATTERNS"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MONDELAYPATTERNS">
<term><varname>mondelaypatterns</varname></term> <term><varname>mondelaypatterns</varname></term>
<listitem><para>Timing parameters for the real time indexing. Definitions for files which get a longer delay before reindexing <listitem><para>Timing parameters for the real time indexing. Definitions for files which get a longer delay before reindexing
@ -673,21 +760,25 @@ reindexed once in a while. A list of wildcardPattern:seconds pairs. The
patterns are matched with fnmatch(pattern, path, 0) You can quote entries patterns are matched with fnmatch(pattern, path, 0) You can quote entries
containing white space with double quotes (quote the whole entry, not the containing white space with double quotes (quote the whole entry, not the
pattern). The default is empty. pattern). The default is empty.
Example: mondelaypatterns = *.log:20 "*with spaces.*:30"</para></listitem></varlistentry> Example: mondelaypatterns = *.log:20 "*with spaces.*:30"
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.IDXNICEPRIO"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.IDXNICEPRIO">
<term><varname>idxniceprio</varname></term> <term><varname>idxniceprio</varname></term>
<listitem><para>"nice" process priority for the indexing processes. Default: 19 <listitem><para>"nice" process priority for the indexing processes. Default: 19
(lowest) Appeared with 1.26.5. Prior versions were fixed at 19.</para></listitem></varlistentry> (lowest) Appeared with 1.26.5. Prior versions were fixed at 19.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MONIONICECLASS"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MONIONICECLASS">
<term><varname>monioniceclass</varname></term> <term><varname>monioniceclass</varname></term>
<listitem><para>ionice class for the indexing process. Despite the misleading name, and on platforms where this is <listitem><para>ionice class for the indexing process. Despite the misleading name, and on platforms where this is
supported, this affects all indexing processes, supported, this affects all indexing processes,
not only the real time/monitoring ones. The default value is 3 (use not only the real time/monitoring ones. The default value is 3 (use
lowest "Idle" priority).</para></listitem></varlistentry> lowest "Idle" priority).
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MONIONICECLASSDATA"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MONIONICECLASSDATA">
<term><varname>monioniceclassdata</varname></term> <term><varname>monioniceclassdata</varname></term>
<listitem><para>ionice class level parameter if the class supports it. The default is empty, as the default "Idle" class has no <listitem><para>ionice class level parameter if the class supports it. The default is empty, as the default "Idle" class has no
levels.</para></listitem></varlistentry> levels.
</para></listitem></varlistentry>
</variablelist></sect3> </variablelist></sect3>
<sect3 id="RCL.INSTALL.CONFIG.RECOLLCONF.QUERY"> <sect3 id="RCL.INSTALL.CONFIG.RECOLLCONF.QUERY">
<title>Query-time parameters (no impact on the index) </title><variablelist> <title>Query-time parameters (no impact on the index) </title><variablelist>
@ -696,7 +787,8 @@ levels.</para></listitem></varlistentry>
<listitem><para>auto-trigger diacritics sensitivity (raw index only). IF the index is not stripped, decide if we automatically trigger <listitem><para>auto-trigger diacritics sensitivity (raw index only). IF the index is not stripped, decide if we automatically trigger
diacritics sensitivity if the search term has accented characters (not in diacritics sensitivity if the search term has accented characters (not in
unac_except_trans). Else you need to use the query language and the "D" unac_except_trans). Else you need to use the query language and the "D"
modifier to specify diacritics sensitivity. Default is no.</para></listitem></varlistentry> modifier to specify diacritics sensitivity. Default is no.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.AUTOCASESENS"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.AUTOCASESENS">
<term><varname>autocasesens</varname></term> <term><varname>autocasesens</varname></term>
<listitem><para>auto-trigger case sensitivity (raw index only). IF <listitem><para>auto-trigger case sensitivity (raw index only). IF
@ -704,40 +796,46 @@ the index is not stripped (see indexStripChars), decide if we
automatically trigger character case sensitivity if the search term has automatically trigger character case sensitivity if the search term has
upper-case characters in any but the first position. Else you need to use upper-case characters in any but the first position. Else you need to use
the query language and the "C" modifier to specify character-case the query language and the "C" modifier to specify character-case
sensitivity. Default is yes.</para></listitem></varlistentry> sensitivity. Default is yes.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MAXTERMEXPAND"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MAXTERMEXPAND">
<term><varname>maxTermExpand</varname></term> <term><varname>maxTermExpand</varname></term>
<listitem><para>Maximum query expansion count <listitem><para>Maximum query expansion count
for a single term (e.g.: when using wildcards). This only for a single term (e.g.: when using wildcards). This only
affects queries, not indexing. We used to not limit this at all (except affects queries, not indexing. We used to not limit this at all (except
for filenames where the limit was too low at 1000), but it is for filenames where the limit was too low at 1000), but it is
unreasonable with a big index. Default 10000.</para></listitem></varlistentry> unreasonable with a big index. Default 10000.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MAXXAPIANCLAUSES"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MAXXAPIANCLAUSES">
<term><varname>maxXapianClauses</varname></term> <term><varname>maxXapianClauses</varname></term>
<listitem><para>Maximum number of clauses <listitem><para>Maximum number of clauses
we add to a single Xapian query. This only affects queries, we add to a single Xapian query. This only affects queries,
not indexing. In some cases, the result of term expansion can be not indexing. In some cases, the result of term expansion can be
multiplicative, and we want to avoid eating all the memory. Default multiplicative, and we want to avoid eating all the memory. Default
50000.</para></listitem></varlistentry> 50000.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.SNIPPETMAXPOSWALK"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.SNIPPETMAXPOSWALK">
<term><varname>snippetMaxPosWalk</varname></term> <term><varname>snippetMaxPosWalk</varname></term>
<listitem><para>Maximum number of positions we walk while populating a snippet for <listitem><para>Maximum number of positions we walk while populating a snippet for
the result list. The default of 1,000,000 may be the result list. The default of 1,000,000 may be
insufficient for very big documents, the consequence would be snippets insufficient for very big documents, the consequence would be snippets
with possibly meaning-altering missing words.</para></listitem></varlistentry> with possibly meaning-altering missing words.
</para></listitem></varlistentry>
</variablelist></sect3> </variablelist></sect3>
<sect3 id="RCL.INSTALL.CONFIG.RECOLLCONF.PDF"> <sect3 id="RCL.INSTALL.CONFIG.RECOLLCONF.PDF">
<title>Parameters for the PDF input script </title><variablelist> <title>Parameters for the PDF input script </title><variablelist>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.PDFOCR"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.PDFOCR">
<term><varname>pdfocr</varname></term> <term><varname>pdfocr</varname></term>
<listitem><para>Attempt OCR of PDF files with no text content. This can be defined in subdirectories. The default is off because <listitem><para>Attempt OCR of PDF files with no text content. This can be defined in subdirectories. The default is off because
OCR is so very slow.</para></listitem></varlistentry> OCR is so very slow.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.PDFATTACH"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.PDFATTACH">
<term><varname>pdfattach</varname></term> <term><varname>pdfattach</varname></term>
<listitem><para>Enable PDF attachment extraction by executing pdftk (if <listitem><para>Enable PDF attachment extraction by executing pdftk (if
available). This is available). This is
normally disabled, because it does slow down PDF indexing a bit even if normally disabled, because it does slow down PDF indexing a bit even if
not one attachment is ever found.</para></listitem></varlistentry> not one attachment is ever found.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.PDFEXTRAMETA"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.PDFEXTRAMETA">
<term><varname>pdfextrameta</varname></term> <term><varname>pdfextrameta</varname></term>
<listitem><para>Extract text from selected XMP metadata tags. This <listitem><para>Extract text from selected XMP metadata tags. This
@ -745,7 +843,8 @@ is a space-separated list of qualified XMP tag names. Each element can also
include a translation to a Recoll field name, separated by a '|' include a translation to a Recoll field name, separated by a '|'
character. If the second element is absent, the tag name is used as the character. If the second element is absent, the tag name is used as the
Recoll field names. You will also need to add specifications to the Recoll field names. You will also need to add specifications to the
"fields" file to direct processing of the extracted data.</para></listitem></varlistentry> "fields" file to direct processing of the extracted data.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.PDFEXTRAMETAFIX"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.PDFEXTRAMETAFIX">
<term><varname>pdfextrametafix</varname></term> <term><varname>pdfextrametafix</varname></term>
<listitem><para>Define name of XMP field editing script. This <listitem><para>Define name of XMP field editing script. This
@ -754,7 +853,8 @@ values. The script should define a 'MetaFixer' class with a metafix()
method which will be called with the qualified tag name and value of each method which will be called with the qualified tag name and value of each
selected field, for editing or erasing. A new instance is created for selected field, for editing or erasing. A new instance is created for
each document, so that the object can keep state for, e.g. eliminating each document, so that the object can keep state for, e.g. eliminating
duplicate values.</para></listitem></varlistentry> duplicate values.
</para></listitem></varlistentry>
</variablelist></sect3> </variablelist></sect3>
<sect3 id="RCL.INSTALL.CONFIG.RECOLLCONF.OCR"> <sect3 id="RCL.INSTALL.CONFIG.RECOLLCONF.OCR">
<title>Parameters for OCR processing </title><variablelist> <title>Parameters for OCR processing </title><variablelist>
@ -766,17 +866,20 @@ the input file. Modules for tesseract (tesseract) and ABBYY FineReader
(abbyy) are present in the standard distribution. For compatibility with (abbyy) are present in the standard distribution. For compatibility with
the previous version, if this is not defined at all, the default value is the previous version, if this is not defined at all, the default value is
"tesseract". Use an explicit empty value if needed. A value of "abbyy "tesseract". Use an explicit empty value if needed. A value of "abbyy
tesseract" will try everything.</para></listitem></varlistentry> tesseract" will try everything.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.OCRCACHEDIR"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.OCRCACHEDIR">
<term><varname>ocrcachedir</varname></term> <term><varname>ocrcachedir</varname></term>
<listitem><para>Location for caching OCR data. The default if this is empty or undefined is to store the cached <listitem><para>Location for caching OCR data. The default if this is empty or undefined is to store the cached
OCR data under $RECOLL_CONFDIR/ocrcache.</para></listitem></varlistentry> OCR data under $RECOLL_CONFDIR/ocrcache.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.TESSERACTLANG"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.TESSERACTLANG">
<term><varname>tesseractlang</varname></term> <term><varname>tesseractlang</varname></term>
<listitem><para>Language to assume for tesseract OCR. Important for improving the OCR accuracy. This can also be set <listitem><para>Language to assume for tesseract OCR. Important for improving the OCR accuracy. This can also be set
through the contents of a file in through the contents of a file in
the currently processed directory. See the rclocrtesseract.py the currently processed directory. See the rclocrtesseract.py
script. Example values: eng, fra... See the tesseract documentation.</para></listitem></varlistentry> script. Example values: eng, fra... See the tesseract documentation.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.TESSERACTCMD"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.TESSERACTCMD">
<term><varname>tesseractcmd</varname></term> <term><varname>tesseractcmd</varname></term>
<listitem><para>Path for the tesseract command. Do not quote. This is mostly useful on Windows, or for specifying a non-default <listitem><para>Path for the tesseract command. Do not quote. This is mostly useful on Windows, or for specifying a non-default
@ -800,6 +903,7 @@ script. Typical values: English, French... See the ABBYY documentation.
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MHMBOXQUIRKS"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MHMBOXQUIRKS">
<term><varname>mhmboxquirks</varname></term> <term><varname>mhmboxquirks</varname></term>
<listitem><para>Enable thunderbird/mozilla-seamonkey mbox format quirks Set this for the directory where the email mbox files are <listitem><para>Enable thunderbird/mozilla-seamonkey mbox format quirks Set this for the directory where the email mbox files are
stored.</para></listitem></varlistentry> stored.
</para></listitem></varlistentry>
</variablelist></sect3> </variablelist></sect3>
</sect2> </sect2>

View File

@ -8929,24 +8929,26 @@ hasextract = False
White space separated list of wildcard patterns White space separated list of wildcard patterns
(simple ones, not paths, must contain no '/' (simple ones, not paths, must contain no '/'
characters), which will be tested against file characters), which will be tested against file
and directory names. Have a look at the default and directory names.</p>
configuration for the initial value, some entries <p>Have a look at the default configuration for
may not suit your situation. The easiest way to the initial value, some entries may not suit your
see it is through the GUI Index configuration situation. The easiest way to see it is through
"local parameters" panel. The list in the default the GUI Index configuration "local parameters"
configuration does not exclude hidden directories panel.</p>
(names beginning with a dot), which means that it <p>The list in the default configuration does not
may index quite a few things that you do not exclude hidden directories (names beginning with
want. On the other hand, email user agents like a dot), which means that it may index quite a few
Thunderbird usually store messages in hidden things that you do not want. On the other hand,
directories, and you probably want this indexed. email user agents like Thunderbird usually store
One possible solution is to have ".*" in messages in hidden directories, and you probably
"skippedNames", and add things like want this indexed. One possible solution is to
"~/.thunderbird" "~/.evolution" to "topdirs". Not have ".*" in "skippedNames", and add things like
even the file names are indexed for patterns in "~/.thunderbird" "~/.evolution" to "topdirs".</p>
this list, see the "noContentSuffixes" variable <p>Not even the file names are indexed for
for an alternative approach which indexes the patterns in this list, see the
file names. Can be redefined for any subtree.</p> "noContentSuffixes" variable for an alternative
approach which indexes the file names. Can be
redefined for any subtree.</p>
</dd> </dd>
<dt><a name= <dt><a name=
"RCL.INSTALL.CONFIG.RECOLLCONF.SKIPPEDNAMES-" id= "RCL.INSTALL.CONFIG.RECOLLCONF.SKIPPEDNAMES-" id=
@ -9013,23 +9015,33 @@ hasextract = False
<dd> <dd>
<p>Absolute paths we should not go into. <p>Absolute paths we should not go into.
Space-separated list of wildcard expressions for Space-separated list of wildcard expressions for
absolute filesystem paths. Must be defined at the absolute filesystem paths (for files or
directories). The variable must be defined at the
top level of the configuration file, not in a top level of the configuration file, not in a
subsection. Can contain files and directories. subsection.</p>
The database and configuration directories will <p>Any value in the list must be textually
automatically be added. The expressions are consistent with the values in topdirs, no
matched using 'fnmatch(3)' with the FNM_PATHNAME attempts are made to resolve symbolic links. In
flag set by default. This means that '/' practise, if, as is frequently the case, /home is
characters must be matched explicitly. You can a link to /usr/home, your default topdirs will
set 'skippedPathsFnmPathname' to 0 to disable the have a single entry '~' which will be translated
use of FNM_PATHNAME (meaning that '/*/dir3' will to '/home/yourlogin'. In this case, any
match '/dir1/dir2/dir3'). The default value skippedPaths entry should start with
contains the usual mount point for removable '/home/yourlogin' *not* with
media to remind you that it is a bad idea to have '/usr/home/yourlogin'.</p>
Recoll work on these (esp. with the monitor: <p>The index and configuration directories will
media gets indexed on mount, all data gets erased automatically be added to the list.</p>
on unmount). Explicitly adding '/media/xxx' to <p>The expressions are matched using 'fnmatch(3)'
the 'topdirs' variable will override this.</p> with the FNM_PATHNAME flag set by default. This
means that '/' characters must be matched
explicitly. You can set 'skippedPathsFnmPathname'
to 0 to disable the use of FNM_PATHNAME (meaning
that '/*/dir3' will match '/dir1/dir2/dir3').</p>
<p>The default value contains the usual mount
point for removable media to remind you that it
is in most cases a bad idea to have Recoll work
on these Explicitly adding '/media/xxx' to the
'topdirs' variable will override this.</p>
</dd> </dd>
<dt><a name= <dt><a name=
"RCL.INSTALL.CONFIG.RECOLLCONF.SKIPPEDPATHSFNMPATHNAME" "RCL.INSTALL.CONFIG.RECOLLCONF.SKIPPEDPATHSFNMPATHNAME"
@ -9271,36 +9283,37 @@ hasextract = False
<p>Decide if we store the documents' text content <p>Decide if we store the documents' text content
in the index. Storing the text allows extracting in the index. Storing the text allows extracting
snippets from it at query time, instead of snippets from it at query time, instead of
building them from index position data. Newer building them from index position data.</p>
Xapian index formats have rendered our use of <p>Newer Xapian index formats have rendered our
positions list unacceptably slow in some cases. use of positions list unacceptably slow in some
The last Xapian index format with good cases. The last Xapian index format with good
performance for the old method is Chert, which is performance for the old method is Chert, which is
default for 1.2, still supported but not default default for 1.2, still supported but not default
in 1.4 and will be dropped in 1.6. The stored in 1.4 and will be dropped in 1.6.</p>
document text is translated from its original <p>The stored document text is translated from
format to UTF-8 plain text, but not stripped of its original format to UTF-8 plain text, but not
upper-case, diacritics, or punctuation signs. stripped of upper-case, diacritics, or
Storing it increases the index size by 10-20% punctuation signs. Storing it increases the index
typically, but also allows for nicer snippets, so size by 10-20% typically, but also allows for
it may be worth enabling it even if not strictly nicer snippets, so it may be worth enabling it
needed for performance if you can afford the even if not strictly needed for performance if
space. The variable only has an effect when you can afford the space.</p>
creating an index, meaning that the xapiandb <p>The variable only has an effect when creating
directory must not exist yet. Its exact effect an index, meaning that the xapiandb directory
depends on the Xapian version. For Xapian 1.4, if must not exist yet. Its exact effect depends on
the variable is set to 0, the Chert format will the Xapian version.</p>
be used, and the text will not be stored. If the <p>For Xapian 1.4, if the variable is set to 0,
variable is 1, Glass will be used, and the text the Chert format will be used, and the text will
stored. For Xapian 1.2, and for versions after not be stored. If the variable is 1, Glass will
1.5 and newer, the index format is always the be used, and the text stored.</p>
default, but the variable controls if the text is <p>For Xapian 1.2, and for versions after 1.5 and
stored or not, and the abstract generation newer, the index format is always the default,
method. With Xapian 1.5 and later, and the but the variable controls if the text is stored
variable set to 0, abstract generation may be or not, and the abstract generation method. With
very slow, but this setting may still be useful Xapian 1.5 and later, and the variable set to 0,
to save space if you do not use abstract abstract generation may be very slow, but this
generation at all.</p> setting may still be useful to save space if you
do not use abstract generation at all.</p>
</dd> </dd>
<dt><a name= <dt><a name=
"RCL.INSTALL.CONFIG.RECOLLCONF.NONUMBERS" id= "RCL.INSTALL.CONFIG.RECOLLCONF.NONUMBERS" id=
@ -9425,19 +9438,23 @@ hasextract = False
should be specified, as appartenance to the list should be specified, as appartenance to the list
will turn-off both standard accent and case will turn-off both standard accent and case
processing. The value is global and affects both processing. The value is global and affects both
indexing and querying. Examples: Swedish: indexing and querying. We also convert a few
unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe confusing Unicode characters (quotes, hyphen) to
æae Æae ffff fifi flfl åå Åå . German: their ASCII equivalent to avoid "invisible"
unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe search failures.</p>
æae Æae ffff fifi flfl . French: you probably want <p>Examples: Swedish: unac_except_trans = ää Ää
to decompose oe and ae and nobody would type a öö Öö üü Üü ßss œoe Œoe æae Æae ffff fifi flfl åå Åå
German ß unac_except_trans = ßss œoe Œoe æae Æae ' ❜' ʼ' - . German: unac_except_trans = ää Ää
ffff fifi flfl . The default for all until someone öö Öö üü Üü ßss œoe Œoe æae Æae ffff fifi flfl ' ❜'
protests follows. These decompositions are not ʼ' - . French: you probably want to decompose oe
performed by unac, but it is unlikely that and ae and nobody would type a German ß
someone would type the composed forms in a unac_except_trans = ßss œoe Œoe æae Æae ffff fifi
flfl ' ❜' ʼ' - . The default for all until
someone protests follows. These decompositions
are not performed by unac, but it is unlikely
that someone would type the composed forms in a
search. unac_except_trans = ßss œoe Œoe æae Æae search. unac_except_trans = ßss œoe Œoe æae Æae
ffff fifi flfl</p> ffff fifi flfl ' ❜' ʼ' -</p>
</dd> </dd>
<dt><a name= <dt><a name=
"RCL.INSTALL.CONFIG.RECOLLCONF.MAILDEFCHARSET" id= "RCL.INSTALL.CONFIG.RECOLLCONF.MAILDEFCHARSET" id=

View File

@ -33,16 +33,21 @@ topdirs = ~
# <brief>Files and directories which should be ignored.</brief> # <brief>Files and directories which should be ignored.</brief>
# #
# <descr> White space separated list of wildcard patterns (simple ones, not paths, must contain no # <descr> White space separated list of wildcard patterns (simple ones, not paths, must contain no
# '/' characters), which will be tested against file and directory names. Have a look at the default # '/' characters), which will be tested against file and directory names.
# configuration for the initial value, some entries may not suit your situation. The easiest way to #
# see it is through the GUI Index configuration "local parameters" panel. The list in the default # Have a look at the default configuration for the initial value, some entries may not suit your
# configuration does not exclude hidden directories (names beginning with a dot), which means that # situation. The easiest way to see it is through the GUI Index configuration "local parameters"
# it may index quite a few things that you do not want. On the other hand, email user agents like # panel.
# Thunderbird usually store messages in hidden directories, and you probably want this indexed. One #
# possible solution is to have ".*" in "skippedNames", and add things like "~/.thunderbird" # The list in the default configuration does not exclude hidden directories (names beginning with a
# "~/.evolution" to "topdirs". Not even the file names are indexed for patterns in this list, see # dot), which means that it may index quite a few things that you do not want. On the other hand,
# the "noContentSuffixes" variable for an alternative approach which indexes the file names. Can be # email user agents like Thunderbird usually store messages in hidden directories, and you probably
# redefined for any subtree.</descr> # want this indexed. One possible solution is to have ".*" in "skippedNames", and add things like
# "~/.thunderbird" "~/.evolution" to "topdirs".
#
# Not even the file names are indexed for patterns in this list, see the "noContentSuffixes"
# variable for an alternative approach which indexes the file names. Can be redefined for any
# subtree.</descr>
# #
#</var> #</var>
skippedNames = #* CVS Cache cache* .cache caughtspam tmp \ skippedNames = #* CVS Cache cache* .cache caughtspam tmp \
@ -104,19 +109,26 @@ noContentSuffixes+ =
# <var name="skippedPaths" type="string"> # <var name="skippedPaths" type="string">
# #
# <brief>Absolute paths we should not go into.</brief> # <brief>Absolute paths we should not go into.</brief>
# <descr>Space-separated list of wildcard expressions for absolute #
# filesystem paths. Must be defined at the top level of the configuration # <descr>Space-separated list of wildcard expressions for absolute filesystem paths (for files or
# file, not in a subsection. Can contain files and directories. The database and # directories). The variable must be defined at the top level of the configuration file, not in a
# configuration directories will automatically be added. The expressions # subsection.
# are matched using 'fnmatch(3)' with the FNM_PATHNAME flag set by #
# default. This means that '/' characters must be matched explicitly. You # Any value in the list must be textually consistent with the values in topdirs, no attempts are
# can set 'skippedPathsFnmPathname' to 0 to disable the use of FNM_PATHNAME # made to resolve symbolic links. In practise, if, as is frequently the case, /home is a link to
# (meaning that '/*/dir3' will match '/dir1/dir2/dir3'). The default value # /usr/home, your default topdirs will have a single entry '~' which will be translated to
# contains the usual mount point for removable media to remind you that it # '/home/yourlogin'. In this case, any skippedPaths entry should start with '/home/yourlogin' *not*
# is a bad idea to have Recoll work on these (esp. with the monitor: media # with '/usr/home/yourlogin'.
# gets indexed on mount, all data gets erased on unmount). Explicitly #
# adding '/media/xxx' to the 'topdirs' variable will override # The index and configuration directories will automatically be added to the list.
# this.</descr></var> #
# The expressions are matched using 'fnmatch(3)' with the FNM_PATHNAME flag set by default. This
# means that '/' characters must be matched explicitly. You can set 'skippedPathsFnmPathname' to 0
# to disable the use of FNM_PATHNAME (meaning that '/*/dir3' will match '/dir1/dir2/dir3').
#
# The default value contains the usual mount point for removable media to remind you that it is in
# most cases a bad idea to have Recoll work on these Explicitly adding '/media/xxx' to the 'topdirs'
# variable will override this.</descr></var>
skippedPaths = /media skippedPaths = /media
# <var name="skippedPathsFnmPathname" type="bool"><brief>Set to 0 to # <var name="skippedPathsFnmPathname" type="bool"><brief>Set to 0 to