updated recoll.conf doc

This commit is contained in:
Jean-Francois Dockes 2019-02-04 15:47:14 +01:00
parent 541c407033
commit c7b2587f40
3 changed files with 101 additions and 34 deletions

View File

@ -126,15 +126,14 @@ types. Lets you exclude some types from indexing. MIME type
names should be taken from the mimemap file (the values may be different
from xdg-mime or file -i output in some cases) Can be redefined for
subtrees.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.NOMD5MIMETYPES">
<term><varname>nomd5mimetypes</varname></term>
<listitem><para>Don't compute md5 for
these types. md5 checksums are used only for deduplicating
results, and can be very expensive to compute on multimedia or other big
files. This list lets you turn off md5 computation for selected types. It
is global (no redefinition for subtrees). At the moment, it only has an
effect for external handlers (exec and execm). The file types can be
specified by listing either MIME types (e.g. audio/mpeg) or handler names
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.NOMD5TYPES">
<term><varname>nomd5types</varname></term>
<listitem><para>Don't compute md5 for these types. md5 checksums are used only for deduplicating results, and can be
very expensive to compute on multimedia or other big files. This list
lets you turn off md5 computation for selected types. It is global (no
redefinition for subtrees). At the moment, it only has an effect for
external handlers (exec and execm). The file types can be specified by
listing either MIME types (e.g. audio/mpeg) or handler names
(e.g. rclaudio).</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.COMPRESSEDFILEMAXKBS">
<term><varname>compressedfilemaxkbs</varname></term>
@ -244,6 +243,15 @@ for a subtree.</para></listitem></varlistentry>
'coworker' also when the input is 'co-worker'. This is new
in version 1.22, and on by default. Setting the variable to off allows
restoring the previous behaviour.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.BACKSLASHASLETTER">
<term><varname>backslashasletter</varname></term>
<listitem><para>Process backslash as normal letter This may make sense for people wanting to index TeX commands as
such but is not of much general use.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MAXTERMLENGTH">
<term><varname>maxtermlength</varname></term>
<listitem><para>Maximum term length. Words longer than this will be discarded.
The default is 40 and used to be hard-coded, but it can now be
adjusted. You need an index reset if you change the value.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.NOCJK">
<term><varname>nocjk</varname></term>
<listitem><para>Decides if specific East Asian
@ -371,8 +379,8 @@ subpath under cachedir.</para></listitem></varlistentry>
over which we stop indexing. The value is a percentage,
corresponding to what the "Capacity" df output column shows. The default
value is 0, meaning no checking.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.XAPIANDB">
<term><varname>xapiandb</varname></term>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.DBDIR">
<term><varname>dbdir</varname></term>
<listitem><para>Xapian database directory
location. This will be created on first indexing. If the
value is not an absolute path, it will be interpreted as relative to
@ -447,8 +455,8 @@ $XAPIAN_FLUSH_THRESHOLD documents created, modified or deleted: as memory
usage depends on average document size, not only document count, the
Xapian approach is is not very useful, and you should let Recoll manage
the flushes. The program compiled value is 0. The configured default
value (from this file) is 10 MB, and will be too low in many cases (it is
chosen to conserve memory). If you are looking
value (from this file) is now 50 MB, and should be ok in many cases.
You can set it as low as 10 to conserve memory, but if you are looking
for maximum speed, you may want to experiment with values between 20 and
200. In my experience, values beyond this are always counterproductive. If
you find otherwise, please drop me a note.</para></listitem></varlistentry>
@ -677,6 +685,11 @@ with possibly meaning-altering missing words.</para></listitem></varlistentry>
<listitem><para>Attempt OCR of PDF files with no text content if both tesseract and
pdftoppm are installed. The default is off because OCR is so
very slow.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.PDFOCRLANG">
<term><varname>pdfocrlang</varname></term>
<listitem><para>Language to assume for PDF OCR. This is very important for having a reasonable rate of errors
with tesseract. This can also be set through a configuration variable
or directory-local parameters. See the rclpdf.py script.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.PDFATTACH">
<term><varname>pdfattach</varname></term>
<listitem><para>Enable PDF attachment extraction by executing pdftk (if

View File

@ -8300,8 +8300,8 @@ for i in range(nres):
cases) Can be redefined for subtrees.</p>
</dd>
<dt><a name=
"RCL.INSTALL.CONFIG.RECOLLCONF.NOMD5MIMETYPES" id=
"RCL.INSTALL.CONFIG.RECOLLCONF.NOMD5MIMETYPES"></a><span class="term"><code class="varname">nomd5mimetypes</code></span></dt>
"RCL.INSTALL.CONFIG.RECOLLCONF.NOMD5TYPES" id=
"RCL.INSTALL.CONFIG.RECOLLCONF.NOMD5TYPES"></a><span class="term"><code class="varname">nomd5types</code></span></dt>
<dd>
<p>Don't compute md5 for these types. md5
checksums are used only for deduplicating
@ -8496,6 +8496,25 @@ for i in range(nres):
1.22, and on by default. Setting the variable to
off allows restoring the previous behaviour.</p>
</dd>
<dt><a name=
"RCL.INSTALL.CONFIG.RECOLLCONF.BACKSLASHASLETTER"
id=
"RCL.INSTALL.CONFIG.RECOLLCONF.BACKSLASHASLETTER"></a><span class="term"><code class="varname">backslashasletter</code></span></dt>
<dd>
<p>Process backslash as normal letter This may
make sense for people wanting to index TeX
commands as such but is not of much general
use.</p>
</dd>
<dt><a name=
"RCL.INSTALL.CONFIG.RECOLLCONF.MAXTERMLENGTH" id=
"RCL.INSTALL.CONFIG.RECOLLCONF.MAXTERMLENGTH"></a><span class="term"><code class="varname">maxtermlength</code></span></dt>
<dd>
<p>Maximum term length. Words longer than this
will be discarded. The default is 40 and used to
be hard-coded, but it can now be adjusted. You
need an index reset if you change the value.</p>
</dd>
<dt><a name="RCL.INSTALL.CONFIG.RECOLLCONF.NOCJK"
id=
"RCL.INSTALL.CONFIG.RECOLLCONF.NOCJK"></a><span class="term"><code class="varname">nocjk</code></span></dt>
@ -8696,9 +8715,9 @@ for i in range(nres):
column shows. The default value is 0, meaning no
checking.</p>
</dd>
<dt><a name=
"RCL.INSTALL.CONFIG.RECOLLCONF.XAPIANDB" id=
"RCL.INSTALL.CONFIG.RECOLLCONF.XAPIANDB"></a><span class="term"><code class="varname">xapiandb</code></span></dt>
<dt><a name="RCL.INSTALL.CONFIG.RECOLLCONF.DBDIR"
id=
"RCL.INSTALL.CONFIG.RECOLLCONF.DBDIR"></a><span class="term"><code class="varname">dbdir</code></span></dt>
<dd>
<p>Xapian database directory location. This will
be created on first indexing. If the value is not
@ -8840,13 +8859,13 @@ for i in range(nres):
the Xapian approach is is not very useful, and
you should let Recoll manage the flushes. The
program compiled value is 0. The configured
default value (from this file) is 10 MB, and will
be too low in many cases (it is chosen to
conserve memory). If you are looking for maximum
speed, you may want to experiment with values
between 20 and 200. In my experience, values
beyond this are always counterproductive. If you
find otherwise, please drop me a note.</p>
default value (from this file) is now 50 MB, and
should be ok in many cases. You can set it as low
as 10 to conserve memory, but if you are looking
for maximum speed, you may want to experiment
with values between 20 and 200. In my experience,
values beyond this are always counterproductive.
If you find otherwise, please drop me a note.</p>
</dd>
<dt><a name=
"RCL.INSTALL.CONFIG.RECOLLCONF.FILTERMAXSECONDS"
@ -9285,6 +9304,16 @@ for i in range(nres):
default is off because OCR is so very slow.</p>
</dd>
<dt><a name=
"RCL.INSTALL.CONFIG.RECOLLCONF.PDFOCRLANG" id=
"RCL.INSTALL.CONFIG.RECOLLCONF.PDFOCRLANG"></a><span class="term"><code class="varname">pdfocrlang</code></span></dt>
<dd>
<p>Language to assume for PDF OCR. This is very
important for having a reasonable rate of errors
with tesseract. This can also be set through a
configuration variable or directory-local
parameters. See the rclpdf.py script.</p>
</dd>
<dt><a name=
"RCL.INSTALL.CONFIG.RECOLLCONF.PDFATTACH" id=
"RCL.INSTALL.CONFIG.RECOLLCONF.PDFATTACH"></a><span class="term"><code class="varname">pdfattach</code></span></dt>
<dd>

View File

@ -168,14 +168,16 @@ skippedPaths = /media
# subtrees.</descr></var>
#excludedmimetypes =
# <var name="nomd5mimetypes" type="string"><brief>Don't compute md5 for
# these types.</brief><descr>md5 checksums are used only for deduplicating
# results, and can be very expensive to compute on multimedia or other big
# files. This list lets you turn off md5 computation for selected types. It
# is global (no redefinition for subtrees). At the moment, it only has an
# effect for external handlers (exec and execm). The file types can be
# specified by listing either MIME types (e.g. audio/mpeg) or handler names
# (e.g. rclaudio).</descr></var>
# <var name="nomd5types" type="string">
# <brief>Don't compute md5 for these types.</brief>
# <descr>md5 checksums are used only for deduplicating results, and can be
# very expensive to compute on multimedia or other big files. This list
# lets you turn off md5 computation for selected types. It is global (no
# redefinition for subtrees). At the moment, it only has an effect for
# external handlers (exec and execm). The file types can be specified by
# listing either MIME types (e.g. audio/mpeg) or handler names
# (e.g. rclaudio).</descr>
# </var>
nomd5types = rclaudio
# <var name="compressedfilemaxkbs" type="int"><brief>Size limit for compressed
@ -299,6 +301,21 @@ indexStoreDocText = 1
# restoring the previous behaviour.</descr></var>
#dehyphenate = 1
# <var name="backslashasletter" type="bool">
# <brief>Process backslash as normal letter</brief>
# <descr>This may make sense for people wanting to index TeX commands as
# such but is not of much general use.</descr>
# </var>
#backslashasletter = 0
# <var name="maxtermlength" type="int" values="10 200 40">
# <brief>Maximum term length.</brief>
# <descr>Words longer than this will be discarded.
# The default is 40 and used to be hard-coded, but it can now be
# adjusted. You need an index reset if you change the value.</descr>
# </var>
#maxtermlength = 40
# <var name="nocjk" type="bool"><brief>Decides if specific East Asian
# (Chinese Korean Japanese) characters/word splitting is turned
# off.</brief><descr>This will save a small amount of CPU if you have no CJK
@ -435,7 +452,7 @@ noxattrfields = 0
# value is 0, meaning no checking.</descr></var>
maxfsoccuppc = 0
# <var name="xapiandb" type="dfn"><brief>Xapian database directory
# <var name="dbdir" type="dfn"><brief>Xapian database directory
# location.</brief><descr>This will be created on first indexing. If the
# value is not an absolute path, it will be interpreted as relative to
# cachedir if set, or the configuration directory (-c argument or
@ -837,6 +854,14 @@ snippetMaxPosWalk = 1000000
# very slow.</descr></var>
#pdfocr = 0
# <var name="pdfocrlang" type="string">
# <brief>Language to assume for PDF OCR.</brief>
# <descr>This is very important for having a reasonable rate of errors
# with tesseract. This can also be set through a configuration variable
# or directory-local parameters. See the rclpdf.py script.</descr>
# </var>
#pdfocrlang = eng
# <var name="pdfattach" type="bool">
#
# <brief>Enable PDF attachment extraction by executing pdftk (if