updated recoll.conf doc

This commit is contained in:
Jean-Francois Dockes 2019-02-04 15:47:14 +01:00
parent 541c407033
commit c7b2587f40
3 changed files with 101 additions and 34 deletions

View File

@ -126,15 +126,14 @@ types. Lets you exclude some types from indexing. MIME type
names should be taken from the mimemap file (the values may be different names should be taken from the mimemap file (the values may be different
from xdg-mime or file -i output in some cases) Can be redefined for from xdg-mime or file -i output in some cases) Can be redefined for
subtrees.</para></listitem></varlistentry> subtrees.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.NOMD5MIMETYPES"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.NOMD5TYPES">
<term><varname>nomd5mimetypes</varname></term> <term><varname>nomd5types</varname></term>
<listitem><para>Don't compute md5 for <listitem><para>Don't compute md5 for these types. md5 checksums are used only for deduplicating results, and can be
these types. md5 checksums are used only for deduplicating very expensive to compute on multimedia or other big files. This list
results, and can be very expensive to compute on multimedia or other big lets you turn off md5 computation for selected types. It is global (no
files. This list lets you turn off md5 computation for selected types. It redefinition for subtrees). At the moment, it only has an effect for
is global (no redefinition for subtrees). At the moment, it only has an external handlers (exec and execm). The file types can be specified by
effect for external handlers (exec and execm). The file types can be listing either MIME types (e.g. audio/mpeg) or handler names
specified by listing either MIME types (e.g. audio/mpeg) or handler names
(e.g. rclaudio).</para></listitem></varlistentry> (e.g. rclaudio).</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.COMPRESSEDFILEMAXKBS"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.COMPRESSEDFILEMAXKBS">
<term><varname>compressedfilemaxkbs</varname></term> <term><varname>compressedfilemaxkbs</varname></term>
@ -244,6 +243,15 @@ for a subtree.</para></listitem></varlistentry>
'coworker' also when the input is 'co-worker'. This is new 'coworker' also when the input is 'co-worker'. This is new
in version 1.22, and on by default. Setting the variable to off allows in version 1.22, and on by default. Setting the variable to off allows
restoring the previous behaviour.</para></listitem></varlistentry> restoring the previous behaviour.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.BACKSLASHASLETTER">
<term><varname>backslashasletter</varname></term>
<listitem><para>Process backslash as normal letter This may make sense for people wanting to index TeX commands as
such but is not of much general use.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MAXTERMLENGTH">
<term><varname>maxtermlength</varname></term>
<listitem><para>Maximum term length. Words longer than this will be discarded.
The default is 40 and used to be hard-coded, but it can now be
adjusted. You need an index reset if you change the value.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.NOCJK"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.NOCJK">
<term><varname>nocjk</varname></term> <term><varname>nocjk</varname></term>
<listitem><para>Decides if specific East Asian <listitem><para>Decides if specific East Asian
@ -371,8 +379,8 @@ subpath under cachedir.</para></listitem></varlistentry>
over which we stop indexing. The value is a percentage, over which we stop indexing. The value is a percentage,
corresponding to what the "Capacity" df output column shows. The default corresponding to what the "Capacity" df output column shows. The default
value is 0, meaning no checking.</para></listitem></varlistentry> value is 0, meaning no checking.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.XAPIANDB"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.DBDIR">
<term><varname>xapiandb</varname></term> <term><varname>dbdir</varname></term>
<listitem><para>Xapian database directory <listitem><para>Xapian database directory
location. This will be created on first indexing. If the location. This will be created on first indexing. If the
value is not an absolute path, it will be interpreted as relative to value is not an absolute path, it will be interpreted as relative to
@ -447,8 +455,8 @@ $XAPIAN_FLUSH_THRESHOLD documents created, modified or deleted: as memory
usage depends on average document size, not only document count, the usage depends on average document size, not only document count, the
Xapian approach is is not very useful, and you should let Recoll manage Xapian approach is is not very useful, and you should let Recoll manage
the flushes. The program compiled value is 0. The configured default the flushes. The program compiled value is 0. The configured default
value (from this file) is 10 MB, and will be too low in many cases (it is value (from this file) is now 50 MB, and should be ok in many cases.
chosen to conserve memory). If you are looking You can set it as low as 10 to conserve memory, but if you are looking
for maximum speed, you may want to experiment with values between 20 and for maximum speed, you may want to experiment with values between 20 and
200. In my experience, values beyond this are always counterproductive. If 200. In my experience, values beyond this are always counterproductive. If
you find otherwise, please drop me a note.</para></listitem></varlistentry> you find otherwise, please drop me a note.</para></listitem></varlistentry>
@ -677,6 +685,11 @@ with possibly meaning-altering missing words.</para></listitem></varlistentry>
<listitem><para>Attempt OCR of PDF files with no text content if both tesseract and <listitem><para>Attempt OCR of PDF files with no text content if both tesseract and
pdftoppm are installed. The default is off because OCR is so pdftoppm are installed. The default is off because OCR is so
very slow.</para></listitem></varlistentry> very slow.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.PDFOCRLANG">
<term><varname>pdfocrlang</varname></term>
<listitem><para>Language to assume for PDF OCR. This is very important for having a reasonable rate of errors
with tesseract. This can also be set through a configuration variable
or directory-local parameters. See the rclpdf.py script.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.PDFATTACH"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.PDFATTACH">
<term><varname>pdfattach</varname></term> <term><varname>pdfattach</varname></term>
<listitem><para>Enable PDF attachment extraction by executing pdftk (if <listitem><para>Enable PDF attachment extraction by executing pdftk (if

View File

@ -8300,8 +8300,8 @@ for i in range(nres):
cases) Can be redefined for subtrees.</p> cases) Can be redefined for subtrees.</p>
</dd> </dd>
<dt><a name= <dt><a name=
"RCL.INSTALL.CONFIG.RECOLLCONF.NOMD5MIMETYPES" id= "RCL.INSTALL.CONFIG.RECOLLCONF.NOMD5TYPES" id=
"RCL.INSTALL.CONFIG.RECOLLCONF.NOMD5MIMETYPES"></a><span class="term"><code class="varname">nomd5mimetypes</code></span></dt> "RCL.INSTALL.CONFIG.RECOLLCONF.NOMD5TYPES"></a><span class="term"><code class="varname">nomd5types</code></span></dt>
<dd> <dd>
<p>Don't compute md5 for these types. md5 <p>Don't compute md5 for these types. md5
checksums are used only for deduplicating checksums are used only for deduplicating
@ -8496,6 +8496,25 @@ for i in range(nres):
1.22, and on by default. Setting the variable to 1.22, and on by default. Setting the variable to
off allows restoring the previous behaviour.</p> off allows restoring the previous behaviour.</p>
</dd> </dd>
<dt><a name=
"RCL.INSTALL.CONFIG.RECOLLCONF.BACKSLASHASLETTER"
id=
"RCL.INSTALL.CONFIG.RECOLLCONF.BACKSLASHASLETTER"></a><span class="term"><code class="varname">backslashasletter</code></span></dt>
<dd>
<p>Process backslash as normal letter This may
make sense for people wanting to index TeX
commands as such but is not of much general
use.</p>
</dd>
<dt><a name=
"RCL.INSTALL.CONFIG.RECOLLCONF.MAXTERMLENGTH" id=
"RCL.INSTALL.CONFIG.RECOLLCONF.MAXTERMLENGTH"></a><span class="term"><code class="varname">maxtermlength</code></span></dt>
<dd>
<p>Maximum term length. Words longer than this
will be discarded. The default is 40 and used to
be hard-coded, but it can now be adjusted. You
need an index reset if you change the value.</p>
</dd>
<dt><a name="RCL.INSTALL.CONFIG.RECOLLCONF.NOCJK" <dt><a name="RCL.INSTALL.CONFIG.RECOLLCONF.NOCJK"
id= id=
"RCL.INSTALL.CONFIG.RECOLLCONF.NOCJK"></a><span class="term"><code class="varname">nocjk</code></span></dt> "RCL.INSTALL.CONFIG.RECOLLCONF.NOCJK"></a><span class="term"><code class="varname">nocjk</code></span></dt>
@ -8696,9 +8715,9 @@ for i in range(nres):
column shows. The default value is 0, meaning no column shows. The default value is 0, meaning no
checking.</p> checking.</p>
</dd> </dd>
<dt><a name= <dt><a name="RCL.INSTALL.CONFIG.RECOLLCONF.DBDIR"
"RCL.INSTALL.CONFIG.RECOLLCONF.XAPIANDB" id= id=
"RCL.INSTALL.CONFIG.RECOLLCONF.XAPIANDB"></a><span class="term"><code class="varname">xapiandb</code></span></dt> "RCL.INSTALL.CONFIG.RECOLLCONF.DBDIR"></a><span class="term"><code class="varname">dbdir</code></span></dt>
<dd> <dd>
<p>Xapian database directory location. This will <p>Xapian database directory location. This will
be created on first indexing. If the value is not be created on first indexing. If the value is not
@ -8840,13 +8859,13 @@ for i in range(nres):
the Xapian approach is is not very useful, and the Xapian approach is is not very useful, and
you should let Recoll manage the flushes. The you should let Recoll manage the flushes. The
program compiled value is 0. The configured program compiled value is 0. The configured
default value (from this file) is 10 MB, and will default value (from this file) is now 50 MB, and
be too low in many cases (it is chosen to should be ok in many cases. You can set it as low
conserve memory). If you are looking for maximum as 10 to conserve memory, but if you are looking
speed, you may want to experiment with values for maximum speed, you may want to experiment
between 20 and 200. In my experience, values with values between 20 and 200. In my experience,
beyond this are always counterproductive. If you values beyond this are always counterproductive.
find otherwise, please drop me a note.</p> If you find otherwise, please drop me a note.</p>
</dd> </dd>
<dt><a name= <dt><a name=
"RCL.INSTALL.CONFIG.RECOLLCONF.FILTERMAXSECONDS" "RCL.INSTALL.CONFIG.RECOLLCONF.FILTERMAXSECONDS"
@ -9285,6 +9304,16 @@ for i in range(nres):
default is off because OCR is so very slow.</p> default is off because OCR is so very slow.</p>
</dd> </dd>
<dt><a name= <dt><a name=
"RCL.INSTALL.CONFIG.RECOLLCONF.PDFOCRLANG" id=
"RCL.INSTALL.CONFIG.RECOLLCONF.PDFOCRLANG"></a><span class="term"><code class="varname">pdfocrlang</code></span></dt>
<dd>
<p>Language to assume for PDF OCR. This is very
important for having a reasonable rate of errors
with tesseract. This can also be set through a
configuration variable or directory-local
parameters. See the rclpdf.py script.</p>
</dd>
<dt><a name=
"RCL.INSTALL.CONFIG.RECOLLCONF.PDFATTACH" id= "RCL.INSTALL.CONFIG.RECOLLCONF.PDFATTACH" id=
"RCL.INSTALL.CONFIG.RECOLLCONF.PDFATTACH"></a><span class="term"><code class="varname">pdfattach</code></span></dt> "RCL.INSTALL.CONFIG.RECOLLCONF.PDFATTACH"></a><span class="term"><code class="varname">pdfattach</code></span></dt>
<dd> <dd>

View File

@ -168,14 +168,16 @@ skippedPaths = /media
# subtrees.</descr></var> # subtrees.</descr></var>
#excludedmimetypes = #excludedmimetypes =
# <var name="nomd5mimetypes" type="string"><brief>Don't compute md5 for # <var name="nomd5types" type="string">
# these types.</brief><descr>md5 checksums are used only for deduplicating # <brief>Don't compute md5 for these types.</brief>
# results, and can be very expensive to compute on multimedia or other big # <descr>md5 checksums are used only for deduplicating results, and can be
# files. This list lets you turn off md5 computation for selected types. It # very expensive to compute on multimedia or other big files. This list
# is global (no redefinition for subtrees). At the moment, it only has an # lets you turn off md5 computation for selected types. It is global (no
# effect for external handlers (exec and execm). The file types can be # redefinition for subtrees). At the moment, it only has an effect for
# specified by listing either MIME types (e.g. audio/mpeg) or handler names # external handlers (exec and execm). The file types can be specified by
# (e.g. rclaudio).</descr></var> # listing either MIME types (e.g. audio/mpeg) or handler names
# (e.g. rclaudio).</descr>
# </var>
nomd5types = rclaudio nomd5types = rclaudio
# <var name="compressedfilemaxkbs" type="int"><brief>Size limit for compressed # <var name="compressedfilemaxkbs" type="int"><brief>Size limit for compressed
@ -299,6 +301,21 @@ indexStoreDocText = 1
# restoring the previous behaviour.</descr></var> # restoring the previous behaviour.</descr></var>
#dehyphenate = 1 #dehyphenate = 1
# <var name="backslashasletter" type="bool">
# <brief>Process backslash as normal letter</brief>
# <descr>This may make sense for people wanting to index TeX commands as
# such but is not of much general use.</descr>
# </var>
#backslashasletter = 0
# <var name="maxtermlength" type="int" values="10 200 40">
# <brief>Maximum term length.</brief>
# <descr>Words longer than this will be discarded.
# The default is 40 and used to be hard-coded, but it can now be
# adjusted. You need an index reset if you change the value.</descr>
# </var>
#maxtermlength = 40
# <var name="nocjk" type="bool"><brief>Decides if specific East Asian # <var name="nocjk" type="bool"><brief>Decides if specific East Asian
# (Chinese Korean Japanese) characters/word splitting is turned # (Chinese Korean Japanese) characters/word splitting is turned
# off.</brief><descr>This will save a small amount of CPU if you have no CJK # off.</brief><descr>This will save a small amount of CPU if you have no CJK
@ -435,7 +452,7 @@ noxattrfields = 0
# value is 0, meaning no checking.</descr></var> # value is 0, meaning no checking.</descr></var>
maxfsoccuppc = 0 maxfsoccuppc = 0
# <var name="xapiandb" type="dfn"><brief>Xapian database directory # <var name="dbdir" type="dfn"><brief>Xapian database directory
# location.</brief><descr>This will be created on first indexing. If the # location.</brief><descr>This will be created on first indexing. If the
# value is not an absolute path, it will be interpreted as relative to # value is not an absolute path, it will be interpreted as relative to
# cachedir if set, or the configuration directory (-c argument or # cachedir if set, or the configuration directory (-c argument or
@ -837,6 +854,14 @@ snippetMaxPosWalk = 1000000
# very slow.</descr></var> # very slow.</descr></var>
#pdfocr = 0 #pdfocr = 0
# <var name="pdfocrlang" type="string">
# <brief>Language to assume for PDF OCR.</brief>
# <descr>This is very important for having a reasonable rate of errors
# with tesseract. This can also be set through a configuration variable
# or directory-local parameters. See the rclpdf.py script.</descr>
# </var>
#pdfocrlang = eng
# <var name="pdfattach" type="bool"> # <var name="pdfattach" type="bool">
# #
# <brief>Enable PDF attachment extraction by executing pdftk (if # <brief>Enable PDF attachment extraction by executing pdftk (if