updated recoll.conf doc
This commit is contained in:
parent
541c407033
commit
c7b2587f40
@ -126,15 +126,14 @@ types. Lets you exclude some types from indexing. MIME type
|
||||
names should be taken from the mimemap file (the values may be different
|
||||
from xdg-mime or file -i output in some cases) Can be redefined for
|
||||
subtrees.</para></listitem></varlistentry>
|
||||
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.NOMD5MIMETYPES">
|
||||
<term><varname>nomd5mimetypes</varname></term>
|
||||
<listitem><para>Don't compute md5 for
|
||||
these types. md5 checksums are used only for deduplicating
|
||||
results, and can be very expensive to compute on multimedia or other big
|
||||
files. This list lets you turn off md5 computation for selected types. It
|
||||
is global (no redefinition for subtrees). At the moment, it only has an
|
||||
effect for external handlers (exec and execm). The file types can be
|
||||
specified by listing either MIME types (e.g. audio/mpeg) or handler names
|
||||
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.NOMD5TYPES">
|
||||
<term><varname>nomd5types</varname></term>
|
||||
<listitem><para>Don't compute md5 for these types. md5 checksums are used only for deduplicating results, and can be
|
||||
very expensive to compute on multimedia or other big files. This list
|
||||
lets you turn off md5 computation for selected types. It is global (no
|
||||
redefinition for subtrees). At the moment, it only has an effect for
|
||||
external handlers (exec and execm). The file types can be specified by
|
||||
listing either MIME types (e.g. audio/mpeg) or handler names
|
||||
(e.g. rclaudio).</para></listitem></varlistentry>
|
||||
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.COMPRESSEDFILEMAXKBS">
|
||||
<term><varname>compressedfilemaxkbs</varname></term>
|
||||
@ -244,6 +243,15 @@ for a subtree.</para></listitem></varlistentry>
|
||||
'coworker' also when the input is 'co-worker'. This is new
|
||||
in version 1.22, and on by default. Setting the variable to off allows
|
||||
restoring the previous behaviour.</para></listitem></varlistentry>
|
||||
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.BACKSLASHASLETTER">
|
||||
<term><varname>backslashasletter</varname></term>
|
||||
<listitem><para>Process backslash as normal letter This may make sense for people wanting to index TeX commands as
|
||||
such but is not of much general use.</para></listitem></varlistentry>
|
||||
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MAXTERMLENGTH">
|
||||
<term><varname>maxtermlength</varname></term>
|
||||
<listitem><para>Maximum term length. Words longer than this will be discarded.
|
||||
The default is 40 and used to be hard-coded, but it can now be
|
||||
adjusted. You need an index reset if you change the value.</para></listitem></varlistentry>
|
||||
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.NOCJK">
|
||||
<term><varname>nocjk</varname></term>
|
||||
<listitem><para>Decides if specific East Asian
|
||||
@ -371,8 +379,8 @@ subpath under cachedir.</para></listitem></varlistentry>
|
||||
over which we stop indexing. The value is a percentage,
|
||||
corresponding to what the "Capacity" df output column shows. The default
|
||||
value is 0, meaning no checking.</para></listitem></varlistentry>
|
||||
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.XAPIANDB">
|
||||
<term><varname>xapiandb</varname></term>
|
||||
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.DBDIR">
|
||||
<term><varname>dbdir</varname></term>
|
||||
<listitem><para>Xapian database directory
|
||||
location. This will be created on first indexing. If the
|
||||
value is not an absolute path, it will be interpreted as relative to
|
||||
@ -447,8 +455,8 @@ $XAPIAN_FLUSH_THRESHOLD documents created, modified or deleted: as memory
|
||||
usage depends on average document size, not only document count, the
|
||||
Xapian approach is is not very useful, and you should let Recoll manage
|
||||
the flushes. The program compiled value is 0. The configured default
|
||||
value (from this file) is 10 MB, and will be too low in many cases (it is
|
||||
chosen to conserve memory). If you are looking
|
||||
value (from this file) is now 50 MB, and should be ok in many cases.
|
||||
You can set it as low as 10 to conserve memory, but if you are looking
|
||||
for maximum speed, you may want to experiment with values between 20 and
|
||||
200. In my experience, values beyond this are always counterproductive. If
|
||||
you find otherwise, please drop me a note.</para></listitem></varlistentry>
|
||||
@ -677,6 +685,11 @@ with possibly meaning-altering missing words.</para></listitem></varlistentry>
|
||||
<listitem><para>Attempt OCR of PDF files with no text content if both tesseract and
|
||||
pdftoppm are installed. The default is off because OCR is so
|
||||
very slow.</para></listitem></varlistentry>
|
||||
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.PDFOCRLANG">
|
||||
<term><varname>pdfocrlang</varname></term>
|
||||
<listitem><para>Language to assume for PDF OCR. This is very important for having a reasonable rate of errors
|
||||
with tesseract. This can also be set through a configuration variable
|
||||
or directory-local parameters. See the rclpdf.py script.</para></listitem></varlistentry>
|
||||
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.PDFATTACH">
|
||||
<term><varname>pdfattach</varname></term>
|
||||
<listitem><para>Enable PDF attachment extraction by executing pdftk (if
|
||||
|
||||
@ -8300,8 +8300,8 @@ for i in range(nres):
|
||||
cases) Can be redefined for subtrees.</p>
|
||||
</dd>
|
||||
<dt><a name=
|
||||
"RCL.INSTALL.CONFIG.RECOLLCONF.NOMD5MIMETYPES" id=
|
||||
"RCL.INSTALL.CONFIG.RECOLLCONF.NOMD5MIMETYPES"></a><span class="term"><code class="varname">nomd5mimetypes</code></span></dt>
|
||||
"RCL.INSTALL.CONFIG.RECOLLCONF.NOMD5TYPES" id=
|
||||
"RCL.INSTALL.CONFIG.RECOLLCONF.NOMD5TYPES"></a><span class="term"><code class="varname">nomd5types</code></span></dt>
|
||||
<dd>
|
||||
<p>Don't compute md5 for these types. md5
|
||||
checksums are used only for deduplicating
|
||||
@ -8496,6 +8496,25 @@ for i in range(nres):
|
||||
1.22, and on by default. Setting the variable to
|
||||
off allows restoring the previous behaviour.</p>
|
||||
</dd>
|
||||
<dt><a name=
|
||||
"RCL.INSTALL.CONFIG.RECOLLCONF.BACKSLASHASLETTER"
|
||||
id=
|
||||
"RCL.INSTALL.CONFIG.RECOLLCONF.BACKSLASHASLETTER"></a><span class="term"><code class="varname">backslashasletter</code></span></dt>
|
||||
<dd>
|
||||
<p>Process backslash as normal letter This may
|
||||
make sense for people wanting to index TeX
|
||||
commands as such but is not of much general
|
||||
use.</p>
|
||||
</dd>
|
||||
<dt><a name=
|
||||
"RCL.INSTALL.CONFIG.RECOLLCONF.MAXTERMLENGTH" id=
|
||||
"RCL.INSTALL.CONFIG.RECOLLCONF.MAXTERMLENGTH"></a><span class="term"><code class="varname">maxtermlength</code></span></dt>
|
||||
<dd>
|
||||
<p>Maximum term length. Words longer than this
|
||||
will be discarded. The default is 40 and used to
|
||||
be hard-coded, but it can now be adjusted. You
|
||||
need an index reset if you change the value.</p>
|
||||
</dd>
|
||||
<dt><a name="RCL.INSTALL.CONFIG.RECOLLCONF.NOCJK"
|
||||
id=
|
||||
"RCL.INSTALL.CONFIG.RECOLLCONF.NOCJK"></a><span class="term"><code class="varname">nocjk</code></span></dt>
|
||||
@ -8696,9 +8715,9 @@ for i in range(nres):
|
||||
column shows. The default value is 0, meaning no
|
||||
checking.</p>
|
||||
</dd>
|
||||
<dt><a name=
|
||||
"RCL.INSTALL.CONFIG.RECOLLCONF.XAPIANDB" id=
|
||||
"RCL.INSTALL.CONFIG.RECOLLCONF.XAPIANDB"></a><span class="term"><code class="varname">xapiandb</code></span></dt>
|
||||
<dt><a name="RCL.INSTALL.CONFIG.RECOLLCONF.DBDIR"
|
||||
id=
|
||||
"RCL.INSTALL.CONFIG.RECOLLCONF.DBDIR"></a><span class="term"><code class="varname">dbdir</code></span></dt>
|
||||
<dd>
|
||||
<p>Xapian database directory location. This will
|
||||
be created on first indexing. If the value is not
|
||||
@ -8840,13 +8859,13 @@ for i in range(nres):
|
||||
the Xapian approach is is not very useful, and
|
||||
you should let Recoll manage the flushes. The
|
||||
program compiled value is 0. The configured
|
||||
default value (from this file) is 10 MB, and will
|
||||
be too low in many cases (it is chosen to
|
||||
conserve memory). If you are looking for maximum
|
||||
speed, you may want to experiment with values
|
||||
between 20 and 200. In my experience, values
|
||||
beyond this are always counterproductive. If you
|
||||
find otherwise, please drop me a note.</p>
|
||||
default value (from this file) is now 50 MB, and
|
||||
should be ok in many cases. You can set it as low
|
||||
as 10 to conserve memory, but if you are looking
|
||||
for maximum speed, you may want to experiment
|
||||
with values between 20 and 200. In my experience,
|
||||
values beyond this are always counterproductive.
|
||||
If you find otherwise, please drop me a note.</p>
|
||||
</dd>
|
||||
<dt><a name=
|
||||
"RCL.INSTALL.CONFIG.RECOLLCONF.FILTERMAXSECONDS"
|
||||
@ -9285,6 +9304,16 @@ for i in range(nres):
|
||||
default is off because OCR is so very slow.</p>
|
||||
</dd>
|
||||
<dt><a name=
|
||||
"RCL.INSTALL.CONFIG.RECOLLCONF.PDFOCRLANG" id=
|
||||
"RCL.INSTALL.CONFIG.RECOLLCONF.PDFOCRLANG"></a><span class="term"><code class="varname">pdfocrlang</code></span></dt>
|
||||
<dd>
|
||||
<p>Language to assume for PDF OCR. This is very
|
||||
important for having a reasonable rate of errors
|
||||
with tesseract. This can also be set through a
|
||||
configuration variable or directory-local
|
||||
parameters. See the rclpdf.py script.</p>
|
||||
</dd>
|
||||
<dt><a name=
|
||||
"RCL.INSTALL.CONFIG.RECOLLCONF.PDFATTACH" id=
|
||||
"RCL.INSTALL.CONFIG.RECOLLCONF.PDFATTACH"></a><span class="term"><code class="varname">pdfattach</code></span></dt>
|
||||
<dd>
|
||||
|
||||
@ -168,14 +168,16 @@ skippedPaths = /media
|
||||
# subtrees.</descr></var>
|
||||
#excludedmimetypes =
|
||||
|
||||
# <var name="nomd5mimetypes" type="string"><brief>Don't compute md5 for
|
||||
# these types.</brief><descr>md5 checksums are used only for deduplicating
|
||||
# results, and can be very expensive to compute on multimedia or other big
|
||||
# files. This list lets you turn off md5 computation for selected types. It
|
||||
# is global (no redefinition for subtrees). At the moment, it only has an
|
||||
# effect for external handlers (exec and execm). The file types can be
|
||||
# specified by listing either MIME types (e.g. audio/mpeg) or handler names
|
||||
# (e.g. rclaudio).</descr></var>
|
||||
# <var name="nomd5types" type="string">
|
||||
# <brief>Don't compute md5 for these types.</brief>
|
||||
# <descr>md5 checksums are used only for deduplicating results, and can be
|
||||
# very expensive to compute on multimedia or other big files. This list
|
||||
# lets you turn off md5 computation for selected types. It is global (no
|
||||
# redefinition for subtrees). At the moment, it only has an effect for
|
||||
# external handlers (exec and execm). The file types can be specified by
|
||||
# listing either MIME types (e.g. audio/mpeg) or handler names
|
||||
# (e.g. rclaudio).</descr>
|
||||
# </var>
|
||||
nomd5types = rclaudio
|
||||
|
||||
# <var name="compressedfilemaxkbs" type="int"><brief>Size limit for compressed
|
||||
@ -299,6 +301,21 @@ indexStoreDocText = 1
|
||||
# restoring the previous behaviour.</descr></var>
|
||||
#dehyphenate = 1
|
||||
|
||||
# <var name="backslashasletter" type="bool">
|
||||
# <brief>Process backslash as normal letter</brief>
|
||||
# <descr>This may make sense for people wanting to index TeX commands as
|
||||
# such but is not of much general use.</descr>
|
||||
# </var>
|
||||
#backslashasletter = 0
|
||||
|
||||
# <var name="maxtermlength" type="int" values="10 200 40">
|
||||
# <brief>Maximum term length.</brief>
|
||||
# <descr>Words longer than this will be discarded.
|
||||
# The default is 40 and used to be hard-coded, but it can now be
|
||||
# adjusted. You need an index reset if you change the value.</descr>
|
||||
# </var>
|
||||
#maxtermlength = 40
|
||||
|
||||
# <var name="nocjk" type="bool"><brief>Decides if specific East Asian
|
||||
# (Chinese Korean Japanese) characters/word splitting is turned
|
||||
# off.</brief><descr>This will save a small amount of CPU if you have no CJK
|
||||
@ -435,7 +452,7 @@ noxattrfields = 0
|
||||
# value is 0, meaning no checking.</descr></var>
|
||||
maxfsoccuppc = 0
|
||||
|
||||
# <var name="xapiandb" type="dfn"><brief>Xapian database directory
|
||||
# <var name="dbdir" type="dfn"><brief>Xapian database directory
|
||||
# location.</brief><descr>This will be created on first indexing. If the
|
||||
# value is not an absolute path, it will be interpreted as relative to
|
||||
# cachedir if set, or the configuration directory (-c argument or
|
||||
@ -837,6 +854,14 @@ snippetMaxPosWalk = 1000000
|
||||
# very slow.</descr></var>
|
||||
#pdfocr = 0
|
||||
|
||||
# <var name="pdfocrlang" type="string">
|
||||
# <brief>Language to assume for PDF OCR.</brief>
|
||||
# <descr>This is very important for having a reasonable rate of errors
|
||||
# with tesseract. This can also be set through a configuration variable
|
||||
# or directory-local parameters. See the rclpdf.py script.</descr>
|
||||
# </var>
|
||||
#pdfocrlang = eng
|
||||
|
||||
# <var name="pdfattach" type="bool">
|
||||
#
|
||||
# <brief>Enable PDF attachment extraction by executing pdftk (if
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user