updated recoll.conf doc
This commit is contained in:
parent
541c407033
commit
c7b2587f40
@ -126,15 +126,14 @@ types. Lets you exclude some types from indexing. MIME type
|
|||||||
names should be taken from the mimemap file (the values may be different
|
names should be taken from the mimemap file (the values may be different
|
||||||
from xdg-mime or file -i output in some cases) Can be redefined for
|
from xdg-mime or file -i output in some cases) Can be redefined for
|
||||||
subtrees.</para></listitem></varlistentry>
|
subtrees.</para></listitem></varlistentry>
|
||||||
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.NOMD5MIMETYPES">
|
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.NOMD5TYPES">
|
||||||
<term><varname>nomd5mimetypes</varname></term>
|
<term><varname>nomd5types</varname></term>
|
||||||
<listitem><para>Don't compute md5 for
|
<listitem><para>Don't compute md5 for these types. md5 checksums are used only for deduplicating results, and can be
|
||||||
these types. md5 checksums are used only for deduplicating
|
very expensive to compute on multimedia or other big files. This list
|
||||||
results, and can be very expensive to compute on multimedia or other big
|
lets you turn off md5 computation for selected types. It is global (no
|
||||||
files. This list lets you turn off md5 computation for selected types. It
|
redefinition for subtrees). At the moment, it only has an effect for
|
||||||
is global (no redefinition for subtrees). At the moment, it only has an
|
external handlers (exec and execm). The file types can be specified by
|
||||||
effect for external handlers (exec and execm). The file types can be
|
listing either MIME types (e.g. audio/mpeg) or handler names
|
||||||
specified by listing either MIME types (e.g. audio/mpeg) or handler names
|
|
||||||
(e.g. rclaudio).</para></listitem></varlistentry>
|
(e.g. rclaudio).</para></listitem></varlistentry>
|
||||||
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.COMPRESSEDFILEMAXKBS">
|
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.COMPRESSEDFILEMAXKBS">
|
||||||
<term><varname>compressedfilemaxkbs</varname></term>
|
<term><varname>compressedfilemaxkbs</varname></term>
|
||||||
@ -244,6 +243,15 @@ for a subtree.</para></listitem></varlistentry>
|
|||||||
'coworker' also when the input is 'co-worker'. This is new
|
'coworker' also when the input is 'co-worker'. This is new
|
||||||
in version 1.22, and on by default. Setting the variable to off allows
|
in version 1.22, and on by default. Setting the variable to off allows
|
||||||
restoring the previous behaviour.</para></listitem></varlistentry>
|
restoring the previous behaviour.</para></listitem></varlistentry>
|
||||||
|
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.BACKSLASHASLETTER">
|
||||||
|
<term><varname>backslashasletter</varname></term>
|
||||||
|
<listitem><para>Process backslash as normal letter This may make sense for people wanting to index TeX commands as
|
||||||
|
such but is not of much general use.</para></listitem></varlistentry>
|
||||||
|
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MAXTERMLENGTH">
|
||||||
|
<term><varname>maxtermlength</varname></term>
|
||||||
|
<listitem><para>Maximum term length. Words longer than this will be discarded.
|
||||||
|
The default is 40 and used to be hard-coded, but it can now be
|
||||||
|
adjusted. You need an index reset if you change the value.</para></listitem></varlistentry>
|
||||||
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.NOCJK">
|
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.NOCJK">
|
||||||
<term><varname>nocjk</varname></term>
|
<term><varname>nocjk</varname></term>
|
||||||
<listitem><para>Decides if specific East Asian
|
<listitem><para>Decides if specific East Asian
|
||||||
@ -371,8 +379,8 @@ subpath under cachedir.</para></listitem></varlistentry>
|
|||||||
over which we stop indexing. The value is a percentage,
|
over which we stop indexing. The value is a percentage,
|
||||||
corresponding to what the "Capacity" df output column shows. The default
|
corresponding to what the "Capacity" df output column shows. The default
|
||||||
value is 0, meaning no checking.</para></listitem></varlistentry>
|
value is 0, meaning no checking.</para></listitem></varlistentry>
|
||||||
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.XAPIANDB">
|
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.DBDIR">
|
||||||
<term><varname>xapiandb</varname></term>
|
<term><varname>dbdir</varname></term>
|
||||||
<listitem><para>Xapian database directory
|
<listitem><para>Xapian database directory
|
||||||
location. This will be created on first indexing. If the
|
location. This will be created on first indexing. If the
|
||||||
value is not an absolute path, it will be interpreted as relative to
|
value is not an absolute path, it will be interpreted as relative to
|
||||||
@ -447,8 +455,8 @@ $XAPIAN_FLUSH_THRESHOLD documents created, modified or deleted: as memory
|
|||||||
usage depends on average document size, not only document count, the
|
usage depends on average document size, not only document count, the
|
||||||
Xapian approach is is not very useful, and you should let Recoll manage
|
Xapian approach is is not very useful, and you should let Recoll manage
|
||||||
the flushes. The program compiled value is 0. The configured default
|
the flushes. The program compiled value is 0. The configured default
|
||||||
value (from this file) is 10 MB, and will be too low in many cases (it is
|
value (from this file) is now 50 MB, and should be ok in many cases.
|
||||||
chosen to conserve memory). If you are looking
|
You can set it as low as 10 to conserve memory, but if you are looking
|
||||||
for maximum speed, you may want to experiment with values between 20 and
|
for maximum speed, you may want to experiment with values between 20 and
|
||||||
200. In my experience, values beyond this are always counterproductive. If
|
200. In my experience, values beyond this are always counterproductive. If
|
||||||
you find otherwise, please drop me a note.</para></listitem></varlistentry>
|
you find otherwise, please drop me a note.</para></listitem></varlistentry>
|
||||||
@ -677,6 +685,11 @@ with possibly meaning-altering missing words.</para></listitem></varlistentry>
|
|||||||
<listitem><para>Attempt OCR of PDF files with no text content if both tesseract and
|
<listitem><para>Attempt OCR of PDF files with no text content if both tesseract and
|
||||||
pdftoppm are installed. The default is off because OCR is so
|
pdftoppm are installed. The default is off because OCR is so
|
||||||
very slow.</para></listitem></varlistentry>
|
very slow.</para></listitem></varlistentry>
|
||||||
|
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.PDFOCRLANG">
|
||||||
|
<term><varname>pdfocrlang</varname></term>
|
||||||
|
<listitem><para>Language to assume for PDF OCR. This is very important for having a reasonable rate of errors
|
||||||
|
with tesseract. This can also be set through a configuration variable
|
||||||
|
or directory-local parameters. See the rclpdf.py script.</para></listitem></varlistentry>
|
||||||
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.PDFATTACH">
|
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.PDFATTACH">
|
||||||
<term><varname>pdfattach</varname></term>
|
<term><varname>pdfattach</varname></term>
|
||||||
<listitem><para>Enable PDF attachment extraction by executing pdftk (if
|
<listitem><para>Enable PDF attachment extraction by executing pdftk (if
|
||||||
|
|||||||
@ -8300,8 +8300,8 @@ for i in range(nres):
|
|||||||
cases) Can be redefined for subtrees.</p>
|
cases) Can be redefined for subtrees.</p>
|
||||||
</dd>
|
</dd>
|
||||||
<dt><a name=
|
<dt><a name=
|
||||||
"RCL.INSTALL.CONFIG.RECOLLCONF.NOMD5MIMETYPES" id=
|
"RCL.INSTALL.CONFIG.RECOLLCONF.NOMD5TYPES" id=
|
||||||
"RCL.INSTALL.CONFIG.RECOLLCONF.NOMD5MIMETYPES"></a><span class="term"><code class="varname">nomd5mimetypes</code></span></dt>
|
"RCL.INSTALL.CONFIG.RECOLLCONF.NOMD5TYPES"></a><span class="term"><code class="varname">nomd5types</code></span></dt>
|
||||||
<dd>
|
<dd>
|
||||||
<p>Don't compute md5 for these types. md5
|
<p>Don't compute md5 for these types. md5
|
||||||
checksums are used only for deduplicating
|
checksums are used only for deduplicating
|
||||||
@ -8496,6 +8496,25 @@ for i in range(nres):
|
|||||||
1.22, and on by default. Setting the variable to
|
1.22, and on by default. Setting the variable to
|
||||||
off allows restoring the previous behaviour.</p>
|
off allows restoring the previous behaviour.</p>
|
||||||
</dd>
|
</dd>
|
||||||
|
<dt><a name=
|
||||||
|
"RCL.INSTALL.CONFIG.RECOLLCONF.BACKSLASHASLETTER"
|
||||||
|
id=
|
||||||
|
"RCL.INSTALL.CONFIG.RECOLLCONF.BACKSLASHASLETTER"></a><span class="term"><code class="varname">backslashasletter</code></span></dt>
|
||||||
|
<dd>
|
||||||
|
<p>Process backslash as normal letter This may
|
||||||
|
make sense for people wanting to index TeX
|
||||||
|
commands as such but is not of much general
|
||||||
|
use.</p>
|
||||||
|
</dd>
|
||||||
|
<dt><a name=
|
||||||
|
"RCL.INSTALL.CONFIG.RECOLLCONF.MAXTERMLENGTH" id=
|
||||||
|
"RCL.INSTALL.CONFIG.RECOLLCONF.MAXTERMLENGTH"></a><span class="term"><code class="varname">maxtermlength</code></span></dt>
|
||||||
|
<dd>
|
||||||
|
<p>Maximum term length. Words longer than this
|
||||||
|
will be discarded. The default is 40 and used to
|
||||||
|
be hard-coded, but it can now be adjusted. You
|
||||||
|
need an index reset if you change the value.</p>
|
||||||
|
</dd>
|
||||||
<dt><a name="RCL.INSTALL.CONFIG.RECOLLCONF.NOCJK"
|
<dt><a name="RCL.INSTALL.CONFIG.RECOLLCONF.NOCJK"
|
||||||
id=
|
id=
|
||||||
"RCL.INSTALL.CONFIG.RECOLLCONF.NOCJK"></a><span class="term"><code class="varname">nocjk</code></span></dt>
|
"RCL.INSTALL.CONFIG.RECOLLCONF.NOCJK"></a><span class="term"><code class="varname">nocjk</code></span></dt>
|
||||||
@ -8696,9 +8715,9 @@ for i in range(nres):
|
|||||||
column shows. The default value is 0, meaning no
|
column shows. The default value is 0, meaning no
|
||||||
checking.</p>
|
checking.</p>
|
||||||
</dd>
|
</dd>
|
||||||
<dt><a name=
|
<dt><a name="RCL.INSTALL.CONFIG.RECOLLCONF.DBDIR"
|
||||||
"RCL.INSTALL.CONFIG.RECOLLCONF.XAPIANDB" id=
|
id=
|
||||||
"RCL.INSTALL.CONFIG.RECOLLCONF.XAPIANDB"></a><span class="term"><code class="varname">xapiandb</code></span></dt>
|
"RCL.INSTALL.CONFIG.RECOLLCONF.DBDIR"></a><span class="term"><code class="varname">dbdir</code></span></dt>
|
||||||
<dd>
|
<dd>
|
||||||
<p>Xapian database directory location. This will
|
<p>Xapian database directory location. This will
|
||||||
be created on first indexing. If the value is not
|
be created on first indexing. If the value is not
|
||||||
@ -8840,13 +8859,13 @@ for i in range(nres):
|
|||||||
the Xapian approach is is not very useful, and
|
the Xapian approach is is not very useful, and
|
||||||
you should let Recoll manage the flushes. The
|
you should let Recoll manage the flushes. The
|
||||||
program compiled value is 0. The configured
|
program compiled value is 0. The configured
|
||||||
default value (from this file) is 10 MB, and will
|
default value (from this file) is now 50 MB, and
|
||||||
be too low in many cases (it is chosen to
|
should be ok in many cases. You can set it as low
|
||||||
conserve memory). If you are looking for maximum
|
as 10 to conserve memory, but if you are looking
|
||||||
speed, you may want to experiment with values
|
for maximum speed, you may want to experiment
|
||||||
between 20 and 200. In my experience, values
|
with values between 20 and 200. In my experience,
|
||||||
beyond this are always counterproductive. If you
|
values beyond this are always counterproductive.
|
||||||
find otherwise, please drop me a note.</p>
|
If you find otherwise, please drop me a note.</p>
|
||||||
</dd>
|
</dd>
|
||||||
<dt><a name=
|
<dt><a name=
|
||||||
"RCL.INSTALL.CONFIG.RECOLLCONF.FILTERMAXSECONDS"
|
"RCL.INSTALL.CONFIG.RECOLLCONF.FILTERMAXSECONDS"
|
||||||
@ -9285,6 +9304,16 @@ for i in range(nres):
|
|||||||
default is off because OCR is so very slow.</p>
|
default is off because OCR is so very slow.</p>
|
||||||
</dd>
|
</dd>
|
||||||
<dt><a name=
|
<dt><a name=
|
||||||
|
"RCL.INSTALL.CONFIG.RECOLLCONF.PDFOCRLANG" id=
|
||||||
|
"RCL.INSTALL.CONFIG.RECOLLCONF.PDFOCRLANG"></a><span class="term"><code class="varname">pdfocrlang</code></span></dt>
|
||||||
|
<dd>
|
||||||
|
<p>Language to assume for PDF OCR. This is very
|
||||||
|
important for having a reasonable rate of errors
|
||||||
|
with tesseract. This can also be set through a
|
||||||
|
configuration variable or directory-local
|
||||||
|
parameters. See the rclpdf.py script.</p>
|
||||||
|
</dd>
|
||||||
|
<dt><a name=
|
||||||
"RCL.INSTALL.CONFIG.RECOLLCONF.PDFATTACH" id=
|
"RCL.INSTALL.CONFIG.RECOLLCONF.PDFATTACH" id=
|
||||||
"RCL.INSTALL.CONFIG.RECOLLCONF.PDFATTACH"></a><span class="term"><code class="varname">pdfattach</code></span></dt>
|
"RCL.INSTALL.CONFIG.RECOLLCONF.PDFATTACH"></a><span class="term"><code class="varname">pdfattach</code></span></dt>
|
||||||
<dd>
|
<dd>
|
||||||
|
|||||||
@ -168,14 +168,16 @@ skippedPaths = /media
|
|||||||
# subtrees.</descr></var>
|
# subtrees.</descr></var>
|
||||||
#excludedmimetypes =
|
#excludedmimetypes =
|
||||||
|
|
||||||
# <var name="nomd5mimetypes" type="string"><brief>Don't compute md5 for
|
# <var name="nomd5types" type="string">
|
||||||
# these types.</brief><descr>md5 checksums are used only for deduplicating
|
# <brief>Don't compute md5 for these types.</brief>
|
||||||
# results, and can be very expensive to compute on multimedia or other big
|
# <descr>md5 checksums are used only for deduplicating results, and can be
|
||||||
# files. This list lets you turn off md5 computation for selected types. It
|
# very expensive to compute on multimedia or other big files. This list
|
||||||
# is global (no redefinition for subtrees). At the moment, it only has an
|
# lets you turn off md5 computation for selected types. It is global (no
|
||||||
# effect for external handlers (exec and execm). The file types can be
|
# redefinition for subtrees). At the moment, it only has an effect for
|
||||||
# specified by listing either MIME types (e.g. audio/mpeg) or handler names
|
# external handlers (exec and execm). The file types can be specified by
|
||||||
# (e.g. rclaudio).</descr></var>
|
# listing either MIME types (e.g. audio/mpeg) or handler names
|
||||||
|
# (e.g. rclaudio).</descr>
|
||||||
|
# </var>
|
||||||
nomd5types = rclaudio
|
nomd5types = rclaudio
|
||||||
|
|
||||||
# <var name="compressedfilemaxkbs" type="int"><brief>Size limit for compressed
|
# <var name="compressedfilemaxkbs" type="int"><brief>Size limit for compressed
|
||||||
@ -299,6 +301,21 @@ indexStoreDocText = 1
|
|||||||
# restoring the previous behaviour.</descr></var>
|
# restoring the previous behaviour.</descr></var>
|
||||||
#dehyphenate = 1
|
#dehyphenate = 1
|
||||||
|
|
||||||
|
# <var name="backslashasletter" type="bool">
|
||||||
|
# <brief>Process backslash as normal letter</brief>
|
||||||
|
# <descr>This may make sense for people wanting to index TeX commands as
|
||||||
|
# such but is not of much general use.</descr>
|
||||||
|
# </var>
|
||||||
|
#backslashasletter = 0
|
||||||
|
|
||||||
|
# <var name="maxtermlength" type="int" values="10 200 40">
|
||||||
|
# <brief>Maximum term length.</brief>
|
||||||
|
# <descr>Words longer than this will be discarded.
|
||||||
|
# The default is 40 and used to be hard-coded, but it can now be
|
||||||
|
# adjusted. You need an index reset if you change the value.</descr>
|
||||||
|
# </var>
|
||||||
|
#maxtermlength = 40
|
||||||
|
|
||||||
# <var name="nocjk" type="bool"><brief>Decides if specific East Asian
|
# <var name="nocjk" type="bool"><brief>Decides if specific East Asian
|
||||||
# (Chinese Korean Japanese) characters/word splitting is turned
|
# (Chinese Korean Japanese) characters/word splitting is turned
|
||||||
# off.</brief><descr>This will save a small amount of CPU if you have no CJK
|
# off.</brief><descr>This will save a small amount of CPU if you have no CJK
|
||||||
@ -435,7 +452,7 @@ noxattrfields = 0
|
|||||||
# value is 0, meaning no checking.</descr></var>
|
# value is 0, meaning no checking.</descr></var>
|
||||||
maxfsoccuppc = 0
|
maxfsoccuppc = 0
|
||||||
|
|
||||||
# <var name="xapiandb" type="dfn"><brief>Xapian database directory
|
# <var name="dbdir" type="dfn"><brief>Xapian database directory
|
||||||
# location.</brief><descr>This will be created on first indexing. If the
|
# location.</brief><descr>This will be created on first indexing. If the
|
||||||
# value is not an absolute path, it will be interpreted as relative to
|
# value is not an absolute path, it will be interpreted as relative to
|
||||||
# cachedir if set, or the configuration directory (-c argument or
|
# cachedir if set, or the configuration directory (-c argument or
|
||||||
@ -837,6 +854,14 @@ snippetMaxPosWalk = 1000000
|
|||||||
# very slow.</descr></var>
|
# very slow.</descr></var>
|
||||||
#pdfocr = 0
|
#pdfocr = 0
|
||||||
|
|
||||||
|
# <var name="pdfocrlang" type="string">
|
||||||
|
# <brief>Language to assume for PDF OCR.</brief>
|
||||||
|
# <descr>This is very important for having a reasonable rate of errors
|
||||||
|
# with tesseract. This can also be set through a configuration variable
|
||||||
|
# or directory-local parameters. See the rclpdf.py script.</descr>
|
||||||
|
# </var>
|
||||||
|
#pdfocrlang = eng
|
||||||
|
|
||||||
# <var name="pdfattach" type="bool">
|
# <var name="pdfattach" type="bool">
|
||||||
#
|
#
|
||||||
# <brief>Enable PDF attachment extraction by executing pdftk (if
|
# <brief>Enable PDF attachment extraction by executing pdftk (if
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user