updated recoll.conf doc

2019-02-04 15:47:14 +01:00 · 2019-02-04 15:47:14 +01:00 · c7b2587f40
commit c7b2587f40
parent 541c407033
3 changed files with 101 additions and 34 deletions
--- a/src/doc/user/recoll.conf.xml
+++ b/src/doc/user/recoll.conf.xml
@ -126,15 +126,14 @@ types. Lets you exclude some types from indexing. MIME type
 names should be taken from the mimemap file (the values may be different
 from xdg-mime or file -i output in some cases) Can be redefined for
 subtrees.</para></listitem></varlistentry>
-<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.NOMD5MIMETYPES">
+<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.NOMD5TYPES">
-<term><varname>nomd5mimetypes</varname></term>
+<term><varname>nomd5types</varname></term>
-<listitem><para>Don't compute md5 for
+<listitem><para>Don't compute md5 for these types. md5 checksums are used only for deduplicating results, and can be
-these types. md5 checksums are used only for deduplicating
+very expensive to compute on multimedia or other big files. This list
-results, and can be very expensive to compute on multimedia or other big
+lets you turn off md5 computation for selected types. It is global (no
-files. This list lets you turn off md5 computation for selected types. It
+redefinition for subtrees). At the moment, it only has an effect for
-is global (no redefinition for subtrees). At the moment, it only has an
+external handlers (exec and execm). The file types can be specified by
-effect for external handlers (exec and execm). The file types can be
+listing either MIME types (e.g. audio/mpeg) or handler names
 specified by listing either MIME types (e.g. audio/mpeg) or handler names
 (e.g. rclaudio).</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.COMPRESSEDFILEMAXKBS">
 <term><varname>compressedfilemaxkbs</varname></term>
@ -244,6 +243,15 @@ for a subtree.</para></listitem></varlistentry>
 'coworker' also when the input is 'co-worker'. This is new
 in version 1.22, and on by default. Setting the variable to off allows
 restoring the previous behaviour.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.BACKSLASHASLETTER">
 <term><varname>backslashasletter</varname></term>
 <listitem><para>Process backslash as normal letter This may make sense for people wanting to index TeX commands as
 such but is not of much general use.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MAXTERMLENGTH">
 <term><varname>maxtermlength</varname></term>
 <listitem><para>Maximum term length. Words longer than this will be discarded.
 The default is 40 and used to be hard-coded, but it can now be
 adjusted. You need an index reset if you change the value.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.NOCJK">
 <term><varname>nocjk</varname></term>
 <listitem><para>Decides if specific East Asian
@ -371,8 +379,8 @@ subpath under cachedir.</para></listitem></varlistentry>
 over which we stop indexing. The value is a percentage,
 corresponding to what the "Capacity" df output column shows. The default
 value is 0, meaning no checking.</para></listitem></varlistentry>
-<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.XAPIANDB">
+<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.DBDIR">
-<term><varname>xapiandb</varname></term>
+<term><varname>dbdir</varname></term>
 <listitem><para>Xapian database directory
 location. This will be created on first indexing. If the
 value is not an absolute path, it will be interpreted as relative to
@ -447,8 +455,8 @@ $XAPIAN_FLUSH_THRESHOLD documents created, modified or deleted: as memory
 usage depends on average document size, not only document count, the
 Xapian approach is is not very useful, and you should let Recoll manage
 the flushes. The program compiled value is 0. The configured default
-value (from this file) is 10 MB, and will be too low in many cases (it is
+value (from this file) is now 50 MB, and should be ok in many cases.
-chosen to conserve memory). If you are looking
+You can set it as low as 10 to conserve memory, but if you are looking
 for maximum speed, you may want to experiment with values between 20 and
 200. In my experience, values beyond this are always counterproductive. If
 you find otherwise, please drop me a note.</para></listitem></varlistentry>
@ -677,6 +685,11 @@ with possibly meaning-altering missing words.</para></listitem></varlistentry>
 <listitem><para>Attempt OCR of PDF files with no text content if both tesseract and
 pdftoppm are installed. The default is off because OCR is so
 very slow.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.PDFOCRLANG">
 <term><varname>pdfocrlang</varname></term>
 <listitem><para>Language to assume for PDF OCR. This is very important for having a reasonable rate of errors
 with tesseract. This can also be set through a configuration variable
 or directory-local parameters. See the rclpdf.py script.</para></listitem></varlistentry>
 <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.PDFATTACH">
 <term><varname>pdfattach</varname></term>
 <listitem><para>Enable PDF attachment extraction by executing pdftk (if
--- a/src/doc/user/usermanual.html
+++ b/src/doc/user/usermanual.html
@ -8300,8 +8300,8 @@ for i in range(nres):
                  cases) Can be redefined for subtrees.</p>
                </dd>
                <dt><a name=
-                "RCL.INSTALL.CONFIG.RECOLLCONF.NOMD5MIMETYPES" id=
+                "RCL.INSTALL.CONFIG.RECOLLCONF.NOMD5TYPES" id=
-                "RCL.INSTALL.CONFIG.RECOLLCONF.NOMD5MIMETYPES"></a><span class="term"><code class="varname">nomd5mimetypes</code></span></dt>
+                "RCL.INSTALL.CONFIG.RECOLLCONF.NOMD5TYPES"></a><span class="term"><code class="varname">nomd5types</code></span></dt>
                <dd>
                  <p>Don't compute md5 for these types. md5
                  checksums are used only for deduplicating
@ -8496,6 +8496,25 @@ for i in range(nres):
                  1.22, and on by default. Setting the variable to
                  off allows restoring the previous behaviour.</p>
                </dd>
                <dt><a name=
                "RCL.INSTALL.CONFIG.RECOLLCONF.BACKSLASHASLETTER"
                id=
                "RCL.INSTALL.CONFIG.RECOLLCONF.BACKSLASHASLETTER"></a><span class="term"><code class="varname">backslashasletter</code></span></dt>
                <dd>
                  <p>Process backslash as normal letter This may
                  make sense for people wanting to index TeX
                  commands as such but is not of much general
                  use.</p>
                </dd>
                <dt><a name=
                "RCL.INSTALL.CONFIG.RECOLLCONF.MAXTERMLENGTH" id=
                "RCL.INSTALL.CONFIG.RECOLLCONF.MAXTERMLENGTH"></a><span class="term"><code class="varname">maxtermlength</code></span></dt>
                <dd>
                  <p>Maximum term length. Words longer than this
                  will be discarded. The default is 40 and used to
                  be hard-coded, but it can now be adjusted. You
                  need an index reset if you change the value.</p>
                </dd>
                <dt><a name="RCL.INSTALL.CONFIG.RECOLLCONF.NOCJK"
                id=
                "RCL.INSTALL.CONFIG.RECOLLCONF.NOCJK"></a><span class="term"><code class="varname">nocjk</code></span></dt>
@ -8696,9 +8715,9 @@ for i in range(nres):
                  column shows. The default value is 0, meaning no
                  checking.</p>
                </dd>
-                <dt><a name=
+                <dt><a name="RCL.INSTALL.CONFIG.RECOLLCONF.DBDIR"
-                "RCL.INSTALL.CONFIG.RECOLLCONF.XAPIANDB" id=
+                id=
-                "RCL.INSTALL.CONFIG.RECOLLCONF.XAPIANDB"></a><span class="term"><code class="varname">xapiandb</code></span></dt>
+                "RCL.INSTALL.CONFIG.RECOLLCONF.DBDIR"></a><span class="term"><code class="varname">dbdir</code></span></dt>
                <dd>
                  <p>Xapian database directory location. This will
                  be created on first indexing. If the value is not
@ -8840,13 +8859,13 @@ for i in range(nres):
                  the Xapian approach is is not very useful, and
                  you should let Recoll manage the flushes. The
                  program compiled value is 0. The configured
-                  default value (from this file) is 10 MB, and will
+                  default value (from this file) is now 50 MB, and
-                  be too low in many cases (it is chosen to
+                  should be ok in many cases. You can set it as low
-                  conserve memory). If you are looking for maximum
+                  as 10 to conserve memory, but if you are looking
-                  speed, you may want to experiment with values
+                  for maximum speed, you may want to experiment
-                  between 20 and 200. In my experience, values
+                  with values between 20 and 200. In my experience,
-                  beyond this are always counterproductive. If you
+                  values beyond this are always counterproductive.
-                  find otherwise, please drop me a note.</p>
+                  If you find otherwise, please drop me a note.</p>
                </dd>
                <dt><a name=
                "RCL.INSTALL.CONFIG.RECOLLCONF.FILTERMAXSECONDS"
@ -9285,6 +9304,16 @@ for i in range(nres):
                  default is off because OCR is so very slow.</p>
                </dd>
                <dt><a name=
                "RCL.INSTALL.CONFIG.RECOLLCONF.PDFOCRLANG" id=
                "RCL.INSTALL.CONFIG.RECOLLCONF.PDFOCRLANG"></a><span class="term"><code class="varname">pdfocrlang</code></span></dt>
                <dd>
                  <p>Language to assume for PDF OCR. This is very
                  important for having a reasonable rate of errors
                  with tesseract. This can also be set through a
                  configuration variable or directory-local
                  parameters. See the rclpdf.py script.</p>
                </dd>
                <dt><a name=
                "RCL.INSTALL.CONFIG.RECOLLCONF.PDFATTACH" id=
                "RCL.INSTALL.CONFIG.RECOLLCONF.PDFATTACH"></a><span class="term"><code class="varname">pdfattach</code></span></dt>
                <dd>
--- a/src/sampleconf/recoll.conf
+++ b/src/sampleconf/recoll.conf
@ -168,14 +168,16 @@ skippedPaths = /media
 # subtrees.</descr></var>
 #excludedmimetypes = 
-# <var name="nomd5mimetypes" type="string"><brief>Don't compute md5 for
+# <var name="nomd5types" type="string">
-# these types.</brief><descr>md5 checksums are used only for deduplicating
+#  <brief>Don't compute md5 for these types.</brief>
-# results, and can be very expensive to compute on multimedia or other big
+#  <descr>md5 checksums are used only for deduplicating results, and can be
-# files. This list lets you turn off md5 computation for selected types. It
+#  very expensive to compute on multimedia or other big files. This list
-# is global (no redefinition for subtrees). At the moment, it only has an
+#  lets you turn off md5 computation for selected types. It is global (no
-# effect for external handlers (exec and execm). The file types can be
+#  redefinition for subtrees). At the moment, it only has an effect for
-# specified by listing either MIME types (e.g. audio/mpeg) or handler names
+#  external handlers (exec and execm). The file types can be specified by
-# (e.g. rclaudio).</descr></var>
+#  listing either MIME types (e.g. audio/mpeg) or handler names
 #  (e.g. rclaudio).</descr>
 # </var>
 nomd5types = rclaudio
 # <var name="compressedfilemaxkbs" type="int"><brief>Size limit for compressed
@ -299,6 +301,21 @@ indexStoreDocText = 1
 # restoring the previous behaviour.</descr></var>
 #dehyphenate = 1
 # <var name="backslashasletter" type="bool">
 #  <brief>Process backslash as normal letter</brief>
 #  <descr>This may make sense for people wanting to index TeX commands as
 #  such but is not of much general use.</descr>
 # </var>
 #backslashasletter = 0
 # <var name="maxtermlength" type="int" values="10 200 40">
 #  <brief>Maximum term length.</brief>
 #  <descr>Words longer than this will be discarded.
 #  The default is 40 and used to be hard-coded, but it can now be
 #  adjusted. You need an index reset if you change the value.</descr>
 # </var>
 #maxtermlength = 40
 # <var name="nocjk" type="bool"><brief>Decides if specific East Asian
 # (Chinese Korean Japanese) characters/word splitting is turned
 # off.</brief><descr>This will save a small amount of CPU if you have no CJK
@ -435,7 +452,7 @@ noxattrfields = 0
 # value is 0, meaning no checking.</descr></var>
 maxfsoccuppc = 0
-# <var name="xapiandb" type="dfn"><brief>Xapian database directory
+# <var name="dbdir" type="dfn"><brief>Xapian database directory
 # location.</brief><descr>This will be created on first indexing. If the
 # value is not an absolute path, it will be interpreted as relative to
 # cachedir if set, or the configuration directory (-c argument or
@ -837,6 +854,14 @@ snippetMaxPosWalk = 1000000
 # very slow.</descr></var>
 #pdfocr = 0
 # <var name="pdfocrlang" type="string">
 #  <brief>Language to assume for PDF OCR.</brief>
 #  <descr>This is very important for having a reasonable rate of errors
 #   with tesseract. This can also be set through a configuration variable
 #   or directory-local parameters. See the rclpdf.py script.</descr>
 # </var>
 #pdfocrlang = eng
 # <var name="pdfattach" type="bool">
 #
 # <brief>Enable PDF attachment extraction by executing pdftk (if