doc for partial incremental indexing

This commit is contained in:
Jean-Francois Dockes 2018-04-12 10:35:22 +02:00
parent 272a63104e
commit 1e22966bd3
4 changed files with 120 additions and 50 deletions

View File

@ -9,6 +9,13 @@
directories to recursively index. Default to ~ (indexes directories to recursively index. Default to ~ (indexes
$HOME). You can use symbolic links in the list, they will be followed, $HOME). You can use symbolic links in the list, they will be followed,
independantly of the value of the followLinks variable.</para></listitem></varlistentry> independantly of the value of the followLinks variable.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.MONITORDIRS">
<term><varname>monitordirs</varname></term>
<listitem><para>(1.25) Space-separated list of
files or directories to monitor for updates. When running
the real-time indexer, this allows monitoring only a subset of the whole
indexed area. The elements must be included in the tree defined by the
'topdirs' members.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.SKIPPEDNAMES"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.SKIPPEDNAMES">
<term><varname>skippedNames</varname></term> <term><varname>skippedNames</varname></term>
<listitem><para>Files and directories which should be ignored. <listitem><para>Files and directories which should be ignored.

View File

@ -92,11 +92,11 @@ alink="#0000FF">
"#RCL.INDEXING.INTRODUCTION.CONFIG">Configurations, "#RCL.INDEXING.INTRODUCTION.CONFIG">Configurations,
multiple indexes</a></span></dt> multiple indexes</a></span></dt>
<dt><span class="sect2">2.1.3. <a href= <dt><span class="sect2">2.1.3. <a href=
"#idm222">Document types</a></span></dt> "#idm223">Document types</a></span></dt>
<dt><span class="sect2">2.1.4. <a href= <dt><span class="sect2">2.1.4. <a href=
"#idm263">Indexing failures</a></span></dt> "#idm264">Indexing failures</a></span></dt>
<dt><span class="sect2">2.1.5. <a href= <dt><span class="sect2">2.1.5. <a href=
"#idm275">Recovery</a></span></dt> "#idm276">Recovery</a></span></dt>
</dl> </dl>
</dd> </dd>
<dt><span class="sect1">2.2. <a href= <dt><span class="sect1">2.2. <a href=
@ -637,12 +637,12 @@ alink="#0000FF">
<code class="literal">saké</code>, <code class= <code class="literal">saké</code>, <code class=
"literal">mate</code> / <code class= "literal">mate</code> / <code class=
"literal">maté</code>).</p> "literal">maté</code>).</p>
<p><span class="application">Recoll</span> versions 1.18 <p><span class="application">Recoll</span> can optionally
and newer can optionally store the raw terms, without store the raw terms, without accent stripping or case
accent stripping or case conversion. In this configuration, conversion. In this configuration, default searches will
default searches will behave as before, but it is possible behave as before, but it is possible to perform searches
to perform searches sensitive to case and diacritics. This sensitive to case and diacritics. This is described in more
is described in more detail in the <a class="link" href= detail in the <a class="link" href=
"#RCL.INDEXING.CONFIG.SENS" title= "#RCL.INDEXING.CONFIG.SENS" title=
"2.3.2.&nbsp;Index case and diacritics sensitivity">section "2.3.2.&nbsp;Index case and diacritics sensitivity">section
about index case and diacritics sensitivity</a>.</p> about index case and diacritics sensitivity</a>.</p>
@ -783,7 +783,7 @@ alink="#0000FF">
</div> </div>
</div> </div>
<p><span class="application">Recoll</span> indexing can <p><span class="application">Recoll</span> indexing can
be performed along two different modes:</p> be performed along two main modes:</p>
<div class="itemizedlist"> <div class="itemizedlist">
<ul class="itemizedlist" style= <ul class="itemizedlist" style=
"list-style-type: disc;"> "list-style-type: disc;">
@ -807,10 +807,8 @@ alink="#0000FF">
as a file is created or changed. <span class= as a file is created or changed. <span class=
"command"><strong>recollindex</strong></span> runs "command"><strong>recollindex</strong></span> runs
as a daemon and uses a file system alteration as a daemon and uses a file system alteration
monitor such as <span class= monitor (e.g. <span class=
"application">inotify</span>, <span class= "application">inotify</span>) to detect file
"application">Fam</span> or <span class=
"application">Gamin</span> to detect file
changes.</p> changes.</p>
</li> </li>
</ul> </ul>
@ -821,6 +819,14 @@ alink="#0000FF">
documentation directory, and real time indexing on a documentation directory, and real time indexing on a
small home directory). Monitoring a big file system tree small home directory). Monitoring a big file system tree
can consume significant system resources.</p> can consume significant system resources.</p>
<p>With <span class="application">Recoll</span> 1.25 and
newer, it is also possible to set up an index so that
only a subset of the tree will be monitored and the rest
will be covered by batch/incremental indexing. (See the
details in the <a class="link" href=
"#RCL.INDEXING.MONITOR" title=
"2.9.&nbsp;Real time indexing">Real time indexing</a>
section.</p>
<p>The choice of method and the parameters used can be <p>The choice of method and the parameters used can be
configured from the <span class= configured from the <span class=
"command"><strong>recoll</strong></span> GUI: "command"><strong>recoll</strong></span> GUI:
@ -834,12 +840,13 @@ alink="#0000FF">
later restart of indexing will mostly resume from where later restart of indexing will mostly resume from where
things stopped (the file tree walk has to be restarted things stopped (the file tree walk has to be restarted
from the beginning).</p> from the beginning).</p>
<p>When the real time indexer is running, only a stop <p>When the real time indexer is running, two operations
operation is available from the menu. When no indexing is are available from the menu: 'Stop' and 'Trigger
running, you have a choice of updating the index or incremental pass'. When no indexing is running, you have
rebuilding it (the first choice only processes changed a choice of updating the index or rebuilding it (the
files, the second one zeroes the index before starting so first choice only processes changed files, the second one
that all files are processed).</p> zeroes the index before starting so that all files are
processed).</p>
</div> </div>
<div class="sect2"> <div class="sect2">
<div class="titlepage"> <div class="titlepage">
@ -910,8 +917,8 @@ alink="#0000FF">
<div class="titlepage"> <div class="titlepage">
<div> <div>
<div> <div>
<h3 class="title"><a name="idm222" id= <h3 class="title"><a name="idm223" id=
"idm222"></a>2.1.3.&nbsp;Document types</h3> "idm223"></a>2.1.3.&nbsp;Document types</h3>
</div> </div>
</div> </div>
</div> </div>
@ -1008,8 +1015,8 @@ alink="#0000FF">
<div class="titlepage"> <div class="titlepage">
<div> <div>
<div> <div>
<h3 class="title"><a name="idm263" id= <h3 class="title"><a name="idm264" id=
"idm263"></a>2.1.4.&nbsp;Indexing failures</h3> "idm264"></a>2.1.4.&nbsp;Indexing failures</h3>
</div> </div>
</div> </div>
</div> </div>
@ -1044,8 +1051,8 @@ alink="#0000FF">
<div class="titlepage"> <div class="titlepage">
<div> <div>
<div> <div>
<h3 class="title"><a name="idm275" id= <h3 class="title"><a name="idm276" id=
"idm275"></a>2.1.5.&nbsp;Recovery</h3> "idm276"></a>2.1.5.&nbsp;Recovery</h3>
</div> </div>
</div> </div>
</div> </div>
@ -2111,7 +2118,7 @@ alink="#0000FF">
"application">X11</span> session monitoring (else the "application">X11</span> session monitoring (else the
daemon will not start).</p> daemon will not start).</p>
<p>By default, the messages from the indexing daemon will <p>By default, the messages from the indexing daemon will
be setn to the same file as those from the interactive be sent to the same file as those from the interactive
commands (<code class="literal">logfilename</code>). You commands (<code class="literal">logfilename</code>). You
may want to change this by setting the <code class= may want to change this by setting the <code class=
"varname">daemlogfilename</code> and <code class= "varname">daemlogfilename</code> and <code class=
@ -2138,6 +2145,18 @@ alink="#0000FF">
system resources. You probably do not want to enable it if system resources. You probably do not want to enable it if
your system is short on resources. Periodic indexing is your system is short on resources. Periodic indexing is
adequate in most cases.</p> adequate in most cases.</p>
<p>As of <span class="application">Recoll</span> 1.25, you
can set the <a class="link" href=
"#RCL.INSTALL.CONFIG.RECOLLCONF.MONITORDIRS">monitordirs</a>
configuration variable to specify that only a subset of
your indexed files will be monitored for instant indexing.
In this situation, an incremental pass on the full tree can
be triggered by either restarting the indexer, or just
running the <span class=
"command"><strong>recollindex</strong></span>, which will
notify the running process. The <span class=
"command"><strong>recoll</strong></span> GUI also has a
menu entry for this.</p>
<div class="note" style= <div class="note" style=
"margin-left: 0.5in; margin-right: 0.5in;"> "margin-left: 0.5in; margin-right: 0.5in;">
<h3 class="title">Increasing resources for inotify</h3> <h3 class="title">Increasing resources for inotify</h3>
@ -7985,6 +8004,17 @@ for i in range(nres):
of the followLinks variable.</p> of the followLinks variable.</p>
</dd> </dd>
<dt><a name= <dt><a name=
"RCL.INSTALL.CONFIG.RECOLLCONF.MONITORDIRS" id=
"RCL.INSTALL.CONFIG.RECOLLCONF.MONITORDIRS"></a><span class="term"><code class="varname">monitordirs</code></span></dt>
<dd>
<p>(1.25) Space-separated list of files or
directories to monitor for updates. When running
the real-time indexer, this allows monitoring
only a subset of the whole indexed area. The
elements must be included in the tree defined by
the 'topdirs' members.</p>
</dd>
<dt><a name=
"RCL.INSTALL.CONFIG.RECOLLCONF.SKIPPEDNAMES" id= "RCL.INSTALL.CONFIG.RECOLLCONF.SKIPPEDNAMES" id=
"RCL.INSTALL.CONFIG.RECOLLCONF.SKIPPEDNAMES"></a><span class="term"><code class="varname">skippedNames</code></span></dt> "RCL.INSTALL.CONFIG.RECOLLCONF.SKIPPEDNAMES"></a><span class="term"><code class="varname">skippedNames</code></span></dt>
<dd> <dd>
@ -8931,6 +8961,17 @@ for i in range(nres):
have custom fields.</p> have custom fields.</p>
</dd> </dd>
<dt><a name= <dt><a name=
"RCL.INSTALL.CONFIG.RECOLLCONF.IDXTEXTTRUNCATELEN"
id=
"RCL.INSTALL.CONFIG.RECOLLCONF.IDXTEXTTRUNCATELEN"></a><span class="term"><code class="varname">idxtexttruncatelen</code></span></dt>
<dd>
<p>Truncation length for all document texts. Only
index the beginning of documents. This is not
recommended except if you are sure that the
interesting keywords are at the top and have
severe disk space issues.</p>
</dd>
<dt><a name=
"RCL.INSTALL.CONFIG.RECOLLCONF.ASPELLLANGUAGE" id= "RCL.INSTALL.CONFIG.RECOLLCONF.ASPELLLANGUAGE" id=
"RCL.INSTALL.CONFIG.RECOLLCONF.ASPELLLANGUAGE"></a><span class="term"><code class="varname">aspellLanguage</code></span></dt> "RCL.INSTALL.CONFIG.RECOLLCONF.ASPELLLANGUAGE"></a><span class="term"><code class="varname">aspellLanguage</code></span></dt>
<dd> <dd>

View File

@ -226,13 +226,12 @@
diacritics (<literal>sake</literal> / <literal>saké</literal>, diacritics (<literal>sake</literal> / <literal>saké</literal>,
<literal>mate</literal> / <literal>maté</literal>).</para> <literal>mate</literal> / <literal>maté</literal>).</para>
<para>&RCL; versions 1.18 and newer can optionally store the raw <para>&RCL; can optionally store the raw terms, without accent
terms, without accent stripping or case conversion. In this stripping or case conversion. In this configuration, default searches
configuration, default searches will behave as before, but it is will behave as before, but it is possible to perform searches
possible to perform searches sensitive to case and sensitive to case and diacritics. This is described in more detail in
diacritics. This is described in more detail the <link linkend="RCL.INDEXING.CONFIG.SENS">section about index case
in the <link linkend="RCL.INDEXING.CONFIG.SENS">section about index and diacritics sensitivity</link>.</para>
case and diacritics sensitivity</link>.</para>
<para>&RCL; has many parameters which define exactly what to <para>&RCL; has many parameters which define exactly what to
index, and how to classify and decode the source index, and how to classify and decode the source
@ -327,7 +326,7 @@
<sect2 id="RCL.INDEXING.INTRODUCTION.MODES"> <sect2 id="RCL.INDEXING.INTRODUCTION.MODES">
<title>Indexing modes</title> <title>Indexing modes</title>
<para>&RCL; indexing can be performed along two different modes: <para>&RCL; indexing can be performed along two main modes:
<itemizedlist> <itemizedlist>
<listitem> <listitem>
<formalpara> <formalpara>
@ -343,18 +342,16 @@
</listitem> </listitem>
<listitem> <listitem>
<formalpara><title><link linkend="RCL.INDEXING.MONITOR">Real <formalpara><title><link linkend="RCL.INDEXING.MONITOR">Real
time indexing:</link></title> time indexing:</link></title> <para>indexing takes place as
<para>indexing takes place as soon as a file is created or soon as a file is created or
changed. <command>recollindex</command> runs as a daemon changed. <command>recollindex</command> runs as a daemon and
and uses a file system alteration monitor such as uses a file system alteration monitor
<application>inotify</application>, (e.g. <application>inotify</application>) to detect file
<application>Fam</application> or changes.</para> </formalpara>
<application>Gamin</application>
to detect file changes.</para>
</formalpara>
</listitem> </listitem>
</itemizedlist> </itemizedlist>
</para> </para>
<para>The choice between the two methods is mostly a matter of <para>The choice between the two methods is mostly a matter of
preference, and they can be combined by setting up multiple preference, and they can be combined by setting up multiple
indexes (ie: use periodic indexing on a big documentation indexes (ie: use periodic indexing on a big documentation
@ -362,6 +359,12 @@
directory). Monitoring a big file system tree can consume directory). Monitoring a big file system tree can consume
significant system resources.</para> significant system resources.</para>
<para>With &RCL; 1.25 and newer, it is also possible to set up an
index so that only a subset of the tree will be monitored and the
rest will be covered by batch/incremental indexing. (See the
details in the <link linkend="RCL.INDEXING.MONITOR">Real time
indexing</link> section.</para>
<para>The choice of method and the parameters used can be <para>The choice of method and the parameters used can be
configured from the <command>recoll</command> GUI: configured from the <command>recoll</command> GUI:
<menuchoice> <menuchoice>
@ -378,11 +381,12 @@
mostly resume from where things stopped (the file tree walk has to mostly resume from where things stopped (the file tree walk has to
be restarted from the beginning).</para> be restarted from the beginning).</para>
<para>When the real time indexer is running, only a stop operation <para>When the real time indexer is running, two operations are
is available from the menu. When no indexing is running, you have available from the menu: 'Stop' and 'Trigger incremental pass'.
a choice of updating the index or rebuilding it (the first choice When no indexing is running, you have a choice of updating the
only processes changed files, the second one zeroes the index index or rebuilding it (the first choice only processes changed
before starting so that all files are processed).</para> files, the second one zeroes the index before starting so that all
files are processed).</para>
</sect2> </sect2>
@ -1456,7 +1460,7 @@
session monitoring (else the daemon will not start).</para> session monitoring (else the daemon will not start).</para>
<para>By default, the messages from the indexing daemon will be <para>By default, the messages from the indexing daemon will be
setn to the same file as those from the interactive commands sent to the same file as those from the interactive commands
(<literal>logfilename</literal>). You may want to change this (<literal>logfilename</literal>). You may want to change this
by setting the <varname>daemlogfilename</varname> and by setting the <varname>daemlogfilename</varname> and
<varname>daemloglevel</varname> configuration parameters. Also <varname>daemloglevel</varname> configuration parameters. Also
@ -1482,6 +1486,17 @@
your system is short on resources. Periodic indexing is your system is short on resources. Periodic indexing is
adequate in most cases.</para> adequate in most cases.</para>
<para>As of &RCL; 1.25, you can set the <link
linkend="RCL.INSTALL.CONFIG.RECOLLCONF.MONITORDIRS">monitordirs</link>
configuration variable to specify that only a subset of your indexed
files will be monitored for instant indexing. In this situation, an
incremental pass on the full tree can be triggered by either
restarting the indexer, or just running the
<command>recollindex</command>, which will notify the running
process. The <command>recoll</command> GUI also has a menu entry for
this.</para>
<note><title>Increasing resources for inotify</title> <note><title>Increasing resources for inotify</title>
<para>On Linux systems, monitoring a big tree may need <para>On Linux systems, monitoring a big tree may need
increasing the resources available to inotify, which are increasing the resources available to inotify, which are

View File

@ -20,6 +20,13 @@
# independantly of the value of the followLinks variable.</descr></var> # independantly of the value of the followLinks variable.</descr></var>
topdirs = ~ topdirs = ~
# <var name="monitordirs" type="string"><brief>(1.25) Space-separated list of
# files or directories to monitor for updates.</brief><descr>When running
# the real-time indexer, this allows monitoring only a subset of the whole
# indexed area. The elements must be included in the tree defined by the
# 'topdirs' members.</descr></var>
#monitordirs=
# <var name="skippedNames" type="string"> # <var name="skippedNames" type="string">
# #
# <brief>Files and directories which should be ignored.</brief> <descr> # <brief>Files and directories which should be ignored.</brief> <descr>