doc
This commit is contained in:
parent
55e2fe5d27
commit
6f1be83251
@ -35,7 +35,7 @@ alink="#0000FF">
|
|||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
<div>
|
<div>
|
||||||
<p class="copyright">Copyright © 2005-2015 Jean-Francois
|
<p class="copyright">Copyright © 2005-2018 Jean-Francois
|
||||||
Dockes</p>
|
Dockes</p>
|
||||||
</div>
|
</div>
|
||||||
<div>
|
<div>
|
||||||
@ -92,11 +92,11 @@ alink="#0000FF">
|
|||||||
"#RCL.INDEXING.INTRODUCTION.CONFIG">Configurations,
|
"#RCL.INDEXING.INTRODUCTION.CONFIG">Configurations,
|
||||||
multiple indexes</a></span></dt>
|
multiple indexes</a></span></dt>
|
||||||
<dt><span class="sect2">2.1.3. <a href=
|
<dt><span class="sect2">2.1.3. <a href=
|
||||||
"#idm223">Document types</a></span></dt>
|
"#idm224">Document types</a></span></dt>
|
||||||
<dt><span class="sect2">2.1.4. <a href=
|
<dt><span class="sect2">2.1.4. <a href=
|
||||||
"#idm264">Indexing failures</a></span></dt>
|
"#idm265">Indexing failures</a></span></dt>
|
||||||
<dt><span class="sect2">2.1.5. <a href=
|
<dt><span class="sect2">2.1.5. <a href=
|
||||||
"#idm276">Recovery</a></span></dt>
|
"#idm277">Recovery</a></span></dt>
|
||||||
</dl>
|
</dl>
|
||||||
</dd>
|
</dd>
|
||||||
<dt><span class="sect1">2.2. <a href=
|
<dt><span class="sect1">2.2. <a href=
|
||||||
@ -176,9 +176,11 @@ alink="#0000FF">
|
|||||||
<dd>
|
<dd>
|
||||||
<dl>
|
<dl>
|
||||||
<dt><span class="sect2">2.9.1. <a href=
|
<dt><span class="sect2">2.9.1. <a href=
|
||||||
"#RCL.INDEXING.MONITOR.FASTFILES">Slowing down the
|
"#RCL.INDEXING.MONITOR.START">Real time indexing:
|
||||||
reindexing rate for fast changing
|
automatic daemon start</a></span></dt>
|
||||||
files</a></span></dt>
|
<dt><span class="sect2">2.9.2. <a href=
|
||||||
|
"#RCL.INDEXING.MONITOR.DETAILS">Real time indexing:
|
||||||
|
miscellaneous details</a></span></dt>
|
||||||
</dl>
|
</dl>
|
||||||
</dd>
|
</dd>
|
||||||
</dl>
|
</dl>
|
||||||
@ -481,9 +483,8 @@ alink="#0000FF">
|
|||||||
"guimenuitem">Indexing configuration</span>, then adjust
|
"guimenuitem">Indexing configuration</span>, then adjust
|
||||||
the <span class="guilabel">Top directories</span>
|
the <span class="guilabel">Top directories</span>
|
||||||
section).</p>
|
section).</p>
|
||||||
<p>Also be aware that, on Unix/Linux, you may need to
|
<p>On Unix/Linux, you may need to install the appropriate
|
||||||
install the appropriate <a class="link" href=
|
<a class="link" href="#RCL.INSTALL.EXTERNAL" title=
|
||||||
"#RCL.INSTALL.EXTERNAL" title=
|
|
||||||
"6.2. Supporting packages">supporting applications</a>
|
"6.2. Supporting packages">supporting applications</a>
|
||||||
for document types that need them (for example <span class=
|
for document types that need them (for example <span class=
|
||||||
"application">antiword</span> for <span class=
|
"application">antiword</span> for <span class=
|
||||||
@ -594,9 +595,10 @@ alink="#0000FF">
|
|||||||
"application">Recoll</span> can only display documents that
|
"application">Recoll</span> can only display documents that
|
||||||
still exist at the place from which they were indexed.
|
still exist at the place from which they were indexed.
|
||||||
(Actually, there is a way to reconstruct a document from
|
(Actually, there is a way to reconstruct a document from
|
||||||
the information in the index, but the result is not nice,
|
the information in the index, but only the pure text is
|
||||||
as all formatting, punctuation and capitalization are
|
saved, possibly without punctuation and capitalization,
|
||||||
lost).</p>
|
depending on <span class="application">Recoll</span>
|
||||||
|
version).</p>
|
||||||
<p><span class="application">Recoll</span> stores all
|
<p><span class="application">Recoll</span> stores all
|
||||||
internal data in <span class="application">Unicode
|
internal data in <span class="application">Unicode
|
||||||
UTF-8</span> format, and it can index files of many types
|
UTF-8</span> format, and it can index files of many types
|
||||||
@ -796,11 +798,10 @@ alink="#0000FF">
|
|||||||
<li class="listitem">
|
<li class="listitem">
|
||||||
<p><b><a class="link" href="#RCL.INDEXING.PERIODIC"
|
<p><b><a class="link" href="#RCL.INDEXING.PERIODIC"
|
||||||
title="2.8. Periodic indexing">Periodic (or
|
title="2.8. Periodic indexing">Periodic (or
|
||||||
batch) indexing:</a> </b>indexing takes place
|
batch) indexing:</a> </b><span class=
|
||||||
at discrete times, by executing the <span class=
|
"command"><strong>recollindex</strong></span> is
|
||||||
"command"><strong>recollindex</strong></span>
|
executed at discrete times. The typical usage is to
|
||||||
command. The typical usage is to have a nightly
|
have a nightly run <a class="link" href=
|
||||||
indexing run <a class="link" href=
|
|
||||||
"#RCL.INDEXING.PERIODIC.AUTOMAT" title=
|
"#RCL.INDEXING.PERIODIC.AUTOMAT" title=
|
||||||
"2.8.2. Using cron to automate indexing">programmed</a>
|
"2.8.2. Using cron to automate indexing">programmed</a>
|
||||||
into your <span class=
|
into your <span class=
|
||||||
@ -809,13 +810,13 @@ alink="#0000FF">
|
|||||||
<li class="listitem">
|
<li class="listitem">
|
||||||
<p><b><a class="link" href="#RCL.INDEXING.MONITOR"
|
<p><b><a class="link" href="#RCL.INDEXING.MONITOR"
|
||||||
title="2.9. Real time indexing">Real time
|
title="2.9. Real time indexing">Real time
|
||||||
indexing:</a> </b>indexing takes place as soon
|
indexing:</a> </b><span class=
|
||||||
as a file is created or changed. <span class=
|
|
||||||
"command"><strong>recollindex</strong></span> runs
|
"command"><strong>recollindex</strong></span> runs
|
||||||
as a daemon and uses a file system alteration
|
permanently as a daemon and uses a file system
|
||||||
monitor (e.g. <span class=
|
alteration monitor (e.g. <span class=
|
||||||
"application">inotify</span>) to detect file
|
"application">inotify</span>) to detect file
|
||||||
changes.</p>
|
changes. New or updated files are indexed at
|
||||||
|
once.</p>
|
||||||
</li>
|
</li>
|
||||||
</ul>
|
</ul>
|
||||||
</div>
|
</div>
|
||||||
@ -825,7 +826,7 @@ alink="#0000FF">
|
|||||||
documentation directory, and real time indexing on a
|
documentation directory, and real time indexing on a
|
||||||
small home directory). Monitoring a big file system tree
|
small home directory). Monitoring a big file system tree
|
||||||
can consume significant system resources.</p>
|
can consume significant system resources.</p>
|
||||||
<p>With <span class="application">Recoll</span> 1.25 and
|
<p>With <span class="application">Recoll</span> 1.24 and
|
||||||
newer, it is also possible to set up an index so that
|
newer, it is also possible to set up an index so that
|
||||||
only a subset of the tree will be monitored and the rest
|
only a subset of the tree will be monitored and the rest
|
||||||
will be covered by batch/incremental indexing. (See the
|
will be covered by batch/incremental indexing. (See the
|
||||||
@ -838,9 +839,9 @@ alink="#0000FF">
|
|||||||
"command"><strong>recoll</strong></span> GUI:
|
"command"><strong>recoll</strong></span> GUI:
|
||||||
<span class="guimenu">Preferences</span> → <span class=
|
<span class="guimenu">Preferences</span> → <span class=
|
||||||
"guimenuitem">Indexing schedule</span></p>
|
"guimenuitem">Indexing schedule</span></p>
|
||||||
<p>The <span class="guimenu">File</span> menu also has
|
<p>The GUI <span class="guimenu">File</span> menu also
|
||||||
entries to start or stop the current indexing operation.
|
has entries to start or stop the current indexing
|
||||||
Stopping indexing is performed by killing the
|
operation. Stopping indexing is performed by killing the
|
||||||
<span class="command"><strong>recollindex</strong></span>
|
<span class="command"><strong>recollindex</strong></span>
|
||||||
process, which will checkpoint its state and exit. A
|
process, which will checkpoint its state and exit. A
|
||||||
later restart of indexing will mostly resume from where
|
later restart of indexing will mostly resume from where
|
||||||
@ -900,7 +901,7 @@ alink="#0000FF">
|
|||||||
<p>When generating indexes, the different configurations
|
<p>When generating indexes, the different configurations
|
||||||
are entirely independant (no parameters are ever shared
|
are entirely independant (no parameters are ever shared
|
||||||
between configurations when indexing).</p>
|
between configurations when indexing).</p>
|
||||||
<p>Multiple indexes can queryied concurrently, either
|
<p>Multiple indexes can be queryied concurrently, either
|
||||||
from the GUI or the command line. When doing this, there
|
from the GUI or the command line. When doing this, there
|
||||||
is always a main configuration, from which both
|
is always a main configuration, from which both
|
||||||
configuration and index data are used. Only the index
|
configuration and index data are used. Only the index
|
||||||
@ -923,8 +924,8 @@ alink="#0000FF">
|
|||||||
<div class="titlepage">
|
<div class="titlepage">
|
||||||
<div>
|
<div>
|
||||||
<div>
|
<div>
|
||||||
<h3 class="title"><a name="idm223" id=
|
<h3 class="title"><a name="idm224" id=
|
||||||
"idm223"></a>2.1.3. Document types</h3>
|
"idm224"></a>2.1.3. Document types</h3>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
@ -943,10 +944,10 @@ alink="#0000FF">
|
|||||||
<span class="application">LibreOffice</span> document
|
<span class="application">LibreOffice</span> document
|
||||||
stored as an attachment to an email message inside an
|
stored as an attachment to an email message inside an
|
||||||
email folder archived in a zip file...</p>
|
email folder archived in a zip file...</p>
|
||||||
<p><span class="application">Recoll</span> indexing
|
<p><span class=
|
||||||
processes plain text, HTML, OpenDocument
|
"command"><strong>recollindex</strong></span> processes
|
||||||
(Open/LibreOffice), email formats, and a few others
|
plain text, HTML, OpenDocument (Open/LibreOffice), email
|
||||||
internally.</p>
|
formats, and a few others internally.</p>
|
||||||
<p>Other file types (ie: postscript, pdf, ms-word, rtf
|
<p>Other file types (ie: postscript, pdf, ms-word, rtf
|
||||||
...) need external applications for preprocessing. The
|
...) need external applications for preprocessing. The
|
||||||
list is in the <a class="link" href=
|
list is in the <a class="link" href=
|
||||||
@ -967,15 +968,15 @@ alink="#0000FF">
|
|||||||
to either exclude some types, or on the contrary define a
|
to either exclude some types, or on the contrary define a
|
||||||
positive list of types to be indexed. In the latter case,
|
positive list of types to be indexed. In the latter case,
|
||||||
any type not in the list will be ignored.</p>
|
any type not in the list will be ignored.</p>
|
||||||
<p>Excluding file types can be done by adding wildcard
|
<p>Excluding files by name can be done by adding wildcard
|
||||||
name patterns to the <a class="link" href=
|
name patterns to the <a class="link" href=
|
||||||
"#RCL.INSTALL.CONFIG.RECOLLCONF.SKIPPEDNAMES">skippedNames</a>
|
"#RCL.INSTALL.CONFIG.RECOLLCONF.SKIPPEDNAMES">skippedNames</a>
|
||||||
list, which can be done from the GUI Index configuration
|
list, which can be done from the GUI Index configuration
|
||||||
menu. For versions 1.20 and later, you can alternatively
|
menu. Excluding by type can be done by setting the
|
||||||
set the <a class="link" href=
|
<a class="link" href=
|
||||||
"#RCL.INSTALL.CONFIG.RECOLLCONF.EXCLUDEDMIMETYPES">excludedmimetypes</a>
|
"#RCL.INSTALL.CONFIG.RECOLLCONF.EXCLUDEDMIMETYPES">excludedmimetypes</a>
|
||||||
list in the configuration file. This can be redefined for
|
list in the configuration file (1.20 and later). This can
|
||||||
subdirectories.</p>
|
be redefined for subdirectories.</p>
|
||||||
<p>You can also define an exclusive list of MIME types to
|
<p>You can also define an exclusive list of MIME types to
|
||||||
be indexed (no others will be indexed), by settting the
|
be indexed (no others will be indexed), by settting the
|
||||||
<a class="link" href=
|
<a class="link" href=
|
||||||
@ -1021,8 +1022,8 @@ alink="#0000FF">
|
|||||||
<div class="titlepage">
|
<div class="titlepage">
|
||||||
<div>
|
<div>
|
||||||
<div>
|
<div>
|
||||||
<h3 class="title"><a name="idm264" id=
|
<h3 class="title"><a name="idm265" id=
|
||||||
"idm264"></a>2.1.4. Indexing failures</h3>
|
"idm265"></a>2.1.4. Indexing failures</h3>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
@ -1039,7 +1040,7 @@ alink="#0000FF">
|
|||||||
may be quite costly (for example failing to uncompress a
|
may be quite costly (for example failing to uncompress a
|
||||||
big file because of insufficient disk space).</p>
|
big file because of insufficient disk space).</p>
|
||||||
<p>The indexer in <span class="application">Recoll</span>
|
<p>The indexer in <span class="application">Recoll</span>
|
||||||
versions 1.21 and later does not retry failed file by
|
versions 1.21 and later does not retry failed files by
|
||||||
default. Retrying will only occur if an explicit option
|
default. Retrying will only occur if an explicit option
|
||||||
(<code class="option">-k</code>) is set on the
|
(<code class="option">-k</code>) is set on the
|
||||||
<span class="command"><strong>recollindex</strong></span>
|
<span class="command"><strong>recollindex</strong></span>
|
||||||
@ -1057,8 +1058,8 @@ alink="#0000FF">
|
|||||||
<div class="titlepage">
|
<div class="titlepage">
|
||||||
<div>
|
<div>
|
||||||
<div>
|
<div>
|
||||||
<h3 class="title"><a name="idm276" id=
|
<h3 class="title"><a name="idm277" id=
|
||||||
"idm276"></a>2.1.5. Recovery</h3>
|
"idm277"></a>2.1.5. Recovery</h3>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
@ -1153,9 +1154,9 @@ alink="#0000FF">
|
|||||||
non-indexed data (an extreme example being a set of mp3
|
non-indexed data (an extreme example being a set of mp3
|
||||||
files where only the tags would be indexed).</p>
|
files where only the tags would be indexed).</p>
|
||||||
<p>Of course, images, sound and video do not increase the
|
<p>Of course, images, sound and video do not increase the
|
||||||
index size, which means that nowadays, typically, even a
|
index size, which means that typically, even a big index
|
||||||
big index will be negligible against the total amount of
|
will be negligible against the total amount of data on the
|
||||||
data on the computer.</p>
|
computer.</p>
|
||||||
<p>The index data directory (<code class=
|
<p>The index data directory (<code class=
|
||||||
"filename">xapiandb</code>) only contains data that can be
|
"filename">xapiandb</code>) only contains data that can be
|
||||||
completely rebuilt by an index run (as long as the original
|
completely rebuilt by an index run (as long as the original
|
||||||
@ -1200,10 +1201,11 @@ alink="#0000FF">
|
|||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
<p>The <span class="application">Recoll</span> index does
|
<p>The <span class="application">Recoll</span> index does
|
||||||
not hold copies of the indexed documents. But it does
|
not hold complete copies of the indexed documents (it
|
||||||
hold enough data to allow for an almost complete
|
almost does after version 1.24). But it does hold enough
|
||||||
reconstruction. If confidential data is indexed, access
|
data to allow for an almost complete reconstruction. If
|
||||||
to the database directory should be restricted.</p>
|
confidential data is indexed, access to the database
|
||||||
|
directory should be restricted.</p>
|
||||||
<p><span class="application">Recoll</span> will create
|
<p><span class="application">Recoll</span> will create
|
||||||
the configuration directory with a mode of 0700 (access
|
the configuration directory with a mode of 0700 (access
|
||||||
by owner only). As the index data directory is by default
|
by owner only). As the index data directory is by default
|
||||||
@ -1256,8 +1258,7 @@ alink="#0000FF">
|
|||||||
"refentrytitle">recoll.conf</span>(5)</span> man page, but
|
"refentrytitle">recoll.conf</span>(5)</span> man page, but
|
||||||
the most current information will most likely be the
|
the most current information will most likely be the
|
||||||
comments inside the sample file. The most immediately
|
comments inside the sample file. The most immediately
|
||||||
useful variable you may interested in is probably <a class=
|
useful variable is probably <a class="link" href=
|
||||||
"link" href=
|
|
||||||
"#RCL.INSTALL.CONFIG.RECOLLCONF.TOPDIRS"><code class=
|
"#RCL.INSTALL.CONFIG.RECOLLCONF.TOPDIRS"><code class=
|
||||||
"varname">topdirs</code></a>, which determines what
|
"varname">topdirs</code></a>, which determines what
|
||||||
subtrees and files get indexed.</p>
|
subtrees and files get indexed.</p>
|
||||||
@ -1271,9 +1272,8 @@ alink="#0000FF">
|
|||||||
Recoll indexes, depending on the treatment of character
|
Recoll indexes, depending on the treatment of character
|
||||||
case and diacritics. A <a class="link" href=
|
case and diacritics. A <a class="link" href=
|
||||||
"#RCL.INDEXING.CONFIG.SENS" title=
|
"#RCL.INDEXING.CONFIG.SENS" title=
|
||||||
"2.3.2. Index case and diacritics sensitivity">a
|
"2.3.2. Index case and diacritics sensitivity">further
|
||||||
further section</a> describes the two types in more
|
section</a> describes the two types in more detail.</p>
|
||||||
detail.</p>
|
|
||||||
<div class="sect2">
|
<div class="sect2">
|
||||||
<div class="titlepage">
|
<div class="titlepage">
|
||||||
<div>
|
<div>
|
||||||
@ -1317,7 +1317,7 @@ alink="#0000FF">
|
|||||||
where narrowing the search can improve the results. You
|
where narrowing the search can improve the results. You
|
||||||
can achieve approximately the same effect with the
|
can achieve approximately the same effect with the
|
||||||
directory filter in advanced search, but multiple indexes
|
directory filter in advanced search, but multiple indexes
|
||||||
will have much better performance and may be worth the
|
will have better performance and may be worth the
|
||||||
trouble.</p>
|
trouble.</p>
|
||||||
<p>A <span class=
|
<p>A <span class=
|
||||||
"command"><strong>recollindex</strong></span> program
|
"command"><strong>recollindex</strong></span> program
|
||||||
@ -1325,7 +1325,7 @@ alink="#0000FF">
|
|||||||
only use parameters from a single configuration (no
|
only use parameters from a single configuration (no
|
||||||
parameters are ever shared between configurations when
|
parameters are ever shared between configurations when
|
||||||
indexing).</p>
|
indexing).</p>
|
||||||
<p>Multiple indexes can queryied concurrently, either
|
<p>Multiple indexes can be queryied concurrently, either
|
||||||
from the GUI or the command line. When doing this, there
|
from the GUI or the command line. When doing this, there
|
||||||
is always a main configuration, from which both
|
is always a main configuration, from which both
|
||||||
configuration and index data are used. Only the index
|
configuration and index data are used. Only the index
|
||||||
@ -2082,68 +2082,6 @@ alink="#0000FF">
|
|||||||
"command"><strong>recollindex</strong></span> will detach
|
"command"><strong>recollindex</strong></span> will detach
|
||||||
from the terminal and become a daemon, permanently
|
from the terminal and become a daemon, permanently
|
||||||
monitoring file changes and updating the index.</p>
|
monitoring file changes and updating the index.</p>
|
||||||
<p>Under <span class="application">KDE</span>, <span class=
|
|
||||||
"application">Gnome</span> and some other desktop
|
|
||||||
environments, the daemon can automatically started when you
|
|
||||||
log in, by creating a desktop file inside the <code class=
|
|
||||||
"filename">~/.config/autostart</code> directory. This can
|
|
||||||
be done for you by the <span class=
|
|
||||||
"application">Recoll</span> GUI. Use the <span class=
|
|
||||||
"guimenu">Preferences->Indexing Schedule</span>
|
|
||||||
menu.</p>
|
|
||||||
<p>With older <span class="application">X11</span> setups,
|
|
||||||
starting the daemon is normally performed as part of the
|
|
||||||
user session script.</p>
|
|
||||||
<p>The <code class="filename">rclmon.sh</code> script can
|
|
||||||
be used to easily start and stop the daemon. It can be
|
|
||||||
found in the <code class="filename">examples</code>
|
|
||||||
directory (typically <code class=
|
|
||||||
"filename">/usr/local/[share/]recoll/examples</code>).</p>
|
|
||||||
<p>For example, my out of fashion <span class=
|
|
||||||
"application">xdm</span>-based session has a <code class=
|
|
||||||
"filename">.xsession</code> script with the following lines
|
|
||||||
at the end:</p>
|
|
||||||
<pre class="programlisting">recollconf=$HOME/.recoll-home
|
|
||||||
recolldata=/usr/local/share/recoll
|
|
||||||
RECOLL_CONFDIR=$recollconf $recolldata/examples/rclmon.sh start
|
|
||||||
|
|
||||||
fvwm
|
|
||||||
|
|
||||||
</pre>
|
|
||||||
<p>The indexing daemon gets started, then the window
|
|
||||||
manager, for which the session waits.</p>
|
|
||||||
<p>By default the indexing daemon will monitor the state of
|
|
||||||
the X11 session, and exit when it finishes, it is not
|
|
||||||
necessary to kill it explicitly. (The <span class=
|
|
||||||
"application">X11</span> server monitoring can be disabled
|
|
||||||
with option <code class="option">-x</code> to <span class=
|
|
||||||
"command"><strong>recollindex</strong></span>).</p>
|
|
||||||
<p>If you use the daemon completely out of an <span class=
|
|
||||||
"application">X11</span> session, you need to add option
|
|
||||||
<code class="option">-x</code> to disable <span class=
|
|
||||||
"application">X11</span> session monitoring (else the
|
|
||||||
daemon will not start).</p>
|
|
||||||
<p>By default, the messages from the indexing daemon will
|
|
||||||
be sent to the same file as those from the interactive
|
|
||||||
commands (<code class="literal">logfilename</code>). You
|
|
||||||
may want to change this by setting the <code class=
|
|
||||||
"varname">daemlogfilename</code> and <code class=
|
|
||||||
"varname">daemloglevel</code> configuration parameters.
|
|
||||||
Also the log file will only be truncated when the daemon
|
|
||||||
starts. If the daemon runs permanently, the log file may
|
|
||||||
grow quite big, depending on the log level.</p>
|
|
||||||
<p>When building <span class="application">Recoll</span>,
|
|
||||||
the real time indexing support can be customised during
|
|
||||||
package <a class="link" href="#RCL.INSTALL.BUILDING" title=
|
|
||||||
"6.3. Building from source">configuration</a> with the
|
|
||||||
<code class="option">--with[out]-fam</code> or <code class=
|
|
||||||
"option">--with[out]-inotify</code> options. The default is
|
|
||||||
currently to include <span class=
|
|
||||||
"application">inotify</span> monitoring on systems that
|
|
||||||
support it, and, as of <span class=
|
|
||||||
"application">Recoll</span> 1.17, <span class=
|
|
||||||
"application">gamin</span> support on <span class=
|
|
||||||
"application">FreeBSD</span>.</p>
|
|
||||||
<p>While it is convenient that data is indexed in real
|
<p>While it is convenient that data is indexed in real
|
||||||
time, repeated indexing can generate a significant load on
|
time, repeated indexing can generate a significant load on
|
||||||
the system when files such as email folders change. Also,
|
the system when files such as email folders change. Also,
|
||||||
@ -2151,68 +2089,149 @@ alink="#0000FF">
|
|||||||
system resources. You probably do not want to enable it if
|
system resources. You probably do not want to enable it if
|
||||||
your system is short on resources. Periodic indexing is
|
your system is short on resources. Periodic indexing is
|
||||||
adequate in most cases.</p>
|
adequate in most cases.</p>
|
||||||
<p>As of <span class="application">Recoll</span> 1.25, you
|
<p>As of <span class="application">Recoll</span> 1.24, you
|
||||||
can set the <a class="link" href=
|
can set the <a class="link" href=
|
||||||
"#RCL.INSTALL.CONFIG.RECOLLCONF.MONITORDIRS">monitordirs</a>
|
"#RCL.INSTALL.CONFIG.RECOLLCONF.MONITORDIRS">monitordirs</a>
|
||||||
configuration variable to specify that only a subset of
|
configuration variable to specify that only a subset of
|
||||||
your indexed files will be monitored for instant indexing.
|
your indexed files will be monitored for instant indexing.
|
||||||
In this situation, an incremental pass on the full tree can
|
In this situation, an incremental pass on the full tree can
|
||||||
be triggered by either restarting the indexer, or just
|
be triggered by either restarting the indexer, or just
|
||||||
running the <span class=
|
running <span class=
|
||||||
"command"><strong>recollindex</strong></span>, which will
|
"command"><strong>recollindex</strong></span>, which will
|
||||||
notify the running process. The <span class=
|
notify the running process. The <span class=
|
||||||
"command"><strong>recoll</strong></span> GUI also has a
|
"command"><strong>recoll</strong></span> GUI also has a
|
||||||
menu entry for this.</p>
|
menu entry for this.</p>
|
||||||
<div class="note" style=
|
<div class="sect2">
|
||||||
"margin-left: 0.5in; margin-right: 0.5in;">
|
<div class="titlepage">
|
||||||
<h3 class="title">Increasing resources for inotify</h3>
|
<div>
|
||||||
<p>On Linux systems, monitoring a big tree may need
|
<div>
|
||||||
increasing the resources available to inotify, which are
|
<h3 class="title"><a name=
|
||||||
normally defined in <code class=
|
"RCL.INDEXING.MONITOR.START" id=
|
||||||
"filename">/etc/sysctl.conf</code>.</p>
|
"RCL.INDEXING.MONITOR.START"></a>2.9.1. Real
|
||||||
<pre class="programlisting">
|
time indexing: automatic daemon start</h3>
|
||||||
### inotify
|
</div>
|
||||||
#
|
</div>
|
||||||
# cat /proc/sys/fs/inotify/max_queued_events - 16384
|
</div>
|
||||||
# cat /proc/sys/fs/inotify/max_user_instances - 128
|
<p>Under <span class="application">KDE</span>,
|
||||||
# cat /proc/sys/fs/inotify/max_user_watches - 16384
|
<span class="application">Gnome</span> and some other
|
||||||
#
|
desktop environments, the daemon can automatically
|
||||||
# -- Change to:
|
started when you log in, by creating a desktop file
|
||||||
#
|
inside the <code class=
|
||||||
fs.inotify.max_queued_events=32768
|
"filename">~/.config/autostart</code> directory. This can
|
||||||
fs.inotify.max_user_instances=256
|
be done for you by the <span class=
|
||||||
fs.inotify.max_user_watches=32768
|
"application">Recoll</span> GUI. Use the <span class=
|
||||||
</pre>
|
"guimenu">Preferences->Indexing Schedule</span>
|
||||||
<p>Especially, you will need to trim your tree or adjust
|
menu.</p>
|
||||||
the <code class="literal">max_user_watches</code> value
|
<p>With older <span class="application">X11</span>
|
||||||
if indexing exits with a message about errno <code class=
|
setups, starting the daemon is normally performed as part
|
||||||
"literal">ENOSPC</code> (28) from <code class=
|
of the user session script.</p>
|
||||||
"function">inotify_add_watch</code>.</p>
|
<p>The <code class="filename">rclmon.sh</code> script can
|
||||||
|
be used to easily start and stop the daemon. It can be
|
||||||
|
found in the <code class="filename">examples</code>
|
||||||
|
directory (typically <code class=
|
||||||
|
"filename">/usr/local/[share/]recoll/examples</code>).</p>
|
||||||
|
<p>For example, my out of fashion <span class=
|
||||||
|
"application">xdm</span>-based session has a <code class=
|
||||||
|
"filename">.xsession</code> script with the following
|
||||||
|
lines at the end:</p>
|
||||||
|
<pre class="programlisting">recollconf=$HOME/.recoll-home
|
||||||
|
recolldata=/usr/local/share/recoll
|
||||||
|
RECOLL_CONFDIR=$recollconf $recolldata/examples/rclmon.sh start
|
||||||
|
|
||||||
|
fvwm
|
||||||
|
|
||||||
|
</pre>
|
||||||
|
<p>The indexing daemon gets started, then the window
|
||||||
|
manager, for which the session waits.</p>
|
||||||
|
<p>By default the indexing daemon will monitor the state
|
||||||
|
of the X11 session, and exit when it finishes, it is not
|
||||||
|
necessary to kill it explicitly. (The <span class=
|
||||||
|
"application">X11</span> server monitoring can be
|
||||||
|
disabled with option <code class="option">-x</code> to
|
||||||
|
<span class=
|
||||||
|
"command"><strong>recollindex</strong></span>).</p>
|
||||||
|
<p>If you use the daemon completely out of an
|
||||||
|
<span class="application">X11</span> session, you need to
|
||||||
|
add option <code class="option">-x</code> to disable
|
||||||
|
<span class="application">X11</span> session monitoring
|
||||||
|
(else the daemon will not start).</p>
|
||||||
</div>
|
</div>
|
||||||
<div class="sect2">
|
<div class="sect2">
|
||||||
<div class="titlepage">
|
<div class="titlepage">
|
||||||
<div>
|
<div>
|
||||||
<div>
|
<div>
|
||||||
<h3 class="title"><a name=
|
<h3 class="title"><a name=
|
||||||
"RCL.INDEXING.MONITOR.FASTFILES" id=
|
"RCL.INDEXING.MONITOR.DETAILS" id=
|
||||||
"RCL.INDEXING.MONITOR.FASTFILES"></a>2.9.1. Slowing
|
"RCL.INDEXING.MONITOR.DETAILS"></a>2.9.2. Real
|
||||||
down the reindexing rate for fast changing
|
time indexing: miscellaneous details</h3>
|
||||||
files</h3>
|
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
<p>When using the real time monitor, it may happen that
|
<p>By default, the messages from the indexing daemon will
|
||||||
some files need to be indexed, but change so often that
|
be sent to the same file as those from the interactive
|
||||||
they impose an excessive load for the system.</p>
|
commands (<code class="literal">logfilename</code>). You
|
||||||
<p><span class="application">Recoll</span> provides a
|
may want to change this by setting the <code class=
|
||||||
configuration option to specify the minimum time before
|
"varname">daemlogfilename</code> and <code class=
|
||||||
which a file, specified by a wildcard pattern, cannot be
|
"varname">daemloglevel</code> configuration parameters.
|
||||||
reindexed. See the <code class=
|
Also the log file will only be truncated when the daemon
|
||||||
"varname">mondelaypatterns</code> parameter in the
|
starts. If the daemon runs permanently, the log file may
|
||||||
<a class="link" href=
|
grow quite big, depending on the log level.</p>
|
||||||
"#RCL.INSTALL.CONFIG.RECOLLCONF.MISC" title=
|
<p>When building <span class="application">Recoll</span>,
|
||||||
"6.4.2.5. Miscellaneous parameters">configuration
|
the real time indexing support can be customised during
|
||||||
section</a>.</p>
|
package <a class="link" href="#RCL.INSTALL.BUILDING"
|
||||||
|
title="6.3. Building from source">configuration</a>
|
||||||
|
with the <code class="option">--with[out]-fam</code> or
|
||||||
|
<code class="option">--with[out]-inotify</code> options.
|
||||||
|
The default is currently to include <span class=
|
||||||
|
"application">inotify</span> monitoring on systems that
|
||||||
|
support it, and, as of <span class=
|
||||||
|
"application">Recoll</span> 1.17, <span class=
|
||||||
|
"application">gamin</span> support on <span class=
|
||||||
|
"application">FreeBSD</span>.</p>
|
||||||
|
<div class="note" style=
|
||||||
|
"margin-left: 0.5in; margin-right: 0.5in;">
|
||||||
|
<h3 class="title">Increasing resources for inotify</h3>
|
||||||
|
<p>On Linux systems, monitoring a big tree may need
|
||||||
|
increasing the resources available to inotify, which
|
||||||
|
are normally defined in <code class=
|
||||||
|
"filename">/etc/sysctl.conf</code>.</p>
|
||||||
|
<pre class="programlisting">
|
||||||
|
### inotify
|
||||||
|
#
|
||||||
|
# cat /proc/sys/fs/inotify/max_queued_events - 16384
|
||||||
|
# cat /proc/sys/fs/inotify/max_user_instances - 128
|
||||||
|
# cat /proc/sys/fs/inotify/max_user_watches - 16384
|
||||||
|
#
|
||||||
|
# -- Change to:
|
||||||
|
#
|
||||||
|
fs.inotify.max_queued_events=32768
|
||||||
|
fs.inotify.max_user_instances=256
|
||||||
|
fs.inotify.max_user_watches=32768
|
||||||
|
</pre>
|
||||||
|
<p>Especially, you will need to trim your tree or
|
||||||
|
adjust the <code class=
|
||||||
|
"literal">max_user_watches</code> value if indexing
|
||||||
|
exits with a message about errno <code class=
|
||||||
|
"literal">ENOSPC</code> (28) from <code class=
|
||||||
|
"function">inotify_add_watch</code>.</p>
|
||||||
|
</div>
|
||||||
|
<div class="note" style=
|
||||||
|
"margin-left: 0.5in; margin-right: 0.5in;">
|
||||||
|
<h3 class="title">Slowing down the reindexing rate for
|
||||||
|
fast changing files</h3>
|
||||||
|
<p>When using the real time monitor, it may happen that
|
||||||
|
some files need to be indexed, but change so often that
|
||||||
|
they impose an excessive load for the system.</p>
|
||||||
|
<p><span class="application">Recoll</span> provides a
|
||||||
|
configuration option to specify the minimum time before
|
||||||
|
which a file, specified by a wildcard pattern, cannot
|
||||||
|
be reindexed. See the <code class=
|
||||||
|
"varname">mondelaypatterns</code> parameter in the
|
||||||
|
<a class="link" href=
|
||||||
|
"#RCL.INSTALL.CONFIG.RECOLLCONF.MISC" title=
|
||||||
|
"6.4.2.5. Miscellaneous parameters">configuration
|
||||||
|
section</a>.</p>
|
||||||
|
</div>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
|||||||
@ -25,7 +25,7 @@
|
|||||||
</author>
|
</author>
|
||||||
|
|
||||||
<copyright>
|
<copyright>
|
||||||
<year>2005-2015</year>
|
<year>2005-2018</year>
|
||||||
<holder role="mailto:jfd@recoll.org">Jean-Francois Dockes</holder>
|
<holder role="mailto:jfd@recoll.org">Jean-Francois Dockes</holder>
|
||||||
</copyright>
|
</copyright>
|
||||||
|
|
||||||
@ -89,7 +89,7 @@
|
|||||||
</menuchoice>, then adjust the <guilabel>Top
|
</menuchoice>, then adjust the <guilabel>Top
|
||||||
directories</guilabel> section).</para>
|
directories</guilabel> section).</para>
|
||||||
|
|
||||||
<para>Also be aware that, on Unix/Linux, you may need to install the
|
<para>On Unix/Linux, you may need to install the
|
||||||
appropriate <link linkend="RCL.INSTALL.EXTERNAL"> supporting
|
appropriate <link linkend="RCL.INSTALL.EXTERNAL"> supporting
|
||||||
applications</link> for document types that need them (for
|
applications</link> for document types that need them (for
|
||||||
example <application>antiword</application> for
|
example <application>antiword</application> for
|
||||||
@ -175,13 +175,13 @@
|
|||||||
<para>In a shorter way, &RCL; does the dirty footwork, &XAP;
|
<para>In a shorter way, &RCL; does the dirty footwork, &XAP;
|
||||||
deals with the intelligent parts of the process.</para>
|
deals with the intelligent parts of the process.</para>
|
||||||
|
|
||||||
<para>The &XAP; index can be big (roughly the size of the
|
<para>The &XAP; index can be big (roughly the size of the original
|
||||||
original document set), but it is not a document
|
document set), but it is not a document archive. &RCL; can only
|
||||||
archive. &RCL; can only display documents that still exist at
|
display documents that still exist at the place from which they were
|
||||||
the place from which they were indexed. (Actually, there is a
|
indexed. (Actually, there is a way to reconstruct a document from the
|
||||||
way to reconstruct a document from the information in the
|
information in the index, but only the pure text is saved, possibly
|
||||||
index, but the result is not nice, as all formatting,
|
without punctuation and capitalization, depending on &RCL;
|
||||||
punctuation and capitalization are lost).</para>
|
version).</para>
|
||||||
|
|
||||||
<para>&RCL; stores all internal data in <application>Unicode
|
<para>&RCL; stores all internal data in <application>Unicode
|
||||||
UTF-8</application> format, and it can index files of many types
|
UTF-8</application> format, and it can index files of many types
|
||||||
@ -332,9 +332,8 @@
|
|||||||
<formalpara>
|
<formalpara>
|
||||||
<title><link linkend="RCL.INDEXING.PERIODIC">
|
<title><link linkend="RCL.INDEXING.PERIODIC">
|
||||||
Periodic (or batch) indexing:</link></title>
|
Periodic (or batch) indexing:</link></title>
|
||||||
<para>indexing takes place at discrete
|
<para><command>recollindex</command> is executed
|
||||||
times, by executing the <command>recollindex</command>
|
at discrete times. The typical usage is to have a nightly run
|
||||||
command. The typical usage is to have a nightly indexing run
|
|
||||||
<link linkend="RCL.INDEXING.PERIODIC.AUTOMAT">
|
<link linkend="RCL.INDEXING.PERIODIC.AUTOMAT">
|
||||||
programmed</link> into
|
programmed</link> into
|
||||||
your <command>cron</command> file.</para>
|
your <command>cron</command> file.</para>
|
||||||
@ -342,12 +341,12 @@
|
|||||||
</listitem>
|
</listitem>
|
||||||
<listitem>
|
<listitem>
|
||||||
<formalpara><title><link linkend="RCL.INDEXING.MONITOR">Real
|
<formalpara><title><link linkend="RCL.INDEXING.MONITOR">Real
|
||||||
time indexing:</link></title> <para>indexing takes place as
|
time indexing:</link></title>
|
||||||
soon as a file is created or
|
<para><command>recollindex</command> runs permanently as a
|
||||||
changed. <command>recollindex</command> runs as a daemon and
|
daemon and uses a file system alteration monitor
|
||||||
uses a file system alteration monitor
|
|
||||||
(e.g. <application>inotify</application>) to detect file
|
(e.g. <application>inotify</application>) to detect file
|
||||||
changes.</para> </formalpara>
|
changes. New or updated files are indexed at once.</para>
|
||||||
|
</formalpara>
|
||||||
</listitem>
|
</listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
</para>
|
</para>
|
||||||
@ -359,7 +358,7 @@
|
|||||||
directory). Monitoring a big file system tree can consume
|
directory). Monitoring a big file system tree can consume
|
||||||
significant system resources.</para>
|
significant system resources.</para>
|
||||||
|
|
||||||
<para>With &RCL; 1.25 and newer, it is also possible to set up an
|
<para>With &RCL; 1.24 and newer, it is also possible to set up an
|
||||||
index so that only a subset of the tree will be monitored and the
|
index so that only a subset of the tree will be monitored and the
|
||||||
rest will be covered by batch/incremental indexing. (See the
|
rest will be covered by batch/incremental indexing. (See the
|
||||||
details in the <link linkend="RCL.INDEXING.MONITOR">Real time
|
details in the <link linkend="RCL.INDEXING.MONITOR">Real time
|
||||||
@ -373,7 +372,7 @@
|
|||||||
</menuchoice>
|
</menuchoice>
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para>The <menuchoice><guimenu>File</guimenu>
|
<para>The GUI <menuchoice><guimenu>File</guimenu>
|
||||||
</menuchoice> menu also has entries to start or stop
|
</menuchoice> menu also has entries to start or stop
|
||||||
the current indexing operation. Stopping indexing is performed by
|
the current indexing operation. Stopping indexing is performed by
|
||||||
killing the <command>recollindex</command> process, which will
|
killing the <command>recollindex</command> process, which will
|
||||||
@ -430,10 +429,10 @@
|
|||||||
entirely independant (no parameters are ever shared between
|
entirely independant (no parameters are ever shared between
|
||||||
configurations when indexing).</para>
|
configurations when indexing).</para>
|
||||||
|
|
||||||
<para>Multiple indexes can queryied concurrently, either from the
|
<para>Multiple indexes can be queryied concurrently, either from
|
||||||
GUI or the command line. When doing this, there is always a main
|
the GUI or the command line. When doing this, there is always a
|
||||||
configuration, from which both configuration and index data are
|
main configuration, from which both configuration and index data
|
||||||
used. Only the index data from the additional indexes is used
|
are used. Only the index data from the additional indexes is used
|
||||||
(their configuration parameters are ignored).</para>
|
(their configuration parameters are ignored).</para>
|
||||||
|
|
||||||
<para>This is important and sometimes confusing, so it will be
|
<para>This is important and sometimes confusing, so it will be
|
||||||
@ -464,8 +463,9 @@
|
|||||||
document stored as an attachment to an email message inside an
|
document stored as an attachment to an email message inside an
|
||||||
email folder archived in a zip file...</para>
|
email folder archived in a zip file...</para>
|
||||||
|
|
||||||
<para>&RCL; indexing processes plain text, HTML, OpenDocument
|
<para><command>recollindex</command> processes plain text, HTML,
|
||||||
(Open/LibreOffice), email formats, and a few others internally.</para>
|
OpenDocument (Open/LibreOffice), email formats, and a few others
|
||||||
|
internally.</para>
|
||||||
|
|
||||||
<para>Other file types (ie: postscript, pdf, ms-word, rtf ...)
|
<para>Other file types (ie: postscript, pdf, ms-word, rtf ...)
|
||||||
need external applications for preprocessing. The list is in the
|
need external applications for preprocessing. The list is in the
|
||||||
@ -488,14 +488,15 @@
|
|||||||
indexed. In the latter case, any type not in the list will
|
indexed. In the latter case, any type not in the list will
|
||||||
be ignored.</para>
|
be ignored.</para>
|
||||||
|
|
||||||
<para>Excluding file types can be done by adding wildcard name
|
<para>Excluding files by name can be done by adding wildcard name
|
||||||
patterns to the
|
patterns to the
|
||||||
<link linkend="RCL.INSTALL.CONFIG.RECOLLCONF.SKIPPEDNAMES">
|
<link linkend="RCL.INSTALL.CONFIG.RECOLLCONF.SKIPPEDNAMES">
|
||||||
skippedNames</link> list, which
|
skippedNames</link> list, which
|
||||||
can be done from the GUI Index configuration menu. For
|
can be done from the GUI Index configuration menu. Excluding by
|
||||||
versions 1.20 and later, you can alternatively set the
|
type can be done by setting the
|
||||||
<link linkend="RCL.INSTALL.CONFIG.RECOLLCONF.EXCLUDEDMIMETYPES">
|
<link linkend="RCL.INSTALL.CONFIG.RECOLLCONF.EXCLUDEDMIMETYPES">
|
||||||
excludedmimetypes</link> list in the configuration file. This
|
excludedmimetypes</link> list in the configuration file (1.20
|
||||||
|
and later). This
|
||||||
can be redefined for subdirectories.</para>
|
can be redefined for subdirectories.</para>
|
||||||
|
|
||||||
<para>You can also define an exclusive list of MIME types to be
|
<para>You can also define an exclusive list of MIME types to be
|
||||||
@ -550,7 +551,7 @@
|
|||||||
file because of insufficient disk space).</para>
|
file because of insufficient disk space).</para>
|
||||||
|
|
||||||
<para>The indexer in &RCL; versions 1.21 and later does not
|
<para>The indexer in &RCL; versions 1.21 and later does not
|
||||||
retry failed file by default. Retrying will only occur if an
|
retry failed files by default. Retrying will only occur if an
|
||||||
explicit option (<option>-k</option>) is set on the
|
explicit option (<option>-k</option>) is set on the
|
||||||
<command>recollindex</command> command line, or if a script
|
<command>recollindex</command> command line, or if a script
|
||||||
executed when <command>recollindex</command> starts up says
|
executed when <command>recollindex</command> starts up says
|
||||||
@ -636,10 +637,9 @@
|
|||||||
example being a set of mp3 files where only the tags would be
|
example being a set of mp3 files where only the tags would be
|
||||||
indexed).</para>
|
indexed).</para>
|
||||||
|
|
||||||
<para>Of course, images, sound and video do not increase the
|
<para>Of course, images, sound and video do not increase the index
|
||||||
index size, which means that nowadays, typically, even a big
|
size, which means that typically, even a big index will be negligible
|
||||||
index will be negligible against the total amount of data on the
|
against the total amount of data on the computer.</para>
|
||||||
computer.</para>
|
|
||||||
|
|
||||||
<para>The index data directory (<filename>xapiandb</filename>)
|
<para>The index data directory (<filename>xapiandb</filename>)
|
||||||
only contains data that can be completely rebuilt by an index run
|
only contains data that can be completely rebuilt by an index run
|
||||||
@ -669,10 +669,11 @@
|
|||||||
<sect2 id="RCL.INDEXING.STORAGE.SECURITY">
|
<sect2 id="RCL.INDEXING.STORAGE.SECURITY">
|
||||||
<title>Security aspects</title>
|
<title>Security aspects</title>
|
||||||
|
|
||||||
<para>The &RCL; index does not hold copies of the indexed
|
<para>The &RCL; index does not hold complete copies of the indexed
|
||||||
documents. But it does hold enough data to allow for an almost
|
documents (it almost does after version 1.24). But it does
|
||||||
complete reconstruction. If confidential data is indexed,
|
hold enough data to allow for an almost complete reconstruction. If
|
||||||
access to the database directory should be restricted. </para>
|
confidential data is indexed, access to the database directory
|
||||||
|
should be restricted. </para>
|
||||||
|
|
||||||
<para>&RCL; will create the configuration directory with a mode of
|
<para>&RCL; will create the configuration directory with a mode of
|
||||||
0700 (access by owner only). As the index data directory is by
|
0700 (access by owner only). As the index data directory is by
|
||||||
@ -716,10 +717,9 @@
|
|||||||
<refentrytitle>recoll.conf</refentrytitle>
|
<refentrytitle>recoll.conf</refentrytitle>
|
||||||
<manvolnum>5</manvolnum>
|
<manvolnum>5</manvolnum>
|
||||||
</citerefentry>
|
</citerefentry>
|
||||||
man page, but the most
|
man page, but the most current information will most likely be the
|
||||||
current information will most likely be the comments inside the
|
comments inside the sample file. The most immediately useful variable
|
||||||
sample file. The most immediately useful variable you may
|
is probably
|
||||||
interested in is probably
|
|
||||||
<link linkend="RCL.INSTALL.CONFIG.RECOLLCONF.TOPDIRS">
|
<link linkend="RCL.INSTALL.CONFIG.RECOLLCONF.TOPDIRS">
|
||||||
<varname>topdirs</varname></link>,
|
<varname>topdirs</varname></link>,
|
||||||
which determines what subtrees and files get indexed.</para>
|
which determines what subtrees and files get indexed.</para>
|
||||||
@ -731,7 +731,7 @@
|
|||||||
|
|
||||||
<para>As of Recoll 1.18 there are two incompatible types of Recoll
|
<para>As of Recoll 1.18 there are two incompatible types of Recoll
|
||||||
indexes, depending on the treatment of character case and
|
indexes, depending on the treatment of character case and
|
||||||
diacritics. A <link linkend="RCL.INDEXING.CONFIG.SENS">a further
|
diacritics. A <link linkend="RCL.INDEXING.CONFIG.SENS">further
|
||||||
section</link> describes the two types in more detail.</para>
|
section</link> describes the two types in more detail.</para>
|
||||||
|
|
||||||
<sect2 id="RCL.INDEXING.CONFIG.MULTIPLE">
|
<sect2 id="RCL.INDEXING.CONFIG.MULTIPLE">
|
||||||
@ -757,26 +757,25 @@
|
|||||||
to avoid mistakenly creating additional directories when an
|
to avoid mistakenly creating additional directories when an
|
||||||
argument is mistyped.</para>
|
argument is mistyped.</para>
|
||||||
|
|
||||||
<para>A typical usage scenario for the multiple index feature
|
<para>A typical usage scenario for the multiple index feature would
|
||||||
would be for a system administrator to set up a central index
|
be for a system administrator to set up a central index for shared
|
||||||
for shared data, that you choose to search or not in addition to
|
data, that you choose to search or not in addition to your personal
|
||||||
your personal data. Of course, there are other
|
data. Of course, there are other possibilities. There are many
|
||||||
possibilities. There are many cases where you know the subset of
|
cases where you know the subset of files that should be searched,
|
||||||
files that should be searched, and where narrowing the search
|
and where narrowing the search can improve the results. You can
|
||||||
can improve the results. You can achieve approximately the same
|
achieve approximately the same effect with the directory filter in
|
||||||
effect with the directory filter in advanced search, but
|
advanced search, but multiple indexes will have better performance
|
||||||
multiple indexes will have much better performance and may be
|
and may be worth the trouble.</para>
|
||||||
worth the trouble.</para>
|
|
||||||
|
|
||||||
<para>A <command>recollindex</command> program instance can only
|
<para>A <command>recollindex</command> program instance can only
|
||||||
update one specific index, and it will only use parameters from a
|
update one specific index, and it will only use parameters from a
|
||||||
single configuration (no parameters are ever shared between
|
single configuration (no parameters are ever shared between
|
||||||
configurations when indexing).</para>
|
configurations when indexing).</para>
|
||||||
|
|
||||||
<para>Multiple indexes can queryied concurrently, either from the
|
<para>Multiple indexes can be queryied concurrently, either from
|
||||||
GUI or the command line. When doing this, there is always a main
|
the GUI or the command line. When doing this, there is always a
|
||||||
configuration, from which both configuration and index data are
|
main configuration, from which both configuration and index data
|
||||||
used. Only the index data from the additional indexes is used
|
are used. Only the index data from the additional indexes is used
|
||||||
(their configuration parameters are ignored).</para>
|
(their configuration parameters are ignored).</para>
|
||||||
|
|
||||||
<para>When searching, the current main index (defined by
|
<para>When searching, the current main index (defined by
|
||||||
@ -1416,68 +1415,6 @@
|
|||||||
from the terminal and become a daemon, permanently monitoring
|
from the terminal and become a daemon, permanently monitoring
|
||||||
file changes and updating the index.</para>
|
file changes and updating the index.</para>
|
||||||
|
|
||||||
<para>Under <application>KDE</application>,
|
|
||||||
<application>Gnome</application> and some other desktop
|
|
||||||
environments, the daemon can automatically started when you log
|
|
||||||
in, by creating a desktop file inside the
|
|
||||||
<filename>~/.config/autostart</filename> directory. This can be
|
|
||||||
done for you by the &RCL; GUI. Use the
|
|
||||||
<guimenu>Preferences->Indexing Schedule</guimenu> menu.</para>
|
|
||||||
|
|
||||||
<para>With older <application>X11</application> setups, starting
|
|
||||||
the daemon is normally performed as part of the user session
|
|
||||||
script.</para>
|
|
||||||
|
|
||||||
<para>The <filename>rclmon.sh</filename> script can be used to
|
|
||||||
easily start and stop the daemon. It can be found in the
|
|
||||||
<filename>examples</filename> directory (typically
|
|
||||||
<filename>/usr/local/[share/]recoll/examples</filename>).</para>
|
|
||||||
|
|
||||||
<para>For example, my out of fashion
|
|
||||||
<application>xdm</application>-based session has a
|
|
||||||
<filename>.xsession</filename> script with the following lines
|
|
||||||
at the end:</para>
|
|
||||||
|
|
||||||
<programlisting>recollconf=$HOME/.recoll-home
|
|
||||||
recolldata=/usr/local/share/recoll
|
|
||||||
RECOLL_CONFDIR=$recollconf $recolldata/examples/rclmon.sh start
|
|
||||||
|
|
||||||
fvwm
|
|
||||||
|
|
||||||
</programlisting>
|
|
||||||
|
|
||||||
<para>The indexing daemon gets started, then the window manager,
|
|
||||||
for which the session waits.</para> <para>By default the
|
|
||||||
indexing daemon will monitor the state of the X11 session, and
|
|
||||||
exit when it finishes, it is not necessary to kill it
|
|
||||||
explicitly. (The <application>X11</application> server
|
|
||||||
monitoring can be disabled with option <option>-x</option> to
|
|
||||||
<command>recollindex</command>).</para>
|
|
||||||
|
|
||||||
<para>If you use the daemon completely out of an
|
|
||||||
<application>X11</application> session, you need to add option
|
|
||||||
<option>-x</option> to disable <application>X11</application>
|
|
||||||
session monitoring (else the daemon will not start).</para>
|
|
||||||
|
|
||||||
<para>By default, the messages from the indexing daemon will be
|
|
||||||
sent to the same file as those from the interactive commands
|
|
||||||
(<literal>logfilename</literal>). You may want to change this
|
|
||||||
by setting the <varname>daemlogfilename</varname> and
|
|
||||||
<varname>daemloglevel</varname> configuration parameters. Also
|
|
||||||
the log file will only be truncated when the daemon starts. If
|
|
||||||
the daemon runs permanently, the log file may grow quite big,
|
|
||||||
depending on the log level.</para>
|
|
||||||
|
|
||||||
<para>When building &RCL;, the real time indexing support can be
|
|
||||||
customised during package <link
|
|
||||||
linkend="RCL.INSTALL.BUILDING">configuration</link> with
|
|
||||||
the <option>--with[out]-fam</option> or
|
|
||||||
<option>--with[out]-inotify</option> options. The default is
|
|
||||||
currently to include <application>inotify</application>
|
|
||||||
monitoring on systems that support it, and, as of &RCL; 1.17,
|
|
||||||
<application>gamin</application> support on
|
|
||||||
<application>FreeBSD</application>.</para>
|
|
||||||
|
|
||||||
<para>While it is convenient that data is indexed in real time,
|
<para>While it is convenient that data is indexed in real time,
|
||||||
repeated indexing can generate a significant load on the
|
repeated indexing can generate a significant load on the
|
||||||
system when files such as email folders change. Also,
|
system when files such as email folders change. Also,
|
||||||
@ -1486,44 +1423,112 @@
|
|||||||
your system is short on resources. Periodic indexing is
|
your system is short on resources. Periodic indexing is
|
||||||
adequate in most cases.</para>
|
adequate in most cases.</para>
|
||||||
|
|
||||||
<para>As of &RCL; 1.25, you can set the <link
|
<para>As of &RCL; 1.24, you can set the <link
|
||||||
linkend="RCL.INSTALL.CONFIG.RECOLLCONF.MONITORDIRS">monitordirs</link>
|
linkend="RCL.INSTALL.CONFIG.RECOLLCONF.MONITORDIRS">monitordirs</link>
|
||||||
configuration variable to specify that only a subset of your indexed
|
configuration variable to specify that only a subset of your indexed
|
||||||
files will be monitored for instant indexing. In this situation, an
|
files will be monitored for instant indexing. In this situation, an
|
||||||
incremental pass on the full tree can be triggered by either
|
incremental pass on the full tree can be triggered by either
|
||||||
restarting the indexer, or just running the
|
restarting the indexer, or just running
|
||||||
<command>recollindex</command>, which will notify the running
|
<command>recollindex</command>, which will notify the running
|
||||||
process. The <command>recoll</command> GUI also has a menu entry for
|
process. The <command>recoll</command> GUI also has a menu entry for
|
||||||
this.</para>
|
this.</para>
|
||||||
|
|
||||||
|
<sect2 id="RCL.INDEXING.MONITOR.START">
|
||||||
|
<title>Real time indexing: automatic daemon start</title>
|
||||||
|
|
||||||
<note><title>Increasing resources for inotify</title>
|
<para>Under <application>KDE</application>,
|
||||||
<para>On Linux systems, monitoring a big tree may need
|
<application>Gnome</application> and some other desktop
|
||||||
increasing the resources available to inotify, which are
|
environments, the daemon can automatically started when you log
|
||||||
normally defined in <filename>/etc/sysctl.conf</filename>.
|
in, by creating a desktop file inside the
|
||||||
<programlisting>
|
<filename>~/.config/autostart</filename> directory. This can be
|
||||||
### inotify
|
done for you by the &RCL; GUI. Use the
|
||||||
#
|
<guimenu>Preferences->Indexing Schedule</guimenu> menu.</para>
|
||||||
# cat /proc/sys/fs/inotify/max_queued_events - 16384
|
|
||||||
# cat /proc/sys/fs/inotify/max_user_instances - 128
|
|
||||||
# cat /proc/sys/fs/inotify/max_user_watches - 16384
|
|
||||||
#
|
|
||||||
# -- Change to:
|
|
||||||
#
|
|
||||||
fs.inotify.max_queued_events=32768
|
|
||||||
fs.inotify.max_user_instances=256
|
|
||||||
fs.inotify.max_user_watches=32768
|
|
||||||
</programlisting>
|
|
||||||
|
|
||||||
</para>
|
<para>With older <application>X11</application> setups, starting
|
||||||
<para>Especially, you will need to trim your tree or adjust
|
the daemon is normally performed as part of the user session
|
||||||
the <literal>max_user_watches</literal> value if indexing exits with
|
script.</para>
|
||||||
a message about errno <literal>ENOSPC</literal> (28) from
|
|
||||||
<function>inotify_add_watch</function>.</para>
|
|
||||||
</note>
|
|
||||||
|
|
||||||
<sect2 id="RCL.INDEXING.MONITOR.FASTFILES">
|
<para>The <filename>rclmon.sh</filename> script can be used to
|
||||||
<title>Slowing down the reindexing rate for fast changing
|
easily start and stop the daemon. It can be found in the
|
||||||
|
<filename>examples</filename> directory (typically
|
||||||
|
<filename>/usr/local/[share/]recoll/examples</filename>).</para>
|
||||||
|
|
||||||
|
<para>For example, my out of fashion
|
||||||
|
<application>xdm</application>-based session has a
|
||||||
|
<filename>.xsession</filename> script with the following lines
|
||||||
|
at the end:</para>
|
||||||
|
|
||||||
|
<programlisting>recollconf=$HOME/.recoll-home
|
||||||
|
recolldata=/usr/local/share/recoll
|
||||||
|
RECOLL_CONFDIR=$recollconf $recolldata/examples/rclmon.sh start
|
||||||
|
|
||||||
|
fvwm
|
||||||
|
|
||||||
|
</programlisting>
|
||||||
|
|
||||||
|
<para>The indexing daemon gets started, then the window manager,
|
||||||
|
for which the session waits.</para> <para>By default the
|
||||||
|
indexing daemon will monitor the state of the X11 session, and
|
||||||
|
exit when it finishes, it is not necessary to kill it
|
||||||
|
explicitly. (The <application>X11</application> server
|
||||||
|
monitoring can be disabled with option <option>-x</option> to
|
||||||
|
<command>recollindex</command>).</para>
|
||||||
|
|
||||||
|
<para>If you use the daemon completely out of an
|
||||||
|
<application>X11</application> session, you need to add option
|
||||||
|
<option>-x</option> to disable <application>X11</application>
|
||||||
|
session monitoring (else the daemon will not start).</para>
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
<sect2 id="RCL.INDEXING.MONITOR.DETAILS">
|
||||||
|
<title>Real time indexing: miscellaneous details</title>
|
||||||
|
|
||||||
|
<para>By default, the messages from the indexing daemon will be
|
||||||
|
sent to the same file as those from the interactive commands
|
||||||
|
(<literal>logfilename</literal>). You may want to change this
|
||||||
|
by setting the <varname>daemlogfilename</varname> and
|
||||||
|
<varname>daemloglevel</varname> configuration parameters. Also
|
||||||
|
the log file will only be truncated when the daemon starts. If
|
||||||
|
the daemon runs permanently, the log file may grow quite big,
|
||||||
|
depending on the log level.</para>
|
||||||
|
|
||||||
|
<para>When building &RCL;, the real time indexing support can be
|
||||||
|
customised during package <link
|
||||||
|
linkend="RCL.INSTALL.BUILDING">configuration</link> with
|
||||||
|
the <option>--with[out]-fam</option> or
|
||||||
|
<option>--with[out]-inotify</option> options. The default is
|
||||||
|
currently to include <application>inotify</application>
|
||||||
|
monitoring on systems that support it, and, as of &RCL; 1.17,
|
||||||
|
<application>gamin</application> support on
|
||||||
|
<application>FreeBSD</application>.</para>
|
||||||
|
|
||||||
|
<note><title>Increasing resources for inotify</title>
|
||||||
|
<para>On Linux systems, monitoring a big tree may need
|
||||||
|
increasing the resources available to inotify, which are
|
||||||
|
normally defined in <filename>/etc/sysctl.conf</filename>.
|
||||||
|
<programlisting>
|
||||||
|
### inotify
|
||||||
|
#
|
||||||
|
# cat /proc/sys/fs/inotify/max_queued_events - 16384
|
||||||
|
# cat /proc/sys/fs/inotify/max_user_instances - 128
|
||||||
|
# cat /proc/sys/fs/inotify/max_user_watches - 16384
|
||||||
|
#
|
||||||
|
# -- Change to:
|
||||||
|
#
|
||||||
|
fs.inotify.max_queued_events=32768
|
||||||
|
fs.inotify.max_user_instances=256
|
||||||
|
fs.inotify.max_user_watches=32768
|
||||||
|
</programlisting>
|
||||||
|
|
||||||
|
</para>
|
||||||
|
<para>Especially, you will need to trim your tree or adjust
|
||||||
|
the <literal>max_user_watches</literal> value if indexing exits with
|
||||||
|
a message about errno <literal>ENOSPC</literal> (28) from
|
||||||
|
<function>inotify_add_watch</function>.</para>
|
||||||
|
</note>
|
||||||
|
|
||||||
|
|
||||||
|
<note><title>Slowing down the reindexing rate for fast changing
|
||||||
files</title>
|
files</title>
|
||||||
|
|
||||||
<para>When using the real time monitor, it may happen that some
|
<para>When using the real time monitor, it may happen that some
|
||||||
@ -1535,8 +1540,10 @@
|
|||||||
reindexed. See the <varname>mondelaypatterns</varname> parameter in
|
reindexed. See the <varname>mondelaypatterns</varname> parameter in
|
||||||
the <link linkend="RCL.INSTALL.CONFIG.RECOLLCONF.MISC">
|
the <link linkend="RCL.INSTALL.CONFIG.RECOLLCONF.MISC">
|
||||||
configuration section</link>.</para>
|
configuration section</link>.</para>
|
||||||
|
</note>
|
||||||
|
|
||||||
</sect2>
|
</sect2>
|
||||||
|
|
||||||
</sect1>
|
</sect1>
|
||||||
|
|
||||||
</chapter>
|
</chapter>
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user