doc: clarify web cache management and "trigger incremental pass"

This commit is contained in:
Jean-Francois Dockes 2021-03-21 10:05:15 +01:00
parent 92f9852942
commit 1b61861ab3
2 changed files with 72 additions and 45 deletions

View File

@ -423,7 +423,7 @@ alink="#0000FF">
<div class="list-of-tables">
<p><b>List of Tables</b></p>
<dl>
<dt>3.1. <a href="#idm1438">Keyboard shortcuts</a></dt>
<dt>3.1. <a href="#idm1444">Keyboard shortcuts</a></dt>
</dl>
</div>
<div class="chapter">
@ -1976,6 +1976,25 @@ recollindex -c "$confdir"
"application">Recoll</span> then processes, storing the
data into a local cache, then indexing it, then removing
the file from the queue.</p>
<div class="note" style=
"margin-left: 0.5in; margin-right: 0.5in;">
<h3 class="title">The local cache is not an archive</h3>
<p>As mentioned above, a copy of the indexed Web pages is
retained by Recoll in a local cache (from which data is
fetched for previews, or when resetting the index). The
cache is not changed by an index reset, just read for
indexing. The cache has a maximum size, which can be
adjusted from the <span class="guilabel">Index
configuration</span> / <span class="guilabel">Web
history</span> panel (<code class=
"literal">webcachemaxmbs</code> parameter in <code class=
"filename">recoll.conf</code>). Once the maximum size is
reached, old pages are erased to make room for new ones.
The pages which you want to keep indefinitely need to be
explicitly archived elsewhere. Using a very high value
for the cache size can avoid data erasure, but see the
above 'Howto' page for more details and gotchas.</p>
</div>
<p>The visited Web pages indexing feature can be enabled on
the <span class="application">Recoll</span> side from the
GUI <span class="guilabel">Index configuration</span>
@ -1989,23 +2008,6 @@ recollindex -c "$confdir"
configuration in a <a class="ulink" href=
"https://www.lesbonscomptes.com/recoll/faqsandhowtos/IndexWebHistory"
target="_top">Recoll 'Howto' entry</a>.</p>
<div class="note" style=
"margin-left: 0.5in; margin-right: 0.5in;">
<h3 class="title">The cache is not an archive</h3>
<p>A copy of the indexed Web pages is retained by Recoll
in a local cache (from which data is fetched for
previews, or when resetting the index). The cache has a
maximum size, which can be adjusted from the <span class=
"guilabel">Index configuration</span> / <span class=
"guilabel">Web history</span> panel (<code class=
"literal">webcachemaxmbs</code> parameter in <code class=
"filename">recoll.conf</code>). Once the maximum size is
reached, old pages are erased to make room for new ones.
The pages which you want to keep indefinitely need to be
explicitly archived elsewhere. Using a very high value
for the cache size can avoid data erasure, but see the
above 'Howto' page for more details and gotchas.</p>
</div>
</div>
<div class="sect1">
<div class="titlepage">
@ -2357,10 +2359,11 @@ metadatacmds = ; <em class=
<p>The GUI <span class="guimenu">File</span> menu has
entries to start or stop the current indexing operation.
When indexing is not currently running, you have a choice
of updating the index or rebuilding it (the first choice
only processes changed files, the second one zeroes the
index before starting so that all files are
processed).</p>
between <span class="guimenuitem">Update Index</span> or
<span class="guimenuitem">Rebuild Index</span>. The first
choice only processes changed files, the second one
erases the index before starting so that all files are
processed.</p>
<p>On Linux and Windows, the GUI can be used to manage
the indexing operation. Stopping the indexer can be done
from the <span class=
@ -2526,7 +2529,17 @@ metadatacmds = ; <em class=
<p>In this situation, the <span class=
"command"><strong>recoll</strong></span> GUI <span class=
"guimenu">File</span> menu makes two operations available:
'Stop' and 'Trigger incremental pass'.</p>
<span class="guimenuitem">Stop</span> and <span class=
"guimenuitem">Trigger incremental pass</span>.</p>
<p><span class="guimenuitem">Trigger incremental
pass</span> has the same effect as restarting the indexer,
and will cause a complete walk of the indexed area,
processing the changed files, then switch to monitoring.
This is only marginally useful, maybe in cases where the
indexer is configured to delay updates, or to force an
immediate rebuild of the stemming and phonetic data, which
are only processed at intervals by the real time
indexer.</p>
<p>While it is convenient that data is indexed in real
time, repeated indexing can generate a significant load on
the system when files such as email folders change. Also,
@ -3987,7 +4000,7 @@ fs.inotify.max_user_watches=32768
given context (e.g. within a preview window, within the
result table).</p>
<div class="table">
<a name="idm1438" id="idm1438"></a>
<a name="idm1444" id="idm1444"></a>
<p class="title"><b>Table&nbsp;3.1.&nbsp;Keyboard
shortcuts</b></p>
<div class="table-contents">

View File

@ -1277,31 +1277,34 @@ recollindex -c "$confdir"
local cache, then indexing it, then removing the file from the
queue.</para>
<note><title>The local cache is not an archive</title><para>As
mentioned above, a copy of the indexed Web pages is retained by
Recoll in a local cache (from which data is fetched for previews,
or when resetting the index). The cache is not changed by an
index reset, just read for indexing. The cache has a maximum
size, which can be adjusted from the <guilabel>Index
configuration</guilabel> / <guilabel>Web history</guilabel> panel
(<literal>webcachemaxmbs</literal> parameter
in <filename>recoll.conf</filename>). Once the maximum size is
reached, old pages are erased to make room for new ones. The
pages which you want to keep indefinitely need to be explicitly
archived elsewhere. Using a very high value for the cache size
can avoid data erasure, but see the above 'Howto' page for more
details and gotchas.</para></note>
<para>The visited Web pages indexing feature can be enabled on the
&RCL; side from the GUI <guilabel>Index configuration</guilabel>
panel, or by editing the configuration file (set
<varname>processwebqueue</varname> to 1).</para>
<para>The &RCL; GUI has a tool to list and edit the contents of the
Web
cache. (<menuchoice><guimenu>Tools</guimenu><guimenuitem>Webcache
Web cache. (<menuchoice><guimenu>Tools</guimenu><guimenuitem>Webcache
editor</guimenuitem></menuchoice>)</para>
<para>You can find more details on Web indexing, its usage and configuration
in a <ulink url="&FAQS;IndexWebHistory">Recoll 'Howto' entry</ulink>.</para>
in a <ulink url="&FAQS;IndexWebHistory">Recoll 'Howto'
entry</ulink>.</para>
<note><title>The cache is not an archive</title><para>A copy of
the indexed Web pages is retained by Recoll in a local cache
(from which data is fetched for previews, or when resetting the
index). The cache has a maximum size, which can be adjusted from
the <guilabel>Index configuration</guilabel> / <guilabel>Web
history</guilabel> panel (<literal>webcachemaxmbs</literal>
parameter in <filename>recoll.conf</filename>). Once the maximum
size is reached, old pages are erased to make room for new ones.
The pages which you want to keep indefinitely need to be
explicitly archived elsewhere. Using a very high value for
the cache size can avoid data erasure, but see the above 'Howto'
page for more details and gotchas.</para></note>
</sect1>
@ -1576,9 +1579,11 @@ metadatacmds = ; <replaceable>tags</replaceable> = tmsu tags %f
<para>The GUI <menuchoice><guimenu>File</guimenu> </menuchoice>
menu has entries to start or stop the current indexing
operation. When indexing is not currently running, you have a
choice of updating the index or rebuilding it (the first choice
only processes changed files, the second one zeroes the index
before starting so that all files are processed).</para>
choice between <guimenuitem>Update
Index</guimenuitem> or <guimenuitem>Rebuild Index</guimenuitem>.
The first choice only processes changed files, the second one
erases the index before starting so that all files are
processed.</para>
<para>On Linux and Windows, the GUI can be used to manage the indexing
operation. Stopping the indexer can be done
@ -1721,11 +1726,20 @@ metadatacmds = ; <replaceable>tags</replaceable> = tmsu tags %f
from the terminal and become a daemon, permanently monitoring
file changes and updating the index.</para>
<para>In this situation, the <command>recoll</command> GUI
<menuchoice><guimenu>File</guimenu></menuchoice> menu
makes two operations available: 'Stop' and 'Trigger incremental pass'.
<para>In this situation, the <command>recoll</command>
GUI <menuchoice><guimenu>File</guimenu></menuchoice> menu makes two
operations available: <guimenuitem>Stop</guimenuitem>
and <guimenuitem>Trigger incremental pass</guimenuitem>.
</para>
<para><guimenuitem>Trigger incremental pass</guimenuitem> has the
same effect as restarting the indexer, and will cause a complete
walk of the indexed area, processing the changed files, then switch
to monitoring. This is only marginally useful, maybe in cases where
the indexer is configured to delay updates, or to force an
immediate rebuild of the stemming and phonetic data, which are only
processed at intervals by the real time indexer.</para>
<para>While it is convenient that data is indexed in real time,
repeated indexing can generate a significant load on the
system when files such as email folders change. Also,