doc: clarify web cache management and "trigger incremental pass"

This commit is contained in:
Jean-Francois Dockes 2021-03-21 10:05:15 +01:00
parent 92f9852942
commit 1b61861ab3
2 changed files with 72 additions and 45 deletions

View File

@ -423,7 +423,7 @@ alink="#0000FF">
<div class="list-of-tables"> <div class="list-of-tables">
<p><b>List of Tables</b></p> <p><b>List of Tables</b></p>
<dl> <dl>
<dt>3.1. <a href="#idm1438">Keyboard shortcuts</a></dt> <dt>3.1. <a href="#idm1444">Keyboard shortcuts</a></dt>
</dl> </dl>
</div> </div>
<div class="chapter"> <div class="chapter">
@ -1976,6 +1976,25 @@ recollindex -c "$confdir"
"application">Recoll</span> then processes, storing the "application">Recoll</span> then processes, storing the
data into a local cache, then indexing it, then removing data into a local cache, then indexing it, then removing
the file from the queue.</p> the file from the queue.</p>
<div class="note" style=
"margin-left: 0.5in; margin-right: 0.5in;">
<h3 class="title">The local cache is not an archive</h3>
<p>As mentioned above, a copy of the indexed Web pages is
retained by Recoll in a local cache (from which data is
fetched for previews, or when resetting the index). The
cache is not changed by an index reset, just read for
indexing. The cache has a maximum size, which can be
adjusted from the <span class="guilabel">Index
configuration</span> / <span class="guilabel">Web
history</span> panel (<code class=
"literal">webcachemaxmbs</code> parameter in <code class=
"filename">recoll.conf</code>). Once the maximum size is
reached, old pages are erased to make room for new ones.
The pages which you want to keep indefinitely need to be
explicitly archived elsewhere. Using a very high value
for the cache size can avoid data erasure, but see the
above 'Howto' page for more details and gotchas.</p>
</div>
<p>The visited Web pages indexing feature can be enabled on <p>The visited Web pages indexing feature can be enabled on
the <span class="application">Recoll</span> side from the the <span class="application">Recoll</span> side from the
GUI <span class="guilabel">Index configuration</span> GUI <span class="guilabel">Index configuration</span>
@ -1989,23 +2008,6 @@ recollindex -c "$confdir"
configuration in a <a class="ulink" href= configuration in a <a class="ulink" href=
"https://www.lesbonscomptes.com/recoll/faqsandhowtos/IndexWebHistory" "https://www.lesbonscomptes.com/recoll/faqsandhowtos/IndexWebHistory"
target="_top">Recoll 'Howto' entry</a>.</p> target="_top">Recoll 'Howto' entry</a>.</p>
<div class="note" style=
"margin-left: 0.5in; margin-right: 0.5in;">
<h3 class="title">The cache is not an archive</h3>
<p>A copy of the indexed Web pages is retained by Recoll
in a local cache (from which data is fetched for
previews, or when resetting the index). The cache has a
maximum size, which can be adjusted from the <span class=
"guilabel">Index configuration</span> / <span class=
"guilabel">Web history</span> panel (<code class=
"literal">webcachemaxmbs</code> parameter in <code class=
"filename">recoll.conf</code>). Once the maximum size is
reached, old pages are erased to make room for new ones.
The pages which you want to keep indefinitely need to be
explicitly archived elsewhere. Using a very high value
for the cache size can avoid data erasure, but see the
above 'Howto' page for more details and gotchas.</p>
</div>
</div> </div>
<div class="sect1"> <div class="sect1">
<div class="titlepage"> <div class="titlepage">
@ -2357,10 +2359,11 @@ metadatacmds = ; <em class=
<p>The GUI <span class="guimenu">File</span> menu has <p>The GUI <span class="guimenu">File</span> menu has
entries to start or stop the current indexing operation. entries to start or stop the current indexing operation.
When indexing is not currently running, you have a choice When indexing is not currently running, you have a choice
of updating the index or rebuilding it (the first choice between <span class="guimenuitem">Update Index</span> or
only processes changed files, the second one zeroes the <span class="guimenuitem">Rebuild Index</span>. The first
index before starting so that all files are choice only processes changed files, the second one
processed).</p> erases the index before starting so that all files are
processed.</p>
<p>On Linux and Windows, the GUI can be used to manage <p>On Linux and Windows, the GUI can be used to manage
the indexing operation. Stopping the indexer can be done the indexing operation. Stopping the indexer can be done
from the <span class= from the <span class=
@ -2526,7 +2529,17 @@ metadatacmds = ; <em class=
<p>In this situation, the <span class= <p>In this situation, the <span class=
"command"><strong>recoll</strong></span> GUI <span class= "command"><strong>recoll</strong></span> GUI <span class=
"guimenu">File</span> menu makes two operations available: "guimenu">File</span> menu makes two operations available:
'Stop' and 'Trigger incremental pass'.</p> <span class="guimenuitem">Stop</span> and <span class=
"guimenuitem">Trigger incremental pass</span>.</p>
<p><span class="guimenuitem">Trigger incremental
pass</span> has the same effect as restarting the indexer,
and will cause a complete walk of the indexed area,
processing the changed files, then switch to monitoring.
This is only marginally useful, maybe in cases where the
indexer is configured to delay updates, or to force an
immediate rebuild of the stemming and phonetic data, which
are only processed at intervals by the real time
indexer.</p>
<p>While it is convenient that data is indexed in real <p>While it is convenient that data is indexed in real
time, repeated indexing can generate a significant load on time, repeated indexing can generate a significant load on
the system when files such as email folders change. Also, the system when files such as email folders change. Also,
@ -3987,7 +4000,7 @@ fs.inotify.max_user_watches=32768
given context (e.g. within a preview window, within the given context (e.g. within a preview window, within the
result table).</p> result table).</p>
<div class="table"> <div class="table">
<a name="idm1438" id="idm1438"></a> <a name="idm1444" id="idm1444"></a>
<p class="title"><b>Table&nbsp;3.1.&nbsp;Keyboard <p class="title"><b>Table&nbsp;3.1.&nbsp;Keyboard
shortcuts</b></p> shortcuts</b></p>
<div class="table-contents"> <div class="table-contents">

View File

@ -1277,31 +1277,34 @@ recollindex -c "$confdir"
local cache, then indexing it, then removing the file from the local cache, then indexing it, then removing the file from the
queue.</para> queue.</para>
<note><title>The local cache is not an archive</title><para>As
mentioned above, a copy of the indexed Web pages is retained by
Recoll in a local cache (from which data is fetched for previews,
or when resetting the index). The cache is not changed by an
index reset, just read for indexing. The cache has a maximum
size, which can be adjusted from the <guilabel>Index
configuration</guilabel> / <guilabel>Web history</guilabel> panel
(<literal>webcachemaxmbs</literal> parameter
in <filename>recoll.conf</filename>). Once the maximum size is
reached, old pages are erased to make room for new ones. The
pages which you want to keep indefinitely need to be explicitly
archived elsewhere. Using a very high value for the cache size
can avoid data erasure, but see the above 'Howto' page for more
details and gotchas.</para></note>
<para>The visited Web pages indexing feature can be enabled on the <para>The visited Web pages indexing feature can be enabled on the
&RCL; side from the GUI <guilabel>Index configuration</guilabel> &RCL; side from the GUI <guilabel>Index configuration</guilabel>
panel, or by editing the configuration file (set panel, or by editing the configuration file (set
<varname>processwebqueue</varname> to 1).</para> <varname>processwebqueue</varname> to 1).</para>
<para>The &RCL; GUI has a tool to list and edit the contents of the <para>The &RCL; GUI has a tool to list and edit the contents of the
Web Web cache. (<menuchoice><guimenu>Tools</guimenu><guimenuitem>Webcache
cache. (<menuchoice><guimenu>Tools</guimenu><guimenuitem>Webcache
editor</guimenuitem></menuchoice>)</para> editor</guimenuitem></menuchoice>)</para>
<para>You can find more details on Web indexing, its usage and configuration <para>You can find more details on Web indexing, its usage and configuration
in a <ulink url="&FAQS;IndexWebHistory">Recoll 'Howto' entry</ulink>.</para> in a <ulink url="&FAQS;IndexWebHistory">Recoll 'Howto'
entry</ulink>.</para>
<note><title>The cache is not an archive</title><para>A copy of
the indexed Web pages is retained by Recoll in a local cache
(from which data is fetched for previews, or when resetting the
index). The cache has a maximum size, which can be adjusted from
the <guilabel>Index configuration</guilabel> / <guilabel>Web
history</guilabel> panel (<literal>webcachemaxmbs</literal>
parameter in <filename>recoll.conf</filename>). Once the maximum
size is reached, old pages are erased to make room for new ones.
The pages which you want to keep indefinitely need to be
explicitly archived elsewhere. Using a very high value for
the cache size can avoid data erasure, but see the above 'Howto'
page for more details and gotchas.</para></note>
</sect1> </sect1>
@ -1576,9 +1579,11 @@ metadatacmds = ; <replaceable>tags</replaceable> = tmsu tags %f
<para>The GUI <menuchoice><guimenu>File</guimenu> </menuchoice> <para>The GUI <menuchoice><guimenu>File</guimenu> </menuchoice>
menu has entries to start or stop the current indexing menu has entries to start or stop the current indexing
operation. When indexing is not currently running, you have a operation. When indexing is not currently running, you have a
choice of updating the index or rebuilding it (the first choice choice between <guimenuitem>Update
only processes changed files, the second one zeroes the index Index</guimenuitem> or <guimenuitem>Rebuild Index</guimenuitem>.
before starting so that all files are processed).</para> The first choice only processes changed files, the second one
erases the index before starting so that all files are
processed.</para>
<para>On Linux and Windows, the GUI can be used to manage the indexing <para>On Linux and Windows, the GUI can be used to manage the indexing
operation. Stopping the indexer can be done operation. Stopping the indexer can be done
@ -1721,11 +1726,20 @@ metadatacmds = ; <replaceable>tags</replaceable> = tmsu tags %f
from the terminal and become a daemon, permanently monitoring from the terminal and become a daemon, permanently monitoring
file changes and updating the index.</para> file changes and updating the index.</para>
<para>In this situation, the <command>recoll</command> GUI <para>In this situation, the <command>recoll</command>
<menuchoice><guimenu>File</guimenu></menuchoice> menu GUI <menuchoice><guimenu>File</guimenu></menuchoice> menu makes two
makes two operations available: 'Stop' and 'Trigger incremental pass'. operations available: <guimenuitem>Stop</guimenuitem>
and <guimenuitem>Trigger incremental pass</guimenuitem>.
</para> </para>
<para><guimenuitem>Trigger incremental pass</guimenuitem> has the
same effect as restarting the indexer, and will cause a complete
walk of the indexed area, processing the changed files, then switch
to monitoring. This is only marginally useful, maybe in cases where
the indexer is configured to delay updates, or to force an
immediate rebuild of the stemming and phonetic data, which are only
processed at intervals by the real time indexer.</para>
<para>While it is convenient that data is indexed in real time, <para>While it is convenient that data is indexed in real time,
repeated indexing can generate a significant load on the repeated indexing can generate a significant load on the
system when files such as email folders change. Also, system when files such as email folders change. Also,