This commit is contained in:
Jean-Francois Dockes 2017-02-27 17:15:28 +01:00
parent d35c2a557a
commit c8cc64366d
4 changed files with 172 additions and 25 deletions

View File

@ -39,5 +39,11 @@ index.html: usermanual.xml
usermanual.pdf: usermanual.xml usermanual.pdf: usermanual.xml
dblatex $< dblatex $<
UTILBUILDS=/home/dockes/projets/builds/medocutils/
recoll-conf-xml:
$(UTILBUILDS)/confxml --docbook \
--idprefix=RCL.INSTALL.CONFIG.RECOLLCONF \
../../sampleconf/recoll.conf > recoll.conf.xml
clean: clean:
rm -f RCL.*.html usermanual.pdf usermanual.html index.html tmpfile.html rm -f RCL.*.html usermanual.pdf usermanual.html index.html tmpfile.html

View File

@ -24,6 +24,14 @@ you probably want this indexed. One possible solution is to have ".*" in
list, see the "noContentSuffixes" variable for an alternative approach list, see the "noContentSuffixes" variable for an alternative approach
which indexes the file names. Can be redefined for any which indexes the file names. Can be redefined for any
subtree.</para></listitem></varlistentry> subtree.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.SKIPPEDNAMES-">
<term><varname>skippedNames-</varname></term>
<listitem><para>List of name endings to remove from the default skippedNames
list. </para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.SKIPPEDNAMES+">
<term><varname>skippedNames+</varname></term>
<listitem><para>List of name endings to add to the default skippedNames
list. </para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.NOCONTENTSUFFIXES"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.NOCONTENTSUFFIXES">
<term><varname>noContentSuffixes</varname></term> <term><varname>noContentSuffixes</varname></term>
<listitem><para>List of name endings (not necessarily dot-separated suffixes) for <listitem><para>List of name endings (not necessarily dot-separated suffixes) for
@ -35,6 +43,14 @@ recoll.conf allows editing the list through the GUI). This is different
from skippedNames because these are name ending matches only (not from skippedNames because these are name ending matches only (not
wildcard patterns), and the file name itself gets indexed normally. This wildcard patterns), and the file name itself gets indexed normally. This
can be redefined for subdirectories.</para></listitem></varlistentry> can be redefined for subdirectories.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.NOCONTENTSUFFIXES-">
<term><varname>noContentSuffixes-</varname></term>
<listitem><para>List of name endings to remove from the default noContentSuffixes
list. </para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.NOCONTENTSUFFIXES+">
<term><varname>noContentSuffixes+</varname></term>
<listitem><para>List of name endings to add to the default noContentSuffixes
list. </para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.SKIPPEDPATHS"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.SKIPPEDPATHS">
<term><varname>skippedPaths</varname></term> <term><varname>skippedPaths</varname></term>
<listitem><para>Paths we should not go into. Space-separated list of <listitem><para>Paths we should not go into. Space-separated list of
@ -92,6 +108,16 @@ subtrees.</para></listitem></varlistentry>
<listitem><para>List of excluded MIME <listitem><para>List of excluded MIME
types. Lets you exclude some types from indexing. Can be types. Lets you exclude some types from indexing. Can be
redefined for subtrees.</para></listitem></varlistentry> redefined for subtrees.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.NOMD5MIMETYPES">
<term><varname>nomd5mimetypes</varname></term>
<listitem><para>Don't compute md5 for
these types. md5 checksums are used only for deduplicating
results, and can be very expensive to compute on multimedia or other big
files. This list lets you turn off md5 computation for selected types. It
is global (no redefinition for subtrees). At the moment, it only has an
effect for external handlers (exec and execm). The file types can be
specified by listing either MIME types (e.g. audio/mpeg) or handler names
(e.g. rclaudio).</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.COMPRESSEDFILEMAXKBS"> <varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.COMPRESSEDFILEMAXKBS">
<term><varname>compressedfilemaxkbs</varname></term> <term><varname>compressedfilemaxkbs</varname></term>
<listitem><para>Size limit for compressed <listitem><para>Size limit for compressed

View File

@ -20,8 +20,8 @@ alink="#0000FF">
<div class="titlepage"> <div class="titlepage">
<div> <div>
<div> <div>
<h1 class="title"><a name="idp55872064" id= <h1 class="title"><a name="idp7068352" id=
"idp55872064"></a>Recoll user manual</h1> "idp7068352"></a>Recoll user manual</h1>
</div> </div>
<div> <div>
@ -109,13 +109,13 @@ alink="#0000FF">
multiple indexes</a></span></dt> multiple indexes</a></span></dt>
<dt><span class="sect2">2.1.3. <a href= <dt><span class="sect2">2.1.3. <a href=
"#idp58917632">Document types</a></span></dt> "#idp39073168">Document types</a></span></dt>
<dt><span class="sect2">2.1.4. <a href= <dt><span class="sect2">2.1.4. <a href=
"#idp59454368">Indexing failures</a></span></dt> "#idp39095632">Indexing failures</a></span></dt>
<dt><span class="sect2">2.1.5. <a href= <dt><span class="sect2">2.1.5. <a href=
"#idp60727408">Recovery</a></span></dt> "#idp39102640">Recovery</a></span></dt>
</dl> </dl>
</dd> </dd>
@ -376,7 +376,7 @@ alink="#0000FF">
handler</a></span></dt> handler</a></span></dt>
<dt><span class="sect2">4.1.4. <a href= <dt><span class="sect2">4.1.4. <a href=
"#RCL.PROGRAM.FILTERS.HTML">Input handler HTML "#RCL.PROGRAM.FILTERS.HTML">Input handler
output</a></span></dt> output</a></span></dt>
<dt><span class="sect2">4.1.5. <a href= <dt><span class="sect2">4.1.5. <a href=
@ -997,8 +997,8 @@ alink="#0000FF">
<div class="titlepage"> <div class="titlepage">
<div> <div>
<div> <div>
<h3 class="title"><a name="idp58917632" id= <h3 class="title"><a name="idp39073168" id=
"idp58917632"></a>2.1.3.&nbsp;Document types</h3> "idp39073168"></a>2.1.3.&nbsp;Document types</h3>
</div> </div>
</div> </div>
</div> </div>
@ -1105,8 +1105,8 @@ indexedmimetypes = application/pdf
<div class="titlepage"> <div class="titlepage">
<div> <div>
<div> <div>
<h3 class="title"><a name="idp59454368" id= <h3 class="title"><a name="idp39095632" id=
"idp59454368"></a>2.1.4.&nbsp;Indexing "idp39095632"></a>2.1.4.&nbsp;Indexing
failures</h3> failures</h3>
</div> </div>
</div> </div>
@ -1146,8 +1146,8 @@ indexedmimetypes = application/pdf
<div class="titlepage"> <div class="titlepage">
<div> <div>
<div> <div>
<h3 class="title"><a name="idp60727408" id= <h3 class="title"><a name="idp39102640" id=
"idp60727408"></a>2.1.5.&nbsp;Recovery</h3> "idp39102640"></a>2.1.5.&nbsp;Recovery</h3>
</div> </div>
</div> </div>
</div> </div>
@ -5987,9 +5987,8 @@ dir:recoll dir:src -dir:utils -dir:common
deciding factor is metadata: <span class= deciding factor is metadata: <span class=
"application">Recoll</span> has a way to <a class="link" "application">Recoll</span> has a way to <a class="link"
href="#RCL.PROGRAM.FILTERS.HTML" title= href="#RCL.PROGRAM.FILTERS.HTML" title=
"4.1.4.&nbsp;Input handler HTML output">extract metadata "4.1.4.&nbsp;Input handler output">extract metadata from
from the HTML header and use it for field the HTML header and use it for field searches.</a>.</p>
searches.</a>.</p>
<p>The <code class= <p>The <code class=
"envar">RECOLL_FILTER_FORPREVIEW</code> environment "envar">RECOLL_FILTER_FORPREVIEW</code> environment
@ -6196,13 +6195,32 @@ application/x-chm = execm rclchm
<h3 class="title"><a name= <h3 class="title"><a name=
"RCL.PROGRAM.FILTERS.HTML" id= "RCL.PROGRAM.FILTERS.HTML" id=
"RCL.PROGRAM.FILTERS.HTML"></a>4.1.4.&nbsp;Input "RCL.PROGRAM.FILTERS.HTML"></a>4.1.4.&nbsp;Input
handler HTML output</h3> handler output</h3>
</div> </div>
</div> </div>
</div> </div>
<p>The output HTML could be very minimal like the <p>Both the simple and persistent input handlers can
following example:</p> return any MIME type to Recoll, which will further
process the data according to the MIME configuration.</p>
<p>Most input filters filters produce either <code class=
"literal">text/plain</code> or <code class=
"literal">text/html</code> data. There are exceptions,
for example, filters which process archive file
(<code class="literal">zip</code>, <code class=
"literal">tar</code>, etc.) will usually return the
documents as they are found, without processing them
further.</p>
<p>There is nothing to say about <code class=
"literal">text/plain</code> output, except that its
character encoding should be consistent with what is
specified in the <code class="filename">mimeconf</code>
file.</p>
<p>For filters producing HTML, the output could be very
minimal like the following example:</p>
<pre class="programlisting"> <pre class="programlisting">
&lt;html&gt; &lt;html&gt;
&lt;head&gt; &lt;head&gt;
@ -6222,9 +6240,9 @@ application/x-chm = execm rclchm
"literal">&amp;amp;</code>", "<code class= "literal">&amp;amp;</code>", "<code class=
"literal">&lt;</code>" should be transformed into "literal">&lt;</code>" should be transformed into
"<code class="literal">&amp;lt;</code>". This is not "<code class="literal">&amp;lt;</code>". This is not
always properly done by translating programs which output always properly done by external helper programs which
HTML, and of course never by those which output plain output HTML, and of course never by those which output
text.</p> plain text.</p>
<p>When encapsulating plain text in an HTML body, the <p>When encapsulating plain text in an HTML body, the
display of a preview may be improved by enclosing the display of a preview may be improved by enclosing the
@ -6293,6 +6311,17 @@ or
described in a <a class="link" href="#RCL.PROGRAM.FIELDS" described in a <a class="link" href="#RCL.PROGRAM.FIELDS"
title="4.2.&nbsp;Field data processing">further title="4.2.&nbsp;Field data processing">further
section</a>.</p> section</a>.</p>
<p>Persistent filters can use another, probably simpler,
method to produce metadata, by calling the <code class=
"literal">setfield()</code> helper method. This avoids
the necessity to produce HTML, and any issue with HTML
quoting. See, for example, <code class=
"filename">rclaudio</code> in <span class=
"application">Recoll</span> 1.23 and later for an example
of handler which outputs <code class=
"literal">text/plain</code> and uses <code class=
"literal">setfield()</code> to produce metadata.</p>
</div> </div>
<div class="sect2"> <div class="sect2">
@ -8676,6 +8705,23 @@ thesame = "some string with spaces"
names. Can be redefined for any subtree.</p> names. Can be redefined for any subtree.</p>
</dd> </dd>
<dt><a name=
"RCL.INSTALL.CONFIG.RECOLLCONF.SKIPPEDNAMES-" id=
"RCL.INSTALL.CONFIG.RECOLLCONF.SKIPPEDNAMES-"></a><span class="term"><code class="varname">skippedNames-</code></span></dt>
<dd>
<p>List of name endings to remove from the default
skippedNames list.</p>
</dd>
<dt><a name=
"RCL.INSTALL.CONFIG.RECOLLCONF.SKIPPEDNAMES+"></a><span class="term"><code class="varname">skippedNames+</code></span></dt>
<dd>
<p>List of name endings to add to the default
skippedNames list.</p>
</dd>
<dt><a name= <dt><a name=
"RCL.INSTALL.CONFIG.RECOLLCONF.NOCONTENTSUFFIXES" id= "RCL.INSTALL.CONFIG.RECOLLCONF.NOCONTENTSUFFIXES" id=
"RCL.INSTALL.CONFIG.RECOLLCONF.NOCONTENTSUFFIXES"></a><span class="term"><code class="varname">noContentSuffixes</code></span></dt> "RCL.INSTALL.CONFIG.RECOLLCONF.NOCONTENTSUFFIXES"></a><span class="term"><code class="varname">noContentSuffixes</code></span></dt>
@ -8696,6 +8742,25 @@ thesame = "some string with spaces"
subdirectories.</p> subdirectories.</p>
</dd> </dd>
<dt><a name=
"RCL.INSTALL.CONFIG.RECOLLCONF.NOCONTENTSUFFIXES-"
id="RCL.INSTALL.CONFIG.RECOLLCONF.NOCONTENTSUFFIXES-">
</a><span class="term"><code class=
"varname">noContentSuffixes-</code></span></dt>
<dd>
<p>List of name endings to remove from the default
noContentSuffixes list.</p>
</dd>
<dt><a name=
"RCL.INSTALL.CONFIG.RECOLLCONF.NOCONTENTSUFFIXES+"></a><span class="term"><code class="varname">noContentSuffixes+</code></span></dt>
<dd>
<p>List of name endings to add to the default
noContentSuffixes list.</p>
</dd>
<dt><a name= <dt><a name=
"RCL.INSTALL.CONFIG.RECOLLCONF.SKIPPEDPATHS" id= "RCL.INSTALL.CONFIG.RECOLLCONF.SKIPPEDPATHS" id=
"RCL.INSTALL.CONFIG.RECOLLCONF.SKIPPEDPATHS"></a><span class="term"><code class="varname">skippedPaths</code></span></dt> "RCL.INSTALL.CONFIG.RECOLLCONF.SKIPPEDPATHS"></a><span class="term"><code class="varname">skippedPaths</code></span></dt>
@ -8798,6 +8863,23 @@ thesame = "some string with spaces"
subtrees.</p> subtrees.</p>
</dd> </dd>
<dt><a name=
"RCL.INSTALL.CONFIG.RECOLLCONF.NOMD5MIMETYPES" id=
"RCL.INSTALL.CONFIG.RECOLLCONF.NOMD5MIMETYPES"></a><span class="term"><code class="varname">nomd5mimetypes</code></span></dt>
<dd>
<p>Don't compute md5 for these types. md5 checksums
are used only for deduplicating results, and can be
very expensive to compute on multimedia or other
big files. This list lets you turn off md5
computation for selected types. It is global (no
redefinition for subtrees). At the moment, it only
has an effect for external handlers (exec and
execm). The file types can be specified by listing
either MIME types (e.g. audio/mpeg) or handler
names (e.g. rclaudio).</p>
</dd>
<dt><a name= <dt><a name=
"RCL.INSTALL.CONFIG.RECOLLCONF.COMPRESSEDFILEMAXKBS" "RCL.INSTALL.CONFIG.RECOLLCONF.COMPRESSEDFILEMAXKBS"
id= id=
@ -9967,6 +10049,13 @@ x-my-tag = mailmytag
off, or the command changed inside the main configuration off, or the command changed inside the main configuration
file).</p> file).</p>
<p>All extension values in <code class=
"filename">mimemap</code> must be entered in lower case.
File names extensions are lower-cased for comparison
during indexing, meaning that an upper case <code class=
"filename">mimemap</code> entry will never be
matched.</p>
<p>The mappings can be specified on a per-subtree basis, <p>The mappings can be specified on a per-subtree basis,
which may be useful in some cases. Example: <span class= which may be useful in some cases. Example: <span class=
"application">okular</span> notes have a <code class= "application">okular</span> notes have a <code class=

View File

@ -4211,10 +4211,26 @@ application/x-chm = execm rclchm
</sect2> </sect2>
<sect2 id="RCL.PROGRAM.FILTERS.HTML"> <sect2 id="RCL.PROGRAM.FILTERS.HTML">
<title>Input handler HTML output</title> <title>Input handler output</title>
<para>The output HTML could be very minimal like the following <para>Both the simple and persistent input handlers can return any
example: MIME type to Recoll, which will further process the data according
to the MIME configuration.</para>
<para>Most input filters filters produce either
<literal>text/plain</literal> or <literal>text/html</literal>
data. There are exceptions, for example, filters which process
archive file (<literal>zip</literal>, <literal>tar</literal>, etc.)
will usually return the documents as they are found, without
processing them further.</para>
<para>There is nothing to say about <literal>text/plain</literal>
output, except that its character encoding should be consistent
with what is specified in the <filename>mimeconf</filename>
file.</para>
<para>For filters producing HTML, the output could be very minimal
like the following example:
<programlisting> <programlisting>
&lt;html> &lt;html>
&lt;head> &lt;head>
@ -4234,7 +4250,7 @@ application/x-chm = execm rclchm
"<literal>&amp;amp;</literal>", "<literal>&lt;</literal>" "<literal>&amp;amp;</literal>", "<literal>&lt;</literal>"
should be transformed into should be transformed into
"<literal>&amp;lt;</literal>". This is not always properly "<literal>&amp;lt;</literal>". This is not always properly
done by translating programs which output HTML, and of done by external helper programs which output HTML, and of
course never by those which output plain text. </para> course never by those which output plain text. </para>
<para>When encapsulating plain text in an HTML body, <para>When encapsulating plain text in an HTML body,
@ -4298,6 +4314,16 @@ or
in a <link linkend="RCL.PROGRAM.FIELDS">further in a <link linkend="RCL.PROGRAM.FIELDS">further
section</link>.</para> section</link>.</para>
<para>Persistent filters can use another, probably simpler,
method to produce metadata, by calling the
<literal>setfield()</literal> helper method. This avoids the
necessity to produce HTML, and any issue with HTML quoting. See,
for example, <filename>rclaudio</filename> in &RCL; 1.23 and
later for an example of handler which outputs
<literal>text/plain</literal> and uses
<literal>setfield()</literal> to produce metadata.</para>
</sect2> </sect2>
<sect2 id="RCL.PROGRAM.FILTERS.PAGES"> <sect2 id="RCL.PROGRAM.FILTERS.PAGES">