User doc: small improvements

This commit is contained in:
Jean-Francois Dockes 2017-12-03 10:19:40 +01:00
parent 572eb5b57d
commit 944076da54
4 changed files with 1166 additions and 1227 deletions

View File

@ -1,4 +1,12 @@
# Wherever docbook.xsl and chunk.xsl live
# Wherever docbook.xsl and chunk.xsl live.
# NOTE: THIS IS HARDCODED inside custom.xsl (for changing the output
# charset), which needs to change if the stylesheet location changes.
# Necessity of custom.xsl:
# http://www.sagehill.net/docbookxsl/OutputEncoding.html
# Fbsd # Fbsd
#XSLDIR="/usr/local/share/xsl/docbook/" #XSLDIR="/usr/local/share/xsl/docbook/"
# Mac # Mac
@ -26,7 +34,7 @@ webh:
usermanual.html: usermanual.xml usermanual.html: usermanual.xml
xsltproc --xinclude ${commonoptions} \ xsltproc --xinclude ${commonoptions} \
-o tmpfile.html "${XSLDIR}/html/docbook.xsl" $< -o tmpfile.html custom.xsl $<
-tidy -indent tmpfile.html > usermanual.html -tidy -indent tmpfile.html > usermanual.html
rm -f tmpfile.html rm -f tmpfile.html

14
src/doc/user/custom.xsl Normal file
View File

@ -0,0 +1,14 @@
<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:import
href="/usr/share/xml/docbook/stylesheet/docbook-xsl/html/docbook.xsl"/>
<xsl:output method="html"
doctype-public="-//W3C//DTD HTML 4.01//EN"
doctype-system="http://www.w3.org/TR/html4/strict.dtd"
encoding="UTF-8"
indent="no"/>
</xsl:stylesheet>

File diff suppressed because it is too large Load Diff

View File

@ -1,9 +1,11 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [ "http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
<!ENTITY RCL "<application>Recoll</application>"> <!ENTITY RCL "<application>Recoll</application>">
<!ENTITY RCLAPPS "<ulink url='http://www.recoll.org/features.html#doctypes'>http://www.recoll.org/features.html</ulink>"> <!ENTITY RCLAPPS "<ulink url='http://www.recoll.org/features.html#doctypes'>http://www.recoll.org/features.html</ulink>">
<!ENTITY RCLVERSION "1.22"> <!ENTITY RCLVERSION "1.23">
<!ENTITY XAP "<application>Xapian</application>"> <!ENTITY XAP "<application>Xapian</application>">
<!ENTITY WIN "<application>Windows</application>"> <!ENTITY WIN "<application>Windows</application>">
<!ENTITY FAQS "https://www.lesbonscomptes.com/recoll/faqsandhowtos/"> <!ENTITY FAQS "https://www.lesbonscomptes.com/recoll/faqsandhowtos/">
@ -50,16 +52,16 @@
<para>This document introduces full text search notions <para>This document introduces full text search notions
and describes the installation and use of the &RCL; and describes the installation and use of the &RCL;
application. This version describes &RCL; &RCLVERSION;.</para> application. It is updated for &RCL; &RCLVERSION;.</para>
<para>&RCL; was for a long time dedicated to Unix-like systems. It <para>&RCL; was for a long time dedicated to Unix-like systems. It
was only lately (2015) ported to was only lately (2015) ported to
<application>MS-Windows</application>. Many references in this <application>MS-Windows</application>. Many references in this
manual, especially file locations, are specific to Unix, and not manual, especially file locations, are specific to Unix, and not
valid on &WIN;. Some described features are also not available on valid on &WIN;, where some described features are also not available.
&WIN;. The manual will be progressively updated. Until this happens, The manual will be progressively updated. Until this happens, on
most references to shared files can be translated by looking under &WIN;, most references to shared files can be translated by looking
the Recoll installation directory (esp. the under the Recoll installation directory (esp. the
<filename>Share</filename> subdirectory). The user configuration is <filename>Share</filename> subdirectory). The user configuration is
stored by default under <filename>AppData/Local/Recoll</filename> stored by default under <filename>AppData/Local/Recoll</filename>
inside the user directory, along with the index itself.</para> inside the user directory, along with the index itself.</para>
@ -79,7 +81,9 @@
number of documents and you do not want to wait or are very number of documents and you do not want to wait or are very
short on disk space. In this case, you may first want to customize short on disk space. In this case, you may first want to customize
the <link linkend="RCL.INDEXING.CONFIG">configuration</link> the <link linkend="RCL.INDEXING.CONFIG">configuration</link>
to restrict the indexed area (for the very impatient with a completed package install, from the <command>recoll</command> GUI: <menuchoice> to restrict the indexed area (for the very impatient with a
completed package install, from the <command>recoll</command> GUI:
<menuchoice>
<guimenu>Preferences</guimenu> <guimenu>Preferences</guimenu>
<guimenuitem>Indexing configuration</guimenuitem> <guimenuitem>Indexing configuration</guimenuitem>
</menuchoice>, then adjust the <guilabel>Top </menuchoice>, then adjust the <guilabel>Top
@ -91,9 +95,9 @@
example <application>antiword</application> for example <application>antiword</application> for
<application>Microsoft Word</application> files).</para> <application>Microsoft Word</application> files).</para>
<para>The &RCL; installation for &WIN; is self-contained and includes <para>The &RCL; for &WIN; package is self-contained and includes
most useful auxiliary programs. You will just need to install Python most useful auxiliary programs. You will just need to install
2.7.</para> <application>Python</application> 2.7.</para>
</sect1> </sect1>
@ -121,7 +125,9 @@
very complex, and in general are inferior to the power of the very complex, and in general are inferior to the power of the
human mind to rapidly determine relevance. The quality of human mind to rapidly determine relevance. The quality of
relevance guessing is probably the most important aspect when relevance guessing is probably the most important aspect when
evaluating a search application.</para> evaluating a search application. &RCL; relies on the &XAP;
probabilistic information retrieval library to determine
relevance.</para>
<para>In many cases, you are looking for all the forms of a <para>In many cases, you are looking for all the forms of a
word, including plurals, different tenses for a verb, or terms word, including plurals, different tenses for a verb, or terms
@ -132,13 +138,14 @@
same stem). This can be prevented for searching for a specific same stem). This can be prevented for searching for a specific
form.</para> form.</para>
<para>Stemming, by itself, does not accommodate for misspellings <para>Stemming, by itself, does not accommodate for misspellings or
or phonetic searches. A full text search application may also phonetic searches. A full text search application may also support
support this form of approximation. For example, a search for this form of approximation. For example, a search for
<replaceable>aliterattion</replaceable> returning no result may <replaceable>aliterattion</replaceable> returning no result might
propose, depending on index contents, <replaceable>alliteration propose <replaceable>alliteration, alteration, alterations, or
alteration alterations altercation</replaceable> as possible altercation</replaceable> as possible replacement terms. &RCL; bases
replacement terms. </para> its suggestions on the actual index contents, so that suggestions may
be made for words which would not appear in a standard dictionary.</para>
</sect1> </sect1>
@ -248,28 +255,35 @@
location defined by <application>Qt</application>.</para> location defined by <application>Qt</application>.</para>
<para>The <link linkend="RCL.INDEXING.PERIODIC.EXEC">indexing <para>The <link linkend="RCL.INDEXING.PERIODIC.EXEC">indexing
process</link> is started automatically the first time you process</link> is started automatically (after asking permission), the
execute the <command>recoll</command> GUI. Indexing can also first time you execute the <command>recoll</command> GUI. Indexing
be performed by executing the <command>recollindex</command> can also be performed by executing the <command>recollindex</command>
command. &RCL; indexing is multithreaded by default when command. &RCL; indexing is multithreaded by default when appropriate
appropriate hardware resources are available, and can perform hardware resources are available, and can perform in parallel
in parallel multiple tasks among text extraction, segmentation multiple tasks for text extraction, segmentation and index
and index updates.</para> updates.</para>
<para><link linkend="RCL.SEARCH">Searches</link> are usually <para><link linkend="RCL.SEARCH">Searches</link> are usually
performed inside the <command>recoll</command> GUI, which has many performed inside the <command>recoll</command> GUI, which has many
options to help you find what you are looking for. However, there options to help you find what you are looking for. However, there
are other ways to perform &RCL; searches: mostly a <link are other ways to perform &RCL; searches:
linkend="RCL.SEARCH.COMMANDLINE"> <itemizedlist>
command line interface</link>, a <listitem><para>A <link linkend="RCL.SEARCH.COMMANDLINE">
<link linkend="RCL.PROGRAM.PYTHONAPI"> command line interface</link>.</para></listitem>
<listitem><para>A <link linkend="RCL.PROGRAM.PYTHONAPI">
<application>Python</application> <application>Python</application>
programming interface</link>, a <link linkend="RCL.SEARCH.KIO"> programming interface</link></para></listitem>
<application>KDE</application> KIO slave module</link>, and <listitem><para>A <link linkend="RCL.SEARCH.KIO">
Ubuntu Unity <ulink url="https://bitbucket.org/medoc/unity-lens-recoll"> <application>KDE</application> KIO slave
Lens</ulink> (for older versions) or module</link>.</para></listitem>
<ulink url="https://bitbucket.org/medoc/unity-scope-recoll"> <listitem><para>A Ubuntu Unity <ulink
Scope</ulink> (for current versions) modules. url="https://bitbucket.org/medoc/unity-scope-recoll">Scope</ulink>
module.</para></listitem>
<listitem><para>A <ulink
url="https://github.com/koniu/recoll-webui">WEB
interface</ulink>.
</para></listitem>
</itemizedlist>
</para> </para>
</sect1> </sect1>
@ -296,7 +310,7 @@
optimization, and a new behaviour in version 1.21 (failed files optimization, and a new behaviour in version 1.21 (failed files
were always retried by previous versions). The command line were always retried by previous versions). The command line
option <option>-k</option> can be set to retry failed files, for option <option>-k</option> can be set to retry failed files, for
example after updating a filter.</para> example after updating an input handler.</para>
<para>The following sections give an overview of different <para>The following sections give an overview of different
aspects of the indexing processes and configuration, with links aspects of the indexing processes and configuration, with links
@ -375,42 +389,58 @@
<sect2 id="RCL.INDEXING.INTRODUCTION.CONFIG"> <sect2 id="RCL.INDEXING.INTRODUCTION.CONFIG">
<title>Configurations, multiple indexes</title> <title>Configurations, multiple indexes</title>
<para>The parameters describing what is to be indexed and <para>&RCL; supports defining multiple indexes.</para>
local preferences are defined in text files contained in a
<link linkend="RCL.INDEXING.CONFIG">configuration
directory</link>.</para>
<para>All parameters have defaults, defined in system-wide <para>Each index is defined by its own <link
files.</para> linkend="RCL.INDEXING.CONFIG">configuration directory</link>, in
which several configuration files describe what should be indexed
<para>Without further configuration, &RCL; will index all and how.</para>
appropriate files from your home directory, with a reasonable
set of defaults.</para>
<para>A default personal configuration directory <para>A default personal configuration directory
(<filename>$HOME/.recoll/</filename>) is created (<filename>$HOME/.recoll/</filename>) is created
when a &RCL; program is first executed. It is possible to when a &RCL; program is first executed. This configuration is
create other configuration directories, and use them by the one used for indexing and querying when no specific
setting the <envar>RECOLL_CONFDIR</envar> environment configuration is specified.</para>
variable, or giving the <option>-c</option> option to any of
the &RCL; commands.</para>
<para>In some cases, it may be interesting to index different <para>All configuration parameters have defaults, defined in
areas of the file system to separate databases. You can do this system-wide files. Without further customisation, the default
by using multiple configuration directories, each indexing a configuration will process your complete home directory, with a
file system area to a specific database. Typically, this reasonable set of defaults. It can be changed to process a
would be done to separate personal and shared different area of the file system, select files in different ways,
and many other things.</para>
<para>In some cases, it may be interesting, for example, to index
different areas of the file system into separate indexes, or use
different options. You can do this by creating additional
configuration directories.</para>
<para>Examples of usage would be to separate personal and shared
indexes, or to take advantage of the organization of your data indexes, or to take advantage of the organization of your data
to improve search precision.</para> to improve search precision.</para>
<para>The generated indexes can <para>A specific configuration can be selected by setting the
be queried concurrently in a transparent manner.</para> <envar>RECOLL_CONFDIR</envar> environment variable, or giving the
<option>-c</option> option to any of the &RCL; commands.</para>
<para>For index generation, multiple configurations are <para>When generating indexes, the different configurations are
totally independant from each other. When multiple indexes need entirely independant (no parameters are ever shared between
to be used for a single search, configurations when indexing).</para>
<link linkend="RCL.INDEXING.CONFIG.MULTIPLE">some parameters
should be consistent among the configurations</link>.</para> <para>Multiple indexes can queryied concurrently, either from the
GUI or the command line. When doing this, there is always a main
configuration, from which both configuration and index data are
used. Only the index data from the additional indexes is used
(their configuration parameters are ignored).</para>
<para>This is important and sometimes confusing, so it will be
rephrased here: for index generation, multiple configurations are
totally independant from each other. When querying, configuration
and data are used from the main index (the one designated by
<literal>-c</literal> or <envar>RECOLL_CONFDIR</envar>), and only
the data from the additional indexes is used. This also implies
that <link linkend="RCL.INDEXING.CONFIG.MULTIPLE">some parameters
should be consistent among the configurations</link> for indexes
which are to be used together.</para>
</sect2> </sect2>
@ -450,21 +480,11 @@
<para>By default, &RCL; will try to index any file type that <para>By default, &RCL; will try to index any file type that
it has a way to read. This is sometimes not desirable, and it has a way to read. This is sometimes not desirable, and
there are ways to either exclude some types, or on the there are ways to either exclude some types, or on the
contrary to define a positive list of types to be contrary define a positive list of types to be
indexed. In the latter case, any type not in the list will indexed. In the latter case, any type not in the list will
be ignored.</para> be ignored.</para>
<note><title>Note about MIME types</title> <para>Excluding file types can be done by adding wildcard name
<para>When editing the <literal>indexedmimetypes</literal>
or <literal>excludedmimetypes</literal> lists, you should use the
MIME values listed in the <filename>mimemap</filename> file
or in Recoll result lists in preference to <literal>file -i</literal>
output: there are a number of differences. The
<literal>file -i</literal> output should only be used for files
without extensions, or for which the extension is not listed in
<filename>mimemap</filename></para></note>
<para>Excluding types can be done by adding wildcard name
patterns to the patterns to the
<link linkend="RCL.INSTALL.CONFIG.RECOLLCONF.SKIPPEDNAMES"> <link linkend="RCL.INSTALL.CONFIG.RECOLLCONF.SKIPPEDNAMES">
skippedNames</link> list, which skippedNames</link> list, which
@ -491,15 +511,24 @@ indexedmimetypes = application/pdf
</para> </para>
<para><literal>excludedmimetypes</literal> or <para><literal>excludedmimetypes</literal> or
<literal>indexedmimetypes</literal>, can be set either by <literal>indexedmimetypes</literal>, can be set either by editing
editing the <link linkend="RCL.INSTALL.CONFIG.RECOLLCONF"> the <link linkend="RCL.INSTALL.CONFIG.RECOLLCONF">configuration
main configuration file file (<filename>recoll.conf</filename>)</link> for
(<filename>recoll.conf</filename>)</link>, or from the GUI the index, or by using the GUI index configuration tool.</para>
index configuration tool.</para>
<note><title>Note about MIME types</title>
<para>When editing the <literal>indexedmimetypes</literal>
or <literal>excludedmimetypes</literal> lists, you should use the
MIME values listed in the <filename>mimemap</filename> file
or in Recoll result lists in preference to <literal>file -i</literal>
output: there are a number of differences. The
<literal>file -i</literal> output should only be used for files
without extensions, or for which the extension is not listed in
<filename>mimemap</filename></para></note>
</sect2> </sect2>
<sect2> <sect2>
<title>Indexing failures</title> <title>Indexing failures</title>
@ -531,6 +560,7 @@ indexedmimetypes = application/pdf
<sect2> <sect2>
<title>Recovery</title> <title>Recovery</title>
<para>In the rare case where the index becomes corrupted (which can <para>In the rare case where the index becomes corrupted (which can
signal itself by weird search results or crashes), the index files signal itself by weird search results or crashes), the index files
need to be erased before restarting a clean indexing pass. Just delete need to be erased before restarting a clean indexing pass. Just delete
@ -538,7 +568,11 @@ indexedmimetypes = application/pdf
<link linkend="RCL.INDEXING.STORAGE">next section</link>), or, <link linkend="RCL.INDEXING.STORAGE">next section</link>), or,
alternatively, start the next <command>recollindex</command> with the alternatively, start the next <command>recollindex</command> with the
<option>-z</option> option, which will reset the database before <option>-z</option> option, which will reset the database before
indexing.</para> indexing. The difference between the two methods is that the
second will not change the current index format, which may be
undesirable if a newer format is supported by the &XAP;
version.</para>
</sect2> </sect2>
</sect1> </sect1>
@ -585,7 +619,6 @@ indexedmimetypes = application/pdf
desired another location for the index, typically out of disk desired another location for the index, typically out of disk
occupation concerns.</para> occupation concerns.</para>
</listitem> </listitem>
</itemizedlist> </itemizedlist>
</para> </para>
@ -600,7 +633,7 @@ indexedmimetypes = application/pdf
indexed).</para> indexed).</para>
<para>Of course, images, sound and video do not increase the <para>Of course, images, sound and video do not increase the
index size, which means that nowadays (2012), typically, even a big index size, which means that nowadays, typically, even a big
index will be negligible against the total amount of data on the index will be negligible against the total amount of data on the
computer.</para> computer.</para>
@ -617,18 +650,15 @@ indexedmimetypes = application/pdf
used to create new indexes, and will also support the format from used to create new indexes, and will also support the format from
the previous major version.</para> the previous major version.</para>
<para>&XAP; will not convert automatically an existing index <para>&XAP; will not convert automatically an existing index from
from the older format to the newer one. If you want to upgrade to the older format to the newer one. If you want to upgrade to the
the new format, or if a very old index needs to be converted new format, or if a very old index needs to be converted because
because its format is not supported any more, you will have to its format is not supported any more, you will have to explicitly
explicitly delete the old index, then run a normal indexing delete the old index (typically
process.</para> <filename>~/.recoll/xapiandb</filename>), then run a normal
indexing command. Using option <option>-z</option> would not work
in this situation.</para>
<para>Using the <option>-z</option> option to
<command>recollindex</command> is not sufficient to change the
format, you will have to delete all files inside the index
directory (typically <filename>~/.recoll/xapiandb</filename>)
before starting the indexing.</para>
</sect2> </sect2>
@ -697,16 +727,16 @@ indexedmimetypes = application/pdf
<para>As of Recoll 1.18 there are two incompatible types of Recoll <para>As of Recoll 1.18 there are two incompatible types of Recoll
indexes, depending on the treatment of character case and indexes, depending on the treatment of character case and
diacritics. The next section describes the two types in more diacritics. A <link linkend="RCL.INDEXING.CONFIG.SENS">a further
detail.</para> section</link> describes the two types in more detail.</para>
<sect2 id="RCL.INDEXING.CONFIG.MULTIPLE"> <sect2 id="RCL.INDEXING.CONFIG.MULTIPLE">
<title>Multiple indexes</title> <title>Multiple indexes</title>
<para>Multiple &RCL; indexes can be created by <para>Multiple &RCL; indexes can be created by using several
using several configuration directories which are usually set to configuration directories which are typically set to index
index different areas of the file system. A specific index can different areas of the file system. A specific index can be
be selected for updating or searching, using the selected for updating or searching, using the
<envar>RECOLL_CONFDIR</envar> environment variable or the <envar>RECOLL_CONFDIR</envar> environment variable or the
<option>-c</option> option to <command>recoll</command> and <option>-c</option> option to <command>recoll</command> and
<command>recollindex</command>.</para> <command>recollindex</command>.</para>
@ -717,7 +747,7 @@ indexedmimetypes = application/pdf
<envar>RECOLL_CONFDIR</envar> or the <option>-c</option> parameter, <envar>RECOLL_CONFDIR</envar> or the <option>-c</option> parameter,
and there is no way to switch configurations within the GUI.</para> and there is no way to switch configurations within the GUI.</para>
<para>Additional configuration directory (beyond <para>Additional configuration directories (beyond
<filename>~/.recoll</filename>) must be created by hand <filename>~/.recoll</filename>) must be created by hand
(<command>mkdir</command> or such), the GUI will not do it. This is (<command>mkdir</command> or such), the GUI will not do it. This is
to avoid mistakenly creating additional directories when an to avoid mistakenly creating additional directories when an
@ -735,16 +765,20 @@ indexedmimetypes = application/pdf
worth the trouble.</para> worth the trouble.</para>
<para>A <command>recollindex</command> program instance can only <para>A <command>recollindex</command> program instance can only
update one specific index.</para> update one specific index, and it will only use parameters from a
single configuration (no parameters are ever shared between
configurations when indexing).</para>
<para>The main index (defined by <para>Multiple indexes can queryied concurrently, either from the
<envar>RECOLL_CONFDIR</envar> or <option>-c</option>) is GUI or the command line. When doing this, there is always a main
always active. If this is undesirable, you can set up your configuration, from which both configuration and index data are
base configuration to index an empty directory.</para> used. Only the index data from the additional indexes is used
(their configuration parameters are ignored).</para>
<para>The different search interfaces (GUI, command line, ...) <para>When searching, the current main index (defined by
have different methods to define the set of indexes to be <envar>RECOLL_CONFDIR</envar> or <option>-c</option>) is always
used, see the appropriate section.</para> active. If this is undesirable, you can set up your base
configuration to index an empty directory.</para>
<para>If a set of multiple indexes are to be used together for <para>If a set of multiple indexes are to be used together for
searches, some configuration parameters must be consistent searches, some configuration parameters must be consistent
@ -761,6 +795,11 @@ indexedmimetypes = application/pdf
<link linkend="RCL.INSTALL.CONFIG.RECOLLCONF.TERMS">linked <link linkend="RCL.INSTALL.CONFIG.RECOLLCONF.TERMS">linked
section</link>.</para> section</link>.</para>
<para>The different search interfaces (GUI, command line, ...)
have different methods to define the set of indexes to be
used, see the appropriate section.</para>
</sect2> </sect2>
@ -2357,8 +2396,8 @@ MimeType=*/*
<para>See the <link linkend="RCL.INDEXING.CONFIG.MULTIPLE">section <para>See the <link linkend="RCL.INDEXING.CONFIG.MULTIPLE">section
describing the use of multiple indexes</link> for describing the use of multiple indexes</link> for
generalities. Only the aspects concerning generalities. Only the aspects concerning the
the <command>recoll</command> GUI are described here.</para> <command>recoll</command> GUI are described here.</para>
<para>A <command>recoll</command> program instance is always <para>A <command>recoll</command> program instance is always
associated with a specific index, which is the one to be updated associated with a specific index, which is the one to be updated
@ -2367,13 +2406,12 @@ MimeType=*/*
indexes can be selected through the <guilabel>external indexes can be selected through the <guilabel>external
indexes</guilabel> tab in the preferences dialog.</para> indexes</guilabel> tab in the preferences dialog.</para>
<para>Index selection is performed in two phases. A set of all <para>Index selection is performed in two phases. A set of all usable
usable indexes must first be defined, and then the subset of indexes must first be defined, and then the subset of indexes to be
indexes to be used for searching. These parameters used for searching. These parameters are retained across program
are retained across program executions (there are kept executions (there are kept separately for each &RCL;
separately for each &RCL; configuration). The set of all indexes configuration). The set of all indexes is usually quite stable, while
is usually quite stable, while the active ones might typically the active ones might typically be adjusted quite frequently.</para>
be adjusted quite frequently.</para>
<para>The main index (defined by <para>The main index (defined by
<envar>RECOLL_CONFDIR</envar>) is always active. If this is <envar>RECOLL_CONFDIR</envar>) is always active. If this is