User doc: small improvements

This commit is contained in:
Jean-Francois Dockes 2017-12-03 10:19:40 +01:00
parent 572eb5b57d
commit 944076da54
4 changed files with 1166 additions and 1227 deletions

View File

@ -1,4 +1,12 @@
# Wherever docbook.xsl and chunk.xsl live
# Wherever docbook.xsl and chunk.xsl live.
# NOTE: THIS IS HARDCODED inside custom.xsl (for changing the output
# charset), which needs to change if the stylesheet location changes.
# Necessity of custom.xsl:
# http://www.sagehill.net/docbookxsl/OutputEncoding.html
# Fbsd
#XSLDIR="/usr/local/share/xsl/docbook/"
# Mac
@ -26,7 +34,7 @@ webh:
usermanual.html: usermanual.xml
xsltproc --xinclude ${commonoptions} \
-o tmpfile.html "${XSLDIR}/html/docbook.xsl" $<
-o tmpfile.html custom.xsl $<
-tidy -indent tmpfile.html > usermanual.html
rm -f tmpfile.html

14
src/doc/user/custom.xsl Normal file
View File

@ -0,0 +1,14 @@
<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:import
href="/usr/share/xml/docbook/stylesheet/docbook-xsl/html/docbook.xsl"/>
<xsl:output method="html"
doctype-public="-//W3C//DTD HTML 4.01//EN"
doctype-system="http://www.w3.org/TR/html4/strict.dtd"
encoding="UTF-8"
indent="no"/>
</xsl:stylesheet>

File diff suppressed because it is too large Load Diff

View File

@ -1,9 +1,11 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
"http://www.oasis-open.org/docbook/xml/4.5/docbookx.dtd" [
<!ENTITY RCL "<application>Recoll</application>">
<!ENTITY RCLAPPS "<ulink url='http://www.recoll.org/features.html#doctypes'>http://www.recoll.org/features.html</ulink>">
<!ENTITY RCLVERSION "1.22">
<!ENTITY RCLVERSION "1.23">
<!ENTITY XAP "<application>Xapian</application>">
<!ENTITY WIN "<application>Windows</application>">
<!ENTITY FAQS "https://www.lesbonscomptes.com/recoll/faqsandhowtos/">
@ -50,16 +52,16 @@
<para>This document introduces full text search notions
and describes the installation and use of the &RCL;
application. This version describes &RCL; &RCLVERSION;.</para>
application. It is updated for &RCL; &RCLVERSION;.</para>
<para>&RCL; was for a long time dedicated to Unix-like systems. It
was only lately (2015) ported to
<application>MS-Windows</application>. Many references in this
manual, especially file locations, are specific to Unix, and not
valid on &WIN;. Some described features are also not available on
&WIN;. The manual will be progressively updated. Until this happens,
most references to shared files can be translated by looking under
the Recoll installation directory (esp. the
valid on &WIN;, where some described features are also not available.
The manual will be progressively updated. Until this happens, on
&WIN;, most references to shared files can be translated by looking
under the Recoll installation directory (esp. the
<filename>Share</filename> subdirectory). The user configuration is
stored by default under <filename>AppData/Local/Recoll</filename>
inside the user directory, along with the index itself.</para>
@ -79,7 +81,9 @@
number of documents and you do not want to wait or are very
short on disk space. In this case, you may first want to customize
the <link linkend="RCL.INDEXING.CONFIG">configuration</link>
to restrict the indexed area (for the very impatient with a completed package install, from the <command>recoll</command> GUI: <menuchoice>
to restrict the indexed area (for the very impatient with a
completed package install, from the <command>recoll</command> GUI:
<menuchoice>
<guimenu>Preferences</guimenu>
<guimenuitem>Indexing configuration</guimenuitem>
</menuchoice>, then adjust the <guilabel>Top
@ -91,9 +95,9 @@
example <application>antiword</application> for
<application>Microsoft Word</application> files).</para>
<para>The &RCL; installation for &WIN; is self-contained and includes
most useful auxiliary programs. You will just need to install Python
2.7.</para>
<para>The &RCL; for &WIN; package is self-contained and includes
most useful auxiliary programs. You will just need to install
<application>Python</application> 2.7.</para>
</sect1>
@ -121,7 +125,9 @@
very complex, and in general are inferior to the power of the
human mind to rapidly determine relevance. The quality of
relevance guessing is probably the most important aspect when
evaluating a search application.</para>
evaluating a search application. &RCL; relies on the &XAP;
probabilistic information retrieval library to determine
relevance.</para>
<para>In many cases, you are looking for all the forms of a
word, including plurals, different tenses for a verb, or terms
@ -132,13 +138,14 @@
same stem). This can be prevented for searching for a specific
form.</para>
<para>Stemming, by itself, does not accommodate for misspellings
or phonetic searches. A full text search application may also
support this form of approximation. For example, a search for
<replaceable>aliterattion</replaceable> returning no result may
propose, depending on index contents, <replaceable>alliteration
alteration alterations altercation</replaceable> as possible
replacement terms. </para>
<para>Stemming, by itself, does not accommodate for misspellings or
phonetic searches. A full text search application may also support
this form of approximation. For example, a search for
<replaceable>aliterattion</replaceable> returning no result might
propose <replaceable>alliteration, alteration, alterations, or
altercation</replaceable> as possible replacement terms. &RCL; bases
its suggestions on the actual index contents, so that suggestions may
be made for words which would not appear in a standard dictionary.</para>
</sect1>
@ -248,28 +255,35 @@
location defined by <application>Qt</application>.</para>
<para>The <link linkend="RCL.INDEXING.PERIODIC.EXEC">indexing
process</link> is started automatically the first time you
execute the <command>recoll</command> GUI. Indexing can also
be performed by executing the <command>recollindex</command>
command. &RCL; indexing is multithreaded by default when
appropriate hardware resources are available, and can perform
in parallel multiple tasks among text extraction, segmentation
and index updates.</para>
process</link> is started automatically (after asking permission), the
first time you execute the <command>recoll</command> GUI. Indexing
can also be performed by executing the <command>recollindex</command>
command. &RCL; indexing is multithreaded by default when appropriate
hardware resources are available, and can perform in parallel
multiple tasks for text extraction, segmentation and index
updates.</para>
<para><link linkend="RCL.SEARCH">Searches</link> are usually
performed inside the <command>recoll</command> GUI, which has many
options to help you find what you are looking for. However, there
are other ways to perform &RCL; searches: mostly a <link
linkend="RCL.SEARCH.COMMANDLINE">
command line interface</link>, a
<link linkend="RCL.PROGRAM.PYTHONAPI">
are other ways to perform &RCL; searches:
<itemizedlist>
<listitem><para>A <link linkend="RCL.SEARCH.COMMANDLINE">
command line interface</link>.</para></listitem>
<listitem><para>A <link linkend="RCL.PROGRAM.PYTHONAPI">
<application>Python</application>
programming interface</link>, a <link linkend="RCL.SEARCH.KIO">
<application>KDE</application> KIO slave module</link>, and
Ubuntu Unity <ulink url="https://bitbucket.org/medoc/unity-lens-recoll">
Lens</ulink> (for older versions) or
<ulink url="https://bitbucket.org/medoc/unity-scope-recoll">
Scope</ulink> (for current versions) modules.
programming interface</link></para></listitem>
<listitem><para>A <link linkend="RCL.SEARCH.KIO">
<application>KDE</application> KIO slave
module</link>.</para></listitem>
<listitem><para>A Ubuntu Unity <ulink
url="https://bitbucket.org/medoc/unity-scope-recoll">Scope</ulink>
module.</para></listitem>
<listitem><para>A <ulink
url="https://github.com/koniu/recoll-webui">WEB
interface</ulink>.
</para></listitem>
</itemizedlist>
</para>
</sect1>
@ -296,7 +310,7 @@
optimization, and a new behaviour in version 1.21 (failed files
were always retried by previous versions). The command line
option <option>-k</option> can be set to retry failed files, for
example after updating a filter.</para>
example after updating an input handler.</para>
<para>The following sections give an overview of different
aspects of the indexing processes and configuration, with links
@ -375,42 +389,58 @@
<sect2 id="RCL.INDEXING.INTRODUCTION.CONFIG">
<title>Configurations, multiple indexes</title>
<para>The parameters describing what is to be indexed and
local preferences are defined in text files contained in a
<link linkend="RCL.INDEXING.CONFIG">configuration
directory</link>.</para>
<para>&RCL; supports defining multiple indexes.</para>
<para>All parameters have defaults, defined in system-wide
files.</para>
<para>Without further configuration, &RCL; will index all
appropriate files from your home directory, with a reasonable
set of defaults.</para>
<para>Each index is defined by its own <link
linkend="RCL.INDEXING.CONFIG">configuration directory</link>, in
which several configuration files describe what should be indexed
and how.</para>
<para>A default personal configuration directory
(<filename>$HOME/.recoll/</filename>) is created
when a &RCL; program is first executed. It is possible to
create other configuration directories, and use them by
setting the <envar>RECOLL_CONFDIR</envar> environment
variable, or giving the <option>-c</option> option to any of
the &RCL; commands.</para>
when a &RCL; program is first executed. This configuration is
the one used for indexing and querying when no specific
configuration is specified.</para>
<para>In some cases, it may be interesting to index different
areas of the file system to separate databases. You can do this
by using multiple configuration directories, each indexing a
file system area to a specific database. Typically, this
would be done to separate personal and shared
<para>All configuration parameters have defaults, defined in
system-wide files. Without further customisation, the default
configuration will process your complete home directory, with a
reasonable set of defaults. It can be changed to process a
different area of the file system, select files in different ways,
and many other things.</para>
<para>In some cases, it may be interesting, for example, to index
different areas of the file system into separate indexes, or use
different options. You can do this by creating additional
configuration directories.</para>
<para>Examples of usage would be to separate personal and shared
indexes, or to take advantage of the organization of your data
to improve search precision.</para>
<para>The generated indexes can
be queried concurrently in a transparent manner.</para>
<para>A specific configuration can be selected by setting the
<envar>RECOLL_CONFDIR</envar> environment variable, or giving the
<option>-c</option> option to any of the &RCL; commands.</para>
<para>For index generation, multiple configurations are
totally independant from each other. When multiple indexes need
to be used for a single search,
<link linkend="RCL.INDEXING.CONFIG.MULTIPLE">some parameters
should be consistent among the configurations</link>.</para>
<para>When generating indexes, the different configurations are
entirely independant (no parameters are ever shared between
configurations when indexing).</para>
<para>Multiple indexes can queryied concurrently, either from the
GUI or the command line. When doing this, there is always a main
configuration, from which both configuration and index data are
used. Only the index data from the additional indexes is used
(their configuration parameters are ignored).</para>
<para>This is important and sometimes confusing, so it will be
rephrased here: for index generation, multiple configurations are
totally independant from each other. When querying, configuration
and data are used from the main index (the one designated by
<literal>-c</literal> or <envar>RECOLL_CONFDIR</envar>), and only
the data from the additional indexes is used. This also implies
that <link linkend="RCL.INDEXING.CONFIG.MULTIPLE">some parameters
should be consistent among the configurations</link> for indexes
which are to be used together.</para>
</sect2>
@ -450,21 +480,11 @@
<para>By default, &RCL; will try to index any file type that
it has a way to read. This is sometimes not desirable, and
there are ways to either exclude some types, or on the
contrary to define a positive list of types to be
contrary define a positive list of types to be
indexed. In the latter case, any type not in the list will
be ignored.</para>
<note><title>Note about MIME types</title>
<para>When editing the <literal>indexedmimetypes</literal>
or <literal>excludedmimetypes</literal> lists, you should use the
MIME values listed in the <filename>mimemap</filename> file
or in Recoll result lists in preference to <literal>file -i</literal>
output: there are a number of differences. The
<literal>file -i</literal> output should only be used for files
without extensions, or for which the extension is not listed in
<filename>mimemap</filename></para></note>
<para>Excluding types can be done by adding wildcard name
<para>Excluding file types can be done by adding wildcard name
patterns to the
<link linkend="RCL.INSTALL.CONFIG.RECOLLCONF.SKIPPEDNAMES">
skippedNames</link> list, which
@ -491,15 +511,24 @@ indexedmimetypes = application/pdf
</para>
<para><literal>excludedmimetypes</literal> or
<literal>indexedmimetypes</literal>, can be set either by
editing the <link linkend="RCL.INSTALL.CONFIG.RECOLLCONF">
main configuration file
(<filename>recoll.conf</filename>)</link>, or from the GUI
index configuration tool.</para>
<literal>indexedmimetypes</literal>, can be set either by editing
the <link linkend="RCL.INSTALL.CONFIG.RECOLLCONF">configuration
file (<filename>recoll.conf</filename>)</link> for
the index, or by using the GUI index configuration tool.</para>
<note><title>Note about MIME types</title>
<para>When editing the <literal>indexedmimetypes</literal>
or <literal>excludedmimetypes</literal> lists, you should use the
MIME values listed in the <filename>mimemap</filename> file
or in Recoll result lists in preference to <literal>file -i</literal>
output: there are a number of differences. The
<literal>file -i</literal> output should only be used for files
without extensions, or for which the extension is not listed in
<filename>mimemap</filename></para></note>
</sect2>
<sect2>
<title>Indexing failures</title>
@ -531,6 +560,7 @@ indexedmimetypes = application/pdf
<sect2>
<title>Recovery</title>
<para>In the rare case where the index becomes corrupted (which can
signal itself by weird search results or crashes), the index files
need to be erased before restarting a clean indexing pass. Just delete
@ -538,7 +568,11 @@ indexedmimetypes = application/pdf
<link linkend="RCL.INDEXING.STORAGE">next section</link>), or,
alternatively, start the next <command>recollindex</command> with the
<option>-z</option> option, which will reset the database before
indexing.</para>
indexing. The difference between the two methods is that the
second will not change the current index format, which may be
undesirable if a newer format is supported by the &XAP;
version.</para>
</sect2>
</sect1>
@ -585,7 +619,6 @@ indexedmimetypes = application/pdf
desired another location for the index, typically out of disk
occupation concerns.</para>
</listitem>
</itemizedlist>
</para>
@ -600,7 +633,7 @@ indexedmimetypes = application/pdf
indexed).</para>
<para>Of course, images, sound and video do not increase the
index size, which means that nowadays (2012), typically, even a big
index size, which means that nowadays, typically, even a big
index will be negligible against the total amount of data on the
computer.</para>
@ -617,18 +650,15 @@ indexedmimetypes = application/pdf
used to create new indexes, and will also support the format from
the previous major version.</para>
<para>&XAP; will not convert automatically an existing index
from the older format to the newer one. If you want to upgrade to
the new format, or if a very old index needs to be converted
because its format is not supported any more, you will have to
explicitly delete the old index, then run a normal indexing
process.</para>
<para>&XAP; will not convert automatically an existing index from
the older format to the newer one. If you want to upgrade to the
new format, or if a very old index needs to be converted because
its format is not supported any more, you will have to explicitly
delete the old index (typically
<filename>~/.recoll/xapiandb</filename>), then run a normal
indexing command. Using option <option>-z</option> would not work
in this situation.</para>
<para>Using the <option>-z</option> option to
<command>recollindex</command> is not sufficient to change the
format, you will have to delete all files inside the index
directory (typically <filename>~/.recoll/xapiandb</filename>)
before starting the indexing.</para>
</sect2>
@ -697,16 +727,16 @@ indexedmimetypes = application/pdf
<para>As of Recoll 1.18 there are two incompatible types of Recoll
indexes, depending on the treatment of character case and
diacritics. The next section describes the two types in more
detail.</para>
diacritics. A <link linkend="RCL.INDEXING.CONFIG.SENS">a further
section</link> describes the two types in more detail.</para>
<sect2 id="RCL.INDEXING.CONFIG.MULTIPLE">
<title>Multiple indexes</title>
<para>Multiple &RCL; indexes can be created by
using several configuration directories which are usually set to
index different areas of the file system. A specific index can
be selected for updating or searching, using the
<para>Multiple &RCL; indexes can be created by using several
configuration directories which are typically set to index
different areas of the file system. A specific index can be
selected for updating or searching, using the
<envar>RECOLL_CONFDIR</envar> environment variable or the
<option>-c</option> option to <command>recoll</command> and
<command>recollindex</command>.</para>
@ -717,7 +747,7 @@ indexedmimetypes = application/pdf
<envar>RECOLL_CONFDIR</envar> or the <option>-c</option> parameter,
and there is no way to switch configurations within the GUI.</para>
<para>Additional configuration directory (beyond
<para>Additional configuration directories (beyond
<filename>~/.recoll</filename>) must be created by hand
(<command>mkdir</command> or such), the GUI will not do it. This is
to avoid mistakenly creating additional directories when an
@ -735,16 +765,20 @@ indexedmimetypes = application/pdf
worth the trouble.</para>
<para>A <command>recollindex</command> program instance can only
update one specific index.</para>
update one specific index, and it will only use parameters from a
single configuration (no parameters are ever shared between
configurations when indexing).</para>
<para>The main index (defined by
<envar>RECOLL_CONFDIR</envar> or <option>-c</option>) is
always active. If this is undesirable, you can set up your
base configuration to index an empty directory.</para>
<para>Multiple indexes can queryied concurrently, either from the
GUI or the command line. When doing this, there is always a main
configuration, from which both configuration and index data are
used. Only the index data from the additional indexes is used
(their configuration parameters are ignored).</para>
<para>The different search interfaces (GUI, command line, ...)
have different methods to define the set of indexes to be
used, see the appropriate section.</para>
<para>When searching, the current main index (defined by
<envar>RECOLL_CONFDIR</envar> or <option>-c</option>) is always
active. If this is undesirable, you can set up your base
configuration to index an empty directory.</para>
<para>If a set of multiple indexes are to be used together for
searches, some configuration parameters must be consistent
@ -761,6 +795,11 @@ indexedmimetypes = application/pdf
<link linkend="RCL.INSTALL.CONFIG.RECOLLCONF.TERMS">linked
section</link>.</para>
<para>The different search interfaces (GUI, command line, ...)
have different methods to define the set of indexes to be
used, see the appropriate section.</para>
</sect2>
@ -2357,8 +2396,8 @@ MimeType=*/*
<para>See the <link linkend="RCL.INDEXING.CONFIG.MULTIPLE">section
describing the use of multiple indexes</link> for
generalities. Only the aspects concerning
the <command>recoll</command> GUI are described here.</para>
generalities. Only the aspects concerning the
<command>recoll</command> GUI are described here.</para>
<para>A <command>recoll</command> program instance is always
associated with a specific index, which is the one to be updated
@ -2367,13 +2406,12 @@ MimeType=*/*
indexes can be selected through the <guilabel>external
indexes</guilabel> tab in the preferences dialog.</para>
<para>Index selection is performed in two phases. A set of all
usable indexes must first be defined, and then the subset of
indexes to be used for searching. These parameters
are retained across program executions (there are kept
separately for each &RCL; configuration). The set of all indexes
is usually quite stable, while the active ones might typically
be adjusted quite frequently.</para>
<para>Index selection is performed in two phases. A set of all usable
indexes must first be defined, and then the subset of indexes to be
used for searching. These parameters are retained across program
executions (there are kept separately for each &RCL;
configuration). The set of all indexes is usually quite stable, while
the active ones might typically be adjusted quite frequently.</para>
<para>The main index (defined by
<envar>RECOLL_CONFDIR</envar>) is always active. If this is