This commit is contained in:
"Jean-Francois Dockes ext:(%22) 2012-04-02 08:10:04 +02:00
parent 50f3883e35
commit a1f7c9fc1c

View File

@ -135,29 +135,46 @@
different character sets, encodings, and languages into the same
index. It has input filters for many document types.</para>
<para>Stemming depends on the document language. &RCL; stores
the unstemmed versions of terms and uses auxiliary databases for
term expansion. It can switch stemming languages, or add a
language, without re-indexing. Storing documents in different
languages in the same index is possible, and useful in
practice, but does introduce possibilities of confusion. &RCL;
currently makes no attempt at automatic language recognition.</para>
<para>Stemming is the process by which &RCL; reduces words to
their radicals so that searching does not depend, for example,
on a word being singular or plural (floor, floors), or on a verb
tense (flooring, floored). Because the mechanisms used for
stemming depend on the specific grammatical rules for each
language, there is a separate stemmer module for most common
languages where stemming makes sense. Storing documents written
in different languages in the same index is possible, and
commonly done. In this situation, you can specify several
stemming languages for the index. &RCL; stores the unstemmed
versions of terms in the main index and uses auxiliary databases
for term expansion (one for each stemming language), which means
that you can switch stemming languages between searches, or add
a language without needing a full reindex. &RCL; currently
makes no attempt at automatic language recognition, which means
that the stemmer will sometimes be applied to terms from other
languages with potentially strange results. In practise, even if
this introduces possibilities of confusion, this approach has
been proven quite useful, and, awaiting the addition of an
automatic language recognition module to &RCL;, it is much less
cumbersome than separating your documents according to what
language they are written in.</para>
<para>&RCL; has many parameters which define exactly what to
index, and how to classify and decode the source documents. These
are kept in <link linkend="rcl.indexing.config">configuration
files</link>. A default configuration is copied into a standard
location (usually something like
<filename>/usr/[local/]share/recoll/examples</filename>) during
installation. The default parameters from this file may be
overridden by values that you set inside your personal
configuration, found by default in the <filename>.recoll</filename>
sub-directory of your home directory. The default configuration
will index your home directory with default parameters and should
be sufficient for giving &RCL; a try, but you may want to adjust it
later, which can be done either by editing the text files or by
using configuration menus in the <command>recoll</command>
GUI</para>
index, and how to classify and decode the source
documents. These are kept in <link
linkend="rcl.indexing.config">configuration files</link>. A
default configuration is copied into a standard location
(usually something like
<filename>/usr/[local/]share/recoll/examples</filename>)
during installation. The default values set by the
configuration files in this directory may be overridden by
values that you set inside your personal configuration, found
by default in the <filename>.recoll</filename> sub-directory
of your home directory. The default configuration will index
your home directory with default parameters and should be
sufficient for giving &RCL; a try, but you may want to adjust
it later, which can be done either by editing the text files
or by using configuration menus in the
<command>recoll</command> GUI</para>
<para><link linkend="rcl.indexing.periodic.exec">Indexing</link>
is started automatically the first time you execute the