This commit is contained in:
"Jean-Francois Dockes ext:(%22) 2012-04-02 08:10:04 +02:00
parent 50f3883e35
commit a1f7c9fc1c

View File

@ -135,29 +135,46 @@
different character sets, encodings, and languages into the same different character sets, encodings, and languages into the same
index. It has input filters for many document types.</para> index. It has input filters for many document types.</para>
<para>Stemming depends on the document language. &RCL; stores <para>Stemming is the process by which &RCL; reduces words to
the unstemmed versions of terms and uses auxiliary databases for their radicals so that searching does not depend, for example,
term expansion. It can switch stemming languages, or add a on a word being singular or plural (floor, floors), or on a verb
language, without re-indexing. Storing documents in different tense (flooring, floored). Because the mechanisms used for
languages in the same index is possible, and useful in stemming depend on the specific grammatical rules for each
practice, but does introduce possibilities of confusion. &RCL; language, there is a separate stemmer module for most common
currently makes no attempt at automatic language recognition.</para> languages where stemming makes sense. Storing documents written
in different languages in the same index is possible, and
commonly done. In this situation, you can specify several
stemming languages for the index. &RCL; stores the unstemmed
versions of terms in the main index and uses auxiliary databases
for term expansion (one for each stemming language), which means
that you can switch stemming languages between searches, or add
a language without needing a full reindex. &RCL; currently
makes no attempt at automatic language recognition, which means
that the stemmer will sometimes be applied to terms from other
languages with potentially strange results. In practise, even if
this introduces possibilities of confusion, this approach has
been proven quite useful, and, awaiting the addition of an
automatic language recognition module to &RCL;, it is much less
cumbersome than separating your documents according to what
language they are written in.</para>
<para>&RCL; has many parameters which define exactly what to <para>&RCL; has many parameters which define exactly what to
index, and how to classify and decode the source documents. These index, and how to classify and decode the source
are kept in <link linkend="rcl.indexing.config">configuration documents. These are kept in <link
files</link>. A default configuration is copied into a standard linkend="rcl.indexing.config">configuration files</link>. A
location (usually something like default configuration is copied into a standard location
<filename>/usr/[local/]share/recoll/examples</filename>) during (usually something like
installation. The default parameters from this file may be <filename>/usr/[local/]share/recoll/examples</filename>)
overridden by values that you set inside your personal during installation. The default values set by the
configuration, found by default in the <filename>.recoll</filename> configuration files in this directory may be overridden by
sub-directory of your home directory. The default configuration values that you set inside your personal configuration, found
will index your home directory with default parameters and should by default in the <filename>.recoll</filename> sub-directory
be sufficient for giving &RCL; a try, but you may want to adjust it of your home directory. The default configuration will index
later, which can be done either by editing the text files or by your home directory with default parameters and should be
using configuration menus in the <command>recoll</command> sufficient for giving &RCL; a try, but you may want to adjust
GUI</para> it later, which can be done either by editing the text files
or by using configuration menus in the
<command>recoll</command> GUI</para>
<para><link linkend="rcl.indexing.periodic.exec">Indexing</link> <para><link linkend="rcl.indexing.periodic.exec">Indexing</link>
is started automatically the first time you execute the is started automatically the first time you execute the