use indexing instead of indexation

This commit is contained in:
dockes 2006-04-07 13:07:34 +00:00
parent c23b7e452b
commit 678e661190

View File

@ -24,7 +24,7 @@
Dockes</holder>
</copyright>
<releaseinfo>$Id: usermanual.sgml,v 1.10 2006-04-05 13:30:00 dockes Exp $</releaseinfo>
<releaseinfo>$Id: usermanual.sgml,v 1.11 2006-04-07 13:07:34 dockes Exp $</releaseinfo>
<abstract>
<para>This document introduces full text search notions
@ -108,11 +108,11 @@
mature package using <ulink
url="http://www.xapian.org/docs/intro_ir.html">a sophisticated
probabilistic ranking model</ulink>. &RCL; provides the interface
to get data into (indexation) and out (searching) of the system.</para>
to get data into (indexing) and out (searching) of the system.</para>
<para>In practice, &XAP; works by remembering where terms appear
in your document files. The acquisition process is called
indexation. </para>
indexing. </para>
<para>The resulting database can be big (roughly the size of the
original document set), but it is not a document
@ -151,7 +151,7 @@
giving &RCL; a try, but you may want to adjust it
later.</para>
<para><link linkend="rcl.indexing.exec">Indexation</link> is started
<para><link linkend="rcl.indexing.exec">Indexing</link> is started
automatically the first time you execute the
<command>recoll</command> search graphical user interface, or by
executing the <command>recollindex</command> command.</para>
@ -166,22 +166,22 @@
<chapter id="rcl.indexing">
<title>Indexation</title>
<title>Indexing</title>
<sect1 id="rcl.indexing.introduction">
<title>Introduction</title>
<para>Indexation is the process by which the set of documents is
analyzed and the data entered into the database. &RCL; indexation
<para>Indexing is the process by which the set of documents is
analyzed and the data entered into the database. &RCL; indexing
is normally incremental: documents will only be processed if
they have been modified. On the first execution, of course, all
documents will need processing. A full index build can be forced
later on by specifying an option to the indexation command
later on by specifying an option to the indexing command
(<command>recollindex -z</command>).</para>
<para>&RCL; indexation takes place at discrete times. There is
<para>&RCL; indexing takes place at discrete times. There is
currently no interface to real time file modification
monitors. The typical usage is to have a nightly indexation run
monitors. The typical usage is to have a nightly indexing run
<link linkend="rcl.indexing.automat">programmed</link> into your
<command>cron</command> file.</para>
@ -205,7 +205,7 @@
many individually indexed documents.
</para>
<para>&RCL; indexation processes plain text, HTML, openoffice
<para>&RCL; indexing processes plain text, HTML, openoffice
and e-mail files internally. Other types (ie: postscript, pdf,
ms-word, rtf) need external applications for preprocessing. The
list is in the <link
@ -219,7 +219,7 @@
</sect1>
<sect1 id="rcl.indexing.config">
<title>The indexation configuration</title>
<title>The indexing configuration</title>
<para>Values set in the system-wide configuration file (named
like
@ -231,9 +231,9 @@
<para>The most accurate documentation for editing the file is
given by comments inside the central one. If you want to adjust
the configuration before indexation, just click
the configuration before indexing, just click
<guilabel>Cancel</guilabel> when the program asks if it should
start initial indexation. This will have created a
start initial indexing. This will have created a
<filename>.recoll</filename> directory containing empty
configuration files.</para>
@ -244,34 +244,34 @@
</sect1>
<sect1 id="rcl.indexing.exec">
<title>Starting indexation</title>
<title>Starting indexing</title>
<para>Indexation is performed either by the
<para>Indexing is performed either by the
<command>recollindex</command> program, or by the
indexation thread inside the <command>recoll</command>
indexing thread inside the <command>recoll</command>
program (use the <guimenu>File</guimenu> menu).
<para>If the <command>recoll</command> program finds no database
when it starts, it will automatically start indexation (except
when it starts, it will automatically start indexing (except
if cancelled).</para>
<para>It is best to avoid interrupting the indexation process, as
<para>It is best to avoid interrupting the indexing process, as
this may sometimes leave the database in a bad state. This is
not a serious problem, as you then just need to clear
everything and restart the indexation: the database files are
everything and restart the indexing: the database files are
normally stored in the <filename>$HOME/.recoll/xapiandb</filename>
directory,
which you can just delete if needed. Alternatively, you can
start <command>recollindex -z</command>, which will
reset the database before indexation.</para>
reset the database before indexing.</para>
</sect1>
<sect1 id="rcl.indexing.automat">
<title>Using <command>cron</command> to automate
indexation</title>
indexing</title>
<para>The most common way to set up indexation is to have a cron
<para>The most common way to set up indexing is to have a cron
task execute it every night. For example the following
<filename>crontab</filename> entry would do it every day at
3:30AM (supposing <command>recollindex</command> is in your PATH):</para>
@ -443,7 +443,7 @@
<formalpara><title>File names</title>
<para>All file name elements (the broken up file path) are
entered as terms during indexation, and you can specify them
entered as terms during indexing, and you can specify them
as ordinary terms in normal search fields. Alternatively, you
can use specific file name search which will
<emphasis>only</emphasis> look for file names and can use
@ -510,7 +510,7 @@
file</link>), or later added with
<command>recollindex -s</command> (See the recollindex
manual). Stemming languages which are dynamically added will be
deleted at the next indexation pass unless they are also added in
deleted at the next indexing pass unless they are also added in
the configuration file.</para>
</listitem>
@ -745,7 +745,7 @@
will be created with a set of empty configuration files.
<command>recoll</command> will give you a
chance to edit the configuration file before starting
indexation. <command>recollindex</command> will
indexing. <command>recollindex</command> will
proceed immediately.</para>
<para>Most of the parameters specific to the
@ -787,7 +787,7 @@
</itemizedlist>
<para>Section lines allow redefining some parameters for a
directory subtree. Some of the parameters used for indexation
directory subtree. Some of the parameters used for indexing
are looked up hierarchically from the more to the less
specific. Not all parameters can be meaningfully redefined,
this is specified for each in the next section. </para>
@ -813,7 +813,7 @@
<command>recoll</command> to copy the sample
configuration, click <guimenu>Cancel</guimenu>, and edit
the configuration file before restarting the command. This
will start the initial indexation, which may take some time.</para>
will start the initial indexing, which may take some time.</para>
<para>Paramers:</para>
@ -824,7 +824,7 @@
index (recursively for directories). The indexer will not
follow symbolic links inside the indexed trees. If an entry in
the <literal>topdirs</literal> list is a symbolic link,
indexation will not start and will generate an error.</para>
indexing will not start and will generate an error.</para>
</listitem>
</varlistentry>
@ -885,7 +885,7 @@
possible values. You can add a stem expansion database for
a different language by using <command>recollindex
-s</command>, but it will be deleted during the next
indexation. Only languages listed in the configuration
indexing. Only languages listed in the configuration
file are permanent.</para>
</listitem>
</varlistentry>
@ -927,7 +927,7 @@
type for a file (the main procedure uses suffix
associations as defined in the <filename>mimemap</filename>
file). This can be useful for files with suffixless names,
but it will also cause the indexation of many bogus "text"
but it will also cause the indexing of many bogus "text"
files.</para>
</listitem>
</varlistentry>
@ -937,7 +937,7 @@
section of the database to allow specific file names
searches using wild cards. This parameter decides if
file name indexing is performed only for files with mime
types that would qualify them for full text indexation, or
types that would qualify them for full text indexing, or
for all files inside the selected subtrees, independant of
mime type.</para>
</listitem>
@ -985,10 +985,10 @@
<title>The mimeconf file</title>
<para><filename>mimeconf</filename> specifies how the
different mime types are handled for indexation, and for
different mime types are handled for indexing, and for
display.</para>
<para>Changing the indexation parameters is probably not a
<para>Changing the indexing parameters is probably not a
good idea except if you are a &RCL; developper.</para>
<para>You may want to adjust the external viewers defined in