doc
This commit is contained in:
parent
d41561638a
commit
e41216aa9d
@ -20,7 +20,7 @@
|
|||||||
</author>
|
</author>
|
||||||
|
|
||||||
<copyright>
|
<copyright>
|
||||||
<year>2005</year>
|
<year>2005-2011</year>
|
||||||
<holder role="mailto:jfd@recoll.org">Jean-Francois
|
<holder role="mailto:jfd@recoll.org">Jean-Francois
|
||||||
Dockes</holder>
|
Dockes</holder>
|
||||||
</copyright>
|
</copyright>
|
||||||
@ -197,18 +197,18 @@
|
|||||||
<listitem>
|
<listitem>
|
||||||
<formalpara><title>Periodic indexing:</title>
|
<formalpara><title>Periodic indexing:</title>
|
||||||
<para>indexing takes place at discrete
|
<para>indexing takes place at discrete
|
||||||
times, by executing the <command>recollindex</command>
|
times, by executing the <command>recollindex</command>
|
||||||
command. The typical usage is to have a nightly indexing run
|
command. The typical usage is to have a nightly indexing run
|
||||||
<link linkend="rcl.indexing.periodic.automat">programmed</link> into your
|
<link linkend="rcl.indexing.periodic.automat">programmed</link>
|
||||||
<command>cron</command> file.</para>
|
into your <command>cron</command> file.</para>
|
||||||
</formalpara>
|
</formalpara>
|
||||||
</listitem>
|
</listitem>
|
||||||
|
|
||||||
<listitem>
|
<listitem>
|
||||||
<formalpara><title>Real time indexing:</title>
|
<formalpara><title>Real time indexing:</title>
|
||||||
<para>indexing takes place as soon as a file is created or
|
<para>indexing takes place as soon as a file is created or
|
||||||
changed. <command>recollindex</command> runs as a daemon
|
changed. <command>recollindex</command> runs as a daemon
|
||||||
and uses a file system alteration monitor such as
|
and uses a file system alteration monitor such as
|
||||||
<application>inotify</application>,
|
<application>inotify</application>,
|
||||||
<application>Fam</application> or
|
<application>Fam</application> or
|
||||||
<application>Gamin</application>
|
<application>Gamin</application>
|
||||||
@ -218,17 +218,16 @@
|
|||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
|
|
||||||
<para>The choice between the two methods is mostly a matter of
|
<para>The choice between the two methods is mostly a matter of
|
||||||
preference, and they can be combined by setting up multiple
|
preference, and they can be combined by setting up multiple
|
||||||
indexes (ie: use periodic indexing on a big documentation
|
indexes (ie: use periodic indexing on a big documentation
|
||||||
directory, and real time indexing on a small home
|
directory, and real time indexing on a small home
|
||||||
directory). Monitoring a big file system tree can consume
|
directory). Monitoring a big file system tree can consume
|
||||||
significant system resources.<para>
|
significant system resources.<para>
|
||||||
|
|
||||||
<para>&RCL; knows about quite a few different document
|
<para>&RCL; knows about quite a few different document
|
||||||
types. The parameters for document types recognition and
|
types. The parameters for document types recognition and
|
||||||
processing are set in
|
processing are set in
|
||||||
<link linkend="rcl.indexing.config">configuration files</link>.
|
<link linkend="rcl.indexing.config">configuration files</link>.</para>
|
||||||
</para>
|
|
||||||
|
|
||||||
<para>Most file types, like HTML or word processing files, only hold
|
<para>Most file types, like HTML or word processing files, only hold
|
||||||
one document. Some file types, like mail folder files or zip
|
one document. Some file types, like mail folder files or zip
|
||||||
@ -236,25 +235,24 @@
|
|||||||
in turn be themselves compound ones. Such hierarchies can go quite
|
in turn be themselves compound ones. Such hierarchies can go quite
|
||||||
deep, and &RCL; has no problem processing, for example, an ms-word
|
deep, and &RCL; has no problem processing, for example, an ms-word
|
||||||
document which would be an attachment to an email message part of
|
document which would be an attachment to an email message part of
|
||||||
a folder file archived inside a zip file...
|
a folder file archived inside a zip file...</para>
|
||||||
</para>
|
|
||||||
|
|
||||||
<para>&RCL; indexing processes plain text, HTML, openoffice
|
<para>&RCL; indexing processes plain text, HTML, openoffice
|
||||||
and e-mail files internally (a few more actually).</para>
|
and e-mail files internally (a few more actually).</para>
|
||||||
|
|
||||||
<para>Other file types (ie: postscript, pdf, ms-word, rtf ...)
|
<para>Other file types (ie: postscript, pdf, ms-word, rtf ...)
|
||||||
need external applications for preprocessing. The list is in the
|
need external applications for preprocessing. The list is in the
|
||||||
<link linkend="rcl.install.external"> installation</link>
|
<link linkend="rcl.install.external"> installation</link>
|
||||||
section. After every indexing operation, &RCL; updates a list of
|
section. After every indexing operation, &RCL; updates a list of
|
||||||
commands that would be needed for indexing existing files
|
commands that would be needed for indexing existing files
|
||||||
types. This list can be displayed from the
|
types. This list can be displayed from the
|
||||||
<command>recoll</command> <guilabel>File</guilabel> menu. It is
|
<command>recoll</command> <guilabel>File</guilabel> menu. It is
|
||||||
stored in the <filename>missing</filename> text file
|
stored in the <filename>missing</filename> text file
|
||||||
inside the configuration directory.</para>
|
inside the configuration directory.</para>
|
||||||
|
|
||||||
<para>Without further configuration, &RCL; will index all
|
<para>Without further configuration, &RCL; will index all
|
||||||
appropriate files from your home directory, with a reasonable
|
appropriate files from your home directory, with a reasonable
|
||||||
set of defaults.</para>
|
set of defaults.</para>
|
||||||
|
|
||||||
<para>In some cases, it may be interesting to index different
|
<para>In some cases, it may be interesting to index different
|
||||||
areas of the file system to separate databases. You can do this
|
areas of the file system to separate databases. You can do this
|
||||||
@ -323,19 +321,19 @@ recoll
|
|||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
|
|
||||||
<para>The size of the index is determined by the document set size,
|
<para>The size of the index is determined by the document set size,
|
||||||
but the ratio can vary a lot. For a typical mixed
|
but the ratio can vary a lot. For a typical mixed
|
||||||
set of documents, the index size will often be close to
|
set of documents, the index size will often be close to
|
||||||
the data set size. In specific cases (a set of compressed
|
the data set size. In specific cases (a set of compressed
|
||||||
mbox files for example), the index can become much bigger than
|
mbox files for example), the index can become much bigger than
|
||||||
the documents. It may also be much smaller if the documents
|
the documents. It may also be much smaller if the documents
|
||||||
contain a lot of images or other non-indexed data (an extreme
|
contain a lot of images or other non-indexed data (an extreme
|
||||||
example being a set of mp3 files where only the tags would be
|
example being a set of mp3 files where only the tags would be
|
||||||
indexed).</para>
|
indexed).</para>
|
||||||
|
|
||||||
<para>Of course, images, sound and video do not increase the
|
<para>Of course, images, sound and video do not increase the
|
||||||
index size, which means that it will be quite typical nowadays
|
index size, which means that it will be quite typical nowadays
|
||||||
(2006), that even a big index will be negligible against the
|
(2006), that even a big index will be negligible against the
|
||||||
total amount of data on the computer.</para>
|
total amount of data on the computer.</para>
|
||||||
|
|
||||||
<para>The index data directory (<filename>xapiandb</filename>)
|
<para>The index data directory (<filename>xapiandb</filename>)
|
||||||
only contains data that can be completely rebuilt by an index
|
only contains data that can be completely rebuilt by an index
|
||||||
@ -385,20 +383,20 @@ recoll
|
|||||||
<title>Security aspects</title>
|
<title>Security aspects</title>
|
||||||
|
|
||||||
<para>The &RCL; index does not hold copies of the indexed
|
<para>The &RCL; index does not hold copies of the indexed
|
||||||
documents. But it does hold enough data to allow for an almost
|
documents. But it does hold enough data to allow for an almost
|
||||||
complete reconstruction. If confidential data is indexed,
|
complete reconstruction. If confidential data is indexed,
|
||||||
access to the database directory should be restricted. </para>
|
access to the database directory should be restricted. </para>
|
||||||
|
|
||||||
<para>As of version 1.4, &RCL; will create the configuration
|
<para>As of version 1.4, &RCL; will create the configuration
|
||||||
directory with a mode of 0700 (access by owner only). As the
|
directory with a mode of 0700 (access by owner only). As the
|
||||||
index data directory is by default a sub-directory of the
|
index data directory is by default a sub-directory of the
|
||||||
configuration directory, this should result in appropriate
|
configuration directory, this should result in appropriate
|
||||||
protection.</para>
|
protection.</para>
|
||||||
|
|
||||||
<para>If you use another setup, you should think of the kind
|
<para>If you use another setup, you should think of the kind
|
||||||
of protection you need for your index, set the directory
|
of protection you need for your index, set the directory
|
||||||
and files access modes appropriately, and also maybe adjust
|
and files access modes appropriately, and also maybe adjust
|
||||||
the <literal>umask</literal> used during index updates.</para>
|
the <literal>umask</literal> used during index updates.</para>
|
||||||
|
|
||||||
|
|
||||||
</sect2>
|
</sect2>
|
||||||
@ -409,38 +407,38 @@ recoll
|
|||||||
<title>Indexing configuration</title>
|
<title>Indexing configuration</title>
|
||||||
|
|
||||||
<para>Variables set inside the
|
<para>Variables set inside the
|
||||||
<link linkend="rcl.install.config">&RCL; configuration files</link>
|
<link linkend="rcl.install.config">&RCL; configuration files</link>
|
||||||
control which areas of the file system are indexed, and how
|
control which areas of the file system are indexed, and how
|
||||||
files are processed. These variables can be set either by
|
files are processed. These variables can be set either by
|
||||||
editing the text files or using the dialogs in the
|
editing the text files or using the dialogs in the
|
||||||
<command>recoll</command> GUI.</para>
|
<command>recoll</command> GUI.</para>
|
||||||
|
|
||||||
<para>You can also use <link linkend="rcl.search.multidb">multiple
|
<para>You can also use <link linkend="rcl.search.multidb">multiple
|
||||||
indexes</link> defined by separate configurations, typically to
|
indexes</link> defined by separate configurations, typically to
|
||||||
separate personal and shared indexes, or to take advantage of
|
separate personal and shared indexes, or to take advantage of
|
||||||
the organization of your data to improve search precision.</para>
|
the organization of your data to improve search precision.</para>
|
||||||
|
|
||||||
<para>The first time you start <command>recoll</command>, you
|
<para>The first time you start <command>recoll</command>, you
|
||||||
will be asked whether or not you would like it to build the
|
will be asked whether or not you would like it to build the
|
||||||
index. If you want to adjust the configuration before indexing,
|
index. If you want to adjust the configuration before indexing,
|
||||||
just click <guilabel>Cancel</guilabel> at this point, which will get
|
just click <guilabel>Cancel</guilabel> at this point, which will get
|
||||||
you into the configuration interface. If you exit,
|
you into the configuration interface. If you exit,
|
||||||
<filename>recoll</filename> will have created a ~/.recoll directory
|
<filename>recoll</filename> will have created a ~/.recoll directory
|
||||||
containing empty configuration files, which you can edit by hand.</para>
|
containing empty configuration files, which you can edit by hand.</para>
|
||||||
|
|
||||||
<para>The configuration is documented inside the <link
|
<para>The configuration is documented inside the
|
||||||
linkend="rcl.install.config">installation chapter</link> of this
|
<link linkend="rcl.install.config">installation chapter</link>
|
||||||
document, or in the recoll.conf(5) man page, but the most
|
of this document, or in the recoll.conf(5) man page, but the most
|
||||||
current information will most likely be the comments inside the
|
current information will most likely be the comments inside the
|
||||||
sample file. The most immediately useful variable you may
|
sample file. The most immediately useful variable you may
|
||||||
interested in is probably <link
|
interested in is probably
|
||||||
linkend="rcl.install.config.recollconf.topdirs">topdirs</link>,
|
<link linkend="rcl.install.config.recollconf.topdirs">topdirs</link>,
|
||||||
which determines what subtrees get indexed.</para>
|
which determines what subtrees get indexed.</para>
|
||||||
|
|
||||||
<para>The applications needed to index file types other than
|
<para>The applications needed to index file types other than
|
||||||
text, HTML or email (ie: pdf, postscript, ms-word...) are
|
text, HTML or email (ie: pdf, postscript, ms-word...) are
|
||||||
described in the <link linkend="rcl.install.external">external
|
described in the <link linkend="rcl.install.external">external
|
||||||
packages section</link></para>
|
packages section</link></para>
|
||||||
|
|
||||||
<sect2 id="rcl.indexing.config.gui">
|
<sect2 id="rcl.indexing.config.gui">
|
||||||
<title>The indexing configuration GUI</title>
|
<title>The indexing configuration GUI</title>
|
||||||
@ -510,7 +508,7 @@ recoll
|
|||||||
<title>Periodic indexing</title>
|
<title>Periodic indexing</title>
|
||||||
|
|
||||||
<sect2 id="rcl.indexing.periodic.exec">
|
<sect2 id="rcl.indexing.periodic.exec">
|
||||||
<title>Starting indexing</title>
|
<title>Running indexing</title>
|
||||||
|
|
||||||
<para>Indexing is performed either by the
|
<para>Indexing is performed either by the
|
||||||
<command>recollindex</command> program, or by the
|
<command>recollindex</command> program, or by the
|
||||||
@ -525,22 +523,22 @@ recoll
|
|||||||
<command>recollindex</command> command:
|
<command>recollindex</command> command:
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
<listitem><para>Starting the indexing thread is more convenient,
|
<listitem><para>Starting the indexing thread is more convenient,
|
||||||
being just one click away.</para>
|
being just one click away.</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
<listitem><para>The <command>recollindex</command> command has
|
<listitem><para>The <command>recollindex</command> command has
|
||||||
more options, especially the one to reset the index
|
more options, especially the one to reset the index
|
||||||
(<literal>-z</literal>).</para>
|
(<literal>-z</literal>).</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
<listitem><para>The <command>recollindex</command> command will
|
<listitem><para>The <command>recollindex</command> command will
|
||||||
not take down your GUI if it crashes (a rare occurrence, but who
|
not take down your GUI if it crashes (a rare occurrence,
|
||||||
knows...)</para>
|
but who knows...)</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
<listitem><para>The <command>recollindex</command> command uses
|
<listitem><para>The <command>recollindex</command> command uses
|
||||||
<command>setpriority/nice</command> to lower its priority while
|
<command>setpriority/nice</command> to lower its priority while
|
||||||
indexing
|
indexing
|
||||||
(it will also use <command>ionice</command> when this becomes
|
(it will also use <command>ionice</command> when this becomes
|
||||||
more widely available), the thread can't do it, else it would
|
more widely available), the thread can't do it, else it would
|
||||||
also slow down the user/search interface.</para>
|
also slow down the user/search interface.</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
I'll let the reader decide where my heart belongs...</para>
|
I'll let the reader decide where my heart belongs...</para>
|
||||||
@ -567,7 +565,24 @@ recoll
|
|||||||
up to date will not need to be reindexed).</para>
|
up to date will not need to be reindexed).</para>
|
||||||
|
|
||||||
<para><command>recollindex</command> has a number of other options
|
<para><command>recollindex</command> has a number of other options
|
||||||
which are described in its man page.</para>
|
which are described in its man page.</para>
|
||||||
|
|
||||||
|
<para>Of special interest maybe are the <literal>-i</literal> and
|
||||||
|
<literal>-f</literal> options. <literal>-i</literal> allows
|
||||||
|
indexing an explicit list of files (given as command line
|
||||||
|
parameters or read on stdin). <literal>-f</literal> tells
|
||||||
|
<command>recollindex</command> to ignore file selection
|
||||||
|
parameters from the configuration. Together, these options allow
|
||||||
|
building a custom file selection process for some area of the
|
||||||
|
file system, by adding the top directory to the
|
||||||
|
<literal>skippedPaths</literal> list and using an appropriate
|
||||||
|
file selection method to build the file list to be fed to
|
||||||
|
<literal>recollindex -if</literal> .</para>
|
||||||
|
|
||||||
|
<para><literal>recollindex -i</literal> will not descend into
|
||||||
|
directory parameters, but just add them as index entries. It is
|
||||||
|
up to the external file selection method to build the complete
|
||||||
|
file list.</para>
|
||||||
</sect2>
|
</sect2>
|
||||||
|
|
||||||
<sect2 id="rcl.indexing.periodic.automat">
|
<sect2 id="rcl.indexing.periodic.automat">
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user