1436 lines
62 KiB
Plaintext
1436 lines
62 KiB
Plaintext
<!DOCTYPE BOOK PUBLIC "-//FreeBSD//DTD DocBook V4.1-Based Extension//EN" [
|
|
<!ENTITY RCL "<application>Recoll</application>">
|
|
<!ENTITY XAP "<application>Xapian</application>">
|
|
|
|
]>
|
|
|
|
<book lang="en">
|
|
|
|
<bookinfo>
|
|
<title>Recoll user manual</title>
|
|
|
|
|
|
<author>
|
|
<firstname>Jean-Francois</firstname>
|
|
<surname>Dockes</surname>
|
|
<affiliation>
|
|
<address><email>jean-francois.dockes@wanadoo.fr</email></address>
|
|
</affiliation>
|
|
</author>
|
|
|
|
<copyright>
|
|
<year>2005</year>
|
|
<holder role="mailto:jean-francois.dockes@wanadoo.fr">Jean-Francois
|
|
Dockes</holder>
|
|
</copyright>
|
|
|
|
<releaseinfo>$Id: usermanual.sgml,v 1.22 2006-10-12 08:39:55 dockes Exp $</releaseinfo>
|
|
|
|
<abstract>
|
|
<para>This document introduces full text search notions
|
|
and describes the installation and use of the &RCL; application.</para>
|
|
</abstract>
|
|
|
|
|
|
</bookinfo>
|
|
|
|
<chapter id="rcl.introduction">
|
|
<title>Introduction</title>
|
|
|
|
<sect1 id="rcl.introduction.tryit">
|
|
<title>Giving it a try</title>
|
|
|
|
<para>If you do not like reading manuals (who does?) and would
|
|
like to give &RCL; a try, just perform <link
|
|
linkend="rcl.install.binary">installation</link> and start the
|
|
<command>recoll</command> user interface, which will index your
|
|
home directory by default, allowing you to search immediately after
|
|
indexing completes.</para>
|
|
|
|
<para>Do not do this if your home directory contains a huge
|
|
number of documents and you do not want to wait or are very
|
|
short on disk space. In this case, you may want to edit the <link
|
|
linkend="rcl.indexing.config">configuration file</link> first to
|
|
restrict the indexed area.</para>
|
|
|
|
<para>Also be aware that you may need to install the
|
|
appropriate <link linkend="rcl.install.external">
|
|
supporting applications</link> for document types that need
|
|
them (for example <application>antiword</application> for
|
|
ms-word files).</para>
|
|
|
|
<sect1 id="rcl.introduction.search">
|
|
<title>Full text search</title>
|
|
|
|
<para>&RCL; is a full text search application. Full text search
|
|
applications let you find your data by content rather
|
|
than by external attributes (like a file name). More
|
|
specifically, they will let you specify words (terms) that
|
|
should or should not appear in the text you are looking for,
|
|
and return a list of matching documents, ordered so that the
|
|
most <emphasis>relevant</emphasis> documents will appear
|
|
first.</para>
|
|
|
|
<para>You do not need to remember in what file or email message you
|
|
stored a given piece of information. You just ask for related
|
|
terms, and the tool will return a list of documents where
|
|
those terms are prominent, in a similar way to Internet search
|
|
engines.</para>
|
|
|
|
<para>&RCL; tries to determine which documents are most relevant to
|
|
the search terms you provide. Computer algorithms for determining
|
|
relevance can be very complex, and in general are inferior to the
|
|
power of the human mind to rapidly determine relevance. The quality
|
|
of relevance guessing by the search tool is probably the most
|
|
important element for a search application.</para>
|
|
|
|
<para>In many cases, you are looking for all the forms of a
|
|
word, not for a specific form or spelling. These different
|
|
forms may include plurals, different tenses for a verb, or
|
|
terms derived from the same root or <emphasis>stem</emphasis>
|
|
(example: floor, floors, floored, flooring...). &RCL; will by
|
|
default expand queries to all such related terms (words that
|
|
reduce to the same stem). This expansion can be disabled at
|
|
search time.</para>
|
|
|
|
<para>Stemming, by itself, does not accommodate for misspellings or
|
|
phonetic searches. &RCL; currently does not support these
|
|
features.</para>
|
|
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="rcl.introduction.recoll">
|
|
<title>Recoll overview</title>
|
|
|
|
<para>&RCL; uses the
|
|
<ulink url="http://www.xapian.org">&XAP;</ulink> information retrieval
|
|
library as its storage and retrieval engine. &XAP; is a very
|
|
mature package using <ulink
|
|
url="http://www.xapian.org/docs/intro_ir.html">a sophisticated
|
|
probabilistic ranking model</ulink>. &RCL; provides the interface
|
|
to get data into (indexing) and out (searching) of the system.</para>
|
|
|
|
<para>In practice, &XAP; works by remembering where terms appear
|
|
in your document files. The acquisition process is called
|
|
indexing. </para>
|
|
|
|
<para>The resulting index can be big (roughly the size of the
|
|
original document set), but it is not a document
|
|
archive. &RCL; can only display documents that still exist at
|
|
the place from which they were indexed. (Actually, there is a
|
|
way to reconstruct a document from the information in the
|
|
index, but the result is not nice, as all formatting,
|
|
punctuation and capitalization are lost).</para>
|
|
|
|
<para>&RCL; stores all internal data in <application>Unicode
|
|
UTF-8</application> format, and it can index files with
|
|
different character sets, encodings, and languages into the same
|
|
index. It has input filters for many document types.</para>
|
|
|
|
<para>Stemming depends on the document language. &RCL; stores
|
|
the unstemmed versions of terms and uses auxiliary databases for
|
|
term expansion. It can switch stemming languages, or add a
|
|
language, without re-indexing. Storing documents in different
|
|
languages in the same index is possible, and useful in
|
|
practice, but does introduce possibilities of confusion. &RCL;
|
|
currently makes no attempt at automatic language recognition.</para>
|
|
|
|
<para>&RCL; has many parameters which define exactly what to
|
|
index, and how to classify and decode the source
|
|
documents. These are kept in a <link
|
|
linkend="rcl.indexing.config">configuration file</link>. A
|
|
default configuration is copied into a standard location
|
|
(usually something like
|
|
<filename>/usr/[local/]share/recoll/examples</filename>)
|
|
during installation. The default parameters from this file may
|
|
be overridden by values that you set inside your personal
|
|
configuration, found by default in the
|
|
<filename>.recoll</filename> sub-directory of your home
|
|
directory. The default configuration will index your home
|
|
directory with default parameters and should be sufficient for
|
|
giving &RCL; a try, but you may want to adjust it
|
|
later.</para>
|
|
|
|
<para><link linkend="rcl.indexing.exec">Indexing</link> is started
|
|
automatically the first time you execute the
|
|
<command>recoll</command> search graphical user interface, or by
|
|
executing the <command>recollindex</command> command.</para>
|
|
|
|
<para><link linkend="rcl.search">Searches</link> are
|
|
performed inside the <command>recoll</command>
|
|
program, which has many options to help you find what you are
|
|
looking for.</para>
|
|
|
|
</sect1>
|
|
</chapter>
|
|
|
|
|
|
<chapter id="rcl.indexing">
|
|
<title>Indexing</title>
|
|
|
|
<sect1 id="rcl.indexing.introduction">
|
|
<title>Introduction</title>
|
|
|
|
<para>Indexing is the process by which the set of documents is
|
|
analyzed and the data entered into the database. &RCL; indexing
|
|
is normally incremental: documents will only be processed if
|
|
they have been modified. On the first execution, of course, all
|
|
documents will need processing. A full index build can be forced
|
|
later on by specifying an option to the indexing command
|
|
(<command>recollindex -z</command>).</para>
|
|
|
|
<para>&RCL; indexing takes place at discrete times. There is
|
|
currently no interface to real time file modification
|
|
monitors. The typical usage is to have a nightly indexing run
|
|
<link linkend="rcl.indexing.automat">programmed</link> into your
|
|
<command>cron</command> file.</para>
|
|
|
|
<sidebar><para>There is nothing in &RCL; and &XAP;
|
|
that would prevent interfacing with a real time file
|
|
modification monitor, but this would tend to consume significant
|
|
system resources for dubious gain, because you rarely need a
|
|
full text search to find documents you just
|
|
modified. <command>recollindex -i</command> can be used to add
|
|
individual files to the index if you want to play with this, see
|
|
the manual page.</para>
|
|
</sidebar>
|
|
|
|
<para>&RCL; knows about quite a few different document
|
|
types. The parameters for document types recognition and
|
|
processing are set in
|
|
<link linkend="rcl.indexing.config">configuration files</link>
|
|
Most file types, like HTML or word processing files, only hold
|
|
one document. Some file types, like mail folder files can hold
|
|
many individually indexed documents.
|
|
</para>
|
|
|
|
<para>&RCL; indexing processes plain text, HTML, openoffice
|
|
and e-mail files internally. Other types (ie: postscript, pdf,
|
|
ms-word, rtf) need external applications for preprocessing. The
|
|
list is in the <link linkend="rcl.install.external">
|
|
installation</link> section.</para>
|
|
|
|
<para>Without further configuration, &RCL; will index all
|
|
appropriate files from your home directory, with a reasonable
|
|
set of defaults.</para>
|
|
|
|
<para>In some cases, it may be interesting to index different
|
|
areas of the file system to separate databases. You can do this
|
|
by using multiple configuration directories, each indexing a
|
|
file system area to a specific database. See the <link
|
|
linkend="rcl.search.multidb">section about using multiple
|
|
databases</link> for more information on multiple configurations
|
|
and indexes. </para>
|
|
</sect1>
|
|
|
|
<sect1 id="rcl.indexing.storage">
|
|
<title>Index storage</title>
|
|
|
|
<para>The default location for the index data is the
|
|
<filename>xapiandb</filename> subdirectory of the &RCL;
|
|
configuration directory, typically
|
|
<filename>$HOME/.recoll/xapiandb/</filename>. This can be
|
|
changed via two different methods (with different purposes):</para>
|
|
|
|
<itemizedlist>
|
|
|
|
<listitem><para>You can specify a different configuration
|
|
directory by setting the <literal>RECOLL_CONFDIR</literal>
|
|
environment variable, or using the <literal>-c</literal>
|
|
option to the &RCL; commands. This method would typically be
|
|
used to index different areas of the file system to
|
|
different indexes. For example, if you were to issue the
|
|
following commands:
|
|
<programlisting>
|
|
export RECOLL_CONFDIR=~/.indexes-email
|
|
recoll
|
|
</programlisting> Then &RCL; would use configuration files
|
|
stored in <filename>~/.indexes-email/</filename> and,
|
|
(unless specified otherwise in
|
|
<filename>recoll.conf</filename>) would look for
|
|
the index in <filename>~/.indexes-email/xapiandb/</filename>.
|
|
|
|
<para>Using multiple configuration directories and
|
|
<link linkend="rcl.install.config.recollconf">configuration
|
|
options</link> allows you to tailor multiple configurations
|
|
and indexes to handle whatever subset of the available data
|
|
that you wish to make searchable.</para>
|
|
|
|
</listitem>
|
|
<listitem><para>You can also specify a different storage
|
|
location for the index by setting the <literal>dbdir</literal>
|
|
parameter in the configuration file
|
|
(see the <link linkend="rcl.install.config.recollconf">configuration
|
|
section</link>). This method would mainly be of use if you
|
|
wanted to keep the configuration directory in its default location,
|
|
but desired another location for the index, typically out of
|
|
disk occupation concerns.</para>
|
|
</listitem>
|
|
|
|
</itemizedlist>
|
|
|
|
<para>The size of the index is determined by the size of the set
|
|
of documents, but the ratio can vary a lot. For a typical mixed
|
|
set of documents, the index size will often be close to
|
|
the data set size. In specific cases (a set of compressed
|
|
mbox files for example), the index can become much bigger than
|
|
the documents. It may also be much smaller if the documents
|
|
contain a lot of images or other non-indexed data (an extreme
|
|
example being a set of mp3 files where only the tags would be
|
|
indexed).</para>
|
|
|
|
<para>Of course, images, sound and video do not increase the
|
|
index size, which means that it will be quite typical nowadays
|
|
(2006), that even a big index will be negligible against the
|
|
total amount of data on the computer.</para>
|
|
|
|
<para>The index data directory (<filename>xapiandb</filename>)
|
|
only contains data that can be completely rebuilt by an index
|
|
run, and it can always be destroyed safely.</para>
|
|
|
|
<sect2 id="rcl.indexing.storage.security">
|
|
<title>Security aspects</title>
|
|
|
|
<para>The &RCL; index does not hold copies of the indexed
|
|
documents. But it does hold enough data to allow for an almost
|
|
complete reconstruction. If confidential data is indexed,
|
|
access to the database directory should be restricted. </para>
|
|
|
|
<para>As of version 1.4, &RCL; will create the configuration
|
|
directory with a mode of 0700 (access by owner only). As the
|
|
index data directory is by default a sub-directory of the
|
|
configuration directory, this should result in appropriate
|
|
protection.</para>
|
|
|
|
<para>If you use another setup, you should think of the kind
|
|
of protection you need for your index, and set the directory
|
|
and files access modes appropriately.</para>
|
|
|
|
</sect2>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="rcl.indexing.config">
|
|
<title>The indexing configuration</title>
|
|
|
|
<para>You can control which areas of the file system are
|
|
indexed, and how files are processed, by setting variables inside
|
|
the <link linkend="rcl.install.config">&RCL; configuration
|
|
files</link>.</para>
|
|
|
|
<para>You can also use <link linkend="rcl.search.multidb">multiple
|
|
indexes</link> defined by separate configurations, typically to
|
|
separate personal and shared indexes, or to take advantage of
|
|
the organization of your data to improve search precision.</para>
|
|
|
|
<para>The first time you start <command>recoll</command>, you
|
|
will be asked whether or not you would like recoll to build the
|
|
index. If you want to adjust the configuration before indexing,
|
|
just click <guilabel>Cancel</guilabel> at this point. That way,
|
|
recoll will have created a ~/.recoll directory containing empty
|
|
configuration files.</para>
|
|
|
|
<para>The configuration is documented inside the <link
|
|
linkend="rcl.install.config">installation chapter</link> of
|
|
this document, or in the recoll.conf(5) man page. The most
|
|
immediately useful variable you may interested in is probably <link
|
|
linkend="rcl.install.config.recollconf.topdirs">topdirs</link>,
|
|
which determines what subtrees get indexed.</para>
|
|
|
|
<para>The applications needed to index file types other than
|
|
text, HTML or email (ie: pdf, postscript, ms-word...) are
|
|
described in the <link linkend="rcl.install.external">external
|
|
packages section</link></para>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="rcl.indexing.exec">
|
|
<title>Starting indexing</title>
|
|
|
|
<para>Indexing is performed either by the
|
|
<command>recollindex</command> program, or by the
|
|
indexing thread inside the <command>recoll</command>
|
|
program (use the <guimenu>File</guimenu> menu). Both programs
|
|
will use of the <literal>RECOLL_CONFDIR</literal>
|
|
variable or accept a <literal>-c</literal>
|
|
<replaceable>confdir</replaceable> option to specify the
|
|
configuration directory to be used.</para>
|
|
|
|
<para>If the <command>recoll</command> program finds no index
|
|
when it starts, it will automatically start indexing (except
|
|
if canceled).</para>
|
|
|
|
<para>It is best to avoid interrupting the indexing process, as
|
|
this may sometimes leave the index in a bad state. This is
|
|
not a serious problem, as you then just need to delete
|
|
the index files and restart the indexing. The index files are
|
|
normally stored in the <filename>$HOME/.recoll/xapiandb</filename>
|
|
directory, which you can just delete if needed. Alternatively,
|
|
you can start <command>recollindex</command> with option
|
|
<literal>-z</literal>, which will reset the database before
|
|
indexing.</para>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="rcl.indexing.automat">
|
|
<title>Using <command>cron</command> to automate
|
|
indexing</title>
|
|
|
|
<para>The most common way to set up indexing is to have a cron
|
|
task execute it every night. For example the following
|
|
<filename>crontab</filename> entry would do it every day at
|
|
3:30AM (supposing <command>recollindex</command> is in your PATH):</para>
|
|
|
|
<programlisting>30 3 * * * recollindex > /tmp/recolltrace 2>&1</programlisting>
|
|
|
|
<para>The usual command to edit your
|
|
<filename>crontab</filename> is
|
|
<userinput>crontab -e</userinput> (which will usually start the
|
|
<command>vi</command> editor to edit the file). You may have
|
|
more sophisticated tools available on your system.</para>
|
|
|
|
</sect1>
|
|
|
|
</chapter>
|
|
|
|
<chapter id="rcl.search">
|
|
<title>Search</title>
|
|
|
|
<para>The <command>recoll</command> program provides the user
|
|
interface for searching. It is based on the
|
|
<application>QT</application> library.</para>
|
|
|
|
<sect1 id="rcl.search.simple">
|
|
<title>Simple search</title>
|
|
|
|
<procedure>
|
|
<step><para>Start the <command>recoll</command> program.</para>
|
|
</step>
|
|
<step><para>Possibly choose a search mode: <guilabel>Any
|
|
term</guilabel> or <guilabel>All terms</guilabel> or
|
|
<guilabel>File name</guilabel>.</para>
|
|
</step>
|
|
<step><para>Enter search term(s) in the text field at the top of the
|
|
window.</para>
|
|
</step>
|
|
<step><para>Click the <guilabel>Search</guilabel> button or
|
|
hit the <keycap>Enter</keycap> key to start the search.</para>
|
|
</step>
|
|
</procedure>
|
|
|
|
<para>The initial default search mode is <guilabel>Any
|
|
term</guilabel>. This will look for documents with any of the
|
|
search terms (the ones with more terms will get better scores).
|
|
<guilabel>All terms</guilabel> will ensure
|
|
that only documents with all the terms will be
|
|
returned. <guilabel>File name</guilabel> will specifically
|
|
look for file names, and allows using wildcards
|
|
(<literal>*</literal>, <literal>?</literal> ,
|
|
<literal>[]</literal>). </para>
|
|
|
|
<para>You can search for exact phrases (adjacent words in a
|
|
given order) by enclosing the input inside double quotes. Ex:
|
|
<literal>"virtual reality"</literal>.</para>
|
|
<para>Character case has no influence on search, except that you
|
|
can disable stem expansion for any term by capitalizing it. Ie:
|
|
a search for <literal>floor</literal> will also normally look for
|
|
<literal>flooring</literal>, <literal>floored</literal>, etc., but
|
|
a search for <literal>Floor</literal> will only look for
|
|
<literal>floor</literal>, in any character case (stemming can
|
|
also be disabled globally in the preferences). </para>
|
|
|
|
<para>&RCL; remembers the last few searches that you
|
|
performed. You can use the simple search text entry widget (a
|
|
combobox) to recall them (click on the thing at the right of the
|
|
text field). Please note, however, that only the search texts
|
|
are remembered, not the mode (all/any/file name).</para>
|
|
|
|
<para>Hitting <keycap>^Tab</keycap> (<keycap>Ctrl</keycap> +
|
|
<keycap>Tab</keycap>) while entering a word in the
|
|
simple search entry will open a window with possible completions
|
|
for the word. The completions are extracted from the
|
|
database.</para>
|
|
|
|
<para>Double-clicking on a word in the result list or a preview
|
|
window will insert it into the simple search entry field.</para>
|
|
|
|
<para>You can use the <guilabel>Tools</guilabel> / <guilabel>Advanced
|
|
search</guilabel> dialog for more complex searches.</para>
|
|
</sect1>
|
|
|
|
<sect1 id="rcl.search.reslist">
|
|
<title>The result list</title>
|
|
|
|
<para>After starting a search, a list of results will instantly
|
|
be displayed in the main list window.</para>
|
|
|
|
<para>By default, the document list is presented in order of
|
|
relevance (how well the system estimates that the document
|
|
matches the query). You can specify a different ordering by
|
|
using the <link linkend="rcl.search.sort"><guilabel>Tools</guilabel>
|
|
/ <guilabel>Sort parameters</guilabel></link> dialog.</para>
|
|
|
|
<para>Clicking on the
|
|
<literal>Preview</literal> link for an entry will open an
|
|
internal preview window for the document. Clicking the
|
|
<literal>Edit</literal> link will attempt to start an external
|
|
viewer (have a look at the <filename>mimeconf</filename>
|
|
configuration file to see how these are configured).</para>
|
|
|
|
<para>The <literal>Preview</literal> and <literal>Edit</literal>
|
|
edit links may not be present for all entries, meaning that
|
|
&RCL; has no configured way to preview a given file type (which
|
|
was indexed by name only), or no configured external viewer for
|
|
the file type. This can sometimes be adjusted simply by tweaking
|
|
the <link linkend="rclinstall.config.mimemap">
|
|
<filename>mimemap</filename></link> and
|
|
<link linkend="rclinstall.config.mimeconf">
|
|
<filename>mimeconf</filename></link> configuration files.</para>
|
|
|
|
<para>You can click on the <literal>Query details</literal> link
|
|
at the top of the results page to see the query actually
|
|
performed, after stem expansion and other processing.</para>
|
|
|
|
<para>Double-clicking on any word inside the result list or a
|
|
preview window will insert it into the simple search text.</para>
|
|
|
|
<para>The result list is divided into pages (the size of which
|
|
you can change in the preferences). Use the arrow buttons in the
|
|
toolbar or the links at the bottom of the page to browse the
|
|
results.</para>
|
|
|
|
|
|
<sect2 id="rcl.search.resultlist.menu">
|
|
<title>The result list right-click menu</title>
|
|
|
|
<para>Apart from the preview and edit links, you can display a
|
|
pop-up menu by right-clicking over a paragraph in the result
|
|
list. This menu has the following entries:</para>
|
|
|
|
<itemizedlist>
|
|
<listitem><para><guilabel>Preview</guilabel></para></listitem>
|
|
<listitem><para><guilabel>Edit</guilabel></para></listitem>
|
|
<listitem><para><guilabel>Copy File Name</guilabel></para></listitem>
|
|
<listitem><para><guilabel>Copy Url</guilabel></para></listitem>
|
|
<listitem><para><guilabel>Find similar</guilabel></para></listitem>
|
|
</itemizedlist>
|
|
|
|
<para>The <guilabel>Preview</guilabel> and
|
|
<guilabel>Edit</guilabel> entries do the same thing as the
|
|
corresponding links. The two following entries will copy either
|
|
an URL or the file path to the clipboard, for pasting into
|
|
another application.</para>
|
|
|
|
<para>The <guilabel>Find similar</guilabel> entry will select
|
|
a number of relevant term from the current document and enter
|
|
them into the simple search field. You can then start a simple
|
|
search, with a good chance of finding documents related to the
|
|
current result.</para>
|
|
|
|
</sect2>
|
|
</sect1>
|
|
|
|
<sect1 id="rcl.search.preview">
|
|
<title>The preview window</title>
|
|
|
|
<para>The preview window opens when you first click a
|
|
<literal>Preview</literal> link inside the result list.</para>
|
|
|
|
<para>Subsequent preview requests for a given search open new
|
|
tabs in the existing window.</para>
|
|
|
|
<para>Starting another search and requesting a preview will
|
|
create a new preview window. The old one stays open until you
|
|
close it.</para>
|
|
|
|
<para>You can close a preview tab by typing <keycap>^W</keycap>
|
|
(<keycap>Ctrl</keycap> + <keycap>W</keycap>) in the
|
|
window. Closing the last tab for a window will also close the
|
|
window.</para>
|
|
|
|
<para>Of course you can also close a preview window by using the
|
|
window manager button in the top of the frame.</para>
|
|
|
|
<para>You can display successive or previous documents from the
|
|
result list inside a preview tab by typing
|
|
<keycap>Shift</keycap>+<keycap>Down</keycap> or
|
|
<keycap>Shift</keycap>+<keycap>Up</keycap> (<keycap>Down</keycap>
|
|
and <keycap>Up</keycap> are the arrow keys).</para>
|
|
|
|
<para>The preview tabs have an internal incremental search
|
|
function. You initiate the search either by typing a
|
|
<keycap>/</keycap> (slash) inside the text area or by clicking
|
|
into the <guilabel>Search for:</guilabel> text field and
|
|
entering the search string. You can then use the
|
|
<guilabel>Next</guilabel> and <guilabel>Previous</guilabel>
|
|
buttons to find the next/previous occurrence. You can also type
|
|
<keycap>F3</keycap> inside the text area to get to the next
|
|
occurrence.</para>
|
|
|
|
<para>If you have a search string entered and you use ^Up/^Down
|
|
to browse the results, the search is initiated for each successive
|
|
document. If the string is found, the cursor will be positioned
|
|
at the first occurrence of the search string.</para>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="rcl.search.complex">
|
|
<title>Complex/advanced search</title>
|
|
|
|
<para>The advanced search dialog has fields that will allow a more
|
|
refined search, looking for documents with all given elements, a
|
|
given exact phrase, none of the given elements, or a given file
|
|
name (with wildcard expansion). All relevant fields will be
|
|
combined by an implicit AND clause. All fields except "Exact
|
|
phrase" can accept a mix of single words and phrases enclosed
|
|
in double quotes.</para>
|
|
|
|
<para>Advanced search will let you search for documents of specific mime
|
|
types (ie: only <literal>text/plain</literal>, or
|
|
<literal>text/HTML</literal> or
|
|
<literal>application/pdf</literal> etc...). The state of the
|
|
file type selection can be saved as the default (the file type
|
|
filter will not be activated at program start-up, but the lists
|
|
will be in the restored state).</para>
|
|
|
|
<para>You can also restrict the search results
|
|
to a sub-tree of the indexed area. If you need to do this often,
|
|
you may think of setting up multiple indexes instead, as the
|
|
performance will be much better.</para>
|
|
|
|
<para>Click on the <guilabel>Start Search</guilabel> button in
|
|
the advanced search dialog, or type <keycap>Enter</keycap> in
|
|
any text field to start the search. The button in
|
|
the main window always performs a simple search.</para>
|
|
|
|
<para>Click on the <literal>Show query details</literal> link at
|
|
the top of the result page to see the query expansion.</para>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="rcl.search.multidb">
|
|
<title>Multiple databases</title>
|
|
|
|
<para>Multiple &RCL; databases or indexes can be created by
|
|
using several configuration directories which are usually set to
|
|
index different areas of the file system. A specific index can
|
|
be selected for updating or searching, using the
|
|
<literal>RECOLL_CONFDIR</literal> environment variable or the
|
|
<literal>-c</literal> option to <command>recoll</command> and
|
|
<command>recollindex</command>.</para>
|
|
|
|
<para>A <command>recollindex</command> program instance can only
|
|
update one specific index.</para>
|
|
|
|
<para>A <command>recoll</command> program instance is also
|
|
associated with a specific index, which is the one to be
|
|
updated by its indexing thread, but it can use any
|
|
number of &RCL; indexes for searching. The external indexes
|
|
can be selected through the <guilabel>external
|
|
indexes</guilabel> tab in the preferences dialog.</para>
|
|
|
|
<para>Index selection is performed in two phases. A set of all
|
|
usable indexes must first be defined, and then the subset of
|
|
indexes to be used for searching. Of course, these parameters
|
|
are retained across program executions (there are kept
|
|
separately for each &RCL; configuration). The set of all indexes
|
|
is usually quite stable, while the active ones might typically
|
|
be adjusted quite frequently.</para>
|
|
|
|
<para>The main index (defined by
|
|
<literal>RECOLL_CONFDIR</literal>) is always active. If this is
|
|
undesirable, you can set up your base configuration to index
|
|
an empty directory.</para>
|
|
|
|
<para>As building the set of all indexes can be a little tedious
|
|
when done through the user interface, you can use the
|
|
<literal>RECOLL_EXTRA_DBS</literal> environment
|
|
variable to provide an initial set. This might typically be
|
|
set up by a system administrator so that every user does not
|
|
have to do it. The variable should define a colon-separated list
|
|
of index directories, ie:
|
|
</para>
|
|
<screen>export RECOLL_EXTRA_DBS=/some/place/xapiandb:/some/other/db</screen>
|
|
|
|
<para>A typical usage scenario for the multiple index feature
|
|
would be for a system administrator to set up a central index
|
|
for shared data, that you may choose to search, or not, in
|
|
addition to your personal data. Of course, there are other
|
|
possibilities. There are many cases where you know the subset of
|
|
files that you want to be searched for a given query, and where
|
|
restricting the query will much improve the precision of the
|
|
results. This can also be performed with the directory filter in
|
|
advanced search, but multiple indexes will have much better
|
|
performance and may be worth the trouble.</para>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="rcl.search.history">
|
|
<title>Document history</title>
|
|
|
|
<para>Documents that you actually view (with the internal preview
|
|
or an external tool) are entered into the document history,
|
|
which is remembered. You can display the history list by using
|
|
the <guilabel>Tools/</guilabel><guilabel>Doc History</guilabel> menu
|
|
entry.</para>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="rcl.search.sort">
|
|
<title>Sorting search results</title>
|
|
|
|
<para>The documents in a result list are normally sorted in
|
|
order of relevance. It is possible to specify different sort
|
|
parameters by using the <guimenu>Sort parameters</guimenu>
|
|
dialog (located in the <guimenu>Tools</guimenu>
|
|
menu).</para>
|
|
|
|
<para>The tool sorts a specified number of the most
|
|
relevant documents in the result list, according to
|
|
specified criteria. The currently available criteria are
|
|
<emphasis>date</emphasis> and <emphasis>mime type</emphasis>.</para>
|
|
|
|
<para>The sort parameters stay in effect until they are explicitly
|
|
reset, or the program exits. An activated sort is indicated in
|
|
the result list header.</para>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="rcl.search.tips">
|
|
<title>Search tips, shortcuts</title>
|
|
|
|
<formalpara><title>Term completion</title>
|
|
<para>Typing <keycap>^TAB</keycap> (<keycap>Control</keycap> +
|
|
<keycap>Tab</keycap>) in the simple
|
|
search entry field while entering a word will either complete
|
|
the current word if its beginning matches a unique term in the
|
|
index, or open a window to propose a list of completions</para>
|
|
</formalpara>
|
|
|
|
<formalpara><title>Picking up new terms from result or preview
|
|
text</title>
|
|
<para>Double-clicking on a word in the result list or in a
|
|
preview window will copy it to the simple search entry field.</para>
|
|
</formalpara>
|
|
|
|
<formalpara><title>Disabling stem expansion</title>
|
|
<para>Entering a capitalized word in any search field will prevent
|
|
stem expansion (no search for
|
|
<literal>gardening</literal> if you enter
|
|
<literal>Garden</literal> instead of
|
|
<literal>garden</literal>). This is the only case where
|
|
character case should make a difference for a &RCL;
|
|
search. You can also disable stem expansion or change the
|
|
stemming language in the preferences.</para>
|
|
</formalpara>
|
|
|
|
<formalpara><title>Phrases</title>
|
|
<para>A phrase can be looked for by enclosing it in double
|
|
quotes. Example: <literal>"user manual"</literal> will look
|
|
only for occurrences of <literal>user</literal> immediately
|
|
followed by <literal>manual</literal>. You can use the
|
|
<guilabel>This exact phrase</guilabel> field of the advanced
|
|
search dialog to the same effect. Phrases can be entered along
|
|
simple terms in all simple or advanced search entry fields
|
|
(except <guilabel>This exact phrase</guilabel>).</para>
|
|
</formalpara>
|
|
|
|
<formalpara><title>Browsing the result list inside a preview
|
|
window (1.5)</title>
|
|
<para>Entering <keycap>Shift-Down</keycap> or <keycap>Shift-Up</keycap>
|
|
(<keycap>Shift</keycap> + an arrow key) in a preview window will
|
|
display the next or the previous document from the result
|
|
list. Any secondary search currently active will be executed on
|
|
the new document.</para>
|
|
</formalpara>
|
|
|
|
<formalpara><title>AutoPhrases (1.5)</title>
|
|
<para>This option can be set in the preferences dialog. If it is
|
|
set, a phrase will be automatically built and added to simple
|
|
searches when looking for <literal>Any terms</literal>. This
|
|
will not change radically the results, but will give a relevance
|
|
boost to the results where the search terms appear as a
|
|
phrase. Ie: searching for <literal>virtual reality</literal>
|
|
will still find all documents where either
|
|
<literal>virtual</literal> or <literal>reality</literal> or
|
|
both appear, but those which contain <literal>virtual
|
|
reality</literal> should appear sooner in the list.</para>
|
|
|
|
<formalpara><title>Finding related documents</title>
|
|
<para>Selecting the <guilabel>Find similar documents</guilabel> entry
|
|
in the result list paragraph right-click menu will select a
|
|
set of "interesting" terms from the current result, and insert
|
|
them into the simple search entry field. You can then possibly
|
|
edit the list and start a search to find documents which may
|
|
be apparented to the current result.</para>
|
|
</formalpara>
|
|
|
|
<formalpara><title>File names</title>
|
|
<para>File names are added as terms during indexing, and you can
|
|
specify them as ordinary terms in normal search fields (&RCL; used
|
|
to index all directories in the file path as terms. This has been
|
|
abandoned as it did not seem really useful). Alternatively, you
|
|
can use the specific file name search which will
|
|
<emphasis>only</emphasis> look for file names and can use wildcard
|
|
expansion.</para>
|
|
</formalpara>
|
|
|
|
<formalpara><title>Query explanation</title>
|
|
<para>You can get an exact description of what the query
|
|
looked for, including stem expansion, and Boolean operators
|
|
used, by clicking on the result list header.</para>
|
|
</formalpara>
|
|
|
|
<formalpara><title>Closing previews</title>
|
|
<para>Entering <keycap>^W</keycap> in a tab will
|
|
close it (and, for the last tab, close the preview
|
|
window). Entering <keycap>Esc</keycap> will close the preview
|
|
window and all its tabs.</para>
|
|
</formalpara>
|
|
|
|
<formalpara><title>Quitting</title>
|
|
<para>Entering <keycap>^Q</keycap> almost anywhere will
|
|
close the application.</para>
|
|
</formalpara>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="rcl.search.custom">
|
|
<title>Customizing the search interface</title>
|
|
|
|
<para>It is possible to customize some aspects of the search
|
|
interface by using <guimenu>Query configuration</guimenu> entry
|
|
in the <guimenu>Preferences</guimenu> menu.</para>
|
|
|
|
<para>There are two tabs in the dialog, dealing with the
|
|
interface itself, and with the parameters used for searching and
|
|
returning results.</para>
|
|
|
|
<formalpara><title>User interface parameters:</title>
|
|
<para>
|
|
<itemizedlist>
|
|
|
|
<listitem><para><guilabel>Number of results in a result
|
|
page</guilabel></para>
|
|
</listitem>
|
|
|
|
<listitem><para><guilabel>Result list font</guilabel>: There
|
|
is quite a lot of information shown in the result list, and
|
|
you may want to customize the font and/or font size. The rest
|
|
of the fonts used by &RCL; are determined by your generic QT
|
|
config (try the <command>qtconfig</command> command.</para>
|
|
</listitem>
|
|
|
|
<listitem><para><guilabel>HTML help browser</guilabel>: this
|
|
will let you chose your preferred browser which will be
|
|
started from the <guimenu>Help</guimenu> menu to read the user
|
|
manual. You can enter a simple name if the command is in your
|
|
PATH, or browse for a full pathname.</para>
|
|
</listitem>
|
|
|
|
<listitem><para><guilabel>Show document type icons in result
|
|
list</guilabel>: icons in the result list can be turned
|
|
off. They take quite a lot of space and convey relatively
|
|
little useful information.</para>
|
|
</listitem>
|
|
|
|
<listitem><para><guilabel>Auto-start simple search on
|
|
white space entry</guilabel>: if this is checked, a search will
|
|
be executed each time you enter a space in the simple search
|
|
input field. This lets you look at the result list as you
|
|
enter new terms. This is off by default, you may like it or
|
|
not...</para>
|
|
</listitem>
|
|
|
|
</itemizedlist>
|
|
</para>
|
|
</formalpara>
|
|
|
|
|
|
<formalpara><title>Search parameters:</title>
|
|
<para>
|
|
<itemizedlist>
|
|
|
|
<listitem><para><guilabel>Stemming language</guilabel>:
|
|
stemming obviously depends on the document's language. This
|
|
listbox will let you chose among the stemming databases which
|
|
were built during indexing (this is set in the <link
|
|
linkend="rcl.install.config.recollconf">main configuration
|
|
file</link>), or later added with
|
|
<command>recollindex -s</command> (See the recollindex
|
|
manual). Stemming languages which are dynamically added will be
|
|
deleted at the next indexing pass unless they are also added in
|
|
the configuration file.</para>
|
|
</listitem>
|
|
|
|
<listitem><para><guilabel>Dynamically build
|
|
abstracts</guilabel>: this decides if &RCL; tries to build
|
|
document abstracts when displaying the result list. Abstracts
|
|
are constructed by taking context from the document
|
|
information, around the search terms. This can slow down
|
|
result list display significantly for big documents, and you
|
|
may want to turn it off.</para>
|
|
</listitem>
|
|
|
|
<listitem><para><guilabel>Replace abstracts from
|
|
documents</guilabel>: this decides if we should synthesize and
|
|
display an abstract in place of an explicit abstract found
|
|
within the document itself.</para>
|
|
</listitem>
|
|
|
|
<listitem><para><guilabel>Synthetic abstract size</guilabel>:
|
|
adjust to taste...</para>
|
|
</listitem>
|
|
|
|
<listitem><para><guilabel>Synthetic abstract context
|
|
words</guilabel>: how many words should be displayed around
|
|
each term occurrence.</para>
|
|
</listitem>
|
|
|
|
</itemizedlist>
|
|
</para>
|
|
</formalpara>
|
|
|
|
<formalpara id="rcl.search.custom.extradb">
|
|
<title>External indexes:</title>
|
|
<para>This panel will let you browse for additional indexes
|
|
that you may want to search. External indexes are designated by
|
|
their database directory (ie:
|
|
<filename>/home/someothergui/.recoll/xapiandb</filename>,
|
|
<filename>/usr/local/recollglobal/xapiandb</filename>).</para>
|
|
|
|
<para>Once entered, the indexes will appear in the
|
|
<guilabel>All indexes</guilabel> list, and you can
|
|
chose which ones you want to use at any moment by transferring
|
|
them to/from the <guilabel>Active indexes</guilabel>
|
|
list.</para>
|
|
<para>Your main database (the one the current configuration
|
|
indexes to), is always implicitly active. If this is not
|
|
desirable, you can set up your configuration so that it indexes,
|
|
for example, an empty directory.</para>
|
|
|
|
</sect1>
|
|
|
|
</chapter>
|
|
|
|
|
|
<chapter id="rcl.install">
|
|
<title>Installation</title>
|
|
|
|
<sect1 id="rcl.install.binary">
|
|
<title>Installing a prebuilt copy</title>
|
|
|
|
<para>Recoll binary installations are always linked statically
|
|
to the xapian libraries, and have no other dependencies. You
|
|
will only have to check or install
|
|
<link linkend="rcl.install.external">supporting
|
|
applications</link> for the file types that you want to index
|
|
beyond text, HTML and mail files.</para>
|
|
|
|
<sect2 id="rcl.install.binary.package">
|
|
<title>Installing through a package system</title>
|
|
|
|
<para>If you use a BSD-type port system or a
|
|
prebuilt package (RPM or other), just follow the usual
|
|
procedure, and maybe have a look at the <link
|
|
linkend="rcl.install.config">configuration
|
|
section</link> (but this may not be necessary for a quick
|
|
test with default parameters).</para>
|
|
|
|
</sect2>
|
|
|
|
<sect2 id="rcl.install.binary.rcl">
|
|
<title>Installing a prebuilt &RCL;</title>
|
|
|
|
<para>The unpackaged binary versions are just compressed tar
|
|
files of a build tree, where only the useful parts were kept
|
|
(executables and sample configuration).</para>
|
|
|
|
<para>The executable binary files are built with a static link to
|
|
libxapian and libiconv, to make installation easier (no
|
|
dependencies). However, this also means that you cannot change
|
|
the versions which are used.</para>
|
|
|
|
<para>After extracting the tar file, you can proceed with
|
|
<link linkend="rcl.install.building.install">installation</link> as
|
|
if you had built the package from source (that is, just type
|
|
<literal>make install</literal>). The binary trees are built for
|
|
installation to <filename>/usr/local</filename>.</para>
|
|
|
|
<para>You may then need to install external applications to process
|
|
some file types that you want indexed (ie: acrobat,
|
|
postscript ...). See next section.</para>
|
|
|
|
<para>Finally, you may want to have a look at the <link
|
|
linkend="rcl.indexing.config">configuration section</link>.</para>
|
|
|
|
</sect2>
|
|
</sect1>
|
|
|
|
<sect1 id="rcl.install.external">
|
|
<title>Packages needed for external file types</title>
|
|
|
|
<para>&RCL; uses external applications
|
|
to index some file types. You need to install them for the
|
|
file types that you wish to have indexed (these are run-time
|
|
dependencies. None is needed for building &RCL;):</para>
|
|
|
|
<itemizedlist>
|
|
|
|
<listitem><para>PDF: pdftotext is part of the <ulink
|
|
url="http://www.foolabs.com/xpdf/">Xpdf</ulink> package.</para>
|
|
</listitem>
|
|
|
|
<listitem><para>Postscript: <ulink
|
|
url="http://www.cs.wisc.edu/~ghost/doc/pstotext.htm">
|
|
pstotext</ulink>.</para>
|
|
</listitem>
|
|
|
|
<listitem><para>MS Word: <ulink url="http://www.winfield.demon.nl">
|
|
antiword</ulink>.</para>
|
|
</listitem>
|
|
|
|
<listitem><para>MS Excel and PowerPoint:
|
|
<ulink url="http://www.45.free.net/~vitus/software/catdoc/">
|
|
catdoc</ulink>.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>RTF: <ulink
|
|
url="http://www.gnu.org/software/unrtf/unrtf.html">unrtf</ulink>
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>dvi: <ulink
|
|
url="http://www.radicaleye.com/dvips.html">dvips</ulink></para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>djvu:
|
|
<ulink
|
|
url="http://djvulibre.djvuzone.org/doc/index.html">DjVuLibre
|
|
</ulink></para>
|
|
</listitem>
|
|
|
|
<listitem><para>MP3: &RCL; will use the
|
|
<command>id3info</command> command from the <ulink
|
|
url="http://id3lib.sourceforge.net/">id3lib</ulink> package to
|
|
extract tag information. Without it, only the file names will
|
|
be indexed.</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para>Text, HTML, mail folders and Openoffice files are
|
|
processed internally.</para>
|
|
</sect1>
|
|
|
|
|
|
<sect1 id="rcl.install.building">
|
|
<title>Building from source</title>
|
|
|
|
<sect2 id="rcl.install.building.prereqs">
|
|
<title>Prerequisites</title>
|
|
|
|
<para>At the very least, you will need to download and install the
|
|
<ulink url="http://www.xapian.org">xapian core package</ulink>
|
|
(&RCL; development currently uses version 0.9.5), and the <ulink
|
|
url="http://www.trolltech.com/products/qt/index.html">qt
|
|
run-time and development packages</ulink> (&RCL; development
|
|
currently uses version 3.3.5, but any 3.3 version is
|
|
probably OK).</para>
|
|
|
|
<para>You will most probably be able to find a binary package for
|
|
<application>qt</application> for your system. You may have to
|
|
compile &XAP; but this is not difficult (if you are using
|
|
<application>FreeBSD</application>, there is a port).</para>
|
|
|
|
<para>You may also need
|
|
<ulink
|
|
url="http://www.gnu.org/software/libiconv/">libiconv</ulink>. &RCL;
|
|
currently uses version 1.9 (this should not be critical). On
|
|
<application>Linux</application> systems, the iconv interface
|
|
is part of libc and you should not need to do anything
|
|
special.</para>
|
|
|
|
<sect2 id="rcl.install.building.build">
|
|
<title>Building</title>
|
|
|
|
<para>&RCL; has been built on
|
|
Linux (redhat7.3, mandriva 2005, Fedora Core 3), FreeBSD and
|
|
Solaris 8. If you build on another system, <ulink
|
|
url="mailto:jean-francois.dockes@wanadoo.fr">I would very much
|
|
welcome patches</ulink>.</para>
|
|
|
|
<para>Depending on the <application>qt</application>
|
|
configuration on your system, you may have to set the
|
|
<literal>QTDIR</literal> and <literal>QMAKESPECS</literal>
|
|
variables in your environment:</para>
|
|
<itemizedlist>
|
|
<listitem><para><literal>QTDIR</literal> should point to the
|
|
directory above the one that holds the qt include files (ie:
|
|
qt.h).</para>
|
|
</listitem>
|
|
<listitem><para><literal>QMAKESPECS</literal> should
|
|
be set to the name of one of the
|
|
<application>qt</application> mkspecs sub-directories (ie:
|
|
linux-g++).</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para>On many Linux systems, <literal>QTDIR</literal> is set
|
|
by the login scripts, and <literal>QMAKESPECS</literal> is not
|
|
needed because there is a <filename>default</filename> link in
|
|
<filename>mkspecs/</filename>.</para>
|
|
|
|
<para>The &RCL; <command>configure</command> script does a
|
|
better job of checking these variables after release
|
|
1.1.1. Before this, unexplained errors will occur during
|
|
compilation if the environment is not set up. Also, for 1.1.0 the
|
|
<command>qmake</command> command should be in your PATH (later
|
|
releases can also find it in
|
|
<filename>$QTDIR/bin</filename>).</para>
|
|
|
|
<para>Normal procedure:</para>
|
|
<screen>
|
|
<userinput>cd recoll-xxx</userinput>
|
|
<userinput>configure</userinput>
|
|
<userinput>make</userinput>
|
|
<userinput>(practices usual hardship-repelling invocations)</userinput>
|
|
</screen>
|
|
|
|
|
|
<para>There little auto-configuration. The
|
|
<command>configure</command> script will mainly link one of
|
|
the system-specific files in the <filename>mk</filename>
|
|
directory to <filename>mk/sysconf</filename>. If your system
|
|
is not known yet, it will tell you as much, and you may want
|
|
to manually copy and modify one of the existing files (the new
|
|
file name should be the output of <command>uname -s</command>).</para>
|
|
</sect2>
|
|
|
|
<sect2 id="rcl.install.building.install">
|
|
<title>Installation</title>
|
|
|
|
<para>Either type <userinput>make install</userinput> or execute
|
|
<userinput>recollinstall
|
|
<replaceable>prefix</replaceable></userinput>, in the root
|
|
of the source tree. This will copy the commands to
|
|
<filename><replaceable>prefix</replaceable>/bin</filename>
|
|
and the sample configuration files, scripts and other shared
|
|
data to
|
|
<filename><replaceable>prefix</replaceable>/share/recoll</filename>.</para>
|
|
<para>If the installation prefix given to
|
|
<command>recollinstall</command> is different from what was
|
|
specified when executing <command>configure</command>, you
|
|
will have to set the <literal>RECOLL_DATADIR</literal>
|
|
environment variable to indicate where the shared data is to
|
|
be found.</para>
|
|
|
|
<para>You can then proceed to <link
|
|
linkend="rcl.install.config">configuration</link>. </para>
|
|
|
|
</sect2>
|
|
</sect1>
|
|
|
|
<sect1 id="rcl.install.config">
|
|
<title>Configuration overview</title>
|
|
|
|
<para>Most of the parameters specific to the
|
|
<command>recoll</command> GUI are set through the
|
|
<guilabel>Preferences</guilabel> menu and stored in the
|
|
standard QT place
|
|
(<filename>$HOME/.qt/recollrc</filename>). You probably do not
|
|
want to edit this by hand.</para>
|
|
|
|
<para>For other options, &RCL; uses text configuration
|
|
files. You will have to edit them by hand for
|
|
now (there is still some hope for a GUI configuration tool
|
|
in the future). The most accurate documentation for the
|
|
configuration parameters is given by comments inside the default
|
|
files, and we will just give a general overview here.</para>
|
|
|
|
<para>There are two sets of configuration files. The system-wide
|
|
files are kept in a directory named like
|
|
<filename>/usr/[local/]share/recoll/examples</filename>,
|
|
they define default values for the system. A parallel set of
|
|
files exists by default in the <filename>.recoll</filename> directory
|
|
in your home. This directory can be changed with the
|
|
<literal>RECOLL_CONFDIR</literal> environment variable or the -c
|
|
option parameter to <command>recoll</command> and
|
|
<command>recollindex</command>.</para>
|
|
|
|
<para>If the <filename>.recoll</filename> directory does not
|
|
exist when <command>recoll</command> or
|
|
<command>recollindex</command> are started, it
|
|
will be created with a set of empty configuration files.
|
|
<command>recoll</command> will give you a
|
|
chance to edit the configuration file before starting
|
|
indexing. <command>recollindex</command> will
|
|
proceed immediately.</para>
|
|
|
|
|
|
<para>All configuration files share the same format. For
|
|
example, a short extract of the main configuration file might
|
|
look as follows:</para>
|
|
<programlisting>
|
|
# Space-separated list of directories to index.
|
|
topdirs = ~/docs /usr/share/doc
|
|
|
|
[~/somedirectory-with-utf8-txt-files]
|
|
defaultcharset = utf-8
|
|
</programlisting>
|
|
|
|
<para>There are three kinds of lines: </para>
|
|
<itemizedlist>
|
|
<listitem><para>Comment (starts with
|
|
<emphasis>#</emphasis>) or empty.</para>
|
|
</listitem>
|
|
<listitem><para>Parameter affectation (<emphasis>name =
|
|
value</emphasis>).</para>
|
|
</listitem>
|
|
<listitem><para>Section definition
|
|
([<emphasis>somedirname</emphasis>]).</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para>Section definitions allow redefining some parameters for
|
|
a directory sub-tree. They stay in effect until another
|
|
section definition, or the end of file, is encountered. Some
|
|
of the parameters used for indexing are looked up
|
|
hierarchically from the current directory location
|
|
upwards. Not all parameters can be meaningfully redefined,
|
|
this is specified for each in the next section. </para>
|
|
|
|
<para>The tilde character (~) is expanded in file names to the
|
|
name of the user's home directory.</para>
|
|
|
|
<para>White space is used for separation inside lists.
|
|
Elements with embedded spaces can be quoted using
|
|
double-quotes.</para>
|
|
|
|
<sect2 id="rcl.install.config.recollconf">
|
|
<title>Main configuration file</title>
|
|
|
|
<para><filename>recoll.conf</filename> is the main
|
|
configuration file. It defines things like
|
|
what to index (top directories and things to ignore), and the
|
|
default character set to use for document types which do not
|
|
specify it internally.</para>
|
|
|
|
<para>The default configuration will index your home
|
|
directory. If this is not appropriate, start
|
|
<command>recoll</command> to create a blank
|
|
configuration, click <guimenu>Cancel</guimenu>, and edit
|
|
the configuration file before restarting the command. This
|
|
will start the initial indexing, which may take some time.</para>
|
|
|
|
<para>Paramers:</para>
|
|
|
|
<variablelist>
|
|
|
|
<varlistentry id="rcl.install.config.recollconf.topdirs">
|
|
<term><literal>topdirs</literal></term>
|
|
<listitem><para>Specifies the list of directories or files to
|
|
index (recursively for directories). The indexer will not
|
|
follow symbolic links inside the indexed trees. If an entry in
|
|
the <literal>topdirs</literal> list is a symbolic link,
|
|
indexing will not start and will generate an error.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry><term><literal>dbdir</literal></term>
|
|
<listitem><para>The name of the Xapian data directory. It
|
|
will be created if needed when the index is
|
|
initialized. If this is not an absolute path, it will be
|
|
interpreted relative to the configuration directory.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry><term><literal>skippedNames</literal></term>
|
|
<listitem>
|
|
<para>A space-separated list of patterns for
|
|
names of files or directories that should be completely
|
|
ignored. The list defined in the default file is: </para>
|
|
<programlisting>
|
|
*~ #* bin CVS Cache caughtspam tmp
|
|
</programlisting>
|
|
<para>The list can be redefined for sub-directories, but is only
|
|
actually changed for the top level ones in
|
|
<literal>topdirs</literal>.</para>
|
|
<para>The top-level directories are not affected by this
|
|
list (that is, a directory in <literal>topdirs</literal>
|
|
might match and would still be indexed).</para>
|
|
<para>The list in the default configuration does not
|
|
exclude hidden directories (names beginning with a
|
|
dot), which means that it may index quite a few things
|
|
that you do not want. On the other hand, mail user
|
|
agents like <application>thunderbird</application>
|
|
usually store messages in hidden directories, and you
|
|
probably want this indexed. One possible solution is to
|
|
have <userinput>.*</userinput> in
|
|
<literal>skippedNames</literal>, and add things like
|
|
<filename>~/.thunderbird</filename> or
|
|
<filename>~/.evolution</filename> in
|
|
<literal>topdirs</literal>.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry><term><literal>loglevel</literal></term>
|
|
<listitem><para>Verbosity level for recoll and
|
|
recollindex. A value of 4 lists quite a lot of
|
|
debug/information messages. 2 only lists errors. </para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry><term><literal>logfilename</literal></term>
|
|
<listitem><para>Where the messages should go. 'stderr' can
|
|
be used as a special value, and is the default. </para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry><term><literal>filtersdir</literal></term>
|
|
<listitem><para>A directory to search for the external
|
|
filter scripts used to index some types of files. The
|
|
value should not be changed, except if you want to modify
|
|
one of the default scripts. The value can be redefined for
|
|
any sub-directory. </para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry><term><literal>indexstemminglanguages</literal></term>
|
|
<listitem><para>A list of languages for which the stem
|
|
expansion databases will be built. See recollindex(1) for
|
|
possible values. You can add a stem expansion database for
|
|
a different language by using <command>recollindex
|
|
-s</command>, but it will be deleted during the next
|
|
indexing. Only languages listed in the configuration
|
|
file are permanent.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
<varlistentry><term><literal>defaultcharset</literal></term>
|
|
<listitem><para>The name of the character set used for
|
|
files that do not contain a character set definition (ie:
|
|
plain text files). This can be redefined for any
|
|
sub-directory. If it is not set at all, the character set
|
|
used is the one defined by the nls environment (LC_ALL,
|
|
LC_CTYPE, LANG), or iso8859-1 if nothing is set.</para>
|
|
|
|
<varlistentry><term><literal>guesscharset</literal></term>
|
|
<listitem><para>Decide if we try to guess the character
|
|
set of files if no internal value is available (ie: for
|
|
plain text files). This does not work well in general, and
|
|
should probably not be used. </para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry><term><literal>usesystemfilecommand</literal></term>
|
|
<listitem><para>Decide if we use the <command>file -i</command>
|
|
system command as a final step for determining the mime
|
|
type for a file (the main procedure uses suffix
|
|
associations as defined in the <filename>mimemap</filename>
|
|
file). This can be useful for files with suffix-less names,
|
|
but it will also cause the indexing of many bogus "text"
|
|
files.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry><term><literal>indexallfilenames</literal></term>
|
|
<listitem><para>&RCL; indexes file names in a special
|
|
section of the database to allow specific file names
|
|
searches using wild cards. This parameter decides if
|
|
file name indexing is performed only for files with mime
|
|
types that would qualify them for full text indexing, or
|
|
for all files inside the selected subtrees, independently of
|
|
mime type.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry><term><literal>idxabsmlen</literal></term>
|
|
<listitem><para>&RCL; stores an abstract for each indexed
|
|
file inside the database. This is so that they can be
|
|
displayed inside the result lists without decoding the
|
|
original file. This parameter defines the size of the
|
|
stored abstract (which can come from an actual section or
|
|
just be the beginning of the text). The default value is 250.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry><term><literal>iconsdir</literal></term>
|
|
<listitem><para>The name of the directory where
|
|
<command>recoll</command> result list icons are
|
|
stored. You can change this if you want different
|
|
images.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
</variablelist>
|
|
|
|
</sect2>
|
|
|
|
<sect2 id="rclinstall.config.mimemap">
|
|
<title>The mimemap file</title>
|
|
|
|
<para><filename>mimemap</filename> specifies the
|
|
file name extension to mime type mappings.</para>
|
|
|
|
<para>For file names without an extension, or with an unknown
|
|
one, the system's <command>file -i</command> command will be
|
|
executed to determine the mime type (this can be switched off
|
|
inside the main configuration file).</para>
|
|
|
|
<para>The mappings can be specified on a per-subtree basis,
|
|
which may be useful in some cases. Example:
|
|
<application>gaim</application> logs have a
|
|
<filename>.txt</filename> extension but
|
|
should be handled specially, which is possible because they
|
|
are usually all located in one place.</para>
|
|
|
|
<para><filename>mimemap</filename> also has a
|
|
<literal>recoll_noindex</literal> variable which is a list of
|
|
suffixes. Matching files will be skipped (avoids unnecessary
|
|
decompressions or <command>file</command> executions). This is
|
|
partially redundant with <literal>skippedNames</literal> in
|
|
the main configuration file, with two differences: it will not
|
|
affect directories, and it can be changed for any
|
|
sub-directory.</para>
|
|
|
|
</sect2>
|
|
|
|
<sect2 id="rclinstall.config.mimeconf">
|
|
<title>The mimeconf file</title>
|
|
|
|
<para><filename>mimeconf</filename> specifies how the
|
|
different mime types are handled for indexing, and for
|
|
display.</para>
|
|
|
|
<para>Changing the indexing parameters is probably not a
|
|
good idea except if you are a &RCL; developers.</para>
|
|
|
|
<para>You may want to adjust the external viewers defined in
|
|
(ie: HTML is either previewed internally or displayed using
|
|
<application>firefox</application>, but you may prefer
|
|
<application>mozilla</application>, your
|
|
<application>openoffice.org</application>
|
|
program might be named <command>oofice</command> instead of
|
|
<command>openoffice</command> ...). Look
|
|
for the <literal>[view]</literal> section.</para>
|
|
|
|
<para>You can also change the icons which are displayed by
|
|
<command>recoll</command> in the result lists (the values are
|
|
the basenames of the png images inside the
|
|
<filename>iconsdir</filename> directory (specified in
|
|
<filename>recoll.conf</filename>).</para>
|
|
|
|
</sect2>
|
|
|
|
</sect1>
|
|
</chapter>
|
|
|
|
</book>
|
|
|