2068 lines
88 KiB
Plaintext
2068 lines
88 KiB
Plaintext
<!DOCTYPE BOOK PUBLIC "-//FreeBSD//DTD DocBook V4.1-Based Extension//EN" [
|
|
<!ENTITY RCL "<application>Recoll</application>">
|
|
<!ENTITY XAP "<application>Xapian</application>">
|
|
|
|
]>
|
|
|
|
<book lang="en">
|
|
|
|
<bookinfo>
|
|
<title>Recoll user manual</title>
|
|
|
|
|
|
<author>
|
|
<firstname>Jean-Francois</firstname>
|
|
<surname>Dockes</surname>
|
|
<affiliation>
|
|
<address><email>jean-francois.dockes@wanadoo.fr</email></address>
|
|
</affiliation>
|
|
</author>
|
|
|
|
<copyright>
|
|
<year>2005</year>
|
|
<holder role="mailto:jean-francois.dockes@wanadoo.fr">Jean-Francois
|
|
Dockes</holder>
|
|
</copyright>
|
|
|
|
<releaseinfo>$Id: usermanual.sgml,v 1.40 2007-02-14 10:10:42 dockes Exp $</releaseinfo>
|
|
|
|
<abstract>
|
|
<para>This document introduces full text search notions
|
|
and describes the installation and use of the &RCL; application.</para>
|
|
</abstract>
|
|
|
|
|
|
</bookinfo>
|
|
|
|
<chapter id="rcl.introduction">
|
|
<title>Introduction</title>
|
|
|
|
<sect1 id="rcl.introduction.tryit">
|
|
<title>Giving it a try</title>
|
|
|
|
<para>If you do not like reading manuals (who does?) and would
|
|
like to give &RCL; a try, just perform <link
|
|
linkend="rcl.install.binary">installation</link> and start the
|
|
<command>recoll</command> user interface, which will index your
|
|
home directory by default, allowing you to search immediately after
|
|
indexing completes.</para>
|
|
|
|
<para>Do not do this if your home directory contains a huge
|
|
number of documents and you do not want to wait or are very
|
|
short on disk space. In this case, you may want to edit the <link
|
|
linkend="rcl.indexing.config">configuration file</link> first to
|
|
restrict the indexed area.</para>
|
|
|
|
<para>Also be aware that you may need to install the
|
|
appropriate <link linkend="rcl.install.external">
|
|
supporting applications</link> for document types that need
|
|
them (for example <application>antiword</application> for
|
|
ms-word files).</para>
|
|
|
|
<sect1 id="rcl.introduction.search">
|
|
<title>Full text search</title>
|
|
|
|
<para>&RCL; is a full text search application. Full text search
|
|
applications let you find your data by content rather
|
|
than by external attributes (like a file name). More
|
|
specifically, they will let you specify words (terms) that
|
|
should or should not appear in the text you are looking for,
|
|
and return a list of matching documents, ordered so that the
|
|
most <emphasis>relevant</emphasis> documents will appear
|
|
first.</para>
|
|
|
|
<para>You do not need to remember in what file or email message you
|
|
stored a given piece of information. You just ask for related
|
|
terms, and the tool will return a list of documents where
|
|
those terms are prominent, in a similar way to Internet search
|
|
engines.</para>
|
|
|
|
<para>&RCL; tries to determine which documents are most relevant to
|
|
the search terms you provide. Computer algorithms for determining
|
|
relevance can be very complex, and in general are inferior to the
|
|
power of the human mind to rapidly determine relevance. The quality
|
|
of relevance guessing by the search tool is probably the most
|
|
important element for a search application.</para>
|
|
|
|
<para>In many cases, you are looking for all the forms of a
|
|
word, not for a specific form or spelling. These different
|
|
forms may include plurals, different tenses for a verb, or
|
|
terms derived from the same root or <emphasis>stem</emphasis>
|
|
(example: floor, floors, floored, flooring...). &RCL; will by
|
|
default expand queries to all such related terms (words that
|
|
reduce to the same stem). This expansion can be disabled at
|
|
search time.</para>
|
|
|
|
<para>Stemming, by itself, does not accommodate for misspellings or
|
|
phonetic searches. &RCL; supports these features through a specific
|
|
tool (the <literal>term explorer</literal>) which will let you
|
|
explore the set of index terms along different modes.</para>
|
|
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="rcl.introduction.recoll">
|
|
<title>Recoll overview</title>
|
|
|
|
<para>&RCL; uses the
|
|
<ulink url="http://www.xapian.org">&XAP;</ulink> information retrieval
|
|
library as its storage and retrieval engine. &XAP; is a very
|
|
mature package using <ulink
|
|
url="http://www.xapian.org/docs/intro_ir.html">a sophisticated
|
|
probabilistic ranking model</ulink>. &RCL; provides the interface
|
|
to get data into (indexing) and out (searching) of the system.</para>
|
|
|
|
<para>In practice, &XAP; works by remembering where terms appear
|
|
in your document files. The acquisition process is called
|
|
indexing. </para>
|
|
|
|
<para>The resulting index can be big (roughly the size of the
|
|
original document set), but it is not a document
|
|
archive. &RCL; can only display documents that still exist at
|
|
the place from which they were indexed. (Actually, there is a
|
|
way to reconstruct a document from the information in the
|
|
index, but the result is not nice, as all formatting,
|
|
punctuation and capitalization are lost).</para>
|
|
|
|
<para>&RCL; stores all internal data in <application>Unicode
|
|
UTF-8</application> format, and it can index files with
|
|
different character sets, encodings, and languages into the same
|
|
index. It has input filters for many document types.</para>
|
|
|
|
<para>Stemming depends on the document language. &RCL; stores
|
|
the unstemmed versions of terms and uses auxiliary databases for
|
|
term expansion. It can switch stemming languages, or add a
|
|
language, without re-indexing. Storing documents in different
|
|
languages in the same index is possible, and useful in
|
|
practice, but does introduce possibilities of confusion. &RCL;
|
|
currently makes no attempt at automatic language recognition.</para>
|
|
|
|
<para>&RCL; has many parameters which define exactly what to
|
|
index, and how to classify and decode the source
|
|
documents. These are kept in a <link
|
|
linkend="rcl.indexing.config">configuration file</link>. A
|
|
default configuration is copied into a standard location
|
|
(usually something like
|
|
<filename>/usr/[local/]share/recoll/examples</filename>)
|
|
during installation. The default parameters from this file may
|
|
be overridden by values that you set inside your personal
|
|
configuration, found by default in the
|
|
<filename>.recoll</filename> sub-directory of your home
|
|
directory. The default configuration will index your home
|
|
directory with default parameters and should be sufficient for
|
|
giving &RCL; a try, but you may want to adjust it
|
|
later.</para>
|
|
|
|
<para><link linkend="rcl.indexing.periodic.exec">Indexing</link>
|
|
is started automatically the first time you execute the
|
|
<command>recoll</command> search graphical user interface, or by
|
|
executing the <command>recollindex</command> command.</para>
|
|
|
|
<para><link linkend="rcl.search">Searches</link> are
|
|
performed inside the <command>recoll</command>
|
|
program, which has many options to help you find what you are
|
|
looking for.</para>
|
|
|
|
</sect1>
|
|
</chapter>
|
|
|
|
|
|
<chapter id="rcl.indexing">
|
|
<title>Indexing</title>
|
|
|
|
<sect1 id="rcl.indexing.introduction">
|
|
<title>Introduction</title>
|
|
|
|
<para>Indexing is the process by which the set of documents is
|
|
analyzed and the data entered into the database. &RCL; indexing
|
|
is normally incremental: documents will only be processed if
|
|
they have been modified. On the first execution, of course, all
|
|
documents will need processing. A full index build can be forced
|
|
later by specifying an option to the indexing command
|
|
(<command>recollindex -z</command>).</para>
|
|
|
|
<para>&RCL; indexing can be performed with two different
|
|
methods:</para>
|
|
|
|
<itemizedlist>
|
|
|
|
<listitem>
|
|
<formalpara><title>Periodic indexing:</title>
|
|
<para>indexing takes place at discrete
|
|
times, by executing the <command>recollindex</command>
|
|
command. The typical usage is to have a nightly indexing run
|
|
<link linkend="rcl.indexing.periodic.automat">programmed</link> into your
|
|
<command>cron</command> file.</para>
|
|
</formalpara>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<formalpara><title>Real time indexing:</title>
|
|
<para>indexing takes place as soon as a file is created or
|
|
changed. <command>recollindex</command> runs as a daemon
|
|
and uses a file system alteration monitor such as
|
|
<application>Fam</application>,
|
|
<application>Gamin</application> or
|
|
<application>inotify</application> do detect file changes.
|
|
Monitoring a big directory tree can consume significant
|
|
system resources.</para>
|
|
</formalpara>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para>The choice between the two methods is mostly a matter of
|
|
preference, and they can be combined by setting up multiple
|
|
indexes (ie: use periodic indexing on a big documentation
|
|
directory, and real time indexing on a small home
|
|
directory). Monitoring a big file system tree can consume
|
|
significant system resources, for dubious gains. <para>
|
|
|
|
<para>&RCL; knows about quite a few different document
|
|
types. The parameters for document types recognition and
|
|
processing are set in
|
|
<link linkend="rcl.indexing.config">configuration files</link>
|
|
Most file types, like HTML or word processing files, only hold
|
|
one document. Some file types, like mail folder files can hold
|
|
many individually indexed documents.
|
|
</para>
|
|
|
|
<para>&RCL; indexing processes plain text, HTML, openoffice
|
|
and e-mail files internally. Other types (ie: postscript, pdf,
|
|
ms-word, rtf) need external applications for preprocessing. The
|
|
list is in the <link linkend="rcl.install.external">
|
|
installation</link> section.</para>
|
|
|
|
<para>Without further configuration, &RCL; will index all
|
|
appropriate files from your home directory, with a reasonable
|
|
set of defaults.</para>
|
|
|
|
<para>In some cases, it may be interesting to index different
|
|
areas of the file system to separate databases. You can do this
|
|
by using multiple configuration directories, each indexing a
|
|
file system area to a specific database. See the <link
|
|
linkend="rcl.search.multidb">section about using multiple
|
|
databases</link> for more information on multiple configurations
|
|
and indexes. </para>
|
|
</sect1>
|
|
|
|
<sect1 id="rcl.indexing.storage">
|
|
<title>Index storage</title>
|
|
|
|
<para>The default location for the index data is the
|
|
<filename>xapiandb</filename> subdirectory of the &RCL;
|
|
configuration directory, typically
|
|
<filename>$HOME/.recoll/xapiandb/</filename>. This can be
|
|
changed via two different methods (with different purposes):</para>
|
|
|
|
<itemizedlist>
|
|
|
|
<listitem><para>You can specify a different configuration
|
|
directory by setting the <literal>RECOLL_CONFDIR</literal>
|
|
environment variable, or using the <literal>-c</literal>
|
|
option to the &RCL; commands. This method would typically be
|
|
used to index different areas of the file system to
|
|
different indexes. For example, if you were to issue the
|
|
following commands:
|
|
<programlisting>
|
|
export RECOLL_CONFDIR=~/.indexes-email
|
|
recoll
|
|
</programlisting> Then &RCL; would use configuration files
|
|
stored in <filename>~/.indexes-email/</filename> and,
|
|
(unless specified otherwise in
|
|
<filename>recoll.conf</filename>) would look for
|
|
the index in <filename>~/.indexes-email/xapiandb/</filename>.
|
|
|
|
<para>Using multiple configuration directories and
|
|
<link linkend="rcl.install.config.recollconf">configuration
|
|
options</link> allows you to tailor multiple configurations
|
|
and indexes to handle whatever subset of the available data
|
|
that you wish to make searchable.</para>
|
|
|
|
</listitem>
|
|
<listitem><para>You can also specify a different storage
|
|
location for the index by setting the <literal>dbdir</literal>
|
|
parameter in the configuration file
|
|
(see the <link linkend="rcl.install.config.recollconf">configuration
|
|
section</link>). This method would mainly be of use if you
|
|
wanted to keep the configuration directory in its default location,
|
|
but desired another location for the index, typically out of
|
|
disk occupation concerns.</para>
|
|
</listitem>
|
|
|
|
</itemizedlist>
|
|
|
|
<para>The size of the index is determined by the size of the set
|
|
of documents, but the ratio can vary a lot. For a typical mixed
|
|
set of documents, the index size will often be close to
|
|
the data set size. In specific cases (a set of compressed
|
|
mbox files for example), the index can become much bigger than
|
|
the documents. It may also be much smaller if the documents
|
|
contain a lot of images or other non-indexed data (an extreme
|
|
example being a set of mp3 files where only the tags would be
|
|
indexed).</para>
|
|
|
|
<para>Of course, images, sound and video do not increase the
|
|
index size, which means that it will be quite typical nowadays
|
|
(2006), that even a big index will be negligible against the
|
|
total amount of data on the computer.</para>
|
|
|
|
<para>The index data directory (<filename>xapiandb</filename>)
|
|
only contains data that can be completely rebuilt by an index
|
|
run, and it can always be destroyed safely.</para>
|
|
|
|
<sect2 id="rcl.indexing.storage.security">
|
|
<title>Security aspects</title>
|
|
|
|
<para>The &RCL; index does not hold copies of the indexed
|
|
documents. But it does hold enough data to allow for an almost
|
|
complete reconstruction. If confidential data is indexed,
|
|
access to the database directory should be restricted. </para>
|
|
|
|
<para>As of version 1.4, &RCL; will create the configuration
|
|
directory with a mode of 0700 (access by owner only). As the
|
|
index data directory is by default a sub-directory of the
|
|
configuration directory, this should result in appropriate
|
|
protection.</para>
|
|
|
|
<para>If you use another setup, you should think of the kind
|
|
of protection you need for your index, and set the directory
|
|
and files access modes appropriately.</para>
|
|
|
|
</sect2>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="rcl.indexing.config">
|
|
<title>The indexing configuration</title>
|
|
|
|
<para>You can control which areas of the file system are
|
|
indexed, and how files are processed, by setting variables inside
|
|
the <link linkend="rcl.install.config">&RCL; configuration
|
|
files</link>.</para>
|
|
|
|
<para>You can also use <link linkend="rcl.search.multidb">multiple
|
|
indexes</link> defined by separate configurations, typically to
|
|
separate personal and shared indexes, or to take advantage of
|
|
the organization of your data to improve search precision.</para>
|
|
|
|
<para>The first time you start <command>recoll</command>, you
|
|
will be asked whether or not you would like recoll to build the
|
|
index. If you want to adjust the configuration before indexing,
|
|
just click <guilabel>Cancel</guilabel> at this point. That way,
|
|
recoll will have created a ~/.recoll directory containing empty
|
|
configuration files.</para>
|
|
|
|
<para>The configuration is documented inside the <link
|
|
linkend="rcl.install.config">installation chapter</link> of
|
|
this document, or in the recoll.conf(5) man page. The most
|
|
immediately useful variable you may interested in is probably <link
|
|
linkend="rcl.install.config.recollconf.topdirs">topdirs</link>,
|
|
which determines what subtrees get indexed.</para>
|
|
|
|
<para>The applications needed to index file types other than
|
|
text, HTML or email (ie: pdf, postscript, ms-word...) are
|
|
described in the <link linkend="rcl.install.external">external
|
|
packages section</link></para>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="rcl.indexing.periodic">
|
|
<title>Periodic indexing</title>
|
|
|
|
<sect2 id="rcl.indexing.periodic.exec">
|
|
<title>Starting indexing</title>
|
|
|
|
<para>Indexing is performed either by the
|
|
<command>recollindex</command> program, or by the
|
|
indexing thread inside the <command>recoll</command>
|
|
program (use the <guimenu>File</guimenu> menu). Both programs
|
|
will use of the <literal>RECOLL_CONFDIR</literal>
|
|
variable or accept a <literal>-c</literal>
|
|
<replaceable>confdir</replaceable> option to specify the
|
|
configuration directory to be used.</para>
|
|
|
|
<para>If the <command>recoll</command> program finds no index
|
|
when it starts, it will automatically start indexing (except
|
|
if canceled).</para>
|
|
|
|
<para>It is best to avoid interrupting the indexing process, as
|
|
this may sometimes leave the index in a bad state. This is
|
|
not a serious problem, as you then just need to delete
|
|
the index files and restart the indexing. The index files are
|
|
normally stored in the <filename>$HOME/.recoll/xapiandb</filename>
|
|
directory, which you can just delete if needed. Alternatively,
|
|
you can start <command>recollindex</command> with option
|
|
<literal>-z</literal>, which will reset the database before
|
|
indexing.</para>
|
|
|
|
</sect2>
|
|
|
|
<sect2 id="rcl.indexing.periodic.automat">
|
|
<title>Using <command>cron</command> to automate
|
|
indexing</title>
|
|
|
|
<para>The most common way to set up indexing is to have a cron
|
|
task execute it every night. For example the following
|
|
<filename>crontab</filename> entry would do it every day at
|
|
3:30AM (supposing <command>recollindex</command> is in your PATH):</para>
|
|
|
|
<programlisting>30 3 * * * recollindex > /tmp/recolltrace 2>&1</programlisting>
|
|
|
|
<para>The usual command to edit your
|
|
<filename>crontab</filename> is
|
|
<userinput>crontab -e</userinput> (which will usually start the
|
|
<command>vi</command> editor to edit the file). You may have
|
|
more sophisticated tools available on your system.</para>
|
|
|
|
</sect2>
|
|
</sect1>
|
|
|
|
<sect1 id="rcl.indexing.monitor">
|
|
<title>Real time indexing</title>
|
|
|
|
<para>Real time monitoring/indexing is performed by starting the
|
|
<command>recollindex -m</command> command. With this option,
|
|
<command>recollindex</command> will detach from the terminal and
|
|
become a daemon, permanently monitoring file changes and updating
|
|
the index.</para>
|
|
|
|
<para>The real time indexing support can be customised during package
|
|
<link linkend="rcl.install.building.build">configuration</link>
|
|
with the <literal>--with[out]-fam</literal> or
|
|
<literal>--with[out]-inotify</literal> options. The default is
|
|
currently to include inotify monitoring on systems that support
|
|
it.</para>
|
|
|
|
<para>The <filename>rclmon.sh</filename> script can be used to
|
|
easily start and stop the daemon. It can be found in the
|
|
<filename>examples</filename> directory (typically
|
|
<filename>/usr/local/[share/]recoll/examples</filename>).</para>
|
|
|
|
<para>Starting the daemon is normally performed as part
|
|
of the user session script. For example, my out of fashion
|
|
xdm-based session has a <filename>.xsession</filename> script
|
|
with the following lines at the end:</para>
|
|
|
|
<programlisting>recollconf=$HOME/.recoll-home
|
|
recolldata=/usr/local/share/recoll
|
|
RECOLL_CONFDIR=$recollconf $recolldata/examples/rclmon.sh start
|
|
|
|
fvwm
|
|
|
|
</programlisting>
|
|
|
|
<para>The indexing daemon gets started, then the window manager,
|
|
for which the session waits.</para> <para>By default the
|
|
indexing daemon will monitor the state of the X11 session, and
|
|
exit when it finishes, it is not necessary to kill it
|
|
explicitely. (The X11 server monitoring can be disabled with option
|
|
<literal>-x</literal> to <command>recollindex</command>).
|
|
</para>
|
|
|
|
<para>Under KDE, you can place a small script to start
|
|
<command>recollindex -m</command> under
|
|
<filename>$HOME/.kde/Autostart</filename>. This will be executed
|
|
when the session begins.</para>
|
|
|
|
<para>There is a similar mechanism under Gnome (find the session
|
|
control tool in the menus and use the "Startup programs" tab).</para>
|
|
|
|
<para>By default, the indexing daemon will write its messages to
|
|
a file inside the configuration directory (this is controlled by
|
|
the <literal>daemlogfilename</literal> and
|
|
<literal>daemloglevel</literal> configuration parameters). You
|
|
may want to change this. Also the log file will only be truncated
|
|
when the daemon starts. If the daemon runs permanently, the log
|
|
file may grow quite big, depending on the log level.</para>
|
|
|
|
<para>While it is convenient that data is indexed in real time,
|
|
repeated indexing can generate a significant load on the system
|
|
when files such as email folders change. You probably do not
|
|
want to enable it if your system is short on resources. Periodic
|
|
indexing is adequate in most cases.</para>
|
|
|
|
</sect1>
|
|
|
|
</chapter>
|
|
|
|
<chapter id="rcl.search">
|
|
<title>Searching</title>
|
|
|
|
<para>The <command>recoll</command> program provides the user
|
|
interface for searching. It is based on the
|
|
<application>QT</application> library.</para>
|
|
|
|
<sect1 id="rcl.search.simple">
|
|
<title>Simple search</title>
|
|
|
|
<procedure>
|
|
<step><para>Start the <command>recoll</command> program.</para>
|
|
</step>
|
|
<step><para>Possibly choose a search mode: <guilabel>Any
|
|
term</guilabel> or <guilabel>All terms</guilabel> or
|
|
<guilabel>File name</guilabel>.</para>
|
|
</step>
|
|
<step><para>Enter search term(s) in the text field at the top of the
|
|
window.</para>
|
|
</step>
|
|
<step><para>Click the <guilabel>Search</guilabel> button or
|
|
hit the <keycap>Enter</keycap> key to start the search.</para>
|
|
</step>
|
|
</procedure>
|
|
|
|
<para>The initial default search mode is <guilabel>All
|
|
terms</guilabel>. This will look for documents containing all
|
|
of the search terms (the ones with more terms will get better
|
|
scores). <guilabel>Any term</guilabel> will search for
|
|
documents where at least one of the terms appear. <guilabel>File
|
|
name</guilabel> will specifically look for file names.</para>
|
|
|
|
<para>The fourth entry (<guilabel>Query Language</guilabel>) is
|
|
described in <link linkend="rcl.search.lang">its own
|
|
section</link>.</para>
|
|
|
|
<para>All search modes allow wildcards inside terms
|
|
(<literal>*</literal>, <literal>?</literal>,
|
|
<literal>[]</literal>). You may want to have a look at the
|
|
<link linkend="rcl.search.wildcards">section about wildcards</link>
|
|
for more information about this.</para>
|
|
|
|
<para>You can search for exact phrases (adjacent words in a
|
|
given order) by enclosing the input inside double quotes. Ex:
|
|
<literal>"virtual reality"</literal>.</para>
|
|
|
|
<para>Character case has no influence on search, except that you
|
|
can disable stem expansion for any term by capitalizing it. Ie:
|
|
a search for <literal>floor</literal> will also normally look for
|
|
<literal>flooring</literal>, <literal>floored</literal>, etc., but
|
|
a search for <literal>Floor</literal> will only look for
|
|
<literal>floor</literal>, in any character case (stemming can
|
|
also be disabled globally in the preferences). </para>
|
|
|
|
<para>&RCL; remembers the last few searches that you
|
|
performed. You can use the simple search text entry widget (a
|
|
combobox) to recall them (click on the thing at the right of the
|
|
text field). Please note, however, that only the search texts
|
|
are remembered, not the mode (all/any/file name).</para>
|
|
|
|
<para>Typing <keycap>Esc</keycap> <keycap>Space</keycap> while
|
|
entering a word in the simple search entry will open a window
|
|
with possible completions for the word. The completions are
|
|
extracted from the database.</para>
|
|
|
|
<para>Double-clicking on a word in the result list or a preview
|
|
window will insert it into the simple search entry field.</para>
|
|
|
|
<para>Note that, apart from wildcard characters (single
|
|
<literal>?</literal> characters are ok), you can cut and paste
|
|
any text into an <guilabel>All terms</guilabel> or
|
|
<guilabel>Any term</guilabel> search field, punctuation,
|
|
newlines and all. &RCL; will process it and produce a meaningful
|
|
search. This is what most differentiates this mode from the
|
|
<guilabel>Query Language</guilabel> mode, where you have to care
|
|
about the syntax.</para>
|
|
|
|
<para>You can use the <guilabel>Tools</guilabel> / <guilabel>Advanced
|
|
search</guilabel> dialog for more complex searches.</para>
|
|
</sect1>
|
|
|
|
<sect1 id="rcl.search.reslist">
|
|
<title>The result list</title>
|
|
|
|
<para>After starting a search, a list of results will instantly
|
|
be displayed in the main list window.</para>
|
|
|
|
<para>By default, the document list is presented in order of
|
|
relevance (how well the system estimates that the document
|
|
matches the query). You can specify a different ordering by
|
|
using the <link linkend="rcl.search.sort"><guilabel>Tools</guilabel>
|
|
/ <guilabel>Sort parameters</guilabel></link> dialog.</para>
|
|
|
|
<para>Clicking on the
|
|
<literal>Preview</literal> link for an entry will open an
|
|
internal preview window for the document. Further
|
|
<literal>Preview</literal> clicks for the same search will open
|
|
tabs in the existing preview window. You can use
|
|
<keycap>Shift</keycap>+Click to force the creation of another
|
|
preview window, which may be useful to view the documents side
|
|
by side. (You can also browse successive results in a single
|
|
preview window by typing
|
|
<keycap>Shift</keycap>+<keycap>ArrowUp/Down</keycap> in the
|
|
window).</para>
|
|
|
|
<para>Clicking the <literal>Edit</literal> link will attempt to
|
|
start an external viewer. The viewers can be configured through the
|
|
user preferences dialog, or by editing the
|
|
<filename>mimeview</filename> configuration file.</para>
|
|
|
|
<para>The <literal>Preview</literal> and <literal>Edit</literal>
|
|
edit links may not be present for all entries, meaning that
|
|
&RCL; has no configured way to preview a given file type (which
|
|
was indexed by name only), or no configured external viewer for
|
|
the file type. This can sometimes be adjusted simply by tweaking
|
|
the <link linkend="rclinstall.config.mimemap">
|
|
<filename>mimemap</filename></link> and
|
|
<link linkend="rclinstall.config.mimeview">
|
|
<filename>mimeview</filename></link> configuration files (the latter
|
|
can be modified with the user preferences dialog).</para>
|
|
|
|
<para>You can click on the <literal>Query details</literal> link
|
|
at the top of the results page to see the query actually
|
|
performed, after stem expansion and other processing.</para>
|
|
|
|
<para>Double-clicking on any word inside the result list or a
|
|
preview window will insert it into the simple search text.</para>
|
|
|
|
<para>The result list is divided into pages (the size of which
|
|
you can change in the preferences). Use the arrow buttons in the
|
|
toolbar or the links at the bottom of the page to browse the
|
|
results.</para>
|
|
|
|
|
|
<sect2 id="rcl.search.resultlist.menu">
|
|
<title>The result list right-click menu</title>
|
|
|
|
<para>Apart from the preview and edit links, you can display a
|
|
pop-up menu by right-clicking over a paragraph in the result
|
|
list. This menu has the following entries:</para>
|
|
|
|
<itemizedlist>
|
|
<listitem><para><guilabel>Preview</guilabel></para></listitem>
|
|
<listitem><para><guilabel>Edit</guilabel></para></listitem>
|
|
<listitem><para><guilabel>Copy File Name</guilabel></para></listitem>
|
|
<listitem><para><guilabel>Copy Url</guilabel></para></listitem>
|
|
<listitem><para><guilabel>Find similar</guilabel></para></listitem>
|
|
<listitem><para><guilabel>Find similar</guilabel></para></listitem>
|
|
<listitem><para><guilabel>Parent document</guilabel></para></listitem>
|
|
</itemizedlist>
|
|
|
|
<para>The <guilabel>Preview</guilabel> and
|
|
<guilabel>Edit</guilabel> entries do the same thing as the
|
|
corresponding links.</para>
|
|
|
|
<para>The <guilabel>Copy File Name</guilabel> and
|
|
<guilabel>Copy Url</guilabel> copy the relevant data to the
|
|
clipboard, for later pasting.</para>
|
|
|
|
<para>The <guilabel>Find similar</guilabel> entry will select
|
|
a number of relevant term from the current document and enter
|
|
them into the simple search field. You can then start a simple
|
|
search, with a good chance of finding documents related to the
|
|
current result.</para>
|
|
|
|
<para>The <guilabel>Parent document</guilabel> entry will
|
|
appear for documents which are not actually files but are
|
|
part of, or attached to, a higher level document. This entry
|
|
is mainly useful for email attachments and permits viewing
|
|
the message to which the document is attached. Note that the
|
|
entry will also appear for an email which is part of an mbox
|
|
folder file, but that you can't actually visualize the
|
|
folder (there will be an error dialog if you try). &RCL; is
|
|
unfortunately not yet smart enough to disable the entry in
|
|
this case.</para>
|
|
|
|
</sect2>
|
|
</sect1>
|
|
|
|
<sect1 id="rcl.search.preview">
|
|
<title>The preview window</title>
|
|
|
|
<para>The preview window opens when you first click a
|
|
<literal>Preview</literal> link inside the result list.</para>
|
|
|
|
<para>Subsequent preview requests for a given search open new
|
|
tabs in the existing window (except if you hold the
|
|
<keycap>Shift</keycap> key while clicking which will open a new
|
|
window for side by side viewing).</para>
|
|
|
|
<para>Starting another search and requesting a preview will
|
|
create a new preview window. The old one stays open until you
|
|
close it.</para>
|
|
|
|
<para>You can close a preview tab by typing <keycap>^W</keycap>
|
|
(<keycap>Ctrl</keycap> + <keycap>W</keycap>) in the
|
|
window. Closing the last tab for a window will also close the
|
|
window.</para>
|
|
|
|
<para>Of course you can also close a preview window by using the
|
|
window manager button in the top of the frame.</para>
|
|
|
|
<para>You can display successive or previous documents from the
|
|
result list inside a preview tab by typing
|
|
<keycap>Shift</keycap>+<keycap>Down</keycap> or
|
|
<keycap>Shift</keycap>+<keycap>Up</keycap> (<keycap>Down</keycap>
|
|
and <keycap>Up</keycap> are the arrow keys).</para>
|
|
|
|
<para>The preview tabs have an internal incremental search
|
|
function. You initiate the search either by typing a
|
|
<keycap>/</keycap> (slash) inside the text area or by clicking
|
|
into the <guilabel>Search for:</guilabel> text field and
|
|
entering the search string. You can then use the
|
|
<guilabel>Next</guilabel> and <guilabel>Previous</guilabel>
|
|
buttons to find the next/previous occurrence. You can also type
|
|
<keycap>F3</keycap> inside the text area to get to the next
|
|
occurrence.</para>
|
|
|
|
<para>If you have a search string entered and you use ^Up/^Down
|
|
to browse the results, the search is initiated for each successive
|
|
document. If the string is found, the cursor will be positioned
|
|
at the first occurrence of the search string.</para>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="rcl.search.lang">
|
|
<title>The query language</title>
|
|
|
|
<para>The query language processor is activated on the
|
|
simple search entry when the search mode selector is set to
|
|
<guilabel>Query Language</guilabel>.</para>
|
|
|
|
<para>Here follows a sample request that we are going to
|
|
explain:</para>
|
|
<programlisting>
|
|
mime:message/rfc822 author:"john doe" Beatles OR Lennon Live OR Unplugged -potatoes
|
|
</programlisting>
|
|
|
|
<para>This would search for all email messages with
|
|
<replaceable>John Doe</replaceable>
|
|
appearing as a phrase in the <literal>From:</literal> header,
|
|
and containing either <replaceable>beatles</replaceable> or
|
|
<replaceable>lennon</replaceable> and either
|
|
<replaceable>live</replaceable> or
|
|
<replaceable>unplugged</replaceable> but not
|
|
<replaceable>potatoes</replaceable>.</para>
|
|
|
|
<para>The first element, <literal>mime:message/rfc822</literal>
|
|
is a special switch that restricts the results to be email
|
|
messages. There could be several such switches, which would form
|
|
a list of allowed types.</para>
|
|
|
|
<para>The second element <literal>author:"john doe"</literal> is
|
|
a phrase search limited to a specific field. Phrase searches are
|
|
specified as usual by enclosing the words in double quotes. The
|
|
field specification appears before the colon. &RCL; currently
|
|
manages the following fields:</para>
|
|
<itemizedlist>
|
|
<listitem><para><literal>title</literal>,
|
|
<literal>subject</literal> or <literal>caption</literal> are
|
|
synonyms which specify data to be searched for in the
|
|
document title or subject.</para>
|
|
</listitem>
|
|
<listitem><para><literal>author</literal> or
|
|
<literal>from</literal> for searching the documents originators.</para>
|
|
</listitem>
|
|
<listitem><para><literal>keyword</literal> for searching the
|
|
document specified keywords (few documents actually have any).</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para>The query language is currently the only way to use the
|
|
&RCL; field search capability.</para>
|
|
|
|
<para>All elements in the search entry are normally combined
|
|
with an implicit AND. It is possible to specify that elements be
|
|
OR'ed instead, as in <replaceable>Beatles</replaceable>
|
|
<literal>OR</literal> <replaceable>Lennon</replaceable>. The
|
|
<literal>OR</literal> must be entered literally (capitals), and
|
|
it has priority over the AND associations:
|
|
<replaceable>word1</replaceable>
|
|
<replaceable>word2</replaceable> <literal>OR</literal>
|
|
<replaceable>word3</replaceable>
|
|
means
|
|
<replaceable>word1</replaceable> AND
|
|
(<replaceable>word2</replaceable> <literal>OR</literal>
|
|
<replaceable>word3</replaceable>)
|
|
not
|
|
(<replaceable>word1</replaceable> AND
|
|
<replaceable>word2</replaceable>) <literal>OR</literal>
|
|
<replaceable>word3</replaceable>. Do not enter explicit
|
|
parenthesis, they are not supported for now.</para>
|
|
|
|
<para>An entry preceded by a <literal>-</literal> specifies a
|
|
term that should <emphasis>not</emphasis> appear.</para>
|
|
|
|
<para>Words inside phrases and capitalized words are not
|
|
stem-expanded. Wildcards may be used anywhere.</para>
|
|
|
|
<para>You can use the <literal>show query</literal> link at the
|
|
top of the result list to check the exact query which was
|
|
finally executed by Xapian.</para>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="rcl.search.complex">
|
|
<title>Complex/advanced search</title>
|
|
|
|
<para>The advanced search dialog has a number of fields that
|
|
will allow a more refined search. Each entry field is
|
|
configurable for the following modes:</para>
|
|
|
|
<itemizedlist>
|
|
<listitem><para>All terms.</para>
|
|
</listitem>
|
|
<listitem><para>Any term.</para>
|
|
</listitem>
|
|
<listitem><para>None of the terms.</para>
|
|
</listitem>
|
|
<listitem><para>Phrase (exact terms in order within an
|
|
adjustable window).</para>
|
|
</listitem>
|
|
<listitem><para>Proximity (terms in any order within an
|
|
adjustable window).</para>
|
|
</listitem>
|
|
<listitem><para>Filename search with wildcards.</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para>Additional entry fields can be created by clicking the
|
|
<guilabel>Add clause</guilabel> button.</para>
|
|
|
|
<para>You can choose that all relevant fields will be combined
|
|
by either an AND or an OR conjunction. All types of clauses
|
|
except "phrase" and "near" can accept a mix of single words and
|
|
phrases enclosed in double quotes. Stemming expansion will be
|
|
performed for all terms not beginning with a capital letter,
|
|
except for terms inside "phrase" clauses. Wildcards will be
|
|
processed everywhere.</para>
|
|
|
|
<para>Advanced search will also let you search for documents of
|
|
specific mime types (ie: only <literal>text/plain</literal>, or
|
|
<literal>text/HTML</literal> or
|
|
<literal>application/pdf</literal> etc...). The state of the
|
|
file type selection can be saved as the default (the file type
|
|
filter will not be activated at program start-up, but the lists
|
|
will be in the restored state).</para>
|
|
|
|
<para>You can also restrict the search results
|
|
to a sub-tree of the indexed area. If you need to do this often,
|
|
you may think of setting up multiple indexes instead, as the
|
|
performance will be much better.</para>
|
|
|
|
<para>Click on the <guilabel>Start Search</guilabel> button in
|
|
the advanced search dialog, or type <keycap>Enter</keycap> in
|
|
any text field to start the search. The button in
|
|
the main window always performs a simple search.</para>
|
|
|
|
<para>Click on the <literal>Show query details</literal> link at
|
|
the top of the result page to see the query expansion.</para>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="rcl.search.termexplorer">
|
|
<title>The term explorer tool</title>
|
|
|
|
<para>&RCL; automatically manages the expansion of search terms
|
|
to their derivatives (ie: plural/singular, verb
|
|
inflections). But there are other cases where the exact search
|
|
term is not known. For example, you may not remember the exact
|
|
spelling, or only know the beginning of the name.</para>
|
|
|
|
<para>The term explorer tool (started from the toolbar icon or
|
|
from the <guilabel>Term explorer</guilabel> entry of the
|
|
<guilabel>Tools</guilabel> menu) can be used to search the full index
|
|
terms list. It has three modes of operations:</para>
|
|
<variablelist>
|
|
|
|
<varlistentry>
|
|
<term>Wildcard</term>
|
|
<listitem><para>In this mode of operation, you can enter a
|
|
search string with shell-like wildcards (*, ?, []). ie:
|
|
<replaceable>xapi*</replaceable> would display all index terms
|
|
beginning with <replaceable>xapi</replaceable>. (More
|
|
about wildcards <link
|
|
linkend="rcl.search.wildcards">here</link>).</para></listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>Regular expression</term>
|
|
<listitem><para>This mode will accept a regular expression
|
|
as input. Example:
|
|
<replaceable>word[0-9]+</replaceable>. The expression is
|
|
implicitely anchored at the beginning. Ie:
|
|
<replaceable>press</replaceable> will match
|
|
<replaceable>pression</replaceable> but not
|
|
<replaceable>expression</replaceable>. You can use
|
|
<replaceable>.*press</replaceable> to match the latter,
|
|
but be aware that this will cause a full index term list
|
|
scan, which can be quite long.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
|
|
<term>Stem expansion</term>
|
|
<listitem><para>This mode will perform the usual stem expansion
|
|
normally done as part user input processing. As such it is
|
|
probably mostly useful to demonstrate the process.
|
|
</para></listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term>Spelling/Phonetic</term> <listitem><para>In this
|
|
mode, you enter the term as you think it is spelled, and
|
|
&RCL; will do its best to find index terms that sound like
|
|
your entry. This mode uses the
|
|
<application>Aspell</application> spelling application,
|
|
which must be installed on your system for things to
|
|
work. The language which is used to build the dictionary
|
|
out of the index terms (which is done at the end of an
|
|
indexing pass) is the one defined by your NLS
|
|
environment. Weird things will probably happen if
|
|
languages are mixed up.</para></listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
|
|
<para>Note that in cases where &RCL; does not know the beginning
|
|
of the string to search for (ie a wildcard expression like
|
|
<replaceable>*coll</replaceable>), the expansion can take quite
|
|
a long time because the full index term list will have to be
|
|
processed. The expansion is currently limited at 200 results for
|
|
wildcards and regular expressions.</para>
|
|
|
|
<para>Double-clicking on a term in the result list will insert
|
|
it into the simple search entry field. You can also cut/paste
|
|
between the result list and any entry field (the end of lines
|
|
will be taken care of).</para>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="rcl.search.wildcards">
|
|
<title>More about wildcards</title>
|
|
<para>All words entered in &RCL; search fields will be processed
|
|
for wildcard expansion before the request is finally
|
|
executed.</para>
|
|
|
|
<para>The wildcard characters are:</para>
|
|
|
|
<itemizedlist>
|
|
<listitem><para><literal>*</literal> which matches 0 or more
|
|
characters.</para>
|
|
</listitem>
|
|
<listitem><para><literal>?</literal> which matches
|
|
a single character.</para>
|
|
</listitem>
|
|
<listitem><para><literal>[]</literal> which allow
|
|
defining sets of characters to be matched (ex:
|
|
<literal>[</literal><userinput>abc</userinput><literal>]</literal>
|
|
matches a single character which may be 'a' or 'b' or 'c',
|
|
<literal>[</literal><userinput>0-9</userinput><literal>]</literal>
|
|
matches any number.</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para>You should be aware of a few things before using
|
|
wildcards.</para>
|
|
|
|
<itemizedlist>
|
|
<listitem><para>Using a wildcard character at the beginning of
|
|
a word can make for a slow search because &RCL; will have to
|
|
scan the whole index term list to find the matches.</para>
|
|
</listitem>
|
|
<listitem><para>Using a <literal>*</literal> at the end of a
|
|
word can produce more matches than you would think, and
|
|
strange search results. You can use the <link
|
|
linkend="rcl.search.termexplorer">term explorer</link> tool to
|
|
check what completions exist for a given term. You can also
|
|
see exactly what search was performed by clicking on the link
|
|
at the top of the result list. In general, for natural
|
|
language terms, stem expansion will produce better results
|
|
than an ending <literal>*</literal> (stem expansion is turned
|
|
off when any wildcard character appears in the term).</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="rcl.search.multidb">
|
|
<title>Multiple databases</title>
|
|
|
|
<para>Multiple &RCL; databases or indexes can be created by
|
|
using several configuration directories which are usually set to
|
|
index different areas of the file system. A specific index can
|
|
be selected for updating or searching, using the
|
|
<literal>RECOLL_CONFDIR</literal> environment variable or the
|
|
<literal>-c</literal> option to <command>recoll</command> and
|
|
<command>recollindex</command>.</para>
|
|
|
|
<para>A <command>recollindex</command> program instance can only
|
|
update one specific index.</para>
|
|
|
|
<para>A <command>recoll</command> program instance is also
|
|
associated with a specific index, which is the one to be
|
|
updated by its indexing thread, but it can use any
|
|
number of &RCL; indexes for searching. The external indexes
|
|
can be selected through the <guilabel>external
|
|
indexes</guilabel> tab in the preferences dialog.</para>
|
|
|
|
<para>Index selection is performed in two phases. A set of all
|
|
usable indexes must first be defined, and then the subset of
|
|
indexes to be used for searching. Of course, these parameters
|
|
are retained across program executions (there are kept
|
|
separately for each &RCL; configuration). The set of all indexes
|
|
is usually quite stable, while the active ones might typically
|
|
be adjusted quite frequently.</para>
|
|
|
|
<para>The main index (defined by
|
|
<literal>RECOLL_CONFDIR</literal>) is always active. If this is
|
|
undesirable, you can set up your base configuration to index
|
|
an empty directory.</para>
|
|
|
|
<para>As building the set of all indexes can be a little tedious
|
|
when done through the user interface, you can use the
|
|
<literal>RECOLL_EXTRA_DBS</literal> environment
|
|
variable to provide an initial set. This might typically be
|
|
set up by a system administrator so that every user does not
|
|
have to do it. The variable should define a colon-separated list
|
|
of index directories, ie:
|
|
</para>
|
|
<screen>export RECOLL_EXTRA_DBS=/some/place/xapiandb:/some/other/db</screen>
|
|
|
|
<para>A typical usage scenario for the multiple index feature
|
|
would be for a system administrator to set up a central index
|
|
for shared data, that you choose to search or not in addition to
|
|
your personal data. Of course, there are other
|
|
possibilities. There are many cases where you know the subset of
|
|
files that should be searched, and where narrowing the search
|
|
can improve the results. You can achieve approximately the same
|
|
effect with the directory filter in advanced search, but
|
|
multiple indexes will have much better performance and may be
|
|
worth the trouble.</para>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="rcl.search.history">
|
|
<title>Document history</title>
|
|
|
|
<para>Documents that you actually view (with the internal preview
|
|
or an external tool) are entered into the document history,
|
|
which is remembered. You can display the history list by using
|
|
the <guilabel>Tools/</guilabel><guilabel>Doc History</guilabel> menu
|
|
entry.</para>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="rcl.search.sort">
|
|
<title>Sorting search results</title>
|
|
|
|
<para>The documents in a result list are normally sorted in
|
|
order of relevance. It is possible to specify different sort
|
|
parameters by using the <guimenu>Sort parameters</guimenu>
|
|
dialog (located in the <guimenu>Tools</guimenu>
|
|
menu).</para>
|
|
|
|
<para>The tool sorts a specified number of the most
|
|
relevant documents in the result list, according to
|
|
specified criteria. The currently available criteria are
|
|
<emphasis>date</emphasis> and <emphasis>mime type</emphasis>.</para>
|
|
|
|
<para>The sort parameters stay in effect until they are explicitly
|
|
reset, or the program exits. An activated sort is indicated in
|
|
the result list header.</para>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="rcl.search.tips">
|
|
<title>Search tips, shortcuts</title>
|
|
|
|
<formalpara><title>Term completion</title>
|
|
<para>Typing <keycap>Esc</keycap> <keycap>Space</keycap> in
|
|
the simple search entry field while entering a word will
|
|
either complete the current word if its beginning matches a
|
|
unique term in the index, or open a window to propose a list
|
|
of completions.</para>
|
|
</formalpara>
|
|
|
|
<formalpara><title>Picking up new terms from result or preview
|
|
text</title>
|
|
<para>Double-clicking on a word in the result list or in a
|
|
preview window will copy it to the simple search entry field.</para>
|
|
</formalpara>
|
|
|
|
<formalpara><title>Disabling stem expansion</title>
|
|
<para>Entering a capitalized word in any search field will prevent
|
|
stem expansion (no search for
|
|
<literal>gardening</literal> if you enter
|
|
<literal>Garden</literal> instead of
|
|
<literal>garden</literal>). This is the only case where
|
|
character case should make a difference for a &RCL;
|
|
search. You can also disable stem expansion or change the
|
|
stemming language in the preferences.</para>
|
|
</formalpara>
|
|
|
|
<formalpara><title>Phrases</title>
|
|
<para>A phrase can be looked for by enclosing it in double
|
|
quotes. Example: <literal>"user manual"</literal> will look
|
|
only for occurrences of <literal>user</literal> immediately
|
|
followed by <literal>manual</literal>. You can use the
|
|
<guilabel>This exact phrase</guilabel> field of the advanced
|
|
search dialog to the same effect. Phrases can be entered along
|
|
simple terms in all simple or advanced search entry fields
|
|
(except <guilabel>This exact phrase</guilabel>).</para>
|
|
</formalpara>
|
|
|
|
<formalpara><title>Browsing the result list inside a preview
|
|
window (1.5)</title>
|
|
<para>Entering <keycap>Shift-Down</keycap> or <keycap>Shift-Up</keycap>
|
|
(<keycap>Shift</keycap> + an arrow key) in a preview window will
|
|
display the next or the previous document from the result
|
|
list. Any secondary search currently active will be executed on
|
|
the new document.</para>
|
|
</formalpara>
|
|
|
|
<formalpara><title>Forced opening of a preview window (1.6)</title>
|
|
<para>You can use <keycap>Shift</keycap>+Click on a result list
|
|
<literal>Preview</literal> link to force the creation of a
|
|
preview window instead of a new tab in the existing one.</para>
|
|
</formalpara>
|
|
|
|
<formalpara><title>AutoPhrases (1.5)</title>
|
|
<para>This option can be set in the preferences dialog. If it is
|
|
set, a phrase will be automatically built and added to simple
|
|
searches when looking for <literal>Any terms</literal>. This
|
|
will not change radically the results, but will give a relevance
|
|
boost to the results where the search terms appear as a
|
|
phrase. Ie: searching for <literal>virtual reality</literal>
|
|
will still find all documents where either
|
|
<literal>virtual</literal> or <literal>reality</literal> or
|
|
both appear, but those which contain <literal>virtual
|
|
reality</literal> should appear sooner in the list.</para>
|
|
|
|
<formalpara><title>Finding related documents</title>
|
|
<para>Selecting the <guilabel>Find similar documents</guilabel> entry
|
|
in the result list paragraph right-click menu will select a
|
|
set of "interesting" terms from the current result, and insert
|
|
them into the simple search entry field. You can then possibly
|
|
edit the list and start a search to find documents which may
|
|
be apparented to the current result.</para>
|
|
</formalpara>
|
|
|
|
<formalpara><title>File names</title>
|
|
<para>File names are added as terms during indexing, and you can
|
|
specify them as ordinary terms in normal search fields (&RCL; used
|
|
to index all directories in the file path as terms. This has been
|
|
abandoned as it did not seem really useful). Alternatively, you
|
|
can use the specific file name search which will
|
|
<emphasis>only</emphasis> look for file names and can use wildcard
|
|
expansion.</para>
|
|
</formalpara>
|
|
|
|
<formalpara><title>Query explanation</title>
|
|
<para>You can get an exact description of what the query
|
|
looked for, including stem expansion, and Boolean operators
|
|
used, by clicking on the result list header.</para>
|
|
</formalpara>
|
|
|
|
<formalpara><title>Closing previews</title>
|
|
<para>Entering <keycap>^W</keycap> in a tab will
|
|
close it (and, for the last tab, close the preview
|
|
window). Entering <keycap>Esc</keycap> will close the preview
|
|
window and all its tabs.</para>
|
|
</formalpara>
|
|
|
|
<formalpara><title>Quitting</title>
|
|
<para>Entering <keycap>^Q</keycap> almost anywhere will
|
|
close the application.</para>
|
|
</formalpara>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="rcl.search.custom">
|
|
<title>Customizing the search interface</title>
|
|
|
|
<para>It is possible to customize some aspects of the search
|
|
interface by using <guimenu>Query configuration</guimenu> entry
|
|
in the <guimenu>Preferences</guimenu> menu.</para>
|
|
|
|
<para>There are two tabs in the dialog, dealing with the
|
|
interface itself, and with the parameters used for searching and
|
|
returning results.</para>
|
|
|
|
<formalpara><title>User interface parameters:</title>
|
|
<para>
|
|
<itemizedlist>
|
|
|
|
<listitem><para><guilabel>Number of results in a result
|
|
page</guilabel></para>
|
|
</listitem>
|
|
|
|
<listitem><para><guilabel>Result list font</guilabel>: There
|
|
is quite a lot of information shown in the result list, and
|
|
you may want to customize the font and/or font size. The rest
|
|
of the fonts used by &RCL; are determined by your generic QT
|
|
config (try the <command>qtconfig</command> command.</para>
|
|
</listitem>
|
|
|
|
<listitem><para><guilabel>Result paragraph format
|
|
string</guilabel>: allows you to change the presentation of
|
|
each result list entry. This is a qt-html string where the
|
|
following printf-like <literal>%</literal> substitutions will
|
|
be performed:
|
|
<itemizedlist>
|
|
<listitem>
|
|
<formalpara><title>%A</title><para>Abstract</para></formalpara>
|
|
</listitem>
|
|
<listitem><formalpara><title>%D</title><para>Date</para></formalpara>
|
|
</listitem>
|
|
<listitem><formalpara><title>%K</title><para>Keywords (if
|
|
any)</para></formalpara>
|
|
</listitem>
|
|
<listitem><formalpara><title>%L</title><para>Preview and
|
|
Edit links</para></formalpara>
|
|
</listitem>
|
|
<listitem><formalpara><title>%M</title><para>Mime
|
|
type</para></formalpara>
|
|
</listitem>
|
|
<listitem><formalpara><title>%N</title><para>result Number
|
|
</para></formalpara>
|
|
</listitem>
|
|
<listitem><formalpara><title>%R</title><para>Relevance
|
|
percentage</para></formalpara>
|
|
</listitem>
|
|
<listitem><formalpara><title>%S</title><para>Size
|
|
information</para></formalpara>
|
|
</listitem>
|
|
<listitem><formalpara><title>%T</title><para>Title</para>
|
|
</formalpara>
|
|
</listitem>
|
|
<listitem><formalpara><title>%U</title><para>Url</para></formalpara>
|
|
</listitem>
|
|
</itemizedlist>
|
|
The default value for the string is:
|
|
<programlisting>%R %S %L &nbsp;&nbsp;<b>%T</b><br>
|
|
%M&nbsp;%D&nbsp;&nbsp;&nbsp;<i>%U</i><br>
|
|
%A %K
|
|
</programlisting>
|
|
You may, for example, try the following for a more web-like
|
|
experience:
|
|
<programlisting><u><b><a href="P%N">%T</a></b></u><br>
|
|
%A<font color=#008000>%U - %S</font> - %L
|
|
</programlisting>
|
|
The format of the Preview and Edit links is
|
|
<literal><a href="P<replaceable>docnum</replaceable>"></literal>
|
|
and
|
|
<literal><a href="E<replaceable>docnum</replaceable>"></literal>
|
|
where <replaceable>docnum</replaceable> is what %N would
|
|
print. This makes the title a preview link in the above format.
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem><para><guilabel>HTML help browser</guilabel>: this
|
|
will let you chose your preferred browser which will be
|
|
started from the <guimenu>Help</guimenu> menu to read the user
|
|
manual. You can enter a simple name if the command is in your
|
|
PATH, or browse for a full pathname.</para>
|
|
</listitem>
|
|
|
|
<listitem><para><guilabel>Show document type icons in result
|
|
list</guilabel>: icons in the result list can be turned
|
|
off. They take quite a lot of space and convey relatively
|
|
little useful information.</para>
|
|
</listitem>
|
|
|
|
<listitem><para><guilabel>Auto-start simple search on
|
|
white space entry</guilabel>: if this is checked, a search will
|
|
be executed each time you enter a space in the simple search
|
|
input field. This lets you look at the result list as you
|
|
enter new terms. This is off by default, you may like it or
|
|
not...</para>
|
|
</listitem>
|
|
|
|
<listitem><para><guilabel>Start with advanced search dialog open
|
|
</guilabel> and <guilabel>Start with sort dialog open</guilabel>:
|
|
If you use these dialogs all the time, checking these
|
|
entries will get them to open when recoll starts.</para>
|
|
</listitem>
|
|
|
|
<listitem><para><guilabel>Use desktop preferences to choose
|
|
document editor</guilabel>: if this is checked, the
|
|
<command>xdg-open</command>
|
|
utility will be used to open files when you click the
|
|
<guilabel>Edit</guilabel> link in the result list, instead of
|
|
the application defined in
|
|
<filename>mimeview</filename>. <command>xdg-open</command>
|
|
will in term use your desktop preferences to choose an
|
|
appropriate application.</para>
|
|
</listitem>
|
|
|
|
|
|
</itemizedlist>
|
|
</para>
|
|
</formalpara>
|
|
|
|
|
|
<formalpara><title>Search parameters:</title>
|
|
<para>
|
|
<itemizedlist>
|
|
|
|
<listitem><para><guilabel>Stemming language</guilabel>:
|
|
stemming obviously depends on the document's language. This
|
|
listbox will let you chose among the stemming databases which
|
|
were built during indexing (this is set in the <link
|
|
linkend="rcl.install.config.recollconf">main configuration
|
|
file</link>), or later added with
|
|
<command>recollindex -s</command> (See the recollindex
|
|
manual). Stemming languages which are dynamically added will be
|
|
deleted at the next indexing pass unless they are also added in
|
|
the configuration file.</para>
|
|
</listitem>
|
|
|
|
<listitem><para><guilabel>Dynamically build
|
|
abstracts</guilabel>: this decides if &RCL; tries to build
|
|
document abstracts when displaying the result list. Abstracts
|
|
are constructed by taking context from the document
|
|
information, around the search terms. This can slow down
|
|
result list display significantly for big documents, and you
|
|
may want to turn it off.</para>
|
|
</listitem>
|
|
|
|
<listitem><para><guilabel>Replace abstracts from
|
|
documents</guilabel>: this decides if we should synthesize and
|
|
display an abstract in place of an explicit abstract found
|
|
within the document itself.</para>
|
|
</listitem>
|
|
|
|
<listitem><para><guilabel>Synthetic abstract size</guilabel>:
|
|
adjust to taste...</para>
|
|
</listitem>
|
|
|
|
<listitem><para><guilabel>Synthetic abstract context
|
|
words</guilabel>: how many words should be displayed around
|
|
each term occurrence.</para>
|
|
</listitem>
|
|
|
|
</itemizedlist>
|
|
</para>
|
|
</formalpara>
|
|
|
|
<formalpara id="rcl.search.custom.extradb">
|
|
<title>External indexes:</title>
|
|
<para>This panel will let you browse for additional indexes
|
|
that you may want to search. External indexes are designated by
|
|
their database directory (ie:
|
|
<filename>/home/someothergui/.recoll/xapiandb</filename>,
|
|
<filename>/usr/local/recollglobal/xapiandb</filename>).</para>
|
|
|
|
<para>Once entered, the indexes will appear in the
|
|
<guilabel>External indexes</guilabel> list, and you can
|
|
chose which ones you want to use at any moment by checking or
|
|
unchecking their entries.</para>
|
|
|
|
<para>Your main database (the one the current configuration
|
|
indexes to), is always implicitly active. If this is not
|
|
desirable, you can set up your configuration so that it indexes,
|
|
for example, an empty directory.</para>
|
|
|
|
</sect1>
|
|
|
|
</chapter>
|
|
|
|
|
|
<chapter id="rcl.install">
|
|
<title>Installation</title>
|
|
|
|
<sect1 id="rcl.install.binary">
|
|
<title>Installing a prebuilt copy</title>
|
|
|
|
<para>Recoll binary installations are always linked statically
|
|
to the xapian libraries, and have no other dependencies. You
|
|
will only have to check or install
|
|
<link linkend="rcl.install.external">supporting
|
|
applications</link> for the file types that you want to index
|
|
beyond text, HTML and mail files.</para>
|
|
|
|
<sect2 id="rcl.install.binary.package">
|
|
<title>Installing through a package system</title>
|
|
|
|
<para>If you use a BSD-type port system or a
|
|
prebuilt package (RPM or other), just follow the usual
|
|
procedure, and maybe have a look at the <link
|
|
linkend="rcl.install.config">configuration
|
|
section</link> (but this may not be necessary for a quick
|
|
test with default parameters).</para>
|
|
|
|
</sect2>
|
|
|
|
<sect2 id="rcl.install.binary.rcl">
|
|
<title>Installing a prebuilt &RCL;</title>
|
|
|
|
<para>The unpackaged binary versions are just compressed tar
|
|
files of a build tree, where only the useful parts were kept
|
|
(executables and sample configuration).</para>
|
|
|
|
<para>The executable binary files are built with a static link to
|
|
libxapian and libiconv, to make installation easier (no
|
|
dependencies). However, this also means that you cannot change
|
|
the versions which are used.</para>
|
|
|
|
<para>After extracting the tar file, you can proceed with
|
|
<link linkend="rcl.install.building.install">installation</link> as
|
|
if you had built the package from source (that is, just type
|
|
<literal>make install</literal>). The binary trees are built for
|
|
installation to <filename>/usr/local</filename>.</para>
|
|
|
|
<para>You may then need to install external applications to process
|
|
some file types that you want indexed (ie: acrobat,
|
|
postscript ...). See next section.</para>
|
|
|
|
<para>Finally, you may want to have a look at the <link
|
|
linkend="rcl.indexing.config">configuration section</link>.</para>
|
|
|
|
</sect2>
|
|
</sect1>
|
|
|
|
<sect1 id="rcl.install.external">
|
|
<title>Supporting packages</title>
|
|
|
|
<para>&RCL; uses external applications to index some file
|
|
types. You need to install them for the file types that you wish to
|
|
have indexed (these are run-time dependencies. None is needed for
|
|
building &RCL;):</para>
|
|
|
|
<itemizedlist>
|
|
|
|
<listitem><para>Openoffice: supported natively, but needs the
|
|
<command>unzip</command> command to be installed.</para>
|
|
</listitem>
|
|
|
|
<listitem><para>PDF: pdftotext is part of the <ulink
|
|
url="http://www.foolabs.com/xpdf/">Xpdf</ulink> package.</para>
|
|
</listitem>
|
|
|
|
<listitem><para>Postscript: <ulink
|
|
url="http://www.cs.wisc.edu/~ghost/doc/pstotext.htm">
|
|
pstotext</ulink>.</para>
|
|
</listitem>
|
|
|
|
<listitem><para>MS Word: <ulink url="http://www.winfield.demon.nl">
|
|
antiword</ulink>.</para>
|
|
</listitem>
|
|
|
|
<listitem><para>MS Excel and PowerPoint:
|
|
<ulink url="http://www.45.free.net/~vitus/software/catdoc/">
|
|
catdoc</ulink>.</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>RTF: <ulink
|
|
url="http://www.gnu.org/software/unrtf/unrtf.html">unrtf</ulink>
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>dvi: <ulink
|
|
url="http://www.radicaleye.com/dvips.html">dvips</ulink></para>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<para>djvu:
|
|
<ulink
|
|
url="http://djvulibre.djvuzone.org/doc/index.html">DjVuLibre
|
|
</ulink></para>
|
|
</listitem>
|
|
|
|
<listitem><para>MP3: &RCL; will use the
|
|
<command>id3info</command> command from the <ulink
|
|
url="http://id3lib.sourceforge.net/">id3lib</ulink> package to
|
|
extract tag information. Without it, only the file names will
|
|
be indexed.</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para>Text, HTML, mail folders Openoffice and Scribus files
|
|
are processed internally. Lyx is used to index Lyx files. Many
|
|
filters need <command>sed</command> and <command>awk</command>.
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
|
|
<sect1 id="rcl.install.building">
|
|
<title>Building from source</title>
|
|
|
|
<sect2 id="rcl.install.building.prereqs">
|
|
<title>Prerequisites</title>
|
|
|
|
<para>At the very least, you will need to download and install the
|
|
<ulink url="http://www.xapian.org">xapian core package</ulink>
|
|
(&RCL; development currently uses version 0.9.5), and the <ulink
|
|
url="http://www.trolltech.com/products/qt/index.html">qt
|
|
run-time and development packages</ulink> (&RCL; development
|
|
currently uses version 3.3.5, but any 3.3 version is
|
|
probably OK).</para>
|
|
|
|
<para>You will most probably be able to find a binary package for
|
|
<application>qt</application> for your system. You may have to
|
|
compile &XAP; but this is not difficult (if you are using
|
|
<application>FreeBSD</application>, there is a port).</para>
|
|
|
|
<para>You may also need
|
|
<ulink
|
|
url="http://www.gnu.org/software/libiconv/">libiconv</ulink>. &RCL;
|
|
currently uses version 1.9 (this should not be critical). On
|
|
<application>Linux</application> systems, the iconv interface
|
|
is part of libc and you should not need to do anything
|
|
special.</para>
|
|
|
|
<sect2 id="rcl.install.building.build">
|
|
<title>Building</title>
|
|
|
|
<para>&RCL; has been built on
|
|
Linux (redhat7.3, mandriva 2005/6, Fedora Core 3/4/5), FreeBSD and
|
|
Solaris 8. If you build on another system, <ulink
|
|
url="mailto:jean-francois.dockes@wanadoo.fr">I would very much
|
|
welcome patches</ulink>.</para>
|
|
|
|
<para>Depending on the <application>qt</application>
|
|
configuration on your system, you may have to set the
|
|
<literal>QTDIR</literal> and <literal>QMAKESPECS</literal>
|
|
variables in your environment:</para>
|
|
<itemizedlist>
|
|
<listitem><para><literal>QTDIR</literal> should point to the
|
|
directory above the one that holds the qt include files (ie:
|
|
if <filename>qt.h</filename> is
|
|
<filename>/usr/local/qt/include/qt.h</filename>, QTDIR
|
|
should be <filename>/usr/local/qt</filename>).</para>
|
|
</listitem>
|
|
<listitem><para><literal>QMAKESPECS</literal> should
|
|
be set to the name of one of the
|
|
<application>qt</application> mkspecs sub-directories (ie:
|
|
linux-g++).</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para>On many Linux systems, <literal>QTDIR</literal> is set
|
|
by the login scripts, and <literal>QMAKESPECS</literal> is not
|
|
needed because there is a <filename>default</filename> link in
|
|
<filename>mkspecs/</filename>.</para>
|
|
|
|
<formalpara><title>Configure
|
|
options:</title><para><literal>--without-aspell</literal>
|
|
will disable the code for phonetic matching of search
|
|
terms. <literal>--with-fam</literal> or
|
|
<literal>--with-inotify</literal> will enable the code for
|
|
real time indexing. Inotify support is enabled by default on
|
|
recent Linux systems.</para>
|
|
|
|
<para>Normal procedure:</para>
|
|
<screen>
|
|
<userinput>cd recoll-xxx</userinput>
|
|
<userinput>configure</userinput>
|
|
<userinput>make</userinput>
|
|
<userinput>(practices usual hardship-repelling invocations)</userinput>
|
|
</screen>
|
|
|
|
|
|
<para>There little auto-configuration. The
|
|
<command>configure</command> script will mainly link one of
|
|
the system-specific files in the <filename>mk</filename>
|
|
directory to <filename>mk/sysconf</filename>. If your system
|
|
is not known yet, it will tell you as much, and you may want
|
|
to manually copy and modify one of the existing files (the new
|
|
file name should be the output of <command>uname -s</command>).</para>
|
|
</sect2>
|
|
|
|
<sect2 id="rcl.install.building.install">
|
|
<title>Installation</title>
|
|
|
|
<para>Either type <userinput>make install</userinput> or execute
|
|
<userinput>recollinstall
|
|
<replaceable>prefix</replaceable></userinput>, in the root
|
|
of the source tree. This will copy the commands to
|
|
<filename><replaceable>prefix</replaceable>/bin</filename>
|
|
and the sample configuration files, scripts and other shared
|
|
data to
|
|
<filename><replaceable>prefix</replaceable>/share/recoll</filename>.</para>
|
|
<para>If the installation prefix given to
|
|
<command>recollinstall</command> is different from what was
|
|
specified when executing <command>configure</command>, you
|
|
will have to set the <literal>RECOLL_DATADIR</literal>
|
|
environment variable to indicate where the shared data is to
|
|
be found.</para>
|
|
|
|
<para>You can then proceed to <link
|
|
linkend="rcl.install.config">configuration</link>. </para>
|
|
|
|
</sect2>
|
|
</sect1>
|
|
|
|
<sect1 id="rcl.install.config">
|
|
<title>Configuration overview</title>
|
|
|
|
<para>Most of the parameters specific to the
|
|
<command>recoll</command> GUI are set through the
|
|
<guilabel>Preferences</guilabel> menu and stored in the
|
|
standard QT place
|
|
(<filename>$HOME/.qt/recollrc</filename>). You probably do not
|
|
want to edit this by hand.</para>
|
|
|
|
<para>For other options, &RCL; uses text configuration
|
|
files. You will have to edit them by hand for
|
|
now (there is still some hope for a GUI configuration tool
|
|
in the future). The most accurate documentation for the
|
|
configuration parameters is given by comments inside the default
|
|
files, and we will just give a general overview here.</para>
|
|
|
|
<para>There are two sets of configuration files. The system-wide
|
|
files are kept in a directory named like
|
|
<filename>/usr/[local/]share/recoll/examples</filename>,
|
|
they define default values for the system. A parallel set of
|
|
files exists by default in the <filename>.recoll</filename> directory
|
|
in your home. This directory can be changed with the
|
|
<literal>RECOLL_CONFDIR</literal> environment variable or the -c
|
|
option parameter to <command>recoll</command> and
|
|
<command>recollindex</command>.</para>
|
|
|
|
<para>If the <filename>.recoll</filename> directory does not
|
|
exist when <command>recoll</command> or
|
|
<command>recollindex</command> are started, it will be created
|
|
with a set of empty configuration files.
|
|
<command>recoll</command> will give you a chance to edit the
|
|
configuration file before starting
|
|
indexing. <command>recollindex</command> will proceed
|
|
immediately. To avoid mistakes, the automatic directory
|
|
creation will only occur for the
|
|
default location, not if <literal>-c</literal> or
|
|
<literal>RECOLL_CONFDIR</literal> were used (in the latter
|
|
cases, you will have to create the directory).</para>
|
|
|
|
|
|
<para>All configuration files share the same format. For
|
|
example, a short extract of the main configuration file might
|
|
look as follows:</para>
|
|
<programlisting>
|
|
# Space-separated list of directories to index.
|
|
topdirs = ~/docs /usr/share/doc
|
|
|
|
[~/somedirectory-with-utf8-txt-files]
|
|
defaultcharset = utf-8
|
|
</programlisting>
|
|
|
|
<para>There are three kinds of lines: </para>
|
|
<itemizedlist>
|
|
<listitem><para>Comment (starts with
|
|
<emphasis>#</emphasis>) or empty.</para>
|
|
</listitem>
|
|
<listitem><para>Parameter affectation (<emphasis>name =
|
|
value</emphasis>).</para>
|
|
</listitem>
|
|
<listitem><para>Section definition
|
|
([<emphasis>somedirname</emphasis>]).</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para>Section definitions allow redefining some parameters for
|
|
a directory sub-tree. They stay in effect until another
|
|
section definition, or the end of file, is encountered. Some
|
|
of the parameters used for indexing are looked up
|
|
hierarchically from the current directory location
|
|
upwards. Not all parameters can be meaningfully redefined,
|
|
this is specified for each in the next section. </para>
|
|
|
|
<para>The tilde character (~) is expanded in file names to the
|
|
name of the user's home directory.</para>
|
|
|
|
<para>White space is used for separation inside lists.
|
|
List elements with embedded spaces can be quoted using
|
|
double-quotes.</para>
|
|
|
|
<sect2 id="rcl.install.config.recollconf">
|
|
<title>Main configuration file</title>
|
|
|
|
<para><filename>recoll.conf</filename> is the main
|
|
configuration file. It defines things like
|
|
what to index (top directories and things to ignore), and the
|
|
default character set to use for document types which do not
|
|
specify it internally.</para>
|
|
|
|
<para>The default configuration will index your home
|
|
directory. If this is not appropriate, start
|
|
<command>recoll</command> to create a blank
|
|
configuration, click <guimenu>Cancel</guimenu>, and edit
|
|
the configuration file before restarting the command. This
|
|
will start the initial indexing, which may take some time.</para>
|
|
|
|
<para>Paramers:</para>
|
|
|
|
<variablelist>
|
|
|
|
<varlistentry id="rcl.install.config.recollconf.topdirs">
|
|
<term><literal>topdirs</literal></term>
|
|
<listitem><para>Specifies the list of directories or files to
|
|
index (recursively for directories). The indexer will not
|
|
follow symbolic links inside the indexed trees. If an entry in
|
|
the <literal>topdirs</literal> list is a symbolic link,
|
|
indexing will not start and will generate an error.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry><term><literal>dbdir</literal></term>
|
|
<listitem><para>The name of the Xapian data directory. It
|
|
will be created if needed when the index is
|
|
initialized. If this is not an absolute path, it will be
|
|
interpreted relative to the configuration directory. The
|
|
value can have embedded spaces but starting or trailing
|
|
spaces will be trimmed. You cannot use quotes here.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry><term><literal>skippedNames</literal></term>
|
|
<listitem>
|
|
<para>A space-separated list of patterns for
|
|
names of files or directories that should be completely
|
|
ignored. The list defined in the default file is: </para>
|
|
<programlisting>
|
|
skippedNames = #* bin CVS Cache cache* caughtspam tmp .thumbnails .svn \
|
|
*~ recollrc
|
|
</programlisting>
|
|
<para>The list can be redefined for sub-directories, but is only
|
|
actually changed for the top level ones in
|
|
<literal>topdirs</literal>.</para>
|
|
<para>The top-level directories are not affected by this
|
|
list (that is, a directory in <literal>topdirs</literal>
|
|
might match and would still be indexed).</para>
|
|
<para>The list in the default configuration does not
|
|
exclude hidden directories (names beginning with a
|
|
dot), which means that it may index quite a few things
|
|
that you do not want. On the other hand, mail user
|
|
agents like <application>thunderbird</application>
|
|
usually store messages in hidden directories, and you
|
|
probably want this indexed. One possible solution is to
|
|
have <filename>.*</filename> in
|
|
<literal>skippedNames</literal>, and add things like
|
|
<filename>~/.thunderbird</filename> or
|
|
<filename>~/.evolution</filename> in
|
|
<literal>topdirs</literal>.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry><term><literal>skippedPaths</literal> and
|
|
<literal>daemSkippedPaths</literal> </term>
|
|
<listitem>
|
|
<para>A space-separated list of patterns for
|
|
<emphasis>paths</emphasis> of files or directories that should be skipped.
|
|
There is no default in the sample configuration file,
|
|
but the code always adds the configuration and database
|
|
directories in there.</para>
|
|
<para><literal>skippedPaths</literal> is used both by
|
|
batch and real time
|
|
indexing. <literal>daemSkippedPaths</literal> can be
|
|
used to specify things that should be indexed at
|
|
startup, but not monitored.</para>
|
|
<para>Example of use for skipping text files only in a
|
|
specific directory:</para>
|
|
<programlisting>
|
|
skippedPaths = ~/somedir/*.txt
|
|
</programlisting>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry><term><literal>loglevel,daemloglevel</literal></term>
|
|
<listitem><para>Verbosity level for recoll and
|
|
recollindex. A value of 4 lists quite a lot of
|
|
debug/information messages. 2 only lists errors. The
|
|
<literal>daem</literal>version is specific to the indexing monitor
|
|
daemon.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry><term><literal>logfilename,
|
|
daemlogfilename</literal></term>
|
|
<listitem><para>Where the messages should go. 'stderr' can
|
|
be used as a special value, and is the default. The
|
|
<literal>daem</literal>version is specific to the indexing monitor
|
|
daemon.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry><term><literal>filtersdir</literal></term>
|
|
<listitem><para>A directory to search for the external
|
|
filter scripts used to index some types of files. The
|
|
value should not be changed, except if you want to modify
|
|
one of the default scripts. The value can be redefined for
|
|
any sub-directory. </para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry><term><literal>indexstemminglanguages</literal></term>
|
|
<listitem><para>A list of languages for which the stem
|
|
expansion databases will be built. See recollindex(1) for
|
|
possible values. You can add a stem expansion database for
|
|
a different language by using <command>recollindex
|
|
-s</command>, but it will be deleted during the next
|
|
indexing. Only languages listed in the configuration
|
|
file are permanent.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
<varlistentry><term><literal>defaultcharset</literal></term>
|
|
<listitem><para>The name of the character set used for
|
|
files that do not contain a character set definition (ie:
|
|
plain text files). This can be redefined for any
|
|
sub-directory. If it is not set at all, the character set
|
|
used is the one defined by the nls environment (LC_ALL,
|
|
LC_CTYPE, LANG), or iso8859-1 if nothing is set.</para>
|
|
|
|
<varlistentry><term><literal>guesscharset</literal></term>
|
|
<listitem><para>Decide if we try to guess the character
|
|
set of files if no internal value is available (ie: for
|
|
plain text files). This does not work well in general, and
|
|
should probably not be used. </para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry><term><literal>usesystemfilecommand</literal></term>
|
|
<listitem><para>Decide if we use the <command>file -i</command>
|
|
system command as a final step for determining the mime
|
|
type for a file (the main procedure uses suffix
|
|
associations as defined in the <filename>mimemap</filename>
|
|
file). This can be useful for files with suffix-less names,
|
|
but it will also cause the indexing of many bogus "text"
|
|
files.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry><term><literal>indexallfilenames</literal></term>
|
|
<listitem><para>&RCL; indexes file names in a special
|
|
section of the database to allow specific file names
|
|
searches using wild cards. This parameter decides if
|
|
file name indexing is performed only for files with mime
|
|
types that would qualify them for full text indexing, or
|
|
for all files inside the selected subtrees, independently of
|
|
mime type.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry><term><literal>idxabsmlen</literal></term>
|
|
<listitem><para>&RCL; stores an abstract for each indexed
|
|
file inside the database. This is so that they can be
|
|
displayed inside the result lists without decoding the
|
|
original file. This parameter defines the size of the
|
|
stored abstract (which can come from an actual section or
|
|
just be the beginning of the text). The default value is 250.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry><term><literal>iconsdir</literal></term>
|
|
<listitem><para>The name of the directory where
|
|
<command>recoll</command> result list icons are
|
|
stored. You can change this if you want different
|
|
images.</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
</variablelist>
|
|
|
|
</sect2>
|
|
|
|
<sect2 id="rclinstall.config.mimemap">
|
|
<title>The mimemap file</title>
|
|
|
|
<para><filename>mimemap</filename> specifies the
|
|
file name extension to mime type mappings.</para>
|
|
|
|
<para>For file names without an extension, or with an unknown
|
|
one, the system's <command>file -i</command> command will be
|
|
executed to determine the mime type (this can be switched off
|
|
inside the main configuration file).</para>
|
|
|
|
<para>The mappings can be specified on a per-subtree basis,
|
|
which may be useful in some cases. Example:
|
|
<application>gaim</application> logs have a
|
|
<filename>.txt</filename> extension but
|
|
should be handled specially, which is possible because they
|
|
are usually all located in one place.</para>
|
|
|
|
<para><filename>mimemap</filename> also has a
|
|
<literal>recoll_noindex</literal> variable which is a list of
|
|
suffixes. Matching files will be skipped (which avoids
|
|
unnecessary decompressions or <command>file</command>
|
|
executions). This is partially redundant with
|
|
<literal>skippedNames</literal> in the main configuration
|
|
file, with two differences: it will not affect directories,
|
|
and it cannot be made dependant on the file-system location
|
|
(it is a configuration-wide parameter). You could accomplish
|
|
with <literal>skippedNames</literal> anything that
|
|
<literal>recoll_noindex</literal> does. The latter is used
|
|
mostly for things known to be unindexable by a given &RCL;
|
|
version. Having it there avoids cluttering the more
|
|
user-oriented and locally customized
|
|
<literal>skippedNames</literal>.</para>
|
|
|
|
</sect2>
|
|
|
|
<sect2 id="rclinstall.config.mimeconf">
|
|
<title>The mimeconf file</title>
|
|
|
|
<para><filename>mimeconf</filename> specifies how the
|
|
different mime types are handled for indexing, and which icons
|
|
are displayed in the <command>recoll</command> result lists.</para>
|
|
|
|
<para>Changing the parameters in the [index] section is
|
|
probably not a good idea except if you are a &RCL;
|
|
developer.</para>
|
|
|
|
<para>The [icons] section allows you to change the icons which
|
|
are displayed by <command>recoll</command> in the result
|
|
lists (the values are the basenames of the png images inside
|
|
the <filename>iconsdir</filename> directory (specified in
|
|
<filename>recoll.conf</filename>).</para>
|
|
|
|
</sect2>
|
|
<sect2 id="rclinstall.config.mimeview">
|
|
<title>The mimeview file</title>
|
|
|
|
<para><filename>mimeview</filename> specifies which programs
|
|
are started when you click on an <guilabel>Edit</guilabel>
|
|
link in a result list. Ie: HTML is normally displayed using
|
|
<application>firefox</application>, but you may prefer
|
|
<application>Konqueror</application>, your
|
|
<application>openoffice.org</application>
|
|
program might be named <command>oofice</command> instead of
|
|
<command>openoffice</command> etc.
|
|
</para>
|
|
|
|
<para>Changes to this file can be done by direct editing, or
|
|
through the <command>recoll</command> user preferences dialog.</para>
|
|
|
|
<para>As for the other configuration files, the normal usage
|
|
is to have a <filename>mimeview</filename> inside your own
|
|
configuration directory, with just the non-default entries,
|
|
which will override those from the central configuration
|
|
file.</para>
|
|
<para>Please note that these entries must be placed under a
|
|
<literal>[view]</literal> section.</para>
|
|
|
|
<para>If <guilabel>Use desktop preferences to choose
|
|
document editor</guilabel> is checked in the user preferences,
|
|
all <filename>mimeview</filename> entries will be ignored
|
|
except the one labelled <literal>application/x-all</literal>
|
|
(which is set to use <command>xdg-open</command> by default).</para>
|
|
</sect2>
|
|
|
|
<sect2 id="rclinstall.config.examples">
|
|
<title>Examples of configuration adjustments</title>
|
|
|
|
<sect3 id="rclinstall.config.examples.addview">
|
|
<title>Adding an external viewer for an non-indexed type</title>
|
|
|
|
<para>Imagine that you have some kind of file which does not
|
|
have indexable content, but for which you would like to have a
|
|
functional <guilabel>Edit</guilabel> link in the result list
|
|
(when found by file name). The file names end in
|
|
<replaceable>.blob</replaceable> and can be displayed by
|
|
application <replaceable>blobviewer</replaceable>.</para>
|
|
|
|
<para>You need two entries in the configuration files for this
|
|
to work:</para>
|
|
<itemizedlist>
|
|
<listitem><para>In <filename>$RECOLL_CONFDIR/mimemap</filename>
|
|
(typically <filename>~/.recoll/mimemap</filename>), add the
|
|
following line:</para>
|
|
<programlisting>
|
|
application/x-blobapp = .blob
|
|
</programlisting>
|
|
<para>Note that the mime type is made up here, and you could
|
|
call it <replaceable>diesel/oil</replaceable> just the
|
|
same.</para>
|
|
</listitem>
|
|
<listitem><para>In
|
|
<filename>$RECOLL_CONFDIR/mimeview</filename> under the
|
|
<literal>[view]</literal> section:</para>
|
|
<programlisting>
|
|
application/x-blobapp = blobviewer %f
|
|
</programlisting>
|
|
|
|
<para>We are supposing that
|
|
<replaceable>blobviewer</replaceable> wants a file name
|
|
parameter here, you would use <literal>%u</literal> if
|
|
it liked URLs better.</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
|
|
<para>If you just wanted to change the application used by
|
|
&RCL; to display a mime type which it already knows, you
|
|
would just need to edit <filename>mimeview</filename>. The
|
|
entries you add in your personal file override those in the
|
|
central configuration, which you do not need to alter</para>
|
|
|
|
</sect3>
|
|
|
|
<sect3 id="rclinstall.config.examples.addindex">
|
|
<title>Adding indexing support for a new file type</title>
|
|
|
|
<para>Let us now imagine that the above
|
|
<replaceable>.blob</replaceable> files actually contain
|
|
indexable text and that you know how to extract it with a
|
|
command line program. Getting &RCL; to index the files is
|
|
easy. You need to perform the above alteration, and also to
|
|
add data to the <filename>mimeconf</filename> file
|
|
(typically in <filename>~/.recoll/mimeconf</filename>):</para>
|
|
|
|
<itemizedlist>
|
|
<listitem><para>Under the <literal>[index]</literal>
|
|
section, add the following line (more about the
|
|
<replaceable>rclblob</replaceable> indexing script later):</para>
|
|
<programlisting>
|
|
application/x-blobapp = exec rclblob
|
|
</programlisting>
|
|
<para></para>
|
|
</listitem>
|
|
|
|
<listitem><para>Under the <literal>[icons]</literal>
|
|
section, you should choose an icon to be displayed for the
|
|
files inside the result lists. Icons are normally 64x64
|
|
pixels PNG files which live in
|
|
<filename>/usr/[local/]share/recoll/images</filename>.
|
|
</para>
|
|
</listitem>
|
|
|
|
<listitem><para>Under the <literal>[categories]</literal>
|
|
section, you should add the mime type where it makes sense
|
|
(you can also create a category). Categories may be used
|
|
for filtering in advanced search.</para>
|
|
</listitem>
|
|
|
|
</itemizedlist>
|
|
|
|
<para>The <replaceable>rclblob</replaceable> filter should
|
|
be an executable program or script which exists inside
|
|
<filename>/usr/[local/]share/recoll/filters</filename>. It
|
|
will be given a file name as argument and should output the
|
|
text contents in html format on the standard output.</para>
|
|
|
|
<para>The html could be very minimal like the following
|
|
example:</para>
|
|
<programlisting><html><head>
|
|
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
|
|
</head>
|
|
<body>some text content</body></html>
|
|
</programlisting>
|
|
|
|
<para>You should take care to escape some characters inside
|
|
the text by transforming them into appropriate
|
|
entities. "<literal>&</literal>" should be transformed into
|
|
"<literal>&amp;</literal>", "<literal><</literal>"
|
|
should be transformed into "<literal>&lt;</literal>".</para>
|
|
|
|
<para>The character set needs to be specified in the
|
|
header. It does not need to be UTF-8 (&RCL; will take care
|
|
of translating it), but it must be accurate for good
|
|
results.</para>
|
|
|
|
<para>&RCL; will also make use of other header fields if
|
|
they are present: <literal>title</literal>,
|
|
<literal>description</literal>, <literal>keywords</literal>.
|
|
<para>
|
|
<para>The easiest way to write a new filter is probably to start
|
|
from an existing one.</para>
|
|
</sect3>
|
|
|
|
</sect2>
|
|
|
|
</sect1>
|
|
</chapter>
|
|
|
|
</book>
|
|
|