939 lines
38 KiB
Plaintext
939 lines
38 KiB
Plaintext
|
|
More documentation can be found in the doc/ directory or at http://www.recoll.org
|
|
|
|
|
|
Link: HOME
|
|
Link: PREVIOUS
|
|
Link: NEXT
|
|
|
|
Recoll user manual
|
|
Prev Next
|
|
|
|
--------------------------------------------------------------------------
|
|
|
|
Chapter 5. Installation and configuration
|
|
|
|
Table of Contents
|
|
|
|
5.1. Installing a binary copy
|
|
|
|
5.2. Supporting packages
|
|
|
|
5.3. Building from source
|
|
|
|
5.4. Configuration overview
|
|
|
|
5.1. Installing a binary copy
|
|
|
|
There are three types of binary Recoll installations:
|
|
|
|
* Through your system normal software distribution framework (ie,
|
|
Debian/Ubuntu apt, FreeBSD ports, etc.).
|
|
|
|
* From a package downloaded from the Recoll web site.
|
|
|
|
* From a prebuilt tree downloaded from the Recoll web site.
|
|
|
|
In all cases, the strict software dependancies (ie on Xapian or iconv)
|
|
will be automatically satisfied, you should not have to worry about them.
|
|
|
|
You will only have to check or install supporting applications for the
|
|
file types that you want to index beyond those that are natively processed
|
|
by Recoll (text, HTML, mail files, and a few others).
|
|
|
|
You should also maybe have a look at the configuration section (but this
|
|
may not be necessary for a quick test with default parameters). Most
|
|
parameters can be more conveniently set from the GUI interface.
|
|
|
|
5.1.1. Installing through a package system
|
|
|
|
If you use a BSD-type port system or a prebuilt package (DEB, RPM,
|
|
manually or through the system software configuration utility), just
|
|
follow the usual procedure for your system.
|
|
|
|
5.1.2. Installing a prebuilt Recoll
|
|
|
|
The unpackaged binary versions on the Recoll web site are just compressed
|
|
tar files of a build tree, where only the useful parts were kept
|
|
(executables and sample configuration).
|
|
|
|
The executable binary files are built with a static link to libxapian and
|
|
libiconv, to make installation easier (no dependencies).
|
|
|
|
After extracting the tar file, you can proceed with installation as if you
|
|
had built the package from source (that is, just type make install). The
|
|
binary trees are built for installation to /usr/local.
|
|
|
|
--------------------------------------------------------------------------
|
|
|
|
Prev Home Next
|
|
API Supporting packages
|
|
Link: HOME
|
|
Link: UP
|
|
Link: PREVIOUS
|
|
Link: NEXT
|
|
|
|
Recoll user manual
|
|
Prev Chapter 5. Installation and configuration Next
|
|
|
|
--------------------------------------------------------------------------
|
|
|
|
5.2. Supporting packages
|
|
|
|
Recoll uses external applications to index some file types. You need to
|
|
install them for the file types that you wish to have indexed (these are
|
|
run-time optional dependencies. None is needed for building or running
|
|
Recoll except for indexing their specific file type).
|
|
|
|
After an indexing pass, the commands that were found missing can be
|
|
displayed from the recoll File menu. The list is stored in the missing
|
|
text file inside the configuration directory.
|
|
|
|
A list of common file types which need external commands follows. Many of
|
|
the filters need the iconv command, which is not always listed as a
|
|
dependancy.
|
|
|
|
Please note that, due to the relatively dynamic nature of this
|
|
information, the most up to date version is now kept on the Recoll helper
|
|
applications page along with links to the home pages or best
|
|
source/patches pages, and misc tips. The list below is not updated often
|
|
and may be quite stale.
|
|
|
|
For many Linux distributions, most of the commands listed can be installed
|
|
from the package repositories. However, the packages are sometimes
|
|
outdated, or not the best version for Recoll, so you should take a look at
|
|
the Recoll helper applications page if a file type is important to you.
|
|
|
|
As of Recoll release 1.14, a number of XML-based formats that were handled
|
|
by ad hoc filter code now use the xsltproc command, which usually comes
|
|
with libxslt. These are: abiword, fb2 (ebooks), kword, openoffice, svg.
|
|
|
|
Now for the list:
|
|
|
|
* Openoffice files need unzip and xsltproc.
|
|
|
|
* PDF files need pdftotext which is part of the Xpdf or Poppler
|
|
packages.
|
|
|
|
* Postscript files need pstotext. The original version has an issue with
|
|
shell character in file names, which is corrected in recent packages.
|
|
See the the Recoll helper applications page for more detail.
|
|
|
|
* MS Word needs antiword. It is also useful to have wvWare installed as
|
|
it may be be used as a fallback for some files which antiword does not
|
|
handle.
|
|
|
|
* MS Excel and PowerPoint need catdoc.
|
|
|
|
* MS Open XML (docx) needs xsltproc.
|
|
|
|
* Wordperfect files need wpd2html from the libwpd (or libwpd-tools on
|
|
Ubuntu) package.
|
|
|
|
* RTF files need unrtf, which, in its standard version, has much trouble
|
|
with non-western character sets. Check the Recoll helper applications
|
|
page.
|
|
|
|
* TeX files need untex or detex. Check the Recoll helper applications
|
|
page for sources if it's not packaged for your distribution.
|
|
|
|
* dvi files need dvips.
|
|
|
|
* djvu files need djvutxt and djvused from the DjVuLibre package.
|
|
|
|
* Audio files: Recoll releases before 1.13 used the id3info command from
|
|
the id3lib package to extract mp3 tag information, metaflac (standard
|
|
flac tools) for flac files, and ogginfo (vorbis tools) for ogg files.
|
|
Releases 1.14 and later use a single Python filter based on mutagen
|
|
for all audio file types.
|
|
|
|
* Pictures: Recoll uses the Exiftool Perl package to extract tag
|
|
information. Most image file formats are supported. Note that there
|
|
may not be much interest in indexing the technical tags (image size,
|
|
aperture, etc.). This is only of interest if you store personal tags
|
|
or textual descriptions inside the image files.
|
|
|
|
* chm: files in microsoft help format need Python and the pychm module
|
|
(which needs chmlib).
|
|
|
|
* ICS: up to Recoll 1.13, iCalendar files need Python and the icalendar
|
|
module. icalendar is not needed for newer versions, which use internal
|
|
code.
|
|
|
|
* Zip archives need Python (and the standard zipfile module).
|
|
|
|
* Rar archives need Python, the rarfile Python module and the unrar
|
|
utility.
|
|
|
|
* Midi karaoke files need Python and the Midi module
|
|
|
|
* Konqueror webarchive format with Python (uses the Tarfile module).
|
|
|
|
* mimehtml web archive format (support based on the mail filter, which
|
|
introduces some mild weirdness, but still usable).
|
|
|
|
Text, HTML, mail folders, and Scribus files are processed internally. Lyx
|
|
is used to index Lyx files. Many filters need iconv and the standard sed
|
|
and awk.
|
|
|
|
--------------------------------------------------------------------------
|
|
|
|
Prev Home Next
|
|
Installation and configuration Up Building from source
|
|
Link: HOME
|
|
Link: UP
|
|
Link: PREVIOUS
|
|
Link: NEXT
|
|
|
|
Recoll user manual
|
|
Prev Chapter 5. Installation and configuration Next
|
|
|
|
--------------------------------------------------------------------------
|
|
|
|
5.3. Building from source
|
|
|
|
5.3.1. Prerequisites
|
|
|
|
C++ compiler. Up to Recoll version 1.13.04, its absence can manifest
|
|
itself by strange messages about a missing iconv_open.
|
|
|
|
Development files for Xapian core.
|
|
|
|
Important: If you are building Xapian for an older CPU (before Pentium 4
|
|
or Athlon 64), you need to add the --disable-sse flag to the configure
|
|
command. Else all Xapian application will crash with an illegal
|
|
instruction error.
|
|
|
|
Development files for Qt .
|
|
|
|
Development files for X11 and zlib.
|
|
|
|
Check the Recoll download page for up to date version information.
|
|
|
|
You will most probably be able to find a binary package for Qt for your
|
|
system. You may have to compile Xapian but this is not difficult (if you
|
|
are using FreeBSD, there is a port).
|
|
|
|
You may also need libiconv. Recoll currently uses version 1.9 (this should
|
|
not be critical). On Linux systems, the iconv interface is part of libc
|
|
and you should not need to do anything special.
|
|
|
|
5.3.2. Building
|
|
|
|
Recoll has been built on Linux, FreeBSD, Mac OS X, and Solaris, most
|
|
versions after 2005 should be ok, maybe some older ones too (Solaris 8 is
|
|
ok). If you build on another system, and need to modify things, I would
|
|
very much welcome patches.
|
|
|
|
Depending on the Qt 3 configuration on your system, you may have to set
|
|
the QTDIR and QMAKESPECS variables in your environment:
|
|
|
|
* QTDIR should point to the directory above the one that holds the qt
|
|
include files (ie: if qt.h is /usr/local/qt/include/qt.h, QTDIR should
|
|
be /usr/local/qt).
|
|
|
|
* QMAKESPECS should be set to the name of one of the qt mkspecs
|
|
sub-directories (ie: linux-g++).
|
|
|
|
On many Linux systems, QTDIR is set by the login scripts, and QMAKESPECS
|
|
is not needed because there is a default link in mkspecs/.
|
|
|
|
Neither QTDIR nor QMAKESPECS should be needed with Qt 4, configuration
|
|
details are entirely determined by qmake (which is quite often installed
|
|
as qmake-qt4).
|
|
|
|
Configure options:
|
|
|
|
* --without-aspell will disable the code for phonetic matching of search
|
|
terms.
|
|
|
|
* --with-fam or --with-inotify will enable the code for real time
|
|
indexing. Inotify support is enabled by default on recent Linux
|
|
systems.
|
|
|
|
* --enable-xattr will enable code to fetch data from file extended
|
|
attributes. This is only useful is some application stores data in
|
|
there, and also needs some simple configuration (see comments in the
|
|
fields configuration file).
|
|
|
|
* --enable-camelcase will enable splitting camelCase words. This is not
|
|
enabled by default as it has the unfortunate side-effect of making
|
|
some phrase searches quite confusing: ie, "MySQL manual" would be
|
|
matched by "MySQL manual" and "my sql manual" but not "mysql manual"
|
|
(only inside phrase searches).
|
|
|
|
* --with-file-command Specify the version of the 'file' command to use
|
|
(ie: --with-file-command=/usr/local/bin/file). Can be useful to enable
|
|
the gnu version on systems where the native one is bad.
|
|
|
|
* --without-gui Disable the Qt interface, and auxiliary uses of X11, and
|
|
compile the command line version.
|
|
|
|
* Of course the usual autoconf configure options, like --prefix apply.
|
|
|
|
Normal procedure:
|
|
|
|
cd recoll-xxx
|
|
configure
|
|
make
|
|
(practices usual hardship-repelling invocations)
|
|
|
|
|
|
There is little auto-configuration. The configure script will mainly link
|
|
one of the system-specific files in the mk directory to mk/sysconf. If
|
|
your system is not known yet, it will tell you as much, and you may want
|
|
to manually copy and modify one of the existing files (the new file name
|
|
should be the output of uname -s).
|
|
|
|
5.3.3. Installation
|
|
|
|
Either type make install or execute recollinstall prefix, in the root of
|
|
the source tree. This will copy the commands to prefix/bin and the sample
|
|
configuration files, scripts and other shared data to prefix/share/recoll.
|
|
|
|
If the installation prefix given to recollinstall is different from either
|
|
the system default or the value which was specified when executing
|
|
configure (as in configure --prefix /some/path), you will have to set the
|
|
RECOLL_DATADIR environment variable to indicate where the shared data is
|
|
to be found (ie for (ba)sh: export
|
|
RECOLL_DATADIR=/some/path/share/recoll).
|
|
|
|
You can then proceed to configuration.
|
|
|
|
--------------------------------------------------------------------------
|
|
|
|
Prev Home Next
|
|
Supporting packages Up Configuration overview
|
|
Link: HOME
|
|
Link: UP
|
|
Link: PREVIOUS
|
|
|
|
Recoll user manual
|
|
Prev Chapter 5. Installation and configuration
|
|
|
|
--------------------------------------------------------------------------
|
|
|
|
5.4. Configuration overview
|
|
|
|
Most of the parameters specific to the recoll GUI are set through the
|
|
Preferences menu and stored in the standard Qt place ($HOME/.qt/recollrc).
|
|
You probably do not want to edit this by hand.
|
|
|
|
Recoll indexing options are set inside text configuration files located in
|
|
a configuration directory. There can be several such directories, each of
|
|
which define the parameters for one index.
|
|
|
|
The configuration files can be edited by hand or through the Indexing
|
|
configuration dialog (Preferences menu). The GUI tool will try to respect
|
|
your formatting and comments as much as possible, so it is quite possible
|
|
to use both ways.
|
|
|
|
The most accurate documentation for the configuration parameters is given
|
|
by comments inside the default files, and we will just give a general
|
|
overview here.
|
|
|
|
For each index, there are two sets of configuration files. System-wide
|
|
configuration files are kept in a directory named like
|
|
/usr/[local/]share/recoll/examples, and define default values, shared by
|
|
all indexes. For each index, a parallel set of files defines the
|
|
customized parameters.
|
|
|
|
The default location of the configuration is the .recoll directory in your
|
|
home. Most people will only use this directory.
|
|
|
|
This location can be changed, or others can be added with the
|
|
RECOLL_CONFDIR environment variable or the -c option parameter to recoll
|
|
and recollindex.
|
|
|
|
If the .recoll directory does not exist when recoll or recollindex are
|
|
started, it will be created with a set of empty configuration files.
|
|
recoll will give you a chance to edit the configuration file before
|
|
starting indexing. recollindex will proceed immediately. To avoid
|
|
mistakes, the automatic directory creation will only occur for the default
|
|
location, not if -c or RECOLL_CONFDIR were used (in the latter cases, you
|
|
will have to create the directory).
|
|
|
|
All configuration files share the same format. For example, a short
|
|
extract of the main configuration file might look as follows:
|
|
|
|
# Space-separated list of directories to index.
|
|
topdirs = ~/docs /usr/share/doc
|
|
|
|
[~/somedirectory-with-utf8-txt-files]
|
|
defaultcharset = utf-8
|
|
|
|
|
|
There are three kinds of lines:
|
|
|
|
* Comment (starts with #) or empty.
|
|
|
|
* Parameter affectation (name = value).
|
|
|
|
* Section definition ([somedirname]).
|
|
|
|
Depending on the type of configuration file, section definitions either
|
|
separate groups of parameters or allow redefining some parameters for a
|
|
directory sub-tree. They stay in effect until another section definition,
|
|
or the end of file, is encountered. Some of the parameters used for
|
|
indexing are looked up hierarchically from the current directory location
|
|
upwards. Not all parameters can be meaningfully redefined, this is
|
|
specified for each in the next section.
|
|
|
|
When found at the beginning of a file path, the tilde character (~) is
|
|
expanded to the name of the user's home directory, as a shell would do.
|
|
|
|
White space is used for separation inside lists. List elements with
|
|
embedded spaces can be quoted using double-quotes.
|
|
|
|
5.4.1. Main configuration file
|
|
|
|
recoll.conf is the main configuration file. It defines things like what to
|
|
index (top directories and things to ignore), and the default character
|
|
set to use for document types which do not specify it internally.
|
|
|
|
The default configuration will index your home directory. If this is not
|
|
appropriate, start recoll to create a blank configuration, click Cancel,
|
|
and edit the configuration file before restarting the command. This will
|
|
start the initial indexing, which may take some time.
|
|
|
|
Most of the following parameters can be changed from the Index
|
|
Configuration menu in the recoll interface. Some can only be set by
|
|
editing the configuration file.
|
|
|
|
5.4.1.1. Parameters affecting what documents we index:
|
|
|
|
topdirs
|
|
|
|
Specifies the list of directories or files to index (recursively
|
|
for directories). You can use symbolic links as elements of this
|
|
list. See the followLinks option about following symbolic links
|
|
found under the top elements (not followed by default).
|
|
|
|
skippedNames
|
|
|
|
A space-separated list of patterns for names of files or
|
|
directories that should be completely ignored. The list defined in
|
|
the default file is:
|
|
|
|
skippedNames = #* bin CVS Cache cache* caughtspam tmp .thumbnails .svn \
|
|
*~ .beagle .git .hg .bzr loop.ps .xsession-errors \
|
|
.recoll* xapiandb recollrc recoll.conf
|
|
|
|
The list can be redefined at any sub-directory in the indexed
|
|
area.
|
|
|
|
The top-level directories are not affected by this list (that is,
|
|
a directory in topdirs might match and would still be indexed).
|
|
|
|
The list in the default configuration does not exclude hidden
|
|
directories (names beginning with a dot), which means that it may
|
|
index quite a few things that you do not want. On the other hand,
|
|
mail user agents like thunderbird usually store messages in hidden
|
|
directories, and you probably want this indexed. One possible
|
|
solution is to have .* in skippedNames, and add things like
|
|
~/.thunderbird or ~/.evolution in topdirs.
|
|
|
|
Not even the file names are indexed for patterns in this list. See
|
|
the recoll_noindex variable in mimemap for an alternative approach
|
|
which indexes the file names.
|
|
|
|
skippedPaths and daemSkippedPaths
|
|
|
|
A space-separated list of patterns for paths of files or
|
|
directories that should be skipped. There is no default in the
|
|
sample configuration file, but the code always adds the
|
|
configuration and database directories in there.
|
|
|
|
skippedPaths is used both by batch and real time indexing.
|
|
daemSkippedPaths can be used to specify things that should be
|
|
indexed at startup, but not monitored.
|
|
|
|
Example of use for skipping text files only in a specific
|
|
directory:
|
|
|
|
skippedPaths = ~/somedir/*.txt
|
|
|
|
|
|
followLinks
|
|
|
|
Specifies if the indexer should follow symbolic links while
|
|
walking the file tree. The default is to ignore symbolic links to
|
|
avoid multiple indexing of linked files. No effort is made to
|
|
avoid duplication when this option is set to true. This option can
|
|
be set individually for each of the topdirs members by using
|
|
sections. It can not be changed below the topdirs level.
|
|
|
|
indexedmimetypes
|
|
|
|
Recoll normally indexes any file which it knows how to read. This
|
|
list lets you restrict the indexed mime types to what you specify.
|
|
If the variable is unspecified or the list empty (the default),
|
|
all supported types are processed.
|
|
|
|
compressedfilemaxkbs
|
|
|
|
Size limit for compressed (.gz or .bz2) files. These need to be
|
|
decompressed in a temporary directory for identification, which
|
|
can be very wasteful if 'uninteresting' big compressed files are
|
|
present. Negative means no limit, 0 means no processing of any
|
|
compressed file. Defaults to -1.
|
|
|
|
textfilemaxmbs
|
|
|
|
Maximum size for text files. Very big text files are often
|
|
uninteresting logs. Set to -1 to disable (default 20MB).
|
|
|
|
textfilepagekbs
|
|
|
|
If set to other than -1, text files will be indexed as multiple
|
|
documents of the given page size. This may be useful if you do
|
|
want to index very big text files as it will both reduce memory
|
|
usage at index time and help with loading data to the preview
|
|
window. A size of a few megabytes would seem reasonable (default:
|
|
1MB).
|
|
|
|
indexallfilenames
|
|
|
|
Recoll indexes file names in a special section of the database to
|
|
allow specific file names searches using wild cards. This
|
|
parameter decides if file name indexing is performed only for
|
|
files with mime types that would qualify them for full text
|
|
indexing, or for all files inside the selected subtrees,
|
|
independently of mime type.
|
|
|
|
usesystemfilecommand
|
|
|
|
Decide if we use the file -i system command as a final step for
|
|
determining the mime type for a file (the main procedure uses
|
|
suffix associations as defined in the mimemap file). This can be
|
|
useful for files with suffix-less names, but it will also cause
|
|
the indexing of many bogus "text" files.
|
|
|
|
processbeaglequeue
|
|
|
|
If this is set, process the directory where Beagle Web browser
|
|
plugins copy visited pages for indexing. Of course, Beagle MUST
|
|
NOT be running, else things will behave strangely.
|
|
|
|
beaglequeuedir
|
|
|
|
The path to the Beagle indexing queue. This is hard-coded in the
|
|
Beagle plugin as ~/.beagle/ToIndex so there should be no need to
|
|
change it.
|
|
|
|
5.4.1.2. Parameters affecting how we generate terms:
|
|
|
|
Changing some of these parameters will imply a full reindex. Also, when
|
|
using multiple indexes, it may not make sense to search indexes that don't
|
|
share the values for these parameters, because they usually affect both
|
|
search and index operations.
|
|
|
|
nonumbers
|
|
|
|
If this set to true, no terms will be generated for numbers. For
|
|
example "123", "1.5e6", 192.168.1.4, would not be indexed
|
|
("value123" would still be). Numbers are often quite interesting
|
|
to search for, and this should probably not be set except for
|
|
special situations, ie, scientific documents with huge amounts of
|
|
numbers in them. This can only be set for a whole index, not for a
|
|
subtree.
|
|
|
|
nocjk
|
|
|
|
If this set to true, specific east asian (Chinese Korean Japanese)
|
|
characters/word splitting is turned off. This will save a small
|
|
amount of cpu if you have no CJK documents. If your document base
|
|
does include such text but you are not interested in searching it,
|
|
setting nocjk may be a significant time and space saver.
|
|
|
|
cjkngramlen
|
|
|
|
This lets you adjust the size of n-grams used for indexing CJK
|
|
text. The default value of 2 is probably appropriate in most
|
|
cases. A value of 3 would allow more precision and efficiency on
|
|
longer words, but the index will be approximately twice as large.
|
|
|
|
indexstemminglanguages
|
|
|
|
A list of languages for which the stem expansion databases will be
|
|
built. See recollindex(1) or use the recollindex -l command for
|
|
possible values. You can add a stem expansion database for a
|
|
different language by using recollindex -s, but it will be deleted
|
|
during the next indexing. Only languages listed in the
|
|
configuration file are permanent.
|
|
|
|
defaultcharset
|
|
|
|
The name of the character set used for files that do not contain a
|
|
character set definition (ie: plain text files). This can be
|
|
redefined for any sub-directory. If it is not set at all, the
|
|
character set used is the one defined by the nls environment
|
|
(LC_ALL, LC_CTYPE, LANG), or iso8859-1 if nothing is set.
|
|
|
|
maildefcharset
|
|
|
|
This can be used to define the default character set specifically
|
|
for mail messages which don't specify it. This is mainly useful
|
|
for readpst (libpst) dumps, which are utf-8 but do not say so.
|
|
|
|
localfields
|
|
|
|
This allows setting fields for all documents under a given
|
|
directory. Typical usage would be to set an "rclaptg" field, to be
|
|
used in mimeview to select a specific viewer. If several fields
|
|
are to be set, they should be separated with a colon (':')
|
|
character (which there is currently no way to escape). Ie:
|
|
localfields= rclaptg=gnus:other = val, then select specifier
|
|
viewer with mimetype|tag=... in mimeview.
|
|
|
|
5.4.1.3. Parameters affecting where and how we store things:
|
|
|
|
dbdir
|
|
|
|
The name of the Xapian data directory. It will be created if
|
|
needed when the index is initialized. If this is not an absolute
|
|
path, it will be interpreted relative to the configuration
|
|
directory. The value can have embedded spaces but starting or
|
|
trailing spaces will be trimmed. You cannot use quotes here.
|
|
|
|
maxfsoccuppc
|
|
|
|
Maximum file system occupation before we stop indexing. The value
|
|
is a percentage, corresponding to what the "Capacity" df output
|
|
column shows. The default value is 0, meaning no checking.
|
|
|
|
mboxcachedir
|
|
|
|
The directory where mbox message offsets cache files are held.
|
|
This is normally $RECOLL_CONFDIR/mboxcache, but it may be useful
|
|
to share a directory between different configurations.
|
|
|
|
mboxcacheminmbs
|
|
|
|
The minimum mbox file size over which we cache the offsets. There
|
|
is really no sense in caching offsets for small files. The default
|
|
is 5 MB.
|
|
|
|
webcachedir
|
|
|
|
This is only used by the Beagle web browser plugin indexing code,
|
|
and defines where the cache for visited pages will live. Default:
|
|
$RECOLL_CONFDIR/webcache
|
|
|
|
webcachemaxmbs
|
|
|
|
This is only used by the Beagle web browser plugin indexing code,
|
|
and defines the maximum size for the web page cache. Default: 40
|
|
MB.
|
|
|
|
idxflushmb
|
|
|
|
Threshold (megabytes of new text data) where we flush from memory
|
|
to disk index. Setting this can help control memory usage. A value
|
|
of 0 means no explicit flushing, letting Xapian use its own
|
|
default, which is flushing every 10000 documents (memory usage
|
|
depends on average document size). The default value is 10.
|
|
|
|
5.4.1.4. Miscellaneous parameters:
|
|
|
|
loglevel,daemloglevel
|
|
|
|
Verbosity level for recoll and recollindex. A value of 4 lists
|
|
quite a lot of debug/information messages. 2 only lists errors.
|
|
The daemversion is specific to the indexing monitor daemon.
|
|
|
|
logfilename, daemlogfilename
|
|
|
|
Where the messages should go. 'stderr' can be used as a special
|
|
value, and is the default. The daemversion is specific to the
|
|
indexing monitor daemon.
|
|
|
|
mondelaypatterns
|
|
|
|
This allows specify wildcard path patterns (processed with
|
|
fnmatch(3) with 0 flag), to match files which change too often and
|
|
for which a delay should be observed before re-indexing. This is a
|
|
space-separated list, each entry being a pattern and a time in
|
|
seconds, separated by a colon. You can use double quotes if a path
|
|
entry contains white space. Example:
|
|
|
|
mondelaypatterns = *.log:20 "this one has spaces*:10"
|
|
|
|
|
|
monixinterval
|
|
|
|
Minimum interval (seconds) for processing the indexing queue. The
|
|
real time monitor does not process each event when it comes in,
|
|
but will wait this time for the queue to accumulate to diminish
|
|
overhead and in order to aggregate multiple events to the same
|
|
file. Default 30 S.
|
|
|
|
monauxinterval
|
|
|
|
Period (in seconds) at which the real time monitor will regenerate
|
|
the auxiliary databases (spelling, stemming) if needed. The
|
|
default is one hour.
|
|
|
|
filtermaxseconds
|
|
|
|
Maximum filter execution time, after which it is aborted. Some
|
|
postscript programs just loop...
|
|
|
|
filtersdir
|
|
|
|
A directory to search for the external filter scripts used to
|
|
index some types of files. The value should not be changed, except
|
|
if you want to modify one of the default scripts. The value can be
|
|
redefined for any sub-directory.
|
|
|
|
iconsdir
|
|
|
|
The name of the directory where recoll result list icons are
|
|
stored. You can change this if you want different images.
|
|
|
|
idxabsmlen
|
|
|
|
Recoll stores an abstract for each indexed file inside the
|
|
database. The text can come from an actual 'abstract' section in
|
|
the document or will just be the beginning of the document. It is
|
|
stored in the index so that it can be displayed inside the result
|
|
lists without decoding the original file. The idxabsmlen parameter
|
|
defines the size of the stored abstract. The default value is 250
|
|
bytes. The search interface gives you the choice to display this
|
|
stored text or a synthetic abstract built by extracting text
|
|
around the search terms. If you always prefer the synthetic
|
|
abstract, you can reduce this value and save a little space.
|
|
|
|
aspellLanguage
|
|
|
|
Language definitions to use when creating the aspell dictionary.
|
|
The value must match a set of aspell language definition files.
|
|
You can type "aspell config" to see where these are installed
|
|
(look for data-dir). The default if the variable is not set is to
|
|
use your desktop national language environment to guess the value.
|
|
|
|
noaspell
|
|
|
|
If this is set, the aspell dictionary generation is turned off.
|
|
Useful for cases where you don't need the functionality or when it
|
|
is unusable because aspell crashes during dictionary generation.
|
|
|
|
5.4.2. The fields file
|
|
|
|
This file contains information about dynamic fields handling in Recoll.
|
|
Some very basic fields have hard-wired behaviour, and, mostly, you should
|
|
not change the original data inside the fields file. But you can create
|
|
custom fields fitting your data and handle them just like they were native
|
|
ones.
|
|
|
|
The fields file has several sections, which each define an aspect of
|
|
fields processing. Quite often, you'll have to modify several sections to
|
|
obtain the desired behaviour.
|
|
|
|
We will only give a short description here, you should refer to the
|
|
comments inside the file for more detailed information.
|
|
|
|
Field names should be lowercase alphabetic ASCII.
|
|
|
|
[prefixes]
|
|
|
|
A field becomes indexed (searchable) by having a prefix defined in
|
|
this section.
|
|
|
|
[stored]
|
|
|
|
A field becomes stored (displayable inside results) by having its
|
|
name listed in this section (typically with an empty value).
|
|
|
|
[aliases]
|
|
|
|
This section defines lists of synonyms for the canonical names
|
|
used inside the [prefixes] and [stored] sections
|
|
|
|
filter-specific sections
|
|
|
|
Some filters may need specific configuration for handling fields.
|
|
Only the mail message filter currently has such a section (named
|
|
[mail]). It allows indexing arbitrary mail headers in addition to
|
|
the ones indexed by default. Other such sections may appear in the
|
|
future.
|
|
|
|
Here follows a small example of a personal fields file. This would extract
|
|
a specific mail header and use it as a searchable field, with data
|
|
displayable inside result lists. (Side note: as the mail filter does no
|
|
decoding on the values, only plain ascii headers can be indexed, and only
|
|
the first occurrence will be used for headers that occur several times).
|
|
|
|
[prefixes]
|
|
# Index mailmytag contents (with the given prefix)
|
|
mailmytag = XMTAG
|
|
|
|
[stored]
|
|
# Store mailmytag inside the document data record (so that it can be
|
|
# displayed - as %(mailmytag) - in result lists).
|
|
mailmytag =
|
|
|
|
[mail]
|
|
# Extract the X-My-Tag mail header, and use it internally with the
|
|
# mailmytag field name
|
|
x-my-tag = mailmytag
|
|
|
|
5.4.3. The mimemap file
|
|
|
|
mimemap specifies the file name extension to mime type mappings.
|
|
|
|
For file names without an extension, or with an unknown one, the system's
|
|
file -i command will be executed to determine the mime type (this can be
|
|
switched off inside the main configuration file).
|
|
|
|
The mappings can be specified on a per-subtree basis, which may be useful
|
|
in some cases. Example: gaim logs have a .txt extension but should be
|
|
handled specially, which is possible because they are usually all located
|
|
in one place.
|
|
|
|
mimemap also has a recoll_noindex variable which is a list of suffixes.
|
|
Matching files will be skipped (which avoids unnecessary decompressions or
|
|
file executions). This is partially redundant with skippedNames in the
|
|
main configuration file, with a few differences: it will not affect
|
|
directories, it cannot be made dependant on the file-system location (it
|
|
is a configuration-wide parameter), and the file names will still be
|
|
indexed (not even the file names are indexed for patterns in skippedNames.
|
|
recoll_noindex is used mostly for things known to be unindexable by a
|
|
given Recoll version. Having it there avoids cluttering the more
|
|
user-oriented and locally customized skippedNames.
|
|
|
|
5.4.4. The mimeconf file
|
|
|
|
mimeconf specifies how the different mime types are handled for indexing,
|
|
and which icons are displayed in the recoll result lists.
|
|
|
|
Changing the parameters in the [index] section is probably not a good idea
|
|
except if you are a Recoll developer.
|
|
|
|
The [icons] section allows you to change the icons which are displayed by
|
|
recoll in the result lists (the values are the basenames of the png images
|
|
inside the iconsdir directory (specified in recoll.conf).
|
|
|
|
5.4.5. The mimeview file
|
|
|
|
mimeview specifies which programs are started when you click on an Open
|
|
link in a result list. Ie: HTML is normally displayed using firefox, but
|
|
you may prefer Konqueror, your openoffice.org program might be named
|
|
oofice instead of openoffice etc.
|
|
|
|
Changes to this file can be done by direct editing, or through the recoll
|
|
user preferences dialog.
|
|
|
|
If Use desktop preferences to choose document editor is checked in the
|
|
Recoll GUI user preferences, all mimeview entries will be ignored except
|
|
the one labelled application/x-all (which is set to use xdg-open by
|
|
default).
|
|
|
|
As for the other configuration files, the normal usage is to have a
|
|
mimeview inside your own configuration directory, with just the
|
|
non-default entries, which will override those from the central
|
|
configuration file.
|
|
|
|
Please note that these entries must be placed under a [view] section.
|
|
|
|
The keys in the file are normally mime types. You can add an application
|
|
tag to specialize the choice for an area of the filesystem (using a
|
|
localfields specification in mimeconf). The syntax for the key is
|
|
mimetype|tag
|
|
|
|
The nouncompforviewmts entry, (placed at the top level, outside of the
|
|
[view] section), holds a list of mime types that should not be
|
|
uncompressed before starting the viewer (if they are found compressed, ie:
|
|
mydoc.doc.gz).
|
|
|
|
The right side of each assignment holds a command to be executed for
|
|
opening the file. The following substitutions are performed:
|
|
|
|
* %D. Document date
|
|
|
|
* %f. File name. This may be the name of a temporary file if it was
|
|
necessary to create one (ie: to extract a subdocument from a
|
|
container).
|
|
|
|
* %F. Original file name. Same as %f except if a temporary file is used.
|
|
|
|
* %i. Internal path, for subdocuments of containers. The format depends
|
|
on the container type. If this appears in the command line, Recoll
|
|
will not create a temporary file to extract the subdocument, expecting
|
|
the called application (possibly a script) to be able to handle it.
|
|
|
|
* %M. Mime type
|
|
|
|
* %U, %u. Url.
|
|
|
|
In addition to the predefined values above, all strings like %(fieldname)
|
|
will be replaced by the value of the field named fieldname for the
|
|
document. This could be used in combination with field customisation to
|
|
help with opening the document.
|
|
|
|
5.4.6. Examples of configuration adjustments
|
|
|
|
5.4.6.1. Adding an external viewer for an non-indexed type
|
|
|
|
Imagine that you have some kind of file which does not have indexable
|
|
content, but for which you would like to have a functional Open link in
|
|
the result list (when found by file name). The file names end in .blob and
|
|
can be displayed by application blobviewer.
|
|
|
|
You need two entries in the configuration files for this to work:
|
|
|
|
* In $RECOLL_CONFDIR/mimemap (typically ~/.recoll/mimemap), add the
|
|
following line:
|
|
|
|
.blob = application/x-blobapp
|
|
|
|
Note that the mime type is made up here, and you could call it
|
|
diesel/oil just the same.
|
|
|
|
* In $RECOLL_CONFDIR/mimeview under the [view] section, add:
|
|
|
|
application/x-blobapp = blobviewer %f
|
|
|
|
We are supposing that blobviewer wants a file name parameter here, you
|
|
would use %u if it liked URLs better.
|
|
|
|
If you just wanted to change the application used by Recoll to display a
|
|
mime type which it already knows, you would just need to edit mimeview.
|
|
The entries you add in your personal file override those in the central
|
|
configuration, which you do not need to alter. mimeview can also be
|
|
modified from the Gui.
|
|
|
|
5.4.6.2. Adding indexing support for a new file type
|
|
|
|
Let us now imagine that the above .blob files actually contain indexable
|
|
text and that you know how to extract it with a command line program.
|
|
Getting Recoll to index the files is easy. You need to perform the above
|
|
alteration, and also to add data to the mimeconf file (typically in
|
|
~/.recoll/mimeconf):
|
|
|
|
* Under the [index] section, add the following line (more about the
|
|
rclblob indexing script later):
|
|
|
|
application/x-blobapp = exec rclblob
|
|
|
|
* Under the [icons] section, you should choose an icon to be displayed
|
|
for the files inside the result lists. Icons are normally 64x64 pixels
|
|
PNG files which live in /usr/[local/]share/recoll/images.
|
|
|
|
* Under the [categories] section, you should add the mime type where it
|
|
makes sense (you can also create a category). Categories may be used
|
|
for filtering in advanced search.
|
|
|
|
The rclblob filter should be an executable program or script which exists
|
|
inside /usr/[local/]share/recoll/filters. It will be given a file name as
|
|
argument and should output the text or html contents on the standard
|
|
output.
|
|
|
|
The filter programming section describes in more detail how to write a
|
|
filter.
|
|
|
|
--------------------------------------------------------------------------
|
|
|
|
Prev Home
|
|
Building from source Up
|