1349 lines
57 KiB
Plaintext
1349 lines
57 KiB
Plaintext
|
|
More documentation can be found in the doc/ directory or at http://www.recoll.org
|
|
|
|
|
|
Link: home: Recoll user manual
|
|
Link: up: Recoll user manual
|
|
Link: prev: 4.3. API
|
|
Link: next: 5.2. Supporting packages
|
|
|
|
Chapter 5. Installation and configuration
|
|
Prev Next
|
|
|
|
----------------------------------------------------------------------
|
|
|
|
Chapter 5. Installation and configuration
|
|
|
|
5.1. Installing a binary copy
|
|
|
|
Recoll binary copies are always distributed as regular packages for your
|
|
system. They can be obtained either through the system's normal software
|
|
distribution framework (e.g. Debian/Ubuntu apt, FreeBSD ports, etc.), or
|
|
from some type of "backports" repository providing versions newer than the
|
|
standard ones, or found on the Recoll WEB site in some cases.
|
|
|
|
There used to exist another form of binary install, as pre-compiled source
|
|
trees, but these are just less convenient than the packages and don't
|
|
exist any more.
|
|
|
|
The package management tools will usually automatically deal with hard
|
|
dependencies for packages obtained from a proper package repository. You
|
|
will have to deal with them by hand for downloaded packages (for example,
|
|
when dpkg complains about missing dependencies).
|
|
|
|
In all cases, you will have to check or install supporting applications
|
|
for the file types that you want to index beyond those that are natively
|
|
processed by Recoll (text, HTML, email files, and a few others).
|
|
|
|
You should also maybe have a look at the configuration section (but this
|
|
may not be necessary for a quick test with default parameters). Most
|
|
parameters can be more conveniently set from the GUI interface.
|
|
|
|
----------------------------------------------------------------------
|
|
|
|
Prev Next
|
|
4.3. API Home 5.2. Supporting packages
|
|
Link: home: Recoll user manual
|
|
Link: up: Chapter 5. Installation and configuration
|
|
Link: prev: Chapter 5. Installation and configuration
|
|
Link: next: 5.3. Building from source
|
|
|
|
5.2. Supporting packages
|
|
Prev Chapter 5. Installation and configuration Next
|
|
|
|
----------------------------------------------------------------------
|
|
|
|
5.2. Supporting packages
|
|
|
|
Recoll uses external applications to index some file types. You need to
|
|
install them for the file types that you wish to have indexed (these are
|
|
run-time optional dependencies. None is needed for building or running
|
|
Recoll except for indexing their specific file type).
|
|
|
|
After an indexing pass, the commands that were found missing can be
|
|
displayed from the recoll File menu. The list is stored in the missing
|
|
text file inside the configuration directory.
|
|
|
|
A list of common file types which need external commands follows. Many of
|
|
the handlers need the iconv command, which is not always listed as a
|
|
dependency.
|
|
|
|
Please note that, due to the relatively dynamic nature of this
|
|
information, the most up to date version is now kept on
|
|
http://www.recoll.org/features.html along with links to the home pages or
|
|
best source/patches pages, and misc tips. The list below is not updated
|
|
often and may be quite stale.
|
|
|
|
For many Linux distributions, most of the commands listed can be installed
|
|
from the package repositories. However, the packages are sometimes
|
|
outdated, or not the best version for Recoll, so you should take a look at
|
|
http://www.recoll.org/features.html if a file type is important to you.
|
|
|
|
As of Recoll release 1.14, a number of XML-based formats that were handled
|
|
by ad hoc handler code now use the xsltproc command, which usually comes
|
|
with libxslt. These are: abiword, fb2 (ebooks), kword, openoffice, svg.
|
|
|
|
Now for the list:
|
|
|
|
o Openoffice files need unzip and xsltproc.
|
|
|
|
o PDF files need pdftotext which is part of Poppler (usually comes with
|
|
the poppler-utils package). Avoid the original one from Xpdf.
|
|
|
|
o Postscript files need pstotext. The original version has an issue with
|
|
shell character in file names, which is corrected in recent packages.
|
|
See http://www.recoll.org/features.html for more detail.
|
|
|
|
o MS Word needs antiword. It is also useful to have wvWare installed as
|
|
it may be be used as a fallback for some files which antiword does not
|
|
handle.
|
|
|
|
o MS Excel and PowerPoint are processed by internal Python handlers.
|
|
|
|
o MS Open XML (docx) needs xsltproc.
|
|
|
|
o Wordperfect files need wpd2html from the libwpd (or libwpd-tools on
|
|
Ubuntu) package.
|
|
|
|
o RTF files need unrtf, which, in its older versions, has much trouble
|
|
with non-western character sets. Many Linux distributions carry
|
|
outdated unrtf versions. Check http://www.recoll.org/features.html for
|
|
details.
|
|
|
|
o TeX files need untex or detex. Check
|
|
http://www.recoll.org/features.html for sources if it's not packaged
|
|
for your distribution.
|
|
|
|
o dvi files need dvips.
|
|
|
|
o djvu files need djvutxt and djvused from the DjVuLibre package.
|
|
|
|
o Audio files: Recoll releases 1.14 and later use a single Python
|
|
handler based on mutagen for all audio file types.
|
|
|
|
o Pictures: Recoll uses the Exiftool Perl package to extract tag
|
|
information. Most image file formats are supported. Note that there
|
|
may not be much interest in indexing the technical tags (image size,
|
|
aperture, etc.). This is only of interest if you store personal tags
|
|
or textual descriptions inside the image files.
|
|
|
|
o chm: files in Microsoft help format need Python and the pychm module
|
|
(which needs chmlib).
|
|
|
|
o ICS: up to Recoll 1.13, iCalendar files need Python and the icalendar
|
|
module. icalendar is not needed for newer versions, which use internal
|
|
code.
|
|
|
|
o Zip archives need Python (and the standard zipfile module).
|
|
|
|
o Rar archives need Python, the rarfile Python module and the unrar
|
|
utility.
|
|
|
|
o Midi karaoke files need Python and the Midi module
|
|
|
|
o Konqueror webarchive format with Python (uses the Tarfile module).
|
|
|
|
o Mimehtml web archive format (support based on the email handler, which
|
|
introduces some mild weirdness, but still usable).
|
|
|
|
Text, HTML, email folders, and Scribus files are processed internally. Lyx
|
|
is used to index Lyx files. Many handlers need iconv and the standard sed
|
|
and awk.
|
|
|
|
----------------------------------------------------------------------
|
|
|
|
Prev Up Next
|
|
Chapter 5. Installation and configuration Home 5.3. Building from source
|
|
Link: home: Recoll user manual
|
|
Link: up: Chapter 5. Installation and configuration
|
|
Link: prev: 5.2. Supporting packages
|
|
Link: next: 5.4. Configuration overview
|
|
|
|
5.3. Building from source
|
|
Prev Chapter 5. Installation and configuration Next
|
|
|
|
----------------------------------------------------------------------
|
|
|
|
5.3. Building from source
|
|
|
|
5.3.1. Prerequisites
|
|
|
|
If you can install any or all of the following through the package manager
|
|
for your system, all the better. Especially Qt is a very big piece of
|
|
software, but you will most probably be able to find a binary package.
|
|
|
|
You may have to compile Xapian but this is easy.
|
|
|
|
The shopping list:
|
|
|
|
o C++ compiler. Up to Recoll version 1.13.04, its absence can manifest
|
|
itself by strange messages about a missing iconv_open.
|
|
|
|
o Development files for Xapian core.
|
|
|
|
Important
|
|
|
|
If you are building Xapian for an older CPU (before Pentium 4 or
|
|
Athlon 64), you need to add the --disable-sse flag to the configure
|
|
command. Else all Xapian application will crash with an illegal
|
|
instruction error.
|
|
|
|
o Development files for Qt 4 . Recoll has not been tested with Qt 5 yet.
|
|
Recoll 1.15.9 was the last version to support Qt 3. If you do not want
|
|
to install or build the Qt Webkit module, Recoll has a configuration
|
|
option to disable its use (see further).
|
|
|
|
o Development files for X11 and zlib.
|
|
|
|
o You may also need libiconv. On Linux systems, the iconv interface is
|
|
part of libc and you should not need to do anything special.
|
|
|
|
Check the Recoll download page for up to date version information.
|
|
|
|
5.3.2. Building
|
|
|
|
Recoll has been built on Linux, FreeBSD, Mac OS X, and Solaris, most
|
|
versions after 2005 should be ok, maybe some older ones too (Solaris 8 is
|
|
ok). If you build on another system, and need to modify things, I would
|
|
very much welcome patches.
|
|
|
|
Configure options:
|
|
|
|
o --without-aspell will disable the code for phonetic matching of search
|
|
terms.
|
|
|
|
o --with-fam or --with-inotify will enable the code for real time
|
|
indexing. Inotify support is enabled by default on recent Linux
|
|
systems.
|
|
|
|
o --with-qzeitgeist will enable sending Zeitgeist events about the
|
|
visited search results, and needs the qzeitgeist package.
|
|
|
|
o --disable-webkit is available from version 1.17 to implement the
|
|
result list with a Qt QTextBrowser instead of a WebKit widget if you
|
|
do not or can't depend on the latter.
|
|
|
|
o --disable-idxthreads is available from version 1.19 to suppress
|
|
multithreading inside the indexing process. You can also use the
|
|
run-time configuration to restrict recollindex to using a single
|
|
thread, but the compile-time option may disable a few more unused
|
|
locks. This only applies to the use of multithreading for the core
|
|
index processing (data input). The Recoll monitor mode always uses at
|
|
least two threads of execution.
|
|
|
|
o --disable-python-module will avoid building the Python module.
|
|
|
|
o --disable-xattr will prevent fetching data from file extended
|
|
attributes. Beyond a few standard attributes, fetching extended
|
|
attributes data can only be useful is some application stores data in
|
|
there, and also needs some simple configuration (see comments in the
|
|
fields configuration file).
|
|
|
|
o --enable-camelcase will enable splitting camelCase words. This is not
|
|
enabled by default as it has the unfortunate side-effect of making
|
|
some phrase searches quite confusing: ie, "MySQL manual" would be
|
|
matched by "MySQL manual" and "my sql manual" but not "mysql manual"
|
|
(only inside phrase searches).
|
|
|
|
o --with-file-command Specify the version of the 'file' command to use
|
|
(ie: --with-file-command=/usr/local/bin/file). Can be useful to enable
|
|
the gnu version on systems where the native one is bad.
|
|
|
|
o --disable-qtgui Disable the Qt interface. Will allow building the
|
|
indexer and the command line search program in absence of a Qt
|
|
environment.
|
|
|
|
o --disable-x11mon Disable X11 connection monitoring inside recollindex.
|
|
Together with --disable-qtgui, this allows building recoll without Qt
|
|
and X11.
|
|
|
|
o --disable-pic will compile Recoll with position-dependant code. This
|
|
is incompatible with building the KIO or the Python or PHP extensions,
|
|
but might yield very marginally faster code.
|
|
|
|
o Of course the usual autoconf configure options, like --prefix apply.
|
|
|
|
Normal procedure:
|
|
|
|
cd recoll-xxx
|
|
./configure
|
|
make
|
|
(practices usual hardship-repelling invocations)
|
|
|
|
|
|
There is little auto-configuration. The configure script will mainly link
|
|
one of the system-specific files in the mk directory to mk/sysconf. If
|
|
your system is not known yet, it will tell you as much, and you may want
|
|
to manually copy and modify one of the existing files (the new file name
|
|
should be the output of uname -s).
|
|
|
|
5.3.2.1. Building on Solaris
|
|
|
|
We did not test building the GUI on Solaris for recent versions. You will
|
|
need at least Qt 4.4. There are some hints on an old web site page, they
|
|
may still be valid.
|
|
|
|
Someone did test the 1.19 indexer and Python module build, they do work,
|
|
with a few minor glitches. Be sure to use GNU make and install.
|
|
|
|
5.3.3. Installation
|
|
|
|
Either type make install or execute recollinstall prefix, in the root of
|
|
the source tree. This will copy the commands to prefix/bin and the sample
|
|
configuration files, scripts and other shared data to prefix/share/recoll.
|
|
|
|
If the installation prefix given to recollinstall is different from either
|
|
the system default or the value which was specified when executing
|
|
configure (as in configure --prefix /some/path), you will have to set the
|
|
RECOLL_DATADIR environment variable to indicate where the shared data is
|
|
to be found (ie for (ba)sh: export
|
|
RECOLL_DATADIR=/some/path/share/recoll).
|
|
|
|
You can then proceed to configuration.
|
|
|
|
----------------------------------------------------------------------
|
|
|
|
Prev Up Next
|
|
5.2. Supporting packages Home 5.4. Configuration overview
|
|
Link: home: Recoll user manual
|
|
Link: up: Chapter 5. Installation and configuration
|
|
Link: prev: 5.3. Building from source
|
|
|
|
5.4. Configuration overview
|
|
Prev Chapter 5. Installation and configuration
|
|
|
|
----------------------------------------------------------------------
|
|
|
|
5.4. Configuration overview
|
|
|
|
Most of the parameters specific to the recoll GUI are set through the
|
|
Preferences menu and stored in the standard Qt place
|
|
($HOME/.config/Recoll.org/recoll.conf). You probably do not want to edit
|
|
this by hand.
|
|
|
|
Recoll indexing options are set inside text configuration files located in
|
|
a configuration directory. There can be several such directories, each of
|
|
which defines the parameters for one index.
|
|
|
|
The configuration files can be edited by hand or through the Index
|
|
configuration dialog (Preferences menu). The GUI tool will try to respect
|
|
your formatting and comments as much as possible, so it is quite possible
|
|
to use both ways.
|
|
|
|
The most accurate documentation for the configuration parameters is given
|
|
by comments inside the default files, and we will just give a general
|
|
overview here.
|
|
|
|
By default, for each index, there are two sets of configuration files.
|
|
System-wide configuration files are kept in a directory named like
|
|
/usr/[local/]share/recoll/examples, and define default values, shared by
|
|
all indexes. For each index, a parallel set of files defines the
|
|
customized parameters.
|
|
|
|
In addition (as of Recoll version 1.19.7), it is possible to specify two
|
|
additional configuration directories which will be stacked before and
|
|
after the user configuration directory. These are defined by the
|
|
RECOLL_CONFTOP and RECOLL_CONFMID environment variables. Values from
|
|
configuration files inside the top directory will override user ones,
|
|
values from configuration files inside the middle directory will override
|
|
system ones and be overridden by user ones. These two variables may be of
|
|
use to applications which augment Recoll functionality, and need to add
|
|
configuration data without disturbing the user's files. Please note that
|
|
the two, currently single, values will probably be interpreted as
|
|
colon-separated lists in the future: do not use colon characters inside
|
|
the directory paths.
|
|
|
|
The default location of the configuration is the .recoll directory in your
|
|
home. Most people will only use this directory.
|
|
|
|
This location can be changed, or others can be added with the
|
|
RECOLL_CONFDIR environment variable or the -c option parameter to recoll
|
|
and recollindex.
|
|
|
|
If the .recoll directory does not exist when recoll or recollindex are
|
|
started, it will be created with a set of empty configuration files.
|
|
recoll will give you a chance to edit the configuration file before
|
|
starting indexing. recollindex will proceed immediately. To avoid
|
|
mistakes, the automatic directory creation will only occur for the default
|
|
location, not if -c or RECOLL_CONFDIR were used (in the latter cases, you
|
|
will have to create the directory).
|
|
|
|
All configuration files share the same format. For example, a short
|
|
extract of the main configuration file might look as follows:
|
|
|
|
# Space-separated list of directories to index.
|
|
topdirs = ~/docs /usr/share/doc
|
|
|
|
[~/somedirectory-with-utf8-txt-files]
|
|
defaultcharset = utf-8
|
|
|
|
|
|
There are three kinds of lines:
|
|
|
|
o Comment (starts with #) or empty.
|
|
|
|
o Parameter affectation (name = value).
|
|
|
|
o Section definition ([somedirname]).
|
|
|
|
Depending on the type of configuration file, section definitions either
|
|
separate groups of parameters or allow redefining some parameters for a
|
|
directory sub-tree. They stay in effect until another section definition,
|
|
or the end of file, is encountered. Some of the parameters used for
|
|
indexing are looked up hierarchically from the current directory location
|
|
upwards. Not all parameters can be meaningfully redefined, this is
|
|
specified for each in the next section.
|
|
|
|
When found at the beginning of a file path, the tilde character (~) is
|
|
expanded to the name of the user's home directory, as a shell would do.
|
|
|
|
White space is used for separation inside lists. List elements with
|
|
embedded spaces can be quoted using double-quotes.
|
|
|
|
Encoding issues. Most of the configuration parameters are plain ASCII. Two
|
|
particular sets of values may cause encoding issues:
|
|
|
|
o File path parameters may contain non-ascii characters and should use
|
|
the exact same byte values as found in the file system directory.
|
|
Usually, this means that the configuration file should use the system
|
|
default locale encoding.
|
|
|
|
o The unac_except_trans parameter should be encoded in UTF-8. If your
|
|
system locale is not UTF-8, and you need to also specify non-ascii
|
|
file paths, this poses a difficulty because common text editors cannot
|
|
handle multiple encodings in a single file. In this relatively
|
|
unlikely case, you can edit the configuration file as two separate
|
|
text files with appropriate encodings, and concatenate them to create
|
|
the complete configuration.
|
|
|
|
5.4.1. Environment variables
|
|
|
|
RECOLL_CONFDIR
|
|
|
|
Defines the main configuration directory.
|
|
|
|
RECOLL_TMPDIR, TMPDIR
|
|
|
|
Locations for temporary files, in this order of priority. The
|
|
default if none of these is set is to use /tmp. Big temporary
|
|
files may be created during indexing, mostly for decompressing,
|
|
and also for processing, e.g. email attachments.
|
|
|
|
RECOLL_CONFTOP, RECOLL_CONFMID
|
|
|
|
Allow adding configuration directories with priorities below and
|
|
above the user directory (see above the Configuration overview
|
|
section for details).
|
|
|
|
RECOLL_EXTRA_DBS, RECOLL_ACTIVE_EXTRA_DBS
|
|
|
|
Help for setting up external indexes. See this paragraph for
|
|
explanations.
|
|
|
|
RECOLL_DATADIR
|
|
|
|
Defines replacement for the default location of Recoll data files,
|
|
normally found in, e.g., /usr/share/recoll).
|
|
|
|
RECOLL_FILTERSDIR
|
|
|
|
Defines replacement for the default location of Recoll filters,
|
|
normally found in, e.g., /usr/share/recoll/filters).
|
|
|
|
ASPELL_PROG
|
|
|
|
aspell program to use for creating the spelling dictionary. The
|
|
result has to be compatible with the libaspell which Recoll is
|
|
using.
|
|
|
|
VARNAME
|
|
|
|
Blabla
|
|
|
|
5.4.2. The main configuration file, recoll.conf
|
|
|
|
recoll.conf is the main configuration file. It defines things like what to
|
|
index (top directories and things to ignore), and the default character
|
|
set to use for document types which do not specify it internally.
|
|
|
|
The default configuration will index your home directory. If this is not
|
|
appropriate, start recoll to create a blank configuration, click Cancel,
|
|
and edit the configuration file before restarting the command. This will
|
|
start the initial indexing, which may take some time.
|
|
|
|
Most of the following parameters can be changed from the Index
|
|
Configuration menu in the recoll interface. Some can only be set by
|
|
editing the configuration file.
|
|
|
|
5.4.2.1. Parameters affecting what documents we index:
|
|
|
|
topdirs
|
|
|
|
Specifies the list of directories or files to index (recursively
|
|
for directories). You can use symbolic links as elements of this
|
|
list. See the followLinks option about following symbolic links
|
|
found under the top elements (not followed by default).
|
|
|
|
skippedNames
|
|
|
|
A space-separated list of wildcard patterns for names of files or
|
|
directories that should be completely ignored. The list defined in
|
|
the default file is:
|
|
|
|
skippedNames = #* bin CVS Cache cache* caughtspam tmp .thumbnails .svn \
|
|
*~ .beagle .git .hg .bzr loop.ps .xsession-errors \
|
|
.recoll* xapiandb recollrc recoll.conf
|
|
|
|
The list can be redefined at any sub-directory in the indexed
|
|
area.
|
|
|
|
The top-level directories are not affected by this list (that is,
|
|
a directory in topdirs might match and would still be indexed).
|
|
|
|
The list in the default configuration does not exclude hidden
|
|
directories (names beginning with a dot), which means that it may
|
|
index quite a few things that you do not want. On the other hand,
|
|
email user agents like thunderbird usually store messages in
|
|
hidden directories, and you probably want this indexed. One
|
|
possible solution is to have .* in skippedNames, and add things
|
|
like ~/.thunderbird or ~/.evolution in topdirs.
|
|
|
|
Not even the file names are indexed for patterns in this list. See
|
|
the noContentSuffixes variable for an alternative approach which
|
|
indexes the file names.
|
|
|
|
noContentSuffixes
|
|
|
|
This is a list of file name endings (not wildcard expressions, nor
|
|
dot-delimited suffixes). Only the names of matching files will be
|
|
indexed (no attempt at MIME type identification, no decompression,
|
|
no content indexing). This can be redefined for subdirectories,
|
|
and edited from the GUI. The default value is:
|
|
|
|
noContentSuffixes = .md5 .map \
|
|
.o .lib .dll .a .sys .exe .com \
|
|
.mpp .mpt .vsd \
|
|
.img .img.gz .img.bz2 .img.xz .image .image.gz .image.bz2 .image.xz \
|
|
.dat .bak .rdf .log.gz .log .db .msf .pid \
|
|
,v ~ #
|
|
|
|
skippedPaths and daemSkippedPaths
|
|
|
|
A space-separated list of patterns for paths of files or
|
|
directories that should be skipped. There is no default in the
|
|
sample configuration file, but the code always adds the
|
|
configuration and database directories in there.
|
|
|
|
skippedPaths is used both by batch and real time indexing.
|
|
daemSkippedPaths can be used to specify things that should be
|
|
indexed at startup, but not monitored.
|
|
|
|
Example of use for skipping text files only in a specific
|
|
directory:
|
|
|
|
skippedPaths = ~/somedir/*.txt
|
|
|
|
|
|
skippedPathsFnmPathname
|
|
|
|
The values in the *skippedPaths variables are matched by default
|
|
with fnmatch(3), with the FNM_PATHNAME flag. This means that '/'
|
|
characters must be matched explicitly. You can set
|
|
skippedPathsFnmPathname to 0 to disable the use of FNM_PATHNAME
|
|
(meaning that /*/dir3 will match /dir1/dir2/dir3).
|
|
|
|
zipSkippedNames
|
|
|
|
A space-separated list of patterns for names of files or
|
|
directories that should be ignored inside zip archives. This is
|
|
used directly by the zip handler, and has a function similar to
|
|
skippedNames, but works independently. Can be redefined for
|
|
filesystem subdirectories. For versions up to 1.19, you will need
|
|
to update the Zip handler and install a supplementary Python
|
|
module. The details are described on the Recoll wiki.
|
|
|
|
followLinks
|
|
|
|
Specifies if the indexer should follow symbolic links while
|
|
walking the file tree. The default is to ignore symbolic links to
|
|
avoid multiple indexing of linked files. No effort is made to
|
|
avoid duplication when this option is set to true. This option can
|
|
be set individually for each of the topdirs members by using
|
|
sections. It can not be changed below the topdirs level.
|
|
|
|
indexedmimetypes
|
|
|
|
Recoll normally indexes any file which it knows how to read. This
|
|
list lets you restrict the indexed MIME types to what you specify.
|
|
If the variable is unspecified or the list empty (the default),
|
|
all supported types are processed. Can be redefined for
|
|
subdirectories.
|
|
|
|
excludedmimetypes
|
|
|
|
This list lets you exclude some MIME types from indexing. Can be
|
|
redefined for subdirectories.
|
|
|
|
compressedfilemaxkbs
|
|
|
|
Size limit for compressed (.gz or .bz2) files. These need to be
|
|
decompressed in a temporary directory for identification, which
|
|
can be very wasteful if 'uninteresting' big compressed files are
|
|
present. Negative means no limit, 0 means no processing of any
|
|
compressed file. Defaults to -1.
|
|
|
|
textfilemaxmbs
|
|
|
|
Maximum size for text files. Very big text files are often
|
|
uninteresting logs. Set to -1 to disable (default 20MB).
|
|
|
|
textfilepagekbs
|
|
|
|
If set to other than -1, text files will be indexed as multiple
|
|
documents of the given page size. This may be useful if you do
|
|
want to index very big text files as it will both reduce memory
|
|
usage at index time and help with loading data to the preview
|
|
window. A size of a few megabytes would seem reasonable (default:
|
|
1MB).
|
|
|
|
membermaxkbs
|
|
|
|
This defines the maximum size in kilobytes for an archive member
|
|
(zip, tar or rar at the moment). Bigger entries will be skipped.
|
|
|
|
indexallfilenames
|
|
|
|
Recoll indexes file names in a special section of the database to
|
|
allow specific file names searches using wild cards. This
|
|
parameter decides if file name indexing is performed only for
|
|
files with MIME types that would qualify them for full text
|
|
indexing, or for all files inside the selected subtrees,
|
|
independently of MIME type.
|
|
|
|
usesystemfilecommand
|
|
|
|
Decide if we execute a system command (file -i by default) as a
|
|
final step for determining the MIME type for a file (the main
|
|
procedure uses suffix associations as defined in the mimemap
|
|
file). This can be useful for files with suffix-less names, but it
|
|
will also cause the indexing of many bogus "text" files.
|
|
|
|
systemfilecommand
|
|
|
|
Command to use for mime for mime type determination if
|
|
usesystefilecommand is set. Recent versions of xdg-mime sometimes
|
|
work better than file.
|
|
|
|
processwebqueue
|
|
|
|
If this is set, process the directory where Web browser plugins
|
|
copy visited pages for indexing.
|
|
|
|
webqueuedir
|
|
|
|
The path to the web indexing queue. This is hard-coded in the
|
|
Firefox plugin as ~/.recollweb/ToIndex so there should be no need
|
|
to change it.
|
|
|
|
5.4.2.2. Parameters affecting how we generate terms:
|
|
|
|
Changing some of these parameters will imply a full reindex. Also, when
|
|
using multiple indexes, it may not make sense to search indexes that don't
|
|
share the values for these parameters, because they usually affect both
|
|
search and index operations.
|
|
|
|
indexStripChars
|
|
|
|
Decide if we strip characters of diacritics and convert them to
|
|
lower-case before terms are indexed. If we don't, searches
|
|
sensitive to case and diacritics can be performed, but the index
|
|
will be bigger, and some marginal weirdness may sometimes occur.
|
|
The default is a stripped index (indexStripChars = 1) for now.
|
|
When using multiple indexes for a search, this parameter must be
|
|
defined identically for all. Changing the value implies an index
|
|
reset.
|
|
|
|
maxTermExpand
|
|
|
|
Maximum expansion count for a single term (e.g.: when using
|
|
wildcards). The default of 10000 is reasonable and will avoid
|
|
queries that appear frozen while the engine is walking the term
|
|
list.
|
|
|
|
maxXapianClauses
|
|
|
|
Maximum number of elementary clauses we can add to a single Xapian
|
|
query. In some cases, the result of term expansion can be
|
|
multiplicative, and we want to avoid using excessive memory. The
|
|
default of 100 000 should be both high enough in most cases and
|
|
compatible with current typical hardware configurations.
|
|
|
|
nonumbers
|
|
|
|
If this set to true, no terms will be generated for numbers. For
|
|
example "123", "1.5e6", 192.168.1.4, would not be indexed
|
|
("value123" would still be). Numbers are often quite interesting
|
|
to search for, and this should probably not be set except for
|
|
special situations, ie, scientific documents with huge amounts of
|
|
numbers in them. This can only be set for a whole index, not for a
|
|
subtree.
|
|
|
|
nocjk
|
|
|
|
If this set to true, specific east asian (Chinese Korean Japanese)
|
|
characters/word splitting is turned off. This will save a small
|
|
amount of cpu if you have no CJK documents. If your document base
|
|
does include such text but you are not interested in searching it,
|
|
setting nocjk may be a significant time and space saver.
|
|
|
|
cjkngramlen
|
|
|
|
This lets you adjust the size of n-grams used for indexing CJK
|
|
text. The default value of 2 is probably appropriate in most
|
|
cases. A value of 3 would allow more precision and efficiency on
|
|
longer words, but the index will be approximately twice as large.
|
|
|
|
indexstemminglanguages
|
|
|
|
A list of languages for which the stem expansion databases will be
|
|
built. See recollindex(1) or use the recollindex -l command for
|
|
possible values. You can add a stem expansion database for a
|
|
different language by using recollindex -s, but it will be deleted
|
|
during the next indexing. Only languages listed in the
|
|
configuration file are permanent.
|
|
|
|
defaultcharset
|
|
|
|
The name of the character set used for files that do not contain a
|
|
character set definition (ie: plain text files). This can be
|
|
redefined for any sub-directory. If it is not set at all, the
|
|
character set used is the one defined by the nls environment (
|
|
LC_ALL, LC_CTYPE, LANG), or iso8859-1 if nothing is set.
|
|
|
|
unac_except_trans
|
|
|
|
This is a list of characters, encoded in UTF-8, which should be
|
|
handled specially when converting text to unaccented lowercase.
|
|
For example, in Swedish, the letter a with diaeresis has full
|
|
alphabet citizenship and should not be turned into an a. Each
|
|
element in the space-separated list has the special character as
|
|
first element and the translation following. The handling of both
|
|
the lowercase and upper-case versions of a character should be
|
|
specified, as appartenance to the list will turn-off both standard
|
|
accent and case processing. Example for Swedish:
|
|
|
|
unac_except_trans = aaaa AAaa a:a: A:a: o:o: O:o:
|
|
|
|
|
|
Note that the translation is not limited to a single character,
|
|
you could very well have something like u:ue in the list.
|
|
|
|
The default value set for unac_except_trans can't be listed here
|
|
because I have trouble with SGML and UTF-8, but it only contains
|
|
ligature decompositions: german ss, oe, ae, fi, fl.
|
|
|
|
This parameter can't be defined for subdirectories, it is global,
|
|
because there is no way to do otherwise when querying. If you have
|
|
document sets which would need different values, you will have to
|
|
index and query them separately.
|
|
|
|
maildefcharset
|
|
|
|
This can be used to define the default character set specifically
|
|
for email messages which don't specify it. This is mainly useful
|
|
for readpst (libpst) dumps, which are utf-8 but do not say so.
|
|
|
|
localfields
|
|
|
|
This allows setting fields for all documents under a given
|
|
directory. Typical usage would be to set an "rclaptg" field, to be
|
|
used in mimeview to select a specific viewer. If several fields
|
|
are to be set, they should be separated with a semi-colon (';')
|
|
character, which there is currently no way to escape. Also note
|
|
the initial semi-colon. Example: localfields= ;rclaptg=gnus;other
|
|
= val, then select specifier viewer with mimetype|tag=... in
|
|
mimeview.
|
|
|
|
testmodifusemtime
|
|
|
|
If true, use mtime instead of default ctime to determine if a file
|
|
has been modified (in addition to size, which is always used).
|
|
Setting this can reduce re-indexing on systems where extended
|
|
attributes are modified (by some other application), but not
|
|
indexed (changing extended attributes only affects ctime). Notes:
|
|
|
|
o This may prevent detection of change in some marginal file
|
|
rename cases (the target would need to have the same size and
|
|
mtime).
|
|
|
|
o You should probably also set noxattrfields to 1 in this case,
|
|
except if you still prefer to perform xattr indexing, for
|
|
example if the local file update pattern makes it of value
|
|
(as in general, there is a risk for pure extended attributes
|
|
updates without file modification to go undetected).
|
|
|
|
Perform a full index reset after changing the value of this
|
|
parameter.
|
|
|
|
noxattrfields
|
|
|
|
Recoll versions 1.19 and later automatically translate file
|
|
extended attributes into document fields (to be processed
|
|
according to the parameters from the fields file). Setting this
|
|
variable to 1 will disable the behaviour.
|
|
|
|
metadatacmds
|
|
|
|
This allows executing external commands for each file and storing
|
|
the output in Recoll document fields. This could be used for
|
|
example to index external tag data. The value is a list of field
|
|
names and commands, don't forget an initial semi-colon. Example:
|
|
|
|
[/some/area/of/the/fs]
|
|
metadatacmds = ; tags = tmsu tags %f; otherfield = somecmd -xx %f
|
|
|
|
|
|
As a specially disgusting hack brought by Recoll 1.19.7, if a
|
|
"field name" begins with rclmulti, the data returned by the
|
|
command is expected to contain multiple field values, in
|
|
configuration file format. This allows setting several fields by
|
|
executing a single command. Example:
|
|
|
|
metadatacmds = ; rclmulti1 = somecmd %f
|
|
|
|
|
|
If somecmd returns data in the form of:
|
|
|
|
field1 = value1
|
|
field2 = value for field2
|
|
|
|
|
|
field1 and field2 will be set inside the document metadata.
|
|
|
|
5.4.2.3. Parameters affecting where and how we store things:
|
|
|
|
dbdir
|
|
|
|
The name of the Xapian data directory. It will be created if
|
|
needed when the index is initialized. If this is not an absolute
|
|
path, it will be interpreted relative to the configuration
|
|
directory. The value can have embedded spaces but starting or
|
|
trailing spaces will be trimmed. You cannot use quotes here.
|
|
|
|
idxstatusfile
|
|
|
|
The name of the scratch file where the indexer process updates its
|
|
status. Default: idxstatus.txt inside the configuration directory.
|
|
|
|
maxfsoccuppc
|
|
|
|
Maximum file system occupation before we stop indexing. The value
|
|
is a percentage, corresponding to what the "Capacity" df output
|
|
column shows. The default value is 0, meaning no checking.
|
|
|
|
mboxcachedir
|
|
|
|
The directory where mbox message offsets cache files are held.
|
|
This is normally $RECOLL_CONFDIR/mboxcache, but it may be useful
|
|
to share a directory between different configurations.
|
|
|
|
mboxcacheminmbs
|
|
|
|
The minimum mbox file size over which we cache the offsets. There
|
|
is really no sense in caching offsets for small files. The default
|
|
is 5 MB.
|
|
|
|
webcachedir
|
|
|
|
This is only used by the web browser plugin indexing code, and
|
|
defines where the cache for visited pages will live. Default:
|
|
$RECOLL_CONFDIR/webcache
|
|
|
|
webcachemaxmbs
|
|
|
|
This is only used by the web browser plugin indexing code, and
|
|
defines the maximum size for the web page cache. Default: 40 MB.
|
|
Quite unfortunately, this is only taken into account when creating
|
|
the cache file. You need to delete the file for a change to be
|
|
taken into account.
|
|
|
|
idxflushmb
|
|
|
|
Threshold (megabytes of new text data) where we flush from memory
|
|
to disk index. Setting this can help control memory usage. A value
|
|
of 0 means no explicit flushing, letting Xapian use its own
|
|
default, which is flushing every 10000 (or XAPIAN_FLUSH_THRESHOLD)
|
|
documents, which gives little memory usage control, as memory
|
|
usage also depends on average document size. The default value is
|
|
10, and it is probably a bit low. If your system usually has free
|
|
memory, you can try higher values between 20 and 80. In my
|
|
experience, values beyond 100 are always counterproductive.
|
|
|
|
5.4.2.4. Parameters affecting multithread processing
|
|
|
|
The Recoll indexing process recollindex can use multiple threads to speed
|
|
up indexing on multiprocessor systems. The work done to index files is
|
|
divided in several stages and some of the stages can be executed by
|
|
multiple threads. The stages are:
|
|
|
|
1. File system walking: this is always performed by the main thread.
|
|
2. File conversion and data extraction.
|
|
3. Text processing (splitting, stemming, etc.)
|
|
4. Xapian index update.
|
|
|
|
You can also read a longer document about the transformation of Recoll
|
|
indexing to multithreading.
|
|
|
|
The threads configuration is controlled by two configuration file
|
|
parameters.
|
|
|
|
thrQSizes
|
|
|
|
This variable defines the job input queues configuration. There
|
|
are three possible queues for stages 2, 3 and 4, and this
|
|
parameter should give the queue depth for each stage (three
|
|
integer values). If a value of -1 is used for a given stage, no
|
|
queue is used, and the thread will go on performing the next
|
|
stage. In practise, deep queues have not been shown to increase
|
|
performance. A value of 0 for the first queue tells Recoll to
|
|
perform autoconfiguration (no need for the two other values in
|
|
this case) - this is the default configuration.
|
|
|
|
thrTCounts
|
|
|
|
This defines the number of threads used for each stage. If a value
|
|
of -1 is used for one of the queue depths, the corresponding
|
|
thread count is ignored. It makes no sense to use a value other
|
|
than 1 for the last stage because updating the Xapian index is
|
|
necessarily single-threaded (and protected by a mutex).
|
|
|
|
The following example would use three queues (of depth 2), and 4 threads
|
|
for converting source documents, 2 for processing their text, and one to
|
|
update the index. This was tested to be the best configuration on the test
|
|
system (quadri-processor with multiple disks).
|
|
|
|
thrQSizes = 2 2 2
|
|
thrTCounts = 4 2 1
|
|
|
|
The following example would use a single queue, and the complete
|
|
processing for each document would be performed by a single thread
|
|
(several documents will still be processed in parallel in most cases). The
|
|
threads will use mutual exclusion when entering the index update stage. In
|
|
practise the performance would be close to the precedent case in general,
|
|
but worse in certain cases (e.g. a Zip archive would be performed purely
|
|
sequentially), so the previous approach is preferred. YMMV... The 2 last
|
|
values for thrTCounts are ignored.
|
|
|
|
thrQSizes = 2 -1 -1
|
|
thrTCounts = 6 1 1
|
|
|
|
The following example would disable multithreading. Indexing will be
|
|
performed by a single thread.
|
|
|
|
thrQSizes = -1 -1 -1
|
|
|
|
5.4.2.5. Miscellaneous parameters:
|
|
|
|
autodiacsens
|
|
|
|
IF the index is not stripped, decide if we automatically trigger
|
|
diacritics sensitivity if the search term has accented characters
|
|
(not in unac_except_trans). Else you need to use the query
|
|
language and the D modifier to specify diacritics sensitivity.
|
|
Default is no.
|
|
|
|
autocasesens
|
|
|
|
IF the index is not stripped, decide if we automatically trigger
|
|
character case sensitivity if the search term has upper-case
|
|
characters in any but the first position. Else you need to use the
|
|
query language and the C modifier to specify character-case
|
|
sensitivity. Default is yes.
|
|
|
|
loglevel,daemloglevel
|
|
|
|
Verbosity level for recoll and recollindex. A value of 4 lists
|
|
quite a lot of debug/information messages. 2 only lists errors.
|
|
The daemversion is specific to the indexing monitor daemon.
|
|
|
|
logfilename, daemlogfilename
|
|
|
|
Where the messages should go. 'stderr' can be used as a special
|
|
value, and is the default. The daemversion is specific to the
|
|
indexing monitor daemon.
|
|
|
|
checkneedretryindexscript
|
|
|
|
This defines the name for a command executed by recollindex when
|
|
starting indexing. If the exit status of the command is 0,
|
|
recollindex retries to index all files which previously could not
|
|
be indexed because of data extraction errors. The default value is
|
|
a script which checks if any of the common bin directories have
|
|
changed (indicating that a helper program may have been
|
|
installed).
|
|
|
|
mondelaypatterns
|
|
|
|
This allows specify wildcard path patterns (processed with
|
|
fnmatch(3) with 0 flag), to match files which change too often and
|
|
for which a delay should be observed before re-indexing. This is a
|
|
space-separated list, each entry being a pattern and a time in
|
|
seconds, separated by a colon. You can use double quotes if a path
|
|
entry contains white space. Example:
|
|
|
|
mondelaypatterns = *.log:20 "this one has spaces*:10"
|
|
|
|
|
|
monixinterval
|
|
|
|
Minimum interval (seconds) for processing the indexing queue. The
|
|
real time monitor does not process each event when it comes in,
|
|
but will wait this time for the queue to accumulate to diminish
|
|
overhead and in order to aggregate multiple events to the same
|
|
file. Default 30 S.
|
|
|
|
monauxinterval
|
|
|
|
Period (in seconds) at which the real time monitor will regenerate
|
|
the auxiliary databases (spelling, stemming) if needed. The
|
|
default is one hour.
|
|
|
|
monioniceclass, monioniceclassdata
|
|
|
|
These allow defining the ionice class and data used by the indexer
|
|
(default class 3, no data).
|
|
|
|
filtermaxseconds
|
|
|
|
Maximum handler execution time, after which it is aborted. Some
|
|
postscript programs just loop...
|
|
|
|
filtermaxmbytes
|
|
|
|
Recoll 1.20.7 and later. Maximum handler memory utilisation. This
|
|
uses setrlimit(RLIMIT_AS) on most systems (total virtual memory
|
|
space size limit). Some programs may start with 500 MBytes of
|
|
mapped shared libraries, so take this into account when choosing a
|
|
value. The default is a liberal 2000MB.
|
|
|
|
filtersdir
|
|
|
|
A directory to search for the external input handler scripts used
|
|
to index some types of files. The value should not be changed,
|
|
except if you want to modify one of the default scripts. The value
|
|
can be redefined for any sub-directory.
|
|
|
|
iconsdir
|
|
|
|
The name of the directory where recoll result list icons are
|
|
stored. You can change this if you want different images.
|
|
|
|
idxabsmlen
|
|
|
|
Recoll stores an abstract for each indexed file inside the
|
|
database. The text can come from an actual 'abstract' section in
|
|
the document or will just be the beginning of the document. It is
|
|
stored in the index so that it can be displayed inside the result
|
|
lists without decoding the original file. The idxabsmlen parameter
|
|
defines the size of the stored abstract. The default value is 250
|
|
bytes. The search interface gives you the choice to display this
|
|
stored text or a synthetic abstract built by extracting text
|
|
around the search terms. If you always prefer the synthetic
|
|
abstract, you can reduce this value and save a little space.
|
|
|
|
idxmetastoredlen
|
|
|
|
Maximum stored length for metadata fields. This does not affect
|
|
indexing (the whole field is processed anyway), just the amount of
|
|
data stored in the index for the purpose of displaying fields
|
|
inside result lists or previews. The default value is 150 bytes
|
|
which may be too low if you have custom fields.
|
|
|
|
aspellLanguage
|
|
|
|
Language definitions to use when creating the aspell dictionary.
|
|
The value must match a set of aspell language definition files.
|
|
You can type "aspell config" to see where these are installed
|
|
(look for data-dir). The default if the variable is not set is to
|
|
use your desktop national language environment to guess the value.
|
|
|
|
noaspell
|
|
|
|
If this is set, the aspell dictionary generation is turned off.
|
|
Useful for cases where you don't need the functionality or when it
|
|
is unusable because aspell crashes during dictionary generation.
|
|
|
|
mhmboxquirks
|
|
|
|
This allows defining location-related quirks for the mailbox
|
|
handler. Currently only the tbird flag is defined, and it should
|
|
be set for directories which hold Thunderbird data, as their
|
|
folder format is weird.
|
|
|
|
5.4.3. The fields file
|
|
|
|
This file contains information about dynamic fields handling in Recoll.
|
|
Some very basic fields have hard-wired behaviour, and, mostly, you should
|
|
not change the original data inside the fields file. But you can create
|
|
custom fields fitting your data and handle them just like they were native
|
|
ones.
|
|
|
|
The fields file has several sections, which each define an aspect of
|
|
fields processing. Quite often, you'll have to modify several sections to
|
|
obtain the desired behaviour.
|
|
|
|
We will only give a short description here, you should refer to the
|
|
comments inside the default file for more detailed information.
|
|
|
|
Field names should be lowercase alphabetic ASCII.
|
|
|
|
[prefixes]
|
|
|
|
A field becomes indexed (searchable) by having a prefix defined in
|
|
this section.
|
|
|
|
[stored]
|
|
|
|
A field becomes stored (displayable inside results) by having its
|
|
name listed in this section (typically with an empty value).
|
|
|
|
[aliases]
|
|
|
|
This section defines lists of synonyms for the canonical names
|
|
used inside the [prefixes] and [stored] sections
|
|
|
|
[queryaliases]
|
|
|
|
This section also defines aliases for the canonic field names,
|
|
with the difference that the substitution will only be used at
|
|
query time, avoiding any possibility that the value would pick-up
|
|
random metadata from documents.
|
|
|
|
handler-specific sections
|
|
|
|
Some input handlers may need specific configuration for handling
|
|
fields. Only the email message handler currently has such a
|
|
section (named [mail]). It allows indexing arbitrary email headers
|
|
in addition to the ones indexed by default. Other such sections
|
|
may appear in the future.
|
|
|
|
Here follows a small example of a personal fields file. This would extract
|
|
a specific email header and use it as a searchable field, with data
|
|
displayable inside result lists. (Side note: as the email handler does no
|
|
decoding on the values, only plain ascii headers can be indexed, and only
|
|
the first occurrence will be used for headers that occur several times).
|
|
|
|
[prefixes]
|
|
# Index mailmytag contents (with the given prefix)
|
|
mailmytag = XMTAG
|
|
|
|
[stored]
|
|
# Store mailmytag inside the document data record (so that it can be
|
|
# displayed - as %(mailmytag) - in result lists).
|
|
mailmytag =
|
|
|
|
[queryaliases]
|
|
filename = fn
|
|
containerfilename = cfn
|
|
|
|
[mail]
|
|
# Extract the X-My-Tag mail header, and use it internally with the
|
|
# mailmytag field name
|
|
x-my-tag = mailmytag
|
|
|
|
5.4.3.1. Extended attributes in the fields file
|
|
|
|
Recoll versions 1.19 and later process user extended file attributes as
|
|
documents fields by default.
|
|
|
|
Attributes are processed as fields of the same name, after removing the
|
|
user prefix on Linux.
|
|
|
|
The [xattrtofields] section of the fields file allows specifying
|
|
translations from extended attributes names to Recoll field names. An
|
|
empty translation disables use of the corresponding attribute data.
|
|
|
|
5.4.4. The mimemap file
|
|
|
|
mimemap specifies the file name extension to MIME type mappings.
|
|
|
|
For file names without an extension, or with an unknown one, the system's
|
|
file -i command will be executed to determine the MIME type (this can be
|
|
switched off inside the main configuration file).
|
|
|
|
The mappings can be specified on a per-subtree basis, which may be useful
|
|
in some cases. Example: gaim logs have a .txt extension but should be
|
|
handled specially, which is possible because they are usually all located
|
|
in one place.
|
|
|
|
The recoll_noindex mimemap variable has been moved to recoll.conf and
|
|
renamed to noContentSuffixes, while keeping the same function, as of
|
|
Recoll version 1.21. For older Recoll versions, see the documentation for
|
|
noContentSuffixes but use recoll_noindex in mimemap.
|
|
|
|
5.4.5. The mimeconf file
|
|
|
|
mimeconf specifies how the different MIME types are handled for indexing,
|
|
and which icons are displayed in the recoll result lists.
|
|
|
|
Changing the parameters in the [index] section is probably not a good idea
|
|
except if you are a Recoll developer.
|
|
|
|
The [icons] section allows you to change the icons which are displayed by
|
|
recoll in the result lists (the values are the basenames of the png images
|
|
inside the iconsdir directory (specified in recoll.conf).
|
|
|
|
5.4.6. The mimeview file
|
|
|
|
mimeview specifies which programs are started when you click on an Open
|
|
link in a result list. Ie: HTML is normally displayed using firefox, but
|
|
you may prefer Konqueror, your openoffice.org program might be named
|
|
oofice instead of openoffice etc.
|
|
|
|
Changes to this file can be done by direct editing, or through the recoll
|
|
GUI preferences dialog.
|
|
|
|
If Use desktop preferences to choose document editor is checked in the
|
|
Recoll GUI preferences, all mimeview entries will be ignored except the
|
|
one labelled application/x-all (which is set to use xdg-open by default).
|
|
|
|
In this case, the xallexcepts top level variable defines a list of MIME
|
|
type exceptions which will be processed according to the local entries
|
|
instead of being passed to the desktop. This is so that specific Recoll
|
|
options such as a page number or a search string can be passed to
|
|
applications that support them, such as the evince viewer.
|
|
|
|
As for the other configuration files, the normal usage is to have a
|
|
mimeview inside your own configuration directory, with just the
|
|
non-default entries, which will override those from the central
|
|
configuration file.
|
|
|
|
All viewer definition entries must be placed under a [view] section.
|
|
|
|
The keys in the file are normally MIME types. You can add an application
|
|
tag to specialize the choice for an area of the filesystem (using a
|
|
localfields specification in mimeconf). The syntax for the key is
|
|
mimetype|tag
|
|
|
|
The nouncompforviewmts entry, (placed at the top level, outside of the
|
|
[view] section), holds a list of MIME types that should not be
|
|
uncompressed before starting the viewer (if they are found compressed, ie:
|
|
mydoc.doc.gz).
|
|
|
|
The right side of each assignment holds a command to be executed for
|
|
opening the file. The following substitutions are performed:
|
|
|
|
o %D. Document date
|
|
|
|
o %f. File name. This may be the name of a temporary file if it was
|
|
necessary to create one (ie: to extract a subdocument from a
|
|
container).
|
|
|
|
o %i. Internal path, for subdocuments of containers. The format depends
|
|
on the container type. If this appears in the command line, Recoll
|
|
will not create a temporary file to extract the subdocument, expecting
|
|
the called application (possibly a script) to be able to handle it.
|
|
|
|
o %M. MIME type
|
|
|
|
o %p. Page index. Only significant for a subset of document types,
|
|
currently only PDF, Postscript and DVI files. Can be used to start the
|
|
editor at the right page for a match or snippet.
|
|
|
|
o %s. Search term. The value will only be set for documents with indexed
|
|
page numbers (ie: PDF). The value will be one of the matched search
|
|
terms. It would allow pre-setting the value in the "Find" entry inside
|
|
Evince for example, for easy highlighting of the term.
|
|
|
|
o %u. Url.
|
|
|
|
In addition to the predefined values above, all strings like %(fieldname)
|
|
will be replaced by the value of the field named fieldname for the
|
|
document. This could be used in combination with field customisation to
|
|
help with opening the document.
|
|
|
|
5.4.7. The ptrans file
|
|
|
|
ptrans specifies query-time path translations. These can be useful in
|
|
multiple cases.
|
|
|
|
The file has a section for any index which needs translations, either the
|
|
main one or additional query indexes. The sections are named with the
|
|
Xapian index directory names. No slash character should exist at the end
|
|
of the paths (all comparisons are textual). An example should make things
|
|
sufficiently clear
|
|
|
|
[/home/me/.recoll/xapiandb]
|
|
/this/directory/moved = /to/this/place
|
|
|
|
[/path/to/additional/xapiandb]
|
|
/server/volume1/docdir = /net/server/volume1/docdir
|
|
/server/volume2/docdir = /net/server/volume2/docdir
|
|
|
|
|
|
5.4.8. Examples of configuration adjustments
|
|
|
|
5.4.8.1. Adding an external viewer for an non-indexed type
|
|
|
|
Imagine that you have some kind of file which does not have indexable
|
|
content, but for which you would like to have a functional Open link in
|
|
the result list (when found by file name). The file names end in .blob and
|
|
can be displayed by application blobviewer.
|
|
|
|
You need two entries in the configuration files for this to work:
|
|
|
|
o In $RECOLL_CONFDIR/mimemap (typically ~/.recoll/mimemap), add the
|
|
following line:
|
|
|
|
.blob = application/x-blobapp
|
|
|
|
Note that the MIME type is made up here, and you could call it
|
|
diesel/oil just the same.
|
|
|
|
o In $RECOLL_CONFDIR/mimeview under the [view] section, add:
|
|
|
|
application/x-blobapp = blobviewer %f
|
|
|
|
We are supposing that blobviewer wants a file name parameter here, you
|
|
would use %u if it liked URLs better.
|
|
|
|
If you just wanted to change the application used by Recoll to display a
|
|
MIME type which it already knows, you would just need to edit mimeview.
|
|
The entries you add in your personal file override those in the central
|
|
configuration, which you do not need to alter. mimeview can also be
|
|
modified from the Gui.
|
|
|
|
5.4.8.2. Adding indexing support for a new file type
|
|
|
|
Let us now imagine that the above .blob files actually contain indexable
|
|
text and that you know how to extract it with a command line program.
|
|
Getting Recoll to index the files is easy. You need to perform the above
|
|
alteration, and also to add data to the mimeconf file (typically in
|
|
~/.recoll/mimeconf):
|
|
|
|
o Under the [index] section, add the following line (more about the
|
|
rclblob indexing script later):
|
|
|
|
application/x-blobapp = exec rclblob
|
|
|
|
o Under the [icons] section, you should choose an icon to be displayed
|
|
for the files inside the result lists. Icons are normally 64x64 pixels
|
|
PNG files which live in /usr/[local/]share/recoll/images.
|
|
|
|
o Under the [categories] section, you should add the MIME type where it
|
|
makes sense (you can also create a category). Categories may be used
|
|
for filtering in advanced search.
|
|
|
|
The rclblob handler should be an executable program or script which exists
|
|
inside /usr/[local/]share/recoll/filters. It will be given a file name as
|
|
argument and should output the text or html contents on the standard
|
|
output.
|
|
|
|
The filter programming section describes in more detail how to write an
|
|
input handler.
|
|
|
|
----------------------------------------------------------------------
|
|
|
|
Prev Up
|
|
5.3. Building from source Home
|