540 lines
22 KiB
Plaintext
540 lines
22 KiB
Plaintext
|
|
More documentation can be found in the doc/ directory or at http://www.recoll.org
|
|
|
|
|
|
Link: HOME
|
|
Link: PREVIOUS
|
|
Link: NEXT
|
|
|
|
Recoll user manual
|
|
Prev Next
|
|
|
|
--------------------------------------------------------------------------
|
|
|
|
Chapter 4. Installation
|
|
|
|
Table of Contents
|
|
|
|
4.1. Installing a prebuilt copy
|
|
|
|
4.2. Supporting packages
|
|
|
|
4.3. Building from source
|
|
|
|
4.4. Configuration overview
|
|
|
|
4.1. Installing a prebuilt copy
|
|
|
|
Recoll binary installations are always linked statically to the xapian
|
|
libraries, and have no other dependencies. You will only have to check or
|
|
install supporting applications for the file types that you want to index
|
|
beyond text, HTML and mail files.
|
|
|
|
4.1.1. Installing through a package system
|
|
|
|
If you use a BSD-type port system or a prebuilt package (RPM or other),
|
|
just follow the usual procedure, and maybe have a look at the
|
|
configuration section (but this may not be necessary for a quick test with
|
|
default parameters).
|
|
|
|
4.1.2. Installing a prebuilt Recoll
|
|
|
|
The unpackaged binary versions are just compressed tar files of a build
|
|
tree, where only the useful parts were kept (executables and sample
|
|
configuration).
|
|
|
|
The executable binary files are built with a static link to libxapian and
|
|
libiconv, to make installation easier (no dependencies). However, this
|
|
also means that you cannot change the versions which are used.
|
|
|
|
After extracting the tar file, you can proceed with installation as if you
|
|
had built the package from source (that is, just type make install). The
|
|
binary trees are built for installation to /usr/local.
|
|
|
|
You may then need to install external applications to process some file
|
|
types that you want indexed (ie: acrobat, postscript ...). See next
|
|
section.
|
|
|
|
Finally, you may want to have a look at the configuration section.
|
|
|
|
--------------------------------------------------------------------------
|
|
|
|
Prev Home Next
|
|
Customizing the search interface Supporting packages
|
|
Link: HOME
|
|
Link: UP
|
|
Link: PREVIOUS
|
|
Link: NEXT
|
|
|
|
Recoll user manual
|
|
Prev Chapter 4. Installation Next
|
|
|
|
--------------------------------------------------------------------------
|
|
|
|
4.2. Supporting packages
|
|
|
|
Recoll uses external applications to index some file types. You need to
|
|
install them for the file types that you wish to have indexed (these are
|
|
run-time dependencies. None is needed for building Recoll):
|
|
|
|
* Openoffice: supported natively, but needs the unzip command to be
|
|
installed.
|
|
|
|
* PDF: pdftotext is part of the Xpdf package.
|
|
|
|
* Postscript: pstotext.
|
|
|
|
* MS Word: antiword.
|
|
|
|
* MS Excel and PowerPoint: catdoc.
|
|
|
|
* RTF: unrtf
|
|
|
|
* dvi: dvips
|
|
|
|
* djvu: DjVuLibre
|
|
|
|
* MP3: Recoll will use the id3info command from the id3lib package to
|
|
extract tag information. Without it, only the file names will be
|
|
indexed.
|
|
|
|
Text, HTML, mail folders Openoffice and Scribus files are processed
|
|
internally. Lyx is used to index Lyx files. Many filters need sed and awk.
|
|
|
|
--------------------------------------------------------------------------
|
|
|
|
Prev Home Next
|
|
Installation Up Building from source
|
|
Link: HOME
|
|
Link: UP
|
|
Link: PREVIOUS
|
|
Link: NEXT
|
|
|
|
Recoll user manual
|
|
Prev Chapter 4. Installation Next
|
|
|
|
--------------------------------------------------------------------------
|
|
|
|
4.3. Building from source
|
|
|
|
4.3.1. Prerequisites
|
|
|
|
At the very least, you will need to download and install the xapian core
|
|
package (Recoll development currently uses version 0.9.5), and the qt
|
|
run-time and development packages (Recoll development currently uses
|
|
version 3.3.5, but any 3.3 version is probably OK).
|
|
|
|
You will most probably be able to find a binary package for qt for your
|
|
system. You may have to compile Xapian but this is not difficult (if you
|
|
are using FreeBSD, there is a port).
|
|
|
|
You may also need libiconv. Recoll currently uses version 1.9 (this should
|
|
not be critical). On Linux systems, the iconv interface is part of libc
|
|
and you should not need to do anything special.
|
|
|
|
4.3.2. Building
|
|
|
|
Recoll has been built on Linux (redhat7.3, mandriva 2005/6, Fedora Core
|
|
3/4/5), FreeBSD and Solaris 8. If you build on another system, I would
|
|
very much welcome patches.
|
|
|
|
Depending on the qt configuration on your system, you may have to set the
|
|
QTDIR and QMAKESPECS variables in your environment:
|
|
|
|
* QTDIR should point to the directory above the one that holds the qt
|
|
include files (ie: if qt.h is /usr/local/qt/include/qt.h, QTDIR should
|
|
be /usr/local/qt).
|
|
|
|
* QMAKESPECS should be set to the name of one of the qt mkspecs
|
|
sub-directories (ie: linux-g++).
|
|
|
|
On many Linux systems, QTDIR is set by the login scripts, and QMAKESPECS
|
|
is not needed because there is a default link in mkspecs/.
|
|
|
|
Configure options: --without-aspell will disable the code for phonetic
|
|
matching of search terms. --with-fam or --with-inotify will enable the
|
|
code for real time indexing. Inotify support is enabled by default on
|
|
recent Linux systems.
|
|
|
|
Normal procedure:
|
|
|
|
cd recoll-xxx
|
|
configure
|
|
make
|
|
(practices usual hardship-repelling invocations)
|
|
|
|
|
|
There little auto-configuration. The configure script will mainly link one
|
|
of the system-specific files in the mk directory to mk/sysconf. If your
|
|
system is not known yet, it will tell you as much, and you may want to
|
|
manually copy and modify one of the existing files (the new file name
|
|
should be the output of uname -s).
|
|
|
|
4.3.3. Installation
|
|
|
|
Either type make install or execute recollinstall prefix, in the root of
|
|
the source tree. This will copy the commands to prefix/bin and the sample
|
|
configuration files, scripts and other shared data to prefix/share/recoll.
|
|
|
|
If the installation prefix given to recollinstall is different from what
|
|
was specified when executing configure, you will have to set the
|
|
RECOLL_DATADIR environment variable to indicate where the shared data is
|
|
to be found.
|
|
|
|
You can then proceed to configuration.
|
|
|
|
--------------------------------------------------------------------------
|
|
|
|
Prev Home Next
|
|
Supporting packages Up Configuration overview
|
|
Link: HOME
|
|
Link: UP
|
|
Link: PREVIOUS
|
|
|
|
Recoll user manual
|
|
Prev Chapter 4. Installation
|
|
|
|
--------------------------------------------------------------------------
|
|
|
|
4.4. Configuration overview
|
|
|
|
Most of the parameters specific to the recoll GUI are set through the
|
|
Preferences menu and stored in the standard QT place ($HOME/.qt/recollrc).
|
|
You probably do not want to edit this by hand.
|
|
|
|
For other options, Recoll uses text configuration files. You will have to
|
|
edit them by hand for now (there is still some hope for a GUI
|
|
configuration tool in the future). The most accurate documentation for the
|
|
configuration parameters is given by comments inside the default files,
|
|
and we will just give a general overview here.
|
|
|
|
There are two sets of configuration files. The system-wide files are kept
|
|
in a directory named like /usr/[local/]share/recoll/examples, they define
|
|
default values for the system. A parallel set of files exists by default
|
|
in the .recoll directory in your home. This directory can be changed with
|
|
the RECOLL_CONFDIR environment variable or the -c option parameter to
|
|
recoll and recollindex.
|
|
|
|
If the .recoll directory does not exist when recoll or recollindex are
|
|
started, it will be created with a set of empty configuration files.
|
|
recoll will give you a chance to edit the configuration file before
|
|
starting indexing. recollindex will proceed immediately. To avoid
|
|
mistakes, the automatic directory creation will only occur for the default
|
|
location, not if -c or RECOLL_CONFDIR were used (in the latter cases, you
|
|
will have to create the directory).
|
|
|
|
All configuration files share the same format. For example, a short
|
|
extract of the main configuration file might look as follows:
|
|
|
|
# Space-separated list of directories to index.
|
|
topdirs = ~/docs /usr/share/doc
|
|
|
|
[~/somedirectory-with-utf8-txt-files]
|
|
defaultcharset = utf-8
|
|
|
|
|
|
There are three kinds of lines:
|
|
|
|
* Comment (starts with #) or empty.
|
|
|
|
* Parameter affectation (name = value).
|
|
|
|
* Section definition ([somedirname]).
|
|
|
|
Section definitions allow redefining some parameters for a directory
|
|
sub-tree. They stay in effect until another section definition, or the end
|
|
of file, is encountered. Some of the parameters used for indexing are
|
|
looked up hierarchically from the current directory location upwards. Not
|
|
all parameters can be meaningfully redefined, this is specified for each
|
|
in the next section.
|
|
|
|
The tilde character (~) is expanded in file names to the name of the
|
|
user's home directory.
|
|
|
|
White space is used for separation inside lists. List elements with
|
|
embedded spaces can be quoted using double-quotes.
|
|
|
|
4.4.1. Main configuration file
|
|
|
|
recoll.conf is the main configuration file. It defines things like what to
|
|
index (top directories and things to ignore), and the default character
|
|
set to use for document types which do not specify it internally.
|
|
|
|
The default configuration will index your home directory. If this is not
|
|
appropriate, start recoll to create a blank configuration, click Cancel,
|
|
and edit the configuration file before restarting the command. This will
|
|
start the initial indexing, which may take some time.
|
|
|
|
Paramers:
|
|
|
|
topdirs
|
|
|
|
Specifies the list of directories or files to index (recursively
|
|
for directories). The indexer will not follow symbolic links
|
|
inside the indexed trees. If an entry in the topdirs list is a
|
|
symbolic link, indexing will not start and will generate an error.
|
|
|
|
dbdir
|
|
|
|
The name of the Xapian data directory. It will be created if
|
|
needed when the index is initialized. If this is not an absolute
|
|
path, it will be interpreted relative to the configuration
|
|
directory. The value can have embedded spaces but starting or
|
|
trailing spaces will be trimmed. You cannot use quotes here.
|
|
|
|
skippedNames
|
|
|
|
A space-separated list of patterns for names of files or
|
|
directories that should be completely ignored. The list defined in
|
|
the default file is:
|
|
|
|
skippedNames = #* bin CVS Cache cache* caughtspam tmp .thumbnails .svn \
|
|
*~ recollrc
|
|
|
|
The list can be redefined for sub-directories, but is only
|
|
actually changed for the top level ones in topdirs.
|
|
|
|
The top-level directories are not affected by this list (that is,
|
|
a directory in topdirs might match and would still be indexed).
|
|
|
|
The list in the default configuration does not exclude hidden
|
|
directories (names beginning with a dot), which means that it may
|
|
index quite a few things that you do not want. On the other hand,
|
|
mail user agents like thunderbird usually store messages in hidden
|
|
directories, and you probably want this indexed. One possible
|
|
solution is to have .* in skippedNames, and add things like
|
|
~/.thunderbird or ~/.evolution in topdirs.
|
|
|
|
skippedPaths and daemSkippedPaths
|
|
|
|
A space-separated list of patterns for paths of files or
|
|
directories that should be skipped. There is no default in the
|
|
sample configuration file, but the code always adds the
|
|
configuration and database directories in there.
|
|
|
|
skippedPaths is used both by batch and real time indexing.
|
|
daemSkippedPaths can be used to specify things that should be
|
|
indexed at startup, but not monitored.
|
|
|
|
Example of use for skipping text files only in a specific
|
|
directory:
|
|
|
|
skippedPaths = ~/somedir/*.txt
|
|
|
|
|
|
loglevel,daemloglevel
|
|
|
|
Verbosity level for recoll and recollindex. A value of 4 lists
|
|
quite a lot of debug/information messages. 2 only lists errors.
|
|
The daemversion is specific to the indexing monitor daemon.
|
|
|
|
logfilename, daemlogfilename
|
|
|
|
Where the messages should go. 'stderr' can be used as a special
|
|
value, and is the default. The daemversion is specific to the
|
|
indexing monitor daemon.
|
|
|
|
filtersdir
|
|
|
|
A directory to search for the external filter scripts used to
|
|
index some types of files. The value should not be changed, except
|
|
if you want to modify one of the default scripts. The value can be
|
|
redefined for any sub-directory.
|
|
|
|
indexstemminglanguages
|
|
|
|
A list of languages for which the stem expansion databases will be
|
|
built. See recollindex(1) for possible values. You can add a stem
|
|
expansion database for a different language by using recollindex
|
|
-s, but it will be deleted during the next indexing. Only
|
|
languages listed in the configuration file are permanent.
|
|
|
|
defaultcharset
|
|
|
|
The name of the character set used for files that do not contain a
|
|
character set definition (ie: plain text files). This can be
|
|
redefined for any sub-directory. If it is not set at all, the
|
|
character set used is the one defined by the nls environment
|
|
(LC_ALL, LC_CTYPE, LANG), or iso8859-1 if nothing is set.
|
|
|
|
guesscharset
|
|
|
|
Decide if we try to guess the character set of files if no
|
|
internal value is available (ie: for plain text files). This does
|
|
not work well in general, and should probably not be used.
|
|
|
|
usesystemfilecommand
|
|
|
|
Decide if we use the file -i system command as a final step for
|
|
determining the mime type for a file (the main procedure uses
|
|
suffix associations as defined in the mimemap file). This can be
|
|
useful for files with suffix-less names, but it will also cause
|
|
the indexing of many bogus "text" files.
|
|
|
|
indexallfilenames
|
|
|
|
Recoll indexes file names in a special section of the database to
|
|
allow specific file names searches using wild cards. This
|
|
parameter decides if file name indexing is performed only for
|
|
files with mime types that would qualify them for full text
|
|
indexing, or for all files inside the selected subtrees,
|
|
independently of mime type.
|
|
|
|
idxabsmlen
|
|
|
|
Recoll stores an abstract for each indexed file inside the
|
|
database. This is so that they can be displayed inside the result
|
|
lists without decoding the original file. This parameter defines
|
|
the size of the stored abstract (which can come from an actual
|
|
section or just be the beginning of the text). The default value
|
|
is 250.
|
|
|
|
iconsdir
|
|
|
|
The name of the directory where recoll result list icons are
|
|
stored. You can change this if you want different images.
|
|
|
|
4.4.2. The mimemap file
|
|
|
|
mimemap specifies the file name extension to mime type mappings.
|
|
|
|
For file names without an extension, or with an unknown one, the system's
|
|
file -i command will be executed to determine the mime type (this can be
|
|
switched off inside the main configuration file).
|
|
|
|
The mappings can be specified on a per-subtree basis, which may be useful
|
|
in some cases. Example: gaim logs have a .txt extension but should be
|
|
handled specially, which is possible because they are usually all located
|
|
in one place.
|
|
|
|
mimemap also has a recoll_noindex variable which is a list of suffixes.
|
|
Matching files will be skipped (which avoids unnecessary decompressions or
|
|
file executions). This is partially redundant with skippedNames in the
|
|
main configuration file, with two differences: it will not affect
|
|
directories, and it cannot be made dependant on the file-system location
|
|
(it is a configuration-wide parameter). You could accomplish with
|
|
skippedNames anything that recoll_noindex does. The latter is used mostly
|
|
for things known to be unindexable by a given Recoll version. Having it
|
|
there avoids cluttering the more user-oriented and locally customized
|
|
skippedNames.
|
|
|
|
4.4.3. The mimeconf file
|
|
|
|
mimeconf specifies how the different mime types are handled for indexing,
|
|
and which icons are displayed in the recoll result lists.
|
|
|
|
Changing the parameters in the [index] section is probably not a good idea
|
|
except if you are a Recoll developer.
|
|
|
|
The [icons] section allows you to change the icons which are displayed by
|
|
recoll in the result lists (the values are the basenames of the png images
|
|
inside the iconsdir directory (specified in recoll.conf).
|
|
|
|
4.4.4. The mimeview file
|
|
|
|
mimeview specifies which programs are started when you click on an Edit
|
|
link in a result list. Ie: HTML is normally displayed using firefox, but
|
|
you may prefer Konqueror, your openoffice.org program might be named
|
|
oofice instead of openoffice etc.
|
|
|
|
Changes to this file can be done by direct editing, or through the recoll
|
|
user preferences dialog.
|
|
|
|
As for the other configuration files, the normal usage is to have a
|
|
mimeview inside your own configuration directory, with just the
|
|
non-default entries, which will override those from the central
|
|
configuration file.
|
|
|
|
Please note that these entries must be placed under a [view] section.
|
|
|
|
If Use desktop preferences to choose document editor is checked in the
|
|
user preferences, all mimeview entries will be ignored except the one
|
|
labelled application/x-all (which is set to use xdg-open by default).
|
|
|
|
4.4.5. Examples of configuration adjustments
|
|
|
|
4.4.5.1. Adding an external viewer for an non-indexed type
|
|
|
|
Imagine that you have some kind of file which does not have indexable
|
|
content, but for which you would like to have a functional Edit link in
|
|
the result list (when found by file name). The file names end in .blob and
|
|
can be displayed by application blobviewer.
|
|
|
|
You need two entries in the configuration files for this to work:
|
|
|
|
* In $RECOLL_CONFDIR/mimemap (typically ~/.recoll/mimemap), add the
|
|
following line:
|
|
|
|
application/x-blobapp = .blob
|
|
|
|
|
|
Note that the mime type is made up here, and you could call it
|
|
diesel/oil just the same.
|
|
|
|
* In $RECOLL_CONFDIR/mimeview under the [view] section:
|
|
|
|
application/x-blobapp = blobviewer %f
|
|
|
|
|
|
We are supposing that blobviewer wants a file name parameter here, you
|
|
would use %u if it liked URLs better.
|
|
|
|
If you just wanted to change the application used by Recoll to display a
|
|
mime type which it already knows, you would just need to edit mimeview.
|
|
The entries you add in your personal file override those in the central
|
|
configuration, which you do not need to alter
|
|
|
|
4.4.5.2. Adding indexing support for a new file type
|
|
|
|
Let us now imagine that the above .blob files actually contain indexable
|
|
text and that you know how to extract it with a command line program.
|
|
Getting Recoll to index the files is easy. You need to perform the above
|
|
alteration, and also to add data to the mimeconf file (typically in
|
|
~/.recoll/mimeconf):
|
|
|
|
* Under the [index] section, add the following line (more about the
|
|
rclblob indexing script later):
|
|
|
|
application/x-blobapp = exec rclblob
|
|
|
|
|
|
* Under the [icons] section, you should choose an icon to be displayed
|
|
for the files inside the result lists. Icons are normally 64x64 pixels
|
|
PNG files which live in /usr/[local/]share/recoll/images.
|
|
|
|
* Under the [categories] section, you should add the mime type where it
|
|
makes sense (you can also create a category). Categories may be used
|
|
for filtering in advanced search.
|
|
|
|
The rclblob filter should be an executable program or script which exists
|
|
inside /usr/[local/]share/recoll/filters. It will be given a file name as
|
|
argument and should output the text contents in html format on the
|
|
standard output.
|
|
|
|
The html could be very minimal like the following example:
|
|
|
|
<html><head>
|
|
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
|
|
</head>
|
|
<body>some text content</body></html>
|
|
|
|
|
|
You should take care to escape some characters inside the text by
|
|
transforming them into appropriate entities. "&" should be transformed
|
|
into "&", "<" should be transformed into "<".
|
|
|
|
The character set needs to be specified in the header. It does not need to
|
|
be UTF-8 (Recoll will take care of translating it), but it must be
|
|
accurate for good results.
|
|
|
|
Recoll will also make use of other header fields if they are present:
|
|
title, description, keywords.
|
|
|
|
The easiest way to write a new filter is probably to start from an
|
|
existing one.
|
|
|
|
--------------------------------------------------------------------------
|
|
|
|
Prev Home
|
|
Building from source Up
|