217 lines
8.2 KiB
Groff
217 lines
8.2 KiB
Groff
.\" $Id: recoll.conf.5,v 1.5 2007-07-13 10:18:49 dockes Exp $ (C) 2005 J.F.Dockes\$
|
|
.TH RECOLL.CONF 5 "8 January 2006"
|
|
.SH NAME
|
|
recoll.conf \- main personal configuration file for Recoll
|
|
.SH DESCRIPTION
|
|
This file defines the indexation configuration for the Recoll full-text search
|
|
system.
|
|
.LP
|
|
The system-wide configuration file is normally located inside
|
|
/usr/[local]/share/recoll/examples. Any parameter set in the common file
|
|
may be overriden by setting it in the personal configuration file, by default:
|
|
.IR $HOME/.recoll/recoll.conf
|
|
.LP
|
|
Please note while we try to keep this manual page reasonably up to date, it
|
|
will frequently lag the current state of the software. The best source of
|
|
information about the configuration are the comments in the configuration
|
|
file.
|
|
|
|
.LP
|
|
A short extract of the file might look as follows:
|
|
.IP
|
|
.nf
|
|
|
|
# Space-separated list of directories to index.
|
|
topdirs = ~/docs /usr/share/doc
|
|
|
|
[~/somedirectory-with-utf8-txt-files]
|
|
defaultcharset = utf-8
|
|
|
|
.fi
|
|
.LP
|
|
There are three kinds of lines:
|
|
.RS
|
|
.IP \(bu
|
|
Comment or empty
|
|
.IP \(bu
|
|
Parameter affectation
|
|
.IP \(bu
|
|
Section definition
|
|
.RE
|
|
.LP
|
|
Empty lines or lines beginning with # are ignored.
|
|
.LP
|
|
Affectation lines are in the form 'name = value'.
|
|
.LP
|
|
Section lines allow redefining a parameter for a directory subtree. Some of
|
|
the parameters used for indexaction are looked up hierarchically from the
|
|
more to the less specific. Not all parameters can be meaningfully
|
|
redefined, this is specified for each in the next section.
|
|
.LP
|
|
The tilde character (~) is expanded in file names to the name of the user's
|
|
home directory.
|
|
.LP
|
|
Where values are lists, white space is used for separation, and elements with
|
|
embedded spaces can be quoted with double-quotes.
|
|
.SH OPTIONS
|
|
.TP
|
|
.BI "topdirs = " directories
|
|
Specifies the list of directories to index (recursively).
|
|
.TP
|
|
.BI "dbdir = " directory
|
|
The name of the Xapian database directory. It will be created if needed
|
|
when the database is initialized. If this is not an absolute pathname, it
|
|
will be taken relative to the configuration directory.
|
|
.TP
|
|
.BI "skippedNames = " patterns
|
|
A space-separated list of patterns for names of files or directories that
|
|
should be completely ignored. The list defined in the default file is:
|
|
.sp
|
|
.nf
|
|
*~ #* bin CVS Cache caughtspam tmp
|
|
|
|
.fi
|
|
The list can be redefined for subdirectories, but is only actually changed
|
|
for the top level ones in
|
|
.I topdirs
|
|
.TP
|
|
.BI "skippedPaths = " patterns
|
|
A space-separated list of patterns for paths the indexer should not descend
|
|
into. Together with topdirs, this allows pruning the indexed tree to one's
|
|
content. daemSkippedPaths can be used to define a specific value for the
|
|
real time indexing monitor.
|
|
.TP
|
|
.BI "followLinks = " boolean
|
|
Specifies if the indexer should follow
|
|
symbolic links while walking the file tree. The default is
|
|
to ignore symbolic links to avoid multiple indexing of
|
|
linked files. No effort is made to avoid duplication when
|
|
this option is set to true. This option can be set
|
|
individually for each of the
|
|
.I topdirs
|
|
members by using sections. It can not be changed below the
|
|
.I topdirs
|
|
level.
|
|
.TP
|
|
.BI "loglevel = " value
|
|
Verbosity level for recoll and recollindex. A value of 4 lists quite a lot of
|
|
debug/information messages. 3 lists only errors.
|
|
.B daemloglevel
|
|
can be used to specify a different value for the real-time indexing daemon.
|
|
.TP
|
|
.BI "logfilename = " file
|
|
Where should the messages go. 'stderr' can be used as a special value.
|
|
.B daemlogfilename
|
|
can be used to specify a different value for the real-time indexing daemon.
|
|
.TP
|
|
.BI "indexstemminglanguages = " languages
|
|
A list of languages for which the stem expansion databases will be
|
|
built. See recollindex(1) for possible values.
|
|
.TP
|
|
.BI "defaultcharset = " charset
|
|
The name of the character set used for files that do not contain a
|
|
character set definition (ie: plain text files). This can be redefined for
|
|
any subdirectory.
|
|
.TP
|
|
.BI "maxfsoccuppc = " percentnumber
|
|
Maximum file system occupation before we
|
|
stop indexing. The value is a percentage, corresponding to
|
|
what the "Capacity" df output column shows. The default
|
|
value is 0, meaning no checking.
|
|
.TP
|
|
.BI "idxflushmb = " megabytes
|
|
Threshold (megabytes of new text data)
|
|
where we flush from memory to disk index. Setting this can
|
|
help control memory usage. A value of 0 means no explicit
|
|
flushing, letting Xapian use its own default, which is
|
|
flushing every 10000 documents (memory usage depends on
|
|
average document size). The default value is 10.
|
|
.TP
|
|
.BI "filtersdir = " directory
|
|
A directory to search for the external filter scripts used to index some
|
|
types of files. The value should not be changed, except if you want to
|
|
modify one of the default scripts. The value can be redefined for any
|
|
subdirectory.
|
|
.TP
|
|
.BI "iconsdir = " directory
|
|
The name of the directory where
|
|
.B recoll
|
|
result list icons are stored. You can change this if you want different
|
|
images.
|
|
.TP
|
|
.BI "guesscharset = " boolean
|
|
Try to guess the character set of files if no internal value is available
|
|
(ie: for plain text files). This does not work well in general, and should
|
|
probably not be used.
|
|
.TP
|
|
.BI "usesystemfilecommand = " boolean
|
|
Decide if we use the
|
|
.B "file -i"
|
|
system command as a final step for determining the mime type for a file
|
|
(the main procedure uses suffix associations as defined in the
|
|
.B mimemap
|
|
file). This can be useful for files with suffixless names, but it will
|
|
also cause the indexation of many bogus "text" files.
|
|
.TP
|
|
.BI "indexedmimetypes = " list
|
|
Recoll normally indexes any file which it knows how to read. This list lets
|
|
you restrict the indexed mime types to what you specify. If the variable is
|
|
unspecified or the list empty (the default), all supported types are
|
|
processed.
|
|
.TP
|
|
.BI "compressedfilemaxkbs = " value
|
|
Size limit for compressed (.gz or .bz2) files. These need to be
|
|
decompressed in a temporary directory for identification, which can be very
|
|
wasteful if 'uninteresting' big compressed files are present. Negative
|
|
means no limit, 0 means no processing of any compressed file. Defaults
|
|
to -1.
|
|
.TP
|
|
.BI "indexallfilenames = " boolean
|
|
Recoll indexes file names into a special section of the database to allow
|
|
specific file names searches using wild cards. This parameter decides if
|
|
file name indexing is performed only for files with mime types that would
|
|
qualify them for full text indexation, or for all files inside
|
|
the selected subtrees, independant of mime type.
|
|
.TP
|
|
.BI "idxabsmlen = " value
|
|
Recoll stores an abstract for each indexed file inside the database. The
|
|
text can come from an actual 'abstract' section in the document or will
|
|
just be the beginning of the document. It is stored in the index so that it
|
|
can be displayed inside the result lists without decoding the original
|
|
file. The
|
|
.I idxabsmlen
|
|
parameter defines the size of the stored abstract. The default value is 250
|
|
bytes. The search interface gives you the choice to display this stored
|
|
text or a synthetic abstract built by extracting text around the search
|
|
terms. If you always prefer the synthetic abstract, you can reduce this
|
|
value and save a little space.
|
|
.TP
|
|
.BI "aspellLanguage = " lang
|
|
Language definitions to use when creating the aspell dictionary. The value
|
|
must match a set of aspell language definition files. You can type "aspell
|
|
config" to see where these are installed (look for data-dir). The default
|
|
if the variable is not set is to use your desktop national language
|
|
environment to guess the value.
|
|
.TP
|
|
.BI "noaspell = " boolean
|
|
If this is set, the aspell dictionary generation is turned off. Useful for
|
|
cases where you don't need the functionality or when it is unusable because
|
|
aspell crashes during dictionary generation.
|
|
.TP
|
|
.BI "nocjk = " boolean
|
|
If this set to true, specific east asian (Chinese Korean Japanese)
|
|
characters/word splitting is turned off. This will save a small amount of
|
|
cpu if you have no CJK documents. If your document base does include such
|
|
text but you are not interested in searching it, setting
|
|
.I nocjk
|
|
may be a significant time and space saver.
|
|
.TP
|
|
.BI "cjkngramlen = " value
|
|
This lets you adjust the size of n-grams used for indexing CJK text. The
|
|
default value of 2 is probably appropriate in most cases. A value of 3
|
|
would allow more precision and efficiency on longer words, but the index
|
|
will be approximately twice as large.
|
|
.SH SEE ALSO
|
|
.PP
|
|
recollindex(1) recoll(1)
|