recoll.conf man page

This commit is contained in:
Jean-Francois Dockes 2021-01-13 09:22:25 +01:00
parent 51761b7aa6
commit ce3088e635
2 changed files with 92 additions and 26 deletions

View File

@ -92,6 +92,11 @@ list.
List of name endings to add to the default skippedNames List of name endings to add to the default skippedNames
list. list.
.TP .TP
.BI "onlyNames = "string
Regular file name filter patterns If this is set, only the file names not in skippedNames and
matching one of the patterns will be considered for indexing. Can be
redefined per subtree. Does not apply to directories.
.TP
.BI "noContentSuffixes = "string .BI "noContentSuffixes = "string
List of name endings (not necessarily dot-separated suffixes) for List of name endings (not necessarily dot-separated suffixes) for
which we don't try MIME type identification, and don't uncompress or which we don't try MIME type identification, and don't uncompress or
@ -292,14 +297,19 @@ will reduce the index size. This can only be set for a whole index, not
for a subtree. for a subtree.
.TP .TP
.BI "dehyphenate = "bool .BI "dehyphenate = "bool
Determines if we index 'coworker' also when the input is 'co-worker'. This is new Determines if we index 'coworker'
also when the input is 'co-worker'. This is new
in version 1.22, and on by default. Setting the variable to off allows in version 1.22, and on by default. Setting the variable to off allows
restoring the previous behaviour. restoring the previous behaviour.
.TP .TP
.BI "backslashasletter = "bool .BI "backslashasletter = "bool
Process backslash as normal letter This may make sense for people wanting to index TeX commands as Process backslash as normal letter. This may make sense for people wanting to index TeX commands as
such but is not of much general use. such but is not of much general use.
.TP .TP
.BI "underscoreasletter = "bool
Process underscore as normal letter. This makes sense in so many cases that one wonders if it should
not be the default.
.TP
.BI "maxtermlength = "int .BI "maxtermlength = "int
Maximum term length. Words longer than this will be discarded. Maximum term length. Words longer than this will be discarded.
The default is 40 and used to be hard-coded, but it can now be The default is 40 and used to be hard-coded, but it can now be
@ -323,7 +333,8 @@ as large.
.BI "indexstemminglanguages = "string .BI "indexstemminglanguages = "string
Languages for which to create stemming expansion Languages for which to create stemming expansion
data. Stemmer names can be found by executing 'recollindex data. Stemmer names can be found by executing 'recollindex
-l', or this can also be set from a list in the GUI. -l', or this can also be set from a list in the GUI. The values are full
language names, e.g. english, french...
.TP .TP
.BI "defaultcharset = "string .BI "defaultcharset = "string
Default character Default character
@ -348,16 +359,24 @@ lowercase and upper-case versions of a character should be specified, as
appartenance to the list will turn-off both standard accent and case appartenance to the list will turn-off both standard accent and case
processing. The value is global and affects both indexing and querying. processing. The value is global and affects both indexing and querying.
Examples: Examples:
.br
Swedish: Swedish:
.br
unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe æae Æae ffff fifi flfl åå Åå unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe æae Æae ffff fifi flfl åå Åå
. German: .br
German:
.br
unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe æae Æae ffff fifi flfl unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe æae Æae ffff fifi flfl
In French, you probably want to decompose oe and ae and nobody would type .br
French: you probably want to decompose oe and ae and nobody would type
a German ß a German ß
.br
unac_except_trans = ßss œoe Œoe æae Æae ffff fifi flfl unac_except_trans = ßss œoe Œoe æae Æae ffff fifi flfl
. The default for all until someone protests follows. These decompositions .br
The default for all until someone protests follows. These decompositions
are not performed by unac, but it is unlikely that someone would type the are not performed by unac, but it is unlikely that someone would type the
composed forms in a search. composed forms in a search.
.br
unac_except_trans = ßss œoe Œoe æae Æae ffff fifi flfl unac_except_trans = ßss œoe Œoe æae Æae ffff fifi flfl
.TP .TP
.BI "maildefcharset = "string .BI "maildefcharset = "string
@ -452,6 +471,10 @@ a directory between different configurations.
Minimum mbox file size over which we cache the offsets. There is really no sense in caching offsets for small files. The Minimum mbox file size over which we cache the offsets. There is really no sense in caching offsets for small files. The
default is 5 MB. default is 5 MB.
.TP .TP
.BI "mboxmaxmsgmbs = "int
Maximum mbox member message size in megabytes. Size over which we assume that the mbox format is bad or we
misinterpreted it, at which point we just stop processing the file.
.TP
.BI "webcachedir = "dfn .BI "webcachedir = "dfn
Directory where we store the archived web pages. This is only used by the web history indexing code Directory where we store the archived web pages. This is only used by the web history indexing code
Default: cachedir/webcache if cachedir is set, else Default: cachedir/webcache if cachedir is set, else
@ -515,10 +538,11 @@ is mainly to avoid infinite loops in postscript files
.TP .TP
.BI "filtermaxmbytes = "int .BI "filtermaxmbytes = "int
Maximum virtual memory space for filter processes Maximum virtual memory space for filter processes
(setrlimit(RLIMIT_AS)), in megabytes. Note that this (setrlimit(RLIMIT_AS)), in megabytes. Note that this includes any mapped libs (there is no reliable
includes any mapped libs (there is no reliable Linux way to limit the Linux way to limit the data space only), so we need to be a bit generous
data space only), so we need to be a bit generous here. Anything over here. Anything over 2000 will be ignored on 32 bits machines. The
2000 will be ignored on 32 bits machines. previous default value of 2000 would prevent java pdftk to work when
executed from Python rclpdf.py.
.TP .TP
.BI "thrQSizes = "string .BI "thrQSizes = "string
Stage input queues configuration. There are three Stage input queues configuration. There are three
@ -567,6 +591,12 @@ Override logfilename for the indexer in real time
mode. The default is to use the idx... values if set, else mode. The default is to use the idx... values if set, else
the log... values. the log... values.
.TP .TP
.BI "pyloglevel = "int
Override loglevel for the python module.
.TP
.BI "pylogfilename = "fn
Override logfilename for the python module.
.TP
.BI "orgidxconfdir = "dfn .BI "orgidxconfdir = "dfn
Original location of the configuration directory. This is used exclusively for movable datasets. Locating the Original location of the configuration directory. This is used exclusively for movable datasets. Locating the
configuration directory inside the directory tree makes it possible to configuration directory inside the directory tree makes it possible to
@ -633,9 +663,9 @@ space issues.
.BI "aspellLanguage = "string .BI "aspellLanguage = "string
Language definitions to use when creating the aspell Language definitions to use when creating the aspell
dictionary. The value must match a set of aspell language dictionary. The value must match a set of aspell language
definition files. You can type "aspell dicts" to see a list The default definition files. You can type "aspell dicts" to see a list The default
if this is not set is to use the NLS environment to guess the if this is not set is to use the NLS environment to guess the value. The
value. values are the 2-letter language codes (e.g. 'en', 'fr'...)
.TP .TP
.BI "aspellAddCreateParam = "string .BI "aspellAddCreateParam = "string
Additional option and parameter to aspell dictionary creation Additional option and parameter to aspell dictionary creation
@ -676,13 +706,19 @@ containing white space with double quotes (quote the whole entry, not the
pattern). The default is empty. pattern). The default is empty.
Example: mondelaypatterns = *.log:20 "*with spaces.*:30" Example: mondelaypatterns = *.log:20 "*with spaces.*:30"
.TP .TP
.BI "idxniceprio = "int
"nice" process priority for the indexing processes. Default: 19
(lowest) Appeared with 1.26.5. Prior versions were fixed at 19.
.TP
.BI "monioniceclass = "int .BI "monioniceclass = "int
ionice class for the real time indexing process On platforms where this is supported. The default value is ionice class for the indexing process. Despite the misleading name, and on platforms where this is
3. supported, this affects all indexing processes,
not only the real time/monitoring ones. The default value is 3 (use
lowest "Idle" priority).
.TP .TP
.BI "monioniceclassdata = "string .BI "monioniceclassdata = "string
ionice class parameter for the real time indexing process. On platforms where this is supported. The default is ionice class level parameter if the class supports it. The default is empty, as the default "Idle" class has no
empty. levels.
.TP .TP
.BI "autodiacsens = "bool .BI "autodiacsens = "bool
auto-trigger diacritics sensitivity (raw index only). IF the index is not stripped, decide if we automatically trigger auto-trigger diacritics sensitivity (raw index only). IF the index is not stripped, decide if we automatically trigger
@ -719,14 +755,8 @@ insufficient for very big documents, the consequence would be snippets
with possibly meaning-altering missing words. with possibly meaning-altering missing words.
.TP .TP
.BI "pdfocr = "bool .BI "pdfocr = "bool
Attempt OCR of PDF files with no text content if both tesseract and Attempt OCR of PDF files with no text content. This can be defined in subdirectories. The default is off because
pdftoppm are installed. The default is off because OCR is so OCR is so very slow.
very slow.
.TP
.BI "pdfocrlang = "string
Language to assume for PDF OCR. This is very important for having a reasonable rate of errors
with tesseract. This can also be set through a configuration variable
or directory-local parameters. See the rclpdf.py script.
.TP .TP
.BI "pdfattach = "bool .BI "pdfattach = "bool
Enable PDF attachment extraction by executing pdftk (if Enable PDF attachment extraction by executing pdftk (if
@ -750,6 +780,42 @@ method which will be called with the qualified tag name and value of each
selected field, for editing or erasing. A new instance is created for selected field, for editing or erasing. A new instance is created for
each document, so that the object can keep state for, e.g. eliminating each document, so that the object can keep state for, e.g. eliminating
duplicate values. duplicate values.
.TP
.BI "ocrprogs = "string
OCR modules to try. The top OCR script will try to load the corresponding modules in
order and use the first which reports being capable of performing OCR on
the input file. Modules for tesseract (tesseract) and ABBYY FineReader
(abbyy) are present in the standard distribution. For compatibility with
the previous version, if this is not defined at all, the default value is
"tesseract". Use an explicit empty value if needed. A value of "abbyy
tesseract" will try everything.
.TP
.BI "ocrcachedir = "dfn
Location for caching OCR data. The default if this is empty or undefined is to store the cached
OCR data under $RECOLL_CONFDIR/ocrcache.
.TP
.BI "tesseractlang = "string
Language to assume for tesseract OCR. Important for improving the OCR accuracy. This can also be set
through the contents of a file in
the currently processed directory. See the rclocrtesseract.py
script. Example values: eng, fra... See the tesseract documentation.
.TP
.BI "tesseractcmd = "fn
Path for the tesseract command. Do not quote. This is mostly useful on Windows, or for specifying a non-default
tesseract command. E.g. on Windows.
tesseractcmd = C:/Program Files (x86)/Tesseract-OCR/tesseract.exe
.TP
.BI "abbyylang = "string
Language to assume for abbyy OCR. Important for improving the OCR accuracy. This can also be set
through the contents of a file in
the currently processed directory. See the rclocrabbyy.py
script. Typical values: English, French... See the ABBYY documentation.
.TP
.BI "abbyycmd = "fn
Path for the abbyy command The ABBY directory is usually not in the path, so you should set this.
.TP .TP
.BI "mhmboxquirks = "string .BI "mhmboxquirks = "string
Enable thunderbird/mozilla-seamonkey mbox format quirks Set this for the directory where the email mbox files are Enable thunderbird/mozilla-seamonkey mbox format quirks Set this for the directory where the email mbox files are

View File

@ -387,7 +387,7 @@ indexstemminglanguages = english
# unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe æae Æae ffff fifi flfl åå Åå # unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe æae Æae ffff fifi flfl åå Åå
# . German: # . German:
# unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe æae Æae ffff fifi flfl # unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe æae Æae ffff fifi flfl
# In French, you probably want to decompose oe and ae and nobody would type # . French: you probably want to decompose oe and ae and nobody would type
# a German ß # a German ß
# unac_except_trans = ßss œoe Œoe æae Æae ffff fifi flfl # unac_except_trans = ßss œoe Œoe æae Æae ffff fifi flfl
# . The default for all until someone protests follows. These decompositions # . The default for all until someone protests follows. These decompositions