doc
This commit is contained in:
parent
48bc71da70
commit
567aaa2035
@ -54,12 +54,20 @@ home directory.
|
|||||||
Where values are lists, white space is used for separation, and elements with
|
Where values are lists, white space is used for separation, and elements with
|
||||||
embedded spaces can be quoted with double-quotes.
|
embedded spaces can be quoted with double-quotes.
|
||||||
.SH OPTIONS
|
.SH OPTIONS
|
||||||
|
|
||||||
|
|
||||||
.TP
|
.TP
|
||||||
.BI "topdirs = "string
|
.BI "topdirs = "string
|
||||||
Space-separated list of files or
|
Space-separated list of files or
|
||||||
directories to recursively index. Default to ~ (indexes
|
directories to recursively index. Default to ~ (indexes
|
||||||
$HOME). You can use symbolic links in the list, they will be followed,
|
$HOME). You can use symbolic links in the list, they will be followed,
|
||||||
independently of the value of the followLinks variable.
|
independantly of the value of the followLinks variable.
|
||||||
|
.TP
|
||||||
|
.BI "monitordirs = "string
|
||||||
|
Space-separated list of files or directories to monitor for
|
||||||
|
updates. When running the real-time indexer, this allows monitoring only a
|
||||||
|
subset of the whole indexed area. The elements must be included in the
|
||||||
|
tree defined by the 'topdirs' members.
|
||||||
.TP
|
.TP
|
||||||
.BI "skippedNames = "string
|
.BI "skippedNames = "string
|
||||||
Files and directories which should be ignored.
|
Files and directories which should be ignored.
|
||||||
@ -69,13 +77,21 @@ names. The list in the default configuration does not exclude hidden
|
|||||||
directories (names beginning with a dot), which means that it may index
|
directories (names beginning with a dot), which means that it may index
|
||||||
quite a few things that you do not want. On the other hand, email user
|
quite a few things that you do not want. On the other hand, email user
|
||||||
agents like Thunderbird usually store messages in hidden directories, and
|
agents like Thunderbird usually store messages in hidden directories, and
|
||||||
you probably want this indexed. One possible solution is to have '.*'
|
you probably want this indexed. One possible solution is to have ".*" in
|
||||||
in 'skippedNames', and add things like '~/.thunderbird' '~/.evolution'
|
"skippedNames", and add things like "~/.thunderbird" "~/.evolution" to
|
||||||
to 'topdirs'. Not even the file names are indexed for patterns in this
|
"topdirs". Not even the file names are indexed for patterns in this
|
||||||
list, see the 'noContentSuffixes' variable for an alternative approach
|
list, see the "noContentSuffixes" variable for an alternative approach
|
||||||
which indexes the file names. Can be redefined for any
|
which indexes the file names. Can be redefined for any
|
||||||
subtree.
|
subtree.
|
||||||
.TP
|
.TP
|
||||||
|
.BI "skippedNames- = "string
|
||||||
|
List of name endings to remove from the default skippedNames
|
||||||
|
list.
|
||||||
|
.TP
|
||||||
|
.BI "skippedNames+ = "string
|
||||||
|
List of name endings to add to the default skippedNames
|
||||||
|
list.
|
||||||
|
.TP
|
||||||
.BI "noContentSuffixes = "string
|
.BI "noContentSuffixes = "string
|
||||||
List of name endings (not necessarily dot-separated suffixes) for
|
List of name endings (not necessarily dot-separated suffixes) for
|
||||||
which we don't try MIME type identification, and don't uncompress or
|
which we don't try MIME type identification, and don't uncompress or
|
||||||
@ -87,38 +103,59 @@ from skippedNames because these are name ending matches only (not
|
|||||||
wildcard patterns), and the file name itself gets indexed normally. This
|
wildcard patterns), and the file name itself gets indexed normally. This
|
||||||
can be redefined for subdirectories.
|
can be redefined for subdirectories.
|
||||||
.TP
|
.TP
|
||||||
|
.BI "noContentSuffixes- = "string
|
||||||
|
List of name endings to remove from the default noContentSuffixes
|
||||||
|
list.
|
||||||
|
.TP
|
||||||
|
.BI "noContentSuffixes+ = "string
|
||||||
|
List of name endings to add to the default noContentSuffixes
|
||||||
|
list.
|
||||||
|
.TP
|
||||||
.BI "skippedPaths = "string
|
.BI "skippedPaths = "string
|
||||||
Paths we should not go into. Space-separated list of
|
Absolute paths we should not go into. Space-separated list of wildcard expressions for absolute
|
||||||
wildcard expressions for filesystem paths. Can contain files and
|
filesystem paths. Must be defined at the top level of the configuration
|
||||||
directories. The database and configuration directories will
|
file, not in a subsection. Can contain files and directories. The database and
|
||||||
automatically be added. The expressions are matched using 'fnmatch(3)'
|
configuration directories will automatically be added. The expressions
|
||||||
with the FNM_PATHNAME flag set by default. This means that '/' characters
|
are matched using 'fnmatch(3)' with the FNM_PATHNAME flag set by
|
||||||
must be matched explicitly. You can set 'skippedPathsFnmPathname' to 0
|
default. This means that '/' characters must be matched explicitely. You
|
||||||
to disable the use of FNM_PATHNAME (meaning that '/*/dir3' will
|
can set 'skippedPathsFnmPathname' to 0 to disable the use of FNM_PATHNAME
|
||||||
match '/dir1/dir2/dir3'). The default value contains the usual mount point
|
(meaning that '/*/dir3' will match '/dir1/dir2/dir3'). The default value
|
||||||
for removable media to remind you that it is a bad idea to have Recoll work
|
contains the usual mount point for removable media to remind you that it
|
||||||
on these (esp. with the monitor: media gets indexed on mount, all data
|
is a bad idea to have Recoll work on these (esp. with the monitor: media
|
||||||
gets erased on unmount). Explicitly adding '/media/xxx' to the topdirs
|
gets indexed on mount, all data gets erased on unmount). Explicitely
|
||||||
will override this.
|
adding '/media/xxx' to the 'topdirs' variable will override
|
||||||
|
this.
|
||||||
.TP
|
.TP
|
||||||
.BI "skippedPathsFnmPathname = "bool
|
.BI "skippedPathsFnmPathname = "bool
|
||||||
Set to 0 to
|
Set to 0 to
|
||||||
override use of FNM_PATHNAME for matching skipped
|
override use of FNM_PATHNAME for matching skipped
|
||||||
paths.
|
paths.
|
||||||
.TP
|
.TP
|
||||||
|
.BI "nowalkfn = "string
|
||||||
|
File name which will cause its parent directory to be skipped. Any directory containing a file with this name will be skipped as
|
||||||
|
if it was part of the skippedPaths list. Ex: .recoll-noindex
|
||||||
|
.TP
|
||||||
.BI "daemSkippedPaths = "string
|
.BI "daemSkippedPaths = "string
|
||||||
skippedPaths equivalent specific to
|
skippedPaths equivalent specific to
|
||||||
real time indexing. This enables having parts of the tree
|
real time indexing. This enables having parts of the tree
|
||||||
which are initially indexed but not monitored. If daemSkippedPaths is
|
which are initially indexed but not monitored. If daemSkippedPaths is
|
||||||
not set, the daemon uses skippedPaths.
|
not set, the daemon uses skippedPaths.
|
||||||
|
.TP
|
||||||
|
.BI "zipUseSkippedNames = "bool
|
||||||
|
Use skippedNames inside Zip archives. Fetched
|
||||||
|
directly by the rclzip handler. Skip the patterns defined by skippedNames
|
||||||
|
inside Zip archives. Can be redefined for subdirectories.
|
||||||
|
See https://www.lesbonscomptes.com/recoll/faqsandhowtos/FilteringOutZipArchiveMembers.html
|
||||||
|
|
||||||
.TP
|
.TP
|
||||||
.BI "zipSkippedNames = "string
|
.BI "zipSkippedNames = "string
|
||||||
Space-separated list of wildcard expressions for names that should
|
Space-separated list of wildcard expressions for names that should
|
||||||
be ignored inside zip archives. This is used directly by
|
be ignored inside zip archives. This is used directly by
|
||||||
the zip handler, and has a function similar to skippedNames, but works
|
the zip handler. If zipUseSkippedNames is not set, zipSkippedNames
|
||||||
independently. Can be redefined for subdirectories. Supported by recoll
|
defines the patterns to be skipped inside archives. If zipUseSkippedNames
|
||||||
1.20 and newer. See
|
is set, the two lists are concatenated and used. Can be redefined for
|
||||||
https://bitbucket.org/medoc/recoll/wiki/Filtering%20out%20Zip%20archive%20members
|
subdirectories.
|
||||||
|
See https://www.lesbonscomptes.com/recoll/faqsandhowtos/FilteringOutZipArchiveMembers.html
|
||||||
|
|
||||||
.TP
|
.TP
|
||||||
.BI "followLinks = "bool
|
.BI "followLinks = "bool
|
||||||
@ -133,16 +170,27 @@ followed.
|
|||||||
.BI "indexedmimetypes = "string
|
.BI "indexedmimetypes = "string
|
||||||
Restrictive list of
|
Restrictive list of
|
||||||
indexed mime types. Normally not set (in which case all
|
indexed mime types. Normally not set (in which case all
|
||||||
supported types are indexed). If it is set,
|
supported types are indexed). If it is set, only the types from the list
|
||||||
only the types from the list will have their contents indexed. The names
|
will have their contents indexed. The names will be indexed anyway if
|
||||||
will be indexed anyway if indexallfilenames is set (default). MIME
|
indexallfilenames is set (default). MIME type names should be taken from
|
||||||
type names should be taken from the mimemap file. Can be redefined for
|
the mimemap file (the values may be different from xdg-mime or file -i
|
||||||
subtrees.
|
output in some cases). Can be redefined for subtrees.
|
||||||
.TP
|
.TP
|
||||||
.BI "excludedmimetypes = "string
|
.BI "excludedmimetypes = "string
|
||||||
List of excluded MIME
|
List of excluded MIME
|
||||||
types. Lets you exclude some types from indexing. Can be
|
types. Lets you exclude some types from indexing. MIME type
|
||||||
redefined for subtrees.
|
names should be taken from the mimemap file (the values may be different
|
||||||
|
from xdg-mime or file -i output in some cases) Can be redefined for
|
||||||
|
subtrees.
|
||||||
|
.TP
|
||||||
|
.BI "nomd5types = "string
|
||||||
|
Don't compute md5 for these types. md5 checksums are used only for deduplicating results, and can be
|
||||||
|
very expensive to compute on multimedia or other big files. This list
|
||||||
|
lets you turn off md5 computation for selected types. It is global (no
|
||||||
|
redefinition for subtrees). At the moment, it only has an effect for
|
||||||
|
external handlers (exec and execm). The file types can be specified by
|
||||||
|
listing either MIME types (e.g. audio/mpeg) or handler names
|
||||||
|
(e.g. rclaudio).
|
||||||
.TP
|
.TP
|
||||||
.BI "compressedfilemaxkbs = "int
|
.BI "compressedfilemaxkbs = "int
|
||||||
Size limit for compressed
|
Size limit for compressed
|
||||||
@ -173,9 +221,9 @@ for the command used.
|
|||||||
Command used to guess
|
Command used to guess
|
||||||
MIME types if the internal methods fails This should be a
|
MIME types if the internal methods fails This should be a
|
||||||
"file -i" workalike. The file path will be added as a last parameter to
|
"file -i" workalike. The file path will be added as a last parameter to
|
||||||
the command line. 'xdg-mime' works better than the traditional 'file'
|
the command line. "xdg-mime" works better than the traditional "file"
|
||||||
command, and is now the configured default (with a hard-coded fallback
|
command, and is now the configured default (with a hard-coded fallback to
|
||||||
to 'file')
|
"file")
|
||||||
.TP
|
.TP
|
||||||
.BI "processwebqueue = "bool
|
.BI "processwebqueue = "bool
|
||||||
Decide if we process the
|
Decide if we process the
|
||||||
@ -204,6 +252,34 @@ will be bigger, and some marginal weirdness may sometimes occur. The
|
|||||||
default is a stripped index. When using multiple indexes for a search,
|
default is a stripped index. When using multiple indexes for a search,
|
||||||
this parameter must be defined identically for all. Changing the value
|
this parameter must be defined identically for all. Changing the value
|
||||||
implies an index reset.
|
implies an index reset.
|
||||||
|
.TP
|
||||||
|
.BI "indexStoreDocText = "bool
|
||||||
|
Decide if we store the
|
||||||
|
documents' text content in the index. Storing the text
|
||||||
|
allows extracting snippets from it at query time, instead of building
|
||||||
|
them from index position data.
|
||||||
|
Newer Xapian index formats have rendered our use of positions list
|
||||||
|
unacceptably slow in some cases. The last Xapian index format with good
|
||||||
|
performance for the old method is Chert, which is default for 1.2, still
|
||||||
|
supported but not default in 1.4 and will be dropped in 1.6.
|
||||||
|
The stored document text is translated from its original format to UTF-8
|
||||||
|
plain text, but not stripped of upper-case, diacritics, or punctuation
|
||||||
|
signs. Storing it increases the index size by 10-20% typically, but also
|
||||||
|
allows for nicer snippets, so it may be worth enabling it even if not
|
||||||
|
strictly needed for performance if you can afford the space.
|
||||||
|
The variable only has an effect when creating an index, meaning that the
|
||||||
|
xapiandb directory must not exist yet. Its exact effect depends on the
|
||||||
|
Xapian version.
|
||||||
|
For Xapian 1.4, if the variable is set to 0, the Chert format will be
|
||||||
|
used, and the text will not be stored. If the variable is 1, Glass will
|
||||||
|
be used, and the text stored.
|
||||||
|
For Xapian 1.2, and for versions after 1.5 and newer, the index format is
|
||||||
|
always the default, but the variable controls if the text is stored or
|
||||||
|
not, and the abstract generation method. With Xapian 1.5 and later, and
|
||||||
|
the variable set to 0, abstract generation may be very slow, but this
|
||||||
|
setting may still be useful to save space if you do not use abstract
|
||||||
|
generation at all.
|
||||||
|
|
||||||
.TP
|
.TP
|
||||||
.BI "nonumbers = "bool
|
.BI "nonumbers = "bool
|
||||||
Decides if terms will be
|
Decides if terms will be
|
||||||
@ -216,9 +292,19 @@ will reduce the index size. This can only be set for a whole index, not
|
|||||||
for a subtree.
|
for a subtree.
|
||||||
.TP
|
.TP
|
||||||
.BI "dehyphenate = "bool
|
.BI "dehyphenate = "bool
|
||||||
Determines if we index 'coworker' also when the input is 'co-worker'.
|
Determines if we index
|
||||||
This is new in version 1.22, and on by default. Setting the variable to off
|
'coworker' also when the input is 'co-worker'. This is new
|
||||||
allows restoring the previous behaviour.
|
in version 1.22, and on by default. Setting the variable to off allows
|
||||||
|
restoring the previous behaviour.
|
||||||
|
.TP
|
||||||
|
.BI "backslashasletter = "bool
|
||||||
|
Process backslash as normal letter This may make sense for people wanting to index TeX commands as
|
||||||
|
such but is not of much general use.
|
||||||
|
.TP
|
||||||
|
.BI "maxtermlength = "int
|
||||||
|
Maximum term length. Words longer than this will be discarded.
|
||||||
|
The default is 40 and used to be hard-coded, but it can now be
|
||||||
|
adjusted. You need an index reset if you change the value.
|
||||||
.TP
|
.TP
|
||||||
.BI "nocjk = "bool
|
.BI "nocjk = "bool
|
||||||
Decides if specific East Asian
|
Decides if specific East Asian
|
||||||
@ -263,24 +349,16 @@ lowercase and upper-case versions of a character should be specified, as
|
|||||||
appartenance to the list will turn-off both standard accent and case
|
appartenance to the list will turn-off both standard accent and case
|
||||||
processing. The value is global and affects both indexing and querying.
|
processing. The value is global and affects both indexing and querying.
|
||||||
Examples:
|
Examples:
|
||||||
|
|
||||||
Swedish:
|
Swedish:
|
||||||
|
|
||||||
unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe æae Æae ffff fifi flfl åå Åå
|
unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe æae Æae ffff fifi flfl åå Åå
|
||||||
|
. German:
|
||||||
German:
|
|
||||||
|
|
||||||
unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe æae Æae ffff fifi flfl
|
unac_except_trans = ää Ää öö Öö üü Üü ßss œoe Œoe æae Æae ffff fifi flfl
|
||||||
|
|
||||||
In French, you probably want to decompose oe and ae and nobody would type
|
In French, you probably want to decompose oe and ae and nobody would type
|
||||||
a German ß
|
a German ß
|
||||||
|
|
||||||
unac_except_trans = ßss œoe Œoe æae Æae ffff fifi flfl
|
unac_except_trans = ßss œoe Œoe æae Æae ffff fifi flfl
|
||||||
|
. The default for all until someone protests follows. These decompositions
|
||||||
The default for all until someone protests follows. These decompositions
|
|
||||||
are not performed by unac, but it is unlikely that someone would type the
|
are not performed by unac, but it is unlikely that someone would type the
|
||||||
composed forms in a search.
|
composed forms in a search.
|
||||||
|
|
||||||
unac_except_trans = ßss œoe Œoe æae Æae ffff fifi flfl
|
unac_except_trans = ßss œoe Œoe æae Æae ffff fifi flfl
|
||||||
.TP
|
.TP
|
||||||
.BI "maildefcharset = "string
|
.BI "maildefcharset = "string
|
||||||
@ -352,7 +430,7 @@ over which we stop indexing. The value is a percentage,
|
|||||||
corresponding to what the "Capacity" df output column shows. The default
|
corresponding to what the "Capacity" df output column shows. The default
|
||||||
value is 0, meaning no checking.
|
value is 0, meaning no checking.
|
||||||
.TP
|
.TP
|
||||||
.BI "xapiandb = "dfn
|
.BI "dbdir = "dfn
|
||||||
Xapian database directory
|
Xapian database directory
|
||||||
location. This will be created on first indexing. If the
|
location. This will be created on first indexing. If the
|
||||||
value is not an absolute path, it will be interpreted as relative to
|
value is not an absolute path, it will be interpreted as relative to
|
||||||
@ -386,9 +464,17 @@ Default: 40 MB.
|
|||||||
Reducing the size will not physically truncate the file.
|
Reducing the size will not physically truncate the file.
|
||||||
.TP
|
.TP
|
||||||
.BI "webqueuedir = "fn
|
.BI "webqueuedir = "fn
|
||||||
The path to the Web indexing queue. This is
|
The path to the Web indexing queue. This used to be
|
||||||
hard-coded in the plugin as ~/.recollweb/ToIndex so there should be no
|
hard-coded in the old plugin as ~/.recollweb/ToIndex so there would be no
|
||||||
need or possibility to change it.
|
need or possibility to change it, but the WebExtensions plugin now downloads
|
||||||
|
the files to the user Downloads directory, and a script moves them to
|
||||||
|
webqueuedir. The script reads this value from the config so it has become
|
||||||
|
possible to change it.
|
||||||
|
.TP
|
||||||
|
.BI "webdownloadsdir = "fn
|
||||||
|
The path to browser downloads directory. This is
|
||||||
|
where the new browser add-on extension has to create the files. They are
|
||||||
|
then moved by a script to webqueuedir.
|
||||||
.TP
|
.TP
|
||||||
.BI "aspellDicDir = "dfn
|
.BI "aspellDicDir = "dfn
|
||||||
Aspell dictionary storage directory location. The
|
Aspell dictionary storage directory location. The
|
||||||
@ -415,10 +501,11 @@ which lets Xapian perform its own thing, meaning flushing every
|
|||||||
$XAPIAN_FLUSH_THRESHOLD documents created, modified or deleted: as memory
|
$XAPIAN_FLUSH_THRESHOLD documents created, modified or deleted: as memory
|
||||||
usage depends on average document size, not only document count, the
|
usage depends on average document size, not only document count, the
|
||||||
Xapian approach is is not very useful, and you should let Recoll manage
|
Xapian approach is is not very useful, and you should let Recoll manage
|
||||||
the flushes. The default value of idxflushmb is 10 MB, and may be a bit
|
the flushes. The program compiled value is 0. The configured default
|
||||||
low. If you are looking for maximum speed, you may want to experiment
|
value (from this file) is now 50 MB, and should be ok in many cases.
|
||||||
with values between 20 and
|
You can set it as low as 10 to conserve memory, but if you are looking
|
||||||
80. In my experience, values beyond 100 are always counterproductive. If
|
for maximum speed, you may want to experiment with values between 20 and
|
||||||
|
200. In my experience, values beyond this are always counterproductive. If
|
||||||
you find otherwise, please drop me a note.
|
you find otherwise, please drop me a note.
|
||||||
.TP
|
.TP
|
||||||
.BI "filtermaxseconds = "int
|
.BI "filtermaxseconds = "int
|
||||||
@ -481,6 +568,25 @@ Override logfilename for the indexer in real time
|
|||||||
mode. The default is to use the idx... values if set, else
|
mode. The default is to use the idx... values if set, else
|
||||||
the log... values.
|
the log... values.
|
||||||
.TP
|
.TP
|
||||||
|
.BI "orgidxconfdir = "dfn
|
||||||
|
Original location of the configuration directory. This is used exclusively for movable datasets. Locating the
|
||||||
|
configuration directory inside the directory tree makes it possible to
|
||||||
|
provide automatic query time path translations once the data set has
|
||||||
|
moved (for example, because it has been mounted on another
|
||||||
|
location).
|
||||||
|
.TP
|
||||||
|
.BI "curidxconfdir = "dfn
|
||||||
|
Current location of the configuration directory. Complement orgidxconfdir for movable datasets. This should be used
|
||||||
|
if the configuration directory has been copied from the dataset to
|
||||||
|
another location, either because the dataset is readonly and an r/w copy
|
||||||
|
is desired, or for performance reasons. This records the original moved
|
||||||
|
location before copy, to allow path translation computations. For
|
||||||
|
example if a dataset originally indexed as '/home/me/mydata/config' has
|
||||||
|
been mounted to '/media/me/mydata', and the GUI is running from a copied
|
||||||
|
configuration, orgidxconfdir would be '/home/me/mydata/config', and
|
||||||
|
curidxconfdir (as set in the copied configuration) would be
|
||||||
|
'/media/me/mydata/config'.
|
||||||
|
.TP
|
||||||
.BI "idxrundir = "dfn
|
.BI "idxrundir = "dfn
|
||||||
Indexing process current directory. The input
|
Indexing process current directory. The input
|
||||||
handlers sometimes leave temporary files in the current directory, so it
|
handlers sometimes leave temporary files in the current directory, so it
|
||||||
@ -519,6 +625,12 @@ amount of data stored in the index for the purpose of displaying fields
|
|||||||
inside result lists or previews. The default value is 150 bytes which
|
inside result lists or previews. The default value is 150 bytes which
|
||||||
may be too low if you have custom fields.
|
may be too low if you have custom fields.
|
||||||
.TP
|
.TP
|
||||||
|
.BI "idxtexttruncatelen = "int
|
||||||
|
Truncation length for all document texts. Only index
|
||||||
|
the beginning of documents. This is not recommended except if you are
|
||||||
|
sure that the interesting keywords are at the top and have severe disk
|
||||||
|
space issues.
|
||||||
|
.TP
|
||||||
.BI "aspellLanguage = "string
|
.BI "aspellLanguage = "string
|
||||||
Language definitions to use when creating the aspell
|
Language definitions to use when creating the aspell
|
||||||
dictionary. The value must match a set of aspell language
|
dictionary. The value must match a set of aspell language
|
||||||
@ -612,16 +724,39 @@ Attempt OCR of PDF files with no text content if both tesseract and
|
|||||||
pdftoppm are installed. The default is off because OCR is so
|
pdftoppm are installed. The default is off because OCR is so
|
||||||
very slow.
|
very slow.
|
||||||
.TP
|
.TP
|
||||||
|
.BI "pdfocrlang = "string
|
||||||
|
Language to assume for PDF OCR. This is very important for having a reasonable rate of errors
|
||||||
|
with tesseract. This can also be set through a configuration variable
|
||||||
|
or directory-local parameters. See the rclpdf.py script.
|
||||||
|
.TP
|
||||||
.BI "pdfattach = "bool
|
.BI "pdfattach = "bool
|
||||||
Enable PDF attachment extraction by executing pdftk (if
|
Enable PDF attachment extraction by executing pdftk (if
|
||||||
available). This is
|
available). This is
|
||||||
normally disabled, because it does slow down PDF indexing a bit even if
|
normally disabled, because it does slow down PDF indexing a bit even if
|
||||||
not one attachment is ever found.
|
not one attachment is ever found.
|
||||||
.TP
|
.TP
|
||||||
|
.BI "pdfextrameta = "string
|
||||||
|
Extract text from selected XMP metadata tags. This
|
||||||
|
is a space-separated list of qualified XMP tag names. Each element can also
|
||||||
|
include a translation to a Recoll field name, separated by a '|'
|
||||||
|
character. If the second element is absent, the tag name is used as the
|
||||||
|
Recoll field names. You will also need to add specifications to the
|
||||||
|
"fields" file to direct processing of the extracted data.
|
||||||
|
.TP
|
||||||
|
.BI "pdfextrametafix = "fn
|
||||||
|
Define name of XMP field editing script. This
|
||||||
|
defines the name of a script to be loaded for editing XMP field
|
||||||
|
values. The script should define a 'MetaFixer' class with a metafix()
|
||||||
|
method which will be called with the qualified tag name and value of each
|
||||||
|
selected field, for editing or erasing. A new instance is created for
|
||||||
|
each document, so that the object can keep state for, e.g. eliminating
|
||||||
|
duplicate values.
|
||||||
|
.TP
|
||||||
.BI "mhmboxquirks = "string
|
.BI "mhmboxquirks = "string
|
||||||
Enable thunderbird/mozilla-seamonkey mbox format quirks Set this for the directory where the email mbox files are
|
Enable thunderbird/mozilla-seamonkey mbox format quirks Set this for the directory where the email mbox files are
|
||||||
stored.
|
stored.
|
||||||
|
|
||||||
|
|
||||||
.SH SEE ALSO
|
.SH SEE ALSO
|
||||||
.PP
|
.PP
|
||||||
recollindex(1) recoll(1)
|
recollindex(1) recoll(1)
|
||||||
|
|||||||
File diff suppressed because it is too large
Load Diff
@ -8,6 +8,7 @@
|
|||||||
<!ENTITY RCLVERSION "1.25">
|
<!ENTITY RCLVERSION "1.25">
|
||||||
<!ENTITY XAP "<application>Xapian</application>">
|
<!ENTITY XAP "<application>Xapian</application>">
|
||||||
<!ENTITY WIN "<application>Windows</application>">
|
<!ENTITY WIN "<application>Windows</application>">
|
||||||
|
<!ENTITY LIN "<application>Unix</application>-like systems">
|
||||||
<!ENTITY FAQS "https://www.lesbonscomptes.com/recoll/faqsandhowtos/">
|
<!ENTITY FAQS "https://www.lesbonscomptes.com/recoll/faqsandhowtos/">
|
||||||
]>
|
]>
|
||||||
|
|
||||||
@ -89,7 +90,7 @@
|
|||||||
</menuchoice>, then adjust the <guilabel>Top
|
</menuchoice>, then adjust the <guilabel>Top
|
||||||
directories</guilabel> section).</para>
|
directories</guilabel> section).</para>
|
||||||
|
|
||||||
<para>On Unix/Linux, you may need to install the
|
<para>On &LIN;, you may need to install the
|
||||||
appropriate
|
appropriate
|
||||||
<link linkend="RCL.INSTALL.EXTERNAL">supporting applications</link>
|
<link linkend="RCL.INSTALL.EXTERNAL">supporting applications</link>
|
||||||
for document types that need them (for
|
for document types that need them (for
|
||||||
@ -177,16 +178,13 @@
|
|||||||
<para>The &XAP; index can be big (roughly the size of the original
|
<para>The &XAP; index can be big (roughly the size of the original
|
||||||
document set), but it is not a document archive. &RCL; can only
|
document set), but it is not a document archive. &RCL; can only
|
||||||
display documents that still exist at the place from which they were
|
display documents that still exist at the place from which they were
|
||||||
indexed. (Actually, there is a way to reconstruct a document from the
|
indexed.</para>
|
||||||
information in the index, but only the pure text is saved, possibly
|
|
||||||
without punctuation and capitalization, depending on &RCL;
|
|
||||||
version).</para>
|
|
||||||
|
|
||||||
<para>&RCL; stores all internal data in <application>Unicode
|
<para>&RCL; stores all internal data in <application>Unicode
|
||||||
UTF-8</application> format, and it can index files of many types
|
UTF-8</application> format, and it can index many types of files
|
||||||
with different character sets, encodings, and languages into the
|
with different character sets, encodings, and languages into the
|
||||||
same index. It can process documents embedded inside other
|
same index. It can process documents embedded inside other
|
||||||
documents (for example a pdf document stored inside a Zip
|
documents (for example a PDF document stored inside a Zip
|
||||||
archive sent as an email attachment...), down to an arbitrary
|
archive sent as an email attachment...), down to an arbitrary
|
||||||
depth.</para>
|
depth.</para>
|
||||||
|
|
||||||
@ -233,25 +231,17 @@
|
|||||||
<link linkend="RCL.INDEXING.CONFIG.SENS">index case and diacritics sensitivity</link>.
|
<link linkend="RCL.INDEXING.CONFIG.SENS">index case and diacritics sensitivity</link>.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para>&RCL; has many parameters which define exactly what to
|
<para>&RCL; uses many parameters to define exactly what to index,
|
||||||
index, and how to classify and decode the source
|
and how to classify and decode the source documents. These are kept
|
||||||
documents. These are kept in
|
in <link linkend="RCL.INDEXING.CONFIG">configuration files</link>. A
|
||||||
<link linkend="RCL.INDEXING.CONFIG">configuration files</link>.
|
default configuration is copied into a standard location (usually
|
||||||
A default configuration is copied into a standard location
|
something like <filename>/usr/share/recoll/examples</filename>)
|
||||||
(usually something like
|
during installation. The default values set by the configuration
|
||||||
<filename>/usr/share/recoll/examples</filename>)
|
files in this directory may be overridden by values set inside your
|
||||||
during installation. The default values set by the
|
personal configuration. With the default configuration, &RCL; will
|
||||||
configuration files in this directory may be overridden by
|
index your home directory with generic parameters. The configuration
|
||||||
values set inside your personal configuration, found
|
can be customized either by editing the text files or by using
|
||||||
by default in the <filename>.recoll</filename> sub-directory
|
configuration menus in the <command>recoll</command> GUI.</para>
|
||||||
of your home directory. The default configuration will index
|
|
||||||
your home directory with default parameters and should be
|
|
||||||
sufficient for giving &RCL; a try, but you may want to adjust
|
|
||||||
it later, which can be done either by editing the text files
|
|
||||||
or by using configuration menus in the
|
|
||||||
<command>recoll</command> GUI. Some other parameters affecting only
|
|
||||||
the <command>recoll</command> GUI are stored in the standard
|
|
||||||
location defined by <application>Qt</application>.</para>
|
|
||||||
|
|
||||||
<para>The <link linkend="RCL.INDEXING.PERIODIC.EXEC">indexing process</link>
|
<para>The <link linkend="RCL.INDEXING.PERIODIC.EXEC">indexing process</link>
|
||||||
is started automatically (after asking permission), the
|
is started automatically (after asking permission), the
|
||||||
@ -265,7 +255,7 @@
|
|||||||
<para><link linkend="RCL.SEARCH">Searches</link> are usually
|
<para><link linkend="RCL.SEARCH">Searches</link> are usually
|
||||||
performed inside the <command>recoll</command> GUI, which has many
|
performed inside the <command>recoll</command> GUI, which has many
|
||||||
options to help you find what you are looking for. However, there
|
options to help you find what you are looking for. However, there
|
||||||
are other ways to perform &RCL; searches:
|
are other ways to query the index:
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
<listitem><para>A
|
<listitem><para>A
|
||||||
<link linkend="RCL.SEARCH.COMMANDLINE">command line interface</link>.
|
<link linkend="RCL.SEARCH.COMMANDLINE">command line interface</link>.
|
||||||
@ -328,41 +318,44 @@
|
|||||||
<sect2 id="RCL.INDEXING.INTRODUCTION.MODES">
|
<sect2 id="RCL.INDEXING.INTRODUCTION.MODES">
|
||||||
<title>Indexing modes</title>
|
<title>Indexing modes</title>
|
||||||
|
|
||||||
<para>&RCL; indexing can be performed along two main modes:
|
<para>&RCL; indexing can be performed along two main modes:</para>
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
<listitem>
|
<listitem>
|
||||||
<formalpara>
|
<formalpara><title>
|
||||||
<title><link linkend="RCL.INDEXING.PERIODIC">Periodic (or batch) indexing:</link></title>
|
<link linkend="RCL.INDEXING.PERIODIC">Periodic (or batch) indexing</link>
|
||||||
|
</title>
|
||||||
<para><command>recollindex</command> is executed
|
<para><command>recollindex</command> is executed
|
||||||
at discrete times. The typical usage is to have a nightly run
|
at discrete times. On &LIN;, the typical usage is to have a
|
||||||
<link linkend="RCL.INDEXING.PERIODIC.AUTOMAT">programmed</link> into
|
nightly run
|
||||||
your <command>cron</command> file.</para>
|
<link linkend="RCL.INDEXING.PERIODIC.AUTOMAT">programmed</link>
|
||||||
|
into your <command>cron</command> file. On &WIN;, this is
|
||||||
|
the only mode available, and the indexer is usually started
|
||||||
|
from the GUI (but there is nothing to prevent starting it
|
||||||
|
from a command script).</para>
|
||||||
</formalpara>
|
</formalpara>
|
||||||
</listitem>
|
</listitem>
|
||||||
<listitem>
|
<listitem>
|
||||||
<formalpara><title><link linkend="RCL.INDEXING.MONITOR">Real time indexing:</link></title>
|
<formalpara><title>
|
||||||
<para><command>recollindex</command> runs permanently as a
|
<link linkend="RCL.INDEXING.MONITOR">Real time indexing</link>
|
||||||
daemon and uses a file system alteration monitor
|
</title>
|
||||||
|
<para>(Only available on &LIN;). <command>recollindex</command> runs
|
||||||
|
permanently as a daemon and uses a file system alteration monitor
|
||||||
(e.g. <application>inotify</application>) to detect file
|
(e.g. <application>inotify</application>) to detect file
|
||||||
changes. New or updated files are indexed at once.</para>
|
changes. New or updated files are indexed at once. Monitoring a
|
||||||
|
big file system tree can consume
|
||||||
|
significant system resources. </para>
|
||||||
</formalpara>
|
</formalpara>
|
||||||
</listitem>
|
</listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
</para>
|
|
||||||
|
|
||||||
|
<simplesect><title>&LIN;: choosing an indexing mode</title>
|
||||||
<para>The choice between the two methods is mostly a matter of
|
<para>The choice between the two methods is mostly a matter of
|
||||||
preference, and they can be combined by setting up multiple
|
preference, and they can be combined by setting up multiple
|
||||||
indexes (ie: use periodic indexing on a big documentation
|
indexes (ie: use periodic indexing on a big documentation
|
||||||
directory, and real time indexing on a small home
|
directory, and real time indexing on a small home
|
||||||
directory). Monitoring a big file system tree can consume
|
directory), or, with &RCL; 1.24 and newer, by
|
||||||
significant system resources.</para>
|
<link linkend="RCL.INDEXING.MONITOR">configuring the index so that only a subset of the tree will be monitored.</link>
|
||||||
|
</para>
|
||||||
<para>With &RCL; 1.24 and newer, it is also possible to set up an
|
|
||||||
index so that only a subset of the tree will be monitored and the
|
|
||||||
rest will be covered by batch/incremental indexing. (See the
|
|
||||||
details in the <link linkend="RCL.INDEXING.MONITOR">Real time indexing</link>
|
|
||||||
section.</para>
|
|
||||||
|
|
||||||
<para>The choice of method and the parameters used can be
|
<para>The choice of method and the parameters used can be
|
||||||
configured from the <command>recoll</command> GUI:
|
configured from the <command>recoll</command> GUI:
|
||||||
<menuchoice>
|
<menuchoice>
|
||||||
@ -370,21 +363,7 @@
|
|||||||
<guimenuitem>Indexing schedule</guimenuitem>
|
<guimenuitem>Indexing schedule</guimenuitem>
|
||||||
</menuchoice>
|
</menuchoice>
|
||||||
</para>
|
</para>
|
||||||
|
</simplesect>
|
||||||
<para>The GUI <menuchoice><guimenu>File</guimenu>
|
|
||||||
</menuchoice> menu also has entries to start or stop
|
|
||||||
the current indexing operation. Stopping indexing is performed by
|
|
||||||
killing the <command>recollindex</command> process, which will
|
|
||||||
checkpoint its state and exit. A later restart of indexing will
|
|
||||||
mostly resume from where things stopped (the file tree walk has to
|
|
||||||
be restarted from the beginning).</para>
|
|
||||||
|
|
||||||
<para>When the real time indexer is running, two operations are
|
|
||||||
available from the menu: 'Stop' and 'Trigger incremental pass'.
|
|
||||||
When no indexing is running, you have a choice of updating the
|
|
||||||
index or rebuilding it (the first choice only processes changed
|
|
||||||
files, the second one zeroes the index before starting so that all
|
|
||||||
files are processed).</para>
|
|
||||||
|
|
||||||
</sect2>
|
</sect2>
|
||||||
|
|
||||||
@ -396,11 +375,13 @@
|
|||||||
in which several configuration files describe
|
in which several configuration files describe
|
||||||
what should be indexed and how.</para>
|
what should be indexed and how.</para>
|
||||||
|
|
||||||
<para>A default personal configuration directory
|
<para>When <command>recoll</command> or
|
||||||
(<filename>$HOME/.recoll/</filename>) is created
|
<command>recollindex</command> is first executed, it creates a
|
||||||
when a &RCL; program is first executed. This configuration is
|
default configuration directory. This configuration is the one used
|
||||||
the one used for indexing and querying when no specific
|
for indexing and querying when no specific configuration is
|
||||||
configuration is specified.</para>
|
specified. It is located in <filename>$HOME/.recoll/</filename> for
|
||||||
|
&LIN; and <filename>%LOCALAPPDATA%</filename> on &WIN;
|
||||||
|
(typically <filename>C:\Users\[me]\Appdata\Local</filename>).</para>
|
||||||
|
|
||||||
<para>All configuration parameters have defaults, defined in
|
<para>All configuration parameters have defaults, defined in
|
||||||
system-wide files. Without further customisation, the default
|
system-wide files. Without further customisation, the default
|
||||||
@ -431,33 +412,6 @@
|
|||||||
machines), and then merging them, or querying them in
|
machines), and then merging them, or querying them in
|
||||||
parallel.</para>
|
parallel.</para>
|
||||||
|
|
||||||
<para>A specific configuration can be selected by setting the
|
|
||||||
<envar>RECOLL_CONFDIR</envar> environment variable, or giving the
|
|
||||||
<option>-c</option> option to any of the &RCL; commands.</para>
|
|
||||||
|
|
||||||
<para>When creating or updating indexes, the different
|
|
||||||
configurations are entirely independant (no parameters are ever
|
|
||||||
shared between configurations when indexing). The
|
|
||||||
<command>recollindex</command> program always works on a single
|
|
||||||
index.</para>
|
|
||||||
|
|
||||||
<para>When querying, multiple indexes can be accessed concurrently,
|
|
||||||
either from the GUI or the command line. When doing this, there is
|
|
||||||
always one main configuration, from which both configuration and
|
|
||||||
index data are used. Only the index data from the additional
|
|
||||||
indexes is used (their configuration parameters are
|
|
||||||
ignored).</para>
|
|
||||||
|
|
||||||
<para>The behaviour of index update and query regarding multiple
|
|
||||||
configurations is important and sometimes confusing, so it will be
|
|
||||||
rephrased here: for index generation, multiple configurations are
|
|
||||||
totally independant from each other. When querying, configuration
|
|
||||||
and data are used from the main index (the one designated by
|
|
||||||
<literal>-c</literal> or <envar>RECOLL_CONFDIR</envar>), and only
|
|
||||||
the data from the additional indexes is used. This implies
|
|
||||||
that some parameters should be consistent among the configurations
|
|
||||||
for indexes which are to be used together.</para>
|
|
||||||
|
|
||||||
<para>See the section about
|
<para>See the section about
|
||||||
<link linkend="RCL.INDEXING.CONFIG.MULTIPLE">configuring multiple indexes</link>
|
<link linkend="RCL.INDEXING.CONFIG.MULTIPLE">configuring multiple indexes</link>
|
||||||
for more detail</para>
|
for more detail</para>
|
||||||
@ -751,27 +705,26 @@
|
|||||||
<link linkend="RCL.INDEXING.CONFIG.GUI">dialogs in the <command>recoll</command> GUI</link>.
|
<link linkend="RCL.INDEXING.CONFIG.GUI">dialogs in the <command>recoll</command> GUI</link>.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para>The first time you start <command>recoll</command>, you
|
<para>The first time you start <command>recoll</command>, you will be
|
||||||
will be asked whether or not you would like it to build the
|
asked whether or not you would like it to build the index. If you
|
||||||
index. If you want to adjust the configuration before
|
want to adjust the configuration before indexing, just click
|
||||||
indexing, just click <guilabel>Cancel</guilabel> at this
|
<guilabel>Cancel</guilabel> at this point, which will get you into
|
||||||
point, which will get you into the configuration interface. If
|
the configuration interface. If you exit at this point,
|
||||||
you exit at this point, <filename>recoll</filename> will have
|
<filename>recoll</filename> will have created a default configuration
|
||||||
created a <filename>~/.recoll</filename> directory containing
|
directory with empty configuration files, which you can then
|
||||||
empty configuration files, which you can edit by hand.</para>
|
edit.</para>
|
||||||
|
|
||||||
<para>The configuration is documented inside the
|
<para>The configuration is documented inside the
|
||||||
<link linkend="RCL.INSTALL.CONFIG">installation chapter</link>
|
<link linkend="RCL.INSTALL.CONFIG">installation chapter</link>
|
||||||
of this document, or in the
|
of this document, or in the
|
||||||
<citerefentry>
|
<ulink url="https://www.lesbonscomptes.com/recoll/manpages/recoll.conf.5.html"><citerefentry><refentrytitle>recoll.conf</refentrytitle><manvolnum>5</manvolnum></citerefentry></ulink>
|
||||||
<refentrytitle>recoll.conf</refentrytitle>
|
manual page.Both documents are automatically generated from
|
||||||
<manvolnum>5</manvolnum>
|
the comments inside the configuration file.</para>
|
||||||
</citerefentry>
|
|
||||||
man page, but the most current information will most likely be the
|
<para>The most immediately useful variable
|
||||||
comments inside the sample file. The most immediately useful variable
|
|
||||||
is probably
|
is probably
|
||||||
<link linkend="RCL.INSTALL.CONFIG.RECOLLCONF.TOPDIRS"><varname>topdirs</varname></link>,
|
<link linkend="RCL.INSTALL.CONFIG.RECOLLCONF.TOPDIRS"><varname>topdirs</varname></link>,
|
||||||
which determines what subtrees and files get indexed.</para>
|
which lists the subtrees and files to be indexed.</para>
|
||||||
|
|
||||||
<para>The applications needed to index file types other than
|
<para>The applications needed to index file types other than
|
||||||
text, HTML or email (ie: pdf, postscript, ms-word...) are
|
text, HTML or email (ie: pdf, postscript, ms-word...) are
|
||||||
@ -789,67 +742,62 @@
|
|||||||
|
|
||||||
<para>Multiple &RCL; indexes can be created by using several
|
<para>Multiple &RCL; indexes can be created by using several
|
||||||
configuration directories which are typically set to index
|
configuration directories which are typically set to index
|
||||||
different areas of the file system. A specific index can be
|
different areas of the file system.</para>
|
||||||
selected for updating or searching, using the
|
|
||||||
<envar>RECOLL_CONFDIR</envar> environment variable or the
|
<para>A specific index can be selected by setting the
|
||||||
|
<envar>RECOLL_CONFDIR</envar> environment variable or giving the
|
||||||
<option>-c</option> option to <command>recoll</command> and
|
<option>-c</option> option to <command>recoll</command> and
|
||||||
<command>recollindex</command>.</para>
|
<command>recollindex</command>.</para>
|
||||||
|
|
||||||
<para>Index configuration parameters can be set either by using a
|
<para>The <command>recollindex</command> program, used for creating
|
||||||
text editor on the files, or, for most parameters, by using the
|
or updating indexes, always works on a single index. The different
|
||||||
<command>recoll</command> index configuration GUI. In the latter
|
configurations are entirely independant (no parameters are ever
|
||||||
case, the configuration directory for which parameters are modified
|
shared between configurations when indexing). </para>
|
||||||
is the one which was selected by <envar>RECOLL_CONFDIR</envar> or
|
|
||||||
the <option>-c</option> parameter, and there is no way to switch
|
|
||||||
configurations within the GUI.</para>
|
|
||||||
|
|
||||||
<para>As a remainder from a previous section, a
|
<para>All the search interfaces (<command>recoll</command>,
|
||||||
<command>recollindex</command> program instance can only update one
|
|
||||||
specific index, and it will only use parameters from a single
|
|
||||||
configuration (no parameters are ever shared between configurations
|
|
||||||
when indexing). All the query methods (<command>recoll</command>,
|
|
||||||
<command>recollq</command>, the Python API, etc.) operate with a
|
<command>recollq</command>, the Python API, etc.) operate with a
|
||||||
main configuration, from which both configuration and index data
|
main configuration, from which both configuration and index data
|
||||||
are used, but can also query data from multiple additional
|
are used, and can also query data from multiple additional
|
||||||
indexes. Only the index data from the latter is used, their
|
indexes. Only the index data from the latter is used, their
|
||||||
configuration parameters are ignored.</para>
|
configuration parameters are ignored. This implies that some
|
||||||
|
parameters should be consistent among index configurations which
|
||||||
|
are to be used together.</para>
|
||||||
|
|
||||||
<para>When searching, the current main index (defined by
|
<para>When searching, the current main index (defined by
|
||||||
<envar>RECOLL_CONFDIR</envar> or <option>-c</option>) is always
|
<envar>RECOLL_CONFDIR</envar> or <option>-c</option>) is always
|
||||||
active. If this is undesirable, you can set up your base
|
active. If this is undesirable, you can set up your base
|
||||||
configuration to index an empty directory.</para>
|
configuration to index an empty directory.</para>
|
||||||
|
|
||||||
<para>If a set of multiple indexes are to be used together for
|
<para>Index configuration parameters can be set either by using a
|
||||||
searches, some configuration parameters must be consistent
|
text editor on the files, or, for most parameters, by using the
|
||||||
among the set. These are parameters which need to be the same
|
<link linkend="RCL.INDEXING.CONFIG.GUI"><command>recoll</command> index configuration GUI</link>.
|
||||||
when indexing and searching. As the parameters come from the
|
In the latter case, the configuration directory for which
|
||||||
main configuration when searching, they need to be compatible
|
parameters are modified is the one which was selected by
|
||||||
with what was set when creating the other indexes (which came
|
<envar>RECOLL_CONFDIR</envar> or the <option>-c</option> parameter,
|
||||||
from their respective configuration directories).</para>
|
and there is no way to switch configurations within the GUI.</para>
|
||||||
|
|
||||||
<para>Most importantly, all indexes to be queried concurrently must
|
<para>See the <link linkend="RCL.INSTALL.CONFIG.RECOLLCONF">configuration section</link>
|
||||||
have the same option concerning character case and diacritics
|
for a detailed description of the parameters</para>
|
||||||
stripping, but there are other constraints. Most of the
|
|
||||||
relevant parameters are described in the
|
<para>Some configuration parameters must be consistent among a set
|
||||||
<link linkend="RCL.INSTALL.CONFIG.RECOLLCONF.TERMS">linked section</link>.
|
of multiple indexes used together for searches. Most importantly,
|
||||||
|
all indexes to be queried concurrently must have the same option
|
||||||
|
concerning character case and diacritics stripping, but there are
|
||||||
|
other constraints. Most of the relevant parameters affect the
|
||||||
|
<link linkend="RCL.INSTALL.CONFIG.RECOLLCONF.TERMS">term generation</link>.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para>The different search interfaces (GUI, command line, ...)
|
<para>Using multiple configurations implies a small
|
||||||
have different methods to define the set of indexes to be
|
level of command line or file manager usage. The user must
|
||||||
used, see the appropriate section.</para>
|
explicitely create additional configuration directories, the GUI
|
||||||
|
will not do it. This is to avoid mistakenly creating additional
|
||||||
|
directories when an argument is mistyped. Also, the GUI or the
|
||||||
|
indexer must be launched with a specific option or environment to
|
||||||
|
work on the right configuration.</para>
|
||||||
|
|
||||||
<para>At the moment, using multiple configurations implies a small
|
<simplesect>
|
||||||
level of command line usage. Additional configuration directories
|
<title>In practise: creating and using an additional index</title>
|
||||||
(beyond <filename>~/.recoll</filename>) must be created by hand
|
|
||||||
(<command>mkdir</command> or such), the GUI will not do it. This is
|
|
||||||
to avoid mistakenly creating additional directories when an
|
|
||||||
argument is mistyped. Also, the GUI or the indexer must be launched
|
|
||||||
with a specific option or environment to work on the right
|
|
||||||
configuration.</para>
|
|
||||||
|
|
||||||
<para>To be more practical, here follows a few examples of the
|
|
||||||
commands need to create, configure, update, and query an additional
|
|
||||||
index.</para>
|
|
||||||
|
|
||||||
<para>Initially creating the configuration and index:<programlisting>
|
<para>Initially creating the configuration and index:<programlisting>
|
||||||
mkdir <replaceable>/path/to/my/new/config</replaceable></programlisting></para>
|
mkdir <replaceable>/path/to/my/new/config</replaceable></programlisting></para>
|
||||||
@ -858,15 +806,19 @@ mkdir <replaceable>/path/to/my/new/config</replaceable></programlisting></para>
|
|||||||
<command>recoll</command> GUI, launched from the
|
<command>recoll</command> GUI, launched from the
|
||||||
command line to pass the <literal>-c</literal> option
|
command line to pass the <literal>-c</literal> option
|
||||||
(you could create a desktop file to do it for you), and then using the
|
(you could create a desktop file to do it for you), and then using the
|
||||||
GUI index configuration tool to set up the index.
|
<link linkend="RCL.INDEXING.CONFIG.GUI">GUI index configuration tool</link>
|
||||||
|
to set up the index.
|
||||||
<programlisting>
|
<programlisting>
|
||||||
recoll -c <replaceable>/path/to/my/new/config</replaceable></programlisting>
|
recoll -c <replaceable>/path/to/my/new/config</replaceable></programlisting>
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
|
|
||||||
<para>Alternatively, you can just start a text editor on the main
|
<para>Alternatively, you can just start a text editor on the main
|
||||||
configuration file
|
configuration file:
|
||||||
<link linkend="RCL.INSTALL.CONFIG.RECOLLCONF"><filename>recoll.conf</filename></link>.</para>
|
<programlisting>
|
||||||
|
<replaceable>someEditor</replaceable> <replaceable>/path/to/my/new/config</replaceable>/<link linkend="RCL.INSTALL.CONFIG.RECOLLCONF"><filename>recoll.conf</filename></link>
|
||||||
|
</programlisting>
|
||||||
|
</para>
|
||||||
|
|
||||||
|
|
||||||
<para>Creating and updating the index can be done from the command line:
|
<para>Creating and updating the index can be done from the command line:
|
||||||
@ -891,7 +843,7 @@ recoll -c <replaceable>/path/to/my/new/config</replaceable></programlisting>
|
|||||||
<guimenu>Preferences</guimenu>
|
<guimenu>Preferences</guimenu>
|
||||||
<guimenuitem>External Index Dialog</guimenuitem>
|
<guimenuitem>External Index Dialog</guimenuitem>
|
||||||
</menuchoice> menu.</para>
|
</menuchoice> menu.</para>
|
||||||
|
</simplesect>
|
||||||
</sect2>
|
</sect2>
|
||||||
|
|
||||||
|
|
||||||
@ -911,9 +863,8 @@ recoll -c <replaceable>/path/to/my/new/config</replaceable></programlisting>
|
|||||||
the index. With a stripped index, the search term will be stripped
|
the index. With a stripped index, the search term will be stripped
|
||||||
before searching.</para>
|
before searching.</para>
|
||||||
|
|
||||||
<para>A raw index allows for another possibility which a stripped
|
<para>A raw index allows using case and diacritics to discriminate
|
||||||
index cannot offer: using case and diacritics to discriminate
|
between terms, e.g., returning different results when searching for
|
||||||
between terms, returning different results when searching for
|
|
||||||
<literal>US</literal> and <literal>us</literal> or
|
<literal>US</literal> and <literal>us</literal> or
|
||||||
<literal>resume</literal> and <literal>résumé</literal>.
|
<literal>resume</literal> and <literal>résumé</literal>.
|
||||||
Read the
|
Read the
|
||||||
@ -927,15 +878,14 @@ recoll -c <replaceable>/path/to/my/new/config</replaceable></programlisting>
|
|||||||
automated by &RCL;), and all indexes in a search must be set
|
automated by &RCL;), and all indexes in a search must be set
|
||||||
in the same way (again, not checked by &RCL;). </para>
|
in the same way (again, not checked by &RCL;). </para>
|
||||||
|
|
||||||
<para>If the <literal>indexStripChars</literal> is not set, &RCL;
|
<para>&RCL; creates a stripped index by default if
|
||||||
1.18 creates a stripped index by default, for
|
<literal>indexStripChars</literal> is not set.</para>
|
||||||
compatibility with previous versions.</para>
|
|
||||||
|
|
||||||
<para>As a cost for added capability, a raw index will be slightly
|
<para>As a cost for added capability, a raw index will be slightly
|
||||||
bigger than a stripped one (around 10%). Also, searches will be
|
bigger than a stripped one (around 10%). Also, searches will be
|
||||||
more complex, so probably slightly slower, and the feature is
|
more complex, so probably slightly slower, and the feature is
|
||||||
still young, so that a certain amount of weirdness cannot be
|
relatively little used, so that a certain amount of weirdness
|
||||||
excluded.</para>
|
cannot be excluded.</para>
|
||||||
|
|
||||||
<para>One of the most adverse consequence of using a raw index
|
<para>One of the most adverse consequence of using a raw index
|
||||||
is that some phrase and proximity searches may become
|
is that some phrase and proximity searches may become
|
||||||
@ -950,7 +900,7 @@ recoll -c <replaceable>/path/to/my/new/config</replaceable></programlisting>
|
|||||||
|
|
||||||
|
|
||||||
<sect2 id="RCL.INDEXING.CONFIG.THREADS">
|
<sect2 id="RCL.INDEXING.CONFIG.THREADS">
|
||||||
<title>Indexing threads configuration</title>
|
<title>Indexing threads configuration (&LIN;)</title>
|
||||||
|
|
||||||
<para>The &RCL; indexing process
|
<para>The &RCL; indexing process
|
||||||
<command>recollindex</command> can use multiple threads to
|
<command>recollindex</command> can use multiple threads to
|
||||||
@ -1363,7 +1313,7 @@ recoll -c <replaceable>/path/to/my/new/config</replaceable></programlisting>
|
|||||||
<sect1 id="RCL.INDEXING.PERIODIC">
|
<sect1 id="RCL.INDEXING.PERIODIC">
|
||||||
<title>Periodic indexing</title>
|
<title>Periodic indexing</title>
|
||||||
|
|
||||||
<sect2 id="RCL.INDEXING.PERIODIC.EXEC">
|
<simplesect id="RCL.INDEXING.PERIODIC.EXEC">
|
||||||
<title>Running indexing</title>
|
<title>Running indexing</title>
|
||||||
|
|
||||||
<para>Indexing is always performed by the
|
<para>Indexing is always performed by the
|
||||||
@ -1381,19 +1331,36 @@ recoll -c <replaceable>/path/to/my/new/config</replaceable></programlisting>
|
|||||||
when it starts, it will automatically start indexing (except
|
when it starts, it will automatically start indexing (except
|
||||||
if canceled).</para>
|
if canceled).</para>
|
||||||
|
|
||||||
<para>The <command>recollindex</command> indexing process can be
|
<para>The GUI <menuchoice><guimenu>File</guimenu> </menuchoice>
|
||||||
interrupted by sending an interrupt (<keysym>Ctrl-C</keysym>,
|
menu has entries to start or stop the current indexing
|
||||||
SIGINT) or terminate
|
operation.</para>
|
||||||
(SIGTERM) signal. Some time may elapse before the process exits,
|
|
||||||
because it needs to properly flush and close the index. This can
|
<para>When no indexing is running, you have a choice of updating the
|
||||||
also be done from the <command>recoll</command> GUI
|
index or rebuilding it (the first choice only processes changed
|
||||||
|
files, the second one zeroes the index before starting so that all
|
||||||
|
files are processed).</para>
|
||||||
|
|
||||||
|
<para>On Linux, the <command>recollindex</command> indexing process
|
||||||
|
can be interrupted by sending an interrupt
|
||||||
|
(<keysym>Ctrl-C</keysym>, SIGINT) or terminate (SIGTERM)
|
||||||
|
signal.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>On Linux and Windows, the GUI can used to manage the indexing
|
||||||
|
operation. Stopping the indexer can be done
|
||||||
|
from the <command>recoll</command> GUI
|
||||||
<menuchoice>
|
<menuchoice>
|
||||||
<guimenu>File</guimenu>
|
<guimenu>File</guimenu>
|
||||||
<guimenuitem>Stop Indexing</guimenuitem>
|
<guimenuitem>Stop Indexing</guimenuitem>
|
||||||
</menuchoice>
|
</menuchoice>
|
||||||
menu entry.</para>
|
menu entry.
|
||||||
|
</para>
|
||||||
|
|
||||||
<para>After such an interruption, the index will be somewhat
|
<para>When stopped, some time may elapse before
|
||||||
|
<command>recollindex</command> exits, because it needs to properly
|
||||||
|
flush and close the index.</para>
|
||||||
|
|
||||||
|
<para>After an interruption, the index will be somewhat
|
||||||
inconsistent because some operations which are normally
|
inconsistent because some operations which are normally
|
||||||
performed at the end of the indexing pass will have been
|
performed at the end of the indexing pass will have been
|
||||||
skipped (for example, the stemming and spelling databases
|
skipped (for example, the stemming and spelling databases
|
||||||
@ -1404,9 +1371,11 @@ recoll -c <replaceable>/path/to/my/new/config</replaceable></programlisting>
|
|||||||
to the interruption and for which the index is still up to
|
to the interruption and for which the index is still up to
|
||||||
date will not need to be reindexed).</para>
|
date will not need to be reindexed).</para>
|
||||||
|
|
||||||
<para><command>recollindex</command> has a number of other options
|
<para><command>recollindex</command> has many options
|
||||||
which are described in its man page. Only a few will be
|
which are listed in its
|
||||||
described here.</para>
|
<ulink url="https://www.lesbonscomptes.com/recoll/manpages/recollindex.1.html">manual page</ulink>.
|
||||||
|
Only a few will be described here.</para>
|
||||||
|
|
||||||
<para>Option <option>-z</option> will reset the index when
|
<para>Option <option>-z</option> will reset the index when
|
||||||
starting. This is almost the same as destroying the index
|
starting. This is almost the same as destroying the index
|
||||||
files (the nuance is that the &XAP; format version will not
|
files (the nuance is that the &XAP; format version will not
|
||||||
@ -1446,11 +1415,10 @@ recoll -c <replaceable>/path/to/my/new/config</replaceable></programlisting>
|
|||||||
but just add them as index entries. It is
|
but just add them as index entries. It is
|
||||||
up to the external file selection method to build the complete
|
up to the external file selection method to build the complete
|
||||||
file list.</para>
|
file list.</para>
|
||||||
</sect2>
|
</simplesect>
|
||||||
|
|
||||||
<sect2 id="RCL.INDEXING.PERIODIC.AUTOMAT">
|
<simplesect id="RCL.INDEXING.PERIODIC.AUTOMAT">
|
||||||
<title>Using <command>cron</command> to automate
|
<title>Linux: using <command>cron</command> to automate indexing</title>
|
||||||
indexing</title>
|
|
||||||
|
|
||||||
<para>The most common way to set up indexing is to have a cron
|
<para>The most common way to set up indexing is to have a cron
|
||||||
task execute it every night. For example the following
|
task execute it every night. For example the following
|
||||||
@ -1468,7 +1436,7 @@ recoll -c <replaceable>/path/to/my/new/config</replaceable></programlisting>
|
|||||||
]]></screen>
|
]]></screen>
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para>As of version 1.17 the &RCL; GUI has dialogs to manage
|
<para>The &RCL; GUI has dialogs to manage
|
||||||
<filename>crontab</filename> entries for
|
<filename>crontab</filename> entries for
|
||||||
<command>recollindex</command>. You can reach them from the
|
<command>recollindex</command>. You can reach them from the
|
||||||
<menuchoice>
|
<menuchoice>
|
||||||
@ -1492,11 +1460,11 @@ recoll -c <replaceable>/path/to/my/new/config</replaceable></programlisting>
|
|||||||
issues.</para>
|
issues.</para>
|
||||||
|
|
||||||
|
|
||||||
</sect2>
|
</simplesect>
|
||||||
</sect1>
|
</sect1>
|
||||||
|
|
||||||
<sect1 id="RCL.INDEXING.MONITOR">
|
<sect1 id="RCL.INDEXING.MONITOR">
|
||||||
<title>Real time indexing</title>
|
<title>&LIN;: real time indexing</title>
|
||||||
|
|
||||||
<para>Real time monitoring/indexing is performed by starting the
|
<para>Real time monitoring/indexing is performed by starting the
|
||||||
<command>recollindex</command> <option>-m</option> command.
|
<command>recollindex</command> <option>-m</option> command.
|
||||||
@ -1504,6 +1472,11 @@ recoll -c <replaceable>/path/to/my/new/config</replaceable></programlisting>
|
|||||||
from the terminal and become a daemon, permanently monitoring
|
from the terminal and become a daemon, permanently monitoring
|
||||||
file changes and updating the index.</para>
|
file changes and updating the index.</para>
|
||||||
|
|
||||||
|
<para>In this situation, the <command>recoll</command> GUI
|
||||||
|
<menuchoice><guimenu>File</guimenu></menuchoice> menu
|
||||||
|
makes two operations available: 'Stop' and 'Trigger incremental pass'.
|
||||||
|
</para>
|
||||||
|
|
||||||
<para>While it is convenient that data is indexed in real time,
|
<para>While it is convenient that data is indexed in real time,
|
||||||
repeated indexing can generate a significant load on the
|
repeated indexing can generate a significant load on the
|
||||||
system when files such as email folders change. Also,
|
system when files such as email folders change. Also,
|
||||||
@ -1522,8 +1495,8 @@ recoll -c <replaceable>/path/to/my/new/config</replaceable></programlisting>
|
|||||||
process. The <command>recoll</command> GUI also has a menu entry for
|
process. The <command>recoll</command> GUI also has a menu entry for
|
||||||
this.</para>
|
this.</para>
|
||||||
|
|
||||||
<sect2 id="RCL.INDEXING.MONITOR.START">
|
<simplesect id="RCL.INDEXING.MONITOR.START">
|
||||||
<title>Real time indexing: automatic daemon start</title>
|
<title>Automatic daemon start</title>
|
||||||
|
|
||||||
<para>Under <application>KDE</application>,
|
<para>Under <application>KDE</application>,
|
||||||
<application>Gnome</application> and some other desktop
|
<application>Gnome</application> and some other desktop
|
||||||
@ -1542,17 +1515,15 @@ recoll -c <replaceable>/path/to/my/new/config</replaceable></programlisting>
|
|||||||
<filename>examples</filename> directory (typically
|
<filename>examples</filename> directory (typically
|
||||||
<filename>/usr/local/[share/]recoll/examples</filename>).</para>
|
<filename>/usr/local/[share/]recoll/examples</filename>).</para>
|
||||||
|
|
||||||
<para>For example, my out of fashion
|
<para>For example, a good old <application>xdm</application>-based
|
||||||
<application>xdm</application>-based session has a
|
session could have a <filename>.xsession</filename> script with the
|
||||||
<filename>.xsession</filename> script with the following lines
|
following lines at the end:</para>
|
||||||
at the end:</para>
|
|
||||||
|
|
||||||
<programlisting>recollconf=$HOME/.recoll-home
|
<programlisting>recollconf=$HOME/.recoll-home
|
||||||
recolldata=/usr/local/share/recoll
|
recolldata=/usr/local/share/recoll
|
||||||
RECOLL_CONFDIR=$recollconf $recolldata/examples/rclmon.sh start
|
RECOLL_CONFDIR=$recollconf $recolldata/examples/rclmon.sh start
|
||||||
|
|
||||||
fvwm
|
fvwm
|
||||||
|
|
||||||
</programlisting>
|
</programlisting>
|
||||||
|
|
||||||
<para>The indexing daemon gets started, then the window manager,
|
<para>The indexing daemon gets started, then the window manager,
|
||||||
@ -1567,10 +1538,10 @@ recoll -c <replaceable>/path/to/my/new/config</replaceable></programlisting>
|
|||||||
<application>X11</application> session, you need to add option
|
<application>X11</application> session, you need to add option
|
||||||
<option>-x</option> to disable <application>X11</application>
|
<option>-x</option> to disable <application>X11</application>
|
||||||
session monitoring (else the daemon will not start).</para>
|
session monitoring (else the daemon will not start).</para>
|
||||||
</sect2>
|
</simplesect>
|
||||||
|
|
||||||
<sect2 id="RCL.INDEXING.MONITOR.DETAILS">
|
<simplesect id="RCL.INDEXING.MONITOR.DETAILS">
|
||||||
<title>Real time indexing: miscellaneous details</title>
|
<title>Miscellaneous details</title>
|
||||||
|
|
||||||
<para>By default, the messages from the indexing daemon will be
|
<para>By default, the messages from the indexing daemon will be
|
||||||
sent to the same file as those from the interactive commands
|
sent to the same file as those from the interactive commands
|
||||||
@ -1581,17 +1552,7 @@ recoll -c <replaceable>/path/to/my/new/config</replaceable></programlisting>
|
|||||||
the daemon runs permanently, the log file may grow quite big,
|
the daemon runs permanently, the log file may grow quite big,
|
||||||
depending on the log level.</para>
|
depending on the log level.</para>
|
||||||
|
|
||||||
<para>When building &RCL;, the real time indexing support can be
|
<formalpara><title>Increasing resources for inotify</title>
|
||||||
customised during package
|
|
||||||
<link linkend="RCL.INSTALL.BUILDING">configuration</link>
|
|
||||||
with the <option>--with[out]-fam</option> or
|
|
||||||
<option>--with[out]-inotify</option> options. The default is
|
|
||||||
currently to include <application>inotify</application>
|
|
||||||
monitoring on systems that support it, and, as of &RCL; 1.17,
|
|
||||||
<application>gamin</application> support on
|
|
||||||
<application>FreeBSD</application>.</para>
|
|
||||||
|
|
||||||
<note><title>Increasing resources for inotify</title>
|
|
||||||
<para>On Linux systems, monitoring a big tree may need
|
<para>On Linux systems, monitoring a big tree may need
|
||||||
increasing the resources available to inotify, which are
|
increasing the resources available to inotify, which are
|
||||||
normally defined in <filename>/etc/sysctl.conf</filename>.
|
normally defined in <filename>/etc/sysctl.conf</filename>.
|
||||||
@ -1609,29 +1570,28 @@ recoll -c <replaceable>/path/to/my/new/config</replaceable></programlisting>
|
|||||||
fs.inotify.max_user_watches=32768
|
fs.inotify.max_user_watches=32768
|
||||||
</programlisting>
|
</programlisting>
|
||||||
|
|
||||||
</para>
|
Especially, you will need to trim your tree or adjust
|
||||||
<para>Especially, you will need to trim your tree or adjust
|
|
||||||
the <literal>max_user_watches</literal> value if indexing exits with
|
the <literal>max_user_watches</literal> value if indexing exits with
|
||||||
a message about errno <literal>ENOSPC</literal> (28) from
|
a message about errno <literal>ENOSPC</literal> (28) from
|
||||||
<function>inotify_add_watch</function>.</para>
|
<function>inotify_add_watch</function>.
|
||||||
</note>
|
</para>
|
||||||
|
</formalpara>
|
||||||
|
|
||||||
|
|
||||||
<note><title>Slowing down the reindexing rate for fast changing
|
<formalpara><title>Slowing down the reindexing rate for fast changing
|
||||||
files</title>
|
files</title>
|
||||||
|
|
||||||
<para>When using the real time monitor, it may happen that some
|
<para>When using the real time monitor, it may happen that some
|
||||||
files need to be indexed, but change so often that they impose an
|
files need to be indexed, but change so often that they impose an
|
||||||
excessive load for the system.</para>
|
excessive load for the system.
|
||||||
|
|
||||||
<para>&RCL; provides a configuration option to specify the minimum
|
&RCL; provides a configuration option to specify the minimum
|
||||||
time before which a file, specified by a wildcard pattern, cannot be
|
time before which a file, specified by a wildcard pattern, cannot be
|
||||||
reindexed. See the <varname>mondelaypatterns</varname> parameter in
|
reindexed. See the <varname>mondelaypatterns</varname> parameter in
|
||||||
the <link linkend="RCL.INSTALL.CONFIG.RECOLLCONF.MISC">configuration section</link>.
|
the <link linkend="RCL.INSTALL.CONFIG.RECOLLCONF.MISC">configuration section</link>.
|
||||||
</para>
|
</para>
|
||||||
</note>
|
</formalpara>
|
||||||
|
|
||||||
</sect2>
|
</simplesect>
|
||||||
|
|
||||||
</sect1>
|
</sect1>
|
||||||
|
|
||||||
@ -1660,12 +1620,9 @@ recoll -c <replaceable>/path/to/my/new/config</replaceable></programlisting>
|
|||||||
</listitem>
|
</listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
|
|
||||||
<para>In most cases, you can enter the terms as you
|
<para>In most cases, you can enter the terms as you think them, even
|
||||||
think them, even if they contain embedded punctuation or other
|
if they contain embedded punctuation or other non-textual characters
|
||||||
non-textual characters. For
|
(e.g. &RCL; can handle things like email addresses).</para>
|
||||||
example, &RCL; can handle things like email addresses, or
|
|
||||||
arbitrary cut and paste from another text window, punctation
|
|
||||||
and all.</para>
|
|
||||||
|
|
||||||
<para>The main case where you should enter text differently from
|
<para>The main case where you should enter text differently from
|
||||||
how it is printed is for east-asian languages (Chinese,
|
how it is printed is for east-asian languages (Chinese,
|
||||||
@ -1674,10 +1631,10 @@ recoll -c <replaceable>/path/to/my/new/config</replaceable></programlisting>
|
|||||||
case (they would typically be printed without white
|
case (they would typically be printed without white
|
||||||
space).</para>
|
space).</para>
|
||||||
|
|
||||||
<para>Some searches can be quite complex, and you may want to
|
<para>Some searches can be quite complex, and you may want to re-use
|
||||||
re-use them later, perhaps with some tweaking. &RCL; versions
|
them later, perhaps with some tweaking. &RCL; can save and restore
|
||||||
1.21 and later can save and restore searches, using XML files. See
|
searches. See <link linkend="RCL.SEARCH.SAVING">Saving and restoring
|
||||||
<link linkend="RCL.SEARCH.SAVING">Saving and restoring queries</link>.
|
queries</link>.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<sect2 id="RCL.SEARCH.GUI.SIMPLE">
|
<sect2 id="RCL.SEARCH.GUI.SIMPLE">
|
||||||
@ -1704,12 +1661,9 @@ recoll -c <replaceable>/path/to/my/new/config</replaceable></programlisting>
|
|||||||
documents containing all of the search terms (the ones with more
|
documents containing all of the search terms (the ones with more
|
||||||
terms will get better scores), just like the <guilabel>All
|
terms will get better scores), just like the <guilabel>All
|
||||||
terms</guilabel> mode. <guilabel>Any term</guilabel> will search
|
terms</guilabel> mode. <guilabel>Any term</guilabel> will search
|
||||||
for documents where at least one of the terms appear.</para>
|
for documents where at least one of the terms
|
||||||
|
appear. <guilabel>File name</guilabel> will exclusively look for
|
||||||
<para>The <guilabel>Query Language</guilabel> features are
|
file names, not contents</para>
|
||||||
described in
|
|
||||||
<link linkend="RCL.SEARCH.LANG">a separate section</link>.
|
|
||||||
</para>
|
|
||||||
|
|
||||||
<para>All search modes allow terms to be expanded with wildcards
|
<para>All search modes allow terms to be expanded with wildcards
|
||||||
characters (<literal>*</literal>, <literal>?</literal>,
|
characters (<literal>*</literal>, <literal>?</literal>,
|
||||||
@ -1717,11 +1671,21 @@ recoll -c <replaceable>/path/to/my/new/config</replaceable></programlisting>
|
|||||||
<link linkend="RCL.SEARCH.WILDCARDS">section about wildcards</link> for
|
<link linkend="RCL.SEARCH.WILDCARDS">section about wildcards</link> for
|
||||||
more details.</para>
|
more details.</para>
|
||||||
|
|
||||||
|
<para>In all modes except <guilabel>File name</guilabel>, you can
|
||||||
|
search for exact phrases (adjacent words in a given order) by
|
||||||
|
enclosing the input inside double quotes. Ex:
|
||||||
|
<literal>"virtual reality"</literal>.</para>
|
||||||
|
|
||||||
|
<para>The <guilabel>Query Language</guilabel> features are
|
||||||
|
described in
|
||||||
|
<link linkend="RCL.SEARCH.LANG">a separate section</link>.
|
||||||
|
</para>
|
||||||
|
|
||||||
<para>The <guilabel>File name</guilabel> search mode will
|
<para>The <guilabel>File name</guilabel> search mode will
|
||||||
specifically look for file names. The point of having a separate
|
specifically look for file names. The point of having a separate
|
||||||
file name search is that wild card expansion can be performed more
|
file name search is that wild card expansion can be performed more
|
||||||
efficiently on a small subset of the index (allowing wild cards on
|
efficiently on a small subset of the index (allowing wild cards on
|
||||||
the left of terms without excessive penality). Things to know:
|
the left of terms without excessive cost). Things to know:
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
<listitem><para>White space in the entry should match white
|
<listitem><para>White space in the entry should match white
|
||||||
space in the file name, and is not treated specially.</para>
|
space in the file name, and is not treated specially.</para>
|
||||||
@ -1743,11 +1707,6 @@ recoll -c <replaceable>/path/to/my/new/config</replaceable></programlisting>
|
|||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para>In all modes except <guilabel>File name</guilabel>, you can
|
|
||||||
search for exact phrases (adjacent words in a given order) by
|
|
||||||
enclosing the input inside double quotes. Ex:
|
|
||||||
<literal>"virtual reality"</literal>.</para>
|
|
||||||
|
|
||||||
<para>When using a stripped index (the default), character case has
|
<para>When using a stripped index (the default), character case has
|
||||||
no influence on search, except that you can disable stem expansion
|
no influence on search, except that you can disable stem expansion
|
||||||
for any term by capitalizing it. Ie: a search for
|
for any term by capitalizing it. Ie: a search for
|
||||||
@ -3403,20 +3362,19 @@ recoll -c <replaceable>/path/to/my/new/config</replaceable></programlisting>
|
|||||||
<command>recoll</command>). The query to be executed is specified
|
<command>recoll</command>). The query to be executed is specified
|
||||||
as command line arguments.</para>
|
as command line arguments.</para>
|
||||||
|
|
||||||
<para><command>recollq</command> is not built by default. You can
|
<para><command>recollq</command> is not always built by default. You
|
||||||
use the <filename>Makefile</filename> in the
|
can use the <filename>Makefile</filename> in the
|
||||||
<filename>query</filename> directory to build it. This is a very
|
<filename>query</filename> directory to build it. This is a very
|
||||||
simple program, and if you can program a little c++, you may find it
|
simple program, and if you can program a little c++, you may find it
|
||||||
useful to taylor its output format to your needs. Not that recollq is
|
useful to taylor its output format to your needs. Apart from being
|
||||||
only really useful on systems where the Qt libraries (or even the X11
|
easily customised, <command>recollq</command> is only really useful
|
||||||
ones) are not available. Otherwise, just use
|
on systems where the Qt libraries are not available, else it is
|
||||||
<literal>recoll -t</literal>, which takes the exact same
|
redundant with <literal>recoll -t</literal>.</para>
|
||||||
parameters and options which
|
|
||||||
are described for <command>recollq</command></para>
|
|
||||||
|
|
||||||
<para><command>recollq</command> has a man page (not installed by
|
<para><command>recollq</command> has a
|
||||||
default, look in the <filename>doc/man</filename> directory). The
|
<ulink url="https://www.lesbonscomptes.com/recoll/manpages/recollq.1.html">man page</ulink>.
|
||||||
Usage string is as follows:</para>
|
|
||||||
|
The Usage string is as follows:</para>
|
||||||
<programlisting>
|
<programlisting>
|
||||||
recollq: usage:
|
recollq: usage:
|
||||||
-P: Show the date span for all the documents present in the index
|
-P: Show the date span for all the documents present in the index
|
||||||
@ -3455,9 +3413,9 @@ recoll -c <replaceable>/path/to/my/new/config</replaceable></programlisting>
|
|||||||
</programlisting>
|
</programlisting>
|
||||||
|
|
||||||
<para>Sample execution:</para>
|
<para>Sample execution:</para>
|
||||||
<programlisting>recollq 'ilur -nautique mime:text/html'
|
<programlisting>
|
||||||
Recoll query: ((((ilur:(wqf=11) OR ilurs) AND_NOT (nautique:(wqf=11)
|
recollq 'ilur -nautique mime:text/html'
|
||||||
OR nautiques OR nautiqu OR nautiquement)) FILTER Ttext/html))
|
Recoll query: ((((ilur:(wqf=11) OR ilurs) AND_NOT (nautique:(wqf=11) OR nautiques OR nautiqu OR nautiquement)) FILTER Ttext/html))
|
||||||
4 results
|
4 results
|
||||||
text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/comptes.html] [comptes.html] 18593 bytes
|
text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/comptes.html] [comptes.html] 18593 bytes
|
||||||
text/html [file:///Users/uncrypted-dockes/projets/nautique/webnautique/articles/ilur1/index.html] [Constructio...
|
text/html [file:///Users/uncrypted-dockes/projets/nautique/webnautique/articles/ilur1/index.html] [Constructio...
|
||||||
@ -5835,9 +5793,8 @@ for i in range(nres):
|
|||||||
<sect1 id="RCL.INSTALL.EXTERNAL">
|
<sect1 id="RCL.INSTALL.EXTERNAL">
|
||||||
<title>Supporting packages</title>
|
<title>Supporting packages</title>
|
||||||
|
|
||||||
<note><para>The &WIN; installation of &RCL; is self-contained, and
|
<note><para>The &WIN; installation of &RCL; is self-contained.
|
||||||
only needs Python 2.7 to be externally installed. &WIN; users can
|
&WIN; users can skip this section.</para></note>
|
||||||
skip this section.</para></note>
|
|
||||||
|
|
||||||
<para>&RCL; uses external applications to index some file
|
<para>&RCL; uses external applications to index some file
|
||||||
types. You need to install them for the file types that you wish to
|
types. You need to install them for the file types that you wish to
|
||||||
@ -5851,134 +5808,46 @@ for i in range(nres):
|
|||||||
<filename>missing</filename> text file inside the configuration
|
<filename>missing</filename> text file inside the configuration
|
||||||
directory.</para>
|
directory.</para>
|
||||||
|
|
||||||
<para>A list of common file types which need external
|
<para>The past has proven that I was unable to maintain an up to date
|
||||||
commands follows. Many of the handlers need the
|
application list in this manual. Please check &RCLAPPS; for a
|
||||||
<command>iconv</command> command, which is not always listed as a
|
complete list along with links to the home pages or best
|
||||||
dependancy.</para>
|
source/patches pages, and misc tips. What follows is only a
|
||||||
|
very short extract of the stable essentials.</para>
|
||||||
|
|
||||||
<para>Please note that, due to the relatively dynamic nature of this
|
|
||||||
information, the most up to date version is now kept on &RCLAPPS;
|
|
||||||
along with links to the home pages or best source/patches pages,
|
|
||||||
and misc tips. The list below is not updated often and may be quite
|
|
||||||
stale.</para>
|
|
||||||
|
|
||||||
<para>For many Linux distributions, most of the commands listed can
|
|
||||||
be installed from the package repositories. However, the packages
|
|
||||||
are sometimes outdated, or not the best version for &RCL;, so you
|
|
||||||
should take a look at &RCLAPPS; if a file
|
|
||||||
type is important to you.</para>
|
|
||||||
|
|
||||||
<para>As of &RCL; release 1.14, a number of XML-based formats that
|
|
||||||
were handled by ad hoc handler code now use the
|
|
||||||
<command>xsltproc</command> command, which usually comes with
|
|
||||||
<application>libxslt</application>. These are: abiword, fb2
|
|
||||||
(ebooks), kword, openoffice, svg.</para>
|
|
||||||
|
|
||||||
<para>Now for the list:</para>
|
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
|
|
||||||
<listitem><para>Openoffice files need <command>unzip</command> and
|
|
||||||
<command>xsltproc</command>.</para></listitem>
|
|
||||||
|
|
||||||
<listitem><para>PDF files need <command>pdftotext</command>
|
<listitem><para>PDF files need <command>pdftotext</command>
|
||||||
which is part of <application>Poppler</application> (usually
|
which is part of <application>Poppler</application> (usually
|
||||||
comes with the <literal>poppler-utils</literal>
|
comes with the <literal>poppler-utils</literal>
|
||||||
package). Avoid the original one from
|
package). Avoid the original one from
|
||||||
<application>Xpdf</application>.</para></listitem>
|
<application>Xpdf</application>.</para></listitem>
|
||||||
|
|
||||||
<listitem><para>Postscript files need <command>pstotext</command>.
|
<listitem><para>MS Word documents need
|
||||||
The original version has an issue with shell
|
|
||||||
character in file names, which is corrected in recent
|
|
||||||
packages. See &RCLAPPS; for more detail.</para>
|
|
||||||
</listitem>
|
|
||||||
|
|
||||||
<listitem><para>MS Word needs
|
|
||||||
<command>antiword</command>. It is also useful to have
|
<command>antiword</command>. It is also useful to have
|
||||||
<command>wvWare</command> installed as it may be
|
<command>wvWare</command> installed as it may be
|
||||||
be used as a fallback for some files which
|
be used as a fallback for some files which
|
||||||
<command>antiword</command> does not handle.</para></listitem>
|
<command>antiword</command> does not handle.</para></listitem>
|
||||||
|
|
||||||
<listitem><para>MS Excel and PowerPoint are processed by
|
|
||||||
internal <command>Python</command> handlers.</para></listitem>
|
|
||||||
|
|
||||||
<listitem><para>MS Open XML (docx) needs <command>
|
|
||||||
xsltproc</command>.</para></listitem>
|
|
||||||
|
|
||||||
<listitem><para>Wordperfect files need <command>wpd2html</command>
|
|
||||||
from the <application>libwpd</application> (or
|
|
||||||
<application>libwpd-tools</application> on Ubuntu)
|
|
||||||
package.</para></listitem>
|
|
||||||
|
|
||||||
<listitem><para>RTF files need <command>unrtf</command>,
|
<listitem><para>RTF files need <command>unrtf</command>,
|
||||||
which, in its older versions, has much trouble with
|
which, in its older versions, has much trouble with
|
||||||
non-western character sets. Many Linux distributions carry
|
non-western character sets. Many Linux distributions carry
|
||||||
outdated <command>unrtf</command> versions. Check
|
outdated <command>unrtf</command> versions. Check
|
||||||
&RCLAPPS; for details.</para></listitem>
|
&RCLAPPS; for details.</para></listitem>
|
||||||
|
|
||||||
<listitem><para>TeX files need <command>untex</command> or
|
|
||||||
<command>detex</command>. Check &RCLAPPS; for sources if it's not
|
|
||||||
packaged for your distribution.</para></listitem>
|
|
||||||
|
|
||||||
<listitem><para>dvi files need <command>dvips</command>.</para>
|
|
||||||
</listitem>
|
|
||||||
|
|
||||||
<listitem><para>djvu files need <command>djvutxt</command> and
|
|
||||||
<command>djvused</command> from the
|
|
||||||
<application>DjVuLibre</application> package.</para></listitem>
|
|
||||||
|
|
||||||
<listitem><para>Audio files: &RCL; releases 1.14 and later use
|
|
||||||
a single <application>Python</application> handler based
|
|
||||||
on <application>mutagen</application> for all audio file
|
|
||||||
types.</para>
|
|
||||||
</listitem>
|
|
||||||
|
|
||||||
<listitem><para>Pictures: &RCL; uses the
|
<listitem><para>Pictures: &RCL; uses the
|
||||||
<application>Exiftool</application>
|
<application>Exiftool</application>
|
||||||
<application>Perl</application> package to extract tag
|
<application>Perl</application> package to extract tag
|
||||||
information. Most image file formats are supported. Note that
|
information. Most image file formats are
|
||||||
there may not be much interest in indexing the technical tags
|
supported.</para></listitem>
|
||||||
(image size, aperture, etc.). This is only of interest if you
|
|
||||||
store personal tags or textual descriptions inside the image
|
|
||||||
files.</para></listitem>
|
|
||||||
|
|
||||||
<listitem><para>chm: files in Microsoft help format need Python and
|
<listitem><para>Up to &RCL; 1.24, many XML-based formats need the
|
||||||
the <application>pychm</application> module (which needs
|
<command>xsltproc</command> command, which usually comes with
|
||||||
<application>chmlib</application>).</para></listitem>
|
<application>libxslt</application>. These are: abiword, fb2
|
||||||
|
ebooks, kword, openoffice, opendocument svg. &RCL; 1.25 and later
|
||||||
<listitem><para>ICS: up to &RCL; 1.13, iCalendar files need
|
process them internally (using libxslt).</para>
|
||||||
<application>Python</application>
|
|
||||||
and the <application>icalendar</application>
|
|
||||||
module. <application>icalendar</application> is not needed for newer
|
|
||||||
versions, which use internal code.</para></listitem>
|
|
||||||
|
|
||||||
<listitem><para>Zip archives need <application>Python</application>
|
|
||||||
(and the standard zipfile module). </para></listitem>
|
|
||||||
|
|
||||||
<listitem><para>Rar archives need
|
|
||||||
<application>Python</application>, the
|
|
||||||
<application>rarfile</application> Python module and the
|
|
||||||
<command>unrar</command> utility.</para></listitem>
|
|
||||||
|
|
||||||
<listitem><para>Midi karaoke files need
|
|
||||||
<application>Python</application> and the
|
|
||||||
<ulink url="http://pypi.python.org/pypi/midi/0.2.1">
|
|
||||||
<application>Midi module</application></ulink></para>
|
|
||||||
</listitem>
|
</listitem>
|
||||||
|
|
||||||
<listitem><para>Konqueror webarchive format with Python (uses the
|
|
||||||
Tarfile module).</para></listitem>
|
|
||||||
|
|
||||||
<listitem><para>Mimehtml web archive format (support based on
|
|
||||||
the email handler, which introduces some mild weirdness, but
|
|
||||||
still usable).</para></listitem>
|
|
||||||
|
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
|
|
||||||
<para>Text, HTML, email folders, and Scribus files are
|
|
||||||
processed internally. <application>Lyx</application> is used to
|
|
||||||
index Lyx files. Many handlers need <command>iconv</command> and the
|
|
||||||
standard <command>sed</command> and <command>awk</command>.
|
|
||||||
</para>
|
|
||||||
|
|
||||||
</sect1>
|
</sect1>
|
||||||
|
|
||||||
@ -6089,9 +5958,10 @@ for i in range(nres):
|
|||||||
terms. </para></listitem>
|
terms. </para></listitem>
|
||||||
|
|
||||||
<listitem><para><option>--with-fam</option> or
|
<listitem><para><option>--with-fam</option> or
|
||||||
<option>--with-inotify</option> will enable the code for
|
<option>--with-inotify</option> will enable the code for real
|
||||||
real time indexing. Inotify support is enabled by default on
|
time indexing. Inotify support is enabled by default on Linux
|
||||||
recent Linux systems.</para></listitem>
|
systems.</para></listitem>
|
||||||
|
|
||||||
|
|
||||||
<listitem><para><option>--with-qzeitgeist</option> will
|
<listitem><para><option>--with-qzeitgeist</option> will
|
||||||
enable sending <application>Zeitgeist</application>
|
enable sending <application>Zeitgeist</application>
|
||||||
|
|||||||
@ -216,9 +216,9 @@ usesystemfilecommand = 1
|
|||||||
# <var name="systemfilecommand" type="string"><brief>Command used to guess
|
# <var name="systemfilecommand" type="string"><brief>Command used to guess
|
||||||
# MIME types if the internal methods fails</brief><descr>This should be a
|
# MIME types if the internal methods fails</brief><descr>This should be a
|
||||||
# "file -i" workalike. The file path will be added as a last parameter to
|
# "file -i" workalike. The file path will be added as a last parameter to
|
||||||
# the command line. 'xdg-mime' works better than the traditional 'file'
|
# the command line. "xdg-mime" works better than the traditional "file"
|
||||||
# command, and is now the configured default (with a hard-coded fallback to
|
# command, and is now the configured default (with a hard-coded fallback to
|
||||||
# 'file')</descr></var>
|
# "file")</descr></var>
|
||||||
systemfilecommand = xdg-mime query filetype
|
systemfilecommand = xdg-mime query filetype
|
||||||
|
|
||||||
# <var name="processwebqueue" type="bool"><brief>Decide if we process the
|
# <var name="processwebqueue" type="bool"><brief>Decide if we process the
|
||||||
@ -885,7 +885,7 @@ snippetMaxPosWalk = 1000000
|
|||||||
# include a translation to a Recoll field name, separated by a '|'
|
# include a translation to a Recoll field name, separated by a '|'
|
||||||
# character. If the second element is absent, the tag name is used as the
|
# character. If the second element is absent, the tag name is used as the
|
||||||
# Recoll field names. You will also need to add specifications to the
|
# Recoll field names. You will also need to add specifications to the
|
||||||
# 'fields' file to direct processing of the extracted data.</descr></var>
|
# "fields" file to direct processing of the extracted data.</descr></var>
|
||||||
#pdfextrameta = bibtex:location|location bibtex:booktitle bibtex:pages
|
#pdfextrameta = bibtex:location|location bibtex:booktitle bibtex:pages
|
||||||
|
|
||||||
# <var name="pdfextrametafix" type="fn">
|
# <var name="pdfextrametafix" type="fn">
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user