release 1.20.0

This commit is contained in:
Jean-Francois Dockes 2013-11-09 14:40:06 +01:00
parent bb3fc54400
commit 3e6cbe755f
2 changed files with 228 additions and 47 deletions

View File

@ -294,6 +294,15 @@ Chapter 5. Installation and configuration
to manually copy and modify one of the existing files (the new file name to manually copy and modify one of the existing files (the new file name
should be the output of uname -s). should be the output of uname -s).
5.3.2.1. Building on Solaris
We did not test building the GUI on Solaris for recent versions. You will
need at least Qt 4.4. There are some hints on an old web site page, they
may still be valid.
Someone did test the 1.19 indexer and Python module build, they do work,
with a few minor glitches. Be sure to use GNU make and install.
5.3.3. Installation 5.3.3. Installation
Either type make install or execute recollinstall prefix, in the root of Either type make install or execute recollinstall prefix, in the root of
@ -342,12 +351,25 @@ Chapter 5. Installation and configuration
by comments inside the default files, and we will just give a general by comments inside the default files, and we will just give a general
overview here. overview here.
For each index, there are two sets of configuration files. System-wide By default, for each index, there are two sets of configuration files.
configuration files are kept in a directory named like System-wide configuration files are kept in a directory named like
/usr/[local/]share/recoll/examples, and define default values, shared by /usr/[local/]share/recoll/examples, and define default values, shared by
all indexes. For each index, a parallel set of files defines the all indexes. For each index, a parallel set of files defines the
customized parameters. customized parameters.
In addition (as of Recoll version 1.19.7), it is possible to specify two
additional configuration directories which will be stacked before and
after the user configuration directory. These are defined by the
RECOLL_CONFTOP and RECOLL_CONFMID environment variables. Values from
configuration files inside the top directory will override user ones,
values from configuration files inside the middle directory will override
system ones and be overriden by user ones. These two variables may be of
use to applications which augment Recoll functionality, and need to add
configuration data without disturbing the user's files. Please note that
the two, currently single, values will probably be interpreted as
colon-separated lists in the future: do not use colon characters inside
the directory paths.
The default location of the configuration is the .recoll directory in your The default location of the configuration is the .recoll directory in your
home. Most people will only use this directory. home. Most people will only use this directory.
@ -411,7 +433,7 @@ Chapter 5. Installation and configuration
text files with appropriate encodings, and concatenate them to create text files with appropriate encodings, and concatenate them to create
the complete configuration. the complete configuration.
5.4.1. Main configuration file 5.4.1. The main configuration file, recoll.conf
recoll.conf is the main configuration file. It defines things like what to recoll.conf is the main configuration file. It defines things like what to
index (top directories and things to ignore), and the default character index (top directories and things to ignore), and the default character
@ -437,7 +459,7 @@ Chapter 5. Installation and configuration
skippedNames skippedNames
A space-separated list of patterns for names of files or A space-separated list of wilcard patterns for names of files or
directories that should be completely ignored. The list defined in directories that should be completely ignored. The list defined in
the default file is: the default file is:
@ -488,6 +510,16 @@ Chapter 5. Installation and configuration
can set skippedPathsFnmPathname to 0 to disable the use of can set skippedPathsFnmPathname to 0 to disable the use of
FNM_PATHNAME (meaning that /*/dir3 will match /dir1/dir2/dir3). FNM_PATHNAME (meaning that /*/dir3 will match /dir1/dir2/dir3).
zipSkippedNames
A space-separated list of patterns for names of files or
directories that should be ignored inside zip archives. This is
used directly by the zip filter, and has a function similar to
skippedNames, but works independantly. Can be redefined for
filesystem subdirectories. For versions up to 1.19, you will need
to update the Zip filter and install a supplementary Python
module. The details are described on the Recoll wiki.
followLinks followLinks
Specifies if the indexer should follow symbolic links while Specifies if the indexer should follow symbolic links while
@ -679,17 +711,41 @@ Chapter 5. Installation and configuration
= val, then select specifier viewer with mimetype|tag=... in = val, then select specifier viewer with mimetype|tag=... in
mimeview. mimeview.
noxattrfields
Recoll versions 1.19 and later automatically translate file
extended attributes into document fields (to be processed
according to the parameters from the fields file). Setting this
variable to 1 will disable the behaviour.
metadatacmds metadatacmds
This allows executing external commands for each file and storing This allows executing external commands for each file and storing
the output in a Recoll field. This could be used for example to the output in Recoll document fields. This could be used for
index external tag data. The value is a list of field names and example to index external tag data. The value is a list of field
commands, don't forget an initial semi-colon. Example: names and commands, don't forget an initial semi-colon. Example:
[/some/area/of/the/fs] [/some/area/of/the/fs]
metadatacmds = ; tags = tmsu tags %f; otherfield = somecmd -xx %f metadatacmds = ; tags = tmsu tags %f; otherfield = somecmd -xx %f
As a specially disgusting hack brought by Recoll 1.19.7, if a
"field name" begins with rclmulti, the data returned by the
command is expected to contain multiple field values, in
configuration file format. This allows setting several fields by
executing a single command. Example:
metadatacmds = ; rclmulti1 = somecmd %f
If somecmd returns data in the form of:
field1 = value1
field2 = value for field2
field1 and field2 will be set inside the document metadata.
5.4.1.3. Parameters affecting where and how we store things: 5.4.1.3. Parameters affecting where and how we store things:
dbdir dbdir
@ -746,7 +802,7 @@ Chapter 5. Installation and configuration
memory, you can try higher values between 20 and 80. In my memory, you can try higher values between 20 and 80. In my
experience, values beyond 100 are always counterproductive. experience, values beyond 100 are always counterproductive.
5.4.1.4. Indexing parallelism configuration 5.4.1.4. Parameters affecting multithread processing
The Recoll indexing process recollindex can use multiple threads to speed The Recoll indexing process recollindex can use multiple threads to speed
up indexing on multiprocessor systems. The work done to index files is up indexing on multiprocessor systems. The work done to index files is
@ -774,7 +830,7 @@ Chapter 5. Installation and configuration
stage. In practise, deep queues have not been shown to increase stage. In practise, deep queues have not been shown to increase
performance. A value of 0 for the first queue tells Recoll to performance. A value of 0 for the first queue tells Recoll to
perform autoconfiguration (no need for the two other values in perform autoconfiguration (no need for the two other values in
this case)- this is the default configuration. this case) - this is the default configuration.
thrTCounts thrTCounts
@ -804,6 +860,11 @@ Chapter 5. Installation and configuration
thrQSizes = 2 -1 -1 thrQSizes = 2 -1 -1
thrTCounts = 6 1 1 thrTCounts = 6 1 1
The following example would disable multithreading. Indexing will be
performed by a single thread.
thrQSizes = -1 -1 -1
5.4.1.5. Miscellaneous parameters: 5.4.1.5. Miscellaneous parameters:
autodiacsens autodiacsens

View File

@ -174,7 +174,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
5.4. Configuration overview 5.4. Configuration overview
5.4.1. Main configuration file 5.4.1. The main configuration file, recoll.conf
5.4.2. The fields file 5.4.2. The fields file
@ -416,11 +416,11 @@ Chapter 2. Indexing
to be indexed. In the latter case, any type not in the list will be to be indexed. In the latter case, any type not in the list will be
ignored. ignored.
Excluding types can be done by adding name patterns to the skippedNames Excluding types can be done by adding wildcard name patterns to the
list, which can be done from the GUI Index configuration menu. It is also skippedNames list, which can be done from the GUI Index configuration
possible to exclude a mime type independantly of the file name by menu. It is also possible to exclude a mime type independantly of the file
associating it with the rclnull filter. This can be done by editing the name by associating it with the rclnull filter. This can be done by
mimeconf configuration file. editing the mimeconf configuration file.
In order to define a positive list, You need to edit the main In order to define a positive list, You need to edit the main
configuration file (recoll.conf) and set the indexedmimetypes configuration file (recoll.conf) and set the indexedmimetypes
@ -627,6 +627,11 @@ Chapter 2. Indexing
probably slightly slower, and the feature is still young, so that a probably slightly slower, and the feature is still young, so that a
certain amount of weirdness cannot be excluded. certain amount of weirdness cannot be excluded.
One of the most adverse consequence of using a raw index is that some
phrase and proximity searches may become impossible: because each term
needs to be expanded, and all combinations searched for, the
multiplicative expansion may become unmanageable.
2.3.3. The index configuration GUI 2.3.3. The index configuration GUI
Most parameters for a given index configuration can be set from a recoll Most parameters for a given index configuration can be set from a recoll
@ -860,6 +865,24 @@ Chapter 2. Indexing
it if your system is short on resources. Periodic indexing is adequate in it if your system is short on resources. Periodic indexing is adequate in
most cases. most cases.
Increasing resources for inotify
On Linux systems, monitoring a big tree may imply increasing the resources
available to inotify, which are normally defined in /etc/sysctl.conf.
### inotify
#
# cat /proc/sys/fs/inotify/max_queued_events - 16384
# cat /proc/sys/fs/inotify/max_user_instances - 128
# cat /proc/sys/fs/inotify/max_user_watches - 16384
#
# -- Change to:
#
fs.inotify.max_queued_events=32768
fs.notify.max_user_instances=256
fs.inotify.max_user_watches=32768
2.8.1. Slowing down the reindexing rate for fast changing files 2.8.1. Slowing down the reindexing rate for fast changing files
When using the real time monitor, it may happen that some files need to be When using the real time monitor, it may happen that some files need to be
@ -2702,14 +2725,22 @@ Chapter 4. Programming interface
4.3.2.1. Introduction 4.3.2.1. Introduction
Recoll versions after 1.11 define a Python programming interface, both for Recoll versions after 1.11 define a Python programming interface, both for
searching and indexing. searching and indexing. The indexing portion has seen little use, but the
searching one is used in the Recoll Ubuntu Unity Lens and Recoll Web UI.
The API is inspired by the Python database API specification, version 1.0 The API is inspired by the Python database API specification. There were
for Recoll versions up to 1.18, version 2.0 for Recoll versions 1.19 and two major changes in recent Recoll versions:
later. The package structure changed with Recoll 1.19 too. We will mostly
describe the new API and package structure here. A paragraph at the end of o The basis for the Recoll API changed from Python database API version
this section will explain a few differences and ways to write code 1.0 (Recoll versions up to 1.18.1), to version 2.0 (Recoll 1.18.2 and
compatible with both versions. later).
o The recoll module became a package (with an internal recoll module) as
of Recoll version 1.19, in order to add more functions. For existing
code, this only changes the way the interface must be imported.
We will mostly describe the new API and package structure here. A
paragraph at the end of this section will explain a few differences and
ways to write code compatible with both versions.
The Python interface can be found in the source package, under The Python interface can be found in the source package, under
python/recoll. python/recoll.
@ -2723,6 +2754,12 @@ Chapter 4. Programming interface
python setup.py install python setup.py install
The normal Recoll installer installs the Python API along with the main
code.
When installing from a repository, and depending on the distribution, the
Python API can sometimes be found in a separate package.
4.3.2.2. Recoll package 4.3.2.2. Recoll package
The recoll package contains two modules: The recoll package contains two modules:
@ -2766,7 +2803,17 @@ Chapter 4. Programming interface
These aliases return a blank Query object for this index. These aliases return a blank Query object for this index.
Db.setAbstractParams(maxchars, contextwords) Db.setAbstractParams(maxchars, contextwords)
Set the parameters used to build snippets. Set the parameters used to build snippets (sets of keywords in
context text fragments). maxchars defines the maximum total size
of the abstract. contextwords defines how many terms are shown
around the keyword.
Db.termMatch(match_type, expr, field='', maxlen=-1, casesens=False,
diacsens=False, lang='english')
Expand an expression against the index term list. Performs the
basic function from the GUI term explorer tool. match_type can be
either of wildcard, regexp or stem. Returns a list of terms
expanded from the input expression.
The Query class The Query class
@ -2794,7 +2841,7 @@ Chapter 4. Programming interface
Fetches the next Doc object from the current search results. Fetches the next Doc object from the current search results.
Query.close() Query.close()
Closes the connection. The object is unusable after the call. Closes the query. The object is unusable after the call.
Query.scroll(value, mode='relative') Query.scroll(value, mode='relative')
Adjusts the position in the current result set. mode can be Adjusts the position in the current result set. mode can be
@ -2803,9 +2850,9 @@ Chapter 4. Programming interface
Query.getgroups() Query.getgroups()
Retrieves the expanded query terms as a list of pairs. Meaningful Retrieves the expanded query terms as a list of pairs. Meaningful
only after executexx In each pair, the first entry is a list of only after executexx In each pair, the first entry is a list of
user terms, the second a list of query terms as derived from the user terms (of size one for simple terms, or more for group and
user terms and used in the Xapian Query. The size of each list is phrase clauses), the second a list of query terms as derived from
one for simple terms, or more for group and phrase clauses. the user terms and used in the Xapian Query.
Query.getxquery() Query.getxquery()
Return the Xapian query description as a Unicode string. Return the Xapian query description as a Unicode string.
@ -2837,8 +2884,8 @@ Chapter 4. Programming interface
Query.rownumber Query.rownumber
Next index to be fetched from results. Normally increments after Next index to be fetched from results. Normally increments after
each fetchone() call, but can be set/reset before the call effect each fetchone() call, but can be set/reset before the call to
seeking. Starts at 0. effect seeking (equivalent to using scroll()). Starts at 0.
The Doc class The Doc class
@ -2887,11 +2934,13 @@ Chapter 4. Programming interface
4.3.2.4. The rclextract module 4.3.2.4. The rclextract module
Document content is not provided by an index query. To access it, the data Index queries do not provide document content (only a partial and
extraction part of the indexing process must be performed (subdocument unprecise reconstruction is performed to show the snippets text). In order
access and format translation). This is not trivial in general. The to access the actual document data, the data extraction part of the
rclextract module currently provides a single class which can be used to indexing process must be performed (subdocument access and format
access the data content for result documents. translation). This is not trivial in general. The rclextract module
currently provides a single class which can be used to access the data
content for result documents.
Classes Classes
@ -2905,13 +2954,23 @@ Chapter 4. Programming interface
Extractor.textextract(ipath) Extractor.textextract(ipath)
Extract document defined by ipath and return a Doc object. The Extract document defined by ipath and return a Doc object. The
doc.text field has the document text as either text/plain or doc.text field has the document text converted to either
text/html according to doc.mimetype. text/plain or text/html according to doc.mimetype. The typical use
would be as follows:
Extractor.idoctofile() qdoc = query.fetchone()
extractor = recoll.Extractor(qdoc)
doc = extractor.textextract(qdoc.ipath)
# use doc.text, e.g. for previewing
Extractor.idoctofile(ipath, targetmtype, outfile='')
Extracts document into an output file, which can be given Extracts document into an output file, which can be given
explicitly or will be created as a temporary file to be deleted by explicitly or will be created as a temporary file to be deleted by
the caller. the caller. Typical use:
qdoc = query.fetchone()
extractor = recoll.Extractor(qdoc)
filename = extractor.idoctofile(qdoc.ipath, qdoc.mimetype)
4.3.2.5. Example code 4.3.2.5. Example code
@ -3224,6 +3283,15 @@ Chapter 5. Installation and configuration
to manually copy and modify one of the existing files (the new file name to manually copy and modify one of the existing files (the new file name
should be the output of uname -s). should be the output of uname -s).
5.3.2.1. Building on Solaris
We did not test building the GUI on Solaris for recent versions. You will
need at least Qt 4.4. There are some hints on an old web site page, they
may still be valid.
Someone did test the 1.19 indexer and Python module build, they do work,
with a few minor glitches. Be sure to use GNU make and install.
5.3.3. Installation 5.3.3. Installation
Either type make install or execute recollinstall prefix, in the root of Either type make install or execute recollinstall prefix, in the root of
@ -3259,12 +3327,25 @@ Chapter 5. Installation and configuration
by comments inside the default files, and we will just give a general by comments inside the default files, and we will just give a general
overview here. overview here.
For each index, there are two sets of configuration files. System-wide By default, for each index, there are two sets of configuration files.
configuration files are kept in a directory named like System-wide configuration files are kept in a directory named like
/usr/[local/]share/recoll/examples, and define default values, shared by /usr/[local/]share/recoll/examples, and define default values, shared by
all indexes. For each index, a parallel set of files defines the all indexes. For each index, a parallel set of files defines the
customized parameters. customized parameters.
In addition (as of Recoll version 1.19.7), it is possible to specify two
additional configuration directories which will be stacked before and
after the user configuration directory. These are defined by the
RECOLL_CONFTOP and RECOLL_CONFMID environment variables. Values from
configuration files inside the top directory will override user ones,
values from configuration files inside the middle directory will override
system ones and be overriden by user ones. These two variables may be of
use to applications which augment Recoll functionality, and need to add
configuration data without disturbing the user's files. Please note that
the two, currently single, values will probably be interpreted as
colon-separated lists in the future: do not use colon characters inside
the directory paths.
The default location of the configuration is the .recoll directory in your The default location of the configuration is the .recoll directory in your
home. Most people will only use this directory. home. Most people will only use this directory.
@ -3328,7 +3409,7 @@ Chapter 5. Installation and configuration
text files with appropriate encodings, and concatenate them to create text files with appropriate encodings, and concatenate them to create
the complete configuration. the complete configuration.
5.4.1. Main configuration file 5.4.1. The main configuration file, recoll.conf
recoll.conf is the main configuration file. It defines things like what to recoll.conf is the main configuration file. It defines things like what to
index (top directories and things to ignore), and the default character index (top directories and things to ignore), and the default character
@ -3354,7 +3435,7 @@ Chapter 5. Installation and configuration
skippedNames skippedNames
A space-separated list of patterns for names of files or A space-separated list of wilcard patterns for names of files or
directories that should be completely ignored. The list defined in directories that should be completely ignored. The list defined in
the default file is: the default file is:
@ -3405,6 +3486,16 @@ Chapter 5. Installation and configuration
can set skippedPathsFnmPathname to 0 to disable the use of can set skippedPathsFnmPathname to 0 to disable the use of
FNM_PATHNAME (meaning that /*/dir3 will match /dir1/dir2/dir3). FNM_PATHNAME (meaning that /*/dir3 will match /dir1/dir2/dir3).
zipSkippedNames
A space-separated list of patterns for names of files or
directories that should be ignored inside zip archives. This is
used directly by the zip filter, and has a function similar to
skippedNames, but works independantly. Can be redefined for
filesystem subdirectories. For versions up to 1.19, you will need
to update the Zip filter and install a supplementary Python
module. The details are described on the Recoll wiki.
followLinks followLinks
Specifies if the indexer should follow symbolic links while Specifies if the indexer should follow symbolic links while
@ -3596,17 +3687,41 @@ Chapter 5. Installation and configuration
= val, then select specifier viewer with mimetype|tag=... in = val, then select specifier viewer with mimetype|tag=... in
mimeview. mimeview.
noxattrfields
Recoll versions 1.19 and later automatically translate file
extended attributes into document fields (to be processed
according to the parameters from the fields file). Setting this
variable to 1 will disable the behaviour.
metadatacmds metadatacmds
This allows executing external commands for each file and storing This allows executing external commands for each file and storing
the output in a Recoll field. This could be used for example to the output in Recoll document fields. This could be used for
index external tag data. The value is a list of field names and example to index external tag data. The value is a list of field
commands, don't forget an initial semi-colon. Example: names and commands, don't forget an initial semi-colon. Example:
[/some/area/of/the/fs] [/some/area/of/the/fs]
metadatacmds = ; tags = tmsu tags %f; otherfield = somecmd -xx %f metadatacmds = ; tags = tmsu tags %f; otherfield = somecmd -xx %f
As a specially disgusting hack brought by Recoll 1.19.7, if a
"field name" begins with rclmulti, the data returned by the
command is expected to contain multiple field values, in
configuration file format. This allows setting several fields by
executing a single command. Example:
metadatacmds = ; rclmulti1 = somecmd %f
If somecmd returns data in the form of:
field1 = value1
field2 = value for field2
field1 and field2 will be set inside the document metadata.
5.4.1.3. Parameters affecting where and how we store things: 5.4.1.3. Parameters affecting where and how we store things:
dbdir dbdir
@ -3663,7 +3778,7 @@ Chapter 5. Installation and configuration
memory, you can try higher values between 20 and 80. In my memory, you can try higher values between 20 and 80. In my
experience, values beyond 100 are always counterproductive. experience, values beyond 100 are always counterproductive.
5.4.1.4. Indexing parallelism configuration 5.4.1.4. Parameters affecting multithread processing
The Recoll indexing process recollindex can use multiple threads to speed The Recoll indexing process recollindex can use multiple threads to speed
up indexing on multiprocessor systems. The work done to index files is up indexing on multiprocessor systems. The work done to index files is
@ -3691,7 +3806,7 @@ Chapter 5. Installation and configuration
stage. In practise, deep queues have not been shown to increase stage. In practise, deep queues have not been shown to increase
performance. A value of 0 for the first queue tells Recoll to performance. A value of 0 for the first queue tells Recoll to
perform autoconfiguration (no need for the two other values in perform autoconfiguration (no need for the two other values in
this case)- this is the default configuration. this case) - this is the default configuration.
thrTCounts thrTCounts
@ -3721,6 +3836,11 @@ Chapter 5. Installation and configuration
thrQSizes = 2 -1 -1 thrQSizes = 2 -1 -1
thrTCounts = 6 1 1 thrTCounts = 6 1 1
The following example would disable multithreading. Indexing will be
performed by a single thread.
thrQSizes = -1 -1 -1
5.4.1.5. Miscellaneous parameters: 5.4.1.5. Miscellaneous parameters:
autodiacsens autodiacsens