diff --git a/src/doc/user/usermanual.sgml b/src/doc/user/usermanual.sgml
index 0260e65f..4f0ca928 100644
--- a/src/doc/user/usermanual.sgml
+++ b/src/doc/user/usermanual.sgml
@@ -26,8 +26,7 @@
2005-2012
- Jean-Francois
- Dockes
+ Jean-Francois Dockes
@@ -193,7 +192,7 @@
command line interface, a
Python
- programming interface, a
+ programming interface, a
KDE KIO slave module, and
a Ubuntu Unity Lens module.
@@ -209,100 +208,143 @@
IntroductionIndexing is the process by which the set of documents is
- analyzed and the data entered into the database. &RCL; indexing
- is normally incremental: documents will only be processed if
- they have been modified. On the first execution, all
- documents will need processing. A full index build can be forced
- later by specifying an option to the indexing command
- (recollindex ).
+ analyzed and the data entered into the database. &RCL;
+ indexing is normally incremental: documents will only be
+ processed if they have been modified. On the first execution,
+ all documents will need processing. A full index build can be
+ forced later by specifying an option to the indexing command
+ (recollindex
+ or ).
- &RCL; indexing can be performed with two different
- methods:
+ The following sections give an overview of different
+ aspects of the indexing processes and configuration, with links
+ to detailed sections.
-
+
+ Indexing modes
-
- Periodic (or Batch) indexing:
- indexing takes place at discrete
- times, by executing the recollindex
- command. The typical usage is to have a nightly indexing run
- programmed
- into your cron file.
-
-
+ &RCL; indexing can be performed along two different modes:
+
+
+
+
+ Periodic (or batch) indexing:
+ indexing takes place at discrete
+ times, by executing the recollindex
+ command. The typical usage is to have a nightly indexing run
+
+ programmed into
+ your cron file.
+
+
+
+ Real
+ time indexing:
+ indexing takes place as soon as a file is created or
+ changed. recollindex runs as a daemon
+ and uses a file system alteration monitor such as
+ inotify,
+ Fam or
+ Gamin
+ to detect file changes.
+
+
+
+
+ The choice between the two methods is mostly a matter of
+ preference, and they can be combined by setting up multiple
+ indexes (ie: use periodic indexing on a big documentation
+ directory, and real time indexing on a small home
+ directory). Monitoring a big file system tree can consume
+ significant system resources.
-
- Real time indexing:
- indexing takes place as soon as a file is created or
- changed. recollindex runs as a daemon
- and uses a file system alteration monitor such as
- inotify,
- Fam or
- Gamin
- to detect file changes.
-
-
-
+
- The choice between the two methods is mostly a matter of
- preference, and they can be combined by setting up multiple
- indexes (ie: use periodic indexing on a big documentation
- directory, and real time indexing on a small home
- directory). Monitoring a big file system tree can consume
- significant system resources.
+
+ Configurations, multiple indexes
+
+ The parameters describing what is to be indexed and
+ local preferences are defined in text files contained in a
+ configuration
+ directory.
+ All parameters have defaults, defined in system-wide
+ files.
+ Without further configuration, &RCL; will index all
+ appropriate files from your home directory, with a reasonable
+ set of defaults.
+ A default personal configuration directory
+ ($HOME/.recoll/) is created
+ when a &RCL; program is first executed. It is possible to
+ create other configuration directories, and use them by
+ setting the RECOLL_CONFDIR environment
+ variable, or giving the option to any of
+ the &RCL; commands.
- &RCL; knows about quite a few different document
- types. The parameters for document types recognition and
- processing are set in
- configuration files.
+ In some cases, it may be interesting to index different
+ areas of the file system to separate databases. You can do this
+ by using multiple configuration directories, each indexing a
+ file system area to a specific database. Typically, this
+ would be done to separate personal and shared
+ indexes, or to take advantage of the organization of your data
+ to improve search precision.
+ The generated indexes can
+ be queried
+ concurrently in a transparent manner.
- Most file types, like HTML or word processing files, only hold
- one document. Some file types, like email folders or zip
- archives, can hold many individually indexed documents, which may
- in turn be themselves compound ones. Such hierarchies can go quite
- deep, and &RCL; can process, for example, an
- ms-word
- document stored as an attachment to an email message inside an
- email folder archived in a zip file...
+ For index generation, multiple configurations are
+ totally independant from each other. When multiple indexes
+ are used for searches,
+ some parameters
+ should be consistent among the configurations.
- &RCL; indexing processes plain text, HTML, OpenDocument
- (Open/LibreOffice), email formats, and a few others internally.
+
- Other file types (ie: postscript, pdf, ms-word, rtf ...)
- need external applications for preprocessing. The list is in the
- installation
- section. After every indexing operation, &RCL; updates a list of
- commands that would be needed for indexing existing files
- types. This list can be displayed by selecting the menu option
-
+
+ Document types
+ &RCL; knows about quite a few different document
+ types. The parameters for document types recognition and
+ processing are set in
+ configuration files.
+
+ Most file types, like HTML or word processing files, only hold
+ one document. Some file types, like email folders or zip
+ archives, can hold many individually indexed documents, which may
+ themselves be compound ones. Such hierarchies can go quite
+ deep, and &RCL; can process, for example, an
+ ms-word
+ document stored as an attachment to an email message inside an
+ email folder archived in a zip file...
+
+ &RCL; indexing processes plain text, HTML, OpenDocument
+ (Open/LibreOffice), email formats, and a few others internally.
+
+ Other file types (ie: postscript, pdf, ms-word, rtf ...)
+ need external applications for preprocessing. The list is in the
+ installation
+ section. After every indexing operation, &RCL; updates a list of
+ commands that would be needed for indexing existing files
+ types. This list can be displayed by selecting the menu option
+ FileShow Missing Helpers
-
- in the recoll GUI. It is stored in the
- missing text file inside the configuration
- directory.
+
+ in the recoll GUI. It is stored in the
+ missing text file inside the configuration
+ directory.
+
- Without further configuration, &RCL; will index all
- appropriate files from your home directory, with a reasonable
- set of defaults.
-
- In some cases, it may be interesting to index different
- areas of the file system to separate databases. You can do this
- by using multiple configuration directories, each indexing a
- file system area to a specific database. See the
- section about using multiple
- databases for more information on multiple configurations
- and indexes.
-
- In the rare case where the index becomes corrupted (which can
- signal itself by weird search results or crashes), the index files
- need to be erased before restarting a clean indexing pass. Just delete
- the xapiandb directory (see
- next section), or,
- alternatively, start the next recollindex with the
- option, which will reset the database before
- indexing.
+
+ Recovery
+ In the rare case where the index becomes corrupted (which can
+ signal itself by weird search results or crashes), the index files
+ need to be erased before restarting a clean indexing pass. Just delete
+ the xapiandb directory (see
+ next section), or,
+ alternatively, start the next recollindex with the
+ option, which will reset the database before
+ indexing.
+
@@ -313,10 +355,8 @@
xapiandb subdirectory of the &RCL;
configuration directory, typically
$HOME/.recoll/xapiandb/. This can be
- changed via two different methods (with different purposes):
-
+ changed via two different methods (with different purposes):
-
You can specify a different configuration
directory by setting the RECOLL_CONFDIR
environment variable, or using the
@@ -341,6 +381,7 @@ recoll
that you wish to make searchable.
+
You can also specify a different storage
location for the index by setting the dbdir
parameter in the configuration file
@@ -352,13 +393,14 @@ recoll
+
- The size of the index is determined by the document set size,
- but the ratio can vary a lot. For a typical mixed
- set of documents, the index size will often be close to
- the data set size. In specific cases (a set of compressed
- mbox files for example), the index can become much bigger than
- the documents. It may also be much smaller if the documents
+ The size of the index is determined by the size of the set
+ of documents, but the ratio can vary a lot. For a typical
+ mixed set of documents, the index size will often be close to
+ the data set size. In specific cases (a set of compressed mbox
+ files for example), the index can become much bigger than the
+ documents. It may also be much smaller if the documents
contain a lot of images or other non-indexed data (an extreme
example being a set of mp3 files where only the tags would be
indexed).
@@ -388,7 +430,7 @@ recoll
explicitly delete the old index, then run a normal indexing
process.
- Unfortunately, using the option to
+ Using the option to
recollindex is not sufficient to change the
format, you will have to delete all files inside the index
directory (typically ~/.recoll/xapiandb)
@@ -430,11 +472,6 @@ recoll
editing the text files or using the dialogs in the
recoll GUI.
- You can also use multiple
- indexes defined by separate configurations, typically to
- separate personal and shared indexes, or to take advantage of
- the organization of your data to improve search precision.
-
The first time you start recoll, you
will be asked whether or not you would like it to build the
index. If you want to adjust the configuration before
@@ -582,19 +619,32 @@ recoll
menu entry.After such an interruption, the index will be somewhat
- inconsistent because some operations which are normally performed
- at the end of the indexing pass will have been skipped (for
- example, the stemming and spelling databases will be inexistant
- or out of date). You just need to restart indexing at a later
- time to restore consistency. The indexing will restart at the
- interruption point (the full file tree will be traversed,
- but files that were indexed up to the interruption and are still
- up to date will not need to be reindexed).
+ inconsistent because some operations which are normally
+ performed at the end of the indexing pass will have been
+ skipped (for example, the stemming and spelling databases
+ will be inexistant or out of date). You just need to restart
+ indexing at a later time to restore consistency. The
+ indexing will restart at the interruption point (the full
+ file tree will be traversed, but files that were indexed up
+ to the interruption and for which the index is still up to
+ date will not need to be reindexed).recollindex has a number of other options
- which are described in its man page.
-
- Of special interest maybe are the and
+ which are described in its man page. Only a few will be
+ described here.
+ Option will reset the index when
+ starting. This is almost the same as destroying the index
+ files (the nuance is that the Xapian format version will not
+ be changed).
+ Option will force the update of all
+ documents without resetting the index first. This will not
+ have the "clean start" aspect of , but
+ the advantage is that the index will remain available for
+ querying while it is rebuilt, which can be a significant
+ advantage if it is very big (some installations need days
+ for a full index rebuild).
+ Of special interest also, maybe, are
+ the and
options. allows
indexing an explicit list of files (given as command line
parameters or read on stdin).
@@ -799,7 +849,7 @@ fvwm
case (they would typically be printed without white
space).
-
+ Simple search
@@ -907,7 +957,7 @@ fvwm
this mode from the Query Language mode, where
you have to care about the syntax.
- You can use the
+ You can use the
ToolsAdvanced search
@@ -916,7 +966,7 @@ fvwm
-
+ The default result listAfter starting a search, a list of results will instantly
@@ -927,7 +977,7 @@ fvwm
matches the query). You can sort the result by ascending or
descending date by using the vertical arrows in the toolbar (the old
sort tool is gone after release 1.15, because the new result table has much better
+ linkend="rcl.search.gui.restable">result table has much better
capability).Clicking on the
@@ -965,7 +1015,7 @@ fvwm
The format of the result list entries is entirely
configurable by using the preference dialog to
- edit an HTML
+ edit an HTML
fragment.You can click on the Query details link
@@ -981,7 +1031,7 @@ fvwm
results.
-
+ The result list right-click menuApart from the preview and edit links, you can display a
@@ -1038,7 +1088,7 @@ fvwm
-
+ The result tableIn &RCL; 1.15 and newer, the results can be displayed in
@@ -1072,7 +1122,7 @@ fvwm
-
+ The preview windowThe preview window opens when you first click a
@@ -1093,7 +1143,7 @@ fvwm
window.Of course you can also close a preview window by using the
- window manager button in the top of the frame.
+ window manager button in the top of the frame.
You can display successive or previous documents from the
result list inside a preview tab by typing
@@ -1101,34 +1151,77 @@ fvwm
Shift+Up (Down
and Up are the arrow keys).
- The preview tabs have an internal incremental search
- function. You initiate the search either by typing a
- / (slash) or CTL-F inside the text
- area or by clicking into the Search for: text
- field and entering the search string. You can then use the
- Next and Previous buttons
- to find the next/previous occurrence. You can also type
- F3 inside the text area to get to the next
- occurrence.
-
- If you have a search string entered and you use Ctrl-Up/Ctrl-Down
- to browse the results, the search is initiated for each successive
- document. If the string is found, the cursor will be positioned
- at the first occurrence of the search string.
-
A right-click menu in the text area allows switching
- between displaying the main text or the contents of fields
- associated to the document (ie: author, abtract, etc.). This is
- especially useful in cases where the term match did not occur in
- the main text but in one of the fields.
-
+ between displaying the main text or the contents of fields
+ associated to the document (ie: author, abtract, etc.). This is
+ especially useful in cases where the term match did not occur in
+ the main text but in one of the fields. In the case of
+ images, you can switch between three displays: the image
+ itself, the image metadata as extracted
+ by exiftool and the fields, which is the
+ metadata stored in the index.
+
+
You can print the current preview window contents by typing
Ctrl-P (Ctrl +
P) in the window text.
+
+
+
+ Searching inside the preview
+
+ The preview window has an internal search capability,
+ mostly controlled by the panel at the bottom of the window,
+ which works in two modes: as a classical editor incremental
+ search, where we look for the text entered in the entry
+ zone, or as a way to walk the matches between the document
+ and the &RCL; query that found it.
+
+
+
+ Incremental text search
+ The preview tabs have an internal incremental search
+ function. You initiate the search either by typing a
+ / (slash) or CTL-F
+ inside the text area or by clicking into
+ the Search for: text field and
+ entering the search string. You can then use the
+ Next
+ and Previous buttons
+ to find the next/previous occurrence. You can also type
+ F3 inside the text area to get to the next
+ occurrence.
+ If you have a search string entered and you use
+ Ctrl-Up/Ctrl-Down to browse the results, the search is
+ initiated for each successive document. If the string is
+ found, the cursor will be positioned at the first
+ occurrence of the search string.
+
+
+
+
+ Walking the match lists
+ If the entry area is empty when you click
+ the Next
+ or Previous buttons, the editor will
+ be scrolled to show the next match to any search term
+ (the next highlighted zone). If you select a search group
+ from the dropdown list and click Next
+ or Previous, the match list for this
+ group will be walked. This is not the same as a text
+ search, because the occurences will include non-exact
+ matches (as caused by stemming or wildcards). The search
+ will revert to the text mode as soon as you edit the
+ entry area.
+
+
+
+
+
-
+ Complex/advanced searchThe advanced search dialog helps you build more complex queries
@@ -1159,7 +1252,7 @@ fvwm
Click on the Show query details link at
the top of the result page to see the query expansion.
-
+ Avanced search: the "find" tabThis part of the dialog lets you constructc a query by
@@ -1216,7 +1309,7 @@ fvwm
-
+ Avanced search: the "filter" tabThis part of the dialog has several sections which allow
@@ -1272,7 +1365,7 @@ fvwm
-
+ The term explorer tool&RCL; automatically manages the expansion of search terms
@@ -1351,62 +1444,45 @@ fvwm
-
+ Multiple databases
- Multiple &RCL; databases or indexes can be created by
- using several configuration directories which are usually set to
- index different areas of the file system. A specific index can
- be selected for updating or searching, using the
- RECOLL_CONFDIR environment variable or the
- option to recoll and
- recollindex.
+ See the section
+ describing the use of multiple indexes for
+ generalities. Only the aspects concerning
+ the recoll GUI are described here.
- A recollindex program instance can only
- update one specific index.
-
- A recoll program instance is also
- associated with a specific index, which is the one to be
- updated by its indexing thread, but it can use any
- number of &RCL; indexes for searching. The external indexes
- can be selected through the external
- indexes tab in the preferences dialog.
+ A recoll program instance is always
+ associated with a specific index, which is the one to be updated
+ when requested from the File menu, but it can
+ use any number of &RCL; indexes for searching. The external
+ indexes can be selected through the external
+ indexes tab in the preferences dialog.Index selection is performed in two phases. A set of all
- usable indexes must first be defined, and then the subset of
- indexes to be used for searching. Of course, these parameters
- are retained across program executions (there are kept
- separately for each &RCL; configuration). The set of all indexes
- is usually quite stable, while the active ones might typically
- be adjusted quite frequently.
+ usable indexes must first be defined, and then the subset of
+ indexes to be used for searching. Of course, these parameters
+ are retained across program executions (there are kept
+ separately for each &RCL; configuration). The set of all indexes
+ is usually quite stable, while the active ones might typically
+ be adjusted quite frequently.
The main index (defined by
- RECOLL_CONFDIR) is always active. If this is
- undesirable, you can set up your base configuration to index
- an empty directory.
+ RECOLL_CONFDIR) is always active. If this is
+ undesirable, you can set up your base configuration to index
+ an empty directory.
As building the set of all indexes can be a little tedious
- when done through the user interface, you can use the
- RECOLL_EXTRA_DBS environment
- variable to provide an initial set. This might typically be
- set up by a system administrator so that every user does not
- have to do it. The variable should define a colon-separated list
- of index directories, ie:
-
- export RECOLL_EXTRA_DBS=/some/place/xapiandb:/some/other/db
+ when done through the user interface, you can use the
+ RECOLL_EXTRA_DBS environment
+ variable to provide an initial set. This might typically be
+ set up by a system administrator so that every user does not
+ have to do it. The variable should define a colon-separated list
+ of index directories, ie:
+
+ export RECOLL_EXTRA_DBS=/some/place/xapiandb:/some/other/db
- A typical usage scenario for the multiple index feature
- would be for a system administrator to set up a central index
- for shared data, that you choose to search or not in addition to
- your personal data. Of course, there are other
- possibilities. There are many cases where you know the subset of
- files that should be searched, and where narrowing the search
- can improve the results. You can achieve approximately the same
- effect with the directory filter in advanced search, but
- multiple indexes will have much better performance and may be
- worth the trouble.
-
- Another environment variable,
+ Another environment variable,
RECOLL_ACTIVE_EXTRA_DBS allows adding to the active
list of indexes. This variable was suggested and implemented by a
&RCL; user. It is mostly useful if you use scripts to mount
@@ -1415,18 +1491,17 @@ fvwm
RECOLL_ACTIVE_EXTRA_DBS, you can add and activate
the index for the mounted volume when starting
recoll.
-
-
- RECOLL_ACTIVE_EXTRA_DBS is available for
+
+
+ RECOLL_ACTIVE_EXTRA_DBS is available for
&RCL; versions 1.17.2 and later. A change was made in the same
update so that recoll will
automatically deactivate unreachable indexes when starting
up.
-
-
+ Document historyDocuments that you actually view (with the internal preview
@@ -1441,7 +1516,7 @@ fvwm
-
+ Sorting search results and collapsing duplicatesThe documents in a result list are normally sorted in
@@ -1471,10 +1546,10 @@ fvwm
-
+ Search tips, shortcuts
-
+ Terms and search expansionTerm completion
@@ -1539,7 +1614,7 @@ fvwm
-
+ Working with phrases and proximityPhrases and Proximity searches
@@ -1587,7 +1662,7 @@ fvwm
-
+ OthersUsing fields
@@ -1656,7 +1731,7 @@ fvwm
-
+ Customizing the search interfaceYou can customize some aspects of the search interface by using
@@ -1668,7 +1743,7 @@ fvwm
returning results, and what indexes are searched.
-
+ User interface parameters:
@@ -1764,7 +1839,7 @@ fvwm
-
+ Result list parameters:
@@ -1780,18 +1855,18 @@ fvwm
config (try the qtconfig command).
-
+ Edit result list paragraph format string:
allows you to change the presentation of each result list
- entry. See the
+ entry. See the
result list customisation section.
-
+ Edit result page html header insert:
allows you to define text inserted at the end of the result
page html header.
- More detail in the
+ More detail in the
result list customisation section.
@@ -1801,7 +1876,7 @@ fvwm
should be specified as an strftime() string (man strftime).
-
+ Abstract snippet separator:
for synthetic abstracts built from index data, which are
usually made of several snippets from different parts of the
@@ -1812,7 +1887,7 @@ fvwm
-
+ Search parameters:
@@ -1884,7 +1959,7 @@ fvwm
-
+ External indexes:This panel will let you browse for additional indexes
that you may want to search. External indexes are designated by
@@ -1905,7 +1980,7 @@ fvwm
need to implement a way of purging the index from stale data,
-
+ The result list formatThe result list presentation can be exhaustively customized
@@ -1934,7 +2009,7 @@ fvwm
page about
customising the result list on the &RCL; web site.
-
+ The paragraph formatThis is an arbitrary HTML string where the following printf-like
@@ -2039,7 +2114,7 @@ fvwm
site, with pictures to show how they look.It is also possible to
-
+
define the value of the snippet separator inside the abstract
section.
@@ -2048,10 +2123,10 @@ fvwm
-
+ Searching with the KDE KIO slave
-
+ What's thisThe &RCL; KIO slave allows performing a &RCL; search
@@ -2086,7 +2161,7 @@ fvwm
-
+ Searchable documentsAs a sample application, the &RCL; KIO slave could allow
@@ -2488,7 +2563,7 @@ text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/r
Using a * at the end of a
word can produce more matches than you would think, and
strange search results. You can use the term explorer tool to
+ linkend="rcl.search.gui.termexplorer">term explorer tool to
check what completions exist for a given term. You can also
see exactly what search was performed by clicking on the link
at the top of the result list. In general, for natural
@@ -2578,8 +2653,57 @@ text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/r
+
+
+ Multiple databases
+
+ Multiple &RCL; databases or indexes can be created by
+ using several configuration directories which are usually set to
+ index different areas of the file system. A specific index can
+ be selected for updating or searching, using the
+ RECOLL_CONFDIR environment variable or the
+ option to recoll and
+ recollindex.
+
+ A typical usage scenario for the multiple index feature
+ would be for a system administrator to set up a central index
+ for shared data, that you choose to search or not in addition to
+ your personal data. Of course, there are other
+ possibilities. There are many cases where you know the subset of
+ files that should be searched, and where narrowing the search
+ can improve the results. You can achieve approximately the same
+ effect with the directory filter in advanced search, but
+ multiple indexes will have much better performance and may be
+ worth the trouble.
+
+ A recollindex program instance can only
+ update one specific index.
+
+ The main index (defined by
+ RECOLL_CONFDIR or ) is
+ always active. If this is undesirable, you can set up your
+ base configuration to index an empty directory.
+
+ The different search interfaces (GUI, command line, ...)
+ have different methods to define the set of indexes to be
+ used, see the appropriate section.
+
+ If a set of multiple indexes are to be used together for
+ searches, some configuration parameters must be consistent
+ among the set. These are parameters which need to be the same
+ when indexing and searching. As the parameters come from the
+ main configuration when searching, they need to be compatible
+ with what was set when creating the other indexes (which came
+ from their respective configuration directories. Most of the
+ relevant parameters are described in the following
+ linked
+ section.
+
+
+
+
Programming interface
@@ -3892,9 +4016,9 @@ skippedPaths = ~/somedir/∗.txt
Parameters affecting how we generate terms:Changing some of these parameters will imply a full
- reindex. Also, when using multiple indexes, it may not make sense
- to search indexes that don't share the values for these parameters,
- because they usually affect both search and index operations.
+ reindex. Also, when using multiple indexes, it may not make sense
+ to search indexes that don't share the values for these parameters,
+ because they usually affect both search and index operations.