release 2586

This commit is contained in:
Jean-Francois Dockes 2012-03-07 18:29:57 +01:00
parent 420157d998
commit 3e607580f5
2 changed files with 290 additions and 150 deletions

View File

@ -266,8 +266,13 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
(ie: --with-file-command=/usr/local/bin/file). Can be useful to enable (ie: --with-file-command=/usr/local/bin/file). Can be useful to enable
the gnu version on systems where the native one is bad. the gnu version on systems where the native one is bad.
* --without-gui Disable the Qt interface, and auxiliary uses of X11, and * --disable-qtgui Disable the Qt interface. Will allow building the
compile the command line version. indexer and the command line search program in absence of a Qt
environment.
* --disable-x11mon Disable X11 connection monitoring inside recollindex.
Together with --disable-qtgui, this allows building recoll without Qt
and X11.
* Of course the usual autoconf configure options, like --prefix apply. * Of course the usual autoconf configure options, like --prefix apply.
@ -277,7 +282,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
configure configure
make make
(practices usual hardship-repelling invocations) (practices usual hardship-repelling invocations)
There is little auto-configuration. The configure script will mainly link There is little auto-configuration. The configure script will mainly link
one of the system-specific files in the mk directory to mk/sysconf. If one of the system-specific files in the mk directory to mk/sysconf. If
@ -316,8 +321,9 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
5.4. Configuration overview 5.4. Configuration overview
Most of the parameters specific to the recoll GUI are set through the Most of the parameters specific to the recoll GUI are set through the
Preferences menu and stored in the standard Qt place ($HOME/.qt/recollrc). Preferences menu and stored in the standard Qt place
You probably do not want to edit this by hand. ($HOME/.config/Recoll.org/recoll.conf). You probably do not want to edit
this by hand.
Recoll indexing options are set inside text configuration files located in Recoll indexing options are set inside text configuration files located in
a configuration directory. There can be several such directories, each of a configuration directory. There can be several such directories, each of
@ -361,7 +367,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
[~/somedirectory-with-utf8-txt-files] [~/somedirectory-with-utf8-txt-files]
defaultcharset = utf-8 defaultcharset = utf-8
There are three kinds of lines: There are three kinds of lines:
@ -416,8 +422,8 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
the default file is: the default file is:
skippedNames = #* bin CVS Cache cache* caughtspam tmp .thumbnails .svn \ skippedNames = #* bin CVS Cache cache* caughtspam tmp .thumbnails .svn \
*~ .beagle .git .hg .bzr loop.ps .xsession-errors \ *~ .beagle .git .hg .bzr loop.ps .xsession-errors \
.recoll* xapiandb recollrc recoll.conf .recoll* xapiandb recollrc recoll.conf
The list can be redefined at any sub-directory in the indexed The list can be redefined at any sub-directory in the indexed
area. area.
@ -451,8 +457,16 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
Example of use for skipping text files only in a specific Example of use for skipping text files only in a specific
directory: directory:
skippedPaths = ~/somedir/*.txt skippedPaths = ~/somedir/..txt
skippedPathsFnmPathname
The values in the *skippedPaths variables are matched by default
with fnmatch(3), with the FNM_PATHNAME and FNM_LEADING_DIR flags.
This means that '/' characters must be matched explicitely. You
can set skippedPathsFnmPathname to 0 to disable the use of
FNM_PATHNAME (meaning that /*/dir3 will match /dir1/dir2/dir3).
followLinks followLinks
@ -596,6 +610,11 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
directory. The value can have embedded spaces but starting or directory. The value can have embedded spaces but starting or
trailing spaces will be trimmed. You cannot use quotes here. trailing spaces will be trimmed. You cannot use quotes here.
idxstatusfile
The name of the scratch file where the indexer process updates its
status. Default: idxstatus.txt inside the configuration directory.
maxfsoccuppc maxfsoccuppc
Maximum file system occupation before we stop indexing. The value Maximum file system occupation before we stop indexing. The value
@ -659,7 +678,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
entry contains white space. Example: entry contains white space. Example:
mondelaypatterns = *.log:20 "this one has spaces*:10" mondelaypatterns = *.log:20 "this one has spaces*:10"
monixinterval monixinterval
@ -890,7 +909,6 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
Note that the mime type is made up here, and you could call it Note that the mime type is made up here, and you could call it
diesel/oil just the same. diesel/oil just the same.
* In $RECOLL_CONFDIR/mimeview under the [view] section, add: * In $RECOLL_CONFDIR/mimeview under the [view] section, add:
application/x-blobapp = blobviewer %f application/x-blobapp = blobviewer %f

View File

@ -8,11 +8,11 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
<jfd@recoll.org> <jfd@recoll.org>
Copyright (c) 2005-2011 Jean-Francois Dockes Copyright (c) 2005-2012 Jean-Francois Dockes
This document introduces full text search notions and describes the This document introduces full text search notions and describes the
installation and use of the Recoll application. It currently describes installation and use of the Recoll application. It currently describes
Recoll 1.16. Recoll 1.17.
[ Split HTML / Single HTML ] [ Split HTML / Single HTML ]
@ -110,7 +110,11 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
4.1. Writing a document filter 4.1. Writing a document filter
4.1.1. Filter HTML output 4.1.1. Simple filters
4.1.2. Telling Recoll about the filter
4.1.3. Filter HTML output
4.2. Field data processing 4.2. Field data processing
@ -246,7 +250,9 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
set inside your personal configuration, found by default in the .recoll set inside your personal configuration, found by default in the .recoll
sub-directory of your home directory. The default configuration will index sub-directory of your home directory. The default configuration will index
your home directory with default parameters and should be sufficient for your home directory with default parameters and should be sufficient for
giving Recoll a try, but you may want to adjust it later. giving Recoll a try, but you may want to adjust it later, which can be
done either by editing the text files or by using configuration menus in
the recoll GUI
Indexing is started automatically the first time you execute the recoll Indexing is started automatically the first time you execute the recoll
search graphical user interface, or by executing the recollindex command. search graphical user interface, or by executing the recollindex command.
@ -266,9 +272,9 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
Indexing is the process by which the set of documents is analyzed and the Indexing is the process by which the set of documents is analyzed and the
data entered into the database. Recoll indexing is normally incremental: data entered into the database. Recoll indexing is normally incremental:
documents will only be processed if they have been modified. On the first documents will only be processed if they have been modified. On the first
execution, of course, all documents will need processing. A full index execution, all documents will need processing. A full index build can be
build can be forced later by specifying an option to the indexing command forced later by specifying an option to the indexing command (recollindex
(recollindex -z). -z).
Recoll indexing can be performed with two different methods: Recoll indexing can be performed with two different methods:
@ -287,8 +293,6 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
small home directory). Monitoring a big file system tree can consume small home directory). Monitoring a big file system tree can consume
significant system resources. significant system resources.
Recoll knows about quite a few different document types. The parameters Recoll knows about quite a few different document types. The parameters
for document types recognition and processing are set in configuration for document types recognition and processing are set in configuration
files. files.
@ -301,8 +305,8 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
attachment to an email message part of a folder file archived inside a zip attachment to an email message part of a folder file archived inside a zip
file... file...
Recoll indexing processes plain text, HTML, openoffice and e-mail files Recoll indexing processes plain text, HTML, openoffice and e-mail files,
internally (a few more actually). and a few others internally.
Other file types (ie: postscript, pdf, ms-word, rtf ...) need external Other file types (ie: postscript, pdf, ms-word, rtf ...) need external
applications for preprocessing. The list is in the installation section. applications for preprocessing. The list is in the installation section.
@ -343,7 +347,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
export RECOLL_CONFDIR=~/.indexes-email export RECOLL_CONFDIR=~/.indexes-email
recoll recoll
Then Recoll would use configuration files stored in ~/.indexes-email/ Then Recoll would use configuration files stored in ~/.indexes-email/
and, (unless specified otherwise in recoll.conf) would look for the and, (unless specified otherwise in recoll.conf) would look for the
@ -380,30 +384,19 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
2.2.1. Xapian index formats 2.2.1. Xapian index formats
If your first installation of Recoll was 1.9.0 or more recent, you can Xapian versions usually support several formats for index storage. A given
skip this section. major Xapian version will have a current format, used to create new
indexes, and will also support the format from the previous major version.
Xapian has had two possible index formats for quite some time. The "old" Xapian will not convert automatically an existing index from the older
one named Quartz, and the new one named Flint. Xapian 0.9 used Quartz by format to the newer one. If you want to upgrade to the new format, or if a
default, but could use Flint if a specific environment variable very old index needs to be converted because its format is not supported
(XAPIAN_PREFER_FLINT) was set. Xapian 1.0 still supports Quartz but will any more, you will have to explicitly delete the old index, then run a
use Flint by default for new index creations. normal indexing process.
The number of disk accesses performed during indexing has been much
optimized in the new Flint engine and you may see indexing times improved
by 50% in some cases (compared to Quartz), typically for big indexes where
disk accesses dominate the indexing time. There is also a more modest
improvement of index size.
Xapian will not convert automatically an existing index from the Quartz to
the Flint format. If you have an older index and want to take advantage of
the new format (which can be done without setting the environment variable
as of Recoll 1.8.2 and Xapian 1.0.0), you will have to explicitly delete
the old index, then run a normal indexing process.
Unfortunately, using the -z option to recollindex is not sufficient to Unfortunately, using the -z option to recollindex is not sufficient to
change the format, you have to delete all files inside the index directory change the format, you will have to delete all files inside the index
(typically ~/.recoll/xapiandb) before starting indexing. directory (typically ~/.recoll/xapiandb) before starting the indexing.
---------------------------------------------------------------------- ----------------------------------------------------------------------
@ -414,7 +407,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
confidential data is indexed, access to the database directory should be confidential data is indexed, access to the database directory should be
restricted. restricted.
As of version 1.4, Recoll will create the configuration directory with a Recoll (since version 1.4) will create the configuration directory with a
mode of 0700 (access by owner only). As the index data directory is by mode of 0700 (access by owner only). As the index data directory is by
default a sub-directory of the configuration directory, this should result default a sub-directory of the configuration directory, this should result
in appropriate protection. in appropriate protection.
@ -507,11 +500,12 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
2.5.1. Running indexing 2.5.1. Running indexing
Indexing is performed either by the recollindex program, or by the Indexing is performed either by the recollindex program, or by the
indexing thread inside the recoll program (use the File menu). Both indexing thread inside the recoll program (start it from the File menu).
programs will use the RECOLL_CONFDIR variable or accept a -c confdir Both programs will use the RECOLL_CONFDIR variable or accept a -c confdir
option to specify a non-default configuration directory. option to specify a non-default configuration directory.
Reasons to use either the indexing thread or the recollindex command: There are reasons to use either the indexing thread or the recollindex
command, but it is also a matter of personal preferences:
* Starting the indexing thread is more convenient, being just one click * Starting the indexing thread is more convenient, being just one click
away. away.
@ -523,11 +517,10 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
rare occurrence, but who knows...) rare occurrence, but who knows...)
* The recollindex command uses setpriority/nice to lower its priority * The recollindex command uses setpriority/nice to lower its priority
while indexing (it will also use ionice when this becomes more widely while indexing. When available (and for Recoll version 1.16.2 and
available), the thread can't do it, else it would also slow down the newer), it also uses the ionice command to lower its IO priority. The
user/search interface. thread can't do it, else it would also slow down the user/search
interface.
I'll let the reader decide where my heart belongs...
If the recoll program finds no index when it starts, it will automatically If the recoll program finds no index when it starts, it will automatically
start indexing (except if canceled). start indexing (except if canceled).
@ -596,7 +589,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
The real time indexing support can be customised during package The real time indexing support can be customised during package
configuration with the --with[out]-fam or --with[out]-inotify options. The configuration with the --with[out]-fam or --with[out]-inotify options. The
default is currently to include inotify monitoring on systems that support default is currently to include inotify monitoring on systems that support
it. it, and, as of recoll 1.17, gamin support on FreeBSD.
The rclmon.sh script can be used to easily start and stop the daemon. It The rclmon.sh script can be used to easily start and stop the daemon. It
can be found in the examples directory (typically can be found in the examples directory (typically
@ -610,7 +603,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
recolldata=/usr/local/share/recoll recolldata=/usr/local/share/recoll
RECOLL_CONFDIR=$recollconf $recolldata/examples/rclmon.sh start RECOLL_CONFDIR=$recollconf $recolldata/examples/rclmon.sh start
fvwm fvwm
The indexing daemon gets started, then the window manager, for which the The indexing daemon gets started, then the window manager, for which the
session waits. session waits.
@ -625,6 +618,10 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
There is a similar mechanism under Gnome (find the session control tool in There is a similar mechanism under Gnome (find the session control tool in
the menus and use the "Startup programs" tab). the menus and use the "Startup programs" tab).
If you use the daemon completely out of an X11 session, you need to add
option -x to disable X11 session monitoring (else the daemon will not
start).
By default, the messages from the indexing daemon will be discarded. You By default, the messages from the indexing daemon will be discarded. You
may want to change this by setting the daemlogfilename and daemloglevel may want to change this by setting the daemlogfilename and daemloglevel
configuration parameters. Also the log file will only be truncated when configuration parameters. Also the log file will only be truncated when
@ -882,10 +879,9 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
Hovering over a table row will update the detail area at the bottom of the Hovering over a table row will update the detail area at the bottom of the
window with the corresponding values. You can click the row to freeze the window with the corresponding values. You can click the row to freeze the
display. The bottom area is equivalent to a classical result list display. The bottom area is equivalent to a result list paragraph, with
paragraph, with links for starting a preview or a native application, and links for starting a preview or a native application, and an equivalent
an equivalent right-click menu. Typing Esc (the Escape key) will unfreeze right-click menu. Typing Esc (the Escape key) will unfreeze the display.
the display.
---------------------------------------------------------------------- ----------------------------------------------------------------------
@ -1117,15 +1113,12 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
3.1.9. Sorting search results and collapsing duplicates 3.1.9. Sorting search results and collapsing duplicates
The documents in a result list are normally sorted in order of relevance. The documents in a result list are normally sorted in order of relevance.
It is possible to specify different sort parameters by using the Sort It is possible to specify a different sort order, either by using the
parameters dialog (located in the Tools menu). vertical arrows in the GUI toolbox to sort by date, or switching to the
result table display and clicking on any header. The sort order chosen
The tool sorts a specified number of the most relevant documents in the inside the result table remains active if you switch back to the result
result list, according to specified criteria. The currently available list, until you click one of the vertical arrows, until both are unchecked
criteria are date and mime type. (you are back to sort by relevance).
The sort parameters stay in effect until they are explicitly reset, or the
program exits. An activated sort is indicated in the result list header.
Sort parameters are remembered between program invocations, but result Sort parameters are remembered between program invocations, but result
sorting is normally always inactive when the program starts. It is sorting is normally always inactive when the program starts. It is
@ -1199,6 +1192,19 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
documents where either virtual or reality or both appear, but those which documents where either virtual or reality or both appear, but those which
contain virtual reality should appear sooner in the list. contain virtual reality should appear sooner in the list.
Phrase searches can strongly slow down a query if most of the terms in the
phrase are common. This is why the autophrase option is off by default for
Recoll versions before 1.17. As of version 1.17, autophrase is on by
default, but very common terms will be removed from the constructed
phrase. The removal threshold can be adjusted from the search preferences.
Phrases and abbreviations. As of Recoll version 1.17, dotted abbreviations
like I.B.M. are also automatically indexed as a word without the dots:
IBM. Searching for the word inside a phrase (ie: "the IBM company") will
only match the dotted abrreviation if you increase the phrase slack (using
the advanced search panel control, or the o query language modifier).
Literal occurences of the word will be matched normally.
---------------------------------------------------------------------- ----------------------------------------------------------------------
3.1.10.3. Others 3.1.10.3. Others
@ -1247,34 +1253,37 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
User interface parameters: User interface parameters:
* Number of results in a result page:
* Hide duplicate results: decides if result list entries are shown for
identical documents found in different places.
* Highlight color for query terms: Terms from the user query are * Highlight color for query terms: Terms from the user query are
highlighted in the result list samples and the preview window. The highlighted in the result list samples and the preview window. The
color can be chosen here. Any Qt color string should work (ie red, color can be chosen here. Any Qt color string should work (ie red,
#ff0000). The default is blue. #ff0000). The default is blue.
* Result list font: There is quite a lot of information shown in the * Style sheet: The name of a Qt style sheet text file which is applied
result list, and you may want to customize the font and/or font size. to the whole Recoll application on startup. The default value is
The rest of the fonts used by Recoll are determined by your generic Qt empty, but there is a skeleton style sheet (recoll.qss) inside the
config (try the qtconfig command). /usr/share/recoll/examples directory. Using a style sheet, you can
change most Recoll graphical parameters: colors, fonts, etc. See the
* Result paragraph format string: allows you to change the presentation sample file for a few simple examples.
of each result list entry. This is described in its own section.
* Abstract snippet separator: for synthetic abstracts built from index
data, which are usually made of several snippets from different parts
of the document, this defines the snippet separator, an ellipsis by
default.
* Maximum text size highlighted for preview Inserting highlights on * Maximum text size highlighted for preview Inserting highlights on
search term inside the text before inserting it in the preview window search term inside the text before inserting it in the preview window
involves quite a lot of processing, and can be disabled over the given involves quite a lot of processing, and can be disabled over the given
text size to speed up loading. text size to speed up loading.
* Prefer HTML to plain text for preview if set, Recoll will display HTML
as such inside the preview window. If this causes problems with the Qt
HTML display, you can uncheck it to display the plain text version
instead.
* Use <PRE> tags instead of <BR> to display plain text as HTML in
preview: when displaying plain text inside the preview window, Recoll
tries to preserve some of the original text line breaks and
indentation. It can either use PRE HTML tags, which will well preserve
the indentation but will force horizontal scrolling for long lines, or
use BR tags to break at the original line breaks, which will let the
editor introduce other line breaks according to the window width, but
will lose some of the original indentation.
* Use desktop preferences to choose document editor: if this is checked, * Use desktop preferences to choose document editor: if this is checked,
the xdg-open utility will be used to open files when you click the the xdg-open utility will be used to open files when you click the
Open link in the result list, instead of the application defined in Open link in the result list, instead of the application defined in
@ -1301,13 +1310,37 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
tool stat between invocations. It normally starts with sorting tool stat between invocations. It normally starts with sorting
disabled. disabled.
* Prefer HTML to plain text for preview if set, Recoll will display HTML Result list parameters:
as such inside the preview window. If this causes problems with the Qt
HTML display, you can uncheck it to display the plain text version * Number of results in a result page
instead.
* Result list font: There is quite a lot of information shown in the
result list, and you may want to customize the font and/or font size.
The rest of the fonts used by Recoll are determined by your generic Qt
config (try the qtconfig command).
* Edit result list paragraph format string: allows you to change the
presentation of each result list entry. See the result list
customisation section.
* Edit result page html header insert: allows you to define text
inserted at the end of the result page html header. More detail in the
result list customisation section.
* Date format: allows specifying the format used for displaying dates
inside the result list. This should be specified as an strftime()
string (man strftime).
* Abstract snippet separator: for synthetic abstracts built from index
data, which are usually made of several snippets from different parts
of the document, this defines the snippet separator, an ellipsis by
default.
Search parameters: Search parameters:
* Hide duplicate results: decides if result list entries are shown for
identical documents found in different places.
* Stemming language: stemming obviously depends on the document's * Stemming language: stemming obviously depends on the document's
language. This listbox will let you chose among the stemming databases language. This listbox will let you chose among the stemming databases
which were built during indexing (this is set in the main which were built during indexing (this is set in the main
@ -1316,11 +1349,16 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
will be deleted at the next indexing pass unless they are also added will be deleted at the next indexing pass unless they are also added
in the configuration file. in the configuration file.
* Dynamically add phrase to simple searches: a phrase will be * Automatically add phrase to simple searches: a phrase will be
automatically built and added to simple searches when looking for Any automatically built and added to simple searches when looking for Any
terms. This will give a relevance boost to the results where the terms. This will give a relevance boost to the results where the
search terms appear as a phrase (consecutive and in order). search terms appear as a phrase (consecutive and in order).
* Autophrase term frequency threshold percentage: very frequent terms
should not be included in automatic phrase searches for performance
reasons. The parameter defines the cutoff percentage (percentage of
the documents where the term appears).
* Replace abstracts from documents: this decides if we should synthesize * Replace abstracts from documents: this decides if we should synthesize
and display an abstract in place of an explicit abstract found within and display an abstract in place of an explicit abstract found within
the document itself. the document itself.
@ -1358,28 +1396,51 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
---------------------------------------------------------------------- ----------------------------------------------------------------------
3.1.11.1. The result list paragraph format 3.1.11.1. The result list format
The presentation of each result inside the result list can be customized The result list presentation can be exhaustively customized by adjusting
by setting the result list paragraph format inside the User Interface tab two elements:
of the Query configuration.
This is a Qt HTML string where the following printf-like % substitutions * The paragraph format
will be performed:
* Html code inside the header section
These can be edited from the Result list tab of the Query configuration.
Newer versions of Recoll (from 1.17) use a WebKit HTML object by default
(this may be disabled at build time), and total customisation is possible
with full support for CSS and Javascript. Conversely, there are limits to
what you can do with the older Qt QTextBrowser, but still, it is possible
to decide what data each result will contain, and how it will be
displayed.
No more detail will be given about the header part (only useful with the
WebKit build), if there are restrictions to what you can do, they are
beyond this author's HTML/CSS/Javascript abilities...
----------------------------------------------------------------------
3.1.11.1.1. The paragraph format
This is an arbitrary HTML string where the following printf-like %
substitutions will be performed:
* %A. Abstract * %A. Abstract
* %D. Date * %D. Date
* %I. Icon image name * %I. Icon image name. This is normally determined from the mime type.
The associations are defined inside the mimeconf configuration file.
If a thumbnail for the file is found at the standard Freedesktop
location, this will be displayed instead.
* %K. Keywords (if any) * %K. Keywords (if any)
* %L. Preview and Edit links * %L. Precooked Preview and Edit links
* %M. Mime type * %M. Mime type
* %N. result Number * %N. result Number inside the result page
* %R. Relevance percentage * %R. Relevance percentage
@ -1390,8 +1451,8 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
* %U. Url * %U. Url
The format of the Preview and Edit links is <a href="P%N"> and <a The format of the Preview and Edit links is <a href="P%N"> and <a
href="E%N"> where docnum (%N expands to the document number inside the href="E%N"> where docnum (%N) expands to the document number inside the
result list). result page).
In addition to the predefined values above, all strings like %(fieldname) In addition to the predefined values above, all strings like %(fieldname)
will be replaced by the value of the field named fieldname for this will be replaced by the value of the field named fieldname for this
@ -1410,27 +1471,30 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
<img src="%I" align="left">%R %S %L &nbsp;&nbsp;<b>%T</b><br> <img src="%I" align="left">%R %S %L &nbsp;&nbsp;<b>%T</b><br>
%M&nbsp;%D&nbsp;&nbsp;&nbsp;<i>%U</i>&nbsp;%i<br> %M&nbsp;%D&nbsp;&nbsp;&nbsp;<i>%U</i>&nbsp;%i<br>
%A %K %A %K
You may, for example, try the following for a more web-like experience: You may, for example, try the following for a more web-like experience:
<u><b><a href="P%N">%T</a></b></u><br> <u><b><a href="P%N">%T</a></b></u><br>
%A<font color=#008000>%U - %S</font> - %L %A<font color=#008000>%U - %S</font> - %L
Or the clean looking: Or the clean looking:
<img src="%I" align="left">%L <font color="#900000">%R</font> <img src="%I" align="left">%L <font color="#900000">%R</font>
<b>%T</b><br>%S <b>%T</b><br>%S
<font color="#808080"><i>%U</i></font> <font color="#808080"><i>%U</i></font>
<table bgcolor="#e0e0e0"> <table bgcolor="#e0e0e0">
<tr><td><div>%A</div></td></tr> <tr><td><div>%A</div></td></tr>
</table>%K </table>%K
Note that the P%N link in the above paragraph makes the title a preview Note that the P%N link in the above paragraph makes the title a preview
link. link.
These samples, and some others are on the web site, with pictures to show
how they look.
It is also possible to define the value of the snippet separator inside It is also possible to define the value of the snippet separator inside
the abstract section. the abstract section.
@ -1484,7 +1548,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
} }
</script> </script>
.... ....
<body ondblclick="recollsearch()"> <body ondblclick="recollsearch()">
---------------------------------------------------------------------- ----------------------------------------------------------------------
@ -1546,8 +1610,8 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
used with the KIO slave or the command line search. It broadly has the used with the KIO slave or the command line search. It broadly has the
same capabilities as the complex search interface in the GUI. same capabilities as the complex search interface in the GUI.
The language is roughly based on the Xesam user search language The language is roughly based on the (seemingly defunct) Xesam user search
specification. language specification.
If the results of a query language search puzzle you and you doubt what If the results of a query language search puzzle you and you doubt what
has been actually searched for, you can use the GUI show query link at the has been actually searched for, you can use the GUI show query link at the
@ -1557,7 +1621,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
Here follows a sample request that we are going to explain: Here follows a sample request that we are going to explain:
author:"john doe" Beatles OR Lennon Live OR Unplugged -potatoes author:"john doe" Beatles OR Lennon Live OR Unplugged -potatoes
This would search for all documents with John Doe appearing as a phrase in This would search for all documents with John Doe appearing as a phrase in
the author field (exactly what this is would depend on the document type, the author field (exactly what this is would depend on the document type,
@ -1585,9 +1649,8 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
significant), so that title:"prejudice pride" is not the same as significant), so that title:"prejudice pride" is not the same as
title:prejudice title:pride, and is unlikely to find a result. title:prejudice title:pride, and is unlikely to find a result.
Most Xesam phrase modifiers are unsupported, except for l (small ell) to Modifiers can be set on a phrase clause, for exemple to specify a
disable stemming, and p to turn a phrase into a NEAR (unordered proximity) proximity search (unordered). See the modifier section.
search. Exemple: "prejudice pride"p
Recoll currently manages the following default fields: Recoll currently manages the following default fields:
@ -1609,7 +1672,18 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
* dir for filtering the results on file location (Ex: * dir for filtering the results on file location (Ex:
dir:/home/me/somedir). -dir also works to find results out of the dir:/home/me/somedir). -dir also works to find results out of the
specified directory, only after release 1.15.8. specified directory, only after release 1.15.8. A tilde inside the
value will be expanded to the home directory. dir is not a regular
field and only one value makes sense in a query (you can't use
dir:dir1 OR dir:dir2). Relative paths make sense, for example,
dir:share/doc would match either /usr/share/doc or
/usr/local/share/doc
* size for filtering the results on file size. Exemple: size<10000. You
can use <, > or = as operators. You can specify a range like the
following: size>100 size<1000. The usual k/K, m/M, g/G, t/T can be
used as (decimal) multipliers. Ex: size>1k to search for files bigger
than 1000 bytes.
* date for searching or filtering on dates. The syntax for the argument * date for searching or filtering on dates. The syntax for the argument
is based on the ISO8601 standard for dates and time intervals. Only is based on the ISO8601 standard for dates and time intervals. Only
@ -1828,29 +1902,68 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
complicated than the older kind. Most of these new filters are written complicated than the older kind. Most of these new filters are written
in Python, using a common module to handle the protocol. in Python, using a common module to handle the protocol.
The following will just describe the simple filters, if you are programmer The following will just describe the simple filters. If you can program
enough to write one of the other kind, it shouldn't be too difficult to and want to write one of the other kind, it shouldn't be too difficult to
make sense of one of the existing modules (ie: rclzip). make sense of one of the existing modules. For example, look at rclzip
which uses Zip file paths as internal identifiers (ipath), and rclinfo,
which uses an integer index.
----------------------------------------------------------------------
4.1.1. Simple filters
Recoll simple filters are usually shell-scripts, but this is in no way Recoll simple filters are usually shell-scripts, but this is in no way
necessary. These programs are extremely simple and most of the difficulty necessary. Extracting the text from the native format is the difficult
lies in extracting the text from the native format, not outputting what is part. Outputting the format expected by Recoll is trivial. Happily enough,
expected by Recoll. Happily enough, most document formats already have most document formats have translators or text extractors which can be
translators or text extractors which handle the difficult part and can be called from the filter. In some cases the output of the translating
called from the filter. In some case the output of the translating program program is completely appropriate, and no intermediate shell-script is
is appropriate, and no intermediate shell-script is needed. needed.
Filters are called with a single argument which is the source file name. Filters are called with a single argument which is the source file name.
They should output the result to stdout. They should output the result to stdout.
When writing a filter, you should decide if it will output plain text or
html. Plain text is simpler, but you will not be able to add metadata or
vary the output character encoding (this will be defined in a
configuration file). Additionally, some formatting may easier to preserve
when previewing html. Actually the deciding factor is metadata: Recoll has
a way to extract metadata from the html header and use it for field
searches..
The RECOLL_FILTER_FORPREVIEW environment variable (values yes, no) tells The RECOLL_FILTER_FORPREVIEW environment variable (values yes, no) tells
the filter if the operation is for indexing or previewing. Some filters the filter if the operation is for indexing or previewing. Some filters
use this to output a slightly different format. This is not essential. use this to output a slightly different format, for example stripping
uninteresting repeated keywords (ie: Subject: for email) when indexing.
This is not essential.
You should look to one of the simple filters, for exemple rclps for a
starting point.
Don't forget to make your filter executable before testing !
----------------------------------------------------------------------
4.1.2. Telling Recoll about the filter
There are two elements that link a file to the filter which should process
it: the association of file to mime type and the association of a mime
type with a filter.
The association of files to mime types is mostly based on name suffixes.
The types are defined inside the mimemap file. Example:
.doc = application/msword
If no suffix association is found for the file name, Recoll will try to
execute the file -i command to determine a mime type.
The association of file types to filters is performed in the mimeconf The association of file types to filters is performed in the mimeconf
file. A sample: file. A sample will probably be of better help than a long explanation:
[index]
[index]
application/msword = exec antiword -t -i 1 -m UTF-8;\ application/msword = exec antiword -t -i 1 -m UTF-8;\
mimetype = text/plain ; charset=utf-8 mimetype = text/plain ; charset=utf-8
@ -1876,16 +1989,9 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
* application/x-chm is processed by a persistant filter. This is * application/x-chm is processed by a persistant filter. This is
determined by the execm keyword. determined by the execm keyword.
The easiest way to write a new filter is probably to start from an
existing one.
Filters which output text/plain text are generally simpler, but they
cannot specify the character set and other metadata, so they are limited
to cases where these elements are not needed.
---------------------------------------------------------------------- ----------------------------------------------------------------------
4.1.1. Filter HTML output 4.1.3. Filter HTML output
The output HTML could be very minimal like the following example: The output HTML could be very minimal like the following example:
@ -1893,7 +1999,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
</head> </head>
<body>some text content</body></html> <body>some text content</body></html>
You should take care to escape some characters inside the text by You should take care to escape some characters inside the text by
transforming them into appropriate entities. "&" should be transformed transforming them into appropriate entities. "&" should be transformed
@ -2210,8 +2316,6 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
extra_dbs is a list of external databases (xapian directories) extra_dbs is a list of external databases (xapian directories)
writable decides if we can index new data through this connection writable decides if we can index new data through this connection
---------------------------------------------------------------------- ----------------------------------------------------------------------
4.3.2.3. Example code 4.3.2.3. Example code
@ -2241,7 +2345,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
print abs print abs
print print
---------------------------------------------------------------------- ----------------------------------------------------------------------
@ -2472,8 +2576,13 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
(ie: --with-file-command=/usr/local/bin/file). Can be useful to enable (ie: --with-file-command=/usr/local/bin/file). Can be useful to enable
the gnu version on systems where the native one is bad. the gnu version on systems where the native one is bad.
* --without-gui Disable the Qt interface, and auxiliary uses of X11, and * --disable-qtgui Disable the Qt interface. Will allow building the
compile the command line version. indexer and the command line search program in absence of a Qt
environment.
* --disable-x11mon Disable X11 connection monitoring inside recollindex.
Together with --disable-qtgui, this allows building recoll without Qt
and X11.
* Of course the usual autoconf configure options, like --prefix apply. * Of course the usual autoconf configure options, like --prefix apply.
@ -2483,7 +2592,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
configure configure
make make
(practices usual hardship-repelling invocations) (practices usual hardship-repelling invocations)
There is little auto-configuration. The configure script will mainly link There is little auto-configuration. The configure script will mainly link
one of the system-specific files in the mk directory to mk/sysconf. If one of the system-specific files in the mk directory to mk/sysconf. If
@ -2513,8 +2622,9 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
5.4. Configuration overview 5.4. Configuration overview
Most of the parameters specific to the recoll GUI are set through the Most of the parameters specific to the recoll GUI are set through the
Preferences menu and stored in the standard Qt place ($HOME/.qt/recollrc). Preferences menu and stored in the standard Qt place
You probably do not want to edit this by hand. ($HOME/.config/Recoll.org/recoll.conf). You probably do not want to edit
this by hand.
Recoll indexing options are set inside text configuration files located in Recoll indexing options are set inside text configuration files located in
a configuration directory. There can be several such directories, each of a configuration directory. There can be several such directories, each of
@ -2558,7 +2668,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
[~/somedirectory-with-utf8-txt-files] [~/somedirectory-with-utf8-txt-files]
defaultcharset = utf-8 defaultcharset = utf-8
There are three kinds of lines: There are three kinds of lines:
@ -2617,8 +2727,8 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
the default file is: the default file is:
skippedNames = #* bin CVS Cache cache* caughtspam tmp .thumbnails .svn \ skippedNames = #* bin CVS Cache cache* caughtspam tmp .thumbnails .svn \
*~ .beagle .git .hg .bzr loop.ps .xsession-errors \ *~ .beagle .git .hg .bzr loop.ps .xsession-errors \
.recoll* xapiandb recollrc recoll.conf .recoll* xapiandb recollrc recoll.conf
The list can be redefined at any sub-directory in the indexed The list can be redefined at any sub-directory in the indexed
area. area.
@ -2652,8 +2762,16 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
Example of use for skipping text files only in a specific Example of use for skipping text files only in a specific
directory: directory:
skippedPaths = ~/somedir/*.txt skippedPaths = ~/somedir/..txt
skippedPathsFnmPathname
The values in the *skippedPaths variables are matched by default
with fnmatch(3), with the FNM_PATHNAME and FNM_LEADING_DIR flags.
This means that '/' characters must be matched explicitely. You
can set skippedPathsFnmPathname to 0 to disable the use of
FNM_PATHNAME (meaning that /*/dir3 will match /dir1/dir2/dir3).
followLinks followLinks
@ -2801,6 +2919,11 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
directory. The value can have embedded spaces but starting or directory. The value can have embedded spaces but starting or
trailing spaces will be trimmed. You cannot use quotes here. trailing spaces will be trimmed. You cannot use quotes here.
idxstatusfile
The name of the scratch file where the indexer process updates its
status. Default: idxstatus.txt inside the configuration directory.
maxfsoccuppc maxfsoccuppc
Maximum file system occupation before we stop indexing. The value Maximum file system occupation before we stop indexing. The value
@ -2866,7 +2989,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
entry contains white space. Example: entry contains white space. Example:
mondelaypatterns = *.log:20 "this one has spaces*:10" mondelaypatterns = *.log:20 "this one has spaces*:10"
monixinterval monixinterval
@ -3107,7 +3230,6 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
Note that the mime type is made up here, and you could call it Note that the mime type is made up here, and you could call it
diesel/oil just the same. diesel/oil just the same.
* In $RECOLL_CONFDIR/mimeview under the [view] section, add: * In $RECOLL_CONFDIR/mimeview under the [view] section, add:
application/x-blobapp = blobviewer %f application/x-blobapp = blobviewer %f