release 2586
This commit is contained in:
parent
420157d998
commit
3e607580f5
34
src/INSTALL
34
src/INSTALL
@ -266,8 +266,13 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
(ie: --with-file-command=/usr/local/bin/file). Can be useful to enable
|
(ie: --with-file-command=/usr/local/bin/file). Can be useful to enable
|
||||||
the gnu version on systems where the native one is bad.
|
the gnu version on systems where the native one is bad.
|
||||||
|
|
||||||
* --without-gui Disable the Qt interface, and auxiliary uses of X11, and
|
* --disable-qtgui Disable the Qt interface. Will allow building the
|
||||||
compile the command line version.
|
indexer and the command line search program in absence of a Qt
|
||||||
|
environment.
|
||||||
|
|
||||||
|
* --disable-x11mon Disable X11 connection monitoring inside recollindex.
|
||||||
|
Together with --disable-qtgui, this allows building recoll without Qt
|
||||||
|
and X11.
|
||||||
|
|
||||||
* Of course the usual autoconf configure options, like --prefix apply.
|
* Of course the usual autoconf configure options, like --prefix apply.
|
||||||
|
|
||||||
@ -316,8 +321,9 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
5.4. Configuration overview
|
5.4. Configuration overview
|
||||||
|
|
||||||
Most of the parameters specific to the recoll GUI are set through the
|
Most of the parameters specific to the recoll GUI are set through the
|
||||||
Preferences menu and stored in the standard Qt place ($HOME/.qt/recollrc).
|
Preferences menu and stored in the standard Qt place
|
||||||
You probably do not want to edit this by hand.
|
($HOME/.config/Recoll.org/recoll.conf). You probably do not want to edit
|
||||||
|
this by hand.
|
||||||
|
|
||||||
Recoll indexing options are set inside text configuration files located in
|
Recoll indexing options are set inside text configuration files located in
|
||||||
a configuration directory. There can be several such directories, each of
|
a configuration directory. There can be several such directories, each of
|
||||||
@ -416,8 +422,8 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
the default file is:
|
the default file is:
|
||||||
|
|
||||||
skippedNames = #* bin CVS Cache cache* caughtspam tmp .thumbnails .svn \
|
skippedNames = #* bin CVS Cache cache* caughtspam tmp .thumbnails .svn \
|
||||||
*~ .beagle .git .hg .bzr loop.ps .xsession-errors \
|
*~ .beagle .git .hg .bzr loop.ps .xsession-errors \
|
||||||
.recoll* xapiandb recollrc recoll.conf
|
.recoll* xapiandb recollrc recoll.conf
|
||||||
|
|
||||||
The list can be redefined at any sub-directory in the indexed
|
The list can be redefined at any sub-directory in the indexed
|
||||||
area.
|
area.
|
||||||
@ -451,9 +457,17 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
Example of use for skipping text files only in a specific
|
Example of use for skipping text files only in a specific
|
||||||
directory:
|
directory:
|
||||||
|
|
||||||
skippedPaths = ~/somedir/*.txt
|
skippedPaths = ~/somedir/..txt
|
||||||
|
|
||||||
|
|
||||||
|
skippedPathsFnmPathname
|
||||||
|
|
||||||
|
The values in the *skippedPaths variables are matched by default
|
||||||
|
with fnmatch(3), with the FNM_PATHNAME and FNM_LEADING_DIR flags.
|
||||||
|
This means that '/' characters must be matched explicitely. You
|
||||||
|
can set skippedPathsFnmPathname to 0 to disable the use of
|
||||||
|
FNM_PATHNAME (meaning that /*/dir3 will match /dir1/dir2/dir3).
|
||||||
|
|
||||||
followLinks
|
followLinks
|
||||||
|
|
||||||
Specifies if the indexer should follow symbolic links while
|
Specifies if the indexer should follow symbolic links while
|
||||||
@ -596,6 +610,11 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
directory. The value can have embedded spaces but starting or
|
directory. The value can have embedded spaces but starting or
|
||||||
trailing spaces will be trimmed. You cannot use quotes here.
|
trailing spaces will be trimmed. You cannot use quotes here.
|
||||||
|
|
||||||
|
idxstatusfile
|
||||||
|
|
||||||
|
The name of the scratch file where the indexer process updates its
|
||||||
|
status. Default: idxstatus.txt inside the configuration directory.
|
||||||
|
|
||||||
maxfsoccuppc
|
maxfsoccuppc
|
||||||
|
|
||||||
Maximum file system occupation before we stop indexing. The value
|
Maximum file system occupation before we stop indexing. The value
|
||||||
@ -890,7 +909,6 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
|
|
||||||
Note that the mime type is made up here, and you could call it
|
Note that the mime type is made up here, and you could call it
|
||||||
diesel/oil just the same.
|
diesel/oil just the same.
|
||||||
|
|
||||||
* In $RECOLL_CONFDIR/mimeview under the [view] section, add:
|
* In $RECOLL_CONFDIR/mimeview under the [view] section, add:
|
||||||
|
|
||||||
application/x-blobapp = blobviewer %f
|
application/x-blobapp = blobviewer %f
|
||||||
|
|||||||
370
src/README
370
src/README
@ -8,11 +8,11 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
|
|
||||||
<jfd@recoll.org>
|
<jfd@recoll.org>
|
||||||
|
|
||||||
Copyright (c) 2005-2011 Jean-Francois Dockes
|
Copyright (c) 2005-2012 Jean-Francois Dockes
|
||||||
|
|
||||||
This document introduces full text search notions and describes the
|
This document introduces full text search notions and describes the
|
||||||
installation and use of the Recoll application. It currently describes
|
installation and use of the Recoll application. It currently describes
|
||||||
Recoll 1.16.
|
Recoll 1.17.
|
||||||
|
|
||||||
[ Split HTML / Single HTML ]
|
[ Split HTML / Single HTML ]
|
||||||
|
|
||||||
@ -110,7 +110,11 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
|
|
||||||
4.1. Writing a document filter
|
4.1. Writing a document filter
|
||||||
|
|
||||||
4.1.1. Filter HTML output
|
4.1.1. Simple filters
|
||||||
|
|
||||||
|
4.1.2. Telling Recoll about the filter
|
||||||
|
|
||||||
|
4.1.3. Filter HTML output
|
||||||
|
|
||||||
4.2. Field data processing
|
4.2. Field data processing
|
||||||
|
|
||||||
@ -246,7 +250,9 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
set inside your personal configuration, found by default in the .recoll
|
set inside your personal configuration, found by default in the .recoll
|
||||||
sub-directory of your home directory. The default configuration will index
|
sub-directory of your home directory. The default configuration will index
|
||||||
your home directory with default parameters and should be sufficient for
|
your home directory with default parameters and should be sufficient for
|
||||||
giving Recoll a try, but you may want to adjust it later.
|
giving Recoll a try, but you may want to adjust it later, which can be
|
||||||
|
done either by editing the text files or by using configuration menus in
|
||||||
|
the recoll GUI
|
||||||
|
|
||||||
Indexing is started automatically the first time you execute the recoll
|
Indexing is started automatically the first time you execute the recoll
|
||||||
search graphical user interface, or by executing the recollindex command.
|
search graphical user interface, or by executing the recollindex command.
|
||||||
@ -266,9 +272,9 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
Indexing is the process by which the set of documents is analyzed and the
|
Indexing is the process by which the set of documents is analyzed and the
|
||||||
data entered into the database. Recoll indexing is normally incremental:
|
data entered into the database. Recoll indexing is normally incremental:
|
||||||
documents will only be processed if they have been modified. On the first
|
documents will only be processed if they have been modified. On the first
|
||||||
execution, of course, all documents will need processing. A full index
|
execution, all documents will need processing. A full index build can be
|
||||||
build can be forced later by specifying an option to the indexing command
|
forced later by specifying an option to the indexing command (recollindex
|
||||||
(recollindex -z).
|
-z).
|
||||||
|
|
||||||
Recoll indexing can be performed with two different methods:
|
Recoll indexing can be performed with two different methods:
|
||||||
|
|
||||||
@ -287,8 +293,6 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
small home directory). Monitoring a big file system tree can consume
|
small home directory). Monitoring a big file system tree can consume
|
||||||
significant system resources.
|
significant system resources.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
Recoll knows about quite a few different document types. The parameters
|
Recoll knows about quite a few different document types. The parameters
|
||||||
for document types recognition and processing are set in configuration
|
for document types recognition and processing are set in configuration
|
||||||
files.
|
files.
|
||||||
@ -301,8 +305,8 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
attachment to an email message part of a folder file archived inside a zip
|
attachment to an email message part of a folder file archived inside a zip
|
||||||
file...
|
file...
|
||||||
|
|
||||||
Recoll indexing processes plain text, HTML, openoffice and e-mail files
|
Recoll indexing processes plain text, HTML, openoffice and e-mail files,
|
||||||
internally (a few more actually).
|
and a few others internally.
|
||||||
|
|
||||||
Other file types (ie: postscript, pdf, ms-word, rtf ...) need external
|
Other file types (ie: postscript, pdf, ms-word, rtf ...) need external
|
||||||
applications for preprocessing. The list is in the installation section.
|
applications for preprocessing. The list is in the installation section.
|
||||||
@ -380,30 +384,19 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
|
|
||||||
2.2.1. Xapian index formats
|
2.2.1. Xapian index formats
|
||||||
|
|
||||||
If your first installation of Recoll was 1.9.0 or more recent, you can
|
Xapian versions usually support several formats for index storage. A given
|
||||||
skip this section.
|
major Xapian version will have a current format, used to create new
|
||||||
|
indexes, and will also support the format from the previous major version.
|
||||||
|
|
||||||
Xapian has had two possible index formats for quite some time. The "old"
|
Xapian will not convert automatically an existing index from the older
|
||||||
one named Quartz, and the new one named Flint. Xapian 0.9 used Quartz by
|
format to the newer one. If you want to upgrade to the new format, or if a
|
||||||
default, but could use Flint if a specific environment variable
|
very old index needs to be converted because its format is not supported
|
||||||
(XAPIAN_PREFER_FLINT) was set. Xapian 1.0 still supports Quartz but will
|
any more, you will have to explicitly delete the old index, then run a
|
||||||
use Flint by default for new index creations.
|
normal indexing process.
|
||||||
|
|
||||||
The number of disk accesses performed during indexing has been much
|
|
||||||
optimized in the new Flint engine and you may see indexing times improved
|
|
||||||
by 50% in some cases (compared to Quartz), typically for big indexes where
|
|
||||||
disk accesses dominate the indexing time. There is also a more modest
|
|
||||||
improvement of index size.
|
|
||||||
|
|
||||||
Xapian will not convert automatically an existing index from the Quartz to
|
|
||||||
the Flint format. If you have an older index and want to take advantage of
|
|
||||||
the new format (which can be done without setting the environment variable
|
|
||||||
as of Recoll 1.8.2 and Xapian 1.0.0), you will have to explicitly delete
|
|
||||||
the old index, then run a normal indexing process.
|
|
||||||
|
|
||||||
Unfortunately, using the -z option to recollindex is not sufficient to
|
Unfortunately, using the -z option to recollindex is not sufficient to
|
||||||
change the format, you have to delete all files inside the index directory
|
change the format, you will have to delete all files inside the index
|
||||||
(typically ~/.recoll/xapiandb) before starting indexing.
|
directory (typically ~/.recoll/xapiandb) before starting the indexing.
|
||||||
|
|
||||||
----------------------------------------------------------------------
|
----------------------------------------------------------------------
|
||||||
|
|
||||||
@ -414,7 +407,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
confidential data is indexed, access to the database directory should be
|
confidential data is indexed, access to the database directory should be
|
||||||
restricted.
|
restricted.
|
||||||
|
|
||||||
As of version 1.4, Recoll will create the configuration directory with a
|
Recoll (since version 1.4) will create the configuration directory with a
|
||||||
mode of 0700 (access by owner only). As the index data directory is by
|
mode of 0700 (access by owner only). As the index data directory is by
|
||||||
default a sub-directory of the configuration directory, this should result
|
default a sub-directory of the configuration directory, this should result
|
||||||
in appropriate protection.
|
in appropriate protection.
|
||||||
@ -507,11 +500,12 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
2.5.1. Running indexing
|
2.5.1. Running indexing
|
||||||
|
|
||||||
Indexing is performed either by the recollindex program, or by the
|
Indexing is performed either by the recollindex program, or by the
|
||||||
indexing thread inside the recoll program (use the File menu). Both
|
indexing thread inside the recoll program (start it from the File menu).
|
||||||
programs will use the RECOLL_CONFDIR variable or accept a -c confdir
|
Both programs will use the RECOLL_CONFDIR variable or accept a -c confdir
|
||||||
option to specify a non-default configuration directory.
|
option to specify a non-default configuration directory.
|
||||||
|
|
||||||
Reasons to use either the indexing thread or the recollindex command:
|
There are reasons to use either the indexing thread or the recollindex
|
||||||
|
command, but it is also a matter of personal preferences:
|
||||||
|
|
||||||
* Starting the indexing thread is more convenient, being just one click
|
* Starting the indexing thread is more convenient, being just one click
|
||||||
away.
|
away.
|
||||||
@ -523,11 +517,10 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
rare occurrence, but who knows...)
|
rare occurrence, but who knows...)
|
||||||
|
|
||||||
* The recollindex command uses setpriority/nice to lower its priority
|
* The recollindex command uses setpriority/nice to lower its priority
|
||||||
while indexing (it will also use ionice when this becomes more widely
|
while indexing. When available (and for Recoll version 1.16.2 and
|
||||||
available), the thread can't do it, else it would also slow down the
|
newer), it also uses the ionice command to lower its IO priority. The
|
||||||
user/search interface.
|
thread can't do it, else it would also slow down the user/search
|
||||||
|
interface.
|
||||||
I'll let the reader decide where my heart belongs...
|
|
||||||
|
|
||||||
If the recoll program finds no index when it starts, it will automatically
|
If the recoll program finds no index when it starts, it will automatically
|
||||||
start indexing (except if canceled).
|
start indexing (except if canceled).
|
||||||
@ -596,7 +589,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
The real time indexing support can be customised during package
|
The real time indexing support can be customised during package
|
||||||
configuration with the --with[out]-fam or --with[out]-inotify options. The
|
configuration with the --with[out]-fam or --with[out]-inotify options. The
|
||||||
default is currently to include inotify monitoring on systems that support
|
default is currently to include inotify monitoring on systems that support
|
||||||
it.
|
it, and, as of recoll 1.17, gamin support on FreeBSD.
|
||||||
|
|
||||||
The rclmon.sh script can be used to easily start and stop the daemon. It
|
The rclmon.sh script can be used to easily start and stop the daemon. It
|
||||||
can be found in the examples directory (typically
|
can be found in the examples directory (typically
|
||||||
@ -625,6 +618,10 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
There is a similar mechanism under Gnome (find the session control tool in
|
There is a similar mechanism under Gnome (find the session control tool in
|
||||||
the menus and use the "Startup programs" tab).
|
the menus and use the "Startup programs" tab).
|
||||||
|
|
||||||
|
If you use the daemon completely out of an X11 session, you need to add
|
||||||
|
option -x to disable X11 session monitoring (else the daemon will not
|
||||||
|
start).
|
||||||
|
|
||||||
By default, the messages from the indexing daemon will be discarded. You
|
By default, the messages from the indexing daemon will be discarded. You
|
||||||
may want to change this by setting the daemlogfilename and daemloglevel
|
may want to change this by setting the daemlogfilename and daemloglevel
|
||||||
configuration parameters. Also the log file will only be truncated when
|
configuration parameters. Also the log file will only be truncated when
|
||||||
@ -882,10 +879,9 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
|
|
||||||
Hovering over a table row will update the detail area at the bottom of the
|
Hovering over a table row will update the detail area at the bottom of the
|
||||||
window with the corresponding values. You can click the row to freeze the
|
window with the corresponding values. You can click the row to freeze the
|
||||||
display. The bottom area is equivalent to a classical result list
|
display. The bottom area is equivalent to a result list paragraph, with
|
||||||
paragraph, with links for starting a preview or a native application, and
|
links for starting a preview or a native application, and an equivalent
|
||||||
an equivalent right-click menu. Typing Esc (the Escape key) will unfreeze
|
right-click menu. Typing Esc (the Escape key) will unfreeze the display.
|
||||||
the display.
|
|
||||||
|
|
||||||
----------------------------------------------------------------------
|
----------------------------------------------------------------------
|
||||||
|
|
||||||
@ -1117,15 +1113,12 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
3.1.9. Sorting search results and collapsing duplicates
|
3.1.9. Sorting search results and collapsing duplicates
|
||||||
|
|
||||||
The documents in a result list are normally sorted in order of relevance.
|
The documents in a result list are normally sorted in order of relevance.
|
||||||
It is possible to specify different sort parameters by using the Sort
|
It is possible to specify a different sort order, either by using the
|
||||||
parameters dialog (located in the Tools menu).
|
vertical arrows in the GUI toolbox to sort by date, or switching to the
|
||||||
|
result table display and clicking on any header. The sort order chosen
|
||||||
The tool sorts a specified number of the most relevant documents in the
|
inside the result table remains active if you switch back to the result
|
||||||
result list, according to specified criteria. The currently available
|
list, until you click one of the vertical arrows, until both are unchecked
|
||||||
criteria are date and mime type.
|
(you are back to sort by relevance).
|
||||||
|
|
||||||
The sort parameters stay in effect until they are explicitly reset, or the
|
|
||||||
program exits. An activated sort is indicated in the result list header.
|
|
||||||
|
|
||||||
Sort parameters are remembered between program invocations, but result
|
Sort parameters are remembered between program invocations, but result
|
||||||
sorting is normally always inactive when the program starts. It is
|
sorting is normally always inactive when the program starts. It is
|
||||||
@ -1199,6 +1192,19 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
documents where either virtual or reality or both appear, but those which
|
documents where either virtual or reality or both appear, but those which
|
||||||
contain virtual reality should appear sooner in the list.
|
contain virtual reality should appear sooner in the list.
|
||||||
|
|
||||||
|
Phrase searches can strongly slow down a query if most of the terms in the
|
||||||
|
phrase are common. This is why the autophrase option is off by default for
|
||||||
|
Recoll versions before 1.17. As of version 1.17, autophrase is on by
|
||||||
|
default, but very common terms will be removed from the constructed
|
||||||
|
phrase. The removal threshold can be adjusted from the search preferences.
|
||||||
|
|
||||||
|
Phrases and abbreviations. As of Recoll version 1.17, dotted abbreviations
|
||||||
|
like I.B.M. are also automatically indexed as a word without the dots:
|
||||||
|
IBM. Searching for the word inside a phrase (ie: "the IBM company") will
|
||||||
|
only match the dotted abrreviation if you increase the phrase slack (using
|
||||||
|
the advanced search panel control, or the o query language modifier).
|
||||||
|
Literal occurences of the word will be matched normally.
|
||||||
|
|
||||||
----------------------------------------------------------------------
|
----------------------------------------------------------------------
|
||||||
|
|
||||||
3.1.10.3. Others
|
3.1.10.3. Others
|
||||||
@ -1247,34 +1253,37 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
|
|
||||||
User interface parameters:
|
User interface parameters:
|
||||||
|
|
||||||
* Number of results in a result page:
|
|
||||||
|
|
||||||
* Hide duplicate results: decides if result list entries are shown for
|
|
||||||
identical documents found in different places.
|
|
||||||
|
|
||||||
* Highlight color for query terms: Terms from the user query are
|
* Highlight color for query terms: Terms from the user query are
|
||||||
highlighted in the result list samples and the preview window. The
|
highlighted in the result list samples and the preview window. The
|
||||||
color can be chosen here. Any Qt color string should work (ie red,
|
color can be chosen here. Any Qt color string should work (ie red,
|
||||||
#ff0000). The default is blue.
|
#ff0000). The default is blue.
|
||||||
|
|
||||||
* Result list font: There is quite a lot of information shown in the
|
* Style sheet: The name of a Qt style sheet text file which is applied
|
||||||
result list, and you may want to customize the font and/or font size.
|
to the whole Recoll application on startup. The default value is
|
||||||
The rest of the fonts used by Recoll are determined by your generic Qt
|
empty, but there is a skeleton style sheet (recoll.qss) inside the
|
||||||
config (try the qtconfig command).
|
/usr/share/recoll/examples directory. Using a style sheet, you can
|
||||||
|
change most Recoll graphical parameters: colors, fonts, etc. See the
|
||||||
* Result paragraph format string: allows you to change the presentation
|
sample file for a few simple examples.
|
||||||
of each result list entry. This is described in its own section.
|
|
||||||
|
|
||||||
* Abstract snippet separator: for synthetic abstracts built from index
|
|
||||||
data, which are usually made of several snippets from different parts
|
|
||||||
of the document, this defines the snippet separator, an ellipsis by
|
|
||||||
default.
|
|
||||||
|
|
||||||
* Maximum text size highlighted for preview Inserting highlights on
|
* Maximum text size highlighted for preview Inserting highlights on
|
||||||
search term inside the text before inserting it in the preview window
|
search term inside the text before inserting it in the preview window
|
||||||
involves quite a lot of processing, and can be disabled over the given
|
involves quite a lot of processing, and can be disabled over the given
|
||||||
text size to speed up loading.
|
text size to speed up loading.
|
||||||
|
|
||||||
|
* Prefer HTML to plain text for preview if set, Recoll will display HTML
|
||||||
|
as such inside the preview window. If this causes problems with the Qt
|
||||||
|
HTML display, you can uncheck it to display the plain text version
|
||||||
|
instead.
|
||||||
|
|
||||||
|
* Use <PRE> tags instead of <BR> to display plain text as HTML in
|
||||||
|
preview: when displaying plain text inside the preview window, Recoll
|
||||||
|
tries to preserve some of the original text line breaks and
|
||||||
|
indentation. It can either use PRE HTML tags, which will well preserve
|
||||||
|
the indentation but will force horizontal scrolling for long lines, or
|
||||||
|
use BR tags to break at the original line breaks, which will let the
|
||||||
|
editor introduce other line breaks according to the window width, but
|
||||||
|
will lose some of the original indentation.
|
||||||
|
|
||||||
* Use desktop preferences to choose document editor: if this is checked,
|
* Use desktop preferences to choose document editor: if this is checked,
|
||||||
the xdg-open utility will be used to open files when you click the
|
the xdg-open utility will be used to open files when you click the
|
||||||
Open link in the result list, instead of the application defined in
|
Open link in the result list, instead of the application defined in
|
||||||
@ -1301,13 +1310,37 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
tool stat between invocations. It normally starts with sorting
|
tool stat between invocations. It normally starts with sorting
|
||||||
disabled.
|
disabled.
|
||||||
|
|
||||||
* Prefer HTML to plain text for preview if set, Recoll will display HTML
|
Result list parameters:
|
||||||
as such inside the preview window. If this causes problems with the Qt
|
|
||||||
HTML display, you can uncheck it to display the plain text version
|
* Number of results in a result page
|
||||||
instead.
|
|
||||||
|
* Result list font: There is quite a lot of information shown in the
|
||||||
|
result list, and you may want to customize the font and/or font size.
|
||||||
|
The rest of the fonts used by Recoll are determined by your generic Qt
|
||||||
|
config (try the qtconfig command).
|
||||||
|
|
||||||
|
* Edit result list paragraph format string: allows you to change the
|
||||||
|
presentation of each result list entry. See the result list
|
||||||
|
customisation section.
|
||||||
|
|
||||||
|
* Edit result page html header insert: allows you to define text
|
||||||
|
inserted at the end of the result page html header. More detail in the
|
||||||
|
result list customisation section.
|
||||||
|
|
||||||
|
* Date format: allows specifying the format used for displaying dates
|
||||||
|
inside the result list. This should be specified as an strftime()
|
||||||
|
string (man strftime).
|
||||||
|
|
||||||
|
* Abstract snippet separator: for synthetic abstracts built from index
|
||||||
|
data, which are usually made of several snippets from different parts
|
||||||
|
of the document, this defines the snippet separator, an ellipsis by
|
||||||
|
default.
|
||||||
|
|
||||||
Search parameters:
|
Search parameters:
|
||||||
|
|
||||||
|
* Hide duplicate results: decides if result list entries are shown for
|
||||||
|
identical documents found in different places.
|
||||||
|
|
||||||
* Stemming language: stemming obviously depends on the document's
|
* Stemming language: stemming obviously depends on the document's
|
||||||
language. This listbox will let you chose among the stemming databases
|
language. This listbox will let you chose among the stemming databases
|
||||||
which were built during indexing (this is set in the main
|
which were built during indexing (this is set in the main
|
||||||
@ -1316,11 +1349,16 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
will be deleted at the next indexing pass unless they are also added
|
will be deleted at the next indexing pass unless they are also added
|
||||||
in the configuration file.
|
in the configuration file.
|
||||||
|
|
||||||
* Dynamically add phrase to simple searches: a phrase will be
|
* Automatically add phrase to simple searches: a phrase will be
|
||||||
automatically built and added to simple searches when looking for Any
|
automatically built and added to simple searches when looking for Any
|
||||||
terms. This will give a relevance boost to the results where the
|
terms. This will give a relevance boost to the results where the
|
||||||
search terms appear as a phrase (consecutive and in order).
|
search terms appear as a phrase (consecutive and in order).
|
||||||
|
|
||||||
|
* Autophrase term frequency threshold percentage: very frequent terms
|
||||||
|
should not be included in automatic phrase searches for performance
|
||||||
|
reasons. The parameter defines the cutoff percentage (percentage of
|
||||||
|
the documents where the term appears).
|
||||||
|
|
||||||
* Replace abstracts from documents: this decides if we should synthesize
|
* Replace abstracts from documents: this decides if we should synthesize
|
||||||
and display an abstract in place of an explicit abstract found within
|
and display an abstract in place of an explicit abstract found within
|
||||||
the document itself.
|
the document itself.
|
||||||
@ -1358,28 +1396,51 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
|
|
||||||
----------------------------------------------------------------------
|
----------------------------------------------------------------------
|
||||||
|
|
||||||
3.1.11.1. The result list paragraph format
|
3.1.11.1. The result list format
|
||||||
|
|
||||||
The presentation of each result inside the result list can be customized
|
The result list presentation can be exhaustively customized by adjusting
|
||||||
by setting the result list paragraph format inside the User Interface tab
|
two elements:
|
||||||
of the Query configuration.
|
|
||||||
|
|
||||||
This is a Qt HTML string where the following printf-like % substitutions
|
* The paragraph format
|
||||||
will be performed:
|
|
||||||
|
* Html code inside the header section
|
||||||
|
|
||||||
|
These can be edited from the Result list tab of the Query configuration.
|
||||||
|
|
||||||
|
Newer versions of Recoll (from 1.17) use a WebKit HTML object by default
|
||||||
|
(this may be disabled at build time), and total customisation is possible
|
||||||
|
with full support for CSS and Javascript. Conversely, there are limits to
|
||||||
|
what you can do with the older Qt QTextBrowser, but still, it is possible
|
||||||
|
to decide what data each result will contain, and how it will be
|
||||||
|
displayed.
|
||||||
|
|
||||||
|
No more detail will be given about the header part (only useful with the
|
||||||
|
WebKit build), if there are restrictions to what you can do, they are
|
||||||
|
beyond this author's HTML/CSS/Javascript abilities...
|
||||||
|
|
||||||
|
----------------------------------------------------------------------
|
||||||
|
|
||||||
|
3.1.11.1.1. The paragraph format
|
||||||
|
|
||||||
|
This is an arbitrary HTML string where the following printf-like %
|
||||||
|
substitutions will be performed:
|
||||||
|
|
||||||
* %A. Abstract
|
* %A. Abstract
|
||||||
|
|
||||||
* %D. Date
|
* %D. Date
|
||||||
|
|
||||||
* %I. Icon image name
|
* %I. Icon image name. This is normally determined from the mime type.
|
||||||
|
The associations are defined inside the mimeconf configuration file.
|
||||||
|
If a thumbnail for the file is found at the standard Freedesktop
|
||||||
|
location, this will be displayed instead.
|
||||||
|
|
||||||
* %K. Keywords (if any)
|
* %K. Keywords (if any)
|
||||||
|
|
||||||
* %L. Preview and Edit links
|
* %L. Precooked Preview and Edit links
|
||||||
|
|
||||||
* %M. Mime type
|
* %M. Mime type
|
||||||
|
|
||||||
* %N. result Number
|
* %N. result Number inside the result page
|
||||||
|
|
||||||
* %R. Relevance percentage
|
* %R. Relevance percentage
|
||||||
|
|
||||||
@ -1390,8 +1451,8 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
* %U. Url
|
* %U. Url
|
||||||
|
|
||||||
The format of the Preview and Edit links is <a href="P%N"> and <a
|
The format of the Preview and Edit links is <a href="P%N"> and <a
|
||||||
href="E%N"> where docnum (%N expands to the document number inside the
|
href="E%N"> where docnum (%N) expands to the document number inside the
|
||||||
result list).
|
result page).
|
||||||
|
|
||||||
In addition to the predefined values above, all strings like %(fieldname)
|
In addition to the predefined values above, all strings like %(fieldname)
|
||||||
will be replaced by the value of the field named fieldname for this
|
will be replaced by the value of the field named fieldname for this
|
||||||
@ -1431,6 +1492,9 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
Note that the P%N link in the above paragraph makes the title a preview
|
Note that the P%N link in the above paragraph makes the title a preview
|
||||||
link.
|
link.
|
||||||
|
|
||||||
|
These samples, and some others are on the web site, with pictures to show
|
||||||
|
how they look.
|
||||||
|
|
||||||
It is also possible to define the value of the snippet separator inside
|
It is also possible to define the value of the snippet separator inside
|
||||||
the abstract section.
|
the abstract section.
|
||||||
|
|
||||||
@ -1546,8 +1610,8 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
used with the KIO slave or the command line search. It broadly has the
|
used with the KIO slave or the command line search. It broadly has the
|
||||||
same capabilities as the complex search interface in the GUI.
|
same capabilities as the complex search interface in the GUI.
|
||||||
|
|
||||||
The language is roughly based on the Xesam user search language
|
The language is roughly based on the (seemingly defunct) Xesam user search
|
||||||
specification.
|
language specification.
|
||||||
|
|
||||||
If the results of a query language search puzzle you and you doubt what
|
If the results of a query language search puzzle you and you doubt what
|
||||||
has been actually searched for, you can use the GUI show query link at the
|
has been actually searched for, you can use the GUI show query link at the
|
||||||
@ -1585,9 +1649,8 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
significant), so that title:"prejudice pride" is not the same as
|
significant), so that title:"prejudice pride" is not the same as
|
||||||
title:prejudice title:pride, and is unlikely to find a result.
|
title:prejudice title:pride, and is unlikely to find a result.
|
||||||
|
|
||||||
Most Xesam phrase modifiers are unsupported, except for l (small ell) to
|
Modifiers can be set on a phrase clause, for exemple to specify a
|
||||||
disable stemming, and p to turn a phrase into a NEAR (unordered proximity)
|
proximity search (unordered). See the modifier section.
|
||||||
search. Exemple: "prejudice pride"p
|
|
||||||
|
|
||||||
Recoll currently manages the following default fields:
|
Recoll currently manages the following default fields:
|
||||||
|
|
||||||
@ -1609,7 +1672,18 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
|
|
||||||
* dir for filtering the results on file location (Ex:
|
* dir for filtering the results on file location (Ex:
|
||||||
dir:/home/me/somedir). -dir also works to find results out of the
|
dir:/home/me/somedir). -dir also works to find results out of the
|
||||||
specified directory, only after release 1.15.8.
|
specified directory, only after release 1.15.8. A tilde inside the
|
||||||
|
value will be expanded to the home directory. dir is not a regular
|
||||||
|
field and only one value makes sense in a query (you can't use
|
||||||
|
dir:dir1 OR dir:dir2). Relative paths make sense, for example,
|
||||||
|
dir:share/doc would match either /usr/share/doc or
|
||||||
|
/usr/local/share/doc
|
||||||
|
|
||||||
|
* size for filtering the results on file size. Exemple: size<10000. You
|
||||||
|
can use <, > or = as operators. You can specify a range like the
|
||||||
|
following: size>100 size<1000. The usual k/K, m/M, g/G, t/T can be
|
||||||
|
used as (decimal) multipliers. Ex: size>1k to search for files bigger
|
||||||
|
than 1000 bytes.
|
||||||
|
|
||||||
* date for searching or filtering on dates. The syntax for the argument
|
* date for searching or filtering on dates. The syntax for the argument
|
||||||
is based on the ISO8601 standard for dates and time intervals. Only
|
is based on the ISO8601 standard for dates and time intervals. Only
|
||||||
@ -1828,29 +1902,68 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
complicated than the older kind. Most of these new filters are written
|
complicated than the older kind. Most of these new filters are written
|
||||||
in Python, using a common module to handle the protocol.
|
in Python, using a common module to handle the protocol.
|
||||||
|
|
||||||
The following will just describe the simple filters, if you are programmer
|
The following will just describe the simple filters. If you can program
|
||||||
enough to write one of the other kind, it shouldn't be too difficult to
|
and want to write one of the other kind, it shouldn't be too difficult to
|
||||||
make sense of one of the existing modules (ie: rclzip).
|
make sense of one of the existing modules. For example, look at rclzip
|
||||||
|
which uses Zip file paths as internal identifiers (ipath), and rclinfo,
|
||||||
|
which uses an integer index.
|
||||||
|
|
||||||
|
----------------------------------------------------------------------
|
||||||
|
|
||||||
|
4.1.1. Simple filters
|
||||||
|
|
||||||
Recoll simple filters are usually shell-scripts, but this is in no way
|
Recoll simple filters are usually shell-scripts, but this is in no way
|
||||||
necessary. These programs are extremely simple and most of the difficulty
|
necessary. Extracting the text from the native format is the difficult
|
||||||
lies in extracting the text from the native format, not outputting what is
|
part. Outputting the format expected by Recoll is trivial. Happily enough,
|
||||||
expected by Recoll. Happily enough, most document formats already have
|
most document formats have translators or text extractors which can be
|
||||||
translators or text extractors which handle the difficult part and can be
|
called from the filter. In some cases the output of the translating
|
||||||
called from the filter. In some case the output of the translating program
|
program is completely appropriate, and no intermediate shell-script is
|
||||||
is appropriate, and no intermediate shell-script is needed.
|
needed.
|
||||||
|
|
||||||
Filters are called with a single argument which is the source file name.
|
Filters are called with a single argument which is the source file name.
|
||||||
They should output the result to stdout.
|
They should output the result to stdout.
|
||||||
|
|
||||||
|
When writing a filter, you should decide if it will output plain text or
|
||||||
|
html. Plain text is simpler, but you will not be able to add metadata or
|
||||||
|
vary the output character encoding (this will be defined in a
|
||||||
|
configuration file). Additionally, some formatting may easier to preserve
|
||||||
|
when previewing html. Actually the deciding factor is metadata: Recoll has
|
||||||
|
a way to extract metadata from the html header and use it for field
|
||||||
|
searches..
|
||||||
|
|
||||||
The RECOLL_FILTER_FORPREVIEW environment variable (values yes, no) tells
|
The RECOLL_FILTER_FORPREVIEW environment variable (values yes, no) tells
|
||||||
the filter if the operation is for indexing or previewing. Some filters
|
the filter if the operation is for indexing or previewing. Some filters
|
||||||
use this to output a slightly different format. This is not essential.
|
use this to output a slightly different format, for example stripping
|
||||||
|
uninteresting repeated keywords (ie: Subject: for email) when indexing.
|
||||||
|
This is not essential.
|
||||||
|
|
||||||
|
You should look to one of the simple filters, for exemple rclps for a
|
||||||
|
starting point.
|
||||||
|
|
||||||
|
Don't forget to make your filter executable before testing !
|
||||||
|
|
||||||
|
----------------------------------------------------------------------
|
||||||
|
|
||||||
|
4.1.2. Telling Recoll about the filter
|
||||||
|
|
||||||
|
There are two elements that link a file to the filter which should process
|
||||||
|
it: the association of file to mime type and the association of a mime
|
||||||
|
type with a filter.
|
||||||
|
|
||||||
|
The association of files to mime types is mostly based on name suffixes.
|
||||||
|
The types are defined inside the mimemap file. Example:
|
||||||
|
|
||||||
|
|
||||||
|
.doc = application/msword
|
||||||
|
|
||||||
|
If no suffix association is found for the file name, Recoll will try to
|
||||||
|
execute the file -i command to determine a mime type.
|
||||||
|
|
||||||
The association of file types to filters is performed in the mimeconf
|
The association of file types to filters is performed in the mimeconf
|
||||||
file. A sample:
|
file. A sample will probably be of better help than a long explanation:
|
||||||
|
|
||||||
[index]
|
|
||||||
|
[index]
|
||||||
application/msword = exec antiword -t -i 1 -m UTF-8;\
|
application/msword = exec antiword -t -i 1 -m UTF-8;\
|
||||||
mimetype = text/plain ; charset=utf-8
|
mimetype = text/plain ; charset=utf-8
|
||||||
|
|
||||||
@ -1876,16 +1989,9 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
* application/x-chm is processed by a persistant filter. This is
|
* application/x-chm is processed by a persistant filter. This is
|
||||||
determined by the execm keyword.
|
determined by the execm keyword.
|
||||||
|
|
||||||
The easiest way to write a new filter is probably to start from an
|
|
||||||
existing one.
|
|
||||||
|
|
||||||
Filters which output text/plain text are generally simpler, but they
|
|
||||||
cannot specify the character set and other metadata, so they are limited
|
|
||||||
to cases where these elements are not needed.
|
|
||||||
|
|
||||||
----------------------------------------------------------------------
|
----------------------------------------------------------------------
|
||||||
|
|
||||||
4.1.1. Filter HTML output
|
4.1.3. Filter HTML output
|
||||||
|
|
||||||
The output HTML could be very minimal like the following example:
|
The output HTML could be very minimal like the following example:
|
||||||
|
|
||||||
@ -2210,8 +2316,6 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
extra_dbs is a list of external databases (xapian directories)
|
extra_dbs is a list of external databases (xapian directories)
|
||||||
writable decides if we can index new data through this connection
|
writable decides if we can index new data through this connection
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
----------------------------------------------------------------------
|
----------------------------------------------------------------------
|
||||||
|
|
||||||
4.3.2.3. Example code
|
4.3.2.3. Example code
|
||||||
@ -2472,8 +2576,13 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
(ie: --with-file-command=/usr/local/bin/file). Can be useful to enable
|
(ie: --with-file-command=/usr/local/bin/file). Can be useful to enable
|
||||||
the gnu version on systems where the native one is bad.
|
the gnu version on systems where the native one is bad.
|
||||||
|
|
||||||
* --without-gui Disable the Qt interface, and auxiliary uses of X11, and
|
* --disable-qtgui Disable the Qt interface. Will allow building the
|
||||||
compile the command line version.
|
indexer and the command line search program in absence of a Qt
|
||||||
|
environment.
|
||||||
|
|
||||||
|
* --disable-x11mon Disable X11 connection monitoring inside recollindex.
|
||||||
|
Together with --disable-qtgui, this allows building recoll without Qt
|
||||||
|
and X11.
|
||||||
|
|
||||||
* Of course the usual autoconf configure options, like --prefix apply.
|
* Of course the usual autoconf configure options, like --prefix apply.
|
||||||
|
|
||||||
@ -2513,8 +2622,9 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
5.4. Configuration overview
|
5.4. Configuration overview
|
||||||
|
|
||||||
Most of the parameters specific to the recoll GUI are set through the
|
Most of the parameters specific to the recoll GUI are set through the
|
||||||
Preferences menu and stored in the standard Qt place ($HOME/.qt/recollrc).
|
Preferences menu and stored in the standard Qt place
|
||||||
You probably do not want to edit this by hand.
|
($HOME/.config/Recoll.org/recoll.conf). You probably do not want to edit
|
||||||
|
this by hand.
|
||||||
|
|
||||||
Recoll indexing options are set inside text configuration files located in
|
Recoll indexing options are set inside text configuration files located in
|
||||||
a configuration directory. There can be several such directories, each of
|
a configuration directory. There can be several such directories, each of
|
||||||
@ -2617,8 +2727,8 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
the default file is:
|
the default file is:
|
||||||
|
|
||||||
skippedNames = #* bin CVS Cache cache* caughtspam tmp .thumbnails .svn \
|
skippedNames = #* bin CVS Cache cache* caughtspam tmp .thumbnails .svn \
|
||||||
*~ .beagle .git .hg .bzr loop.ps .xsession-errors \
|
*~ .beagle .git .hg .bzr loop.ps .xsession-errors \
|
||||||
.recoll* xapiandb recollrc recoll.conf
|
.recoll* xapiandb recollrc recoll.conf
|
||||||
|
|
||||||
The list can be redefined at any sub-directory in the indexed
|
The list can be redefined at any sub-directory in the indexed
|
||||||
area.
|
area.
|
||||||
@ -2652,9 +2762,17 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
Example of use for skipping text files only in a specific
|
Example of use for skipping text files only in a specific
|
||||||
directory:
|
directory:
|
||||||
|
|
||||||
skippedPaths = ~/somedir/*.txt
|
skippedPaths = ~/somedir/..txt
|
||||||
|
|
||||||
|
|
||||||
|
skippedPathsFnmPathname
|
||||||
|
|
||||||
|
The values in the *skippedPaths variables are matched by default
|
||||||
|
with fnmatch(3), with the FNM_PATHNAME and FNM_LEADING_DIR flags.
|
||||||
|
This means that '/' characters must be matched explicitely. You
|
||||||
|
can set skippedPathsFnmPathname to 0 to disable the use of
|
||||||
|
FNM_PATHNAME (meaning that /*/dir3 will match /dir1/dir2/dir3).
|
||||||
|
|
||||||
followLinks
|
followLinks
|
||||||
|
|
||||||
Specifies if the indexer should follow symbolic links while
|
Specifies if the indexer should follow symbolic links while
|
||||||
@ -2801,6 +2919,11 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
directory. The value can have embedded spaces but starting or
|
directory. The value can have embedded spaces but starting or
|
||||||
trailing spaces will be trimmed. You cannot use quotes here.
|
trailing spaces will be trimmed. You cannot use quotes here.
|
||||||
|
|
||||||
|
idxstatusfile
|
||||||
|
|
||||||
|
The name of the scratch file where the indexer process updates its
|
||||||
|
status. Default: idxstatus.txt inside the configuration directory.
|
||||||
|
|
||||||
maxfsoccuppc
|
maxfsoccuppc
|
||||||
|
|
||||||
Maximum file system occupation before we stop indexing. The value
|
Maximum file system occupation before we stop indexing. The value
|
||||||
@ -3107,7 +3230,6 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
|
|
||||||
Note that the mime type is made up here, and you could call it
|
Note that the mime type is made up here, and you could call it
|
||||||
diesel/oil just the same.
|
diesel/oil just the same.
|
||||||
|
|
||||||
* In $RECOLL_CONFDIR/mimeview under the [view] section, add:
|
* In $RECOLL_CONFDIR/mimeview under the [view] section, add:
|
||||||
|
|
||||||
application/x-blobapp = blobviewer %f
|
application/x-blobapp = blobviewer %f
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user