diff --git a/src/doc/user/usermanual.sgml b/src/doc/user/usermanual.sgml
index 756f988a..bcf15a5b 100644
--- a/src/doc/user/usermanual.sgml
+++ b/src/doc/user/usermanual.sgml
@@ -139,27 +139,47 @@
index. It has input filters for many document types.
Stemming is the process by which &RCL; reduces words to
- their radicals so that searching does not depend, for example,
- on a word being singular or plural (floor, floors), or on a verb
- tense (flooring, floored). Because the mechanisms used for
- stemming depend on the specific grammatical rules for each
- language, there is a separate stemmer module for most common
- languages where stemming makes sense. Storing documents written
- in different languages in the same index is possible, and
- commonly done. In this situation, you can specify several
- stemming languages for the index. &RCL; stores the unstemmed
- versions of terms in the main index and uses auxiliary databases
- for term expansion (one for each stemming language), which means
- that you can switch stemming languages between searches, or add
- a language without needing a full reindex. &RCL; currently
- makes no attempt at automatic language recognition, which means
- that the stemmer will sometimes be applied to terms from other
- languages with potentially strange results. In practise, even if
- this introduces possibilities of confusion, this approach has
- been proven quite useful, and, awaiting the addition of an
- automatic language recognition module to &RCL;, it is much less
- cumbersome than separating your documents according to what
- language they are written in.
+ their radicals so that searching does not depend, for example, on a
+ word being singular or plural (floor, floors), or on a verb tense
+ (flooring, floored). Because the mechanisms used for stemming
+ depend on the specific grammatical rules for each language, there
+ is a separate stemmer module for most common languages where
+ stemming makes sense.
+
+ &RCL; stores the unstemmed versions of terms in the main index
+ and uses auxiliary databases for term expansion (one for each
+ stemming language), which means that you can switch stemming
+ languages between searches, or add a language without needing a
+ full reindex.
+
+ Storing documents written in different languages in the same
+ index is possible, and commonly done. In this situation, you can
+ specify several stemming languages for the index.
+
+ &RCL; currently makes no attempt at automatic language
+ recognition, which means that the stemmer will sometimes be applied
+ to terms from other languages with potentially strange results. In
+ practise, even if this introduces possibilities of confusion, this
+ approach has been proven quite useful, and, awaiting the addition
+ of an automatic language recognition module to &RCL;, it is much
+ less cumbersome than separating your documents according to what
+ language they are written in.
+
+ Before version 1.18, &RCL; always stripped most accents and
+ diacritics from terms, and converted them to lower case before
+ storing them in the index. As a consequence, it was impossible to
+ search for a particular capitalization of a term
+ (US / us), or to
+ discriminate two terms based on diacritics (sake
+ / saké, mate /
+ maté).
+
+ As of version 1.18, &RCL; can optionally store the raw terms,
+ without accent stripping or case conversion. Expansions necessary
+ for searches insensitive to case and/or diacritics are then
+ performed when searching. This is described in more detail in the
+ section about index case
+ and diacritics sensitivity.&RCL; has many parameters which define exactly what to
index, and how to classify and decode the source
@@ -507,13 +527,45 @@ recoll
Index case and diacritics sensitivity
- Index case sensitivity
- is controlled by the indexStripChars configuration
+ As of &RCL; version 1.18 you have a choice of building an
+ index with terms stripped of character case and diacritics, or
+ one with raw terms. For a source term of
+ Résumé, the former will store
+ resume, the latter
+ Résumé.
+
+ Each type of index allows performing searches insensitive to
+ case and diacritics: with a raw index, the user entry will be
+ expanded to match all case and diacritics variations present in
+ the index. With a stripped index, the search term will be stripped
+ before searching.
+
+ A raw index allows for another possibility which a stripped
+ index cannot offer: using case and diacritics to discriminate
+ between terms, returning different results when searching for
+ US and us or
+ resume and résumé.
+ Read the section about search
+ case and diacritics sensitivity for more details.
+
+ The type of index to be created is controlled by the
+ indexStripChars configuration
variable which can only be changed by editing the
configuration file. Any change implies an index reset (not
- automated by recoll), and all indexes in a search must be set
- in the same way (again, not checked by recoll).
+ automated by &RCL;), and all indexes in a search must be set
+ in the same way (again, not checked by &RCL;).
+ If the indexStripChars is not set, &RCL;
+ 1.18 creates a stripped index by default, for
+ compatibility with previous versions.
+
+ As a cost for added capability, a raw index will be slightly
+ bigger than a stripped one (around 10%). Also, searches will be
+ more complex, so probably slightly slower, and the feature is
+ still young, and a certain amount of weirdness cannot be
+ excluded.
+
+
@@ -1011,7 +1063,7 @@ fvwm
start an external viewer. The viewer for each document type can be
configured through the user preferences dialog, or by editing the
mimeview configuration file. You can also check
- the Use desktop preferences option in the user
+ the Use desktop preferences option in the GUI
preferences dialog to use the desktop defaults for all
documents. This is probably the best option if you are using a well
configured Gnome or
@@ -1819,6 +1871,14 @@ fvwm
application.
+ Exceptions: when using the
+ desktop preferences for opening documents, these are mime types
+ that will still be opened according to &RCL; preferences. This
+ is useful for passing parameters like page numbers or search
+ strings to applications that support them
+ (e.g. evince).
+
+
Choose editor applications
this will let you choose the command started by the
Open links inside the result list, for
@@ -2369,144 +2429,160 @@ text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/r
section.&RCL; currently manages the following default fields:
+
+
title,
- subject or caption are
- synonyms which specify data to be searched for in the
- document title or subject.
-
+ subject or caption are
+ synonyms which specify data to be searched for in the
+ document title or subject.
+
+
author or
- from for searching the documents originators.
-
+ from for searching the documents
+ originators.
+
+
recipient or
- to for searching the documents recipients.
-
+ to for searching the documents
+ recipients.
+
+
keyword for searching the
- document-specified keywords (few documents actually have any).
-
+ document-specified keywords (few documents actually have
+ any).
+
+
filename for the document's
- file name.
+ file name.
+
ext specifies the file
- name extension (Ex: ext:html)
-
-
+ name extension (Ex: ext:html)
+
+
+
The field syntax also supports a few field-like, but
- special, criteria:
+ special, criteria:
+
+
dir for filtering the
- results on file location (Ex:
- dir:/home/me/somedir). -dir
- also works to find results out of the specified directory, only
- after release 1.15.8. A tilde inside the value will be expanded to
- the home directory. dir is not a regular field
- and only one value makes sense in a query (you can't use
- dir:dir1 OR dir:dir2). Relative paths make
- sense, for example,
- dir:share/doc would match either
- /usr/share/doc or
- /usr/local/share/doc
-
+ results on file location (Ex:
+ dir:/home/me/somedir). -dir
+ also works to find results out of the specified directory, only
+ after release 1.15.8. A tilde inside the value will be expanded to
+ the home directory. dir is not a regular field
+ and only one value makes sense in a query (you can't use
+ dir:dir1 OR dir:dir2). Relative paths make
+ sense, for example,
+ dir:share/doc would match either
+ /usr/share/doc or
+ /usr/local/share/doc
+
size for filtering the
- results on file size. Example:
- size<10000. You can use
- <, > or
- = as operators. You can specify a range like the
- following: size>100 size<1000. The usual
- k/K, m/M, g/G, t/T can be used as (decimal)
- multipliers. Ex: size>1k to search for files
- bigger than 1000 bytes.
-
+ results on file size. Example:
+ size<10000. You can use
+ <, > or
+ = as operators. You can specify a range like the
+ following: size>100 size<1000. The usual
+ k/K, m/M, g/G, t/T can be used as (decimal)
+ multipliers. Ex: size>1k to search for files
+ bigger than 1000 bytes.
+
date for searching or filtering
- on dates. The syntax for the argument is based on the ISO8601
- standard for dates and time intervals. Only dates are supported, no
- times. The general syntax is 2 elements separated by a
- / character. Each element can be a date or a
- period of time. Periods are specified as
-PnYnMnD.
- The n numbers are the respective numbers
- of years, months or days, any of which may be missing. Dates are
- specified as
-YYYY-MM-DD.
- The days and months parts may be missing. If the
- / is present but an element is missing, the
- missing element is interpreted as the lowest or highest date in the
- index. Examples:
+ on dates. The syntax for the argument is based on the ISO8601
+ standard for dates and time intervals. Only dates are supported, no
+ times. The general syntax is 2 elements separated by a
+ / character. Each element can be a date or a
+ period of time. Periods are specified as
+ PnYnMnD.
+ The n numbers are the respective numbers
+ of years, months or days, any of which may be missing. Dates are
+ specified as
+ YYYY-MM-DD.
+ The days and months parts may be missing. If the
+ / is present but an element is missing, the
+ missing element is interpreted as the lowest or highest date in the
+ index. Examples:
+
2001-03-01/2002-05-01 the
- basic syntax for an interval of dates.
-
+ basic syntax for an interval of dates.
+ 2001-03-01/P1Y2M the
- same specified with a period.
-
+ same specified with a period.
+
2001/ from the beginning of
- 2001 to the latest date in the index.
-
+ 2001 to the latest date in the index.
+
2001 the whole year of
- 2001
+ 2001
P2D/ means 2 days ago up to
- now if there are no documents with dates in the future.
-
+ now if there are no documents with dates in the future.
+
/2003 all documents from
- 2003 or older.
-
-
+ 2003 or older.
+
+
Periods can also be specified with small letters (ie:
- p2y).
-
+ p2y).
+
mime or
- format for specifying the
- mime type. This one is quite special because you can specify
- several values which will be OR'ed (the normal default for the
- language is AND). Ex: mime:text/plain
- mime:text/html. Specifying an explicit boolean
- operator before a
- mime specification is not supported and
- will produce strange results. You can filter out certain types
- by using negation (-mime:some/type), and you can
- use wildcards in the value (mime:text/*).
- Note that mime is
- the ONLY field with an OR default. You do need to use
- OR with ext terms for
- example.
-
+ format for specifying the
+ mime type. This one is quite special because you can specify
+ several values which will be OR'ed (the normal default for the
+ language is AND). Ex: mime:text/plain
+ mime:text/html. Specifying an explicit boolean
+ operator before a
+ mime specification is not supported and
+ will produce strange results. You can filter out certain types
+ by using negation (-mime:some/type), and you can
+ use wildcards in the value (mime:text/*).
+ Note that mime is
+ the ONLY field with an OR default. You do need to use
+ OR with ext terms for
+ example.
+
type or
- rclcat for specifying the category (as in
- text/media/presentation/etc.). The classification of mime
- types in categories is defined in the &RCL; configuration
- (mimeconf), and can be modified or
- extended. The default category names are those which permit
- filtering results in the main GUI screen. Categories are OR'ed
- like mime types above. This can't be negated with
- - either.
-
+ rclcat for specifying the category (as in
+ text/media/presentation/etc.). The classification of mime
+ types in categories is defined in the &RCL; configuration
+ (mimeconf), and can be modified or
+ extended. The default category names are those which permit
+ filtering results in the main GUI screen. Categories are OR'ed
+ like mime types above. This can't be negated with
+ - either.
+
-
+
Words inside phrases and capitalized words are not
- stem-expanded. Wildcards may be used anywhere inside a term.
- Specifying a wild-card on the left of a term can produce a very
- slow search (or even an incorrect one if the expansion is
- truncated because of excessive size). Also see More about wildcards.
+ stem-expanded. Wildcards may be used anywhere inside a term.
+ Specifying a wild-card on the left of a term can produce a very
+ slow search (or even an incorrect one if the expansion is
+ truncated because of excessive size). Also see
+
+ More about wildcards.
The document filters used while indexing have the
- possibility to create other fields with arbitrary names, and
- aliases may be defined in the configuration, so that the exact
- field search possibilities may be different for you if someone
- took care of the customisation.
+ possibility to create other fields with arbitrary names, and
+ aliases may be defined in the configuration, so that the exact
+ field search possibilities may be different for you if someone
+ took care of the customisation.
ModifiersSome characters are recognized as search modifiers when found
- immediately after the closing double quote of a phrase, as in
- "some term"modifierchars. The actual "phrase"
- can be a single term of course. Supported modifiers:
+ immediately after the closing double quote of a phrase, as in
+ "some term"modifierchars. The actual "phrase"
+ can be a single term of course. Supported modifiers:
+
l can be used to turn off
stemming (mostly makes sense with p because
@@ -2525,6 +2601,12 @@ text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/r
(unordered). Example:"order any in"p
+ C will turn on case
+ sensitivity (if the index supports it).
+
+ D will turn on diacritics
+ sensitivity (if the index supports it).
+
A weight can be specified for a query element
by specifying a decimal value at the start of the
modifiers. Example: "Important"2.5.
@@ -2537,6 +2619,78 @@ text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/r
+
+
+ Search case and diacritics sensitivity
+
+ For &RCL; versions 1.18 and later, and when working
+ with a raw index (not the default), searches can be
+ made sensitive
+ to character case and diacritics. How this happens is controlled by
+ configuration variables and what search data is entered.
+
+ The general default is that searches are insensitive to case
+ and diacritics. An entry of resume will match any
+ of Resume, RESUME,
+ résumé, Résumé etc.
+
+ Two configuration variables can automate switching on
+ sensitivity:
+
+
+
+
+ autodiacsensIf this is set, search
+ sensitivity to diacritics will be turned on as soon as an
+ accented character exists in a search term. When the variable
+ is set to true, resume will start a
+ diacritics-unsensitive search, but résumé
+ will be matched exactly. The default value is
+ false.
+
+
+
+ autocasesensIf this is set, search
+ sensitivity to character case will be turned on as soon as an
+ upper-case character exists in a search term except
+ for the first one. When the variable is set to
+ true, us or Us will
+ start a diacritics-unsensitive search, but
+ US will be matched exactly. The default
+ value is true (contrary to
+ autodiacsens).
+
+
+
+
+ As in the past, capitalizing the first letter of a word will
+ turn off its stem expansion and have no effect on
+ case-sensitivity.
+
+ You can also explicitely activate case and diacritics
+ sensitivity by using modifiers with the query
+ language. C will make the term case-sensitive, and
+ D will make it
+ diacritics-sensitive. Examples:
+
+ "us"C
+
+
+ will search for the term us exactly
+ (Us will not be a match).
+
+
+ "resume"D
+
+ will search for the term resume exactly
+ (résumé will not be a match).
+
+
+ When either case or diacritics sensitivity is activated, stem
+ expansion is turned off. Having both does not make much sense.
+
+
+
Anchored searches and wildcards
@@ -2929,11 +3083,11 @@ application/x-chm = execm rclchm
Page numbersThe indexer will interpret ^L characters
- in the filter output as indicating page breaks, and will record
- them. At query time, this allows starting a viewer on the right
- page for a hit or a snippet. Currently, only the PDF filter
- generates page breaks (thanks to
- pdftotext).
+ in the filter output as indicating page breaks, and will record
+ them. At query time, this allows starting a viewer on the right
+ page for a hit or a snippet. Currently, only the PDF, Postscript
+ and DVI filters generate page breaks.
+
@@ -4529,30 +4683,38 @@ x-my-tag = mailmytag
The mimeview filemimeview specifies which programs
- are started when you click on an Open
- link in a result list. Ie: HTML is normally displayed using
+ are started when you click on an Open link
+ in a result list. Ie: HTML is normally displayed using
firefox, but you may prefer
Konqueror, your
openoffice.org
program might be named oofice instead of
- openoffice etc.
-
+ openoffice etc.
Changes to this file can be done by direct editing, or
- through the recoll user preferences dialog.
+ through the recoll GUI preferences dialog.
If Use desktop preferences to choose document
- editor is checked in the &RCL; GUI user preferences, all
+ editor is checked in the &RCL; GUI preferences, all
mimeview entries will be ignored except the
one labelled application/x-all (which is set to
use xdg-open by default).
+ In this case, the xallexcepts top level
+ variable defines a list of mime type exceptions which
+ will be processed according to the local entries instead of being
+ passed to the desktop. This is so that specific &RCL; options
+ such as a page number or a search string can be passed to
+ applications that support them, such as the
+ evince viewer.
+
As for the other configuration files, the normal usage
- is to have a mimeview inside your own
- configuration directory, with just the non-default entries,
- which will override those from the central configuration
- file.
- Please note that these entries must be placed under a
+ is to have a mimeview inside your own
+ configuration directory, with just the non-default entries,
+ which will override those from the central configuration
+ file.
+
+ All viewer definition entries must be placed under a
[view] section.The keys in the file are normally mime types. You can add an
@@ -4602,9 +4764,9 @@ x-my-tag = mailmytag
%pPage index. Only significant for a subset of document
- types, currently only PDF files. Can be used to start the
- editor at the right page for a match or
- snippet.
+ types, currently only PDF, Postscript and DVI files. Can be
+ used to start the editor at the right page for a match or
+ snippet.%s
diff --git a/src/qtgui/uiprefs.ui b/src/qtgui/uiprefs.ui
index 45c5faaa..a0820485 100644
--- a/src/qtgui/uiprefs.ui
+++ b/src/qtgui/uiprefs.ui
@@ -184,6 +184,9 @@
Exceptions
+
+ Mime types that should not be passed to xdg-open even when "Use desktop preferences" is set.<br> Useful to pass page number and search string options to, e.g. evince.
+
diff --git a/src/utils/conftree.cpp b/src/utils/conftree.cpp
index cc557f9b..6bf2b057 100644
--- a/src/utils/conftree.cpp
+++ b/src/utils/conftree.cpp
@@ -39,10 +39,6 @@
using namespace std;
#endif // NO_NAMESPACES
-#ifndef MIN
-#define MIN(A,B) ((A)<(B) ? (A) : (B))
-#endif
-
#undef DEBUG
#ifdef DEBUG
#define LOGDEB(X) fprintf X
@@ -276,7 +272,7 @@ int ConfSimple::set(const std::string &nm, const std::string &value,
{
if (status != STATUS_RW)
return 0;
- LOGDEB2(("ConfSimple::set [%s]:[%s] -> [%s]\n", sk.c_str(),
+ LOGDEB((stderr, "ConfSimple::set [%s]:[%s] -> [%s]\n", sk.c_str(),
nm.c_str(), value.c_str()));
if (!i_set(nm, value, sk))
return 0;