From 770e3844fa77ff6c5bef1d53234a59e78dd4ff02 Mon Sep 17 00:00:00 2001 From: Jean-Francois Dockes Date: Thu, 4 Oct 2012 17:03:46 +0200 Subject: [PATCH] doc and messages --- src/doc/user/usermanual.sgml | 454 ++++++++++++++++++++++++----------- src/qtgui/uiprefs.ui | 3 + src/utils/conftree.cpp | 6 +- 3 files changed, 312 insertions(+), 151 deletions(-) diff --git a/src/doc/user/usermanual.sgml b/src/doc/user/usermanual.sgml index 756f988a..bcf15a5b 100644 --- a/src/doc/user/usermanual.sgml +++ b/src/doc/user/usermanual.sgml @@ -139,27 +139,47 @@ index. It has input filters for many document types. Stemming is the process by which &RCL; reduces words to - their radicals so that searching does not depend, for example, - on a word being singular or plural (floor, floors), or on a verb - tense (flooring, floored). Because the mechanisms used for - stemming depend on the specific grammatical rules for each - language, there is a separate stemmer module for most common - languages where stemming makes sense. Storing documents written - in different languages in the same index is possible, and - commonly done. In this situation, you can specify several - stemming languages for the index. &RCL; stores the unstemmed - versions of terms in the main index and uses auxiliary databases - for term expansion (one for each stemming language), which means - that you can switch stemming languages between searches, or add - a language without needing a full reindex. &RCL; currently - makes no attempt at automatic language recognition, which means - that the stemmer will sometimes be applied to terms from other - languages with potentially strange results. In practise, even if - this introduces possibilities of confusion, this approach has - been proven quite useful, and, awaiting the addition of an - automatic language recognition module to &RCL;, it is much less - cumbersome than separating your documents according to what - language they are written in. + their radicals so that searching does not depend, for example, on a + word being singular or plural (floor, floors), or on a verb tense + (flooring, floored). Because the mechanisms used for stemming + depend on the specific grammatical rules for each language, there + is a separate stemmer module for most common languages where + stemming makes sense. + + &RCL; stores the unstemmed versions of terms in the main index + and uses auxiliary databases for term expansion (one for each + stemming language), which means that you can switch stemming + languages between searches, or add a language without needing a + full reindex. + + Storing documents written in different languages in the same + index is possible, and commonly done. In this situation, you can + specify several stemming languages for the index. + + &RCL; currently makes no attempt at automatic language + recognition, which means that the stemmer will sometimes be applied + to terms from other languages with potentially strange results. In + practise, even if this introduces possibilities of confusion, this + approach has been proven quite useful, and, awaiting the addition + of an automatic language recognition module to &RCL;, it is much + less cumbersome than separating your documents according to what + language they are written in. + + Before version 1.18, &RCL; always stripped most accents and + diacritics from terms, and converted them to lower case before + storing them in the index. As a consequence, it was impossible to + search for a particular capitalization of a term + (US / us), or to + discriminate two terms based on diacritics (sake + / saké, mate / + maté). + + As of version 1.18, &RCL; can optionally store the raw terms, + without accent stripping or case conversion. Expansions necessary + for searches insensitive to case and/or diacritics are then + performed when searching. This is described in more detail in the + section about index case + and diacritics sensitivity. &RCL; has many parameters which define exactly what to index, and how to classify and decode the source @@ -507,13 +527,45 @@ recoll Index case and diacritics sensitivity - Index case sensitivity - is controlled by the indexStripChars configuration + As of &RCL; version 1.18 you have a choice of building an + index with terms stripped of character case and diacritics, or + one with raw terms. For a source term of + Résumé, the former will store + resume, the latter + Résumé. + + Each type of index allows performing searches insensitive to + case and diacritics: with a raw index, the user entry will be + expanded to match all case and diacritics variations present in + the index. With a stripped index, the search term will be stripped + before searching. + + A raw index allows for another possibility which a stripped + index cannot offer: using case and diacritics to discriminate + between terms, returning different results when searching for + US and us or + resume and résumé. + Read the section about search + case and diacritics sensitivity for more details. + + The type of index to be created is controlled by the + indexStripChars configuration variable which can only be changed by editing the configuration file. Any change implies an index reset (not - automated by recoll), and all indexes in a search must be set - in the same way (again, not checked by recoll). + automated by &RCL;), and all indexes in a search must be set + in the same way (again, not checked by &RCL;). + If the indexStripChars is not set, &RCL; + 1.18 creates a stripped index by default, for + compatibility with previous versions. + + As a cost for added capability, a raw index will be slightly + bigger than a stripped one (around 10%). Also, searches will be + more complex, so probably slightly slower, and the feature is + still young, and a certain amount of weirdness cannot be + excluded. + + @@ -1011,7 +1063,7 @@ fvwm start an external viewer. The viewer for each document type can be configured through the user preferences dialog, or by editing the mimeview configuration file. You can also check - the Use desktop preferences option in the user + the Use desktop preferences option in the GUI preferences dialog to use the desktop defaults for all documents. This is probably the best option if you are using a well configured Gnome or @@ -1819,6 +1871,14 @@ fvwm application. + Exceptions: when using the + desktop preferences for opening documents, these are mime types + that will still be opened according to &RCL; preferences. This + is useful for passing parameters like page numbers or search + strings to applications that support them + (e.g. evince). + + Choose editor applications this will let you choose the command started by the Open links inside the result list, for @@ -2369,144 +2429,160 @@ text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/r section. &RCL; currently manages the following default fields: + + title, - subject or caption are - synonyms which specify data to be searched for in the - document title or subject. - + subject or caption are + synonyms which specify data to be searched for in the + document title or subject. + + author or - from for searching the documents originators. - + from for searching the documents + originators. + + recipient or - to for searching the documents recipients. - + to for searching the documents + recipients. + + keyword for searching the - document-specified keywords (few documents actually have any). - + document-specified keywords (few documents actually have + any). + + filename for the document's - file name. + file name. + ext specifies the file - name extension (Ex: ext:html) - - + name extension (Ex: ext:html) + + + The field syntax also supports a few field-like, but - special, criteria: + special, criteria: + + dir for filtering the - results on file location (Ex: - dir:/home/me/somedir). -dir - also works to find results out of the specified directory, only - after release 1.15.8. A tilde inside the value will be expanded to - the home directory. dir is not a regular field - and only one value makes sense in a query (you can't use - dir:dir1 OR dir:dir2). Relative paths make - sense, for example, - dir:share/doc would match either - /usr/share/doc or - /usr/local/share/doc - + results on file location (Ex: + dir:/home/me/somedir). -dir + also works to find results out of the specified directory, only + after release 1.15.8. A tilde inside the value will be expanded to + the home directory. dir is not a regular field + and only one value makes sense in a query (you can't use + dir:dir1 OR dir:dir2). Relative paths make + sense, for example, + dir:share/doc would match either + /usr/share/doc or + /usr/local/share/doc + size for filtering the - results on file size. Example: - size<10000. You can use - <, > or - = as operators. You can specify a range like the - following: size>100 size<1000. The usual - k/K, m/M, g/G, t/T can be used as (decimal) - multipliers. Ex: size>1k to search for files - bigger than 1000 bytes. - + results on file size. Example: + size<10000. You can use + <, > or + = as operators. You can specify a range like the + following: size>100 size<1000. The usual + k/K, m/M, g/G, t/T can be used as (decimal) + multipliers. Ex: size>1k to search for files + bigger than 1000 bytes. + date for searching or filtering - on dates. The syntax for the argument is based on the ISO8601 - standard for dates and time intervals. Only dates are supported, no - times. The general syntax is 2 elements separated by a - / character. Each element can be a date or a - period of time. Periods are specified as -PnYnMnD. - The n numbers are the respective numbers - of years, months or days, any of which may be missing. Dates are - specified as -YYYY-MM-DD. - The days and months parts may be missing. If the - / is present but an element is missing, the - missing element is interpreted as the lowest or highest date in the - index. Examples: + on dates. The syntax for the argument is based on the ISO8601 + standard for dates and time intervals. Only dates are supported, no + times. The general syntax is 2 elements separated by a + / character. Each element can be a date or a + period of time. Periods are specified as + PnYnMnD. + The n numbers are the respective numbers + of years, months or days, any of which may be missing. Dates are + specified as + YYYY-MM-DD. + The days and months parts may be missing. If the + / is present but an element is missing, the + missing element is interpreted as the lowest or highest date in the + index. Examples: + 2001-03-01/2002-05-01 the - basic syntax for an interval of dates. - + basic syntax for an interval of dates. + 2001-03-01/P1Y2M the - same specified with a period. - + same specified with a period. + 2001/ from the beginning of - 2001 to the latest date in the index. - + 2001 to the latest date in the index. + 2001 the whole year of - 2001 + 2001 P2D/ means 2 days ago up to - now if there are no documents with dates in the future. - + now if there are no documents with dates in the future. + /2003 all documents from - 2003 or older. - - + 2003 or older. + + Periods can also be specified with small letters (ie: - p2y). - + p2y). + mime or - format for specifying the - mime type. This one is quite special because you can specify - several values which will be OR'ed (the normal default for the - language is AND). Ex: mime:text/plain - mime:text/html. Specifying an explicit boolean - operator before a - mime specification is not supported and - will produce strange results. You can filter out certain types - by using negation (-mime:some/type), and you can - use wildcards in the value (mime:text/*). - Note that mime is - the ONLY field with an OR default. You do need to use - OR with ext terms for - example. - + format for specifying the + mime type. This one is quite special because you can specify + several values which will be OR'ed (the normal default for the + language is AND). Ex: mime:text/plain + mime:text/html. Specifying an explicit boolean + operator before a + mime specification is not supported and + will produce strange results. You can filter out certain types + by using negation (-mime:some/type), and you can + use wildcards in the value (mime:text/*). + Note that mime is + the ONLY field with an OR default. You do need to use + OR with ext terms for + example. + type or - rclcat for specifying the category (as in - text/media/presentation/etc.). The classification of mime - types in categories is defined in the &RCL; configuration - (mimeconf), and can be modified or - extended. The default category names are those which permit - filtering results in the main GUI screen. Categories are OR'ed - like mime types above. This can't be negated with - - either. - + rclcat for specifying the category (as in + text/media/presentation/etc.). The classification of mime + types in categories is defined in the &RCL; configuration + (mimeconf), and can be modified or + extended. The default category names are those which permit + filtering results in the main GUI screen. Categories are OR'ed + like mime types above. This can't be negated with + - either. + - + Words inside phrases and capitalized words are not - stem-expanded. Wildcards may be used anywhere inside a term. - Specifying a wild-card on the left of a term can produce a very - slow search (or even an incorrect one if the expansion is - truncated because of excessive size). Also see More about wildcards. + stem-expanded. Wildcards may be used anywhere inside a term. + Specifying a wild-card on the left of a term can produce a very + slow search (or even an incorrect one if the expansion is + truncated because of excessive size). Also see + + More about wildcards. The document filters used while indexing have the - possibility to create other fields with arbitrary names, and - aliases may be defined in the configuration, so that the exact - field search possibilities may be different for you if someone - took care of the customisation. + possibility to create other fields with arbitrary names, and + aliases may be defined in the configuration, so that the exact + field search possibilities may be different for you if someone + took care of the customisation. Modifiers Some characters are recognized as search modifiers when found - immediately after the closing double quote of a phrase, as in - "some term"modifierchars. The actual "phrase" - can be a single term of course. Supported modifiers: + immediately after the closing double quote of a phrase, as in + "some term"modifierchars. The actual "phrase" + can be a single term of course. Supported modifiers: + l can be used to turn off stemming (mostly makes sense with p because @@ -2525,6 +2601,12 @@ text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/r (unordered). Example:"order any in"p + C will turn on case + sensitivity (if the index supports it). + + D will turn on diacritics + sensitivity (if the index supports it). + A weight can be specified for a query element by specifying a decimal value at the start of the modifiers. Example: "Important"2.5. @@ -2537,6 +2619,78 @@ text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/r + + + Search case and diacritics sensitivity + + For &RCL; versions 1.18 and later, and when working + with a raw index (not the default), searches can be + made sensitive + to character case and diacritics. How this happens is controlled by + configuration variables and what search data is entered. + + The general default is that searches are insensitive to case + and diacritics. An entry of resume will match any + of Resume, RESUME, + résumé, Résumé etc. + + Two configuration variables can automate switching on + sensitivity: + + + + + autodiacsensIf this is set, search + sensitivity to diacritics will be turned on as soon as an + accented character exists in a search term. When the variable + is set to true, resume will start a + diacritics-unsensitive search, but résumé + will be matched exactly. The default value is + false. + + + + autocasesensIf this is set, search + sensitivity to character case will be turned on as soon as an + upper-case character exists in a search term except + for the first one. When the variable is set to + true, us or Us will + start a diacritics-unsensitive search, but + US will be matched exactly. The default + value is true (contrary to + autodiacsens). + + + + + As in the past, capitalizing the first letter of a word will + turn off its stem expansion and have no effect on + case-sensitivity. + + You can also explicitely activate case and diacritics + sensitivity by using modifiers with the query + language. C will make the term case-sensitive, and + D will make it + diacritics-sensitive. Examples: + + "us"C + + + will search for the term us exactly + (Us will not be a match). + + + "resume"D + + will search for the term resume exactly + (résumé will not be a match). + + + When either case or diacritics sensitivity is activated, stem + expansion is turned off. Having both does not make much sense. + + + Anchored searches and wildcards @@ -2929,11 +3083,11 @@ application/x-chm = execm rclchm Page numbers The indexer will interpret ^L characters - in the filter output as indicating page breaks, and will record - them. At query time, this allows starting a viewer on the right - page for a hit or a snippet. Currently, only the PDF filter - generates page breaks (thanks to - pdftotext). + in the filter output as indicating page breaks, and will record + them. At query time, this allows starting a viewer on the right + page for a hit or a snippet. Currently, only the PDF, Postscript + and DVI filters generate page breaks. + @@ -4529,30 +4683,38 @@ x-my-tag = mailmytag The mimeview file mimeview specifies which programs - are started when you click on an Open - link in a result list. Ie: HTML is normally displayed using + are started when you click on an Open link + in a result list. Ie: HTML is normally displayed using firefox, but you may prefer Konqueror, your openoffice.org program might be named oofice instead of - openoffice etc. - + openoffice etc. Changes to this file can be done by direct editing, or - through the recoll user preferences dialog. + through the recoll GUI preferences dialog. If Use desktop preferences to choose document - editor is checked in the &RCL; GUI user preferences, all + editor is checked in the &RCL; GUI preferences, all mimeview entries will be ignored except the one labelled application/x-all (which is set to use xdg-open by default). + In this case, the xallexcepts top level + variable defines a list of mime type exceptions which + will be processed according to the local entries instead of being + passed to the desktop. This is so that specific &RCL; options + such as a page number or a search string can be passed to + applications that support them, such as the + evince viewer. + As for the other configuration files, the normal usage - is to have a mimeview inside your own - configuration directory, with just the non-default entries, - which will override those from the central configuration - file. - Please note that these entries must be placed under a + is to have a mimeview inside your own + configuration directory, with just the non-default entries, + which will override those from the central configuration + file. + + All viewer definition entries must be placed under a [view] section. The keys in the file are normally mime types. You can add an @@ -4602,9 +4764,9 @@ x-my-tag = mailmytag %p Page index. Only significant for a subset of document - types, currently only PDF files. Can be used to start the - editor at the right page for a match or - snippet. + types, currently only PDF, Postscript and DVI files. Can be + used to start the editor at the right page for a match or + snippet. %s diff --git a/src/qtgui/uiprefs.ui b/src/qtgui/uiprefs.ui index 45c5faaa..a0820485 100644 --- a/src/qtgui/uiprefs.ui +++ b/src/qtgui/uiprefs.ui @@ -184,6 +184,9 @@ Exceptions + + Mime types that should not be passed to xdg-open even when "Use desktop preferences" is set.<br> Useful to pass page number and search string options to, e.g. evince. + diff --git a/src/utils/conftree.cpp b/src/utils/conftree.cpp index cc557f9b..6bf2b057 100644 --- a/src/utils/conftree.cpp +++ b/src/utils/conftree.cpp @@ -39,10 +39,6 @@ using namespace std; #endif // NO_NAMESPACES -#ifndef MIN -#define MIN(A,B) ((A)<(B) ? (A) : (B)) -#endif - #undef DEBUG #ifdef DEBUG #define LOGDEB(X) fprintf X @@ -276,7 +272,7 @@ int ConfSimple::set(const std::string &nm, const std::string &value, { if (status != STATUS_RW) return 0; - LOGDEB2(("ConfSimple::set [%s]:[%s] -> [%s]\n", sk.c_str(), + LOGDEB((stderr, "ConfSimple::set [%s]:[%s] -> [%s]\n", sk.c_str(), nm.c_str(), value.c_str())); if (!i_set(nm, value, sk)) return 0;