From 5a9b90d26cddae1200981b0b63323f5ef9e42386 Mon Sep 17 00:00:00 2001 From: dockes Date: Thu, 25 Jan 2007 15:47:45 +0000 Subject: [PATCH] *** empty log message *** --- src/doc/user/usermanual.sgml | 253 ++++++++++++++++++++++++++++------- 1 file changed, 202 insertions(+), 51 deletions(-) diff --git a/src/doc/user/usermanual.sgml b/src/doc/user/usermanual.sgml index 1574c975..00194db6 100644 --- a/src/doc/user/usermanual.sgml +++ b/src/doc/user/usermanual.sgml @@ -24,7 +24,7 @@ Dockes - $Id: usermanual.sgml,v 1.35 2007-01-15 13:03:35 dockes Exp $ + $Id: usermanual.sgml,v 1.36 2007-01-25 15:47:45 dockes Exp $ This document introduces full text search notions @@ -178,7 +178,7 @@ is normally incremental: documents will only be processed if they have been modified. On the first execution, of course, all documents will need processing. A full index build can be forced - later on by specifying an option to the indexing command + later by specifying an option to the indexing command (recollindex -z). &RCL; indexing can be performed with two different @@ -486,7 +486,7 @@ fvwm - Search + Searching The recoll program provides the user interface for searching. It is based on the @@ -510,19 +510,27 @@ fvwm - The initial default search mode is Any - term. This will look for documents with any of the - search terms (the ones with more terms will get better scores). - All terms will ensure - that only documents with all the terms will be - returned. File name will specifically - look for file names, and allows using wildcards - (*, ? , - []). + The initial default search mode is All + terms. This will look for documents containing all + of the search terms (the ones with more terms will get better + scores). Any term will search for + documents where at least one of the terms appear. File + name will specifically look for file names. + + The fourth entry (Query Language) is + described in its own + section. + + All search modes allow wildcards inside terms + (*, ?, + []). You may want to have a look at the + section about wildcards + for more information about this. You can search for exact phrases (adjacent words in a given order) by enclosing the input inside double quotes. Ex: "virtual reality". + Character case has no influence on search, except that you can disable stem expansion for any term by capitalizing it. Ie: a search for floor will also normally look for @@ -537,7 +545,7 @@ fvwm text field). Please note, however, that only the search texts are remembered, not the mode (all/any/file name). - Typing Esc Space) while + Typing Esc Space while entering a word in the simple search entry will open a window with possible completions for the word. The completions are extracted from the database. @@ -568,7 +576,10 @@ fvwm tabs in the existing preview window. You can use Shift+Click to force the creation of another preview window, which may be useful to view the documents side - by side. + by side. (You can also browse successive results in a single + preview window by typing + Shift+ArrowUp/Down in the + window). Clicking the Edit link will attempt to start an external viewer. The viewers can be configured through the @@ -618,9 +629,11 @@ fvwm The Preview and Edit entries do the same thing as the - corresponding links. The two following entries will copy either - an URL or the file path to the clipboard, for pasting into - another application. + corresponding links. + + The Copy File Name and + Copy Url copy the relevant data to the + clipboard, for later pasting. The Find similar entry will select a number of relevant term from the current document and enter @@ -628,10 +641,6 @@ fvwm search, with a good chance of finding documents related to the current result. - The Copy File Name and - Copy Url copy the relevant data to the - clipboard, for later pasting. - The Parent document entry will appear for documents which are not actually files but are part of, or attached to, a higher level document. This entry @@ -653,7 +662,9 @@ fvwm Preview link inside the result list. Subsequent preview requests for a given search open new - tabs in the existing window. + tabs in the existing window (except if you hold the + Shift key while clicking which will open a new + window for side by side viewing). Starting another search and requesting a preview will create a new preview window. The old one stays open until you @@ -690,12 +701,93 @@ fvwm + + The query language + + The query language processor is activated on the + simple search entry when the search mode selector is set to + Query Language. + + Here follows a sample request that we are going to + explain: + + mime:message/rfc822 author:"john doe" Beatles OR Lennon Live OR Unplugged -potatoes + + + This would search for all email messages with + John Doe + appearing as a phrase in the From: header, + and containing either beatles or + lennon and either + live or + unplugged but not + potatoes. + + The first element, mime:message/rfc822 + is a special switch that restricts the results to be email + messages. There could be several such switches, which would form + a list of allowed types. + + The second element author:"john doe" is + a phrase search limited to a specific field. Phrase searches are + specified as usual by enclosing the words in double quotes. The + field specification appears before the colon. &RCL; currently + manages the following fields: + + title, + subject or caption are + synonyms which specify data to be searched for in the + document title or subject. + + author or + from for searching the documents originators. + + keyword for searching the + document specified keywords (few documents actually have any). + + + + The query language is currently the only way to use the + &RCL; field search capability. + + All elements in the search entry are normally combined + with an implicit AND. It is possible to specify that elements be + OR'ed instead, as in Beatles + OR Lennon. The + OR must be entered literally (capitals), and + it has priority over the AND associations: + word1 + word2 OR + word3 + means + word1 AND + (word2 OR + word3) + not + (word1 AND + word2) OR + word3. Do not enter explicit + parenthesis, they are not supported for now. + + An entry preceded by a - specifies a + term that should not appear. + + Words inside phrases and capitalized words are not + stem-expanded. Wildcards may be used anywhere. + + You can use the show query link at the + top of the result list to check the exact query which was + finally executed by Xapian. + + + Complex/advanced search - The advanced search dialog has fields that will allow a more - refined search. It has a number of entry fields, each of which - is configurable for the following modes: + The advanced search dialog has a number of fields that + will allow a more refined search. Each entry field is + configurable for the following modes: + All terms. @@ -712,16 +804,17 @@ fvwm Filename search with wildcards. - + Additional entry fields can be created by clicking the Add clause button. - All relevant fields will be combined by an implicit AND - or OR conjunction. All types of clauses except "phrase" and - "near" can accept a mix of single words and phrases enclosed - in double quotes. Stemming expansion will be performed for all - terms not beginning with a capital letter, except for "phrase" - clauses. + You can choose that all relevant fields will be combined + by either an AND or an OR conjunction. All types of clauses + except "phrase" and "near" can accept a mix of single words and + phrases enclosed in double quotes. Stemming expansion will be + performed for all terms not beginning with a capital letter, + except for terms inside "phrase" clauses. Wildcards will be + processed everywhere. Advanced search will also let you search for documents of specific mime types (ie: only text/plain, or @@ -764,18 +857,26 @@ fvwm Wildcard In this mode of operation, you can enter a - search string with shell-like wildcards (*, ?). ie: - xapi* . + search string with shell-like wildcards (*, ?, []). ie: + xapi* would display all index terms + beginning with xapi. (More + about wildcards here). Regular expression This mode will accept a regular expression as input. Example: - word[0-9]+ . The regular - expression is anchored by enclosing in - ^ and $ before - execution. + word[0-9]+. The expression is + implicitely anchored at the beginning. Ie: + press will match + pression but not + expression. You can use + .*press to match the latter, + but be aware that this will cause a full index term list + scan, which can be quite long. + @@ -815,6 +916,53 @@ fvwm + + More about wildcards + All words entered in &RCL; search fields will be processed + for wildcard expansion before the request is finally + executed. + + The wildcard characters are: + + + * which matches 0 or more + characters. + + ? which matches + a single character. + + [] which allow + defining sets of characters to be matched (ex: + [abc] + matches a single character which may be 'a' or 'b' or 'c', + [0-9] + matches any number. + + + + You should be aware of a few things before using + wildcards. + + + Using a wildcard character at the beginning of + a word can make for a slow search because &RCL; will have to + scan the whole index term list to find the matches. + + Using a * at the end of a + word can produce more matches than you would think, and + strange search results. You can use the term explorer tool to + check what completions exist for a given term. You can also + see exactly what search was performed by clicking on the link + at the top of the result list. In general, for natural + language terms, stem expansion will produce better results + than an ending * (stem expansion is turned + off when any wildcard character appears in the term). + + + + + Multiple databases @@ -861,14 +1009,14 @@ fvwm A typical usage scenario for the multiple index feature would be for a system administrator to set up a central index - for shared data, that you may choose to search, or not, in - addition to your personal data. Of course, there are other + for shared data, that you choose to search or not in addition to + your personal data. Of course, there are other possibilities. There are many cases where you know the subset of - files that you want to be searched for a given query, and where - restricting the query will much improve the precision of the - results. This can also be performed with the directory filter in - advanced search, but multiple indexes will have much better - performance and may be worth the trouble. + files that should be searched, and where narrowing the search + can improve the results. You can achieve approximately the same + effect with the directory filter in advanced search, but + multiple indexes will have much better performance and may be + worth the trouble. @@ -1167,10 +1315,10 @@ fvwm /usr/local/recollglobal/xapiandb). Once entered, the indexes will appear in the - All indexes list, and you can - chose which ones you want to use at any moment by transferring - them to/from the Active indexes - list. + External indexes list, and you can + chose which ones you want to use at any moment by checking or + unchecking their entries. + Your main database (the one the current configuration indexes to), is always implicitly active. If this is not desirable, you can set up your configuration so that it indexes, @@ -1292,8 +1440,11 @@ fvwm - Text, HTML, mail folders and Openoffice files are - processed internally. + Text, HTML, mail folders Openoffice and Scribus files + are processed internally. Lyx is used to index Lyx files. Many + filters need sed and awk. + +