From d0a8a3729813e80ee7d3ec49af9c2b29d1e022ef Mon Sep 17 00:00:00 2001 From: dockes Date: Sat, 17 Jan 2009 14:57:12 +0000 Subject: [PATCH] added compressedfilemaxkbs --- src/doc/man/recoll.conf.5 | 92 +++++++++++++++++--- src/doc/user/usermanual.sgml | 162 ++++++++++++++++++++++++++++++++++- 2 files changed, 237 insertions(+), 17 deletions(-) diff --git a/src/doc/man/recoll.conf.5 b/src/doc/man/recoll.conf.5 index deeb876d..ea9717db 100644 --- a/src/doc/man/recoll.conf.5 +++ b/src/doc/man/recoll.conf.5 @@ -3,8 +3,8 @@ .SH NAME recoll.conf \- main personal configuration file for Recoll .SH DESCRIPTION -This file defines the indexation configuration for the full-text search -system Recoll. +This file defines the indexation configuration for the Recoll full-text search +system. .LP The system-wide configuration file is normally located inside /usr/[local]/share/recoll/examples. Any parameter set in the common file @@ -58,6 +58,11 @@ embedded spaces can be quoted with double-quotes. .BI "topdirs = " directories Specifies the list of directories to index (recursively). .TP +.BI "dbdir = " directory +The name of the Xapian database directory. It will be created if needed +when the database is initialized. If this is not an absolute pathname, it +will be taken relative to the configuration directory. +.TP .BI "skippedNames = " patterns A space-separated list of patterns for names of files or directories that should be completely ignored. The list defined in the default file is: @@ -76,6 +81,18 @@ into. Together with topdirs, this allows pruning the indexed tree to one's content. daemSkippedPaths can be used to define a specific value for the real time indexing monitor. .TP +.BI "followLinks = " boolean +Specifies if the indexer should follow +symbolic links while walking the file tree. The default is +to ignore symbolic links to avoid multiple indexing of +linked files. No effort is made to avoid duplication when +this option is set to true. This option can be set +individually for each of the +.I topdirs +members by using sections. It can not be changed below the +.I topdirs +level. +.TP .BI "loglevel = " value Verbosity level for recoll and recollindex. A value of 4 lists quite a lot of debug/information messages. 3 lists only errors. @@ -87,11 +104,6 @@ Where should the messages go. 'stderr' can be used as a special value. .B daemlogfilename can be used to specify a different value for the real-time indexing daemon. .TP -.BI "dbdir = " directory -The name of the Xapian database directory. It will be created if needed -when the database is initialized. If this is not an absolute pathname, it -will be taken relative to the configuration directory. -.TP .BI "indexstemminglanguages = " languages A list of languages for which the stem expansion databases will be built. See recollindex(1) for possible values. @@ -132,13 +144,6 @@ Try to guess the character set of files if no internal value is available (ie: for plain text files). This does not work well in general, and should probably not be used. .TP -.BI "indexallfilenames = " boolean -Recoll indexes file names into a special section of the database to allow -specific file names searches using wild cards. This parameter decides if -file name indexing is performed only for files with mime types that would -qualify them for full text indexation, or for all files inside -the selected subtrees, independant of mime type. -.TP .BI "usesystemfilecommand = " boolean Decide if we use the .B "file -i" @@ -147,6 +152,65 @@ system command as a final step for determining the mime type for a file .B mimemap file). This can be useful for files with suffixless names, but it will also cause the indexation of many bogus "text" files. +.TP +.BI "indexedmimetypes = " list +Recoll normally indexes any file which it knows how to read. This list lets +you restrict the indexed mime types to what you specify. If the variable is +unspecified or the list empty (the default), all supported types are +processed. +.TP +.BI "compressedfilemaxkbs = " value +Size limit for compressed (.gz or .bz2) files. These need to be +decompressed in a temporary directory for identification, which can be very +wasteful if 'uninteresting' big compressed files are present. Negative +means no limit, 0 means no processing of any compressed file. Defaults +to -1. +.TP +.BI "indexallfilenames = " boolean +Recoll indexes file names into a special section of the database to allow +specific file names searches using wild cards. This parameter decides if +file name indexing is performed only for files with mime types that would +qualify them for full text indexation, or for all files inside +the selected subtrees, independant of mime type. +.TP +.BI "idxabsmlen = " value +Recoll stores an abstract for each indexed file inside the database. The +text can come from an actual 'abstract' section in the document or will +just be the beginning of the document. It is stored in the index so that it +can be displayed inside the result lists without decoding the original +file. The +.I idxabsmlen +parameter defines the size of the stored abstract. The default value is 250 +bytes. The search interface gives you the choice to display this stored +text or a synthetic abstract built by extracting text around the search +terms. If you always prefer the synthetic abstract, you can reduce this +value and save a little space. +.TP +.BI "aspellLanguage = " lang +Language definitions to use when creating the aspell dictionary. The value +must match a set of aspell language definition files. You can type "aspell +config" to see where these are installed (look for data-dir). The default +if the variable is not set is to use your desktop national language +environment to guess the value. +.TP +.BI "noaspell = " boolean +If this is set, the aspell dictionary generation is turned off. Useful for +cases where you don't need the functionality or when it is unusable because +aspell crashes during dictionary generation. +.TP +.BI "nocjk = " boolean +If this set to true, specific east asian (Chinese Korean Japanese) +characters/word splitting is turned off. This will save a small amount of +cpu if you have no CJK documents. If your document base does include such +text but you are not interested in searching it, setting +.I nocjk +may be a significant time and space saver. +.TP +.BI "cjkngramlen = " value +This lets you adjust the size of n-grams used for indexing CJK text. The +default value of 2 is probably appropriate in most cases. A value of 3 +would allow more precision and efficiency on longer words, but the index +will be approximately twice as large. .SH SEE ALSO .PP recollindex(1) recoll(1) diff --git a/src/doc/user/usermanual.sgml b/src/doc/user/usermanual.sgml index e662846b..d7669455 100644 --- a/src/doc/user/usermanual.sgml +++ b/src/doc/user/usermanual.sgml @@ -578,9 +578,9 @@ fvwm - Searching + Searching with the Qt graphical user interface - The recoll program provides the user + The recoll program provides the main user interface for searching. It is based on the QT library. @@ -1048,6 +1048,23 @@ fvwm + Phrases and Proximity searches + These two clauses work in similar ways, with the + difference that proximity searches do not impose an order on the + words. In both cases, an adjustable number (slack) of non-matched words + may be accepted between the searched ones (use the counter on + the left to adjust this count). For phrases, the default count + is zero (exact match). For proximity it is ten (meaning that two search + terms, would be matched if found within a window of twelve + words). Examples: a phrase search for quick + fox with a slack of 0 will match quick + fox but not quick brown fox. With + a slack of 1 it will match the latter, but not fox + quick. A proximity search for quick + fox with the default slack will match the + latter, and also a fox is a cunning and quick animal. + + Click on the Start Search button in the advanced search dialog, or type Enter in any text field to start the search. The button in @@ -1361,7 +1378,7 @@ fvwm quotes. Example: "user manual" will look only for occurrences of user immediately followed by manual. You can use the - This exact phrase field of the advanced + This phrase field of the advanced search dialog to the same effect. Phrases can be entered along simple terms in all simple or advanced search entry fields (except This exact phrase). @@ -1646,6 +1663,135 @@ fvwm + + Searching with the KDE KIO slave + + + What's this + + The &RCL; KIO slave allows performing a &RCL; search + by entering an appropriate URL in a KDE open dialog, or with an + HTML-based interface displayed in + Konqueror. + + The HTML-based interface is similar to the QT-based + interface, but slightly less powerful for now. Its advantage is + that you can perform your search while staying fully within the + KDE framework: drag and drop from the result list works normally + and you have your normal choice of applications for opening + files. + + The alternative interface uses a directory view of search + results. Due to limitations in the current KIO slave interface, + it is currently not obviously useful (to me). + + The interface is described in more detail inside a help + file which you can access by entering + recoll:/ inside the + konqueror URL line (this works only if the + recoll KIO slave has been previously installed). + + + The instructions for building this module are located in + the source tree. See: + kde/kio/recoll/00README.txt + + + + + + Searchable documents + + As a sample application, the &RCL; KIO slave could allow + preparing a set of HTML documents (for example a manual) so that + they become their own search interface inside + konqueror. + + This can be done by either explicitely inserting + <a href="recoll:/..."> links + around some document areas, or automatically by adding a + very small javascript program to the + documents, like the following example, which would initiate a search by + double-clicking any term: + + <script language="JavaScript"> + function recollsearch() { + var t = document.getSelection(); + window.location.href = 'recoll://search/query?qtp=a&p=0&q=' + + encodeURIComponent(t); + } +</script> + .... +<body ondblclick="recollsearch()"> + + + + + + + + + + + Searching on the command line + + There are several ways to obtain search results as a text + stream, without a graphical interface: + + By passing option -t to the + recoll program. + + By using the recollq program. + + By writing a custom + Python program, using the + Recoll Python API. + + + + The first two methods work in the same way and accept/need the same + arguments (except for the additional -t to + recoll). The query to be executed is specified + as command line arguments. + + recollq is not built by default. You can + use the Makefile in the + query directory to build it. This is a very + simple program, and it will often be useful to taylor its output format + to your needs. + + recollq has a man page (not installed by + default, look in the doc/man directory). The + Usage string is as follows: +recollq [-o|-a|-f] <query string> + Runs a recoll query and displays result lines. + Default: will interpret the argument(s) as a query language string + -o Emulate the gui simple search in ANY TERM mode + -a Emulate the gui simple search in ALL TERMS mode + -f Emulate the gui simple search in filename mode +Common options: + -c <configdir> : specify config directory, overriding $RECOLL_CONFDIR + -d also dump file contents + -n <cnt> limit the maximum number of results (0->no limit, default 2000) + -b : basic. Just output urls, no mime types or titles + -m : dump the whole document meta[] array + -S fld : sort by field name + -D : sort descending + + + Sample execution: +recollq 'ilur -nautique mime:text/html' +Recoll query: ((((ilur:(wqf=11) OR ilurs) AND_NOT (nautique:(wqf=11) + OR nautiques OR nautiqu OR nautiquement)) FILTER Ttext/html)) +4 results +text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/comptes.html] [comptes.html] 18593 bytes +text/html [file:///Users/uncrypted-dockes/projets/nautique/webnautique/articles/ilur1/index.html] [Constructio... +text/html [file:///Users/uncrypted-dockes/projets/pagepers/index.html] [psxtcl/writemime/recoll]... +text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/recu-chasse-maree.... + + + + Programming interface @@ -2713,6 +2859,16 @@ skippedPaths = ~/somedir/∗.txt + compressedfilemaxkbs + Size limit for compressed (.gz or .bz2) + files. These need to be decompressed in a temporary + directory for identification, which can be very wasteful + if 'uninteresting' big compressed files are present. + Negative means no limit, 0 means no processing of any + compressed file. Defaults to -1. + + + indexallfilenames &RCL; indexes file names in a special section of the database to allow specific file names