added compressedfilemaxkbs

This commit is contained in:
dockes 2009-01-17 14:57:12 +00:00
parent bf16706d50
commit d0a8a37298
2 changed files with 237 additions and 17 deletions

View File

@ -3,8 +3,8 @@
.SH NAME
recoll.conf \- main personal configuration file for Recoll
.SH DESCRIPTION
This file defines the indexation configuration for the full-text search
system Recoll.
This file defines the indexation configuration for the Recoll full-text search
system.
.LP
The system-wide configuration file is normally located inside
/usr/[local]/share/recoll/examples. Any parameter set in the common file
@ -58,6 +58,11 @@ embedded spaces can be quoted with double-quotes.
.BI "topdirs = " directories
Specifies the list of directories to index (recursively).
.TP
.BI "dbdir = " directory
The name of the Xapian database directory. It will be created if needed
when the database is initialized. If this is not an absolute pathname, it
will be taken relative to the configuration directory.
.TP
.BI "skippedNames = " patterns
A space-separated list of patterns for names of files or directories that
should be completely ignored. The list defined in the default file is:
@ -76,6 +81,18 @@ into. Together with topdirs, this allows pruning the indexed tree to one's
content. daemSkippedPaths can be used to define a specific value for the
real time indexing monitor.
.TP
.BI "followLinks = " boolean
Specifies if the indexer should follow
symbolic links while walking the file tree. The default is
to ignore symbolic links to avoid multiple indexing of
linked files. No effort is made to avoid duplication when
this option is set to true. This option can be set
individually for each of the
.I topdirs
members by using sections. It can not be changed below the
.I topdirs
level.
.TP
.BI "loglevel = " value
Verbosity level for recoll and recollindex. A value of 4 lists quite a lot of
debug/information messages. 3 lists only errors.
@ -87,11 +104,6 @@ Where should the messages go. 'stderr' can be used as a special value.
.B daemlogfilename
can be used to specify a different value for the real-time indexing daemon.
.TP
.BI "dbdir = " directory
The name of the Xapian database directory. It will be created if needed
when the database is initialized. If this is not an absolute pathname, it
will be taken relative to the configuration directory.
.TP
.BI "indexstemminglanguages = " languages
A list of languages for which the stem expansion databases will be
built. See recollindex(1) for possible values.
@ -132,13 +144,6 @@ Try to guess the character set of files if no internal value is available
(ie: for plain text files). This does not work well in general, and should
probably not be used.
.TP
.BI "indexallfilenames = " boolean
Recoll indexes file names into a special section of the database to allow
specific file names searches using wild cards. This parameter decides if
file name indexing is performed only for files with mime types that would
qualify them for full text indexation, or for all files inside
the selected subtrees, independant of mime type.
.TP
.BI "usesystemfilecommand = " boolean
Decide if we use the
.B "file -i"
@ -147,6 +152,65 @@ system command as a final step for determining the mime type for a file
.B mimemap
file). This can be useful for files with suffixless names, but it will
also cause the indexation of many bogus "text" files.
.TP
.BI "indexedmimetypes = " list
Recoll normally indexes any file which it knows how to read. This list lets
you restrict the indexed mime types to what you specify. If the variable is
unspecified or the list empty (the default), all supported types are
processed.
.TP
.BI "compressedfilemaxkbs = " value
Size limit for compressed (.gz or .bz2) files. These need to be
decompressed in a temporary directory for identification, which can be very
wasteful if 'uninteresting' big compressed files are present. Negative
means no limit, 0 means no processing of any compressed file. Defaults
to -1.
.TP
.BI "indexallfilenames = " boolean
Recoll indexes file names into a special section of the database to allow
specific file names searches using wild cards. This parameter decides if
file name indexing is performed only for files with mime types that would
qualify them for full text indexation, or for all files inside
the selected subtrees, independant of mime type.
.TP
.BI "idxabsmlen = " value
Recoll stores an abstract for each indexed file inside the database. The
text can come from an actual 'abstract' section in the document or will
just be the beginning of the document. It is stored in the index so that it
can be displayed inside the result lists without decoding the original
file. The
.I idxabsmlen
parameter defines the size of the stored abstract. The default value is 250
bytes. The search interface gives you the choice to display this stored
text or a synthetic abstract built by extracting text around the search
terms. If you always prefer the synthetic abstract, you can reduce this
value and save a little space.
.TP
.BI "aspellLanguage = " lang
Language definitions to use when creating the aspell dictionary. The value
must match a set of aspell language definition files. You can type "aspell
config" to see where these are installed (look for data-dir). The default
if the variable is not set is to use your desktop national language
environment to guess the value.
.TP
.BI "noaspell = " boolean
If this is set, the aspell dictionary generation is turned off. Useful for
cases where you don't need the functionality or when it is unusable because
aspell crashes during dictionary generation.
.TP
.BI "nocjk = " boolean
If this set to true, specific east asian (Chinese Korean Japanese)
characters/word splitting is turned off. This will save a small amount of
cpu if you have no CJK documents. If your document base does include such
text but you are not interested in searching it, setting
.I nocjk
may be a significant time and space saver.
.TP
.BI "cjkngramlen = " value
This lets you adjust the size of n-grams used for indexing CJK text. The
default value of 2 is probably appropriate in most cases. A value of 3
would allow more precision and efficiency on longer words, but the index
will be approximately twice as large.
.SH SEE ALSO
.PP
recollindex(1) recoll(1)

View File

@ -578,9 +578,9 @@ fvwm
</chapter>
<chapter id="rcl.search">
<title>Searching</title>
<title>Searching with the Qt graphical user interface</title>
<para>The <command>recoll</command> program provides the user
<para>The <command>recoll</command> program provides the main user
interface for searching. It is based on the
<application>QT</application> library.</para>
@ -1048,6 +1048,23 @@ fvwm
</itemizedlist>
<formalpara><title>Phrases and Proximity searches</title>
<para>These two clauses work in similar ways, with the
difference that proximity searches do not impose an order on the
words. In both cases, an adjustable number (slack) of non-matched words
may be accepted between the searched ones (use the counter on
the left to adjust this count). For phrases, the default count
is zero (exact match). For proximity it is ten (meaning that two search
terms, would be matched if found within a window of twelve
words). Examples: a phrase search for <literal>quick
fox</literal> with a slack of 0 will match <literal>quick
fox</literal> but not <literal>quick brown fox</literal>. With
a slack of 1 it will match the latter, but not <literal>fox
quick</literal>. A proximity search for <literal>quick
fox</literal> with the default slack will match the
latter, and also <literal>a fox is a cunning and quick animal</literal>.
</formalpara>
<para>Click on the <guilabel>Start Search</guilabel> button in
the advanced search dialog, or type <keycap>Enter</keycap> in
any text field to start the search. The button in
@ -1361,7 +1378,7 @@ fvwm
quotes. Example: <literal>"user manual"</literal> will look
only for occurrences of <literal>user</literal> immediately
followed by <literal>manual</literal>. You can use the
<guilabel>This exact phrase</guilabel> field of the advanced
<guilabel>This phrase</guilabel> field of the advanced
search dialog to the same effect. Phrases can be entered along
simple terms in all simple or advanced search entry fields
(except <guilabel>This exact phrase</guilabel>).</para>
@ -1646,6 +1663,135 @@ fvwm
</chapter>
<chapter id="rcl.searchkio">
<title>Searching with the KDE KIO slave</title>
<sect1 id="rcl.searchkio.intro">
<title>What's this</title>
<para>The &RCL; KIO slave allows performing a &RCL; search
by entering an appropriate URL in a KDE open dialog, or with an
HTML-based interface displayed in
<command>Konqueror</command>.</para>
<para>The HTML-based interface is similar to the QT-based
interface, but slightly less powerful for now. Its advantage is
that you can perform your search while staying fully within the
KDE framework: drag and drop from the result list works normally
and you have your normal choice of applications for opening
files.</para>
<para>The alternative interface uses a directory view of search
results. Due to limitations in the current KIO slave interface,
it is currently not obviously useful (to me).</para>
<para>The interface is described in more detail inside a help
file which you can access by entering
<filename>recoll:/</filename> inside the
<command>konqueror</command> URL line (this works only if the
recoll KIO slave has been previously installed).</para>
<para>The instructions for building this module are located in
the source tree. See:
<filename>kde/kio/recoll/00README.txt</filename></para>
</sect1>
<sect1 id="rcl.searchkio.searchabledocs">
<title>Searchable documents</title>
<para>As a sample application, the &RCL; KIO slave could allow
preparing a set of HTML documents (for example a manual) so that
they become their own search interface inside
<command>konqueror</command>.</para>
<para>This can be done by either explicitely inserting
<literal>&lt;a&nbsp;href="recoll:/..."&gt;</literal> links
around some document areas, or automatically by adding a
very small <application>javascript</application> program to the
documents, like the following example, which would initiate a search by
double-clicking any term:</para>
<programlisting>&lt;script language="JavaScript">
function recollsearch() {
var t = document.getSelection();
window.location.href = 'recoll://search/query?qtp=a&amp;p=0&amp;q=' +
encodeURIComponent(t);
}
&lt;/script>
....
&lt;body ondblclick="recollsearch()">
</programlisting>
</sect1>
</chapter>
<chapter id="rcl.searchkcl">
<title>Searching on the command line</title>
<para>There are several ways to obtain search results as a text
stream, without a graphical interface:</para>
<itemizedlist>
<listitem><para>By passing option <literal>-t</literal> to the
<command>recoll</command> program.</para>
</listitem>
<listitem><para>By using the <command>recollq</command> program.</para>
</listitem>
<listitem><para>By writing a custom
<application>Python</application> program, using the
<link linkend="rcl.program.api.python">Recoll Python API</link>.</para>
</listitem>
</itemizedlist>
<para>The first two methods work in the same way and accept/need the same
arguments (except for the additional <literal>-t</literal> to
<command>recoll</command>). The query to be executed is specified
as command line arguments.</para>
<para><command>recollq</command> is not built by default. You can
use the <filename>Makefile</filename> in the
<filename>query</filename> directory to build it. This is a very
simple program, and it will often be useful to taylor its output format
to your needs.</para>
<para><command>recollq</command> has a man page (not installed by
default, look in the <filename>doc/man</filename> directory). The
Usage string is as follows:</para>
<programlisting>recollq [-o|-a|-f] &lt;query string>
Runs a recoll query and displays result lines.
Default: will interpret the argument(s) as a query language string
-o Emulate the gui simple search in ANY TERM mode
-a Emulate the gui simple search in ALL TERMS mode
-f Emulate the gui simple search in filename mode
Common options:
-c &lt;configdir> : specify config directory, overriding $RECOLL_CONFDIR
-d also dump file contents
-n &lt;cnt> limit the maximum number of results (0->no limit, default 2000)
-b : basic. Just output urls, no mime types or titles
-m : dump the whole document meta[] array
-S fld : sort by field name
-D : sort descending
</programlisting>
<para>Sample execution:</para>
<programlisting>recollq 'ilur -nautique mime:text/html'
Recoll query: ((((ilur:(wqf=11) OR ilurs) AND_NOT (nautique:(wqf=11)
OR nautiques OR nautiqu OR nautiquement)) FILTER Ttext/html))
4 results
text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/comptes.html] [comptes.html] 18593 bytes
text/html [file:///Users/uncrypted-dockes/projets/nautique/webnautique/articles/ilur1/index.html] [Constructio...
text/html [file:///Users/uncrypted-dockes/projets/pagepers/index.html] [psxtcl/writemime/recoll]...
text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/recu-chasse-maree....
</programlisting>
</chapter>
<chapter id="rcl.program">
<title>Programming interface</title>
@ -2713,6 +2859,16 @@ skippedPaths = ~/somedir/&lowast;.txt
</listitem>
</varlistentry>
<varlistentry><term><literal>compressedfilemaxkbs</literal></term>
<listitem><para>Size limit for compressed (.gz or .bz2)
files. These need to be decompressed in a temporary
directory for identification, which can be very wasteful
if 'uninteresting' big compressed files are present.
Negative means no limit, 0 means no processing of any
compressed file. Defaults to -1.</para>
</listitem>
</varlistentry>
<varlistentry><term><literal>indexallfilenames</literal></term>
<listitem><para>&RCL; indexes file names in a special
section of the database to allow specific file names