added compressedfilemaxkbs
This commit is contained in:
parent
bf16706d50
commit
d0a8a37298
@ -3,8 +3,8 @@
|
||||
.SH NAME
|
||||
recoll.conf \- main personal configuration file for Recoll
|
||||
.SH DESCRIPTION
|
||||
This file defines the indexation configuration for the full-text search
|
||||
system Recoll.
|
||||
This file defines the indexation configuration for the Recoll full-text search
|
||||
system.
|
||||
.LP
|
||||
The system-wide configuration file is normally located inside
|
||||
/usr/[local]/share/recoll/examples. Any parameter set in the common file
|
||||
@ -58,6 +58,11 @@ embedded spaces can be quoted with double-quotes.
|
||||
.BI "topdirs = " directories
|
||||
Specifies the list of directories to index (recursively).
|
||||
.TP
|
||||
.BI "dbdir = " directory
|
||||
The name of the Xapian database directory. It will be created if needed
|
||||
when the database is initialized. If this is not an absolute pathname, it
|
||||
will be taken relative to the configuration directory.
|
||||
.TP
|
||||
.BI "skippedNames = " patterns
|
||||
A space-separated list of patterns for names of files or directories that
|
||||
should be completely ignored. The list defined in the default file is:
|
||||
@ -76,6 +81,18 @@ into. Together with topdirs, this allows pruning the indexed tree to one's
|
||||
content. daemSkippedPaths can be used to define a specific value for the
|
||||
real time indexing monitor.
|
||||
.TP
|
||||
.BI "followLinks = " boolean
|
||||
Specifies if the indexer should follow
|
||||
symbolic links while walking the file tree. The default is
|
||||
to ignore symbolic links to avoid multiple indexing of
|
||||
linked files. No effort is made to avoid duplication when
|
||||
this option is set to true. This option can be set
|
||||
individually for each of the
|
||||
.I topdirs
|
||||
members by using sections. It can not be changed below the
|
||||
.I topdirs
|
||||
level.
|
||||
.TP
|
||||
.BI "loglevel = " value
|
||||
Verbosity level for recoll and recollindex. A value of 4 lists quite a lot of
|
||||
debug/information messages. 3 lists only errors.
|
||||
@ -87,11 +104,6 @@ Where should the messages go. 'stderr' can be used as a special value.
|
||||
.B daemlogfilename
|
||||
can be used to specify a different value for the real-time indexing daemon.
|
||||
.TP
|
||||
.BI "dbdir = " directory
|
||||
The name of the Xapian database directory. It will be created if needed
|
||||
when the database is initialized. If this is not an absolute pathname, it
|
||||
will be taken relative to the configuration directory.
|
||||
.TP
|
||||
.BI "indexstemminglanguages = " languages
|
||||
A list of languages for which the stem expansion databases will be
|
||||
built. See recollindex(1) for possible values.
|
||||
@ -132,13 +144,6 @@ Try to guess the character set of files if no internal value is available
|
||||
(ie: for plain text files). This does not work well in general, and should
|
||||
probably not be used.
|
||||
.TP
|
||||
.BI "indexallfilenames = " boolean
|
||||
Recoll indexes file names into a special section of the database to allow
|
||||
specific file names searches using wild cards. This parameter decides if
|
||||
file name indexing is performed only for files with mime types that would
|
||||
qualify them for full text indexation, or for all files inside
|
||||
the selected subtrees, independant of mime type.
|
||||
.TP
|
||||
.BI "usesystemfilecommand = " boolean
|
||||
Decide if we use the
|
||||
.B "file -i"
|
||||
@ -147,6 +152,65 @@ system command as a final step for determining the mime type for a file
|
||||
.B mimemap
|
||||
file). This can be useful for files with suffixless names, but it will
|
||||
also cause the indexation of many bogus "text" files.
|
||||
.TP
|
||||
.BI "indexedmimetypes = " list
|
||||
Recoll normally indexes any file which it knows how to read. This list lets
|
||||
you restrict the indexed mime types to what you specify. If the variable is
|
||||
unspecified or the list empty (the default), all supported types are
|
||||
processed.
|
||||
.TP
|
||||
.BI "compressedfilemaxkbs = " value
|
||||
Size limit for compressed (.gz or .bz2) files. These need to be
|
||||
decompressed in a temporary directory for identification, which can be very
|
||||
wasteful if 'uninteresting' big compressed files are present. Negative
|
||||
means no limit, 0 means no processing of any compressed file. Defaults
|
||||
to -1.
|
||||
.TP
|
||||
.BI "indexallfilenames = " boolean
|
||||
Recoll indexes file names into a special section of the database to allow
|
||||
specific file names searches using wild cards. This parameter decides if
|
||||
file name indexing is performed only for files with mime types that would
|
||||
qualify them for full text indexation, or for all files inside
|
||||
the selected subtrees, independant of mime type.
|
||||
.TP
|
||||
.BI "idxabsmlen = " value
|
||||
Recoll stores an abstract for each indexed file inside the database. The
|
||||
text can come from an actual 'abstract' section in the document or will
|
||||
just be the beginning of the document. It is stored in the index so that it
|
||||
can be displayed inside the result lists without decoding the original
|
||||
file. The
|
||||
.I idxabsmlen
|
||||
parameter defines the size of the stored abstract. The default value is 250
|
||||
bytes. The search interface gives you the choice to display this stored
|
||||
text or a synthetic abstract built by extracting text around the search
|
||||
terms. If you always prefer the synthetic abstract, you can reduce this
|
||||
value and save a little space.
|
||||
.TP
|
||||
.BI "aspellLanguage = " lang
|
||||
Language definitions to use when creating the aspell dictionary. The value
|
||||
must match a set of aspell language definition files. You can type "aspell
|
||||
config" to see where these are installed (look for data-dir). The default
|
||||
if the variable is not set is to use your desktop national language
|
||||
environment to guess the value.
|
||||
.TP
|
||||
.BI "noaspell = " boolean
|
||||
If this is set, the aspell dictionary generation is turned off. Useful for
|
||||
cases where you don't need the functionality or when it is unusable because
|
||||
aspell crashes during dictionary generation.
|
||||
.TP
|
||||
.BI "nocjk = " boolean
|
||||
If this set to true, specific east asian (Chinese Korean Japanese)
|
||||
characters/word splitting is turned off. This will save a small amount of
|
||||
cpu if you have no CJK documents. If your document base does include such
|
||||
text but you are not interested in searching it, setting
|
||||
.I nocjk
|
||||
may be a significant time and space saver.
|
||||
.TP
|
||||
.BI "cjkngramlen = " value
|
||||
This lets you adjust the size of n-grams used for indexing CJK text. The
|
||||
default value of 2 is probably appropriate in most cases. A value of 3
|
||||
would allow more precision and efficiency on longer words, but the index
|
||||
will be approximately twice as large.
|
||||
.SH SEE ALSO
|
||||
.PP
|
||||
recollindex(1) recoll(1)
|
||||
|
||||
@ -578,9 +578,9 @@ fvwm
|
||||
</chapter>
|
||||
|
||||
<chapter id="rcl.search">
|
||||
<title>Searching</title>
|
||||
<title>Searching with the Qt graphical user interface</title>
|
||||
|
||||
<para>The <command>recoll</command> program provides the user
|
||||
<para>The <command>recoll</command> program provides the main user
|
||||
interface for searching. It is based on the
|
||||
<application>QT</application> library.</para>
|
||||
|
||||
@ -1048,6 +1048,23 @@ fvwm
|
||||
</itemizedlist>
|
||||
|
||||
|
||||
<formalpara><title>Phrases and Proximity searches</title>
|
||||
<para>These two clauses work in similar ways, with the
|
||||
difference that proximity searches do not impose an order on the
|
||||
words. In both cases, an adjustable number (slack) of non-matched words
|
||||
may be accepted between the searched ones (use the counter on
|
||||
the left to adjust this count). For phrases, the default count
|
||||
is zero (exact match). For proximity it is ten (meaning that two search
|
||||
terms, would be matched if found within a window of twelve
|
||||
words). Examples: a phrase search for <literal>quick
|
||||
fox</literal> with a slack of 0 will match <literal>quick
|
||||
fox</literal> but not <literal>quick brown fox</literal>. With
|
||||
a slack of 1 it will match the latter, but not <literal>fox
|
||||
quick</literal>. A proximity search for <literal>quick
|
||||
fox</literal> with the default slack will match the
|
||||
latter, and also <literal>a fox is a cunning and quick animal</literal>.
|
||||
</formalpara>
|
||||
|
||||
<para>Click on the <guilabel>Start Search</guilabel> button in
|
||||
the advanced search dialog, or type <keycap>Enter</keycap> in
|
||||
any text field to start the search. The button in
|
||||
@ -1361,7 +1378,7 @@ fvwm
|
||||
quotes. Example: <literal>"user manual"</literal> will look
|
||||
only for occurrences of <literal>user</literal> immediately
|
||||
followed by <literal>manual</literal>. You can use the
|
||||
<guilabel>This exact phrase</guilabel> field of the advanced
|
||||
<guilabel>This phrase</guilabel> field of the advanced
|
||||
search dialog to the same effect. Phrases can be entered along
|
||||
simple terms in all simple or advanced search entry fields
|
||||
(except <guilabel>This exact phrase</guilabel>).</para>
|
||||
@ -1646,6 +1663,135 @@ fvwm
|
||||
|
||||
</chapter>
|
||||
|
||||
<chapter id="rcl.searchkio">
|
||||
<title>Searching with the KDE KIO slave</title>
|
||||
|
||||
<sect1 id="rcl.searchkio.intro">
|
||||
<title>What's this</title>
|
||||
|
||||
<para>The &RCL; KIO slave allows performing a &RCL; search
|
||||
by entering an appropriate URL in a KDE open dialog, or with an
|
||||
HTML-based interface displayed in
|
||||
<command>Konqueror</command>.</para>
|
||||
|
||||
<para>The HTML-based interface is similar to the QT-based
|
||||
interface, but slightly less powerful for now. Its advantage is
|
||||
that you can perform your search while staying fully within the
|
||||
KDE framework: drag and drop from the result list works normally
|
||||
and you have your normal choice of applications for opening
|
||||
files.</para>
|
||||
|
||||
<para>The alternative interface uses a directory view of search
|
||||
results. Due to limitations in the current KIO slave interface,
|
||||
it is currently not obviously useful (to me).</para>
|
||||
|
||||
<para>The interface is described in more detail inside a help
|
||||
file which you can access by entering
|
||||
<filename>recoll:/</filename> inside the
|
||||
<command>konqueror</command> URL line (this works only if the
|
||||
recoll KIO slave has been previously installed).</para>
|
||||
|
||||
|
||||
<para>The instructions for building this module are located in
|
||||
the source tree. See:
|
||||
<filename>kde/kio/recoll/00README.txt</filename></para>
|
||||
</sect1>
|
||||
|
||||
|
||||
|
||||
<sect1 id="rcl.searchkio.searchabledocs">
|
||||
<title>Searchable documents</title>
|
||||
|
||||
<para>As a sample application, the &RCL; KIO slave could allow
|
||||
preparing a set of HTML documents (for example a manual) so that
|
||||
they become their own search interface inside
|
||||
<command>konqueror</command>.</para>
|
||||
|
||||
<para>This can be done by either explicitely inserting
|
||||
<literal><a href="recoll:/..."></literal> links
|
||||
around some document areas, or automatically by adding a
|
||||
very small <application>javascript</application> program to the
|
||||
documents, like the following example, which would initiate a search by
|
||||
double-clicking any term:</para>
|
||||
|
||||
<programlisting><script language="JavaScript">
|
||||
function recollsearch() {
|
||||
var t = document.getSelection();
|
||||
window.location.href = 'recoll://search/query?qtp=a&p=0&q=' +
|
||||
encodeURIComponent(t);
|
||||
}
|
||||
</script>
|
||||
....
|
||||
<body ondblclick="recollsearch()">
|
||||
|
||||
</programlisting>
|
||||
|
||||
</sect1>
|
||||
|
||||
</chapter>
|
||||
|
||||
|
||||
|
||||
<chapter id="rcl.searchkcl">
|
||||
<title>Searching on the command line</title>
|
||||
|
||||
<para>There are several ways to obtain search results as a text
|
||||
stream, without a graphical interface:</para>
|
||||
<itemizedlist>
|
||||
<listitem><para>By passing option <literal>-t</literal> to the
|
||||
<command>recoll</command> program.</para>
|
||||
</listitem>
|
||||
<listitem><para>By using the <command>recollq</command> program.</para>
|
||||
</listitem>
|
||||
<listitem><para>By writing a custom
|
||||
<application>Python</application> program, using the
|
||||
<link linkend="rcl.program.api.python">Recoll Python API</link>.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
|
||||
<para>The first two methods work in the same way and accept/need the same
|
||||
arguments (except for the additional <literal>-t</literal> to
|
||||
<command>recoll</command>). The query to be executed is specified
|
||||
as command line arguments.</para>
|
||||
|
||||
<para><command>recollq</command> is not built by default. You can
|
||||
use the <filename>Makefile</filename> in the
|
||||
<filename>query</filename> directory to build it. This is a very
|
||||
simple program, and it will often be useful to taylor its output format
|
||||
to your needs.</para>
|
||||
|
||||
<para><command>recollq</command> has a man page (not installed by
|
||||
default, look in the <filename>doc/man</filename> directory). The
|
||||
Usage string is as follows:</para>
|
||||
<programlisting>recollq [-o|-a|-f] <query string>
|
||||
Runs a recoll query and displays result lines.
|
||||
Default: will interpret the argument(s) as a query language string
|
||||
-o Emulate the gui simple search in ANY TERM mode
|
||||
-a Emulate the gui simple search in ALL TERMS mode
|
||||
-f Emulate the gui simple search in filename mode
|
||||
Common options:
|
||||
-c <configdir> : specify config directory, overriding $RECOLL_CONFDIR
|
||||
-d also dump file contents
|
||||
-n <cnt> limit the maximum number of results (0->no limit, default 2000)
|
||||
-b : basic. Just output urls, no mime types or titles
|
||||
-m : dump the whole document meta[] array
|
||||
-S fld : sort by field name
|
||||
-D : sort descending
|
||||
</programlisting>
|
||||
|
||||
<para>Sample execution:</para>
|
||||
<programlisting>recollq 'ilur -nautique mime:text/html'
|
||||
Recoll query: ((((ilur:(wqf=11) OR ilurs) AND_NOT (nautique:(wqf=11)
|
||||
OR nautiques OR nautiqu OR nautiquement)) FILTER Ttext/html))
|
||||
4 results
|
||||
text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/comptes.html] [comptes.html] 18593 bytes
|
||||
text/html [file:///Users/uncrypted-dockes/projets/nautique/webnautique/articles/ilur1/index.html] [Constructio...
|
||||
text/html [file:///Users/uncrypted-dockes/projets/pagepers/index.html] [psxtcl/writemime/recoll]...
|
||||
text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/recu-chasse-maree....
|
||||
</programlisting>
|
||||
|
||||
</chapter>
|
||||
|
||||
<chapter id="rcl.program">
|
||||
<title>Programming interface</title>
|
||||
|
||||
@ -2713,6 +2859,16 @@ skippedPaths = ~/somedir/∗.txt
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry><term><literal>compressedfilemaxkbs</literal></term>
|
||||
<listitem><para>Size limit for compressed (.gz or .bz2)
|
||||
files. These need to be decompressed in a temporary
|
||||
directory for identification, which can be very wasteful
|
||||
if 'uninteresting' big compressed files are present.
|
||||
Negative means no limit, 0 means no processing of any
|
||||
compressed file. Defaults to -1.</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry><term><literal>indexallfilenames</literal></term>
|
||||
<listitem><para>&RCL; indexes file names in a special
|
||||
section of the database to allow specific file names
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user