doc
This commit is contained in:
parent
268e3824dc
commit
6bd88ca32f
@ -64,8 +64,8 @@
|
|||||||
<para>Also be aware that you may need to install the
|
<para>Also be aware that you may need to install the
|
||||||
appropriate <link linkend="rcl.install.external"> supporting
|
appropriate <link linkend="rcl.install.external"> supporting
|
||||||
applications</link> for document types that need them (for
|
applications</link> for document types that need them (for
|
||||||
example <application>antiword</application> for ms-word
|
example <application>antiword</application> for
|
||||||
files).</para>
|
<application>Microsoft Word</application> files).</para>
|
||||||
</sect1>
|
</sect1>
|
||||||
|
|
||||||
<sect1 id="rcl.introduction.search">
|
<sect1 id="rcl.introduction.search">
|
||||||
@ -83,7 +83,7 @@
|
|||||||
<para>You do not need to remember in what file or email message you
|
<para>You do not need to remember in what file or email message you
|
||||||
stored a given piece of information. You just ask for related
|
stored a given piece of information. You just ask for related
|
||||||
terms, and the tool will return a list of documents where
|
terms, and the tool will return a list of documents where
|
||||||
those terms are prominent, in a similar way to Internet search
|
these terms are prominent, in a similar way to Internet search
|
||||||
engines.</para>
|
engines.</para>
|
||||||
|
|
||||||
<para>A search application tries to determine which documents are
|
<para>A search application tries to determine which documents are
|
||||||
@ -143,7 +143,7 @@
|
|||||||
word being singular or plural (floor, floors), or on a verb tense
|
word being singular or plural (floor, floors), or on a verb tense
|
||||||
(flooring, floored). Because the mechanisms used for stemming
|
(flooring, floored). Because the mechanisms used for stemming
|
||||||
depend on the specific grammatical rules for each language, there
|
depend on the specific grammatical rules for each language, there
|
||||||
is a separate stemmer module for most common languages where
|
is a separate &XAP; stemmer module for most common languages where
|
||||||
stemming makes sense.</para>
|
stemming makes sense.</para>
|
||||||
|
|
||||||
<para>&RCL; stores the unstemmed versions of terms in the main index
|
<para>&RCL; stores the unstemmed versions of terms in the main index
|
||||||
@ -160,26 +160,27 @@
|
|||||||
recognition, which means that the stemmer will sometimes be applied
|
recognition, which means that the stemmer will sometimes be applied
|
||||||
to terms from other languages with potentially strange results. In
|
to terms from other languages with potentially strange results. In
|
||||||
practise, even if this introduces possibilities of confusion, this
|
practise, even if this introduces possibilities of confusion, this
|
||||||
approach has been proven quite useful, and, awaiting the addition
|
approach has been proven quite useful, and it is much less
|
||||||
of an automatic language recognition module to &RCL;, it is much
|
cumbersome than separating your documents according to what
|
||||||
less cumbersome than separating your documents according to what
|
|
||||||
language they are written in.</para>
|
language they are written in.</para>
|
||||||
|
|
||||||
<para>Before version 1.18, &RCL; always stripped most accents and
|
<para>Before version 1.18, &RCL; stripped most accents and
|
||||||
diacritics from terms, and converted them to lower case before
|
diacritics from terms, and converted them to lower case before
|
||||||
storing them in the index. As a consequence, it was impossible to
|
either storing them in the index or searching for them. As a
|
||||||
search for a particular capitalization of a term
|
consequence, it was impossible to search for a particular
|
||||||
(<literal>US</literal> / <literal>us</literal>), or to
|
capitalization of a term (<literal>US</literal> /
|
||||||
discriminate two terms based on diacritics (<literal>sake</literal>
|
<literal>us</literal>), or to discriminate two terms based on
|
||||||
/ <literal>saké</literal>, <literal>mate</literal> /
|
diacritics (<literal>sake</literal> / <literal>saké</literal>,
|
||||||
<literal>maté</literal>).</para>
|
<literal>mate</literal> / <literal>maté</literal>).</para>
|
||||||
|
|
||||||
<para>As of version 1.18, &RCL; can optionally store the raw terms,
|
<para>As of version 1.18, &RCL; can optionally store the raw terms,
|
||||||
without accent stripping or case conversion. Expansions necessary
|
without accent stripping or case conversion. In this configuration,
|
||||||
for searches insensitive to case and/or diacritics are then
|
it is still possible (and most common) for a query to be
|
||||||
performed when searching. This is described in more detail in the
|
insensitive to case and/or diacritics. Appropriate term expansions
|
||||||
<link linkend="RCL.INDEXING.CONFIG.SENS">section about index case
|
are performed before actually accessing the main index. This is
|
||||||
and diacritics sensitivity</link>.</para>
|
described in more detail in the <link
|
||||||
|
linkend="RCL.INDEXING.CONFIG.SENS">section about index case and
|
||||||
|
diacritics sensitivity</link>.</para>
|
||||||
|
|
||||||
<para>&RCL; has many parameters which define exactly what to
|
<para>&RCL; has many parameters which define exactly what to
|
||||||
index, and how to classify and decode the source
|
index, and how to classify and decode the source
|
||||||
@ -197,7 +198,9 @@
|
|||||||
sufficient for giving &RCL; a try, but you may want to adjust
|
sufficient for giving &RCL; a try, but you may want to adjust
|
||||||
it later, which can be done either by editing the text files
|
it later, which can be done either by editing the text files
|
||||||
or by using configuration menus in the
|
or by using configuration menus in the
|
||||||
<command>recoll</command> GUI</para>
|
<command>recoll</command> GUI. Some other parameters affecting only
|
||||||
|
the <command>recoll</command> GUI are stored in the standard
|
||||||
|
location defined by <application>Qt</application>.</para>
|
||||||
|
|
||||||
<para>The <link linkend="rcl.indexing.periodic.exec">indexing
|
<para>The <link linkend="rcl.indexing.periodic.exec">indexing
|
||||||
process</link> is started automatically the first time you
|
process</link> is started automatically the first time you
|
||||||
@ -241,7 +244,7 @@
|
|||||||
aspects of the indexing processes and configuration, with links
|
aspects of the indexing processes and configuration, with links
|
||||||
to detailed sections.</para>
|
to detailed sections.</para>
|
||||||
|
|
||||||
<sect2>
|
<sect2 id="rcl.indexing.introduction.modes">
|
||||||
<title>Indexing modes</title>
|
<title>Indexing modes</title>
|
||||||
|
|
||||||
<para>&RCL; indexing can be performed along two different modes:
|
<para>&RCL; indexing can be performed along two different modes:
|
||||||
@ -279,20 +282,30 @@
|
|||||||
directory). Monitoring a big file system tree can consume
|
directory). Monitoring a big file system tree can consume
|
||||||
significant system resources.</para>
|
significant system resources.</para>
|
||||||
|
|
||||||
|
<para>The choice of method and the parameters used can be
|
||||||
|
configured from the <command>recoll</command> GUI:
|
||||||
|
<menuchoice>
|
||||||
|
<guimenu>Preferences</guimenu>
|
||||||
|
<guimenuitem>Indexing schedule</guimenuitem>
|
||||||
|
</menuchoice>
|
||||||
|
</para>
|
||||||
</sect2>
|
</sect2>
|
||||||
|
|
||||||
<sect2>
|
<sect2 id="rcl.indexing.introduction.config">
|
||||||
<title>Configurations, multiple indexes</title>
|
<title>Configurations, multiple indexes</title>
|
||||||
|
|
||||||
<para>The parameters describing what is to be indexed and
|
<para>The parameters describing what is to be indexed and
|
||||||
local preferences are defined in text files contained in a
|
local preferences are defined in text files contained in a
|
||||||
<link linkend="rcl.indexing.config">configuration
|
<link linkend="rcl.indexing.config">configuration
|
||||||
directory</link>.</para>
|
directory</link>.</para>
|
||||||
|
|
||||||
<para>All parameters have defaults, defined in system-wide
|
<para>All parameters have defaults, defined in system-wide
|
||||||
files.</para>
|
files.</para>
|
||||||
|
|
||||||
<para>Without further configuration, &RCL; will index all
|
<para>Without further configuration, &RCL; will index all
|
||||||
appropriate files from your home directory, with a reasonable
|
appropriate files from your home directory, with a reasonable
|
||||||
set of defaults.</para>
|
set of defaults.</para>
|
||||||
|
|
||||||
<para>A default personal configuration directory
|
<para>A default personal configuration directory
|
||||||
(<filename>$HOME/.recoll/</filename>) is created
|
(<filename>$HOME/.recoll/</filename>) is created
|
||||||
when a &RCL; program is first executed. It is possible to
|
when a &RCL; program is first executed. It is possible to
|
||||||
@ -308,14 +321,14 @@
|
|||||||
would be done to separate personal and shared
|
would be done to separate personal and shared
|
||||||
indexes, or to take advantage of the organization of your data
|
indexes, or to take advantage of the organization of your data
|
||||||
to improve search precision.</para>
|
to improve search precision.</para>
|
||||||
|
|
||||||
<para>The generated indexes can
|
<para>The generated indexes can
|
||||||
be <link linkend="rcl.search.multidb">queried
|
be queried concurrently in a transparent manner.</para>
|
||||||
concurrently</link> in a transparent manner.</para>
|
|
||||||
|
|
||||||
<para>For index generation, multiple configurations are
|
<para>For index generation, multiple configurations are
|
||||||
totally independant from each other. When multiple indexes need
|
totally independant from each other. When multiple indexes need
|
||||||
to be used for a single search,
|
to be used for a single search,
|
||||||
<link linkend="rcl.search.multidb">some parameters
|
<link linkend="rcl.indexing.config.multiple">some parameters
|
||||||
should be consistent among the configurations</link>.</para>
|
should be consistent among the configurations</link>.</para>
|
||||||
|
|
||||||
</sect2>
|
</sect2>
|
||||||
@ -331,8 +344,8 @@
|
|||||||
one document. Some file types, like email folders or zip
|
one document. Some file types, like email folders or zip
|
||||||
archives, can hold many individually indexed documents, which may
|
archives, can hold many individually indexed documents, which may
|
||||||
themselves be compound ones. Such hierarchies can go quite
|
themselves be compound ones. Such hierarchies can go quite
|
||||||
deep, and &RCL; can process, for example, an
|
deep, and &RCL; can process, for example, a
|
||||||
<application>ms-word</application>
|
<application>LibreOffice</application>
|
||||||
document stored as an attachment to an email message inside an
|
document stored as an attachment to an email message inside an
|
||||||
email folder archived in a zip file...</para>
|
email folder archived in a zip file...</para>
|
||||||
|
|
||||||
@ -395,22 +408,23 @@ recoll
|
|||||||
the index in
|
the index in
|
||||||
<filename>~/.indexes-email/xapiandb/</filename>.</para>
|
<filename>~/.indexes-email/xapiandb/</filename>.</para>
|
||||||
|
|
||||||
<para>Using multiple configuration directories and
|
<para>Using multiple configuration directories and <link
|
||||||
<link linkend="rcl.install.config.recollconf">configuration
|
linkend="rcl.install.config.recollconf">configuration
|
||||||
options</link> allows you to tailor multiple configurations
|
options</link> allows you to tailor multiple configurations and
|
||||||
and indexes to handle whatever subset of the available data
|
indexes to handle whatever subset of the available data you wish
|
||||||
that you wish to make searchable.</para>
|
to make searchable.</para>
|
||||||
|
|
||||||
</listitem>
|
</listitem>
|
||||||
|
|
||||||
<listitem><para>You can also specify a different storage
|
<listitem><para>For a given configuration directory, you can
|
||||||
location for the index by setting the <varname>dbdir</varname>
|
specify a non-default storage location for the index by setting
|
||||||
parameter in the configuration file
|
the <varname>dbdir</varname> parameter in the configuration file
|
||||||
(see the <link linkend="rcl.install.config.recollconf">configuration
|
(see the <link
|
||||||
section</link>). This method would mainly be of use if you
|
linkend="rcl.install.config.recollconf">configuration
|
||||||
wanted to keep the configuration directory in its default location,
|
section</link>). This method would mainly be of use if you wanted
|
||||||
but desired another location for the index, typically out of
|
to keep the configuration directory in its default location, but
|
||||||
disk occupation concerns.</para>
|
desired another location for the index, typically out of disk
|
||||||
|
occupation concerns.</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
|
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
@ -437,7 +451,7 @@ recoll
|
|||||||
destroyed safely.</para>
|
destroyed safely.</para>
|
||||||
|
|
||||||
<sect2 id="rcl.indexing.storage.format">
|
<sect2 id="rcl.indexing.storage.format">
|
||||||
<title>Xapian index formats</title>
|
<title>&XAP; index formats</title>
|
||||||
|
|
||||||
<para>&XAP; versions usually support several formats for index
|
<para>&XAP; versions usually support several formats for index
|
||||||
storage. A given major &XAP; version will have a current format,
|
storage. A given major &XAP; version will have a current format,
|
||||||
@ -490,8 +504,9 @@ recoll
|
|||||||
<link linkend="rcl.install.config">&RCL; configuration files</link>
|
<link linkend="rcl.install.config">&RCL; configuration files</link>
|
||||||
control which areas of the file system are indexed, and how
|
control which areas of the file system are indexed, and how
|
||||||
files are processed. These variables can be set either by
|
files are processed. These variables can be set either by
|
||||||
editing the text files or using the dialogs in the
|
editing the text files or by using the
|
||||||
<command>recoll</command> GUI.</para>
|
<link linkend="rcl.indexing.config.gui"> dialogs in the
|
||||||
|
<command>recoll</command> GUI</link>.</para>
|
||||||
|
|
||||||
<para>The first time you start <command>recoll</command>, you
|
<para>The first time you start <command>recoll</command>, you
|
||||||
will be asked whether or not you would like it to build the
|
will be asked whether or not you would like it to build the
|
||||||
@ -522,6 +537,61 @@ recoll
|
|||||||
described in the <link linkend="rcl.install.external">external
|
described in the <link linkend="rcl.install.external">external
|
||||||
packages section.</link></para>
|
packages section.</link></para>
|
||||||
|
|
||||||
|
<para>As of Recoll 1.18 there are two incompatible types of Recoll
|
||||||
|
indexes, depending on the treatment of character case and
|
||||||
|
diacritics. The next section describes the two types in more
|
||||||
|
detail.</para>
|
||||||
|
|
||||||
|
<sect2 id="rcl.indexing.config.multiple">
|
||||||
|
<title>Multiple indexes</title>
|
||||||
|
|
||||||
|
<para>Multiple &RCL; indexes can be created by
|
||||||
|
using several configuration directories which are usually set to
|
||||||
|
index different areas of the file system. A specific index can
|
||||||
|
be selected for updating or searching, using the
|
||||||
|
<envar>RECOLL_CONFDIR</envar> environment variable or the
|
||||||
|
<option>-c</option> option to <command>recoll</command> and
|
||||||
|
<command>recollindex</command>.</para>
|
||||||
|
|
||||||
|
<para>A typical usage scenario for the multiple index feature
|
||||||
|
would be for a system administrator to set up a central index
|
||||||
|
for shared data, that you choose to search or not in addition to
|
||||||
|
your personal data. Of course, there are other
|
||||||
|
possibilities. There are many cases where you know the subset of
|
||||||
|
files that should be searched, and where narrowing the search
|
||||||
|
can improve the results. You can achieve approximately the same
|
||||||
|
effect with the directory filter in advanced search, but
|
||||||
|
multiple indexes will have much better performance and may be
|
||||||
|
worth the trouble.</para>
|
||||||
|
|
||||||
|
<para>A <command>recollindex</command> program instance can only
|
||||||
|
update one specific index.</para>
|
||||||
|
|
||||||
|
<para>The main index (defined by
|
||||||
|
<envar>RECOLL_CONFDIR</envar> or <option>-c</option>) is
|
||||||
|
always active. If this is undesirable, you can set up your
|
||||||
|
base configuration to index an empty directory.</para>
|
||||||
|
|
||||||
|
<para>The different search interfaces (GUI, command line, ...)
|
||||||
|
have different methods to define the set of indexes to be
|
||||||
|
used, see the appropriate section.</para>
|
||||||
|
|
||||||
|
<para>If a set of multiple indexes are to be used together for
|
||||||
|
searches, some configuration parameters must be consistent
|
||||||
|
among the set. These are parameters which need to be the same
|
||||||
|
when indexing and searching. As the parameters come from the
|
||||||
|
main configuration when searching, they need to be compatible
|
||||||
|
with what was set when creating the other indexes (which came
|
||||||
|
from their respective configuration directories).</para>
|
||||||
|
|
||||||
|
<para>Most importantly, all indexes to be queried concurrently must
|
||||||
|
have the same option concerning character case and diacritics
|
||||||
|
stripping, but there are other constraints. Most of the
|
||||||
|
relevant parameters are described in the
|
||||||
|
<link linkend="rcl.install.config.recollconf.terms">linked
|
||||||
|
section</link>.</para>
|
||||||
|
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
|
||||||
<sect2 id="rcl.indexing.config.sens">
|
<sect2 id="rcl.indexing.config.sens">
|
||||||
@ -562,7 +632,7 @@ recoll
|
|||||||
<para>As a cost for added capability, a raw index will be slightly
|
<para>As a cost for added capability, a raw index will be slightly
|
||||||
bigger than a stripped one (around 10%). Also, searches will be
|
bigger than a stripped one (around 10%). Also, searches will be
|
||||||
more complex, so probably slightly slower, and the feature is
|
more complex, so probably slightly slower, and the feature is
|
||||||
still young, and a certain amount of weirdness cannot be
|
still young, so that a certain amount of weirdness cannot be
|
||||||
excluded.</para>
|
excluded.</para>
|
||||||
|
|
||||||
</sect2>
|
</sect2>
|
||||||
@ -709,7 +779,7 @@ recoll
|
|||||||
described here.</para>
|
described here.</para>
|
||||||
<para>Option <option>-z</option> will reset the index when
|
<para>Option <option>-z</option> will reset the index when
|
||||||
starting. This is almost the same as destroying the index
|
starting. This is almost the same as destroying the index
|
||||||
files (the nuance is that the Xapian format version will not
|
files (the nuance is that the &XAP; format version will not
|
||||||
be changed).</para>
|
be changed).</para>
|
||||||
<para>Option <option>-Z</option> will force the update of all
|
<para>Option <option>-Z</option> will force the update of all
|
||||||
documents without resetting the index first. This will not
|
documents without resetting the index first. This will not
|
||||||
@ -905,8 +975,8 @@ fvwm
|
|||||||
<listitem><para>Advanced search (a panel accessed through the
|
<listitem><para>Advanced search (a panel accessed through the
|
||||||
<guilabel>Tools</guilabel> menu or the toolbox bar icon) has
|
<guilabel>Tools</guilabel> menu or the toolbox bar icon) has
|
||||||
multiple entry fields, which you may use to build a logical
|
multiple entry fields, which you may use to build a logical
|
||||||
condition, with additional filtering on file type and location
|
condition, with additional filtering on file type, location
|
||||||
in the file system.</para>
|
in the file system, modification date, and size.</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
|
|
||||||
@ -955,60 +1025,53 @@ fvwm
|
|||||||
described in <link linkend="rcl.search.lang">a separate
|
described in <link linkend="rcl.search.lang">a separate
|
||||||
section</link>.</para>
|
section</link>.</para>
|
||||||
|
|
||||||
<para><guilabel>File name</guilabel> will specifically look for file
|
|
||||||
names. The entry will be split at white space characters,
|
|
||||||
and each fragment will be separately expanded, then the search will
|
|
||||||
be for file names matching all fragments (this is new in 1.15,
|
|
||||||
older releases did an OR of the whole thing which did not make
|
|
||||||
sense). Things to know:
|
|
||||||
<itemizedlist>
|
|
||||||
<listitem><para>The search is case- and accent-insensitive.</para>
|
|
||||||
</listitem>
|
|
||||||
<listitem><para>Fragments without any wild card
|
|
||||||
character and not capitalized will be prepended and appended
|
|
||||||
with '*' (ie: <replaceable>etc</replaceable> ->
|
|
||||||
<replaceable>*etc*</replaceable>, but
|
|
||||||
<replaceable>Etc</replaceable> ->
|
|
||||||
<replaceable>etc</replaceable>). Of course it does not make
|
|
||||||
sense to have multiple fragments if one of them is capitalized
|
|
||||||
(as this one will require an exact match).</para>
|
|
||||||
</listitem>
|
|
||||||
<listitem><para>If you want to search for a pattern including
|
|
||||||
white space, use double quotes (ie: <replaceable>"admin
|
|
||||||
note*"</replaceable>).</para>
|
|
||||||
</listitem>
|
|
||||||
<listitem><para>If you have a big index (many files),
|
|
||||||
excessively generic fragments may result in inefficient
|
|
||||||
searches.</para>
|
|
||||||
</listitem>
|
|
||||||
<listitem><para>As an example, <replaceable>inst
|
|
||||||
recoll</replaceable> would match
|
|
||||||
<replaceable>recollinstall.in</replaceable> (and quite a few
|
|
||||||
others...).</para>
|
|
||||||
</listitem>
|
|
||||||
</itemizedlist>
|
|
||||||
The point of having a separate file name
|
|
||||||
search is that wild card expansion can be performed more
|
|
||||||
efficiently on a relatively small subset of the index (allowing
|
|
||||||
wild cards on the left of terms without excessive penality).</para>
|
|
||||||
|
|
||||||
<para>All search modes allow wildcards inside terms
|
<para>All search modes allow wildcards inside terms
|
||||||
(<literal>*</literal>, <literal>?</literal>,
|
(<literal>*</literal>, <literal>?</literal>,
|
||||||
<literal>[]</literal>). You may want to have a look at the
|
<literal>[]</literal>). You may want to have a look at the
|
||||||
<link linkend="rcl.search.wildcards">section about wildcards</link>
|
<link linkend="rcl.search.wildcards">section about wildcards</link>
|
||||||
for more information about this.</para>
|
for more information about this.</para>
|
||||||
|
|
||||||
|
<para><guilabel>File name</guilabel> will specifically look for file
|
||||||
|
names. The point of having a separate file name
|
||||||
|
search is that wild card expansion can be performed more
|
||||||
|
efficiently on a small subset of the index (allowing
|
||||||
|
wild cards on the left of terms without excessive penality).
|
||||||
|
Things to know:
|
||||||
|
<itemizedlist>
|
||||||
|
<listitem><para>White space in the entry should match white
|
||||||
|
space in the file name, and is not treated specially.</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem><para>The search is insensitive to character case and
|
||||||
|
accents, independantly of the type of index.</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem><para>An entry without any wild card
|
||||||
|
character and not capitalized will be prepended and appended
|
||||||
|
with '*' (ie: <replaceable>etc</replaceable> ->
|
||||||
|
<replaceable>*etc*</replaceable>, but
|
||||||
|
<replaceable>Etc</replaceable> ->
|
||||||
|
<replaceable>etc</replaceable>).</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem><para>If you have a big index (many files),
|
||||||
|
excessively generic fragments may result in inefficient
|
||||||
|
searches.</para>
|
||||||
|
</listitem>
|
||||||
|
</itemizedlist>
|
||||||
|
</para>
|
||||||
|
|
||||||
<para>You can search for exact phrases (adjacent words in a
|
<para>You can search for exact phrases (adjacent words in a
|
||||||
given order) by enclosing the input inside double quotes. Ex:
|
given order) by enclosing the input inside double quotes. Ex:
|
||||||
<literal>"virtual reality"</literal>.</para>
|
<literal>"virtual reality"</literal>.</para>
|
||||||
|
|
||||||
<para>Character case has no influence on search, except that you
|
<para>When using a stripped index, character case has no influence on
|
||||||
can disable stem expansion for any term by capitalizing it. Ie:
|
search, except that you can disable stem expansion for any term by
|
||||||
a search for <literal>floor</literal> will also normally look for
|
capitalizing it. Ie: a search for <literal>floor</literal> will also
|
||||||
<literal>flooring</literal>, <literal>floored</literal>, etc., but
|
normally look for <literal>flooring</literal>,
|
||||||
a search for <literal>Floor</literal> will only look for
|
<literal>floored</literal>, etc., but a search for
|
||||||
<literal>floor</literal>, in any character case. Stemming can
|
<literal>Floor</literal> will only look for <literal>floor</literal>,
|
||||||
also be disabled globally in the preferences. </para>
|
in any character case. Stemming can also be disabled globally in the
|
||||||
|
preferences. When using a raw index, <link
|
||||||
|
linkend="rcl.search.casediac">the rules are a bit more
|
||||||
|
complicated</link>.</para>
|
||||||
|
|
||||||
<para>&RCL; remembers the last few searches that you
|
<para>&RCL; remembers the last few searches that you
|
||||||
performed. You can use the simple search text entry widget (a
|
performed. You can use the simple search text entry widget (a
|
||||||
@ -1050,10 +1113,7 @@ fvwm
|
|||||||
<para>By default, the document list is presented in order of
|
<para>By default, the document list is presented in order of
|
||||||
relevance (how well the system estimates that the document
|
relevance (how well the system estimates that the document
|
||||||
matches the query). You can sort the result by ascending or
|
matches the query). You can sort the result by ascending or
|
||||||
descending date by using the vertical arrows in the toolbar (the old
|
descending date by using the vertical arrows in the toolbar.</para>
|
||||||
sort tool is gone after release 1.15, because the new <link
|
|
||||||
linkend="rcl.search.gui.restable">result table</link> has much better
|
|
||||||
capability).</para>
|
|
||||||
|
|
||||||
<para>Clicking on the
|
<para>Clicking on the
|
||||||
<literal>Preview</literal> link for an entry will open an
|
<literal>Preview</literal> link for an entry will open an
|
||||||
@ -1520,7 +1580,7 @@ fvwm
|
|||||||
of the string to search for (ie a wildcard expression like
|
of the string to search for (ie a wildcard expression like
|
||||||
<replaceable>*coll</replaceable>), the expansion can take quite
|
<replaceable>*coll</replaceable>), the expansion can take quite
|
||||||
a long time because the full index term list will have to be
|
a long time because the full index term list will have to be
|
||||||
processed. The expansion is currently limited at 200 results for
|
processed. The expansion is currently limited at 10000 results for
|
||||||
wildcards and regular expressions.</para>
|
wildcards and regular expressions.</para>
|
||||||
|
|
||||||
<para>Double-clicking on a term in the result list will insert
|
<para>Double-clicking on a term in the result list will insert
|
||||||
@ -1531,9 +1591,9 @@ fvwm
|
|||||||
</sect2>
|
</sect2>
|
||||||
|
|
||||||
<sect2 id="rcl.search.gui.multidb">
|
<sect2 id="rcl.search.gui.multidb">
|
||||||
<title>Multiple databases</title>
|
<title>Multiple indexes</title>
|
||||||
|
|
||||||
<para>See the <link linkend="rcl.search.multidb">section
|
<para>See the <link linkend="rcl.indexing.config.multiple">section
|
||||||
describing the use of multiple indexes</link> for
|
describing the use of multiple indexes</link> for
|
||||||
generalities. Only the aspects concerning
|
generalities. Only the aspects concerning
|
||||||
the <command>recoll</command> GUI are described here.</para>
|
the <command>recoll</command> GUI are described here.</para>
|
||||||
@ -1627,7 +1687,7 @@ fvwm
|
|||||||
of the document container, not only of the text contents (so
|
of the document container, not only of the text contents (so
|
||||||
that ie, a text document with an image added will not be a
|
that ie, a text document with an image added will not be a
|
||||||
duplicate of the text only). Duplicates hiding is controlled
|
duplicate of the text only). Duplicates hiding is controlled
|
||||||
by an entry in the <guilabel>Query configuration</guilabel>
|
by an entry in the <guilabel>GUI configuration</guilabel>
|
||||||
dialog, and is off by default.</para>
|
dialog, and is off by default.</para>
|
||||||
|
|
||||||
</sect2>
|
</sect2>
|
||||||
@ -1821,7 +1881,7 @@ fvwm
|
|||||||
<title>Customizing the search interface</title>
|
<title>Customizing the search interface</title>
|
||||||
|
|
||||||
<para>You can customize some aspects of the search interface by using
|
<para>You can customize some aspects of the search interface by using
|
||||||
the <guimenu>Query configuration</guimenu> entry in the
|
the <guimenu>GUI configuration</guimenu> entry in the
|
||||||
<guimenu>Preferences</guimenu> menu.</para>
|
<guimenu>Preferences</guimenu> menu.</para>
|
||||||
|
|
||||||
<para>There are several tabs in the dialog, dealing with the
|
<para>There are several tabs in the dialog, dealing with the
|
||||||
@ -1868,8 +1928,7 @@ fvwm
|
|||||||
version instead. </para>
|
version instead. </para>
|
||||||
</listitem>
|
</listitem>
|
||||||
|
|
||||||
<listitem><para><guilabel>Use <PRE> tags instead of
|
<listitem><para><guilabel>Plain text to HTML line style</guilabel>:
|
||||||
<BR> to display plain text as HTML in preview</guilabel>:
|
|
||||||
when displaying plain text inside the preview window, &RCL;
|
when displaying plain text inside the preview window, &RCL;
|
||||||
tries to preserve some of the original text line breaks and
|
tries to preserve some of the original text line breaks and
|
||||||
indentation. It can either use PRE HTML tags, which will
|
indentation. It can either use PRE HTML tags, which will
|
||||||
@ -1877,7 +1936,9 @@ fvwm
|
|||||||
scrolling for long lines, or use BR tags to break at the
|
scrolling for long lines, or use BR tags to break at the
|
||||||
original line breaks, which will let the editor introduce
|
original line breaks, which will let the editor introduce
|
||||||
other line breaks according to the window width, but will
|
other line breaks according to the window width, but will
|
||||||
lose some of the original indentation.</para>
|
lose some of the original indentation. The third option has
|
||||||
|
been available in recent releases and is probably now the best
|
||||||
|
one: use PRE tags with line wrapping.</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
|
|
||||||
<listitem><para><guilabel>Use desktop preferences to choose
|
<listitem><para><guilabel>Use desktop preferences to choose
|
||||||
@ -1895,7 +1956,9 @@ fvwm
|
|||||||
that will still be opened according to &RCL; preferences. This
|
that will still be opened according to &RCL; preferences. This
|
||||||
is useful for passing parameters like page numbers or search
|
is useful for passing parameters like page numbers or search
|
||||||
strings to applications that support them
|
strings to applications that support them
|
||||||
(e.g. <application>evince</application>).</para>
|
(e.g. <application>evince</application>). This cannot be done
|
||||||
|
with <command>xdg-open</command> which only supports passing
|
||||||
|
one parameter.</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
|
|
||||||
<listitem><para><guilabel>Choose editor applications</guilabel>
|
<listitem><para><guilabel>Choose editor applications</guilabel>
|
||||||
@ -1917,9 +1980,8 @@ fvwm
|
|||||||
</listitem>
|
</listitem>
|
||||||
|
|
||||||
<listitem><para><guilabel>Start with advanced search dialog open
|
<listitem><para><guilabel>Start with advanced search dialog open
|
||||||
</guilabel> and <guilabel>Start with sort dialog
|
</guilabel>: If you use this dialog frequently, checking
|
||||||
open</guilabel>: If you use these dialogs all the time, checking
|
the entries will get it to open when recoll starts.</para>
|
||||||
these entries will get them to open when recoll starts.</para>
|
|
||||||
</listitem>
|
</listitem>
|
||||||
|
|
||||||
<listitem><para><guilabel>Remember sort activation
|
<listitem><para><guilabel>Remember sort activation
|
||||||
@ -1957,9 +2019,9 @@ fvwm
|
|||||||
</listitem>
|
</listitem>
|
||||||
|
|
||||||
<listitem id="rcl.search.gui.custom.resulthead">
|
<listitem id="rcl.search.gui.custom.resulthead">
|
||||||
<para><guilabel>Edit result page html header insert</guilabel>:
|
<para><guilabel>Edit result page HTML header insert</guilabel>:
|
||||||
allows you to define text inserted at the end of the result
|
allows you to define text inserted at the end of the result
|
||||||
page html header.
|
page HTML header.
|
||||||
More detail in the <link linkend="rcl.search.gui.custom.reslist">
|
More detail in the <link linkend="rcl.search.gui.custom.reslist">
|
||||||
result list customisation section.</link></para>
|
result list customisation section.</link></para>
|
||||||
</listitem>
|
</listitem>
|
||||||
@ -2026,11 +2088,10 @@ fvwm
|
|||||||
|
|
||||||
<listitem><para><guilabel>Dynamically build
|
<listitem><para><guilabel>Dynamically build
|
||||||
abstracts</guilabel>: this decides if &RCL; tries to build
|
abstracts</guilabel>: this decides if &RCL; tries to build
|
||||||
document abstracts when displaying the result list. Abstracts
|
document abstracts (lists of <emphasis>snippets</emphasis>)
|
||||||
are constructed by taking context from the document
|
when displaying the result list. Abstracts are constructed by
|
||||||
information, around the search terms. This can slow down
|
taking context from the document information, around the search
|
||||||
result list display significantly for big documents, and you
|
terms.</para>
|
||||||
may want to turn it off.</para>
|
|
||||||
</listitem>
|
</listitem>
|
||||||
|
|
||||||
<listitem><para><guilabel>Synthetic abstract size</guilabel>:
|
<listitem><para><guilabel>Synthetic abstract size</guilabel>:
|
||||||
@ -2081,12 +2142,12 @@ fvwm
|
|||||||
by adjusting two elements:</para>
|
by adjusting two elements:</para>
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
<listitem><para>The paragraph format</para></listitem>
|
<listitem><para>The paragraph format</para></listitem>
|
||||||
<listitem><para>Html code inside the header
|
<listitem><para>HTML code inside the header
|
||||||
section</para></listitem>
|
section</para></listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
|
|
||||||
<para>These can be edited from the <guilabel>Result list</guilabel>
|
<para>These can be edited from the <guilabel>Result list</guilabel>
|
||||||
tab of the <guilabel>Query configuration</guilabel>.</para>
|
tab of the <guilabel>GUI configuration</guilabel>.</para>
|
||||||
|
|
||||||
<para>Newer versions of Recoll (from 1.17) use a WebKit HTML
|
<para>Newer versions of Recoll (from 1.17) use a WebKit HTML
|
||||||
object by default (this may be disabled at build time), and
|
object by default (this may be disabled at build time), and
|
||||||
@ -2115,10 +2176,6 @@ fvwm
|
|||||||
</listitem>
|
</listitem>
|
||||||
<listitem><formalpara><title>%D</title><para>Date</para></formalpara>
|
<listitem><formalpara><title>%D</title><para>Date</para></formalpara>
|
||||||
</listitem>
|
</listitem>
|
||||||
<listitem><formalpara><title>%E</title><para>Precooked Snippets
|
|
||||||
link (will only appear for documents indexed with page
|
|
||||||
numbers)</para></formalpara>
|
|
||||||
</listitem>
|
|
||||||
<listitem><formalpara><title>%I</title><para>Icon image
|
<listitem><formalpara><title>%I</title><para>Icon image
|
||||||
name. This is normally determined from the mime type. The
|
name. This is normally determined from the mime type. The
|
||||||
associations are defined inside the
|
associations are defined inside the
|
||||||
@ -2131,8 +2188,8 @@ fvwm
|
|||||||
<listitem><formalpara><title>%K</title><para>Keywords (if
|
<listitem><formalpara><title>%K</title><para>Keywords (if
|
||||||
any)</para></formalpara>
|
any)</para></formalpara>
|
||||||
</listitem>
|
</listitem>
|
||||||
<listitem><formalpara><title>%L</title><para>Precooked Preview and
|
<listitem><formalpara><title>%L</title><para>Precooked Preview,
|
||||||
Edit links</para></formalpara>
|
Edit, and possibly Snippets links</para></formalpara>
|
||||||
</listitem>
|
</listitem>
|
||||||
<listitem><formalpara><title>%M</title><para>Mime
|
<listitem><formalpara><title>%M</title><para>Mime
|
||||||
type</para></formalpara>
|
type</para></formalpara>
|
||||||
@ -2156,10 +2213,11 @@ fvwm
|
|||||||
</listitem>
|
</listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
|
|
||||||
The format of the Preview and Edit links is
|
The format of the Preview, Edit, and Snippets links is
|
||||||
<literal><a href="P%N"></literal>
|
<literal><a href="P%N"></literal>,
|
||||||
and
|
|
||||||
<literal><a href="E%N"></literal>
|
<literal><a href="E%N"></literal>
|
||||||
|
and
|
||||||
|
<literal><a href="A%N"></literal>
|
||||||
where <replaceable>docnum</replaceable> (%N) expands to the document
|
where <replaceable>docnum</replaceable> (%N) expands to the document
|
||||||
number inside the result page).</para>
|
number inside the result page).</para>
|
||||||
|
|
||||||
@ -2377,7 +2435,7 @@ text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/r
|
|||||||
capabilities as the complex search interface in the
|
capabilities as the complex search interface in the
|
||||||
GUI.</para>
|
GUI.</para>
|
||||||
|
|
||||||
<para>The language is roughly based on the (seemingly defunct)
|
<para>The language is based on the (seemingly defunct)
|
||||||
<ulink url="http://www.xesam.org/main/XesamUserSearchLanguage95">
|
<ulink url="http://www.xesam.org/main/XesamUserSearchLanguage95">
|
||||||
Xesam</ulink> user search language specification.</para>
|
Xesam</ulink> user search language specification.</para>
|
||||||
|
|
||||||
@ -2405,13 +2463,15 @@ text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/r
|
|||||||
<replaceable>potatoes</replaceable> (in any part of the document).</para>
|
<replaceable>potatoes</replaceable> (in any part of the document).</para>
|
||||||
|
|
||||||
<para>An element is composed of an optional field specification,
|
<para>An element is composed of an optional field specification,
|
||||||
and a value, separated by a colon. Example:
|
and a value, separated by a colon (the field separator is the last
|
||||||
<replaceable>Beatles</replaceable>,
|
colon in the element). Example:
|
||||||
|
<replaceable>Eugenie</replaceable>,
|
||||||
<replaceable>author:balzac</replaceable>,
|
<replaceable>author:balzac</replaceable>,
|
||||||
<replaceable>dc:title:grandet</replaceable> </para>
|
<replaceable>dc:title:grandet</replaceable> </para>
|
||||||
|
|
||||||
<para>The colon, if present, means "contains". Xesam defines other
|
<para>The colon, if present, means "contains". Xesam defines other
|
||||||
relations, which are not supported for now.</para>
|
relations, which are mostly supported for now (except in special
|
||||||
|
cases, described further down).</para>
|
||||||
|
|
||||||
<para>All elements in the search entry are normally combined
|
<para>All elements in the search entry are normally combined
|
||||||
with an implicit AND. It is possible to specify that elements be
|
with an implicit AND. It is possible to specify that elements be
|
||||||
@ -2429,8 +2489,8 @@ text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/r
|
|||||||
not
|
not
|
||||||
(<replaceable>word1</replaceable> AND
|
(<replaceable>word1</replaceable> AND
|
||||||
<replaceable>word2</replaceable>) <literal>OR</literal>
|
<replaceable>word2</replaceable>) <literal>OR</literal>
|
||||||
<replaceable>word3</replaceable>. Do not enter explicit
|
<replaceable>word3</replaceable>. Explicit
|
||||||
parenthesis, they are not supported for now.</para>
|
parenthesis are <emphasis>not</emphasis> supported.</para>
|
||||||
|
|
||||||
<para>An element preceded by a <literal>-</literal> specifies a
|
<para>An element preceded by a <literal>-</literal> specifies a
|
||||||
term that should <emphasis>not</emphasis> appear. Pure negative
|
term that should <emphasis>not</emphasis> appear. Pure negative
|
||||||
@ -2777,6 +2837,11 @@ dir:recoll dir:src -dir:utils -dir:common
|
|||||||
a word can make for a slow search because &RCL; will have to
|
a word can make for a slow search because &RCL; will have to
|
||||||
scan the whole index term list to find the matches.</para>
|
scan the whole index term list to find the matches.</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
|
<listitem><para>When working with a raw index (preserving
|
||||||
|
character case and diacritics), the literal part of a wildcard
|
||||||
|
expression will be matched exactly for case and
|
||||||
|
diacritics.</para>
|
||||||
|
</listitem>
|
||||||
<listitem><para>Using a <literal>*</literal> at the end of a
|
<listitem><para>Using a <literal>*</literal> at the end of a
|
||||||
word can produce more matches than you would think, and
|
word can produce more matches than you would think, and
|
||||||
strange search results. You can use the <link
|
strange search results. You can use the <link
|
||||||
@ -2817,7 +2882,14 @@ dir:recoll dir:src -dir:utils -dir:common
|
|||||||
term</literal> at the beginning of the text would be a match for
|
term</literal> at the beginning of the text would be a match for
|
||||||
<literal>"^my term"o5</literal>.</para>
|
<literal>"^my term"o5</literal>.</para>
|
||||||
|
|
||||||
</sect2>
|
<para>Anchored searches can be very useful for searches inside
|
||||||
|
somewhat structured documents like scientific articles, in case
|
||||||
|
explicit metadata has not been supplied (a most frequent case), for
|
||||||
|
example for looking for matches inside the abstract or the list of
|
||||||
|
authors (which occur at the top of the document).</para>
|
||||||
|
|
||||||
|
|
||||||
|
</sect2>
|
||||||
|
|
||||||
</sect1> <!-- wildchars and anchors -->
|
</sect1> <!-- wildchars and anchors -->
|
||||||
|
|
||||||
@ -2892,61 +2964,13 @@ dir:recoll dir:src -dir:utils -dir:common
|
|||||||
|
|
||||||
</sect1> <!-- rcl.search.desktop -->
|
</sect1> <!-- rcl.search.desktop -->
|
||||||
|
|
||||||
|
|
||||||
<sect1 id="rcl.search.multidb">
|
|
||||||
<title>Multiple databases</title>
|
|
||||||
|
|
||||||
<para>Multiple &RCL; databases or indexes can be created by
|
|
||||||
using several configuration directories which are usually set to
|
|
||||||
index different areas of the file system. A specific index can
|
|
||||||
be selected for updating or searching, using the
|
|
||||||
<envar>RECOLL_CONFDIR</envar> environment variable or the
|
|
||||||
<option>-c</option> option to <command>recoll</command> and
|
|
||||||
<command>recollindex</command>.</para>
|
|
||||||
|
|
||||||
<para>A typical usage scenario for the multiple index feature
|
|
||||||
would be for a system administrator to set up a central index
|
|
||||||
for shared data, that you choose to search or not in addition to
|
|
||||||
your personal data. Of course, there are other
|
|
||||||
possibilities. There are many cases where you know the subset of
|
|
||||||
files that should be searched, and where narrowing the search
|
|
||||||
can improve the results. You can achieve approximately the same
|
|
||||||
effect with the directory filter in advanced search, but
|
|
||||||
multiple indexes will have much better performance and may be
|
|
||||||
worth the trouble.</para>
|
|
||||||
|
|
||||||
<para>A <command>recollindex</command> program instance can only
|
|
||||||
update one specific index.</para>
|
|
||||||
|
|
||||||
<para>The main index (defined by
|
|
||||||
<envar>RECOLL_CONFDIR</envar> or <option>-c</option>) is
|
|
||||||
always active. If this is undesirable, you can set up your
|
|
||||||
base configuration to index an empty directory.</para>
|
|
||||||
|
|
||||||
<para>The different search interfaces (GUI, command line, ...)
|
|
||||||
have different methods to define the set of indexes to be
|
|
||||||
used, see the appropriate section.</para>
|
|
||||||
|
|
||||||
<para>If a set of multiple indexes are to be used together for
|
|
||||||
searches, some configuration parameters must be consistent
|
|
||||||
among the set. These are parameters which need to be the same
|
|
||||||
when indexing and searching. As the parameters come from the
|
|
||||||
main configuration when searching, they need to be compatible
|
|
||||||
with what was set when creating the other indexes (which came
|
|
||||||
from their respective configuration directories. Most of the
|
|
||||||
relevant parameters are described in the following
|
|
||||||
<link linkend="rcl.install.config.recollconf.terms">linked
|
|
||||||
section</link>.</para>
|
|
||||||
|
|
||||||
</sect1> <!-- multiple databases -->
|
|
||||||
|
|
||||||
</chapter> <!-- Search -->
|
</chapter> <!-- Search -->
|
||||||
|
|
||||||
|
|
||||||
<chapter id="rcl.program">
|
<chapter id="rcl.program">
|
||||||
<title>Programming interface</title>
|
<title>Programming interface</title>
|
||||||
|
|
||||||
<para>&RCL; has an Application programming Interface, usable both
|
<para>&RCL; has an Application Programming Interface, usable both
|
||||||
for indexing and searching, currently accessible from the
|
for indexing and searching, currently accessible from the
|
||||||
<application>Python</application> language.</para>
|
<application>Python</application> language.</para>
|
||||||
|
|
||||||
@ -2972,8 +2996,8 @@ dir:recoll dir:src -dir:utils -dir:common
|
|||||||
<listitem><para>Simple filters (the old ones) run once and
|
<listitem><para>Simple filters (the old ones) run once and
|
||||||
exit. They can be bare programs like
|
exit. They can be bare programs like
|
||||||
<application>antiword</application>, or shell-scripts using other
|
<application>antiword</application>, or shell-scripts using other
|
||||||
programs. They are very simple to write, just having to write the
|
programs. They are very simple to write, because they just need
|
||||||
text to the standard output.</para>
|
to output the converted to the standard output.</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
<listitem><para>Multiple filters, new in 1.13, run as long as
|
<listitem><para>Multiple filters, new in 1.13, run as long as
|
||||||
their master process (ie: recollindex) is active. They can
|
their master process (ie: recollindex) is active. They can
|
||||||
@ -3008,12 +3032,12 @@ dir:recoll dir:src -dir:utils -dir:common
|
|||||||
source file name. They should output the result to stdout.</para>
|
source file name. They should output the result to stdout.</para>
|
||||||
|
|
||||||
<para>When writing a filter, you should decide if it will output
|
<para>When writing a filter, you should decide if it will output
|
||||||
plain text or html. Plain text is simpler, but you will not be able
|
plain text or HTML. Plain text is simpler, but you will not be able
|
||||||
to add metadata or vary the output character encoding (this will be
|
to add metadata or vary the output character encoding (this will be
|
||||||
defined in a configuration file). Additionally, some formatting may
|
defined in a configuration file). Additionally, some formatting may
|
||||||
easier to preserve when previewing html. Actually the deciding factor
|
be easier to preserve when previewing HTML. Actually the deciding factor
|
||||||
is metadata: &RCL; has a way to <link linkend="rcl.program.filters.html">
|
is metadata: &RCL; has a way to <link linkend="rcl.program.filters.html">
|
||||||
extract metadata from the html header and use it for field
|
extract metadata from the HTML header and use it for field
|
||||||
searches.</link>.</para>
|
searches.</link>.</para>
|
||||||
|
|
||||||
<para>The <envar>RECOLL_FILTER_FORPREVIEW</envar> environment
|
<para>The <envar>RECOLL_FILTER_FORPREVIEW</envar> environment
|
||||||
@ -3121,7 +3145,7 @@ application/x-chm = execm rclchm
|
|||||||
should be transformed into
|
should be transformed into
|
||||||
"<literal>&lt;</literal>". This is not always properly
|
"<literal>&lt;</literal>". This is not always properly
|
||||||
done by translating programs which output HTML, and of
|
done by translating programs which output HTML, and of
|
||||||
course nerver by those which output plain text.</para>
|
course never by those which output plain text.</para>
|
||||||
|
|
||||||
<para>The character set needs to be specified in the
|
<para>The character set needs to be specified in the
|
||||||
header. It does not need to be UTF-8 (&RCL; will take care
|
header. It does not need to be UTF-8 (&RCL; will take care
|
||||||
@ -3197,11 +3221,51 @@ application/x-chm = execm rclchm
|
|||||||
other aspects of fields handling is defined inside the
|
other aspects of fields handling is defined inside the
|
||||||
<filename>fields</filename> configuration file.</para>
|
<filename>fields</filename> configuration file.</para>
|
||||||
|
|
||||||
|
<para>The sequence of events for field processing is as follows:
|
||||||
|
<itemizedlist>
|
||||||
|
<listitem><para>During indexing,
|
||||||
|
<command>recollindex</command> scans all <literal>meta</literal>
|
||||||
|
fields in HTML documents (most document types are transformed
|
||||||
|
into HTML at some point). It compares the name for each element
|
||||||
|
to the configuration defining what should be done with fields
|
||||||
|
(the <filename>fields</filename> file)</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem><para>If the name for the <literal>meta</literal>
|
||||||
|
element matches one for a field that should be indexed, the
|
||||||
|
contents are processed and the terms are entered into the index
|
||||||
|
with the prefix defined in the <filename>fields</filename>
|
||||||
|
file.</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem><para>If the name for the <literal>meta</literal> element
|
||||||
|
matches one for a field that should be stored, the content of the
|
||||||
|
element is stored with the document data record, from which it
|
||||||
|
can be extracted and displayed at query time.</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem><para>At query time, if a field search is performed, the
|
||||||
|
index prefix is computed and the match is only performed against
|
||||||
|
appropriately prefixed terms in the index.</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem><para>At query time, the field can be displayed inside
|
||||||
|
the result list by using the appropriate directive in the
|
||||||
|
definition of the <link
|
||||||
|
linkend="rcl.search.gui.custom.reslist">result list paragraph
|
||||||
|
format</link>. All fields are displayed on the fields screen of
|
||||||
|
the preview window (which you can reach through the right-click
|
||||||
|
menu). This is independant of the fact that the search which
|
||||||
|
produced the results used the field or not.</para>
|
||||||
|
</listitem>
|
||||||
|
</itemizedlist>
|
||||||
|
|
||||||
<para>You can find more information in the
|
<para>You can find more information in the
|
||||||
<link linkend="rcl.install.config.fields">section about the
|
<link linkend="rcl.install.config.fields">section about the
|
||||||
<filename>fields</filename> file</link>, or in comments inside the
|
<filename>fields</filename> file</link>, or in comments inside the
|
||||||
file.</para>
|
file.</para>
|
||||||
|
|
||||||
|
<para>You can also have a look at the <ulink
|
||||||
|
url="https://bitbucket.org/medoc/recoll/wiki/HandleCustomField">example
|
||||||
|
on the Wiki</ulink>, detailing
|
||||||
|
how one could add a <emphasis>page count</emphasis> field to pdf
|
||||||
|
documents for displaying inside result lists.</para>
|
||||||
|
|
||||||
</sect1>
|
</sect1>
|
||||||
|
|
||||||
@ -3276,8 +3340,7 @@ application/x-chm = execm rclchm
|
|||||||
<para>&RCL; versions after 1.11 define a Python programming
|
<para>&RCL; versions after 1.11 define a Python programming
|
||||||
interface, both for searching and indexing.</para>
|
interface, both for searching and indexing.</para>
|
||||||
|
|
||||||
<para>The Python interface is not built by default and can be
|
<para>The Python interface can be found in the source package,
|
||||||
found in the source package,
|
|
||||||
under <filename>python/recoll</filename>.</para>
|
under <filename>python/recoll</filename>.</para>
|
||||||
<para>In order to build the module, you should first build
|
<para>In order to build the module, you should first build
|
||||||
or re-build the Recoll library using position-independant
|
or re-build the Recoll library using position-independant
|
||||||
@ -4389,6 +4452,12 @@ unac_except_trans =
|
|||||||
character, you could very well have something like
|
character, you could very well have something like
|
||||||
<literal>üue</literal> in the list.</para>
|
<literal>üue</literal> in the list.</para>
|
||||||
|
|
||||||
|
<para>The default value set for
|
||||||
|
<literal>unac_except_trans</literal> can't be listed here
|
||||||
|
because I have trouble with SGML and UTF-8, but it only
|
||||||
|
contains ligature decompositions: german ss, oe, ae, fi,
|
||||||
|
fl.</para>
|
||||||
|
|
||||||
<para>This parameter can't be defined for subdirectories, it
|
<para>This parameter can't be defined for subdirectories, it
|
||||||
is global, because there is no way to do otherwise when
|
is global, because there is no way to do otherwise when
|
||||||
querying. If you have document sets which would need different
|
querying. If you have document sets which would need different
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user