doc for indexStoreDocText

This commit is contained in:
Jean-Francois Dockes 2018-01-25 13:41:19 +01:00
parent 3d4fd3c62e
commit d14ecc4ff3
4 changed files with 115 additions and 22 deletions

View File

@ -174,7 +174,7 @@ members. This is passed to the filters in the environment
as RECOLL_FILTER_MAXMEMBERKB.</para></listitem></varlistentry>
</variablelist></sect3>
<sect3 id="RCL.INSTALL.CONFIG.RECOLLCONF.TERMS">
<title>Parameters affecting how we generate terms </title><variablelist>
<title>Parameters affecting how we generate terms and organize the index </title><variablelist>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.INDEXSTRIPCHARS">
<term><varname>indexStripChars</varname></term>
<listitem><para>Decide if we store
@ -184,6 +184,34 @@ will be bigger, and some marginal weirdness may sometimes occur. The
default is a stripped index. When using multiple indexes for a search,
this parameter must be defined identically for all. Changing the value
implies an index reset.</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.INDEXSTOREDOCTEXT">
<term><varname>indexStoreDocText</varname></term>
<listitem><para>Decide if we store the
documents' text content in the index. Storing the text
allows extracting snippets from it at query time, instead of building
them from index position data.
Newer Xapian index formats have rendered our use of positions list
unacceptably slow in some cases. The last Xapian index format with good
performance for the old method is Chert, which is default for 1.2, still
supported but not default in 1.4 and will be dropped in 1.6.
The stored document text is translated from its original format to UTF-8
plain text, but not stripped of upper-case, diacritics, or punctuation
signs. Storing it increases the index size by 10-20% typically, but also
allows for nicer snippets, so it may be worth enabling it even if not
strictly needed for performance if you can afford the space.
The variable only has an effect when creating an index, meaning that the
xapiandb directory must not exist yet. Its exact effect depends on the
Xapian version.
For Xapian 1.4, if the variable is set to 0, the Chert format will be
used, and the text will not be stored. If the variable is 1, Glass will
be used, and the text stored.
For Xapian 1.2, and for versions after 1.5 and newer, the index format is
always the default, but the variable controls if the text is stored or
not, and the abstract generation method. With Xapian 1.5 and later, and
the variable set to 0, abstract generation may be very slow, but this
setting may still be useful to save space if you do not use abstract
generation at all.
</para></listitem></varlistentry>
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.NONUMBERS">
<term><varname>nonumbers</varname></term>
<listitem><para>Decides if terms will be

View File

@ -1337,7 +1337,7 @@ alink="#0000FF">
other constraints. Most of the relevant parameters are
described in the <a class="link" href=
"#RCL.INSTALL.CONFIG.RECOLLCONF.TERMS" title=
"6.4.2.2.&nbsp;Parameters affecting how we generate terms">
"6.4.2.2.&nbsp;Parameters affecting how we generate terms and organize the index">
linked section</a>.</p>
<p>The different search interfaces (GUI, command line,
...) have different methods to define the set of indexes
@ -6462,18 +6462,28 @@ alink="#0000FF">
</dd>
<dt><span class=
"term">Query.execute(query_string, stemming=1,
stemlang="english")</span></dt>
stemlang="english",
fetchtext=False)</span></dt>
<dd>
<p>Starts a search for <em class=
"replaceable"><code>query_string</code></em>,
a <span class="application">Recoll</span>
search language string.</p>
search language string. If the index stores
the document texts and <code class=
"literal">fetchtext</code> is True, store the
document extracted text in <code class=
"literal">doc.text</code>.</p>
</dd>
<dt><span class=
"term">Query.executesd(SearchData)</span></dt>
"term">Query.executesd(SearchData,
fetchtext=False)</span></dt>
<dd>
<p>Starts a search for the query defined by
the SearchData object.</p>
the SearchData object. If the index stores
the document texts and <code class=
"literal">fetchtext</code> is True, store the
document extracted text in <code class=
"literal">doc.text</code>.</p>
</dd>
<dt><span class=
"term">Query.fetchmany(size=query.arraysize)</span></dt>
@ -8256,7 +8266,8 @@ for i in range(nres):
<h4 class="title"><a name=
"RCL.INSTALL.CONFIG.RECOLLCONF.TERMS" id=
"RCL.INSTALL.CONFIG.RECOLLCONF.TERMS"></a>6.4.2.2.&nbsp;Parameters
affecting how we generate terms</h4>
affecting how we generate terms and organize the
index</h4>
</div>
</div>
</div>
@ -8277,6 +8288,45 @@ for i in range(nres):
implies an index reset.</p>
</dd>
<dt><a name=
"RCL.INSTALL.CONFIG.RECOLLCONF.INDEXSTOREDOCTEXT"
id=
"RCL.INSTALL.CONFIG.RECOLLCONF.INDEXSTOREDOCTEXT"></a><span class="term"><code class="varname">indexStoreDocText</code></span></dt>
<dd>
<p>Decide if we store the documents' text content
in the index. Storing the text allows extracting
snippets from it at query time, instead of
building them from index position data. Newer
Xapian index formats have rendered our use of
positions list unacceptably slow in some cases.
The last Xapian index format with good
performance for the old method is Chert, which is
default for 1.2, still supported but not default
in 1.4 and will be dropped in 1.6. The stored
document text is translated from its original
format to UTF-8 plain text, but not stripped of
upper-case, diacritics, or punctuation signs.
Storing it increases the index size by 10-20%
typically, but also allows for nicer snippets, so
it may be worth enabling it even if not strictly
needed for performance if you can afford the
space. The variable only has an effect when
creating an index, meaning that the xapiandb
directory must not exist yet. Its exact effect
depends on the Xapian version. For Xapian 1.4, if
the variable is set to 0, the Chert format will
be used, and the text will not be stored. If the
variable is 1, Glass will be used, and the text
stored. For Xapian 1.2, and for versions after
1.5 and newer, the index format is always the
default, but the variable controls if the text is
stored or not, and the abstract generation
method. With Xapian 1.5 and later, and the
variable set to 0, abstract generation may be
very slow, but this setting may still be useful
to save space if you do not use abstract
generation at all.</p>
</dd>
<dt><a name=
"RCL.INSTALL.CONFIG.RECOLLCONF.NONUMBERS" id=
"RCL.INSTALL.CONFIG.RECOLLCONF.NONUMBERS"></a><span class="term"><code class="varname">nonumbers</code></span></dt>
<dd>

View File

@ -1847,7 +1847,8 @@
current result. I can't remember a single instance where this
function was actually useful to me...</para>
<para id="RCL.SEARCH.GUI.RESULTLIST.MENU.SNIPPETS">The <guilabel>Open Snippets Window</guilabel> entry will only
<para id="RCL.SEARCH.GUI.RESULTLIST.MENU.SNIPPETS">The
<guilabel>Open Snippets Window</guilabel> entry will only
appear for documents which support page breaks (typically
PDF, Postscript, DVI). The snippets window lists extracts from
the document, taken around search terms occurrences, along with the
@ -5013,16 +5014,22 @@
<varlistentry>
<term>Query.execute(query_string, stemming=1,
stemlang="english")</term>
stemlang="english", fetchtext=False)</term>
<listitem><para>Starts a search
for <replaceable>query_string</replaceable>, a &RCL;
search language string.</para></listitem>
search language string. If the index stores the document
texts and <literal>fetchtext</literal> is True, store the
document extracted text in
<literal>doc.text</literal>.</para></listitem>
</varlistentry>
<varlistentry>
<term>Query.executesd(SearchData)</term>
<listitem><para>Starts a search for the query defined by the
SearchData object.</para></listitem>
<term>Query.executesd(SearchData, fetchtext=False)</term>
<listitem><para>Starts a search for the query defined by
the SearchData object. If the index stores the document
texts and <literal>fetchtext</literal> is True, store the
document extracted text in
<literal>doc.text</literal>.</para></listitem>
</varlistentry>
<varlistentry>

View File

@ -241,19 +241,27 @@ indexStripChars = 1
# performance for the old method is Chert, which is default for 1.2, still
# supported but not default in 1.4 and will be dropped in 1.6.
#
# The document text is translated from its original format to UTF-8 plain
# text, but not stripped of upper-case, diacritics, or punctuation
# The stored document text is translated from its original format to UTF-8
# plain text, but not stripped of upper-case, diacritics, or punctuation
# signs. Storing it increases the index size by 10-20% typically, but also
# allows for nicer snippets, so it may be worth enabling it even if not
# strictly needed for performance if you can afford the space.
#
# The variable only has an effect when creating an index, tested as
# xapiandb directory not existing. Its exact effect depends on the Xapian
# version. For Xapian 1.2, you can force the new method by setting the
# variable to 1. For Xapian 1.4, the Chert format will be used, and the text
# will not be stored if the variable is not set or set to 0. For later
# Xapian versions, the variable does nothing, the text is always stored.
# </desc></var>
# The variable only has an effect when creating an index, meaning that the
# xapiandb directory must not exist yet. Its exact effect depends on the
# Xapian version.
#
# For Xapian 1.4, if the variable is set to 0, the Chert format will be
# used, and the text will not be stored. If the variable is 1, Glass will
# be used, and the text stored.
#
# For Xapian 1.2, and for versions after 1.5 and newer, the index format is
# always the default, but the variable controls if the text is stored or
# not, and the abstract generation method. With Xapian 1.5 and later, and
# the variable set to 0, abstract generation may be very slow, but this
# setting may still be useful to save space if you do not use abstract
# generation at all.
# </descr></var>
indexStoreDocText = 1
# <var name="nonumbers" type="bool"><brief>Decides if terms will be