doc for indexStoreDocText
This commit is contained in:
parent
3d4fd3c62e
commit
d14ecc4ff3
@ -174,7 +174,7 @@ members. This is passed to the filters in the environment
|
|||||||
as RECOLL_FILTER_MAXMEMBERKB.</para></listitem></varlistentry>
|
as RECOLL_FILTER_MAXMEMBERKB.</para></listitem></varlistentry>
|
||||||
</variablelist></sect3>
|
</variablelist></sect3>
|
||||||
<sect3 id="RCL.INSTALL.CONFIG.RECOLLCONF.TERMS">
|
<sect3 id="RCL.INSTALL.CONFIG.RECOLLCONF.TERMS">
|
||||||
<title>Parameters affecting how we generate terms </title><variablelist>
|
<title>Parameters affecting how we generate terms and organize the index </title><variablelist>
|
||||||
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.INDEXSTRIPCHARS">
|
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.INDEXSTRIPCHARS">
|
||||||
<term><varname>indexStripChars</varname></term>
|
<term><varname>indexStripChars</varname></term>
|
||||||
<listitem><para>Decide if we store
|
<listitem><para>Decide if we store
|
||||||
@ -184,6 +184,34 @@ will be bigger, and some marginal weirdness may sometimes occur. The
|
|||||||
default is a stripped index. When using multiple indexes for a search,
|
default is a stripped index. When using multiple indexes for a search,
|
||||||
this parameter must be defined identically for all. Changing the value
|
this parameter must be defined identically for all. Changing the value
|
||||||
implies an index reset.</para></listitem></varlistentry>
|
implies an index reset.</para></listitem></varlistentry>
|
||||||
|
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.INDEXSTOREDOCTEXT">
|
||||||
|
<term><varname>indexStoreDocText</varname></term>
|
||||||
|
<listitem><para>Decide if we store the
|
||||||
|
documents' text content in the index. Storing the text
|
||||||
|
allows extracting snippets from it at query time, instead of building
|
||||||
|
them from index position data.
|
||||||
|
Newer Xapian index formats have rendered our use of positions list
|
||||||
|
unacceptably slow in some cases. The last Xapian index format with good
|
||||||
|
performance for the old method is Chert, which is default for 1.2, still
|
||||||
|
supported but not default in 1.4 and will be dropped in 1.6.
|
||||||
|
The stored document text is translated from its original format to UTF-8
|
||||||
|
plain text, but not stripped of upper-case, diacritics, or punctuation
|
||||||
|
signs. Storing it increases the index size by 10-20% typically, but also
|
||||||
|
allows for nicer snippets, so it may be worth enabling it even if not
|
||||||
|
strictly needed for performance if you can afford the space.
|
||||||
|
The variable only has an effect when creating an index, meaning that the
|
||||||
|
xapiandb directory must not exist yet. Its exact effect depends on the
|
||||||
|
Xapian version.
|
||||||
|
For Xapian 1.4, if the variable is set to 0, the Chert format will be
|
||||||
|
used, and the text will not be stored. If the variable is 1, Glass will
|
||||||
|
be used, and the text stored.
|
||||||
|
For Xapian 1.2, and for versions after 1.5 and newer, the index format is
|
||||||
|
always the default, but the variable controls if the text is stored or
|
||||||
|
not, and the abstract generation method. With Xapian 1.5 and later, and
|
||||||
|
the variable set to 0, abstract generation may be very slow, but this
|
||||||
|
setting may still be useful to save space if you do not use abstract
|
||||||
|
generation at all.
|
||||||
|
</para></listitem></varlistentry>
|
||||||
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.NONUMBERS">
|
<varlistentry id="RCL.INSTALL.CONFIG.RECOLLCONF.NONUMBERS">
|
||||||
<term><varname>nonumbers</varname></term>
|
<term><varname>nonumbers</varname></term>
|
||||||
<listitem><para>Decides if terms will be
|
<listitem><para>Decides if terms will be
|
||||||
|
|||||||
@ -1337,7 +1337,7 @@ alink="#0000FF">
|
|||||||
other constraints. Most of the relevant parameters are
|
other constraints. Most of the relevant parameters are
|
||||||
described in the <a class="link" href=
|
described in the <a class="link" href=
|
||||||
"#RCL.INSTALL.CONFIG.RECOLLCONF.TERMS" title=
|
"#RCL.INSTALL.CONFIG.RECOLLCONF.TERMS" title=
|
||||||
"6.4.2.2. Parameters affecting how we generate terms">
|
"6.4.2.2. Parameters affecting how we generate terms and organize the index">
|
||||||
linked section</a>.</p>
|
linked section</a>.</p>
|
||||||
<p>The different search interfaces (GUI, command line,
|
<p>The different search interfaces (GUI, command line,
|
||||||
...) have different methods to define the set of indexes
|
...) have different methods to define the set of indexes
|
||||||
@ -6462,18 +6462,28 @@ alink="#0000FF">
|
|||||||
</dd>
|
</dd>
|
||||||
<dt><span class=
|
<dt><span class=
|
||||||
"term">Query.execute(query_string, stemming=1,
|
"term">Query.execute(query_string, stemming=1,
|
||||||
stemlang="english")</span></dt>
|
stemlang="english",
|
||||||
|
fetchtext=False)</span></dt>
|
||||||
<dd>
|
<dd>
|
||||||
<p>Starts a search for <em class=
|
<p>Starts a search for <em class=
|
||||||
"replaceable"><code>query_string</code></em>,
|
"replaceable"><code>query_string</code></em>,
|
||||||
a <span class="application">Recoll</span>
|
a <span class="application">Recoll</span>
|
||||||
search language string.</p>
|
search language string. If the index stores
|
||||||
|
the document texts and <code class=
|
||||||
|
"literal">fetchtext</code> is True, store the
|
||||||
|
document extracted text in <code class=
|
||||||
|
"literal">doc.text</code>.</p>
|
||||||
</dd>
|
</dd>
|
||||||
<dt><span class=
|
<dt><span class=
|
||||||
"term">Query.executesd(SearchData)</span></dt>
|
"term">Query.executesd(SearchData,
|
||||||
|
fetchtext=False)</span></dt>
|
||||||
<dd>
|
<dd>
|
||||||
<p>Starts a search for the query defined by
|
<p>Starts a search for the query defined by
|
||||||
the SearchData object.</p>
|
the SearchData object. If the index stores
|
||||||
|
the document texts and <code class=
|
||||||
|
"literal">fetchtext</code> is True, store the
|
||||||
|
document extracted text in <code class=
|
||||||
|
"literal">doc.text</code>.</p>
|
||||||
</dd>
|
</dd>
|
||||||
<dt><span class=
|
<dt><span class=
|
||||||
"term">Query.fetchmany(size=query.arraysize)</span></dt>
|
"term">Query.fetchmany(size=query.arraysize)</span></dt>
|
||||||
@ -8256,7 +8266,8 @@ for i in range(nres):
|
|||||||
<h4 class="title"><a name=
|
<h4 class="title"><a name=
|
||||||
"RCL.INSTALL.CONFIG.RECOLLCONF.TERMS" id=
|
"RCL.INSTALL.CONFIG.RECOLLCONF.TERMS" id=
|
||||||
"RCL.INSTALL.CONFIG.RECOLLCONF.TERMS"></a>6.4.2.2. Parameters
|
"RCL.INSTALL.CONFIG.RECOLLCONF.TERMS"></a>6.4.2.2. Parameters
|
||||||
affecting how we generate terms</h4>
|
affecting how we generate terms and organize the
|
||||||
|
index</h4>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
@ -8277,6 +8288,45 @@ for i in range(nres):
|
|||||||
implies an index reset.</p>
|
implies an index reset.</p>
|
||||||
</dd>
|
</dd>
|
||||||
<dt><a name=
|
<dt><a name=
|
||||||
|
"RCL.INSTALL.CONFIG.RECOLLCONF.INDEXSTOREDOCTEXT"
|
||||||
|
id=
|
||||||
|
"RCL.INSTALL.CONFIG.RECOLLCONF.INDEXSTOREDOCTEXT"></a><span class="term"><code class="varname">indexStoreDocText</code></span></dt>
|
||||||
|
<dd>
|
||||||
|
<p>Decide if we store the documents' text content
|
||||||
|
in the index. Storing the text allows extracting
|
||||||
|
snippets from it at query time, instead of
|
||||||
|
building them from index position data. Newer
|
||||||
|
Xapian index formats have rendered our use of
|
||||||
|
positions list unacceptably slow in some cases.
|
||||||
|
The last Xapian index format with good
|
||||||
|
performance for the old method is Chert, which is
|
||||||
|
default for 1.2, still supported but not default
|
||||||
|
in 1.4 and will be dropped in 1.6. The stored
|
||||||
|
document text is translated from its original
|
||||||
|
format to UTF-8 plain text, but not stripped of
|
||||||
|
upper-case, diacritics, or punctuation signs.
|
||||||
|
Storing it increases the index size by 10-20%
|
||||||
|
typically, but also allows for nicer snippets, so
|
||||||
|
it may be worth enabling it even if not strictly
|
||||||
|
needed for performance if you can afford the
|
||||||
|
space. The variable only has an effect when
|
||||||
|
creating an index, meaning that the xapiandb
|
||||||
|
directory must not exist yet. Its exact effect
|
||||||
|
depends on the Xapian version. For Xapian 1.4, if
|
||||||
|
the variable is set to 0, the Chert format will
|
||||||
|
be used, and the text will not be stored. If the
|
||||||
|
variable is 1, Glass will be used, and the text
|
||||||
|
stored. For Xapian 1.2, and for versions after
|
||||||
|
1.5 and newer, the index format is always the
|
||||||
|
default, but the variable controls if the text is
|
||||||
|
stored or not, and the abstract generation
|
||||||
|
method. With Xapian 1.5 and later, and the
|
||||||
|
variable set to 0, abstract generation may be
|
||||||
|
very slow, but this setting may still be useful
|
||||||
|
to save space if you do not use abstract
|
||||||
|
generation at all.</p>
|
||||||
|
</dd>
|
||||||
|
<dt><a name=
|
||||||
"RCL.INSTALL.CONFIG.RECOLLCONF.NONUMBERS" id=
|
"RCL.INSTALL.CONFIG.RECOLLCONF.NONUMBERS" id=
|
||||||
"RCL.INSTALL.CONFIG.RECOLLCONF.NONUMBERS"></a><span class="term"><code class="varname">nonumbers</code></span></dt>
|
"RCL.INSTALL.CONFIG.RECOLLCONF.NONUMBERS"></a><span class="term"><code class="varname">nonumbers</code></span></dt>
|
||||||
<dd>
|
<dd>
|
||||||
|
|||||||
@ -1847,7 +1847,8 @@
|
|||||||
current result. I can't remember a single instance where this
|
current result. I can't remember a single instance where this
|
||||||
function was actually useful to me...</para>
|
function was actually useful to me...</para>
|
||||||
|
|
||||||
<para id="RCL.SEARCH.GUI.RESULTLIST.MENU.SNIPPETS">The <guilabel>Open Snippets Window</guilabel> entry will only
|
<para id="RCL.SEARCH.GUI.RESULTLIST.MENU.SNIPPETS">The
|
||||||
|
<guilabel>Open Snippets Window</guilabel> entry will only
|
||||||
appear for documents which support page breaks (typically
|
appear for documents which support page breaks (typically
|
||||||
PDF, Postscript, DVI). The snippets window lists extracts from
|
PDF, Postscript, DVI). The snippets window lists extracts from
|
||||||
the document, taken around search terms occurrences, along with the
|
the document, taken around search terms occurrences, along with the
|
||||||
@ -5013,16 +5014,22 @@
|
|||||||
|
|
||||||
<varlistentry>
|
<varlistentry>
|
||||||
<term>Query.execute(query_string, stemming=1,
|
<term>Query.execute(query_string, stemming=1,
|
||||||
stemlang="english")</term>
|
stemlang="english", fetchtext=False)</term>
|
||||||
<listitem><para>Starts a search
|
<listitem><para>Starts a search
|
||||||
for <replaceable>query_string</replaceable>, a &RCL;
|
for <replaceable>query_string</replaceable>, a &RCL;
|
||||||
search language string.</para></listitem>
|
search language string. If the index stores the document
|
||||||
|
texts and <literal>fetchtext</literal> is True, store the
|
||||||
|
document extracted text in
|
||||||
|
<literal>doc.text</literal>.</para></listitem>
|
||||||
</varlistentry>
|
</varlistentry>
|
||||||
|
|
||||||
<varlistentry>
|
<varlistentry>
|
||||||
<term>Query.executesd(SearchData)</term>
|
<term>Query.executesd(SearchData, fetchtext=False)</term>
|
||||||
<listitem><para>Starts a search for the query defined by the
|
<listitem><para>Starts a search for the query defined by
|
||||||
SearchData object.</para></listitem>
|
the SearchData object. If the index stores the document
|
||||||
|
texts and <literal>fetchtext</literal> is True, store the
|
||||||
|
document extracted text in
|
||||||
|
<literal>doc.text</literal>.</para></listitem>
|
||||||
</varlistentry>
|
</varlistentry>
|
||||||
|
|
||||||
<varlistentry>
|
<varlistentry>
|
||||||
|
|||||||
@ -241,19 +241,27 @@ indexStripChars = 1
|
|||||||
# performance for the old method is Chert, which is default for 1.2, still
|
# performance for the old method is Chert, which is default for 1.2, still
|
||||||
# supported but not default in 1.4 and will be dropped in 1.6.
|
# supported but not default in 1.4 and will be dropped in 1.6.
|
||||||
#
|
#
|
||||||
# The document text is translated from its original format to UTF-8 plain
|
# The stored document text is translated from its original format to UTF-8
|
||||||
# text, but not stripped of upper-case, diacritics, or punctuation
|
# plain text, but not stripped of upper-case, diacritics, or punctuation
|
||||||
# signs. Storing it increases the index size by 10-20% typically, but also
|
# signs. Storing it increases the index size by 10-20% typically, but also
|
||||||
# allows for nicer snippets, so it may be worth enabling it even if not
|
# allows for nicer snippets, so it may be worth enabling it even if not
|
||||||
# strictly needed for performance if you can afford the space.
|
# strictly needed for performance if you can afford the space.
|
||||||
#
|
#
|
||||||
# The variable only has an effect when creating an index, tested as
|
# The variable only has an effect when creating an index, meaning that the
|
||||||
# xapiandb directory not existing. Its exact effect depends on the Xapian
|
# xapiandb directory must not exist yet. Its exact effect depends on the
|
||||||
# version. For Xapian 1.2, you can force the new method by setting the
|
# Xapian version.
|
||||||
# variable to 1. For Xapian 1.4, the Chert format will be used, and the text
|
#
|
||||||
# will not be stored if the variable is not set or set to 0. For later
|
# For Xapian 1.4, if the variable is set to 0, the Chert format will be
|
||||||
# Xapian versions, the variable does nothing, the text is always stored.
|
# used, and the text will not be stored. If the variable is 1, Glass will
|
||||||
# </desc></var>
|
# be used, and the text stored.
|
||||||
|
#
|
||||||
|
# For Xapian 1.2, and for versions after 1.5 and newer, the index format is
|
||||||
|
# always the default, but the variable controls if the text is stored or
|
||||||
|
# not, and the abstract generation method. With Xapian 1.5 and later, and
|
||||||
|
# the variable set to 0, abstract generation may be very slow, but this
|
||||||
|
# setting may still be useful to save space if you do not use abstract
|
||||||
|
# generation at all.
|
||||||
|
# </descr></var>
|
||||||
indexStoreDocText = 1
|
indexStoreDocText = 1
|
||||||
|
|
||||||
# <var name="nonumbers" type="bool"><brief>Decides if terms will be
|
# <var name="nonumbers" type="bool"><brief>Decides if terms will be
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user