diff --git a/src/doc/user/recoll.conf.xml b/src/doc/user/recoll.conf.xml index 51bbae4d..1637c7b1 100644 --- a/src/doc/user/recoll.conf.xml +++ b/src/doc/user/recoll.conf.xml @@ -174,7 +174,7 @@ members. This is passed to the filters in the environment as RECOLL_FILTER_MAXMEMBERKB. -Parameters affecting how we generate terms +Parameters affecting how we generate terms and organize the index indexStripChars Decide if we store @@ -184,6 +184,34 @@ will be bigger, and some marginal weirdness may sometimes occur. The default is a stripped index. When using multiple indexes for a search, this parameter must be defined identically for all. Changing the value implies an index reset. + +indexStoreDocText +Decide if we store the +documents' text content in the index. Storing the text +allows extracting snippets from it at query time, instead of building +them from index position data. +Newer Xapian index formats have rendered our use of positions list +unacceptably slow in some cases. The last Xapian index format with good +performance for the old method is Chert, which is default for 1.2, still +supported but not default in 1.4 and will be dropped in 1.6. +The stored document text is translated from its original format to UTF-8 +plain text, but not stripped of upper-case, diacritics, or punctuation +signs. Storing it increases the index size by 10-20% typically, but also +allows for nicer snippets, so it may be worth enabling it even if not +strictly needed for performance if you can afford the space. +The variable only has an effect when creating an index, meaning that the +xapiandb directory must not exist yet. Its exact effect depends on the +Xapian version. +For Xapian 1.4, if the variable is set to 0, the Chert format will be +used, and the text will not be stored. If the variable is 1, Glass will +be used, and the text stored. +For Xapian 1.2, and for versions after 1.5 and newer, the index format is +always the default, but the variable controls if the text is stored or +not, and the abstract generation method. With Xapian 1.5 and later, and +the variable set to 0, abstract generation may be very slow, but this +setting may still be useful to save space if you do not use abstract +generation at all. + nonumbers Decides if terms will be diff --git a/src/doc/user/usermanual.html b/src/doc/user/usermanual.html index b9ba51da..bd7fd4f5 100644 --- a/src/doc/user/usermanual.html +++ b/src/doc/user/usermanual.html @@ -1337,7 +1337,7 @@ alink="#0000FF"> other constraints. Most of the relevant parameters are described in the + "6.4.2.2. Parameters affecting how we generate terms and organize the index"> linked section.

The different search interfaces (GUI, command line, ...) have different methods to define the set of indexes @@ -6462,18 +6462,28 @@ alink="#0000FF">

Query.execute(query_string, stemming=1, - stemlang="english")
+ stemlang="english", + fetchtext=False)

Starts a search for query_string, a Recoll - search language string.

+ search language string. If the index stores + the document texts and fetchtext is True, store the + document extracted text in doc.text.

Query.executesd(SearchData)
+ "term">Query.executesd(SearchData, + fetchtext=False)

Starts a search for the query defined by - the SearchData object.

+ the SearchData object. If the index stores + the document texts and fetchtext is True, store the + document extracted text in doc.text.

Query.fetchmany(size=query.arraysize)
@@ -8256,7 +8266,8 @@ for i in range(nres):

6.4.2.2. Parameters - affecting how we generate terms

+ affecting how we generate terms and organize the + index @@ -8277,6 +8288,45 @@ for i in range(nres): implies an index reset.

indexStoreDocText
+
+

Decide if we store the documents' text content + in the index. Storing the text allows extracting + snippets from it at query time, instead of + building them from index position data. Newer + Xapian index formats have rendered our use of + positions list unacceptably slow in some cases. + The last Xapian index format with good + performance for the old method is Chert, which is + default for 1.2, still supported but not default + in 1.4 and will be dropped in 1.6. The stored + document text is translated from its original + format to UTF-8 plain text, but not stripped of + upper-case, diacritics, or punctuation signs. + Storing it increases the index size by 10-20% + typically, but also allows for nicer snippets, so + it may be worth enabling it even if not strictly + needed for performance if you can afford the + space. The variable only has an effect when + creating an index, meaning that the xapiandb + directory must not exist yet. Its exact effect + depends on the Xapian version. For Xapian 1.4, if + the variable is set to 0, the Chert format will + be used, and the text will not be stored. If the + variable is 1, Glass will be used, and the text + stored. For Xapian 1.2, and for versions after + 1.5 and newer, the index format is always the + default, but the variable controls if the text is + stored or not, and the abstract generation + method. With Xapian 1.5 and later, and the + variable set to 0, abstract generation may be + very slow, but this setting may still be useful + to save space if you do not use abstract + generation at all.

+
+
nonumbers
diff --git a/src/doc/user/usermanual.xml b/src/doc/user/usermanual.xml index b6192e07..711cf3c6 100644 --- a/src/doc/user/usermanual.xml +++ b/src/doc/user/usermanual.xml @@ -1847,7 +1847,8 @@ current result. I can't remember a single instance where this function was actually useful to me... - The Open Snippets Window entry will only + The + Open Snippets Window entry will only appear for documents which support page breaks (typically PDF, Postscript, DVI). The snippets window lists extracts from the document, taken around search terms occurrences, along with the @@ -5013,16 +5014,22 @@ Query.execute(query_string, stemming=1, - stemlang="english") + stemlang="english", fetchtext=False) Starts a search for query_string, a &RCL; - search language string. + search language string. If the index stores the document + texts and fetchtext is True, store the + document extracted text in + doc.text. - Query.executesd(SearchData) - Starts a search for the query defined by the - SearchData object. + Query.executesd(SearchData, fetchtext=False) + Starts a search for the query defined by + the SearchData object. If the index stores the document + texts and fetchtext is True, store the + document extracted text in + doc.text. diff --git a/src/sampleconf/recoll.conf b/src/sampleconf/recoll.conf index d14c7204..3088dade 100644 --- a/src/sampleconf/recoll.conf +++ b/src/sampleconf/recoll.conf @@ -241,19 +241,27 @@ indexStripChars = 1 # performance for the old method is Chert, which is default for 1.2, still # supported but not default in 1.4 and will be dropped in 1.6. # -# The document text is translated from its original format to UTF-8 plain -# text, but not stripped of upper-case, diacritics, or punctuation +# The stored document text is translated from its original format to UTF-8 +# plain text, but not stripped of upper-case, diacritics, or punctuation # signs. Storing it increases the index size by 10-20% typically, but also # allows for nicer snippets, so it may be worth enabling it even if not # strictly needed for performance if you can afford the space. # -# The variable only has an effect when creating an index, tested as -# xapiandb directory not existing. Its exact effect depends on the Xapian -# version. For Xapian 1.2, you can force the new method by setting the -# variable to 1. For Xapian 1.4, the Chert format will be used, and the text -# will not be stored if the variable is not set or set to 0. For later -# Xapian versions, the variable does nothing, the text is always stored. -# +# The variable only has an effect when creating an index, meaning that the +# xapiandb directory must not exist yet. Its exact effect depends on the +# Xapian version. +# +# For Xapian 1.4, if the variable is set to 0, the Chert format will be +# used, and the text will not be stored. If the variable is 1, Glass will +# be used, and the text stored. +# +# For Xapian 1.2, and for versions after 1.5 and newer, the index format is +# always the default, but the variable controls if the text is stored or +# not, and the abstract generation method. With Xapian 1.5 and later, and +# the variable set to 0, abstract generation may be very slow, but this +# setting may still be useful to save space if you do not use abstract +# generation at all. +# indexStoreDocText = 1 # Decides if terms will be