diff --git a/src/doc/user/recoll.conf.xml b/src/doc/user/recoll.conf.xml
index 521d84d3..faddfa97 100644
--- a/src/doc/user/recoll.conf.xml
+++ b/src/doc/user/recoll.conf.xml
@@ -126,15 +126,14 @@ types. Lets you exclude some types from indexing. MIME type
names should be taken from the mimemap file (the values may be different
from xdg-mime or file -i output in some cases) Can be redefined for
subtrees.
-
-nomd5mimetypes
-Don't compute md5 for
-these types. md5 checksums are used only for deduplicating
-results, and can be very expensive to compute on multimedia or other big
-files. This list lets you turn off md5 computation for selected types. It
-is global (no redefinition for subtrees). At the moment, it only has an
-effect for external handlers (exec and execm). The file types can be
-specified by listing either MIME types (e.g. audio/mpeg) or handler names
+
+nomd5types
+Don't compute md5 for these types. md5 checksums are used only for deduplicating results, and can be
+very expensive to compute on multimedia or other big files. This list
+lets you turn off md5 computation for selected types. It is global (no
+redefinition for subtrees). At the moment, it only has an effect for
+external handlers (exec and execm). The file types can be specified by
+listing either MIME types (e.g. audio/mpeg) or handler names
(e.g. rclaudio).
compressedfilemaxkbs
@@ -244,6 +243,15 @@ for a subtree.
'coworker' also when the input is 'co-worker'. This is new
in version 1.22, and on by default. Setting the variable to off allows
restoring the previous behaviour.
+
+backslashasletter
+Process backslash as normal letter This may make sense for people wanting to index TeX commands as
+such but is not of much general use.
+
+maxtermlength
+Maximum term length. Words longer than this will be discarded.
+The default is 40 and used to be hard-coded, but it can now be
+adjusted. You need an index reset if you change the value.
nocjk
Decides if specific East Asian
@@ -371,8 +379,8 @@ subpath under cachedir.
over which we stop indexing. The value is a percentage,
corresponding to what the "Capacity" df output column shows. The default
value is 0, meaning no checking.
-
-xapiandb
+
+dbdir
Xapian database directory
location. This will be created on first indexing. If the
value is not an absolute path, it will be interpreted as relative to
@@ -447,8 +455,8 @@ $XAPIAN_FLUSH_THRESHOLD documents created, modified or deleted: as memory
usage depends on average document size, not only document count, the
Xapian approach is is not very useful, and you should let Recoll manage
the flushes. The program compiled value is 0. The configured default
-value (from this file) is 10 MB, and will be too low in many cases (it is
-chosen to conserve memory). If you are looking
+value (from this file) is now 50 MB, and should be ok in many cases.
+You can set it as low as 10 to conserve memory, but if you are looking
for maximum speed, you may want to experiment with values between 20 and
200. In my experience, values beyond this are always counterproductive. If
you find otherwise, please drop me a note.
@@ -677,6 +685,11 @@ with possibly meaning-altering missing words.
Attempt OCR of PDF files with no text content if both tesseract and
pdftoppm are installed. The default is off because OCR is so
very slow.
+
+pdfocrlang
+Language to assume for PDF OCR. This is very important for having a reasonable rate of errors
+with tesseract. This can also be set through a configuration variable
+or directory-local parameters. See the rclpdf.py script.
pdfattach
Enable PDF attachment extraction by executing pdftk (if
diff --git a/src/doc/user/usermanual.html b/src/doc/user/usermanual.html
index 113d576f..26d99aa8 100644
--- a/src/doc/user/usermanual.html
+++ b/src/doc/user/usermanual.html
@@ -8300,8 +8300,8 @@ for i in range(nres):
cases) Can be redefined for subtrees.
nomd5mimetypes
+ "RCL.INSTALL.CONFIG.RECOLLCONF.NOMD5TYPES" id=
+ "RCL.INSTALL.CONFIG.RECOLLCONF.NOMD5TYPES">nomd5types
Don't compute md5 for these types. md5
checksums are used only for deduplicating
@@ -8496,6 +8496,25 @@ for i in range(nres):
1.22, and on by default. Setting the variable to
off allows restoring the previous behaviour.
+ backslashasletter
+
+ Process backslash as normal letter This may
+ make sense for people wanting to index TeX
+ commands as such but is not of much general
+ use.
+
+ maxtermlength
+
+ Maximum term length. Words longer than this
+ will be discarded. The default is 40 and used to
+ be hard-coded, but it can now be adjusted. You
+ need an index reset if you change the value.
+
nocjk
@@ -8696,9 +8715,9 @@ for i in range(nres):
column shows. The default value is 0, meaning no
checking.
- xapiandb
+ dbdir
Xapian database directory location. This will
be created on first indexing. If the value is not
@@ -8840,13 +8859,13 @@ for i in range(nres):
the Xapian approach is is not very useful, and
you should let Recoll manage the flushes. The
program compiled value is 0. The configured
- default value (from this file) is 10 MB, and will
- be too low in many cases (it is chosen to
- conserve memory). If you are looking for maximum
- speed, you may want to experiment with values
- between 20 and 200. In my experience, values
- beyond this are always counterproductive. If you
- find otherwise, please drop me a note.
+ default value (from this file) is now 50 MB, and
+ should be ok in many cases. You can set it as low
+ as 10 to conserve memory, but if you are looking
+ for maximum speed, you may want to experiment
+ with values between 20 and 200. In my experience,
+ values beyond this are always counterproductive.
+ If you find otherwise, please drop me a note.
pdfocrlang
+
+ Language to assume for PDF OCR. This is very
+ important for having a reasonable rate of errors
+ with tesseract. This can also be set through a
+ configuration variable or directory-local
+ parameters. See the rclpdf.py script.
+
+ pdfattach
diff --git a/src/sampleconf/recoll.conf b/src/sampleconf/recoll.conf
index d5accddb..904233b2 100644
--- a/src/sampleconf/recoll.conf
+++ b/src/sampleconf/recoll.conf
@@ -168,14 +168,16 @@ skippedPaths = /media
# subtrees.
#excludedmimetypes =
-# Don't compute md5 for
-# these types.md5 checksums are used only for deduplicating
-# results, and can be very expensive to compute on multimedia or other big
-# files. This list lets you turn off md5 computation for selected types. It
-# is global (no redefinition for subtrees). At the moment, it only has an
-# effect for external handlers (exec and execm). The file types can be
-# specified by listing either MIME types (e.g. audio/mpeg) or handler names
-# (e.g. rclaudio).
+#
+# Don't compute md5 for these types.
+# md5 checksums are used only for deduplicating results, and can be
+# very expensive to compute on multimedia or other big files. This list
+# lets you turn off md5 computation for selected types. It is global (no
+# redefinition for subtrees). At the moment, it only has an effect for
+# external handlers (exec and execm). The file types can be specified by
+# listing either MIME types (e.g. audio/mpeg) or handler names
+# (e.g. rclaudio).
+#
nomd5types = rclaudio
# Size limit for compressed
@@ -299,6 +301,21 @@ indexStoreDocText = 1
# restoring the previous behaviour.
#dehyphenate = 1
+#
+# Process backslash as normal letter
+# This may make sense for people wanting to index TeX commands as
+# such but is not of much general use.
+#
+#backslashasletter = 0
+
+#
+# Maximum term length.
+# Words longer than this will be discarded.
+# The default is 40 and used to be hard-coded, but it can now be
+# adjusted. You need an index reset if you change the value.
+#
+#maxtermlength = 40
+
# Decides if specific East Asian
# (Chinese Korean Japanese) characters/word splitting is turned
# off.This will save a small amount of CPU if you have no CJK
@@ -435,7 +452,7 @@ noxattrfields = 0
# value is 0, meaning no checking.
maxfsoccuppc = 0
-# Xapian database directory
+# Xapian database directory
# location.This will be created on first indexing. If the
# value is not an absolute path, it will be interpreted as relative to
# cachedir if set, or the configuration directory (-c argument or
@@ -837,6 +854,14 @@ snippetMaxPosWalk = 1000000
# very slow.
#pdfocr = 0
+#
+# Language to assume for PDF OCR.
+# This is very important for having a reasonable rate of errors
+# with tesseract. This can also be set through a configuration variable
+# or directory-local parameters. See the rclpdf.py script.
+#
+#pdfocrlang = eng
+
#
#
# Enable PDF attachment extraction by executing pdftk (if