diff --git a/src/doc/user/usermanual.html b/src/doc/user/usermanual.html index 26d99aa8..fc22e43e 100644 --- a/src/doc/user/usermanual.html +++ b/src/doc/user/usermanual.html @@ -110,6 +110,9 @@ alink="#0000FF">
For a given configuration directory, you can
+ specify a non-default storage location for the index
+ by setting the dbdir
+ parameter in the configuration file (see the
+
+ configuration section). This method would mainly
+ be of use if you wanted to keep the configuration
+ directory in its default location, but desired
+ another location for the index, typically out of disk
+ occupation or performance concerns.
You can specify a different configuration
directory by setting the
whatever subset of the available data you wish to
make searchable.
For a given configuration directory, you can
- specify a non-default storage location for the index
- by setting the dbdir
- parameter in the configuration file (see the
-
- configuration section). This method would mainly
- be of use if you wanted to keep the configuration
- directory in its default location, but desired
- another location for the index, typically out of disk
- occupation concerns.
The size of the index is determined by the size of the set of documents, but the ratio can vary a lot. For a @@ -1154,9 +1157,9 @@ alink="#0000FF"> non-indexed data (an extreme example being a set of mp3 files where only the tags would be indexed).
Of course, images, sound and video do not increase the - index size, which means that typically, even a big index - will be negligible against the total amount of data on the - computer.
+ index size, which means that in most cases, the space used + by the index will be negligible against the total amount of + data on the computer.The index data directory (xapiandb) only contains data that can be
completely rebuilt by an index run (as long as the original
@@ -1186,8 +1189,10 @@ alink="#0000FF">
because its format is not supported any more, you will
have to explicitly delete the old index (typically
~/.recoll/xapiandb), then
- run a normal indexing command. Using option -z would not work in this situation.
-z would not work in this
+ situation.
umask used during
index updates.
This only needs concern you if your index is going to + be bigger than around 5 GBytes. Beyond 10 GBytes, it + becomes a serious issue. Most people have much smaller + indexes. For reference, 5 GBytes would be around 2000 + bibles, a lot of text. If you have a huge text dataset + (remember: images don't count, the text content of PDFs + is typically less than 5% of the file size), read on.
+The amount of writing performed by Xapian during index + creation is not linear with the index size (it is + somewhere between linear and quadratic). For big indexes + this becomes a performance issue, and may even be an SSD + disk wear issue.
+The problem can be mitigated by observing the + following rules:
+Partition the data set and create several + indexes of reasonable size rather than a huge one. + These indexes can then be queried in parallel + (using the Recoll + external indexes facility), or merged using + xapian-compact.
+Have a lot of RAM available and set the
+ idxflushmb
+ Recoll
+ configuration parameter as high as you can without
+ swapping (experimentation will be needed). 200
+ would be a minimum in this context.
Use Xapian 1.4.10 or newer, as this version + brought a significant improvement in the amount of + writes.
+