From 9001129bf40b88a6fb080936e0e811d8f3310986 Mon Sep 17 00:00:00 2001 From: dockes Date: Sat, 8 Apr 2006 14:00:14 +0000 Subject: [PATCH] *** empty log message *** --- src/VERSION | 2 +- src/doc/user/usermanual.sgml | 85 +++++++++++++++++++++++------------- 2 files changed, 56 insertions(+), 31 deletions(-) diff --git a/src/VERSION b/src/VERSION index 88c5fb89..347f5833 100644 --- a/src/VERSION +++ b/src/VERSION @@ -1 +1 @@ -1.4.0 +1.4.1 diff --git a/src/doc/user/usermanual.sgml b/src/doc/user/usermanual.sgml index a2f495d7..4cd2771a 100644 --- a/src/doc/user/usermanual.sgml +++ b/src/doc/user/usermanual.sgml @@ -24,7 +24,7 @@ Dockes - $Id: usermanual.sgml,v 1.11 2006-04-07 13:07:34 dockes Exp $ + $Id: usermanual.sgml,v 1.12 2006-04-08 14:00:14 dockes Exp $ This document introduces full text search notions @@ -114,24 +114,24 @@ in your document files. The acquisition process is called indexing. - The resulting database can be big (roughly the size of the + The resulting index can be big (roughly the size of the original document set), but it is not a document archive. &RCL; can only display documents that still exist at the place from which they were indexed. (Actually, there is a way to reconstruct a document from the information in the - database, but the result is not nice, as all formatting, + index, but the result is not nice, as all formatting, punctuation and capitalisation are lost). &RCL; stores all internal data in Unicode UTF-8 format, and it can index files with different character sets, encodings, and languages into the same - database. It has input filters for many document types. + index. It has input filters for many document types. Stemming depends on the document language. &RCL; stores the unstemmed versions of terms and uses auxiliary databases for term expansion. It can switch stemming languages, or add a language, without reindexing. Storing documents in different - languages in the same database is possible, and useful in + languages in the same index is possible, and useful in practice, but does introduce possibilities of confusion. &RCL; currently makes no attempt at automatic language recognition. @@ -218,6 +218,37 @@ + + Index storage + + The default location for the index data is the + $HOME/.recoll/xapiandb/ directory. This can + be changed by setting the RECOLL_CONFDIR + environment variable, or by specifying the + dbdir parameter in the configuration file + (see the configuration + section). + + The size of the index is determined by the size of the set + of documents, but the ratio can vary a lot. For a typical mixed + set of documents, the index size will often be close to + the data set size. In specific cases (a set of compressed + mbox files for example), the index can become much bigger than + the documents. It may also be much smaller if the documents + contain a lot of images or other non-indexed data (an extreme + example being a set of mp3 files where only the tags would be + indexed). + + Of course, images, sound and video do not increase the + index size, which means that it will be quite typical nowadays + (2006), that even a big index will be negligible against the + total amount of data on the computer. + + The index data directory only contains data that will be + rebuilt by an index run, so that it can be destroyed safely. + + + The indexing configuration @@ -251,14 +282,14 @@ indexing thread inside the recoll program (use the File menu). - If the recoll program finds no database + If the recoll program finds no index when it starts, it will automatically start indexing (except if cancelled). It is best to avoid interrupting the indexing process, as this may sometimes leave the database in a bad state. This is not a serious problem, as you then just need to clear - everything and restart the indexing: the database files are + everything and restart the indexing: the index files are normally stored in the $HOME/.recoll/xapiandb directory, which you can just delete if needed. Alternatively, you can @@ -442,12 +473,13 @@ File names - All file name elements (the broken up file path) are - entered as terms during indexing, and you can specify them - as ordinary terms in normal search fields. Alternatively, you + File names are added as terms during indexing, and you can + specify them as ordinary terms in normal search fields (&RCL; used + to index all directories in the file path as terms. This has been + abandonned as it did not seem really useful). Alternatively, you can use specific file name search which will - only look for file names and can use - wildcard expansion. + only look for file names and can use wildcard + expansion. Quitting @@ -487,7 +519,7 @@ Html help browser: this - will let you chose your the preferred browser which will be + will let you chose your preferred browser which will be started from the Help menu to read the user manual. You can enter a simple name if the command is in your PATH, or browse for a full pathname. @@ -735,10 +767,8 @@ they define default values for the system. A parallel set of files exists in the .recoll directory in your home (this can be changed with the - RECOLL_CONFDIR environment variable. - The database is also kept in .recoll by - default, (this can be changed by a configuration - parameter). + RECOLL_CONFDIR environment variable. + If the .recoll directory does not exist when recoll or recollindex are started, it @@ -806,11 +836,11 @@ configuration file. It defines things like what to index (top directories and things to ignore), and the default character set to use for document types which do not - specify it internally. + specify it internally. The default configuration will index your home - directory. If this is not appropriate, use - recoll to copy the sample + directory. If this is not appropriate, start + recoll to create a blank configuration, click Cancel, and edit the configuration file before restarting the command. This will start the initial indexing, which may take some time. @@ -865,8 +895,8 @@ logfilename - Where should the messages go. 'stderr' can - be used as a special value. + Where the messages should go. 'stderr' can + be used as a special value, and is the default. @@ -899,9 +929,9 @@ dbdir - The name of the Xapian database - directory. It will be created if needed when the database - is initialized. + The name of the Xapian data directory. It + will be created if needed when the index is + initialized. @@ -958,11 +988,6 @@ executed to determine the mime type (this can be switched off inside the main configuration file). - mimemap also has a list of - extensions which should be ignored totally (to avoid losing - time by executing file - for things that certainly should not be indexed). - The mappings can be specified on a per-subtree basis, which may be useful in some cases. Example: gaim logs have a