diff --git a/src/doc/user/usermanual.sgml b/src/doc/user/usermanual.sgml
index 36b0548a..c11e625e 100644
--- a/src/doc/user/usermanual.sgml
+++ b/src/doc/user/usermanual.sgml
@@ -2324,32 +2324,75 @@ text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/r
handle the protocol.
- The following will just describe the simple filters, if you are
- programmer enough to write one of the other kind, it shouldn't be too
- difficult to make sense of one of the existing modules (ie:
- rclzip).
+ The following will just describe the simple filters. If you can
+ program and want to write one of the other kind, it shouldn't be too
+ difficult to make sense of one of the existing modules. For example,
+ look at rclzip which uses Zip file paths as
+ internal identifiers (ipath), and
+ rclinfo, which uses an integer index.
+
+
+ Simple filters&RCL; simple filters are usually shell-scripts, but this is in
- no way necessary. These programs are extremely simple and most
- of the difficulty lies in extracting the text from the native
- format, not outputting what is expected by &RCL;. Happily
- enough, most document formats already have translators or text
- extractors which handle the difficult part and can be called
- from the filter. In some case the output of the translating
- program is appropriate, and no intermediate shell-script is
- needed.
+ no way necessary. Extracting the text from the native format is the
+ difficult part. Outputting the format expected by &RCL; is
+ trivial. Happily enough, most document formats have translators or
+ text extractors which can be called from the filter. In some cases
+ the output of the translating program is completely appropriate,
+ and no intermediate shell-script is needed.
Filters are called with a single argument which is the
source file name. They should output the result to stdout.
- The RECOLL_FILTER_FORPREVIEW
- environment variable (values yes,
- no) tells the filter if the operation is
- for indexing or previewing. Some filters use this to output a
- slightly different format. This is not essential.
+ When writing a filter, you should decide if it will output
+ plain text or html. Plain text is simpler, but you will not be able
+ to add metadata or vary the output character encoding (this will be
+ defined in a configuration file). Additionally, some formatting may
+ easier to preserve when previewing html. Actually the deciding factor
+ is metadata: &RCL; has a way to
+ extract metadata from the html header and use it for field
+ searches..
+
+ The RECOLL_FILTER_FORPREVIEW environment
+ variable (values yes, no)
+ tells the filter if the operation is for indexing or
+ previewing. Some filters use this to output a slightly different
+ format, for example stripping uninteresting repeated keywords (ie:
+ Subject: for email) when indexing. This is not
+ essential.
+
+ You should look to one of the simple filters, for exemple
+ rclps for a starting point.
+
+ Don't forget to make your filter executable before
+ testing !
+
+
+
+
+ Telling &RCL; about the filter
+
+ There are two elements that link a file to the filter which
+ should process it: the association of file to mime type and the
+ association of a mime type with a filter.
+
+ The association of files to mime types is mostly based on
+ name suffixes. The types are defined inside the
+
+ mimemap file. Example:
+
+
+.doc = application/msword
+
+ If no suffix association is found for the file name, &RCL; will try
+ to execute the file -i command to determine a
+ mime type.The association of file types to filters is performed in
- the mimeconf file. A sample:
+ the
+ mimeconf file. A sample will probably be
+ of better help than a long explanation:
[index]
@@ -2392,14 +2435,9 @@ application/x-chm = execm rclchm
execm keyword.
- The easiest way to write a new filter is probably to start from an
- existing one.
-
- Filters which output text/plain text
- are generally simpler, but they cannot specify the character set
- and other metadata, so they are limited to cases where these
- elements are not needed.
+
+ Filter HTML output