From c8cc64366dac8f15635ab9c8d774ca4f33caf1f2 Mon Sep 17 00:00:00 2001 From: Jean-Francois Dockes Date: Mon, 27 Feb 2017 17:15:28 +0100 Subject: [PATCH] doc --- src/doc/user/Makefile | 6 ++ src/doc/user/recoll.conf.xml | 26 +++++++ src/doc/user/usermanual.html | 131 +++++++++++++++++++++++++++++------ src/doc/user/usermanual.xml | 34 +++++++-- 4 files changed, 172 insertions(+), 25 deletions(-) diff --git a/src/doc/user/Makefile b/src/doc/user/Makefile index 14b6c9c2..a4ce4db3 100644 --- a/src/doc/user/Makefile +++ b/src/doc/user/Makefile @@ -39,5 +39,11 @@ index.html: usermanual.xml usermanual.pdf: usermanual.xml dblatex $< +UTILBUILDS=/home/dockes/projets/builds/medocutils/ +recoll-conf-xml: + $(UTILBUILDS)/confxml --docbook \ + --idprefix=RCL.INSTALL.CONFIG.RECOLLCONF \ + ../../sampleconf/recoll.conf > recoll.conf.xml + clean: rm -f RCL.*.html usermanual.pdf usermanual.html index.html tmpfile.html diff --git a/src/doc/user/recoll.conf.xml b/src/doc/user/recoll.conf.xml index 81bd8b02..e8d379ed 100644 --- a/src/doc/user/recoll.conf.xml +++ b/src/doc/user/recoll.conf.xml @@ -24,6 +24,14 @@ you probably want this indexed. One possible solution is to have ".*" in list, see the "noContentSuffixes" variable for an alternative approach which indexes the file names. Can be redefined for any subtree. + +skippedNames- +List of name endings to remove from the default skippedNames +list. + +skippedNames+ +List of name endings to add to the default skippedNames +list. noContentSuffixes List of name endings (not necessarily dot-separated suffixes) for @@ -35,6 +43,14 @@ recoll.conf allows editing the list through the GUI). This is different from skippedNames because these are name ending matches only (not wildcard patterns), and the file name itself gets indexed normally. This can be redefined for subdirectories. + +noContentSuffixes- +List of name endings to remove from the default noContentSuffixes +list. + +noContentSuffixes+ +List of name endings to add to the default noContentSuffixes +list. skippedPaths Paths we should not go into. Space-separated list of @@ -92,6 +108,16 @@ subtrees. List of excluded MIME types. Lets you exclude some types from indexing. Can be redefined for subtrees. + +nomd5mimetypes +Don't compute md5 for +these types. md5 checksums are used only for deduplicating +results, and can be very expensive to compute on multimedia or other big +files. This list lets you turn off md5 computation for selected types. It +is global (no redefinition for subtrees). At the moment, it only has an +effect for external handlers (exec and execm). The file types can be +specified by listing either MIME types (e.g. audio/mpeg) or handler names +(e.g. rclaudio). compressedfilemaxkbs Size limit for compressed diff --git a/src/doc/user/usermanual.html b/src/doc/user/usermanual.html index c0c11462..64d39e38 100644 --- a/src/doc/user/usermanual.html +++ b/src/doc/user/usermanual.html @@ -20,8 +20,8 @@ alink="#0000FF">
-

Recoll user manual

+

Recoll user manual

@@ -109,13 +109,13 @@ alink="#0000FF"> multiple indexes
2.1.3. Document types
+ "#idp39073168">Document types
2.1.4. Indexing failures
+ "#idp39095632">Indexing failures
2.1.5. Recovery
+ "#idp39102640">Recovery @@ -376,7 +376,7 @@ alink="#0000FF"> handler
4.1.4. Input handler HTML + "#RCL.PROGRAM.FILTERS.HTML">Input handler output
4.1.5.
-

2.1.3. Document types

+

2.1.3. Document types

@@ -1105,8 +1105,8 @@ indexedmimetypes = application/pdf
-

2.1.4. Indexing +

2.1.4. Indexing failures

@@ -1146,8 +1146,8 @@ indexedmimetypes = application/pdf
-

2.1.5. Recovery

+

2.1.5. Recovery

@@ -5987,9 +5987,8 @@ dir:recoll dir:src -dir:utils -dir:common deciding factor is metadata: Recoll has a way to extract metadata - from the HTML header and use it for field - searches..

+ "4.1.4. Input handler output">extract metadata from + the HTML header and use it for field searches..

The RECOLL_FILTER_FORPREVIEW environment @@ -6196,13 +6195,32 @@ application/x-chm = execm rclchm

4.1.4. Input - handler HTML output

+ handler output
-

The output HTML could be very minimal like the - following example:

+

Both the simple and persistent input handlers can + return any MIME type to Recoll, which will further + process the data according to the MIME configuration.

+ +

Most input filters filters produce either text/plain or text/html data. There are exceptions, + for example, filters which process archive file + (zip, tar, etc.) will usually return the + documents as they are found, without processing them + further.

+ +

There is nothing to say about text/plain output, except that its + character encoding should be consistent with what is + specified in the mimeconf + file.

+ +

For filters producing HTML, the output could be very + minimal like the following example:

 <html>
   <head>
@@ -6222,9 +6240,9 @@ application/x-chm = execm rclchm
           "literal">&amp;", "<" should be transformed into
           "&lt;". This is not
-          always properly done by translating programs which output
-          HTML, and of course never by those which output plain
-          text.

+ always properly done by external helper programs which + output HTML, and of course never by those which output + plain text.

When encapsulating plain text in an HTML body, the display of a preview may be improved by enclosing the @@ -6293,6 +6311,17 @@ or described in a further section.

+ +

Persistent filters can use another, probably simpler, + method to produce metadata, by calling the setfield() helper method. This avoids + the necessity to produce HTML, and any issue with HTML + quoting. See, for example, rclaudio in Recoll 1.23 and later for an example + of handler which outputs text/plain and uses setfield() to produce metadata.

@@ -8676,6 +8705,23 @@ thesame = "some string with spaces" names. Can be redefined for any subtree.

+
skippedNames-
+ +
+

List of name endings to remove from the default + skippedNames list.

+
+ +
skippedNames+
+ +
+

List of name endings to add to the default + skippedNames list.

+
+
noContentSuffixes
@@ -8696,6 +8742,25 @@ thesame = "some string with spaces" subdirectories.

+
+ noContentSuffixes-
+ +
+

List of name endings to remove from the default + noContentSuffixes list.

+
+ +
noContentSuffixes+
+ +
+

List of name endings to add to the default + noContentSuffixes list.

+
+
skippedPaths
@@ -8798,6 +8863,23 @@ thesame = "some string with spaces" subtrees.

+
nomd5mimetypes
+ +
+

Don't compute md5 for these types. md5 checksums + are used only for deduplicating results, and can be + very expensive to compute on multimedia or other + big files. This list lets you turn off md5 + computation for selected types. It is global (no + redefinition for subtrees). At the moment, it only + has an effect for external handlers (exec and + execm). The file types can be specified by listing + either MIME types (e.g. audio/mpeg) or handler + names (e.g. rclaudio).

+
+
+

All extension values in mimemap must be entered in lower case. + File names extensions are lower-cased for comparison + during indexing, meaning that an upper case mimemap entry will never be + matched.

+

The mappings can be specified on a per-subtree basis, which may be useful in some cases. Example: okular notes have a - Input handler HTML output + Input handler output - The output HTML could be very minimal like the following - example: + Both the simple and persistent input handlers can return any + MIME type to Recoll, which will further process the data according + to the MIME configuration. + + Most input filters filters produce either + text/plain or text/html + data. There are exceptions, for example, filters which process + archive file (zip, tar, etc.) + will usually return the documents as they are found, without + processing them further. + + There is nothing to say about text/plain + output, except that its character encoding should be consistent + with what is specified in the mimeconf + file. + + For filters producing HTML, the output could be very minimal + like the following example: <html> <head> @@ -4234,7 +4250,7 @@ application/x-chm = execm rclchm "&amp;", "<" should be transformed into "&lt;". This is not always properly - done by translating programs which output HTML, and of + done by external helper programs which output HTML, and of course never by those which output plain text. When encapsulating plain text in an HTML body, @@ -4298,6 +4314,16 @@ or in a further section. + + Persistent filters can use another, probably simpler, + method to produce metadata, by calling the + setfield() helper method. This avoids the + necessity to produce HTML, and any issue with HTML quoting. See, + for example, rclaudio in &RCL; 1.23 and + later for an example of handler which outputs + text/plain and uses + setfield() to produce metadata. +