This commit is contained in:
Jean-Francois Dockes 2014-04-28 17:09:00 +02:00
parent 3de5b5af3c
commit 60110e8b54
3 changed files with 144 additions and 172 deletions

View File

@ -1,5 +1,37 @@
# Wherever docbook.xsl and chunk.xsl live
# Fbsd
#XSLDIR="/usr/local/share/xsl/docbook/"
# Mac
#XSLDIR="/opt/local/share/xsl/docbook-xsl/"
#Linux
XSLDIR="/usr/share/xml/docbook/stylesheet/docbook-xsl/"
# Options common to the single-file and chunked versions
commonoptions=--stringparam section.autolabel 1 \
--stringparam section.autolabel.max.depth 3 \
--stringparam section.label.includes.component.label 1 \
--stringparam autotoc.label.in.hyperlink 0 \
--stringparam abstract.notitle.enabled 1 \
--stringparam html.stylesheet docbook-xsl.css \
--stringparam generate.toc "book toc,title,figure,table,example,equation"
all: usermanual.html index.html usermanual.pdf
usermanual.html: usermanual.xml usermanual.html: usermanual.xml
sh xmlmake.sh xsltproc ${commonoptions} \
-o tmpfile.html "${XSLDIR}/html/docbook.xsl" usermanual.xml
-tidy -indent tmpfile.html > usermanual.html
index.html: usermanual.xml
xsltproc ${commonoptions} \
--stringparam use.id.as.filename 1 \
--stringparam root.filename index \
"${XSLDIR}/html/chunk.xsl" usermanual.xml
usermanual.pdf: usermanual.xml
dblatex usermanual.xml
clean: clean:
rm -f RCL.*.html usermanual.pdf usermanual.html index.html rm -f RCL.*.html usermanual.pdf usermanual.html index.html tmpfile.html

View File

@ -39,9 +39,6 @@
<para>This document introduces full text search notions <para>This document introduces full text search notions
and describes the installation and use of the &RCL; and describes the installation and use of the &RCL;
application. It currently describes &RCL; &RCLVERSION;.</para> application. It currently describes &RCL; &RCLVERSION;.</para>
<!-- <para>[ <ulink url="index.html">Split HTML</ulink> /
<ulink url="usermanual-xml.html">Single HTML</ulink> ]</para>
-->
</abstract> </abstract>
@ -141,7 +138,7 @@
<para>&RCL; stores all internal data in <application>Unicode <para>&RCL; stores all internal data in <application>Unicode
UTF-8</application> format, and it can index files with UTF-8</application> format, and it can index files with
different character sets, encodings, and languages into the same different character sets, encodings, and languages into the same
index. It has input filters for many document types.</para> index. It has can process many document types.</para>
<para>Stemming is the process by which &RCL; reduces words to <para>Stemming is the process by which &RCL; reduces words to
their radicals so that searching does not depend, for example, on a their radicals so that searching does not depend, for example, on a
@ -381,9 +378,9 @@
patterns to the <literal>skippedNames</literal> list, which patterns to the <literal>skippedNames</literal> list, which
can be done from the GUI Index configuration menu. It is can be done from the GUI Index configuration menu. It is
also possible to exclude a mime type independantly of the also possible to exclude a mime type independantly of the
file name by associating it with file name by associating it with the
the <filename>rclnull</filename> filter. This can be done by <filename>rclnull</filename> input handler. This can be done
editing the <link linkend="RCL.INSTALL.CONFIG.MIMECONF"> by editing the <link linkend="RCL.INSTALL.CONFIG.MIMECONF">
<filename>mimeconf</filename> configuration <filename>mimeconf</filename> configuration
file</link>.</para> file</link>.</para>
@ -2463,7 +2460,7 @@ fs.inotify.max_user_watches=32768
and <literal>filename</literal>), so this feature will need and <literal>filename</literal>), so this feature will need
some custom local configuration to be useful. An example some custom local configuration to be useful. An example
candidate would be the <literal>recipient</literal> field candidate would be the <literal>recipient</literal> field
which is generated by the message filters.</para> which is generated by the message input handlers.</para>
<para>The default value for the paragraph format string is: <para>The default value for the paragraph format string is:
<screen><![CDATA[ <screen><![CDATA[
@ -2961,7 +2958,7 @@ dir:recoll dir:src -dir:utils -dir:common
<link linkend="RCL.SEARCH.WILDCARDS"> <link linkend="RCL.SEARCH.WILDCARDS">
More about wildcards</link>.</para> More about wildcards</link>.</para>
<para>The document filters used while indexing have the <para>The document input handlers used while indexing have the
possibility to create other fields with arbitrary names, and possibility to create other fields with arbitrary names, and
aliases may be defined in the configuration, so that the exact aliases may be defined in the configuration, so that the exact
field search possibilities may be different for you if someone field search possibilities may be different for you if someone
@ -3293,7 +3290,7 @@ dir:recoll dir:src -dir:utils -dir:common
<application>Python</application> language.</para> <application>Python</application> language.</para>
<para>Another less radical way to extend the application is to <para>Another less radical way to extend the application is to
write filters for new types of documents.</para> write input handlers for new types of documents.</para>
<para>The processing of metadata attributes for documents <para>The processing of metadata attributes for documents
(<literal>fields</literal>) is highly configurable.</para> (<literal>fields</literal>) is highly configurable.</para>
@ -3301,69 +3298,77 @@ dir:recoll dir:src -dir:utils -dir:common
<sect1 id="RCL.PROGRAM.FILTERS"> <sect1 id="RCL.PROGRAM.FILTERS">
<title>Writing a document filter</title> <title>Writing a document input handler</title>
<para>&RCL; filters cooperate to translate from the multitude <note><title>Terminology</title>The small programs or pieces
of code which handle the processing of the different document
types for &RCL; used to be called <literal>filters</literal>,
which is still reflected in the name of the directory which
holds them and many configuration variables. They were named
this way because one of their primary functions is to filter
out the formatting directives and keep the text
content. However these modules may have other behaviours, and
the term <literal>input handler</literal> is now progressively
substituted in the documentation. <literal>filter</literal> is
still used in many places though.</note>
<para>&RCL; input handlers cooperate to translate from the multitude
of input document formats, simple ones of input document formats, simple ones
as <application>opendocument</application>, as <application>opendocument</application>,
<application>acrobat</application>), or compound ones such <application>acrobat</application>), or compound ones such
as <application>Zip</application> as <application>Zip</application>
or <application>Email</application>, into the final &RCL; or <application>Email</application>, into the final &RCL;
indexing input format, which may indexing input format, which is plain text.
be <literal>text/plain</literal> Most input handlers are executable
or <literal>text/html</literal>. Most filters are executable programs or scripts. A few handlers are coded in C++ and live
programs or scripts. A few filters are coded in C++ and live
inside <command>recollindex</command>. This latter kind will not inside <command>recollindex</command>. This latter kind will not
be described here.</para> be described here.</para>
<para>There are currently (1.18 and since 1.13) two kinds of <para>There are currently (1.18 and since 1.13) two kinds of
external executable filters: external executable input handlers:
<itemizedlist> <itemizedlist>
<listitem><para>Simple filters (<literal>exec</literal> <listitem><para>Simple <literal>exec</literal> handlers
filters) run once and run once and exit. They can be bare programs like
exit. They can be bare programs <command>antiword</command>, or scripts using other
like <application>antiword</application>, or scripts programs. They are very simple to write, because they just
using other programs. They are very simple to write, need to print the converted document to the standard
because they just need to print the converted document output. Their output can be plain text or HTML. HTML is
to the standard output. Their output can usually preferred because it can store metadata fields and
be <literal>text/plain</literal> it allows preserving some of the formatting for the GUI
or <literal>text/html</literal>.</para> preview.</para>
</listitem> </listitem>
<listitem><para>Multiple filters (<literal>execm</literal> <listitem><para>Multiple <literal>execm</literal> handlers
filters), run as long as can process multiple files (sparing the process startup
their master process (<command>recollindex</command>) is time which can be very significant), or multiple documents
active. They can process multiple files (sparing the per file (e.g.: for <application>zip</application> or
process startup time which can be very significant), <application>chm</application> files). They communicate
or multiple documents per file (e.g.: for zip or chm with the indexer through a simple protocol, but are
files). They communicate with the indexer through a nevertheless a bit more complicated than the older
simple protocol, but are nevertheless a bit more kind. Most of new handlers are written in
complicated than the older kind. Most of new <application>Python</application>, using a common module
filters are written to handle the protocol. There is an exception,
in <application>Python</application>, using a common <command>rclimg</command> which is written in Perl. The
module to handle the protocol. There is an subdocuments output by these handlers can be directly
exception, <command>rclimg</command> which is written indexable (text or HTML), or they can be other simple or
in Perl. The subdocuments output by these filters can compound documents that will need to be processed by
be directly indexable (text or HTML), or they can be another handler.</para>
other simple or compound documents that will need to
be processed by another filter.</para>
</listitem> </listitem>
</itemizedlist> </itemizedlist>
</para> </para>
<para>In both cases, filters deal with regular file system <para>In both cases, handlers deal with regular file system
files, and can process either a single document, or a files, and can process either a single document, or a
linear list of documents in each file. &RCL; is responsible linear list of documents in each file. &RCL; is responsible
for performing up to date checks, deal with more complex for performing up to date checks, deal with more complex
embedding and other upper level issues.</para> embedding and other upper level issues.</para>
<para>In the extreme case of a simple filter returning a <para>A simple handler returning a
document in <literal>text/plain</literal> format, no document in <literal>text/plain</literal> format, can transfer
metadata can be transferred from the filter to the no metadata to the indexer. Generic metadata, like document
indexer. Generic metadata, like document size or size or modification date, will be gathered and stored by
modification date, will be gathered and stored by the the indexer.</para>
indexer.</para>
<para>Filters that produce <literal>text/html</literal> <para>Handlers that produce <literal>text/html</literal>
format can return an arbitrary amount of metadata inside HTML format can return an arbitrary amount of metadata inside HTML
<literal>meta</literal> tags. These will be processed <literal>meta</literal> tags. These will be processed
according to the directives found in according to the directives found in
@ -3371,7 +3376,7 @@ dir:recoll dir:src -dir:utils -dir:common
<filename>fields</filename> configuration <filename>fields</filename> configuration
file</link>.</para> file</link>.</para>
<para>The filters that can handle multiple documents per file <para>The handlers that can handle multiple documents per file
return a single piece of data to identify each document inside return a single piece of data to identify each document inside
the file. This piece of data, called the file. This piece of data, called
an <literal>ipath element</literal> will be sent back by an <literal>ipath element</literal> will be sent back by
@ -3380,27 +3385,27 @@ dir:recoll dir:src -dir:utils -dir:common
viewer.</para> viewer.</para>
<para>The following section describes the simple <para>The following section describes the simple
filters, and the next one gives a few explanations about handlers, and the next one gives a few explanations about
the <literal>execm</literal> ones. You could conceivably the <literal>execm</literal> ones. You could conceivably
write a simple filter with only the elements in the write a simple handler with only the elements in the
manual. This will not be the case for the other ones, for manual. This will not be the case for the other ones, for
which you will have to look at the code.</para> which you will have to look at the code.</para>
<sect2 id="RCL.PROGRAM.FILTERS.SIMPLE"> <sect2 id="RCL.PROGRAM.FILTERS.SIMPLE">
<title>Simple filters</title> <title>Simple input handlers</title>
<para>&RCL; simple filters are usually shell-scripts, but this is in <para>&RCL; simple handlers are usually shell-scripts, but this is in
no way necessary. Extracting the text from the native format is the no way necessary. Extracting the text from the native format is the
difficult part. Outputting the format expected by &RCL; is difficult part. Outputting the format expected by &RCL; is
trivial. Happily enough, most document formats have translators or trivial. Happily enough, most document formats have translators or
text extractors which can be called from the filter. In some cases text extractors which can be called from the handler. In some cases
the output of the translating program is completely appropriate, the output of the translating program is completely appropriate,
and no intermediate shell-script is needed.</para> and no intermediate shell-script is needed.</para>
<para>Filters are called with a single argument which is the <para>Input handlers are called with a single argument which is the
source file name. They should output the result to stdout.</para> source file name. They should output the result to stdout.</para>
<para>When writing a filter, you should decide if it will output <para>When writing a handler, you should decide if it will output
plain text or HTML. Plain text is simpler, but you will not be able plain text or HTML. Plain text is simpler, but you will not be able
to add metadata or vary the output character encoding (this will be to add metadata or vary the output character encoding (this will be
defined in a configuration file). Additionally, some formatting may defined in a configuration file). Additionally, some formatting may
@ -3411,25 +3416,25 @@ dir:recoll dir:src -dir:utils -dir:common
<para>The <envar>RECOLL_FILTER_FORPREVIEW</envar> environment <para>The <envar>RECOLL_FILTER_FORPREVIEW</envar> environment
variable (values <literal>yes</literal>, <literal>no</literal>) variable (values <literal>yes</literal>, <literal>no</literal>)
tells the filter if the operation is for indexing or tells the handler if the operation is for indexing or
previewing. Some filters use this to output a slightly different previewing. Some handlers use this to output a slightly different
format, for example stripping uninteresting repeated keywords (ie: format, for example stripping uninteresting repeated keywords (ie:
<literal>Subject:</literal> for email) when indexing. This is not <literal>Subject:</literal> for email) when indexing. This is not
essential.</para> essential.</para>
<para>You should look at one of the simple filters, for example <para>You should look at one of the simple handlers, for example
<command>rclps</command> for a starting point.</para> <command>rclps</command> for a starting point.</para>
<para>Don't forget to make your filter executable before <para>Don't forget to make your handler executable before
testing !</para> testing !</para>
</sect2> </sect2>
<sect2 id="RCL.PROGRAM.FILTERS.MULTIPLE"> <sect2 id="RCL.PROGRAM.FILTERS.MULTIPLE">
<title>"Multiple" filters</title> <title>"Multiple" handlers</title>
<para>If you can program and want to write <para>If you can program and want to write
an <literal>execm</literal> filter, it should not be too an <literal>execm</literal> handler, it should not be too
difficult to make sense of one of the existing modules. For difficult to make sense of one of the existing modules. For
example, look at <command>rclzip</command> which uses Zip example, look at <command>rclzip</command> which uses Zip
file paths as identifiers (<literal>ipath</literal>), file paths as identifiers (<literal>ipath</literal>),
@ -3438,7 +3443,7 @@ dir:recoll dir:src -dir:utils -dir:common
the <filename>internfile/mh_execm.h</filename> file and the <filename>internfile/mh_execm.h</filename> file and
possibly at the corresponding module.</para> possibly at the corresponding module.</para>
<para><literal>execm</literal> filters sometimes need to make <para><literal>execm</literal> handlers sometimes need to make
a choice for the nature of the <literal>ipath</literal> a choice for the nature of the <literal>ipath</literal>
elements that they use in communication with the elements that they use in communication with the
indexer. Here are a few guidelines: indexer. Here are a few guidelines:
@ -3453,16 +3458,16 @@ dir:recoll dir:src -dir:utils -dir:common
separator to store a complex path internally (for separator to store a complex path internally (for
deeper embedding). Colons inside deeper embedding). Colons inside
the <literal>ipath</literal> elements output by a the <literal>ipath</literal> elements output by a
filter will be escaped, but would be a bad choice as a handler will be escaped, but would be a bad choice as a
filter-specific separator (mostly, again, for handler-specific separator (mostly, again, for
debugging issues).</para></listitem> debugging issues).</para></listitem>
</itemizedlist> </itemizedlist>
In any case, the main goal is that it should In any case, the main goal is that it should
be easy for the filter to extract the target document, given be easy for the handler to extract the target document, given
the file name and the <literal>ipath</literal> the file name and the <literal>ipath</literal>
element.</para> element.</para>
<para><literal>execm</literal> filters will also produce <para><literal>execm</literal> handlers will also produce
a document with a null <literal>ipath</literal> a document with a null <literal>ipath</literal>
element. Depending on the type of document, this may have element. Depending on the type of document, this may have
some associated data (e.g. the body of an email message), or some associated data (e.g. the body of an email message), or
@ -3472,11 +3477,11 @@ dir:recoll dir:src -dir:utils -dir:common
</sect2> </sect2>
<sect2 id="RCL.PROGRAM.FILTERS.ASSOCIATION"> <sect2 id="RCL.PROGRAM.FILTERS.ASSOCIATION">
<title>Telling &RCL; about the filter</title> <title>Telling &RCL; about the handler</title>
<para>There are two elements that link a file to the filter which <para>There are two elements that link a file to the handler which
should process it: the association of file to mime type and the should process it: the association of file to mime type and the
association of a mime type with a filter.</para> association of a mime type with a handler.</para>
<para>The association of files to mime types is mostly based on <para>The association of files to mime types is mostly based on
name suffixes. The types are defined inside the name suffixes. The types are defined inside the
@ -3490,7 +3495,7 @@ dir:recoll dir:src -dir:utils -dir:common
to execute the <command>file -i</command> command to determine a to execute the <command>file -i</command> command to determine a
mime type.</para> mime type.</para>
<para>The association of file types to filters is performed in <para>The association of file types to handlers is performed in
the <link linkend="RCL.INSTALL.CONFIG.MIMECONF"> the <link linkend="RCL.INSTALL.CONFIG.MIMECONF">
<filename>mimeconf</filename> file</link>. A sample will probably be <filename>mimeconf</filename> file</link>. A sample will probably be
of better help than a long explanation:</para> of better help than a long explanation:</para>
@ -3532,7 +3537,7 @@ application/x-chm = execm rclchm
<command>unrtf</command> in the HTML header section.</para> <command>unrtf</command> in the HTML header section.</para>
</listitem> </listitem>
<listitem><para><literal>application/x-chm</literal> is processed <listitem><para><literal>application/x-chm</literal> is processed
by a persistant filter. This is determined by the by a persistant handler. This is determined by the
<literal>execm</literal> keyword.</para> <literal>execm</literal> keyword.</para>
</listitem> </listitem>
</itemizedlist> </itemizedlist>
@ -3541,7 +3546,7 @@ application/x-chm = execm rclchm
</sect2> </sect2>
<sect2 id="RCL.PROGRAM.FILTERS.HTML"> <sect2 id="RCL.PROGRAM.FILTERS.HTML">
<title>Filter HTML output</title> <title>Input handler HTML output</title>
<para>The output HTML could be very minimal like the following <para>The output HTML could be very minimal like the following
example: example:
@ -3607,7 +3612,7 @@ or
</programlisting> </programlisting>
</para> </para>
<para>Filters also have the possibility to "invent" field <para>Input handlers also have the possibility to "invent" field
names. This should also be output as meta tags:</para> names. This should also be output as meta tags:</para>
<programlisting> <programlisting>
@ -3634,10 +3639,10 @@ or
<title>Page numbers</title> <title>Page numbers</title>
<para>The indexer will interpret <literal>^L</literal> characters <para>The indexer will interpret <literal>^L</literal> characters
in the filter output as indicating page breaks, and will record in the handler output as indicating page breaks, and will record
them. At query time, this allows starting a viewer on the right them. At query time, this allows starting a viewer on the right
page for a hit or a snippet. Currently, only the PDF, Postscript page for a hit or a snippet. Currently, only the PDF, Postscript
and DVI filters generate page breaks.</para> and DVI handlers generate page breaks.</para>
</sect2> </sect2>
@ -3651,7 +3656,7 @@ or
<literal>author</literal>, <literal>abstract</literal>.</para> <literal>author</literal>, <literal>abstract</literal>.</para>
<para>The field values for documents can appear in several ways <para>The field values for documents can appear in several ways
during indexing: either output by filters during indexing: either output by input handlers
as <literal>meta</literal> fields in the HTML header section, or as <literal>meta</literal> fields in the HTML header section, or
extracted from file extended attributes, or added as attributes extracted from file extended attributes, or added as attributes
of the <literal>Doc</literal> object when using the API, or of the <literal>Doc</literal> object when using the API, or
@ -3661,7 +3666,7 @@ or
specific field.</para> specific field.</para>
<para>&RCL; defines a number of default fields. Additional <para>&RCL; defines a number of default fields. Additional
ones can be output by filters, and described in the ones can be output by handlers, and described in the
<filename>fields</filename> configuration file.</para> <filename>fields</filename> configuration file.</para>
<para>Fields can be:</para> <para>Fields can be:</para>
@ -3903,7 +3908,7 @@ or
<title>The Db class</title> <title>The Db class</title>
<para>A Db object is created by <para>A Db object is created by
a <literal>connect()</literal> function and holds a a <literal>connect()</literal> call and holds a
connection to a Recoll index.</para> connection to a Recoll index.</para>
<variablelist> <variablelist>
<title>Methods</title> <title>Methods</title>
@ -4381,7 +4386,7 @@ except:
directory.</para> directory.</para>
<para>A list of common file types which need external <para>A list of common file types which need external
commands follows. Many of the filters need the commands follows. Many of the handlers need the
<command>iconv</command> command, which is not always listed as a <command>iconv</command> command, which is not always listed as a
dependancy.</para> dependancy.</para>
@ -4398,7 +4403,7 @@ except:
type is important to you.</para> type is important to you.</para>
<para>As of &RCL; release 1.14, a number of XML-based formats that <para>As of &RCL; release 1.14, a number of XML-based formats that
were handled by ad hoc filter code now use the were handled by ad hoc handler code now use the
<command>xsltproc</command> command, which usually comes with <command>xsltproc</command> command, which usually comes with
<application>libxslt</application>. These are: abiword, fb2 <application>libxslt</application>. These are: abiword, fb2
(ebooks), kword, openoffice, svg.</para> (ebooks), kword, openoffice, svg.</para>
@ -4425,8 +4430,8 @@ except:
be used as a fallback for some files which be used as a fallback for some files which
<command>antiword</command> does not handle.</para></listitem> <command>antiword</command> does not handle.</para></listitem>
<listitem><para>MS Excel and PowerPoint need <command> <listitem><para>MS Excel and PowerPoint are processed by
catdoc</command>.</para></listitem> internal <command>Python</command> handlers.</para></listitem>
<listitem><para>MS Open XML (docx) needs <command> <listitem><para>MS Open XML (docx) needs <command>
xsltproc</command>.</para></listitem> xsltproc</command>.</para></listitem>
@ -4451,15 +4456,10 @@ except:
<command>djvused</command> from the <command>djvused</command> from the
<application>DjVuLibre</application> package.</para></listitem> <application>DjVuLibre</application> package.</para></listitem>
<listitem><para>Audio files: &RCL; releases before 1.13 <listitem><para>Audio files: &RCL; releases 1.14 and later use
used the <command>id3info</command> command from the <application> a single <application>Python</application> handler based
id3lib</application> package to extract mp3 tag information, on <application>mutagen</application> for all audio file
<command>metaflac</command> (standard flac tools) for flac files, types.</para>
and <command>ogginfo</command> (vorbis tools) for ogg
files. Releases 1.14 and later use a single
<application>Python</application> filter based
on <application>mutagen</application> for all audio file
types.</para>
</listitem> </listitem>
<listitem><para>Pictures: &RCL; uses the <listitem><para>Pictures: &RCL; uses the
@ -4471,7 +4471,7 @@ except:
store personal tags or textual descriptions inside the image store personal tags or textual descriptions inside the image
files.</para></listitem> files.</para></listitem>
<listitem><para>chm: files in microsoft help format need Python and <listitem><para>chm: files in Microsoft help format need Python and
the <application>pychm</application> module (which needs the <application>pychm</application> module (which needs
<application>chmlib</application>).</para></listitem> <application>chmlib</application>).</para></listitem>
@ -4498,15 +4498,15 @@ except:
<listitem><para>Konqueror webarchive format with Python (uses the <listitem><para>Konqueror webarchive format with Python (uses the
Tarfile module).</para></listitem> Tarfile module).</para></listitem>
<listitem><para>mimehtml web archive format (support based on the email <listitem><para>Mimehtml web archive format (support based on
filter, which introduces some mild weirdness, but still the email handler, which introduces some mild weirdness, but
usable).</para></listitem> still usable).</para></listitem>
</itemizedlist> </itemizedlist>
<para>Text, HTML, email folders, and Scribus files are <para>Text, HTML, email folders, and Scribus files are
processed internally. <application>Lyx</application> is used to processed internally. <application>Lyx</application> is used to
index Lyx files. Many filters need <command>iconv</command> and the index Lyx files. Many handlers need <command>iconv</command> and the
standard <command>sed</command> and <command>awk</command>. standard <command>sed</command> and <command>awk</command>.
</para> </para>
@ -4994,10 +4994,10 @@ skippedPaths = ~/somedir/*.txt
<listitem><para>A space-separated list of patterns for <listitem><para>A space-separated list of patterns for
names of files or directories that should be ignored names of files or directories that should be ignored
inside zip archives. This is used directly by the zip inside zip archives. This is used directly by the zip
filter, and has a function similar to skippedNames, but handler, and has a function similar to skippedNames, but
works independantly. Can be redefined for filesystem works independantly. Can be redefined for filesystem
subdirectories. For versions up to 1.19, you will need subdirectories. For versions up to 1.19, you will need
to update the Zip filter and install a supplementary to update the Zip handler and install a supplementary
Python module. The details are Python module. The details are
described <ulink url="https://bitbucket.org/medoc/recoll/wiki/Filtering%20out%20Zip%20archive%20members">on described <ulink url="https://bitbucket.org/medoc/recoll/wiki/Filtering%20out%20Zip%20archive%20members">on
the &RCL; wiki</ulink>. the &RCL; wiki</ulink>.
@ -5552,13 +5552,13 @@ mondelaypatterns = *.log:20 "this one has spaces*:10"
</varlistentry> </varlistentry>
<varlistentry><term><varname>filtermaxseconds</varname></term> <varlistentry><term><varname>filtermaxseconds</varname></term>
<listitem><para>Maximum filter execution time, after which it <listitem><para>Maximum handler execution time, after which it
is aborted. Some postscript programs just loop...</para> is aborted. Some postscript programs just loop...</para>
</listitem> </listitem>
</varlistentry> </varlistentry>
<varlistentry><term><varname>filtersdir</varname></term> <varlistentry><term><varname>filtersdir</varname></term>
<listitem><para>A directory to search for the external <listitem><para>A directory to search for the external
filter scripts used to index some types of files. The input handler scripts used to index some types of files. The
value should not be changed, except if you want to modify value should not be changed, except if you want to modify
one of the default scripts. The value can be redefined for one of the default scripts. The value can be redefined for
any sub-directory. </para> any sub-directory. </para>
@ -5678,9 +5678,9 @@ mondelaypatterns = *.log:20 "this one has spaces*:10"
</listitem> </listitem>
</varlistentry> </varlistentry>
<varlistentry> <varlistentry>
<term>filter-specific sections</term> <term>handler-specific sections</term>
<listitem><para>Some filters may need specific <listitem><para>Some input handlers may need specific
configuration for handling fields. Only the email message filter configuration for handling fields. Only the email message handler
currently has such a section (named currently has such a section (named
<literal>[mail]</literal>). It allows indexing arbitrary email <literal>[mail]</literal>). It allows indexing arbitrary email
headers in addition to the ones indexed by default. Other such headers in addition to the ones indexed by default. Other such
@ -5694,7 +5694,7 @@ mondelaypatterns = *.log:20 "this one has spaces*:10"
<filename>fields</filename> <filename>fields</filename>
file. This would extract a specific email header and file. This would extract a specific email header and
use it as a searchable field, with data displayable inside result use it as a searchable field, with data displayable inside result
lists. (Side note: as the email filter does no decoding on the values, lists. (Side note: as the email handler does no decoding on the values,
only plain ascii headers can be indexed, and only the only plain ascii headers can be indexed, and only the
first occurrence will be used for headers that occur several times). first occurrence will be used for headers that occur several times).
@ -6007,7 +6007,7 @@ application/x-blobapp = exec rclblob
</listitem> </listitem>
</itemizedlist> </itemizedlist>
<para>The <replaceable>rclblob</replaceable> filter should <para>The <replaceable>rclblob</replaceable> handler should
be an executable program or script which exists inside be an executable program or script which exists inside
<filename>/usr/[local/]share/recoll/filters</filename>. It <filename>/usr/[local/]share/recoll/filters</filename>. It
will be given a file name as argument and should output the will be given a file name as argument and should output the
@ -6015,7 +6015,7 @@ application/x-blobapp = exec rclblob
<para>The <link linkend="RCL.PROGRAM.FILTERS">filter <para>The <link linkend="RCL.PROGRAM.FILTERS">filter
programming</link> section describes in more detail how programming</link> section describes in more detail how
to write a filter.</para> to write an input handler.</para>
</sect3> </sect3>

View File

@ -1,60 +0,0 @@
#!/bin/sh
# A script to produce the Recoll manual with an xml toolchain.
# Tools used:
# - xsltproc
# - The docbook-xsl styleets
# - dblatex for producing the PDF.
#
# Limitations:
# - Does not produce the links to the whole/chunked versions at the top
# of the document
# - The anchor names from the source text are converted to uppercase
# by the sgml toolchain. This does not happen with the xml
# toolchain, which means that external links like
# usermanual.html#RCL.CONFIG.INDEXING won't work because fragments
# are case-sensitive. This has been solved by converting all ids
# inside the source file to upper-case. DON'T REINTRODUCE
# lower-case IDS
# Wherever docbook.xsl and chunk.xsl live
# Fbsd
#XSLDIR="/usr/local/share/xsl/docbook/"
# Mac
#XSLDIR="/opt/local/share/xsl/docbook-xsl/"
#Linux
XSLDIR="/usr/share/xml/docbook/stylesheet/docbook-xsl/"
dochunky=1
test $# -eq 1 && dochunky=0
# Options common to the single-file and chunked versions
commonoptions="--stringparam section.autolabel 1 \
--stringparam section.autolabel.max.depth 3 \
--stringparam section.label.includes.component.label 1 \
--stringparam autotoc.label.in.hyperlink 0 \
--stringparam abstract.notitle.enabled 1 \
--stringparam html.stylesheet docbook-xsl.css \
--stringparam generate.toc \"book toc,title,figure,table,example,equation\" \
"
# Do the chunky thing
if test $dochunky -ne 0 ; then
eval xsltproc $commonoptions \
--stringparam use.id.as.filename 1 \
--stringparam root.filename index \
"$XSLDIR/html/chunk.xsl" \
usermanual.xml
fi
# Produce the single file version
eval xsltproc $commonoptions \
-o usermanual.html \
"$XSLDIR/html/docbook.xsl" \
usermanual.xml
tidy -indent usermanual.html > tmpfile
mv -f tmpfile usermanual.html
# And the pdf with dblatex
dblatex usermanual.xml