doc
This commit is contained in:
parent
3de5b5af3c
commit
60110e8b54
@ -1,5 +1,37 @@
|
|||||||
|
# Wherever docbook.xsl and chunk.xsl live
|
||||||
|
# Fbsd
|
||||||
|
#XSLDIR="/usr/local/share/xsl/docbook/"
|
||||||
|
# Mac
|
||||||
|
#XSLDIR="/opt/local/share/xsl/docbook-xsl/"
|
||||||
|
#Linux
|
||||||
|
XSLDIR="/usr/share/xml/docbook/stylesheet/docbook-xsl/"
|
||||||
|
|
||||||
|
|
||||||
|
# Options common to the single-file and chunked versions
|
||||||
|
commonoptions=--stringparam section.autolabel 1 \
|
||||||
|
--stringparam section.autolabel.max.depth 3 \
|
||||||
|
--stringparam section.label.includes.component.label 1 \
|
||||||
|
--stringparam autotoc.label.in.hyperlink 0 \
|
||||||
|
--stringparam abstract.notitle.enabled 1 \
|
||||||
|
--stringparam html.stylesheet docbook-xsl.css \
|
||||||
|
--stringparam generate.toc "book toc,title,figure,table,example,equation"
|
||||||
|
|
||||||
|
|
||||||
|
all: usermanual.html index.html usermanual.pdf
|
||||||
|
|
||||||
usermanual.html: usermanual.xml
|
usermanual.html: usermanual.xml
|
||||||
sh xmlmake.sh
|
xsltproc ${commonoptions} \
|
||||||
|
-o tmpfile.html "${XSLDIR}/html/docbook.xsl" usermanual.xml
|
||||||
|
-tidy -indent tmpfile.html > usermanual.html
|
||||||
|
|
||||||
|
index.html: usermanual.xml
|
||||||
|
xsltproc ${commonoptions} \
|
||||||
|
--stringparam use.id.as.filename 1 \
|
||||||
|
--stringparam root.filename index \
|
||||||
|
"${XSLDIR}/html/chunk.xsl" usermanual.xml
|
||||||
|
|
||||||
|
usermanual.pdf: usermanual.xml
|
||||||
|
dblatex usermanual.xml
|
||||||
|
|
||||||
clean:
|
clean:
|
||||||
rm -f RCL.*.html usermanual.pdf usermanual.html index.html
|
rm -f RCL.*.html usermanual.pdf usermanual.html index.html tmpfile.html
|
||||||
|
|||||||
@ -39,9 +39,6 @@
|
|||||||
<para>This document introduces full text search notions
|
<para>This document introduces full text search notions
|
||||||
and describes the installation and use of the &RCL;
|
and describes the installation and use of the &RCL;
|
||||||
application. It currently describes &RCL; &RCLVERSION;.</para>
|
application. It currently describes &RCL; &RCLVERSION;.</para>
|
||||||
<!-- <para>[ <ulink url="index.html">Split HTML</ulink> /
|
|
||||||
<ulink url="usermanual-xml.html">Single HTML</ulink> ]</para>
|
|
||||||
-->
|
|
||||||
</abstract>
|
</abstract>
|
||||||
|
|
||||||
|
|
||||||
@ -141,7 +138,7 @@
|
|||||||
<para>&RCL; stores all internal data in <application>Unicode
|
<para>&RCL; stores all internal data in <application>Unicode
|
||||||
UTF-8</application> format, and it can index files with
|
UTF-8</application> format, and it can index files with
|
||||||
different character sets, encodings, and languages into the same
|
different character sets, encodings, and languages into the same
|
||||||
index. It has input filters for many document types.</para>
|
index. It has can process many document types.</para>
|
||||||
|
|
||||||
<para>Stemming is the process by which &RCL; reduces words to
|
<para>Stemming is the process by which &RCL; reduces words to
|
||||||
their radicals so that searching does not depend, for example, on a
|
their radicals so that searching does not depend, for example, on a
|
||||||
@ -381,9 +378,9 @@
|
|||||||
patterns to the <literal>skippedNames</literal> list, which
|
patterns to the <literal>skippedNames</literal> list, which
|
||||||
can be done from the GUI Index configuration menu. It is
|
can be done from the GUI Index configuration menu. It is
|
||||||
also possible to exclude a mime type independantly of the
|
also possible to exclude a mime type independantly of the
|
||||||
file name by associating it with
|
file name by associating it with the
|
||||||
the <filename>rclnull</filename> filter. This can be done by
|
<filename>rclnull</filename> input handler. This can be done
|
||||||
editing the <link linkend="RCL.INSTALL.CONFIG.MIMECONF">
|
by editing the <link linkend="RCL.INSTALL.CONFIG.MIMECONF">
|
||||||
<filename>mimeconf</filename> configuration
|
<filename>mimeconf</filename> configuration
|
||||||
file</link>.</para>
|
file</link>.</para>
|
||||||
|
|
||||||
@ -2463,7 +2460,7 @@ fs.inotify.max_user_watches=32768
|
|||||||
and <literal>filename</literal>), so this feature will need
|
and <literal>filename</literal>), so this feature will need
|
||||||
some custom local configuration to be useful. An example
|
some custom local configuration to be useful. An example
|
||||||
candidate would be the <literal>recipient</literal> field
|
candidate would be the <literal>recipient</literal> field
|
||||||
which is generated by the message filters.</para>
|
which is generated by the message input handlers.</para>
|
||||||
|
|
||||||
<para>The default value for the paragraph format string is:
|
<para>The default value for the paragraph format string is:
|
||||||
<screen><![CDATA[
|
<screen><![CDATA[
|
||||||
@ -2961,7 +2958,7 @@ dir:recoll dir:src -dir:utils -dir:common
|
|||||||
<link linkend="RCL.SEARCH.WILDCARDS">
|
<link linkend="RCL.SEARCH.WILDCARDS">
|
||||||
More about wildcards</link>.</para>
|
More about wildcards</link>.</para>
|
||||||
|
|
||||||
<para>The document filters used while indexing have the
|
<para>The document input handlers used while indexing have the
|
||||||
possibility to create other fields with arbitrary names, and
|
possibility to create other fields with arbitrary names, and
|
||||||
aliases may be defined in the configuration, so that the exact
|
aliases may be defined in the configuration, so that the exact
|
||||||
field search possibilities may be different for you if someone
|
field search possibilities may be different for you if someone
|
||||||
@ -3293,7 +3290,7 @@ dir:recoll dir:src -dir:utils -dir:common
|
|||||||
<application>Python</application> language.</para>
|
<application>Python</application> language.</para>
|
||||||
|
|
||||||
<para>Another less radical way to extend the application is to
|
<para>Another less radical way to extend the application is to
|
||||||
write filters for new types of documents.</para>
|
write input handlers for new types of documents.</para>
|
||||||
|
|
||||||
<para>The processing of metadata attributes for documents
|
<para>The processing of metadata attributes for documents
|
||||||
(<literal>fields</literal>) is highly configurable.</para>
|
(<literal>fields</literal>) is highly configurable.</para>
|
||||||
@ -3301,69 +3298,77 @@ dir:recoll dir:src -dir:utils -dir:common
|
|||||||
|
|
||||||
|
|
||||||
<sect1 id="RCL.PROGRAM.FILTERS">
|
<sect1 id="RCL.PROGRAM.FILTERS">
|
||||||
<title>Writing a document filter</title>
|
<title>Writing a document input handler</title>
|
||||||
|
|
||||||
|
<note><title>Terminology</title>The small programs or pieces
|
||||||
|
of code which handle the processing of the different document
|
||||||
|
types for &RCL; used to be called <literal>filters</literal>,
|
||||||
|
which is still reflected in the name of the directory which
|
||||||
|
holds them and many configuration variables. They were named
|
||||||
|
this way because one of their primary functions is to filter
|
||||||
|
out the formatting directives and keep the text
|
||||||
|
content. However these modules may have other behaviours, and
|
||||||
|
the term <literal>input handler</literal> is now progressively
|
||||||
|
substituted in the documentation. <literal>filter</literal> is
|
||||||
|
still used in many places though.</note>
|
||||||
|
|
||||||
<para>&RCL; filters cooperate to translate from the multitude
|
<para>&RCL; input handlers cooperate to translate from the multitude
|
||||||
of input document formats, simple ones
|
of input document formats, simple ones
|
||||||
as <application>opendocument</application>,
|
as <application>opendocument</application>,
|
||||||
<application>acrobat</application>), or compound ones such
|
<application>acrobat</application>), or compound ones such
|
||||||
as <application>Zip</application>
|
as <application>Zip</application>
|
||||||
or <application>Email</application>, into the final &RCL;
|
or <application>Email</application>, into the final &RCL;
|
||||||
indexing input format, which may
|
indexing input format, which is plain text.
|
||||||
be <literal>text/plain</literal>
|
Most input handlers are executable
|
||||||
or <literal>text/html</literal>. Most filters are executable
|
programs or scripts. A few handlers are coded in C++ and live
|
||||||
programs or scripts. A few filters are coded in C++ and live
|
|
||||||
inside <command>recollindex</command>. This latter kind will not
|
inside <command>recollindex</command>. This latter kind will not
|
||||||
be described here.</para>
|
be described here.</para>
|
||||||
|
|
||||||
<para>There are currently (1.18 and since 1.13) two kinds of
|
<para>There are currently (1.18 and since 1.13) two kinds of
|
||||||
external executable filters:
|
external executable input handlers:
|
||||||
<itemizedlist>
|
<itemizedlist>
|
||||||
<listitem><para>Simple filters (<literal>exec</literal>
|
<listitem><para>Simple <literal>exec</literal> handlers
|
||||||
filters) run once and
|
run once and exit. They can be bare programs like
|
||||||
exit. They can be bare programs
|
<command>antiword</command>, or scripts using other
|
||||||
like <application>antiword</application>, or scripts
|
programs. They are very simple to write, because they just
|
||||||
using other programs. They are very simple to write,
|
need to print the converted document to the standard
|
||||||
because they just need to print the converted document
|
output. Their output can be plain text or HTML. HTML is
|
||||||
to the standard output. Their output can
|
usually preferred because it can store metadata fields and
|
||||||
be <literal>text/plain</literal>
|
it allows preserving some of the formatting for the GUI
|
||||||
or <literal>text/html</literal>.</para>
|
preview.</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
<listitem><para>Multiple filters (<literal>execm</literal>
|
<listitem><para>Multiple <literal>execm</literal> handlers
|
||||||
filters), run as long as
|
can process multiple files (sparing the process startup
|
||||||
their master process (<command>recollindex</command>) is
|
time which can be very significant), or multiple documents
|
||||||
active. They can process multiple files (sparing the
|
per file (e.g.: for <application>zip</application> or
|
||||||
process startup time which can be very significant),
|
<application>chm</application> files). They communicate
|
||||||
or multiple documents per file (e.g.: for zip or chm
|
with the indexer through a simple protocol, but are
|
||||||
files). They communicate with the indexer through a
|
nevertheless a bit more complicated than the older
|
||||||
simple protocol, but are nevertheless a bit more
|
kind. Most of new handlers are written in
|
||||||
complicated than the older kind. Most of new
|
<application>Python</application>, using a common module
|
||||||
filters are written
|
to handle the protocol. There is an exception,
|
||||||
in <application>Python</application>, using a common
|
<command>rclimg</command> which is written in Perl. The
|
||||||
module to handle the protocol. There is an
|
subdocuments output by these handlers can be directly
|
||||||
exception, <command>rclimg</command> which is written
|
indexable (text or HTML), or they can be other simple or
|
||||||
in Perl. The subdocuments output by these filters can
|
compound documents that will need to be processed by
|
||||||
be directly indexable (text or HTML), or they can be
|
another handler.</para>
|
||||||
other simple or compound documents that will need to
|
|
||||||
be processed by another filter.</para>
|
|
||||||
</listitem>
|
</listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para>In both cases, filters deal with regular file system
|
<para>In both cases, handlers deal with regular file system
|
||||||
files, and can process either a single document, or a
|
files, and can process either a single document, or a
|
||||||
linear list of documents in each file. &RCL; is responsible
|
linear list of documents in each file. &RCL; is responsible
|
||||||
for performing up to date checks, deal with more complex
|
for performing up to date checks, deal with more complex
|
||||||
embedding and other upper level issues.</para>
|
embedding and other upper level issues.</para>
|
||||||
|
|
||||||
<para>In the extreme case of a simple filter returning a
|
<para>A simple handler returning a
|
||||||
document in <literal>text/plain</literal> format, no
|
document in <literal>text/plain</literal> format, can transfer
|
||||||
metadata can be transferred from the filter to the
|
no metadata to the indexer. Generic metadata, like document
|
||||||
indexer. Generic metadata, like document size or
|
size or modification date, will be gathered and stored by
|
||||||
modification date, will be gathered and stored by the
|
the indexer.</para>
|
||||||
indexer.</para>
|
|
||||||
|
|
||||||
<para>Filters that produce <literal>text/html</literal>
|
<para>Handlers that produce <literal>text/html</literal>
|
||||||
format can return an arbitrary amount of metadata inside HTML
|
format can return an arbitrary amount of metadata inside HTML
|
||||||
<literal>meta</literal> tags. These will be processed
|
<literal>meta</literal> tags. These will be processed
|
||||||
according to the directives found in
|
according to the directives found in
|
||||||
@ -3371,7 +3376,7 @@ dir:recoll dir:src -dir:utils -dir:common
|
|||||||
<filename>fields</filename> configuration
|
<filename>fields</filename> configuration
|
||||||
file</link>.</para>
|
file</link>.</para>
|
||||||
|
|
||||||
<para>The filters that can handle multiple documents per file
|
<para>The handlers that can handle multiple documents per file
|
||||||
return a single piece of data to identify each document inside
|
return a single piece of data to identify each document inside
|
||||||
the file. This piece of data, called
|
the file. This piece of data, called
|
||||||
an <literal>ipath element</literal> will be sent back by
|
an <literal>ipath element</literal> will be sent back by
|
||||||
@ -3380,27 +3385,27 @@ dir:recoll dir:src -dir:utils -dir:common
|
|||||||
viewer.</para>
|
viewer.</para>
|
||||||
|
|
||||||
<para>The following section describes the simple
|
<para>The following section describes the simple
|
||||||
filters, and the next one gives a few explanations about
|
handlers, and the next one gives a few explanations about
|
||||||
the <literal>execm</literal> ones. You could conceivably
|
the <literal>execm</literal> ones. You could conceivably
|
||||||
write a simple filter with only the elements in the
|
write a simple handler with only the elements in the
|
||||||
manual. This will not be the case for the other ones, for
|
manual. This will not be the case for the other ones, for
|
||||||
which you will have to look at the code.</para>
|
which you will have to look at the code.</para>
|
||||||
|
|
||||||
<sect2 id="RCL.PROGRAM.FILTERS.SIMPLE">
|
<sect2 id="RCL.PROGRAM.FILTERS.SIMPLE">
|
||||||
<title>Simple filters</title>
|
<title>Simple input handlers</title>
|
||||||
|
|
||||||
<para>&RCL; simple filters are usually shell-scripts, but this is in
|
<para>&RCL; simple handlers are usually shell-scripts, but this is in
|
||||||
no way necessary. Extracting the text from the native format is the
|
no way necessary. Extracting the text from the native format is the
|
||||||
difficult part. Outputting the format expected by &RCL; is
|
difficult part. Outputting the format expected by &RCL; is
|
||||||
trivial. Happily enough, most document formats have translators or
|
trivial. Happily enough, most document formats have translators or
|
||||||
text extractors which can be called from the filter. In some cases
|
text extractors which can be called from the handler. In some cases
|
||||||
the output of the translating program is completely appropriate,
|
the output of the translating program is completely appropriate,
|
||||||
and no intermediate shell-script is needed.</para>
|
and no intermediate shell-script is needed.</para>
|
||||||
|
|
||||||
<para>Filters are called with a single argument which is the
|
<para>Input handlers are called with a single argument which is the
|
||||||
source file name. They should output the result to stdout.</para>
|
source file name. They should output the result to stdout.</para>
|
||||||
|
|
||||||
<para>When writing a filter, you should decide if it will output
|
<para>When writing a handler, you should decide if it will output
|
||||||
plain text or HTML. Plain text is simpler, but you will not be able
|
plain text or HTML. Plain text is simpler, but you will not be able
|
||||||
to add metadata or vary the output character encoding (this will be
|
to add metadata or vary the output character encoding (this will be
|
||||||
defined in a configuration file). Additionally, some formatting may
|
defined in a configuration file). Additionally, some formatting may
|
||||||
@ -3411,25 +3416,25 @@ dir:recoll dir:src -dir:utils -dir:common
|
|||||||
|
|
||||||
<para>The <envar>RECOLL_FILTER_FORPREVIEW</envar> environment
|
<para>The <envar>RECOLL_FILTER_FORPREVIEW</envar> environment
|
||||||
variable (values <literal>yes</literal>, <literal>no</literal>)
|
variable (values <literal>yes</literal>, <literal>no</literal>)
|
||||||
tells the filter if the operation is for indexing or
|
tells the handler if the operation is for indexing or
|
||||||
previewing. Some filters use this to output a slightly different
|
previewing. Some handlers use this to output a slightly different
|
||||||
format, for example stripping uninteresting repeated keywords (ie:
|
format, for example stripping uninteresting repeated keywords (ie:
|
||||||
<literal>Subject:</literal> for email) when indexing. This is not
|
<literal>Subject:</literal> for email) when indexing. This is not
|
||||||
essential.</para>
|
essential.</para>
|
||||||
|
|
||||||
<para>You should look at one of the simple filters, for example
|
<para>You should look at one of the simple handlers, for example
|
||||||
<command>rclps</command> for a starting point.</para>
|
<command>rclps</command> for a starting point.</para>
|
||||||
|
|
||||||
<para>Don't forget to make your filter executable before
|
<para>Don't forget to make your handler executable before
|
||||||
testing !</para>
|
testing !</para>
|
||||||
|
|
||||||
</sect2>
|
</sect2>
|
||||||
|
|
||||||
<sect2 id="RCL.PROGRAM.FILTERS.MULTIPLE">
|
<sect2 id="RCL.PROGRAM.FILTERS.MULTIPLE">
|
||||||
<title>"Multiple" filters</title>
|
<title>"Multiple" handlers</title>
|
||||||
|
|
||||||
<para>If you can program and want to write
|
<para>If you can program and want to write
|
||||||
an <literal>execm</literal> filter, it should not be too
|
an <literal>execm</literal> handler, it should not be too
|
||||||
difficult to make sense of one of the existing modules. For
|
difficult to make sense of one of the existing modules. For
|
||||||
example, look at <command>rclzip</command> which uses Zip
|
example, look at <command>rclzip</command> which uses Zip
|
||||||
file paths as identifiers (<literal>ipath</literal>),
|
file paths as identifiers (<literal>ipath</literal>),
|
||||||
@ -3438,7 +3443,7 @@ dir:recoll dir:src -dir:utils -dir:common
|
|||||||
the <filename>internfile/mh_execm.h</filename> file and
|
the <filename>internfile/mh_execm.h</filename> file and
|
||||||
possibly at the corresponding module.</para>
|
possibly at the corresponding module.</para>
|
||||||
|
|
||||||
<para><literal>execm</literal> filters sometimes need to make
|
<para><literal>execm</literal> handlers sometimes need to make
|
||||||
a choice for the nature of the <literal>ipath</literal>
|
a choice for the nature of the <literal>ipath</literal>
|
||||||
elements that they use in communication with the
|
elements that they use in communication with the
|
||||||
indexer. Here are a few guidelines:
|
indexer. Here are a few guidelines:
|
||||||
@ -3453,16 +3458,16 @@ dir:recoll dir:src -dir:utils -dir:common
|
|||||||
separator to store a complex path internally (for
|
separator to store a complex path internally (for
|
||||||
deeper embedding). Colons inside
|
deeper embedding). Colons inside
|
||||||
the <literal>ipath</literal> elements output by a
|
the <literal>ipath</literal> elements output by a
|
||||||
filter will be escaped, but would be a bad choice as a
|
handler will be escaped, but would be a bad choice as a
|
||||||
filter-specific separator (mostly, again, for
|
handler-specific separator (mostly, again, for
|
||||||
debugging issues).</para></listitem>
|
debugging issues).</para></listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
In any case, the main goal is that it should
|
In any case, the main goal is that it should
|
||||||
be easy for the filter to extract the target document, given
|
be easy for the handler to extract the target document, given
|
||||||
the file name and the <literal>ipath</literal>
|
the file name and the <literal>ipath</literal>
|
||||||
element.</para>
|
element.</para>
|
||||||
|
|
||||||
<para><literal>execm</literal> filters will also produce
|
<para><literal>execm</literal> handlers will also produce
|
||||||
a document with a null <literal>ipath</literal>
|
a document with a null <literal>ipath</literal>
|
||||||
element. Depending on the type of document, this may have
|
element. Depending on the type of document, this may have
|
||||||
some associated data (e.g. the body of an email message), or
|
some associated data (e.g. the body of an email message), or
|
||||||
@ -3472,11 +3477,11 @@ dir:recoll dir:src -dir:utils -dir:common
|
|||||||
</sect2>
|
</sect2>
|
||||||
|
|
||||||
<sect2 id="RCL.PROGRAM.FILTERS.ASSOCIATION">
|
<sect2 id="RCL.PROGRAM.FILTERS.ASSOCIATION">
|
||||||
<title>Telling &RCL; about the filter</title>
|
<title>Telling &RCL; about the handler</title>
|
||||||
|
|
||||||
<para>There are two elements that link a file to the filter which
|
<para>There are two elements that link a file to the handler which
|
||||||
should process it: the association of file to mime type and the
|
should process it: the association of file to mime type and the
|
||||||
association of a mime type with a filter.</para>
|
association of a mime type with a handler.</para>
|
||||||
|
|
||||||
<para>The association of files to mime types is mostly based on
|
<para>The association of files to mime types is mostly based on
|
||||||
name suffixes. The types are defined inside the
|
name suffixes. The types are defined inside the
|
||||||
@ -3490,7 +3495,7 @@ dir:recoll dir:src -dir:utils -dir:common
|
|||||||
to execute the <command>file -i</command> command to determine a
|
to execute the <command>file -i</command> command to determine a
|
||||||
mime type.</para>
|
mime type.</para>
|
||||||
|
|
||||||
<para>The association of file types to filters is performed in
|
<para>The association of file types to handlers is performed in
|
||||||
the <link linkend="RCL.INSTALL.CONFIG.MIMECONF">
|
the <link linkend="RCL.INSTALL.CONFIG.MIMECONF">
|
||||||
<filename>mimeconf</filename> file</link>. A sample will probably be
|
<filename>mimeconf</filename> file</link>. A sample will probably be
|
||||||
of better help than a long explanation:</para>
|
of better help than a long explanation:</para>
|
||||||
@ -3532,7 +3537,7 @@ application/x-chm = execm rclchm
|
|||||||
<command>unrtf</command> in the HTML header section.</para>
|
<command>unrtf</command> in the HTML header section.</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
<listitem><para><literal>application/x-chm</literal> is processed
|
<listitem><para><literal>application/x-chm</literal> is processed
|
||||||
by a persistant filter. This is determined by the
|
by a persistant handler. This is determined by the
|
||||||
<literal>execm</literal> keyword.</para>
|
<literal>execm</literal> keyword.</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
@ -3541,7 +3546,7 @@ application/x-chm = execm rclchm
|
|||||||
</sect2>
|
</sect2>
|
||||||
|
|
||||||
<sect2 id="RCL.PROGRAM.FILTERS.HTML">
|
<sect2 id="RCL.PROGRAM.FILTERS.HTML">
|
||||||
<title>Filter HTML output</title>
|
<title>Input handler HTML output</title>
|
||||||
|
|
||||||
<para>The output HTML could be very minimal like the following
|
<para>The output HTML could be very minimal like the following
|
||||||
example:
|
example:
|
||||||
@ -3607,7 +3612,7 @@ or
|
|||||||
</programlisting>
|
</programlisting>
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para>Filters also have the possibility to "invent" field
|
<para>Input handlers also have the possibility to "invent" field
|
||||||
names. This should also be output as meta tags:</para>
|
names. This should also be output as meta tags:</para>
|
||||||
|
|
||||||
<programlisting>
|
<programlisting>
|
||||||
@ -3634,10 +3639,10 @@ or
|
|||||||
<title>Page numbers</title>
|
<title>Page numbers</title>
|
||||||
|
|
||||||
<para>The indexer will interpret <literal>^L</literal> characters
|
<para>The indexer will interpret <literal>^L</literal> characters
|
||||||
in the filter output as indicating page breaks, and will record
|
in the handler output as indicating page breaks, and will record
|
||||||
them. At query time, this allows starting a viewer on the right
|
them. At query time, this allows starting a viewer on the right
|
||||||
page for a hit or a snippet. Currently, only the PDF, Postscript
|
page for a hit or a snippet. Currently, only the PDF, Postscript
|
||||||
and DVI filters generate page breaks.</para>
|
and DVI handlers generate page breaks.</para>
|
||||||
|
|
||||||
</sect2>
|
</sect2>
|
||||||
|
|
||||||
@ -3651,7 +3656,7 @@ or
|
|||||||
<literal>author</literal>, <literal>abstract</literal>.</para>
|
<literal>author</literal>, <literal>abstract</literal>.</para>
|
||||||
|
|
||||||
<para>The field values for documents can appear in several ways
|
<para>The field values for documents can appear in several ways
|
||||||
during indexing: either output by filters
|
during indexing: either output by input handlers
|
||||||
as <literal>meta</literal> fields in the HTML header section, or
|
as <literal>meta</literal> fields in the HTML header section, or
|
||||||
extracted from file extended attributes, or added as attributes
|
extracted from file extended attributes, or added as attributes
|
||||||
of the <literal>Doc</literal> object when using the API, or
|
of the <literal>Doc</literal> object when using the API, or
|
||||||
@ -3661,7 +3666,7 @@ or
|
|||||||
specific field.</para>
|
specific field.</para>
|
||||||
|
|
||||||
<para>&RCL; defines a number of default fields. Additional
|
<para>&RCL; defines a number of default fields. Additional
|
||||||
ones can be output by filters, and described in the
|
ones can be output by handlers, and described in the
|
||||||
<filename>fields</filename> configuration file.</para>
|
<filename>fields</filename> configuration file.</para>
|
||||||
|
|
||||||
<para>Fields can be:</para>
|
<para>Fields can be:</para>
|
||||||
@ -3903,7 +3908,7 @@ or
|
|||||||
<title>The Db class</title>
|
<title>The Db class</title>
|
||||||
|
|
||||||
<para>A Db object is created by
|
<para>A Db object is created by
|
||||||
a <literal>connect()</literal> function and holds a
|
a <literal>connect()</literal> call and holds a
|
||||||
connection to a Recoll index.</para>
|
connection to a Recoll index.</para>
|
||||||
<variablelist>
|
<variablelist>
|
||||||
<title>Methods</title>
|
<title>Methods</title>
|
||||||
@ -4381,7 +4386,7 @@ except:
|
|||||||
directory.</para>
|
directory.</para>
|
||||||
|
|
||||||
<para>A list of common file types which need external
|
<para>A list of common file types which need external
|
||||||
commands follows. Many of the filters need the
|
commands follows. Many of the handlers need the
|
||||||
<command>iconv</command> command, which is not always listed as a
|
<command>iconv</command> command, which is not always listed as a
|
||||||
dependancy.</para>
|
dependancy.</para>
|
||||||
|
|
||||||
@ -4398,7 +4403,7 @@ except:
|
|||||||
type is important to you.</para>
|
type is important to you.</para>
|
||||||
|
|
||||||
<para>As of &RCL; release 1.14, a number of XML-based formats that
|
<para>As of &RCL; release 1.14, a number of XML-based formats that
|
||||||
were handled by ad hoc filter code now use the
|
were handled by ad hoc handler code now use the
|
||||||
<command>xsltproc</command> command, which usually comes with
|
<command>xsltproc</command> command, which usually comes with
|
||||||
<application>libxslt</application>. These are: abiword, fb2
|
<application>libxslt</application>. These are: abiword, fb2
|
||||||
(ebooks), kword, openoffice, svg.</para>
|
(ebooks), kword, openoffice, svg.</para>
|
||||||
@ -4425,8 +4430,8 @@ except:
|
|||||||
be used as a fallback for some files which
|
be used as a fallback for some files which
|
||||||
<command>antiword</command> does not handle.</para></listitem>
|
<command>antiword</command> does not handle.</para></listitem>
|
||||||
|
|
||||||
<listitem><para>MS Excel and PowerPoint need <command>
|
<listitem><para>MS Excel and PowerPoint are processed by
|
||||||
catdoc</command>.</para></listitem>
|
internal <command>Python</command> handlers.</para></listitem>
|
||||||
|
|
||||||
<listitem><para>MS Open XML (docx) needs <command>
|
<listitem><para>MS Open XML (docx) needs <command>
|
||||||
xsltproc</command>.</para></listitem>
|
xsltproc</command>.</para></listitem>
|
||||||
@ -4451,15 +4456,10 @@ except:
|
|||||||
<command>djvused</command> from the
|
<command>djvused</command> from the
|
||||||
<application>DjVuLibre</application> package.</para></listitem>
|
<application>DjVuLibre</application> package.</para></listitem>
|
||||||
|
|
||||||
<listitem><para>Audio files: &RCL; releases before 1.13
|
<listitem><para>Audio files: &RCL; releases 1.14 and later use
|
||||||
used the <command>id3info</command> command from the <application>
|
a single <application>Python</application> handler based
|
||||||
id3lib</application> package to extract mp3 tag information,
|
on <application>mutagen</application> for all audio file
|
||||||
<command>metaflac</command> (standard flac tools) for flac files,
|
types.</para>
|
||||||
and <command>ogginfo</command> (vorbis tools) for ogg
|
|
||||||
files. Releases 1.14 and later use a single
|
|
||||||
<application>Python</application> filter based
|
|
||||||
on <application>mutagen</application> for all audio file
|
|
||||||
types.</para>
|
|
||||||
</listitem>
|
</listitem>
|
||||||
|
|
||||||
<listitem><para>Pictures: &RCL; uses the
|
<listitem><para>Pictures: &RCL; uses the
|
||||||
@ -4471,7 +4471,7 @@ except:
|
|||||||
store personal tags or textual descriptions inside the image
|
store personal tags or textual descriptions inside the image
|
||||||
files.</para></listitem>
|
files.</para></listitem>
|
||||||
|
|
||||||
<listitem><para>chm: files in microsoft help format need Python and
|
<listitem><para>chm: files in Microsoft help format need Python and
|
||||||
the <application>pychm</application> module (which needs
|
the <application>pychm</application> module (which needs
|
||||||
<application>chmlib</application>).</para></listitem>
|
<application>chmlib</application>).</para></listitem>
|
||||||
|
|
||||||
@ -4498,15 +4498,15 @@ except:
|
|||||||
<listitem><para>Konqueror webarchive format with Python (uses the
|
<listitem><para>Konqueror webarchive format with Python (uses the
|
||||||
Tarfile module).</para></listitem>
|
Tarfile module).</para></listitem>
|
||||||
|
|
||||||
<listitem><para>mimehtml web archive format (support based on the email
|
<listitem><para>Mimehtml web archive format (support based on
|
||||||
filter, which introduces some mild weirdness, but still
|
the email handler, which introduces some mild weirdness, but
|
||||||
usable).</para></listitem>
|
still usable).</para></listitem>
|
||||||
|
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
|
|
||||||
<para>Text, HTML, email folders, and Scribus files are
|
<para>Text, HTML, email folders, and Scribus files are
|
||||||
processed internally. <application>Lyx</application> is used to
|
processed internally. <application>Lyx</application> is used to
|
||||||
index Lyx files. Many filters need <command>iconv</command> and the
|
index Lyx files. Many handlers need <command>iconv</command> and the
|
||||||
standard <command>sed</command> and <command>awk</command>.
|
standard <command>sed</command> and <command>awk</command>.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
@ -4994,10 +4994,10 @@ skippedPaths = ~/somedir/*.txt
|
|||||||
<listitem><para>A space-separated list of patterns for
|
<listitem><para>A space-separated list of patterns for
|
||||||
names of files or directories that should be ignored
|
names of files or directories that should be ignored
|
||||||
inside zip archives. This is used directly by the zip
|
inside zip archives. This is used directly by the zip
|
||||||
filter, and has a function similar to skippedNames, but
|
handler, and has a function similar to skippedNames, but
|
||||||
works independantly. Can be redefined for filesystem
|
works independantly. Can be redefined for filesystem
|
||||||
subdirectories. For versions up to 1.19, you will need
|
subdirectories. For versions up to 1.19, you will need
|
||||||
to update the Zip filter and install a supplementary
|
to update the Zip handler and install a supplementary
|
||||||
Python module. The details are
|
Python module. The details are
|
||||||
described <ulink url="https://bitbucket.org/medoc/recoll/wiki/Filtering%20out%20Zip%20archive%20members">on
|
described <ulink url="https://bitbucket.org/medoc/recoll/wiki/Filtering%20out%20Zip%20archive%20members">on
|
||||||
the &RCL; wiki</ulink>.
|
the &RCL; wiki</ulink>.
|
||||||
@ -5552,13 +5552,13 @@ mondelaypatterns = *.log:20 "this one has spaces*:10"
|
|||||||
</varlistentry>
|
</varlistentry>
|
||||||
|
|
||||||
<varlistentry><term><varname>filtermaxseconds</varname></term>
|
<varlistentry><term><varname>filtermaxseconds</varname></term>
|
||||||
<listitem><para>Maximum filter execution time, after which it
|
<listitem><para>Maximum handler execution time, after which it
|
||||||
is aborted. Some postscript programs just loop...</para>
|
is aborted. Some postscript programs just loop...</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
</varlistentry>
|
</varlistentry>
|
||||||
<varlistentry><term><varname>filtersdir</varname></term>
|
<varlistentry><term><varname>filtersdir</varname></term>
|
||||||
<listitem><para>A directory to search for the external
|
<listitem><para>A directory to search for the external
|
||||||
filter scripts used to index some types of files. The
|
input handler scripts used to index some types of files. The
|
||||||
value should not be changed, except if you want to modify
|
value should not be changed, except if you want to modify
|
||||||
one of the default scripts. The value can be redefined for
|
one of the default scripts. The value can be redefined for
|
||||||
any sub-directory. </para>
|
any sub-directory. </para>
|
||||||
@ -5678,9 +5678,9 @@ mondelaypatterns = *.log:20 "this one has spaces*:10"
|
|||||||
</listitem>
|
</listitem>
|
||||||
</varlistentry>
|
</varlistentry>
|
||||||
<varlistentry>
|
<varlistentry>
|
||||||
<term>filter-specific sections</term>
|
<term>handler-specific sections</term>
|
||||||
<listitem><para>Some filters may need specific
|
<listitem><para>Some input handlers may need specific
|
||||||
configuration for handling fields. Only the email message filter
|
configuration for handling fields. Only the email message handler
|
||||||
currently has such a section (named
|
currently has such a section (named
|
||||||
<literal>[mail]</literal>). It allows indexing arbitrary email
|
<literal>[mail]</literal>). It allows indexing arbitrary email
|
||||||
headers in addition to the ones indexed by default. Other such
|
headers in addition to the ones indexed by default. Other such
|
||||||
@ -5694,7 +5694,7 @@ mondelaypatterns = *.log:20 "this one has spaces*:10"
|
|||||||
<filename>fields</filename>
|
<filename>fields</filename>
|
||||||
file. This would extract a specific email header and
|
file. This would extract a specific email header and
|
||||||
use it as a searchable field, with data displayable inside result
|
use it as a searchable field, with data displayable inside result
|
||||||
lists. (Side note: as the email filter does no decoding on the values,
|
lists. (Side note: as the email handler does no decoding on the values,
|
||||||
only plain ascii headers can be indexed, and only the
|
only plain ascii headers can be indexed, and only the
|
||||||
first occurrence will be used for headers that occur several times).
|
first occurrence will be used for headers that occur several times).
|
||||||
|
|
||||||
@ -6007,7 +6007,7 @@ application/x-blobapp = exec rclblob
|
|||||||
</listitem>
|
</listitem>
|
||||||
</itemizedlist>
|
</itemizedlist>
|
||||||
|
|
||||||
<para>The <replaceable>rclblob</replaceable> filter should
|
<para>The <replaceable>rclblob</replaceable> handler should
|
||||||
be an executable program or script which exists inside
|
be an executable program or script which exists inside
|
||||||
<filename>/usr/[local/]share/recoll/filters</filename>. It
|
<filename>/usr/[local/]share/recoll/filters</filename>. It
|
||||||
will be given a file name as argument and should output the
|
will be given a file name as argument and should output the
|
||||||
@ -6015,7 +6015,7 @@ application/x-blobapp = exec rclblob
|
|||||||
|
|
||||||
<para>The <link linkend="RCL.PROGRAM.FILTERS">filter
|
<para>The <link linkend="RCL.PROGRAM.FILTERS">filter
|
||||||
programming</link> section describes in more detail how
|
programming</link> section describes in more detail how
|
||||||
to write a filter.</para>
|
to write an input handler.</para>
|
||||||
|
|
||||||
</sect3>
|
</sect3>
|
||||||
|
|
||||||
|
|||||||
@ -1,60 +0,0 @@
|
|||||||
#!/bin/sh
|
|
||||||
|
|
||||||
# A script to produce the Recoll manual with an xml toolchain.
|
|
||||||
# Tools used:
|
|
||||||
# - xsltproc
|
|
||||||
# - The docbook-xsl styleets
|
|
||||||
# - dblatex for producing the PDF.
|
|
||||||
#
|
|
||||||
# Limitations:
|
|
||||||
# - Does not produce the links to the whole/chunked versions at the top
|
|
||||||
# of the document
|
|
||||||
# - The anchor names from the source text are converted to uppercase
|
|
||||||
# by the sgml toolchain. This does not happen with the xml
|
|
||||||
# toolchain, which means that external links like
|
|
||||||
# usermanual.html#RCL.CONFIG.INDEXING won't work because fragments
|
|
||||||
# are case-sensitive. This has been solved by converting all ids
|
|
||||||
# inside the source file to upper-case. DON'T REINTRODUCE
|
|
||||||
# lower-case IDS
|
|
||||||
|
|
||||||
# Wherever docbook.xsl and chunk.xsl live
|
|
||||||
# Fbsd
|
|
||||||
#XSLDIR="/usr/local/share/xsl/docbook/"
|
|
||||||
# Mac
|
|
||||||
#XSLDIR="/opt/local/share/xsl/docbook-xsl/"
|
|
||||||
#Linux
|
|
||||||
XSLDIR="/usr/share/xml/docbook/stylesheet/docbook-xsl/"
|
|
||||||
|
|
||||||
dochunky=1
|
|
||||||
test $# -eq 1 && dochunky=0
|
|
||||||
|
|
||||||
# Options common to the single-file and chunked versions
|
|
||||||
commonoptions="--stringparam section.autolabel 1 \
|
|
||||||
--stringparam section.autolabel.max.depth 3 \
|
|
||||||
--stringparam section.label.includes.component.label 1 \
|
|
||||||
--stringparam autotoc.label.in.hyperlink 0 \
|
|
||||||
--stringparam abstract.notitle.enabled 1 \
|
|
||||||
--stringparam html.stylesheet docbook-xsl.css \
|
|
||||||
--stringparam generate.toc \"book toc,title,figure,table,example,equation\" \
|
|
||||||
"
|
|
||||||
|
|
||||||
# Do the chunky thing
|
|
||||||
if test $dochunky -ne 0 ; then
|
|
||||||
eval xsltproc $commonoptions \
|
|
||||||
--stringparam use.id.as.filename 1 \
|
|
||||||
--stringparam root.filename index \
|
|
||||||
"$XSLDIR/html/chunk.xsl" \
|
|
||||||
usermanual.xml
|
|
||||||
fi
|
|
||||||
|
|
||||||
# Produce the single file version
|
|
||||||
eval xsltproc $commonoptions \
|
|
||||||
-o usermanual.html \
|
|
||||||
"$XSLDIR/html/docbook.xsl" \
|
|
||||||
usermanual.xml
|
|
||||||
|
|
||||||
tidy -indent usermanual.html > tmpfile
|
|
||||||
mv -f tmpfile usermanual.html
|
|
||||||
|
|
||||||
# And the pdf with dblatex
|
|
||||||
dblatex usermanual.xml
|
|
||||||
Loading…
x
Reference in New Issue
Block a user