This commit is contained in:
Jean-Francois Dockes 2020-03-03 18:53:31 +01:00
parent ad466ee42d
commit 8c816f50cf
8 changed files with 205 additions and 861 deletions

View File

@ -3446,43 +3446,48 @@ fs.inotify.max_user_watches=32768
WEB history.</p>
<p>Here follows an example:</p>
<pre class="programlisting">
&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;fragbuts version="1.0"&gt;
&lt;fragbuts version="1.0"&gt;
&lt;radiobuttons&gt;
&lt;!-- Actually useful: toggle WEB queue results inclusion --&gt;
&lt;fragbut&gt;
&lt;label&gt;Include Web Results&lt;/label&gt;
&lt;frag&gt;&lt;/frag&gt;
&lt;/fragbut&gt;
&lt;radiobuttons&gt;
&lt;fragbut&gt;
&lt;label&gt;Exclude Web Results&lt;/label&gt;
&lt;frag&gt;-rclbes:BGL&lt;/frag&gt;
&lt;/fragbut&gt;
&lt;fragbut&gt;
&lt;label&gt;Include Web Results&lt;/label&gt;
&lt;frag&gt;&lt;/frag&gt;
&lt;/fragbut&gt;
&lt;fragbut&gt;
&lt;label&gt;Only Web Results&lt;/label&gt;
&lt;frag&gt;rclbes:BGL&lt;/frag&gt;
&lt;/fragbut&gt;
&lt;fragbut&gt;
&lt;label&gt;Exclude Web Results&lt;/label&gt;
&lt;frag&gt;-rclbes:BGL&lt;/frag&gt;
&lt;/fragbut&gt;
&lt;/radiobuttons&gt;
&lt;fragbut&gt;
&lt;label&gt;Only Web Results&lt;/label&gt;
&lt;frag&gt;rclbes:BGL&lt;/frag&gt;
&lt;/fragbut&gt;
&lt;buttons&gt;
&lt;/radiobuttons&gt;
&lt;fragbut&gt;
&lt;label&gt;Example: Year 2010&lt;/label&gt;
&lt;frag&gt;date:2010-01-01/2010-12-31&lt;/frag&gt;
&lt;/fragbut&gt;
&lt;buttons&gt;
&lt;fragbut&gt;
&lt;label&gt;Example: c++ files&lt;/label&gt;
&lt;frag&gt;ext:cpp OR ext:cxx&lt;/frag&gt;
&lt;/fragbut&gt;
&lt;fragbut&gt;
&lt;label&gt;Year 2010&lt;/label&gt;
&lt;frag&gt;date:2010-01-01/2010-12-31&lt;/frag&gt;
&lt;/fragbut&gt;
&lt;fragbut&gt;
&lt;label&gt;Example: My Great Directory&lt;/label&gt;
&lt;frag&gt;dir:/my/great/directory&lt;/frag&gt;
&lt;/fragbut&gt;
&lt;fragbut&gt;
&lt;label&gt;My Great Directory Only&lt;/label&gt;
&lt;frag&gt;dir:/my/great/directory&lt;/frag&gt;
&lt;/fragbut&gt;
&lt;/buttons&gt;
&lt;/buttons&gt;
&lt;/fragbuts&gt;
&lt;/fragbuts&gt;
</pre>
<p>Each <code class="literal">radiobuttons</code> or
<code class="literal">buttons</code> section defines a
@ -3781,6 +3786,20 @@ fs.inotify.max_user_watches=32768
your NLS environment. Weird things will probably
happen if languages are mixed up.</p>
</dd>
<dt><span class="term">Show index
statistics</span></dt>
<dd>
<p>This will print a long list of boring numbers
about the index</p>
</dd>
<dt><span class="term">List files which could not be
indexed</span></dt>
<dd>
<p>This will show the files which caused errors,
usually because <span class=
"command"><strong>recollindex</strong></span> could
not translate their format into text.</p>
</dd>
</dl>
</div>
<p>Note that in cases where <span class=
@ -3862,14 +3881,9 @@ fs.inotify.max_user_watches=32768
<code class="envar">RECOLL_ACTIVE_EXTRA_DBS</code>, you
can add and activate the index for the mounted volume
when starting <span class=
"command"><strong>recoll</strong></span>.</p>
<p><code class="envar">RECOLL_ACTIVE_EXTRA_DBS</code> is
available for <span class="application">Recoll</span>
versions 1.17.2 and later. A change was made in the same
update so that <span class=
"command"><strong>recoll</strong></span> will
automatically deactivate unreachable indexes when
starting up.</p>
"command"><strong>recoll</strong></span>. Unreachable
indexes will automatically be deactivated when starting
up.</p>
</div>
<div class="sect2">
<div class="titlepage">
@ -5579,8 +5593,9 @@ text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/r
</div>
</div>
</div>
<p><b>Term synonyms:&nbsp;</b>there are a number of ways to
use term synonyms for searching text:</p>
<p><b>Term synonyms and text search:&nbsp;</b>in general,
there are two main ways to use term synonyms for searching
text:</p>
<div class="itemizedlist">
<ul class="itemizedlist" style="list-style-type: disc;">
<li class="listitem">
@ -5829,15 +5844,25 @@ text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/r
is minimal. However there are a few tools available:</p>
<div class="itemizedlist">
<ul class="itemizedlist" style="list-style-type: disc;">
<li class="listitem">
<p>Users of recent Ubuntu-derived distributions, or
any other Gnome desktop systems (e.g. Fedora) can
install the <a class="ulink" href=
"https://www.lesbonscomptes.com/recoll/download.html#gssp"
target="_top">Recoll GSSP</a> (Gnome Shell Search
Provider).</p>
</li>
<li class="listitem">
<p>The <span class="application">KDE</span> KIO Slave
was described in a <a class="link" href=
"#RCL.SEARCH.KIO" title=
"3.3.&nbsp;Searching with the KDE KIO slave">previous
section</a>.</p>
section</a>. It can provide search results inside
<span class=
"command"><strong>Dolphin</strong></span>.</p>
</li>
<li class="listitem">
<p>If you use a recent version of Ubuntu Linux, you
<p>If you use an oldish version of Ubuntu Linux, you
may find the <a class="ulink" href=
"https://www.lesbonscomptes.com/recoll/faqsandhowtos/UnityLens"
target="_top">Ubuntu Unity Lens</a> module
@ -5975,8 +6000,8 @@ text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/r
C++ and live inside <span class=
"command"><strong>recollindex</strong></span>. This latter
kind will not be described here.</p>
<p>There are currently (since version 1.13) two kinds of
external executable input handlers:</p>
<p>There are two kinds of external executable input
handlers:</p>
<div class="itemizedlist">
<ul class="itemizedlist" style="list-style-type: disc;">
<li class="listitem">
@ -6180,10 +6205,11 @@ text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/r
have a look. Before the C++ import, the xsl-based
handlers used a common module <code class=
"filename">rclgenxslt.py</code>, it is still around
but unused. The handler for OpenXML presentations
is still the Python version because the format did
not fit with what the C++ code does. It would be a
good base for another similar issue.</p>
but unused at the moment. The handler for OpenXML
presentations is still the Python version because
the format did not fit with what the C++ code does.
It would be a good base for another similar
issue.</p>
</li>
</ul>
</div>
@ -6366,14 +6392,14 @@ text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/r
minimal like the following example:</p>
<pre class="programlisting">
&lt;html&gt;
&lt;head&gt;
&lt;meta http-equiv="Content-Type" content="text/html;charset=UTF-8"&gt;
&lt;/head&gt;
&lt;body&gt;
Some text content
&lt;/body&gt;
&lt;head&gt;
&lt;meta http-equiv="Content-Type" content="text/html;charset=UTF-8"/&gt;
&lt;/head&gt;
&lt;body&gt;
Some text content
&lt;/body&gt;
&lt;/html&gt;
</pre>
</pre>
<p>You should take care to escape some characters inside
the text by transforming them into appropriate entities.
At the very minimum, "<code class="literal">&amp;</code>"
@ -6613,11 +6639,17 @@ text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/r
for creating/updating an index. Bindings exist for
Python2 and Python3.</p>
<p>The search interface is used in a number of active
projects: the <span class="application">Recoll</span>
projects: the <a class="ulink" href=
"https://www.lesbonscomptes.com/recoll/download.html#gssp"
target="_top"><span class="application">Recoll</span>
<span class="application">Gnome Shell Search
Provider</span>, the <span class=
"application">Recoll</span> Web UI, and the upmpdcli UPnP
Media Server, in addition to many small scripts.</p>
Provider</span></a> , the <a class="ulink" href=
"https://opensourceprojects.eu/p/recollwebui/code/"
target="_top"><span class="application">Recoll</span> Web
UI</a>, and the <a class="ulink" href=
"https://www.lesbonscomptes.com/upmpdcli/upmpdcli-manual.html#UPRCL"
target="_top">upmpdcli UPnP Media Server</a>, in addition
to many small scripts.</p>
<p>The index update section of the API may be used to
create and update <span class="application">Recoll</span>
indexes on specific configurations (separate from the

View File

@ -2454,46 +2454,52 @@ fs.inotify.max_user_watches=32768
contains an example which filters the results from the WEB
history.</para>
<para>Here follows an example:
<programlisting>
&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
<programlisting><![CDATA[
<?xml version="1.0" encoding="UTF-8"?>
<fragbuts version="1.0">
&lt;fragbuts version=&quot;1.0&quot;&gt;
<radiobuttons>
<!-- Actually useful: toggle WEB queue results inclusion -->
<fragbut>
<label>Include Web Results</label>
<frag></frag>
</fragbut>
&lt;radiobuttons&gt;
<fragbut>
<label>Exclude Web Results</label>
<frag>-rclbes:BGL</frag>
</fragbut>
&lt;fragbut&gt;
&lt;label&gt;Include Web Results&lt;/label&gt;
&lt;frag&gt;&lt;/frag&gt;
&lt;/fragbut&gt;
<fragbut>
<label>Only Web Results</label>
<frag>rclbes:BGL</frag>
</fragbut>
&lt;fragbut&gt;
&lt;label&gt;Exclude Web Results&lt;/label&gt;
&lt;frag&gt;-rclbes:BGL&lt;/frag&gt;
&lt;/fragbut&gt;
</radiobuttons>
&lt;fragbut&gt;
&lt;label&gt;Only Web Results&lt;/label&gt;
&lt;frag&gt;rclbes:BGL&lt;/frag&gt;
&lt;/fragbut&gt;
<buttons>
&lt;/radiobuttons&gt;
<fragbut>
<label>Example: Year 2010</label>
<frag>date:2010-01-01/2010-12-31</frag>
</fragbut>
&lt;buttons&gt;
<fragbut>
<label>Example: c++ files</label>
<frag>ext:cpp OR ext:cxx</frag>
</fragbut>
&lt;fragbut&gt;
&lt;label&gt;Year 2010&lt;/label&gt;
&lt;frag&gt;date:2010-01-01/2010-12-31&lt;/frag&gt;
&lt;/fragbut&gt;
<fragbut>
<label>Example: My Great Directory</label>
<frag>dir:/my/great/directory</frag>
</fragbut>
&lt;fragbut&gt;
&lt;label&gt;My Great Directory Only&lt;/label&gt;
&lt;frag&gt;dir:/my/great/directory&lt;/frag&gt;
&lt;/fragbut&gt;
</buttons>
&lt;/buttons&gt;
&lt;/fragbuts&gt;
</programlisting>
</fragbuts>
]]></programlisting>
</para>
<para>Each <literal>radiobuttons</literal> or
@ -2745,6 +2751,16 @@ fs.inotify.max_user_watches=32768
environment. Weird things will probably happen if
languages are mixed up.</para></listitem>
</varlistentry>
<varlistentry>
<term>Show index statistics</term> <listitem><para>This will
print a long list of boring numbers about the index</para>
</listitem></varlistentry>
<varlistentry>
<term>List files which could not be indexed</term>
<listitem><para>This will show the files which caused errors,
usually because <command>recollindex</command> could not
translate their format into text.</para>
</listitem></varlistentry>
</variablelist>
<para>Note that in cases where &RCL; does not know the beginning
@ -2804,22 +2820,16 @@ fs.inotify.max_user_watches=32768
</para>
<screen>export RECOLL_EXTRA_DBS=/some/place/xapiandb:/some/other/db</screen>
<para>Another environment variable,
<envar>RECOLL_ACTIVE_EXTRA_DBS</envar> allows adding to the active
list of indexes. This variable was suggested and implemented by a
&RCL; user. It is mostly useful if you use scripts to mount
external volumes with &RCL; indexes. By using
<envar>RECOLL_EXTRA_DBS</envar> and
<envar>RECOLL_ACTIVE_EXTRA_DBS</envar>, you can add and activate
the index for the mounted volume when starting
<command>recoll</command>.
</para>
<para><envar>RECOLL_ACTIVE_EXTRA_DBS</envar> is available for
&RCL; versions 1.17.2 and later. A change was made in the same
update so that <command>recoll</command> will
automatically deactivate unreachable indexes when starting
up.</para>
<para>Another environment
variable, <envar>RECOLL_ACTIVE_EXTRA_DBS</envar> allows adding to
the active list of indexes. This variable was suggested and
implemented by a &RCL; user. It is mostly useful if you use scripts
to mount external volumes with &RCL; indexes. By
using <envar>RECOLL_EXTRA_DBS</envar>
and <envar>RECOLL_ACTIVE_EXTRA_DBS</envar>, you can add and
activate the index for the mounted volume when
starting <command>recoll</command>. Unreachable indexes will
automatically be deactivated when starting up.</para>
</sect2>
@ -4261,8 +4271,9 @@ text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/r
<sect1 id="RCL.SEARCH.SYNONYMS">
<title>Using Synonyms (1.22)</title>
<formalpara><title>Term synonyms:</title>
<para>there are a number of ways to use term synonyms for searching text:
<formalpara><title>Term synonyms and text search:</title> <para>in
general, there are two main ways to use term synonyms for
searching text:
<itemizedlist>
<listitem><para>At index creation time, they can be used to alter the
indexed terms, either increasing or decreasing their number, by
@ -4478,11 +4489,20 @@ text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/r
available:
<itemizedlist>
<listitem>
<para>The <application>KDE</application> KIO Slave was
described in a <link linkend="RCL.SEARCH.KIO">previous section</link>.</para>
<para>Users of recent Ubuntu-derived distributions, or
any other Gnome desktop systems (e.g. Fedora) can install the
<ulink
url="https://www.lesbonscomptes.com/recoll/download.html#gssp">
Recoll GSSP</ulink> (Gnome Shell Search Provider).</para>
</listitem>
<listitem>
<para>If you use a recent version of Ubuntu Linux, you may
<para>The <application>KDE</application> KIO Slave was described
in a <link linkend="RCL.SEARCH.KIO">previous
section</link>. It can provide search results
inside <command>Dolphin</command>. </para>
</listitem>
<listitem>
<para>If you use an oldish version of Ubuntu Linux, you may
find the <ulink url="&FAQS;UnityLens">Ubuntu Unity
Lens</ulink> module useful.</para>
</listitem>
@ -4583,8 +4603,7 @@ text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/r
in C++ and live inside <command>recollindex</command>. This latter
kind will not be described here.</para>
<para>There are currently (since version 1.13) two kinds of
external executable input handlers:
<para>There are two kinds of external executable input handlers:
<itemizedlist>
<listitem><para>Simple <literal>exec</literal> handlers
run once and exit. They can be bare programs like
@ -4711,34 +4730,32 @@ text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/r
<itemizedlist>
<listitem><para><literal>rclimg</literal> is written in Perl and
handles the execm protocol all by itself (showing how trivial it
is).</para></listitem>
<listitem><para>All the Python handlers share at least the
<filename>rclexecm.py</filename> module, which handles the
communication. Have a look at, for example,
<filename>rclzip</filename> for a handler which uses
<filename>rclexecm.py</filename> directly.</para></listitem>
<listitem><para>Most Python handlers which process
single-document files by executing another command are further
abstracted by using the <filename>rclexec1.py</filename>
module. See for example <filename>rclrtf.py</filename> for a
simple one, or <filename>rcldoc.py</filename> for a slightly more
complicated one (possibly executing several
commands).</para></listitem>
<listitem><para>Handlers which extract text from an XML document
by using an XSLT style sheet are now executed inside
<command>recollindex</command>, with only the style sheet stored
in the <filename>filters/</filename> directory. These can
use a single style sheet (e.g. <filename>abiword.xsl</filename>),
or two sheets for the data and metadata
(e.g. <filename>opendoc-body.xsl</filename> and
<filename>opendoc-meta.xsl</filename>). The
<filename>mimeconf</filename> configuration file defines how the
sheets are used, have a look. Before the C++ import, the
xsl-based handlers used a common module
<filename>rclgenxslt.py</filename>, it is still around but
unused. The handler for OpenXML presentations is still the Python
version because the format did not fit with what the C++ code
does. It would be a good base for another similar
is).</para></listitem> <listitem><para>All the Python handlers
share at least the <filename>rclexecm.py</filename> module, which
handles the communication. Have a look at, for
example, <filename>rclzip</filename> for a handler which
uses <filename>rclexecm.py</filename>
directly.</para></listitem> <listitem><para>Most Python handlers
which process single-document files by executing another command
are further abstracted by using
the <filename>rclexec1.py</filename> module. See for
example <filename>rclrtf.py</filename> for a simple one,
or <filename>rcldoc.py</filename> for a slightly more complicated
one (possibly executing several
commands).</para></listitem> <listitem><para>Handlers which
extract text from an XML document by using an XSLT style sheet
are now executed inside <command>recollindex</command>, with only
the style sheet stored in the <filename>filters/</filename>
directory. These can use a single style sheet
(e.g. <filename>abiword.xsl</filename>), or two sheets for the
data and metadata (e.g. <filename>opendoc-body.xsl</filename>
and <filename>opendoc-meta.xsl</filename>). The <filename>mimeconf</filename>
configuration file defines how the sheets are used, have a
look. Before the C++ import, the xsl-based handlers used a common
module <filename>rclgenxslt.py</filename>, it is still around but
unused at the moment. The handler for OpenXML presentations is
still the Python version because the format did not fit with what
the C++ code does. It would be a good base for another similar
issue.</para></listitem>
</itemizedlist>
</para>
@ -4878,16 +4895,16 @@ text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/r
<para>For filters producing HTML, the output could be very minimal
like the following example:
<programlisting>
&lt;html>
&lt;head>
&lt;meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
&lt;/head>
&lt;body>
Some text content
&lt;/body>
&lt;/html>
</programlisting>
<programlisting><![CDATA[
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8"/>
</head>
<body>
Some text content
</body>
</html>
]]></programlisting>
</para>
<para>You should take care to escape some
@ -5087,9 +5104,16 @@ text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/r
Python2 and Python3.</para>
<para>The search interface is used in a number of active projects:
the &RCL; <application>Gnome Shell Search Provider</application>,
the &RCL; Web UI, and the upmpdcli UPnP Media Server, in addition
to many small scripts.</para>
the <ulink
url="https://www.lesbonscomptes.com/recoll/download.html#gssp">
&RCL; <application>Gnome Shell Search Provider</application>
</ulink>,
the <ulink url="https://opensourceprojects.eu/p/recollwebui/code/">
&RCL; Web UI</ulink>, and the
<ulink
url="https://www.lesbonscomptes.com/upmpdcli/upmpdcli-manual.html#UPRCL">
upmpdcli UPnP Media Server</ulink>, in addition
to many small scripts.</para>
<para>The index update section of the API may be used to create and
update &RCL; indexes on specific configurations (separate from the

View File

@ -1,118 +0,0 @@
#!/usr/bin/env python3
# Copyright (C) 2014 J.F.Dockes
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the
# Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
######################################
from __future__ import print_function
import sys
import rclexecm
import rclgenxslt
stylesheet_all = '''<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:ab="http://www.abisource.com/awml.dtd"
exclude-result-prefixes="ab"
>
<xsl:output method="html" encoding="UTF-8"/>
<xsl:template match="/">
<html>
<head>
<xsl:apply-templates select="ab:abiword/ab:metadata"/>
</head>
<body>
<!-- This is for the older abiword format with no namespaces -->
<xsl:for-each select="abiword/section">
<xsl:apply-templates select="p"/>
</xsl:for-each>
<!-- Newer namespaced format -->
<xsl:for-each select="ab:abiword/ab:section">
<xsl:for-each select="ab:p">
<p><xsl:value-of select="."/></p><xsl:text>
</xsl:text>
</xsl:for-each>
</xsl:for-each>
</body>
</html>
</xsl:template>
<xsl:template match="p">
<p><xsl:value-of select="."/></p><xsl:text>
</xsl:text>
</xsl:template>
<xsl:template match="ab:metadata">
<xsl:for-each select="ab:m">
<xsl:choose>
<xsl:when test="@key = 'dc.creator'">
<meta>
<xsl:attribute name="name">author</xsl:attribute>
<xsl:attribute name="content">
<xsl:value-of select="."/>
</xsl:attribute>
</meta><xsl:text>
</xsl:text>
</xsl:when>
<xsl:when test="@key = 'abiword.keywords'">
<meta>
<xsl:attribute name="name">keywords</xsl:attribute>
<xsl:attribute name="content">
<xsl:value-of select="."/>
</xsl:attribute>
</meta><xsl:text>
</xsl:text>
</xsl:when>
<xsl:when test="@key = 'dc.subject'">
<meta>
<xsl:attribute name="name">keywords</xsl:attribute>
<xsl:attribute name="content">
<xsl:value-of select="."/>
</xsl:attribute>
</meta><xsl:text>
</xsl:text>
</xsl:when>
<xsl:when test="@key = 'dc.description'">
<meta>
<xsl:attribute name="name">abstract</xsl:attribute>
<xsl:attribute name="content">
<xsl:value-of select="."/>
</xsl:attribute>
</meta><xsl:text>
</xsl:text>
</xsl:when>
<xsl:when test="@key = 'dc.title'">
<title><xsl:value-of select="."/></title><xsl:text>
</xsl:text>
</xsl:when>
<xsl:otherwise>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
'''
if __name__ == '__main__':
proto = rclexecm.RclExecM()
extract = rclgenxslt.XSLTExtractor(proto, stylesheet_all)
rclexecm.main(proto, extract)

View File

@ -1,112 +0,0 @@
#!/usr/bin/env python3
# Copyright (C) 2014 J.F.Dockes
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the
# Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
######################################
from __future__ import print_function
import sys
import rclexecm
import rclgenxslt
stylesheet_all = '''<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0"
xmlns:ooo="http://openoffice.org/2004/office"
xmlns:gnm="http://www.gnumeric.org/v10.dtd"
exclude-result-prefixes="office xlink meta ooo dc"
>
<xsl:output method="html" encoding="UTF-8"/>
<xsl:template match="/">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
<xsl:apply-templates select="//office:document-meta/office:meta"/>
</head>
<body>
<xsl:apply-templates select="//gnm:Cells"/>
<xsl:apply-templates select="//gnm:Objects"/>
</body>
</html>
</xsl:template>
<xsl:template match="//dc:date">
<meta>
<xsl:attribute name="name">date</xsl:attribute>
<xsl:attribute name="content"><xsl:value-of select="."/></xsl:attribute>
</meta>
</xsl:template>
<xsl:template match="//dc:description">
<meta>
<xsl:attribute name="name">abstract</xsl:attribute>
<xsl:attribute name="content"><xsl:value-of select="."/></xsl:attribute>
</meta>
</xsl:template>
<xsl:template match="//meta:keyword">
<meta>
<xsl:attribute name="name">keywords</xsl:attribute>
<xsl:attribute name="content"><xsl:value-of select="."/></xsl:attribute>
</meta>
</xsl:template>
<xsl:template match="//dc:subject">
<meta>
<xsl:attribute name="name">keywords</xsl:attribute>
<xsl:attribute name="content"><xsl:value-of select="."/></xsl:attribute>
</meta>
</xsl:template>
<xsl:template match="//dc:title">
<title> <xsl:value-of select="."/> </title>
</xsl:template>
<xsl:template match="//meta:initial-creator">
<meta>
<xsl:attribute name="name">author</xsl:attribute>
<xsl:attribute name="content"><xsl:value-of select="."/></xsl:attribute>
</meta>
</xsl:template>
<xsl:template match="office:meta/*"/>
<xsl:template match="gnm:Cell">
<p><xsl:value-of select="."/></p>
</xsl:template>
<xsl:template match="gnm:CellComment">
<blockquote><xsl:value-of select="@Text"/></blockquote>
</xsl:template>
</xsl:stylesheet>
'''
if __name__ == '__main__':
proto = rclexecm.RclExecM()
extract = rclgenxslt.XSLTExtractor(proto, stylesheet_all, gzip=True)
rclexecm.main(proto, extract)

View File

@ -1,70 +0,0 @@
#!/usr/bin/env python3
# Copyright (C) 2014 J.F.Dockes
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the
# Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
######################################
from __future__ import print_function
import sys
import rclexecm
import rclgenxslt
stylesheet_all = '''<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html" encoding="UTF-8"/>
<xsl:strip-space elements="*" />
<xsl:template match="/">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>
Okular notes about: <xsl:value-of select="/documentInfo/@url" />
</title>
</head>
<body>
<xsl:apply-templates />
</body>
</html>
</xsl:template>
<xsl:template match="node()">
<xsl:apply-templates select="@* | node() "/>
</xsl:template>
<xsl:template match="text()">
<p><xsl:value-of select="."/></p>
<xsl:text >
</xsl:text>
</xsl:template>
<xsl:template match="@contents|@author">
<p><xsl:value-of select="." /></p>
<xsl:text >
</xsl:text>
</xsl:template>
<xsl:template match="@*"/>
</xsl:stylesheet>
'''
if __name__ == '__main__':
proto = rclexecm.RclExecM()
extract = rclgenxslt.XSLTExtractor(proto, stylesheet_all)
rclexecm.main(proto, extract)

View File

@ -1,137 +0,0 @@
#!/usr/bin/env python3
# Copyright (C) 2014-2018 J.F.Dockes
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the
# Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
######################################
import sys
import rclexecm
import rclgenxslt
stylesheet = '''<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0"
xmlns:ooo="http://openoffice.org/2004/office"
xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0"
exclude-result-prefixes="office xlink meta ooo dc text"
>
<xsl:output method="html" encoding="UTF-8"/>
<xsl:template match="/">
<html>
<head>
<xsl:apply-templates select="/office:document/office:meta" />
</head>
<body>
<xsl:apply-templates select="/office:document/office:body" />
</body></html>
</xsl:template>
<xsl:template match="/office:document/office:meta">
<xsl:apply-templates select="dc:title"/>
<xsl:apply-templates select="dc:description"/>
<xsl:apply-templates select="dc:subject"/>
<xsl:apply-templates select="meta:keyword"/>
<xsl:apply-templates select="dc:creator"/>
</xsl:template>
<xsl:template match="/office:document/office:body">
<xsl:apply-templates select=".//text:p" />
<xsl:apply-templates select=".//text:h" />
<xsl:apply-templates select=".//text:s" />
<xsl:apply-templates select=".//text:line-break" />
<xsl:apply-templates select=".//text:tab" />
</xsl:template>
<xsl:template match="dc:title">
<title> <xsl:value-of select="."/> </title><xsl:text>
</xsl:text>
</xsl:template>
<xsl:template match="dc:description">
<meta>
<xsl:attribute name="name">abstract</xsl:attribute>
<xsl:attribute name="content">
<xsl:value-of select="."/>
</xsl:attribute>
</meta><xsl:text>
</xsl:text>
</xsl:template>
<xsl:template match="dc:subject">
<meta>
<xsl:attribute name="name">keywords</xsl:attribute>
<xsl:attribute name="content">
<xsl:value-of select="."/>
</xsl:attribute>
</meta><xsl:text>
</xsl:text>
</xsl:template>
<xsl:template match="dc:creator">
<meta>
<xsl:attribute name="name">author</xsl:attribute>
<xsl:attribute name="content">
<xsl:value-of select="."/>
</xsl:attribute>
</meta><xsl:text>
</xsl:text>
</xsl:template>
<xsl:template match="meta:keyword">
<meta>
<xsl:attribute name="name">keywords</xsl:attribute>
<xsl:attribute name="content">
<xsl:value-of select="."/>
</xsl:attribute>
</meta><xsl:text>
</xsl:text>
</xsl:template>
<xsl:template match="office:body//text:p">
<p><xsl:apply-templates/></p><xsl:text>
</xsl:text>
</xsl:template>
<xsl:template match="office:body//text:h">
<p><xsl:apply-templates/></p><xsl:text>
</xsl:text>
</xsl:template>
<xsl:template match="office:body//text:s">
<xsl:text> </xsl:text>
</xsl:template>
<xsl:template match="office:body//text:line-break">
<br />
</xsl:template>
<xsl:template match="office:body//text:tab">
<xsl:text> </xsl:text>
</xsl:template>
</xsl:stylesheet>
'''
if __name__ == '__main__':
proto = rclexecm.RclExecM()
extract = rclgenxslt.XSLTExtractor(proto, stylesheet)
rclexecm.main(proto, extract)

View File

@ -1,170 +0,0 @@
#!/usr/bin/env python3
# Copyright (C) 2014 J.F.Dockes
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the
# Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
######################################
from __future__ import print_function
import sys
import rclexecm
import rclxslt
from rclbasehandler import RclBaseHandler
from zipfile import ZipFile
stylesheet_meta = '''<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0"
xmlns:ooo="http://openoffice.org/2004/office"
exclude-result-prefixes="office xlink meta ooo dc"
>
<xsl:output method="html" encoding="UTF-8"/>
<xsl:template match="/office:document-meta">
<xsl:apply-templates select="office:meta/dc:description"/>
<xsl:apply-templates select="office:meta/dc:subject"/>
<xsl:apply-templates select="office:meta/dc:title"/>
<xsl:apply-templates select="office:meta/meta:keyword"/>
<xsl:apply-templates select="office:meta/dc:creator"/>
</xsl:template>
<xsl:template match="dc:title">
<title> <xsl:value-of select="."/> </title><xsl:text>
</xsl:text>
</xsl:template>
<xsl:template match="dc:description">
<meta>
<xsl:attribute name="name">abstract</xsl:attribute>
<xsl:attribute name="content">
<xsl:value-of select="."/>
</xsl:attribute>
</meta><xsl:text>
</xsl:text>
</xsl:template>
<xsl:template match="dc:subject">
<meta>
<xsl:attribute name="name">keywords</xsl:attribute>
<xsl:attribute name="content">
<xsl:value-of select="."/>
</xsl:attribute>
</meta><xsl:text>
</xsl:text>
</xsl:template>
<xsl:template match="dc:creator">
<meta>
<xsl:attribute name="name">author</xsl:attribute>
<xsl:attribute name="content">
<xsl:value-of select="."/>
</xsl:attribute>
</meta><xsl:text>
</xsl:text>
</xsl:template>
<xsl:template match="meta:keyword">
<meta>
<xsl:attribute name="name">keywords</xsl:attribute>
<xsl:attribute name="content">
<xsl:value-of select="."/>
</xsl:attribute>
</meta><xsl:text>
</xsl:text>
</xsl:template>
</xsl:stylesheet>
'''
stylesheet_content = '''<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0"
exclude-result-prefixes="text"
>
<xsl:output method="html" encoding="UTF-8"/>
<xsl:template match="text:p">
<p><xsl:apply-templates/></p><xsl:text>
</xsl:text>
</xsl:template>
<xsl:template match="text:h">
<p><xsl:apply-templates/></p><xsl:text>
</xsl:text>
</xsl:template>
<xsl:template match="text:s">
<xsl:text> </xsl:text>
</xsl:template>
<xsl:template match="text:line-break">
<br />
</xsl:template>
<xsl:template match="text:tab">
<xsl:text> </xsl:text>
</xsl:template>
</xsl:stylesheet>
'''
class OOExtractor(RclBaseHandler):
def __init__(self, em):
super(OOExtractor, self).__init__(em)
def html_text(self, fn):
f = open(fn, 'rb')
zip = ZipFile(f)
docdata = b'<html>\n<head>\n<meta http-equiv="Content-Type"' \
b'content="text/html; charset=UTF-8">'
# Wrap metadata extraction because it can sometimes throw
# while the main text will be valid
try:
metadata = zip.read("meta.xml")
if metadata:
res = rclxslt.apply_sheet_data(stylesheet_meta, metadata)
docdata += res
except:
# To be checked. I'm under the impression that I get this when
# nothing matches?
#self.em.rclog("No/bad metadata in %s" % fn)
pass
docdata += b'</head>\n<body>\n'
content = zip.read("content.xml")
if content:
res = rclxslt.apply_sheet_data(stylesheet_content, content)
docdata += res
docdata += b'</body></html>'
return docdata
if __name__ == '__main__':
proto = rclexecm.RclExecM()
extract = OOExtractor(proto)
rclexecm.main(proto, extract)

View File

@ -1,105 +0,0 @@
#!/usr/bin/env python3
# Copyright (C) 2014 J.F.Dockes
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the
# Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
######################################
from __future__ import print_function
import sys
import rclexecm
import rclgenxslt
stylesheet_all = '''<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:svg="http://www.w3.org/2000/svg"
xmlns:dc="http://purl.org/dc/elements/1.1/"
exclude-result-prefixes="svg"
>
<xsl:output method="html" encoding="UTF-8"/>
<xsl:template match="/">
<html>
<head>
<xsl:apply-templates select="svg:svg/svg:title"/>
<xsl:apply-templates select="svg:svg/svg:desc"/>
<xsl:apply-templates select="svg:svg/svg:metadata/descendant::dc:creator"/>
<xsl:apply-templates select="svg:svg/svg:metadata/descendant::dc:subject"/>
<xsl:apply-templates select="svg:svg/svg:metadata/descendant::dc:description"/>
</head>
<body>
<xsl:apply-templates select="//svg:text"/>
</body>
</html>
</xsl:template>
<xsl:template match="svg:desc">
<meta>
<xsl:attribute name="name">keywords</xsl:attribute>
<xsl:attribute name="content">
<xsl:value-of select="."/>
</xsl:attribute>
</meta><xsl:text>
</xsl:text>
</xsl:template>
<xsl:template match="dc:creator">
<meta>
<xsl:attribute name="name">author</xsl:attribute>
<xsl:attribute name="content">
<xsl:value-of select="."/>
</xsl:attribute>
</meta><xsl:text>
</xsl:text>
</xsl:template>
<xsl:template match="dc:subject">
<meta>
<xsl:attribute name="name">keywords</xsl:attribute>
<xsl:attribute name="content">
<xsl:value-of select="."/>
</xsl:attribute>
</meta><xsl:text>
</xsl:text>
</xsl:template>
<xsl:template match="dc:description">
<meta>
<xsl:attribute name="name">description</xsl:attribute>
<xsl:attribute name="content">
<xsl:value-of select="."/>
</xsl:attribute>
</meta><xsl:text>
</xsl:text>
</xsl:template>
<xsl:template match="svg:title">
<title><xsl:value-of select="."/></title><xsl:text>
</xsl:text>
</xsl:template>
<xsl:template match="svg:text">
<p><xsl:value-of select="."/></p><xsl:text>
</xsl:text>
</xsl:template>
</xsl:stylesheet>
'''
if __name__ == '__main__':
proto = rclexecm.RclExecM()
extract = rclgenxslt.XSLTExtractor(proto, stylesheet_all)
rclexecm.main(proto, extract)