This commit is contained in:
Jean-Francois Dockes 2020-03-03 18:53:31 +01:00
parent ad466ee42d
commit 8c816f50cf
8 changed files with 205 additions and 861 deletions

View File

@ -3446,43 +3446,48 @@ fs.inotify.max_user_watches=32768
WEB history.</p> WEB history.</p>
<p>Here follows an example:</p> <p>Here follows an example:</p>
<pre class="programlisting"> <pre class="programlisting">
&lt;?xml version="1.0" encoding="UTF-8"?&gt; &lt;?xml version="1.0" encoding="UTF-8"?&gt;
&lt;fragbuts version="1.0"&gt;
&lt;fragbuts version="1.0"&gt; &lt;radiobuttons&gt;
&lt;!-- Actually useful: toggle WEB queue results inclusion --&gt;
&lt;fragbut&gt;
&lt;label&gt;Include Web Results&lt;/label&gt;
&lt;frag&gt;&lt;/frag&gt;
&lt;/fragbut&gt;
&lt;radiobuttons&gt; &lt;fragbut&gt;
&lt;label&gt;Exclude Web Results&lt;/label&gt;
&lt;frag&gt;-rclbes:BGL&lt;/frag&gt;
&lt;/fragbut&gt;
&lt;fragbut&gt; &lt;fragbut&gt;
&lt;label&gt;Include Web Results&lt;/label&gt; &lt;label&gt;Only Web Results&lt;/label&gt;
&lt;frag&gt;&lt;/frag&gt; &lt;frag&gt;rclbes:BGL&lt;/frag&gt;
&lt;/fragbut&gt; &lt;/fragbut&gt;
&lt;fragbut&gt; &lt;/radiobuttons&gt;
&lt;label&gt;Exclude Web Results&lt;/label&gt;
&lt;frag&gt;-rclbes:BGL&lt;/frag&gt;
&lt;/fragbut&gt;
&lt;fragbut&gt; &lt;buttons&gt;
&lt;label&gt;Only Web Results&lt;/label&gt;
&lt;frag&gt;rclbes:BGL&lt;/frag&gt;
&lt;/fragbut&gt;
&lt;/radiobuttons&gt; &lt;fragbut&gt;
&lt;label&gt;Example: Year 2010&lt;/label&gt;
&lt;frag&gt;date:2010-01-01/2010-12-31&lt;/frag&gt;
&lt;/fragbut&gt;
&lt;buttons&gt; &lt;fragbut&gt;
&lt;label&gt;Example: c++ files&lt;/label&gt;
&lt;frag&gt;ext:cpp OR ext:cxx&lt;/frag&gt;
&lt;/fragbut&gt;
&lt;fragbut&gt; &lt;fragbut&gt;
&lt;label&gt;Year 2010&lt;/label&gt; &lt;label&gt;Example: My Great Directory&lt;/label&gt;
&lt;frag&gt;date:2010-01-01/2010-12-31&lt;/frag&gt; &lt;frag&gt;dir:/my/great/directory&lt;/frag&gt;
&lt;/fragbut&gt; &lt;/fragbut&gt;
&lt;fragbut&gt; &lt;/buttons&gt;
&lt;label&gt;My Great Directory Only&lt;/label&gt;
&lt;frag&gt;dir:/my/great/directory&lt;/frag&gt;
&lt;/fragbut&gt;
&lt;/buttons&gt; &lt;/fragbuts&gt;
&lt;/fragbuts&gt;
</pre> </pre>
<p>Each <code class="literal">radiobuttons</code> or <p>Each <code class="literal">radiobuttons</code> or
<code class="literal">buttons</code> section defines a <code class="literal">buttons</code> section defines a
@ -3781,6 +3786,20 @@ fs.inotify.max_user_watches=32768
your NLS environment. Weird things will probably your NLS environment. Weird things will probably
happen if languages are mixed up.</p> happen if languages are mixed up.</p>
</dd> </dd>
<dt><span class="term">Show index
statistics</span></dt>
<dd>
<p>This will print a long list of boring numbers
about the index</p>
</dd>
<dt><span class="term">List files which could not be
indexed</span></dt>
<dd>
<p>This will show the files which caused errors,
usually because <span class=
"command"><strong>recollindex</strong></span> could
not translate their format into text.</p>
</dd>
</dl> </dl>
</div> </div>
<p>Note that in cases where <span class= <p>Note that in cases where <span class=
@ -3862,14 +3881,9 @@ fs.inotify.max_user_watches=32768
<code class="envar">RECOLL_ACTIVE_EXTRA_DBS</code>, you <code class="envar">RECOLL_ACTIVE_EXTRA_DBS</code>, you
can add and activate the index for the mounted volume can add and activate the index for the mounted volume
when starting <span class= when starting <span class=
"command"><strong>recoll</strong></span>.</p> "command"><strong>recoll</strong></span>. Unreachable
<p><code class="envar">RECOLL_ACTIVE_EXTRA_DBS</code> is indexes will automatically be deactivated when starting
available for <span class="application">Recoll</span> up.</p>
versions 1.17.2 and later. A change was made in the same
update so that <span class=
"command"><strong>recoll</strong></span> will
automatically deactivate unreachable indexes when
starting up.</p>
</div> </div>
<div class="sect2"> <div class="sect2">
<div class="titlepage"> <div class="titlepage">
@ -5579,8 +5593,9 @@ text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/r
</div> </div>
</div> </div>
</div> </div>
<p><b>Term synonyms:&nbsp;</b>there are a number of ways to <p><b>Term synonyms and text search:&nbsp;</b>in general,
use term synonyms for searching text:</p> there are two main ways to use term synonyms for searching
text:</p>
<div class="itemizedlist"> <div class="itemizedlist">
<ul class="itemizedlist" style="list-style-type: disc;"> <ul class="itemizedlist" style="list-style-type: disc;">
<li class="listitem"> <li class="listitem">
@ -5829,15 +5844,25 @@ text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/r
is minimal. However there are a few tools available:</p> is minimal. However there are a few tools available:</p>
<div class="itemizedlist"> <div class="itemizedlist">
<ul class="itemizedlist" style="list-style-type: disc;"> <ul class="itemizedlist" style="list-style-type: disc;">
<li class="listitem">
<p>Users of recent Ubuntu-derived distributions, or
any other Gnome desktop systems (e.g. Fedora) can
install the <a class="ulink" href=
"https://www.lesbonscomptes.com/recoll/download.html#gssp"
target="_top">Recoll GSSP</a> (Gnome Shell Search
Provider).</p>
</li>
<li class="listitem"> <li class="listitem">
<p>The <span class="application">KDE</span> KIO Slave <p>The <span class="application">KDE</span> KIO Slave
was described in a <a class="link" href= was described in a <a class="link" href=
"#RCL.SEARCH.KIO" title= "#RCL.SEARCH.KIO" title=
"3.3.&nbsp;Searching with the KDE KIO slave">previous "3.3.&nbsp;Searching with the KDE KIO slave">previous
section</a>.</p> section</a>. It can provide search results inside
<span class=
"command"><strong>Dolphin</strong></span>.</p>
</li> </li>
<li class="listitem"> <li class="listitem">
<p>If you use a recent version of Ubuntu Linux, you <p>If you use an oldish version of Ubuntu Linux, you
may find the <a class="ulink" href= may find the <a class="ulink" href=
"https://www.lesbonscomptes.com/recoll/faqsandhowtos/UnityLens" "https://www.lesbonscomptes.com/recoll/faqsandhowtos/UnityLens"
target="_top">Ubuntu Unity Lens</a> module target="_top">Ubuntu Unity Lens</a> module
@ -5975,8 +6000,8 @@ text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/r
C++ and live inside <span class= C++ and live inside <span class=
"command"><strong>recollindex</strong></span>. This latter "command"><strong>recollindex</strong></span>. This latter
kind will not be described here.</p> kind will not be described here.</p>
<p>There are currently (since version 1.13) two kinds of <p>There are two kinds of external executable input
external executable input handlers:</p> handlers:</p>
<div class="itemizedlist"> <div class="itemizedlist">
<ul class="itemizedlist" style="list-style-type: disc;"> <ul class="itemizedlist" style="list-style-type: disc;">
<li class="listitem"> <li class="listitem">
@ -6180,10 +6205,11 @@ text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/r
have a look. Before the C++ import, the xsl-based have a look. Before the C++ import, the xsl-based
handlers used a common module <code class= handlers used a common module <code class=
"filename">rclgenxslt.py</code>, it is still around "filename">rclgenxslt.py</code>, it is still around
but unused. The handler for OpenXML presentations but unused at the moment. The handler for OpenXML
is still the Python version because the format did presentations is still the Python version because
not fit with what the C++ code does. It would be a the format did not fit with what the C++ code does.
good base for another similar issue.</p> It would be a good base for another similar
issue.</p>
</li> </li>
</ul> </ul>
</div> </div>
@ -6366,14 +6392,14 @@ text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/r
minimal like the following example:</p> minimal like the following example:</p>
<pre class="programlisting"> <pre class="programlisting">
&lt;html&gt; &lt;html&gt;
&lt;head&gt; &lt;head&gt;
&lt;meta http-equiv="Content-Type" content="text/html;charset=UTF-8"&gt; &lt;meta http-equiv="Content-Type" content="text/html;charset=UTF-8"/&gt;
&lt;/head&gt; &lt;/head&gt;
&lt;body&gt; &lt;body&gt;
Some text content Some text content
&lt;/body&gt; &lt;/body&gt;
&lt;/html&gt; &lt;/html&gt;
</pre> </pre>
<p>You should take care to escape some characters inside <p>You should take care to escape some characters inside
the text by transforming them into appropriate entities. the text by transforming them into appropriate entities.
At the very minimum, "<code class="literal">&amp;</code>" At the very minimum, "<code class="literal">&amp;</code>"
@ -6613,11 +6639,17 @@ text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/r
for creating/updating an index. Bindings exist for for creating/updating an index. Bindings exist for
Python2 and Python3.</p> Python2 and Python3.</p>
<p>The search interface is used in a number of active <p>The search interface is used in a number of active
projects: the <span class="application">Recoll</span> projects: the <a class="ulink" href=
"https://www.lesbonscomptes.com/recoll/download.html#gssp"
target="_top"><span class="application">Recoll</span>
<span class="application">Gnome Shell Search <span class="application">Gnome Shell Search
Provider</span>, the <span class= Provider</span></a> , the <a class="ulink" href=
"application">Recoll</span> Web UI, and the upmpdcli UPnP "https://opensourceprojects.eu/p/recollwebui/code/"
Media Server, in addition to many small scripts.</p> target="_top"><span class="application">Recoll</span> Web
UI</a>, and the <a class="ulink" href=
"https://www.lesbonscomptes.com/upmpdcli/upmpdcli-manual.html#UPRCL"
target="_top">upmpdcli UPnP Media Server</a>, in addition
to many small scripts.</p>
<p>The index update section of the API may be used to <p>The index update section of the API may be used to
create and update <span class="application">Recoll</span> create and update <span class="application">Recoll</span>
indexes on specific configurations (separate from the indexes on specific configurations (separate from the

View File

@ -2454,46 +2454,52 @@ fs.inotify.max_user_watches=32768
contains an example which filters the results from the WEB contains an example which filters the results from the WEB
history.</para> history.</para>
<para>Here follows an example: <para>Here follows an example:
<programlisting> <programlisting><![CDATA[
&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt; <?xml version="1.0" encoding="UTF-8"?>
<fragbuts version="1.0">
&lt;fragbuts version=&quot;1.0&quot;&gt; <radiobuttons>
<!-- Actually useful: toggle WEB queue results inclusion -->
<fragbut>
<label>Include Web Results</label>
<frag></frag>
</fragbut>
&lt;radiobuttons&gt; <fragbut>
<label>Exclude Web Results</label>
<frag>-rclbes:BGL</frag>
</fragbut>
&lt;fragbut&gt; <fragbut>
&lt;label&gt;Include Web Results&lt;/label&gt; <label>Only Web Results</label>
&lt;frag&gt;&lt;/frag&gt; <frag>rclbes:BGL</frag>
&lt;/fragbut&gt; </fragbut>
&lt;fragbut&gt; </radiobuttons>
&lt;label&gt;Exclude Web Results&lt;/label&gt;
&lt;frag&gt;-rclbes:BGL&lt;/frag&gt;
&lt;/fragbut&gt;
&lt;fragbut&gt; <buttons>
&lt;label&gt;Only Web Results&lt;/label&gt;
&lt;frag&gt;rclbes:BGL&lt;/frag&gt;
&lt;/fragbut&gt;
&lt;/radiobuttons&gt; <fragbut>
<label>Example: Year 2010</label>
<frag>date:2010-01-01/2010-12-31</frag>
</fragbut>
&lt;buttons&gt; <fragbut>
<label>Example: c++ files</label>
<frag>ext:cpp OR ext:cxx</frag>
</fragbut>
&lt;fragbut&gt; <fragbut>
&lt;label&gt;Year 2010&lt;/label&gt; <label>Example: My Great Directory</label>
&lt;frag&gt;date:2010-01-01/2010-12-31&lt;/frag&gt; <frag>dir:/my/great/directory</frag>
&lt;/fragbut&gt; </fragbut>
&lt;fragbut&gt; </buttons>
&lt;label&gt;My Great Directory Only&lt;/label&gt;
&lt;frag&gt;dir:/my/great/directory&lt;/frag&gt;
&lt;/fragbut&gt;
&lt;/buttons&gt; </fragbuts>
&lt;/fragbuts&gt; ]]></programlisting>
</programlisting>
</para> </para>
<para>Each <literal>radiobuttons</literal> or <para>Each <literal>radiobuttons</literal> or
@ -2745,6 +2751,16 @@ fs.inotify.max_user_watches=32768
environment. Weird things will probably happen if environment. Weird things will probably happen if
languages are mixed up.</para></listitem> languages are mixed up.</para></listitem>
</varlistentry> </varlistentry>
<varlistentry>
<term>Show index statistics</term> <listitem><para>This will
print a long list of boring numbers about the index</para>
</listitem></varlistentry>
<varlistentry>
<term>List files which could not be indexed</term>
<listitem><para>This will show the files which caused errors,
usually because <command>recollindex</command> could not
translate their format into text.</para>
</listitem></varlistentry>
</variablelist> </variablelist>
<para>Note that in cases where &RCL; does not know the beginning <para>Note that in cases where &RCL; does not know the beginning
@ -2804,22 +2820,16 @@ fs.inotify.max_user_watches=32768
</para> </para>
<screen>export RECOLL_EXTRA_DBS=/some/place/xapiandb:/some/other/db</screen> <screen>export RECOLL_EXTRA_DBS=/some/place/xapiandb:/some/other/db</screen>
<para>Another environment variable, <para>Another environment
<envar>RECOLL_ACTIVE_EXTRA_DBS</envar> allows adding to the active variable, <envar>RECOLL_ACTIVE_EXTRA_DBS</envar> allows adding to
list of indexes. This variable was suggested and implemented by a the active list of indexes. This variable was suggested and
&RCL; user. It is mostly useful if you use scripts to mount implemented by a &RCL; user. It is mostly useful if you use scripts
external volumes with &RCL; indexes. By using to mount external volumes with &RCL; indexes. By
<envar>RECOLL_EXTRA_DBS</envar> and using <envar>RECOLL_EXTRA_DBS</envar>
<envar>RECOLL_ACTIVE_EXTRA_DBS</envar>, you can add and activate and <envar>RECOLL_ACTIVE_EXTRA_DBS</envar>, you can add and
the index for the mounted volume when starting activate the index for the mounted volume when
<command>recoll</command>. starting <command>recoll</command>. Unreachable indexes will
</para> automatically be deactivated when starting up.</para>
<para><envar>RECOLL_ACTIVE_EXTRA_DBS</envar> is available for
&RCL; versions 1.17.2 and later. A change was made in the same
update so that <command>recoll</command> will
automatically deactivate unreachable indexes when starting
up.</para>
</sect2> </sect2>
@ -4261,8 +4271,9 @@ text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/r
<sect1 id="RCL.SEARCH.SYNONYMS"> <sect1 id="RCL.SEARCH.SYNONYMS">
<title>Using Synonyms (1.22)</title> <title>Using Synonyms (1.22)</title>
<formalpara><title>Term synonyms:</title> <formalpara><title>Term synonyms and text search:</title> <para>in
<para>there are a number of ways to use term synonyms for searching text: general, there are two main ways to use term synonyms for
searching text:
<itemizedlist> <itemizedlist>
<listitem><para>At index creation time, they can be used to alter the <listitem><para>At index creation time, they can be used to alter the
indexed terms, either increasing or decreasing their number, by indexed terms, either increasing or decreasing their number, by
@ -4478,11 +4489,20 @@ text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/r
available: available:
<itemizedlist> <itemizedlist>
<listitem> <listitem>
<para>The <application>KDE</application> KIO Slave was <para>Users of recent Ubuntu-derived distributions, or
described in a <link linkend="RCL.SEARCH.KIO">previous section</link>.</para> any other Gnome desktop systems (e.g. Fedora) can install the
<ulink
url="https://www.lesbonscomptes.com/recoll/download.html#gssp">
Recoll GSSP</ulink> (Gnome Shell Search Provider).</para>
</listitem> </listitem>
<listitem> <listitem>
<para>If you use a recent version of Ubuntu Linux, you may <para>The <application>KDE</application> KIO Slave was described
in a <link linkend="RCL.SEARCH.KIO">previous
section</link>. It can provide search results
inside <command>Dolphin</command>. </para>
</listitem>
<listitem>
<para>If you use an oldish version of Ubuntu Linux, you may
find the <ulink url="&FAQS;UnityLens">Ubuntu Unity find the <ulink url="&FAQS;UnityLens">Ubuntu Unity
Lens</ulink> module useful.</para> Lens</ulink> module useful.</para>
</listitem> </listitem>
@ -4583,8 +4603,7 @@ text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/r
in C++ and live inside <command>recollindex</command>. This latter in C++ and live inside <command>recollindex</command>. This latter
kind will not be described here.</para> kind will not be described here.</para>
<para>There are currently (since version 1.13) two kinds of <para>There are two kinds of external executable input handlers:
external executable input handlers:
<itemizedlist> <itemizedlist>
<listitem><para>Simple <literal>exec</literal> handlers <listitem><para>Simple <literal>exec</literal> handlers
run once and exit. They can be bare programs like run once and exit. They can be bare programs like
@ -4711,34 +4730,32 @@ text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/r
<itemizedlist> <itemizedlist>
<listitem><para><literal>rclimg</literal> is written in Perl and <listitem><para><literal>rclimg</literal> is written in Perl and
handles the execm protocol all by itself (showing how trivial it handles the execm protocol all by itself (showing how trivial it
is).</para></listitem> is).</para></listitem> <listitem><para>All the Python handlers
<listitem><para>All the Python handlers share at least the share at least the <filename>rclexecm.py</filename> module, which
<filename>rclexecm.py</filename> module, which handles the handles the communication. Have a look at, for
communication. Have a look at, for example, example, <filename>rclzip</filename> for a handler which
<filename>rclzip</filename> for a handler which uses uses <filename>rclexecm.py</filename>
<filename>rclexecm.py</filename> directly.</para></listitem> directly.</para></listitem> <listitem><para>Most Python handlers
<listitem><para>Most Python handlers which process which process single-document files by executing another command
single-document files by executing another command are further are further abstracted by using
abstracted by using the <filename>rclexec1.py</filename> the <filename>rclexec1.py</filename> module. See for
module. See for example <filename>rclrtf.py</filename> for a example <filename>rclrtf.py</filename> for a simple one,
simple one, or <filename>rcldoc.py</filename> for a slightly more or <filename>rcldoc.py</filename> for a slightly more complicated
complicated one (possibly executing several one (possibly executing several
commands).</para></listitem> commands).</para></listitem> <listitem><para>Handlers which
<listitem><para>Handlers which extract text from an XML document extract text from an XML document by using an XSLT style sheet
by using an XSLT style sheet are now executed inside are now executed inside <command>recollindex</command>, with only
<command>recollindex</command>, with only the style sheet stored the style sheet stored in the <filename>filters/</filename>
in the <filename>filters/</filename> directory. These can directory. These can use a single style sheet
use a single style sheet (e.g. <filename>abiword.xsl</filename>), (e.g. <filename>abiword.xsl</filename>), or two sheets for the
or two sheets for the data and metadata data and metadata (e.g. <filename>opendoc-body.xsl</filename>
(e.g. <filename>opendoc-body.xsl</filename> and and <filename>opendoc-meta.xsl</filename>). The <filename>mimeconf</filename>
<filename>opendoc-meta.xsl</filename>). The configuration file defines how the sheets are used, have a
<filename>mimeconf</filename> configuration file defines how the look. Before the C++ import, the xsl-based handlers used a common
sheets are used, have a look. Before the C++ import, the module <filename>rclgenxslt.py</filename>, it is still around but
xsl-based handlers used a common module unused at the moment. The handler for OpenXML presentations is
<filename>rclgenxslt.py</filename>, it is still around but still the Python version because the format did not fit with what
unused. The handler for OpenXML presentations is still the Python the C++ code does. It would be a good base for another similar
version because the format did not fit with what the C++ code
does. It would be a good base for another similar
issue.</para></listitem> issue.</para></listitem>
</itemizedlist> </itemizedlist>
</para> </para>
@ -4878,16 +4895,16 @@ text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/r
<para>For filters producing HTML, the output could be very minimal <para>For filters producing HTML, the output could be very minimal
like the following example: like the following example:
<programlisting> <programlisting><![CDATA[
&lt;html> <html>
&lt;head> <head>
&lt;meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> <meta http-equiv="Content-Type" content="text/html;charset=UTF-8"/>
&lt;/head> </head>
&lt;body> <body>
Some text content Some text content
&lt;/body> </body>
&lt;/html> </html>
</programlisting> ]]></programlisting>
</para> </para>
<para>You should take care to escape some <para>You should take care to escape some
@ -5087,9 +5104,16 @@ text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/r
Python2 and Python3.</para> Python2 and Python3.</para>
<para>The search interface is used in a number of active projects: <para>The search interface is used in a number of active projects:
the &RCL; <application>Gnome Shell Search Provider</application>, the <ulink
the &RCL; Web UI, and the upmpdcli UPnP Media Server, in addition url="https://www.lesbonscomptes.com/recoll/download.html#gssp">
to many small scripts.</para> &RCL; <application>Gnome Shell Search Provider</application>
</ulink>,
the <ulink url="https://opensourceprojects.eu/p/recollwebui/code/">
&RCL; Web UI</ulink>, and the
<ulink
url="https://www.lesbonscomptes.com/upmpdcli/upmpdcli-manual.html#UPRCL">
upmpdcli UPnP Media Server</ulink>, in addition
to many small scripts.</para>
<para>The index update section of the API may be used to create and <para>The index update section of the API may be used to create and
update &RCL; indexes on specific configurations (separate from the update &RCL; indexes on specific configurations (separate from the

View File

@ -1,118 +0,0 @@
#!/usr/bin/env python3
# Copyright (C) 2014 J.F.Dockes
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the
# Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
######################################
from __future__ import print_function
import sys
import rclexecm
import rclgenxslt
stylesheet_all = '''<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:ab="http://www.abisource.com/awml.dtd"
exclude-result-prefixes="ab"
>
<xsl:output method="html" encoding="UTF-8"/>
<xsl:template match="/">
<html>
<head>
<xsl:apply-templates select="ab:abiword/ab:metadata"/>
</head>
<body>
<!-- This is for the older abiword format with no namespaces -->
<xsl:for-each select="abiword/section">
<xsl:apply-templates select="p"/>
</xsl:for-each>
<!-- Newer namespaced format -->
<xsl:for-each select="ab:abiword/ab:section">
<xsl:for-each select="ab:p">
<p><xsl:value-of select="."/></p><xsl:text>
</xsl:text>
</xsl:for-each>
</xsl:for-each>
</body>
</html>
</xsl:template>
<xsl:template match="p">
<p><xsl:value-of select="."/></p><xsl:text>
</xsl:text>
</xsl:template>
<xsl:template match="ab:metadata">
<xsl:for-each select="ab:m">
<xsl:choose>
<xsl:when test="@key = 'dc.creator'">
<meta>
<xsl:attribute name="name">author</xsl:attribute>
<xsl:attribute name="content">
<xsl:value-of select="."/>
</xsl:attribute>
</meta><xsl:text>
</xsl:text>
</xsl:when>
<xsl:when test="@key = 'abiword.keywords'">
<meta>
<xsl:attribute name="name">keywords</xsl:attribute>
<xsl:attribute name="content">
<xsl:value-of select="."/>
</xsl:attribute>
</meta><xsl:text>
</xsl:text>
</xsl:when>
<xsl:when test="@key = 'dc.subject'">
<meta>
<xsl:attribute name="name">keywords</xsl:attribute>
<xsl:attribute name="content">
<xsl:value-of select="."/>
</xsl:attribute>
</meta><xsl:text>
</xsl:text>
</xsl:when>
<xsl:when test="@key = 'dc.description'">
<meta>
<xsl:attribute name="name">abstract</xsl:attribute>
<xsl:attribute name="content">
<xsl:value-of select="."/>
</xsl:attribute>
</meta><xsl:text>
</xsl:text>
</xsl:when>
<xsl:when test="@key = 'dc.title'">
<title><xsl:value-of select="."/></title><xsl:text>
</xsl:text>
</xsl:when>
<xsl:otherwise>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
'''
if __name__ == '__main__':
proto = rclexecm.RclExecM()
extract = rclgenxslt.XSLTExtractor(proto, stylesheet_all)
rclexecm.main(proto, extract)

View File

@ -1,112 +0,0 @@
#!/usr/bin/env python3
# Copyright (C) 2014 J.F.Dockes
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the
# Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
######################################
from __future__ import print_function
import sys
import rclexecm
import rclgenxslt
stylesheet_all = '''<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0"
xmlns:ooo="http://openoffice.org/2004/office"
xmlns:gnm="http://www.gnumeric.org/v10.dtd"
exclude-result-prefixes="office xlink meta ooo dc"
>
<xsl:output method="html" encoding="UTF-8"/>
<xsl:template match="/">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
<xsl:apply-templates select="//office:document-meta/office:meta"/>
</head>
<body>
<xsl:apply-templates select="//gnm:Cells"/>
<xsl:apply-templates select="//gnm:Objects"/>
</body>
</html>
</xsl:template>
<xsl:template match="//dc:date">
<meta>
<xsl:attribute name="name">date</xsl:attribute>
<xsl:attribute name="content"><xsl:value-of select="."/></xsl:attribute>
</meta>
</xsl:template>
<xsl:template match="//dc:description">
<meta>
<xsl:attribute name="name">abstract</xsl:attribute>
<xsl:attribute name="content"><xsl:value-of select="."/></xsl:attribute>
</meta>
</xsl:template>
<xsl:template match="//meta:keyword">
<meta>
<xsl:attribute name="name">keywords</xsl:attribute>
<xsl:attribute name="content"><xsl:value-of select="."/></xsl:attribute>
</meta>
</xsl:template>
<xsl:template match="//dc:subject">
<meta>
<xsl:attribute name="name">keywords</xsl:attribute>
<xsl:attribute name="content"><xsl:value-of select="."/></xsl:attribute>
</meta>
</xsl:template>
<xsl:template match="//dc:title">
<title> <xsl:value-of select="."/> </title>
</xsl:template>
<xsl:template match="//meta:initial-creator">
<meta>
<xsl:attribute name="name">author</xsl:attribute>
<xsl:attribute name="content"><xsl:value-of select="."/></xsl:attribute>
</meta>
</xsl:template>
<xsl:template match="office:meta/*"/>
<xsl:template match="gnm:Cell">
<p><xsl:value-of select="."/></p>
</xsl:template>
<xsl:template match="gnm:CellComment">
<blockquote><xsl:value-of select="@Text"/></blockquote>
</xsl:template>
</xsl:stylesheet>
'''
if __name__ == '__main__':
proto = rclexecm.RclExecM()
extract = rclgenxslt.XSLTExtractor(proto, stylesheet_all, gzip=True)
rclexecm.main(proto, extract)

View File

@ -1,70 +0,0 @@
#!/usr/bin/env python3
# Copyright (C) 2014 J.F.Dockes
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the
# Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
######################################
from __future__ import print_function
import sys
import rclexecm
import rclgenxslt
stylesheet_all = '''<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html" encoding="UTF-8"/>
<xsl:strip-space elements="*" />
<xsl:template match="/">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>
Okular notes about: <xsl:value-of select="/documentInfo/@url" />
</title>
</head>
<body>
<xsl:apply-templates />
</body>
</html>
</xsl:template>
<xsl:template match="node()">
<xsl:apply-templates select="@* | node() "/>
</xsl:template>
<xsl:template match="text()">
<p><xsl:value-of select="."/></p>
<xsl:text >
</xsl:text>
</xsl:template>
<xsl:template match="@contents|@author">
<p><xsl:value-of select="." /></p>
<xsl:text >
</xsl:text>
</xsl:template>
<xsl:template match="@*"/>
</xsl:stylesheet>
'''
if __name__ == '__main__':
proto = rclexecm.RclExecM()
extract = rclgenxslt.XSLTExtractor(proto, stylesheet_all)
rclexecm.main(proto, extract)

View File

@ -1,137 +0,0 @@
#!/usr/bin/env python3
# Copyright (C) 2014-2018 J.F.Dockes
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the
# Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
######################################
import sys
import rclexecm
import rclgenxslt
stylesheet = '''<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0"
xmlns:ooo="http://openoffice.org/2004/office"
xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0"
exclude-result-prefixes="office xlink meta ooo dc text"
>
<xsl:output method="html" encoding="UTF-8"/>
<xsl:template match="/">
<html>
<head>
<xsl:apply-templates select="/office:document/office:meta" />
</head>
<body>
<xsl:apply-templates select="/office:document/office:body" />
</body></html>
</xsl:template>
<xsl:template match="/office:document/office:meta">
<xsl:apply-templates select="dc:title"/>
<xsl:apply-templates select="dc:description"/>
<xsl:apply-templates select="dc:subject"/>
<xsl:apply-templates select="meta:keyword"/>
<xsl:apply-templates select="dc:creator"/>
</xsl:template>
<xsl:template match="/office:document/office:body">
<xsl:apply-templates select=".//text:p" />
<xsl:apply-templates select=".//text:h" />
<xsl:apply-templates select=".//text:s" />
<xsl:apply-templates select=".//text:line-break" />
<xsl:apply-templates select=".//text:tab" />
</xsl:template>
<xsl:template match="dc:title">
<title> <xsl:value-of select="."/> </title><xsl:text>
</xsl:text>
</xsl:template>
<xsl:template match="dc:description">
<meta>
<xsl:attribute name="name">abstract</xsl:attribute>
<xsl:attribute name="content">
<xsl:value-of select="."/>
</xsl:attribute>
</meta><xsl:text>
</xsl:text>
</xsl:template>
<xsl:template match="dc:subject">
<meta>
<xsl:attribute name="name">keywords</xsl:attribute>
<xsl:attribute name="content">
<xsl:value-of select="."/>
</xsl:attribute>
</meta><xsl:text>
</xsl:text>
</xsl:template>
<xsl:template match="dc:creator">
<meta>
<xsl:attribute name="name">author</xsl:attribute>
<xsl:attribute name="content">
<xsl:value-of select="."/>
</xsl:attribute>
</meta><xsl:text>
</xsl:text>
</xsl:template>
<xsl:template match="meta:keyword">
<meta>
<xsl:attribute name="name">keywords</xsl:attribute>
<xsl:attribute name="content">
<xsl:value-of select="."/>
</xsl:attribute>
</meta><xsl:text>
</xsl:text>
</xsl:template>
<xsl:template match="office:body//text:p">
<p><xsl:apply-templates/></p><xsl:text>
</xsl:text>
</xsl:template>
<xsl:template match="office:body//text:h">
<p><xsl:apply-templates/></p><xsl:text>
</xsl:text>
</xsl:template>
<xsl:template match="office:body//text:s">
<xsl:text> </xsl:text>
</xsl:template>
<xsl:template match="office:body//text:line-break">
<br />
</xsl:template>
<xsl:template match="office:body//text:tab">
<xsl:text> </xsl:text>
</xsl:template>
</xsl:stylesheet>
'''
if __name__ == '__main__':
proto = rclexecm.RclExecM()
extract = rclgenxslt.XSLTExtractor(proto, stylesheet)
rclexecm.main(proto, extract)

View File

@ -1,170 +0,0 @@
#!/usr/bin/env python3
# Copyright (C) 2014 J.F.Dockes
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the
# Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
######################################
from __future__ import print_function
import sys
import rclexecm
import rclxslt
from rclbasehandler import RclBaseHandler
from zipfile import ZipFile
stylesheet_meta = '''<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0"
xmlns:ooo="http://openoffice.org/2004/office"
exclude-result-prefixes="office xlink meta ooo dc"
>
<xsl:output method="html" encoding="UTF-8"/>
<xsl:template match="/office:document-meta">
<xsl:apply-templates select="office:meta/dc:description"/>
<xsl:apply-templates select="office:meta/dc:subject"/>
<xsl:apply-templates select="office:meta/dc:title"/>
<xsl:apply-templates select="office:meta/meta:keyword"/>
<xsl:apply-templates select="office:meta/dc:creator"/>
</xsl:template>
<xsl:template match="dc:title">
<title> <xsl:value-of select="."/> </title><xsl:text>
</xsl:text>
</xsl:template>
<xsl:template match="dc:description">
<meta>
<xsl:attribute name="name">abstract</xsl:attribute>
<xsl:attribute name="content">
<xsl:value-of select="."/>
</xsl:attribute>
</meta><xsl:text>
</xsl:text>
</xsl:template>
<xsl:template match="dc:subject">
<meta>
<xsl:attribute name="name">keywords</xsl:attribute>
<xsl:attribute name="content">
<xsl:value-of select="."/>
</xsl:attribute>
</meta><xsl:text>
</xsl:text>
</xsl:template>
<xsl:template match="dc:creator">
<meta>
<xsl:attribute name="name">author</xsl:attribute>
<xsl:attribute name="content">
<xsl:value-of select="."/>
</xsl:attribute>
</meta><xsl:text>
</xsl:text>
</xsl:template>
<xsl:template match="meta:keyword">
<meta>
<xsl:attribute name="name">keywords</xsl:attribute>
<xsl:attribute name="content">
<xsl:value-of select="."/>
</xsl:attribute>
</meta><xsl:text>
</xsl:text>
</xsl:template>
</xsl:stylesheet>
'''
stylesheet_content = '''<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0"
exclude-result-prefixes="text"
>
<xsl:output method="html" encoding="UTF-8"/>
<xsl:template match="text:p">
<p><xsl:apply-templates/></p><xsl:text>
</xsl:text>
</xsl:template>
<xsl:template match="text:h">
<p><xsl:apply-templates/></p><xsl:text>
</xsl:text>
</xsl:template>
<xsl:template match="text:s">
<xsl:text> </xsl:text>
</xsl:template>
<xsl:template match="text:line-break">
<br />
</xsl:template>
<xsl:template match="text:tab">
<xsl:text> </xsl:text>
</xsl:template>
</xsl:stylesheet>
'''
class OOExtractor(RclBaseHandler):
def __init__(self, em):
super(OOExtractor, self).__init__(em)
def html_text(self, fn):
f = open(fn, 'rb')
zip = ZipFile(f)
docdata = b'<html>\n<head>\n<meta http-equiv="Content-Type"' \
b'content="text/html; charset=UTF-8">'
# Wrap metadata extraction because it can sometimes throw
# while the main text will be valid
try:
metadata = zip.read("meta.xml")
if metadata:
res = rclxslt.apply_sheet_data(stylesheet_meta, metadata)
docdata += res
except:
# To be checked. I'm under the impression that I get this when
# nothing matches?
#self.em.rclog("No/bad metadata in %s" % fn)
pass
docdata += b'</head>\n<body>\n'
content = zip.read("content.xml")
if content:
res = rclxslt.apply_sheet_data(stylesheet_content, content)
docdata += res
docdata += b'</body></html>'
return docdata
if __name__ == '__main__':
proto = rclexecm.RclExecM()
extract = OOExtractor(proto)
rclexecm.main(proto, extract)

View File

@ -1,105 +0,0 @@
#!/usr/bin/env python3
# Copyright (C) 2014 J.F.Dockes
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the
# Free Software Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
######################################
from __future__ import print_function
import sys
import rclexecm
import rclgenxslt
stylesheet_all = '''<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:svg="http://www.w3.org/2000/svg"
xmlns:dc="http://purl.org/dc/elements/1.1/"
exclude-result-prefixes="svg"
>
<xsl:output method="html" encoding="UTF-8"/>
<xsl:template match="/">
<html>
<head>
<xsl:apply-templates select="svg:svg/svg:title"/>
<xsl:apply-templates select="svg:svg/svg:desc"/>
<xsl:apply-templates select="svg:svg/svg:metadata/descendant::dc:creator"/>
<xsl:apply-templates select="svg:svg/svg:metadata/descendant::dc:subject"/>
<xsl:apply-templates select="svg:svg/svg:metadata/descendant::dc:description"/>
</head>
<body>
<xsl:apply-templates select="//svg:text"/>
</body>
</html>
</xsl:template>
<xsl:template match="svg:desc">
<meta>
<xsl:attribute name="name">keywords</xsl:attribute>
<xsl:attribute name="content">
<xsl:value-of select="."/>
</xsl:attribute>
</meta><xsl:text>
</xsl:text>
</xsl:template>
<xsl:template match="dc:creator">
<meta>
<xsl:attribute name="name">author</xsl:attribute>
<xsl:attribute name="content">
<xsl:value-of select="."/>
</xsl:attribute>
</meta><xsl:text>
</xsl:text>
</xsl:template>
<xsl:template match="dc:subject">
<meta>
<xsl:attribute name="name">keywords</xsl:attribute>
<xsl:attribute name="content">
<xsl:value-of select="."/>
</xsl:attribute>
</meta><xsl:text>
</xsl:text>
</xsl:template>
<xsl:template match="dc:description">
<meta>
<xsl:attribute name="name">description</xsl:attribute>
<xsl:attribute name="content">
<xsl:value-of select="."/>
</xsl:attribute>
</meta><xsl:text>
</xsl:text>
</xsl:template>
<xsl:template match="svg:title">
<title><xsl:value-of select="."/></title><xsl:text>
</xsl:text>
</xsl:template>
<xsl:template match="svg:text">
<p><xsl:value-of select="."/></p><xsl:text>
</xsl:text>
</xsl:template>
</xsl:stylesheet>
'''
if __name__ == '__main__':
proto = rclexecm.RclExecM()
extract = rclgenxslt.XSLTExtractor(proto, stylesheet_all)
rclexecm.main(proto, extract)