doc
This commit is contained in:
parent
ed45e5f00e
commit
3b55d03b39
@ -6920,96 +6920,94 @@ recollindex -c "$confdir"
|
|||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
<p>Index queries do not provide document content (only
|
<p>Prior to <span class="application">Recoll</span>
|
||||||
a partial and unprecise reconstruction is performed to
|
1.25, index queries never provide document content
|
||||||
show the snippets text). In order to access the actual
|
because it is not stored. More recent versions usually
|
||||||
document data, the data extraction part of the indexing
|
store the document text, which can be optionally
|
||||||
process must be performed (subdocument access and
|
retrieved when running a query (see <code class=
|
||||||
format translation). This is not trivial in the case of
|
"literal">query.execute()</code> above - the result is
|
||||||
embedded documents. The <code class=
|
always plain text).</p>
|
||||||
"literal">rclextract</code> module provides a single
|
<p>The <code class="literal">rclextract</code> module
|
||||||
class which can be used to access the data content for
|
can give access to the original document and to the
|
||||||
result documents.</p>
|
document text content (if not stored by the index, or
|
||||||
|
to access an HTML version of the text). Acessing the
|
||||||
|
original document is particularly useful if it is
|
||||||
|
embedded (e.g. an email attachment).</p>
|
||||||
|
<p>You need to import the <code class=
|
||||||
|
"literal">recoll</code> module before the <code class=
|
||||||
|
"literal">rclextract</code> module.</p>
|
||||||
<div class="sect4">
|
<div class="sect4">
|
||||||
<div class="titlepage">
|
<div class="titlepage">
|
||||||
<div>
|
<div>
|
||||||
<div>
|
<div>
|
||||||
<h5 class="title"><a name=
|
<h5 class="title"><a name=
|
||||||
"RCL.PROGRAM.PYTHONAPI.RCLEXTRACT.CLASSES" id=
|
"RCL.PROGRAM.PYTHONAPI.RCLEXTRACT.CLASSES.EXTRACTOR"
|
||||||
"RCL.PROGRAM.PYTHONAPI.RCLEXTRACT.CLASSES"></a>Classes</h5>
|
id=
|
||||||
|
"RCL.PROGRAM.PYTHONAPI.RCLEXTRACT.CLASSES.EXTRACTOR">
|
||||||
|
</a>The Extractor class</h5>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
<div class="sect5">
|
<div class="variablelist">
|
||||||
<div class="titlepage">
|
<dl class="variablelist">
|
||||||
<div>
|
<dt><span class="term">Extractor(doc)</span></dt>
|
||||||
<div>
|
<dd>
|
||||||
<h6 class="title"><a name=
|
<p>An <code class="literal">Extractor</code>
|
||||||
"RCL.PROGRAM.PYTHONAPI.RCLEXTRACT.CLASSES.EXTRACTOR"
|
object is built from a <code class=
|
||||||
id=
|
"literal">Doc</code> object, output from a
|
||||||
"RCL.PROGRAM.PYTHONAPI.RCLEXTRACT.CLASSES.EXTRACTOR">
|
query.</p>
|
||||||
</a>The Extractor class</h6>
|
</dd>
|
||||||
</div>
|
<dt><span class=
|
||||||
</div>
|
"term">Extractor.textextract(ipath)</span></dt>
|
||||||
</div>
|
<dd>
|
||||||
<div class="variablelist">
|
<p>Extract document defined by <em class=
|
||||||
<dl class="variablelist">
|
"replaceable"><code>ipath</code></em> and
|
||||||
<dt><span class=
|
return a <code class="literal">Doc</code>
|
||||||
"term">Extractor(doc)</span></dt>
|
object. The <code class=
|
||||||
<dd>
|
"literal">doc.text</code> field has the
|
||||||
<p>An <code class="literal">Extractor</code>
|
document text converted to either text/plain or
|
||||||
object is built from a <code class=
|
text/html according to <code class=
|
||||||
"literal">Doc</code> object, output from a
|
"literal">doc.mimetype</code>. The typical use
|
||||||
query.</p>
|
would be as follows:</p>
|
||||||
</dd>
|
<pre class="programlisting">
|
||||||
<dt><span class=
|
from recoll import recoll, rclextract
|
||||||
"term">Extractor.textextract(ipath)</span></dt>
|
|
||||||
<dd>
|
|
||||||
<p>Extract document defined by <em class=
|
|
||||||
"replaceable"><code>ipath</code></em> and
|
|
||||||
return a <code class="literal">Doc</code>
|
|
||||||
object. The <code class=
|
|
||||||
"literal">doc.text</code> field has the
|
|
||||||
document text converted to either text/plain
|
|
||||||
or text/html according to <code class=
|
|
||||||
"literal">doc.mimetype</code>. The typical
|
|
||||||
use would be as follows:</p>
|
|
||||||
<pre class="programlisting">
|
|
||||||
qdoc = query.fetchone()
|
qdoc = query.fetchone()
|
||||||
extractor = recoll.Extractor(qdoc)
|
extractor = recoll.Extractor(qdoc)
|
||||||
doc = extractor.textextract(qdoc.ipath)
|
doc = extractor.textextract(qdoc.ipath)
|
||||||
# use doc.text, e.g. for previewing</pre>
|
# use doc.text, e.g. for previewing</pre>
|
||||||
<p>Passing <code class=
|
<p>Passing <code class=
|
||||||
"literal">qdoc.ipath</code> to <code class=
|
"literal">qdoc.ipath</code> to <code class=
|
||||||
"literal">textextract()</code> is redundant,
|
"literal">textextract()</code> is redundant,
|
||||||
but reflects the fact that the <code class=
|
but reflects the fact that the <code class=
|
||||||
"literal">Extractor</code> object actually
|
"literal">Extractor</code> object actually has
|
||||||
has the capability to access the other
|
the capability to access the other entries in a
|
||||||
entries in a compound document.</p>
|
compound document.</p>
|
||||||
</dd>
|
</dd>
|
||||||
<dt><span class=
|
<dt><span class=
|
||||||
"term">Extractor.idoctofile(ipath, targetmtype,
|
"term">Extractor.idoctofile(ipath, targetmtype,
|
||||||
outfile='')</span></dt>
|
outfile='')</span></dt>
|
||||||
<dd>
|
<dd>
|
||||||
<p>Extracts document into an output file,
|
<p>Extracts document into an output file, which
|
||||||
which can be given explicitly or will be
|
can be given explicitly or will be created as a
|
||||||
created as a temporary file to be deleted by
|
temporary file to be deleted by the caller.
|
||||||
the caller. Typical use:</p>
|
Typical use:</p>
|
||||||
<pre class="programlisting">
|
<pre class="programlisting">
|
||||||
|
from recoll import recoll, rclextract
|
||||||
|
|
||||||
qdoc = query.fetchone()
|
qdoc = query.fetchone()
|
||||||
extractor = recoll.Extractor(qdoc)
|
extractor = recoll.Extractor(qdoc)
|
||||||
filename = extractor.idoctofile(qdoc.ipath, qdoc.mimetype)</pre>
|
filename = extractor.idoctofile(qdoc.ipath, qdoc.mimetype)</pre>
|
||||||
<p>In all cases the output is a copy, even if
|
<p>In all cases the output is a copy, even if
|
||||||
the requested document is a regular system
|
the requested document is a regular system
|
||||||
file, which may be wasteful in some cases. If
|
file, which may be wasteful in some cases. If
|
||||||
you want to avoid this, you can test for a
|
you want to avoid this, you can test for a
|
||||||
simple file document as follows:</p>
|
simple file document as follows:</p>
|
||||||
<pre class="programlisting">
|
<pre class="programlisting">
|
||||||
not doc.ipath and (not "rclbes" in doc.keys() or doc["rclbes"] == "FS")
|
not doc.ipath and (not "rclbes" in doc.keys() or doc["rclbes"] == "FS")
|
||||||
</pre>
|
</pre>
|
||||||
</dd>
|
</dd>
|
||||||
</dl>
|
</dl>
|
||||||
</div>
|
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
</div>
|
</div>
|
||||||
|
|||||||
@ -5349,40 +5349,45 @@ recollindex -c "$confdir"
|
|||||||
<sect3 id="RCL.PROGRAM.PYTHONAPI.RCLEXTRACT">
|
<sect3 id="RCL.PROGRAM.PYTHONAPI.RCLEXTRACT">
|
||||||
<title>The rclextract module</title>
|
<title>The rclextract module</title>
|
||||||
|
|
||||||
<para>Index queries do not provide document content (only a
|
|
||||||
partial and unprecise reconstruction is performed to show the
|
<para>Prior to &RCL; 1.25, index queries never provide document
|
||||||
snippets text). In order to access the actual document data, the
|
content because it is not stored. More recent versions usually
|
||||||
data extraction part of the indexing process must be performed
|
store the document text, which can be optionally retrieved when
|
||||||
(subdocument access and format translation). This is not trivial
|
running a query (see <literal>query.execute()</literal>
|
||||||
in the case of embedded documents. The
|
above - the result is always plain text).</para>
|
||||||
<literal>rclextract</literal> module provides a single class
|
|
||||||
which can be used to access the data content for result
|
|
||||||
documents.</para>
|
|
||||||
|
|
||||||
<sect4 id="RCL.PROGRAM.PYTHONAPI.RCLEXTRACT.CLASSES">
|
<para>The <literal>rclextract</literal> module can give access to
|
||||||
<title>Classes</title>
|
the original document and to the document text content (if not
|
||||||
|
stored by the index, or to access an HTML version of the text).
|
||||||
<sect5 id="RCL.PROGRAM.PYTHONAPI.RCLEXTRACT.CLASSES.EXTRACTOR">
|
Acessing the original document is particularly useful if it is
|
||||||
<title>The Extractor class</title>
|
embedded (e.g. an email attachment).</para>
|
||||||
|
|
||||||
<variablelist>
|
<para>You need to import the <literal>recoll</literal> module
|
||||||
|
before the <literal>rclextract</literal> module.</para>
|
||||||
|
|
||||||
|
<sect4 id="RCL.PROGRAM.PYTHONAPI.RCLEXTRACT.CLASSES.EXTRACTOR">
|
||||||
|
<title>The Extractor class</title>
|
||||||
|
|
||||||
<varlistentry>
|
<variablelist>
|
||||||
<term>Extractor(doc)</term>
|
|
||||||
<listitem><para>An <literal>Extractor</literal> object is
|
<varlistentry>
|
||||||
built from a <literal>Doc</literal> object, output
|
<term>Extractor(doc)</term>
|
||||||
from a query.</para></listitem>
|
<listitem><para>An <literal>Extractor</literal> object is
|
||||||
</varlistentry>
|
built from a <literal>Doc</literal> object, output
|
||||||
<varlistentry>
|
from a query.</para></listitem>
|
||||||
<term>Extractor.textextract(ipath)</term>
|
</varlistentry>
|
||||||
<listitem><para>Extract document defined by
|
<varlistentry>
|
||||||
<replaceable>ipath</replaceable> and return a
|
<term>Extractor.textextract(ipath)</term>
|
||||||
<literal>Doc</literal> object. The
|
<listitem><para>Extract document defined by
|
||||||
<literal>doc.text</literal> field has the document text
|
<replaceable>ipath</replaceable> and return a
|
||||||
converted to either text/plain or text/html according to
|
<literal>Doc</literal> object. The
|
||||||
<literal>doc.mimetype</literal>. The typical use would be
|
<literal>doc.text</literal> field has the document text
|
||||||
as follows:</para>
|
converted to either text/plain or text/html according to
|
||||||
|
<literal>doc.mimetype</literal>. The typical use would be
|
||||||
|
as follows:</para>
|
||||||
<programlisting>
|
<programlisting>
|
||||||
|
from recoll import recoll, rclextract
|
||||||
|
|
||||||
qdoc = query.fetchone()
|
qdoc = query.fetchone()
|
||||||
extractor = recoll.Extractor(qdoc)
|
extractor = recoll.Extractor(qdoc)
|
||||||
doc = extractor.textextract(qdoc.ipath)
|
doc = extractor.textextract(qdoc.ipath)
|
||||||
@ -5401,6 +5406,8 @@ doc = extractor.textextract(qdoc.ipath)
|
|||||||
temporary file to be deleted by the caller. Typical
|
temporary file to be deleted by the caller. Typical
|
||||||
use:</para>
|
use:</para>
|
||||||
<programlisting>
|
<programlisting>
|
||||||
|
from recoll import recoll, rclextract
|
||||||
|
|
||||||
qdoc = query.fetchone()
|
qdoc = query.fetchone()
|
||||||
extractor = recoll.Extractor(qdoc)
|
extractor = recoll.Extractor(qdoc)
|
||||||
filename = extractor.idoctofile(qdoc.ipath, qdoc.mimetype)</programlisting>
|
filename = extractor.idoctofile(qdoc.ipath, qdoc.mimetype)</programlisting>
|
||||||
@ -5417,8 +5424,7 @@ not doc.ipath and (not "rclbes" in doc.keys() or doc["rclbes"] == "FS")
|
|||||||
|
|
||||||
</variablelist>
|
</variablelist>
|
||||||
|
|
||||||
</sect5> <!-- Extractor class -->
|
</sect4>
|
||||||
</sect4> <!-- rclextract classes -->
|
|
||||||
</sect3> <!-- rclextract module -->
|
</sect3> <!-- rclextract module -->
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@ -1,6 +1,6 @@
|
|||||||
# Configuration
|
# Configuration
|
||||||
# The name of the source DocBook xml file
|
# The name of the source DocBook xml file
|
||||||
INPUT_XML = ../usermanual.xml ../recoll.conf.xml
|
INPUT_XML = ../usermanual.xml
|
||||||
|
|
||||||
# The makefile assumes that you have a
|
# The makefile assumes that you have a
|
||||||
# directory named images that contains
|
# directory named images that contains
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user