doc

2019-03-10 14:47:05 +01:00 · 2019-03-10 14:47:05 +01:00 · 3b55d03b39
commit 3b55d03b39
parent ed45e5f00e
3 changed files with 110 additions and 106 deletions
--- a/src/doc/user/usermanual.html
+++ b/src/doc/user/usermanual.html
@ -6920,96 +6920,94 @@ recollindex -c "$confdir"
                </div>
              </div>
            </div>
-            <p>Index queries do not provide document content (only
+            <p>Prior to <span class="application">Recoll</span>
-            a partial and unprecise reconstruction is performed to
+            1.25, index queries never provide document content
-            show the snippets text). In order to access the actual
+            because it is not stored. More recent versions usually
-            document data, the data extraction part of the indexing
+            store the document text, which can be optionally
-            process must be performed (subdocument access and
+            retrieved when running a query (see <code class=
-            format translation). This is not trivial in the case of
+            "literal">query.execute()</code> above - the result is
-            embedded documents. The <code class=
+            always plain text).</p>
-            "literal">rclextract</code> module provides a single
+            <p>The <code class="literal">rclextract</code> module
-            class which can be used to access the data content for
+            can give access to the original document and to the
-            result documents.</p>
+            document text content (if not stored by the index, or
            to access an HTML version of the text). Acessing the
            original document is particularly useful if it is
            embedded (e.g. an email attachment).</p>
            <p>You need to import the <code class=
            "literal">recoll</code> module before the <code class=
            "literal">rclextract</code> module.</p>
            <div class="sect4">
              <div class="titlepage">
                <div>
                  <div>
                    <h5 class="title"><a name=
-                    "RCL.PROGRAM.PYTHONAPI.RCLEXTRACT.CLASSES" id=
+                    "RCL.PROGRAM.PYTHONAPI.RCLEXTRACT.CLASSES.EXTRACTOR"
-                    "RCL.PROGRAM.PYTHONAPI.RCLEXTRACT.CLASSES"></a>Classes</h5>
+                    id=
                    "RCL.PROGRAM.PYTHONAPI.RCLEXTRACT.CLASSES.EXTRACTOR">
                    </a>The Extractor class</h5>
                  </div>
                </div>
              </div>
-              <div class="sect5">
+              <div class="variablelist">
-                <div class="titlepage">
+                <dl class="variablelist">
-                  <div>
+                  <dt><span class="term">Extractor(doc)</span></dt>
-                    <div>
+                  <dd>
-                      <h6 class="title"><a name=
+                    <p>An <code class="literal">Extractor</code>
-                      "RCL.PROGRAM.PYTHONAPI.RCLEXTRACT.CLASSES.EXTRACTOR"
+                    object is built from a <code class=
-                      id=
+                    "literal">Doc</code> object, output from a
-                      "RCL.PROGRAM.PYTHONAPI.RCLEXTRACT.CLASSES.EXTRACTOR">
+                    query.</p>
-                      </a>The Extractor class</h6>
+                  </dd>
-                    </div>
+                  <dt><span class=
-                  </div>
+                  "term">Extractor.textextract(ipath)</span></dt>
-                </div>
+                  <dd>
-                <div class="variablelist">
+                    <p>Extract document defined by <em class=
-                  <dl class="variablelist">
+                    "replaceable"><code>ipath</code></em> and
-                    <dt><span class=
+                    return a <code class="literal">Doc</code>
-                    "term">Extractor(doc)</span></dt>
+                    object. The <code class=
-                    <dd>
+                    "literal">doc.text</code> field has the
-                      <p>An <code class="literal">Extractor</code>
+                    document text converted to either text/plain or
-                      object is built from a <code class=
+                    text/html according to <code class=
-                      "literal">Doc</code> object, output from a
+                    "literal">doc.mimetype</code>. The typical use
-                      query.</p>
+                    would be as follows:</p>
-                    </dd>
+                    <pre class="programlisting">
-                    <dt><span class=
+from recoll import recoll, rclextract
-                    "term">Extractor.textextract(ipath)</span></dt>
+
                    <dd>
                      <p>Extract document defined by <em class=
                      "replaceable"><code>ipath</code></em> and
                      return a <code class="literal">Doc</code>
                      object. The <code class=
                      "literal">doc.text</code> field has the
                      document text converted to either text/plain
                      or text/html according to <code class=
                      "literal">doc.mimetype</code>. The typical
                      use would be as follows:</p>
                      <pre class="programlisting">
 qdoc = query.fetchone()
 extractor = recoll.Extractor(qdoc)
 doc = extractor.textextract(qdoc.ipath)
 # use doc.text, e.g. for previewing</pre>
-                      <p>Passing <code class=
+                    <p>Passing <code class=
-                      "literal">qdoc.ipath</code> to <code class=
+                    "literal">qdoc.ipath</code> to <code class=
-                      "literal">textextract()</code> is redundant,
+                    "literal">textextract()</code> is redundant,
-                      but reflects the fact that the <code class=
+                    but reflects the fact that the <code class=
-                      "literal">Extractor</code> object actually
+                    "literal">Extractor</code> object actually has
-                      has the capability to access the other
+                    the capability to access the other entries in a
-                      entries in a compound document.</p>
+                    compound document.</p>
-                    </dd>
+                  </dd>
-                    <dt><span class=
+                  <dt><span class=
-                    "term">Extractor.idoctofile(ipath, targetmtype,
+                  "term">Extractor.idoctofile(ipath, targetmtype,
-                    outfile='')</span></dt>
+                  outfile='')</span></dt>
-                    <dd>
+                  <dd>
-                      <p>Extracts document into an output file,
+                    <p>Extracts document into an output file, which
-                      which can be given explicitly or will be
+                    can be given explicitly or will be created as a
-                      created as a temporary file to be deleted by
+                    temporary file to be deleted by the caller.
-                      the caller. Typical use:</p>
+                    Typical use:</p>
-                      <pre class="programlisting">
+                    <pre class="programlisting">
 from recoll import recoll, rclextract
 qdoc = query.fetchone()
 extractor = recoll.Extractor(qdoc)
 filename = extractor.idoctofile(qdoc.ipath, qdoc.mimetype)</pre>
-                      <p>In all cases the output is a copy, even if
+                    <p>In all cases the output is a copy, even if
-                      the requested document is a regular system
+                    the requested document is a regular system
-                      file, which may be wasteful in some cases. If
+                    file, which may be wasteful in some cases. If
-                      you want to avoid this, you can test for a
+                    you want to avoid this, you can test for a
-                      simple file document as follows:</p>
+                    simple file document as follows:</p>
-                      <pre class="programlisting">
+                    <pre class="programlisting">
 not doc.ipath and (not "rclbes" in doc.keys() or doc["rclbes"] == "FS")
 </pre>
-                    </dd>
+                  </dd>
-                  </dl>
+                </dl>
                </div>
              </div>
            </div>
          </div>
--- a/src/doc/user/usermanual.xml
+++ b/src/doc/user/usermanual.xml
@ -5349,40 +5349,45 @@ recollindex -c "$confdir"
        <sect3 id="RCL.PROGRAM.PYTHONAPI.RCLEXTRACT">
          <title>The rclextract module</title>
-          <para>Index queries do not provide document content (only a
+          
-          partial and unprecise reconstruction is performed to show the
+          <para>Prior to &RCL; 1.25, index queries never provide document
-          snippets text). In order to access the actual document data, the
+          content because it is not stored. More recent versions usually
-          data extraction part of the indexing process must be performed
+          store the document text, which can be optionally retrieved when
-          (subdocument access and format translation). This is not trivial
+          running a query (see <literal>query.execute()</literal>
-          in the case of embedded documents. The
+          above - the result is always plain text).</para>
          <literal>rclextract</literal> module provides a single class
          which can be used to access the data content for result
          documents.</para>
-          <sect4 id="RCL.PROGRAM.PYTHONAPI.RCLEXTRACT.CLASSES">
+          <para>The <literal>rclextract</literal> module can give access to
-            <title>Classes</title>
+          the original document and to the document text content (if not
-            
+          stored by the index, or to access an HTML version of the text).
-            <sect5 id="RCL.PROGRAM.PYTHONAPI.RCLEXTRACT.CLASSES.EXTRACTOR">
+          Acessing the original document is particularly useful if it is
-              <title>The Extractor class</title>
+          embedded (e.g. an email attachment).</para>
-              <variablelist>
+          <para>You need to import the <literal>recoll</literal> module
          before the <literal>rclextract</literal> module.</para>
          <sect4 id="RCL.PROGRAM.PYTHONAPI.RCLEXTRACT.CLASSES.EXTRACTOR">
            <title>The Extractor class</title>
-                <varlistentry>
+            <variablelist>
-                  <term>Extractor(doc)</term>
+
-                  <listitem><para>An <literal>Extractor</literal> object is
+              <varlistentry>
-                  built from a <literal>Doc</literal> object, output
+                <term>Extractor(doc)</term>
-                  from a query.</para></listitem>
+                <listitem><para>An <literal>Extractor</literal> object is
-                </varlistentry>
+                built from a <literal>Doc</literal> object, output
-                <varlistentry>
+                from a query.</para></listitem>
-                  <term>Extractor.textextract(ipath)</term>
+              </varlistentry>
-                  <listitem><para>Extract document defined by
+              <varlistentry>
-                  <replaceable>ipath</replaceable> and return a
+                <term>Extractor.textextract(ipath)</term>
-                  <literal>Doc</literal> object. The
+                <listitem><para>Extract document defined by
-                  <literal>doc.text</literal> field has the document text
+                <replaceable>ipath</replaceable> and return a
-                  converted to either text/plain or text/html according to
+                <literal>Doc</literal> object. The
-                  <literal>doc.mimetype</literal>. The typical use would be
+                <literal>doc.text</literal> field has the document text
-                  as follows:</para>
+                converted to either text/plain or text/html according to
                <literal>doc.mimetype</literal>. The typical use would be
                as follows:</para>
 <programlisting>
 from recoll import recoll, rclextract
 qdoc = query.fetchone()
 extractor = recoll.Extractor(qdoc)
 doc = extractor.textextract(qdoc.ipath)
@ -5401,6 +5406,8 @@ doc = extractor.textextract(qdoc.ipath)
                  temporary file to be deleted by the caller. Typical
                  use:</para> 
 <programlisting>
 from recoll import recoll, rclextract
 qdoc = query.fetchone()
 extractor = recoll.Extractor(qdoc)
 filename = extractor.idoctofile(qdoc.ipath, qdoc.mimetype)</programlisting>
@ -5417,8 +5424,7 @@ not doc.ipath and (not "rclbes" in doc.keys() or doc["rclbes"] == "FS")
              </variablelist>
-            </sect5> <!-- Extractor class -->
+          </sect4>
          </sect4> <!-- rclextract classes -->
        </sect3> <!-- rclextract module -->
--- a/src/doc/user/webhelp/Makefile
+++ b/src/doc/user/webhelp/Makefile
@ -1,6 +1,6 @@
 # Configuration
 # The name of the source DocBook xml file
-INPUT_XML = ../usermanual.xml ../recoll.conf.xml
+INPUT_XML = ../usermanual.xml 
 # The makefile assumes that you have a 
 # directory named images that contains