doc

2019-04-12 12:01:12 +02:00 · 2019-04-12 12:01:12 +02:00 · 3ebf1a7db2
commit 3ebf1a7db2
parent ad89225b24
3 changed files with 819 additions and 863 deletions
--- a/src/doc/user/Makefile
+++ b/src/doc/user/Makefile
@ -17,8 +17,9 @@ XSLDIR="/usr/share/xml/docbook/stylesheet/docbook-xsl/"

 # Options common to the single-file and chunked versions
 commonoptions=--stringparam section.autolabel 1 \
-  --stringparam section.autolabel.max.depth 3 \
+  --stringparam section.autolabel.max.depth 2 \
  --stringparam section.label.includes.component.label 1 \
+  --stringparam toc.max.depth 3 \
  --stringparam autotoc.label.in.hyperlink 0 \
  --stringparam abstract.notitle.enabled 1 \
  --stringparam html.stylesheet docbook-xsl.css \
--- a/src/doc/user/usermanual.html
+++ b/src/doc/user/usermanual.html
--- a/src/doc/user/usermanual.xml
+++ b/src/doc/user/usermanual.xml
@ -4966,13 +4966,14 @@ recollindex -c "$confdir"
      <sect2 id="RCL.PROGRAM.PYTHONAPI.INTRO">
        <title>Introduction</title>

-        <para>&RCL; versions after 1.11 define a Python programming
-        interface, both for searching and creating/updating an
-        index.</para>
+        <para>The &RCL; Python programming interface can be used both for
+        searching and for creating/updating an index. Bindings exist for
+        Python2 and Python3.</para>

-        <para>The search interface is used in the &RCL; Ubuntu Unity Lens
-        and the &RCL; Web UI. It can run queries on any &RCL;
-        configuration.</para>
+        <para>The search interface is used in a number of active projects:
+        the &RCL; <application>Gnome Shell Search Provider</application>,
+        the &RCL; Web UI, and the upmpdcli UPnP Media Server, in addition
+        to many small scripts.</para>

        <para>The index update section of the API may be used to create and
        update &RCL; indexes on specific configurations (separate from the
@ -4998,6 +4999,19 @@ recollindex -c "$confdir"
        paragraph at the end of this section will explain a few differences
        and ways to write code compatible with both versions.</para>

+        <para>The <literal>recoll</literal> package now contains two
+        modules:</para>
+        <itemizedlist>
+          <listitem><para>The <literal>recoll</literal> module contains
+          functions and classes used to query (or update) the
+          index.</para></listitem>
+
+          <listitem><para>The <literal>rclextract</literal> module contains
+          functions and classes used at query time to access document
+          data.</para>
+          </listitem>
+        </itemizedlist>
+
        <para>There is a good chance that your system repository has
        packages for the Recoll Python API, sometimes in a package separate
        from the main one (maybe named something like python-recoll).  Else
@ -5022,13 +5036,17 @@ recollindex -c "$confdir"
        nres = query.execute("some query")
        results = query.fetchmany(20)
        for doc in results:
-        print(doc.url, doc.title)
+            print("%s %s" % (doc.url, doc.title))
        ]]></programlisting>

-        <para>You can also take a look at the source for the <ulink
-        url="https://github.com/koniu/recoll-webui">Recoll
-        WebUI</ulink>, or the <ulink url="https://opensourceprojects.eu/p/upmpdcli/code/ci/c8c8e75bd181ad9db2df14da05934e53ca867a06/tree/src/mediaserver/cdplugins/uprcl/uprclfolders.py">upmpdcli local media server</ulink>, which are both
-        based on the Python API.</para>
+        <para>You can also take a look at the source for the
+        <ulink  url="https://opensourceprojects.eu/p/recollwebui/code/ci/78ddb20787b2a894b5e4661a8d5502c4511cf71e/tree/">Recoll
+        WebUI</ulink>, the
+        <ulink url="https://opensourceprojects.eu/p/upmpdcli/code/ci/c8c8e75bd181ad9db2df14da05934e53ca867a06/tree/src/mediaserver/cdplugins/uprcl/uprclfolders.py">upmpdcli 
+        local media server</ulink>, or the
+        <ulink
+            url="https://opensourceprojects.eu/p/recollgssp/code/ci/3f120108e099f9d687306c0be61593994326d52d/tree/gssp-recoll.py">Gnome
+        Shell Search Provider</ulink>.</para>
        
      </sect2>
      
@ -5104,10 +5122,14 @@ recollindex -c "$confdir"

          <varlistentry> 
            <term>Stored and indexed fields</term> 
-            <listitem><para>The <filename>fields</filename> file inside
-            the &RCL; configuration defines which document fields are
-            either "indexed" (searchable), "stored" (retrievable with
-            search results), or both.</para>
+            <listitem><para>The <link
+            linkend="RCL.INSTALL.CONFIG.FIELDS"><filename>fields</filename>
+            file</link> inside the &RCL; configuration defines which
+            document fields are either <literal>indexed</literal>
+            (searchable), <literal>stored</literal> (retrievable with
+            search results), or both. Apart from a few standard/internal
+            fields, only the <literal>stored</literal> fields are
+            retrievable through the Python search interface.</para>
            </listitem>
          </varlistentry>

@ -5118,381 +5140,347 @@ recollindex -c "$confdir"
      <sect2 id="RCL.PROGRAM.PYTHONAPI.SEARCH">
        <title>Python search interface</title>

-        <sect3 id="RCL.PROGRAM.PYTHONAPI.PACKAGE">
-          <title>Recoll package</title>
-          
-          <para>The <literal>recoll</literal> package contains two
-          modules:
-          <itemizedlist>
-            <listitem><para>The <literal>recoll</literal> module contains
-            functions and classes used to query (or update) the
-            index. This section will only describe the query part, see
-            further for the update part.</para></listitem> 
-            <listitem><para>The <literal>rclextract</literal> module contains
-            functions and classes used to access document
-            data.</para></listitem> 
-          </itemizedlist>
-          </para>            
-        </sect3>
-
        <sect3 id="RCL.PROGRAM.PYTHONAPI.RECOLL">
          <title>The recoll module</title>

-          <sect4 id="RCL.PROGRAM.PYTHONAPI.RECOLL.FUNCTIONS">
-            <title>Functions</title>
+        <simplesect id="RCL.PROGRAM.PYTHONAPI.RECOLL.CONNECT">
+          <title>connect(confdir=None, extra_dbs=None, writable = False)</title>

-            <variablelist>
-              <varlistentry>
-                <term>connect(confdir=None, extra_dbs=None,
-                writable = False)</term>
-                <listitem>
-                  <para>The <literal>connect()</literal> function connects to
-                  one or several &RCL; index(es) and returns
-                  a <literal>Db</literal> object.</para>
-                  <itemizedlist>
-                    <listitem><para><literal>confdir</literal> may specify
-                    a configuration directory. The usual defaults
-                    apply.</para></listitem> 
-                    <listitem><para><literal>extra_dbs</literal> is a list of
-                    additional indexes (Xapian directories).</para></listitem>
-                    <listitem><para><literal>writable</literal> decides if
-                    we can index new data through this
-                    connection.</para></listitem>
-                  </itemizedlist> 
-                  <para>This call initializes the recoll module, and it should
-                  always be performed before any other call or object
-                  creation.</para> 
-                </listitem>
-              </varlistentry>
-            </variablelist>
-          </sect4>
+          <para>The <literal>connect()</literal> function connects to
+          one or several &RCL; index(es) and returns
+          a <literal>Db</literal> object.</para>
+          <para>This call initializes the recoll module, and it should
+          always be performed before any other call or object
+          creation.</para> 
+          <itemizedlist>
+            <listitem><para><literal>confdir</literal> may specify
+            a configuration directory. The usual defaults
+            apply.</para></listitem> 
+            <listitem><para><literal>extra_dbs</literal> is a list of
+            additional indexes (Xapian directories).</para></listitem>
+            <listitem><para><literal>writable</literal> decides if
+            we can index new data through this
+            connection.</para></listitem>
+          </itemizedlist> 
+        </simplesect>

+        <simplesect id="RCL.PROGRAM.PYTHONAPI.RECOLL.DB">
+          <title>The Db class</title>

-          <sect4 id="RCL.PROGRAM.PYTHONAPI.RECOLL.CLASSES">
-            <title>Classes</title>
+          <para>A Db object is created by a <literal>connect()</literal>
+          call and holds a  connection to a Recoll index.</para>
+          <variablelist>
+            <varlistentry>
+              <term>Db.close()</term>
+              <listitem><para>Closes the connection. You can't do anything
+              with the <literal>Db</literal> object after
+              this.</para></listitem>
+            </varlistentry>
+            <varlistentry>
+              <term>Db.query(), Db.cursor()</term> <listitem><para>These
+              aliases return a blank <literal>Query</literal> object
+              for this index.</para></listitem>
+            </varlistentry>
+
+            <varlistentry>
+              <term>Db.setAbstractParams(maxchars,
+              contextwords)</term> <listitem><para>Set the parameters used
+              to build snippets (sets of keywords in context text
+              fragments). <literal>maxchars</literal> defines the
+              maximum total size of the abstract. 
+              <literal>contextwords</literal> defines how many
+              terms are shown around the keyword.</para></listitem>
+            </varlistentry>
+
+            <varlistentry>
+              <term>Db.termMatch(match_type, expr, field='',
+              maxlen=-1, casesens=False, diacsens=False, lang='english')
+              </term> 
+              <listitem><para>Expand an expression against the
+              index term list. Performs the basic function from the
+              GUI term explorer tool. <literal>match_type</literal>
+              can be either
+              of <literal>wildcard</literal>, <literal>regexp</literal>
+              or <literal>stem</literal>. Returns a list of terms
+              expanded from the input expression.
+              </para></listitem>
+            </varlistentry>
+
+          </variablelist>
+
+        </simplesect>
+        <simplesect id="RCL.PROGRAM.PYTHONAPI.RECOLL.CLASSES.QUERY">
+          <title>The Query class</title>
+
+          <para>A <literal>Query</literal> object (equivalent to a
+          cursor in the Python DB API) is created by
+          a <literal>Db.query()</literal> call. It is used to
+          execute index searches.</para>
+
+          <variablelist>
+
+            <varlistentry>
+              <term>Query.sortby(fieldname, ascending=True)</term>
+              <listitem><para>Sort results
+              by <replaceable>fieldname</replaceable>, in ascending
+              or descending order. Must be called before executing
+              the search.</para></listitem>
+            </varlistentry>
            
-            <sect5 id="RCL.PROGRAM.PYTHONAPI.RECOLL.CLASSES.DB">
-              <title>The Db class</title>
+            <varlistentry>
+              <term>Query.execute(query_string, stemming=1, 
+              stemlang="english", fetchtext=False)</term>
+              <listitem><para>Starts a search
+              for <replaceable>query_string</replaceable>, a &RCL;
+              search language string. If the index stores the document
+              texts and <literal>fetchtext</literal> is True, store the
+              document extracted text in
+              <literal>doc.text</literal>.</para></listitem> 
+            </varlistentry>

-              <para>A Db object is created by
-              a <literal>connect()</literal> call and holds a 
-              connection to a Recoll index.</para>
-              <variablelist>
-                <varlistentry>
-                  <term>Db.close()</term>
-                  <listitem><para>Closes the connection. You can't do anything
-                  with the <literal>Db</literal> object after
-                  this.</para></listitem>
-                </varlistentry>
-                <varlistentry>
-                  <term>Db.query(), Db.cursor()</term> <listitem><para>These
-                  aliases return a blank <literal>Query</literal> object
-                  for this index.</para></listitem>
-                </varlistentry>
+            <varlistentry>
+              <term>Query.executesd(SearchData, fetchtext=False)</term>
+              <listitem><para>Starts a search for the query defined by
+              the SearchData object. If the index stores the document
+              texts and <literal>fetchtext</literal> is True, store the
+              document extracted text in
+              <literal>doc.text</literal>.</para></listitem>
+            </varlistentry>

-                <varlistentry>
-                  <term>Db.setAbstractParams(maxchars,
-                  contextwords)</term> <listitem><para>Set the parameters used
-                  to build snippets (sets of keywords in context text
-                  fragments). <literal>maxchars</literal> defines the
-                  maximum total size of the abstract. 
-                  <literal>contextwords</literal> defines how many
-                  terms are shown around the keyword.</para></listitem>
-                </varlistentry>
-
-                <varlistentry>
-                  <term>Db.termMatch(match_type, expr, field='',
-                  maxlen=-1, casesens=False, diacsens=False, lang='english')
-                  </term> 
-                  <listitem><para>Expand an expression against the
-                  index term list. Performs the basic function from the
-                  GUI term explorer tool. <literal>match_type</literal>
-                  can be either
-                  of <literal>wildcard</literal>, <literal>regexp</literal>
-                  or <literal>stem</literal>. Returns a list of terms
-                  expanded from the input expression.
-                  </para></listitem>
-                </varlistentry>
-
-              </variablelist>
-
-            </sect5>
-
-
-            <sect5 id="RCL.PROGRAM.PYTHONAPI.RECOLL.CLASSES.QUERY">
-              <title>The Query class</title>
-
-              <para>A <literal>Query</literal> object (equivalent to a
-              cursor in the Python DB API) is created by
-              a <literal>Db.query()</literal> call. It is used to
-              execute index searches.</para>
-
-              <variablelist>
-
-                <varlistentry>
-                  <term>Query.sortby(fieldname, ascending=True)</term>
-                  <listitem><para>Sort results
-                  by <replaceable>fieldname</replaceable>, in ascending
-                  or descending order. Must be called before executing
-                  the search.</para></listitem>
-                </varlistentry>
-                
-                <varlistentry>
-                  <term>Query.execute(query_string, stemming=1, 
-                  stemlang="english", fetchtext=False)</term>
-                  <listitem><para>Starts a search
-                  for <replaceable>query_string</replaceable>, a &RCL;
-                  search language string. If the index stores the document
-                  texts and <literal>fetchtext</literal> is True, store the
-                  document extracted text in
-                  <literal>doc.text</literal>.</para></listitem> 
-                </varlistentry>
-
-                <varlistentry>
-                  <term>Query.executesd(SearchData, fetchtext=False)</term>
-                  <listitem><para>Starts a search for the query defined by
-                  the SearchData object. If the index stores the document
-                  texts and <literal>fetchtext</literal> is True, store the
-                  document extracted text in
-                  <literal>doc.text</literal>.</para></listitem>
-                </varlistentry>
-
-                <varlistentry>
-                  <term>Query.fetchmany(size=query.arraysize)</term> 
-                  
-                  <listitem><para>Fetches
-                  the next <literal>Doc</literal> objects in the current
-                  search results, and returns them as an array of the
-                  required size, which is by default the value of
-                  the <literal>arraysize</literal> data member.</para></listitem>
-                </varlistentry>
-
-                <varlistentry>
-                  <term>Query.fetchone()</term> <listitem><para>Fetches the
-                  next <literal>Doc</literal> object from the current
-                  search results. Generates a StopIteration exception if
-                  there are no results left.</para></listitem>
-                </varlistentry>
-
-                <varlistentry>
-                  <term>Query.close()</term>
-                  <listitem><para>Closes the query. The object is unusable
-                  after the call.</para></listitem>
-                </varlistentry>
-
-                <varlistentry>
-                  <term>Query.scroll(value, mode='relative')</term>
-                  <listitem><para>Adjusts the position in the current result
-                  set. <literal>mode</literal> can
-                  be <literal>relative</literal>
-                  or <literal>absolute</literal>. </para></listitem>
-                </varlistentry>
-
-                <varlistentry>
-                  <term>Query.getgroups()</term>
-                  <listitem><para>Retrieves the expanded query terms as a list
-                  of pairs. Meaningful only after executexx In each
-                  pair, the first entry is a list of user terms (of size
-                  one for simple terms, or more for group and phrase
-                  clauses), the second a list of query terms as derived
-                  from the user terms and used in the Xapian
-                  Query.</para></listitem>
-                </varlistentry>
-                
-                <varlistentry>
-                  <term>Query.getxquery()</term>
-                  <listitem><para>Return the Xapian query description as a
-                  Unicode string. 
-                  Meaningful only after executexx.</para></listitem>
-                </varlistentry>
-
-                <varlistentry>
-                  <term>Query.highlight(text, ishtml = 0, methods = object)</term>
-                  <listitem><para>Will insert &lt;span "class=rclmatch">,
-                  &lt;/span> tags around the match areas in the input text
-                  and return the modified text.  <literal>ishtml</literal>
-                  can be set to indicate that the input text is HTML and
-                  that HTML special characters should not be escaped.
-                  <literal>methods</literal> if set should be an object
-                  with methods startMatch(i) and endMatch() which will be
-                  called for each match and should return a begin and end
-                  tag</para></listitem>
-                </varlistentry>
-
-                <varlistentry>
-                  <term>Query.makedocabstract(doc, methods = object))</term>
-                  <listitem><para>Create a snippets abstract
-                  for <literal>doc</literal> (a <literal>Doc</literal>
-                  object) by selecting text around the match terms.
-                  If methods is set, will also perform highlighting. See
-                  the highlight method.
-                  </para></listitem>
-                </varlistentry>
-                
-                <varlistentry>
-                  <term>Query.__iter__() and Query.next()</term>
-                  <listitem><para>So that things like <literal>for doc in
-                  query:</literal> will work.</para></listitem>
-                </varlistentry>
-              </variablelist>
-
-              <variablelist>
-
-                <varlistentry><term>Query.arraysize</term>
-                <listitem><para>Default number of records processed by fetchmany
-                (r/w).</para></listitem>  
-                </varlistentry>
-                <varlistentry><term>Query.rowcount</term><listitem><para>Number
-                of records returned by the last
-                execute.</para></listitem></varlistentry>
-                <varlistentry><term>Query.rownumber</term><listitem><para>Next index
-                to be fetched from results. Normally increments after
-                each fetchone() call, but can be set/reset before the
-                call to effect seeking (equivalent to
-                using <literal>scroll()</literal>). Starts at
-                0.</para></listitem> 
-                </varlistentry>
-
-              </variablelist>
-
-            </sect5>
-
-
-            <sect5 id="RCL.PROGRAM.PYTHONAPI.RECOLL.CLASSES.DOC">
-              <title>The Doc class</title>
-
-              <para>A <literal>Doc</literal> object contains index data
-              for a given document. The data is extracted from the
-              index when searching, or set by the indexer program when
-              updating. The Doc object has many attributes to be read or
-              set by its user. It matches exactly the Rcl::Doc C++
-              object. Some of the attributes are predefined, but,
-              especially when indexing, others can be set, the name of
-              which will be processed as field names by the indexing
-              configuration.  Inputs can be specified as Unicode or
-              strings. Outputs are Unicode objects. All dates are
-              specified as Unix timestamps, printed as strings. Please
-              refer to the <filename>rcldb/rcldoc.cpp</filename> C++ file
-              for a full description of the predefined attributes. Here
-              follows a short list.</para>
-
-              <para><itemizedlist>
-                <listitem><para><literal>url</literal> the document URL but
-                see also <literal>getbinurl()</literal></para></listitem>
+            <varlistentry>
+              <term>Query.fetchmany(size=query.arraysize)</term> 
              
-                <listitem><para><literal>ipath</literal> the document
-                <literal>ipath</literal> for embedded
-                documents.</para></listitem> 
+              <listitem><para>Fetches
+              the next <literal>Doc</literal> objects in the current
+              search results, and returns them as an array of the
+              required size, which is by default the value of
+              the <literal>arraysize</literal> data member.</para></listitem>
+            </varlistentry>

-                <listitem><para><literal>fbytes, dbytes</literal> the document
-                file and text sizes.</para></listitem> 
-                <listitem><para><literal>fmtime, dmtime</literal> the document
-                file and document times.</para></listitem> 
-              
-                <listitem><para><literal>xdocid</literal> the document
-                Xapian document ID. This is useful if you want to access
-                the document through a direct Xapian
-                operation.</para></listitem>
+            <varlistentry>
+              <term>Query.fetchone()</term> <listitem><para>Fetches the
+              next <literal>Doc</literal> object from the current
+              search results. Generates a StopIteration exception if
+              there are no results left.</para></listitem>
+            </varlistentry>

-                <listitem><para><literal>mtype</literal> the document
-                MIME type.</para></listitem>
+            <varlistentry>
+              <term>Query.close()</term>
+              <listitem><para>Closes the query. The object is unusable
+              after the call.</para></listitem>
+            </varlistentry>

-                <listitem><para>Fields stored by default:
-                <literal>author</literal>, <literal>filename</literal>,
-                <literal>keywords</literal>,
-                <literal>recipient</literal></para></listitem>
+            <varlistentry>
+              <term>Query.scroll(value, mode='relative')</term>
+              <listitem><para>Adjusts the position in the current result
+              set. <literal>mode</literal> can
+              be <literal>relative</literal>
+              or <literal>absolute</literal>. </para></listitem>
+            </varlistentry>

-              </itemizedlist>                
-              </para>
-              
-              <para>At query time, only the fields that are defined
-              as <literal>stored</literal> either by default or in
-              the <filename>fields</filename> configuration file will be
-              meaningful in the <literal>Doc</literal>
-              object. Especially this will not be the case for the
-              document text. See the <literal>rclextract</literal>
-              module for accessing document contents.</para> 
+            <varlistentry>
+              <term>Query.getgroups()</term>
+              <listitem><para>Retrieves the expanded query terms as a list
+              of pairs. Meaningful only after executexx In each
+              pair, the first entry is a list of user terms (of size
+              one for simple terms, or more for group and phrase
+              clauses), the second a list of query terms as derived
+              from the user terms and used in the Xapian
+              Query.</para></listitem>
+            </varlistentry>
+            
+            <varlistentry>
+              <term>Query.getxquery()</term>
+              <listitem><para>Return the Xapian query description as a
+              Unicode string. 
+              Meaningful only after executexx.</para></listitem>
+            </varlistentry>

-              <variablelist>
+            <varlistentry>
+              <term>Query.highlight(text, ishtml = 0, methods = object)</term>
+              <listitem><para>Will insert &lt;span "class=rclmatch">,
+              &lt;/span> tags around the match areas in the input text
+              and return the modified text.  <literal>ishtml</literal>
+              can be set to indicate that the input text is HTML and
+              that HTML special characters should not be escaped.
+              <literal>methods</literal> if set should be an object
+              with methods startMatch(i) and endMatch() which will be
+              called for each match and should return a begin and end
+              tag</para></listitem>
+            </varlistentry>

-                <varlistentry>
-                  <term>get(key), [] operator</term>
+            <varlistentry>
+              <term>Query.makedocabstract(doc, methods = object))</term>
+              <listitem><para>Create a snippets abstract
+              for <literal>doc</literal> (a <literal>Doc</literal>
+              object) by selecting text around the match terms.
+              If methods is set, will also perform highlighting. See
+              the highlight method.
+              </para></listitem>
+            </varlistentry>
+            
+            <varlistentry>
+              <term>Query.__iter__() and Query.next()</term>
+              <listitem><para>So that things like <literal>for doc in
+              query:</literal> will work.</para></listitem>
+            </varlistentry>
+          </variablelist>

-                  <listitem><para>Retrieve the named document
-                  attribute. You can also use <literal>getattr(doc,
-                  key)</literal> or
-                  <literal>doc.key</literal>.</para></listitem>
-                </varlistentry>
+          <variablelist>

-                <varlistentry>
-                  <term>doc.key = value</term>
+            <varlistentry><term>Query.arraysize</term>
+            <listitem><para>Default number of records processed by fetchmany
+            (r/w).</para></listitem>  
+            </varlistentry>
+            <varlistentry><term>Query.rowcount</term><listitem><para>Number
+            of records returned by the last
+            execute.</para></listitem></varlistentry>
+            <varlistentry><term>Query.rownumber</term><listitem><para>Next index
+            to be fetched from results. Normally increments after
+            each fetchone() call, but can be set/reset before the
+            call to effect seeking (equivalent to
+            using <literal>scroll()</literal>). Starts at
+            0.</para></listitem> 
+            </varlistentry>

-                  <listitem><para>Set the the named document attribute. You
-                  can also use <literal>setattr(doc, key,
-                  value)</literal>.</para></listitem>
-                </varlistentry>
+          </variablelist>

-                <varlistentry>
-                  <term>getbinurl()</term>
+        </simplesect>
+        <simplesect id="RCL.PROGRAM.PYTHONAPI.RECOLL.CLASSES.DOC">
+          <title>The Doc class</title>

-                  <listitem><para>Retrieve the URL in byte array format (no
-                  transcoding), for use as parameter to a system
-                  call.</para></listitem>
-                </varlistentry>
+          <para>A <literal>Doc</literal> object contains index data
+          for a given document. The data is extracted from the
+          index when searching, or set by the indexer program when
+          updating. The Doc object has many attributes to be read or
+          set by its user. It mostly matches the Rcl::Doc C++
+          object. Some of the attributes are predefined, but,
+          especially when indexing, others can be set, the name of
+          which will be processed as field names by the indexing
+          configuration.  Inputs can be specified as Unicode or
+          strings. Outputs are Unicode objects. All dates are
+          specified as Unix timestamps, printed as strings. Please
+          refer to the <filename>rcldb/rcldoc.cpp</filename> C++ file
+          for a full description of the predefined attributes. Here
+          follows a short list.</para>

-                <varlistentry>
-                  <term>setbinurl(url)</term>
+          <para><itemizedlist>
+            <listitem><para><literal>url</literal> the document URL but
+            see also <literal>getbinurl()</literal></para></listitem>
+            
+            <listitem><para><literal>ipath</literal> the document
+            <literal>ipath</literal> for embedded
+            documents.</para></listitem> 

-                  <listitem><para>Set the URL in byte array format (no
-                  transcoding).</para></listitem>
-                </varlistentry>
+            <listitem><para><literal>fbytes, dbytes</literal> the document
+            file and text sizes.</para></listitem> 
+            <listitem><para><literal>fmtime, dmtime</literal> the document
+            file and document times.</para></listitem> 
+            
+            <listitem><para><literal>xdocid</literal> the document
+            Xapian document ID. This is useful if you want to access
+            the document through a direct Xapian
+            operation.</para></listitem>

-                <varlistentry>
-                  <term>items()</term>
-                  <listitem><para>Return a dictionary of doc object
-                  keys/values</para></listitem> 
-                </varlistentry>
+            <listitem><para><literal>mtype</literal> the document
+            MIME type.</para></listitem>

-                <varlistentry>
-                  <term>keys()</term>
-                  <listitem><para>list of doc object keys (attribute
-                  names).</para></listitem>
-                </varlistentry>
-              </variablelist>
+            <listitem><para>Fields stored by default:
+            <literal>author</literal>, <literal>filename</literal>,
+            <literal>keywords</literal>,
+            <literal>recipient</literal></para></listitem>

-            </sect5> <!-- Doc -->
+          </itemizedlist>                
+          </para>
+          
+          <para>At query time, only the fields that are defined as
+          <literal>stored</literal> either by default or in the
+          <filename>fields</filename> configuration file will be meaningful
+          in the <literal>Doc</literal> object. The document processed text
+          may be present or not, depending if the index stores the text at
+          all, and if it does, on the <literal>fetchtext</literal> query
+          execute option. See also the <literal>rclextract</literal> module
+          for accessing document contents.</para>

-            <sect5 id="RCL.PROGRAM.PYTHONAPI.RECOLL.CLASSES.SEARCHDATA">
-              <title>The SearchData class</title>
+          <variablelist>

-              <para>A <literal>SearchData</literal> object allows building
-              a query by combining clauses, for execution
-              by <literal>Query.executesd()</literal>. It can be used
-              in replacement of the query language approach. The
-              interface is going to change a little, so no detailed doc
-              for now...</para>
+            <varlistentry>
+              <term>get(key), [] operator</term>

-              <variablelist>
+              <listitem><para>Retrieve the named document
+              attribute. You can also use <literal>getattr(doc,
+              key)</literal> or
+              <literal>doc.key</literal>.</para></listitem>
+            </varlistentry>

-                <varlistentry>
-                  <term>addclause(type='and'|'or'|'excl'|'phrase'|'near'|'sub',
-                  qstring=string, slack=0, field='', stemming=1,
-                  subSearch=SearchData)</term>
-                  <listitem><para></para></listitem>
-                </varlistentry>
-              </variablelist>
+            <varlistentry>
+              <term>doc.key = value</term>

-            </sect5> <!-- SearchData -->
+              <listitem><para>Set the the named document attribute. You
+              can also use <literal>setattr(doc, key,
+              value)</literal>.</para></listitem>
+            </varlistentry>

-          </sect4> <!-- recoll.classes -->
-        </sect3> <!-- Recoll module -->
+            <varlistentry>
+              <term>getbinurl()</term>
+
+              <listitem><para>Retrieve the URL in byte array format (no
+              transcoding), for use as parameter to a system
+              call.</para></listitem>
+            </varlistentry>
+
+            <varlistentry>
+              <term>setbinurl(url)</term>
+
+              <listitem><para>Set the URL in byte array format (no
+              transcoding).</para></listitem>
+            </varlistentry>
+
+            <varlistentry>
+              <term>items()</term>
+              <listitem><para>Return a dictionary of doc object
+              keys/values</para></listitem> 
+            </varlistentry>
+
+            <varlistentry>
+              <term>keys()</term>
+              <listitem><para>list of doc object keys (attribute
+              names).</para></listitem>
+            </varlistentry>
+          </variablelist>
+
+        </simplesect> <!-- Doc -->
+
+        <simplesect id="RCL.PROGRAM.PYTHONAPI.RECOLL.CLASSES.SEARCHDATA">
+          <title>The SearchData class</title>
+
+          <para>A <literal>SearchData</literal> object allows building
+          a query by combining clauses, for execution
+          by <literal>Query.executesd()</literal>. It can be used
+          in replacement of the query language approach. The
+          interface is going to change a little, so no detailed doc
+          for now...</para>
+
+          <variablelist>
+
+            <varlistentry>
+              <term>addclause(type='and'|'or'|'excl'|'phrase'|'near'|'sub',
+              qstring=string, slack=0, field='', stemming=1,
+              subSearch=SearchData)</term>
+              <listitem><para></para></listitem>
+            </varlistentry>
+          </variablelist>
+
+        </simplesect> <!-- SearchData -->
+
+      </sect3> <!-- Recoll module -->

        <sect3 id="RCL.PROGRAM.PYTHONAPI.RCLEXTRACT">
          <title>The rclextract module</title>

          
-          <para>Prior to &RCL; 1.25, index queries never provide document
-          content because it is not stored. More recent versions usually
+          <para>Prior to &RCL; 1.25, index queries could not provide document
+          content because it was never stored. &RCL; 1.25 and later usually
          store the document text, which can be optionally retrieved when
          running a query (see <literal>query.execute()</literal>
          above - the result is always plain text).</para>
@ -5506,7 +5494,7 @@ recollindex -c "$confdir"
          <para>You need to import the <literal>recoll</literal> module
          before the <literal>rclextract</literal> module.</para>
          
-          <sect4 id="RCL.PROGRAM.PYTHONAPI.RCLEXTRACT.CLASSES.EXTRACTOR">
+          <simplesect id="RCL.PROGRAM.PYTHONAPI.RCLEXTRACT.CLASSES.EXTRACTOR">
            <title>The Extractor class</title>

            <variablelist>
@ -5565,7 +5553,7 @@ not doc.ipath and (not "rclbes" in doc.keys() or doc["rclbes"] == "FS")

              </variablelist>

-          </sect4>
+          </simplesect>
        </sect3> <!-- rclextract module -->