doc
This commit is contained in:
parent
ad89225b24
commit
3ebf1a7db2
@ -17,8 +17,9 @@ XSLDIR="/usr/share/xml/docbook/stylesheet/docbook-xsl/"
|
|||||||
|
|
||||||
# Options common to the single-file and chunked versions
|
# Options common to the single-file and chunked versions
|
||||||
commonoptions=--stringparam section.autolabel 1 \
|
commonoptions=--stringparam section.autolabel 1 \
|
||||||
--stringparam section.autolabel.max.depth 3 \
|
--stringparam section.autolabel.max.depth 2 \
|
||||||
--stringparam section.label.includes.component.label 1 \
|
--stringparam section.label.includes.component.label 1 \
|
||||||
|
--stringparam toc.max.depth 3 \
|
||||||
--stringparam autotoc.label.in.hyperlink 0 \
|
--stringparam autotoc.label.in.hyperlink 0 \
|
||||||
--stringparam abstract.notitle.enabled 1 \
|
--stringparam abstract.notitle.enabled 1 \
|
||||||
--stringparam html.stylesheet docbook-xsl.css \
|
--stringparam html.stylesheet docbook-xsl.css \
|
||||||
|
|||||||
File diff suppressed because it is too large
Load Diff
@ -4966,13 +4966,14 @@ recollindex -c "$confdir"
|
|||||||
<sect2 id="RCL.PROGRAM.PYTHONAPI.INTRO">
|
<sect2 id="RCL.PROGRAM.PYTHONAPI.INTRO">
|
||||||
<title>Introduction</title>
|
<title>Introduction</title>
|
||||||
|
|
||||||
<para>&RCL; versions after 1.11 define a Python programming
|
<para>The &RCL; Python programming interface can be used both for
|
||||||
interface, both for searching and creating/updating an
|
searching and for creating/updating an index. Bindings exist for
|
||||||
index.</para>
|
Python2 and Python3.</para>
|
||||||
|
|
||||||
<para>The search interface is used in the &RCL; Ubuntu Unity Lens
|
<para>The search interface is used in a number of active projects:
|
||||||
and the &RCL; Web UI. It can run queries on any &RCL;
|
the &RCL; <application>Gnome Shell Search Provider</application>,
|
||||||
configuration.</para>
|
the &RCL; Web UI, and the upmpdcli UPnP Media Server, in addition
|
||||||
|
to many small scripts.</para>
|
||||||
|
|
||||||
<para>The index update section of the API may be used to create and
|
<para>The index update section of the API may be used to create and
|
||||||
update &RCL; indexes on specific configurations (separate from the
|
update &RCL; indexes on specific configurations (separate from the
|
||||||
@ -4998,6 +4999,19 @@ recollindex -c "$confdir"
|
|||||||
paragraph at the end of this section will explain a few differences
|
paragraph at the end of this section will explain a few differences
|
||||||
and ways to write code compatible with both versions.</para>
|
and ways to write code compatible with both versions.</para>
|
||||||
|
|
||||||
|
<para>The <literal>recoll</literal> package now contains two
|
||||||
|
modules:</para>
|
||||||
|
<itemizedlist>
|
||||||
|
<listitem><para>The <literal>recoll</literal> module contains
|
||||||
|
functions and classes used to query (or update) the
|
||||||
|
index.</para></listitem>
|
||||||
|
|
||||||
|
<listitem><para>The <literal>rclextract</literal> module contains
|
||||||
|
functions and classes used at query time to access document
|
||||||
|
data.</para>
|
||||||
|
</listitem>
|
||||||
|
</itemizedlist>
|
||||||
|
|
||||||
<para>There is a good chance that your system repository has
|
<para>There is a good chance that your system repository has
|
||||||
packages for the Recoll Python API, sometimes in a package separate
|
packages for the Recoll Python API, sometimes in a package separate
|
||||||
from the main one (maybe named something like python-recoll). Else
|
from the main one (maybe named something like python-recoll). Else
|
||||||
@ -5022,13 +5036,17 @@ recollindex -c "$confdir"
|
|||||||
nres = query.execute("some query")
|
nres = query.execute("some query")
|
||||||
results = query.fetchmany(20)
|
results = query.fetchmany(20)
|
||||||
for doc in results:
|
for doc in results:
|
||||||
print(doc.url, doc.title)
|
print("%s %s" % (doc.url, doc.title))
|
||||||
]]></programlisting>
|
]]></programlisting>
|
||||||
|
|
||||||
<para>You can also take a look at the source for the <ulink
|
<para>You can also take a look at the source for the
|
||||||
url="https://github.com/koniu/recoll-webui">Recoll
|
<ulink url="https://opensourceprojects.eu/p/recollwebui/code/ci/78ddb20787b2a894b5e4661a8d5502c4511cf71e/tree/">Recoll
|
||||||
WebUI</ulink>, or the <ulink url="https://opensourceprojects.eu/p/upmpdcli/code/ci/c8c8e75bd181ad9db2df14da05934e53ca867a06/tree/src/mediaserver/cdplugins/uprcl/uprclfolders.py">upmpdcli local media server</ulink>, which are both
|
WebUI</ulink>, the
|
||||||
based on the Python API.</para>
|
<ulink url="https://opensourceprojects.eu/p/upmpdcli/code/ci/c8c8e75bd181ad9db2df14da05934e53ca867a06/tree/src/mediaserver/cdplugins/uprcl/uprclfolders.py">upmpdcli
|
||||||
|
local media server</ulink>, or the
|
||||||
|
<ulink
|
||||||
|
url="https://opensourceprojects.eu/p/recollgssp/code/ci/3f120108e099f9d687306c0be61593994326d52d/tree/gssp-recoll.py">Gnome
|
||||||
|
Shell Search Provider</ulink>.</para>
|
||||||
|
|
||||||
</sect2>
|
</sect2>
|
||||||
|
|
||||||
@ -5104,10 +5122,14 @@ recollindex -c "$confdir"
|
|||||||
|
|
||||||
<varlistentry>
|
<varlistentry>
|
||||||
<term>Stored and indexed fields</term>
|
<term>Stored and indexed fields</term>
|
||||||
<listitem><para>The <filename>fields</filename> file inside
|
<listitem><para>The <link
|
||||||
the &RCL; configuration defines which document fields are
|
linkend="RCL.INSTALL.CONFIG.FIELDS"><filename>fields</filename>
|
||||||
either "indexed" (searchable), "stored" (retrievable with
|
file</link> inside the &RCL; configuration defines which
|
||||||
search results), or both.</para>
|
document fields are either <literal>indexed</literal>
|
||||||
|
(searchable), <literal>stored</literal> (retrievable with
|
||||||
|
search results), or both. Apart from a few standard/internal
|
||||||
|
fields, only the <literal>stored</literal> fields are
|
||||||
|
retrievable through the Python search interface.</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
</varlistentry>
|
</varlistentry>
|
||||||
|
|
||||||
@ -5118,381 +5140,347 @@ recollindex -c "$confdir"
|
|||||||
<sect2 id="RCL.PROGRAM.PYTHONAPI.SEARCH">
|
<sect2 id="RCL.PROGRAM.PYTHONAPI.SEARCH">
|
||||||
<title>Python search interface</title>
|
<title>Python search interface</title>
|
||||||
|
|
||||||
<sect3 id="RCL.PROGRAM.PYTHONAPI.PACKAGE">
|
|
||||||
<title>Recoll package</title>
|
|
||||||
|
|
||||||
<para>The <literal>recoll</literal> package contains two
|
|
||||||
modules:
|
|
||||||
<itemizedlist>
|
|
||||||
<listitem><para>The <literal>recoll</literal> module contains
|
|
||||||
functions and classes used to query (or update) the
|
|
||||||
index. This section will only describe the query part, see
|
|
||||||
further for the update part.</para></listitem>
|
|
||||||
<listitem><para>The <literal>rclextract</literal> module contains
|
|
||||||
functions and classes used to access document
|
|
||||||
data.</para></listitem>
|
|
||||||
</itemizedlist>
|
|
||||||
</para>
|
|
||||||
</sect3>
|
|
||||||
|
|
||||||
<sect3 id="RCL.PROGRAM.PYTHONAPI.RECOLL">
|
<sect3 id="RCL.PROGRAM.PYTHONAPI.RECOLL">
|
||||||
<title>The recoll module</title>
|
<title>The recoll module</title>
|
||||||
|
|
||||||
<sect4 id="RCL.PROGRAM.PYTHONAPI.RECOLL.FUNCTIONS">
|
<simplesect id="RCL.PROGRAM.PYTHONAPI.RECOLL.CONNECT">
|
||||||
<title>Functions</title>
|
<title>connect(confdir=None, extra_dbs=None, writable = False)</title>
|
||||||
|
|
||||||
<variablelist>
|
<para>The <literal>connect()</literal> function connects to
|
||||||
<varlistentry>
|
one or several &RCL; index(es) and returns
|
||||||
<term>connect(confdir=None, extra_dbs=None,
|
a <literal>Db</literal> object.</para>
|
||||||
writable = False)</term>
|
<para>This call initializes the recoll module, and it should
|
||||||
<listitem>
|
always be performed before any other call or object
|
||||||
<para>The <literal>connect()</literal> function connects to
|
creation.</para>
|
||||||
one or several &RCL; index(es) and returns
|
<itemizedlist>
|
||||||
a <literal>Db</literal> object.</para>
|
<listitem><para><literal>confdir</literal> may specify
|
||||||
<itemizedlist>
|
a configuration directory. The usual defaults
|
||||||
<listitem><para><literal>confdir</literal> may specify
|
apply.</para></listitem>
|
||||||
a configuration directory. The usual defaults
|
<listitem><para><literal>extra_dbs</literal> is a list of
|
||||||
apply.</para></listitem>
|
additional indexes (Xapian directories).</para></listitem>
|
||||||
<listitem><para><literal>extra_dbs</literal> is a list of
|
<listitem><para><literal>writable</literal> decides if
|
||||||
additional indexes (Xapian directories).</para></listitem>
|
we can index new data through this
|
||||||
<listitem><para><literal>writable</literal> decides if
|
connection.</para></listitem>
|
||||||
we can index new data through this
|
</itemizedlist>
|
||||||
connection.</para></listitem>
|
</simplesect>
|
||||||
</itemizedlist>
|
|
||||||
<para>This call initializes the recoll module, and it should
|
|
||||||
always be performed before any other call or object
|
|
||||||
creation.</para>
|
|
||||||
</listitem>
|
|
||||||
</varlistentry>
|
|
||||||
</variablelist>
|
|
||||||
</sect4>
|
|
||||||
|
|
||||||
|
<simplesect id="RCL.PROGRAM.PYTHONAPI.RECOLL.DB">
|
||||||
|
<title>The Db class</title>
|
||||||
|
|
||||||
<sect4 id="RCL.PROGRAM.PYTHONAPI.RECOLL.CLASSES">
|
<para>A Db object is created by a <literal>connect()</literal>
|
||||||
<title>Classes</title>
|
call and holds a connection to a Recoll index.</para>
|
||||||
|
<variablelist>
|
||||||
|
<varlistentry>
|
||||||
|
<term>Db.close()</term>
|
||||||
|
<listitem><para>Closes the connection. You can't do anything
|
||||||
|
with the <literal>Db</literal> object after
|
||||||
|
this.</para></listitem>
|
||||||
|
</varlistentry>
|
||||||
|
<varlistentry>
|
||||||
|
<term>Db.query(), Db.cursor()</term> <listitem><para>These
|
||||||
|
aliases return a blank <literal>Query</literal> object
|
||||||
|
for this index.</para></listitem>
|
||||||
|
</varlistentry>
|
||||||
|
|
||||||
|
<varlistentry>
|
||||||
|
<term>Db.setAbstractParams(maxchars,
|
||||||
|
contextwords)</term> <listitem><para>Set the parameters used
|
||||||
|
to build snippets (sets of keywords in context text
|
||||||
|
fragments). <literal>maxchars</literal> defines the
|
||||||
|
maximum total size of the abstract.
|
||||||
|
<literal>contextwords</literal> defines how many
|
||||||
|
terms are shown around the keyword.</para></listitem>
|
||||||
|
</varlistentry>
|
||||||
|
|
||||||
|
<varlistentry>
|
||||||
|
<term>Db.termMatch(match_type, expr, field='',
|
||||||
|
maxlen=-1, casesens=False, diacsens=False, lang='english')
|
||||||
|
</term>
|
||||||
|
<listitem><para>Expand an expression against the
|
||||||
|
index term list. Performs the basic function from the
|
||||||
|
GUI term explorer tool. <literal>match_type</literal>
|
||||||
|
can be either
|
||||||
|
of <literal>wildcard</literal>, <literal>regexp</literal>
|
||||||
|
or <literal>stem</literal>. Returns a list of terms
|
||||||
|
expanded from the input expression.
|
||||||
|
</para></listitem>
|
||||||
|
</varlistentry>
|
||||||
|
|
||||||
|
</variablelist>
|
||||||
|
|
||||||
|
</simplesect>
|
||||||
|
<simplesect id="RCL.PROGRAM.PYTHONAPI.RECOLL.CLASSES.QUERY">
|
||||||
|
<title>The Query class</title>
|
||||||
|
|
||||||
|
<para>A <literal>Query</literal> object (equivalent to a
|
||||||
|
cursor in the Python DB API) is created by
|
||||||
|
a <literal>Db.query()</literal> call. It is used to
|
||||||
|
execute index searches.</para>
|
||||||
|
|
||||||
|
<variablelist>
|
||||||
|
|
||||||
|
<varlistentry>
|
||||||
|
<term>Query.sortby(fieldname, ascending=True)</term>
|
||||||
|
<listitem><para>Sort results
|
||||||
|
by <replaceable>fieldname</replaceable>, in ascending
|
||||||
|
or descending order. Must be called before executing
|
||||||
|
the search.</para></listitem>
|
||||||
|
</varlistentry>
|
||||||
|
|
||||||
<sect5 id="RCL.PROGRAM.PYTHONAPI.RECOLL.CLASSES.DB">
|
<varlistentry>
|
||||||
<title>The Db class</title>
|
<term>Query.execute(query_string, stemming=1,
|
||||||
|
stemlang="english", fetchtext=False)</term>
|
||||||
|
<listitem><para>Starts a search
|
||||||
|
for <replaceable>query_string</replaceable>, a &RCL;
|
||||||
|
search language string. If the index stores the document
|
||||||
|
texts and <literal>fetchtext</literal> is True, store the
|
||||||
|
document extracted text in
|
||||||
|
<literal>doc.text</literal>.</para></listitem>
|
||||||
|
</varlistentry>
|
||||||
|
|
||||||
<para>A Db object is created by
|
<varlistentry>
|
||||||
a <literal>connect()</literal> call and holds a
|
<term>Query.executesd(SearchData, fetchtext=False)</term>
|
||||||
connection to a Recoll index.</para>
|
<listitem><para>Starts a search for the query defined by
|
||||||
<variablelist>
|
the SearchData object. If the index stores the document
|
||||||
<varlistentry>
|
texts and <literal>fetchtext</literal> is True, store the
|
||||||
<term>Db.close()</term>
|
document extracted text in
|
||||||
<listitem><para>Closes the connection. You can't do anything
|
<literal>doc.text</literal>.</para></listitem>
|
||||||
with the <literal>Db</literal> object after
|
</varlistentry>
|
||||||
this.</para></listitem>
|
|
||||||
</varlistentry>
|
|
||||||
<varlistentry>
|
|
||||||
<term>Db.query(), Db.cursor()</term> <listitem><para>These
|
|
||||||
aliases return a blank <literal>Query</literal> object
|
|
||||||
for this index.</para></listitem>
|
|
||||||
</varlistentry>
|
|
||||||
|
|
||||||
<varlistentry>
|
<varlistentry>
|
||||||
<term>Db.setAbstractParams(maxchars,
|
<term>Query.fetchmany(size=query.arraysize)</term>
|
||||||
contextwords)</term> <listitem><para>Set the parameters used
|
|
||||||
to build snippets (sets of keywords in context text
|
|
||||||
fragments). <literal>maxchars</literal> defines the
|
|
||||||
maximum total size of the abstract.
|
|
||||||
<literal>contextwords</literal> defines how many
|
|
||||||
terms are shown around the keyword.</para></listitem>
|
|
||||||
</varlistentry>
|
|
||||||
|
|
||||||
<varlistentry>
|
|
||||||
<term>Db.termMatch(match_type, expr, field='',
|
|
||||||
maxlen=-1, casesens=False, diacsens=False, lang='english')
|
|
||||||
</term>
|
|
||||||
<listitem><para>Expand an expression against the
|
|
||||||
index term list. Performs the basic function from the
|
|
||||||
GUI term explorer tool. <literal>match_type</literal>
|
|
||||||
can be either
|
|
||||||
of <literal>wildcard</literal>, <literal>regexp</literal>
|
|
||||||
or <literal>stem</literal>. Returns a list of terms
|
|
||||||
expanded from the input expression.
|
|
||||||
</para></listitem>
|
|
||||||
</varlistentry>
|
|
||||||
|
|
||||||
</variablelist>
|
|
||||||
|
|
||||||
</sect5>
|
|
||||||
|
|
||||||
|
|
||||||
<sect5 id="RCL.PROGRAM.PYTHONAPI.RECOLL.CLASSES.QUERY">
|
|
||||||
<title>The Query class</title>
|
|
||||||
|
|
||||||
<para>A <literal>Query</literal> object (equivalent to a
|
|
||||||
cursor in the Python DB API) is created by
|
|
||||||
a <literal>Db.query()</literal> call. It is used to
|
|
||||||
execute index searches.</para>
|
|
||||||
|
|
||||||
<variablelist>
|
|
||||||
|
|
||||||
<varlistentry>
|
|
||||||
<term>Query.sortby(fieldname, ascending=True)</term>
|
|
||||||
<listitem><para>Sort results
|
|
||||||
by <replaceable>fieldname</replaceable>, in ascending
|
|
||||||
or descending order. Must be called before executing
|
|
||||||
the search.</para></listitem>
|
|
||||||
</varlistentry>
|
|
||||||
|
|
||||||
<varlistentry>
|
|
||||||
<term>Query.execute(query_string, stemming=1,
|
|
||||||
stemlang="english", fetchtext=False)</term>
|
|
||||||
<listitem><para>Starts a search
|
|
||||||
for <replaceable>query_string</replaceable>, a &RCL;
|
|
||||||
search language string. If the index stores the document
|
|
||||||
texts and <literal>fetchtext</literal> is True, store the
|
|
||||||
document extracted text in
|
|
||||||
<literal>doc.text</literal>.</para></listitem>
|
|
||||||
</varlistentry>
|
|
||||||
|
|
||||||
<varlistentry>
|
|
||||||
<term>Query.executesd(SearchData, fetchtext=False)</term>
|
|
||||||
<listitem><para>Starts a search for the query defined by
|
|
||||||
the SearchData object. If the index stores the document
|
|
||||||
texts and <literal>fetchtext</literal> is True, store the
|
|
||||||
document extracted text in
|
|
||||||
<literal>doc.text</literal>.</para></listitem>
|
|
||||||
</varlistentry>
|
|
||||||
|
|
||||||
<varlistentry>
|
|
||||||
<term>Query.fetchmany(size=query.arraysize)</term>
|
|
||||||
|
|
||||||
<listitem><para>Fetches
|
|
||||||
the next <literal>Doc</literal> objects in the current
|
|
||||||
search results, and returns them as an array of the
|
|
||||||
required size, which is by default the value of
|
|
||||||
the <literal>arraysize</literal> data member.</para></listitem>
|
|
||||||
</varlistentry>
|
|
||||||
|
|
||||||
<varlistentry>
|
|
||||||
<term>Query.fetchone()</term> <listitem><para>Fetches the
|
|
||||||
next <literal>Doc</literal> object from the current
|
|
||||||
search results. Generates a StopIteration exception if
|
|
||||||
there are no results left.</para></listitem>
|
|
||||||
</varlistentry>
|
|
||||||
|
|
||||||
<varlistentry>
|
|
||||||
<term>Query.close()</term>
|
|
||||||
<listitem><para>Closes the query. The object is unusable
|
|
||||||
after the call.</para></listitem>
|
|
||||||
</varlistentry>
|
|
||||||
|
|
||||||
<varlistentry>
|
|
||||||
<term>Query.scroll(value, mode='relative')</term>
|
|
||||||
<listitem><para>Adjusts the position in the current result
|
|
||||||
set. <literal>mode</literal> can
|
|
||||||
be <literal>relative</literal>
|
|
||||||
or <literal>absolute</literal>. </para></listitem>
|
|
||||||
</varlistentry>
|
|
||||||
|
|
||||||
<varlistentry>
|
|
||||||
<term>Query.getgroups()</term>
|
|
||||||
<listitem><para>Retrieves the expanded query terms as a list
|
|
||||||
of pairs. Meaningful only after executexx In each
|
|
||||||
pair, the first entry is a list of user terms (of size
|
|
||||||
one for simple terms, or more for group and phrase
|
|
||||||
clauses), the second a list of query terms as derived
|
|
||||||
from the user terms and used in the Xapian
|
|
||||||
Query.</para></listitem>
|
|
||||||
</varlistentry>
|
|
||||||
|
|
||||||
<varlistentry>
|
|
||||||
<term>Query.getxquery()</term>
|
|
||||||
<listitem><para>Return the Xapian query description as a
|
|
||||||
Unicode string.
|
|
||||||
Meaningful only after executexx.</para></listitem>
|
|
||||||
</varlistentry>
|
|
||||||
|
|
||||||
<varlistentry>
|
|
||||||
<term>Query.highlight(text, ishtml = 0, methods = object)</term>
|
|
||||||
<listitem><para>Will insert <span "class=rclmatch">,
|
|
||||||
</span> tags around the match areas in the input text
|
|
||||||
and return the modified text. <literal>ishtml</literal>
|
|
||||||
can be set to indicate that the input text is HTML and
|
|
||||||
that HTML special characters should not be escaped.
|
|
||||||
<literal>methods</literal> if set should be an object
|
|
||||||
with methods startMatch(i) and endMatch() which will be
|
|
||||||
called for each match and should return a begin and end
|
|
||||||
tag</para></listitem>
|
|
||||||
</varlistentry>
|
|
||||||
|
|
||||||
<varlistentry>
|
|
||||||
<term>Query.makedocabstract(doc, methods = object))</term>
|
|
||||||
<listitem><para>Create a snippets abstract
|
|
||||||
for <literal>doc</literal> (a <literal>Doc</literal>
|
|
||||||
object) by selecting text around the match terms.
|
|
||||||
If methods is set, will also perform highlighting. See
|
|
||||||
the highlight method.
|
|
||||||
</para></listitem>
|
|
||||||
</varlistentry>
|
|
||||||
|
|
||||||
<varlistentry>
|
|
||||||
<term>Query.__iter__() and Query.next()</term>
|
|
||||||
<listitem><para>So that things like <literal>for doc in
|
|
||||||
query:</literal> will work.</para></listitem>
|
|
||||||
</varlistentry>
|
|
||||||
</variablelist>
|
|
||||||
|
|
||||||
<variablelist>
|
|
||||||
|
|
||||||
<varlistentry><term>Query.arraysize</term>
|
|
||||||
<listitem><para>Default number of records processed by fetchmany
|
|
||||||
(r/w).</para></listitem>
|
|
||||||
</varlistentry>
|
|
||||||
<varlistentry><term>Query.rowcount</term><listitem><para>Number
|
|
||||||
of records returned by the last
|
|
||||||
execute.</para></listitem></varlistentry>
|
|
||||||
<varlistentry><term>Query.rownumber</term><listitem><para>Next index
|
|
||||||
to be fetched from results. Normally increments after
|
|
||||||
each fetchone() call, but can be set/reset before the
|
|
||||||
call to effect seeking (equivalent to
|
|
||||||
using <literal>scroll()</literal>). Starts at
|
|
||||||
0.</para></listitem>
|
|
||||||
</varlistentry>
|
|
||||||
|
|
||||||
</variablelist>
|
|
||||||
|
|
||||||
</sect5>
|
|
||||||
|
|
||||||
|
|
||||||
<sect5 id="RCL.PROGRAM.PYTHONAPI.RECOLL.CLASSES.DOC">
|
|
||||||
<title>The Doc class</title>
|
|
||||||
|
|
||||||
<para>A <literal>Doc</literal> object contains index data
|
|
||||||
for a given document. The data is extracted from the
|
|
||||||
index when searching, or set by the indexer program when
|
|
||||||
updating. The Doc object has many attributes to be read or
|
|
||||||
set by its user. It matches exactly the Rcl::Doc C++
|
|
||||||
object. Some of the attributes are predefined, but,
|
|
||||||
especially when indexing, others can be set, the name of
|
|
||||||
which will be processed as field names by the indexing
|
|
||||||
configuration. Inputs can be specified as Unicode or
|
|
||||||
strings. Outputs are Unicode objects. All dates are
|
|
||||||
specified as Unix timestamps, printed as strings. Please
|
|
||||||
refer to the <filename>rcldb/rcldoc.cpp</filename> C++ file
|
|
||||||
for a full description of the predefined attributes. Here
|
|
||||||
follows a short list.</para>
|
|
||||||
|
|
||||||
<para><itemizedlist>
|
|
||||||
<listitem><para><literal>url</literal> the document URL but
|
|
||||||
see also <literal>getbinurl()</literal></para></listitem>
|
|
||||||
|
|
||||||
<listitem><para><literal>ipath</literal> the document
|
<listitem><para>Fetches
|
||||||
<literal>ipath</literal> for embedded
|
the next <literal>Doc</literal> objects in the current
|
||||||
documents.</para></listitem>
|
search results, and returns them as an array of the
|
||||||
|
required size, which is by default the value of
|
||||||
|
the <literal>arraysize</literal> data member.</para></listitem>
|
||||||
|
</varlistentry>
|
||||||
|
|
||||||
<listitem><para><literal>fbytes, dbytes</literal> the document
|
<varlistentry>
|
||||||
file and text sizes.</para></listitem>
|
<term>Query.fetchone()</term> <listitem><para>Fetches the
|
||||||
<listitem><para><literal>fmtime, dmtime</literal> the document
|
next <literal>Doc</literal> object from the current
|
||||||
file and document times.</para></listitem>
|
search results. Generates a StopIteration exception if
|
||||||
|
there are no results left.</para></listitem>
|
||||||
<listitem><para><literal>xdocid</literal> the document
|
</varlistentry>
|
||||||
Xapian document ID. This is useful if you want to access
|
|
||||||
the document through a direct Xapian
|
|
||||||
operation.</para></listitem>
|
|
||||||
|
|
||||||
<listitem><para><literal>mtype</literal> the document
|
<varlistentry>
|
||||||
MIME type.</para></listitem>
|
<term>Query.close()</term>
|
||||||
|
<listitem><para>Closes the query. The object is unusable
|
||||||
|
after the call.</para></listitem>
|
||||||
|
</varlistentry>
|
||||||
|
|
||||||
<listitem><para>Fields stored by default:
|
<varlistentry>
|
||||||
<literal>author</literal>, <literal>filename</literal>,
|
<term>Query.scroll(value, mode='relative')</term>
|
||||||
<literal>keywords</literal>,
|
<listitem><para>Adjusts the position in the current result
|
||||||
<literal>recipient</literal></para></listitem>
|
set. <literal>mode</literal> can
|
||||||
|
be <literal>relative</literal>
|
||||||
|
or <literal>absolute</literal>. </para></listitem>
|
||||||
|
</varlistentry>
|
||||||
|
|
||||||
</itemizedlist>
|
<varlistentry>
|
||||||
</para>
|
<term>Query.getgroups()</term>
|
||||||
|
<listitem><para>Retrieves the expanded query terms as a list
|
||||||
<para>At query time, only the fields that are defined
|
of pairs. Meaningful only after executexx In each
|
||||||
as <literal>stored</literal> either by default or in
|
pair, the first entry is a list of user terms (of size
|
||||||
the <filename>fields</filename> configuration file will be
|
one for simple terms, or more for group and phrase
|
||||||
meaningful in the <literal>Doc</literal>
|
clauses), the second a list of query terms as derived
|
||||||
object. Especially this will not be the case for the
|
from the user terms and used in the Xapian
|
||||||
document text. See the <literal>rclextract</literal>
|
Query.</para></listitem>
|
||||||
module for accessing document contents.</para>
|
</varlistentry>
|
||||||
|
|
||||||
|
<varlistentry>
|
||||||
|
<term>Query.getxquery()</term>
|
||||||
|
<listitem><para>Return the Xapian query description as a
|
||||||
|
Unicode string.
|
||||||
|
Meaningful only after executexx.</para></listitem>
|
||||||
|
</varlistentry>
|
||||||
|
|
||||||
<variablelist>
|
<varlistentry>
|
||||||
|
<term>Query.highlight(text, ishtml = 0, methods = object)</term>
|
||||||
|
<listitem><para>Will insert <span "class=rclmatch">,
|
||||||
|
</span> tags around the match areas in the input text
|
||||||
|
and return the modified text. <literal>ishtml</literal>
|
||||||
|
can be set to indicate that the input text is HTML and
|
||||||
|
that HTML special characters should not be escaped.
|
||||||
|
<literal>methods</literal> if set should be an object
|
||||||
|
with methods startMatch(i) and endMatch() which will be
|
||||||
|
called for each match and should return a begin and end
|
||||||
|
tag</para></listitem>
|
||||||
|
</varlistentry>
|
||||||
|
|
||||||
<varlistentry>
|
<varlistentry>
|
||||||
<term>get(key), [] operator</term>
|
<term>Query.makedocabstract(doc, methods = object))</term>
|
||||||
|
<listitem><para>Create a snippets abstract
|
||||||
|
for <literal>doc</literal> (a <literal>Doc</literal>
|
||||||
|
object) by selecting text around the match terms.
|
||||||
|
If methods is set, will also perform highlighting. See
|
||||||
|
the highlight method.
|
||||||
|
</para></listitem>
|
||||||
|
</varlistentry>
|
||||||
|
|
||||||
|
<varlistentry>
|
||||||
|
<term>Query.__iter__() and Query.next()</term>
|
||||||
|
<listitem><para>So that things like <literal>for doc in
|
||||||
|
query:</literal> will work.</para></listitem>
|
||||||
|
</varlistentry>
|
||||||
|
</variablelist>
|
||||||
|
|
||||||
<listitem><para>Retrieve the named document
|
<variablelist>
|
||||||
attribute. You can also use <literal>getattr(doc,
|
|
||||||
key)</literal> or
|
|
||||||
<literal>doc.key</literal>.</para></listitem>
|
|
||||||
</varlistentry>
|
|
||||||
|
|
||||||
<varlistentry>
|
<varlistentry><term>Query.arraysize</term>
|
||||||
<term>doc.key = value</term>
|
<listitem><para>Default number of records processed by fetchmany
|
||||||
|
(r/w).</para></listitem>
|
||||||
|
</varlistentry>
|
||||||
|
<varlistentry><term>Query.rowcount</term><listitem><para>Number
|
||||||
|
of records returned by the last
|
||||||
|
execute.</para></listitem></varlistentry>
|
||||||
|
<varlistentry><term>Query.rownumber</term><listitem><para>Next index
|
||||||
|
to be fetched from results. Normally increments after
|
||||||
|
each fetchone() call, but can be set/reset before the
|
||||||
|
call to effect seeking (equivalent to
|
||||||
|
using <literal>scroll()</literal>). Starts at
|
||||||
|
0.</para></listitem>
|
||||||
|
</varlistentry>
|
||||||
|
|
||||||
<listitem><para>Set the the named document attribute. You
|
</variablelist>
|
||||||
can also use <literal>setattr(doc, key,
|
|
||||||
value)</literal>.</para></listitem>
|
|
||||||
</varlistentry>
|
|
||||||
|
|
||||||
<varlistentry>
|
</simplesect>
|
||||||
<term>getbinurl()</term>
|
<simplesect id="RCL.PROGRAM.PYTHONAPI.RECOLL.CLASSES.DOC">
|
||||||
|
<title>The Doc class</title>
|
||||||
|
|
||||||
<listitem><para>Retrieve the URL in byte array format (no
|
<para>A <literal>Doc</literal> object contains index data
|
||||||
transcoding), for use as parameter to a system
|
for a given document. The data is extracted from the
|
||||||
call.</para></listitem>
|
index when searching, or set by the indexer program when
|
||||||
</varlistentry>
|
updating. The Doc object has many attributes to be read or
|
||||||
|
set by its user. It mostly matches the Rcl::Doc C++
|
||||||
|
object. Some of the attributes are predefined, but,
|
||||||
|
especially when indexing, others can be set, the name of
|
||||||
|
which will be processed as field names by the indexing
|
||||||
|
configuration. Inputs can be specified as Unicode or
|
||||||
|
strings. Outputs are Unicode objects. All dates are
|
||||||
|
specified as Unix timestamps, printed as strings. Please
|
||||||
|
refer to the <filename>rcldb/rcldoc.cpp</filename> C++ file
|
||||||
|
for a full description of the predefined attributes. Here
|
||||||
|
follows a short list.</para>
|
||||||
|
|
||||||
<varlistentry>
|
<para><itemizedlist>
|
||||||
<term>setbinurl(url)</term>
|
<listitem><para><literal>url</literal> the document URL but
|
||||||
|
see also <literal>getbinurl()</literal></para></listitem>
|
||||||
|
|
||||||
|
<listitem><para><literal>ipath</literal> the document
|
||||||
|
<literal>ipath</literal> for embedded
|
||||||
|
documents.</para></listitem>
|
||||||
|
|
||||||
<listitem><para>Set the URL in byte array format (no
|
<listitem><para><literal>fbytes, dbytes</literal> the document
|
||||||
transcoding).</para></listitem>
|
file and text sizes.</para></listitem>
|
||||||
</varlistentry>
|
<listitem><para><literal>fmtime, dmtime</literal> the document
|
||||||
|
file and document times.</para></listitem>
|
||||||
|
|
||||||
|
<listitem><para><literal>xdocid</literal> the document
|
||||||
|
Xapian document ID. This is useful if you want to access
|
||||||
|
the document through a direct Xapian
|
||||||
|
operation.</para></listitem>
|
||||||
|
|
||||||
<varlistentry>
|
<listitem><para><literal>mtype</literal> the document
|
||||||
<term>items()</term>
|
MIME type.</para></listitem>
|
||||||
<listitem><para>Return a dictionary of doc object
|
|
||||||
keys/values</para></listitem>
|
|
||||||
</varlistentry>
|
|
||||||
|
|
||||||
<varlistentry>
|
<listitem><para>Fields stored by default:
|
||||||
<term>keys()</term>
|
<literal>author</literal>, <literal>filename</literal>,
|
||||||
<listitem><para>list of doc object keys (attribute
|
<literal>keywords</literal>,
|
||||||
names).</para></listitem>
|
<literal>recipient</literal></para></listitem>
|
||||||
</varlistentry>
|
|
||||||
</variablelist>
|
|
||||||
|
|
||||||
</sect5> <!-- Doc -->
|
</itemizedlist>
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<para>At query time, only the fields that are defined as
|
||||||
|
<literal>stored</literal> either by default or in the
|
||||||
|
<filename>fields</filename> configuration file will be meaningful
|
||||||
|
in the <literal>Doc</literal> object. The document processed text
|
||||||
|
may be present or not, depending if the index stores the text at
|
||||||
|
all, and if it does, on the <literal>fetchtext</literal> query
|
||||||
|
execute option. See also the <literal>rclextract</literal> module
|
||||||
|
for accessing document contents.</para>
|
||||||
|
|
||||||
<sect5 id="RCL.PROGRAM.PYTHONAPI.RECOLL.CLASSES.SEARCHDATA">
|
<variablelist>
|
||||||
<title>The SearchData class</title>
|
|
||||||
|
|
||||||
<para>A <literal>SearchData</literal> object allows building
|
<varlistentry>
|
||||||
a query by combining clauses, for execution
|
<term>get(key), [] operator</term>
|
||||||
by <literal>Query.executesd()</literal>. It can be used
|
|
||||||
in replacement of the query language approach. The
|
|
||||||
interface is going to change a little, so no detailed doc
|
|
||||||
for now...</para>
|
|
||||||
|
|
||||||
<variablelist>
|
<listitem><para>Retrieve the named document
|
||||||
|
attribute. You can also use <literal>getattr(doc,
|
||||||
|
key)</literal> or
|
||||||
|
<literal>doc.key</literal>.</para></listitem>
|
||||||
|
</varlistentry>
|
||||||
|
|
||||||
<varlistentry>
|
<varlistentry>
|
||||||
<term>addclause(type='and'|'or'|'excl'|'phrase'|'near'|'sub',
|
<term>doc.key = value</term>
|
||||||
qstring=string, slack=0, field='', stemming=1,
|
|
||||||
subSearch=SearchData)</term>
|
|
||||||
<listitem><para></para></listitem>
|
|
||||||
</varlistentry>
|
|
||||||
</variablelist>
|
|
||||||
|
|
||||||
</sect5> <!-- SearchData -->
|
<listitem><para>Set the the named document attribute. You
|
||||||
|
can also use <literal>setattr(doc, key,
|
||||||
|
value)</literal>.</para></listitem>
|
||||||
|
</varlistentry>
|
||||||
|
|
||||||
</sect4> <!-- recoll.classes -->
|
<varlistentry>
|
||||||
</sect3> <!-- Recoll module -->
|
<term>getbinurl()</term>
|
||||||
|
|
||||||
|
<listitem><para>Retrieve the URL in byte array format (no
|
||||||
|
transcoding), for use as parameter to a system
|
||||||
|
call.</para></listitem>
|
||||||
|
</varlistentry>
|
||||||
|
|
||||||
|
<varlistentry>
|
||||||
|
<term>setbinurl(url)</term>
|
||||||
|
|
||||||
|
<listitem><para>Set the URL in byte array format (no
|
||||||
|
transcoding).</para></listitem>
|
||||||
|
</varlistentry>
|
||||||
|
|
||||||
|
<varlistentry>
|
||||||
|
<term>items()</term>
|
||||||
|
<listitem><para>Return a dictionary of doc object
|
||||||
|
keys/values</para></listitem>
|
||||||
|
</varlistentry>
|
||||||
|
|
||||||
|
<varlistentry>
|
||||||
|
<term>keys()</term>
|
||||||
|
<listitem><para>list of doc object keys (attribute
|
||||||
|
names).</para></listitem>
|
||||||
|
</varlistentry>
|
||||||
|
</variablelist>
|
||||||
|
|
||||||
|
</simplesect> <!-- Doc -->
|
||||||
|
|
||||||
|
<simplesect id="RCL.PROGRAM.PYTHONAPI.RECOLL.CLASSES.SEARCHDATA">
|
||||||
|
<title>The SearchData class</title>
|
||||||
|
|
||||||
|
<para>A <literal>SearchData</literal> object allows building
|
||||||
|
a query by combining clauses, for execution
|
||||||
|
by <literal>Query.executesd()</literal>. It can be used
|
||||||
|
in replacement of the query language approach. The
|
||||||
|
interface is going to change a little, so no detailed doc
|
||||||
|
for now...</para>
|
||||||
|
|
||||||
|
<variablelist>
|
||||||
|
|
||||||
|
<varlistentry>
|
||||||
|
<term>addclause(type='and'|'or'|'excl'|'phrase'|'near'|'sub',
|
||||||
|
qstring=string, slack=0, field='', stemming=1,
|
||||||
|
subSearch=SearchData)</term>
|
||||||
|
<listitem><para></para></listitem>
|
||||||
|
</varlistentry>
|
||||||
|
</variablelist>
|
||||||
|
|
||||||
|
</simplesect> <!-- SearchData -->
|
||||||
|
|
||||||
|
</sect3> <!-- Recoll module -->
|
||||||
|
|
||||||
<sect3 id="RCL.PROGRAM.PYTHONAPI.RCLEXTRACT">
|
<sect3 id="RCL.PROGRAM.PYTHONAPI.RCLEXTRACT">
|
||||||
<title>The rclextract module</title>
|
<title>The rclextract module</title>
|
||||||
|
|
||||||
|
|
||||||
<para>Prior to &RCL; 1.25, index queries never provide document
|
<para>Prior to &RCL; 1.25, index queries could not provide document
|
||||||
content because it is not stored. More recent versions usually
|
content because it was never stored. &RCL; 1.25 and later usually
|
||||||
store the document text, which can be optionally retrieved when
|
store the document text, which can be optionally retrieved when
|
||||||
running a query (see <literal>query.execute()</literal>
|
running a query (see <literal>query.execute()</literal>
|
||||||
above - the result is always plain text).</para>
|
above - the result is always plain text).</para>
|
||||||
@ -5506,7 +5494,7 @@ recollindex -c "$confdir"
|
|||||||
<para>You need to import the <literal>recoll</literal> module
|
<para>You need to import the <literal>recoll</literal> module
|
||||||
before the <literal>rclextract</literal> module.</para>
|
before the <literal>rclextract</literal> module.</para>
|
||||||
|
|
||||||
<sect4 id="RCL.PROGRAM.PYTHONAPI.RCLEXTRACT.CLASSES.EXTRACTOR">
|
<simplesect id="RCL.PROGRAM.PYTHONAPI.RCLEXTRACT.CLASSES.EXTRACTOR">
|
||||||
<title>The Extractor class</title>
|
<title>The Extractor class</title>
|
||||||
|
|
||||||
<variablelist>
|
<variablelist>
|
||||||
@ -5565,7 +5553,7 @@ not doc.ipath and (not "rclbes" in doc.keys() or doc["rclbes"] == "FS")
|
|||||||
|
|
||||||
</variablelist>
|
</variablelist>
|
||||||
|
|
||||||
</sect4>
|
</simplesect>
|
||||||
</sect3> <!-- rclextract module -->
|
</sect3> <!-- rclextract module -->
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user