doc
This commit is contained in:
parent
ad89225b24
commit
3ebf1a7db2
@ -17,8 +17,9 @@ XSLDIR="/usr/share/xml/docbook/stylesheet/docbook-xsl/"
|
||||
|
||||
# Options common to the single-file and chunked versions
|
||||
commonoptions=--stringparam section.autolabel 1 \
|
||||
--stringparam section.autolabel.max.depth 3 \
|
||||
--stringparam section.autolabel.max.depth 2 \
|
||||
--stringparam section.label.includes.component.label 1 \
|
||||
--stringparam toc.max.depth 3 \
|
||||
--stringparam autotoc.label.in.hyperlink 0 \
|
||||
--stringparam abstract.notitle.enabled 1 \
|
||||
--stringparam html.stylesheet docbook-xsl.css \
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
@ -4966,13 +4966,14 @@ recollindex -c "$confdir"
|
||||
<sect2 id="RCL.PROGRAM.PYTHONAPI.INTRO">
|
||||
<title>Introduction</title>
|
||||
|
||||
<para>&RCL; versions after 1.11 define a Python programming
|
||||
interface, both for searching and creating/updating an
|
||||
index.</para>
|
||||
<para>The &RCL; Python programming interface can be used both for
|
||||
searching and for creating/updating an index. Bindings exist for
|
||||
Python2 and Python3.</para>
|
||||
|
||||
<para>The search interface is used in the &RCL; Ubuntu Unity Lens
|
||||
and the &RCL; Web UI. It can run queries on any &RCL;
|
||||
configuration.</para>
|
||||
<para>The search interface is used in a number of active projects:
|
||||
the &RCL; <application>Gnome Shell Search Provider</application>,
|
||||
the &RCL; Web UI, and the upmpdcli UPnP Media Server, in addition
|
||||
to many small scripts.</para>
|
||||
|
||||
<para>The index update section of the API may be used to create and
|
||||
update &RCL; indexes on specific configurations (separate from the
|
||||
@ -4998,6 +4999,19 @@ recollindex -c "$confdir"
|
||||
paragraph at the end of this section will explain a few differences
|
||||
and ways to write code compatible with both versions.</para>
|
||||
|
||||
<para>The <literal>recoll</literal> package now contains two
|
||||
modules:</para>
|
||||
<itemizedlist>
|
||||
<listitem><para>The <literal>recoll</literal> module contains
|
||||
functions and classes used to query (or update) the
|
||||
index.</para></listitem>
|
||||
|
||||
<listitem><para>The <literal>rclextract</literal> module contains
|
||||
functions and classes used at query time to access document
|
||||
data.</para>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
|
||||
<para>There is a good chance that your system repository has
|
||||
packages for the Recoll Python API, sometimes in a package separate
|
||||
from the main one (maybe named something like python-recoll). Else
|
||||
@ -5022,13 +5036,17 @@ recollindex -c "$confdir"
|
||||
nres = query.execute("some query")
|
||||
results = query.fetchmany(20)
|
||||
for doc in results:
|
||||
print(doc.url, doc.title)
|
||||
print("%s %s" % (doc.url, doc.title))
|
||||
]]></programlisting>
|
||||
|
||||
<para>You can also take a look at the source for the <ulink
|
||||
url="https://github.com/koniu/recoll-webui">Recoll
|
||||
WebUI</ulink>, or the <ulink url="https://opensourceprojects.eu/p/upmpdcli/code/ci/c8c8e75bd181ad9db2df14da05934e53ca867a06/tree/src/mediaserver/cdplugins/uprcl/uprclfolders.py">upmpdcli local media server</ulink>, which are both
|
||||
based on the Python API.</para>
|
||||
<para>You can also take a look at the source for the
|
||||
<ulink url="https://opensourceprojects.eu/p/recollwebui/code/ci/78ddb20787b2a894b5e4661a8d5502c4511cf71e/tree/">Recoll
|
||||
WebUI</ulink>, the
|
||||
<ulink url="https://opensourceprojects.eu/p/upmpdcli/code/ci/c8c8e75bd181ad9db2df14da05934e53ca867a06/tree/src/mediaserver/cdplugins/uprcl/uprclfolders.py">upmpdcli
|
||||
local media server</ulink>, or the
|
||||
<ulink
|
||||
url="https://opensourceprojects.eu/p/recollgssp/code/ci/3f120108e099f9d687306c0be61593994326d52d/tree/gssp-recoll.py">Gnome
|
||||
Shell Search Provider</ulink>.</para>
|
||||
|
||||
</sect2>
|
||||
|
||||
@ -5104,10 +5122,14 @@ recollindex -c "$confdir"
|
||||
|
||||
<varlistentry>
|
||||
<term>Stored and indexed fields</term>
|
||||
<listitem><para>The <filename>fields</filename> file inside
|
||||
the &RCL; configuration defines which document fields are
|
||||
either "indexed" (searchable), "stored" (retrievable with
|
||||
search results), or both.</para>
|
||||
<listitem><para>The <link
|
||||
linkend="RCL.INSTALL.CONFIG.FIELDS"><filename>fields</filename>
|
||||
file</link> inside the &RCL; configuration defines which
|
||||
document fields are either <literal>indexed</literal>
|
||||
(searchable), <literal>stored</literal> (retrievable with
|
||||
search results), or both. Apart from a few standard/internal
|
||||
fields, only the <literal>stored</literal> fields are
|
||||
retrievable through the Python search interface.</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
@ -5118,381 +5140,347 @@ recollindex -c "$confdir"
|
||||
<sect2 id="RCL.PROGRAM.PYTHONAPI.SEARCH">
|
||||
<title>Python search interface</title>
|
||||
|
||||
<sect3 id="RCL.PROGRAM.PYTHONAPI.PACKAGE">
|
||||
<title>Recoll package</title>
|
||||
|
||||
<para>The <literal>recoll</literal> package contains two
|
||||
modules:
|
||||
<itemizedlist>
|
||||
<listitem><para>The <literal>recoll</literal> module contains
|
||||
functions and classes used to query (or update) the
|
||||
index. This section will only describe the query part, see
|
||||
further for the update part.</para></listitem>
|
||||
<listitem><para>The <literal>rclextract</literal> module contains
|
||||
functions and classes used to access document
|
||||
data.</para></listitem>
|
||||
</itemizedlist>
|
||||
</para>
|
||||
</sect3>
|
||||
|
||||
<sect3 id="RCL.PROGRAM.PYTHONAPI.RECOLL">
|
||||
<title>The recoll module</title>
|
||||
|
||||
<sect4 id="RCL.PROGRAM.PYTHONAPI.RECOLL.FUNCTIONS">
|
||||
<title>Functions</title>
|
||||
<simplesect id="RCL.PROGRAM.PYTHONAPI.RECOLL.CONNECT">
|
||||
<title>connect(confdir=None, extra_dbs=None, writable = False)</title>
|
||||
|
||||
<variablelist>
|
||||
<varlistentry>
|
||||
<term>connect(confdir=None, extra_dbs=None,
|
||||
writable = False)</term>
|
||||
<listitem>
|
||||
<para>The <literal>connect()</literal> function connects to
|
||||
one or several &RCL; index(es) and returns
|
||||
a <literal>Db</literal> object.</para>
|
||||
<itemizedlist>
|
||||
<listitem><para><literal>confdir</literal> may specify
|
||||
a configuration directory. The usual defaults
|
||||
apply.</para></listitem>
|
||||
<listitem><para><literal>extra_dbs</literal> is a list of
|
||||
additional indexes (Xapian directories).</para></listitem>
|
||||
<listitem><para><literal>writable</literal> decides if
|
||||
we can index new data through this
|
||||
connection.</para></listitem>
|
||||
</itemizedlist>
|
||||
<para>This call initializes the recoll module, and it should
|
||||
always be performed before any other call or object
|
||||
creation.</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
</variablelist>
|
||||
</sect4>
|
||||
<para>The <literal>connect()</literal> function connects to
|
||||
one or several &RCL; index(es) and returns
|
||||
a <literal>Db</literal> object.</para>
|
||||
<para>This call initializes the recoll module, and it should
|
||||
always be performed before any other call or object
|
||||
creation.</para>
|
||||
<itemizedlist>
|
||||
<listitem><para><literal>confdir</literal> may specify
|
||||
a configuration directory. The usual defaults
|
||||
apply.</para></listitem>
|
||||
<listitem><para><literal>extra_dbs</literal> is a list of
|
||||
additional indexes (Xapian directories).</para></listitem>
|
||||
<listitem><para><literal>writable</literal> decides if
|
||||
we can index new data through this
|
||||
connection.</para></listitem>
|
||||
</itemizedlist>
|
||||
</simplesect>
|
||||
|
||||
<simplesect id="RCL.PROGRAM.PYTHONAPI.RECOLL.DB">
|
||||
<title>The Db class</title>
|
||||
|
||||
<sect4 id="RCL.PROGRAM.PYTHONAPI.RECOLL.CLASSES">
|
||||
<title>Classes</title>
|
||||
<para>A Db object is created by a <literal>connect()</literal>
|
||||
call and holds a connection to a Recoll index.</para>
|
||||
<variablelist>
|
||||
<varlistentry>
|
||||
<term>Db.close()</term>
|
||||
<listitem><para>Closes the connection. You can't do anything
|
||||
with the <literal>Db</literal> object after
|
||||
this.</para></listitem>
|
||||
</varlistentry>
|
||||
<varlistentry>
|
||||
<term>Db.query(), Db.cursor()</term> <listitem><para>These
|
||||
aliases return a blank <literal>Query</literal> object
|
||||
for this index.</para></listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>Db.setAbstractParams(maxchars,
|
||||
contextwords)</term> <listitem><para>Set the parameters used
|
||||
to build snippets (sets of keywords in context text
|
||||
fragments). <literal>maxchars</literal> defines the
|
||||
maximum total size of the abstract.
|
||||
<literal>contextwords</literal> defines how many
|
||||
terms are shown around the keyword.</para></listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>Db.termMatch(match_type, expr, field='',
|
||||
maxlen=-1, casesens=False, diacsens=False, lang='english')
|
||||
</term>
|
||||
<listitem><para>Expand an expression against the
|
||||
index term list. Performs the basic function from the
|
||||
GUI term explorer tool. <literal>match_type</literal>
|
||||
can be either
|
||||
of <literal>wildcard</literal>, <literal>regexp</literal>
|
||||
or <literal>stem</literal>. Returns a list of terms
|
||||
expanded from the input expression.
|
||||
</para></listitem>
|
||||
</varlistentry>
|
||||
|
||||
</variablelist>
|
||||
|
||||
</simplesect>
|
||||
<simplesect id="RCL.PROGRAM.PYTHONAPI.RECOLL.CLASSES.QUERY">
|
||||
<title>The Query class</title>
|
||||
|
||||
<para>A <literal>Query</literal> object (equivalent to a
|
||||
cursor in the Python DB API) is created by
|
||||
a <literal>Db.query()</literal> call. It is used to
|
||||
execute index searches.</para>
|
||||
|
||||
<variablelist>
|
||||
|
||||
<varlistentry>
|
||||
<term>Query.sortby(fieldname, ascending=True)</term>
|
||||
<listitem><para>Sort results
|
||||
by <replaceable>fieldname</replaceable>, in ascending
|
||||
or descending order. Must be called before executing
|
||||
the search.</para></listitem>
|
||||
</varlistentry>
|
||||
|
||||
<sect5 id="RCL.PROGRAM.PYTHONAPI.RECOLL.CLASSES.DB">
|
||||
<title>The Db class</title>
|
||||
<varlistentry>
|
||||
<term>Query.execute(query_string, stemming=1,
|
||||
stemlang="english", fetchtext=False)</term>
|
||||
<listitem><para>Starts a search
|
||||
for <replaceable>query_string</replaceable>, a &RCL;
|
||||
search language string. If the index stores the document
|
||||
texts and <literal>fetchtext</literal> is True, store the
|
||||
document extracted text in
|
||||
<literal>doc.text</literal>.</para></listitem>
|
||||
</varlistentry>
|
||||
|
||||
<para>A Db object is created by
|
||||
a <literal>connect()</literal> call and holds a
|
||||
connection to a Recoll index.</para>
|
||||
<variablelist>
|
||||
<varlistentry>
|
||||
<term>Db.close()</term>
|
||||
<listitem><para>Closes the connection. You can't do anything
|
||||
with the <literal>Db</literal> object after
|
||||
this.</para></listitem>
|
||||
</varlistentry>
|
||||
<varlistentry>
|
||||
<term>Db.query(), Db.cursor()</term> <listitem><para>These
|
||||
aliases return a blank <literal>Query</literal> object
|
||||
for this index.</para></listitem>
|
||||
</varlistentry>
|
||||
<varlistentry>
|
||||
<term>Query.executesd(SearchData, fetchtext=False)</term>
|
||||
<listitem><para>Starts a search for the query defined by
|
||||
the SearchData object. If the index stores the document
|
||||
texts and <literal>fetchtext</literal> is True, store the
|
||||
document extracted text in
|
||||
<literal>doc.text</literal>.</para></listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>Db.setAbstractParams(maxchars,
|
||||
contextwords)</term> <listitem><para>Set the parameters used
|
||||
to build snippets (sets of keywords in context text
|
||||
fragments). <literal>maxchars</literal> defines the
|
||||
maximum total size of the abstract.
|
||||
<literal>contextwords</literal> defines how many
|
||||
terms are shown around the keyword.</para></listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>Db.termMatch(match_type, expr, field='',
|
||||
maxlen=-1, casesens=False, diacsens=False, lang='english')
|
||||
</term>
|
||||
<listitem><para>Expand an expression against the
|
||||
index term list. Performs the basic function from the
|
||||
GUI term explorer tool. <literal>match_type</literal>
|
||||
can be either
|
||||
of <literal>wildcard</literal>, <literal>regexp</literal>
|
||||
or <literal>stem</literal>. Returns a list of terms
|
||||
expanded from the input expression.
|
||||
</para></listitem>
|
||||
</varlistentry>
|
||||
|
||||
</variablelist>
|
||||
|
||||
</sect5>
|
||||
|
||||
|
||||
<sect5 id="RCL.PROGRAM.PYTHONAPI.RECOLL.CLASSES.QUERY">
|
||||
<title>The Query class</title>
|
||||
|
||||
<para>A <literal>Query</literal> object (equivalent to a
|
||||
cursor in the Python DB API) is created by
|
||||
a <literal>Db.query()</literal> call. It is used to
|
||||
execute index searches.</para>
|
||||
|
||||
<variablelist>
|
||||
|
||||
<varlistentry>
|
||||
<term>Query.sortby(fieldname, ascending=True)</term>
|
||||
<listitem><para>Sort results
|
||||
by <replaceable>fieldname</replaceable>, in ascending
|
||||
or descending order. Must be called before executing
|
||||
the search.</para></listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>Query.execute(query_string, stemming=1,
|
||||
stemlang="english", fetchtext=False)</term>
|
||||
<listitem><para>Starts a search
|
||||
for <replaceable>query_string</replaceable>, a &RCL;
|
||||
search language string. If the index stores the document
|
||||
texts and <literal>fetchtext</literal> is True, store the
|
||||
document extracted text in
|
||||
<literal>doc.text</literal>.</para></listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>Query.executesd(SearchData, fetchtext=False)</term>
|
||||
<listitem><para>Starts a search for the query defined by
|
||||
the SearchData object. If the index stores the document
|
||||
texts and <literal>fetchtext</literal> is True, store the
|
||||
document extracted text in
|
||||
<literal>doc.text</literal>.</para></listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>Query.fetchmany(size=query.arraysize)</term>
|
||||
|
||||
<listitem><para>Fetches
|
||||
the next <literal>Doc</literal> objects in the current
|
||||
search results, and returns them as an array of the
|
||||
required size, which is by default the value of
|
||||
the <literal>arraysize</literal> data member.</para></listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>Query.fetchone()</term> <listitem><para>Fetches the
|
||||
next <literal>Doc</literal> object from the current
|
||||
search results. Generates a StopIteration exception if
|
||||
there are no results left.</para></listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>Query.close()</term>
|
||||
<listitem><para>Closes the query. The object is unusable
|
||||
after the call.</para></listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>Query.scroll(value, mode='relative')</term>
|
||||
<listitem><para>Adjusts the position in the current result
|
||||
set. <literal>mode</literal> can
|
||||
be <literal>relative</literal>
|
||||
or <literal>absolute</literal>. </para></listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>Query.getgroups()</term>
|
||||
<listitem><para>Retrieves the expanded query terms as a list
|
||||
of pairs. Meaningful only after executexx In each
|
||||
pair, the first entry is a list of user terms (of size
|
||||
one for simple terms, or more for group and phrase
|
||||
clauses), the second a list of query terms as derived
|
||||
from the user terms and used in the Xapian
|
||||
Query.</para></listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>Query.getxquery()</term>
|
||||
<listitem><para>Return the Xapian query description as a
|
||||
Unicode string.
|
||||
Meaningful only after executexx.</para></listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>Query.highlight(text, ishtml = 0, methods = object)</term>
|
||||
<listitem><para>Will insert <span "class=rclmatch">,
|
||||
</span> tags around the match areas in the input text
|
||||
and return the modified text. <literal>ishtml</literal>
|
||||
can be set to indicate that the input text is HTML and
|
||||
that HTML special characters should not be escaped.
|
||||
<literal>methods</literal> if set should be an object
|
||||
with methods startMatch(i) and endMatch() which will be
|
||||
called for each match and should return a begin and end
|
||||
tag</para></listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>Query.makedocabstract(doc, methods = object))</term>
|
||||
<listitem><para>Create a snippets abstract
|
||||
for <literal>doc</literal> (a <literal>Doc</literal>
|
||||
object) by selecting text around the match terms.
|
||||
If methods is set, will also perform highlighting. See
|
||||
the highlight method.
|
||||
</para></listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>Query.__iter__() and Query.next()</term>
|
||||
<listitem><para>So that things like <literal>for doc in
|
||||
query:</literal> will work.</para></listitem>
|
||||
</varlistentry>
|
||||
</variablelist>
|
||||
|
||||
<variablelist>
|
||||
|
||||
<varlistentry><term>Query.arraysize</term>
|
||||
<listitem><para>Default number of records processed by fetchmany
|
||||
(r/w).</para></listitem>
|
||||
</varlistentry>
|
||||
<varlistentry><term>Query.rowcount</term><listitem><para>Number
|
||||
of records returned by the last
|
||||
execute.</para></listitem></varlistentry>
|
||||
<varlistentry><term>Query.rownumber</term><listitem><para>Next index
|
||||
to be fetched from results. Normally increments after
|
||||
each fetchone() call, but can be set/reset before the
|
||||
call to effect seeking (equivalent to
|
||||
using <literal>scroll()</literal>). Starts at
|
||||
0.</para></listitem>
|
||||
</varlistentry>
|
||||
|
||||
</variablelist>
|
||||
|
||||
</sect5>
|
||||
|
||||
|
||||
<sect5 id="RCL.PROGRAM.PYTHONAPI.RECOLL.CLASSES.DOC">
|
||||
<title>The Doc class</title>
|
||||
|
||||
<para>A <literal>Doc</literal> object contains index data
|
||||
for a given document. The data is extracted from the
|
||||
index when searching, or set by the indexer program when
|
||||
updating. The Doc object has many attributes to be read or
|
||||
set by its user. It matches exactly the Rcl::Doc C++
|
||||
object. Some of the attributes are predefined, but,
|
||||
especially when indexing, others can be set, the name of
|
||||
which will be processed as field names by the indexing
|
||||
configuration. Inputs can be specified as Unicode or
|
||||
strings. Outputs are Unicode objects. All dates are
|
||||
specified as Unix timestamps, printed as strings. Please
|
||||
refer to the <filename>rcldb/rcldoc.cpp</filename> C++ file
|
||||
for a full description of the predefined attributes. Here
|
||||
follows a short list.</para>
|
||||
|
||||
<para><itemizedlist>
|
||||
<listitem><para><literal>url</literal> the document URL but
|
||||
see also <literal>getbinurl()</literal></para></listitem>
|
||||
<varlistentry>
|
||||
<term>Query.fetchmany(size=query.arraysize)</term>
|
||||
|
||||
<listitem><para><literal>ipath</literal> the document
|
||||
<literal>ipath</literal> for embedded
|
||||
documents.</para></listitem>
|
||||
<listitem><para>Fetches
|
||||
the next <literal>Doc</literal> objects in the current
|
||||
search results, and returns them as an array of the
|
||||
required size, which is by default the value of
|
||||
the <literal>arraysize</literal> data member.</para></listitem>
|
||||
</varlistentry>
|
||||
|
||||
<listitem><para><literal>fbytes, dbytes</literal> the document
|
||||
file and text sizes.</para></listitem>
|
||||
<listitem><para><literal>fmtime, dmtime</literal> the document
|
||||
file and document times.</para></listitem>
|
||||
|
||||
<listitem><para><literal>xdocid</literal> the document
|
||||
Xapian document ID. This is useful if you want to access
|
||||
the document through a direct Xapian
|
||||
operation.</para></listitem>
|
||||
<varlistentry>
|
||||
<term>Query.fetchone()</term> <listitem><para>Fetches the
|
||||
next <literal>Doc</literal> object from the current
|
||||
search results. Generates a StopIteration exception if
|
||||
there are no results left.</para></listitem>
|
||||
</varlistentry>
|
||||
|
||||
<listitem><para><literal>mtype</literal> the document
|
||||
MIME type.</para></listitem>
|
||||
<varlistentry>
|
||||
<term>Query.close()</term>
|
||||
<listitem><para>Closes the query. The object is unusable
|
||||
after the call.</para></listitem>
|
||||
</varlistentry>
|
||||
|
||||
<listitem><para>Fields stored by default:
|
||||
<literal>author</literal>, <literal>filename</literal>,
|
||||
<literal>keywords</literal>,
|
||||
<literal>recipient</literal></para></listitem>
|
||||
<varlistentry>
|
||||
<term>Query.scroll(value, mode='relative')</term>
|
||||
<listitem><para>Adjusts the position in the current result
|
||||
set. <literal>mode</literal> can
|
||||
be <literal>relative</literal>
|
||||
or <literal>absolute</literal>. </para></listitem>
|
||||
</varlistentry>
|
||||
|
||||
</itemizedlist>
|
||||
</para>
|
||||
|
||||
<para>At query time, only the fields that are defined
|
||||
as <literal>stored</literal> either by default or in
|
||||
the <filename>fields</filename> configuration file will be
|
||||
meaningful in the <literal>Doc</literal>
|
||||
object. Especially this will not be the case for the
|
||||
document text. See the <literal>rclextract</literal>
|
||||
module for accessing document contents.</para>
|
||||
<varlistentry>
|
||||
<term>Query.getgroups()</term>
|
||||
<listitem><para>Retrieves the expanded query terms as a list
|
||||
of pairs. Meaningful only after executexx In each
|
||||
pair, the first entry is a list of user terms (of size
|
||||
one for simple terms, or more for group and phrase
|
||||
clauses), the second a list of query terms as derived
|
||||
from the user terms and used in the Xapian
|
||||
Query.</para></listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>Query.getxquery()</term>
|
||||
<listitem><para>Return the Xapian query description as a
|
||||
Unicode string.
|
||||
Meaningful only after executexx.</para></listitem>
|
||||
</varlistentry>
|
||||
|
||||
<variablelist>
|
||||
<varlistentry>
|
||||
<term>Query.highlight(text, ishtml = 0, methods = object)</term>
|
||||
<listitem><para>Will insert <span "class=rclmatch">,
|
||||
</span> tags around the match areas in the input text
|
||||
and return the modified text. <literal>ishtml</literal>
|
||||
can be set to indicate that the input text is HTML and
|
||||
that HTML special characters should not be escaped.
|
||||
<literal>methods</literal> if set should be an object
|
||||
with methods startMatch(i) and endMatch() which will be
|
||||
called for each match and should return a begin and end
|
||||
tag</para></listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>get(key), [] operator</term>
|
||||
<varlistentry>
|
||||
<term>Query.makedocabstract(doc, methods = object))</term>
|
||||
<listitem><para>Create a snippets abstract
|
||||
for <literal>doc</literal> (a <literal>Doc</literal>
|
||||
object) by selecting text around the match terms.
|
||||
If methods is set, will also perform highlighting. See
|
||||
the highlight method.
|
||||
</para></listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>Query.__iter__() and Query.next()</term>
|
||||
<listitem><para>So that things like <literal>for doc in
|
||||
query:</literal> will work.</para></listitem>
|
||||
</varlistentry>
|
||||
</variablelist>
|
||||
|
||||
<listitem><para>Retrieve the named document
|
||||
attribute. You can also use <literal>getattr(doc,
|
||||
key)</literal> or
|
||||
<literal>doc.key</literal>.</para></listitem>
|
||||
</varlistentry>
|
||||
<variablelist>
|
||||
|
||||
<varlistentry>
|
||||
<term>doc.key = value</term>
|
||||
<varlistentry><term>Query.arraysize</term>
|
||||
<listitem><para>Default number of records processed by fetchmany
|
||||
(r/w).</para></listitem>
|
||||
</varlistentry>
|
||||
<varlistentry><term>Query.rowcount</term><listitem><para>Number
|
||||
of records returned by the last
|
||||
execute.</para></listitem></varlistentry>
|
||||
<varlistentry><term>Query.rownumber</term><listitem><para>Next index
|
||||
to be fetched from results. Normally increments after
|
||||
each fetchone() call, but can be set/reset before the
|
||||
call to effect seeking (equivalent to
|
||||
using <literal>scroll()</literal>). Starts at
|
||||
0.</para></listitem>
|
||||
</varlistentry>
|
||||
|
||||
<listitem><para>Set the the named document attribute. You
|
||||
can also use <literal>setattr(doc, key,
|
||||
value)</literal>.</para></listitem>
|
||||
</varlistentry>
|
||||
</variablelist>
|
||||
|
||||
<varlistentry>
|
||||
<term>getbinurl()</term>
|
||||
</simplesect>
|
||||
<simplesect id="RCL.PROGRAM.PYTHONAPI.RECOLL.CLASSES.DOC">
|
||||
<title>The Doc class</title>
|
||||
|
||||
<listitem><para>Retrieve the URL in byte array format (no
|
||||
transcoding), for use as parameter to a system
|
||||
call.</para></listitem>
|
||||
</varlistentry>
|
||||
<para>A <literal>Doc</literal> object contains index data
|
||||
for a given document. The data is extracted from the
|
||||
index when searching, or set by the indexer program when
|
||||
updating. The Doc object has many attributes to be read or
|
||||
set by its user. It mostly matches the Rcl::Doc C++
|
||||
object. Some of the attributes are predefined, but,
|
||||
especially when indexing, others can be set, the name of
|
||||
which will be processed as field names by the indexing
|
||||
configuration. Inputs can be specified as Unicode or
|
||||
strings. Outputs are Unicode objects. All dates are
|
||||
specified as Unix timestamps, printed as strings. Please
|
||||
refer to the <filename>rcldb/rcldoc.cpp</filename> C++ file
|
||||
for a full description of the predefined attributes. Here
|
||||
follows a short list.</para>
|
||||
|
||||
<varlistentry>
|
||||
<term>setbinurl(url)</term>
|
||||
<para><itemizedlist>
|
||||
<listitem><para><literal>url</literal> the document URL but
|
||||
see also <literal>getbinurl()</literal></para></listitem>
|
||||
|
||||
<listitem><para><literal>ipath</literal> the document
|
||||
<literal>ipath</literal> for embedded
|
||||
documents.</para></listitem>
|
||||
|
||||
<listitem><para>Set the URL in byte array format (no
|
||||
transcoding).</para></listitem>
|
||||
</varlistentry>
|
||||
<listitem><para><literal>fbytes, dbytes</literal> the document
|
||||
file and text sizes.</para></listitem>
|
||||
<listitem><para><literal>fmtime, dmtime</literal> the document
|
||||
file and document times.</para></listitem>
|
||||
|
||||
<listitem><para><literal>xdocid</literal> the document
|
||||
Xapian document ID. This is useful if you want to access
|
||||
the document through a direct Xapian
|
||||
operation.</para></listitem>
|
||||
|
||||
<varlistentry>
|
||||
<term>items()</term>
|
||||
<listitem><para>Return a dictionary of doc object
|
||||
keys/values</para></listitem>
|
||||
</varlistentry>
|
||||
<listitem><para><literal>mtype</literal> the document
|
||||
MIME type.</para></listitem>
|
||||
|
||||
<varlistentry>
|
||||
<term>keys()</term>
|
||||
<listitem><para>list of doc object keys (attribute
|
||||
names).</para></listitem>
|
||||
</varlistentry>
|
||||
</variablelist>
|
||||
<listitem><para>Fields stored by default:
|
||||
<literal>author</literal>, <literal>filename</literal>,
|
||||
<literal>keywords</literal>,
|
||||
<literal>recipient</literal></para></listitem>
|
||||
|
||||
</sect5> <!-- Doc -->
|
||||
</itemizedlist>
|
||||
</para>
|
||||
|
||||
<para>At query time, only the fields that are defined as
|
||||
<literal>stored</literal> either by default or in the
|
||||
<filename>fields</filename> configuration file will be meaningful
|
||||
in the <literal>Doc</literal> object. The document processed text
|
||||
may be present or not, depending if the index stores the text at
|
||||
all, and if it does, on the <literal>fetchtext</literal> query
|
||||
execute option. See also the <literal>rclextract</literal> module
|
||||
for accessing document contents.</para>
|
||||
|
||||
<sect5 id="RCL.PROGRAM.PYTHONAPI.RECOLL.CLASSES.SEARCHDATA">
|
||||
<title>The SearchData class</title>
|
||||
<variablelist>
|
||||
|
||||
<para>A <literal>SearchData</literal> object allows building
|
||||
a query by combining clauses, for execution
|
||||
by <literal>Query.executesd()</literal>. It can be used
|
||||
in replacement of the query language approach. The
|
||||
interface is going to change a little, so no detailed doc
|
||||
for now...</para>
|
||||
<varlistentry>
|
||||
<term>get(key), [] operator</term>
|
||||
|
||||
<variablelist>
|
||||
<listitem><para>Retrieve the named document
|
||||
attribute. You can also use <literal>getattr(doc,
|
||||
key)</literal> or
|
||||
<literal>doc.key</literal>.</para></listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>addclause(type='and'|'or'|'excl'|'phrase'|'near'|'sub',
|
||||
qstring=string, slack=0, field='', stemming=1,
|
||||
subSearch=SearchData)</term>
|
||||
<listitem><para></para></listitem>
|
||||
</varlistentry>
|
||||
</variablelist>
|
||||
<varlistentry>
|
||||
<term>doc.key = value</term>
|
||||
|
||||
</sect5> <!-- SearchData -->
|
||||
<listitem><para>Set the the named document attribute. You
|
||||
can also use <literal>setattr(doc, key,
|
||||
value)</literal>.</para></listitem>
|
||||
</varlistentry>
|
||||
|
||||
</sect4> <!-- recoll.classes -->
|
||||
</sect3> <!-- Recoll module -->
|
||||
<varlistentry>
|
||||
<term>getbinurl()</term>
|
||||
|
||||
<listitem><para>Retrieve the URL in byte array format (no
|
||||
transcoding), for use as parameter to a system
|
||||
call.</para></listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>setbinurl(url)</term>
|
||||
|
||||
<listitem><para>Set the URL in byte array format (no
|
||||
transcoding).</para></listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>items()</term>
|
||||
<listitem><para>Return a dictionary of doc object
|
||||
keys/values</para></listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>keys()</term>
|
||||
<listitem><para>list of doc object keys (attribute
|
||||
names).</para></listitem>
|
||||
</varlistentry>
|
||||
</variablelist>
|
||||
|
||||
</simplesect> <!-- Doc -->
|
||||
|
||||
<simplesect id="RCL.PROGRAM.PYTHONAPI.RECOLL.CLASSES.SEARCHDATA">
|
||||
<title>The SearchData class</title>
|
||||
|
||||
<para>A <literal>SearchData</literal> object allows building
|
||||
a query by combining clauses, for execution
|
||||
by <literal>Query.executesd()</literal>. It can be used
|
||||
in replacement of the query language approach. The
|
||||
interface is going to change a little, so no detailed doc
|
||||
for now...</para>
|
||||
|
||||
<variablelist>
|
||||
|
||||
<varlistentry>
|
||||
<term>addclause(type='and'|'or'|'excl'|'phrase'|'near'|'sub',
|
||||
qstring=string, slack=0, field='', stemming=1,
|
||||
subSearch=SearchData)</term>
|
||||
<listitem><para></para></listitem>
|
||||
</varlistentry>
|
||||
</variablelist>
|
||||
|
||||
</simplesect> <!-- SearchData -->
|
||||
|
||||
</sect3> <!-- Recoll module -->
|
||||
|
||||
<sect3 id="RCL.PROGRAM.PYTHONAPI.RCLEXTRACT">
|
||||
<title>The rclextract module</title>
|
||||
|
||||
|
||||
<para>Prior to &RCL; 1.25, index queries never provide document
|
||||
content because it is not stored. More recent versions usually
|
||||
<para>Prior to &RCL; 1.25, index queries could not provide document
|
||||
content because it was never stored. &RCL; 1.25 and later usually
|
||||
store the document text, which can be optionally retrieved when
|
||||
running a query (see <literal>query.execute()</literal>
|
||||
above - the result is always plain text).</para>
|
||||
@ -5506,7 +5494,7 @@ recollindex -c "$confdir"
|
||||
<para>You need to import the <literal>recoll</literal> module
|
||||
before the <literal>rclextract</literal> module.</para>
|
||||
|
||||
<sect4 id="RCL.PROGRAM.PYTHONAPI.RCLEXTRACT.CLASSES.EXTRACTOR">
|
||||
<simplesect id="RCL.PROGRAM.PYTHONAPI.RCLEXTRACT.CLASSES.EXTRACTOR">
|
||||
<title>The Extractor class</title>
|
||||
|
||||
<variablelist>
|
||||
@ -5565,7 +5553,7 @@ not doc.ipath and (not "rclbes" in doc.keys() or doc["rclbes"] == "FS")
|
||||
|
||||
</variablelist>
|
||||
|
||||
</sect4>
|
||||
</simplesect>
|
||||
</sect3> <!-- rclextract module -->
|
||||
|
||||
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user