added python api doc
This commit is contained in:
parent
d282f8a838
commit
583877757a
@ -24,7 +24,7 @@
|
||||
Dockes</holder>
|
||||
</copyright>
|
||||
|
||||
<releaseinfo>$Id: usermanual.sgml,v 1.66 2008-10-08 16:12:36 dockes Exp $</releaseinfo>
|
||||
<releaseinfo>$Id: usermanual.sgml,v 1.67 2008-10-10 08:19:12 dockes Exp $</releaseinfo>
|
||||
|
||||
<abstract>
|
||||
<para>This document introduces full text search notions
|
||||
@ -1575,12 +1575,329 @@ fvwm
|
||||
<para>Your main database (the one the current configuration
|
||||
indexes to), is always implicitly active. If this is not
|
||||
desirable, you can set up your configuration so that it indexes,
|
||||
for example, an empty directory.</para>
|
||||
for example, an empty directory. An alternative indexer may also
|
||||
need to implement a way of purging the index from stale data,
|
||||
</para>
|
||||
|
||||
</sect1>
|
||||
|
||||
</chapter>
|
||||
|
||||
<chapter id="rcl.program">
|
||||
<title>Programming interface</title>
|
||||
|
||||
<sect1 id="rcl.program.elements">
|
||||
<title>Interface elements</title>
|
||||
|
||||
<para>A few elements in the interface are specific and and need
|
||||
an explanation.</para>
|
||||
|
||||
<variablelist>
|
||||
|
||||
<varlistentry>
|
||||
<term>udi</term> <listitem><para>An udi (unique document
|
||||
identifier) identifies a document. Because of limitations
|
||||
inside the index engine, it is restricted in length (to
|
||||
200 bytes), which is why a regular URI cannot be used. The
|
||||
structure and contents of the udi is defined by the
|
||||
application and opaque to the index engine. For example,
|
||||
the internal file system indexer uses the complete
|
||||
document path (file path + internal path), truncated to
|
||||
length, the suppressed part being replaced by a hash
|
||||
value.</para> </listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>ipath</term>
|
||||
|
||||
<listitem><para>This data value (set as a field in the Doc
|
||||
object) is stored, along with the URL, but not indexed by
|
||||
&RCL;. Its contents are not interpreted, and its use is up
|
||||
to the application. For example, the &RCL; internal file
|
||||
system indexer stores the part of the document access path
|
||||
internal to the container file (<literal>ipath</literal> in
|
||||
this case is a list of subdocument sequential numbers). url
|
||||
and ipath are returned in every search result and permit
|
||||
access to the original document.</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term>Stored and indexed fields</term>
|
||||
|
||||
<listitem><para>The <filename>fields</filename> file inside
|
||||
the &RCL; configuration defines which document fields are
|
||||
either "indexed" (searchable), "stored" (retrievable with
|
||||
search results), or both.</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
</variablelist>
|
||||
|
||||
<para>Data for an external indexer, should be stored in a
|
||||
separate index, not the one for the &RCL; internal file system
|
||||
indexer, except if the latter is not used at all). The reason
|
||||
is that the main document indexer purge pass would remove all
|
||||
the other indexer's documents, as they were not seen during
|
||||
indexing. The main indexer documents would also probably be a
|
||||
problem for the external indexer purge operation.</para>
|
||||
|
||||
</sect1>
|
||||
|
||||
<sect1 id="rcl.program.python">
|
||||
<title>Python interface</title>
|
||||
|
||||
<sect2 id="rcl.program.python.intro">
|
||||
<title>Introduction</title>
|
||||
|
||||
<para>&RCL; versions after 1.11 define a Python programming
|
||||
interface, both for searching and indexing.</para>
|
||||
|
||||
<para>The python interface is not built by default and can be
|
||||
found in the source package, under python/recoll. The
|
||||
directory contains the usual <filename>setup.py</filename>
|
||||
script which you can use to build and install the
|
||||
module:
|
||||
|
||||
<screen>
|
||||
<userinput>cd recoll-xxx/python/recoll</userinput>
|
||||
<userinput>python setup.py build</userinput>
|
||||
<userinput>python setup.py install</userinput>
|
||||
</screen>
|
||||
</para>
|
||||
|
||||
</sect2>
|
||||
|
||||
|
||||
<sect2 id="rcl.program.python.manual">
|
||||
<title>Interface manual</title>
|
||||
|
||||
<literalLayout>
|
||||
NAME
|
||||
recoll - This is an interface to the Recoll full text indexer.
|
||||
|
||||
FILE
|
||||
/usr/local/lib/python2.5/site-packages/recoll.so
|
||||
|
||||
CLASSES
|
||||
Db
|
||||
Doc
|
||||
Query
|
||||
SearchData
|
||||
|
||||
class Db(__builtin__.object)
|
||||
| Db([confdir=None], [extra_dbs=None], [writable = False])
|
||||
|
|
||||
| A Db object holds a connection to a Recoll index. Use the connect()
|
||||
| function to create one.
|
||||
| confdir specifies a Recoll configuration directory (default:
|
||||
| $RECOLL_CONFDIR or ~/.recoll).
|
||||
| extra_dbs is a list of external databases (xapian directories)
|
||||
| writable decides if we can index new data through this connection
|
||||
|
|
||||
| Methods defined here:
|
||||
|
|
||||
|
|
||||
| addOrUpdate(...)
|
||||
| addOrUpdate(udi, doc, parent_udi=None) -> None
|
||||
| Add or update index data for a given document
|
||||
| The udi string must define a unique id for the document. It is not
|
||||
| interpreted inside Recoll
|
||||
| doc is a Doc object
|
||||
| if parent_udi is set, this is a unique identifier for the
|
||||
| top-level container (ie mbox file)
|
||||
|
|
||||
| delete(...)
|
||||
| delete(udi) -> Bool.
|
||||
| Purge index from all data for udi. If udi matches a container
|
||||
| document, purge all subdocs (docs with a parent_udi matching udi).
|
||||
|
|
||||
| makeDocAbstract(...)
|
||||
| makeDocAbstract(Doc, Query) -> string
|
||||
| Build and return 'keyword-in-context' abstract for document
|
||||
| and query.
|
||||
|
|
||||
| needUpdate(...)
|
||||
| needUpdate(udi, sig) -> Bool.
|
||||
| Check if the index is up to date for the document defined by udi,
|
||||
| having the current signature sig.
|
||||
|
|
||||
| purge(...)
|
||||
| purge() -> Bool.
|
||||
| Delete all documents that were not touched during the just finished
|
||||
| indexing pass (since open-for-write). These are the documents for
|
||||
| the needUpdate() call was not performed, indicating that they no
|
||||
| longer exist in the primary storage system.
|
||||
|
|
||||
| query(...)
|
||||
| query() -> Query. Return a new, blank query object for this index.
|
||||
|
|
||||
| setAbstractParams(...)
|
||||
| setAbstractParams(maxchars, contextwords).
|
||||
| Set the parameters used to build 'keyword-in-context' abstracts
|
||||
|
|
||||
| ----------------------------------------------------------------------
|
||||
| Data and other attributes defined here:
|
||||
|
|
||||
|
||||
class Doc(__builtin__.object)
|
||||
| Doc()
|
||||
|
|
||||
| A Doc object contains index data for a given document.
|
||||
| The data is extracted from the index when searching, or set by the
|
||||
| indexer program when updating. The Doc object has no useful methods but
|
||||
| many attributes to be read or set by its user. It matches exactly the
|
||||
| Rcl::Doc c++ object. Some of the attributes are predefined, but,
|
||||
| especially when indexing, others can be set, the name of which will be
|
||||
| processed as field names by the indexing configuration.
|
||||
| Inputs can be specified as unicode or strings.
|
||||
| Outputs are unicode objects.
|
||||
| All dates are specified as unix timestamps, printed as strings
|
||||
| Predefined attributes (index/query/both):
|
||||
| text (index): document plain text
|
||||
| url (both)
|
||||
| fbytes (both) optional) file size in bytes
|
||||
| filename (both)
|
||||
| fmtime (both) optional file modification date. Unix time printed
|
||||
| as string
|
||||
| dbytes (both) document text bytes
|
||||
| dmtime (both) document creation/modification date
|
||||
| ipath (both) value private to the app.: internal access path
|
||||
| inside file
|
||||
| mtype (both) mime type for original document
|
||||
| mtime (query) dmtime if set else fmtime
|
||||
| origcharset (both) charset the text was converted from
|
||||
| size (query) dbytes if set, else fbytes
|
||||
| sig (both) app-defined file modification signature.
|
||||
| For up to date checks
|
||||
| relevancyrating (query)
|
||||
| abstract (both)
|
||||
| author (both)
|
||||
| title (both)
|
||||
| keywords (both)
|
||||
|
|
||||
| Methods defined here:
|
||||
|
|
||||
|
|
||||
| ----------------------------------------------------------------------
|
||||
| Data and other attributes defined here:
|
||||
|
|
||||
|
||||
class Query(__builtin__.object)
|
||||
| Recoll Query objects are used to execute index searches.
|
||||
| They must be created by the Db.query() method.
|
||||
|
|
||||
| Methods defined here:
|
||||
|
|
||||
|
|
||||
| execute(...)
|
||||
| execute(query_string, stemming=1|0)
|
||||
|
|
||||
| Starts a search for query_string, a Recoll search language string
|
||||
| (mostly Xesam-compatible).
|
||||
| The query can be a simple list of terms (and'ed by default), or more
|
||||
| complicated with field specs etc. See the Recoll manual.
|
||||
|
|
||||
| executesd(...)
|
||||
| executesd(SearchData)
|
||||
|
|
||||
| Starts a search for the query defined by the SearchData object.
|
||||
|
|
||||
| fetchone(...)
|
||||
| fetchone(None) -> Doc
|
||||
|
|
||||
| Fetches the next Doc object in the current search results.
|
||||
|
|
||||
| sortby(...)
|
||||
| sortby(field=fieldname, ascending=true)
|
||||
| Sort results by 'fieldname', in ascending or descending order.
|
||||
| Only one field can be used, no subsorts for now.
|
||||
| Must be called before executing the search
|
||||
|
|
||||
| ----------------------------------------------------------------------
|
||||
| Data descriptors defined here:
|
||||
|
|
||||
| next
|
||||
| Next index to be fetched from results. Normally increments after
|
||||
| each fetchone() call, but can be set/reset before the call effect
|
||||
| seeking. Starts at 0
|
||||
|
|
||||
| ----------------------------------------------------------------------
|
||||
| Data and other attributes defined here:
|
||||
|
|
||||
|
||||
class SearchData(__builtin__.object)
|
||||
| SearchData()
|
||||
|
|
||||
| A SearchData object describes a query. It has a number of global
|
||||
| parameters and a chain of search clauses.
|
||||
|
|
||||
| Methods defined here:
|
||||
|
|
||||
|
|
||||
| addclause(...)
|
||||
| addclause(type='and'|'or'|'excl'|'phrase'|'near'|'sub',
|
||||
| qstring=string, slack=int, field=string, stemming=1|0,
|
||||
| subSearch=SearchData)
|
||||
| Adds a simple clause to the SearchData And/Or chain, or a subquery
|
||||
| defined by another SearchData object
|
||||
|
|
||||
| ----------------------------------------------------------------------
|
||||
| Data and other attributes defined here:
|
||||
|
|
||||
|
||||
FUNCTIONS
|
||||
connect(...)
|
||||
connect([confdir=None], [extra_dbs=None], [writable = False])
|
||||
-> Db.
|
||||
|
||||
Connects to a Recoll database and returns a Db object.
|
||||
confdir specifies a Recoll configuration directory
|
||||
(the default is built like for any Recoll program).
|
||||
extra_dbs is a list of external databases (xapian directories)
|
||||
writable decides if we can index new data through this connection
|
||||
|
||||
|
||||
</literalLayout>
|
||||
|
||||
|
||||
<sect2 id="rcl.program.python.examples">
|
||||
<title>Example code</title>
|
||||
|
||||
<para>The following sample would query the index with a user
|
||||
language string. See the <filename>python/samples</filename>
|
||||
directory inside the &RCL; source for other examples.</para>
|
||||
|
||||
<programlisting>
|
||||
#!/usr/bin/env python
|
||||
|
||||
import recoll
|
||||
|
||||
db = recoll.connect()
|
||||
db.setAbstractParams(maxchars=80, contextwords=2)
|
||||
|
||||
query = db.query()
|
||||
nres = query.execute("some user question")
|
||||
print "Result count: ", nres
|
||||
if nres > 5:
|
||||
nres = 5
|
||||
while query.next >= 0 and query.next < nres:
|
||||
doc = query.fetchone()
|
||||
print query.next
|
||||
for k in ("title", "size"):
|
||||
print k, ":", getattr(doc, k).encode('utf-8')
|
||||
abs = db.makeDocAbstract(doc, query).encode('utf-8')
|
||||
print abs
|
||||
print
|
||||
|
||||
|
||||
|
||||
</programlisting>
|
||||
|
||||
</sect2>
|
||||
|
||||
</sect1>
|
||||
</chapter>
|
||||
|
||||
<chapter id="rcl.install">
|
||||
<title>Installation</title>
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user