diff --git a/src/doc/user/Makefile b/src/doc/user/Makefile index b560abfa..31f337b1 100644 --- a/src/doc/user/Makefile +++ b/src/doc/user/Makefile @@ -17,8 +17,9 @@ XSLDIR="/usr/share/xml/docbook/stylesheet/docbook-xsl/" # Options common to the single-file and chunked versions commonoptions=--stringparam section.autolabel 1 \ - --stringparam section.autolabel.max.depth 3 \ + --stringparam section.autolabel.max.depth 2 \ --stringparam section.label.includes.component.label 1 \ + --stringparam toc.max.depth 3 \ --stringparam autotoc.label.in.hyperlink 0 \ --stringparam abstract.notitle.enabled 1 \ --stringparam html.stylesheet docbook-xsl.css \ diff --git a/src/doc/user/usermanual.html b/src/doc/user/usermanual.html index 7b6de588..95bc9522 100644 --- a/src/doc/user/usermanual.html +++ b/src/doc/user/usermanual.html @@ -1429,7 +1429,7 @@ alink="#0000FF"> other constraints. Most of the relevant parameters are described in the + "Parameters affecting how we generate terms and organize the index"> linked section.

The different search interfaces (GUI, command line, ...) have different methods to define the set of indexes @@ -2362,7 +2362,7 @@ recoll -c mondelaypatterns parameter in the configuration + "Miscellaneous parameters">configuration section.

@@ -2655,8 +2655,7 @@ recoll -c The format of the result list entries is entirely configurable by using the preference dialog to edit an HTML - fragment.

+ "The result list format">edit an HTML fragment.

You can click on the Query details link at the top of the results page to see the query actually performed, after stem expansion and @@ -2674,8 +2673,8 @@ recoll -c

3.1.2.1. No - results: the spelling suggestions

+ "RCL.SEARCH.GUI.RESLIST.SUGGS">No results: + the spelling suggestions @@ -2696,8 +2695,8 @@ recoll -c

3.1.2.2. The - result list right-click menu

+ "RCL.SEARCH.GUI.RESULTLIST.MENU">The result + list right-click menu @@ -2992,7 +2991,7 @@ recoll -c

3.1.6.1. Searching + "RCL.SEARCH.GUI.PREVIEW.SEARCH">Searching inside the preview

@@ -3153,8 +3152,7 @@ recoll -c Recoll keeps a history of searches. See Advanced search - history.

+ "Avanced search history">Advanced search history.

The dialog has two tabs:

    @@ -3184,7 +3182,7 @@ recoll -c

    3.1.8.1. Avanced + "RCL.SEARCH.GUI.COMPLEX.TERMS">Avanced search: the "find" tab

@@ -3256,7 +3254,7 @@ recoll -c

3.1.8.2. Avanced + "RCL.SEARCH.GUI.COMPLEX.FILTER">Avanced search: the "filter" tab

@@ -3324,7 +3322,7 @@ recoll -c

3.1.8.3. Avanced + "RCL.SEARCH.GUI.COMPLEX.HISTORY">Avanced search history

@@ -3590,8 +3588,8 @@ recoll -c

3.1.13.1. Terms - and search expansion

+ "RCL.SEARCH.GUI.TIPS.TERMS">Terms and search + expansion @@ -3654,8 +3652,8 @@ recoll -c

3.1.13.2. Working - with phrases and proximity

+ "RCL.SEARCH.GUI.TIPS.PHRASES">Working with + phrases and proximity @@ -3711,7 +3709,7 @@ recoll -c

3.1.13.3. Others

+ "RCL.SEARCH.GUI.TIPS.MISC">Others @@ -4019,8 +4017,8 @@ recoll -c result list - customisation section.

+ "The result list format">result list customisation + section.

  • result list - customisation section.

    + "The result list format">result list customisation + section.

  • Date format: @@ -4158,8 +4156,8 @@ recoll -c

    3.1.15.1. The - result list format

    + "RCL.SEARCH.GUI.CUSTOM.RESLIST">The result + list format @@ -4915,9 +4913,9 @@ recoll -c have a - look at an important limitation of wildcards in - path filters.

    + "Wildcards and path filtering">have a look at an + important limitation of wildcards in path + filters.

    Relative paths also make sense, for example, dir:share/doc would match either

    3.8.1.1. Wildcards - and path filtering

    + "RCL.SEARCH.WILDCARDS.PATH">Wildcards and + path filtering @@ -6382,12 +6380,12 @@ recollindex -c "$confdir" the result list by using the appropriate directive in the definition of the result list - paragraph format. All fields are displayed on the - fields screen of the preview window (which you can - reach through the right-click menu). This is - independant of the fact that the search which - produced the results used the field or not.

    + "The result list format">result list paragraph + format. All fields are displayed on the fields + screen of the preview window (which you can reach + through the right-click menu). This is independant of + the fact that the search which produced the results + used the field or not.

  • @@ -6423,14 +6421,16 @@ recollindex -c "$confdir" -

    Recoll versions after - 1.11 define a Python programming interface, both for - searching and creating/updating an index.

    -

    The search interface is used in the Recoll Ubuntu Unity Lens and the - Recoll Web UI. It can - run queries on any Recoll configuration.

    +

    The Recoll Python + programming interface can be used both for searching and + for creating/updating an index. Bindings exist for + Python2 and Python3.

    +

    The search interface is used in a number of active + projects: the Recoll + Gnome Shell Search + Provider, the Recoll Web UI, and the upmpdcli UPnP + Media Server, in addition to many small scripts.

    The index update section of the API may be used to create and update Recoll indexes on specific configurations (separate from the @@ -6467,6 +6467,23 @@ recollindex -c "$confdir" here. A paragraph at the end of this section will explain a few differences and ways to write code compatible with both versions.

    +

    The recoll package now + contains two modules:

    +
    +
      +
    • +

      The recoll module + contains functions and classes used to query (or + update) the index.

      +
    • +
    • +

      The rclextract + module contains functions and classes used at query + time to access document data.

      +
    • +
    +

    There is a good chance that your system repository has packages for the Recoll Python API, sometimes in a package separate from the main one (maybe named something @@ -6493,15 +6510,17 @@ recollindex -c "$confdir" nres = query.execute("some query") results = query.fetchmany(20) for doc in results: - print(doc.url, doc.title) + print("%s %s" % (doc.url, doc.title))

    You can also take a look at the source for the Recoll WebUI, or the upmpdcli local media server, which are - both based on the Python API.

    + "https://opensourceprojects.eu/p/recollwebui/code/ci/78ddb20787b2a894b5e4661a8d5502c4511cf71e/tree/" + target="_top">Recoll WebUI, the upmpdcli local media server, or the + Gnome Shell Search Provider.

    @@ -6604,11 +6623,19 @@ recollindex -c "$confdir"
    Stored and indexed fields
    -

    The fields file - inside the Recoll +

    The fields file inside the + Recoll configuration defines which document fields are - either "indexed" (searchable), "stored" - (retrievable with search results), or both.

    + either indexed + (searchable), stored + (retrievable with search results), or both. Apart + from a few standard/internal fields, only the + stored fields are + retrievable through the Python search + interface.

    @@ -6624,478 +6651,417 @@ recollindex -c "$confdir"
    -
    -
    -
    -
    -

    5.3.3.1. Recoll - package

    -
    -
    -
    -

    The recoll package - contains two modules:

    -
    -
      -
    • -

      The recoll module - contains functions and classes used to query (or - update) the index. This section will only - describe the query part, see further for the - update part.

      -
    • -
    • -

      The rclextract - module contains functions and classes used to - access document data.

      -
    • -
    -
    -

    5.3.3.2. The - recoll module

    + "RCL.PROGRAM.PYTHONAPI.RECOLL">The recoll + module
    -
    +
    Functions
    + "RCL.PROGRAM.PYTHONAPI.RECOLL.CONNECT" id= + "RCL.PROGRAM.PYTHONAPI.RECOLL.CONNECT">connect(confdir=None, + extra_dbs=None, writable = False)
    +

    The connect() + function connects to one or several Recoll index(es) and returns a + Db object.

    +

    This call initializes the recoll module, and it + should always be performed before any other call or + object creation.

    +
    +
      +
    • +

      confdir may + specify a configuration directory. The usual + defaults apply.

      +
    • +
    • +

      extra_dbs is a + list of additional indexes (Xapian + directories).

      +
    • +
    • +

      writable + decides if we can index new data through this + connection.

      +
    • +
    +
    +
    +
    +
    +
    +
    +
    The Db + class
    +
    +
    +
    +

    A Db object is created by a connect() call and holds a + connection to a Recoll index.

    -
    connect(confdir=None, - extra_dbs=None, writable = False)
    +
    Db.close()
    -

    The connect() - function connects to one or several - Recoll - index(es) and returns a Db object.

    -
    -
      -
    • -

      confdir - may specify a configuration directory. - The usual defaults apply.

      -
    • -
    • -

      extra_dbs - is a list of additional indexes (Xapian - directories).

      -
    • -
    • -

      writable - decides if we can index new data through - this connection.

      -
    • -
    -
    -

    This call initializes the recoll module, and - it should always be performed before any other - call or object creation.

    +

    Closes the connection. You can't do anything + with the Db object + after this.

    +
    +
    Db.query(), + Db.cursor()
    +
    +

    These aliases return a blank Query object for this + index.

    +
    +
    Db.setAbstractParams(maxchars, + contextwords)
    +
    +

    Set the parameters used to build snippets + (sets of keywords in context text fragments). + maxchars defines + the maximum total size of the abstract. + contextwords + defines how many terms are shown around the + keyword.

    +
    +
    Db.termMatch(match_type, + expr, field='', maxlen=-1, casesens=False, + diacsens=False, lang='english')
    +
    +

    Expand an expression against the index term + list. Performs the basic function from the GUI + term explorer tool. match_type can be either of + wildcard, + regexp or + stem. Returns a + list of terms expanded from the input + expression.

    -
    +
    Classes
    + "RCL.PROGRAM.PYTHONAPI.RECOLL.CLASSES.QUERY" + id="RCL.PROGRAM.PYTHONAPI.RECOLL.CLASSES.QUERY"> + The Query class
    -
    -
    +

    A Query object + (equivalent to a cursor in the Python DB API) is + created by a Db.query() + call. It is used to execute index searches.

    +
    +
    +
    Query.sortby(fieldname, + ascending=True)
    +
    +

    Sort results by fieldname, in + ascending or descending order. Must be called + before executing the search.

    +
    +
    Query.execute(query_string, stemming=1, + stemlang="english", fetchtext=False)
    +
    +

    Starts a search for query_string, a + Recoll search + language string. If the index stores the + document texts and fetchtext is True, store the + document extracted text in doc.text.

    +
    +
    Query.executesd(SearchData, + fetchtext=False)
    +
    +

    Starts a search for the query defined by the + SearchData object. If the index stores the + document texts and fetchtext is True, store the + document extracted text in doc.text.

    +
    +
    Query.fetchmany(size=query.arraysize)
    +
    +

    Fetches the next Doc objects in the current + search results, and returns them as an array of + the required size, which is by default the + value of the arraysize data member.

    +
    +
    Query.fetchone()
    +
    +

    Fetches the next Doc object from the current + search results. Generates a StopIteration + exception if there are no results left.

    +
    +
    Query.close()
    +
    +

    Closes the query. The object is unusable + after the call.

    +
    +
    Query.scroll(value, + mode='relative')
    +
    +

    Adjusts the position in the current result + set. mode can be + relative or + absolute.

    +
    +
    Query.getgroups()
    +
    +

    Retrieves the expanded query terms as a list + of pairs. Meaningful only after executexx In + each pair, the first entry is a list of user + terms (of size one for simple terms, or more + for group and phrase clauses), the second a + list of query terms as derived from the user + terms and used in the Xapian Query.

    +
    +
    Query.getxquery()
    +
    +

    Return the Xapian query description as a + Unicode string. Meaningful only after + executexx.

    +
    +
    Query.highlight(text, + ishtml = 0, methods = object)
    +
    +

    Will insert <span "class=rclmatch">, + </span> tags around the match areas in + the input text and return the modified text. + ishtml can be set + to indicate that the input text is HTML and + that HTML special characters should not be + escaped. methods + if set should be an object with methods + startMatch(i) and endMatch() which will be + called for each match and should return a begin + and end tag

    +
    +
    Query.makedocabstract(doc, + methods = object))
    +
    +

    Create a snippets abstract for doc (a Doc object) by selecting text + around the match terms. If methods is set, will + also perform highlighting. See the highlight + method.

    +
    +
    Query.__iter__() and + Query.next()
    +
    +

    So that things like for doc in query: will + work.

    +
    +
    +
    +
    +
    +
    Query.arraysize
    +
    +

    Default number of records processed by + fetchmany (r/w).

    +
    +
    Query.rowcount
    +
    +

    Number of records returned by the last + execute.

    +
    +
    Query.rownumber
    +
    +

    Next index to be fetched from results. + Normally increments after each fetchone() call, + but can be set/reset before the call to effect + seeking (equivalent to using scroll()). Starts at 0.

    +
    +
    +
    +
    +
    +
    +
    -
    -
    The - Db class
    -
    +
    The + Doc class
    -

    A Db object is created by a connect() call and holds a - connection to a Recoll index.

    -
    -
    -
    Db.close()
    -
    -

    Closes the connection. You can't do - anything with the Db object after this.

    -
    -
    Db.query(), - Db.cursor()
    -
    -

    These aliases return a blank Query object for this - index.

    -
    -
    Db.setAbstractParams(maxchars, - contextwords)
    -
    -

    Set the parameters used to build snippets - (sets of keywords in context text fragments). - maxchars defines - the maximum total size of the abstract. - contextwords - defines how many terms are shown around the - keyword.

    -
    -
    Db.termMatch(match_type, - expr, field='', maxlen=-1, casesens=False, - diacsens=False, lang='english')
    -
    -

    Expand an expression against the index - term list. Performs the basic function from - the GUI term explorer tool. match_type can be either of - wildcard, - regexp or - stem. Returns a - list of terms expanded from the input - expression.

    -
    -
    -
    -
    -
    +

    A Doc object contains + index data for a given document. The data is + extracted from the index when searching, or set by + the indexer program when updating. The Doc object has + many attributes to be read or set by its user. It + mostly matches the Rcl::Doc C++ object. Some of the + attributes are predefined, but, especially when + indexing, others can be set, the name of which will + be processed as field names by the indexing + configuration. Inputs can be specified as Unicode or + strings. Outputs are Unicode objects. All dates are + specified as Unix timestamps, printed as strings. + Please refer to the rcldb/rcldoc.cpp C++ file for a + full description of the predefined attributes. Here + follows a short list.

    +
    +
      +
    • +

      url the + document URL but see also getbinurl()

      +
    • +
    • +

      ipath the + document ipath for + embedded documents.

      +
    • +
    • +

      fbytes, dbytes + the document file and text sizes.

      +
    • +
    • +

      fmtime, dmtime + the document file and document times.

      +
    • +
    • +

      xdocid the + document Xapian document ID. This is useful if + you want to access the document through a + direct Xapian operation.

      +
    • +
    • +

      mtype the + document MIME type.

      +
    • +
    • +

      Fields stored by default: author, filename, keywords, recipient

      +
    • +
    +
    +

    At query time, only the fields that are defined as + stored either by default + or in the fields + configuration file will be meaningful in the + Doc object. The document + processed text may be present or not, depending if + the index stores the text at all, and if it does, on + the fetchtext query + execute option. See also the rclextract module for accessing + document contents.

    +
    +
    +
    get(key), [] + operator
    +
    +

    Retrieve the named document attribute. You + can also use getattr(doc, + key) or doc.key.

    +
    +
    doc.key = + value
    +
    +

    Set the the named document attribute. You + can also use setattr(doc, + key, value).

    +
    +
    getbinurl()
    +
    +

    Retrieve the URL in byte array format (no + transcoding), for use as parameter to a system + call.

    +
    +
    setbinurl(url)
    +
    +

    Set the URL in byte array format (no + transcoding).

    +
    +
    items()
    +
    +

    Return a dictionary of doc object + keys/values

    +
    +
    keys()
    +
    +

    list of doc object keys (attribute + names).

    +
    +
    +
    +
    +
    +
    +
    -
    -
    The - Query class
    -
    +
    + The SearchData class
    -

    A Query object - (equivalent to a cursor in the Python DB API) is - created by a Db.query() call. It is used to - execute index searches.

    -
    -
    -
    Query.sortby(fieldname, - ascending=True)
    -
    -

    Sort results by fieldname, in - ascending or descending order. Must be called - before executing the search.

    -
    -
    Query.execute(query_string, stemming=1, - stemlang="english", - fetchtext=False)
    -
    -

    Starts a search for query_string, - a Recoll - search language string. If the index stores - the document texts and fetchtext is True, store the - document extracted text in doc.text.

    -
    -
    Query.executesd(SearchData, - fetchtext=False)
    -
    -

    Starts a search for the query defined by - the SearchData object. If the index stores - the document texts and fetchtext is True, store the - document extracted text in doc.text.

    -
    -
    Query.fetchmany(size=query.arraysize)
    -
    -

    Fetches the next Doc objects in the current - search results, and returns them as an array - of the required size, which is by default the - value of the arraysize data member.

    -
    -
    Query.fetchone()
    -
    -

    Fetches the next Doc object from the current - search results. Generates a StopIteration - exception if there are no results left.

    -
    -
    Query.close()
    -
    -

    Closes the query. The object is unusable - after the call.

    -
    -
    Query.scroll(value, - mode='relative')
    -
    -

    Adjusts the position in the current result - set. mode can be - relative or - absolute.

    -
    -
    Query.getgroups()
    -
    -

    Retrieves the expanded query terms as a - list of pairs. Meaningful only after - executexx In each pair, the first entry is a - list of user terms (of size one for simple - terms, or more for group and phrase clauses), - the second a list of query terms as derived - from the user terms and used in the Xapian - Query.

    -
    -
    Query.getxquery()
    -
    -

    Return the Xapian query description as a - Unicode string. Meaningful only after - executexx.

    -
    -
    Query.highlight(text, - ishtml = 0, methods = object)
    -
    -

    Will insert <span "class=rclmatch">, - </span> tags around the match areas in - the input text and return the modified text. - ishtml can be - set to indicate that the input text is HTML - and that HTML special characters should not - be escaped. methods if set should be an - object with methods startMatch(i) and - endMatch() which will be called for each - match and should return a begin and end - tag

    -
    -
    Query.makedocabstract(doc, methods = - object))
    -
    -

    Create a snippets abstract for - doc (a - Doc object) by - selecting text around the match terms. If - methods is set, will also perform - highlighting. See the highlight method.

    -
    -
    Query.__iter__() and - Query.next()
    -
    -

    So that things like for doc in query: will - work.

    -
    -
    -
    -
    -
    -
    Query.arraysize
    -
    -

    Default number of records processed by - fetchmany (r/w).

    -
    -
    Query.rowcount
    -
    -

    Number of records returned by the last - execute.

    -
    -
    Query.rownumber
    -
    -

    Next index to be fetched from results. - Normally increments after each fetchone() - call, but can be set/reset before the call to - effect seeking (equivalent to using - scroll()). - Starts at 0.

    -
    -
    -
    -
    -
    -
    -
    -
    - The Doc class
    -
    -
    -
    -

    A Doc object - contains index data for a given document. The data - is extracted from the index when searching, or set - by the indexer program when updating. The Doc - object has many attributes to be read or set by its - user. It matches exactly the Rcl::Doc C++ object. - Some of the attributes are predefined, but, - especially when indexing, others can be set, the - name of which will be processed as field names by - the indexing configuration. Inputs can be specified - as Unicode or strings. Outputs are Unicode objects. - All dates are specified as Unix timestamps, printed - as strings. Please refer to the rcldb/rcldoc.cpp C++ file for a - full description of the predefined attributes. Here - follows a short list.

    -
    -
      -
    • -

      url the - document URL but see also getbinurl()

      -
    • -
    • -

      ipath the - document ipath - for embedded documents.

      -
    • -
    • -

      fbytes, - dbytes the document file and text - sizes.

      -
    • -
    • -

      fmtime, - dmtime the document file and document - times.

      -
    • -
    • -

      xdocid the - document Xapian document ID. This is useful - if you want to access the document through a - direct Xapian operation.

      -
    • -
    • -

      mtype the - document MIME type.

      -
    • -
    • -

      Fields stored by default: author, filename, keywords, recipient

      -
    • -
    -
    -

    At query time, only the fields that are defined - as stored either by - default or in the fields configuration file will be - meaningful in the Doc - object. Especially this will not be the case for - the document text. See the rclextract module for accessing - document contents.

    -
    -
    -
    get(key), [] - operator
    -
    -

    Retrieve the named document attribute. You - can also use getattr(doc, key) or - doc.key.

    -
    -
    doc.key = - value
    -
    -

    Set the the named document attribute. You - can also use setattr(doc, key, - value).

    -
    -
    getbinurl()
    -
    -

    Retrieve the URL in byte array format (no - transcoding), for use as parameter to a - system call.

    -
    -
    setbinurl(url)
    -
    -

    Set the URL in byte array format (no - transcoding).

    -
    -
    items()
    -
    -

    Return a dictionary of doc object - keys/values

    -
    -
    keys()
    -
    -

    list of doc object keys (attribute - names).

    -
    -
    -
    -
    -
    -
    -
    -
    -
    - The SearchData class
    -
    -
    -
    -

    A SearchData object - allows building a query by combining clauses, for - execution by Query.executesd(). It can be used - in replacement of the query language approach. The - interface is going to change a little, so no - detailed doc for now...

    -
    -
    -
    addclause(type='and'|'or'|'excl'|'phrase'|'near'|'sub', - qstring=string, slack=0, field='', stemming=1, - subSearch=SearchData)
    -
    -
    -
    +

    A SearchData object + allows building a query by combining clauses, for + execution by Query.executesd(). It can be used in + replacement of the query language approach. The + interface is going to change a little, so no detailed + doc for now...

    +
    +
    +
    addclause(type='and'|'or'|'excl'|'phrase'|'near'|'sub', + qstring=string, slack=0, field='', stemming=1, + subSearch=SearchData)
    +
    +
    @@ -7105,14 +7071,15 @@ recollindex -c "$confdir"

    5.3.3.3. The + "RCL.PROGRAM.PYTHONAPI.RCLEXTRACT">The rclextract module

    Prior to Recoll - 1.25, index queries never provide document content - because it is not stored. More recent versions usually + 1.25, index queries could not provide document content + because it was never stored. Recoll 1.25 and later usually store the document text, which can be optionally retrieved when running a query (see query.execute() above - the result is @@ -7126,7 +7093,7 @@ recollindex -c "$confdir"

    You need to import the recoll module before the rclextract module.

    -
    +
    @@ -7207,7 +7174,7 @@ not doc.ipath and (not "rclbes" in doc.keys() or doc["rclbes"] == "FS")

    5.3.3.4. Search + "RCL.PROGRAM.PYTHONAPI.SEARCH.EXAMPLE">Search API usage example

    @@ -7305,7 +7272,7 @@ for i in range(nres):

    5.3.4.1. Python + "RCL.PROGRAM.PYTHONAPI.UPDATE.UPDATE">Python update interface

    @@ -7399,7 +7366,7 @@ for i in range(nres):

    5.3.4.2. Query + "RCL.PROGRAM.PYTHONAPI.UPDATE.ACCESS">Query data access for external indexers (1.23)

    @@ -7449,7 +7416,7 @@ for i in range(nres):

    5.3.4.3. External + "RCL.PROGRAM.PYTHONAPI.UPDATE.SAMPLES">External indexer samples

    @@ -8404,7 +8371,7 @@ for i in range(nres):

    6.4.2.1. Parameters + "RCL.INSTALL.CONFIG.RECOLLCONF.WHATDOCS">Parameters affecting what documents we index

    @@ -8738,7 +8705,7 @@ for i in range(nres):

    6.4.2.2. Parameters + "RCL.INSTALL.CONFIG.RECOLLCONF.TERMS">Parameters affecting how we generate terms and organize the index

    @@ -9008,7 +8975,7 @@ for i in range(nres):

    6.4.2.3. Parameters + "RCL.INSTALL.CONFIG.RECOLLCONF.STORE">Parameters affecting where and how we store things

    @@ -9163,7 +9130,7 @@ for i in range(nres):

    6.4.2.4. Parameters + "RCL.INSTALL.CONFIG.RECOLLCONF.PERFS">Parameters affecting indexing performance and resource usage

    @@ -9264,7 +9231,7 @@ for i in range(nres):

    6.4.2.5. Miscellaneous + "RCL.INSTALL.CONFIG.RECOLLCONF.MISC">Miscellaneous parameters

    @@ -9541,7 +9508,7 @@ for i in range(nres):

    6.4.2.6. Query-time + "RCL.INSTALL.CONFIG.RECOLLCONF.QUERY">Query-time parameters (no impact on the index)

    @@ -9616,7 +9583,7 @@ for i in range(nres):

    6.4.2.7. Parameters + "RCL.INSTALL.CONFIG.RECOLLCONF.PDF">Parameters for the PDF input script

    @@ -9687,7 +9654,7 @@ for i in range(nres):

    6.4.2.8. Parameters + "RCL.INSTALL.CONFIG.RECOLLCONF.SPECLOCATIONS">Parameters set for specific locations

    @@ -9820,7 +9787,7 @@ for i in range(nres):

    6.4.3.1. Extended + "RCL.INSTALL.CONFIG.FIELDS.XATTR">Extended attributes in the fields file

    @@ -10150,7 +10117,7 @@ other = rclcat:other

    6.4.8.1. Adding + "RCL.INSTALL.CONFIG.EXAMPLES.ADDVIEW">Adding an external viewer for an non-indexed type

    @@ -10213,7 +10180,7 @@ other = rclcat:other

    6.4.8.2. Adding + "RCL.INSTALL.CONFIG.EXAMPLES.ADDINDEX">Adding indexing support for a new file type

    diff --git a/src/doc/user/usermanual.xml b/src/doc/user/usermanual.xml index a10bcbaa..41fb1664 100644 --- a/src/doc/user/usermanual.xml +++ b/src/doc/user/usermanual.xml @@ -4966,13 +4966,14 @@ recollindex -c "$confdir" Introduction - &RCL; versions after 1.11 define a Python programming - interface, both for searching and creating/updating an - index. + The &RCL; Python programming interface can be used both for + searching and for creating/updating an index. Bindings exist for + Python2 and Python3. - The search interface is used in the &RCL; Ubuntu Unity Lens - and the &RCL; Web UI. It can run queries on any &RCL; - configuration. + The search interface is used in a number of active projects: + the &RCL; Gnome Shell Search Provider, + the &RCL; Web UI, and the upmpdcli UPnP Media Server, in addition + to many small scripts. The index update section of the API may be used to create and update &RCL; indexes on specific configurations (separate from the @@ -4998,6 +4999,19 @@ recollindex -c "$confdir" paragraph at the end of this section will explain a few differences and ways to write code compatible with both versions. + The recoll package now contains two + modules: + + The recoll module contains + functions and classes used to query (or update) the + index. + + The rclextract module contains + functions and classes used at query time to access document + data. + + + There is a good chance that your system repository has packages for the Recoll Python API, sometimes in a package separate from the main one (maybe named something like python-recoll). Else @@ -5022,13 +5036,17 @@ recollindex -c "$confdir" nres = query.execute("some query") results = query.fetchmany(20) for doc in results: - print(doc.url, doc.title) + print("%s %s" % (doc.url, doc.title)) ]]> - You can also take a look at the source for the Recoll - WebUI, or the upmpdcli local media server, which are both - based on the Python API. + You can also take a look at the source for the + Recoll + WebUI, the + upmpdcli + local media server, or the + Gnome + Shell Search Provider. @@ -5104,10 +5122,14 @@ recollindex -c "$confdir" Stored and indexed fields - The fields file inside - the &RCL; configuration defines which document fields are - either "indexed" (searchable), "stored" (retrievable with - search results), or both. + The fields + file inside the &RCL; configuration defines which + document fields are either indexed + (searchable), stored (retrievable with + search results), or both. Apart from a few standard/internal + fields, only the stored fields are + retrievable through the Python search interface. @@ -5118,381 +5140,347 @@ recollindex -c "$confdir" Python search interface - - Recoll package - - The recoll package contains two - modules: - - The recoll module contains - functions and classes used to query (or update) the - index. This section will only describe the query part, see - further for the update part. - The rclextract module contains - functions and classes used to access document - data. - - - - The recoll module - - Functions + + connect(confdir=None, extra_dbs=None, writable = False) - - - connect(confdir=None, extra_dbs=None, - writable = False) - - The connect() function connects to - one or several &RCL; index(es) and returns - a Db object. - - confdir may specify - a configuration directory. The usual defaults - apply. - extra_dbs is a list of - additional indexes (Xapian directories). - writable decides if - we can index new data through this - connection. - - This call initializes the recoll module, and it should - always be performed before any other call or object - creation. - - - - + The connect() function connects to + one or several &RCL; index(es) and returns + a Db object. + This call initializes the recoll module, and it should + always be performed before any other call or object + creation. + + confdir may specify + a configuration directory. The usual defaults + apply. + extra_dbs is a list of + additional indexes (Xapian directories). + writable decides if + we can index new data through this + connection. + + + + The Db class - - Classes + A Db object is created by a connect() + call and holds a connection to a Recoll index. + + + Db.close() + Closes the connection. You can't do anything + with the Db object after + this. + + + Db.query(), Db.cursor() These + aliases return a blank Query object + for this index. + + + + Db.setAbstractParams(maxchars, + contextwords) Set the parameters used + to build snippets (sets of keywords in context text + fragments). maxchars defines the + maximum total size of the abstract. + contextwords defines how many + terms are shown around the keyword. + + + + Db.termMatch(match_type, expr, field='', + maxlen=-1, casesens=False, diacsens=False, lang='english') + + Expand an expression against the + index term list. Performs the basic function from the + GUI term explorer tool. match_type + can be either + of wildcard, regexp + or stem. Returns a list of terms + expanded from the input expression. + + + + + + + + The Query class + + A Query object (equivalent to a + cursor in the Python DB API) is created by + a Db.query() call. It is used to + execute index searches. + + + + + Query.sortby(fieldname, ascending=True) + Sort results + by fieldname, in ascending + or descending order. Must be called before executing + the search. + - - The Db class + + Query.execute(query_string, stemming=1, + stemlang="english", fetchtext=False) + Starts a search + for query_string, a &RCL; + search language string. If the index stores the document + texts and fetchtext is True, store the + document extracted text in + doc.text. + - A Db object is created by - a connect() call and holds a - connection to a Recoll index. - - - Db.close() - Closes the connection. You can't do anything - with the Db object after - this. - - - Db.query(), Db.cursor() These - aliases return a blank Query object - for this index. - + + Query.executesd(SearchData, fetchtext=False) + Starts a search for the query defined by + the SearchData object. If the index stores the document + texts and fetchtext is True, store the + document extracted text in + doc.text. + - - Db.setAbstractParams(maxchars, - contextwords) Set the parameters used - to build snippets (sets of keywords in context text - fragments). maxchars defines the - maximum total size of the abstract. - contextwords defines how many - terms are shown around the keyword. - - - - Db.termMatch(match_type, expr, field='', - maxlen=-1, casesens=False, diacsens=False, lang='english') - - Expand an expression against the - index term list. Performs the basic function from the - GUI term explorer tool. match_type - can be either - of wildcard, regexp - or stem. Returns a list of terms - expanded from the input expression. - - - - - - - - - - The Query class - - A Query object (equivalent to a - cursor in the Python DB API) is created by - a Db.query() call. It is used to - execute index searches. - - - - - Query.sortby(fieldname, ascending=True) - Sort results - by fieldname, in ascending - or descending order. Must be called before executing - the search. - - - - Query.execute(query_string, stemming=1, - stemlang="english", fetchtext=False) - Starts a search - for query_string, a &RCL; - search language string. If the index stores the document - texts and fetchtext is True, store the - document extracted text in - doc.text. - - - - Query.executesd(SearchData, fetchtext=False) - Starts a search for the query defined by - the SearchData object. If the index stores the document - texts and fetchtext is True, store the - document extracted text in - doc.text. - - - - Query.fetchmany(size=query.arraysize) - - Fetches - the next Doc objects in the current - search results, and returns them as an array of the - required size, which is by default the value of - the arraysize data member. - - - - Query.fetchone() Fetches the - next Doc object from the current - search results. Generates a StopIteration exception if - there are no results left. - - - - Query.close() - Closes the query. The object is unusable - after the call. - - - - Query.scroll(value, mode='relative') - Adjusts the position in the current result - set. mode can - be relative - or absolute. - - - - Query.getgroups() - Retrieves the expanded query terms as a list - of pairs. Meaningful only after executexx In each - pair, the first entry is a list of user terms (of size - one for simple terms, or more for group and phrase - clauses), the second a list of query terms as derived - from the user terms and used in the Xapian - Query. - - - - Query.getxquery() - Return the Xapian query description as a - Unicode string. - Meaningful only after executexx. - - - - Query.highlight(text, ishtml = 0, methods = object) - Will insert <span "class=rclmatch">, - </span> tags around the match areas in the input text - and return the modified text. ishtml - can be set to indicate that the input text is HTML and - that HTML special characters should not be escaped. - methods if set should be an object - with methods startMatch(i) and endMatch() which will be - called for each match and should return a begin and end - tag - - - - Query.makedocabstract(doc, methods = object)) - Create a snippets abstract - for doc (a Doc - object) by selecting text around the match terms. - If methods is set, will also perform highlighting. See - the highlight method. - - - - - Query.__iter__() and Query.next() - So that things like for doc in - query: will work. - - - - - - Query.arraysize - Default number of records processed by fetchmany - (r/w). - - Query.rowcountNumber - of records returned by the last - execute. - Query.rownumberNext index - to be fetched from results. Normally increments after - each fetchone() call, but can be set/reset before the - call to effect seeking (equivalent to - using scroll()). Starts at - 0. - - - - - - - - - The Doc class - - A Doc object contains index data - for a given document. The data is extracted from the - index when searching, or set by the indexer program when - updating. The Doc object has many attributes to be read or - set by its user. It matches exactly the Rcl::Doc C++ - object. Some of the attributes are predefined, but, - especially when indexing, others can be set, the name of - which will be processed as field names by the indexing - configuration. Inputs can be specified as Unicode or - strings. Outputs are Unicode objects. All dates are - specified as Unix timestamps, printed as strings. Please - refer to the rcldb/rcldoc.cpp C++ file - for a full description of the predefined attributes. Here - follows a short list. - - - url the document URL but - see also getbinurl() + + Query.fetchmany(size=query.arraysize) - ipath the document - ipath for embedded - documents. + Fetches + the next Doc objects in the current + search results, and returns them as an array of the + required size, which is by default the value of + the arraysize data member. + - fbytes, dbytes the document - file and text sizes. - fmtime, dmtime the document - file and document times. - - xdocid the document - Xapian document ID. This is useful if you want to access - the document through a direct Xapian - operation. + + Query.fetchone() Fetches the + next Doc object from the current + search results. Generates a StopIteration exception if + there are no results left. + - mtype the document - MIME type. + + Query.close() + Closes the query. The object is unusable + after the call. + - Fields stored by default: - author, filename, - keywords, - recipient + + Query.scroll(value, mode='relative') + Adjusts the position in the current result + set. mode can + be relative + or absolute. + - - - - At query time, only the fields that are defined - as stored either by default or in - the fields configuration file will be - meaningful in the Doc - object. Especially this will not be the case for the - document text. See the rclextract - module for accessing document contents. + + Query.getgroups() + Retrieves the expanded query terms as a list + of pairs. Meaningful only after executexx In each + pair, the first entry is a list of user terms (of size + one for simple terms, or more for group and phrase + clauses), the second a list of query terms as derived + from the user terms and used in the Xapian + Query. + + + + Query.getxquery() + Return the Xapian query description as a + Unicode string. + Meaningful only after executexx. + - + + Query.highlight(text, ishtml = 0, methods = object) + Will insert <span "class=rclmatch">, + </span> tags around the match areas in the input text + and return the modified text. ishtml + can be set to indicate that the input text is HTML and + that HTML special characters should not be escaped. + methods if set should be an object + with methods startMatch(i) and endMatch() which will be + called for each match and should return a begin and end + tag + - - get(key), [] operator + + Query.makedocabstract(doc, methods = object)) + Create a snippets abstract + for doc (a Doc + object) by selecting text around the match terms. + If methods is set, will also perform highlighting. See + the highlight method. + + + + + Query.__iter__() and Query.next() + So that things like for doc in + query: will work. + + - Retrieve the named document - attribute. You can also use getattr(doc, - key) or - doc.key. - + - - doc.key = value + Query.arraysize + Default number of records processed by fetchmany + (r/w). + + Query.rowcountNumber + of records returned by the last + execute. + Query.rownumberNext index + to be fetched from results. Normally increments after + each fetchone() call, but can be set/reset before the + call to effect seeking (equivalent to + using scroll()). Starts at + 0. + - Set the the named document attribute. You - can also use setattr(doc, key, - value). - + - - getbinurl() + + + The Doc class - Retrieve the URL in byte array format (no - transcoding), for use as parameter to a system - call. - + A Doc object contains index data + for a given document. The data is extracted from the + index when searching, or set by the indexer program when + updating. The Doc object has many attributes to be read or + set by its user. It mostly matches the Rcl::Doc C++ + object. Some of the attributes are predefined, but, + especially when indexing, others can be set, the name of + which will be processed as field names by the indexing + configuration. Inputs can be specified as Unicode or + strings. Outputs are Unicode objects. All dates are + specified as Unix timestamps, printed as strings. Please + refer to the rcldb/rcldoc.cpp C++ file + for a full description of the predefined attributes. Here + follows a short list. - - setbinurl(url) + + url the document URL but + see also getbinurl() + + ipath the document + ipath for embedded + documents. - Set the URL in byte array format (no - transcoding). - + fbytes, dbytes the document + file and text sizes. + fmtime, dmtime the document + file and document times. + + xdocid the document + Xapian document ID. This is useful if you want to access + the document through a direct Xapian + operation. - - items() - Return a dictionary of doc object - keys/values - + mtype the document + MIME type. - - keys() - list of doc object keys (attribute - names). - - + Fields stored by default: + author, filename, + keywords, + recipient - + + + + At query time, only the fields that are defined as + stored either by default or in the + fields configuration file will be meaningful + in the Doc object. The document processed text + may be present or not, depending if the index stores the text at + all, and if it does, on the fetchtext query + execute option. See also the rclextract module + for accessing document contents. - - The SearchData class + - A SearchData object allows building - a query by combining clauses, for execution - by Query.executesd(). It can be used - in replacement of the query language approach. The - interface is going to change a little, so no detailed doc - for now... + + get(key), [] operator - + Retrieve the named document + attribute. You can also use getattr(doc, + key) or + doc.key. + - - addclause(type='and'|'or'|'excl'|'phrase'|'near'|'sub', - qstring=string, slack=0, field='', stemming=1, - subSearch=SearchData) - - - + + doc.key = value - + Set the the named document attribute. You + can also use setattr(doc, key, + value). + - - + + getbinurl() + + Retrieve the URL in byte array format (no + transcoding), for use as parameter to a system + call. + + + + setbinurl(url) + + Set the URL in byte array format (no + transcoding). + + + + items() + Return a dictionary of doc object + keys/values + + + + keys() + list of doc object keys (attribute + names). + + + + + + + The SearchData class + + A SearchData object allows building + a query by combining clauses, for execution + by Query.executesd(). It can be used + in replacement of the query language approach. The + interface is going to change a little, so no detailed doc + for now... + + + + + addclause(type='and'|'or'|'excl'|'phrase'|'near'|'sub', + qstring=string, slack=0, field='', stemming=1, + subSearch=SearchData) + + + + + + + The rclextract module - Prior to &RCL; 1.25, index queries never provide document - content because it is not stored. More recent versions usually + Prior to &RCL; 1.25, index queries could not provide document + content because it was never stored. &RCL; 1.25 and later usually store the document text, which can be optionally retrieved when running a query (see query.execute() above - the result is always plain text). @@ -5506,7 +5494,7 @@ recollindex -c "$confdir" You need to import the recoll module before the rclextract module. - + The Extractor class @@ -5565,7 +5553,7 @@ not doc.ipath and (not "rclbes" in doc.keys() or doc["rclbes"] == "FS") - +