diff --git a/src/doc/user/Makefile b/src/doc/user/Makefile index b560abfa..31f337b1 100644 --- a/src/doc/user/Makefile +++ b/src/doc/user/Makefile @@ -17,8 +17,9 @@ XSLDIR="/usr/share/xml/docbook/stylesheet/docbook-xsl/" # Options common to the single-file and chunked versions commonoptions=--stringparam section.autolabel 1 \ - --stringparam section.autolabel.max.depth 3 \ + --stringparam section.autolabel.max.depth 2 \ --stringparam section.label.includes.component.label 1 \ + --stringparam toc.max.depth 3 \ --stringparam autotoc.label.in.hyperlink 0 \ --stringparam abstract.notitle.enabled 1 \ --stringparam html.stylesheet docbook-xsl.css \ diff --git a/src/doc/user/usermanual.html b/src/doc/user/usermanual.html index 7b6de588..95bc9522 100644 --- a/src/doc/user/usermanual.html +++ b/src/doc/user/usermanual.html @@ -1429,7 +1429,7 @@ alink="#0000FF"> other constraints. Most of the relevant parameters are described in the + "Parameters affecting how we generate terms and organize the index"> linked section.
The different search interfaces (GUI, command line, ...) have different methods to define the set of indexes @@ -2362,7 +2362,7 @@ recoll -c mondelaypatterns parameter in the configuration + "Miscellaneous parameters">configuration section.
@@ -2655,8 +2655,7 @@ recoll -c The format of the result list entries is entirely configurable by using the preference dialog to edit an HTML - fragment. + "The result list format">edit an HTML fragment.You can click on the Query
details link at the top of the results page to see
the query actually performed, after stem expansion and
@@ -2674,8 +2673,8 @@ recoll -c
3.1.2.1. No
- results: the spelling suggestions
+ "RCL.SEARCH.GUI.RESLIST.SUGGS">No results:
+ the spelling suggestions
@@ -2696,8 +2695,8 @@ recoll -c
3.1.2.2. The
- result list right-click menu
+ "RCL.SEARCH.GUI.RESULTLIST.MENU">The result
+ list right-click menu
@@ -2992,7 +2991,7 @@ recoll -c
3.1.6.1. Searching
+ "RCL.SEARCH.GUI.PREVIEW.SEARCH">Searching
inside the preview
@@ -3153,8 +3152,7 @@ recoll -c Recoll keeps a
history of searches. See Advanced search
- history.
The dialog has two tabs:
Date format:
@@ -4158,8 +4156,8 @@ recoll -c
3.1.15.1. The
- result list format
+ "RCL.SEARCH.GUI.CUSTOM.RESLIST">The result
+ list format
@@ -4915,9 +4913,9 @@ recoll -c have a
- look at an important limitation of wildcards in
- path filters.
Relative paths also make sense, for example,
dir:share/doc would
match either
3.8.1.1. Wildcards
- and path filtering
+ "RCL.SEARCH.WILDCARDS.PATH">Wildcards and
+ path filtering
@@ -6382,12 +6380,12 @@ recollindex -c "$confdir"
the result list by using the appropriate directive in
the definition of the result list
- paragraph format. All fields are displayed on the
- fields screen of the preview window (which you can
- reach through the right-click menu). This is
- independant of the fact that the search which
- produced the results used the field or not.
Recoll versions after - 1.11 define a Python programming interface, both for - searching and creating/updating an index.
-The search interface is used in the Recoll Ubuntu Unity Lens and the - Recoll Web UI. It can - run queries on any Recoll configuration.
+The Recoll Python + programming interface can be used both for searching and + for creating/updating an index. Bindings exist for + Python2 and Python3.
+The search interface is used in a number of active + projects: the Recoll + Gnome Shell Search + Provider, the Recoll Web UI, and the upmpdcli UPnP + Media Server, in addition to many small scripts.
The index update section of the API may be used to create and update Recoll indexes on specific configurations (separate from the @@ -6467,6 +6467,23 @@ recollindex -c "$confdir" here. A paragraph at the end of this section will explain a few differences and ways to write code compatible with both versions.
+The recoll package now
+ contains two modules:
The recoll module
+ contains functions and classes used to query (or
+ update) the index.
The rclextract
+ module contains functions and classes used at query
+ time to access document data.
There is a good chance that your system repository has packages for the Recoll Python API, sometimes in a package separate from the main one (maybe named something @@ -6493,15 +6510,17 @@ recollindex -c "$confdir" nres = query.execute("some query") results = query.fetchmany(20) for doc in results: - print(doc.url, doc.title) + print("%s %s" % (doc.url, doc.title))
You can also take a look at the source for the Recoll WebUI, or the upmpdcli local media server, which are - both based on the Python API.
+ "https://opensourceprojects.eu/p/recollwebui/code/ci/78ddb20787b2a894b5e4661a8d5502c4511cf71e/tree/" + target="_top">Recoll WebUI, the upmpdcli local media server, or the + Gnome Shell Search Provider.The fields file
- inside the Recoll
+
The fields file inside the
+ Recoll
configuration defines which document fields are
- either "indexed" (searchable), "stored"
- (retrievable with search results), or both.
indexed
+ (searchable), stored
+ (retrievable with search results), or both. Apart
+ from a few standard/internal fields, only the
+ stored fields are
+ retrievable through the Python search
+ interface.
The recoll package
- contains two modules:
The recoll module
- contains functions and classes used to query (or
- update) the index. This section will only
- describe the query part, see further for the
- update part.
The rclextract
- module contains functions and classes used to
- access document data.
The connect()
+ function connects to one or several Recoll index(es) and returns a
+ Db object.
This call initializes the recoll module, and it + should always be performed before any other call or + object creation.
+confdir may
+ specify a configuration directory. The usual
+ defaults apply.
extra_dbs is a
+ list of additional indexes (Xapian
+ directories).
writable
+ decides if we can index new data through this
+ connection.
A Db object is created by a connect() call and holds a
+ connection to a Recoll index.
The connect()
- function connects to one or several
- Recoll
- index(es) and returns a Db object.
confdir
- may specify a configuration directory.
- The usual defaults apply.
extra_dbs
- is a list of additional indexes (Xapian
- directories).
writable
- decides if we can index new data through
- this connection.
This call initializes the recoll module, and - it should always be performed before any other - call or object creation.
+Closes the connection. You can't do anything
+ with the Db object
+ after this.
These aliases return a blank Query object for this
+ index.
Set the parameters used to build snippets
+ (sets of keywords in context text fragments).
+ maxchars defines
+ the maximum total size of the abstract.
+ contextwords
+ defines how many terms are shown around the
+ keyword.
Expand an expression against the index term
+ list. Performs the basic function from the GUI
+ term explorer tool. match_type can be either of
+ wildcard,
+ regexp or
+ stem. Returns a
+ list of terms expanded from the input
+ expression.
A Query object
+ (equivalent to a cursor in the Python DB API) is
+ created by a Db.query()
+ call. It is used to execute index searches.
Sort results by fieldname, in
+ ascending or descending order. Must be called
+ before executing the search.
Starts a search for query_string, a
+ Recoll search
+ language string. If the index stores the
+ document texts and fetchtext is True, store the
+ document extracted text in doc.text.
Starts a search for the query defined by the
+ SearchData object. If the index stores the
+ document texts and fetchtext is True, store the
+ document extracted text in doc.text.
Fetches the next Doc objects in the current
+ search results, and returns them as an array of
+ the required size, which is by default the
+ value of the arraysize data member.
Fetches the next Doc object from the current
+ search results. Generates a StopIteration
+ exception if there are no results left.
Closes the query. The object is unusable + after the call.
+Adjusts the position in the current result
+ set. mode can be
+ relative or
+ absolute.
Retrieves the expanded query terms as a list + of pairs. Meaningful only after executexx In + each pair, the first entry is a list of user + terms (of size one for simple terms, or more + for group and phrase clauses), the second a + list of query terms as derived from the user + terms and used in the Xapian Query.
+Return the Xapian query description as a + Unicode string. Meaningful only after + executexx.
+Will insert <span "class=rclmatch">,
+ </span> tags around the match areas in
+ the input text and return the modified text.
+ ishtml can be set
+ to indicate that the input text is HTML and
+ that HTML special characters should not be
+ escaped. methods
+ if set should be an object with methods
+ startMatch(i) and endMatch() which will be
+ called for each match and should return a begin
+ and end tag
Create a snippets abstract for doc (a Doc object) by selecting text
+ around the match terms. If methods is set, will
+ also perform highlighting. See the highlight
+ method.
So that things like for doc in query: will
+ work.
Default number of records processed by + fetchmany (r/w).
+Number of records returned by the last + execute.
+Next index to be fetched from results.
+ Normally increments after each fetchone() call,
+ but can be set/reset before the call to effect
+ seeking (equivalent to using scroll()). Starts at 0.
A Db object is created by a connect() call and holds a
- connection to a Recoll index.
Closes the connection. You can't do
- anything with the Db object after this.
These aliases return a blank Query object for this
- index.
Set the parameters used to build snippets
- (sets of keywords in context text fragments).
- maxchars defines
- the maximum total size of the abstract.
- contextwords
- defines how many terms are shown around the
- keyword.
Expand an expression against the index
- term list. Performs the basic function from
- the GUI term explorer tool. match_type can be either of
- wildcard,
- regexp or
- stem. Returns a
- list of terms expanded from the input
- expression.
A Doc object contains
+ index data for a given document. The data is
+ extracted from the index when searching, or set by
+ the indexer program when updating. The Doc object has
+ many attributes to be read or set by its user. It
+ mostly matches the Rcl::Doc C++ object. Some of the
+ attributes are predefined, but, especially when
+ indexing, others can be set, the name of which will
+ be processed as field names by the indexing
+ configuration. Inputs can be specified as Unicode or
+ strings. Outputs are Unicode objects. All dates are
+ specified as Unix timestamps, printed as strings.
+ Please refer to the rcldb/rcldoc.cpp C++ file for a
+ full description of the predefined attributes. Here
+ follows a short list.
url the
+ document URL but see also getbinurl()
ipath the
+ document ipath for
+ embedded documents.
fbytes, dbytes
+ the document file and text sizes.
fmtime, dmtime
+ the document file and document times.
xdocid the
+ document Xapian document ID. This is useful if
+ you want to access the document through a
+ direct Xapian operation.
mtype the
+ document MIME type.
Fields stored by default: author, filename, keywords, recipient
At query time, only the fields that are defined as
+ stored either by default
+ or in the fields
+ configuration file will be meaningful in the
+ Doc object. The document
+ processed text may be present or not, depending if
+ the index stores the text at all, and if it does, on
+ the fetchtext query
+ execute option. See also the rclextract module for accessing
+ document contents.
Retrieve the named document attribute. You
+ can also use getattr(doc,
+ key) or doc.key.
Set the the named document attribute. You
+ can also use setattr(doc,
+ key, value).
Retrieve the URL in byte array format (no + transcoding), for use as parameter to a system + call.
+Set the URL in byte array format (no + transcoding).
+Return a dictionary of doc object + keys/values
+list of doc object keys (attribute + names).
+A Query object
- (equivalent to a cursor in the Python DB API) is
- created by a Db.query() call. It is used to
- execute index searches.
Sort results by fieldname, in
- ascending or descending order. Must be called
- before executing the search.
Starts a search for query_string,
- a Recoll
- search language string. If the index stores
- the document texts and fetchtext is True, store the
- document extracted text in doc.text.
Starts a search for the query defined by
- the SearchData object. If the index stores
- the document texts and fetchtext is True, store the
- document extracted text in doc.text.
Fetches the next Doc objects in the current
- search results, and returns them as an array
- of the required size, which is by default the
- value of the arraysize data member.
Fetches the next Doc object from the current
- search results. Generates a StopIteration
- exception if there are no results left.
Closes the query. The object is unusable - after the call.
-Adjusts the position in the current result
- set. mode can be
- relative or
- absolute.
Retrieves the expanded query terms as a - list of pairs. Meaningful only after - executexx In each pair, the first entry is a - list of user terms (of size one for simple - terms, or more for group and phrase clauses), - the second a list of query terms as derived - from the user terms and used in the Xapian - Query.
-Return the Xapian query description as a - Unicode string. Meaningful only after - executexx.
-Will insert <span "class=rclmatch">,
- </span> tags around the match areas in
- the input text and return the modified text.
- ishtml can be
- set to indicate that the input text is HTML
- and that HTML special characters should not
- be escaped. methods if set should be an
- object with methods startMatch(i) and
- endMatch() which will be called for each
- match and should return a begin and end
- tag
Create a snippets abstract for
- doc (a
- Doc object) by
- selecting text around the match terms. If
- methods is set, will also perform
- highlighting. See the highlight method.
So that things like for doc in query: will
- work.
Default number of records processed by - fetchmany (r/w).
-Number of records returned by the last - execute.
-Next index to be fetched from results.
- Normally increments after each fetchone()
- call, but can be set/reset before the call to
- effect seeking (equivalent to using
- scroll()).
- Starts at 0.
A Doc object
- contains index data for a given document. The data
- is extracted from the index when searching, or set
- by the indexer program when updating. The Doc
- object has many attributes to be read or set by its
- user. It matches exactly the Rcl::Doc C++ object.
- Some of the attributes are predefined, but,
- especially when indexing, others can be set, the
- name of which will be processed as field names by
- the indexing configuration. Inputs can be specified
- as Unicode or strings. Outputs are Unicode objects.
- All dates are specified as Unix timestamps, printed
- as strings. Please refer to the rcldb/rcldoc.cpp C++ file for a
- full description of the predefined attributes. Here
- follows a short list.
url the
- document URL but see also getbinurl()
ipath the
- document ipath
- for embedded documents.
fbytes,
- dbytes the document file and text
- sizes.
fmtime,
- dmtime the document file and document
- times.
xdocid the
- document Xapian document ID. This is useful
- if you want to access the document through a
- direct Xapian operation.
mtype the
- document MIME type.
Fields stored by default: author, filename, keywords, recipient
At query time, only the fields that are defined
- as stored either by
- default or in the fields configuration file will be
- meaningful in the Doc
- object. Especially this will not be the case for
- the document text. See the rclextract module for accessing
- document contents.
Retrieve the named document attribute. You
- can also use getattr(doc, key) or
- doc.key.
Set the the named document attribute. You
- can also use setattr(doc, key,
- value).
Retrieve the URL in byte array format (no - transcoding), for use as parameter to a - system call.
-Set the URL in byte array format (no - transcoding).
-Return a dictionary of doc object - keys/values
-list of doc object keys (attribute - names).
-A SearchData object
- allows building a query by combining clauses, for
- execution by Query.executesd(). It can be used
- in replacement of the query language approach. The
- interface is going to change a little, so no
- detailed doc for now...
A SearchData object
+ allows building a query by combining clauses, for
+ execution by Query.executesd(). It can be used in
+ replacement of the query language approach. The
+ interface is going to change a little, so no detailed
+ doc for now...
Prior to Recoll
- 1.25, index queries never provide document content
- because it is not stored. More recent versions usually
+ 1.25, index queries could not provide document content
+ because it was never stored. Recoll 1.25 and later usually
store the document text, which can be optionally
retrieved when running a query (see query.execute() above - the result is
@@ -7126,7 +7093,7 @@ recollindex -c "$confdir"
You need to import the recoll module before the rclextract module.