links around some document areas, or automatically by adding a very small
javascript program to the documents, like the following example, which
would initiate a search by double-clicking any term:
@@ -1842,7 +1930,44 @@ Chapter 3. Searching
text/html [file:///Users/uncrypted-dockes/projets/pagepers/index.html] [psxtcl/writemime/recoll]...
text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/recu-chasse-maree....
-3.4. The query language
+3.4. Path translations
+
+ In some cases, the document paths stored inside the index do not match the
+ actual ones, so that document previews and accesses will fail. This can
+ occur in a number of circumstances:
+
+ o When using multiple indexes it is a relatively common occurrence that
+ some will actually reside on a remote volume, for exemple mounted via
+ NFS. In this case, the paths used to access the documents on the local
+ machine are not necessarily the same than the ones used while indexing
+ on the remote machine. For example, /home/me may have been used as a
+ topdirs elements while indexing, but the directory might be mounted as
+ /net/server/home/me on the local machine.
+
+ o The case may also occur with removable disks. It is perfectly possible
+ to configure an index to live with the documents on the removable
+ disk, but it may happen that the disk is not mounted at the same place
+ so that the documents paths from the index are invalid.
+
+ o As a last exemple, one could imagine that a big directory has been
+ moved, but that it is currently inconvenient to run the indexer.
+
+ More generally, the path translation facility may be useful whenever the
+ documents paths seen by the indexer are not the same as the ones which
+ should be used at query time.
+
+ Recoll has a facility for rewriting access paths when extracting the data
+ from the index. The translations can be defined for the main index and for
+ any additional query index.
+
+ In the above NFS example, Recoll could be instructed to rewrite any
+ file:///home/me URL from the index to file:///net/server/home/me, allowing
+ accesses from the client.
+
+ The translations are defined in the ptrans configuration file, which can
+ be edited by hand or from the GUI external indexes configuration dialog.
+
+3.5. The query language
The query language processor is activated in the GUI simple search entry
when the search mode selector is set to Query Language. It can also be
@@ -1914,9 +2039,9 @@ Chapter 3. Searching
o dir for filtering the results on file location (Ex:
dir:/home/me/somedir). -dir also works to find results not in the
specified directory (release >= 1.15.8). A tilde inside the value will
- be expanded to the home directory. Wildcards will not be expanded. You
- cannot use OR with dir clauses (this restriction may go away in the
- future).
+ be expanded to the home directory. Wildcards will be expanded, but
+ please have a look at an important limitation of wildcards in path
+ filters.
Relative paths also make sense, for example, dir:share/doc would match
either /usr/share/doc or /usr/local/share/doc
@@ -1930,8 +2055,10 @@ Chapter 3. Searching
This would select results which have both recoll and src in the path
(in any order), and which have not either utils or common.
- Another special aspect of dir clauses is that the values in the index
- are not transcoded to UTF-8, and never lower-cased or unaccented, but
+ You can also use OR conjunctions with dir: clauses.
+
+ A special aspect of dir clauses is that the values in the index are
+ not transcoded to UTF-8, and never lower-cased or unaccented, but
stored as binary. This means that you need to enter the values in the
exact lower or upper case, and that searches for names with diacritics
may sometimes be impossible because of character set conversion
@@ -2000,7 +2127,7 @@ Chapter 3. Searching
configuration, so that the exact field search possibilities may be
different for you if someone took care of the customisation.
- 3.4.1. Modifiers
+ 3.5.1. Modifiers
Some characters are recognized as search modifiers when found immediately
after the closing double quote of a phrase, as in "some
@@ -2025,7 +2152,7 @@ Chapter 3. Searching
o A weight can be specified for a query element by specifying a decimal
value at the start of the modifiers. Example: "Important"2.5.
-3.5. Search case and diacritics sensitivity
+3.6. Search case and diacritics sensitivity
For Recoll versions 1.18 and later, and when working with a raw index (not
the default), searches can be made sensitive to character case and
@@ -2075,7 +2202,7 @@ Chapter 3. Searching
When either case or diacritics sensitivity is activated, stem expansion is
turned off. Having both does not make much sense.
-3.6. Anchored searches and wildcards
+3.7. Anchored searches and wildcards
Some special characters are interpreted by Recoll in search strings to
expand or specialize the search. Wildcards expand a root term in
@@ -2083,7 +2210,7 @@ Chapter 3. Searching
if the match is found at or near the beginning of the document or one of
its fields.
- 3.6.1. More about wildcards
+ 3.7.1. More about wildcards
All words entered in Recoll search fields will be processed for wildcard
expansion before the request is finally executed.
@@ -2098,15 +2225,18 @@ Chapter 3. Searching
matches a single character which may be 'a' or 'b' or 'c', [0-9]
matches any number.
- You should be aware of a few things before using wildcards.
+ You should be aware of a few things when using wildcards.
o Using a wildcard character at the beginning of a word can make for a
slow search because Recoll will have to scan the whole index term list
- to find the matches.
+ to find the matches. However, this is much less a problem for field
+ searches, and queries like author:*@domain.com can sometimes be very
+ useful.
- o When working with a raw index (preserving character case and
- diacritics), the literal part of a wildcard expression will be matched
- exactly for case and diacritics.
+ o For Recoll version 18 only, when working with a raw index (preserving
+ character case and diacritics), the literal part of a wildcard
+ expression will be matched exactly for case and diacritics. This is
+ not true any more for versions 19 and later.
o Using a * at the end of a word can produce more matches than you would
think, and strange search results. You can use the term explorer tool
@@ -2116,7 +2246,22 @@ Chapter 3. Searching
expansion will produce better results than an ending * (stem expansion
is turned off when any wildcard character appears in the term).
- 3.6.2. Anchored searches
+ 3.7.1.1. Wildcards and path filtering
+
+ Due to the way that Recoll processes wildcards inside dir path filtering
+ clauses, they will have a multiplicative effect on the query size. A
+ clause containg wildcards in several paths elements, like, for example,
+ dir:/home/me/*/*/docdir, will almost certainly fail if your indexed tree
+ is of any realistic size.
+
+ Depending on the case, you may be able to work around the issue by
+ specifying the paths elements more narrowly, with a constant prefix, or by
+ using 2 separate dir: clauses instead of multiple wildcards, as in
+ dir:/home/me dir:docdir. The latter query is not equivalent to the initial
+ one because it does not specify a number of directory levels, but that's
+ the best we can do (and it may be actually more useful in some cases).
+
+ 3.7.2. Anchored searches
Two characters are used to specify that a search hit should occur at the
beginning or at the end of the text. ^ at the beginning of a term or
@@ -2145,7 +2290,7 @@ Chapter 3. Searching
matches inside the abstract or the list of authors (which occur at the top
of the document).
-3.7. Desktop integration
+3.8. Desktop integration
Being independant of the desktop type has its drawbacks: Recoll desktop
integration is minimal. However there are a few tools available:
@@ -2159,14 +2304,14 @@ Chapter 3. Searching
Here follow a few other things that may help.
- 3.7.1. Hotkeying recoll
+ 3.8.1. Hotkeying recoll
It is surprisingly convenient to be able to show or hide the Recoll GUI
with a single keystroke. Recoll comes with a small Python script, based on
the libwnck window manager interface library, which will allow you to do
just this. The detailed instructions are on this wiki page.
- 3.7.2. The KDE Kicker Recoll applet
+ 3.8.2. The KDE Kicker Recoll applet
This is probably obsolete now. Anyway:
@@ -2368,32 +2513,68 @@ Chapter 4. Programming interface
The output HTML could be very minimal like the following example:
-
-
-
- some text content
+
+
+
+
+
+ Some text content
+
+
You should take care to escape some characters inside the text by
- transforming them into appropriate entities. "&" should be transformed
- into "&", "<" should be transformed into "<". This is not always
- properly done by translating programs which output HTML, and of course
- never by those which output plain text.
+ transforming them into appropriate entities. At the very minimum, "&"
+ should be transformed into "&", "<" should be transformed into "<".
+ This is not always properly done by translating programs which output
+ HTML, and of course never by those which output plain text.
+
+ When encapsulating plain text in an HTML body, the display of a preview
+ may be improved by enclosing the text inside tags.
The character set needs to be specified in the header. It does not need to
be UTF-8 (Recoll will take care of translating it), but it must be
accurate for good results.
- Recoll will also make use of other header fields if they are present:
- title, description, keywords.
+ Recoll will process meta tags inside the header as possible document
+ fields candidates. Documents fields can be processed by the indexer in
+ different ways, for searching or displaying inside query results. This is
+ described in a following section.
- Filters also have the possibility to "invent" field names. This should be
- output as meta tags:
+ By default, the indexer will process the standard header fields if they
+ are present: title, meta/description, and meta/keywords are both indexed
+ and stored for query-time display.
+
+ A predefined non-standard meta tag will also be processed by Recoll
+ without further configuration: if a date tag is present and has the right
+ format, it will be used as the document date (for display and sorting), in
+ preference to the file modification date. The date format should be as
+ follows:
+
+
+ or
+
+
+
+ Example:
+
+
+
+
+ Filters also have the possibility to "invent" field names. This should
+ also be output as meta tags:
- See the following section for details about configuring how field data is
- processed by the indexer.
+ You can embed HTML markup inside the content of custom fields, for
+ improving the display inside result lists. In this case, add a (wildly
+ non-standard) markup attribute to tell Recoll that the value is HTML and
+ should not be escaped for display.
+
+
+
+ As written above, the processing of fields is described in a further
+ section.
4.1.5. Page numbers
@@ -2409,8 +2590,8 @@ Chapter 4. Programming interface
The field values for documents can appear in several ways during indexing:
either output by filters as meta fields in the HTML header section, or
- added as attributes of the Doc object when using the API, or again
- synthetized internally by Recoll.
+ extracted from file extended attributes, or added as attributes of the Doc
+ object when using the API, or again synthetized internally by Recoll.
The Recoll query language allows searching for text in a specific field.
@@ -2511,234 +2692,237 @@ Chapter 4. Programming interface
Recoll versions after 1.11 define a Python programming interface, both for
searching and indexing.
+ The API is inspired by the Python database API specification, version 1.0
+ for Recoll versions up to 1.18, version 2.0 for Recoll versions 1.19 and
+ later. The package structure changed with Recoll 1.19 too. We will mostly
+ describe the new API and package structure here. A paragraph at the end of
+ this section will explain a few differences and ways to write code
+ compatible with both versions.
+
The Python interface can be found in the source package, under
python/recoll.
- In order to build the module, you should first build or re-build the
- Recoll library using position-independant objects:
+ The python/recoll/ directory contains the usual setup.py. After
+ configuring the main Recoll code, you can use the script to build and
+ install the Python module:
- cd recoll-xxx/
- configure --enable-pic
- make
+ cd recoll-xxx/python/recoll
+ python setup.py build
+ python setup.py install
+
- There is no significant disadvantage in using PIC objects for the main
- Recoll executables, so you can use the --enable-pic option for the main
- build too.
+ 4.3.2.2. Recoll package
- The python/recoll/ directory contains the usual setup.py script which you
- can then use to build and install the module:
+ The recoll package contains two modules:
- cd recoll-xxx/python/recoll
- python setup.py build
- python setup.py install
+ o The recoll module contains functions and classes used to query (or
+ update) the index.
- 4.3.2.2. Interface manual
+ o The rclextract module contains functions and classes used to access
+ document data.
- NAME
- recoll - This is an interface to the Recoll full text indexer.
+ 4.3.2.3. The recoll module
- FILE
- /usr/local/lib/python2.5/site-packages/recoll.so
+ Functions
- CLASSES
- Db
- Doc
- Query
- SearchData
-
- class Db(__builtin__.object)
- | Db([confdir=None], [extra_dbs=None], [writable = False])
- |
- | A Db object holds a connection to a Recoll index. Use the connect()
- | function to create one.
- | confdir specifies a Recoll configuration directory (default:
- | $RECOLL_CONFDIR or ~/.recoll).
- | extra_dbs is a list of external databases (xapian directories)
- | writable decides if we can index new data through this connection
- |
- | Methods defined here:
- |
- |
- | addOrUpdate(...)
- | addOrUpdate(udi, doc, parent_udi=None) -> None
- | Add or update index data for a given document
- | The udi string must define a unique id for the document. It is not
- | interpreted inside Recoll
- | doc is a Doc object
- | if parent_udi is set, this is a unique identifier for the
- | top-level container (ie mbox file)
- |
- | delete(...)
- | delete(udi) -> Bool.
- | Purge index from all data for udi. If udi matches a container
- | document, purge all subdocs (docs with a parent_udi matching udi).
- |
- | makeDocAbstract(...)
- | makeDocAbstract(Doc, Query) -> string
- | Build and return 'keyword-in-context' abstract for document
- | and query.
- |
- | needUpdate(...)
- | needUpdate(udi, sig) -> Bool.
- | Check if the index is up to date for the document defined by udi,
- | having the current signature sig.
- |
- | purge(...)
- | purge() -> Bool.
- | Delete all documents that were not touched during the just finished
- | indexing pass (since open-for-write). These are the documents for
- | the needUpdate() call was not performed, indicating that they no
- | longer exist in the primary storage system.
- |
- | query(...)
- | query() -> Query. Return a new, blank query object for this index.
- |
- | setAbstractParams(...)
- | setAbstractParams(maxchars, contextwords).
- | Set the parameters used to build 'keyword-in-context' abstracts
- |
- | ----------------------------------------------------------------------
- | Data and other attributes defined here:
- |
-
- class Doc(__builtin__.object)
- | Doc()
- |
- | A Doc object contains index data for a given document.
- | The data is extracted from the index when searching, or set by the
- | indexer program when updating. The Doc object has no useful methods but
- | many attributes to be read or set by its user. It matches exactly the
- | Rcl::Doc c++ object. Some of the attributes are predefined, but,
- | especially when indexing, others can be set, the name of which will be
- | processed as field names by the indexing configuration.
- | Inputs can be specified as unicode or strings.
- | Outputs are unicode objects.
- | All dates are specified as unix timestamps, printed as strings
- | Predefined attributes (index/query/both):
- | text (index): document plain text
- | url (both)
- | fbytes (both) optional) file size in bytes
- | filename (both)
- | fmtime (both) optional file modification date. Unix time printed
- | as string
- | dbytes (both) document text bytes
- | dmtime (both) document creation/modification date
- | ipath (both) value private to the app.: internal access path
- | inside file
- | mtype (both) mime type for original document
- | mtime (query) dmtime if set else fmtime
- | origcharset (both) charset the text was converted from
- | size (query) dbytes if set, else fbytes
- | sig (both) app-defined file modification signature.
- | For up to date checks
- | relevancyrating (query)
- | abstract (both)
- | author (both)
- | title (both)
- | keywords (both)
- |
- | Methods defined here:
- |
- |
- | ----------------------------------------------------------------------
- | Data and other attributes defined here:
- |
-
- class Query(__builtin__.object)
- | Recoll Query objects are used to execute index searches.
- | They must be created by the Db.query() method.
- |
- | Methods defined here:
- |
- |
- | execute(...)
- | execute(query_string, stemming=1|0, stemlang="stemming language")
- |
- | Starts a search for query_string, a Recoll search language string
- | (mostly Xesam-compatible).
- | The query can be a simple list of terms (and'ed by default), or more
- | complicated with field specs etc. See the Recoll manual.
- |
- | executesd(...)
- | executesd(SearchData)
- |
- | Starts a search for the query defined by the SearchData object.
- |
- | fetchone(...)
- | fetchone(None) -> Doc
- |
- | Fetches the next Doc object in the current search results.
- |
- | sortby(...)
- | sortby(field=fieldname, ascending=true)
- | Sort results by 'fieldname', in ascending or descending order.
- | Only one field can be used, no subsorts for now.
- | Must be called before executing the search
- |
- | ----------------------------------------------------------------------
- | Data descriptors defined here:
- |
- | next
- | Next index to be fetched from results. Normally increments after
- | each fetchone() call, but can be set/reset before the call effect
- | seeking. Starts at 0
- |
- | ----------------------------------------------------------------------
- | Data and other attributes defined here:
- |
-
- class SearchData(__builtin__.object)
- | SearchData()
- |
- | A SearchData object describes a query. It has a number of global
- | parameters and a chain of search clauses.
- |
- | Methods defined here:
- |
- |
- | addclause(...)
- | addclause(type='and'|'or'|'excl'|'phrase'|'near'|'sub',
- | qstring=string, slack=int, field=string, stemming=1|0,
- | subSearch=SearchData)
- | Adds a simple clause to the SearchData And/Or chain, or a subquery
- | defined by another SearchData object
- |
- | ----------------------------------------------------------------------
- | Data and other attributes defined here:
- |
+ connect(confdir=None, extra_dbs=None, writable = False)
+ The connect() function connects to one or several Recoll index(es)
+ and returns a Db object.
+ o confdir may specify a configuration directory. The usual
+ defaults apply.
+ o extra_dbs is a list of additional indexes (Xapian
+ directories).
+ o writable decides if we can index new data through this
+ connection.
+ This call initializes the recoll module, and it should always be
+ performed before any other call or object creation.
- FUNCTIONS
- connect(...)
- connect([confdir=None], [extra_dbs=None], [writable = False])
- -> Db.
-
- Connects to a Recoll database and returns a Db object.
- confdir specifies a Recoll configuration directory
- (the default is built like for any Recoll program).
- extra_dbs is a list of external databases (xapian directories)
- writable decides if we can index new data through this connection
+ Classes
- 4.3.2.3. Example code
+ The Db class
+
+ A Db object is created by a connect() function and holds a connection to a
+ Recoll index.
+
+ Methods
+
+ Db.close()
+ Closes the connection. You can't do anything with the Db object
+ after this.
+
+ Db.query(), Db.cursor()
+ These aliases return a blank Query object for this index.
+
+ Db.setAbstractParams(maxchars, contextwords)
+ Set the parameters used to build snippets.
+
+ The Query class
+
+ A Query object (equivalent to a cursor in the Python DB API) is created by
+ a Db.query() call. It is used to execute index searches.
+
+ Methods
+
+ Query.sortby(fieldname, ascending=True)
+ Sort results by fieldname, in ascending or descending order. Must
+ be called before executing the search.
+
+ Query.execute(query_string, stemming=1, stemlang="english")
+ Starts a search for query_string, a Recoll search language string.
+
+ Query.executesd(SearchData)
+ Starts a search for the query defined by the SearchData object.
+
+ Query.fetchmany(size=query.arraysize)
+ Fetches the next Doc objects in the current search results, and
+ returns them as an array of the required size, which is by default
+ the value of the arraysize data member.
+
+ Query.fetchone()
+ Fetches the next Doc object from the current search results.
+
+ Query.close()
+ Closes the connection. The object is unusable after the call.
+
+ Query.scroll(value, mode='relative')
+ Adjusts the position in the current result set. mode can be
+ relative or absolute.
+
+ Query.getgroups()
+ Retrieves the expanded query terms as a list of pairs. Meaningful
+ only after executexx In each pair, the first entry is a list of
+ user terms, the second a list of query terms as derived from the
+ user terms and used in the Xapian Query. The size of each list is
+ one for simple terms, or more for group and phrase clauses.
+
+ Query.getxquery()
+ Return the Xapian query description as a Unicode string.
+ Meaningful only after executexx.
+
+ Query.highlight(text, ishtml = 0, methods = object)
+ Will insert , tags around the match
+ areas in the input text and return the modified text. ishtml can
+ be set to indicate that the input text is HTML and that HTML
+ special characters should not be escaped. methods if set should be
+ an object with methods startMatch(i) and endMatch() which will be
+ called for each match and should return a begin and end tag
+
+ Query.makedocabstract(doc, methods = object))
+ Create a snippets abstract for doc (a Doc object) by selecting
+ text around the match terms. If methods is set, will also perform
+ highlighting. See the highlight method.
+
+ Query.__iter__() and Query.next()
+ So that things like for doc in query: will work.
+
+ Data descriptors
+
+ Query.arraysize
+ Default number of records processed by fetchmany (r/w).
+
+ Query.rowcount
+ Number of records returned by the last execute.
+
+ Query.rownumber
+ Next index to be fetched from results. Normally increments after
+ each fetchone() call, but can be set/reset before the call effect
+ seeking. Starts at 0.
+
+ The Doc class
+
+ A Doc object contains index data for a given document. The data is
+ extracted from the index when searching, or set by the indexer program
+ when updating. The Doc object has many attributes to be read or set by its
+ user. It matches exactly the Rcl::Doc C++ object. Some of the attributes
+ are predefined, but, especially when indexing, others can be set, the name
+ of which will be processed as field names by the indexing configuration.
+ Inputs can be specified as Unicode or strings. Outputs are Unicode
+ objects. All dates are specified as Unix timestamps, printed as strings.
+ Please refer to the rcldb/rcldoc.h C++ file for a description of the
+ predefined attributes.
+
+ At query time, only the fields that are defined as stored either by
+ default or in the fields configuration file will be meaningful in the Doc
+ object. Especially this will not be the case for the document text. See
+ the rclextract module for accessing document contents.
+
+ Methods
+
+ get(key), [] operator
+ Retrieve the named doc attribute
+
+ getbinurl()
+ Retrieve the URL in byte array format (no transcoding), for use as
+ parameter to a system call.
+
+ items()
+ Return a dictionary of doc object keys/values
+
+ keys()
+ list of doc object keys (attribute names).
+
+ The SearchData class
+
+ A SearchData object allows building a query by combining clauses, for
+ execution by Query.executesd(). It can be used in replacement of the query
+ language approach. The interface is going to change a little, so no
+ detailed doc for now...
+
+ Methods
+
+ addclause(type='and'|'or'|'excl'|'phrase'|'near'|'sub', qstring=string,
+ slack=0, field='', stemming=1, subSearch=SearchData)
+
+ 4.3.2.4. The rclextract module
+
+ Document content is not provided by an index query. To access it, the data
+ extraction part of the indexing process must be performed (subdocument
+ access and format translation). This is not trivial in general. The
+ rclextract module currently provides a single class which can be used to
+ access the data content for result documents.
+
+ Classes
+
+ The Extractor class
+
+ Methods
+
+ Extractor(doc)
+ An Extractor object is built from a Doc object, output from a
+ query.
+
+ Extractor.textextract(ipath)
+ Extract document defined by ipath and return a Doc object. The
+ doc.text field has the document text as either text/plain or
+ text/html according to doc.mimetype.
+
+ Extractor.idoctofile()
+ Extracts document into an output file, which can be given
+ explicitly or will be created as a temporary file to be deleted by
+ the caller.
+
+ 4.3.2.5. Example code
The following sample would query the index with a user language string.
See the python/samples directory inside the Recoll source for other
- examples.
+ examples. The recollgui subdirectory has a very embryonic GUI which
+ demonstrates the highlighting and data extraction functions.
#!/usr/bin/env python
- import recoll
+ from recoll import recoll
db = recoll.connect()
- db.setAbstractParams(maxchars=80, contextwords=2)
+ db.setAbstractParams(maxchars=80, contextwords=4)
query = db.query()
nres = query.execute("some user question")
print "Result count: ", nres
if nres > 5:
nres = 5
- while query.next >= 0 and query.next < nres:
+ for i in range(nres):
doc = query.fetchone()
- print query.next
+ print "Result #%d" % (query.rownumber,)
for k in ("title", "size"):
print k, ":", getattr(doc, k).encode('utf-8')
abs = db.makeDocAbstract(doc, query).encode('utf-8')
@@ -2747,6 +2931,32 @@ Chapter 4. Programming interface
+ 4.3.2.6. Compatibility with the previous version
+
+ The following code fragments can be used to ensure that code can run with
+ both the old and the new API (as long as it does not use the new abilities
+ of the new API of course).
+
+ Adapting to the new package structure:
+
+
+ try:
+ from recoll import recoll
+ from recoll import rclextract
+ hasextract = True
+ except:
+ import recoll
+ hasextract = False
+
+
+ Adapting to the change of nature of the next Query member. The same test
+ can be used to choose to use the scroll() method (new) or set the next
+ value (old).
+
+
+ rownum = query.next if type(query.next) == int else \
+ query.rownumber
+
Chapter 5. Installation and configuration
@@ -3359,10 +3569,22 @@ Chapter 5. Installation and configuration
This allows setting fields for all documents under a given
directory. Typical usage would be to set an "rclaptg" field, to be
used in mimeview to select a specific viewer. If several fields
- are to be set, they should be separated with a colon (':')
- character (which there is currently no way to escape). Ie:
- localfields= rclaptg=gnus:other = val, then select specifier
- viewer with mimetype|tag=... in mimeview.
+ are to be set, they should be separated with a semi-colon (';')
+ character, which there is currently no way to escape. Also note
+ the initial semi-colon. Example: localfields= ;rclaptg=gnus;other
+ = val, then select specifier viewer with mimetype|tag=... in
+ mimeview.
+
+ metadatacmds
+
+ This allows executing external commands for each file and storing
+ the output in a Recoll field. This could be used for example to
+ index external tag data. The value is a list of field names and
+ commands, don't forget an initial semi-colon. Example:
+
+ [/some/area/of/the/fs]
+ metadatacmds = ; tags = tmsu tags %f; otherfield = somecmd -xx %f
+
5.4.1.3. Parameters affecting where and how we store things:
@@ -3592,6 +3814,18 @@ Chapter 5. Installation and configuration
# mailmytag field name
x-my-tag = mailmytag
+ 5.4.2.1. Extended attributes in the fields file
+
+ Recoll versions 1.19 and later process user extended file attributes as
+ documents fields by default.
+
+ Attributes are processed as fields of the same name, after removing the
+ user prefix on Linux.
+
+ The [xattrtofields] section of the fields file allows specifying
+ translations from extended attributes names to Recoll field names. An
+ empty translation disables use of the corresponding attribute data.
+
5.4.3. The mimemap file
mimemap specifies the file name extension to mime type mappings.
@@ -3699,9 +3933,28 @@ Chapter 5. Installation and configuration
document. This could be used in combination with field customisation to
help with opening the document.
- 5.4.6. Examples of configuration adjustments
+ 5.4.6. The ptrans file
- 5.4.6.1. Adding an external viewer for an non-indexed type
+ ptrans specifies query-time path translations. These can be useful in
+ multiple cases.
+
+ The file has a section for any index which needs translations, either the
+ main one or additional query indexes. The sections are named with the
+ Xapian index directory names. No slash character should exist at the end
+ of the paths (all comparisons are textual). An exemple should make things
+ sufficiently clear
+
+ [/home/me/.recoll/xapiandb]
+ /this/directory/moved = /to/this/place
+
+ [/path/to/additional/xapiandb]
+ /server/volume1/docdir = /net/server/volume1/docdir
+ /server/volume2/docdir = /net/server/volume2/docdir
+
+
+ 5.4.7. Examples of configuration adjustments
+
+ 5.4.7.1. Adding an external viewer for an non-indexed type
Imagine that you have some kind of file which does not have indexable
content, but for which you would like to have a functional Open link in
@@ -3731,7 +3984,7 @@ Chapter 5. Installation and configuration
configuration, which you do not need to alter. mimeview can also be
modified from the Gui.
- 5.4.6.2. Adding indexing support for a new file type
+ 5.4.7.2. Adding indexing support for a new file type
Let us now imagine that the above .blob files actually contain indexable
text and that you know how to extract it with a command line program.