*** empty log message ***
This commit is contained in:
parent
34cd8293ac
commit
d910d2bebe
71
src/INSTALL
71
src/INSTALL
@ -11,23 +11,21 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
|
|
||||||
--------------------------------------------------------------------------
|
--------------------------------------------------------------------------
|
||||||
|
|
||||||
Chapter 4. Installation
|
Chapter 5. Installation
|
||||||
|
|
||||||
Table of Contents
|
Table of Contents
|
||||||
|
|
||||||
4.1. Installing a prebuilt copy
|
5.1. Installing a prebuilt copy
|
||||||
|
|
||||||
4.2. Supporting packages
|
5.2. Supporting packages
|
||||||
|
|
||||||
4.3. Building from source
|
5.3. Building from source
|
||||||
|
|
||||||
4.4. Configuration overview
|
5.4. Configuration overview
|
||||||
|
|
||||||
4.5. The KDE Kicker Recoll applet
|
5.5. The KDE Kicker Recoll applet
|
||||||
|
|
||||||
4.6. Extending Recoll
|
5.1. Installing a prebuilt copy
|
||||||
|
|
||||||
4.1. Installing a prebuilt copy
|
|
||||||
|
|
||||||
Recoll binary packages from the Recoll web site are always linked
|
Recoll binary packages from the Recoll web site are always linked
|
||||||
statically to the Xapian libraries, and have no other dependencies. You
|
statically to the Xapian libraries, and have no other dependencies. You
|
||||||
@ -36,12 +34,12 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
have a look at the configuration section (but this may not be necessary
|
have a look at the configuration section (but this may not be necessary
|
||||||
for a quick test with default parameters).
|
for a quick test with default parameters).
|
||||||
|
|
||||||
4.1.1. Installing through a package system
|
5.1.1. Installing through a package system
|
||||||
|
|
||||||
If you use a BSD-type port system or a prebuilt package (RPM or other),
|
If you use a BSD-type port system or a prebuilt package (RPM or other),
|
||||||
just follow the usual procedure for your system.
|
just follow the usual procedure for your system.
|
||||||
|
|
||||||
4.1.2. Installing a prebuilt Recoll
|
5.1.2. Installing a prebuilt Recoll
|
||||||
|
|
||||||
The unpackaged binary versions on the Recoll web site are just compressed
|
The unpackaged binary versions on the Recoll web site are just compressed
|
||||||
tar files of a build tree, where only the useful parts were kept
|
tar files of a build tree, where only the useful parts were kept
|
||||||
@ -56,23 +54,29 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
|
|
||||||
--------------------------------------------------------------------------
|
--------------------------------------------------------------------------
|
||||||
|
|
||||||
Prev Home Next
|
Prev Home Next
|
||||||
Customizing the search interface Supporting packages
|
API Supporting packages
|
||||||
Link: HOME
|
Link: HOME
|
||||||
Link: UP
|
Link: UP
|
||||||
Link: PREVIOUS
|
Link: PREVIOUS
|
||||||
Link: NEXT
|
Link: NEXT
|
||||||
|
|
||||||
Recoll user manual
|
Recoll user manual
|
||||||
Prev Chapter 4. Installation Next
|
Prev Chapter 5. Installation Next
|
||||||
|
|
||||||
--------------------------------------------------------------------------
|
--------------------------------------------------------------------------
|
||||||
|
|
||||||
4.2. Supporting packages
|
5.2. Supporting packages
|
||||||
|
|
||||||
Recoll uses external applications to index some file types. You need to
|
Recoll uses external applications to index some file types. You need to
|
||||||
install them for the file types that you wish to have indexed (these are
|
install them for the file types that you wish to have indexed (these are
|
||||||
run-time dependencies. None is needed for building Recoll):
|
run-time dependencies. None is needed for building Recoll).
|
||||||
|
|
||||||
|
After an indexing pass, the commands that were found missing can be
|
||||||
|
displayed from the recoll File menu. The list is stored in the missing
|
||||||
|
text file inside the configuration directory.
|
||||||
|
|
||||||
|
A list of common file types which need external commands:
|
||||||
|
|
||||||
* Openoffice: supported natively, but needs the unzip command to be
|
* Openoffice: supported natively, but needs the unzip command to be
|
||||||
installed.
|
installed.
|
||||||
@ -118,13 +122,13 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
Link: NEXT
|
Link: NEXT
|
||||||
|
|
||||||
Recoll user manual
|
Recoll user manual
|
||||||
Prev Chapter 4. Installation Next
|
Prev Chapter 5. Installation Next
|
||||||
|
|
||||||
--------------------------------------------------------------------------
|
--------------------------------------------------------------------------
|
||||||
|
|
||||||
4.3. Building from source
|
5.3. Building from source
|
||||||
|
|
||||||
4.3.1. Prerequisites
|
5.3.1. Prerequisites
|
||||||
|
|
||||||
At the very least, you will need to download and install the xapian core
|
At the very least, you will need to download and install the xapian core
|
||||||
package (Recoll 1.9 normally uses version 1.0.2, but any 0.9 or 1.0.x
|
package (Recoll 1.9 normally uses version 1.0.2, but any 0.9 or 1.0.x
|
||||||
@ -140,7 +144,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
not be critical). On Linux systems, the iconv interface is part of libc
|
not be critical). On Linux systems, the iconv interface is part of libc
|
||||||
and you should not need to do anything special.
|
and you should not need to do anything special.
|
||||||
|
|
||||||
4.3.2. Building
|
5.3.2. Building
|
||||||
|
|
||||||
Recoll has been built on Linux (redhat7.3, mandriva 2005/6, Fedora Core
|
Recoll has been built on Linux (redhat7.3, mandriva 2005/6, Fedora Core
|
||||||
3/4/5/6), FreeBSD 5/6, macosx, and Solaris 8. If you build on another
|
3/4/5/6), FreeBSD 5/6, macosx, and Solaris 8. If you build on another
|
||||||
@ -178,7 +182,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
manually copy and modify one of the existing files (the new file name
|
manually copy and modify one of the existing files (the new file name
|
||||||
should be the output of uname -s).
|
should be the output of uname -s).
|
||||||
|
|
||||||
4.3.3. Installation
|
5.3.3. Installation
|
||||||
|
|
||||||
Either type make install or execute recollinstall prefix, in the root of
|
Either type make install or execute recollinstall prefix, in the root of
|
||||||
the source tree. This will copy the commands to prefix/bin and the sample
|
the source tree. This will copy the commands to prefix/bin and the sample
|
||||||
@ -201,11 +205,11 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
Link: NEXT
|
Link: NEXT
|
||||||
|
|
||||||
Recoll user manual
|
Recoll user manual
|
||||||
Prev Chapter 4. Installation Next
|
Prev Chapter 5. Installation Next
|
||||||
|
|
||||||
--------------------------------------------------------------------------
|
--------------------------------------------------------------------------
|
||||||
|
|
||||||
4.4. Configuration overview
|
5.4. Configuration overview
|
||||||
|
|
||||||
Most of the parameters specific to the recoll GUI are set through the
|
Most of the parameters specific to the recoll GUI are set through the
|
||||||
Preferences menu and stored in the standard QT place ($HOME/.qt/recollrc).
|
Preferences menu and stored in the standard QT place ($HOME/.qt/recollrc).
|
||||||
@ -263,7 +267,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
White space is used for separation inside lists. List elements with
|
White space is used for separation inside lists. List elements with
|
||||||
embedded spaces can be quoted using double-quotes.
|
embedded spaces can be quoted using double-quotes.
|
||||||
|
|
||||||
4.4.1. Main configuration file
|
5.4.1. Main configuration file
|
||||||
|
|
||||||
recoll.conf is the main configuration file. It defines things like what to
|
recoll.conf is the main configuration file. It defines things like what to
|
||||||
index (top directories and things to ignore), and the default character
|
index (top directories and things to ignore), and the default character
|
||||||
@ -467,7 +471,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
cases. A value of 3 would allow more precision and efficiency on
|
cases. A value of 3 would allow more precision and efficiency on
|
||||||
longer words, but the index will be approximately twice as large.
|
longer words, but the index will be approximately twice as large.
|
||||||
|
|
||||||
4.4.2. The mimemap file
|
5.4.2. The mimemap file
|
||||||
|
|
||||||
mimemap specifies the file name extension to mime type mappings.
|
mimemap specifies the file name extension to mime type mappings.
|
||||||
|
|
||||||
@ -491,7 +495,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
there avoids cluttering the more user-oriented and locally customized
|
there avoids cluttering the more user-oriented and locally customized
|
||||||
skippedNames.
|
skippedNames.
|
||||||
|
|
||||||
4.4.3. The mimeconf file
|
5.4.3. The mimeconf file
|
||||||
|
|
||||||
mimeconf specifies how the different mime types are handled for indexing,
|
mimeconf specifies how the different mime types are handled for indexing,
|
||||||
and which icons are displayed in the recoll result lists.
|
and which icons are displayed in the recoll result lists.
|
||||||
@ -503,7 +507,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
recoll in the result lists (the values are the basenames of the png images
|
recoll in the result lists (the values are the basenames of the png images
|
||||||
inside the iconsdir directory (specified in recoll.conf).
|
inside the iconsdir directory (specified in recoll.conf).
|
||||||
|
|
||||||
4.4.4. The mimeview file
|
5.4.4. The mimeview file
|
||||||
|
|
||||||
mimeview specifies which programs are started when you click on an Edit
|
mimeview specifies which programs are started when you click on an Edit
|
||||||
link in a result list. Ie: HTML is normally displayed using firefox, but
|
link in a result list. Ie: HTML is normally displayed using firefox, but
|
||||||
@ -524,9 +528,9 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
user preferences, all mimeview entries will be ignored except the one
|
user preferences, all mimeview entries will be ignored except the one
|
||||||
labelled application/x-all (which is set to use xdg-open by default).
|
labelled application/x-all (which is set to use xdg-open by default).
|
||||||
|
|
||||||
4.4.5. Examples of configuration adjustments
|
5.4.5. Examples of configuration adjustments
|
||||||
|
|
||||||
4.4.5.1. Adding an external viewer for an non-indexed type
|
5.4.5.1. Adding an external viewer for an non-indexed type
|
||||||
|
|
||||||
Imagine that you have some kind of file which does not have indexable
|
Imagine that you have some kind of file which does not have indexable
|
||||||
content, but for which you would like to have a functional Edit link in
|
content, but for which you would like to have a functional Edit link in
|
||||||
@ -557,7 +561,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
The entries you add in your personal file override those in the central
|
The entries you add in your personal file override those in the central
|
||||||
configuration, which you do not need to alter
|
configuration, which you do not need to alter
|
||||||
|
|
||||||
4.4.5.2. Adding indexing support for a new file type
|
5.4.5.2. Adding indexing support for a new file type
|
||||||
|
|
||||||
Let us now imagine that the above .blob files actually contain indexable
|
Let us now imagine that the above .blob files actually contain indexable
|
||||||
text and that you know how to extract it with a command line program.
|
text and that you know how to extract it with a command line program.
|
||||||
@ -581,11 +585,10 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
|
|
||||||
The rclblob filter should be an executable program or script which exists
|
The rclblob filter should be an executable program or script which exists
|
||||||
inside /usr/[local/]share/recoll/filters. It will be given a file name as
|
inside /usr/[local/]share/recoll/filters. It will be given a file name as
|
||||||
argument and should output the text contents in html format on the
|
argument and should output the text contents on the standard output.
|
||||||
standard output.
|
|
||||||
|
|
||||||
You can find more details about writing a Recoll filter in the section
|
The filter programming section describes in more detail how to write a
|
||||||
about writing filters
|
filter.
|
||||||
|
|
||||||
--------------------------------------------------------------------------
|
--------------------------------------------------------------------------
|
||||||
|
|
||||||
|
|||||||
662
src/README
662
src/README
@ -78,41 +78,51 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
|
|
||||||
3.12. Customizing the search interface
|
3.12. Customizing the search interface
|
||||||
|
|
||||||
4. Installation
|
4. Programming interface
|
||||||
|
|
||||||
4.1. Installing a prebuilt copy
|
4.1. Writing a document filter
|
||||||
|
|
||||||
4.1.1. Installing through a package system
|
4.1.1. Filter HTML output
|
||||||
|
|
||||||
4.1.2. Installing a prebuilt Recoll
|
4.2. Field data processing configuration
|
||||||
|
|
||||||
4.2. Supporting packages
|
4.3. API
|
||||||
|
|
||||||
4.3. Building from source
|
4.3.1. Interface elements
|
||||||
|
|
||||||
4.3.1. Prerequisites
|
4.3.2. Python interface
|
||||||
|
|
||||||
4.3.2. Building
|
5. Installation
|
||||||
|
|
||||||
4.3.3. Installation
|
5.1. Installing a prebuilt copy
|
||||||
|
|
||||||
4.4. Configuration overview
|
5.1.1. Installing through a package system
|
||||||
|
|
||||||
4.4.1. Main configuration file
|
5.1.2. Installing a prebuilt Recoll
|
||||||
|
|
||||||
4.4.2. The mimemap file
|
5.2. Supporting packages
|
||||||
|
|
||||||
4.4.3. The mimeconf file
|
5.3. Building from source
|
||||||
|
|
||||||
4.4.4. The mimeview file
|
5.3.1. Prerequisites
|
||||||
|
|
||||||
4.4.5. Examples of configuration adjustments
|
5.3.2. Building
|
||||||
|
|
||||||
4.5. The KDE Kicker Recoll applet
|
5.3.3. Installation
|
||||||
|
|
||||||
4.6. Extending Recoll
|
5.4. Configuration overview
|
||||||
|
|
||||||
4.6.1. Writing a document filter
|
5.4.1. Main configuration file
|
||||||
|
|
||||||
|
5.4.2. The mimemap file
|
||||||
|
|
||||||
|
5.4.3. The mimeconf file
|
||||||
|
|
||||||
|
5.4.4. The mimeview file
|
||||||
|
|
||||||
|
5.4.5. Examples of configuration adjustments
|
||||||
|
|
||||||
|
5.5. The KDE Kicker Recoll applet
|
||||||
|
|
||||||
----------------------------------------------------------------------
|
----------------------------------------------------------------------
|
||||||
|
|
||||||
@ -256,8 +266,14 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
individually indexed documents.
|
individually indexed documents.
|
||||||
|
|
||||||
Recoll indexing processes plain text, HTML, openoffice and e-mail files
|
Recoll indexing processes plain text, HTML, openoffice and e-mail files
|
||||||
internally. Other types (ie: postscript, pdf, ms-word, rtf) need external
|
internally.
|
||||||
|
|
||||||
|
Other file types (ie: postscript, pdf, ms-word, rtf ...) need external
|
||||||
applications for preprocessing. The list is in the installation section.
|
applications for preprocessing. The list is in the installation section.
|
||||||
|
After every indexing operation, Recoll updates a list of commands that
|
||||||
|
would be needed for indexing existing files types. This list can be
|
||||||
|
displayed from the recoll File menu. It is stored in the missing text file
|
||||||
|
inside the configuration directory.
|
||||||
|
|
||||||
Without further configuration, Recoll will index all appropriate files
|
Without further configuration, Recoll will index all appropriate files
|
||||||
from your home directory, with a reasonable set of defaults.
|
from your home directory, with a reasonable set of defaults.
|
||||||
@ -717,6 +733,9 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
The query language processor is activated on the simple search entry when
|
The query language processor is activated on the simple search entry when
|
||||||
the search mode selector is set to Query Language.
|
the search mode selector is set to Query Language.
|
||||||
|
|
||||||
|
The language is roughly based on the Xesam user search language
|
||||||
|
specification.
|
||||||
|
|
||||||
Here follows a sample request that we are going to explain:
|
Here follows a sample request that we are going to explain:
|
||||||
|
|
||||||
author:"john doe" Beatles OR Lennon Live OR Unplugged -potatoes
|
author:"john doe" Beatles OR Lennon Live OR Unplugged -potatoes
|
||||||
@ -728,6 +747,12 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
or lennon and either live or unplugged but not potatoes (in any part of
|
or lennon and either live or unplugged but not potatoes (in any part of
|
||||||
the document).
|
the document).
|
||||||
|
|
||||||
|
An element is composed of an optional field specification, and a value,
|
||||||
|
separated by a colon. Exemple: Beatles, author:balzac, dc:title:grandet
|
||||||
|
|
||||||
|
The colon, if present, means "contains". Xesam defines other relations,
|
||||||
|
which are not supported for now.
|
||||||
|
|
||||||
All elements in the search entry are normally combined with an implicit
|
All elements in the search entry are normally combined with an implicit
|
||||||
AND. It is possible to specify that elements be OR'ed instead, as in
|
AND. It is possible to specify that elements be OR'ed instead, as in
|
||||||
Beatles OR Lennon. The OR must be entered literally (capitals), and it has
|
Beatles OR Lennon. The OR must be entered literally (capitals), and it has
|
||||||
@ -735,51 +760,69 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
(word2 OR word3) not (word1 AND word2) OR word3. Do not enter explicit
|
(word2 OR word3) not (word1 AND word2) OR word3. Do not enter explicit
|
||||||
parenthesis, they are not supported for now.
|
parenthesis, they are not supported for now.
|
||||||
|
|
||||||
An entry preceded by a - specifies a term that should not appear.
|
An element preceded by a - specifies a term that should not appear. Pure
|
||||||
|
negative queries are forbidden.
|
||||||
|
|
||||||
The first element in the above exemple, author:"john doe" is a phrase
|
As usual, words inside quotes define a phrase (the order of words is
|
||||||
search limited to a specific field. Phrase searches are specified as usual
|
significant), so that title:"prejudice pride" is not the same as
|
||||||
by enclosing the words in double quotes. The field specification appears
|
title:prejudice title:pride, and is unlikely to find a result.
|
||||||
before the colon (of course this is not limited to phrases, author:Balzac
|
|
||||||
would be ok too). Recoll currently manages the following fields:
|
Recoll currently manages the following default fields:
|
||||||
|
|
||||||
* title, subject or caption are synonyms which specify data to be
|
* title, subject or caption are synonyms which specify data to be
|
||||||
searched for in the document title or subject.
|
searched for in the document title or subject.
|
||||||
|
|
||||||
* author or from for searching the documents originators.
|
* author or from for searching the documents originators.
|
||||||
|
|
||||||
* keyword for searching the document specified keywords (few documents
|
* recipient or to for searching the documents recipients.
|
||||||
|
|
||||||
|
* keyword for searching the document-specified keywords (few documents
|
||||||
actually have any).
|
actually have any).
|
||||||
|
|
||||||
As of release 1.9, the filters have the possibility to create other fields
|
* filename for the document's file name.
|
||||||
with arbitrary names. No standard filters use this possibility yet.
|
|
||||||
|
|
||||||
There are two other elements which may be specified through the field
|
* ext specifies the file name extension (Ex: ext:html)
|
||||||
syntax, but are somewhat special:
|
|
||||||
|
|
||||||
* ext for specifying the file name extension (Ex: ext:html)
|
The field syntax also supports a few field-like, but special, criteria:
|
||||||
|
|
||||||
* dir for specifying the file location (Ex: dir:/home/me/somedir).
|
* dir for filtering the results on file location (Ex:
|
||||||
Please note that this is quite inefficient, that it may produce very
|
dir:/home/me/somedir). Please note that this is quite inefficient,
|
||||||
slow searches, and that it may be worth in some cases to set up
|
that it may produce very slow searches, and that it may be worth in
|
||||||
separate databases instead.
|
some cases to set up separate databases instead.
|
||||||
|
|
||||||
* mime for specifying the mime type. This one is quite special because
|
* mime or format for specifying the mime type. This one is quite special
|
||||||
you can specify several values which will be OR'ed (the normal default
|
because you can specify several values which will be OR'ed (the normal
|
||||||
for the language is AND). Ex: mime:text/plain mime:text/html.
|
default for the language is AND). Ex: mime:text/plain mime:text/html.
|
||||||
Specifying an explicit boolean operator or negation (-) before a mime
|
Specifying an explicit boolean operator or negation (-) before a mime
|
||||||
specification is not supported and will produce strange results.
|
specification is not supported and will produce strange results.
|
||||||
|
|
||||||
|
* type or rclcat for specifying the category (as in
|
||||||
|
text/media/presentation/etc.). The classification of mime types in
|
||||||
|
categories is defined in the Recoll configuration (mimeconf), and can
|
||||||
|
be modified or extended. The default category names are those which
|
||||||
|
permit filtering results in the main GUI screen. Categories are OR'ed
|
||||||
|
like mime types above.
|
||||||
|
|
||||||
|
The document filters used while indexing have the possibility to create
|
||||||
|
other fields with arbitrary names, and aliases may be defined in the
|
||||||
|
configuration, so that the exact field search possibilities may be
|
||||||
|
different for you if someone took care of the customisation.
|
||||||
|
|
||||||
The query language is currently the only way to use the Recoll field
|
The query language is currently the only way to use the Recoll field
|
||||||
search capability.
|
search capability.
|
||||||
|
|
||||||
Words inside phrases and capitalized words are not stem-expanded.
|
Words inside phrases and capitalized words are not stem-expanded.
|
||||||
Wildcards may be used anywhere inside a term. Specifying a wild-card on
|
Wildcards may be used anywhere inside a term. Specifying a wild-card on
|
||||||
the left of a term can produce a very slow search.
|
the left of a term can produce a very slow search (or even an incorrect
|
||||||
|
one if the expansion is truncated because of excessive size).
|
||||||
|
|
||||||
You can use the show query link at the top of the result list to check the
|
You can use the show query link at the top of the result list to check the
|
||||||
exact query which was finally executed by Xapian.
|
exact query which was finally executed by Xapian.
|
||||||
|
|
||||||
|
Most Xesam phrase modifiers are unsupported, except for l (small ell) to
|
||||||
|
disable stemming, and p to turn an phrase into a NEAR (unordered) search.
|
||||||
|
Exemple: "prejudice pride"p
|
||||||
|
|
||||||
----------------------------------------------------------------------
|
----------------------------------------------------------------------
|
||||||
|
|
||||||
3.5. Complex/advanced search
|
3.5. Complex/advanced search
|
||||||
@ -1194,13 +1237,432 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
|
|
||||||
Your main database (the one the current configuration indexes to), is
|
Your main database (the one the current configuration indexes to), is
|
||||||
always implicitly active. If this is not desirable, you can set up your
|
always implicitly active. If this is not desirable, you can set up your
|
||||||
configuration so that it indexes, for example, an empty directory.
|
configuration so that it indexes, for example, an empty directory. An
|
||||||
|
alternative indexer may also need to implement a way of purging the index
|
||||||
|
from stale data,
|
||||||
|
|
||||||
----------------------------------------------------------------------
|
----------------------------------------------------------------------
|
||||||
|
|
||||||
Chapter 4. Installation
|
Chapter 4. Programming interface
|
||||||
|
|
||||||
4.1. Installing a prebuilt copy
|
Recoll has an Application programming Interface, usable both for indexing
|
||||||
|
and searching, currently accessible from the Python language.
|
||||||
|
|
||||||
|
Another less radical way to extend the application is to write filters for
|
||||||
|
new types of documents.
|
||||||
|
|
||||||
|
The processing of metadata attributes for documents (fields) is highly
|
||||||
|
configurable.
|
||||||
|
|
||||||
|
----------------------------------------------------------------------
|
||||||
|
|
||||||
|
4.1. Writing a document filter
|
||||||
|
|
||||||
|
Recoll filters are executable programs which translate from a specific
|
||||||
|
format (ie: openoffice, acrobat, etc.) to the Recoll indexing input
|
||||||
|
format, which may be text/plain or text/html.
|
||||||
|
|
||||||
|
Recoll filters are usually shell-scripts, but this is in no way necessary.
|
||||||
|
These programs are extremely simple and most of the difficulty lies in
|
||||||
|
extracting the text from the native format, not outputting what is
|
||||||
|
expected by Recoll. Happily enough, most document formats already have
|
||||||
|
translators or text extractors which handle the difficult part and can be
|
||||||
|
called from the filter. In some case the output of the translating program
|
||||||
|
is appropriate, and no intermediate shell-script is needed.
|
||||||
|
|
||||||
|
Filters are called with a single argument which is the source file name.
|
||||||
|
They should output the result to stdout.
|
||||||
|
|
||||||
|
The RECOLL_FILTER_FORPREVIEW environment variable (values yes, no) tells
|
||||||
|
the filter if the operation is for indexing or previewing. Some filters
|
||||||
|
use this to output a slightly different format. This is not essential.
|
||||||
|
|
||||||
|
The association of file types to filters is performed in the mimeconf
|
||||||
|
file. A sample:
|
||||||
|
|
||||||
|
[index]
|
||||||
|
application/msword = exec antiword -t -i 1 -m UTF-8;\
|
||||||
|
mimetype=text/plain;charset=utf-8
|
||||||
|
|
||||||
|
application/ogg = exec rclogg
|
||||||
|
|
||||||
|
text/rtf = exec unrtf --nopict --html; charset=iso-8859-1; mimetype=text/html
|
||||||
|
|
||||||
|
The fragment specifies that:
|
||||||
|
|
||||||
|
* application/msword files are processed by executing the antiword
|
||||||
|
program, which outputs text/plain encoded in iso-8859-1.
|
||||||
|
|
||||||
|
* application/ogg files are processed by the rclogg script, with default
|
||||||
|
output type (text/html, with encoding specified in the header, or
|
||||||
|
utf-8 by default).
|
||||||
|
|
||||||
|
* text/rtf is processed by unrtf, which outputs text/html. The
|
||||||
|
iso-8859-1 encoding is specified because it is not the utf-8 default,
|
||||||
|
and not output by unrtf in the HTML header section.
|
||||||
|
|
||||||
|
The easiest way to write a new filter is probably to start from an
|
||||||
|
existing one.
|
||||||
|
|
||||||
|
Filters which output text/plain text are generally simpler, but they
|
||||||
|
cannot specify the character set and other metadata, so they are limited
|
||||||
|
to cases where these elements are not needed.
|
||||||
|
|
||||||
|
----------------------------------------------------------------------
|
||||||
|
|
||||||
|
4.1.1. Filter HTML output
|
||||||
|
|
||||||
|
The output HTML could be very minimal like the following example:
|
||||||
|
|
||||||
|
<html><head>
|
||||||
|
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
|
||||||
|
</head>
|
||||||
|
<body>some text content</body></html>
|
||||||
|
|
||||||
|
|
||||||
|
You should take care to escape some characters inside the text by
|
||||||
|
transforming them into appropriate entities. "&" should be transformed
|
||||||
|
into "&", "<" should be transformed into "<". This is not always
|
||||||
|
properly done by translating programs which output HTML, and of course
|
||||||
|
nerver by those which output plain text.
|
||||||
|
|
||||||
|
The character set needs to be specified in the header. It does not need to
|
||||||
|
be UTF-8 (Recoll will take care of translating it), but it must be
|
||||||
|
accurate for good results.
|
||||||
|
|
||||||
|
Recoll will also make use of other header fields if they are present:
|
||||||
|
title, description, keywords.
|
||||||
|
|
||||||
|
Filters also have the possibility to "invent" field names. This should be
|
||||||
|
output as meta tags:
|
||||||
|
|
||||||
|
<meta name="somefield" content="Some textual data" />
|
||||||
|
|
||||||
|
See the following section for details about configuring how field data is
|
||||||
|
processed by the indexer.
|
||||||
|
|
||||||
|
----------------------------------------------------------------------
|
||||||
|
|
||||||
|
4.2. Field data processing configuration
|
||||||
|
|
||||||
|
Fields are named pieces of information in or about documents, like title,
|
||||||
|
author, abstract.
|
||||||
|
|
||||||
|
The field values for documents can appear in several ways during indexing:
|
||||||
|
either output by filters as meta fields in the HTML header section, or
|
||||||
|
added as attributes of the Doc object when using the API, or again
|
||||||
|
synthetized internally by Recoll.
|
||||||
|
|
||||||
|
The Recoll query language allows searching for text in a specific field.
|
||||||
|
|
||||||
|
Recoll defines a number of default fields. Additional ones can be output
|
||||||
|
by filters, and described in the fields configuration file.
|
||||||
|
|
||||||
|
Fields can be:
|
||||||
|
|
||||||
|
* indexed, meaning that their terms are separately stored in inverted
|
||||||
|
lists (with a specific prefix), and that a field-specific search is
|
||||||
|
possible.
|
||||||
|
|
||||||
|
* stored, meaning that their value is recorded in the index data record
|
||||||
|
for the document, and can be returned and displayed with search
|
||||||
|
results.
|
||||||
|
|
||||||
|
A field can be either or both indexed and stored.
|
||||||
|
|
||||||
|
A field becomes indexed by having a prefix defined in the [prefixes]
|
||||||
|
section of the fields file. See the comments in there for details
|
||||||
|
|
||||||
|
A field becomes stored by appearing in the [stored] section of the fields
|
||||||
|
file.
|
||||||
|
|
||||||
|
----------------------------------------------------------------------
|
||||||
|
|
||||||
|
4.3. API
|
||||||
|
|
||||||
|
4.3.1. Interface elements
|
||||||
|
|
||||||
|
A few elements in the interface are specific and and need an explanation.
|
||||||
|
|
||||||
|
udi
|
||||||
|
|
||||||
|
An udi (unique document identifier) identifies a document. Because
|
||||||
|
of limitations inside the index engine, it is restricted in length
|
||||||
|
(to 200 bytes), which is why a regular URI cannot be used. The
|
||||||
|
structure and contents of the udi is defined by the application
|
||||||
|
and opaque to the index engine. For example, the internal file
|
||||||
|
system indexer uses the complete document path (file path +
|
||||||
|
internal path), truncated to length, the suppressed part being
|
||||||
|
replaced by a hash value.
|
||||||
|
|
||||||
|
ipath
|
||||||
|
|
||||||
|
This data value (set as a field in the Doc object) is stored,
|
||||||
|
along with the URL, but not indexed by Recoll. Its contents are
|
||||||
|
not interpreted, and its use is up to the application. For
|
||||||
|
example, the Recoll internal file system indexer stores the part
|
||||||
|
of the document access path internal to the container file (ipath
|
||||||
|
in this case is a list of subdocument sequential numbers). url and
|
||||||
|
ipath are returned in every search result and permit access to the
|
||||||
|
original document.
|
||||||
|
|
||||||
|
Stored and indexed fields
|
||||||
|
|
||||||
|
The fields file inside the Recoll configuration defines which
|
||||||
|
document fields are either "indexed" (searchable), "stored"
|
||||||
|
(retrievable with search results), or both.
|
||||||
|
|
||||||
|
Data for an external indexer, should be stored in a separate index, not
|
||||||
|
the one for the Recoll internal file system indexer, except if the latter
|
||||||
|
is not used at all). The reason is that the main document indexer purge
|
||||||
|
pass would remove all the other indexer's documents, as they were not seen
|
||||||
|
during indexing. The main indexer documents would also probably be a
|
||||||
|
problem for the external indexer purge operation.
|
||||||
|
|
||||||
|
----------------------------------------------------------------------
|
||||||
|
|
||||||
|
4.3.2. Python interface
|
||||||
|
|
||||||
|
4.3.2.1. Introduction
|
||||||
|
|
||||||
|
Recoll versions after 1.11 define a Python programming interface, both for
|
||||||
|
searching and indexing.
|
||||||
|
|
||||||
|
The python interface is not built by default and can be found in the
|
||||||
|
source package, under python/recoll. The directory contains the usual
|
||||||
|
setup.py script which you can use to build and install the module:
|
||||||
|
|
||||||
|
cd recoll-xxx/python/recoll
|
||||||
|
python setup.py build
|
||||||
|
python setup.py install
|
||||||
|
|
||||||
|
|
||||||
|
----------------------------------------------------------------------
|
||||||
|
|
||||||
|
4.3.2.2. Interface manual
|
||||||
|
|
||||||
|
NAME
|
||||||
|
recoll - This is an interface to the Recoll full text indexer.
|
||||||
|
|
||||||
|
FILE
|
||||||
|
/usr/local/lib/python2.5/site-packages/recoll.so
|
||||||
|
|
||||||
|
CLASSES
|
||||||
|
Db
|
||||||
|
Doc
|
||||||
|
Query
|
||||||
|
SearchData
|
||||||
|
|
||||||
|
class Db(__builtin__.object)
|
||||||
|
| Db([confdir=None], [extra_dbs=None], [writable = False])
|
||||||
|
|
|
||||||
|
| A Db object holds a connection to a Recoll index. Use the connect()
|
||||||
|
| function to create one.
|
||||||
|
| confdir specifies a Recoll configuration directory (default:
|
||||||
|
| $RECOLL_CONFDIR or ~/.recoll).
|
||||||
|
| extra_dbs is a list of external databases (xapian directories)
|
||||||
|
| writable decides if we can index new data through this connection
|
||||||
|
|
|
||||||
|
| Methods defined here:
|
||||||
|
|
|
||||||
|
|
|
||||||
|
| addOrUpdate(...)
|
||||||
|
| addOrUpdate(udi, doc, parent_udi=None) -> None
|
||||||
|
| Add or update index data for a given document
|
||||||
|
| The udi string must define a unique id for the document. It is not
|
||||||
|
| interpreted inside Recoll
|
||||||
|
| doc is a Doc object
|
||||||
|
| if parent_udi is set, this is a unique identifier for the
|
||||||
|
| top-level container (ie mbox file)
|
||||||
|
|
|
||||||
|
| delete(...)
|
||||||
|
| delete(udi) -> Bool.
|
||||||
|
| Purge index from all data for udi. If udi matches a container
|
||||||
|
| document, purge all subdocs (docs with a parent_udi matching udi).
|
||||||
|
|
|
||||||
|
| makeDocAbstract(...)
|
||||||
|
| makeDocAbstract(Doc, Query) -> string
|
||||||
|
| Build and return 'keyword-in-context' abstract for document
|
||||||
|
| and query.
|
||||||
|
|
|
||||||
|
| needUpdate(...)
|
||||||
|
| needUpdate(udi, sig) -> Bool.
|
||||||
|
| Check if the index is up to date for the document defined by udi,
|
||||||
|
| having the current signature sig.
|
||||||
|
|
|
||||||
|
| purge(...)
|
||||||
|
| purge() -> Bool.
|
||||||
|
| Delete all documents that were not touched during the just finished
|
||||||
|
| indexing pass (since open-for-write). These are the documents for
|
||||||
|
| the needUpdate() call was not performed, indicating that they no
|
||||||
|
| longer exist in the primary storage system.
|
||||||
|
|
|
||||||
|
| query(...)
|
||||||
|
| query() -> Query. Return a new, blank query object for this index.
|
||||||
|
|
|
||||||
|
| setAbstractParams(...)
|
||||||
|
| setAbstractParams(maxchars, contextwords).
|
||||||
|
| Set the parameters used to build 'keyword-in-context' abstracts
|
||||||
|
|
|
||||||
|
| ----------------------------------------------------------------------
|
||||||
|
| Data and other attributes defined here:
|
||||||
|
|
|
||||||
|
|
||||||
|
class Doc(__builtin__.object)
|
||||||
|
| Doc()
|
||||||
|
|
|
||||||
|
| A Doc object contains index data for a given document.
|
||||||
|
| The data is extracted from the index when searching, or set by the
|
||||||
|
| indexer program when updating. The Doc object has no useful methods but
|
||||||
|
| many attributes to be read or set by its user. It matches exactly the
|
||||||
|
| Rcl::Doc c++ object. Some of the attributes are predefined, but,
|
||||||
|
| especially when indexing, others can be set, the name of which will be
|
||||||
|
| processed as field names by the indexing configuration.
|
||||||
|
| Inputs can be specified as unicode or strings.
|
||||||
|
| Outputs are unicode objects.
|
||||||
|
| All dates are specified as unix timestamps, printed as strings
|
||||||
|
| Predefined attributes (index/query/both):
|
||||||
|
| text (index): document plain text
|
||||||
|
| url (both)
|
||||||
|
| fbytes (both) optional) file size in bytes
|
||||||
|
| filename (both)
|
||||||
|
| fmtime (both) optional file modification date. Unix time printed
|
||||||
|
| as string
|
||||||
|
| dbytes (both) document text bytes
|
||||||
|
| dmtime (both) document creation/modification date
|
||||||
|
| ipath (both) value private to the app.: internal access path
|
||||||
|
| inside file
|
||||||
|
| mtype (both) mime type for original document
|
||||||
|
| mtime (query) dmtime if set else fmtime
|
||||||
|
| origcharset (both) charset the text was converted from
|
||||||
|
| size (query) dbytes if set, else fbytes
|
||||||
|
| sig (both) app-defined file modification signature.
|
||||||
|
| For up to date checks
|
||||||
|
| relevancyrating (query)
|
||||||
|
| abstract (both)
|
||||||
|
| author (both)
|
||||||
|
| title (both)
|
||||||
|
| keywords (both)
|
||||||
|
|
|
||||||
|
| Methods defined here:
|
||||||
|
|
|
||||||
|
|
|
||||||
|
| ----------------------------------------------------------------------
|
||||||
|
| Data and other attributes defined here:
|
||||||
|
|
|
||||||
|
|
||||||
|
class Query(__builtin__.object)
|
||||||
|
| Recoll Query objects are used to execute index searches.
|
||||||
|
| They must be created by the Db.query() method.
|
||||||
|
|
|
||||||
|
| Methods defined here:
|
||||||
|
|
|
||||||
|
|
|
||||||
|
| execute(...)
|
||||||
|
| execute(query_string, stemming=1|0)
|
||||||
|
|
|
||||||
|
| Starts a search for query_string, a Recoll search language string
|
||||||
|
| (mostly Xesam-compatible).
|
||||||
|
| The query can be a simple list of terms (and'ed by default), or more
|
||||||
|
| complicated with field specs etc. See the Recoll manual.
|
||||||
|
|
|
||||||
|
| executesd(...)
|
||||||
|
| executesd(SearchData)
|
||||||
|
|
|
||||||
|
| Starts a search for the query defined by the SearchData object.
|
||||||
|
|
|
||||||
|
| fetchone(...)
|
||||||
|
| fetchone(None) -> Doc
|
||||||
|
|
|
||||||
|
| Fetches the next Doc object in the current search results.
|
||||||
|
|
|
||||||
|
| sortby(...)
|
||||||
|
| sortby(field=fieldname, ascending=true)
|
||||||
|
| Sort results by 'fieldname', in ascending or descending order.
|
||||||
|
| Only one field can be used, no subsorts for now.
|
||||||
|
| Must be called before executing the search
|
||||||
|
|
|
||||||
|
| ----------------------------------------------------------------------
|
||||||
|
| Data descriptors defined here:
|
||||||
|
|
|
||||||
|
| next
|
||||||
|
| Next index to be fetched from results. Normally increments after
|
||||||
|
| each fetchone() call, but can be set/reset before the call effect
|
||||||
|
| seeking. Starts at 0
|
||||||
|
|
|
||||||
|
| ----------------------------------------------------------------------
|
||||||
|
| Data and other attributes defined here:
|
||||||
|
|
|
||||||
|
|
||||||
|
class SearchData(__builtin__.object)
|
||||||
|
| SearchData()
|
||||||
|
|
|
||||||
|
| A SearchData object describes a query. It has a number of global
|
||||||
|
| parameters and a chain of search clauses.
|
||||||
|
|
|
||||||
|
| Methods defined here:
|
||||||
|
|
|
||||||
|
|
|
||||||
|
| addclause(...)
|
||||||
|
| addclause(type='and'|'or'|'excl'|'phrase'|'near'|'sub',
|
||||||
|
| qstring=string, slack=int, field=string, stemming=1|0,
|
||||||
|
| subSearch=SearchData)
|
||||||
|
| Adds a simple clause to the SearchData And/Or chain, or a subquery
|
||||||
|
| defined by another SearchData object
|
||||||
|
|
|
||||||
|
| ----------------------------------------------------------------------
|
||||||
|
| Data and other attributes defined here:
|
||||||
|
|
|
||||||
|
|
||||||
|
FUNCTIONS
|
||||||
|
connect(...)
|
||||||
|
connect([confdir=None], [extra_dbs=None], [writable = False])
|
||||||
|
-> Db.
|
||||||
|
|
||||||
|
Connects to a Recoll database and returns a Db object.
|
||||||
|
confdir specifies a Recoll configuration directory
|
||||||
|
(the default is built like for any Recoll program).
|
||||||
|
extra_dbs is a list of external databases (xapian directories)
|
||||||
|
writable decides if we can index new data through this connection
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
----------------------------------------------------------------------
|
||||||
|
|
||||||
|
4.3.2.3. Example code
|
||||||
|
|
||||||
|
The following sample would query the index with a user language string.
|
||||||
|
See the python/samples directory inside the Recoll source for other
|
||||||
|
examples.
|
||||||
|
|
||||||
|
#!/usr/bin/env python
|
||||||
|
|
||||||
|
import recoll
|
||||||
|
|
||||||
|
db = recoll.connect()
|
||||||
|
db.setAbstractParams(maxchars=80, contextwords=2)
|
||||||
|
|
||||||
|
query = db.query()
|
||||||
|
nres = query.execute("some user question")
|
||||||
|
print "Result count: ", nres
|
||||||
|
if nres > 5:
|
||||||
|
nres = 5
|
||||||
|
while query.next >= 0 and query.next < nres:
|
||||||
|
doc = query.fetchone()
|
||||||
|
print query.next
|
||||||
|
for k in ("title", "size"):
|
||||||
|
print k, ":", getattr(doc, k).encode('utf-8')
|
||||||
|
abs = db.makeDocAbstract(doc, query).encode('utf-8')
|
||||||
|
print abs
|
||||||
|
print
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
----------------------------------------------------------------------
|
||||||
|
|
||||||
|
Chapter 5. Installation
|
||||||
|
|
||||||
|
5.1. Installing a prebuilt copy
|
||||||
|
|
||||||
Recoll binary packages from the Recoll web site are always linked
|
Recoll binary packages from the Recoll web site are always linked
|
||||||
statically to the Xapian libraries, and have no other dependencies. You
|
statically to the Xapian libraries, and have no other dependencies. You
|
||||||
@ -1211,14 +1673,14 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
|
|
||||||
----------------------------------------------------------------------
|
----------------------------------------------------------------------
|
||||||
|
|
||||||
4.1.1. Installing through a package system
|
5.1.1. Installing through a package system
|
||||||
|
|
||||||
If you use a BSD-type port system or a prebuilt package (RPM or other),
|
If you use a BSD-type port system or a prebuilt package (RPM or other),
|
||||||
just follow the usual procedure for your system.
|
just follow the usual procedure for your system.
|
||||||
|
|
||||||
----------------------------------------------------------------------
|
----------------------------------------------------------------------
|
||||||
|
|
||||||
4.1.2. Installing a prebuilt Recoll
|
5.1.2. Installing a prebuilt Recoll
|
||||||
|
|
||||||
The unpackaged binary versions on the Recoll web site are just compressed
|
The unpackaged binary versions on the Recoll web site are just compressed
|
||||||
tar files of a build tree, where only the useful parts were kept
|
tar files of a build tree, where only the useful parts were kept
|
||||||
@ -1233,11 +1695,17 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
|
|
||||||
----------------------------------------------------------------------
|
----------------------------------------------------------------------
|
||||||
|
|
||||||
4.2. Supporting packages
|
5.2. Supporting packages
|
||||||
|
|
||||||
Recoll uses external applications to index some file types. You need to
|
Recoll uses external applications to index some file types. You need to
|
||||||
install them for the file types that you wish to have indexed (these are
|
install them for the file types that you wish to have indexed (these are
|
||||||
run-time dependencies. None is needed for building Recoll):
|
run-time dependencies. None is needed for building Recoll).
|
||||||
|
|
||||||
|
After an indexing pass, the commands that were found missing can be
|
||||||
|
displayed from the recoll File menu. The list is stored in the missing
|
||||||
|
text file inside the configuration directory.
|
||||||
|
|
||||||
|
A list of common file types which need external commands:
|
||||||
|
|
||||||
* Openoffice: supported natively, but needs the unzip command to be
|
* Openoffice: supported natively, but needs the unzip command to be
|
||||||
installed.
|
installed.
|
||||||
@ -1275,9 +1743,9 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
|
|
||||||
----------------------------------------------------------------------
|
----------------------------------------------------------------------
|
||||||
|
|
||||||
4.3. Building from source
|
5.3. Building from source
|
||||||
|
|
||||||
4.3.1. Prerequisites
|
5.3.1. Prerequisites
|
||||||
|
|
||||||
At the very least, you will need to download and install the xapian core
|
At the very least, you will need to download and install the xapian core
|
||||||
package (Recoll 1.9 normally uses version 1.0.2, but any 0.9 or 1.0.x
|
package (Recoll 1.9 normally uses version 1.0.2, but any 0.9 or 1.0.x
|
||||||
@ -1295,7 +1763,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
|
|
||||||
----------------------------------------------------------------------
|
----------------------------------------------------------------------
|
||||||
|
|
||||||
4.3.2. Building
|
5.3.2. Building
|
||||||
|
|
||||||
Recoll has been built on Linux (redhat7.3, mandriva 2005/6, Fedora Core
|
Recoll has been built on Linux (redhat7.3, mandriva 2005/6, Fedora Core
|
||||||
3/4/5/6), FreeBSD 5/6, macosx, and Solaris 8. If you build on another
|
3/4/5/6), FreeBSD 5/6, macosx, and Solaris 8. If you build on another
|
||||||
@ -1335,7 +1803,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
|
|
||||||
----------------------------------------------------------------------
|
----------------------------------------------------------------------
|
||||||
|
|
||||||
4.3.3. Installation
|
5.3.3. Installation
|
||||||
|
|
||||||
Either type make install or execute recollinstall prefix, in the root of
|
Either type make install or execute recollinstall prefix, in the root of
|
||||||
the source tree. This will copy the commands to prefix/bin and the sample
|
the source tree. This will copy the commands to prefix/bin and the sample
|
||||||
@ -1350,7 +1818,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
|
|
||||||
----------------------------------------------------------------------
|
----------------------------------------------------------------------
|
||||||
|
|
||||||
4.4. Configuration overview
|
5.4. Configuration overview
|
||||||
|
|
||||||
Most of the parameters specific to the recoll GUI are set through the
|
Most of the parameters specific to the recoll GUI are set through the
|
||||||
Preferences menu and stored in the standard QT place ($HOME/.qt/recollrc).
|
Preferences menu and stored in the standard QT place ($HOME/.qt/recollrc).
|
||||||
@ -1410,7 +1878,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
|
|
||||||
----------------------------------------------------------------------
|
----------------------------------------------------------------------
|
||||||
|
|
||||||
4.4.1. Main configuration file
|
5.4.1. Main configuration file
|
||||||
|
|
||||||
recoll.conf is the main configuration file. It defines things like what to
|
recoll.conf is the main configuration file. It defines things like what to
|
||||||
index (top directories and things to ignore), and the default character
|
index (top directories and things to ignore), and the default character
|
||||||
@ -1616,7 +2084,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
|
|
||||||
----------------------------------------------------------------------
|
----------------------------------------------------------------------
|
||||||
|
|
||||||
4.4.2. The mimemap file
|
5.4.2. The mimemap file
|
||||||
|
|
||||||
mimemap specifies the file name extension to mime type mappings.
|
mimemap specifies the file name extension to mime type mappings.
|
||||||
|
|
||||||
@ -1642,7 +2110,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
|
|
||||||
----------------------------------------------------------------------
|
----------------------------------------------------------------------
|
||||||
|
|
||||||
4.4.3. The mimeconf file
|
5.4.3. The mimeconf file
|
||||||
|
|
||||||
mimeconf specifies how the different mime types are handled for indexing,
|
mimeconf specifies how the different mime types are handled for indexing,
|
||||||
and which icons are displayed in the recoll result lists.
|
and which icons are displayed in the recoll result lists.
|
||||||
@ -1656,7 +2124,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
|
|
||||||
----------------------------------------------------------------------
|
----------------------------------------------------------------------
|
||||||
|
|
||||||
4.4.4. The mimeview file
|
5.4.4. The mimeview file
|
||||||
|
|
||||||
mimeview specifies which programs are started when you click on an Edit
|
mimeview specifies which programs are started when you click on an Edit
|
||||||
link in a result list. Ie: HTML is normally displayed using firefox, but
|
link in a result list. Ie: HTML is normally displayed using firefox, but
|
||||||
@ -1679,9 +2147,9 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
|
|
||||||
----------------------------------------------------------------------
|
----------------------------------------------------------------------
|
||||||
|
|
||||||
4.4.5. Examples of configuration adjustments
|
5.4.5. Examples of configuration adjustments
|
||||||
|
|
||||||
4.4.5.1. Adding an external viewer for an non-indexed type
|
5.4.5.1. Adding an external viewer for an non-indexed type
|
||||||
|
|
||||||
Imagine that you have some kind of file which does not have indexable
|
Imagine that you have some kind of file which does not have indexable
|
||||||
content, but for which you would like to have a functional Edit link in
|
content, but for which you would like to have a functional Edit link in
|
||||||
@ -1714,7 +2182,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
|
|
||||||
----------------------------------------------------------------------
|
----------------------------------------------------------------------
|
||||||
|
|
||||||
4.4.5.2. Adding indexing support for a new file type
|
5.4.5.2. Adding indexing support for a new file type
|
||||||
|
|
||||||
Let us now imagine that the above .blob files actually contain indexable
|
Let us now imagine that the above .blob files actually contain indexable
|
||||||
text and that you know how to extract it with a command line program.
|
text and that you know how to extract it with a command line program.
|
||||||
@ -1738,86 +2206,32 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
|||||||
|
|
||||||
The rclblob filter should be an executable program or script which exists
|
The rclblob filter should be an executable program or script which exists
|
||||||
inside /usr/[local/]share/recoll/filters. It will be given a file name as
|
inside /usr/[local/]share/recoll/filters. It will be given a file name as
|
||||||
argument and should output the text contents in html format on the
|
argument and should output the text contents on the standard output.
|
||||||
standard output.
|
|
||||||
|
|
||||||
You can find more details about writing a Recoll filter in the section
|
The filter programming section describes in more detail how to write a
|
||||||
about writing filters
|
filter.
|
||||||
|
|
||||||
----------------------------------------------------------------------
|
----------------------------------------------------------------------
|
||||||
|
|
||||||
4.5. The KDE Kicker Recoll applet
|
5.5. The KDE Kicker Recoll applet
|
||||||
|
|
||||||
The Recoll source tree contains the source code to the recoll_applet, a
|
The Recoll source tree contains the source code to the recoll_applet, a
|
||||||
small application derived from the find_applet. This can be used to add a
|
small application derived from the find_applet. This can be used to add a
|
||||||
small Recoll launcher to the KDE panel.
|
small Recoll launcher to the KDE panel.
|
||||||
|
|
||||||
The applet is not automatically built with the main Recoll programs. To
|
The applet is not automatically built with the main Recoll programs, nor
|
||||||
build it, you need to unpack the Recoll source code, then go to the
|
is it included with the main source distribution (because the KDE build
|
||||||
kde/recoll_applet/ directory, and type the usual configure;make;make
|
boilerplate makes it relatively big). You can download its source from the
|
||||||
install.
|
recoll.org download page. Use the omnipotent configure;make;make install
|
||||||
|
incantation to build and install.
|
||||||
|
|
||||||
You can then add the applet to the panel by right-clicking the panel and
|
You can then add the applet to the panel by right-clicking the panel and
|
||||||
choosing the Add applet entry.
|
choosing the Add applet entry.
|
||||||
|
|
||||||
The recoll_applet has a small text window where you can type a Recoll
|
The recoll_applet has a small text window where you can type a Recoll
|
||||||
query (in query language form), and an icon which can be used to restrict
|
query (in query language form), and an icon which can be used to restrict
|
||||||
the search to certain types of files.
|
the search to certain types of files. It is quite primitive, and launches
|
||||||
|
a new recoll GUI instance every time (even if it is already running). You
|
||||||
----------------------------------------------------------------------
|
may find it useful anyway.
|
||||||
|
|
||||||
4.6. Extending Recoll
|
|
||||||
|
|
||||||
4.6.1. Writing a document filter
|
|
||||||
|
|
||||||
Recoll filters are executable programs which translate from a specific
|
|
||||||
format (ie: openoffice, acrobat, etc.) to the Recoll indexing input
|
|
||||||
format, which was chosen to be HTML.
|
|
||||||
|
|
||||||
Recoll filters are usually shell-scripts, but this is in no way necessary.
|
|
||||||
These programs are extremely simple and most of the difficulty lies in
|
|
||||||
extracting the text from the native format, not outputting what is
|
|
||||||
expected by Recoll. Happily enough, most document formats already have
|
|
||||||
translators or text extractors which handle the difficult part and can be
|
|
||||||
called from the filter.
|
|
||||||
|
|
||||||
Filters are called with a single argument which is the source file name.
|
|
||||||
They should output the result to stdout.
|
|
||||||
|
|
||||||
The RECOLL_FILTER_FORPREVIEW environment variable (values yes, no) tells
|
|
||||||
the filter if the operation is for indexing or previewing. Some filters
|
|
||||||
use this to output a slightly different format. This is not essential.
|
|
||||||
|
|
||||||
The output HTML could be very minimal like the following example:
|
|
||||||
|
|
||||||
<html><head>
|
|
||||||
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
|
|
||||||
</head>
|
|
||||||
<body>some text content</body></html>
|
|
||||||
|
|
||||||
|
|
||||||
You should take care to escape some characters inside the text by
|
|
||||||
transforming them into appropriate entities. "&" should be transformed
|
|
||||||
into "&", "<" should be transformed into "<".
|
|
||||||
|
|
||||||
The character set needs to be specified in the header. It does not need to
|
|
||||||
be UTF-8 (Recoll will take care of translating it), but it must be
|
|
||||||
accurate for good results.
|
|
||||||
|
|
||||||
Recoll will also make use of other header fields if they are present:
|
|
||||||
title, description, keywords.
|
|
||||||
|
|
||||||
As of Recoll release 1.9, filters also have the possibility to "invent"
|
|
||||||
field names. This should be output as meta tags:
|
|
||||||
|
|
||||||
<meta name="somefield" content="Some textual data" />
|
|
||||||
|
|
||||||
In this case, a correspondance between field name and Xapian prefix should
|
|
||||||
also be added to the mimeconf file. See the existing entries for
|
|
||||||
inspiration. The field can then be used inside the query language to
|
|
||||||
narrow searches.
|
|
||||||
|
|
||||||
The easiest way to write a new filter is probably to start from an
|
|
||||||
existing one.
|
|
||||||
|
|
||||||
----------------------------------------------------------------------
|
----------------------------------------------------------------------
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user