*** empty log message ***
This commit is contained in:
parent
34cd8293ac
commit
d910d2bebe
71
src/INSTALL
71
src/INSTALL
@ -11,23 +11,21 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
|
||||
--------------------------------------------------------------------------
|
||||
|
||||
Chapter 4. Installation
|
||||
Chapter 5. Installation
|
||||
|
||||
Table of Contents
|
||||
|
||||
4.1. Installing a prebuilt copy
|
||||
5.1. Installing a prebuilt copy
|
||||
|
||||
4.2. Supporting packages
|
||||
5.2. Supporting packages
|
||||
|
||||
4.3. Building from source
|
||||
5.3. Building from source
|
||||
|
||||
4.4. Configuration overview
|
||||
5.4. Configuration overview
|
||||
|
||||
4.5. The KDE Kicker Recoll applet
|
||||
5.5. The KDE Kicker Recoll applet
|
||||
|
||||
4.6. Extending Recoll
|
||||
|
||||
4.1. Installing a prebuilt copy
|
||||
5.1. Installing a prebuilt copy
|
||||
|
||||
Recoll binary packages from the Recoll web site are always linked
|
||||
statically to the Xapian libraries, and have no other dependencies. You
|
||||
@ -36,12 +34,12 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
have a look at the configuration section (but this may not be necessary
|
||||
for a quick test with default parameters).
|
||||
|
||||
4.1.1. Installing through a package system
|
||||
5.1.1. Installing through a package system
|
||||
|
||||
If you use a BSD-type port system or a prebuilt package (RPM or other),
|
||||
just follow the usual procedure for your system.
|
||||
|
||||
4.1.2. Installing a prebuilt Recoll
|
||||
5.1.2. Installing a prebuilt Recoll
|
||||
|
||||
The unpackaged binary versions on the Recoll web site are just compressed
|
||||
tar files of a build tree, where only the useful parts were kept
|
||||
@ -56,23 +54,29 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
|
||||
--------------------------------------------------------------------------
|
||||
|
||||
Prev Home Next
|
||||
Customizing the search interface Supporting packages
|
||||
Prev Home Next
|
||||
API Supporting packages
|
||||
Link: HOME
|
||||
Link: UP
|
||||
Link: PREVIOUS
|
||||
Link: NEXT
|
||||
|
||||
Recoll user manual
|
||||
Prev Chapter 4. Installation Next
|
||||
Prev Chapter 5. Installation Next
|
||||
|
||||
--------------------------------------------------------------------------
|
||||
|
||||
4.2. Supporting packages
|
||||
5.2. Supporting packages
|
||||
|
||||
Recoll uses external applications to index some file types. You need to
|
||||
install them for the file types that you wish to have indexed (these are
|
||||
run-time dependencies. None is needed for building Recoll):
|
||||
run-time dependencies. None is needed for building Recoll).
|
||||
|
||||
After an indexing pass, the commands that were found missing can be
|
||||
displayed from the recoll File menu. The list is stored in the missing
|
||||
text file inside the configuration directory.
|
||||
|
||||
A list of common file types which need external commands:
|
||||
|
||||
* Openoffice: supported natively, but needs the unzip command to be
|
||||
installed.
|
||||
@ -118,13 +122,13 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
Link: NEXT
|
||||
|
||||
Recoll user manual
|
||||
Prev Chapter 4. Installation Next
|
||||
Prev Chapter 5. Installation Next
|
||||
|
||||
--------------------------------------------------------------------------
|
||||
|
||||
4.3. Building from source
|
||||
5.3. Building from source
|
||||
|
||||
4.3.1. Prerequisites
|
||||
5.3.1. Prerequisites
|
||||
|
||||
At the very least, you will need to download and install the xapian core
|
||||
package (Recoll 1.9 normally uses version 1.0.2, but any 0.9 or 1.0.x
|
||||
@ -140,7 +144,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
not be critical). On Linux systems, the iconv interface is part of libc
|
||||
and you should not need to do anything special.
|
||||
|
||||
4.3.2. Building
|
||||
5.3.2. Building
|
||||
|
||||
Recoll has been built on Linux (redhat7.3, mandriva 2005/6, Fedora Core
|
||||
3/4/5/6), FreeBSD 5/6, macosx, and Solaris 8. If you build on another
|
||||
@ -178,7 +182,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
manually copy and modify one of the existing files (the new file name
|
||||
should be the output of uname -s).
|
||||
|
||||
4.3.3. Installation
|
||||
5.3.3. Installation
|
||||
|
||||
Either type make install or execute recollinstall prefix, in the root of
|
||||
the source tree. This will copy the commands to prefix/bin and the sample
|
||||
@ -201,11 +205,11 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
Link: NEXT
|
||||
|
||||
Recoll user manual
|
||||
Prev Chapter 4. Installation Next
|
||||
Prev Chapter 5. Installation Next
|
||||
|
||||
--------------------------------------------------------------------------
|
||||
|
||||
4.4. Configuration overview
|
||||
5.4. Configuration overview
|
||||
|
||||
Most of the parameters specific to the recoll GUI are set through the
|
||||
Preferences menu and stored in the standard QT place ($HOME/.qt/recollrc).
|
||||
@ -263,7 +267,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
White space is used for separation inside lists. List elements with
|
||||
embedded spaces can be quoted using double-quotes.
|
||||
|
||||
4.4.1. Main configuration file
|
||||
5.4.1. Main configuration file
|
||||
|
||||
recoll.conf is the main configuration file. It defines things like what to
|
||||
index (top directories and things to ignore), and the default character
|
||||
@ -467,7 +471,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
cases. A value of 3 would allow more precision and efficiency on
|
||||
longer words, but the index will be approximately twice as large.
|
||||
|
||||
4.4.2. The mimemap file
|
||||
5.4.2. The mimemap file
|
||||
|
||||
mimemap specifies the file name extension to mime type mappings.
|
||||
|
||||
@ -491,7 +495,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
there avoids cluttering the more user-oriented and locally customized
|
||||
skippedNames.
|
||||
|
||||
4.4.3. The mimeconf file
|
||||
5.4.3. The mimeconf file
|
||||
|
||||
mimeconf specifies how the different mime types are handled for indexing,
|
||||
and which icons are displayed in the recoll result lists.
|
||||
@ -503,7 +507,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
recoll in the result lists (the values are the basenames of the png images
|
||||
inside the iconsdir directory (specified in recoll.conf).
|
||||
|
||||
4.4.4. The mimeview file
|
||||
5.4.4. The mimeview file
|
||||
|
||||
mimeview specifies which programs are started when you click on an Edit
|
||||
link in a result list. Ie: HTML is normally displayed using firefox, but
|
||||
@ -524,9 +528,9 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
user preferences, all mimeview entries will be ignored except the one
|
||||
labelled application/x-all (which is set to use xdg-open by default).
|
||||
|
||||
4.4.5. Examples of configuration adjustments
|
||||
5.4.5. Examples of configuration adjustments
|
||||
|
||||
4.4.5.1. Adding an external viewer for an non-indexed type
|
||||
5.4.5.1. Adding an external viewer for an non-indexed type
|
||||
|
||||
Imagine that you have some kind of file which does not have indexable
|
||||
content, but for which you would like to have a functional Edit link in
|
||||
@ -557,7 +561,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
The entries you add in your personal file override those in the central
|
||||
configuration, which you do not need to alter
|
||||
|
||||
4.4.5.2. Adding indexing support for a new file type
|
||||
5.4.5.2. Adding indexing support for a new file type
|
||||
|
||||
Let us now imagine that the above .blob files actually contain indexable
|
||||
text and that you know how to extract it with a command line program.
|
||||
@ -581,11 +585,10 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
|
||||
The rclblob filter should be an executable program or script which exists
|
||||
inside /usr/[local/]share/recoll/filters. It will be given a file name as
|
||||
argument and should output the text contents in html format on the
|
||||
standard output.
|
||||
argument and should output the text contents on the standard output.
|
||||
|
||||
You can find more details about writing a Recoll filter in the section
|
||||
about writing filters
|
||||
The filter programming section describes in more detail how to write a
|
||||
filter.
|
||||
|
||||
--------------------------------------------------------------------------
|
||||
|
||||
|
||||
662
src/README
662
src/README
@ -78,41 +78,51 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
|
||||
3.12. Customizing the search interface
|
||||
|
||||
4. Installation
|
||||
4. Programming interface
|
||||
|
||||
4.1. Installing a prebuilt copy
|
||||
4.1. Writing a document filter
|
||||
|
||||
4.1.1. Installing through a package system
|
||||
4.1.1. Filter HTML output
|
||||
|
||||
4.1.2. Installing a prebuilt Recoll
|
||||
4.2. Field data processing configuration
|
||||
|
||||
4.2. Supporting packages
|
||||
4.3. API
|
||||
|
||||
4.3. Building from source
|
||||
4.3.1. Interface elements
|
||||
|
||||
4.3.1. Prerequisites
|
||||
4.3.2. Python interface
|
||||
|
||||
4.3.2. Building
|
||||
5. Installation
|
||||
|
||||
4.3.3. Installation
|
||||
5.1. Installing a prebuilt copy
|
||||
|
||||
4.4. Configuration overview
|
||||
5.1.1. Installing through a package system
|
||||
|
||||
4.4.1. Main configuration file
|
||||
5.1.2. Installing a prebuilt Recoll
|
||||
|
||||
4.4.2. The mimemap file
|
||||
5.2. Supporting packages
|
||||
|
||||
4.4.3. The mimeconf file
|
||||
5.3. Building from source
|
||||
|
||||
4.4.4. The mimeview file
|
||||
5.3.1. Prerequisites
|
||||
|
||||
4.4.5. Examples of configuration adjustments
|
||||
5.3.2. Building
|
||||
|
||||
4.5. The KDE Kicker Recoll applet
|
||||
5.3.3. Installation
|
||||
|
||||
4.6. Extending Recoll
|
||||
5.4. Configuration overview
|
||||
|
||||
4.6.1. Writing a document filter
|
||||
5.4.1. Main configuration file
|
||||
|
||||
5.4.2. The mimemap file
|
||||
|
||||
5.4.3. The mimeconf file
|
||||
|
||||
5.4.4. The mimeview file
|
||||
|
||||
5.4.5. Examples of configuration adjustments
|
||||
|
||||
5.5. The KDE Kicker Recoll applet
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
@ -256,8 +266,14 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
individually indexed documents.
|
||||
|
||||
Recoll indexing processes plain text, HTML, openoffice and e-mail files
|
||||
internally. Other types (ie: postscript, pdf, ms-word, rtf) need external
|
||||
internally.
|
||||
|
||||
Other file types (ie: postscript, pdf, ms-word, rtf ...) need external
|
||||
applications for preprocessing. The list is in the installation section.
|
||||
After every indexing operation, Recoll updates a list of commands that
|
||||
would be needed for indexing existing files types. This list can be
|
||||
displayed from the recoll File menu. It is stored in the missing text file
|
||||
inside the configuration directory.
|
||||
|
||||
Without further configuration, Recoll will index all appropriate files
|
||||
from your home directory, with a reasonable set of defaults.
|
||||
@ -717,6 +733,9 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
The query language processor is activated on the simple search entry when
|
||||
the search mode selector is set to Query Language.
|
||||
|
||||
The language is roughly based on the Xesam user search language
|
||||
specification.
|
||||
|
||||
Here follows a sample request that we are going to explain:
|
||||
|
||||
author:"john doe" Beatles OR Lennon Live OR Unplugged -potatoes
|
||||
@ -728,6 +747,12 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
or lennon and either live or unplugged but not potatoes (in any part of
|
||||
the document).
|
||||
|
||||
An element is composed of an optional field specification, and a value,
|
||||
separated by a colon. Exemple: Beatles, author:balzac, dc:title:grandet
|
||||
|
||||
The colon, if present, means "contains". Xesam defines other relations,
|
||||
which are not supported for now.
|
||||
|
||||
All elements in the search entry are normally combined with an implicit
|
||||
AND. It is possible to specify that elements be OR'ed instead, as in
|
||||
Beatles OR Lennon. The OR must be entered literally (capitals), and it has
|
||||
@ -735,51 +760,69 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
(word2 OR word3) not (word1 AND word2) OR word3. Do not enter explicit
|
||||
parenthesis, they are not supported for now.
|
||||
|
||||
An entry preceded by a - specifies a term that should not appear.
|
||||
An element preceded by a - specifies a term that should not appear. Pure
|
||||
negative queries are forbidden.
|
||||
|
||||
The first element in the above exemple, author:"john doe" is a phrase
|
||||
search limited to a specific field. Phrase searches are specified as usual
|
||||
by enclosing the words in double quotes. The field specification appears
|
||||
before the colon (of course this is not limited to phrases, author:Balzac
|
||||
would be ok too). Recoll currently manages the following fields:
|
||||
As usual, words inside quotes define a phrase (the order of words is
|
||||
significant), so that title:"prejudice pride" is not the same as
|
||||
title:prejudice title:pride, and is unlikely to find a result.
|
||||
|
||||
Recoll currently manages the following default fields:
|
||||
|
||||
* title, subject or caption are synonyms which specify data to be
|
||||
searched for in the document title or subject.
|
||||
|
||||
* author or from for searching the documents originators.
|
||||
|
||||
* keyword for searching the document specified keywords (few documents
|
||||
* recipient or to for searching the documents recipients.
|
||||
|
||||
* keyword for searching the document-specified keywords (few documents
|
||||
actually have any).
|
||||
|
||||
As of release 1.9, the filters have the possibility to create other fields
|
||||
with arbitrary names. No standard filters use this possibility yet.
|
||||
* filename for the document's file name.
|
||||
|
||||
There are two other elements which may be specified through the field
|
||||
syntax, but are somewhat special:
|
||||
* ext specifies the file name extension (Ex: ext:html)
|
||||
|
||||
* ext for specifying the file name extension (Ex: ext:html)
|
||||
The field syntax also supports a few field-like, but special, criteria:
|
||||
|
||||
* dir for specifying the file location (Ex: dir:/home/me/somedir).
|
||||
Please note that this is quite inefficient, that it may produce very
|
||||
slow searches, and that it may be worth in some cases to set up
|
||||
separate databases instead.
|
||||
* dir for filtering the results on file location (Ex:
|
||||
dir:/home/me/somedir). Please note that this is quite inefficient,
|
||||
that it may produce very slow searches, and that it may be worth in
|
||||
some cases to set up separate databases instead.
|
||||
|
||||
* mime for specifying the mime type. This one is quite special because
|
||||
you can specify several values which will be OR'ed (the normal default
|
||||
for the language is AND). Ex: mime:text/plain mime:text/html.
|
||||
* mime or format for specifying the mime type. This one is quite special
|
||||
because you can specify several values which will be OR'ed (the normal
|
||||
default for the language is AND). Ex: mime:text/plain mime:text/html.
|
||||
Specifying an explicit boolean operator or negation (-) before a mime
|
||||
specification is not supported and will produce strange results.
|
||||
|
||||
* type or rclcat for specifying the category (as in
|
||||
text/media/presentation/etc.). The classification of mime types in
|
||||
categories is defined in the Recoll configuration (mimeconf), and can
|
||||
be modified or extended. The default category names are those which
|
||||
permit filtering results in the main GUI screen. Categories are OR'ed
|
||||
like mime types above.
|
||||
|
||||
The document filters used while indexing have the possibility to create
|
||||
other fields with arbitrary names, and aliases may be defined in the
|
||||
configuration, so that the exact field search possibilities may be
|
||||
different for you if someone took care of the customisation.
|
||||
|
||||
The query language is currently the only way to use the Recoll field
|
||||
search capability.
|
||||
|
||||
Words inside phrases and capitalized words are not stem-expanded.
|
||||
Wildcards may be used anywhere inside a term. Specifying a wild-card on
|
||||
the left of a term can produce a very slow search.
|
||||
the left of a term can produce a very slow search (or even an incorrect
|
||||
one if the expansion is truncated because of excessive size).
|
||||
|
||||
You can use the show query link at the top of the result list to check the
|
||||
exact query which was finally executed by Xapian.
|
||||
|
||||
Most Xesam phrase modifiers are unsupported, except for l (small ell) to
|
||||
disable stemming, and p to turn an phrase into a NEAR (unordered) search.
|
||||
Exemple: "prejudice pride"p
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
3.5. Complex/advanced search
|
||||
@ -1194,13 +1237,432 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
|
||||
Your main database (the one the current configuration indexes to), is
|
||||
always implicitly active. If this is not desirable, you can set up your
|
||||
configuration so that it indexes, for example, an empty directory.
|
||||
configuration so that it indexes, for example, an empty directory. An
|
||||
alternative indexer may also need to implement a way of purging the index
|
||||
from stale data,
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
Chapter 4. Installation
|
||||
Chapter 4. Programming interface
|
||||
|
||||
4.1. Installing a prebuilt copy
|
||||
Recoll has an Application programming Interface, usable both for indexing
|
||||
and searching, currently accessible from the Python language.
|
||||
|
||||
Another less radical way to extend the application is to write filters for
|
||||
new types of documents.
|
||||
|
||||
The processing of metadata attributes for documents (fields) is highly
|
||||
configurable.
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
4.1. Writing a document filter
|
||||
|
||||
Recoll filters are executable programs which translate from a specific
|
||||
format (ie: openoffice, acrobat, etc.) to the Recoll indexing input
|
||||
format, which may be text/plain or text/html.
|
||||
|
||||
Recoll filters are usually shell-scripts, but this is in no way necessary.
|
||||
These programs are extremely simple and most of the difficulty lies in
|
||||
extracting the text from the native format, not outputting what is
|
||||
expected by Recoll. Happily enough, most document formats already have
|
||||
translators or text extractors which handle the difficult part and can be
|
||||
called from the filter. In some case the output of the translating program
|
||||
is appropriate, and no intermediate shell-script is needed.
|
||||
|
||||
Filters are called with a single argument which is the source file name.
|
||||
They should output the result to stdout.
|
||||
|
||||
The RECOLL_FILTER_FORPREVIEW environment variable (values yes, no) tells
|
||||
the filter if the operation is for indexing or previewing. Some filters
|
||||
use this to output a slightly different format. This is not essential.
|
||||
|
||||
The association of file types to filters is performed in the mimeconf
|
||||
file. A sample:
|
||||
|
||||
[index]
|
||||
application/msword = exec antiword -t -i 1 -m UTF-8;\
|
||||
mimetype=text/plain;charset=utf-8
|
||||
|
||||
application/ogg = exec rclogg
|
||||
|
||||
text/rtf = exec unrtf --nopict --html; charset=iso-8859-1; mimetype=text/html
|
||||
|
||||
The fragment specifies that:
|
||||
|
||||
* application/msword files are processed by executing the antiword
|
||||
program, which outputs text/plain encoded in iso-8859-1.
|
||||
|
||||
* application/ogg files are processed by the rclogg script, with default
|
||||
output type (text/html, with encoding specified in the header, or
|
||||
utf-8 by default).
|
||||
|
||||
* text/rtf is processed by unrtf, which outputs text/html. The
|
||||
iso-8859-1 encoding is specified because it is not the utf-8 default,
|
||||
and not output by unrtf in the HTML header section.
|
||||
|
||||
The easiest way to write a new filter is probably to start from an
|
||||
existing one.
|
||||
|
||||
Filters which output text/plain text are generally simpler, but they
|
||||
cannot specify the character set and other metadata, so they are limited
|
||||
to cases where these elements are not needed.
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
4.1.1. Filter HTML output
|
||||
|
||||
The output HTML could be very minimal like the following example:
|
||||
|
||||
<html><head>
|
||||
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
|
||||
</head>
|
||||
<body>some text content</body></html>
|
||||
|
||||
|
||||
You should take care to escape some characters inside the text by
|
||||
transforming them into appropriate entities. "&" should be transformed
|
||||
into "&", "<" should be transformed into "<". This is not always
|
||||
properly done by translating programs which output HTML, and of course
|
||||
nerver by those which output plain text.
|
||||
|
||||
The character set needs to be specified in the header. It does not need to
|
||||
be UTF-8 (Recoll will take care of translating it), but it must be
|
||||
accurate for good results.
|
||||
|
||||
Recoll will also make use of other header fields if they are present:
|
||||
title, description, keywords.
|
||||
|
||||
Filters also have the possibility to "invent" field names. This should be
|
||||
output as meta tags:
|
||||
|
||||
<meta name="somefield" content="Some textual data" />
|
||||
|
||||
See the following section for details about configuring how field data is
|
||||
processed by the indexer.
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
4.2. Field data processing configuration
|
||||
|
||||
Fields are named pieces of information in or about documents, like title,
|
||||
author, abstract.
|
||||
|
||||
The field values for documents can appear in several ways during indexing:
|
||||
either output by filters as meta fields in the HTML header section, or
|
||||
added as attributes of the Doc object when using the API, or again
|
||||
synthetized internally by Recoll.
|
||||
|
||||
The Recoll query language allows searching for text in a specific field.
|
||||
|
||||
Recoll defines a number of default fields. Additional ones can be output
|
||||
by filters, and described in the fields configuration file.
|
||||
|
||||
Fields can be:
|
||||
|
||||
* indexed, meaning that their terms are separately stored in inverted
|
||||
lists (with a specific prefix), and that a field-specific search is
|
||||
possible.
|
||||
|
||||
* stored, meaning that their value is recorded in the index data record
|
||||
for the document, and can be returned and displayed with search
|
||||
results.
|
||||
|
||||
A field can be either or both indexed and stored.
|
||||
|
||||
A field becomes indexed by having a prefix defined in the [prefixes]
|
||||
section of the fields file. See the comments in there for details
|
||||
|
||||
A field becomes stored by appearing in the [stored] section of the fields
|
||||
file.
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
4.3. API
|
||||
|
||||
4.3.1. Interface elements
|
||||
|
||||
A few elements in the interface are specific and and need an explanation.
|
||||
|
||||
udi
|
||||
|
||||
An udi (unique document identifier) identifies a document. Because
|
||||
of limitations inside the index engine, it is restricted in length
|
||||
(to 200 bytes), which is why a regular URI cannot be used. The
|
||||
structure and contents of the udi is defined by the application
|
||||
and opaque to the index engine. For example, the internal file
|
||||
system indexer uses the complete document path (file path +
|
||||
internal path), truncated to length, the suppressed part being
|
||||
replaced by a hash value.
|
||||
|
||||
ipath
|
||||
|
||||
This data value (set as a field in the Doc object) is stored,
|
||||
along with the URL, but not indexed by Recoll. Its contents are
|
||||
not interpreted, and its use is up to the application. For
|
||||
example, the Recoll internal file system indexer stores the part
|
||||
of the document access path internal to the container file (ipath
|
||||
in this case is a list of subdocument sequential numbers). url and
|
||||
ipath are returned in every search result and permit access to the
|
||||
original document.
|
||||
|
||||
Stored and indexed fields
|
||||
|
||||
The fields file inside the Recoll configuration defines which
|
||||
document fields are either "indexed" (searchable), "stored"
|
||||
(retrievable with search results), or both.
|
||||
|
||||
Data for an external indexer, should be stored in a separate index, not
|
||||
the one for the Recoll internal file system indexer, except if the latter
|
||||
is not used at all). The reason is that the main document indexer purge
|
||||
pass would remove all the other indexer's documents, as they were not seen
|
||||
during indexing. The main indexer documents would also probably be a
|
||||
problem for the external indexer purge operation.
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
4.3.2. Python interface
|
||||
|
||||
4.3.2.1. Introduction
|
||||
|
||||
Recoll versions after 1.11 define a Python programming interface, both for
|
||||
searching and indexing.
|
||||
|
||||
The python interface is not built by default and can be found in the
|
||||
source package, under python/recoll. The directory contains the usual
|
||||
setup.py script which you can use to build and install the module:
|
||||
|
||||
cd recoll-xxx/python/recoll
|
||||
python setup.py build
|
||||
python setup.py install
|
||||
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
4.3.2.2. Interface manual
|
||||
|
||||
NAME
|
||||
recoll - This is an interface to the Recoll full text indexer.
|
||||
|
||||
FILE
|
||||
/usr/local/lib/python2.5/site-packages/recoll.so
|
||||
|
||||
CLASSES
|
||||
Db
|
||||
Doc
|
||||
Query
|
||||
SearchData
|
||||
|
||||
class Db(__builtin__.object)
|
||||
| Db([confdir=None], [extra_dbs=None], [writable = False])
|
||||
|
|
||||
| A Db object holds a connection to a Recoll index. Use the connect()
|
||||
| function to create one.
|
||||
| confdir specifies a Recoll configuration directory (default:
|
||||
| $RECOLL_CONFDIR or ~/.recoll).
|
||||
| extra_dbs is a list of external databases (xapian directories)
|
||||
| writable decides if we can index new data through this connection
|
||||
|
|
||||
| Methods defined here:
|
||||
|
|
||||
|
|
||||
| addOrUpdate(...)
|
||||
| addOrUpdate(udi, doc, parent_udi=None) -> None
|
||||
| Add or update index data for a given document
|
||||
| The udi string must define a unique id for the document. It is not
|
||||
| interpreted inside Recoll
|
||||
| doc is a Doc object
|
||||
| if parent_udi is set, this is a unique identifier for the
|
||||
| top-level container (ie mbox file)
|
||||
|
|
||||
| delete(...)
|
||||
| delete(udi) -> Bool.
|
||||
| Purge index from all data for udi. If udi matches a container
|
||||
| document, purge all subdocs (docs with a parent_udi matching udi).
|
||||
|
|
||||
| makeDocAbstract(...)
|
||||
| makeDocAbstract(Doc, Query) -> string
|
||||
| Build and return 'keyword-in-context' abstract for document
|
||||
| and query.
|
||||
|
|
||||
| needUpdate(...)
|
||||
| needUpdate(udi, sig) -> Bool.
|
||||
| Check if the index is up to date for the document defined by udi,
|
||||
| having the current signature sig.
|
||||
|
|
||||
| purge(...)
|
||||
| purge() -> Bool.
|
||||
| Delete all documents that were not touched during the just finished
|
||||
| indexing pass (since open-for-write). These are the documents for
|
||||
| the needUpdate() call was not performed, indicating that they no
|
||||
| longer exist in the primary storage system.
|
||||
|
|
||||
| query(...)
|
||||
| query() -> Query. Return a new, blank query object for this index.
|
||||
|
|
||||
| setAbstractParams(...)
|
||||
| setAbstractParams(maxchars, contextwords).
|
||||
| Set the parameters used to build 'keyword-in-context' abstracts
|
||||
|
|
||||
| ----------------------------------------------------------------------
|
||||
| Data and other attributes defined here:
|
||||
|
|
||||
|
||||
class Doc(__builtin__.object)
|
||||
| Doc()
|
||||
|
|
||||
| A Doc object contains index data for a given document.
|
||||
| The data is extracted from the index when searching, or set by the
|
||||
| indexer program when updating. The Doc object has no useful methods but
|
||||
| many attributes to be read or set by its user. It matches exactly the
|
||||
| Rcl::Doc c++ object. Some of the attributes are predefined, but,
|
||||
| especially when indexing, others can be set, the name of which will be
|
||||
| processed as field names by the indexing configuration.
|
||||
| Inputs can be specified as unicode or strings.
|
||||
| Outputs are unicode objects.
|
||||
| All dates are specified as unix timestamps, printed as strings
|
||||
| Predefined attributes (index/query/both):
|
||||
| text (index): document plain text
|
||||
| url (both)
|
||||
| fbytes (both) optional) file size in bytes
|
||||
| filename (both)
|
||||
| fmtime (both) optional file modification date. Unix time printed
|
||||
| as string
|
||||
| dbytes (both) document text bytes
|
||||
| dmtime (both) document creation/modification date
|
||||
| ipath (both) value private to the app.: internal access path
|
||||
| inside file
|
||||
| mtype (both) mime type for original document
|
||||
| mtime (query) dmtime if set else fmtime
|
||||
| origcharset (both) charset the text was converted from
|
||||
| size (query) dbytes if set, else fbytes
|
||||
| sig (both) app-defined file modification signature.
|
||||
| For up to date checks
|
||||
| relevancyrating (query)
|
||||
| abstract (both)
|
||||
| author (both)
|
||||
| title (both)
|
||||
| keywords (both)
|
||||
|
|
||||
| Methods defined here:
|
||||
|
|
||||
|
|
||||
| ----------------------------------------------------------------------
|
||||
| Data and other attributes defined here:
|
||||
|
|
||||
|
||||
class Query(__builtin__.object)
|
||||
| Recoll Query objects are used to execute index searches.
|
||||
| They must be created by the Db.query() method.
|
||||
|
|
||||
| Methods defined here:
|
||||
|
|
||||
|
|
||||
| execute(...)
|
||||
| execute(query_string, stemming=1|0)
|
||||
|
|
||||
| Starts a search for query_string, a Recoll search language string
|
||||
| (mostly Xesam-compatible).
|
||||
| The query can be a simple list of terms (and'ed by default), or more
|
||||
| complicated with field specs etc. See the Recoll manual.
|
||||
|
|
||||
| executesd(...)
|
||||
| executesd(SearchData)
|
||||
|
|
||||
| Starts a search for the query defined by the SearchData object.
|
||||
|
|
||||
| fetchone(...)
|
||||
| fetchone(None) -> Doc
|
||||
|
|
||||
| Fetches the next Doc object in the current search results.
|
||||
|
|
||||
| sortby(...)
|
||||
| sortby(field=fieldname, ascending=true)
|
||||
| Sort results by 'fieldname', in ascending or descending order.
|
||||
| Only one field can be used, no subsorts for now.
|
||||
| Must be called before executing the search
|
||||
|
|
||||
| ----------------------------------------------------------------------
|
||||
| Data descriptors defined here:
|
||||
|
|
||||
| next
|
||||
| Next index to be fetched from results. Normally increments after
|
||||
| each fetchone() call, but can be set/reset before the call effect
|
||||
| seeking. Starts at 0
|
||||
|
|
||||
| ----------------------------------------------------------------------
|
||||
| Data and other attributes defined here:
|
||||
|
|
||||
|
||||
class SearchData(__builtin__.object)
|
||||
| SearchData()
|
||||
|
|
||||
| A SearchData object describes a query. It has a number of global
|
||||
| parameters and a chain of search clauses.
|
||||
|
|
||||
| Methods defined here:
|
||||
|
|
||||
|
|
||||
| addclause(...)
|
||||
| addclause(type='and'|'or'|'excl'|'phrase'|'near'|'sub',
|
||||
| qstring=string, slack=int, field=string, stemming=1|0,
|
||||
| subSearch=SearchData)
|
||||
| Adds a simple clause to the SearchData And/Or chain, or a subquery
|
||||
| defined by another SearchData object
|
||||
|
|
||||
| ----------------------------------------------------------------------
|
||||
| Data and other attributes defined here:
|
||||
|
|
||||
|
||||
FUNCTIONS
|
||||
connect(...)
|
||||
connect([confdir=None], [extra_dbs=None], [writable = False])
|
||||
-> Db.
|
||||
|
||||
Connects to a Recoll database and returns a Db object.
|
||||
confdir specifies a Recoll configuration directory
|
||||
(the default is built like for any Recoll program).
|
||||
extra_dbs is a list of external databases (xapian directories)
|
||||
writable decides if we can index new data through this connection
|
||||
|
||||
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
4.3.2.3. Example code
|
||||
|
||||
The following sample would query the index with a user language string.
|
||||
See the python/samples directory inside the Recoll source for other
|
||||
examples.
|
||||
|
||||
#!/usr/bin/env python
|
||||
|
||||
import recoll
|
||||
|
||||
db = recoll.connect()
|
||||
db.setAbstractParams(maxchars=80, contextwords=2)
|
||||
|
||||
query = db.query()
|
||||
nres = query.execute("some user question")
|
||||
print "Result count: ", nres
|
||||
if nres > 5:
|
||||
nres = 5
|
||||
while query.next >= 0 and query.next < nres:
|
||||
doc = query.fetchone()
|
||||
print query.next
|
||||
for k in ("title", "size"):
|
||||
print k, ":", getattr(doc, k).encode('utf-8')
|
||||
abs = db.makeDocAbstract(doc, query).encode('utf-8')
|
||||
print abs
|
||||
print
|
||||
|
||||
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
Chapter 5. Installation
|
||||
|
||||
5.1. Installing a prebuilt copy
|
||||
|
||||
Recoll binary packages from the Recoll web site are always linked
|
||||
statically to the Xapian libraries, and have no other dependencies. You
|
||||
@ -1211,14 +1673,14 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
4.1.1. Installing through a package system
|
||||
5.1.1. Installing through a package system
|
||||
|
||||
If you use a BSD-type port system or a prebuilt package (RPM or other),
|
||||
just follow the usual procedure for your system.
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
4.1.2. Installing a prebuilt Recoll
|
||||
5.1.2. Installing a prebuilt Recoll
|
||||
|
||||
The unpackaged binary versions on the Recoll web site are just compressed
|
||||
tar files of a build tree, where only the useful parts were kept
|
||||
@ -1233,11 +1695,17 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
4.2. Supporting packages
|
||||
5.2. Supporting packages
|
||||
|
||||
Recoll uses external applications to index some file types. You need to
|
||||
install them for the file types that you wish to have indexed (these are
|
||||
run-time dependencies. None is needed for building Recoll):
|
||||
run-time dependencies. None is needed for building Recoll).
|
||||
|
||||
After an indexing pass, the commands that were found missing can be
|
||||
displayed from the recoll File menu. The list is stored in the missing
|
||||
text file inside the configuration directory.
|
||||
|
||||
A list of common file types which need external commands:
|
||||
|
||||
* Openoffice: supported natively, but needs the unzip command to be
|
||||
installed.
|
||||
@ -1275,9 +1743,9 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
4.3. Building from source
|
||||
5.3. Building from source
|
||||
|
||||
4.3.1. Prerequisites
|
||||
5.3.1. Prerequisites
|
||||
|
||||
At the very least, you will need to download and install the xapian core
|
||||
package (Recoll 1.9 normally uses version 1.0.2, but any 0.9 or 1.0.x
|
||||
@ -1295,7 +1763,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
4.3.2. Building
|
||||
5.3.2. Building
|
||||
|
||||
Recoll has been built on Linux (redhat7.3, mandriva 2005/6, Fedora Core
|
||||
3/4/5/6), FreeBSD 5/6, macosx, and Solaris 8. If you build on another
|
||||
@ -1335,7 +1803,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
4.3.3. Installation
|
||||
5.3.3. Installation
|
||||
|
||||
Either type make install or execute recollinstall prefix, in the root of
|
||||
the source tree. This will copy the commands to prefix/bin and the sample
|
||||
@ -1350,7 +1818,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
4.4. Configuration overview
|
||||
5.4. Configuration overview
|
||||
|
||||
Most of the parameters specific to the recoll GUI are set through the
|
||||
Preferences menu and stored in the standard QT place ($HOME/.qt/recollrc).
|
||||
@ -1410,7 +1878,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
4.4.1. Main configuration file
|
||||
5.4.1. Main configuration file
|
||||
|
||||
recoll.conf is the main configuration file. It defines things like what to
|
||||
index (top directories and things to ignore), and the default character
|
||||
@ -1616,7 +2084,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
4.4.2. The mimemap file
|
||||
5.4.2. The mimemap file
|
||||
|
||||
mimemap specifies the file name extension to mime type mappings.
|
||||
|
||||
@ -1642,7 +2110,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
4.4.3. The mimeconf file
|
||||
5.4.3. The mimeconf file
|
||||
|
||||
mimeconf specifies how the different mime types are handled for indexing,
|
||||
and which icons are displayed in the recoll result lists.
|
||||
@ -1656,7 +2124,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
4.4.4. The mimeview file
|
||||
5.4.4. The mimeview file
|
||||
|
||||
mimeview specifies which programs are started when you click on an Edit
|
||||
link in a result list. Ie: HTML is normally displayed using firefox, but
|
||||
@ -1679,9 +2147,9 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
4.4.5. Examples of configuration adjustments
|
||||
5.4.5. Examples of configuration adjustments
|
||||
|
||||
4.4.5.1. Adding an external viewer for an non-indexed type
|
||||
5.4.5.1. Adding an external viewer for an non-indexed type
|
||||
|
||||
Imagine that you have some kind of file which does not have indexable
|
||||
content, but for which you would like to have a functional Edit link in
|
||||
@ -1714,7 +2182,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
4.4.5.2. Adding indexing support for a new file type
|
||||
5.4.5.2. Adding indexing support for a new file type
|
||||
|
||||
Let us now imagine that the above .blob files actually contain indexable
|
||||
text and that you know how to extract it with a command line program.
|
||||
@ -1738,86 +2206,32 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
|
||||
The rclblob filter should be an executable program or script which exists
|
||||
inside /usr/[local/]share/recoll/filters. It will be given a file name as
|
||||
argument and should output the text contents in html format on the
|
||||
standard output.
|
||||
argument and should output the text contents on the standard output.
|
||||
|
||||
You can find more details about writing a Recoll filter in the section
|
||||
about writing filters
|
||||
The filter programming section describes in more detail how to write a
|
||||
filter.
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
4.5. The KDE Kicker Recoll applet
|
||||
5.5. The KDE Kicker Recoll applet
|
||||
|
||||
The Recoll source tree contains the source code to the recoll_applet, a
|
||||
small application derived from the find_applet. This can be used to add a
|
||||
small Recoll launcher to the KDE panel.
|
||||
|
||||
The applet is not automatically built with the main Recoll programs. To
|
||||
build it, you need to unpack the Recoll source code, then go to the
|
||||
kde/recoll_applet/ directory, and type the usual configure;make;make
|
||||
install.
|
||||
The applet is not automatically built with the main Recoll programs, nor
|
||||
is it included with the main source distribution (because the KDE build
|
||||
boilerplate makes it relatively big). You can download its source from the
|
||||
recoll.org download page. Use the omnipotent configure;make;make install
|
||||
incantation to build and install.
|
||||
|
||||
You can then add the applet to the panel by right-clicking the panel and
|
||||
choosing the Add applet entry.
|
||||
|
||||
The recoll_applet has a small text window where you can type a Recoll
|
||||
query (in query language form), and an icon which can be used to restrict
|
||||
the search to certain types of files.
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
4.6. Extending Recoll
|
||||
|
||||
4.6.1. Writing a document filter
|
||||
|
||||
Recoll filters are executable programs which translate from a specific
|
||||
format (ie: openoffice, acrobat, etc.) to the Recoll indexing input
|
||||
format, which was chosen to be HTML.
|
||||
|
||||
Recoll filters are usually shell-scripts, but this is in no way necessary.
|
||||
These programs are extremely simple and most of the difficulty lies in
|
||||
extracting the text from the native format, not outputting what is
|
||||
expected by Recoll. Happily enough, most document formats already have
|
||||
translators or text extractors which handle the difficult part and can be
|
||||
called from the filter.
|
||||
|
||||
Filters are called with a single argument which is the source file name.
|
||||
They should output the result to stdout.
|
||||
|
||||
The RECOLL_FILTER_FORPREVIEW environment variable (values yes, no) tells
|
||||
the filter if the operation is for indexing or previewing. Some filters
|
||||
use this to output a slightly different format. This is not essential.
|
||||
|
||||
The output HTML could be very minimal like the following example:
|
||||
|
||||
<html><head>
|
||||
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
|
||||
</head>
|
||||
<body>some text content</body></html>
|
||||
|
||||
|
||||
You should take care to escape some characters inside the text by
|
||||
transforming them into appropriate entities. "&" should be transformed
|
||||
into "&", "<" should be transformed into "<".
|
||||
|
||||
The character set needs to be specified in the header. It does not need to
|
||||
be UTF-8 (Recoll will take care of translating it), but it must be
|
||||
accurate for good results.
|
||||
|
||||
Recoll will also make use of other header fields if they are present:
|
||||
title, description, keywords.
|
||||
|
||||
As of Recoll release 1.9, filters also have the possibility to "invent"
|
||||
field names. This should be output as meta tags:
|
||||
|
||||
<meta name="somefield" content="Some textual data" />
|
||||
|
||||
In this case, a correspondance between field name and Xapian prefix should
|
||||
also be added to the mimeconf file. See the existing entries for
|
||||
inspiration. The field can then be used inside the query language to
|
||||
narrow searches.
|
||||
|
||||
The easiest way to write a new filter is probably to start from an
|
||||
existing one.
|
||||
the search to certain types of files. It is quite primitive, and launches
|
||||
a new recoll GUI instance every time (even if it is already running). You
|
||||
may find it useful anyway.
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user