release 1.14.0
This commit is contained in:
parent
023b0205d8
commit
f5974f5133
247
src/INSTALL
247
src/INSTALL
@ -91,7 +91,13 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
displayed from the recoll File menu. The list is stored in the missing
|
||||
text file inside the configuration directory.
|
||||
|
||||
A list of common file types which need external commands:
|
||||
A list of common file types which need external commands follows. Many of
|
||||
the filters need the iconv command, which is not always listed as a
|
||||
dependancy.
|
||||
|
||||
As of Recoll release 1.14, a number of XML-based formats that were handled
|
||||
by ad hoc filter code now use xsltproc, which usually comes with libxslt.
|
||||
These are: abiword, fb2 (ebooks), kword, openoffice, svg.
|
||||
|
||||
* Openoffice: supported natively, but needs the unzip command to be
|
||||
installed.
|
||||
@ -104,6 +110,8 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
|
||||
* MS Excel and PowerPoint: catdoc.
|
||||
|
||||
* MS Open XML (docx): needs xsltproc.
|
||||
|
||||
* Wordperfect files: libwpd.
|
||||
|
||||
* RTF: unrtf
|
||||
@ -117,13 +125,12 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
|
||||
* djvu: DjVuLibre
|
||||
|
||||
* mp3: Recoll will use the id3info command from the id3lib package to
|
||||
extract tag information. Without it, only the file names will be
|
||||
indexed.
|
||||
|
||||
* flac files need metaflac.
|
||||
|
||||
* ogg files need ogginfo.
|
||||
* mp3, flac, ogg vorbis: Recoll releases before 1.13 use the id3info
|
||||
command from the id3lib package to extract mp3 tag information. (Some
|
||||
gcc versions after 4.4 may have trouble compiling id3lib. You can find
|
||||
a workaround here), metaflac (standard flac tools) for flac files, and
|
||||
ogginfo (vorbis tools) for ogg files. Releases 1.14 and later use a
|
||||
single Python filter based on mutagen for all audio file types.
|
||||
|
||||
* Pictures: Recoll uses the Exiftool Perl package to extract tag
|
||||
information. Most image file formats are supported. Note that there
|
||||
@ -134,12 +141,14 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
* chm: files in microsoft help format need Python and the pychm module
|
||||
(which needs chmlib).
|
||||
|
||||
* ics: iCalendar files need Python and the icalendar module.
|
||||
* ics: up to Recoll 1.13, iCalendar files need Python and the icalendar
|
||||
module. For newer versions, icalendar is not needed
|
||||
|
||||
* zip: Zip archives need Python (and the standard zipfile module).
|
||||
|
||||
Text, HTML, mail folders, Openoffice and Scribus files are processed
|
||||
internally. Lyx is used to index Lyx files. Many filters need sed and awk.
|
||||
internally. Lyx is used to index Lyx files. Many filters need iconv and
|
||||
the standard sed and awk.
|
||||
|
||||
--------------------------------------------------------------------------
|
||||
|
||||
@ -159,11 +168,18 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
|
||||
7.3.1. Prerequisites
|
||||
|
||||
At the very least, you will need to download and install the xapian core
|
||||
package and the qt run-time and development packages. Check the Recoll
|
||||
download page for up to date version information.
|
||||
C++ compiler. Up to Recoll version 1.13.04, its absence can manifest
|
||||
itself by strange messages about a missing iconv_open.
|
||||
|
||||
You will most probably be able to find a binary package for qt for your
|
||||
Development files for Xapian core
|
||||
|
||||
Development files for Qt .
|
||||
|
||||
Development files for X11 and zlib.
|
||||
|
||||
Check the Recoll download page for up to date version information.
|
||||
|
||||
You will most probably be able to find a binary package for Qt for your
|
||||
system. You may have to compile Xapian but this is not difficult (if you
|
||||
are using FreeBSD, there is a port).
|
||||
|
||||
@ -173,7 +189,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
|
||||
7.3.2. Building
|
||||
|
||||
Recoll has been built on Linux, FreeBSD, macosx, and Solaris, most
|
||||
Recoll has been built on Linux, FreeBSD, Mac OS X, and Solaris, most
|
||||
versions after 2005 should be ok, maybe some older ones too (Solaris 8 is
|
||||
ok). If you build on another system, and need to modify things, I would
|
||||
very much welcome patches.
|
||||
@ -350,14 +366,18 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
and edit the configuration file before restarting the command. This will
|
||||
start the initial indexing, which may take some time.
|
||||
|
||||
Paramers affecting what we index:
|
||||
Most of the following parameters can be changed from the Index
|
||||
Configuration menu in the recoll interface. Some can only be set by
|
||||
editing the configuration file.
|
||||
|
||||
7.4.1.1. Parameters affecting what documents we index:
|
||||
|
||||
topdirs
|
||||
|
||||
Specifies the list of directories or files to index (recursively
|
||||
for directories). The indexer will not follow symbolic links
|
||||
inside the indexed trees by default (see the followLinks options
|
||||
though).
|
||||
for directories). You can use symbolic links as elements of this
|
||||
list. See the followLinks option about following symbolic links
|
||||
found under the top elements (not followed by default).
|
||||
|
||||
skippedNames
|
||||
|
||||
@ -471,7 +491,72 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
Beagle plugin as ~/.beagle/ToIndex so there should be no need to
|
||||
change it.
|
||||
|
||||
Parameters affecting where and how we store things:
|
||||
7.4.1.2. Parameters affecting how we generate terms:
|
||||
|
||||
Changing some of these parameters will imply a full reindex. Also, when
|
||||
using multiple indexes, it may not make sense to search indexes that don't
|
||||
share the values for these parameters, because they usually affect both
|
||||
search and index operations.
|
||||
|
||||
nonumbers
|
||||
|
||||
If this set to true, no terms will be generated for numbers. For
|
||||
example "123", "1.5e6", 192.168.1.4, would not be indexed
|
||||
("value123" would still be). Numbers are often quite interesting
|
||||
to search for, and this should probably not be set except for
|
||||
special situations, ie, scientific documents with huge amounts of
|
||||
numbers in them. This can only be set for a whole index, not for a
|
||||
subtree.
|
||||
|
||||
nocjk
|
||||
|
||||
If this set to true, specific east asian (Chinese Korean Japanese)
|
||||
characters/word splitting is turned off. This will save a small
|
||||
amount of cpu if you have no CJK documents. If your document base
|
||||
does include such text but you are not interested in searching it,
|
||||
setting nocjk may be a significant time and space saver.
|
||||
|
||||
cjkngramlen
|
||||
|
||||
This lets you adjust the size of n-grams used for indexing CJK
|
||||
text. The default value of 2 is probably appropriate in most
|
||||
cases. A value of 3 would allow more precision and efficiency on
|
||||
longer words, but the index will be approximately twice as large.
|
||||
|
||||
indexstemminglanguages
|
||||
|
||||
A list of languages for which the stem expansion databases will be
|
||||
built. See recollindex(1) or use the recollindex -l command for
|
||||
possible values. You can add a stem expansion database for a
|
||||
different language by using recollindex -s, but it will be deleted
|
||||
during the next indexing. Only languages listed in the
|
||||
configuration file are permanent.
|
||||
|
||||
defaultcharset
|
||||
|
||||
The name of the character set used for files that do not contain a
|
||||
character set definition (ie: plain text files). This can be
|
||||
redefined for any sub-directory. If it is not set at all, the
|
||||
character set used is the one defined by the nls environment
|
||||
(LC_ALL, LC_CTYPE, LANG), or iso8859-1 if nothing is set.
|
||||
|
||||
maildefcharset
|
||||
|
||||
This can be used to define the default character set specifically
|
||||
for mail messages which don't specify it. This is mainly useful
|
||||
for readpst (libpst) dumps, which are utf-8 but do not say so.
|
||||
|
||||
localfields
|
||||
|
||||
This allows setting fields for all documents under a given
|
||||
directory. Typical usage would be to set an "rclaptg" field, to be
|
||||
used in mimeview to select a specific viewer. If several fields
|
||||
are to be set, they should be separated with a colon (':')
|
||||
character (which there is currently no way to escape). Ie:
|
||||
localfields= rclaptg=gnus:other = val, then select specifier
|
||||
viewer with mimetype|tag=... in mimeview.
|
||||
|
||||
7.4.1.3. Parameters affecting where and how we store things:
|
||||
|
||||
dbdir
|
||||
|
||||
@ -519,7 +604,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
default, which is flushing every 10000 documents (memory usage
|
||||
depends on average document size). The default value is 10.
|
||||
|
||||
Miscellani:
|
||||
7.4.1.4. Miscellaneous parameters:
|
||||
|
||||
loglevel,daemloglevel
|
||||
|
||||
@ -533,44 +618,11 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
value, and is the default. The daemversion is specific to the
|
||||
indexing monitor daemon.
|
||||
|
||||
indexstemminglanguages
|
||||
|
||||
A list of languages for which the stem expansion databases will be
|
||||
built. See recollindex(1) or use the recollindex -l command for
|
||||
possible values. You can add a stem expansion database for a
|
||||
different language by using recollindex -s, but it will be deleted
|
||||
during the next indexing. Only languages listed in the
|
||||
configuration file are permanent.
|
||||
|
||||
defaultcharset
|
||||
|
||||
The name of the character set used for files that do not contain a
|
||||
character set definition (ie: plain text files). This can be
|
||||
redefined for any sub-directory. If it is not set at all, the
|
||||
character set used is the one defined by the nls environment
|
||||
(LC_ALL, LC_CTYPE, LANG), or iso8859-1 if nothing is set.
|
||||
|
||||
filtermaxseconds
|
||||
|
||||
Maximum filter execution time, after which it is aborted. Some
|
||||
postscript programs just loop...
|
||||
|
||||
maildefcharset
|
||||
|
||||
This can be used to define the default character set specifically
|
||||
for mail messages which don't specify it. This is mainly useful
|
||||
for readpst (libpst) dumps, which are utf-8 but do not say so.
|
||||
|
||||
localfields
|
||||
|
||||
This allows setting fields for all documents under a given
|
||||
directory. Typical usage would be to set an "rclaptg" field, to be
|
||||
used in mimeview to select a specific viewer. If several fields
|
||||
are to be set, they should be separated with a ':' character
|
||||
(which there is currently no way to escape). Ie: localfields=
|
||||
rclaptg=gnus:other = val, then select specifier viewer with
|
||||
mimetype|tag=... in mimeview.
|
||||
|
||||
filtersdir
|
||||
|
||||
A directory to search for the external filter scripts used to
|
||||
@ -610,28 +662,73 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
Useful for cases where you don't need the functionality or when it
|
||||
is unusable because aspell crashes during dictionary generation.
|
||||
|
||||
nocjk
|
||||
|
||||
If this set to true, specific east asian (Chinese Korean Japanese)
|
||||
characters/word splitting is turned off. This will save a small
|
||||
amount of cpu if you have no CJK documents. If your document base
|
||||
does include such text but you are not interested in searching it,
|
||||
setting nocjk may be a significant time and space saver.
|
||||
|
||||
cjkngramlen
|
||||
|
||||
This lets you adjust the size of n-grams used for indexing CJK
|
||||
text. The default value of 2 is probably appropriate in most
|
||||
cases. A value of 3 would allow more precision and efficiency on
|
||||
longer words, but the index will be approximately twice as large.
|
||||
|
||||
guesscharset
|
||||
|
||||
Decide if we try to guess the character set of files if no
|
||||
internal value is available (ie: for plain text files). This does
|
||||
not work well in general, and should probably not be used.
|
||||
|
||||
7.4.2. The mimemap file
|
||||
7.4.2. The fields file
|
||||
|
||||
This file contains information about dynamic fields handling in Recoll.
|
||||
Some very basic fields have hard-wired behaviour, and, mostly, you should
|
||||
not change the original data inside the fields file. But you can create
|
||||
custom fields fitting your data and handle them just like they were native
|
||||
ones.
|
||||
|
||||
The fields file has several sections, which each define an aspect of
|
||||
fields processing. Quite often, you'll have to modify several sections to
|
||||
obtain the desired behaviour.
|
||||
|
||||
We will only give a short description here, you should refer to the
|
||||
comments inside the file for more detailed information.
|
||||
|
||||
Field names should be lowercase alphabetic ASCII.
|
||||
|
||||
[prefixes]
|
||||
|
||||
A field becomes indexed (searchable) by having a prefix defined in
|
||||
this section.
|
||||
|
||||
[stored]
|
||||
|
||||
A field becomes stored (displayable inside results) by having its
|
||||
name listed in this section (typically with an empty value).
|
||||
|
||||
[aliases]
|
||||
|
||||
This section defines lists of synonyms for the canonical names
|
||||
used inside the [prefixes] and [stored] sections
|
||||
|
||||
filter-specific sections
|
||||
|
||||
Some filters may need specific configuration for handling fields.
|
||||
Only the mail message filter currently has such a section (named
|
||||
[mail]). It allows indexing arbitrary mail headers in addition to
|
||||
the ones indexed by default. Other such sections may appear in the
|
||||
future.
|
||||
|
||||
Here follows a small example of a personal fields file. This would extract
|
||||
a specific mail header and use it as a searchable field, with data
|
||||
displayable inside result lists. (Side note: as the mail filter does no
|
||||
decoding on the values, only plain ascii headers can be indexed, and only
|
||||
the first occurrence will be used for headers that occur several times).
|
||||
|
||||
[prefixes]
|
||||
# Index mailmytag contents (with the given prefix)
|
||||
mailmytag = XMTAG
|
||||
|
||||
[stored]
|
||||
# Store mailmytag inside the document data record (so that it can be
|
||||
# displayed - as %(mailmytag) - in result lists).
|
||||
mailmytag =
|
||||
|
||||
[mail]
|
||||
# Extract the X-My-Tag mail header, and use it internally with the
|
||||
# mailmytag field name
|
||||
x-my-tag = mailmytag
|
||||
|
||||
7.4.3. The mimemap file
|
||||
|
||||
mimemap specifies the file name extension to mime type mappings.
|
||||
|
||||
@ -655,7 +752,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
given Recoll version. Having it there avoids cluttering the more
|
||||
user-oriented and locally customized skippedNames.
|
||||
|
||||
7.4.3. The mimeconf file
|
||||
7.4.4. The mimeconf file
|
||||
|
||||
mimeconf specifies how the different mime types are handled for indexing,
|
||||
and which icons are displayed in the recoll result lists.
|
||||
@ -667,7 +764,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
recoll in the result lists (the values are the basenames of the png images
|
||||
inside the iconsdir directory (specified in recoll.conf).
|
||||
|
||||
7.4.4. The mimeview file
|
||||
7.4.5. The mimeview file
|
||||
|
||||
mimeview specifies which programs are started when you click on an Edit
|
||||
link in a result list. Ie: HTML is normally displayed using firefox, but
|
||||
@ -693,9 +790,9 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
user preferences, all mimeview entries will be ignored except the one
|
||||
labelled application/x-all (which is set to use xdg-open by default).
|
||||
|
||||
7.4.5. Examples of configuration adjustments
|
||||
7.4.6. Examples of configuration adjustments
|
||||
|
||||
7.4.5.1. Adding an external viewer for an non-indexed type
|
||||
7.4.6.1. Adding an external viewer for an non-indexed type
|
||||
|
||||
Imagine that you have some kind of file which does not have indexable
|
||||
content, but for which you would like to have a functional Edit link in
|
||||
@ -725,7 +822,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
configuration, which you do not need to alter. mimeview can also be
|
||||
modified from the Gui.
|
||||
|
||||
7.4.5.2. Adding indexing support for a new file type
|
||||
7.4.6.2. Adding indexing support for a new file type
|
||||
|
||||
Let us now imagine that the above .blob files actually contain indexable
|
||||
text and that you know how to extract it with a command line program.
|
||||
|
||||
322
src/README
322
src/README
@ -102,7 +102,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
|
||||
6.1.1. Filter HTML output
|
||||
|
||||
6.2. Field data processing configuration
|
||||
6.2. Field data processing
|
||||
|
||||
6.3. API
|
||||
|
||||
@ -132,13 +132,15 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
|
||||
7.4.1. Main configuration file
|
||||
|
||||
7.4.2. The mimemap file
|
||||
7.4.2. The fields file
|
||||
|
||||
7.4.3. The mimeconf file
|
||||
7.4.3. The mimemap file
|
||||
|
||||
7.4.4. The mimeview file
|
||||
7.4.4. The mimeconf file
|
||||
|
||||
7.4.5. Examples of configuration adjustments
|
||||
7.4.5. The mimeview file
|
||||
|
||||
7.4.6. Examples of configuration adjustments
|
||||
|
||||
7.5. The KDE Kicker Recoll applet
|
||||
|
||||
@ -868,6 +870,32 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
that it may produce very slow searches, and that it may be worth in
|
||||
some cases to set up separate databases instead.
|
||||
|
||||
* date for searching or filtering on dates. The syntax for the argument
|
||||
is based on the ISO8601 standard for dates and time intervals. Only
|
||||
dates are supported, no times. The general syntax is 2 elements
|
||||
separated by a / character. Each element can be a date or a period of
|
||||
time. Periods are specified as PnYnMnD. The n numbers are the
|
||||
respective numbers of years, months or days, any of which may be
|
||||
missing. Dates are specified as YYYY-MM-DD. The days and months parts
|
||||
may be missing. If the / is present but an element is missing, the
|
||||
missing element is interpreted as the lowest or highest date in the
|
||||
index. Exemples:
|
||||
|
||||
* 2001-03-01/2002-05-01 the basic syntax for an interval of dates.
|
||||
|
||||
* 2001-03-01/P1Y2M the same specified with a period.
|
||||
|
||||
* 2001/ from the beginning of 2001 to the latest date in the index.
|
||||
|
||||
* 2001 the whole year of 2001
|
||||
|
||||
* P2D/ means 2 days ago up to now if there are no documents with
|
||||
dates in the future.
|
||||
|
||||
* /2003 all documents from 2003 or older.
|
||||
|
||||
Periods can also be specified with small letters (ie: p2y).
|
||||
|
||||
* mime or format for specifying the mime type. This one is quite special
|
||||
because you can specify several values which will be OR'ed (the normal
|
||||
default for the language is AND). Ex: mime:text/plain mime:text/html.
|
||||
@ -1156,6 +1184,10 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
Wildcards. Wildcards can be used inside search terms in all forms of
|
||||
searches. More about wildcards.
|
||||
|
||||
Automatic suffixes. Words like odt or ods can be automatically turned into
|
||||
query language ext:xxx clauses. This can be enabled in the Search
|
||||
preferences panel in the GUI.
|
||||
|
||||
Disabling stem expansion. Entering a capitalized word in any search field
|
||||
will prevent stem expansion (no search for gardening if you enter Garden
|
||||
instead of garden). This is the only case where character case should make
|
||||
@ -1321,15 +1353,16 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
the search terms. This can slow down result list display significantly
|
||||
for big documents, and you may want to turn it off.
|
||||
|
||||
* Replace abstracts from documents: this decides if we should synthesize
|
||||
and display an abstract in place of an explicit abstract found within
|
||||
the document itself.
|
||||
|
||||
* Synthetic abstract size: adjust to taste...
|
||||
|
||||
* Synthetic abstract context words: how many words should be displayed
|
||||
around each term occurrence.
|
||||
|
||||
* Query language magic file name suffixes: a list of words which
|
||||
automatically get turned into ext:xxx file name suffix clauses when
|
||||
starting a query language query (ie: doc xls xlsx...). This will save
|
||||
some typing for people who use file types a lot when querying.
|
||||
|
||||
External indexes: This panel will let you browse for additional indexes
|
||||
that you may want to search. External indexes are designated by their
|
||||
database directory (ie: /home/someothergui/.recoll/xapiandb,
|
||||
@ -1650,7 +1683,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
6.2. Field data processing configuration
|
||||
6.2. Field data processing
|
||||
|
||||
Fields are named pieces of information in or about documents, like title,
|
||||
author, abstract.
|
||||
@ -1675,15 +1708,11 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
for the document, and can be returned and displayed with search
|
||||
results.
|
||||
|
||||
A field can be either or both indexed and stored.
|
||||
A field can be either or both indexed and stored. This and other aspects
|
||||
of fields handling is defined inside the fields configuration file.
|
||||
|
||||
A field becomes indexed by having a prefix defined in the [prefixes]
|
||||
section of the fields file. See the comments in there for details
|
||||
|
||||
A field becomes stored by appearing in the [stored] section of the fields
|
||||
file.
|
||||
|
||||
See the comments inside the fields for more details.
|
||||
You can find more information in the section about the fields file, or in
|
||||
comments inside the file.
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
@ -2041,7 +2070,13 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
displayed from the recoll File menu. The list is stored in the missing
|
||||
text file inside the configuration directory.
|
||||
|
||||
A list of common file types which need external commands:
|
||||
A list of common file types which need external commands follows. Many of
|
||||
the filters need the iconv command, which is not always listed as a
|
||||
dependancy.
|
||||
|
||||
As of Recoll release 1.14, a number of XML-based formats that were handled
|
||||
by ad hoc filter code now use xsltproc, which usually comes with libxslt.
|
||||
These are: abiword, fb2 (ebooks), kword, openoffice, svg.
|
||||
|
||||
* Openoffice: supported natively, but needs the unzip command to be
|
||||
installed.
|
||||
@ -2054,6 +2089,8 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
|
||||
* MS Excel and PowerPoint: catdoc.
|
||||
|
||||
* MS Open XML (docx): needs xsltproc.
|
||||
|
||||
* Wordperfect files: libwpd.
|
||||
|
||||
* RTF: unrtf
|
||||
@ -2067,13 +2104,12 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
|
||||
* djvu: DjVuLibre
|
||||
|
||||
* mp3: Recoll will use the id3info command from the id3lib package to
|
||||
extract tag information. Without it, only the file names will be
|
||||
indexed.
|
||||
|
||||
* flac files need metaflac.
|
||||
|
||||
* ogg files need ogginfo.
|
||||
* mp3, flac, ogg vorbis: Recoll releases before 1.13 use the id3info
|
||||
command from the id3lib package to extract mp3 tag information. (Some
|
||||
gcc versions after 4.4 may have trouble compiling id3lib. You can find
|
||||
a workaround here), metaflac (standard flac tools) for flac files, and
|
||||
ogginfo (vorbis tools) for ogg files. Releases 1.14 and later use a
|
||||
single Python filter based on mutagen for all audio file types.
|
||||
|
||||
* Pictures: Recoll uses the Exiftool Perl package to extract tag
|
||||
information. Most image file formats are supported. Note that there
|
||||
@ -2084,12 +2120,14 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
* chm: files in microsoft help format need Python and the pychm module
|
||||
(which needs chmlib).
|
||||
|
||||
* ics: iCalendar files need Python and the icalendar module.
|
||||
* ics: up to Recoll 1.13, iCalendar files need Python and the icalendar
|
||||
module. For newer versions, icalendar is not needed
|
||||
|
||||
* zip: Zip archives need Python (and the standard zipfile module).
|
||||
|
||||
Text, HTML, mail folders, Openoffice and Scribus files are processed
|
||||
internally. Lyx is used to index Lyx files. Many filters need sed and awk.
|
||||
internally. Lyx is used to index Lyx files. Many filters need iconv and
|
||||
the standard sed and awk.
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
@ -2097,11 +2135,18 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
|
||||
7.3.1. Prerequisites
|
||||
|
||||
At the very least, you will need to download and install the xapian core
|
||||
package and the qt run-time and development packages. Check the Recoll
|
||||
download page for up to date version information.
|
||||
C++ compiler. Up to Recoll version 1.13.04, its absence can manifest
|
||||
itself by strange messages about a missing iconv_open.
|
||||
|
||||
You will most probably be able to find a binary package for qt for your
|
||||
Development files for Xapian core
|
||||
|
||||
Development files for Qt .
|
||||
|
||||
Development files for X11 and zlib.
|
||||
|
||||
Check the Recoll download page for up to date version information.
|
||||
|
||||
You will most probably be able to find a binary package for Qt for your
|
||||
system. You may have to compile Xapian but this is not difficult (if you
|
||||
are using FreeBSD, there is a port).
|
||||
|
||||
@ -2113,7 +2158,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
|
||||
7.3.2. Building
|
||||
|
||||
Recoll has been built on Linux, FreeBSD, macosx, and Solaris, most
|
||||
Recoll has been built on Linux, FreeBSD, Mac OS X, and Solaris, most
|
||||
versions after 2005 should be ok, maybe some older ones too (Solaris 8 is
|
||||
ok). If you build on another system, and need to modify things, I would
|
||||
very much welcome patches.
|
||||
@ -2282,14 +2327,20 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
and edit the configuration file before restarting the command. This will
|
||||
start the initial indexing, which may take some time.
|
||||
|
||||
Paramers affecting what we index:
|
||||
Most of the following parameters can be changed from the Index
|
||||
Configuration menu in the recoll interface. Some can only be set by
|
||||
editing the configuration file.
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
7.4.1.1. Parameters affecting what documents we index:
|
||||
|
||||
topdirs
|
||||
|
||||
Specifies the list of directories or files to index (recursively
|
||||
for directories). The indexer will not follow symbolic links
|
||||
inside the indexed trees by default (see the followLinks options
|
||||
though).
|
||||
for directories). You can use symbolic links as elements of this
|
||||
list. See the followLinks option about following symbolic links
|
||||
found under the top elements (not followed by default).
|
||||
|
||||
skippedNames
|
||||
|
||||
@ -2403,7 +2454,76 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
Beagle plugin as ~/.beagle/ToIndex so there should be no need to
|
||||
change it.
|
||||
|
||||
Parameters affecting where and how we store things:
|
||||
----------------------------------------------------------------------
|
||||
|
||||
7.4.1.2. Parameters affecting how we generate terms:
|
||||
|
||||
Changing some of these parameters will imply a full reindex. Also, when
|
||||
using multiple indexes, it may not make sense to search indexes that don't
|
||||
share the values for these parameters, because they usually affect both
|
||||
search and index operations.
|
||||
|
||||
nonumbers
|
||||
|
||||
If this set to true, no terms will be generated for numbers. For
|
||||
example "123", "1.5e6", 192.168.1.4, would not be indexed
|
||||
("value123" would still be). Numbers are often quite interesting
|
||||
to search for, and this should probably not be set except for
|
||||
special situations, ie, scientific documents with huge amounts of
|
||||
numbers in them. This can only be set for a whole index, not for a
|
||||
subtree.
|
||||
|
||||
nocjk
|
||||
|
||||
If this set to true, specific east asian (Chinese Korean Japanese)
|
||||
characters/word splitting is turned off. This will save a small
|
||||
amount of cpu if you have no CJK documents. If your document base
|
||||
does include such text but you are not interested in searching it,
|
||||
setting nocjk may be a significant time and space saver.
|
||||
|
||||
cjkngramlen
|
||||
|
||||
This lets you adjust the size of n-grams used for indexing CJK
|
||||
text. The default value of 2 is probably appropriate in most
|
||||
cases. A value of 3 would allow more precision and efficiency on
|
||||
longer words, but the index will be approximately twice as large.
|
||||
|
||||
indexstemminglanguages
|
||||
|
||||
A list of languages for which the stem expansion databases will be
|
||||
built. See recollindex(1) or use the recollindex -l command for
|
||||
possible values. You can add a stem expansion database for a
|
||||
different language by using recollindex -s, but it will be deleted
|
||||
during the next indexing. Only languages listed in the
|
||||
configuration file are permanent.
|
||||
|
||||
defaultcharset
|
||||
|
||||
The name of the character set used for files that do not contain a
|
||||
character set definition (ie: plain text files). This can be
|
||||
redefined for any sub-directory. If it is not set at all, the
|
||||
character set used is the one defined by the nls environment
|
||||
(LC_ALL, LC_CTYPE, LANG), or iso8859-1 if nothing is set.
|
||||
|
||||
maildefcharset
|
||||
|
||||
This can be used to define the default character set specifically
|
||||
for mail messages which don't specify it. This is mainly useful
|
||||
for readpst (libpst) dumps, which are utf-8 but do not say so.
|
||||
|
||||
localfields
|
||||
|
||||
This allows setting fields for all documents under a given
|
||||
directory. Typical usage would be to set an "rclaptg" field, to be
|
||||
used in mimeview to select a specific viewer. If several fields
|
||||
are to be set, they should be separated with a colon (':')
|
||||
character (which there is currently no way to escape). Ie:
|
||||
localfields= rclaptg=gnus:other = val, then select specifier
|
||||
viewer with mimetype|tag=... in mimeview.
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
7.4.1.3. Parameters affecting where and how we store things:
|
||||
|
||||
dbdir
|
||||
|
||||
@ -2451,7 +2571,9 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
default, which is flushing every 10000 documents (memory usage
|
||||
depends on average document size). The default value is 10.
|
||||
|
||||
Miscellani:
|
||||
----------------------------------------------------------------------
|
||||
|
||||
7.4.1.4. Miscellaneous parameters:
|
||||
|
||||
loglevel,daemloglevel
|
||||
|
||||
@ -2465,44 +2587,11 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
value, and is the default. The daemversion is specific to the
|
||||
indexing monitor daemon.
|
||||
|
||||
indexstemminglanguages
|
||||
|
||||
A list of languages for which the stem expansion databases will be
|
||||
built. See recollindex(1) or use the recollindex -l command for
|
||||
possible values. You can add a stem expansion database for a
|
||||
different language by using recollindex -s, but it will be deleted
|
||||
during the next indexing. Only languages listed in the
|
||||
configuration file are permanent.
|
||||
|
||||
defaultcharset
|
||||
|
||||
The name of the character set used for files that do not contain a
|
||||
character set definition (ie: plain text files). This can be
|
||||
redefined for any sub-directory. If it is not set at all, the
|
||||
character set used is the one defined by the nls environment
|
||||
(LC_ALL, LC_CTYPE, LANG), or iso8859-1 if nothing is set.
|
||||
|
||||
filtermaxseconds
|
||||
|
||||
Maximum filter execution time, after which it is aborted. Some
|
||||
postscript programs just loop...
|
||||
|
||||
maildefcharset
|
||||
|
||||
This can be used to define the default character set specifically
|
||||
for mail messages which don't specify it. This is mainly useful
|
||||
for readpst (libpst) dumps, which are utf-8 but do not say so.
|
||||
|
||||
localfields
|
||||
|
||||
This allows setting fields for all documents under a given
|
||||
directory. Typical usage would be to set an "rclaptg" field, to be
|
||||
used in mimeview to select a specific viewer. If several fields
|
||||
are to be set, they should be separated with a ':' character
|
||||
(which there is currently no way to escape). Ie: localfields=
|
||||
rclaptg=gnus:other = val, then select specifier viewer with
|
||||
mimetype|tag=... in mimeview.
|
||||
|
||||
filtersdir
|
||||
|
||||
A directory to search for the external filter scripts used to
|
||||
@ -2542,21 +2631,6 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
Useful for cases where you don't need the functionality or when it
|
||||
is unusable because aspell crashes during dictionary generation.
|
||||
|
||||
nocjk
|
||||
|
||||
If this set to true, specific east asian (Chinese Korean Japanese)
|
||||
characters/word splitting is turned off. This will save a small
|
||||
amount of cpu if you have no CJK documents. If your document base
|
||||
does include such text but you are not interested in searching it,
|
||||
setting nocjk may be a significant time and space saver.
|
||||
|
||||
cjkngramlen
|
||||
|
||||
This lets you adjust the size of n-grams used for indexing CJK
|
||||
text. The default value of 2 is probably appropriate in most
|
||||
cases. A value of 3 would allow more precision and efficiency on
|
||||
longer words, but the index will be approximately twice as large.
|
||||
|
||||
guesscharset
|
||||
|
||||
Decide if we try to guess the character set of files if no
|
||||
@ -2565,7 +2639,69 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
7.4.2. The mimemap file
|
||||
7.4.2. The fields file
|
||||
|
||||
This file contains information about dynamic fields handling in Recoll.
|
||||
Some very basic fields have hard-wired behaviour, and, mostly, you should
|
||||
not change the original data inside the fields file. But you can create
|
||||
custom fields fitting your data and handle them just like they were native
|
||||
ones.
|
||||
|
||||
The fields file has several sections, which each define an aspect of
|
||||
fields processing. Quite often, you'll have to modify several sections to
|
||||
obtain the desired behaviour.
|
||||
|
||||
We will only give a short description here, you should refer to the
|
||||
comments inside the file for more detailed information.
|
||||
|
||||
Field names should be lowercase alphabetic ASCII.
|
||||
|
||||
[prefixes]
|
||||
|
||||
A field becomes indexed (searchable) by having a prefix defined in
|
||||
this section.
|
||||
|
||||
[stored]
|
||||
|
||||
A field becomes stored (displayable inside results) by having its
|
||||
name listed in this section (typically with an empty value).
|
||||
|
||||
[aliases]
|
||||
|
||||
This section defines lists of synonyms for the canonical names
|
||||
used inside the [prefixes] and [stored] sections
|
||||
|
||||
filter-specific sections
|
||||
|
||||
Some filters may need specific configuration for handling fields.
|
||||
Only the mail message filter currently has such a section (named
|
||||
[mail]). It allows indexing arbitrary mail headers in addition to
|
||||
the ones indexed by default. Other such sections may appear in the
|
||||
future.
|
||||
|
||||
Here follows a small example of a personal fields file. This would extract
|
||||
a specific mail header and use it as a searchable field, with data
|
||||
displayable inside result lists. (Side note: as the mail filter does no
|
||||
decoding on the values, only plain ascii headers can be indexed, and only
|
||||
the first occurrence will be used for headers that occur several times).
|
||||
|
||||
[prefixes]
|
||||
# Index mailmytag contents (with the given prefix)
|
||||
mailmytag = XMTAG
|
||||
|
||||
[stored]
|
||||
# Store mailmytag inside the document data record (so that it can be
|
||||
# displayed - as %(mailmytag) - in result lists).
|
||||
mailmytag =
|
||||
|
||||
[mail]
|
||||
# Extract the X-My-Tag mail header, and use it internally with the
|
||||
# mailmytag field name
|
||||
x-my-tag = mailmytag
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
7.4.3. The mimemap file
|
||||
|
||||
mimemap specifies the file name extension to mime type mappings.
|
||||
|
||||
@ -2591,7 +2727,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
7.4.3. The mimeconf file
|
||||
7.4.4. The mimeconf file
|
||||
|
||||
mimeconf specifies how the different mime types are handled for indexing,
|
||||
and which icons are displayed in the recoll result lists.
|
||||
@ -2605,7 +2741,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
7.4.4. The mimeview file
|
||||
7.4.5. The mimeview file
|
||||
|
||||
mimeview specifies which programs are started when you click on an Edit
|
||||
link in a result list. Ie: HTML is normally displayed using firefox, but
|
||||
@ -2633,9 +2769,9 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
7.4.5. Examples of configuration adjustments
|
||||
7.4.6. Examples of configuration adjustments
|
||||
|
||||
7.4.5.1. Adding an external viewer for an non-indexed type
|
||||
7.4.6.1. Adding an external viewer for an non-indexed type
|
||||
|
||||
Imagine that you have some kind of file which does not have indexable
|
||||
content, but for which you would like to have a functional Edit link in
|
||||
@ -2667,7 +2803,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
7.4.5.2. Adding indexing support for a new file type
|
||||
7.4.6.2. Adding indexing support for a new file type
|
||||
|
||||
Let us now imagine that the above .blob files actually contain indexable
|
||||
text and that you know how to extract it with a command line program.
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user