*** empty log message ***
This commit is contained in:
parent
686b891f21
commit
9238d656b6
121
src/INSTALL
121
src/INSTALL
@ -98,7 +98,8 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
extract tag information. Without it, only the file names will be
|
||||
indexed.
|
||||
|
||||
Text, HTML, mail folders and Openoffice files are processed internally.
|
||||
Text, HTML, mail folders Openoffice and Scribus files are processed
|
||||
internally. Lyx is used to index Lyx files. Many filters need sed and awk.
|
||||
|
||||
--------------------------------------------------------------------------
|
||||
|
||||
@ -217,7 +218,10 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
If the .recoll directory does not exist when recoll or recollindex are
|
||||
started, it will be created with a set of empty configuration files.
|
||||
recoll will give you a chance to edit the configuration file before
|
||||
starting indexing. recollindex will proceed immediately.
|
||||
starting indexing. recollindex will proceed immediately. To avoid
|
||||
mistakes, the automatic directory creation will only occur for the default
|
||||
location, not if -c or RECOLL_CONFDIR were used (in the latter cases, you
|
||||
will have to create the directory).
|
||||
|
||||
All configuration files share the same format. For example, a short
|
||||
extract of the main configuration file might look as follows:
|
||||
@ -247,8 +251,8 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
The tilde character (~) is expanded in file names to the name of the
|
||||
user's home directory.
|
||||
|
||||
White space is used for separation inside lists. Elements with embedded
|
||||
spaces can be quoted using double-quotes.
|
||||
White space is used for separation inside lists. List elements with
|
||||
embedded spaces can be quoted using double-quotes.
|
||||
|
||||
4.4.1. Main configuration file
|
||||
|
||||
@ -275,7 +279,8 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
The name of the Xapian data directory. It will be created if
|
||||
needed when the index is initialized. If this is not an absolute
|
||||
path, it will be interpreted relative to the configuration
|
||||
directory.
|
||||
directory. The value can have embedded spaces but starting or
|
||||
trailing spaces will be trimmed. You cannot use quotes here.
|
||||
|
||||
skippedNames
|
||||
|
||||
@ -283,7 +288,8 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
directories that should be completely ignored. The list defined in
|
||||
the default file is:
|
||||
|
||||
*~ #* bin CVS Cache caughtspam tmp
|
||||
skippedNames = #* bin CVS Cache cache* caughtspam tmp .thumbnails .svn \
|
||||
*~ recollrc
|
||||
|
||||
The list can be redefined for sub-directories, but is only
|
||||
actually changed for the top level ones in topdirs.
|
||||
@ -299,6 +305,23 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
solution is to have .* in skippedNames, and add things like
|
||||
~/.thunderbird or ~/.evolution in topdirs.
|
||||
|
||||
skippedPaths and daemSkippedPaths
|
||||
|
||||
A space-separated list of patterns for paths of files or
|
||||
directories that should be skipped. There is no default in the
|
||||
sample configuration file, but the code always adds the
|
||||
configuration and database directories in there.
|
||||
|
||||
skippedPaths is used both by batch and real time indexing.
|
||||
daemSkippedPaths can be used to specify things that should be
|
||||
indexed at startup, but not monitored.
|
||||
|
||||
Example of use for skipping text files only in a specific
|
||||
directory:
|
||||
|
||||
skippedPaths = ~/somedir/*.txt
|
||||
|
||||
|
||||
loglevel,daemloglevel
|
||||
|
||||
Verbosity level for recoll and recollindex. A value of 4 lists
|
||||
@ -424,6 +447,92 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
|
||||
Please note that these entries must be placed under a [view] section.
|
||||
|
||||
If Use desktop preferences to choose document editor is checked in the
|
||||
user preferences, all mimeview entries will be ignored except the one
|
||||
labelled application/x-all (which is set to use xdg-open by default).
|
||||
|
||||
4.4.5. Examples of configuration adjustments
|
||||
|
||||
4.4.5.1. Adding an external viewer for an non-indexed type
|
||||
|
||||
Imagine that you have some kind of file which does not have indexable
|
||||
content, but for which you would like to have a functional Edit link in
|
||||
the result list (when found by file name). The file names end in .blob and
|
||||
can be displayed by application blobviewer.
|
||||
|
||||
You need two entries in the configuration files for this to work:
|
||||
|
||||
* In $RECOLL_CONFDIR/mimemap (typically ~/.recoll/mimemap), add the
|
||||
following line:
|
||||
|
||||
application/x-blobapp = .blob
|
||||
|
||||
|
||||
Note that the mime type is made up here, and you could call it
|
||||
diesel/oil just the same.
|
||||
|
||||
* In $RECOLL_CONFDIR/mimeview under the [view] section:
|
||||
|
||||
application/x-blobapp = blobviewer %f
|
||||
|
||||
|
||||
We are supposing that blobviewer wants a file name parameter here, you
|
||||
would use %u if it liked URLs better.
|
||||
|
||||
If you just wanted to change the application used by Recoll to display a
|
||||
mime type which it already knows, you would just need to edit mimeview.
|
||||
The entries you add in your personal file override those in the central
|
||||
configuration, which you do not need to alter
|
||||
|
||||
4.4.5.2. Adding indexing support for a new file type
|
||||
|
||||
Let us now imagine that the above .blob files actually contain indexable
|
||||
text and that you know how to extract it with a command line program.
|
||||
Getting Recoll to index the files is easy. You need to perform the above
|
||||
alteration, and also to add data to the mimeconf file (typically in
|
||||
~/.recoll/mimeconf):
|
||||
|
||||
* Under the [index] section, add the following line (more about the
|
||||
rclblob indexing script later):
|
||||
|
||||
application/x-blobapp = exec rclblob
|
||||
|
||||
|
||||
* Under the [icons] section, you should choose an icon to be displayed
|
||||
for the files inside the result lists. Icons are normally 64x64 pixels
|
||||
PNG files which live in /usr/[local/]share/recoll/images.
|
||||
|
||||
* Under the [categories] section, you should add the mime type where it
|
||||
makes sense (you can also create a category). Categories may be used
|
||||
for filtering in advanced search.
|
||||
|
||||
The rclblob filter should be an executable program or script which exists
|
||||
inside /usr/[local/]share/recoll/filters. It will be given a file name as
|
||||
argument and should output the text contents in html format on the
|
||||
standard output.
|
||||
|
||||
The html could be very minimal like the following example:
|
||||
|
||||
<html><head>
|
||||
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
|
||||
</head>
|
||||
<body>some text content</body></html>
|
||||
|
||||
|
||||
You should take care to escape some characters inside the text by
|
||||
transforming them into appropriate entities. "&" should be transformed
|
||||
into "&", "<" should be transformed into "<".
|
||||
|
||||
The character set needs to be specified in the header. It does not need to
|
||||
be UTF-8 (Recoll will take care of translating it), but it must be
|
||||
accurate for good results.
|
||||
|
||||
Recoll will also make use of other header fields if they are present:
|
||||
title, description, keywords.
|
||||
|
||||
The easiest way to write a new filter is probably to start from an
|
||||
existing one.
|
||||
|
||||
--------------------------------------------------------------------------
|
||||
|
||||
Prev Home
|
||||
|
||||
340
src/README
340
src/README
@ -45,7 +45,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
|
||||
2.5. Real time indexing
|
||||
|
||||
3. Search
|
||||
3. Searching
|
||||
|
||||
3.1. Simple search
|
||||
|
||||
@ -55,19 +55,23 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
|
||||
3.3. The preview window
|
||||
|
||||
3.4. Complex/advanced search
|
||||
3.4. The query language
|
||||
|
||||
3.5. The term explorer tool
|
||||
3.5. Complex/advanced search
|
||||
|
||||
3.6. Multiple databases
|
||||
3.6. The term explorer tool
|
||||
|
||||
3.7. Document history
|
||||
3.7. More about wildcards
|
||||
|
||||
3.8. Sorting search results
|
||||
3.8. Multiple databases
|
||||
|
||||
3.9. Search tips, shortcuts
|
||||
3.9. Document history
|
||||
|
||||
3.10. Customizing the search interface
|
||||
3.10. Sorting search results
|
||||
|
||||
3.11. Search tips, shortcuts
|
||||
|
||||
3.12. Customizing the search interface
|
||||
|
||||
4. Installation
|
||||
|
||||
@ -97,6 +101,8 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
|
||||
4.4.4. The mimeview file
|
||||
|
||||
4.4.5. Examples of configuration adjustments
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
Chapter 1. Introduction
|
||||
@ -209,8 +215,8 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
data entered into the database. Recoll indexing is normally incremental:
|
||||
documents will only be processed if they have been modified. On the first
|
||||
execution, of course, all documents will need processing. A full index
|
||||
build can be forced later on by specifying an option to the indexing
|
||||
command (recollindex -z).
|
||||
build can be forced later by specifying an option to the indexing command
|
||||
(recollindex -z).
|
||||
|
||||
Recoll indexing can be performed with two different methods:
|
||||
|
||||
@ -435,7 +441,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
Chapter 3. Search
|
||||
Chapter 3. Searching
|
||||
|
||||
The recoll program provides the user interface for searching. It is based
|
||||
on the QT library.
|
||||
@ -452,11 +458,16 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
|
||||
4. Click the Search button or hit the Enter key to start the search.
|
||||
|
||||
The initial default search mode is Any term. This will look for documents
|
||||
with any of the search terms (the ones with more terms will get better
|
||||
scores). All terms will ensure that only documents with all the terms will
|
||||
be returned. File name will specifically look for file names, and allows
|
||||
using wildcards (*, ? , []).
|
||||
The initial default search mode is All terms. This will look for documents
|
||||
containing all of the search terms (the ones with more terms will get
|
||||
better scores). Any term will search for documents where at least one of
|
||||
the terms appear. File name will specifically look for file names.
|
||||
|
||||
The fourth entry (Query Language) is described in its own section.
|
||||
|
||||
All search modes allow wildcards inside terms (*, ?, []). You may want to
|
||||
have a look at the section about wildcards for more information about
|
||||
this.
|
||||
|
||||
You can search for exact phrases (adjacent words in a given order) by
|
||||
enclosing the input inside double quotes. Ex: "virtual reality".
|
||||
@ -472,13 +483,19 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
thing at the right of the text field). Please note, however, that only the
|
||||
search texts are remembered, not the mode (all/any/file name).
|
||||
|
||||
Typing Esc Space) while entering a word in the simple search entry will
|
||||
Typing Esc Space while entering a word in the simple search entry will
|
||||
open a window with possible completions for the word. The completions are
|
||||
extracted from the database.
|
||||
|
||||
Double-clicking on a word in the result list or a preview window will
|
||||
insert it into the simple search entry field.
|
||||
|
||||
Note that, apart from wildcard characters (single ? characters are ok),
|
||||
you can cut and paste any text into an All terms or Any term search field,
|
||||
punctuation, newlines and all. Recoll will process it and produce a
|
||||
meaningful search. This is what most differentiates this mode from the
|
||||
Query Language mode, where you have to care about the syntax.
|
||||
|
||||
You can use the Tools / Advanced search dialog for more complex searches.
|
||||
|
||||
----------------------------------------------------------------------
|
||||
@ -496,7 +513,8 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
window for the document. Further Preview clicks for the same search will
|
||||
open tabs in the existing preview window. You can use Shift+Click to force
|
||||
the creation of another preview window, which may be useful to view the
|
||||
documents side by side.
|
||||
documents side by side. (You can also browse successive results in a
|
||||
single preview window by typing Shift+ArrowUp/Down in the window).
|
||||
|
||||
Clicking the Edit link will attempt to start an external viewer. The
|
||||
viewers can be configured through the user preferences dialog, or by
|
||||
@ -543,17 +561,15 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
* Parent document
|
||||
|
||||
The Preview and Edit entries do the same thing as the corresponding links.
|
||||
The two following entries will copy either an URL or the file path to the
|
||||
clipboard, for pasting into another application.
|
||||
|
||||
The Copy File Name and Copy Url copy the relevant data to the clipboard,
|
||||
for later pasting.
|
||||
|
||||
The Find similar entry will select a number of relevant term from the
|
||||
current document and enter them into the simple search field. You can then
|
||||
start a simple search, with a good chance of finding documents related to
|
||||
the current result.
|
||||
|
||||
The Copy File Name and Copy Url copy the relevant data to the clipboard,
|
||||
for later pasting.
|
||||
|
||||
The Parent document entry will appear for documents which are not actually
|
||||
files but are part of, or attached to, a higher level document. This entry
|
||||
is mainly useful for email attachments and permits viewing the message to
|
||||
@ -570,7 +586,8 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
result list.
|
||||
|
||||
Subsequent preview requests for a given search open new tabs in the
|
||||
existing window.
|
||||
existing window (except if you hold the Shift key while clicking which
|
||||
will open a new window for side by side viewing).
|
||||
|
||||
Starting another search and requesting a preview will create a new preview
|
||||
window. The old one stays open until you close it.
|
||||
@ -599,11 +616,61 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
3.4. Complex/advanced search
|
||||
3.4. The query language
|
||||
|
||||
The advanced search dialog has fields that will allow a more refined
|
||||
search. It has a number of entry fields, each of which is configurable for
|
||||
the following modes:
|
||||
The query language processor is activated on the simple search entry when
|
||||
the search mode selector is set to Query Language.
|
||||
|
||||
Here follows a sample request that we are going to explain:
|
||||
|
||||
mime:message/rfc822 author:"john doe" Beatles OR Lennon Live OR Unplugged -potatoes
|
||||
|
||||
|
||||
This would search for all email messages with John Doe appearing as a
|
||||
phrase in the From: header, and containing either beatles or lennon and
|
||||
either live or unplugged but not potatoes.
|
||||
|
||||
The first element, mime:message/rfc822 is a special switch that restricts
|
||||
the results to be email messages. There could be several such switches,
|
||||
which would form a list of allowed types.
|
||||
|
||||
The second element author:"john doe" is a phrase search limited to a
|
||||
specific field. Phrase searches are specified as usual by enclosing the
|
||||
words in double quotes. The field specification appears before the colon.
|
||||
Recoll currently manages the following fields:
|
||||
|
||||
* title, subject or caption are synonyms which specify data to be
|
||||
searched for in the document title or subject.
|
||||
|
||||
* author or from for searching the documents originators.
|
||||
|
||||
* keyword for searching the document specified keywords (few documents
|
||||
actually have any).
|
||||
|
||||
The query language is currently the only way to use the Recoll field
|
||||
search capability.
|
||||
|
||||
All elements in the search entry are normally combined with an implicit
|
||||
AND. It is possible to specify that elements be OR'ed instead, as in
|
||||
Beatles OR Lennon. The OR must be entered literally (capitals), and it has
|
||||
priority over the AND associations: word1 word2 OR word3 means word1 AND
|
||||
(word2 OR word3) not (word1 AND word2) OR word3. Do not enter explicit
|
||||
parenthesis, they are not supported for now.
|
||||
|
||||
An entry preceded by a - specifies a term that should not appear.
|
||||
|
||||
Words inside phrases and capitalized words are not stem-expanded.
|
||||
Wildcards may be used anywhere.
|
||||
|
||||
You can use the show query link at the top of the result list to check the
|
||||
exact query which was finally executed by Xapian.
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
3.5. Complex/advanced search
|
||||
|
||||
The advanced search dialog has a number of fields that will allow a more
|
||||
refined search. Each entry field is configurable for the following modes:
|
||||
|
||||
* All terms.
|
||||
|
||||
@ -619,11 +686,12 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
|
||||
Additional entry fields can be created by clicking the Add clause button.
|
||||
|
||||
All relevant fields will be combined by an implicit AND or OR conjunction.
|
||||
All types of clauses except "phrase" and "near" can accept a mix of single
|
||||
words and phrases enclosed in double quotes. Stemming expansion will be
|
||||
performed for all terms not beginning with a capital letter, except for
|
||||
"phrase" clauses.
|
||||
You can choose that all relevant fields will be combined by either an AND
|
||||
or an OR conjunction. All types of clauses except "phrase" and "near" can
|
||||
accept a mix of single words and phrases enclosed in double quotes.
|
||||
Stemming expansion will be performed for all terms not beginning with a
|
||||
capital letter, except for terms inside "phrase" clauses. Wildcards will
|
||||
be processed everywhere.
|
||||
|
||||
Advanced search will also let you search for documents of specific mime
|
||||
types (ie: only text/plain, or text/HTML or application/pdf etc...). The
|
||||
@ -644,7 +712,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
3.5. The term explorer tool
|
||||
3.6. The term explorer tool
|
||||
|
||||
Recoll automatically manages the expansion of search terms to their
|
||||
derivatives (ie: plural/singular, verb inflections). But there are other
|
||||
@ -658,13 +726,16 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
Wildcard
|
||||
|
||||
In this mode of operation, you can enter a search string with
|
||||
shell-like wildcards (*, ?). ie: xapi* .
|
||||
shell-like wildcards (*, ?, []). ie: xapi* would display all index
|
||||
terms beginning with xapi. (More about wildcards here).
|
||||
|
||||
Regular expression
|
||||
|
||||
This mode will accept a regular expression as input. Example:
|
||||
word[0-9]+ . The regular expression is anchored by enclosing in ^
|
||||
and $ before execution.
|
||||
word[0-9]+. The expression is implicitely anchored at the
|
||||
beginning. Ie: press will match pression but not expression. You
|
||||
can use .*press to match the latter, but be aware that this will
|
||||
cause a full index term list scan, which can be quite long.
|
||||
|
||||
Stem expansion
|
||||
|
||||
@ -695,7 +766,38 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
3.6. Multiple databases
|
||||
3.7. More about wildcards
|
||||
|
||||
All words entered in Recoll search fields will be processed for wildcard
|
||||
expansion before the request is finally executed.
|
||||
|
||||
The wildcard characters are:
|
||||
|
||||
* * which matches 0 or more characters.
|
||||
|
||||
* ? which matches a single character.
|
||||
|
||||
* [] which allow defining sets of characters to be matched (ex: [abc]
|
||||
matches a single character which may be 'a' or 'b' or 'c', [0-9]
|
||||
matches any number.
|
||||
|
||||
You should be aware of a few things before using wildcards.
|
||||
|
||||
* Using a wildcard character at the beginning of a word can make for a
|
||||
slow search because Recoll will have to scan the whole index term list
|
||||
to find the matches.
|
||||
|
||||
* Using a * at the end of a word can produce more matches than you would
|
||||
think, and strange search results. You can use the term explorer tool
|
||||
to check what completions exist for a given term. You can also see
|
||||
exactly what search was performed by clicking on the link at the top
|
||||
of the result list. In general, for natural language terms, stem
|
||||
expansion will produce better results than an ending * (stem expansion
|
||||
is turned off when any wildcard character appears in the term).
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
3.8. Multiple databases
|
||||
|
||||
Multiple Recoll databases or indexes can be created by using several
|
||||
configuration directories which are usually set to index different areas
|
||||
@ -731,17 +833,16 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
|
||||
A typical usage scenario for the multiple index feature would be for a
|
||||
system administrator to set up a central index for shared data, that you
|
||||
may choose to search, or not, in addition to your personal data. Of
|
||||
course, there are other possibilities. There are many cases where you know
|
||||
the subset of files that you want to be searched for a given query, and
|
||||
where restricting the query will much improve the precision of the
|
||||
results. This can also be performed with the directory filter in advanced
|
||||
search, but multiple indexes will have much better performance and may be
|
||||
worth the trouble.
|
||||
choose to search or not in addition to your personal data. Of course,
|
||||
there are other possibilities. There are many cases where you know the
|
||||
subset of files that should be searched, and where narrowing the search
|
||||
can improve the results. You can achieve approximately the same effect
|
||||
with the directory filter in advanced search, but multiple indexes will
|
||||
have much better performance and may be worth the trouble.
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
3.7. Document history
|
||||
3.9. Document history
|
||||
|
||||
Documents that you actually view (with the internal preview or an external
|
||||
tool) are entered into the document history, which is remembered. You can
|
||||
@ -749,7 +850,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
3.8. Sorting search results
|
||||
3.10. Sorting search results
|
||||
|
||||
The documents in a result list are normally sorted in order of relevance.
|
||||
It is possible to specify different sort parameters by using the Sort
|
||||
@ -764,7 +865,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
3.9. Search tips, shortcuts
|
||||
3.11. Search tips, shortcuts
|
||||
|
||||
Term completion. Typing Esc Space in the simple search entry field while
|
||||
entering a word will either complete the current word if its beginning
|
||||
@ -830,7 +931,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
3.10. Customizing the search interface
|
||||
3.12. Customizing the search interface
|
||||
|
||||
It is possible to customize some aspects of the search interface by using
|
||||
Query configuration entry in the Preferences menu.
|
||||
@ -903,6 +1004,16 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
search input field. This lets you look at the result list as you enter
|
||||
new terms. This is off by default, you may like it or not...
|
||||
|
||||
* Start with advanced search dialog open and Start with sort dialog
|
||||
open: If you use these dialogs all the time, checking these entries
|
||||
will get them to open when recoll starts.
|
||||
|
||||
* Use desktop preferences to choose document editor: if this is checked,
|
||||
the xdg-open utility will be used to open files when you click the
|
||||
Edit link in the result list, instead of the application defined in
|
||||
mimeview. xdg-open will in term use your desktop preferences to choose
|
||||
an appropriate application.
|
||||
|
||||
Search parameters:
|
||||
|
||||
* Stemming language: stemming obviously depends on the document's
|
||||
@ -933,9 +1044,9 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
database directory (ie: /home/someothergui/.recoll/xapiandb,
|
||||
/usr/local/recollglobal/xapiandb).
|
||||
|
||||
Once entered, the indexes will appear in the All indexes list, and you can
|
||||
chose which ones you want to use at any moment by transferring them
|
||||
to/from the Active indexes list.
|
||||
Once entered, the indexes will appear in the External indexes list, and
|
||||
you can chose which ones you want to use at any moment by checking or
|
||||
unchecking their entries.
|
||||
|
||||
Your main database (the one the current configuration indexes to), is
|
||||
always implicitly active. If this is not desirable, you can set up your
|
||||
@ -1012,7 +1123,8 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
extract tag information. Without it, only the file names will be
|
||||
indexed.
|
||||
|
||||
Text, HTML, mail folders and Openoffice files are processed internally.
|
||||
Text, HTML, mail folders Openoffice and Scribus files are processed
|
||||
internally. Lyx is used to index Lyx files. Many filters need sed and awk.
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
@ -1112,7 +1224,10 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
If the .recoll directory does not exist when recoll or recollindex are
|
||||
started, it will be created with a set of empty configuration files.
|
||||
recoll will give you a chance to edit the configuration file before
|
||||
starting indexing. recollindex will proceed immediately.
|
||||
starting indexing. recollindex will proceed immediately. To avoid
|
||||
mistakes, the automatic directory creation will only occur for the default
|
||||
location, not if -c or RECOLL_CONFDIR were used (in the latter cases, you
|
||||
will have to create the directory).
|
||||
|
||||
All configuration files share the same format. For example, a short
|
||||
extract of the main configuration file might look as follows:
|
||||
@ -1142,8 +1257,8 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
The tilde character (~) is expanded in file names to the name of the
|
||||
user's home directory.
|
||||
|
||||
White space is used for separation inside lists. Elements with embedded
|
||||
spaces can be quoted using double-quotes.
|
||||
White space is used for separation inside lists. List elements with
|
||||
embedded spaces can be quoted using double-quotes.
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
@ -1172,7 +1287,8 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
The name of the Xapian data directory. It will be created if
|
||||
needed when the index is initialized. If this is not an absolute
|
||||
path, it will be interpreted relative to the configuration
|
||||
directory.
|
||||
directory. The value can have embedded spaces but starting or
|
||||
trailing spaces will be trimmed. You cannot use quotes here.
|
||||
|
||||
skippedNames
|
||||
|
||||
@ -1180,7 +1296,8 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
directories that should be completely ignored. The list defined in
|
||||
the default file is:
|
||||
|
||||
*~ #* bin CVS Cache caughtspam tmp
|
||||
skippedNames = #* bin CVS Cache cache* caughtspam tmp .thumbnails .svn \
|
||||
*~ recollrc
|
||||
|
||||
The list can be redefined for sub-directories, but is only
|
||||
actually changed for the top level ones in topdirs.
|
||||
@ -1196,6 +1313,23 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
solution is to have .* in skippedNames, and add things like
|
||||
~/.thunderbird or ~/.evolution in topdirs.
|
||||
|
||||
skippedPaths and daemSkippedPaths
|
||||
|
||||
A space-separated list of patterns for paths of files or
|
||||
directories that should be skipped. There is no default in the
|
||||
sample configuration file, but the code always adds the
|
||||
configuration and database directories in there.
|
||||
|
||||
skippedPaths is used both by batch and real time indexing.
|
||||
daemSkippedPaths can be used to specify things that should be
|
||||
indexed at startup, but not monitored.
|
||||
|
||||
Example of use for skipping text files only in a specific
|
||||
directory:
|
||||
|
||||
skippedPaths = ~/somedir/*.txt
|
||||
|
||||
|
||||
loglevel,daemloglevel
|
||||
|
||||
Verbosity level for recoll and recollindex. A value of 4 lists
|
||||
@ -1327,4 +1461,94 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or
|
||||
|
||||
Please note that these entries must be placed under a [view] section.
|
||||
|
||||
If Use desktop preferences to choose document editor is checked in the
|
||||
user preferences, all mimeview entries will be ignored except the one
|
||||
labelled application/x-all (which is set to use xdg-open by default).
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
4.4.5. Examples of configuration adjustments
|
||||
|
||||
4.4.5.1. Adding an external viewer for an non-indexed type
|
||||
|
||||
Imagine that you have some kind of file which does not have indexable
|
||||
content, but for which you would like to have a functional Edit link in
|
||||
the result list (when found by file name). The file names end in .blob and
|
||||
can be displayed by application blobviewer.
|
||||
|
||||
You need two entries in the configuration files for this to work:
|
||||
|
||||
* In $RECOLL_CONFDIR/mimemap (typically ~/.recoll/mimemap), add the
|
||||
following line:
|
||||
|
||||
application/x-blobapp = .blob
|
||||
|
||||
|
||||
Note that the mime type is made up here, and you could call it
|
||||
diesel/oil just the same.
|
||||
|
||||
* In $RECOLL_CONFDIR/mimeview under the [view] section:
|
||||
|
||||
application/x-blobapp = blobviewer %f
|
||||
|
||||
|
||||
We are supposing that blobviewer wants a file name parameter here, you
|
||||
would use %u if it liked URLs better.
|
||||
|
||||
If you just wanted to change the application used by Recoll to display a
|
||||
mime type which it already knows, you would just need to edit mimeview.
|
||||
The entries you add in your personal file override those in the central
|
||||
configuration, which you do not need to alter
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
4.4.5.2. Adding indexing support for a new file type
|
||||
|
||||
Let us now imagine that the above .blob files actually contain indexable
|
||||
text and that you know how to extract it with a command line program.
|
||||
Getting Recoll to index the files is easy. You need to perform the above
|
||||
alteration, and also to add data to the mimeconf file (typically in
|
||||
~/.recoll/mimeconf):
|
||||
|
||||
* Under the [index] section, add the following line (more about the
|
||||
rclblob indexing script later):
|
||||
|
||||
application/x-blobapp = exec rclblob
|
||||
|
||||
|
||||
* Under the [icons] section, you should choose an icon to be displayed
|
||||
for the files inside the result lists. Icons are normally 64x64 pixels
|
||||
PNG files which live in /usr/[local/]share/recoll/images.
|
||||
|
||||
* Under the [categories] section, you should add the mime type where it
|
||||
makes sense (you can also create a category). Categories may be used
|
||||
for filtering in advanced search.
|
||||
|
||||
The rclblob filter should be an executable program or script which exists
|
||||
inside /usr/[local/]share/recoll/filters. It will be given a file name as
|
||||
argument and should output the text contents in html format on the
|
||||
standard output.
|
||||
|
||||
The html could be very minimal like the following example:
|
||||
|
||||
<html><head>
|
||||
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
|
||||
</head>
|
||||
<body>some text content</body></html>
|
||||
|
||||
|
||||
You should take care to escape some characters inside the text by
|
||||
transforming them into appropriate entities. "&" should be transformed
|
||||
into "&", "<" should be transformed into "<".
|
||||
|
||||
The character set needs to be specified in the header. It does not need to
|
||||
be UTF-8 (Recoll will take care of translating it), but it must be
|
||||
accurate for good results.
|
||||
|
||||
Recoll will also make use of other header fields if they are present:
|
||||
title, description, keywords.
|
||||
|
||||
The easiest way to write a new filter is probably to start from an
|
||||
existing one.
|
||||
|
||||
----------------------------------------------------------------------
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user