From d2fa1befc1a3152536c433afe9293d4ad012748e Mon Sep 17 00:00:00 2001 From: dockes Date: Fri, 30 Jan 2009 11:43:54 +0000 Subject: [PATCH] --- src/INSTALL | 93 +++++++----- src/README | 420 +++++++++++++++++++++++++++++++++++++--------------- 2 files changed, 354 insertions(+), 159 deletions(-) diff --git a/src/INSTALL b/src/INSTALL index f61bcbaf..5d7386df 100644 --- a/src/INSTALL +++ b/src/INSTALL @@ -11,21 +11,21 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or -------------------------------------------------------------------------- - Chapter 5. Installation + Chapter 7. Installation Table of Contents - 5.1. Installing a prebuilt copy + 7.1. Installing a prebuilt copy - 5.2. Supporting packages + 7.2. Supporting packages - 5.3. Building from source + 7.3. Building from source - 5.4. Configuration overview + 7.4. Configuration overview - 5.5. The KDE Kicker Recoll applet + 7.5. The KDE Kicker Recoll applet - 5.1. Installing a prebuilt copy + 7.1. Installing a prebuilt copy Recoll binary packages from the Recoll web site are always linked statically to the Xapian libraries, and have no other dependencies. You @@ -34,12 +34,12 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or have a look at the configuration section (but this may not be necessary for a quick test with default parameters). -5.1.1. Installing through a package system +7.1.1. Installing through a package system If you use a BSD-type port system or a prebuilt package (RPM or other), just follow the usual procedure for your system. -5.1.2. Installing a prebuilt Recoll +7.1.2. Installing a prebuilt Recoll The unpackaged binary versions on the Recoll web site are just compressed tar files of a build tree, where only the useful parts were kept @@ -62,11 +62,11 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or Link: NEXT Recoll user manual - Prev Chapter 5. Installation Next + Prev Chapter 7. Installation Next -------------------------------------------------------------------------- - 5.2. Supporting packages + 7.2. Supporting packages Recoll uses external applications to index some file types. You need to install them for the file types that you wish to have indexed (these are @@ -122,13 +122,13 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or Link: NEXT Recoll user manual - Prev Chapter 5. Installation Next + Prev Chapter 7. Installation Next -------------------------------------------------------------------------- - 5.3. Building from source + 7.3. Building from source -5.3.1. Prerequisites +7.3.1. Prerequisites At the very least, you will need to download and install the xapian core package (Recoll 1.9 normally uses version 1.0.2, but any 0.9 or 1.0.x @@ -144,7 +144,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or not be critical). On Linux systems, the iconv interface is part of libc and you should not need to do anything special. -5.3.2. Building +7.3.2. Building Recoll has been built on Linux (redhat7.3, mandriva 2005/6, Fedora Core 3/4/5/6), FreeBSD 5/6, macosx, and Solaris 8. If you build on another @@ -182,7 +182,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or manually copy and modify one of the existing files (the new file name should be the output of uname -s). -5.3.3. Installation +7.3.3. Installation Either type make install or execute recollinstall prefix, in the root of the source tree. This will copy the commands to prefix/bin and the sample @@ -205,28 +205,41 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or Link: NEXT Recoll user manual - Prev Chapter 5. Installation Next + Prev Chapter 7. Installation Next -------------------------------------------------------------------------- - 5.4. Configuration overview + 7.4. Configuration overview Most of the parameters specific to the recoll GUI are set through the Preferences menu and stored in the standard QT place ($HOME/.qt/recollrc). You probably do not want to edit this by hand. - For other options, Recoll uses text configuration files. You will have to - edit them by hand for now (there is still some hope for a GUI - configuration tool in the future). The most accurate documentation for the - configuration parameters is given by comments inside the default files, - and we will just give a general overview here. + Recoll indexing options are set inside text configuration files located in + a configuration directory. There can be several such directories, each of + which define the parameters for one index. - There are two sets of configuration files. The system-wide files are kept - in a directory named like /usr/[local/]share/recoll/examples, they define - default values for the system. A parallel set of files exists by default - in the .recoll directory in your home. This directory can be changed with - the RECOLL_CONFDIR environment variable or the -c option parameter to - recoll and recollindex. + The configuration files can be edited by hand or through the Indexing + configuration dialog (Preferences menu). The GUI tool will try to respect + your formatting and comments as much as possible, so it is quite possible + to use both ways. + + The most accurate documentation for the configuration parameters is given + by comments inside the default files, and we will just give a general + overview here. + + For each index, there are two sets of configuration files. System-wide + configuration files are kept in a directory named like + /usr/[local/]share/recoll/examples, and define default values, shared by + all indexes. For each index, a parallel set of files defines the + customized parameters. + + The default location of the configuration is the .recoll directory in your + home. Most people will only use this directory. + + This location can be changed, or others can be added with the + RECOLL_CONFDIR environment variable or the -c option parameter to recoll + and recollindex. If the .recoll directory does not exist when recoll or recollindex are started, it will be created with a set of empty configuration files. @@ -267,7 +280,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or White space is used for separation inside lists. List elements with embedded spaces can be quoted using double-quotes. -5.4.1. Main configuration file +7.4.1. Main configuration file recoll.conf is the main configuration file. It defines things like what to index (top directories and things to ignore), and the default character @@ -424,6 +437,14 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or If the variable is unspecified or the list empty (the default), all supported types are processed. + compressedfilemaxkbs + + Size limit for compressed (.gz or .bz2) files. These need to be + decompressed in a temporary directory for identification, which + can be very wasteful if 'uninteresting' big compressed files are + present. Negative means no limit, 0 means no processing of any + compressed file. Defaults to -1. + indexallfilenames Recoll indexes file names in a special section of the database to @@ -475,7 +496,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or cases. A value of 3 would allow more precision and efficiency on longer words, but the index will be approximately twice as large. -5.4.2. The mimemap file +7.4.2. The mimemap file mimemap specifies the file name extension to mime type mappings. @@ -499,7 +520,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or given Recoll version. Having it there avoids cluttering the more user-oriented and locally customized skippedNames. -5.4.3. The mimeconf file +7.4.3. The mimeconf file mimeconf specifies how the different mime types are handled for indexing, and which icons are displayed in the recoll result lists. @@ -511,7 +532,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or recoll in the result lists (the values are the basenames of the png images inside the iconsdir directory (specified in recoll.conf). -5.4.4. The mimeview file +7.4.4. The mimeview file mimeview specifies which programs are started when you click on an Edit link in a result list. Ie: HTML is normally displayed using firefox, but @@ -532,9 +553,9 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or user preferences, all mimeview entries will be ignored except the one labelled application/x-all (which is set to use xdg-open by default). -5.4.5. Examples of configuration adjustments +7.4.5. Examples of configuration adjustments - 5.4.5.1. Adding an external viewer for an non-indexed type + 7.4.5.1. Adding an external viewer for an non-indexed type Imagine that you have some kind of file which does not have indexable content, but for which you would like to have a functional Edit link in @@ -565,7 +586,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or The entries you add in your personal file override those in the central configuration, which you do not need to alter - 5.4.5.2. Adding indexing support for a new file type + 7.4.5.2. Adding indexing support for a new file type Let us now imagine that the above .blob files actually contain indexable text and that you know how to extract it with a command line program. diff --git a/src/README b/src/README index 347386cb..6b902280 100644 --- a/src/README +++ b/src/README @@ -12,9 +12,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or This document introduces full text search notions and describes the installation and use of the Recoll application. It currently describes - Recoll 1.9. - - [ Split HTML / Single HTML ] + Recoll 1.12. ---------------------------------------------------------------------- @@ -50,7 +48,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or 2.5. Real time indexing - 3. Searching + 3. Searching with the Qt graphical user interface 3.1. Simple search @@ -72,7 +70,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or 3.9. Document history - 3.10. Sorting search results + 3.10. Sorting search results and collapsing duplicates 3.11. Search tips, shortcuts @@ -84,51 +82,59 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or 3.12. Customizing the search interface - 4. Programming interface + 4. Searching with the KDE KIO slave - 4.1. Writing a document filter + 4.1. What's this - 4.1.1. Filter HTML output + 4.2. Searchable documents - 4.2. Field data processing configuration + 5. Searching on the command line - 4.3. API + 6. Programming interface - 4.3.1. Interface elements + 6.1. Writing a document filter - 4.3.2. Python interface + 6.1.1. Filter HTML output - 5. Installation + 6.2. Field data processing configuration - 5.1. Installing a prebuilt copy + 6.3. API - 5.1.1. Installing through a package system + 6.3.1. Interface elements - 5.1.2. Installing a prebuilt Recoll + 6.3.2. Python interface - 5.2. Supporting packages + 7. Installation - 5.3. Building from source + 7.1. Installing a prebuilt copy - 5.3.1. Prerequisites + 7.1.1. Installing through a package system - 5.3.2. Building + 7.1.2. Installing a prebuilt Recoll - 5.3.3. Installation + 7.2. Supporting packages - 5.4. Configuration overview + 7.3. Building from source - 5.4.1. Main configuration file + 7.3.1. Prerequisites - 5.4.2. The mimemap file + 7.3.2. Building - 5.4.3. The mimeconf file + 7.3.3. Installation - 5.4.4. The mimeview file + 7.4. Configuration overview - 5.4.5. Examples of configuration adjustments + 7.4.1. Main configuration file - 5.5. The KDE Kicker Recoll applet + 7.4.2. The mimemap file + + 7.4.3. The mimeconf file + + 7.4.4. The mimeview file + + 7.4.5. Examples of configuration adjustments + + 7.5. The KDE Kicker Recoll applet ---------------------------------------------------------------------- @@ -143,7 +149,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or Do not do this if your home directory contains a huge number of documents and you do not want to wait or are very short on disk space. In this case, - you may want to edit the configuration file first to restrict the indexed + you may first want to customize the configuration to restrict the indexed area. Also be aware that you may need to install the appropriate supporting @@ -216,15 +222,14 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or currently makes no attempt at automatic language recognition. Recoll has many parameters which define exactly what to index, and how to - classify and decode the source documents. These are kept in a - configuration file. A default configuration is copied into a standard - location (usually something like /usr/[local/]share/recoll/examples) - during installation. The default parameters from this file may be - overridden by values that you set inside your personal configuration, - found by default in the .recoll sub-directory of your home directory. The - default configuration will index your home directory with default - parameters and should be sufficient for giving Recoll a try, but you may - want to adjust it later. + classify and decode the source documents. These are kept in configuration + files. A default configuration is copied into a standard location (usually + something like /usr/[local/]share/recoll/examples) during installation. + The default parameters from this file may be overridden by values that you + set inside your personal configuration, found by default in the .recoll + sub-directory of your home directory. The default configuration will index + your home directory with default parameters and should be sufficient for + giving Recoll a try, but you may want to adjust it later. Indexing is started automatically the first time you execute the recoll search graphical user interface, or by executing the recollindex command. @@ -419,9 +424,9 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or 2.3.1. The indexing configuration GUI - As of Recoll 1.10, most parameters for a given indexing configuration can - be set from a recoll GUI running on this configuration (either as default, - or by setting RECOLL_CONFDIR or the -c option.) + Most parameters for a given indexing configuration can be set from a + recoll GUI running on this configuration (either as default, or by setting + RECOLL_CONFDIR or the -c option.) The interface is started from the Preferences menu. It has two main panels. The first panel allows setting global variables, like the list of @@ -533,10 +538,10 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or ---------------------------------------------------------------------- - Chapter 3. Searching + Chapter 3. Searching with the Qt graphical user interface - The recoll program provides the user interface for searching. It is based - on the QT library. + The recoll program provides the main user interface for searching. It is + based on the QT library. recoll has two search modes: @@ -554,10 +559,10 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or from another text window, punctation and all. The main case where you should enter text differently from how it is - printed is for east-oriental languages written with Chinese characters. - Words composed of single or multiple characters should be entered - separated by white space in this case (they would typically be printed - without white space). + printed is for east-asian languages (Chinese, Japanese, Korean). Words + composed of single or multiple characters should be entered separated by + white space in this case (they would typically be printed without white + space). ---------------------------------------------------------------------- @@ -565,7 +570,8 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or 1. Start the recoll program. - 2. Possibly choose a search mode: Any term or All terms or File name. + 2. Possibly choose a search mode: Any term, All terms, File name or Query + language. 3. Enter search term(s) in the text field at the top of the window. @@ -579,7 +585,9 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or File name will specifically look for file names. The entry will be split at white space characters, and each pattern will be separately expanded. If you want to search for a pattern including white space, you need to use - double quotes. + double quotes. The point of having a separate file name search is that + wild card expansion can be performed more efficiently on a relatively + small subset of the index. The fourth entry (Query Language) is described in its own section. @@ -593,8 +601,8 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or Character case has no influence on search, except that you can disable stem expansion for any term by capitalizing it. Ie: a search for floor will also normally look for flooring, floored, etc., but a search for - Floor will only look for floor, in any character case (stemming can also - be disabled globally in the preferences). + Floor will only look for floor, in any character case. Sstemming can also + be disabled globally in the preferences. Recoll remembers the last few searches that you performed. You can use the simple search text entry widget (a combobox) to recall them (click on the @@ -634,17 +642,20 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or documents side by side. (You can also browse successive results in a single preview window by typing Shift+ArrowUp/Down in the window). - Clicking the Edit link will attempt to start an external viewer. The - viewers can be configured through the user preferences dialog, or by + Clicking the Edit link will attempt to start an external editor. The + editors can be configured through the user preferences dialog, or by editing the mimeview configuration file. The Preview and Edit edit links may not be present for all entries, meaning that Recoll has no configured way to preview a given file type - (which was indexed by name only), or no configured external viewer for the + (which was indexed by name only), or no configured external editor for the file type. This can sometimes be adjusted simply by tweaking the mimemap and mimeview configuration files (the latter can be modified with the user preferences dialog). + The format of the result list entries is entirely configurable by using + the preference dialog to edit an HTML fragment. + You can click on the Query details link at the top of the results page to see the query actually performed, after stem expansion and other processing. @@ -672,7 +683,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or * Copy Url - * Find similar + * Save to File * Find similar @@ -683,6 +694,12 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or The Copy File Name and Copy Url copy the relevant data to the clipboard, for later pasting. + Save to File allows saving the contents of a result document to a chosen + file. This entry will only appear if the document does not correspond to + an existing file, but is a subdocument inside such a file (ie: an email + attachment). It is especially useful to extract attachments with no + associated editor. + The Find similar entry will select a number of relevant term from the current document and enter them into the simple search field. You can then start a simple search, with a good chance of finding documents related to @@ -732,6 +749,11 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or string is found, the cursor will be positioned at the first occurrence of the search string. + A right-click menu in the text area allows switching between displaying + the main text or the contents of fields associated to the document (ie: + author, abtract, etc.). This is especially useful in cases where the term + match did not occur in the main text but in one of the fields. + ---------------------------------------------------------------------- 3.4. The query language @@ -833,39 +855,60 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or 3.5. Complex/advanced search - The advanced search dialog has a number of fields that will allow a more - refined search. Each entry field is configurable for the following modes: + The advanced search dialog helps you build more complex queries. It can be + opened through the Tools menu or through the main toolbar. - * All terms. + The dialog has three parts: - * Any term. + * The top part allows constructing a query by combining multiple clauses + of different types. Each entry field is configurable for the following + modes: - * None of the terms. + * All terms. - * Phrase (exact terms in order within an adjustable window). + * Any term. - * Proximity (terms in any order within an adjustable window). + * None of the terms. - * Filename search with wildcards. + * Phrase (exact terms in order within an adjustable window). - Additional entry fields can be created by clicking the Add clause button. + * Proximity (terms in any order within an adjustable window). - You can choose that all relevant fields will be combined by either an AND - or an OR conjunction. All types of clauses except "phrase" and "near" can - accept a mix of single words and phrases enclosed in double quotes. - Stemming expansion will be performed for all terms not beginning with a - capital letter, except for terms inside "phrase" clauses. Wildcards will - be processed everywhere. + * Filename search. - Advanced search will also let you search for documents of specific mime - types (ie: only text/plain, or text/HTML or application/pdf etc...). The - state of the file type selection can be saved as the default (the file - type filter will not be activated at program start-up, but the lists will - be in the restored state). + Additional entry fields can be created by clicking the Add clause + button. - You can also restrict the search results to a sub-tree of the indexed - area. If you need to do this often, you may think of setting up multiple - indexes instead, as the performance will be much better. + When searching, the non-empty clauses will be combined either with an + AND or an OR conjunction, depending on the choice made on the left + (All clauses or Any clause). + + Entries of all types except "Phrase" and "Near" accept a mix of single + words and phrases enclosed in double quotes. Stemming and wildcard + expansion will be performed as for simple search. + + * The next part allows filtering the results by their mime types. + + The state of the file type selection can be saved as the default (the + file type filter will not be activated at program start-up, but the + lists will be in the restored state). + + * The bottom part allows restricting the search results to a sub-tree of + the indexed area. If you need to do this often, you may think of + setting up multiple indexes instead, as the performance will be much + better. + + Phrases and Proximity searches. These two clauses work in similar ways, + with the difference that proximity searches do not impose an order on the + words. In both cases, an adjustable number (slack) of non-matched words + may be accepted between the searched ones (use the counter on the left to + adjust this count). For phrases, the default count is zero (exact match). + For proximity it is ten (meaning that two search terms, would be matched + if found within a window of twelve words). Examples: a phrase search for + quick fox with a slack of 0 will match quick fox but not quick brown fox. + With a slack of 1 it will match the latter, but not fox quick. A proximity + search for quick fox with the default slack will match the latter, and + also a fox is a cunning and quick animal. Click on the Start Search button in the advanced search dialog, or type Enter in any text field to start the search. The button in the main window @@ -1020,7 +1063,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or ---------------------------------------------------------------------- -3.10. Sorting search results +3.10. Sorting search results and collapsing duplicates The documents in a result list are normally sorted in order of relevance. It is possible to specify different sort parameters by using the Sort @@ -1038,6 +1081,13 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or possible to keep the sorting activation state between program invocations by checking the Remember sort activation state option in the preferences. + It is also possible to hide duplicate entries inside the result list + (documents with the exact same contents as the displayed one). The test of + identity is based on an MD5 hash of the document container, not only of + the text contents (so that ie, a text document with an image added will + not be a duplicate of the text only). Duplicates hiding is controlled by + an entry in the Query configuration dialog, and is off by default. + ---------------------------------------------------------------------- 3.11. Search tips, shortcuts @@ -1081,10 +1131,10 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or Phrases and Proximity searches. A phrase can be looked for by enclosing it in double quotes. Example: "user manual" will look only for occurrences of - user immediately followed by manual. You can use the This exact phrase - field of the advanced search dialog to the same effect. Phrases can be - entered along simple terms in all simple or advanced search entry fields - (except This exact phrase). + user immediately followed by manual. You can use the This phrase field of + the advanced search dialog to the same effect. Phrases can be entered + along simple terms in all simple or advanced search entry fields (except + This exact phrase). AutoPhrases. This option can be set in the preferences dialog. If it is set, a phrase will be automatically built and added to simple searches @@ -1136,6 +1186,9 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or * Number of results in a result page: + * Hide duplicate results: decides if result list entries are shown for + identical documents found in different places. + * Highlight color for query terms: Terms from the user query are highlighted in the result list samples and the preview window. The color can be chosen here. Any QT color string should work (ie red, @@ -1267,7 +1320,107 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or ---------------------------------------------------------------------- - Chapter 4. Programming interface + Chapter 4. Searching with the KDE KIO slave + +4.1. What's this + + The Recoll KIO slave allows performing a Recoll search by entering an + appropriate URL in a KDE open dialog, or with an HTML-based interface + displayed in Konqueror. + + The HTML-based interface is similar to the QT-based interface, but + slightly less powerful for now. Its advantage is that you can perform your + search while staying fully within the KDE framework: drag and drop from + the result list works normally and you have your normal choice of + applications for opening files. + + The alternative interface uses a directory view of search results. Due to + limitations in the current KIO slave interface, it is currently not + obviously useful (to me). + + The interface is described in more detail inside a help file which you can + access by entering recoll:/ inside the konqueror URL line (this works only + if the recoll KIO slave has been previously installed). + + The instructions for building this module are located in the source tree. + See: kde/kio/recoll/00README.txt + + ---------------------------------------------------------------------- + +4.2. Searchable documents + + As a sample application, the Recoll KIO slave could allow preparing a set + of HTML documents (for example a manual) so that they become their own + search interface inside konqueror. + + This can be done by either explicitely inserting + links around some document areas, or automatically by adding a very small + javascript program to the documents, like the following example, which + would initiate a search by double-clicking any term: + + + .... + + + ---------------------------------------------------------------------- + + Chapter 5. Searching on the command line + + There are several ways to obtain search results as a text stream, without + a graphical interface: + + * By passing option -t to the recoll program. + + * By using the recollq program. + + * By writing a custom Python program, using the Recoll Python API. + + The first two methods work in the same way and accept/need the same + arguments (except for the additional -t to recoll). The query to be + executed is specified as command line arguments. + + recollq is not built by default. You can use the Makefile in the query + directory to build it. This is a very simple program, and it will often be + useful to taylor its output format to your needs. + + recollq has a man page (not installed by default, look in the doc/man + directory). The Usage string is as follows: + + recollq [-o|-a|-f] + Runs a recoll query and displays result lines. + Default: will interpret the argument(s) as a query language string + -o Emulate the gui simple search in ANY TERM mode + -a Emulate the gui simple search in ALL TERMS mode + -f Emulate the gui simple search in filename mode + Common options: + -c : specify config directory, overriding $RECOLL_CONFDIR + -d also dump file contents + -n limit the maximum number of results (0->no limit, default 2000) + -b : basic. Just output urls, no mime types or titles + -m : dump the whole document meta[] array + -S fld : sort by field name + -D : sort descending + + Sample execution: + + recollq 'ilur -nautique mime:text/html' + Recoll query: ((((ilur:(wqf=11) OR ilurs) AND_NOT (nautique:(wqf=11) + OR nautiques OR nautiqu OR nautiquement)) FILTER Ttext/html)) + 4 results + text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/comptes.html] [comptes.html] 18593 bytes + text/html [file:///Users/uncrypted-dockes/projets/nautique/webnautique/articles/ilur1/index.html] [Constructio... + text/html [file:///Users/uncrypted-dockes/projets/pagepers/index.html] [psxtcl/writemime/recoll]... + text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/recu-chasse-maree.... + + ---------------------------------------------------------------------- + + Chapter 6. Programming interface Recoll has an Application programming Interface, usable both for indexing and searching, currently accessible from the Python language. @@ -1280,7 +1433,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or ---------------------------------------------------------------------- -4.1. Writing a document filter +6.1. Writing a document filter Recoll filters are executable programs which translate from a specific format (ie: openoffice, acrobat, etc.) to the Recoll indexing input @@ -1334,7 +1487,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or ---------------------------------------------------------------------- - 4.1.1. Filter HTML output + 6.1.1. Filter HTML output The output HTML could be very minimal like the following example: @@ -1367,7 +1520,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or ---------------------------------------------------------------------- -4.2. Field data processing configuration +6.2. Field data processing configuration Fields are named pieces of information in or about documents, like title, author, abstract. @@ -1402,9 +1555,9 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or ---------------------------------------------------------------------- -4.3. API +6.3. API - 4.3.1. Interface elements + 6.3.1. Interface elements A few elements in the interface are specific and and need an explanation. @@ -1445,9 +1598,9 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or ---------------------------------------------------------------------- - 4.3.2. Python interface + 6.3.2. Python interface - 4.3.2.1. Introduction + 6.3.2.1. Introduction Recoll versions after 1.11 define a Python programming interface, both for searching and indexing. @@ -1463,7 +1616,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or ---------------------------------------------------------------------- - 4.3.2.2. Interface manual + 6.3.2.2. Interface manual NAME recoll - This is an interface to the Recoll full text indexer. @@ -1653,7 +1806,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or ---------------------------------------------------------------------- - 4.3.2.3. Example code + 6.3.2.3. Example code The following sample would query the index with a user language string. See the python/samples directory inside the Recoll source for other @@ -1684,9 +1837,9 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or ---------------------------------------------------------------------- - Chapter 5. Installation + Chapter 7. Installation -5.1. Installing a prebuilt copy +7.1. Installing a prebuilt copy Recoll binary packages from the Recoll web site are always linked statically to the Xapian libraries, and have no other dependencies. You @@ -1697,14 +1850,14 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or ---------------------------------------------------------------------- - 5.1.1. Installing through a package system + 7.1.1. Installing through a package system If you use a BSD-type port system or a prebuilt package (RPM or other), just follow the usual procedure for your system. ---------------------------------------------------------------------- - 5.1.2. Installing a prebuilt Recoll + 7.1.2. Installing a prebuilt Recoll The unpackaged binary versions on the Recoll web site are just compressed tar files of a build tree, where only the useful parts were kept @@ -1719,7 +1872,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or ---------------------------------------------------------------------- -5.2. Supporting packages +7.2. Supporting packages Recoll uses external applications to index some file types. You need to install them for the file types that you wish to have indexed (these are @@ -1767,9 +1920,9 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or ---------------------------------------------------------------------- -5.3. Building from source +7.3. Building from source - 5.3.1. Prerequisites + 7.3.1. Prerequisites At the very least, you will need to download and install the xapian core package (Recoll 1.9 normally uses version 1.0.2, but any 0.9 or 1.0.x @@ -1787,7 +1940,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or ---------------------------------------------------------------------- - 5.3.2. Building + 7.3.2. Building Recoll has been built on Linux (redhat7.3, mandriva 2005/6, Fedora Core 3/4/5/6), FreeBSD 5/6, macosx, and Solaris 8. If you build on another @@ -1827,7 +1980,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or ---------------------------------------------------------------------- - 5.3.3. Installation + 7.3.3. Installation Either type make install or execute recollinstall prefix, in the root of the source tree. This will copy the commands to prefix/bin and the sample @@ -1842,24 +1995,37 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or ---------------------------------------------------------------------- -5.4. Configuration overview +7.4. Configuration overview Most of the parameters specific to the recoll GUI are set through the Preferences menu and stored in the standard QT place ($HOME/.qt/recollrc). You probably do not want to edit this by hand. - For other options, Recoll uses text configuration files. You will have to - edit them by hand for now (there is still some hope for a GUI - configuration tool in the future). The most accurate documentation for the - configuration parameters is given by comments inside the default files, - and we will just give a general overview here. + Recoll indexing options are set inside text configuration files located in + a configuration directory. There can be several such directories, each of + which define the parameters for one index. - There are two sets of configuration files. The system-wide files are kept - in a directory named like /usr/[local/]share/recoll/examples, they define - default values for the system. A parallel set of files exists by default - in the .recoll directory in your home. This directory can be changed with - the RECOLL_CONFDIR environment variable or the -c option parameter to - recoll and recollindex. + The configuration files can be edited by hand or through the Indexing + configuration dialog (Preferences menu). The GUI tool will try to respect + your formatting and comments as much as possible, so it is quite possible + to use both ways. + + The most accurate documentation for the configuration parameters is given + by comments inside the default files, and we will just give a general + overview here. + + For each index, there are two sets of configuration files. System-wide + configuration files are kept in a directory named like + /usr/[local/]share/recoll/examples, and define default values, shared by + all indexes. For each index, a parallel set of files defines the + customized parameters. + + The default location of the configuration is the .recoll directory in your + home. Most people will only use this directory. + + This location can be changed, or others can be added with the + RECOLL_CONFDIR environment variable or the -c option parameter to recoll + and recollindex. If the .recoll directory does not exist when recoll or recollindex are started, it will be created with a set of empty configuration files. @@ -1902,7 +2068,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or ---------------------------------------------------------------------- - 5.4.1. Main configuration file + 7.4.1. Main configuration file recoll.conf is the main configuration file. It defines things like what to index (top directories and things to ignore), and the default character @@ -2059,6 +2225,14 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or If the variable is unspecified or the list empty (the default), all supported types are processed. + compressedfilemaxkbs + + Size limit for compressed (.gz or .bz2) files. These need to be + decompressed in a temporary directory for identification, which + can be very wasteful if 'uninteresting' big compressed files are + present. Negative means no limit, 0 means no processing of any + compressed file. Defaults to -1. + indexallfilenames Recoll indexes file names in a special section of the database to @@ -2112,7 +2286,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or ---------------------------------------------------------------------- - 5.4.2. The mimemap file + 7.4.2. The mimemap file mimemap specifies the file name extension to mime type mappings. @@ -2138,7 +2312,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or ---------------------------------------------------------------------- - 5.4.3. The mimeconf file + 7.4.3. The mimeconf file mimeconf specifies how the different mime types are handled for indexing, and which icons are displayed in the recoll result lists. @@ -2152,7 +2326,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or ---------------------------------------------------------------------- - 5.4.4. The mimeview file + 7.4.4. The mimeview file mimeview specifies which programs are started when you click on an Edit link in a result list. Ie: HTML is normally displayed using firefox, but @@ -2175,9 +2349,9 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or ---------------------------------------------------------------------- - 5.4.5. Examples of configuration adjustments + 7.4.5. Examples of configuration adjustments - 5.4.5.1. Adding an external viewer for an non-indexed type + 7.4.5.1. Adding an external viewer for an non-indexed type Imagine that you have some kind of file which does not have indexable content, but for which you would like to have a functional Edit link in @@ -2210,7 +2384,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or ---------------------------------------------------------------------- - 5.4.5.2. Adding indexing support for a new file type + 7.4.5.2. Adding indexing support for a new file type Let us now imagine that the above .blob files actually contain indexable text and that you know how to extract it with a command line program. @@ -2241,7 +2415,7 @@ More documentation can be found in the doc/ directory or at http://www.recoll.or ---------------------------------------------------------------------- -5.5. The KDE Kicker Recoll applet +7.5. The KDE Kicker Recoll applet The Recoll source tree contains the source code to the recoll_applet, a small application derived from the find_applet. This can be used to add a