diff --git a/src/doc/user/usermanual.sgml b/src/doc/user/usermanual.sgml index 0a58707b..3a160a4e 100644 --- a/src/doc/user/usermanual.sgml +++ b/src/doc/user/usermanual.sgml @@ -24,7 +24,7 @@ Dockes - $Id: usermanual.sgml,v 1.67 2008-10-10 08:19:12 dockes Exp $ + $Id: usermanual.sgml,v 1.68 2008-10-13 07:57:12 dockes Exp $ This document introduces full text search notions @@ -1586,7 +1586,194 @@ fvwm Programming interface - + &RCL; has an Application programming Interface, usable both + for indexing and searching, currently accessible from the + Python language. + + Another less radical way to extend the application is to + write filters for new types of documents. + + The processing of metadata attributes for documents + (fields) is highly configurable. + + + Writing a document filter + + &RCL; filters are executable programs which + translate from a specific format (ie: + openoffice, + acrobat, etc.) to the &RCL; + indexing input format, which may be + text/plain or + text/html. + + &RCL; filters are usually shell-scripts, but this is in + no way necessary. These programs are extremely simple and most + of the difficulty lies in extracting the text from the native + format, not outputting what is expected by &RCL;. Happily + enough, most document formats already have translators or text + extractors which handle the difficult part and can be called + from the filter. In some case the output of the translating + program is appropriate, and no intermediate shell-script is + needed. + + Filters are called with a single argument which is the + source file name. They should output the result to stdout. + + The RECOLL_FILTER_FORPREVIEW + environment variable (values yes, + no) tells the filter if the operation is + for indexing or previewing. Some filters use this to output a + slightly different format. This is not essential. + + The association of file types to filters is performed in + the mimeconf file. A sample: + + +[index] +application/msword = exec antiword -t -i 1 -m UTF-8;\ + mimetype=text/plain;charset=utf-8 + +application/ogg = exec rclogg + +text/rtf = exec unrtf --nopict --html; charset=iso-8859-1; mimetype=text/html + + + The fragment specifies that: + + + + application/msword files + are processed by executing the antiword + program, which outputs + text/plain encoded in + iso-8859-1. + + + application/ogg files are + processed by the rclogg script, with + default output type (text/html, with + encoding specified in the header, or utf-8 + by default). + + + text/rtf is processed by + unrtf, which outputs + text/html. The + iso-8859-1 encoding is specified because it + is not the utf-8 default, and not output by + unrtf in the HTML header section. + + + + The easiest way to write a new filter is probably to start + from an existing one. + + Filters which output text/plain text + are generally simpler, but they cannot specify the character set + and other metadata, so they are limited to cases where these + elements are not needed. + + + + Filter HTML output + + The output HTML could be very minimal like the following + example: + + <html><head> +<meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> +</head> +<body>some text content</body></html> + + + You should take care to escape some + characters inside + the text by transforming them into appropriate + entities. "&" should be transformed into + "&amp;", "<" + should be transformed into + "&lt;". This is not always properly + done by translating programs which output HTML, and of + course nerver by those which output plain text. + + The character set needs to be specified in the + header. It does not need to be UTF-8 (&RCL; will take care + of translating it), but it must be accurate for good + results. + + &RCL; will also make use of other header fields if + they are present: title, + description, + keywords. + + Filters also have the possibility to "invent" field + names. This should be output as meta tags: + + +<meta name="somefield" content="Some textual data" /> + + + See the following section for details about configuring + how field data is processed by the indexer. + + + + + + + Field data processing configuration + + Fields are named pieces of information + in or about documents, like title, + author, abstract. + + The field values for documents can appear in several ways + during indexing: either output by filters as + meta fields in the HTML header section, or + added as attributes of the Doc object when + using the API, or again synthetized internally by &RCL;. + + The &RCL; query language allows searching for text in a + specific field. + + &RCL; defines a number of default fields. Additional + ones can be output by filters, and described in the + fields configuration file. + + Fields can be: + + + indexed, meaning that their + terms are separately stored in inverted lists (with a specific + prefix), and that a field-specific search is possible. + + + stored, meaning that their + value is recorded in the index data record for the document, + and can be returned and displayed with search results. + + + + + A field can be either or both indexed and stored. + + A field becomes indexed by having a prefix defined in + the [prefixes] section of the + fields file. See the comments in there for + details + + A field becomes stored by appearing in + the [stored] section of the + fields file. + + + + + + API + + Interface elements A few elements in the interface are specific and and need @@ -1642,12 +1829,12 @@ fvwm indexing. The main indexer documents would also probably be a problem for the external indexer purge operation. - + - + Python interface - + Introduction &RCL; versions after 1.11 define a Python programming @@ -1666,10 +1853,10 @@ fvwm - + - + Interface manual @@ -1859,9 +2046,9 @@ FUNCTIONS + - - + Example code The following sample would query the index with a user @@ -1894,11 +2081,13 @@ while query.next >= 0 and query.next < nres: - + + + - + Installation @@ -2686,11 +2875,11 @@ skippedPaths = ~/somedir/∗.txt be an executable program or script which exists inside /usr/[local/]share/recoll/filters. It will be given a file name as argument and should output the - text contents in html format on the standard output. + text contents on the standard output. - You can find more details about writing a &RCL; filter - in the section about - writing filters + The filter + programming section describes in more detail how to + write a filter. @@ -2724,83 +2913,6 @@ skippedPaths = ~/somedir/∗.txt running). You may find it useful anyway. - - - Extending &RCL; - - - Writing a document filter - - &RCL; filters are executable programs which - translate from a specific format (ie: - openoffice, - acrobat, etc.) to the &RCL; - indexing input format, which was chosen to be HTML. - - &RCL; filters are usually shell-scripts, but this is in - no way necessary. These programs are extremely simple and most - of the difficulty lies in extracting the text from the native - format, not outputting what is expected by &RCL;. Happily - enough, most document formats already have translators or text - extractors which handle the difficult part and can be called - from the filter. - - Filters are called with a single argument which is the - source file name. They should output the result to stdout. - - The RECOLL_FILTER_FORPREVIEW - environment variable (values yes, - no) tells the filter if the operation is - for indexing or previewing. Some filters use this to output a - slightly different format. This is not essential. - - The output HTML could be very minimal like the following - example: - - <html><head> -<meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> -</head> -<body>some text content</body></html> - - - You should take care to escape some characters inside - the text by transforming them into appropriate - entities. "&" should be transformed into - "&amp;", "<" - should be transformed into "&lt;". - - The character set needs to be specified in the - header. It does not need to be UTF-8 (&RCL; will take care - of translating it), but it must be accurate for good - results. - - &RCL; will also make use of other header fields if - they are present: title, - description, - keywords. - - As of &RCL; release 1.9, filters also have the - possibility to "invent" field names. This should be output as - meta tags: - - -<meta name="somefield" content="Some textual data" /> - - - In this case, a correspondance between field name and - &XAP; prefix should also be added to the - mimeconf file. See the existing entries - for inspiration. The field can then be used inside the query - language to narrow searches. - - The easiest way to write a new filter is probably to start - from an existing one. - - - - - - diff --git a/src/qtgui/i18n/recoll_de.ts b/src/qtgui/i18n/recoll_de.ts index 9eb5db22..2503e739 100644 --- a/src/qtgui/i18n/recoll_de.ts +++ b/src/qtgui/i18n/recoll_de.ts @@ -438,7 +438,7 @@ Drücken Sie Abbrechen, um die Konfigurationsdatei vor dem Start der Indizierung Query results (sorted) - Ergebnisse (sortiert) + Ergebnisse (sortiert) Document history @@ -540,6 +540,56 @@ Soll der Voreinstellungsdialog geöffnet werden? Stop &Indexing + + All + + + + media + Medien + + + message + + + + other + andere + + + presentation + + + + spreadsheet + + + + text + + + + sorted + + + + filtered + + + + External applications/commands needed and not found for indexing your file types: + + + + + + No helpers found missing + + + + Missing helper programs + + RclMainBase @@ -731,6 +781,14 @@ Soll der Voreinstellungsdialog geöffnet werden? &Indexing configuration &Indizierungskonfiguration + + All + + + + &Show missing helpers + + RclResList @@ -861,6 +919,18 @@ Soll der Voreinstellungsdialog geöffnet werden? Documents <b>%1-%2</b> for Dokumente <b>%1-%2</b> für + + filtered + + + + sorted + + + + Document history + Dokumenthistorie + ResListBase @@ -1606,6 +1676,10 @@ Dadurch sollten Ergebnisse, die exakte Übereinstimmungen der Suchworte enthalte Highlight color for query terms + + Prefer Html to plain text for preview. + + ViewAction diff --git a/src/qtgui/i18n/recoll_fr.ts b/src/qtgui/i18n/recoll_fr.ts index 6f60e6fa..b88cdb6d 100644 --- a/src/qtgui/i18n/recoll_fr.ts +++ b/src/qtgui/i18n/recoll_fr.ts @@ -436,7 +436,7 @@ Click Cancel if you want to edit the configuration file before indexation starts Query results (sorted) - Résultats de la recherche (triés) + Résultats de la recherche (triés) Document history @@ -460,7 +460,7 @@ Click Cancel if you want to edit the configuration file before indexation starts Starting help browser - Demarrage de l'outil de consultation de l'aide + Démarrage de l'outil de consultation de l'aide Indexing in progress: @@ -538,6 +538,58 @@ Voulez vous ouvrir le dialogue de paramétrage ? Stop &Indexing Arrèter l'&Indexation + + All + Tout + + + media + multimédia + + + message + message + + + other + autres + + + presentation + présentation + + + spreadsheet + feuille de calcul + + + text + texte + + + sorted + trié + + + filtered + filtré + + + External applications/commands needed and not found for indexing your file types: + + + Applications externes non trouvees pour indexer vos types de fichiers: + + + + + No helpers found missing + Pas d'applications manquantes + + + Missing helper programs + Applications manquantes + RclMainBase @@ -555,7 +607,7 @@ Voulez vous ouvrir le dialogue de paramétrage ? Previous page - Page précedente + Page précédente Next page @@ -615,7 +667,7 @@ Voulez vous ouvrir le dialogue de paramétrage ? &Preferences - &Preferences + &Préférences Search tools @@ -729,6 +781,14 @@ Voulez vous ouvrir le dialogue de paramétrage ? &Indexing configuration Configuration &Indexation + + All + Tout + + + &Show missing helpers + Afficher les application&s manquantes + RclResList @@ -801,7 +861,7 @@ Voulez vous ouvrir le dialogue de paramétrage ? Previous - + Précédent Next @@ -859,6 +919,18 @@ Voulez vous ouvrir le dialogue de paramétrage ? Documents <b>%1-%2</b> for Résultats <b>%1-%2</b> pour + + filtered + filtré + + + sorted + trié + + + Document history + Historique des documents consultés + ResListBase @@ -1601,6 +1673,10 @@ Ceci devrait donner une meilleure pertinence aux résultats où les termes reche Highlight color for query terms Couleur de mise en relief des termes recherchés + + Prefer Html to plain text for preview. + Utiliser le format Html pour la previsualisation. + ViewAction diff --git a/src/qtgui/i18n/recoll_it.ts b/src/qtgui/i18n/recoll_it.ts index 8fb153d0..4241e613 100644 --- a/src/qtgui/i18n/recoll_it.ts +++ b/src/qtgui/i18n/recoll_it.ts @@ -437,7 +437,7 @@ Click Cancel if you want to edit the configuration file before indexation starts Query results (sorted) - Risultati ricerca (ordinati) + Risultati ricerca (ordinati) Document history @@ -539,6 +539,56 @@ Aprire la finestra delle preferenze ? Stop &Indexing + + All + + + + media + multimediali + + + message + + + + other + altri + + + presentation + + + + spreadsheet + + + + text + + + + sorted + + + + filtered + + + + External applications/commands needed and not found for indexing your file types: + + + + + + No helpers found missing + + + + Missing helper programs + + RclMainBase @@ -730,6 +780,14 @@ Aprire la finestra delle preferenze ? &Indexing configuration Conf&igurazione indicizzazione + + All + + + + &Show missing helpers + + RclResList @@ -860,6 +918,18 @@ Aprire la finestra delle preferenze ? Documents <b>%1-%2</b> for Documenti <b>%1-%2</b> per + + filtered + + + + sorted + + + + Document history + Cronologia dei documenti + ResListBase @@ -1599,6 +1669,10 @@ Questo dovrebbe dare la precedenza ai risultati che contengono i termini esattam Highlight color for query terms + + Prefer Html to plain text for preview. + + ViewAction diff --git a/src/qtgui/i18n/recoll_ru.ts b/src/qtgui/i18n/recoll_ru.ts index 00b9bf14..c42f93b2 100644 --- a/src/qtgui/i18n/recoll_ru.ts +++ b/src/qtgui/i18n/recoll_ru.ts @@ -392,7 +392,7 @@ Click Cancel if you want to edit the configuration file before indexation starts Query results (sorted) - Результаты поиска (сортированные) + Результаты поиска (сортированные) Advanced search @@ -534,6 +534,56 @@ Do you want to start the preferences dialog ? Can't start query: + + All + + + + media + мультимедиа + + + message + + + + other + иное + + + presentation + + + + spreadsheet + + + + text + + + + sorted + + + + filtered + + + + External applications/commands needed and not found for indexing your file types: + + + + + + No helpers found missing + + + + Missing helper programs + + RclMainBase @@ -685,6 +735,14 @@ Do you want to start the preferences dialog ? &Indexing configuration Настройки ин&дексирования + + All + + + + &Show missing helpers + + RclResList @@ -948,6 +1006,18 @@ Do you want to start the preferences dialog ? Copy &URL + + filtered + + + + sorted + + + + Document history + История документов + ResListBase @@ -1674,6 +1744,10 @@ This should give higher precedence to the results where the search terms appear Deactivate All + + Prefer Html to plain text for preview. + + ViewAction diff --git a/src/qtgui/i18n/recoll_tr.ts b/src/qtgui/i18n/recoll_tr.ts index 592c6460..a29a14d0 100644 --- a/src/qtgui/i18n/recoll_tr.ts +++ b/src/qtgui/i18n/recoll_tr.ts @@ -352,7 +352,7 @@ Click Cancel if you want to edit the configuration file before indexation starts Query results (sorted) - Arama sonuçları (sıralanmış) + Arama sonuçları (sıralanmış) Cannot retrieve document info from database @@ -426,6 +426,56 @@ Tercihler penceresini açmak ister misiniz? Stop &Indexing + + All + + + + media + ortamlar + + + message + + + + other + diğer + + + presentation + + + + spreadsheet + + + + text + + + + sorted + + + + filtered + + + + External applications/commands needed and not found for indexing your file types: + + + + + + No helpers found missing + + + + Missing helper programs + + RclMainBase @@ -549,6 +599,14 @@ Tercihler penceresini açmak ister misiniz? &Indexing configuration İ&ndeksleme yapılandırması + + All + + + + &Show missing helpers + + ResList @@ -616,6 +674,18 @@ Tercihler penceresini açmak ister misiniz? Query details Sorgu detayları + + filtered + + + + sorted + + + + Document history + Belge geçmişi + SSearch @@ -1047,6 +1117,10 @@ Büyük boyutlu belgelerde yavaş olabilir. Highlight color for query terms + + Prefer Html to plain text for preview. + + ViewAction diff --git a/src/qtgui/i18n/recoll_uk.ts b/src/qtgui/i18n/recoll_uk.ts index 7f1b1985..53dfbc00 100644 --- a/src/qtgui/i18n/recoll_uk.ts +++ b/src/qtgui/i18n/recoll_uk.ts @@ -389,7 +389,7 @@ Click Cancel if you want to edit the configuration file before indexation starts Query results (sorted) - Результати запиту (сортовано) + Результати запиту (сортовано) Advanced search @@ -527,6 +527,56 @@ Do you want to start the preferences dialog ? Can't start query: + + All + + + + media + мультимедіа + + + message + + + + other + інше + + + presentation + + + + spreadsheet + + + + text + + + + sorted + + + + filtered + + + + External applications/commands needed and not found for indexing your file types: + + + + + + No helpers found missing + + + + Missing helper programs + + RclMainBase @@ -678,6 +728,14 @@ Do you want to start the preferences dialog ? &Indexing configuration &Конфіґурація індексування + + All + + + + &Show missing helpers + + RclResList @@ -808,6 +866,18 @@ Do you want to start the preferences dialog ? Copy &URL + + filtered + + + + sorted + + + + Document history + Історія документів + SSearch @@ -1505,6 +1575,10 @@ This should give higher precedence to the results where the search terms appear Deactivate All + + Prefer Html to plain text for preview. + + ViewAction diff --git a/src/qtgui/i18n/recoll_xx.ts b/src/qtgui/i18n/recoll_xx.ts index 4c1c7f0b..13f56092 100644 --- a/src/qtgui/i18n/recoll_xx.ts +++ b/src/qtgui/i18n/recoll_xx.ts @@ -305,10 +305,6 @@ Click Cancel if you want to edit the configuration file before indexation starts Query results - - Query results (sorted) - - Cannot retrieve document info from database @@ -379,6 +375,56 @@ Do you want to start the preferences dialog ? Stop &Indexing + + All + + + + media + + + + message + + + + other + + + + presentation + + + + spreadsheet + + + + text + + + + sorted + + + + filtered + + + + External applications/commands needed and not found for indexing your file types: + + + + + + No helpers found missing + + + + Missing helper programs + + RclMainBase @@ -502,6 +548,14 @@ Do you want to start the preferences dialog ? &Indexing configuration + + All + + + + &Show missing helpers + + ResList @@ -569,6 +623,18 @@ Do you want to start the preferences dialog ? Query details + + filtered + + + + sorted + + + + Document history + + SSearch @@ -998,6 +1064,10 @@ May be slow for big documents. Highlight color for query terms + + Prefer Html to plain text for preview. + + ViewAction diff --git a/src/qtgui/rclmain.ui b/src/qtgui/rclmain.ui index 60fef3e2..2625f176 100644 --- a/src/qtgui/rclmain.ui +++ b/src/qtgui/rclmain.ui @@ -368,7 +368,7 @@ - + ssearch_w.h reslist.h diff --git a/src/qtgui/rclmain_w.cpp b/src/qtgui/rclmain_w.cpp index 5594855d..41df30e2 100644 --- a/src/qtgui/rclmain_w.cpp +++ b/src/qtgui/rclmain_w.cpp @@ -1,5 +1,5 @@ #ifndef lint -static char rcsid[] = "@(#$Id: rclmain_w.cpp,v 1.56 2008-10-08 16:15:22 dockes Exp $ (C) 2005 J.F.Dockes"; +static char rcsid[] = "@(#$Id: rclmain_w.cpp,v 1.57 2008-10-13 07:57:12 dockes Exp $ (C) 2005 J.F.Dockes"; #endif /* * This program is free software; you can redistribute it and/or modify @@ -36,6 +36,8 @@ using std::pair; #if (QT_VERSION < 0x040000) #include #include +#include +#include #endif #include #include