From 19aa3cf60794292f76dfa4cf0adea72767fc2a9b Mon Sep 17 00:00:00 2001 From: Jean-Francois Dockes Date: Sat, 29 Jan 2011 18:21:58 +0100 Subject: [PATCH] described the new table result display --- src/doc/man/recollindex.1 | 24 ++++- src/doc/user/usermanual.sgml | 193 ++++++++++++++++++++++------------- website/features.html | 7 +- 3 files changed, 151 insertions(+), 73 deletions(-) diff --git a/src/doc/man/recollindex.1 b/src/doc/man/recollindex.1 index 1b08c006..aad76d7a 100644 --- a/src/doc/man/recollindex.1 +++ b/src/doc/man/recollindex.1 @@ -33,7 +33,7 @@ recollindex \- indexing command for the Recoll full text search system ] .B -i - +[] .br .B recollindex [ @@ -41,7 +41,7 @@ recollindex \- indexing command for the Recoll full text search system ] .B -e - +[] .br .B recollindex [ @@ -115,12 +115,30 @@ The other modes are useful mainly for testing. .PP .B recollindex -i will index individual files into the database. The stem expansion databases -will not be updated. +will not be updated. .PP .B recollindex -e will erase data for individual files from the database. The stem expansion databases will not be updated. .PP +With options +.B -i +or +.B -e +, if no file names are given on the command line, they +will be read from stdin, so that you could for example run: +.PP +find /path/to/dir -print | recollindex -e +.PP +followed by +.PP +find /path/to/dir -print | recollindex -i +.PP +to force the reindexing of a directory tree (which has to exist inside the +file system area defined by +.I topdirs +in recoll.conf). +.PP .B recollindex -s will build the stem expansion database for a given language, which may or may not be part of the list in the configuration file. If the language is diff --git a/src/doc/user/usermanual.sgml b/src/doc/user/usermanual.sgml index 0540763a..631f1d26 100644 --- a/src/doc/user/usermanual.sgml +++ b/src/doc/user/usermanual.sgml @@ -79,26 +79,26 @@ those terms are prominent, in a similar way to Internet search engines. - &RCL; tries to determine which documents are most relevant to - the search terms you provide. Computer algorithms for determining - relevance can be very complex, and in general are inferior to the - power of the human mind to rapidly determine relevance. The quality - of relevance guessing by the search tool is probably the most - important element for a search application. + A search application tries to determine which documents are + most relevant to the search terms you provide. Computer algorithms + for determining relevance can be very complex, and in general are + inferior to the power of the human mind to rapidly determine + relevance. The quality of relevance guessing is probably the most + important aspect when evaluating a search application. In many cases, you are looking for all the forms of a - word, not for a specific form or spelling. These different - forms may include plurals, different tenses for a verb, or - terms derived from the same root or stem - (example: floor, floors, floored, flooring...). &RCL; will by - default expand queries to all such related terms (words that - reduce to the same stem). This expansion can be disabled at - search time. + word, not for a specific form or spelling. These different forms + may include plurals, different tenses for a verb, or terms derived + from the same root or stem (example: floor, + floors, floored, flooring...). Search applications usually expand + queries to all such related terms (words that reduce to the same + stem) and also provide a way to disable this expansion if you are + actually searching for a specific form. - Stemming, by itself, does not accommodate for misspellings or + Stemming, by itself, does not accommodate for misspellings or phonetic searches. &RCL; supports these features through a specific tool (the term explorer) which will let you - explore the set of index terms along different modes. + explore the set of index terms along different modes. @@ -111,8 +111,8 @@ library as its storage and retrieval engine. &XAP; is a very mature package using a sophisticated - probabilistic ranking model. &RCL; provides the interface - to get data into (indexing) and out (searching) of the system. + probabilistic ranking model. &RCL; provides the mechanisms + and interface to get data into and out of the system. In practice, &XAP; works by remembering where terms appear in your document files. The acquisition process is called @@ -160,10 +160,16 @@ recoll search graphical user interface, or by executing the recollindex command. - Searches are - performed inside the recoll - program, which has many options to help you find what you are - looking for. + Searches are usually + performed inside the recoll graphical user + interface (GUI) program, which has many options to help you find + what you are looking for. However, there are other ways to perform + &RCL; searches: mostly a + command line tool, a + + Python + programming interface, and a + KDE KIO slave module. @@ -202,12 +208,11 @@ Real time indexing: indexing takes place as soon as a file is created or changed. recollindex runs as a daemon - and uses a file system alteration monitor such as - Fam, - Gamin or - inotify do detect file changes. - Monitoring a big directory tree can consume significant - system resources. + and uses a file system alteration monitor such as + inotify, + Fam or + Gamin + to detect file changes. @@ -217,15 +222,21 @@ indexes (ie: use periodic indexing on a big documentation directory, and real time indexing on a small home directory). Monitoring a big file system tree can consume - significant system resources, for dubious gains. + significant system resources. &RCL; knows about quite a few different document types. The parameters for document types recognition and processing are set in - configuration files - Most file types, like HTML or word processing files, only hold - one document. Some file types, like mail folder files, can hold - many individually indexed documents. + configuration files. + + + Most file types, like HTML or word processing files, only hold + one document. Some file types, like mail folder files or zip + archives, can hold many individually indexed documents, which may + in turn be themselves compound ones. Such hierarchies can go quite + deep, and &RCL; has no problem processing, for example, an ms-word + document which would be an attachment to an email message part of + a folder file archived inside a zip file... &RCL; indexing processes plain text, HTML, openoffice @@ -509,18 +520,20 @@ recoll The indexing process can be interrupted by sending an interrupt (^C, SIGINT) or terminate (SIGTERM) signal. Some time may elapse before the process exits, because it needs to properly flush - and close the index. The indexing will restart at the - interruption point the next time (the full file tree will still be - traversed, but files that were indexed up to the interruption and - are still up to date will not need to be reindexed). + and close the index. After such an interruption, the index will be somewhat inconsistent because some operations which are normally performed at the end of the indexing pass will have been skipped (for exemple, the stemming and spelling databases will be inexistant or out of date). You just need to restart indexing at a later - time to restore consistency. + time to restore consistency. The indexing will restart at the + interruption point (the full file tree will be traversed, + but files that were indexed up to the interruption and are still + up to date will not need to be reindexed). + recollindex has a number of other options + which are described in its man page. @@ -635,7 +648,7 @@ fvwm a single entry field where you can enter multiple words. Advanced search (a panel accessed through the - Tools menu or the toolbox bar icon) shas + Tools menu or the toolbox bar icon) has multiple entry fields, which you may use to build a logical condition, with additional filtering on file type and location in the file system. @@ -675,11 +688,17 @@ fvwm - The initial default search mode is All - terms. This will look for documents containing all - of the search terms (the ones with more terms will get better - scores). Any term will search for - documents where at least one of the terms appear. + The initial default search mode is Query + language. Without special directives, this will look for + documents containing all of the search terms (the ones with more + terms will get better scores), just like the All + terms mode which will ignore such + directives. Any term will search for documents + where at least one of the terms appear. + + The Query Language features are + described in a separate + section. File name will specifically look for file names. The entry will be split at white space characters, @@ -718,10 +737,6 @@ fvwm efficiently on a relatively small subset of the index (allowing wild cards on the left of terms without excessive penality). - The fourth entry (Query Language) is - described in its own - section. - All search modes allow wildcards inside terms (*, ?, []). You may want to have a look at the @@ -768,16 +783,18 @@ fvwm - The result list + The default result list After starting a search, a list of results will instantly be displayed in the main list window. By default, the document list is presented in order of relevance (how well the system estimates that the document - matches the query). You can specify a different ordering by - using the Tools - / Sort parameters dialog. + matches the query). You can sort the result by ascending or + descending date by using the vertical arrows in the toolbar (the old + sort tool is gone after release 1.15, because the new result table has much better + capability). Clicking on the Preview link for an entry will open an @@ -871,21 +888,53 @@ fvwm current result. The Parent document entries will - appear for documents which are not actually files but are - part of, or attached to, a higher level document. This entry - is mainly useful for email attachments and permits viewing - the message to which the document is attached. Note that the - entry will also appear for an email which is part of an mbox - folder file, but that you can't actually visualize the - folder (there will be an error dialog if you try). &RCL; is - unfortunately not yet smart enough to disable the entry in - this case. In other cases, the Open option makes sense, for - exemple to start a chm viewer on the parent document for a help - page. + appear for documents which are not actually files but are part + of, or attached to, a higher level document. This entry is mainly + useful for email attachments and permits viewing the message to + which the document is attached. Note that the entry will also + appear for an email which is part of an mbox folder file, but + that you can't actually visualize the folder (there will be an + error dialog if you try). &RCL; is unfortunately not yet smart + enough to disable the entry in this case. In other cases, the + Open option makes sense, for exemple to + start a chm viewer on the parent + document for a help page. + + The alternate result table + + In &RCL; 1.15 and newer, the results can now be shown in a + spreadsheet-like display. You can switch to this presentation by + clicking the table-like icon in the toolbar (this is a toggle, + click again to restore the list). + + Clicking on the column headers will allow sorting by the + values in the column. You can click again to invert the order, and + use the header right-click menu to reset sorting to the default + relevance order. + + Both the list and the table display the same underlying + results. The sort order set from the table is still active if you + switch back to the list mode. You can click twice on a date sort + arrow to reset it from there. + + The header right-click menu allows adding or deleting + columns. The columns can be resized, and their order can be changed + (by dragging). All the changes are recorded when you quit + recoll + + Hovering over a table row will update the detail area at the + bottom of the window with the corresponding values. You can click + the row to freeze the display. The bottom area is equivalent to a + classical result list paragraph, with links for + starting a preview or a native application, and an equivalent + right-click menu. + + + The preview window @@ -2041,12 +2090,12 @@ text/html [file:///Users/uncrypted-dockes/projets/bateaux/ilur/factEtCie/r Hotkeying recoll It is surprisingly convenient to be able to show or hide the - &RCL; GUI with a single keystroke. Recoll comes with a small - python script, based on the libwnck window manager - interface library, which will allow you to do just this. The detailed - instructions are on - - this wiki page. + &RCL; GUI with a single keystroke. Recoll comes with a small + Python script, based on the libwnck window + manager interface library, which will allow you to do just + this. The detailed instructions are on + + this wiki page. @@ -2811,7 +2860,13 @@ while query.next >= 0 and query.next < nres: Zip archives need Python (and the standard zipfile module). - + + Midi karaoke files need + Python and the + + Midi module + + Text, HTML, mail folders, and Scribus files are diff --git a/website/features.html b/website/features.html index 7f23a03f..5d9fcebe 100644 --- a/website/features.html +++ b/website/features.html @@ -198,11 +198,16 @@ on mutagen for all audio types. -
  • Image file tags support with Image file tags with exiftool. This is a perl program, so you also need perl on the system. This works with about any possible image file and tag format (jpg, png, tiff, gif etc.).
  • + +
  • Midi karaoke files with Python and the + + midi module.
  • +

    Other features