From bc83e2981e247b6c65343fb9b3328e0f189de35a Mon Sep 17 00:00:00 2001 From: Jean-Francois Dockes Date: Mon, 2 Mar 2020 14:08:19 +0100 Subject: [PATCH] doc --- src/doc/user/usermanual.html | 507 ++++++++++++++++------------- src/doc/user/usermanual.xml | 615 ++++++++++++++++++----------------- src/windows/mimeview | 2 +- 3 files changed, 600 insertions(+), 524 deletions(-) diff --git a/src/doc/user/usermanual.html b/src/doc/user/usermanual.html index cb4b293a..43fad0ee 100644 --- a/src/doc/user/usermanual.html +++ b/src/doc/user/usermanual.html @@ -10,7 +10,7 @@ + "Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license can be found at the following location: GNU web site. This document introduces full text search notions and describes the installation and use of the Recoll application. This version describes Recoll 1.26."> @@ -35,7 +35,7 @@ alink="#0000FF">
-
@@ -53,7 +53,7 @@ alink="#0000FF"> and describes the installation and use of the Recoll application. This version describes Recoll 1.25.

+ "application">Recoll 1.26.

@@ -92,9 +92,9 @@ alink="#0000FF"> "#RCL.INDEXING.INTRODUCTION.CONFIG">Configurations, multiple indexes
2.1.3. Document types
+ "#idm235">Document types
2.1.4. Indexing failures
+ "#idm276">Indexing failures
2.1.5. Recovery
@@ -190,17 +190,18 @@ alink="#0000FF"> "#RCL.SEARCH.GUI.SIMPLE">Simple search
3.2.2. The default result + "#RCL.SEARCH.GUI.RESLIST">The result list
3.2.3. The result table
3.2.4. Running arbitrary - commands on result files (1.20 and - later)
+ "#RCL.SEARCH.GUI.RUNSCRIPT">Unix-like systems: running + arbitrary commands on result files
3.2.5. Displaying + "#RCL.SEARCH.GUI.THUMBNAILS">Unix-like systems: displaying thumbnails
3.2.6. The preview @@ -429,7 +430,7 @@ alink="#0000FF">

This document introduces full text search notions and describes the installation and use of the Recoll application. It is updated for - Recoll 1.25.

+ Recoll 1.26.

Recoll was for a long time dedicated to Unix-like systems. It was only lately (2015) ported to MS-Windows. @@ -440,10 +441,13 @@ alink="#0000FF"> updated. Until this happens, on Windows, most references to shared files can be translated by looking under the Recoll installation - directory (esp. the Share - subdirectory). The user configuration is stored by default - under AppData/Local/Recoll - inside the user directory, along with the index itself.

+ directory (Typically C:/Program Files + (x86)/Recoll, esp. anything referenced in /usr/share in this document will be found + int the Share subdirectory). + The user configuration is stored by default under + AppData/Local/Recoll inside the + user directory, along with the index itself.

@@ -652,10 +656,12 @@ alink="#0000FF"> files in this directory may be overridden by values set inside your personal configuration. With the default configuration, Recoll will - index your home directory with generic parameters. The - configuration can be customized either by editing the text - files or by using configuration menus in the recoll GUI.

+ index your home directory with generic parameters. Most + common parameters can be set by using configuration menus + in the recoll + GUI. Some less common parameters can only be set by editing + the text files (the new values will be preserved by the + GUI).

The indexing process is started automatically (after asking permission), the first time you @@ -744,11 +750,9 @@ alink="#0000FF">

recollindex skips files which caused an error during a previous pass. This is a - performance optimization, and a new behaviour in version - 1.21 (failed files were always retried by previous - versions). The command line option -k can be set to retry failed files, for - example after updating an input handler.

+ performance optimization, and the command line option + -k can be set to retry failed + files, for example after updating an input handler.

The following sections give an overview of different aspects of the indexing processes and configuration, with links to detailed sections.

@@ -791,9 +795,10 @@ alink="#0000FF"> into your cron file. On Windows, this is - the only mode available, and the indexer is usually - started from the GUI (but there is nothing to - prevent starting it from a command script).

+ the only mode available, and the Windows Task + Scheduler can be used to run indexing. In both + cases, the GUI includes an easy interface to the + system batch scheduler.

  • -

    Unix-like systems: choosing an indexing mode

    @@ -837,7 +842,7 @@ alink="#0000FF"> configured from the recoll GUI: PreferencesIndexing schedule

    + "guimenuitem">Indexing schedule dialog.

    @@ -935,8 +940,8 @@ alink="#0000FF">
    -

    2.1.3. Document types

    +

    2.1.3. Document types

    @@ -1033,8 +1038,8 @@ alink="#0000FF">
    -

    2.1.4. Indexing failures

    +

    2.1.4. Indexing failures

    @@ -1042,20 +1047,15 @@ alink="#0000FF"> reasons: a helper program may be missing, the document may be corrupt, we may fail to uncompress a file because no file system space is available, etc.

    -

    Recoll versions prior - to 1.21 always retried to index files which had - previously caused an error. This guaranteed that anything - that may have become indexable (for example because a - helper had been installed) would be indexed. However this - was bad for performance because some indexing failures - may be quite costly (for example failing to uncompress a - big file because of insufficient disk space).

    -

    The indexer in Recoll +

    The Recoll indexer in versions 1.21 and later does not retry failed files by - default. Retrying will only occur if an explicit option - (-k) is set on the - recollindex - command line, or if a script executed when -k) is set on the recollindex command + line, or if a script executed when recollindex starts up says so. The script is defined by a configuration variable ( files where only the tags would be indexed).

    Of course, images, sound and video do not increase the index size, which means that in most cases, the space used - by the index will be negligible against the total amount of - data on the computer.

    + by the index will be negligible compared to the total + amount of data on the computer.

    The index data directory (xapiandb) only contains data that can be completely rebuilt by an index run (as long as the original @@ -1295,13 +1295,14 @@ alink="#0000FF">

  • -

    Variables set inside the Variables stored inside the Recoll configuration files control which areas of the file system are indexed, and how files - are processed. These variables can be set either by editing - the text files or by using the dialogs in the recoll @@ -1322,7 +1323,7 @@ alink="#0000FF"> "https://www.lesbonscomptes.com/recoll/manpages/recoll.conf.5.html" target="_top">recoll.conf(5) manual - page.Both documents are automatically generated from the + page. Both documents are automatically generated from the comments inside the configuration file.

    The most immediately useful variable is probably "#RCL.INSTALL.EXTERNAL" title= "5.2. Supporting packages">external packages section.

    -

    As of Recoll 1.18 there are two incompatible types of - Recoll indexes, depending on the treatment of character - case and diacritics. A There are two incompatible types of Recoll indexes, + depending on the treatment of character case and + diacritics. A further - section describes the two types in more detail.

    + section describes the two types in more detail. The + default type is appropriate in most cases.

    @@ -1721,12 +1723,13 @@ recoll -c

    By indexing the volume in the main, fixed, index, and ensuring that the volume data is not purged if - the indexing runs while the volume is mounted. - (Recoll 1.25.2).

    + the indexing runs while the volume is mounted. (since + Recoll 1.25.2).

  • By storing a volume index on the volume itself - (Recoll 1.24).

    + (since Recoll + 1.24).

  • @@ -2337,23 +2340,23 @@ metadatacmds = ;

    The GUI File menu has - entries to start or stop the current indexing - operation.

    -

    When no indexing is running, you have a choice of - updating the index or rebuilding it (the first choice + entries to start or stop the current indexing operation. + When indexing is not currently running, you have a choice + of updating the index or rebuilding it (the first choice only processes changed files, the second one zeroes the index before starting so that all files are processed).

    +

    On Linux and Windows, the GUI can be used to manage + the indexing operation. Stopping the indexer can be done + from the recoll GUI FileStop + Indexing menu entry.

    On Linux, the recollindex indexing process can be interrupted by sending an interrupt (Ctrl-C, SIGINT) or terminate (SIGTERM) signal.

    -

    On Linux and Windows, the GUI can used to manage the - indexing operation. Stopping the indexer can be done from - the recoll - GUI FileStop Indexing menu entry.

    When stopped, some time may elapse before recollindex exits, because it needs to properly flush and close the @@ -2368,6 +2371,18 @@ metadatacmds = ; +

    +
    +
    +
    +
    +

    recollindex + command line

    +
    +
    +

    recollindex has many options which are listed in its Qt library.

    recoll has - two search modes:

    + two search interfaces:

    • @@ -2804,40 +2819,6 @@ fs.inotify.max_user_watches=32768 features are described in a separate section.

      -

      The File name search - mode will specifically look for file names. The point of - having a separate file name search is that wild card - expansion can be performed more efficiently on a small - subset of the index (allowing wild cards on the left of - terms without excessive cost). Things to know:

      -
      -
        -
      • -

        White space in the entry should match white - space in the file name, and is not treated - specially.

        -
      • -
      • -

        The search is insensitive to character case and - accents, independently of the type of index.

        -
      • -
      • -

        An entry without any wild card character and not - capitalized will be prepended and appended with '*' - (ie: etc - -> *etc*, but - Etc -> - etc).

        -
      • -
      • -

        If you have a big index (many files), - excessively generic fragments may result in - inefficient searches.

        -
      • -
      -

      When using a stripped index (the default), character case has no influence on search, except that you can disable stem expansion for any term by capitalizing it. @@ -2883,6 +2864,40 @@ fs.inotify.max_user_watches=32768 "guimenu">Tools → Advanced search dialog for more complex searches.

      +

      The File name search + mode will specifically look for file names. The point of + having a separate file name search is that wild card + expansion can be performed more efficiently on a small + subset of the index (allowing wild cards on the left of + terms without excessive cost). Things to know:

      +
      +
        +
      • +

        White space in the entry should match white + space in the file name, and is not treated + specially.

        +
      • +
      • +

        The search is insensitive to character case and + accents, independently of the type of index.

        +
      • +
      • +

        An entry without any wild card character and not + capitalized will be prepended and appended with '*' + (ie: etc + -> *etc*, but + Etc -> + etc).

        +
      • +
      • +

        If you have a big index (many files), + excessively generic fragments may result in + inefficient searches.

        +
      • +
      +
    @@ -2890,27 +2905,27 @@ fs.inotify.max_user_watches=32768

    3.2.2. The - default result list

    + result list

    After starting a search, a list of results will - instantly be displayed in the main list window.

    + instantly be displayed in the main window.

    By default, the document list is presented in order of relevance (how well the system estimates that the document matches the query). You can sort the result by ascending or descending date by using the vertical arrows in the toolbar.

    -

    Clicking on the Preview - link for an entry will open an internal preview window - for the document. Further Preview clicks for the same search will - open tabs in the existing preview window. You can use - Shift+Click - to force the creation of another preview window, which - may be useful to view the documents side by side. (You - can also browse successive results in a single preview - window by typing Clicking the Preview link + for an entry will open an internal preview window for the + document. Further Preview + clicks for the same search will open tabs in the existing + preview window. You can use Shift+Click to force the + creation of another preview window, which may be useful + to view the documents side by side. (You can also browse + successive results in a single preview window by typing + Shift+ArrowUp/Down in the window).

    @@ -2918,39 +2933,23 @@ fs.inotify.max_user_watches=32768 will start an external viewer for the document. By default, Recoll lets the desktop choose the appropriate application for most - document types (there is a short list of exceptions, see - further). If you prefer to completely customize the - choice of applications, you can uncheck the Use desktop preferences option in the - GUI preferences dialog, and click the Choose editor applications button to - adjust the predefined Recoll choices. The tool accepts - multiple selections of MIME types (e.g. to set up the - editor for the dozens of office file types).

    -

    Even when Use desktop - preferences is checked, there is a small list of - exceptions, for MIME types where the Recoll choice should override the - desktop one. These are applications which are well - integrated with Recoll, - especially evince for - viewing PDF and Postscript files because of its support - for opening the document at a specific page and passing a - search string as an argument. Of course, you can edit the - list (in the GUI preferences) if you would prefer to lose - the functionality and use the standard desktop tool.

    -

    You may also change the choice of applications by - editing the mimeview configuration file if you - find this more convenient.

    -

    Each result entry also has a right-click menu with an - Open With entry. This lets - you choose an application from the list of those which - registered with the desktop for the document MIME - type.

    + document types. This currently not customisable on + Windows. See further + for customizing the applications on Unix-like systems.

    +

    You can click on the Query + details link at the top of the results page to see + the query actually performed, after stem expansion and + other processing.

    +

    Double-clicking on any word inside the result list or + a preview window will insert it into the simple search + text.

    +

    The result list is divided into pages (the size of + which you can change in the preferences). Use the arrow + buttons in the toolbar or the links at the bottom of the + page to browse the results.

    The Preview and Open edit links may not be present for all entries, meaning that edit an HTML fragment.

    -

    You can click on the Query - details link at the top of the results page to see - the query actually performed, after stem expansion and - other processing.

    -

    Double-clicking on any word inside the result list or - a preview window will insert it into the simple search - text.

    -

    The result list is divided into pages (the size of - which you can change in the preferences). Use the arrow - buttons in the toolbar or the links at the bottom of the - page to browse the results.

    +
    +
    +
    +
    +

    Unix-like + systems: customising the applications

    +
    +
    +
    +

    By default Recoll + lets the desktop choose what application should be used + to open a given document, with exceptions.

    +

    The details of this behaviour can be customized with + the Preferences → + GUI configuration → + User interface → + Choose editor + applications dialog or by editing the mimeview configuration file.

    +

    When Use desktop + preferences, at the top of the dialog, is + checked, there is a small list of exceptions, for MIME + types where the Recoll + choice should override the desktop one. These are + applications which are well integrated with + Recoll, for example, + on Linux, evince for + viewing PDF and Postscript files because of its support + for opening the document at a specific page and passing + a search string as an argument. You can add or remove + document types to the exceptions by using the + dialog.

    +

    If you prefer to completely customize the choice of + applications, you can uncheck Use desktop preferences, in which + case the Recoll + predefined applications will be used, and can be + changed for each document type. This is probably not + the most convenient approach in most cases.

    +

    In all cases, the applications choice dialog accepts + multiple selections of MIME types in the top section, + and lets you define how they are processed in the + bottom one.

    +

    You may also change the choice of applications by + editing the mimeview configuration file if + you find this more convenient.

    +

    Under Unix-like + systems, each result list entry also has a right-click + menu with an Open With + entry. This lets you choose an application from the + list of those which registered with the desktop for the + document MIME type, on a case by case basis.

    +
    @@ -3063,18 +3111,20 @@ fs.inotify.max_user_watches=32768

    The Preview and Open entries do the same thing as the corresponding links.

    -

    Open With lets you - open the document with one of the applications claiming - to be able to handle its MIME type (the information - comes from the .desktop - files in Open With + (Unix-like systems) + lets you open the document with one of the applications + claiming to be able to handle its MIME type (the + information comes from the .desktop files in /usr/share/applications).

    -

    Run Script allows - starting an arbitrary command on the result file. It - will only appear for results which are top-level files. - See +

    Run Script + (Unix-like systems) + allows starting an arbitrary command on the result + file. It will only appear for results which are + top-level files. See further for a more detailed description.

    The Copy File Name and Copy Url copy the @@ -3129,11 +3179,11 @@ fs.inotify.max_user_watches=32768

    -

    In Recoll 1.15 and - newer, the results can be displayed in spreadsheet-like - fashion. You can switch to this presentation by clicking - the table-like icon in the toolbar (this is a toggle, - click again to restore the list).

    +

    As an alternative to the result list, the results can + also be displayed in spreadsheet-like fashion. You can + switch to this presentation by clicking the table-like + icon in the toolbar (this is a toggle, click again to + restore the list).

    Clicking on the column headers will allow sorting by the values in the column. You can click again to invert the order, and use the header right-click menu to reset @@ -3164,9 +3214,9 @@ fs.inotify.max_user_watches=32768

    3.2.4. Running - arbitrary commands on result files (1.20 and - later)

    + "RCL.SEARCH.GUI.RUNSCRIPT">3.2.4. Unix-like + systems: running arbitrary commands on result + files
    @@ -3217,8 +3267,8 @@ fs.inotify.max_user_watches=32768

    3.2.5. Displaying - thumbnails

    + "RCL.SEARCH.GUI.THUMBNAILS">3.2.5. Unix-like + systems: displaying thumbnails
    @@ -3239,10 +3289,10 @@ fs.inotify.max_user_watches=32768 settings). Restarting the search should then display the thumbnails.

    There are also some pointers about thumbnail - generation on the Recoll wiki.

    + generation in the Recoll FAQ.

    @@ -3269,10 +3319,8 @@ fs.inotify.max_user_watches=32768 "keycap">Ctrl-W (Ctrl + W) in the window. - Closing the last tab for a window will also close the - window.

    -

    Of course you can also close a preview window by using - the window manager button in the top of the frame.

    + Closing the last tab, or using the window manager button + in the top of the frame will also close the window.

    You can display successive or previous documents from the result list inside a preview tab by typing -

    This feature is new in Recoll 1.20, and will probably be - refined depending on user feedback.

    @@ -3879,11 +3924,11 @@ fs.inotify.max_user_watches=32768 Duplicates hiding is controlled by an entry in the GUI configuration dialog, and is off by default.

    -

    As of release 1.19, when a result document does have - undisplayed duplicates, a Dups link will be shown with the result - list entry. Clicking the link will display the paths - (URLs + ipaths) for the duplicate entries.

    +

    When a result document does have undisplayed + duplicates, a Dups link will + be shown with the result list entry. Clicking the link + will display the paths (URLs + ipaths) for the duplicate + entries.

    @@ -3994,27 +4039,23 @@ fs.inotify.max_user_watches=32768 "literal">reality or both appear, but those which contain virtual reality should appear sooner in the list.

    -

    Phrase searches can strongly slow down a query if - most of the terms in the phrase are common. This is why - the autophrase option is - off by default for Recoll versions before 1.17. As of - version 1.17, autophrase - is on by default, but very common terms will be removed - from the constructed phrase. The removal threshold can - be adjusted from the search preferences.

    -

    Phrases and abbreviations. As of - Recoll version 1.17, - dotted abbreviations like I.B.M. are also automatically indexed - as a word without the dots: IBM. Searching for the word inside a - phrase (ie: "the IBM - company") will only match the dotted - abrreviation if you increase the phrase slack (using - the advanced search panel control, or the o query language modifier). Literal - occurences of the word will be matched normally.

    +

    Phrase searches can slow down a query if most of the + terms in the phrase are common. If the autophrase option is on, very common + terms will be removed from the automatically + constructed phrase. The removal threshold can be + adjusted from the search preferences.

    +

    Phrases and abbreviations. Dotted + abbreviations like I.B.M. + are also automatically indexed as a word without the + dots: IBM. Searching for + the word inside a phrase (ie: "the IBM company") will only match the + dotted abrreviation if you increase the phrase slack + (using the advanced search panel control, or the + o query language + modifier). Literal occurences of the word will be + matched normally.

    @@ -4476,18 +4517,24 @@ fs.inotify.max_user_watches=32768
    -

    Newer versions of Recoll (from 1.17) normally use - WebKit HTML widgets for the result list and the - snippets - window (this may be disabled at build time). Total - customisation is possible with full support for CSS and - Javascript. Conversely, there are limits to what you - can do with the older Qt QTextBrowser, but still, it is - possible to decide what data each result will contain, - and how it will be displayed.

    -

    The result list presentation can be exhaustively - customized by adjusting two elements:

    +

    Recoll normally uses a full function HTML processor + to display the result list and the snippets + window. Depending on the version, this may be based + on either Qt WebKit or Qt WebEngine. It is then + possible to completely customise the result list with + full support for CSS and Javascript.

    +

    It is also possible to build Recoll to use a simpler Qt + QTextBrowser widget to display the HTML, which may be + necessary if the ones above are not ported on the + system, or to reduce the application size and + dependancies. There are limits to what you can do in + this case, but it is still possible to decide what data + each result will contain, and how it will be + displayed.

    +

    The result list presentation can be customized by + adjusting two elements:

      @@ -4617,7 +4664,7 @@ fs.inotify.max_user_watches=32768 (if the document is embedded, the script will be started on the top-level parent). See the + "3.2.4. Unix-like systems: running arbitrary commands on result files"> section about defining scripts.

      In addition to the predefined values above, all strings like Recoll"> http://www.recoll.org/features.html"> - + Xapian"> Windows"> Unix-like systems"> @@ -32,7 +32,7 @@ - 2005-2019 + 2005-2020 Jean-Francois Dockes @@ -62,16 +62,18 @@ application. It is updated for &RCL; &RCLVERSION;. &RCL; was for a long time dedicated to Unix-like systems. It - was only lately (2015) ported to - MS-Windows. Many references in this - manual, especially file locations, are specific to Unix, and not - valid on &WIN;, where some described features are also not available. - The manual will be progressively updated. Until this happens, on - &WIN;, most references to shared files can be translated by looking - under the Recoll installation directory (esp. the - Share subdirectory). The user configuration is - stored by default under AppData/Local/Recoll - inside the user directory, along with the index itself. + was only lately (2015) ported to + MS-Windows. Many references in this + manual, especially file locations, are specific to Unix, and not + valid on &WIN;, where some described features are also not available. + The manual will be progressively updated. Until this happens, on + &WIN;, most references to shared files can be translated by looking + under the Recoll installation directory (Typically C:/Program + Files (x86)/Recoll, esp. anything referenced + in /usr/share in this document will be found int + the Share subdirectory). The user configuration is + stored by default under AppData/Local/Recoll + inside the user directory, along with the index itself. Giving it a try @@ -238,16 +240,18 @@ &RCL; uses many parameters to define exactly what to index, - and how to classify and decode the source documents. These are kept - in configuration files. A - default configuration is copied into a standard location (usually - something like /usr/share/recoll/examples) - during installation. The default values set by the configuration - files in this directory may be overridden by values set inside your - personal configuration. With the default configuration, &RCL; will - index your home directory with generic parameters. The configuration - can be customized either by editing the text files or by using - configuration menus in the recoll GUI. + and how to classify and decode the source documents. These are kept + in configuration files. A + default configuration is copied into a standard location (usually + something like /usr/share/recoll/examples) + during installation. The default values set by the configuration + files in this directory may be overridden by values set inside your + personal configuration. With the default configuration, &RCL; will + index your home directory with generic parameters. Most common + parameters can be set by using + configuration menus in the recoll GUI. Some less + common parameters can only be set by editing the text files (the + new values will be preserved by the GUI). The indexing process is started automatically (after asking permission), the @@ -303,11 +307,9 @@ or ). recollindex skips files which caused an - error during a previous pass. This is a performance - optimization, and a new behaviour in version 1.21 (failed files - were always retried by previous versions). The command line - option can be set to retry failed files, for - example after updating an input handler. + error during a previous pass. This is a performance optimization, and + the command line option can be set to retry + failed files, for example after updating an input handler. The following sections give an overview of different aspects of the indexing processes and configuration, with links @@ -329,15 +331,15 @@ <link linkend="RCL.INDEXING.PERIODIC">Periodic (or batch) indexing</link> - - recollindex is executed - at discrete times. On &LIN;, the typical usage is to have a - nightly run - programmed - into your cron file. On &WIN;, this is - the only mode available, and the indexer is usually started - from the GUI (but there is nothing to prevent starting it - from a command script). + recollindex is executed at + discrete times. On &LIN;, the typical usage is to have a + nightly run + + programmed + into your cron file. On &WIN;, this is + the only mode available, and the Windows Task Scheduler can + be used to run indexing. In both cases, the GUI includes an + easy interface to the system batch scheduler. @@ -367,7 +369,7 @@ Preferences Indexing schedule - + dialog. @@ -540,24 +542,19 @@ corrupt, we may fail to uncompress a file because no file system space is available, etc. - &RCL; versions prior to 1.21 always retried to index - files which had previously caused an error. This guaranteed - that anything that may have become indexable (for example - because a helper had been installed) would be indexed. However - this was bad for performance because some indexing failures - may be quite costly (for example failing to uncompress a big - file because of insufficient disk space). - - The indexer in &RCL; versions 1.21 and later does not - retry failed files by default. Retrying will only occur if an - explicit option () is set on the - recollindex command line, or if a script - executed when recollindex starts up says - so. The script is defined by a configuration variable - (checkneedretryindexscript), and makes a - rather lame attempt at deciding if a helper command may have - been installed, by checking if any of the common - bin directories have changed. + The &RCL; indexer in versions 1.21 and later does not + retry failed files by default, because some indexing failures + can be quite costly (for example failing to uncompress a big + file because of insufficient disk space). + Retrying will only occur if an explicit option + () is set on + the recollindex command line, or if a script + executed when recollindex starts up says + so. The script is defined by a configuration variable + (checkneedretryindexscript), and makes a + rather lame attempt at deciding if a helper command may have been + installed, by checking if any of the + common bin directories have changed. @@ -638,7 +635,7 @@ Of course, images, sound and video do not increase the index size, which means that in most cases, the space used by the index - will be negligible against the total amount of data on the + will be negligible compared to the total amount of data on the computer. The index data directory (xapiandb) @@ -727,13 +724,13 @@ Index configuration - Variables set inside the - &RCL; configuration files - control which areas of the file system are indexed, and how - files are processed. These variables can be set either by - editing the text files or by using the - dialogs in the recoll GUI. - + Variables stored inside the + &RCL; configuration files + control which areas of the file system are indexed, and how files + are processed. The values can be set by editing the text + files. Most of the more commonly used ones can also be adjusted by + using the + dialogs in the recoll GUI. The first time you start recoll, you will be asked whether or not you would like it to build the index. If you @@ -748,7 +745,7 @@ installation chapter of this document, or in the recoll.conf5 - manual page.Both documents are automatically generated from + manual page. Both documents are automatically generated from the comments inside the configuration file. The most immediately useful variable @@ -761,11 +758,11 @@ described in the external packages section. - As of Recoll 1.18 there are two incompatible types of Recoll - indexes, depending on the treatment of character case and - diacritics. A - further section - describes the two types in more detail. + There are two incompatible types of Recoll + indexes, depending on the treatment of character case and + diacritics. A further + section describes the two types in more detail. The default + type is appropriate in most cases. Multiple indexes @@ -1088,9 +1085,9 @@ recoll -c /path/to/my/new/config By indexing the volume in the main, fixed, index, and ensuring that the volume data is not purged if the indexing runs - while the volume is mounted. (&RCL; 1.25.2). + while the volume is mounted. (since &RCL; 1.25.2). By storing a volume index on the volume - itself (&RCL; 1.24). + itself (since &RCL; 1.24). @@ -1402,27 +1399,27 @@ metadatacmds = ; tags = tmsu tags %f The PDF input handler The PDF format is very important for scientific and technical - documentation, and document archival. It has extensive - facilities for storing metadata along with the document, and these - facilities are actually used in the real world. + documentation, and document archival. It has extensive + facilities for storing metadata along with the document, and these + facilities are actually used in the real world. In consequence, the rclpdf.py PDF input - handler has more complex capabilities than most others, and it is - also more configurable. Specifically, rclpdf.py - has the following features: - - It can be configured to extract - specific metadata tags from an XMP packet. - It can extract PDF - attachments. - It can automatically perform - OCR if the document text is empty. This is done by - executing an external program and is now described in a - separate - section, because the OCR framework can also be used - with non-PDF image files. - - + handler has more complex capabilities than most others, and it is + also more configurable. Specifically, rclpdf.py + has the following features: + + It can be configured to extract + specific metadata tags from an XMP packet. + It can extract PDF + attachments. + It can automatically perform + OCR if the document text is empty. This is done by + executing an external program and is now described in a + separate + section, because the OCR framework can also be used + with non-PDF image files. + + XMP fields extraction @@ -1496,48 +1493,48 @@ metadatacmds = ; tags = tmsu tags %f - + Recoll and OCR - This is new in &RCL; 1.26.5. Older versions had a more limited, - non-caching capability to execute an external OCR program in the PDF - handler. The new function has the following features: + This is new in &RCL; 1.26.5. Older versions had a more limited, + non-caching capability to execute an external OCR program in the PDF + handler. The new function has the following features: - - The OCR output is cached, stored as separate - files. The caching is ultimately based on a hash value of the - original file contents, so that it is immune to file renames. A - first path-based layer ensures fast operation for unchanged - (unmoved files), and the data hash (which is still orders of - magnitude faster than OCR) is only re-computed if the file has - moved. OCR is only performed if the file was not previously - processed or if it changed. - The support for a specific program is implemented - in a simple Python module. It should be straightforward to add - support for any OCR engine with a capability to run from the - command line. - Modules initially exist for - tesseract (Linux and Windows), and - ABBYY FineReader (Linux, tested with - version 11). ABBYY FineReader is a commercial closed source - program, but it sometimes perform better than - tesseract. - The OCR is currently only called from the PDF - handler, but there should be no problem using it for other image - types. - - + + The OCR output is cached, stored as separate + files. The caching is ultimately based on a hash value of the + original file contents, so that it is immune to file renames. A + first path-based layer ensures fast operation for unchanged + (unmoved files), and the data hash (which is still orders of + magnitude faster than OCR) is only re-computed if the file has + moved. OCR is only performed if the file was not previously + processed or if it changed. + The support for a specific program is implemented + in a simple Python module. It should be straightforward to add + support for any OCR engine with a capability to run from the + command line. + Modules initially exist for + tesseract (Linux and Windows), and + ABBYY FineReader (Linux, tested with + version 11). ABBYY FineReader is a commercial closed source + program, but it sometimes perform better than + tesseract. + The OCR is currently only called from the PDF + handler, but there should be no problem using it for other image + types. + + - To enable this feature, you need to install one of - the supported OCR applications - (tesseract - or ABBYY), enable OCR in the PDF - handler, and tell &RCL; where the appropriate command resides. The - last parts are done by setting configuration variables. See the - - relevant section. All parameters can be localized in - subdirectories through the usual main configuration mechanism (path - sections). + To enable this feature, you need to install one of + the supported OCR applications + (tesseract + or ABBYY), enable OCR in the PDF + handler, and tell &RCL; where the appropriate command resides. The + last parts are done by setting configuration variables. See the + + relevant section. All parameters can be localized in + subdirectories through the usual main configuration mechanism (path + sections). @@ -1564,20 +1561,12 @@ metadatacmds = ; tags = tmsu tags %f The GUI File menu has entries to start or stop the current indexing - operation. + operation. When indexing is not currently running, you have a + choice of updating the index or rebuilding it (the first choice + only processes changed files, the second one zeroes the index + before starting so that all files are processed). - When no indexing is running, you have a choice of updating the - index or rebuilding it (the first choice only processes changed - files, the second one zeroes the index before starting so that all - files are processed). - - On Linux, the recollindex indexing process - can be interrupted by sending an interrupt - (Ctrl-C, SIGINT) or terminate (SIGTERM) - signal. - - - On Linux and Windows, the GUI can used to manage the indexing + On Linux and Windows, the GUI can be used to manage the indexing operation. Stopping the indexer can be done from the recoll GUI @@ -1587,6 +1576,12 @@ metadatacmds = ; tags = tmsu tags %f menu entry. + On Linux, the recollindex indexing process + can be interrupted by sending an interrupt + (Ctrl-C, SIGINT) or terminate (SIGTERM) + signal. + + When stopped, some time may elapse before recollindex exits, because it needs to properly flush and close the index. @@ -1601,6 +1596,10 @@ metadatacmds = ; tags = tmsu tags %f file tree will be traversed, but files that were indexed up to the interruption and for which the index is still up to date will not need to be reindexed). + + + + recollindex command line recollindex has many options which are listed in its @@ -1879,19 +1878,19 @@ fs.inotify.max_user_watches=32768 Searching with the Qt graphical user interface The recoll program provides the main user - interface for searching. It is based on the - Qt library. + interface for searching. It is based on the + Qt library. - recoll has two search modes: + recoll has two search interfaces: Simple search (the default, on the main screen) has - a single entry field where you can enter multiple words. + a single entry field where you can enter multiple words. Advanced search (a panel accessed through the - Tools menu or the toolbox bar icon) has - multiple entry fields, which you may use to build a logical - condition, with additional filtering on file type, location - in the file system, modification date, and size. + Tools menu or the toolbox bar icon) has + multiple entry fields, which you may use to build a logical + condition, with additional filtering on file type, location + in the file system, modification date, and size. @@ -1956,32 +1955,6 @@ fs.inotify.max_user_watches=32768 a separate section. - The File name search mode will - specifically look for file names. The point of having a separate - file name search is that wild card expansion can be performed more - efficiently on a small subset of the index (allowing wild cards on - the left of terms without excessive cost). Things to know: - - White space in the entry should match white - space in the file name, and is not treated specially. - - The search is insensitive to character case and - accents, independently of the type of index. - - An entry without any wild card - character and not capitalized will be prepended and appended - with '*' (ie: etc -> - *etc*, but - Etc -> - etc). - - If you have a big index (many files), - excessively generic fragments may result in inefficient - searches. - - - - When using a stripped index (the default), character case has no influence on search, except that you can disable stem expansion for any term by capitalizing it. Ie: a search for @@ -2018,75 +1991,62 @@ fs.inotify.max_user_watches=32768 You can use the ToolsAdvanced search dialog for more complex searches. + The File name search mode will + specifically look for file names. The point of having a separate + file name search is that wild card expansion can be performed more + efficiently on a small subset of the index (allowing wild cards on + the left of terms without excessive cost). Things to know: + + White space in the entry should match white + space in the file name, and is not treated specially. + + The search is insensitive to character case and + accents, independently of the type of index. + + An entry without any wild card + character and not capitalized will be prepended and appended + with '*' (ie: etc -> + *etc*, but + Etc -> + etc). + + If you have a big index (many files), + excessively generic fragments may result in inefficient + searches. + + + + - The default result list + The result list After starting a search, a list of results will instantly - be displayed in the main list window. + be displayed in the main window. By default, the document list is presented in order of relevance (how well the system estimates that the document matches the query). You can sort the result by ascending or descending date by using the vertical arrows in the toolbar. - Clicking on the - Preview link for an entry will open an - internal preview window for the document. Further - Preview clicks for the same search will open - tabs in the existing preview window. You can use - Shift+Click to force the creation of another - preview window, which may be useful to view the documents side - by side. (You can also browse successive results in a single - preview window by typing - Shift+ArrowUp/Down in the - window). + Clicking the Preview link for an entry + will open an internal preview window for the document. Further + Preview clicks for the same search will open + tabs in the existing preview window. You can use + Shift+Click to force the creation of another + preview window, which may be useful to view the documents side + by side. (You can also browse successive results in a single + preview window by typing + Shift+ArrowUp/Down in the + window). Clicking the Open link will - start an external viewer for the document. By default, &RCL; lets - the desktop choose the appropriate application for most document - types (there is a short list of exceptions, see further). If you - prefer to completely customize the choice of applications, you can - uncheck the Use desktop preferences option in - the GUI preferences dialog, and click the Choose editor - applications button to adjust the predefined &RCL; - choices. The tool accepts multiple selections of MIME types (e.g. to - set up the editor for the dozens of office file types). - - Even when Use desktop preferences is - checked, there is a small list of exceptions, for MIME types where - the &RCL; choice should override the desktop one. These are - applications which are well integrated with &RCL;, especially - evince for viewing PDF and Postscript - files because of its support for opening the document at a specific - page and passing a search string as an argument. Of course, you can - edit the list (in the GUI preferences) if you would prefer to lose - the functionality and use the standard desktop tool. - - You may also change the choice of applications by editing the - mimeview - configuration file if you find this more convenient. - - Each result entry also has a right-click menu with an - Open With entry. This lets you choose an - application from the list of those which registered with the desktop - for the document MIME type. - - The Preview and Open - edit links may not be present for all entries, meaning that - &RCL; has no configured way to preview a given file type (which - was indexed by name only), or no configured external editor for - the file type. This can sometimes be adjusted simply by tweaking - the mimemap - and mimeview - configuration files (the latter can be modified with the user - preferences dialog). - - The format of the result list entries is entirely - configurable by using the preference dialog to - edit an HTML fragment. - + start an external viewer for the document. By default, &RCL; lets + the desktop choose the appropriate application for most document + types. This currently not customisable on &WIN;. See + further + for customizing the applications on &LIN;. You can click on the Query details link at the top of the results page to see the query actually @@ -2100,6 +2060,76 @@ fs.inotify.max_user_watches=32768 toolbar or the links at the bottom of the page to browse the results. + The Preview and Open + edit links may not be present for all entries, meaning that + &RCL; has no configured way to preview a given file type (which + was indexed by name only), or no configured external editor for + the file type. This can sometimes be adjusted simply by tweaking + the + mimemap + and + mimeview + configuration files (the latter can be modified with the user + preferences dialog). + + The format of the result list entries is entirely + configurable by using the preference dialog to + + edit an HTML fragment. + + + &LIN;: customising the applications + + By default &RCL; lets the desktop choose what + application should be used to open a given document, with + exceptions. + + The details of this behaviour can be customized with the + + Preferences + GUI configuration + User interface + Choose editor applications + dialog or by editing + the + mimeview configuration file. + + When Use desktop preferences, at the + top of the dialog, is checked, there is a small list of + exceptions, for MIME types where the &RCL; choice should + override the desktop one. These are applications which are well + integrated with &RCL;, for example, on + Linux, evince for viewing PDF and + Postscript files because of its support for opening the + document at a specific page and passing a search string as an + argument. You can add or remove document types to the + exceptions by using the dialog. + + If you prefer to completely customize the choice of + applications, you can uncheck Use desktop + preferences, in which case the &RCL; predefined + applications will be used, and can be changed for each document + type. This is probably not the most convenient approach in most + cases. + + In all cases, the applications choice dialog accepts + multiple selections of MIME types in the top section, and lets + you define how they are processed in the bottom one. + + You may also change the choice of applications by editing + the + + mimeview + configuration file if you find this more convenient. + + Under &LIN;, each result list entry also has a right-click + menu with an + Open With entry. This lets you choose an + application from the list of those which registered with the desktop + for the document MIME type, on a case by case basis. + + + No results: the spelling suggestions @@ -2143,17 +2173,17 @@ fs.inotify.max_user_watches=32768 Open entries do the same thing as the corresponding links. - Open With lets you open the document - with one of the applications claiming to be able to handle its MIME - type (the information comes from the .desktop - files in - /usr/share/applications). + Open With (&LIN;) lets you open the + document with one of the applications claiming to be able to + handle its MIME type (the information comes from + the .desktop files + in /usr/share/applications). - Run Script allows starting an arbitrary - command on the result file. It will only appear for results which - are top-level files. See - further for a more - detailed description. + Run Script (&LIN;) allows starting an + arbitrary command on the result file. It will only appear for + results which are top-level + files. See further + for a more detailed description. The Copy File Name and Copy Url copy the relevant data to the @@ -2203,10 +2233,10 @@ fs.inotify.max_user_watches=32768 The result table - In &RCL; 1.15 and newer, the results can be displayed in - spreadsheet-like fashion. You can switch to this presentation by - clicking the table-like icon in the toolbar (this is a toggle, - click again to restore the list). + As an alternative to the result list, the results can also be + displayed in spreadsheet-like fashion. You can switch to this + presentation by clicking the table-like icon in the toolbar (this + is a toggle, click again to restore the list). Clicking on the column headers will allow sorting by the values in the column. You can click again to invert the order, and @@ -2235,7 +2265,7 @@ fs.inotify.max_user_watches=32768 - Running arbitrary commands on result files (1.20 and later) + &LIN;: running arbitrary commands on result files Apart from the Open and Open With operations, which allow starting an application on a @@ -2280,7 +2310,7 @@ fs.inotify.max_user_watches=32768 - Displaying thumbnails + &LIN;: displaying thumbnails The default format for the result list entries and the detail area of the result table display an icon for each result @@ -2298,9 +2328,9 @@ fs.inotify.max_user_watches=32768 your settings). Restarting the search should then display the thumbnails. - There are also some - pointers about thumbnail generation on the &RCL; wiki. - + There are also some + pointers about thumbnail generation in the &RCL; + FAQ. @@ -2319,13 +2349,10 @@ fs.inotify.max_user_watches=32768 create a new preview window. The old one stays open until you close it. - You can close a preview tab by typing Ctrl-W - (Ctrl + W) in the - window. Closing the last tab for a window will also close the - window. - - Of course you can also close a preview window by using the - window manager button in the top of the frame. + You can close a preview tab by typing Ctrl-W + (Ctrl + W) in the window. Closing + the last tab, or using the window manager button in the top of the + frame will also close the window. You can display successive or previous documents from the result list inside a preview tab by typing @@ -2477,9 +2504,6 @@ fs.inotify.max_user_watches=32768 added (as an AND filter) before performing the query if the button is active. - This feature is new in &RCL; 1.20, and will probably be - refined depending on user feedback. - @@ -2839,11 +2863,10 @@ fs.inotify.max_user_watches=32768 by an entry in the GUI configuration dialog, and is off by default. - As of release 1.19, when a result document does have - undisplayed duplicates, a Dups - link will be shown with the result list entry. Clicking the - link will display the paths (URLs + ipaths) for the duplicate - entries. + When a result document does have undisplayed duplicates, + a Dups link will be shown with the result list + entry. Clicking the link will display the paths (URLs + ipaths) + for the duplicate entries. @@ -2942,24 +2965,24 @@ fs.inotify.max_user_watches=32768 list. - Phrase searches can strongly slow down a query if most of the - terms in the phrase are common. This is why the - autophrase option is off by default for &RCL; - versions before 1.17. As of version 1.17, - autophrase is on by default, but very common - terms will be removed from the constructed phrase. The removal - threshold can be adjusted from the search preferences. - - Phrases and abbreviations As of - &RCL; version 1.17, dotted abbreviations like - I.B.M. are also automatically indexed as a word - without the dots: IBM. Searching for the word - inside a phrase (ie: "the IBM company") will only - match the dotted abrreviation if you increase the phrase slack (using the - advanced search panel control, or the o query - language modifier). Literal occurences of the word will be matched - normally. + Phrase searches can slow down a query if most of the + terms in the phrase are common. If + the autophrase option is on, very common + terms will be removed from the automatically constructed + phrase. The removal threshold can be adjusted from the search + preferences. + Phrases and abbreviations + Dotted abbreviations like + I.B.M. are also automatically indexed as a + word without the dots: IBM. Searching for + the word inside a phrase (ie: "the IBM + company") will only match the dotted abrreviation + if you increase the phrase slack (using the advanced search + panel control, or the o query language + modifier). Literal occurences of the word will be matched + normally. + @@ -3377,18 +3400,24 @@ fs.inotify.max_user_watches=32768 The result list format - Newer versions of Recoll (from 1.17) normally use WebKit HTML - widgets for the result list and the - snippets window - (this may be disabled at build time). - Total customisation is possible with full support for CSS and - Javascript. Conversely, there are limits to what you can do with - the older Qt QTextBrowser, but still, it is possible to decide - what data each result will contain, and how it will be - displayed. + Recoll normally uses a full function HTML processor to + display the result list and the + + snippets window. Depending on the version, this may be + based on either Qt WebKit or Qt WebEngine. + It is then possible to completely customise the result list with full + support for CSS and Javascript. - The result list presentation can be exhaustively customized - by adjusting two elements: + It is also possible to build &RCL; to use a simpler Qt + QTextBrowser widget to display the HTML, which may be necessary + if the ones above are not ported on the system, or to reduce + the application size and dependancies. There are limits to what + you can do in this case, but it is still possible to decide + what data each result will contain, and how it will be + displayed. + + The result list presentation can be customized + by adjusting two elements: The paragraph format diff --git a/src/windows/mimeview b/src/windows/mimeview index 63159d40..4eadae25 100644 --- a/src/windows/mimeview +++ b/src/windows/mimeview @@ -35,7 +35,7 @@ text/html|epub = rclstartw %F;ignoreipath=1 application/x-fsdirectory|parentopen = rclstartw %f inode/directory|parentopen = rclstartw %f -###### The following are not used at all on windows, but the types need to +###### THE FOLLOWING ARE NOT USED AT ALL ON WINDOWS, but the types need to ###### be listed for an "Open" link to appear in the result list application/epub+zip = ebook-viewer %f