web

2017-06-05 11:57:26 +02:00 · 2017-06-05 11:57:26 +02:00 · 821fb780d2
commit 821fb780d2
parent 06b414cfc6
35 changed files with 2078 additions and 0 deletions
--- a/website/faqsandhowtos/ElinksWeb.txt
+++ b/website/faqsandhowtos/ElinksWeb.txt
@ -0,0 +1,35 @@
 == Extending the Recoll Firefox visited web page indexing mechanism to other browsers
 The *Recoll* _Web Queue_ function allows using WEB browser plug-ins
 originally designed for indexing visited WEB pages with *Beagle* (rip). The
 browser plug-ins works very simply by creating copies of the visited pages
 in a designated directory. Two files are created for each page, one for the
 contents, the other for the metadata. 
 When activated, *Recoll* will visit the queue directory and index each HTML
 page and its associated metadata. There is more detail about the mechanism
 on the [[IndexWebHistory|page about the Recoll Web queue]], but mostly, you
 just need to go to the _Indexing Preferences_ in the *recoll* GUI, open the
 _Web history_ panel and check the top button. 
 Franck, a *Recoll* and *Elinks* user from New Zealand, designed a method
 and wrote a script to index the *Elinks* WEB history in this fashion.  
 The script works by using *wget* to fetch the visited page into the queue
 directory. This means that it would be reusable to index arbitrary WEB
 pages in contexts other than *Elinks* visits. 
 Recipee for *Elinks* and Recoll 1.18 and later:
 * Retrieve the 
  link:https://www.recoll.org/files/elinks_recoll.sh[elinks_recoll.sh] shell
  script and make it executable (`chmod a+x elinks_recoll.sh`).
 * In the Elinks Keyboard shortcut manager (k)/Main, add a shortcut to pass
  the current URL to an external commande, e.g. _Ctrl-P_.
 * In the Options manager (o) /Document/Uri Passing, add an action named for
  example _ToIndex_
 * Modify the ToIndex action to execute `/path/to/the/script/elinks_recoll.sh %c`
 * Save, you are done
 For Recoll 1.17, the method is analog, but the script is named
 link:https://www.recoll.org/files/elinks_recoll.sh[elinks_beagle.sh].
--- a/website/faqsandhowtos/FaqsAndHowTos.txt
+++ b/website/faqsandhowtos/FaqsAndHowTos.txt
@ -0,0 +1,37 @@
 == Faqs and Howtos
 === Indexing
 * link:WhyIsMyFileNotIndexed.html[Why is this file not indexed ? Investigating indexing issues]
 * link:PreventIndexingDir.html[Preventing the indexing of a directory]
 * link:IndexOnAc.html[Starting/stopping the indexer depending on power/battery status]
 * link:IndexMozillaCalendari.html[Indexing Mozilla Sunbird / Lightning calendar data]
 * link:MultipleIndexes.html[Creating and using multiple indexes]
 * link:IndexWebHistory.html[Indexing Web history with the Firefox browser extension]
 * link:ElinksWeb.html[Extending the Web queue mechanism to other browsers and general WEB indexing]
 * link:IndexMailHeader.html[Indexing arbitrary mail headers]
 * link:IndexOutlook.html[Indexing Outlook archives]
 * link:HandleCustomField.html[Generating a custom field and using it to sort results]
 * link:http://www.recoll.org/recoll_XMP/index.html.html[An example of filter/field customisation, using XMP metadata with PDFs]
 * link:FilteringOutZipArchiveMembers.html[Filtering out Zip archive members]
 === Searching
 * link:GUIKeyboard.html[Recoll GUI keyboard navigation]
 * link:HotRecoll.html[On the desktop: using a keyboard shortcut for starting/hiding recoll]
 * link:OpenHelperScript.html[Handling issues for starting native apps, esp. email clients - getting Thunderbird to open message files]
 * link:QpdfviewHelperScript.html[Another example open helper script - using qpdfview to open pdf and postscript files, with support for page and search options]
 * link:UsingOpenWith.html[Using the new Open With menu in recoll 1.20 with a custom
  app]
 * link:ReplaceCategories.html[Replacing the document category filters]
 * link:ResultsThumbnails.html[Result list thumbnails and how to create them]
 * link:MuttAndRecoll.html[Interfacing Recoll and Mutt]
 * link:QueryFromC.html[Querying from a C program]
 === Administration and miscellaneous
 * link:http://www.recoll.org/pages/recoll-webui-install-wsgi.html.html[Installation of the Recoll WebUI with Apache]
 * link:FilterRetrofit.wiki.html[//Installing a filter for a new document type//]
 * link:UnityLens.html[Building and Installing the Ubuntu Unity Recoll Lens]
 * link:SavingConfig.wiki.html[Recoll configuration backup]
 * link:XDGBase.wiki.html[Tidying Recoll data storage]
 * link:ProblemSolvingData.html[Collecting diagnostic information]
 * link:NonAsciiFileNames.html[Unix and non-ascii file names]
 * link:FilterArch.html[Recoll filters]
--- a/website/faqsandhowtos/FilterArch.txt
+++ b/website/faqsandhowtos/FilterArch.txt
@ -0,0 +1,82 @@
 == Recoll input handlers
 In the end, Recoll indexes plain UTF-8 text, remembering when it came
 from.
 But of course, this is not how the source data looks like.
 The text content of the original documents is encoded in many fashions
 (ie pdf, ms-word, html, etc.), and it can also be stored in quite
 involved ways (inside archives, email attachments ...).
 For getting to the data and converting it to plain text, Recoll uses a set
 of modules which it calls input handlers (or filters), which either operate
 on the storage structure (ie: a zip handler), or the storage format (ie a
 pdf to text translator), or both. In addition, there is a tentative notion
 of a higher level storage backend which we will ignore for now (for
 reference there are currently two of those: the file system and the web
 history cache).
 The basic task of filters is to take a document as input and produce a
 series of subdocuments as output. The subdocument's format is defined
 either dynamically (as part of the output data), or statically, in the
 filter definition. 
 === Simple filters
 These are executed by a the **mh_exec** recoll module. They are the vast
 majority.
 These filters are very simple. They are designed to perform a simple task
 with minimal interface, they mostly don't know anything about each other,
 and they don't know much about their context. This makes writing a filter
 quite easy as there is not much to learn about their environment.
 Only one output document is produced and the format is fixed. 
 In practise the filter, which is most generally a shell-script (but could
 be any executable program), takes a file name on the command line and
 outputs an html or plain text document on standard output, then exits.
 For example, the pdf filter takes one pdf file name as input on the command
 line and produces one html document on stdout. The fact that the output is
 html is statically defined in a configuration file. 
 For filters which produce plain text, the output character set information
 is in general defined in the configuration file. Else it will be obtained
 from the locale (hoping that it makes sense).
 Filters that output html can produce metadata information in the html
 header (ie author etc.). Filters that output plain text can only output
 main text data, no metadata fields. 
 Besides the file name, there is one other piece of input information, which
 is in the form of an environment variable, and can be safely ignored:
 +RECOLL_FILTER_FORPREVIEW+. This indicates if the filter is being used
 for previewing or for indexing data. Some filters will elect to suppress
 repetitive parts of the output text when indexing to avoid distorting the
 term statistics. For exemple, the man filter suppresses the section
 headers (NAME, SYNOPSIS...) when indexing.
 === Multiple input filters
 These filters are more complex, but still quite easy to write, especially
 if you can use Python, because they can then use a common module which
 manages the communication with the indexer.
 Newer Recoll versions have converted many previously 'simple' filters to
 this kind as part of the port to Windows.
 These filters are executed by the *mh_execm* Recoll module.
 They are persistent (one instance will persist through a whole indexing
 pass), and will index successive multiple input files (the point being to
 avoid startup performance penalty), and possibly multiple documents per
 input file if this makes sense for their input format (ie: zip archive, chm
 help file). 
 They use a simple communication protocol over a pipe with the main recoll
 or recollindex process, with file names and a few other parameters being
 sent as input, and decoded data and attributes being sent in return.
 The shared Python module is 'filters/rclexecm.py'. You can look at 'rclzip'
 or 'rclaudio' for reasonably straightforward exemples.
--- a/website/faqsandhowtos/FilterRetrofit.txt
+++ b/website/faqsandhowtos/FilterRetrofit.txt
@ -0,0 +1,62 @@
 == Installing a filter for a new document type
 It will sometimes happen that a newer Recoll release has support for a
 document type which would be useful to you, but which your older release
 does not support.
 It is in general easy to import support from the newer to the older
 release: the Recoll input handler interface is very stable, so things should just
 work.
 Input Handler updates are generally described on the Recoll web site
 link:https://www.recoll.org/filters/filters.html[new filters pages]. They
 may include notes about which versions need the new input handler, or specifics
 about installing it.
 An up to date copy of input handlers and configuration files is also kept
 link:https://www.recoll.org/filters/[at the same location].
 We will take an example to make things more concrete: Tomboy and Gnote
 files are directly supported by Recoll 1.19, but not in older Recoll
 releases. The *rclxml* handler is needed to process them.
 The following procedure will allow you to retrofit support:
 - Retrieve the *rclxml* input handler from:
  link:https://www.lesbonscomptes.com/recoll/filters/rclxml[]
 - Copy it to '/usr/share/recoll/filters' and make it executable: 
  `chmod +x rclxml`
  The input handler needs *xsltproc*, but this is probably already on your
  system (else get it with the package manager).
 - Edit '~/.recoll/mimemap', add the following line:
 `.note = application/x-gnote`
 - Edit '~/.recoll/mimeconf', add the following lines:
 +
 ----
 [index]
 application/x-gnote = exec rclxml
 ----
 - Edit '~/.recoll/mimeview', add the following lines:
 +
 ----
 [view]
 application/x-gnote = tomboy %f
 ----
 - The easiest way to make sure the files are indexed with the new input
  handlers may then be to just run a full indexing pass (`recollindex -z`). 
 Notes:
 - The MIME type which is used is not crucial, you could prefer to use,
  e.g., +application/x-tomboy+ instead, it just has to be consistent. To
  avoid future trouble, it's better to use the type used by newer Recoll
  releases though.
 - The 'mimeview' entry is necessary even if you are using the desktop
  preferences to open files. The value will not be used, but it has to be
  there.
--- a/website/faqsandhowtos/FilteringOutZipArchiveMembers.txt
+++ b/website/faqsandhowtos/FilteringOutZipArchiveMembers.txt
@ -0,0 +1,34 @@
 == Filtering out Zip archive members ==
 The *rclzip* Zip archive extraction input handler does not use the general
 configuration variables which define what file system objects should be
 skipped, but it has an equivalent internal function. 
 The name-skipping code depends on a recent member of the the Recoll Python
 package. This will become standard for release 1.20, but for earlier
 releases, you need to do two things to use this function: 
 - Fetch 'python/recoll/recoll/rclconfig.py' and 'filters/rclzip' from the
  source repository. 
 - Copy both to '/usr/share/recoll/filters' and make 'rclzip' executable.
 You can then set a variable named +zipSkippedNames+ inside
 'recoll.conf'. +zipSkippedNames+ should be a space-separated list of
 patterns which will be passed to the Python fnmatch() function. The +/+
 characters are not special (matched as any character). 
 You can't use embedded spaces in patterns (no double-quote quoting for now)
 This can be redefined for file system directories using the usual section
 indicators (Zip archives in different file-system directories can have
 different skip lists). 
 Example:
 ----
 zipSkippedNames = *.txt
 [/path/to/the/dir]
 zipSkippedNames = somedir/*/*.html
 ----
--- a/website/faqsandhowtos/GUIKeyboard.txt
+++ b/website/faqsandhowtos/GUIKeyboard.txt
@ -0,0 +1,60 @@
 == Recoll GUI keyboard navigation
 Using Recoll without the mouse is not completely straightforward, but it is
 mostly feasible. Here follows a description of the usable shortcuts. 
 === Anywhere
 `Ctrl+q` should exit Recoll from anywhere.
 === Main window and result list ===
 When Recoll starts up, the focus is in the simple search entry. The main
 window tab order is as follows: 
 * Clear
 * Search
 * Search type combo
 * Search entry  (Initial focus)
 * Result list (scrolling etc)
 * Result list 1st link
 * Result list next links...
 * Back to Clear
 Each result list entry has 3 links: the icon link is not active, but its
 value is the URL, so that it can be dragged and dropped to another
 application. The 2 other links are _Preview_ and _Open_ and can be
 activated by typing _Enter_. 
 Typing _Ctrl+Shift+s_ anywhere in the main window should return the focus to the search entry. So will _Ctrl+l_ in future versions (for compatibility with WEB browser usage).
 For pure keyboard usage, you can improve this by:
 - Disabling the icon link: use _Preferences->GUI configuration->Result
  List->Edit result paragraph_ and remove the `<a href='%U'>` and `</a>`
  around the `<img...>` tag. 
 - Making the active link more visible by adding the following code to the
  result page HTML header insert (same preferences tab). Feel free to
  adjust the color :=) : 
 ----
 <style type="text/css">
 a:focus {background-color: red;}
 </style>
 ----
 === Result table
 The same _Ctrl+Shift+s_ will return the focus to the search entry when
 working with the result table. 
 _Ctrl+r_ will move the focus from the entry to the spreadsheet. When in
 there the arrow keys will navigate the lines.  
 When a line is selected:
 * _Ctrl+o_ will _Open_ the document.
 * _Ctrl+Shift+o_ will _Open_ the document and exit Recoll.
 * _Ctrl+d_ (detail) will start a _Preview_
 _Esc_ will deselect the current line so that mouse hovering will work again.
--- a/website/faqsandhowtos/HandleCustomField.txt
+++ b/website/faqsandhowtos/HandleCustomField.txt
@ -0,0 +1,69 @@
 == Generating a custom field and using it to sort results
 We are going to show how to generate a custom field from a Recoll filter,
 and use it for sorting results. The example chosen comes from an actual
 user request: sorting results on pdf page counts. 
 The details here are obsolete, as the +pdf+ input handler is now a quite
 different python program, but the general idea is still relevant.
 The page count from a pdf file can be displayed by the pdfinfo command
 (xpdf or poppler tools). 
 We first modify a copy of the rclpdf filter
 ('/usr/[local/]share/recoll/filters/rclpdf'), to compute the pdf page count,
 and output the value as an html meta field. This is a not very interesting
 bit of shell/awk magic. Another approach would be to just rewrite the
 rclpdf filter in your favorite scripting language (ie: perl, python...), as
 all it does is execute pdftotext and pdfinfo and output html, nothing
 complicated. Here follows the rclpdf modification as a pseudo patch: 
 ----
 # compute the page count and format it so that it's alphabetically sortable
 +set `pdfinfo "$infile" | egrep ^Pages:`
 +pages=`printf "%04d" $2`
 [skip...]
 # Pass the page count value to awk
 -awk 'BEGIN'\
 +awk -v Pages="$pages" 'BEGIN'\
 [skip...]
 # Inside the awk program startup section: compute the "meta" field line
 +  pagemeta = "<meta name=\"pdfpages\" content=\"" Pages "\">\n"
 [skip...]
 # Then print it as part of the header:
 +    $0 =  part1 charsetmeta pagemeta part2
 [skip...]
 ----
 You can execute your own version of rclpdf by modifying '~/.recoll/mimeconf':
 ----
 [index]
 application/pdf = exec /path/to/my/own/rclpdf
 ----
 At this point, recollindex would receive and extract a +pdfpages+ field,
 but it would not know what to do with it. We are going to tell it to store
 the value inside the document data record so that it can be displayed in
 the results, and sorted on. For this we modify the '~/.recoll/fields' file: 
 ----
 [stored]
 pdfpages=
 ----
 That's it ! After reindexing, you can now display +pdfpages+ inside the
 result list (add a +%(pdfpages)+ value to the paragraph format), and display
 +pdfpages+ inside the result table (right-click the table header), and sort
 the results on page count (click the column header). 
 Note that +pdfpages+ has not been defined as searchable (this would not make
 much sense). For this, you'd have to define a prefix and add it to the
 [prefixes] fields file section: 
 ----
 [prefixes]
 pdfpages = XYPDFP
 ----
 Have a look at the comments inside the 'fields' file for more information.
--- a/website/faqsandhowtos/Home.txt
+++ b/website/faqsandhowtos/Home.txt
@ -0,0 +1,13 @@
 == Welcome to the Recoll Faqs and Recipees
 link:FaqsAndHowTos.html[FAQs and Howtos] are stored here, but 
 the main source for Recoll user documentation is 
 link:https://www.recoll.org/doc.html[the _Recoll user manual_] on the
 link:https://www.recoll.org/[Recoll Web site] where you will also find a
 lot of other Recoll information, source code tarballs and contact
 information.
 If you want to make your problem report as useful as possible, you may want
 to take a look at link:ProblemSolvingData.html[this page]. 
 link:WikiIndex.html[Full file index]
--- a/website/faqsandhowtos/HotRecoll.txt
+++ b/website/faqsandhowtos/HotRecoll.txt
@ -0,0 +1,79 @@
 == Recoll hotkey: starting / hiding recoll with a keyboard shortcut
 Type a key (ie: F12) and have recoll appear or disappear. On the first
 occurrence, recoll is started if it's not already running. Further
 occurrences toggle recoll between visible and minimized states. Never
 thought this would be useful until someone asked for it. Can't do without
 it anymore :) 
 This works well with both Gnome and KDE, but is implemented using a gnome
 library (*libwnck*) and its python interface, which you may have to install
 on your system if you are a pure KDE user. The library most probably exists
 in the package repositories for your distribution, so this should not be
 too complicated. 
 This should also work with other window managers, because it is based on a
 standard window manager interface extension (EWMH) that most modern window
 managers implement. 
 === Installing the script (all desktops):
 - You will need the libwnck library and its python interface. These are
  usually part of a gnome installation, otherwise check and possibly
  install them. For OpenSuse, the library should already be there but you
  need to install gnome-python-desktop. 
 - Download the
 link:https://www.recoll.org/files/hotrecoll.py[http://www.recoll.org/files/hotrecoll.py
 script]. If you have a recent recoll installation (1.14.3 and 
  further), it's already in the recoll filters directory
  ('/usr/[local/]share/recoll/filters') 
 - Copy the script to some permanent place (ie: '~/bin') and make it
  executable (you can leave it in the filters dirs if it's there). In a
  shell window: `chmod +x hotrecoll.py`.
 - You can check that the script works (or not) by executing it on the
  command line. It does not need an argument. Recoll should appear or
  disappear every time you execute the script. A few warning messages may
  be considered normal. If the script says that it does not find the wnck
  library or some other module, you'll have to install them. 
 === Installing the keyboard shortcut (Gnome):
 - _System->Preferences->Keyboard shortcuts_, or execute
  *gnome-keybinding-properties* 
 - Click add, Name, ie: StartRecoll, Action: /path/to/hotrecoll.py
 - This will add the shortcut to the "Custom shortcuts" section. You can
  then click in the "Shortcut" column for "StartRecoll", and type any key
  combination (ie: push F12) to assign a key shortcut. 
 === Installing the keyboard shortcut (KDE):
 Under KDE installing a global custom keyboard shortcut like we need is most
 helpfully not under "Keyboard Shortcuts" but under "Input Actions". 
 - _Kmenu -> Configure Desktop -> Input Actions -> Edit -> New -> Global
  Shortcut -> Command/Url_ 
 - A new Action appears, named _New Action_. You can rename it something
  like +hotrecoll+ for clarity. 
 - Click the _Trigger_ tab, click the input area and press your preferred
  key combination (ie: F12) 
 - Click the _Action_ tab, and enter +hotrecoll.py+ (if it's in your PATH),
  or else the full path to the command (e.g.:
  '/usr/share/recoll/filters/hotrecoll.py').
 - Click _Apply_.
 === Installing the keyboard shortcut (XFCE):
 Open the settings manager, and add the shortcut in the 
 _Application Shortcuts_ panel inside the _Keyboard_ tool.
 === Other environments
 Many window managers have a way to set up a keyboard shortcut for running
 an arbitrary command. You'll need to look at the documentation for yours,
 or search the web for a solution.  
 An alternative independant of the environment would be to use the XBindKeys
 utility. See this link:http://www.linux.com/archive/feed/59494[linux.com
 article] for helpful instructions. 
--- a/website/faqsandhowtos/IndexMailHeader.txt
+++ b/website/faqsandhowtos/IndexMailHeader.txt
@ -0,0 +1,33 @@
 == Indexing arbitrary mail headers
 By default the Recoll mail handler only processes a subset of email headers
 (+From+, +To+, +Cc+, +Date+, +Subject+). It is possible to index additional
 headers by specifying them inside the 'fields' configuration file, inside
 the configuration directory (typically '~/.recoll/').
 Lengthy explanations are not really needed here, and I'll just show an
 example (duplicated from the configuration section of the manual):
 ----
 [prefixes]
 # Index mailmytag contents (with the given prefix)
 mailmytag = XMTAG
 [stored]
 # Store mailmytag inside the document data record (so that it can be
 # displayed - as %(mailmytag) - in result lists).
 mailmytag = 
 [mail]
 # Extract the X-My-Tag mail header, and use it internally with the
 # mailmytag field name
 x-my-tag = mailmytag
 ----
 Limitations:
 - The mail filter will only process the first instance for a header
  occurring several times.
 - No decoding will take place (ie for non-ascii headers which would have
  some kind of encoding). 
--- a/website/faqsandhowtos/IndexMozillaCalendari.txt
+++ b/website/faqsandhowtos/IndexMozillaCalendari.txt
@ -0,0 +1,32 @@
 == Indexing Mozilla calendar data
 Mozilla calendar programs (*Sunbird*, *Lightning*) do not store their
 data in +ics+ files natively. They use an *SQLite* database (the
 'storage.sdb' file inside the profile). This means that calendar data
 cannot be indexed directly.  
 To get Recoll to index calendar data, you need to export it to an +ics+
 file. This can be done manually, from the application menus, or, by
 installing the
 link:https://addons.mozilla.org/en-US/sunbird/addon/3740[Automatic Export
 extension]. 
 The extension can be configured to export the data when exiting the
 program, or at regular time intervals.  You can even set up a command to be
 executed after the export. If you are not using real time indexing, this
 can usefully be *recollindex*.
 In _Tools->Add Ons->Automatic Export preferences_, in the _Start an
 application after export_ subpanel, set _Path of application_ to
 '/usr/[local/]bin/recollindex' and _Parameters of application_ to
 something like _-i;/home/me/path/to/nameofexportedcal.ics_ 
 This will ensure that the calendar is indexed every time it is exported
 (this is not necessary though, you can let the next batch indexing pass
 take care of it). 
 It may happen that the exported data has some syntax errors which will
 prevent indexing with the *rclics* filter which was distributed up to
 Recoll 1.13.04 (included). You may get an updated filter from the
 link:https://www.recoll.org/download.html[Recoll download page].
--- a/website/faqsandhowtos/IndexOnAc.txt
+++ b/website/faqsandhowtos/IndexOnAc.txt
@ -0,0 +1,24 @@
 == Laptops: starting or stopping indexing according to AC power status
 For people using real time indexing on a laptop, kind user "The Doctor"
 contributed a script to automatically start and stop indexing according to
 power status. The script can be found here:
 link:https://bitbucket.org/medoc/recoll/src/tip/src/desktop/recoll_index_on_ac.sh[recoll_index_on_ac.sh]
 To use it, you need to copy it somewhere (e.g.: '/usr/bin', but any place
 will do), make it executable (`chmod a+x recoll_index_on_ac.sh`), and edit
 '~/.config/autostart/recollindex.desktop'
 Change the following line:
    Exec=recollindex -w 60 -m
 to something like the following (depending where you copied the script):
    Exec=/usr/bin/recoll_index_on_ac.sh
 You may also want to change
 '/usr/share/recoll/examples/recollindex.desktop', otherwise your change
 will be reverted the next time you toggle real time indexing through the
 GUI. And, yes, sorry about it, _this_ change will be lost on the next
 Recoll update, so save a copy.
--- a/website/faqsandhowtos/IndexOutlook.txt
+++ b/website/faqsandhowtos/IndexOutlook.txt
@ -0,0 +1,11 @@
 == Indexing Outlook archives ==
 Recoll has no direct support for indexing Microsoft Outlook data, because,
 if you are a Windows user, you probably are not a good customer for Linux
 desktop indexing...
 However, if you have a need to index Outlook data at some point, I can
 recommend the excellent link:http://www.five-ten-sg.com/libpst/[libpst]
 library and its link:http://www.five-ten-sg.com/libpst/rn01re01.html[readpst]
 utility. Using this you can very easily convert the Outlook data into MH or
 mbox format, and then index the result with Recoll.
--- a/website/faqsandhowtos/IndexWebHistory.txt
+++ b/website/faqsandhowtos/IndexWebHistory.txt
@ -0,0 +1,29 @@
 == Indexing Web history with the Firefox extension ==
 Note: this document is valid for Recoll versions from 1.18.
 The link:http://sourceforge.net/projects/recollfirefox/[Recoll Firefox
 extension] 
 works together with Recoll to index the Web pages that you visit. The
 extension is based on an older one which was initially written for the
 Beagle indexer.
 The extension works by copying the data for the visited pages to a queue
 directory ('~/.recollweb/ToIndex' by default), from which they are
 indexed and removed by Recoll, and then stored in a local cache.
 The extension is now hosted on the Mozilla add-ons site, so you can install
 it very simply in Firefox: link:https://addons.mozilla.org/fr/firefox/addon/recoll-indexer-1/[Recoll Firefox add-on page].
 This feature can be enabled in the Recoll GUI index configuration panel
 (Web history section), or by editing the configuration file (set
 +processwebqueue+ to 1).
 Please remember that Recoll only stores a limited amount of cached web data
 (adjustable from the GUI Index Configuration section), and that old pages
 will be purged from the index. Pages that you want to archive permanently
 need to be saved elsewhere, as they will otherwise eventually disappear
 from the Recoll results.
 Recoll will index +.maff+ files, which may be a better choice for archival
 usage. 
--- a/website/faqsandhowtos/Makefile
+++ b/website/faqsandhowtos/Makefile
@ -0,0 +1,9 @@
 .SUFFIXES: .txt .html
 .txt.html:
 	asciidoc $<
 all: $(addsuffix .html,$(basename $(wildcard *.txt)))
 clean:
 	rm *.html
--- a/website/faqsandhowtos/MultipleIndexes.txt
+++ b/website/faqsandhowtos/MultipleIndexes.txt
@ -0,0 +1,96 @@
 == Creating and using multiple indexes
 === Why would you want to do this ?
 - Easy adjustment of search areas: you can filter results by using the
  directory filter in the advanced search panel, but, if you have
  separate well defined places where you store different kind of data,
  it is easier to maintain separate index and use the External indexes
  dialog to switch them on or off, and it will also yield much better
  search performance. 
 - Shared indexes: it may be useful to maintain one or several indexes
  for shared data, and separate personal indexes for each user. Indexes
  can be shared over the network.
 - Creating separate indexes for removable volumes.
 === How to do it
 As an example we'll suppose that you have Recoll installed and indexing
 your home directory, and that you would like to have a separate index for
 /usr/shared/doc. 
 You need to create a separate configuration for the new index, then add it
 to the external indexes list in the user interface, and activate it as
 needed. 
 . Create a directory for the new index, and create an empty configuration
  file
 +
 ----
 cd
 mkdir .recoll-sharedoc
 touch .recoll-sharedoc/recoll.conf
 ----
 . Either edit the new configuration by hand or start recoll to use the GUI
   configuration editor.
 +
 ----
 cd .recoll-sharedoc
 echo "topdirs = /usr/share/doc" > recoll.conf
 # OR
 recoll -c ~/.recoll-sharedoc
 ----
 +
 If using the GUI, click _Cancel_ when asked, to start the configuration
 editor.
 . Perform initial indexing. If you chose the GUI route, indexing will
  start as soon as you leave the configuration editor. Else, on the
  command line: 
 +
 ----
 recollindex -c ~/.recoll-sharedoc
 ----
 . Optionally set up *cron* to perform nightly indexing, use +crontab -e+
  and insert a line like the following:
 +
 ----
 45 20 * * * recollindex -c ~/.recoll-sharedoc
 ----
 +
 This would start the indexing at 20:45. `crontab -e` will use the *vi*
 editor by default, you can change this by using the EDITOR
 environment variable. Exemple: `EDITOR=kate crontab -e`
 Your favorite desktop may also have a dedicated tool to add crontab entries.
 . Start recoll and choose the _Preferences->External_ index dialog menu
  entry, then click the Browse button (near the bottom), and select the
  new index Xapian database directory '~/.recoll-sharedoc/xapiandb'
  Then click _Add index_.
 . You can then activate or deactivate the new index by clicking the box
  in front of the directory name in the list. 
 When adding an index shared by multiple users, it may be helpful to use the
 RECOLL_EXTRA_DBS environment variable instead of editing individual
 configurations, see the manual for more details.
 === Paths adjustments
 When sharing indexes over a network, in most cases, the indexed data will
 be accessible through different paths on the different hosts. This will
 prevent the Preview and Open functions to work because the paths they get
 from the index do not match the ones which are usable from the local
 host.
 For example my home directory is accessed as '/home/me' on my home
 machine, and as '/net/myhost/home/me' on other hosts. By default, trying
 to access a result from a remote host would use the first path, when the
 second is the one that would work.
 As of release 1.19 **Recoll** has a facility to perform index-dependant
 path translations. This facility is accessible from the _external index
 dialog_ in the GUI preferences. Paths translations can be set for the main
 index if no index is selected (rarely useful), or for the selected
 additional index.
--- a/website/faqsandhowtos/MuttAndRecoll.txt
+++ b/website/faqsandhowtos/MuttAndRecoll.txt
@ -0,0 +1,77 @@
 == Interfacing Recoll and Mutt
 It is possible to either use Mutt as a Recoll search result viewer, or
 start Recoll from the Mutt search.
 === Starting Mutt to view Recoll search results
 This method and the associated 
 link:http://www.recoll.org/files/recoll2mutt[recoll2mutt script] were kindly
 contributed by Morten Langlo.
 This allows finding mail messages in recoll and then calling *mutt*
 or *mutt-kz* to read or process the mail. 
 Installation:
 - Copy the [[http://www.recoll.org/files/recoll2mutt|recoll2mutt script]]
  somewhere in your PATH, and make it executable.
 - In the **recoll** GUI menus: 
 _Preferences->GUI configuration->User interface->Choose editor applications_
 change the entry for "message/rfc822" to: +recoll2mutt %f+
 The script has options for setting a number of parameters, you may not need
 to set any of them, the defaults are:
 - -c mutt
 - -F .muttrc
 - -m Mail
 - -x "-fn 10*20 -geometry 115x40"
 Example:
 ----
 recoll2mutt -c mutt-kz -F .mutt_kzrc -m Mail -x "-fn 10*20 -geometry 115x40"  %f
 ----
 The option +-x+ is passed to *xterm*, which is used to call *mutt* or
 *mutt-kz*.
 The script works for both _mbox_ and _maildir_ mail boxes, and it
 expects the configuration file for mutt and the mail directory to reside in
 your $HOME and the spool file to be '/var/spool/mail/$USER' if it is
 not in your mail directory. But it is easy to change the values in the
 script if you need to.
 *mutt* is opened with the right mailbox and limit set to _Date_ and
 _Sender_.  In theory you could set limit to _Message-Id_, but very often
 *mutt* reports, that there are invalid patterns in _Message-Id_, so do it
 safe, even though all emails in the opened mail box with the same date from
 the sender are shown.
 === Starting Recoll from the Mutt search
 This will work only when using maildir storage (messages in individual
 files). It will not work with mailbox files. The latter would probably be
 possible by extracting the individual result messages using the Python
 interface, but I did not try.
 The classic way to interface Mutt and a search application is to create a
 shortcut to an external command which creates a temporary Maildir
 containing the search results.
 There is such a script for Recoll, you will find it link:https://bitbucket.org/medoc/recoll/raw/41d41799dbac4c69a34db985b3ab9f1597c9c742/src/python/samples/mutt-recoll.py[here].
 Copy the script somewhere in your PATH, and make it executable, then add
 the following line to your '.muttrc':
 ----
 macro index S "<enter-command>unset wait_key<enter><shell-escape>mutt-recoll.py -G<enter><change-folder-readonly>~/.cache/mutt_results<enter>" \
          "search mail (using recoll)"
 ----
 Obviously, you can replace the 'S' letter with whatever will suit you (e.g:/)
--- a/website/faqsandhowtos/NonAsciiFileNames.txt
+++ b/website/faqsandhowtos/NonAsciiFileNames.txt
@ -0,0 +1,85 @@
 == Unix and non-ASCII file names, a summary of issues
 Unix/Linux file and directory names are binary byte C strings. Only the
 null byte and the slash character (/) are forbidden inside a name,
 nowhere does the kernel interpret the strings as meaningful or
 printable.  
 In the old times, all utilities that would display to the user were
 ASCII-based, and people would use pure printable ASCII file names (even
 using space characters inside names was a cause for trouble). Non
 alphanumeric characters were exclusively used for playing tricks on
 colleagues. And all was well. 
 Then the devil came under the guise of accented 8 bit characters. The
 system has no problem with them, file names are still binary C strings, but
 the utilities have to display them or take them as input, and, because
 there is no encoding specification stored with the file names, they can
 only do this according to the character encoding taken from the user's
 current locale.
 For example fr_FR.UTF-8, and fr_FR.ISO8859-1 could be used simultaneously
 on the same system (by different users), but they are completely
 uncompatible: ISO-8859-1 strings are illegal when viewed in an UTF-8 locale
 (will display as interrogation points or some other conventional error
 marker). UTF-8 strings will display as gibberish in an ISO-8859-1 locale.
 This means that the file names created by an UTF-8 user are displayed as
 garbage to the ISO-8859 one...
 If you ever change your locale, your old files are still there and named
 the same (in the binary sense), but the names display badly and you have
 great trouble inputing them. If you add distributed (NFS) file system
 issues, things become totally unmanageable. Also think about archives sent
 from another system with a different encoding.
 For what concerns Recoll:
 - The file names inside recoll.conf are not transcoded, they are taken as
  binary strings (mostly, only +\n+ and +space+ are a bit special), and
  passed as is to the system. So if you edit 'recoll.conf' with a text
  editor, inside the same locale that is or has been used for file names,
  you'll be fine.
 - There was a bug in the GUI configuration tool, up to 1.12, it should
  transcode between the internal Qt format and locale-dependant strings,
  but it doesn't or does it badly.  
 - There is also an exception for the +unac_except_trans+ variable, this
  *has* to be UTF-8, so if the rest of the file uses another encoding,
  you'll need to edit two separate files and concatenate them.
 As of version 1.13, Recoll uses local8Bit()/fromLocal8Bit() to convert
 recoll.conf file names from/to QStrings (it uses UTF-8 for all string
 values which are not file names).
 The Qt file dialog is broken (at least was, I have not checked this on
 recent versions). It should consider file paths as almost-binary data, not
 QStrings, but doesn't. In consequence, things are even more broken than
 necessary as seen from there:
 With LANG="C", no non-ASCII paths can't be used at all:
 - Strings read from recoll.conf are stripped of 8bit characters before display.
 - Directory entries with 8bit characters are not displayed at all in the
  selection dialog.
 With LANG="fr_FR.UTF-8", only UTF-8 paths can be used:
 - Strings read from recoll.conf are damaged when converted to QString
  (except those that were actually UTF-8) 
 - Only the UTF-8 directory entries are displayed in the selection dialog.
 With LANG="fr_FR.iso8859-1", everything works ok.
 - Strings read from recoll.conf are displayed with weird characters if
  they use another encoding such as UTF-8, but are correctly maintained
  and can be read back from the dialogs and rewritten without damage. 
 - Directory entries with 8 bit characters are displayed weirdly (normal),
  but can be manipulated without trouble (this includes utf-8 names of
  course). 
 In conclusion, only the iso-8859 locales can be used for handling mixed
 encoding situations. This is a possible workaround for people who need it. 
 More data about path encoding issues:
 http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html 
--- a/website/faqsandhowtos/OpenHelperScript.txt
+++ b/website/faqsandhowtos/OpenHelperScript.txt
@ -0,0 +1,71 @@
 == Starting native applications
 It is sometimes difficult to start a native application on a result
 document, especially when the result comes from a container file (ie: email
 folder file, chm file).  
 The problem is that native applications usually expect at most a file name
 on the command line, and sometimes not even that (emailers). 
 The _Open parent documents_ link in the result list right click menu is
 sometimes useful in this situation (e.g.: +chm+ files). 
 In some other cases it may help that Recoll does make a lot of data
 available to the application. This data may have to be pre-processed in a
 script before calling the actual application. 
 Details about configuring how the native application or script are called
 are given with the 
 link:http://www.recoll.org/usermanual/usermanual.html#RCL.INSTALL.CONFIG.MIMEVIEW[description of the mimeview configuration file]
 Information about
 link:http://www.recoll.org/usermanual/usermanual.html#RCL.INSTALL.CONFIG.FIELDS[configuring
 customised fields] may also be useful in combination. 
 === Example
 This is a simple example, because it does not need to use special
 fields. It just shows how to solve a simple issue by using an intermediary
 script. The problem is due to the fact that thunderbird's +-file+ option
 won't open a file if the extension is not '.eml'. Jorge, the kind Recoll
 user who supplied the example stores his email in Maildir++ format, the
 file names have no extension, so an intermediary script is necessary to get
 thunderbird to open them: 
 Note that this only works with messages stored in Maildir or MH format (one
 message per file). As far as I know, there is no way to get Thunderbird to
 open an arbitrary mbox file. 
 The 'recoll-thunderbird-open-file' script:
 ----
 #!/bin/sh
 cp $1 /tmp/$$.eml
 thunderbird  -file /tmp/$$.eml
 ----
 Create the file in an editor, save it somewhere, and make it executable
 (`chmod +x recoll-thunderbird-open-file`).
 The mail line in the '~/.recoll/mimeview' file:
 ----
 [view]
 message/rfc822  = recoll-thunderbird-open-file  %f
 ----
 If the place where you saved the script is not in your PATH, you will need
 to use the full path instead of just the script name, as in  
 ----
 [view]
 message/rfc822 = /home/me/somewhere/recoll-thunderbird-open-file  %f
 ----
 You should then be able to open the messages in Thunderbird, which is
 useful, for example, to handle the attachments. 
 With recent Recoll versions, if using the normal option of letting the
 Desktop chose the _Open_ application to use (_Use Desktop default_),
 you should also add +message/rfc822+ to the exceptions, and the whole
 thing is probably more easily done from the Recoll GUI. 
--- a/website/faqsandhowtos/PreventIndexingDir.txt
+++ b/website/faqsandhowtos/PreventIndexingDir.txt
@ -0,0 +1,27 @@
 == Preventing indexing in a directory
 === Why would you want to do this ?
 By default, recollindex (or the indexing thread inside the recoll QT user
 interface) will process your home directories and most its subdirectories,
 at the exception of some well known places (thumbnails, beagle and web
 browser caches, etc.) 
 You may want to prevent indexing in some directories where you don't expect
 interesting search results. This will avoid polluting the search result
 lists, speed up indexing times and make the index smaller. 
 === How to do it
 There are two ways to block indexing at certain points: either by listing
 specific paths, or by directory name pattern matches. 
 - Blocking specific paths: this is controlled by the skippedPaths variable
  in the main configuration file. You can adjust the value either by
  editing the file or by using the indexing configuration dialog:
  _Preferences->Indexing configuration->Global parameters->Skipped paths_
 - Using pattern matches: these are listed in the skippedNames variable in
  the main configuration file. You can adjust the value either by editing
  the file or by using the GUI: _Preferences->Indexing configuration->Local
  parameters->Skipped names_
--- a/website/faqsandhowtos/ProblemSolvingData.txt
+++ b/website/faqsandhowtos/ProblemSolvingData.txt
@ -0,0 +1,157 @@
 == Gathering useful data for asking help about or reporting a Recoll issue
 Once in a while it will happen that a Recoll program will either signal an
 error, or even crash (either the *recoll* graphical interface or the
 *recollindex* command line indexing command). 
 Reporting errors and crashes is very useful. It can help others, and it can
 get your own problem solved. 
 Any problem report should include the exact Recoll and system versions.
 If at all possible, reading the following and performing part of the
 suggested steps will be useful. This is not a condition for obtaining help
 though ! If you have any problem and have a difficulty with the following,
 just contact the mailing list or the developers (see contacts on
 link:https://www.recoll.org/support.html[the Recoll site support page]).
 If the problem concerns indexing, and was initially found using the
 *recoll* GUI, you should try to reproduce it using the
 *recollindex* command-line indexer, which is much simpler and easier to
 debug. 
 There are then two sources of useful information to diagnose the issue: the
 debug log file and, possibly, in case of a crash, a stack trace. 
 Crash and other problem reports are of very high value to me, and I am
 willing to help you with any of the steps described below if it is not
 familiar to you. I do realize that not everybody is a programmer or a
 system administrator. 
 === Obtaining information from the log file
 All Recoll commands write a varying amount of information to a common log file.
 _All commands use the same log, and the file is reset every time a command
 is started: so it is important to make a copy right after the problem
 occurs (for example, do not start *recoll* after a *recollindex*
 crash, this would reset the log). A workaround for this issue is to let the
 messages go to the default +stderr+, and redirect this._
 By default, the messages are output to +stderr+, and you probably don't even
 see them if Recoll is started from the desktop. In this case, you need to
 set the parameters so that output goes to a file, and the appropriate
 verbosity level is set. When using the command-line, you may actually
 prefer to redirect stderr to avoid the log-truncating issue described
 above. 
 You can set the log parameters from the GUI _Indexing parameters_
 section or by editing the '~/.recoll/recoll.conf' file: set the
 +loglevel+ and +logfilename+ parameters. E.g.: 
 ----
 loglevel = 6
 logfilename = /tmp/recolltrace
 ----
 The log file can become very big if you need a big indexing run to
 reproduce the problem. Choose a file system with enough space available
 (possibly a few gigabytes). 
 Then run the sequence that leads to the problem, and make a copy of the log
 file just after. If the log is too big, it will usually be sufficient to
 use the last 500 lines or so (tail -500). 
 ==== Single file indexing issues
 When the problem concerns, or can be reproduced with, a single file it is
 very cumbersome to have to run a full indexing pass to reproduce it. There
 are two ways around this: 
 - Set up an ad hoc configuration with only the file of interest, or its
  parent directory: 
 ----
 cd
 mkdir recoll-test
 cd recoll-test
 echo /path/to/my/file/or/its/parent/dir > recoll.conf
 echo 'loglevel = 6' >> recoll.conf
 echo 'logfilename = /tmp/recolltrace' >> recoll.conf
 recollindex -z -c .
 ----
 - Use the -e and -i options to recollindex to erase/reindex a single
  file. Set up the log, then: 
 ----
 recollindex -e /path/to/my/file
 recollindex -i /path/to/my/file
 ----
 When using the second approach, you must take care that the path used is
 consistent with the paths listed/used in the configuration (ie: if '/home' is
 a link to '/usr/home', and '/usr/home/me' is used in the configuration
 +topdirs+, `recollindex -i /home/me/myfile` will not work, you need
 to use `recollindex -i /usr/home/me/myfile`.
 === Obtaining a stack trace
 If the program actually crashes, and in order to maximize usefulness, a
 crash report should also include a so-called stack trace, something that
 indicates what the program was doing when it crashed. Getting a useful
 stack trace is not very difficult, but it may need a little work on your
 part (which will then enable me do my part of the work). 
 If your distribution includes a separate package for Recoll debugging
 symbols, it probably also has a page on its web site explaining how to use
 them to get a stack trace. You should follow these instructions. If there
 is no debugging package, you should follow the instructions below. A little
 familiarity with the command line will be necessary. 
 ==== Compiling and installing a debugging version
 - Obtain the recoll source for the version you are using (www.recoll.org),
  and extract the source tree. 
 - Follow the
  link:http://www.lesbonscomptes.com/recoll/usermanual/rcl.install.building.html[instructions
  for building Recoll from source] with the following modifications:
 - Before running configure, edit the mk/localdefs.in file and remove the
  -O2 option(s). 
 - When running configure, specify the standard installation location for
  your system as a prefix (to avoid ending up with two installed versions,
  which would almost certainly end in confusion). On Linux this would
  typically be: `configure --prefix=/usr`
 - When installing, arrange for the installed executables not to be stripped
  of debugging symbols by specifying a value for the STRIP environment
  variable (ie: *echo* or *ls*): `sudo make install STRIP=ls`
 ==== Getting a core dump
 You will need to run the operation that caused the crash inside a writable
 directory, and tell the system that you accept core dumps. The commands
 need to be run in a shell inside a terminal window. E.g.: 
 ----
 cd
 ulimit -c unlimited
 recoll  #(or recollindex or whatever you want to run).
 ----
 Hopefuly, you will succeed in getting the command to crash, and you will
 get a core file. A possible approach then would be to make both the
 executable and the core files available to me by uploading it to a file
 sharing site (the core file may be quite big). You should be aware though
 that the core file may contain some of the data that was being indexed,
 which may be a privacy issue. Another approach is to generate the stack
 trace yourself. 
 === Using gdb to get a stack trace
 - Install gdb if it is not already on the system.
 - Run gdb on the command that crashed and the core file (depending on the
  system, the core file may be named "core" or something else, like
  recollindex.core, or core.pid), ie: {{{gdb /usr/bin/recollindex core}}} 
 - Inside gdb, you need to use different commands to get a stack trace for
  recoll and recollindex. For recollindex you can use the bt command. For
  recoll use `thread apply all bt full`
 - Copy/paste the output to your report email :), and quit gdb ("q").
--- a/website/faqsandhowtos/QpdfviewHelperScript.txt
+++ b/website/faqsandhowtos/QpdfviewHelperScript.txt
@ -0,0 +1,61 @@
 == Starting native applications ==
 Another example of using an intermediary script for an application with a
 command line syntax which can't be directly defined in mimeview. 
 We use a script to preprocess and adapt the options before calling the
 actual command. 
 Details about configuring how the native application or script are called
 are given with the
 link:http://www.recoll.org/usermanual/usermanual.html#RCL.INSTALL.CONFIG.MIMEVIEW[description
 of the mimeview configuration file].
 *qpdfview* (link:http://launchpad.net/qpdfview[web site]) is a very
 lightweight tabbed PDF viewer with great search performance and result
 highlighting.
 It does support parsing the search term and page number from the command
 line with the following syntax:
 ----
 qpdfview --unique "%f"#%p --search "%s"
 ----
 However, qpdfview will not launch if either %p or %s are empty in the
 command above. To accommodate for that, Recoll user Florian has written a
 small wrapper shell script:
 ----
 #!/bin/bash
 qpdfviewpath=qpdfview
 if [ -z $2 ]
 then
    page=""
 else
    page="#"$2""
 fi
 if [ -z $3 ]
 then
    search=""
 else
    search="--search "$3""
 fi
 $qpdfviewpath --unique "$1"$page $search >&0 2>&0 &
 ----
 The corresponding handler line for Recoll would be (depending on how you
 name the script and where you store it):
 ----
      qpdfviewwrapper %f %p %s
 ----
--- a/website/faqsandhowtos/QueryFromC.txt
+++ b/website/faqsandhowtos/QueryFromC.txt
@ -0,0 +1,18 @@
 == Querying Recoll from a C program
 The easiest way to query Recoll from a C or C++ program is to execute an
 external search command (`recollq` or `recoll -t`).
 I have written a simple C module which deals with the related housekeeping
 and presents an easy to use API to the rest of the code. You will find it
 here:
    https://bitbucket.org/medoc/recoll-capi
 It is a bit experimental and will only work with recoll 1.20 for now
 (because it uses a new option for recollq). However it would be trivial to
 modify for working with 1.19, get in touch with me if you need this.
 The other approach is to link with the Recoll library. This has no official
 API, but in practise, the internal one is fairly stable, and if you want to
 choose this approach, you should start from the code in recollq.cpp
--- a/website/faqsandhowtos/ReplaceCategories.txt
+++ b/website/faqsandhowtos/ReplaceCategories.txt
@ -0,0 +1,58 @@
 == Replacing the Category filter controls
 The document category filter controls normally appear at the top of the
 *recoll* GUI, either as checkboxes just above the result list, or as a
 dropbox in the tool area.
 By default, they are labeled _Media_, _Message_, _Spreadsheet_, _Text_,
 etc. and each map to a document category.
 The mapping used to be fixed. You could change the number and composition
 of categories by redefining them inside the {{{mimeconf}}} configuration
 file (you still can), but the filters always used document categories.
 Categories can also be selected from the query language by using an
 +rclcat:+ selector. E.g.: _rclcat:message_.
 As of Recoll release 1.17, the filters are not hard-wired any more. They
 map to query language fragments. This means that you can freely redefine
 what they do. 
 The associations are configured inside the 'mimeconf' file, in the
 +[guifilters]+ section. Most GUI parameters are stored in the *Qt*
 configuration file, so this is not entirely consistent, and you will have
 to bear with my lazyness here.
 A simple exemple will hopefuly make things clearer. If you add the 
 following to your '~/.recoll/mimeconf' file:
 ----
 [guifilters]
 Big Books = dir:"~/My Books" size>10K
 My Docs = dir:"~/My Documents"
 Small Books = dir:"~/My Books" size<10K
 System Docs = dir:/usr/share/doc
 ----
 You will have four filter checkboxes, labelled _Big Books_, _My Docs_, etc.
 The text after the equal sign must be a valid query language fragment, and
 will be translated to a *Recoll* query and combined with the rest of the
 query with an AND conjunction.
 Any name text before a colon character will be erased in the display, but
 used for sorting. You can use this to display the checkboxes in any order
 you like. For exemple, the following would do exactly the same as above,
 but ordering the checkboxes in the reverse order.
 ----
 [guifilters]
 d:Big Books = dir:"~/My Books" size>10K
 c:My Docs = dir:"~/My Documents"
 b:Small Books = dir:"~/My Books" size<10K
 a:System Docs = dir:/usr/share/doc
 ----
--- a/website/faqsandhowtos/ResultsThumbnails.txt
+++ b/website/faqsandhowtos/ResultsThumbnails.txt
@ -0,0 +1,23 @@
 == Result list thumbnails and how to create them
 Recoll will display thumbnails for the results if the images exist in the 
 standard location ('$HOME/.thumbnails' or '$HOME/.cache/thumbnails' depending
 on the xdg version). 
 But it will not create thumbnails, mainly because it is very hard to do
 portably.
 Thumbnails are most commonly created when you visit a directory with your
 file manager, but visiting the whole file tree just to create thumbnails is
 a bit fastidious.
 One simple trick to create thumbnails from the recoll GUI is to visit the
 parent directory for a result by using the _Open parent document/folder_
 entry in the right-click menu.
 You can also find tools for the systematic creation of thumbnails for a
 directory tree. Three such tools are discussed on this 
 link:http://askubuntu.com/questions/199110/how-can-i-instruct-nautilus-to-pre-generate-pdf-thumbnails[askubuntu.com discussion]
 Also please note that no thumbnails can currently be generated or displayed
 for embedded documents (attachments, archive members, etc.).
--- a/website/faqsandhowtos/SavingConfig.txt
+++ b/website/faqsandhowtos/SavingConfig.txt
@ -0,0 +1,61 @@
 == User configuration backup
 === Why you would want to do this
 If you are going to reinstall your system, and have some custom
 configuration, you may save some time by making a backup of your
 configuration and restoring it on the new system, rather than going through
 the menus to recreate it.
 === How to do it
 ==== Index/search configuration
 The main recoll configuration data is normally kept inside '~/.recoll' or
 whatever *$RECOLL_CONFDIR* is set to.
 This directory contains both configuration files and generated index
 data.In a standard configuration, the following files and directories
 contain generated data: 
 - 'xapiandb' contains the Xapian index, which normally consumes most of the
  total space. 
 - 'aspdict.en.rws' contains the aspell dictionary used for spelling
  corrections. 
 - 'mboxcache' contains cached offset data for email messages inside mbox
  folders. 
 - 'webcache' contains saved web pages. This is more than a cache as
  destroying it will purge the corresponding data during the next
  indexing. 
 The other files are either very small or contain configuration data.
 If you want to only save configuration, using minimum space, you can
 destroy the above files and directories (with the possible exception of
 'webcache'). Then taking a copy of the '.recoll' directory and adding the
 GUI configuration data described in the next will get you a full
 configuration data backup. 
 ==== GUI configuration
 The parameters set from the _Query configuration_ Qt menus are stored in
 Qt standard places:
 - '~/.qt/recollrc' for Qt 3.x
 - '~/.config/Recoll.org/recoll.conf' for Qt 4 and later
 ==== Other data
 If you wish to save index data in addition to the customisation files,
 which only makes sense if the document access paths do not change after
 reinstallation, you can just take a backup of the full '.recoll'
 directory, taking care that the storage locations for some data elements
 can be changed (not be inside '.recoll'): 
 - The index data is normally kept inside '~/.recoll/xapiandb', but the
  location of this directory can be modified by the +dbdir+
  configuration parameter if it is set (check 'recoll.conf'). 
 - If you use the Firefox Recoll plugin, the WEB history cache is normally
  kept inside '~/.recoll/webcache', but the location can be modified by
  the +webcachedir+ configuration parameter. 
--- a/website/faqsandhowtos/UnityLens.txt
+++ b/website/faqsandhowtos/UnityLens.txt
@ -0,0 +1,109 @@
 == Building and Installing the Ubuntu Unity Recoll Lens
 Important preliminary notes:
 - This only makes sense for Ubuntu versions using the Unity environment:
  Natty (11.04), Oneiric (11.10), Precise (12.04), and later. 
 - _Remember that you still need to use the recoll GUI (or the recollindex
  //command) to get the indexing going !_
 - The Lens is artificially limited to showing at most 20 results. Use the
  recoll GUI for more complete capabilities (or edit rclsearch.py, change
  the "if actual_results >= 20:" line). 
 === The Lens with Recoll 1.17 and later
 If you are willing to install or upgrade to Recoll version 1.17, all
 necessary packages are on the Recoll PPA, you just need to add the
 repository to your system sources and add or upgrade the packages: *_/This
 is the recommended approach!_*
 ----
 sudo add-apt-repository ppa:recoll-backports/recoll-1.15-on
 sudo apt-get update
 sudo apt-get install recoll-lens recoll
 ----
 This document may still be useful if you want to modify the lens source
 code.
 === The Lens with older Recoll versions
 If, for some reason, you wish to test the Lens with an older Recoll
 version, read the following. 
 Please not that such an installation is somewhat crippled: you will not be
 able to display results for embedded documents (emails inside an mbox,
 attachments etc.). This requires a recoll command line option which is only
 available in 1.17 
 The Lens is based on the Recoll Python module which is not built by default
 for versions prior to 1.17, so so you will first need to pull the Recoll
 source code (for you version), then untar and proceed with the
 configure/build instructions below. 
 The following uses --prefix=/usr. I have no real reason to believe 
 that this would not work with /usr/local (lenses are also searched there by
 default). If you confirm that things work with another prefix, please drop
 me a line.
 When doing this over a previous Recoll compilation, run a "make clean" to
 get rid of the non-PIC objects. 
 Note that the following instructions change nothing to your existing Recoll
 installation, they only install the Python module and the Unity Lens,
 recoll, recollindex etc. are unaffected. 
 '/TOP/OF/RECOLL/SRC' designates the top of the recoll source tree.
 === Configure and build the recoll library and python module, install the module
 The following needs the development packages for Xapian, Python and zlib.
 ----
 cd /TOP/OF/RECOLL/SRC 
 # May fail if no previous build was performed
 make clean
 # the gui/x11 disabling is just here to avoid having to install the
 # development libraries for Qt.
 configure --prefix=/usr --enable-pic --without-x --disable-qtgui
 make
 cd python/recoll
 python setup.py build
 sudo python setup.py install
 ----
 === Build and install the Unity Lens
 ----
 cd /TOP/OF/RECOLL/SRC
 cd desktop/unity-lens-recoll
 configure --prefix=/usr --sysconfdir=/etc 
 sudo make install
 ----
 Voilà, it should work...
 Try to start the Dash, you should see the Recoll checkerboard (or
 whatever...) in the Lens list. 
 The Recoll Lens expects a Recoll query language string, so you can use
 field searches, directory, size, and date filtering (see the
 link:http://www.lesbonscomptes.com/recoll/usermanual/rcl.search.lang.html[Recoll
 manual] for a description of the query language).  
 If you want to disable the Lens, I think that you just have to delete
 '/usr/share/unity/lenses/recoll'
 Other installed files:
 ----
 /usr/libexec/unity-recoll-daemon
 /usr/share/dbus-1/services/unity-lens-recoll.service
 /usr/share/doc/unity-lens-recoll
 /usr/share/unity-lens-recoll
 ----
--- a/website/faqsandhowtos/UsingOpenWith.txt
+++ b/website/faqsandhowtos/UsingOpenWith.txt
@ -0,0 +1,68 @@
 == Using the _Open With_ context menu in recoll 1.20 and newer
 Recoll versions and newer have an _Open With_ entry in the result list
 context menu (the thing which pops up on a right click).
 This allows choosing the application used to edit the document, instead of
 using the default one.
 The list of applications is built from the desktop files found inside
 '/usr/share/applications'. For each application on the system, these
 files lists the mime types that the application can process.
 If the application which you would want listed does not appear, the most
 probable cause is that it has no desktop file, which could happen due to a
 number of reasons.
 This can be fixed very easily: just add a +.desktop+ file to
 '/usr/share/applications', starting from an existing one as a template.
 As an example, based on an original idea from Recoll user +florianbw+,
 the following describes setting up a script for editing a PDF document
 title found in the recoll result list.
 The script uses the *zenity* shell script dialog box tool to let you
 enter the new title, and then executes *exiftool* to actually change
 the document.
 ----
 #!/bin/sh
 PDF=$1
 TITLE=`exiftool -Title -s3 "$PDF"`
 RES=`zenity --entry \
  --title="Change PDF Title" \
  --text="Enter the Title:" \
  --entry-text "$TITLE"`
 if [ "$RES" != "" ]; then 
    echo -n "Changing title to $RES ... " && \
        exiftool -Title="$RES" "$PDF" && \
        recollindex -i "$PDF" && echo "Done!"
 else 
     echo "No title entered"
 fi
 ----
 Name it, for example, 'pdf-edit-title.sh', and make it executable 
 (`chmod a+x pdf-edit-title.sh`).
 Then create a file named 'pdf-edit-title.desktop' inside
 '/usr/share/applications'. The file name does not need to be the same as the
 script's, this is just to make things clearer:
 ----
 [Desktop Entry]
 Name=PDF Title Editor
 Comment=Small script based on exiftool used to edit a pdf document title
 Exec=/home/dockes/bin/pdf-edit-title.sh %F
 Type=Application
 MimeType=application/pdf;
 ----
 You're done ! Restart Recoll, perform a search and right-click on a PDF
 result: you should see an entry named _PDF Title Editor_ in the _Open
 With_ list. Click on it, and you will be able to edit the title.
--- a/website/faqsandhowtos/WhyIsMyFileNotIndexed.txt
+++ b/website/faqsandhowtos/WhyIsMyFileNotIndexed.txt
@ -0,0 +1,99 @@
 == Using the log file to investigate indexing issues
 All *Recoll* processes print trace messages. By default these go to the
 standard error output, and you may not ever see them (in the case, for
 example, of the *recoll* GUI started from the desktop interface). 
 There are a number of potential issues with indexing that may need
 investigation, such as: 
 - A file can't be found by searching even if it appears that it should have
  be indexed (this could happen because the file is not selected at all or
  because a filter program crashes). 
 - The indexing process gets stuck and never finishes.
 - The indexing process ends up with an error.
 - The indexing process seems to be using too much system capacity.
 The right way to approach these problems is to use the *recollindex*
 command line tool (instead of the *recoll* GUI), and to set up the
 trace log to provide information about what indexing is actually doing. 
 Trace log parameters can be set either from the GUI _Preferences->Indexing
 Configuration->Global Parameters_ panel, or by editing the configuration
 file '~/.recoll/recoll.conf'. You should set the following parameters: 
 ----
 loglevel = 6
 logfilename = stderr
 thrQSizes = -1 -1 -1
 ----
 We use _stderr_ instead of an actual file in order to capture direct filter
 messages (such as a *python* stack trace) along with normal
 *recollindex* messages. 
 The last line sets recollindex for single-threaded operation, which will
 make the log much more readable. 
 You should then check that no *recoll* or *recollindex* process is
 currently running, and kill any you find. 
 Then, if this is an issue about an identified file, try indexing it only:
 ----
 recollindex -i myunfindablefile.xxx > /tmp/myindexlog 2>&1
 ----
 If this is a general issue with indexing (process not finishing properly),
 just start it: 
 ----
 recollindex > /tmp/myindexlog 2>&1
 ----
 Usually, having a look at the trace will allow to see what is wrong (e.g.:
 a configuration issue or missing filter), and solve the problem.  
 In case of indexer misbehaviour (e.g. using too much memory, you should run
 _tail -f_ on the log to see what is going on. 
 If this is not enough, please
 link:http://bitbucket.org/medoc/recoll/issues/new[open a tracker issue] and
 attach or link to the log data, or just email me (jfd at recoll.org). 
 *recollindex* and *recollindex -i* usually have the same criteria to
 include a file or not (but see the _Path gotcha_ note below). It may
 happen that they behave differently, so it may sometimes be useful to run a
 full *recollindex* even for a specific file, but this will produce a
 big log file. 
 When you are done, it is  better to reset the verbosity to a reasonable
 level (e.g.: +2+ : just errors, +4+ : basic traces). 
 === Note: the path gotcha
 *recollindex -i* will only index files under the directories defined by the
 +topdirs+ configuration variable (your home directory by
 default). Unfortunately, the test is done on the file path text, ignoring
 possible symbolic links. If you give a simple file name as a parameter to
 *recollindex -i* and there are symbolic links inside the +topdirs+
 entries, the comparison may fail. For example, if your home directory is
 '/home/me/' and '/home/' is a link to '/usr/home/', *recollindex -i
 somefilename* will actually try to index '/usr/home/somefilename/', and
 fail (because '/usr/home/me/' is not a subdirectory of '/home/me/'). This
 will manifest itself in the log by a message like the following.  
 ----
 :4:../index/fsindexer.cpp:149:FsIndexer::indexFiles: skipping [/usr/home/me/somefile] (ntd)
 ----
 If this happens, give a full path consistent with what is found in the
 configuration file (e.g.: _recollindex -i /home/me/somefile_). 
 === File system occupation
 One of the possible reasons for failed indexing is a +maxfsoccup+
 parameter set too low. This is the value of file system occupation, not
 free space, where indexing will stop. It is set from the GUI indexing
 configuration or by editing 'recoll.conf'. A value of 0 implies no
 checking, but a very low, non-zero, value will just prevent indexing. 
--- a/website/faqsandhowtos/WikiIndex.txt
+++ b/website/faqsandhowtos/WikiIndex.txt
@ -0,0 +1,65 @@
 == Recoll Wiki file index
 link:ElinksWeb.html[Extending the Recoll Firefox visited web page indexing mechanism to other browsers]
 link:FaqsAndHowTos.html[Faqs and Howtos]
 link:FilterArch.html[Recoll input filters ]
 link:FilterRetrofit.html[Installing a filter for a new document type]
 link:FilteringOutZipArchiveMembers.html[Filtering out Zip archive members]
 link:GUIKeyboard.html[# Recoll GUI keyboard navigation]
 link:HandleCustomField.html[Generating a custom field and using it to sort results]
 link:Home.html[Welcome to the Recoll Wiki]
 link:HotRecoll.html[Recoll hotkey: starting / hiding recoll with a keyboard shortcut]
 link:IndexMailHeader.html[Indexing arbitrary mail headers ]
 link:IndexMozillaCalendari.html[Indexing Mozilla calendar data ]
 link:IndexOnAc.html[Laptops: automatically starting or stopping indexing according to AC power status]
 link:IndexOutlook.html[Indexing Outlook archives]
 link:IndexWebHistory.html[Indexing Web history with the Firefox extension ]
 link:MultipleIndexes.html[Creating and using multiple indexes]
 link:MuttAndRecoll.html[Interfacing Recoll and Mutt]
 link:NonAsciiFileNames.html[Unix and non-ASCII file names, a summary of issues]
 link:OpenHelperScript.html[Starting native applications ]
 link:PreventIndexingDir.html[Preventing indexing in a directory]
 link:ProblemSolvingData.html[Gathering useful data for asking help about or reporting a Recoll issue]
 link:QpdfviewHelperScript.html[Starting native applications ]
 link:QueryFromC.html[Querying Recoll from a C program]
 link:ReplaceCategories.html[Replacing the Category filter controls]
 link:ResultsThumbnails.html[Result list thumbnails and how to create them]
 link:SavingConfig.html[User configuration backup]
 link:UnityLens.html[Building and Installing the Ubuntu Unity Recoll Lens]
 link:UsingOpenWith.html[Using the Open With context menu in recoll 1.20 and newe]
 link:WhyIsMyFileNotIndexed.html[Using the log file to investigate indexing issues]
 link:XDGBase.html[XDG: Tidying Recoll data storage]
 link:ZDevCaseAndDiacritics1.html[Character case and diacritic marks (1), issues with stemming]
 link:ZDevCaseAndDiacritics2.html[Character case and diacritic marks (2), user interface]
 link:ZDevCaseAndDiacritics3.html[Character case and diacritic marks (3), implementation]
--- a/website/faqsandhowtos/XDGBase.txt
+++ b/website/faqsandhowtos/XDGBase.txt
@ -0,0 +1,42 @@
 == XDG: Tidying Recoll data storage ==
 The default storage structure of Recoll configuration and index data is
 quite at odds with what recommends the 
 link:http://standards.freedesktop.org/basedir-spec/basedir-spec-latest.html[XDG
 Base Directory Specification], the reason being that it predates said spec.
 By default, Recoll stores all its data in a single directory: '$HOME/.recoll'
 This is not going to change, because it would be quite disturbing for
 current users.
 However, the location of this directory can be modified using the
 +$RECOLL_CONFDIR+ environment variable.
 Furthermore all significant Recoll data categories can be moved away from
 the configuration directory (maybe to '$HOME/.cache'), by setting
 configuration variables:
 * _dbdir_ defines the location for storing the Xapian
  index. This could be set to, e.g., '$HOME/.cache/recoll/xapiandb'. It is
  quite recommended that 
  this directory be dedicated to Xapian (don't store other things in
  there).
 * _mboxcachedir_ defines the location for caching access speedup information
  about mail folders in mbox format. e.g. '$HOME/.cache/recoll/mboxcache'
 * New in 1.22: you can use _aspellDictDir_ to define the storage
  location for the aspell spelling approximation
  dictionary. E.g. '$HOME/.cache/recoll'
 * _webcachedir_ may be used to define where the visited web pages
  archive is stored. E.g. '$HOME/.cache/recoll/webcache'. This is only used
  if you activate the Firefox plugin and web history indexing. You may
  want to think a bit more about where to store it, because, contrary to
  the above, this is not discardable data: your Recoll Web history goes
  away if you delete it.
 If you use multiple Recoll configurations, each will have to be customized.
 Once these are put away, there are still a few modifyiable files in the
 configuration directory, for example the 'recoll.pid' and 'history'
 files, but these are small files. Moving 'recoll.pid' away would be a
 serious headache because it is used by scripts. 
--- a/website/faqsandhowtos/ZDevCaseAndDiacritics1.txt
+++ b/website/faqsandhowtos/ZDevCaseAndDiacritics1.txt
@ -0,0 +1,143 @@
 == Character case and diacritic marks (1), issues with stemming
 === Case and diacritics in Recoll
 Recoll versions up to 1.17 almost fully ignore character case and diacritic
 marks. 
 All terms are converted to lower case and unaccented before they are
 written to the index. There are only two exceptions:
 * File paths (as used in _dir:_ clauses) are not converted. This might
   be a bug or a feature, but the main reason is that we don't know how they
   are encoded.
 * It is possible to specify that some characters will keep their diacritic
   marks, because the entity formed by the character and the diacritic mark
   is considered to be a different letter, not a modified one. This is
   highly dependant on the language. For exemple, in Swedish, +å+ should
   be preserved, not turned into +a+.
 As a necessary consequence, the same transformations are applied to search
 terms, and it is impossible to search for a specific capitalization of a
 word (+US+ is looked for as +us+), or a specific accented form
 (+café+ will be looked for as +cafe+).
 However, there are some cases where you would like to be more specific:
 * Searching for +US+ or +us+ should probably return different results.
 * Diacritics are seldom significant in English, but we can find a
   few examples anyway: +sake+ and +saké+, +mate+ and +maté+. Of
   course, there are many more cases in languages which use more diacritics.
 On the other hand, accents are often mistyped or forgotten (résumé, résume,
 resume?), and capitalization is most often unsignificant, so that it is
 very important to retain the capability to ignore accent and character
 case differences, and that the discrimination can be easily switched on or
 off for each search (or even for specific terms).
 This text and other pages which will follow will discuss issues in adding
 character case and diacritics sensitivity to Recoll, under the assumption
 that the main index will contain the raw source terms instead of
 case-folded and unaccented ones.
 The following will use the _unaccent_ neologism to mean _remove
 diacritic marks_ (and not only accents). 
 English examples are used when possible, but given the limited use of
 diacritics in English, some French will probably creep in.
 === Diacritics and stemming
 Stemming is the process by which we extend a search to terms related by
 grammatical inflexion, for example singular/plural, verb tenses, etc. For
 example a search for +floor+ is normally expanded by Recoll to +floors,
 floored, flooring, ...+
 In practice Recoll has a separate data structure that has stemmed terms
 (stems) as keys pointing to a list of expansion terms 
 {{{floor -> (floor,floors,floorings,...)}}}
 Stemming should be applied to terms before they are stripped of
 diacritics. Accents may have a grammatical significance, and the accent may
 change how the term is stemmed. For example, in French the +âmes+ suffix
 generally marks a past conjugation but +ames+ does not. The standard
 Xapian French stemmer will turn +évitâmes+ (avoided) into an +évit+ stem,
 but +évitames+ will be turned into +évitam+ (stripping
 plural and feminine suffixes).
 When the search is set to ignore diacritics, this poses a specific problem:
 if the user enters the search term without accents (which is correct
 because the system is supposed to ignore them), there is no warranty that
 the term will be correctly expanded by stemming.
 The diacritic mismatch breaks the family relationship between the stem
 siblings, and this is independant of the type of index: it will happen with
 an index where diacritics are stripped just as with a raw one.
 The simpler case where diacritics in the original term only affects
 diacritics in the stem also necessitates specific processing, but it is
 easier to work around.
 Two examples illustrating these issues follow.
 ==== The simple case: diacritics in the term only affect diacritics in the stem
 Let's imagine that the document set contains the term +éviter+
 (infinitive of +to avoid+), but not +évite+ (present). The only term in
 the actual index is then +éviter+.
 The user enters an unaccented +evite+, counting on the
 diacritics-insensitive search mode to deal with the accents. As +évite+
 is not present in the index, we have no way to guess that +evite+ is
 really +évite+.
 The stemmer will turn +evite+ into +evit+. There is no way that this
 can be related to +éviter+, and this legitimate result can't be found.
 There is a way around this: we can compute a separate
 stem expansion dictionary for unaccented terms. This dictionary, to be used
 with diacritic-unsensitive searches only, contains the relationship
 between +evit+ and +eviter+ (as +éviter+ is in the index). We can
 then relate +eviter+ and +éviter+ because they differ only by accents,
 and the search will find the document with +éviter+.
 ==== The bad case: diacritics in the term change the stem beyond diacritics
 Some grammatically significant accents will cause unexpectedly missing
 search results when using a supposedly diacritics-insensitive search mode.
 Let's imagine that the document set contains the term +éviter+ 
 (infinitive of +to avoid+), but not +évitâmes+ (past). So the stemming
 expansion table has an entry for +évit+ -> +éviter+.
 If the user enters an unaccented +evitames+, she would expect to find the
 documents containing +éviter+ in the results, because the latter term is
 a stemming sibling of +évitâmes+ and the search is supposedly not
 influenced by diacritics, so that +evitames+ and +évitâmes+ should be
 equivalent. 
 However, our search is now in trouble, because +évitâmes+ is not in any
 document, so that there is no data in the index which would inform us about
 how to transform the input term into something that differs only by accents
 but would yield a correct input for the stemmer.
 If we try to feed the raw user input to the stemmer, it will propose 
 an +evitam+ stem, which will not work, because the stem that actually 
 exists is +évit+, and +evitam+ can not be related to +éviter+.
 The only palliative approach I can think of would be a spelling correction
 of the input, performed independantly of the actual index contents, which
 would notice that +évitames+ is not a French word and propose a change or an
 expansion to +évitâmes+, which would correctly stem to +évit+ and allow
 us to find +éviter+.
 This issue is not specific to Recoll or indeed to the fact that the index
 retains accent or not. As far as I can see, it is an intrinsic bad
 interaction between diacritics insensitivity and stemming.
 It is also interesting to note that this case becomes less probable when
 the data set becomes bigger, because more term inflexions will then be
 present in the index.
 We'll next think about an link:ZDevCaseAndDiacritics2.html[appropriate
 interface].
--- a/website/faqsandhowtos/ZDevCaseAndDiacritics2.txt
+++ b/website/faqsandhowtos/ZDevCaseAndDiacritics2.txt
@ -0,0 +1,122 @@
 == Character case and diacritic marks (2), user interface
 In a link:ZDevCaseAndDiacritics1.html[previous document], we discussed some
 of the problems which arise when mixing case/diacritics sensitivity and
 stemming.
 As of version 1.18, Recoll can create two types of indexes:
 * _Dumb_ indexes contain terms which are lowercased and stripped of
  diacritics. Searches using such an index are naturally case- and
  diacritics- insensitive: search terms are stripped before processing.
 * _Raw_ indexes contain terms which are just like they were found in the
  source document. Searching such an index is naturally sensitive to case
  and diacritics, and can be made insensitive by further processing.
 The following explains how users can control these Recoll features.
 === Controlling the type of index we create: stripped or raw
 The kind of index that recoll creates is determined by:
 * A build-time *configure* switch: _--enable-stripchars_. If this is
   set, the code for case and diacritics sensitivity is not compiled in and
   recoll will work like the previous versions: unaccented and casefolded
   index, no runtime options for case or diacritics sensitivity
 * An indexing configuration switch (in recoll.conf): if Recoll was built
   with _--disable-stripchars_, this will provide a dynamic way to return
   to the "traditional" index. The case and diacritics code will be present
   but inactive. Normally, a recoll installation with this switch set
   should behave exactly like one built with _--enable-stripchars_. When
   using multiple indexes, this switch MUST be consistent between
   indexes. There is no support whatsoever for mixing raw and dumb indexes.
   The option is named _indexStripChars_, and it is not settable from the
   GUI to avoid errors. This is something that would typically be set once
   and for all for a given installation. We need to decide what the default
   value will be for 1.18
 * A number of query time switches. Using these it is also possible to
   perform a search insensitive to case and diacritics on a raw index. Note
   however, that, given the complexity of the issues involved, I give no
   guaranty at this time that this will yield exactly the same results as
   searching a dumb index. Details about query time behaviour follow.
 === Controlling stem, case and diacritics expansion: user query interface 
 Recoll versions up to 1.17 were insensitive to case and diacritics. We only
 needed to give the user a way to control stem expansion. This was done in
 three ways:
 * Globally, by setting a menu option.
 * Globally, by setting the stemming language value to empty.
 * On a term by term basis by Capitalizing the term, or, in query language
   mode only, by using an 'l' clause modifier (_"term"l_).
 After switching to an unstripped index, capable of case and diacritic
 sensitivity, we need ways to control what processing is performed among:
 * Case expansion.
 * Diacritics expansion.
 * Stem expansion.
 The default mode will be compatible with the previous version, because
 this is is most generally what we want to do: ignore case and diacritics,
 expand stems.
 There are two easy approaches for controlling the parameters:
 * Global options set in the GUI menus or as *recollq* command line
   switches. 
 * Per-clause options set by modifiers in the query language.
 We would like, however to let the user entry automatically override the
 defaults in a sensible way. For example:
 * If a term is entered with diacritics, diacritic sensitivity is turned on
   (for this term only).
 * If a term is entered with upper-case characters, case sensitivity is
   turned on. In this case, we turn off stem expansion, because it makes
   really no sense with case sensitivity.
 With this method we are stuck with 3 problems (only if the global mode is
 set to insensitive, and we're not using the query language):
 * Turning off stemming without turning on case sensitivity.
 * Searching for an all lower-case term in case-sensitive mode.
 * Searching for a term without diacritics in diacritic-sensitive mode.
 The two latter issues are relatively marginal and can be worked around easily
 by switching to query language mode or using negative clauses in the
 advanced search. 
 However, we need to be able to turn stemming off while remaining
 insensitive to case, and we need to stay reasonably compatible with the
 previous versions. This means that a term which has a capital first letter
 but is otherwise lowercase will turn stemming off, but not case sensitivity
 on. 
 So we're left with how to search for such a term in a case-sensitive way,
 and for this, you'll have to use global options or the query language.
 The modified method is:
 * If a term is entered with diacritics, diacritic sensitivity is turned on
   (for this term only).
 * If the first letter in a term is upper-case and the rest is lower-case,
   we turn stem expansion off, but we do not become case-sensitive
 * If any letter in a term except the first is upper-case, case sensitivity
   is turned on. Stem expansion is also turned-off (even if the first
   letter is lower-case), because it makes really no sense with case
   sensitivity.
 * To search for an all lower-case or capitalized term in a case-sensitive
   way, use the query language: "Capitalized"C, "lowercase"C
 * Use the query language and the "D" modifier to turn on diacritics
   sensitivity.
 It can be noted that some combinations of choices do not make sense and
 they are not allowed by Recoll: for example, diacritics or case sensitivity
 do not make sense with stem expansion (which cannot preserve diacritics in
 any meaningful general way).
 The [[ZDevCaseAndDiacritics3.wiki|next page]] describes the actual
 implementation in Recoll 1.18.
--- a/website/faqsandhowtos/ZDevCaseAndDiacritics3.txt
+++ b/website/faqsandhowtos/ZDevCaseAndDiacritics3.txt
@ -0,0 +1,67 @@
 == Character case and diacritic marks (3), implementation
 In previous pages, we discussed link:ZDevCaseAndDiacritics1.html[diacritics
 and stemming], and an link:ZDevCaseAndDiacritics2.html[appropriate
 interface] for switchable search sensitivity to diacritics and character
 case.
 So you are in this mood again and you don't want to type accents (maybe you're
 stuck with a QWERTY American english keyboard), or conversely you're
 want to resume looking for your résumé, and you've told Recoll as much,
 using the appropriate interface. What happens then ?
 The second case is easy if the index is raw, and mostly impossible if it is
 stripped. So we'll concentrate on the first case: how to achieve case and
 diacritics insensitivity on a raw index ?
 Recoll uses three expansion tables:
 * The first table has stripped and lowercased terms as keys and raw terms as
  data: +mate -> (mate, maté, MATE,...)+.
 * The second table has lowercased stems as keys and original lowercase terms
  as data (when using multiple languages, there are several such tables):
  +évit -> (éviter, évite, évitâmes, ...)+.
 * The third table has stripped and lowercased stems as keys and stripped
  lowercased terms as data:
  +evit -> (eviter, evite, evitons)+ and +evitam -> (evitames, ...)+
 The first table can be used for full case and diacritics expansion or for
 only one of those, by post-filtering the results of full expansion (e.g. if
 we only want diacritics expansion, we filter by stripping diacritics from
 each result term and check that it's identical to the input). For example
 if we have +mate -> (mate, maté, MATE, MATÉ)+ in the table and want to
 only perform case expansion for an input of +maté+, we apply case folding
 to the initial output and keep only +maté+, as +mate+ differs from the
 input.
 We only perform stemming expansion when case and diacritics sensitivity is
 off. It is performed using the second and third tables, both on the
 lowercased and lowercased/stripped output of the first step, and each term
 in the output stemming is expanded again for case (using the first table).
 A full example of the expansion occurring during an insensitive search 
 for +resume+ using French stemming on a mixed English/French index
 follows. An important thing to remember is that the result of each
 expansion is a function of the terms actually present in the index, not
 some arbitrary computation (and so, of course, many of the possible but
 absent variations are missing).
 # The case and diacritics expansion of +resume+ yields +RESUME Resume
  Résumé resumé résume résumé resume+ 
 # The Stem expansion input list (lower-cased) is:
 +resume resumé résume résumé+, and the output is:
 +resum resume resumenes resumer resumes resumé resumée résum résumait
 résumant résume résumer résumerai résumerait résumes résumez résumé résumée
 résumées résumés+ 
 # Each of the above terms is then fed to case and diacritics expansion (first
 table), for the final output:
 +resume résumé Résumé résumer résume Resume résumés RESUME resumes
 resumer résumant resúmenes resumé résumait résumes résumée resumee
 résumerait Résumez résumerai RÉSUMÉES Resumée Resumes résumées+.
 A Xapian OR query is finally constructed from the expanded term list.
--- a/website/faqsandhowtos/makeindex.sh
+++ b/website/faqsandhowtos/makeindex.sh
@ -0,0 +1,20 @@
 #!/bin/sh
 WIDX=WikiIndex.txt
 echo "== Recoll Wiki file index" > $WIDX
 for f in *.txt; do
 if test "$f" = $WIDX ; then continue; fi
 h="`basename $f .txt`.html"
 title=`head -1 "$f" | sed -e 's/=//g' -e 's/^ *//' -e 's/ *$//' -e 's/
//g'`
 echo 'link:'$h'['$title']' >> $WIDX
 echo >> $WIDX
 done
 exit 0
 # Check and display what files are in the index but not in the contents table:
 grep \| FaqsAndHowTos.txt | awk -F\| '{print $1}'  | sed -e 's/\* \[\[//' -e 's/.wiki//' |sort > ctfiles.tmp
 grep '\[\[' WikiIndex.txt | awk -F\| '{print $1}'  | sed -e 's/\[\[//' -e 's/.wiki//' -e 's/.md//' | sort > ixfiles.tmp
 echo 'diff ContentFiles  IndexFiles:'
 diff ctfiles.tmp ixfiles.tmp
 rm ctfiles.tmp ixfiles.tmp