web

2017-06-05 11:57:26 +02:00 · 2017-06-05 11:57:26 +02:00 · 821fb780d2
commit 821fb780d2
parent 06b414cfc6
35 changed files with 2078 additions and 0 deletions
--- a/website/faqsandhowtos/ElinksWeb.txt
+++ b/website/faqsandhowtos/ElinksWeb.txt
@ -0,0 +1,35 @@
+== Extending the Recoll Firefox visited web page indexing mechanism to other browsers
+
+The *Recoll* _Web Queue_ function allows using WEB browser plug-ins
+originally designed for indexing visited WEB pages with *Beagle* (rip). The
+browser plug-ins works very simply by creating copies of the visited pages
+in a designated directory. Two files are created for each page, one for the
+contents, the other for the metadata. 
+
+When activated, *Recoll* will visit the queue directory and index each HTML
+page and its associated metadata. There is more detail about the mechanism
+on the [[IndexWebHistory|page about the Recoll Web queue]], but mostly, you
+just need to go to the _Indexing Preferences_ in the *recoll* GUI, open the
+_Web history_ panel and check the top button. 
+
+Franck, a *Recoll* and *Elinks* user from New Zealand, designed a method
+and wrote a script to index the *Elinks* WEB history in this fashion.  
+
+The script works by using *wget* to fetch the visited page into the queue
+directory. This means that it would be reusable to index arbitrary WEB
+pages in contexts other than *Elinks* visits. 
+
+Recipee for *Elinks* and Recoll 1.18 and later:
+
+* Retrieve the 
+  link:https://www.recoll.org/files/elinks_recoll.sh[elinks_recoll.sh] shell
+  script and make it executable (`chmod a+x elinks_recoll.sh`).
+* In the Elinks Keyboard shortcut manager (k)/Main, add a shortcut to pass
+  the current URL to an external commande, e.g. _Ctrl-P_.
+* In the Options manager (o) /Document/Uri Passing, add an action named for
+  example _ToIndex_
+* Modify the ToIndex action to execute `/path/to/the/script/elinks_recoll.sh %c`
+* Save, you are done
+
+For Recoll 1.17, the method is analog, but the script is named
+link:https://www.recoll.org/files/elinks_recoll.sh[elinks_beagle.sh].
--- a/website/faqsandhowtos/FaqsAndHowTos.txt
+++ b/website/faqsandhowtos/FaqsAndHowTos.txt
@ -0,0 +1,37 @@
+== Faqs and Howtos
+
+=== Indexing
+* link:WhyIsMyFileNotIndexed.html[Why is this file not indexed ? Investigating indexing issues]
+* link:PreventIndexingDir.html[Preventing the indexing of a directory]
+* link:IndexOnAc.html[Starting/stopping the indexer depending on power/battery status]
+* link:IndexMozillaCalendari.html[Indexing Mozilla Sunbird / Lightning calendar data]
+* link:MultipleIndexes.html[Creating and using multiple indexes]
+* link:IndexWebHistory.html[Indexing Web history with the Firefox browser extension]
+* link:ElinksWeb.html[Extending the Web queue mechanism to other browsers and general WEB indexing]
+* link:IndexMailHeader.html[Indexing arbitrary mail headers]
+* link:IndexOutlook.html[Indexing Outlook archives]
+* link:HandleCustomField.html[Generating a custom field and using it to sort results]
+* link:http://www.recoll.org/recoll_XMP/index.html.html[An example of filter/field customisation, using XMP metadata with PDFs]
+* link:FilteringOutZipArchiveMembers.html[Filtering out Zip archive members]
+
+=== Searching
+* link:GUIKeyboard.html[Recoll GUI keyboard navigation]
+* link:HotRecoll.html[On the desktop: using a keyboard shortcut for starting/hiding recoll]
+* link:OpenHelperScript.html[Handling issues for starting native apps, esp. email clients - getting Thunderbird to open message files]
+* link:QpdfviewHelperScript.html[Another example open helper script - using qpdfview to open pdf and postscript files, with support for page and search options]
+* link:UsingOpenWith.html[Using the new Open With menu in recoll 1.20 with a custom
+  app]
+* link:ReplaceCategories.html[Replacing the document category filters]
+* link:ResultsThumbnails.html[Result list thumbnails and how to create them]
+* link:MuttAndRecoll.html[Interfacing Recoll and Mutt]
+* link:QueryFromC.html[Querying from a C program]
+
+=== Administration and miscellaneous
+* link:http://www.recoll.org/pages/recoll-webui-install-wsgi.html.html[Installation of the Recoll WebUI with Apache]
+* link:FilterRetrofit.wiki.html[//Installing a filter for a new document type//]
+* link:UnityLens.html[Building and Installing the Ubuntu Unity Recoll Lens]
+* link:SavingConfig.wiki.html[Recoll configuration backup]
+* link:XDGBase.wiki.html[Tidying Recoll data storage]
+* link:ProblemSolvingData.html[Collecting diagnostic information]
+* link:NonAsciiFileNames.html[Unix and non-ascii file names]
+* link:FilterArch.html[Recoll filters]
--- a/website/faqsandhowtos/FilterArch.txt
+++ b/website/faqsandhowtos/FilterArch.txt
@ -0,0 +1,82 @@
+== Recoll input handlers
+
+In the end, Recoll indexes plain UTF-8 text, remembering when it came
+from.
+
+But of course, this is not how the source data looks like.
+The text content of the original documents is encoded in many fashions
+(ie pdf, ms-word, html, etc.), and it can also be stored in quite
+involved ways (inside archives, email attachments ...).
+
+For getting to the data and converting it to plain text, Recoll uses a set
+of modules which it calls input handlers (or filters), which either operate
+on the storage structure (ie: a zip handler), or the storage format (ie a
+pdf to text translator), or both. In addition, there is a tentative notion
+of a higher level storage backend which we will ignore for now (for
+reference there are currently two of those: the file system and the web
+history cache).
+
+The basic task of filters is to take a document as input and produce a
+series of subdocuments as output. The subdocument's format is defined
+either dynamically (as part of the output data), or statically, in the
+filter definition. 
+
+=== Simple filters
+
+These are executed by a the **mh_exec** recoll module. They are the vast
+majority.
+
+These filters are very simple. They are designed to perform a simple task
+with minimal interface, they mostly don't know anything about each other,
+and they don't know much about their context. This makes writing a filter
+quite easy as there is not much to learn about their environment.
+
+Only one output document is produced and the format is fixed. 
+
+In practise the filter, which is most generally a shell-script (but could
+be any executable program), takes a file name on the command line and
+outputs an html or plain text document on standard output, then exits.
+
+For example, the pdf filter takes one pdf file name as input on the command
+line and produces one html document on stdout. The fact that the output is
+html is statically defined in a configuration file. 
+
+For filters which produce plain text, the output character set information
+is in general defined in the configuration file. Else it will be obtained
+from the locale (hoping that it makes sense).
+
+Filters that output html can produce metadata information in the html
+header (ie author etc.). Filters that output plain text can only output
+main text data, no metadata fields. 
+
+Besides the file name, there is one other piece of input information, which
+is in the form of an environment variable, and can be safely ignored:
+RECOLL_FILTER_FORPREVIEW+. This indicates if the filter is being used
+for previewing or for indexing data. Some filters will elect to suppress
+repetitive parts of the output text when indexing to avoid distorting the
+term statistics. For exemple, the man filter suppresses the section
+headers (NAME, SYNOPSIS...) when indexing.
+
+=== Multiple input filters
+
+These filters are more complex, but still quite easy to write, especially
+if you can use Python, because they can then use a common module which
+manages the communication with the indexer.
+
+Newer Recoll versions have converted many previously 'simple' filters to
+this kind as part of the port to Windows.
+
+These filters are executed by the *mh_execm* Recoll module.
+
+They are persistent (one instance will persist through a whole indexing
+pass), and will index successive multiple input files (the point being to
+avoid startup performance penalty), and possibly multiple documents per
+input file if this makes sense for their input format (ie: zip archive, chm
+help file). 
+
+They use a simple communication protocol over a pipe with the main recoll
+or recollindex process, with file names and a few other parameters being
+sent as input, and decoded data and attributes being sent in return.
+
+The shared Python module is 'filters/rclexecm.py'. You can look at 'rclzip'
+or 'rclaudio' for reasonably straightforward exemples.
--- a/website/faqsandhowtos/FilterRetrofit.txt
+++ b/website/faqsandhowtos/FilterRetrofit.txt
@ -0,0 +1,62 @@
+== Installing a filter for a new document type
+
+It will sometimes happen that a newer Recoll release has support for a
+document type which would be useful to you, but which your older release
+does not support.
+
+It is in general easy to import support from the newer to the older
+release: the Recoll input handler interface is very stable, so things should just
+work.
+
+Input Handler updates are generally described on the Recoll web site
+link:https://www.recoll.org/filters/filters.html[new filters pages]. They
+may include notes about which versions need the new input handler, or specifics
+about installing it.
+
+An up to date copy of input handlers and configuration files is also kept
+link:https://www.recoll.org/filters/[at the same location].
+
+We will take an example to make things more concrete: Tomboy and Gnote
+files are directly supported by Recoll 1.19, but not in older Recoll
+releases. The *rclxml* handler is needed to process them.
+
+The following procedure will allow you to retrofit support:
+
+- Retrieve the *rclxml* input handler from:
+  link:https://www.lesbonscomptes.com/recoll/filters/rclxml[]
+
+- Copy it to '/usr/share/recoll/filters' and make it executable: 
+  `chmod +x rclxml`
+  The input handler needs *xsltproc*, but this is probably already on your
+  system (else get it with the package manager).
+
+- Edit '~/.recoll/mimemap', add the following line:
+ `.note = application/x-gnote`
+- Edit '~/.recoll/mimeconf', add the following lines:
+
+----
+[index]
+application/x-gnote = exec rclxml
+----
+- Edit '~/.recoll/mimeview', add the following lines:
+
+----
+[view]
+application/x-gnote = tomboy %f
+----
+
+- The easiest way to make sure the files are indexed with the new input
+  handlers may then be to just run a full indexing pass (`recollindex -z`). 
+
+Notes:
+ 
+- The MIME type which is used is not crucial, you could prefer to use,
+  e.g., +application/x-tomboy+ instead, it just has to be consistent. To
+  avoid future trouble, it's better to use the type used by newer Recoll
+  releases though.
+- The 'mimeview' entry is necessary even if you are using the desktop
+  preferences to open files. The value will not be used, but it has to be
+  there.
+
+
+
--- a/website/faqsandhowtos/FilteringOutZipArchiveMembers.txt
+++ b/website/faqsandhowtos/FilteringOutZipArchiveMembers.txt
@ -0,0 +1,34 @@
+== Filtering out Zip archive members ==
+
+The *rclzip* Zip archive extraction input handler does not use the general
+configuration variables which define what file system objects should be
+skipped, but it has an equivalent internal function. 
+
+The name-skipping code depends on a recent member of the the Recoll Python
+package. This will become standard for release 1.20, but for earlier
+releases, you need to do two things to use this function: 
+
+- Fetch 'python/recoll/recoll/rclconfig.py' and 'filters/rclzip' from the
+  source repository. 
+- Copy both to '/usr/share/recoll/filters' and make 'rclzip' executable.
+
+You can then set a variable named +zipSkippedNames+ inside
+'recoll.conf'. +zipSkippedNames+ should be a space-separated list of
+patterns which will be passed to the Python fnmatch() function. The +/+
+characters are not special (matched as any character). 
+
+You can't use embedded spaces in patterns (no double-quote quoting for now)
+
+This can be redefined for file system directories using the usual section
+indicators (Zip archives in different file-system directories can have
+different skip lists). 
+
+Example:
+
+----
+zipSkippedNames = *.txt
+[/path/to/the/dir]
+zipSkippedNames = somedir/*/*.html
+----
+
+
--- a/website/faqsandhowtos/GUIKeyboard.txt
+++ b/website/faqsandhowtos/GUIKeyboard.txt
@ -0,0 +1,60 @@
+== Recoll GUI keyboard navigation
+
+Using Recoll without the mouse is not completely straightforward, but it is
+mostly feasible. Here follows a description of the usable shortcuts. 
+
+=== Anywhere
+
+`Ctrl+q` should exit Recoll from anywhere.
+
+=== Main window and result list ===
+
+When Recoll starts up, the focus is in the simple search entry. The main
+window tab order is as follows: 
+
+* Clear
+* Search
+* Search type combo
+* Search entry  (Initial focus)
+* Result list (scrolling etc)
+* Result list 1st link
+* Result list next links...
+* Back to Clear
+
+Each result list entry has 3 links: the icon link is not active, but its
+value is the URL, so that it can be dragged and dropped to another
+application. The 2 other links are _Preview_ and _Open_ and can be
+activated by typing _Enter_. 
+
+Typing _Ctrl+Shift+s_ anywhere in the main window should return the focus to the search entry. So will _Ctrl+l_ in future versions (for compatibility with WEB browser usage).
+
+For pure keyboard usage, you can improve this by:
+
+- Disabling the icon link: use _Preferences->GUI configuration->Result
+  List->Edit result paragraph_ and remove the `<a href='%U'>` and `</a>`
+  around the `<img...>` tag. 
+- Making the active link more visible by adding the following code to the
+  result page HTML header insert (same preferences tab). Feel free to
+  adjust the color :=) : 
+
+----
+<style type="text/css">
+a:focus {background-color: red;}
+</style>
+----
+
+=== Result table
+
+The same _Ctrl+Shift+s_ will return the focus to the search entry when
+working with the result table. 
+
+_Ctrl+r_ will move the focus from the entry to the spreadsheet. When in
+there the arrow keys will navigate the lines.  
+
+When a line is selected:
+
+* _Ctrl+o_ will _Open_ the document.
+* _Ctrl+Shift+o_ will _Open_ the document and exit Recoll.
+* _Ctrl+d_ (detail) will start a _Preview_
+
+_Esc_ will deselect the current line so that mouse hovering will work again.
--- a/website/faqsandhowtos/HandleCustomField.txt
+++ b/website/faqsandhowtos/HandleCustomField.txt
@ -0,0 +1,69 @@
+== Generating a custom field and using it to sort results
+
+We are going to show how to generate a custom field from a Recoll filter,
+and use it for sorting results. The example chosen comes from an actual
+user request: sorting results on pdf page counts. 
+
+The details here are obsolete, as the +pdf+ input handler is now a quite
+different python program, but the general idea is still relevant.
+
+The page count from a pdf file can be displayed by the pdfinfo command
+(xpdf or poppler tools). 
+
+We first modify a copy of the rclpdf filter
+('/usr/[local/]share/recoll/filters/rclpdf'), to compute the pdf page count,
+and output the value as an html meta field. This is a not very interesting
+bit of shell/awk magic. Another approach would be to just rewrite the
+rclpdf filter in your favorite scripting language (ie: perl, python...), as
+all it does is execute pdftotext and pdfinfo and output html, nothing
+complicated. Here follows the rclpdf modification as a pseudo patch: 
+
+----
+# compute the page count and format it so that it's alphabetically sortable
+set `pdfinfo "$infile" | egrep ^Pages:`
+pages=`printf "%04d" $2`
+[skip...]
+# Pass the page count value to awk
+-awk 'BEGIN'\
+awk -v Pages="$pages" 'BEGIN'\
+[skip...]
+# Inside the awk program startup section: compute the "meta" field line
+  pagemeta = "<meta name=\"pdfpages\" content=\"" Pages "\">\n"
+[skip...]
+# Then print it as part of the header:
+    $0 =  part1 charsetmeta pagemeta part2
+[skip...]
+----
+
+You can execute your own version of rclpdf by modifying '~/.recoll/mimeconf':
+
+----
+[index]
+application/pdf = exec /path/to/my/own/rclpdf
+----
+
+At this point, recollindex would receive and extract a +pdfpages+ field,
+but it would not know what to do with it. We are going to tell it to store
+the value inside the document data record so that it can be displayed in
+the results, and sorted on. For this we modify the '~/.recoll/fields' file: 
+
+----
+[stored]
+pdfpages=
+----
+
+That's it ! After reindexing, you can now display +pdfpages+ inside the
+result list (add a +%(pdfpages)+ value to the paragraph format), and display
+pdfpages+ inside the result table (right-click the table header), and sort
+the results on page count (click the column header). 
+
+Note that +pdfpages+ has not been defined as searchable (this would not make
+much sense). For this, you'd have to define a prefix and add it to the
+[prefixes] fields file section: 
+
+----
+[prefixes]
+pdfpages = XYPDFP
+----
+
+Have a look at the comments inside the 'fields' file for more information.
--- a/website/faqsandhowtos/Home.txt
+++ b/website/faqsandhowtos/Home.txt
@ -0,0 +1,13 @@
+== Welcome to the Recoll Faqs and Recipees
+
+link:FaqsAndHowTos.html[FAQs and Howtos] are stored here, but 
+the main source for Recoll user documentation is 
+link:https://www.recoll.org/doc.html[the _Recoll user manual_] on the
+link:https://www.recoll.org/[Recoll Web site] where you will also find a
+lot of other Recoll information, source code tarballs and contact
+information.
+
+If you want to make your problem report as useful as possible, you may want
+to take a look at link:ProblemSolvingData.html[this page]. 
+
+link:WikiIndex.html[Full file index]
--- a/website/faqsandhowtos/HotRecoll.txt
+++ b/website/faqsandhowtos/HotRecoll.txt
@ -0,0 +1,79 @@
+== Recoll hotkey: starting / hiding recoll with a keyboard shortcut
+
+Type a key (ie: F12) and have recoll appear or disappear. On the first
+occurrence, recoll is started if it's not already running. Further
+occurrences toggle recoll between visible and minimized states. Never
+thought this would be useful until someone asked for it. Can't do without
+it anymore :) 
+
+This works well with both Gnome and KDE, but is implemented using a gnome
+library (*libwnck*) and its python interface, which you may have to install
+on your system if you are a pure KDE user. The library most probably exists
+in the package repositories for your distribution, so this should not be
+too complicated. 
+
+This should also work with other window managers, because it is based on a
+standard window manager interface extension (EWMH) that most modern window
+managers implement. 
+
+=== Installing the script (all desktops):
+
+- You will need the libwnck library and its python interface. These are
+  usually part of a gnome installation, otherwise check and possibly
+  install them. For OpenSuse, the library should already be there but you
+  need to install gnome-python-desktop. 
+- Download the
+ link:https://www.recoll.org/files/hotrecoll.py[http://www.recoll.org/files/hotrecoll.py
+ script]. If you have a recent recoll installation (1.14.3 and 
+  further), it's already in the recoll filters directory
+  ('/usr/[local/]share/recoll/filters') 
+- Copy the script to some permanent place (ie: '~/bin') and make it
+  executable (you can leave it in the filters dirs if it's there). In a
+  shell window: `chmod +x hotrecoll.py`.
+- You can check that the script works (or not) by executing it on the
+  command line. It does not need an argument. Recoll should appear or
+  disappear every time you execute the script. A few warning messages may
+  be considered normal. If the script says that it does not find the wnck
+  library or some other module, you'll have to install them. 
+
+=== Installing the keyboard shortcut (Gnome):
+
+- _System->Preferences->Keyboard shortcuts_, or execute
+  *gnome-keybinding-properties* 
+- Click add, Name, ie: StartRecoll, Action: /path/to/hotrecoll.py
+- This will add the shortcut to the "Custom shortcuts" section. You can
+  then click in the "Shortcut" column for "StartRecoll", and type any key
+  combination (ie: push F12) to assign a key shortcut. 
+
+=== Installing the keyboard shortcut (KDE):
+
+Under KDE installing a global custom keyboard shortcut like we need is most
+helpfully not under "Keyboard Shortcuts" but under "Input Actions". 
+
+- _Kmenu -> Configure Desktop -> Input Actions -> Edit -> New -> Global
+  Shortcut -> Command/Url_ 
+- A new Action appears, named _New Action_. You can rename it something
+  like +hotrecoll+ for clarity. 
+- Click the _Trigger_ tab, click the input area and press your preferred
+  key combination (ie: F12) 
+- Click the _Action_ tab, and enter +hotrecoll.py+ (if it's in your PATH),
+  or else the full path to the command (e.g.:
+  '/usr/share/recoll/filters/hotrecoll.py').
+- Click _Apply_.
+
+=== Installing the keyboard shortcut (XFCE):
+
+Open the settings manager, and add the shortcut in the 
+_Application Shortcuts_ panel inside the _Keyboard_ tool.
+
+
+=== Other environments
+
+Many window managers have a way to set up a keyboard shortcut for running
+an arbitrary command. You'll need to look at the documentation for yours,
+or search the web for a solution.  
+
+An alternative independant of the environment would be to use the XBindKeys
+utility. See this link:http://www.linux.com/archive/feed/59494[linux.com
+article] for helpful instructions. 
+
--- a/website/faqsandhowtos/IndexMailHeader.txt
+++ b/website/faqsandhowtos/IndexMailHeader.txt
@ -0,0 +1,33 @@
+== Indexing arbitrary mail headers
+
+By default the Recoll mail handler only processes a subset of email headers
+(+From+, +To+, +Cc+, +Date+, +Subject+). It is possible to index additional
+headers by specifying them inside the 'fields' configuration file, inside
+the configuration directory (typically '~/.recoll/').
+
+Lengthy explanations are not really needed here, and I'll just show an
+example (duplicated from the configuration section of the manual):
+
+----
+[prefixes]
+# Index mailmytag contents (with the given prefix)
+mailmytag = XMTAG
+
+[stored]
+# Store mailmytag inside the document data record (so that it can be
+# displayed - as %(mailmytag) - in result lists).
+mailmytag = 
+
+[mail]
+# Extract the X-My-Tag mail header, and use it internally with the
+# mailmytag field name
+x-my-tag = mailmytag
+
+----
+
+Limitations:
+
+- The mail filter will only process the first instance for a header
+  occurring several times.
+- No decoding will take place (ie for non-ascii headers which would have
+  some kind of encoding). 
--- a/website/faqsandhowtos/IndexMozillaCalendari.txt
+++ b/website/faqsandhowtos/IndexMozillaCalendari.txt
@ -0,0 +1,32 @@
+== Indexing Mozilla calendar data
+
+Mozilla calendar programs (*Sunbird*, *Lightning*) do not store their
+data in +ics+ files natively. They use an *SQLite* database (the
+'storage.sdb' file inside the profile). This means that calendar data
+cannot be indexed directly.  
+
+To get Recoll to index calendar data, you need to export it to an +ics+
+file. This can be done manually, from the application menus, or, by
+installing the
+link:https://addons.mozilla.org/en-US/sunbird/addon/3740[Automatic Export
+extension]. 
+
+The extension can be configured to export the data when exiting the
+program, or at regular time intervals.  You can even set up a command to be
+executed after the export. If you are not using real time indexing, this
+can usefully be *recollindex*.
+
+In _Tools->Add Ons->Automatic Export preferences_, in the _Start an
+application after export_ subpanel, set _Path of application_ to
+'/usr/[local/]bin/recollindex' and _Parameters of application_ to
+something like _-i;/home/me/path/to/nameofexportedcal.ics_ 
+
+This will ensure that the calendar is indexed every time it is exported
+(this is not necessary though, you can let the next batch indexing pass
+take care of it). 
+
+It may happen that the exported data has some syntax errors which will
+prevent indexing with the *rclics* filter which was distributed up to
+Recoll 1.13.04 (included). You may get an updated filter from the
+link:https://www.recoll.org/download.html[Recoll download page].
+
--- a/website/faqsandhowtos/IndexOnAc.txt
+++ b/website/faqsandhowtos/IndexOnAc.txt
@ -0,0 +1,24 @@
+== Laptops: starting or stopping indexing according to AC power status
+
+For people using real time indexing on a laptop, kind user "The Doctor"
+contributed a script to automatically start and stop indexing according to
+power status. The script can be found here:
+link:https://bitbucket.org/medoc/recoll/src/tip/src/desktop/recoll_index_on_ac.sh[recoll_index_on_ac.sh]
+
+To use it, you need to copy it somewhere (e.g.: '/usr/bin', but any place
+will do), make it executable (`chmod a+x recoll_index_on_ac.sh`), and edit
+'~/.config/autostart/recollindex.desktop'
+
+Change the following line:
+
+    Exec=recollindex -w 60 -m
+
+to something like the following (depending where you copied the script):
+
+    Exec=/usr/bin/recoll_index_on_ac.sh
+
+You may also want to change
+'/usr/share/recoll/examples/recollindex.desktop', otherwise your change
+will be reverted the next time you toggle real time indexing through the
+GUI. And, yes, sorry about it, _this_ change will be lost on the next
+Recoll update, so save a copy.
--- a/website/faqsandhowtos/IndexOutlook.txt
+++ b/website/faqsandhowtos/IndexOutlook.txt
@ -0,0 +1,11 @@
+== Indexing Outlook archives ==
+
+Recoll has no direct support for indexing Microsoft Outlook data, because,
+if you are a Windows user, you probably are not a good customer for Linux
+desktop indexing...
+
+However, if you have a need to index Outlook data at some point, I can
+recommend the excellent link:http://www.five-ten-sg.com/libpst/[libpst]
+library and its link:http://www.five-ten-sg.com/libpst/rn01re01.html[readpst]
+utility. Using this you can very easily convert the Outlook data into MH or
+mbox format, and then index the result with Recoll.
--- a/website/faqsandhowtos/IndexWebHistory.txt
+++ b/website/faqsandhowtos/IndexWebHistory.txt
@ -0,0 +1,29 @@
+== Indexing Web history with the Firefox extension ==
+
+Note: this document is valid for Recoll versions from 1.18.
+
+The link:http://sourceforge.net/projects/recollfirefox/[Recoll Firefox
+extension] 
+works together with Recoll to index the Web pages that you visit. The
+extension is based on an older one which was initially written for the
+Beagle indexer.
+
+The extension works by copying the data for the visited pages to a queue
+directory ('~/.recollweb/ToIndex' by default), from which they are
+indexed and removed by Recoll, and then stored in a local cache.
+
+The extension is now hosted on the Mozilla add-ons site, so you can install
+it very simply in Firefox: link:https://addons.mozilla.org/fr/firefox/addon/recoll-indexer-1/[Recoll Firefox add-on page].
+
+This feature can be enabled in the Recoll GUI index configuration panel
+(Web history section), or by editing the configuration file (set
+processwebqueue+ to 1).
+
+Please remember that Recoll only stores a limited amount of cached web data
+(adjustable from the GUI Index Configuration section), and that old pages
+will be purged from the index. Pages that you want to archive permanently
+need to be saved elsewhere, as they will otherwise eventually disappear
+from the Recoll results.
+
+Recoll will index +.maff+ files, which may be a better choice for archival
+usage. 
--- a/website/faqsandhowtos/Makefile
+++ b/website/faqsandhowtos/Makefile
@ -0,0 +1,9 @@
+.SUFFIXES: .txt .html
+
+.txt.html:
+	asciidoc $<
+
+all: $(addsuffix .html,$(basename $(wildcard *.txt)))
+
+clean:
+	rm *.html
--- a/website/faqsandhowtos/MultipleIndexes.txt
+++ b/website/faqsandhowtos/MultipleIndexes.txt
@ -0,0 +1,96 @@
+== Creating and using multiple indexes
+
+=== Why would you want to do this ?
+
+- Easy adjustment of search areas: you can filter results by using the
+  directory filter in the advanced search panel, but, if you have
+  separate well defined places where you store different kind of data,
+  it is easier to maintain separate index and use the External indexes
+  dialog to switch them on or off, and it will also yield much better
+  search performance. 
+- Shared indexes: it may be useful to maintain one or several indexes
+  for shared data, and separate personal indexes for each user. Indexes
+  can be shared over the network.
+- Creating separate indexes for removable volumes.
+
+=== How to do it
+
+As an example we'll suppose that you have Recoll installed and indexing
+your home directory, and that you would like to have a separate index for
+/usr/shared/doc. 
+
+You need to create a separate configuration for the new index, then add it
+to the external indexes list in the user interface, and activate it as
+needed. 
+
+. Create a directory for the new index, and create an empty configuration
+  file
+
+----
+cd
+mkdir .recoll-sharedoc
+touch .recoll-sharedoc/recoll.conf
+----
+. Either edit the new configuration by hand or start recoll to use the GUI
+   configuration editor.
+
+----
+cd .recoll-sharedoc
+echo "topdirs = /usr/share/doc" > recoll.conf
+# OR
+recoll -c ~/.recoll-sharedoc
+----
+
+If using the GUI, click _Cancel_ when asked, to start the configuration
+editor.
+
+. Perform initial indexing. If you chose the GUI route, indexing will
+  start as soon as you leave the configuration editor. Else, on the
+  command line: 
+
+----
+recollindex -c ~/.recoll-sharedoc
+----
+. Optionally set up *cron* to perform nightly indexing, use +crontab -e+
+  and insert a line like the following:
+
+----
+45 20 * * * recollindex -c ~/.recoll-sharedoc
+----
+
+This would start the indexing at 20:45. `crontab -e` will use the *vi*
+editor by default, you can change this by using the EDITOR
+environment variable. Exemple: `EDITOR=kate crontab -e`
+Your favorite desktop may also have a dedicated tool to add crontab entries.
+
+. Start recoll and choose the _Preferences->External_ index dialog menu
+  entry, then click the Browse button (near the bottom), and select the
+  new index Xapian database directory '~/.recoll-sharedoc/xapiandb'
+  Then click _Add index_.
+
+. You can then activate or deactivate the new index by clicking the box
+  in front of the directory name in the list. 
+
+When adding an index shared by multiple users, it may be helpful to use the
+RECOLL_EXTRA_DBS environment variable instead of editing individual
+configurations, see the manual for more details.
+
+=== Paths adjustments
+
+When sharing indexes over a network, in most cases, the indexed data will
+be accessible through different paths on the different hosts. This will
+prevent the Preview and Open functions to work because the paths they get
+from the index do not match the ones which are usable from the local
+host.
+
+For example my home directory is accessed as '/home/me' on my home
+machine, and as '/net/myhost/home/me' on other hosts. By default, trying
+to access a result from a remote host would use the first path, when the
+second is the one that would work.
+
+As of release 1.19 **Recoll** has a facility to perform index-dependant
+path translations. This facility is accessible from the _external index
+dialog_ in the GUI preferences. Paths translations can be set for the main
+index if no index is selected (rarely useful), or for the selected
+additional index.
+
--- a/website/faqsandhowtos/MuttAndRecoll.txt
+++ b/website/faqsandhowtos/MuttAndRecoll.txt
@ -0,0 +1,77 @@
+== Interfacing Recoll and Mutt
+
+It is possible to either use Mutt as a Recoll search result viewer, or
+start Recoll from the Mutt search.
+
+=== Starting Mutt to view Recoll search results
+
+This method and the associated 
+link:http://www.recoll.org/files/recoll2mutt[recoll2mutt script] were kindly
+contributed by Morten Langlo.
+
+This allows finding mail messages in recoll and then calling *mutt*
+or *mutt-kz* to read or process the mail. 
+
+Installation:
+
+- Copy the [[http://www.recoll.org/files/recoll2mutt|recoll2mutt script]]
+  somewhere in your PATH, and make it executable.
+- In the **recoll** GUI menus: 
+_Preferences->GUI configuration->User interface->Choose editor applications_
+change the entry for "message/rfc822" to: +recoll2mutt %f+
+
+The script has options for setting a number of parameters, you may not need
+to set any of them, the defaults are:
+
+- -c mutt
+- -F .muttrc
+- -m Mail
+- -x "-fn 10*20 -geometry 115x40"
+
+Example:
+
+----
+recoll2mutt -c mutt-kz -F .mutt_kzrc -m Mail -x "-fn 10*20 -geometry 115x40"  %f
+----
+
+The option +-x+ is passed to *xterm*, which is used to call *mutt* or
+*mutt-kz*.
+
+The script works for both _mbox_ and _maildir_ mail boxes, and it
+expects the configuration file for mutt and the mail directory to reside in
+your $HOME and the spool file to be '/var/spool/mail/$USER' if it is
+not in your mail directory. But it is easy to change the values in the
+script if you need to.
+
+*mutt* is opened with the right mailbox and limit set to _Date_ and
+_Sender_.  In theory you could set limit to _Message-Id_, but very often
+*mutt* reports, that there are invalid patterns in _Message-Id_, so do it
+safe, even though all emails in the opened mail box with the same date from
+the sender are shown.
+
+
+=== Starting Recoll from the Mutt search
+
+This will work only when using maildir storage (messages in individual
+files). It will not work with mailbox files. The latter would probably be
+possible by extracting the individual result messages using the Python
+interface, but I did not try.
+
+The classic way to interface Mutt and a search application is to create a
+shortcut to an external command which creates a temporary Maildir
+containing the search results.
+
+There is such a script for Recoll, you will find it link:https://bitbucket.org/medoc/recoll/raw/41d41799dbac4c69a34db985b3ab9f1597c9c742/src/python/samples/mutt-recoll.py[here].
+
+Copy the script somewhere in your PATH, and make it executable, then add
+the following line to your '.muttrc':
+
+
+----
+
+macro index S "<enter-command>unset wait_key<enter><shell-escape>mutt-recoll.py -G<enter><change-folder-readonly>~/.cache/mutt_results<enter>" \
+          "search mail (using recoll)"
+
+----
+
+Obviously, you can replace the 'S' letter with whatever will suit you (e.g:/)
--- a/website/faqsandhowtos/NonAsciiFileNames.txt
+++ b/website/faqsandhowtos/NonAsciiFileNames.txt
@ -0,0 +1,85 @@
+== Unix and non-ASCII file names, a summary of issues
+
+Unix/Linux file and directory names are binary byte C strings. Only the
+null byte and the slash character (/) are forbidden inside a name,
+nowhere does the kernel interpret the strings as meaningful or
+printable.  
+
+In the old times, all utilities that would display to the user were
+ASCII-based, and people would use pure printable ASCII file names (even
+using space characters inside names was a cause for trouble). Non
+alphanumeric characters were exclusively used for playing tricks on
+colleagues. And all was well. 
+
+Then the devil came under the guise of accented 8 bit characters. The
+system has no problem with them, file names are still binary C strings, but
+the utilities have to display them or take them as input, and, because
+there is no encoding specification stored with the file names, they can
+only do this according to the character encoding taken from the user's
+current locale.
+
+For example fr_FR.UTF-8, and fr_FR.ISO8859-1 could be used simultaneously
+on the same system (by different users), but they are completely
+uncompatible: ISO-8859-1 strings are illegal when viewed in an UTF-8 locale
+(will display as interrogation points or some other conventional error
+marker). UTF-8 strings will display as gibberish in an ISO-8859-1 locale.
+
+This means that the file names created by an UTF-8 user are displayed as
+garbage to the ISO-8859 one...
+
+If you ever change your locale, your old files are still there and named
+the same (in the binary sense), but the names display badly and you have
+great trouble inputing them. If you add distributed (NFS) file system
+issues, things become totally unmanageable. Also think about archives sent
+from another system with a different encoding.
+
+For what concerns Recoll:
+
+- The file names inside recoll.conf are not transcoded, they are taken as
+  binary strings (mostly, only +\n+ and +space+ are a bit special), and
+  passed as is to the system. So if you edit 'recoll.conf' with a text
+  editor, inside the same locale that is or has been used for file names,
+  you'll be fine.
+- There was a bug in the GUI configuration tool, up to 1.12, it should
+  transcode between the internal Qt format and locale-dependant strings,
+  but it doesn't or does it badly.  
+- There is also an exception for the +unac_except_trans+ variable, this
+  *has* to be UTF-8, so if the rest of the file uses another encoding,
+  you'll need to edit two separate files and concatenate them.
+
+As of version 1.13, Recoll uses local8Bit()/fromLocal8Bit() to convert
+recoll.conf file names from/to QStrings (it uses UTF-8 for all string
+values which are not file names).
+
+The Qt file dialog is broken (at least was, I have not checked this on
+recent versions). It should consider file paths as almost-binary data, not
+QStrings, but doesn't. In consequence, things are even more broken than
+necessary as seen from there:
+
+With LANG="C", no non-ASCII paths can't be used at all:
+
+- Strings read from recoll.conf are stripped of 8bit characters before display.
+- Directory entries with 8bit characters are not displayed at all in the
+  selection dialog.
+
+With LANG="fr_FR.UTF-8", only UTF-8 paths can be used:
+
+- Strings read from recoll.conf are damaged when converted to QString
+  (except those that were actually UTF-8) 
+- Only the UTF-8 directory entries are displayed in the selection dialog.
+
+
+With LANG="fr_FR.iso8859-1", everything works ok.
+
+- Strings read from recoll.conf are displayed with weird characters if
+  they use another encoding such as UTF-8, but are correctly maintained
+  and can be read back from the dialogs and rewritten without damage. 
+- Directory entries with 8 bit characters are displayed weirdly (normal),
+  but can be manipulated without trouble (this includes utf-8 names of
+  course). 
+
+In conclusion, only the iso-8859 locales can be used for handling mixed
+encoding situations. This is a possible workaround for people who need it. 
+
+More data about path encoding issues:
+http://www.dwheeler.com/essays/fixing-unix-linux-filenames.html 
--- a/website/faqsandhowtos/OpenHelperScript.txt
+++ b/website/faqsandhowtos/OpenHelperScript.txt
@ -0,0 +1,71 @@
+== Starting native applications
+
+It is sometimes difficult to start a native application on a result
+document, especially when the result comes from a container file (ie: email
+folder file, chm file).  
+
+The problem is that native applications usually expect at most a file name
+on the command line, and sometimes not even that (emailers). 
+
+The _Open parent documents_ link in the result list right click menu is
+sometimes useful in this situation (e.g.: +chm+ files). 
+
+In some other cases it may help that Recoll does make a lot of data
+available to the application. This data may have to be pre-processed in a
+script before calling the actual application. 
+
+Details about configuring how the native application or script are called
+are given with the 
+link:http://www.recoll.org/usermanual/usermanual.html#RCL.INSTALL.CONFIG.MIMEVIEW[description of the mimeview configuration file]
+
+Information about
+link:http://www.recoll.org/usermanual/usermanual.html#RCL.INSTALL.CONFIG.FIELDS[configuring
+customised fields] may also be useful in combination. 
+
+=== Example
+
+This is a simple example, because it does not need to use special
+fields. It just shows how to solve a simple issue by using an intermediary
+script. The problem is due to the fact that thunderbird's +-file+ option
+won't open a file if the extension is not '.eml'. Jorge, the kind Recoll
+user who supplied the example stores his email in Maildir++ format, the
+file names have no extension, so an intermediary script is necessary to get
+thunderbird to open them: 
+
+Note that this only works with messages stored in Maildir or MH format (one
+message per file). As far as I know, there is no way to get Thunderbird to
+open an arbitrary mbox file. 
+
+The 'recoll-thunderbird-open-file' script:
+
+----
+#!/bin/sh
+cp $1 /tmp/$$.eml
+thunderbird  -file /tmp/$$.eml
+----
+
+Create the file in an editor, save it somewhere, and make it executable
+(`chmod +x recoll-thunderbird-open-file`).
+
+The mail line in the '~/.recoll/mimeview' file:
+
+----
+[view]
+message/rfc822  = recoll-thunderbird-open-file  %f
+----
+
+If the place where you saved the script is not in your PATH, you will need
+to use the full path instead of just the script name, as in  
+
+----
+[view]
+message/rfc822 = /home/me/somewhere/recoll-thunderbird-open-file  %f
+----
+
+You should then be able to open the messages in Thunderbird, which is
+useful, for example, to handle the attachments. 
+
+With recent Recoll versions, if using the normal option of letting the
+Desktop chose the _Open_ application to use (_Use Desktop default_),
+you should also add +message/rfc822+ to the exceptions, and the whole
+thing is probably more easily done from the Recoll GUI. 
--- a/website/faqsandhowtos/PreventIndexingDir.txt
+++ b/website/faqsandhowtos/PreventIndexingDir.txt
@ -0,0 +1,27 @@
+== Preventing indexing in a directory
+
+=== Why would you want to do this ?
+
+By default, recollindex (or the indexing thread inside the recoll QT user
+interface) will process your home directories and most its subdirectories,
+at the exception of some well known places (thumbnails, beagle and web
+browser caches, etc.) 
+
+You may want to prevent indexing in some directories where you don't expect
+interesting search results. This will avoid polluting the search result
+lists, speed up indexing times and make the index smaller. 
+
+=== How to do it
+
+There are two ways to block indexing at certain points: either by listing
+specific paths, or by directory name pattern matches. 
+
+- Blocking specific paths: this is controlled by the skippedPaths variable
+  in the main configuration file. You can adjust the value either by
+  editing the file or by using the indexing configuration dialog:
+  _Preferences->Indexing configuration->Global parameters->Skipped paths_
+- Using pattern matches: these are listed in the skippedNames variable in
+  the main configuration file. You can adjust the value either by editing
+  the file or by using the GUI: _Preferences->Indexing configuration->Local
+  parameters->Skipped names_
+
--- a/website/faqsandhowtos/ProblemSolvingData.txt
+++ b/website/faqsandhowtos/ProblemSolvingData.txt
@ -0,0 +1,157 @@
+== Gathering useful data for asking help about or reporting a Recoll issue
+
+Once in a while it will happen that a Recoll program will either signal an
+error, or even crash (either the *recoll* graphical interface or the
+*recollindex* command line indexing command). 
+
+Reporting errors and crashes is very useful. It can help others, and it can
+get your own problem solved. 
+
+Any problem report should include the exact Recoll and system versions.
+
+If at all possible, reading the following and performing part of the
+suggested steps will be useful. This is not a condition for obtaining help
+though ! If you have any problem and have a difficulty with the following,
+just contact the mailing list or the developers (see contacts on
+link:https://www.recoll.org/support.html[the Recoll site support page]).
+
+If the problem concerns indexing, and was initially found using the
+*recoll* GUI, you should try to reproduce it using the
+*recollindex* command-line indexer, which is much simpler and easier to
+debug. 
+
+There are then two sources of useful information to diagnose the issue: the
+debug log file and, possibly, in case of a crash, a stack trace. 
+
+Crash and other problem reports are of very high value to me, and I am
+willing to help you with any of the steps described below if it is not
+familiar to you. I do realize that not everybody is a programmer or a
+system administrator. 
+
+=== Obtaining information from the log file
+
+All Recoll commands write a varying amount of information to a common log file.
+
+_All commands use the same log, and the file is reset every time a command
+is started: so it is important to make a copy right after the problem
+occurs (for example, do not start *recoll* after a *recollindex*
+crash, this would reset the log). A workaround for this issue is to let the
+messages go to the default +stderr+, and redirect this._
+
+By default, the messages are output to +stderr+, and you probably don't even
+see them if Recoll is started from the desktop. In this case, you need to
+set the parameters so that output goes to a file, and the appropriate
+verbosity level is set. When using the command-line, you may actually
+prefer to redirect stderr to avoid the log-truncating issue described
+above. 
+
+You can set the log parameters from the GUI _Indexing parameters_
+section or by editing the '~/.recoll/recoll.conf' file: set the
+loglevel+ and +logfilename+ parameters. E.g.: 
+
+----
+loglevel = 6
+logfilename = /tmp/recolltrace
+----
+
+The log file can become very big if you need a big indexing run to
+reproduce the problem. Choose a file system with enough space available
+(possibly a few gigabytes). 
+
+Then run the sequence that leads to the problem, and make a copy of the log
+file just after. If the log is too big, it will usually be sufficient to
+use the last 500 lines or so (tail -500). 
+
+==== Single file indexing issues
+
+When the problem concerns, or can be reproduced with, a single file it is
+very cumbersome to have to run a full indexing pass to reproduce it. There
+are two ways around this: 
+
+- Set up an ad hoc configuration with only the file of interest, or its
+  parent directory: 
+----
+cd
+mkdir recoll-test
+cd recoll-test
+echo /path/to/my/file/or/its/parent/dir > recoll.conf
+echo 'loglevel = 6' >> recoll.conf
+echo 'logfilename = /tmp/recolltrace' >> recoll.conf
+recollindex -z -c .
+----
+- Use the -e and -i options to recollindex to erase/reindex a single
+  file. Set up the log, then: 
+----
+recollindex -e /path/to/my/file
+recollindex -i /path/to/my/file
+----
+
+When using the second approach, you must take care that the path used is
+consistent with the paths listed/used in the configuration (ie: if '/home' is
+a link to '/usr/home', and '/usr/home/me' is used in the configuration
+topdirs+, `recollindex -i /home/me/myfile` will not work, you need
+to use `recollindex -i /usr/home/me/myfile`.
+
+
+=== Obtaining a stack trace
+
+If the program actually crashes, and in order to maximize usefulness, a
+crash report should also include a so-called stack trace, something that
+indicates what the program was doing when it crashed. Getting a useful
+stack trace is not very difficult, but it may need a little work on your
+part (which will then enable me do my part of the work). 
+
+If your distribution includes a separate package for Recoll debugging
+symbols, it probably also has a page on its web site explaining how to use
+them to get a stack trace. You should follow these instructions. If there
+is no debugging package, you should follow the instructions below. A little
+familiarity with the command line will be necessary. 
+
+==== Compiling and installing a debugging version
+
+- Obtain the recoll source for the version you are using (www.recoll.org),
+  and extract the source tree. 
+- Follow the
+  link:http://www.lesbonscomptes.com/recoll/usermanual/rcl.install.building.html[instructions
+  for building Recoll from source] with the following modifications:
+- Before running configure, edit the mk/localdefs.in file and remove the
+  -O2 option(s). 
+- When running configure, specify the standard installation location for
+  your system as a prefix (to avoid ending up with two installed versions,
+  which would almost certainly end in confusion). On Linux this would
+  typically be: `configure --prefix=/usr`
+- When installing, arrange for the installed executables not to be stripped
+  of debugging symbols by specifying a value for the STRIP environment
+  variable (ie: *echo* or *ls*): `sudo make install STRIP=ls`
+
+==== Getting a core dump
+    
+You will need to run the operation that caused the crash inside a writable
+directory, and tell the system that you accept core dumps. The commands
+need to be run in a shell inside a terminal window. E.g.: 
+
+----
+cd
+ulimit -c unlimited
+recoll  #(or recollindex or whatever you want to run).
+----
+
+Hopefuly, you will succeed in getting the command to crash, and you will
+get a core file. A possible approach then would be to make both the
+executable and the core files available to me by uploading it to a file
+sharing site (the core file may be quite big). You should be aware though
+that the core file may contain some of the data that was being indexed,
+which may be a privacy issue. Another approach is to generate the stack
+trace yourself. 
+
+=== Using gdb to get a stack trace
+
+- Install gdb if it is not already on the system.
+- Run gdb on the command that crashed and the core file (depending on the
+  system, the core file may be named "core" or something else, like
+  recollindex.core, or core.pid), ie: {{{gdb /usr/bin/recollindex core}}} 
+- Inside gdb, you need to use different commands to get a stack trace for
+  recoll and recollindex. For recollindex you can use the bt command. For
+  recoll use `thread apply all bt full`
+- Copy/paste the output to your report email :), and quit gdb ("q").
+
--- a/website/faqsandhowtos/QpdfviewHelperScript.txt
+++ b/website/faqsandhowtos/QpdfviewHelperScript.txt
@ -0,0 +1,61 @@
+== Starting native applications ==
+
+Another example of using an intermediary script for an application with a
+command line syntax which can't be directly defined in mimeview. 
+
+We use a script to preprocess and adapt the options before calling the
+actual command. 
+
+Details about configuring how the native application or script are called
+are given with the
+link:http://www.recoll.org/usermanual/usermanual.html#RCL.INSTALL.CONFIG.MIMEVIEW[description
+of the mimeview configuration file].
+
+*qpdfview* (link:http://launchpad.net/qpdfview[web site]) is a very
+lightweight tabbed PDF viewer with great search performance and result
+highlighting.
+
+It does support parsing the search term and page number from the command
+line with the following syntax:
+
+----
+qpdfview --unique "%f"#%p --search "%s"
+----
+
+However, qpdfview will not launch if either %p or %s are empty in the
+command above. To accommodate for that, Recoll user Florian has written a
+small wrapper shell script:
+
+----
+#!/bin/bash
+
+qpdfviewpath=qpdfview
+
+if [ -z $2 ]
+then
+    page=""
+
+else
+    page="#"$2""
+fi
+
+if [ -z $3 ]
+then
+    search=""
+
+else
+    search="--search "$3""
+fi
+
+$qpdfviewpath --unique "$1"$page $search >&0 2>&0 &
+----
+
+
+The corresponding handler line for Recoll would be (depending on how you
+name the script and where you store it):
+
+----
+      qpdfviewwrapper %f %p %s
+----
+
+
--- a/website/faqsandhowtos/QueryFromC.txt
+++ b/website/faqsandhowtos/QueryFromC.txt
@ -0,0 +1,18 @@
+== Querying Recoll from a C program
+
+The easiest way to query Recoll from a C or C++ program is to execute an
+external search command (`recollq` or `recoll -t`).
+
+I have written a simple C module which deals with the related housekeeping
+and presents an easy to use API to the rest of the code. You will find it
+here:
+
+    https://bitbucket.org/medoc/recoll-capi
+
+It is a bit experimental and will only work with recoll 1.20 for now
+(because it uses a new option for recollq). However it would be trivial to
+modify for working with 1.19, get in touch with me if you need this.
+
+The other approach is to link with the Recoll library. This has no official
+API, but in practise, the internal one is fairly stable, and if you want to
+choose this approach, you should start from the code in recollq.cpp
--- a/website/faqsandhowtos/ReplaceCategories.txt
+++ b/website/faqsandhowtos/ReplaceCategories.txt
@ -0,0 +1,58 @@
+== Replacing the Category filter controls
+
+The document category filter controls normally appear at the top of the
+*recoll* GUI, either as checkboxes just above the result list, or as a
+dropbox in the tool area.
+
+By default, they are labeled _Media_, _Message_, _Spreadsheet_, _Text_,
+etc. and each map to a document category.
+
+The mapping used to be fixed. You could change the number and composition
+of categories by redefining them inside the {{{mimeconf}}} configuration
+file (you still can), but the filters always used document categories.
+
+Categories can also be selected from the query language by using an
+rclcat:+ selector. E.g.: _rclcat:message_.
+
+As of Recoll release 1.17, the filters are not hard-wired any more. They
+map to query language fragments. This means that you can freely redefine
+what they do. 
+
+The associations are configured inside the 'mimeconf' file, in the
+[guifilters]+ section. Most GUI parameters are stored in the *Qt*
+configuration file, so this is not entirely consistent, and you will have
+to bear with my lazyness here.
+
+A simple exemple will hopefuly make things clearer. If you add the 
+following to your '~/.recoll/mimeconf' file:
+
+----
+[guifilters]
+
+Big Books = dir:"~/My Books" size>10K
+My Docs = dir:"~/My Documents"
+Small Books = dir:"~/My Books" size<10K
+System Docs = dir:/usr/share/doc
+
+----
+
+You will have four filter checkboxes, labelled _Big Books_, _My Docs_, etc.
+
+The text after the equal sign must be a valid query language fragment, and
+will be translated to a *Recoll* query and combined with the rest of the
+query with an AND conjunction.
+
+Any name text before a colon character will be erased in the display, but
+used for sorting. You can use this to display the checkboxes in any order
+you like. For exemple, the following would do exactly the same as above,
+but ordering the checkboxes in the reverse order.
+
+----
+[guifilters]
+
+d:Big Books = dir:"~/My Books" size>10K
+c:My Docs = dir:"~/My Documents"
+b:Small Books = dir:"~/My Books" size<10K
+a:System Docs = dir:/usr/share/doc
+
+----
--- a/website/faqsandhowtos/ResultsThumbnails.txt
+++ b/website/faqsandhowtos/ResultsThumbnails.txt
@ -0,0 +1,23 @@
+== Result list thumbnails and how to create them
+
+Recoll will display thumbnails for the results if the images exist in the 
+standard location ('$HOME/.thumbnails' or '$HOME/.cache/thumbnails' depending
+on the xdg version). 
+
+But it will not create thumbnails, mainly because it is very hard to do
+portably.
+
+Thumbnails are most commonly created when you visit a directory with your
+file manager, but visiting the whole file tree just to create thumbnails is
+a bit fastidious.
+
+One simple trick to create thumbnails from the recoll GUI is to visit the
+parent directory for a result by using the _Open parent document/folder_
+entry in the right-click menu.
+
+You can also find tools for the systematic creation of thumbnails for a
+directory tree. Three such tools are discussed on this 
+link:http://askubuntu.com/questions/199110/how-can-i-instruct-nautilus-to-pre-generate-pdf-thumbnails[askubuntu.com discussion]
+
+Also please note that no thumbnails can currently be generated or displayed
+for embedded documents (attachments, archive members, etc.).
--- a/website/faqsandhowtos/SavingConfig.txt
+++ b/website/faqsandhowtos/SavingConfig.txt
@ -0,0 +1,61 @@
+== User configuration backup
+
+=== Why you would want to do this
+
+If you are going to reinstall your system, and have some custom
+configuration, you may save some time by making a backup of your
+configuration and restoring it on the new system, rather than going through
+the menus to recreate it.
+
+=== How to do it
+
+==== Index/search configuration
+
+The main recoll configuration data is normally kept inside '~/.recoll' or
+whatever *$RECOLL_CONFDIR* is set to.
+
+This directory contains both configuration files and generated index
+data.In a standard configuration, the following files and directories
+contain generated data: 
+
+- 'xapiandb' contains the Xapian index, which normally consumes most of the
+  total space. 
+- 'aspdict.en.rws' contains the aspell dictionary used for spelling
+  corrections. 
+- 'mboxcache' contains cached offset data for email messages inside mbox
+  folders. 
+- 'webcache' contains saved web pages. This is more than a cache as
+  destroying it will purge the corresponding data during the next
+  indexing. 
+
+The other files are either very small or contain configuration data.
+
+If you want to only save configuration, using minimum space, you can
+destroy the above files and directories (with the possible exception of
+'webcache'). Then taking a copy of the '.recoll' directory and adding the
+GUI configuration data described in the next will get you a full
+configuration data backup. 
+
+==== GUI configuration
+
+The parameters set from the _Query configuration_ Qt menus are stored in
+Qt standard places:
+
+- '~/.qt/recollrc' for Qt 3.x
+- '~/.config/Recoll.org/recoll.conf' for Qt 4 and later
+
+
+==== Other data
+
+If you wish to save index data in addition to the customisation files,
+which only makes sense if the document access paths do not change after
+reinstallation, you can just take a backup of the full '.recoll'
+directory, taking care that the storage locations for some data elements
+can be changed (not be inside '.recoll'): 
+
+- The index data is normally kept inside '~/.recoll/xapiandb', but the
+  location of this directory can be modified by the +dbdir+
+  configuration parameter if it is set (check 'recoll.conf'). 
+- If you use the Firefox Recoll plugin, the WEB history cache is normally
+  kept inside '~/.recoll/webcache', but the location can be modified by
+  the +webcachedir+ configuration parameter. 
--- a/website/faqsandhowtos/UnityLens.txt
+++ b/website/faqsandhowtos/UnityLens.txt
@ -0,0 +1,109 @@
+== Building and Installing the Ubuntu Unity Recoll Lens
+
+Important preliminary notes:
+
+- This only makes sense for Ubuntu versions using the Unity environment:
+  Natty (11.04), Oneiric (11.10), Precise (12.04), and later. 
+- _Remember that you still need to use the recoll GUI (or the recollindex
+  //command) to get the indexing going !_
+- The Lens is artificially limited to showing at most 20 results. Use the
+  recoll GUI for more complete capabilities (or edit rclsearch.py, change
+  the "if actual_results >= 20:" line). 
+
+
+=== The Lens with Recoll 1.17 and later
+
+If you are willing to install or upgrade to Recoll version 1.17, all
+necessary packages are on the Recoll PPA, you just need to add the
+repository to your system sources and add or upgrade the packages: *_/This
+is the recommended approach!_*
+
+----
+sudo add-apt-repository ppa:recoll-backports/recoll-1.15-on
+sudo apt-get update
+sudo apt-get install recoll-lens recoll
+----
+
+This document may still be useful if you want to modify the lens source
+code.
+
+=== The Lens with older Recoll versions
+
+If, for some reason, you wish to test the Lens with an older Recoll
+version, read the following. 
+
+Please not that such an installation is somewhat crippled: you will not be
+able to display results for embedded documents (emails inside an mbox,
+attachments etc.). This requires a recoll command line option which is only
+available in 1.17 
+
+The Lens is based on the Recoll Python module which is not built by default
+for versions prior to 1.17, so so you will first need to pull the Recoll
+source code (for you version), then untar and proceed with the
+configure/build instructions below. 
+
+The following uses --prefix=/usr. I have no real reason to believe 
+that this would not work with /usr/local (lenses are also searched there by
+default). If you confirm that things work with another prefix, please drop
+me a line.
+
+When doing this over a previous Recoll compilation, run a "make clean" to
+get rid of the non-PIC objects. 
+
+Note that the following instructions change nothing to your existing Recoll
+installation, they only install the Python module and the Unity Lens,
+recoll, recollindex etc. are unaffected. 
+
+'/TOP/OF/RECOLL/SRC' designates the top of the recoll source tree.
+
+=== Configure and build the recoll library and python module, install the module
+
+The following needs the development packages for Xapian, Python and zlib.
+
+----
+cd /TOP/OF/RECOLL/SRC 
+# May fail if no previous build was performed
+make clean
+
+# the gui/x11 disabling is just here to avoid having to install the
+# development libraries for Qt.
+configure --prefix=/usr --enable-pic --without-x --disable-qtgui
+make
+
+cd python/recoll
+python setup.py build
+sudo python setup.py install
+----
+
+=== Build and install the Unity Lens
+
+----
+cd /TOP/OF/RECOLL/SRC
+cd desktop/unity-lens-recoll
+configure --prefix=/usr --sysconfdir=/etc 
+sudo make install
+
+----
+
+Voilà, it should work...
+
+Try to start the Dash, you should see the Recoll checkerboard (or
+whatever...) in the Lens list. 
+
+The Recoll Lens expects a Recoll query language string, so you can use
+field searches, directory, size, and date filtering (see the
+link:http://www.lesbonscomptes.com/recoll/usermanual/rcl.search.lang.html[Recoll
+manual] for a description of the query language).  
+
+If you want to disable the Lens, I think that you just have to delete
+'/usr/share/unity/lenses/recoll'
+
+Other installed files:
+
+----
+/usr/libexec/unity-recoll-daemon
+/usr/share/dbus-1/services/unity-lens-recoll.service
+/usr/share/doc/unity-lens-recoll
+/usr/share/unity-lens-recoll
+----
+
--- a/website/faqsandhowtos/UsingOpenWith.txt
+++ b/website/faqsandhowtos/UsingOpenWith.txt
@ -0,0 +1,68 @@
+== Using the _Open With_ context menu in recoll 1.20 and newer
+
+Recoll versions and newer have an _Open With_ entry in the result list
+context menu (the thing which pops up on a right click).
+
+This allows choosing the application used to edit the document, instead of
+using the default one.
+
+The list of applications is built from the desktop files found inside
+'/usr/share/applications'. For each application on the system, these
+files lists the mime types that the application can process.
+
+If the application which you would want listed does not appear, the most
+probable cause is that it has no desktop file, which could happen due to a
+number of reasons.
+
+This can be fixed very easily: just add a +.desktop+ file to
+'/usr/share/applications', starting from an existing one as a template.
+
+As an example, based on an original idea from Recoll user +florianbw+,
+the following describes setting up a script for editing a PDF document
+title found in the recoll result list.
+
+The script uses the *zenity* shell script dialog box tool to let you
+enter the new title, and then executes *exiftool* to actually change
+the document.
+
+----
+#!/bin/sh
+
+PDF=$1
+TITLE=`exiftool -Title -s3 "$PDF"`
+
+RES=`zenity --entry \
+  --title="Change PDF Title" \
+  --text="Enter the Title:" \
+  --entry-text "$TITLE"`
+
+if [ "$RES" != "" ]; then 
+    echo -n "Changing title to $RES ... " && \
+        exiftool -Title="$RES" "$PDF" && \
+        recollindex -i "$PDF" && echo "Done!"
+else 
+     echo "No title entered"
+fi
+----
+
+Name it, for example, 'pdf-edit-title.sh', and make it executable 
+(`chmod a+x pdf-edit-title.sh`).
+
+Then create a file named 'pdf-edit-title.desktop' inside
+'/usr/share/applications'. The file name does not need to be the same as the
+script's, this is just to make things clearer:
+
+----
+[Desktop Entry]
+Name=PDF Title Editor
+Comment=Small script based on exiftool used to edit a pdf document title
+Exec=/home/dockes/bin/pdf-edit-title.sh %F
+Type=Application
+MimeType=application/pdf;
+----
+
+You're done ! Restart Recoll, perform a search and right-click on a PDF
+result: you should see an entry named _PDF Title Editor_ in the _Open
+With_ list. Click on it, and you will be able to edit the title.
+
+
--- a/website/faqsandhowtos/WhyIsMyFileNotIndexed.txt
+++ b/website/faqsandhowtos/WhyIsMyFileNotIndexed.txt
@ -0,0 +1,99 @@
+== Using the log file to investigate indexing issues
+
+All *Recoll* processes print trace messages. By default these go to the
+standard error output, and you may not ever see them (in the case, for
+example, of the *recoll* GUI started from the desktop interface). 
+
+There are a number of potential issues with indexing that may need
+investigation, such as: 
+
+- A file can't be found by searching even if it appears that it should have
+  be indexed (this could happen because the file is not selected at all or
+  because a filter program crashes). 
+- The indexing process gets stuck and never finishes.
+- The indexing process ends up with an error.
+- The indexing process seems to be using too much system capacity.
+
+The right way to approach these problems is to use the *recollindex*
+command line tool (instead of the *recoll* GUI), and to set up the
+trace log to provide information about what indexing is actually doing. 
+
+Trace log parameters can be set either from the GUI _Preferences->Indexing
+Configuration->Global Parameters_ panel, or by editing the configuration
+file '~/.recoll/recoll.conf'. You should set the following parameters: 
+
+----
+loglevel = 6
+logfilename = stderr
+thrQSizes = -1 -1 -1
+----
+
+We use _stderr_ instead of an actual file in order to capture direct filter
+messages (such as a *python* stack trace) along with normal
+*recollindex* messages. 
+
+The last line sets recollindex for single-threaded operation, which will
+make the log much more readable. 
+
+You should then check that no *recoll* or *recollindex* process is
+currently running, and kill any you find. 
+
+Then, if this is an issue about an identified file, try indexing it only:
+
+----
+recollindex -i myunfindablefile.xxx > /tmp/myindexlog 2>&1
+----
+
+If this is a general issue with indexing (process not finishing properly),
+just start it: 
+
+----
+recollindex > /tmp/myindexlog 2>&1
+----
+
+Usually, having a look at the trace will allow to see what is wrong (e.g.:
+a configuration issue or missing filter), and solve the problem.  
+
+In case of indexer misbehaviour (e.g. using too much memory, you should run
+_tail -f_ on the log to see what is going on. 
+
+If this is not enough, please
+link:http://bitbucket.org/medoc/recoll/issues/new[open a tracker issue] and
+attach or link to the log data, or just email me (jfd at recoll.org). 
+
+*recollindex* and *recollindex -i* usually have the same criteria to
+include a file or not (but see the _Path gotcha_ note below). It may
+happen that they behave differently, so it may sometimes be useful to run a
+full *recollindex* even for a specific file, but this will produce a
+big log file. 
+
+When you are done, it is  better to reset the verbosity to a reasonable
+level (e.g.: +2+ : just errors, +4+ : basic traces). 
+
+=== Note: the path gotcha
+
+*recollindex -i* will only index files under the directories defined by the
+topdirs+ configuration variable (your home directory by
+default). Unfortunately, the test is done on the file path text, ignoring
+possible symbolic links. If you give a simple file name as a parameter to
+*recollindex -i* and there are symbolic links inside the +topdirs+
+entries, the comparison may fail. For example, if your home directory is
+'/home/me/' and '/home/' is a link to '/usr/home/', *recollindex -i
+somefilename* will actually try to index '/usr/home/somefilename/', and
+fail (because '/usr/home/me/' is not a subdirectory of '/home/me/'). This
+will manifest itself in the log by a message like the following.  
+
+----
+:4:../index/fsindexer.cpp:149:FsIndexer::indexFiles: skipping [/usr/home/me/somefile] (ntd)
+----
+
+If this happens, give a full path consistent with what is found in the
+configuration file (e.g.: _recollindex -i /home/me/somefile_). 
+
+=== File system occupation
+
+One of the possible reasons for failed indexing is a +maxfsoccup+
+parameter set too low. This is the value of file system occupation, not
+free space, where indexing will stop. It is set from the GUI indexing
+configuration or by editing 'recoll.conf'. A value of 0 implies no
+checking, but a very low, non-zero, value will just prevent indexing. 
--- a/website/faqsandhowtos/WikiIndex.txt
+++ b/website/faqsandhowtos/WikiIndex.txt
@ -0,0 +1,65 @@
+== Recoll Wiki file index
+link:ElinksWeb.html[Extending the Recoll Firefox visited web page indexing mechanism to other browsers]
+
+link:FaqsAndHowTos.html[Faqs and Howtos]
+
+link:FilterArch.html[Recoll input filters ]
+
+link:FilterRetrofit.html[Installing a filter for a new document type]
+
+link:FilteringOutZipArchiveMembers.html[Filtering out Zip archive members]
+
+link:GUIKeyboard.html[# Recoll GUI keyboard navigation]
+
+link:HandleCustomField.html[Generating a custom field and using it to sort results]
+
+link:Home.html[Welcome to the Recoll Wiki]
+
+link:HotRecoll.html[Recoll hotkey: starting / hiding recoll with a keyboard shortcut]
+
+link:IndexMailHeader.html[Indexing arbitrary mail headers ]
+
+link:IndexMozillaCalendari.html[Indexing Mozilla calendar data ]
+
+link:IndexOnAc.html[Laptops: automatically starting or stopping indexing according to AC power status]
+
+link:IndexOutlook.html[Indexing Outlook archives]
+
+link:IndexWebHistory.html[Indexing Web history with the Firefox extension ]
+
+link:MultipleIndexes.html[Creating and using multiple indexes]
+
+link:MuttAndRecoll.html[Interfacing Recoll and Mutt]
+
+link:NonAsciiFileNames.html[Unix and non-ASCII file names, a summary of issues]
+
+link:OpenHelperScript.html[Starting native applications ]
+
+link:PreventIndexingDir.html[Preventing indexing in a directory]
+
+link:ProblemSolvingData.html[Gathering useful data for asking help about or reporting a Recoll issue]
+
+link:QpdfviewHelperScript.html[Starting native applications ]
+
+link:QueryFromC.html[Querying Recoll from a C program]
+
+link:ReplaceCategories.html[Replacing the Category filter controls]
+
+link:ResultsThumbnails.html[Result list thumbnails and how to create them]
+
+link:SavingConfig.html[User configuration backup]
+
+link:UnityLens.html[Building and Installing the Ubuntu Unity Recoll Lens]
+
+link:UsingOpenWith.html[Using the Open With context menu in recoll 1.20 and newe]
+
+link:WhyIsMyFileNotIndexed.html[Using the log file to investigate indexing issues]
+
+link:XDGBase.html[XDG: Tidying Recoll data storage]
+
+link:ZDevCaseAndDiacritics1.html[Character case and diacritic marks (1), issues with stemming]
+
+link:ZDevCaseAndDiacritics2.html[Character case and diacritic marks (2), user interface]
+
+link:ZDevCaseAndDiacritics3.html[Character case and diacritic marks (3), implementation]
+
--- a/website/faqsandhowtos/XDGBase.txt
+++ b/website/faqsandhowtos/XDGBase.txt
@ -0,0 +1,42 @@
+== XDG: Tidying Recoll data storage ==
+
+The default storage structure of Recoll configuration and index data is
+quite at odds with what recommends the 
+link:http://standards.freedesktop.org/basedir-spec/basedir-spec-latest.html[XDG
+Base Directory Specification], the reason being that it predates said spec.
+
+By default, Recoll stores all its data in a single directory: '$HOME/.recoll'
+
+This is not going to change, because it would be quite disturbing for
+current users.
+
+However, the location of this directory can be modified using the
+$RECOLL_CONFDIR+ environment variable.
+
+Furthermore all significant Recoll data categories can be moved away from
+the configuration directory (maybe to '$HOME/.cache'), by setting
+configuration variables:
+
+* _dbdir_ defines the location for storing the Xapian
+  index. This could be set to, e.g., '$HOME/.cache/recoll/xapiandb'. It is
+  quite recommended that 
+  this directory be dedicated to Xapian (don't store other things in
+  there).
+* _mboxcachedir_ defines the location for caching access speedup information
+  about mail folders in mbox format. e.g. '$HOME/.cache/recoll/mboxcache'
+* New in 1.22: you can use _aspellDictDir_ to define the storage
+  location for the aspell spelling approximation
+  dictionary. E.g. '$HOME/.cache/recoll'
+* _webcachedir_ may be used to define where the visited web pages
+  archive is stored. E.g. '$HOME/.cache/recoll/webcache'. This is only used
+  if you activate the Firefox plugin and web history indexing. You may
+  want to think a bit more about where to store it, because, contrary to
+  the above, this is not discardable data: your Recoll Web history goes
+  away if you delete it.
+
+If you use multiple Recoll configurations, each will have to be customized.
+
+Once these are put away, there are still a few modifyiable files in the
+configuration directory, for example the 'recoll.pid' and 'history'
+files, but these are small files. Moving 'recoll.pid' away would be a
+serious headache because it is used by scripts. 
--- a/website/faqsandhowtos/ZDevCaseAndDiacritics1.txt
+++ b/website/faqsandhowtos/ZDevCaseAndDiacritics1.txt
@ -0,0 +1,143 @@
+== Character case and diacritic marks (1), issues with stemming
+
+=== Case and diacritics in Recoll
+
+Recoll versions up to 1.17 almost fully ignore character case and diacritic
+marks. 
+
+All terms are converted to lower case and unaccented before they are
+written to the index. There are only two exceptions:
+
+ * File paths (as used in _dir:_ clauses) are not converted. This might
+   be a bug or a feature, but the main reason is that we don't know how they
+   are encoded.
+ * It is possible to specify that some characters will keep their diacritic
+   marks, because the entity formed by the character and the diacritic mark
+   is considered to be a different letter, not a modified one. This is
+   highly dependant on the language. For exemple, in Swedish, +å+ should
+   be preserved, not turned into +a+.
+
+As a necessary consequence, the same transformations are applied to search
+terms, and it is impossible to search for a specific capitalization of a
+word (+US+ is looked for as +us+), or a specific accented form
+(+café+ will be looked for as +cafe+).
+
+However, there are some cases where you would like to be more specific:
+
+ * Searching for +US+ or +us+ should probably return different results.
+ * Diacritics are seldom significant in English, but we can find a
+   few examples anyway: +sake+ and +saké+, +mate+ and +maté+. Of
+   course, there are many more cases in languages which use more diacritics.
+
+On the other hand, accents are often mistyped or forgotten (résumé, résume,
+resume?), and capitalization is most often unsignificant, so that it is
+very important to retain the capability to ignore accent and character
+case differences, and that the discrimination can be easily switched on or
+off for each search (or even for specific terms).
+
+This text and other pages which will follow will discuss issues in adding
+character case and diacritics sensitivity to Recoll, under the assumption
+that the main index will contain the raw source terms instead of
+case-folded and unaccented ones.
+
+The following will use the _unaccent_ neologism to mean _remove
+diacritic marks_ (and not only accents). 
+
+English examples are used when possible, but given the limited use of
+diacritics in English, some French will probably creep in.
+
+=== Diacritics and stemming
+
+Stemming is the process by which we extend a search to terms related by
+grammatical inflexion, for example singular/plural, verb tenses, etc. For
+example a search for +floor+ is normally expanded by Recoll to +floors,
+floored, flooring, ...+
+
+In practice Recoll has a separate data structure that has stemmed terms
+(stems) as keys pointing to a list of expansion terms 
+{{{floor -> (floor,floors,floorings,...)}}}
+
+Stemming should be applied to terms before they are stripped of
+diacritics. Accents may have a grammatical significance, and the accent may
+change how the term is stemmed. For example, in French the +âmes+ suffix
+generally marks a past conjugation but +ames+ does not. The standard
+Xapian French stemmer will turn +évitâmes+ (avoided) into an +évit+ stem,
+but +évitames+ will be turned into +évitam+ (stripping
+plural and feminine suffixes).
+
+When the search is set to ignore diacritics, this poses a specific problem:
+if the user enters the search term without accents (which is correct
+because the system is supposed to ignore them), there is no warranty that
+the term will be correctly expanded by stemming.
+
+The diacritic mismatch breaks the family relationship between the stem
+siblings, and this is independant of the type of index: it will happen with
+an index where diacritics are stripped just as with a raw one.
+
+The simpler case where diacritics in the original term only affects
+diacritics in the stem also necessitates specific processing, but it is
+easier to work around.
+
+Two examples illustrating these issues follow.
+
+==== The simple case: diacritics in the term only affect diacritics in the stem
+
+Let's imagine that the document set contains the term +éviter+
+(infinitive of +to avoid+), but not +évite+ (present). The only term in
+the actual index is then +éviter+.
+
+The user enters an unaccented +evite+, counting on the
+diacritics-insensitive search mode to deal with the accents. As +évite+
+is not present in the index, we have no way to guess that +evite+ is
+really +évite+.
+
+The stemmer will turn +evite+ into +evit+. There is no way that this
+can be related to +éviter+, and this legitimate result can't be found.
+
+There is a way around this: we can compute a separate
+stem expansion dictionary for unaccented terms. This dictionary, to be used
+with diacritic-unsensitive searches only, contains the relationship
+between +evit+ and +eviter+ (as +éviter+ is in the index). We can
+then relate +eviter+ and +éviter+ because they differ only by accents,
+and the search will find the document with +éviter+.
+
+==== The bad case: diacritics in the term change the stem beyond diacritics
+
+Some grammatically significant accents will cause unexpectedly missing
+search results when using a supposedly diacritics-insensitive search mode.
+
+Let's imagine that the document set contains the term +éviter+ 
+(infinitive of +to avoid+), but not +évitâmes+ (past). So the stemming
+expansion table has an entry for +évit+ -> +éviter+.
+
+If the user enters an unaccented +evitames+, she would expect to find the
+documents containing +éviter+ in the results, because the latter term is
+a stemming sibling of +évitâmes+ and the search is supposedly not
+influenced by diacritics, so that +evitames+ and +évitâmes+ should be
+equivalent. 
+
+However, our search is now in trouble, because +évitâmes+ is not in any
+document, so that there is no data in the index which would inform us about
+how to transform the input term into something that differs only by accents
+but would yield a correct input for the stemmer.
+
+If we try to feed the raw user input to the stemmer, it will propose 
+an +evitam+ stem, which will not work, because the stem that actually 
+exists is +évit+, and +evitam+ can not be related to +éviter+.
+
+The only palliative approach I can think of would be a spelling correction
+of the input, performed independantly of the actual index contents, which
+would notice that +évitames+ is not a French word and propose a change or an
+expansion to +évitâmes+, which would correctly stem to +évit+ and allow
+us to find +éviter+.
+
+This issue is not specific to Recoll or indeed to the fact that the index
+retains accent or not. As far as I can see, it is an intrinsic bad
+interaction between diacritics insensitivity and stemming.
+
+It is also interesting to note that this case becomes less probable when
+the data set becomes bigger, because more term inflexions will then be
+present in the index.
+
+We'll next think about an link:ZDevCaseAndDiacritics2.html[appropriate
+interface].
--- a/website/faqsandhowtos/ZDevCaseAndDiacritics2.txt
+++ b/website/faqsandhowtos/ZDevCaseAndDiacritics2.txt
@ -0,0 +1,122 @@
+== Character case and diacritic marks (2), user interface
+
+In a link:ZDevCaseAndDiacritics1.html[previous document], we discussed some
+of the problems which arise when mixing case/diacritics sensitivity and
+stemming.
+
+As of version 1.18, Recoll can create two types of indexes:
+* _Dumb_ indexes contain terms which are lowercased and stripped of
+  diacritics. Searches using such an index are naturally case- and
+  diacritics- insensitive: search terms are stripped before processing.
+* _Raw_ indexes contain terms which are just like they were found in the
+  source document. Searching such an index is naturally sensitive to case
+  and diacritics, and can be made insensitive by further processing.
+
+The following explains how users can control these Recoll features.
+
+=== Controlling the type of index we create: stripped or raw
+
+The kind of index that recoll creates is determined by:
+
+ * A build-time *configure* switch: _--enable-stripchars_. If this is
+   set, the code for case and diacritics sensitivity is not compiled in and
+   recoll will work like the previous versions: unaccented and casefolded
+   index, no runtime options for case or diacritics sensitivity
+
+ * An indexing configuration switch (in recoll.conf): if Recoll was built
+   with _--disable-stripchars_, this will provide a dynamic way to return
+   to the "traditional" index. The case and diacritics code will be present
+   but inactive. Normally, a recoll installation with this switch set
+   should behave exactly like one built with _--enable-stripchars_. When
+   using multiple indexes, this switch MUST be consistent between
+   indexes. There is no support whatsoever for mixing raw and dumb indexes.
+   The option is named _indexStripChars_, and it is not settable from the
+   GUI to avoid errors. This is something that would typically be set once
+   and for all for a given installation. We need to decide what the default
+   value will be for 1.18
+
+ * A number of query time switches. Using these it is also possible to
+   perform a search insensitive to case and diacritics on a raw index. Note
+   however, that, given the complexity of the issues involved, I give no
+   guaranty at this time that this will yield exactly the same results as
+   searching a dumb index. Details about query time behaviour follow.
+
+
+=== Controlling stem, case and diacritics expansion: user query interface 
+
+Recoll versions up to 1.17 were insensitive to case and diacritics. We only
+needed to give the user a way to control stem expansion. This was done in
+three ways:
+
+ * Globally, by setting a menu option.
+ * Globally, by setting the stemming language value to empty.
+ * On a term by term basis by Capitalizing the term, or, in query language
+   mode only, by using an 'l' clause modifier (_"term"l_).
+
+After switching to an unstripped index, capable of case and diacritic
+sensitivity, we need ways to control what processing is performed among:
+
+ * Case expansion.
+ * Diacritics expansion.
+ * Stem expansion.
+
+The default mode will be compatible with the previous version, because
+this is is most generally what we want to do: ignore case and diacritics,
+expand stems.
+
+There are two easy approaches for controlling the parameters:
+ * Global options set in the GUI menus or as *recollq* command line
+   switches. 
+ * Per-clause options set by modifiers in the query language.
+
+We would like, however to let the user entry automatically override the
+defaults in a sensible way. For example:
+
+ * If a term is entered with diacritics, diacritic sensitivity is turned on
+   (for this term only).
+ * If a term is entered with upper-case characters, case sensitivity is
+   turned on. In this case, we turn off stem expansion, because it makes
+   really no sense with case sensitivity.
+
+With this method we are stuck with 3 problems (only if the global mode is
+set to insensitive, and we're not using the query language):
+
+ * Turning off stemming without turning on case sensitivity.
+ * Searching for an all lower-case term in case-sensitive mode.
+ * Searching for a term without diacritics in diacritic-sensitive mode.
+
+The two latter issues are relatively marginal and can be worked around easily
+by switching to query language mode or using negative clauses in the
+advanced search. 
+
+However, we need to be able to turn stemming off while remaining
+insensitive to case, and we need to stay reasonably compatible with the
+previous versions. This means that a term which has a capital first letter
+but is otherwise lowercase will turn stemming off, but not case sensitivity
+on. 
+
+So we're left with how to search for such a term in a case-sensitive way,
+and for this, you'll have to use global options or the query language.
+
+The modified method is:
+
+ * If a term is entered with diacritics, diacritic sensitivity is turned on
+   (for this term only).
+ * If the first letter in a term is upper-case and the rest is lower-case,
+   we turn stem expansion off, but we do not become case-sensitive
+ * If any letter in a term except the first is upper-case, case sensitivity
+   is turned on. Stem expansion is also turned-off (even if the first
+   letter is lower-case), because it makes really no sense with case
+   sensitivity.
+ * To search for an all lower-case or capitalized term in a case-sensitive
+   way, use the query language: "Capitalized"C, "lowercase"C
+ * Use the query language and the "D" modifier to turn on diacritics
+   sensitivity.
+
+It can be noted that some combinations of choices do not make sense and
+they are not allowed by Recoll: for example, diacritics or case sensitivity
+do not make sense with stem expansion (which cannot preserve diacritics in
+any meaningful general way).
+
+The [[ZDevCaseAndDiacritics3.wiki|next page]] describes the actual
+implementation in Recoll 1.18.
--- a/website/faqsandhowtos/ZDevCaseAndDiacritics3.txt
+++ b/website/faqsandhowtos/ZDevCaseAndDiacritics3.txt
@ -0,0 +1,67 @@
+== Character case and diacritic marks (3), implementation
+
+In previous pages, we discussed link:ZDevCaseAndDiacritics1.html[diacritics
+and stemming], and an link:ZDevCaseAndDiacritics2.html[appropriate
+interface] for switchable search sensitivity to diacritics and character
+case.
+
+So you are in this mood again and you don't want to type accents (maybe you're
+stuck with a QWERTY American english keyboard), or conversely you're
+want to resume looking for your résumé, and you've told Recoll as much,
+using the appropriate interface. What happens then ?
+
+The second case is easy if the index is raw, and mostly impossible if it is
+stripped. So we'll concentrate on the first case: how to achieve case and
+diacritics insensitivity on a raw index ?
+
+Recoll uses three expansion tables:
+
+* The first table has stripped and lowercased terms as keys and raw terms as
+  data: +mate -> (mate, maté, MATE,...)+.
+
+* The second table has lowercased stems as keys and original lowercase terms
+  as data (when using multiple languages, there are several such tables):
+  +évit -> (éviter, évite, évitâmes, ...)+.
+
+* The third table has stripped and lowercased stems as keys and stripped
+  lowercased terms as data:
+  +evit -> (eviter, evite, evitons)+ and +evitam -> (evitames, ...)+
+
+The first table can be used for full case and diacritics expansion or for
+only one of those, by post-filtering the results of full expansion (e.g. if
+we only want diacritics expansion, we filter by stripping diacritics from
+each result term and check that it's identical to the input). For example
+if we have +mate -> (mate, maté, MATE, MATÉ)+ in the table and want to
+only perform case expansion for an input of +maté+, we apply case folding
+to the initial output and keep only +maté+, as +mate+ differs from the
+input.
+
+We only perform stemming expansion when case and diacritics sensitivity is
+off. It is performed using the second and third tables, both on the
+lowercased and lowercased/stripped output of the first step, and each term
+in the output stemming is expanded again for case (using the first table).
+
+A full example of the expansion occurring during an insensitive search 
+for +resume+ using French stemming on a mixed English/French index
+follows. An important thing to remember is that the result of each
+expansion is a function of the terms actually present in the index, not
+some arbitrary computation (and so, of course, many of the possible but
+absent variations are missing).
+
+# The case and diacritics expansion of +resume+ yields +RESUME Resume
+  Résumé resumé résume résumé resume+ 
+
+# The Stem expansion input list (lower-cased) is:
+ +resume resumé résume résumé+, and the output is:
+ +resum resume resumenes resumer resumes resumé resumée résum résumait
+ résumant résume résumer résumerai résumerait résumes résumez résumé résumée
+ résumées résumés+ 
+
+# Each of the above terms is then fed to case and diacritics expansion (first
+ table), for the final output:
+ +resume résumé Résumé résumer résume Resume résumés RESUME resumes
+ resumer résumant resúmenes resumé résumait résumes résumée resumee
+ résumerait Résumez résumerai RÉSUMÉES Resumée Resumes résumées+.
+
+A Xapian OR query is finally constructed from the expanded term list.
+
--- a/website/faqsandhowtos/makeindex.sh
+++ b/website/faqsandhowtos/makeindex.sh
@ -0,0 +1,20 @@
+#!/bin/sh
+WIDX=WikiIndex.txt
+
+echo "== Recoll Wiki file index" > $WIDX
+for f in *.txt; do
+ if test "$f" = $WIDX ; then continue; fi
+ h="`basename $f .txt`.html"
+ title=`head -1 "$f" | sed -e 's/=//g' -e 's/^ *//' -e 's/ *$//' -e 's/
//g'`
+ echo 'link:'$h'['$title']' >> $WIDX
+ echo >> $WIDX
+done
+
+exit 0
+# Check and display what files are in the index but not in the contents table:
+
+grep \| FaqsAndHowTos.txt | awk -F\| '{print $1}'  | sed -e 's/\* \[\[//' -e 's/.wiki//' |sort > ctfiles.tmp
+grep '\[\[' WikiIndex.txt | awk -F\| '{print $1}'  | sed -e 's/\[\[//' -e 's/.wiki//' -e 's/.md//' | sort > ixfiles.tmp
+echo 'diff ContentFiles  IndexFiles:'
+diff ctfiles.tmp ixfiles.tmp
+rm ctfiles.tmp ixfiles.tmp