diff --git a/packaging/rpm/recollCooker.spec b/packaging/rpm/recollCooker.spec
new file mode 100644
index 00000000..e36ccfde
--- /dev/null
+++ b/packaging/rpm/recollCooker.spec
@@ -0,0 +1,88 @@
+Summary:	Desktop full text search tool with a qt gui
+Name:           recoll
+Version:        1.8.1
+Release:        %mkrel 1
+License:	GPL
+Group:          Databases
+URL:            http://www.recoll.org/
+Source0:	http://www.lesbonscomptes.com/recoll/%{name}-%{version}.tar.bz2
+Patch1:		%{name}-configure.patch
+BuildRequires:	libxapian-devel
+BuildRequires:	libfam-devel
+BuildRequires:	libqt-devel	>= 3.3.7
+BuildRequires:	libaspell-devel
+Requires:	xapian
+BuildRoot:      %{_tmppath}/%{name}-%{version}--buildroot
+
+%description
+Recoll is a personal full text search tool for Unix/Linux.
+It is based on the very strong Xapian backend, for which 
+it provides an easy to use, feature-rich, easy administration, 
+QT graphical interface.
+
+%prep
+%setup -q 
+%patch1 -p0
+
+%build
+%configure2_5x \
+	--with-fam \
+	--with-aspell
+
+%make
+
+%install
+[ "%{buildroot}" != "/" ] && rm -rf %{buildroot}
+
+%makeinstall_std
+desktop-file-install --vendor="" \
+	--add-category="X-MandrivaLinux-MoreApplications-Databases" \
+	--dir %{buildroot}%{_datadir}/applications %{buildroot}%{_datadir}/applications/*
+
+%clean
+[ "%{buildroot}" != "/" ] && rm -rf %{buildroot}
+
+%files
+%defattr(644,root,root,755)
+%doc %{_datadir}/%{name}/doc
+%attr(755,root,root) %{_bindir}/%{name}*
+%{_datadir}/applications/recoll-searchgui.desktop
+%{_datadir}/icons/hicolor/48x48/apps/recoll-searchgui.png
+%dir %{_datadir}/%{name}
+%dir %{_datadir}/%{name}/examples
+%dir %{_datadir}/%{name}/filters
+%dir %{_datadir}/%{name}/images
+%dir %{_datadir}/%{name}/translations
+%{_datadir}/%{name}/examples/mime*
+%{_datadir}/%{name}/examples/*.conf
+%attr(755,root,root) %{_datadir}/%{name}/examples/rclmon.sh
+%attr(755,root,root) %{_datadir}/%{name}/filters/rc*
+%{_datadir}/%{name}/filters/xdg-open
+%{_datadir}/%{name}/images/*png
+%{_mandir}/man1/recoll*
+%{_mandir}/man5/recoll*
+%{_datadir}/%{name}/translations/*.qm
+
+
+%changelog
+* Fri Apr 20 2007 Tomasz Pawel Gajc <tpg@mandriva.org> 1.8.1-1mdv2008.0
++ Revision: 16093
+- new version
+- drop P0
+
+  + Mandriva <devel@mandriva.com>
+
+
+* Tue Mar 06 2007 Tomasz Pawel Gajc <tpg@mandriva.org> 1.7.5-2mdv2007.0
++ Revision: 134128
+- rebuild
+
+* Tue Jan 30 2007 Tomasz Pawel Gajc <tpg@mandriva.org> 1.7.5-1mdv2007.1
++ Revision: 115423
+- add patch 1 - fix build on x86_64
+- add patch 0 - fix menu entry
+- fix group
+- add buildrequires
+- set correct bits on files
+- Import recoll
+
diff --git a/src/doc/user/usermanual.sgml b/src/doc/user/usermanual.sgml
index 19a9f52c..d2d9a693 100644
--- a/src/doc/user/usermanual.sgml
+++ b/src/doc/user/usermanual.sgml
@@ -24,11 +24,12 @@
       Dockes</holder>
     </copyright>
 
-    <releaseinfo>$Id: usermanual.sgml,v 1.44 2007-06-08 16:46:53 dockes Exp $</releaseinfo>
+    <releaseinfo>$Id: usermanual.sgml,v 1.45 2007-06-26 16:58:25 dockes Exp $</releaseinfo>
 
     <abstract>
       <para>This document introduces full text search notions
-      and describes the installation and use of the &RCL; application.</para>
+      and describes the installation and use of the &RCL;
+      application. It currently describes &RCL; 1.9.</para>
     </abstract>
 
 
@@ -771,30 +772,6 @@ fvwm
       <replaceable>unplugged</replaceable> but not
       <replaceable>potatoes</replaceable> (in any part of the document).</para>
 
-      <para>The first element <literal>author:"john doe"</literal> is
-      a phrase search limited to a specific field. Phrase searches are
-      specified as usual by enclosing the words in double quotes. The
-      field specification appears before the colon (of course this is
-      not limited to phrases, <literal>author:Balzac</literal> would
-      be ok too). &RCL; currently manages the following fields:</para>
-
-      <itemizedlist>
-	<listitem><para><literal>title</literal>,
-	<literal>subject</literal> or <literal>caption</literal> are
-	synonyms which specify data to be searched for in the
-	document title or subject.</para>
-	</listitem>
-	<listitem><para><literal>author</literal> or
-	<literal>from</literal> for searching the documents originators.</para>
-	</listitem>
-	<listitem><para><literal>keyword</literal> for searching the
-	document specified keywords (few documents actually have any).</para>
-	</listitem>
-      </itemizedlist>
-
-      <para>The query language is currently the only way to use the
-      &RCL; field search capability.</para>
-
       <para>All elements in the search entry are normally combined
       with an implicit AND. It is possible to specify that elements be
       OR'ed instead, as in <replaceable>Beatles</replaceable>
@@ -817,8 +794,54 @@ fvwm
       <para>An entry preceded by a <literal>-</literal> specifies a
       term that should <emphasis>not</emphasis> appear.</para>
 
+      <para>The first element in the above exemple,
+      <literal>author:"john doe"</literal> is a phrase search limited
+      to a specific field. Phrase searches are specified as usual by
+      enclosing the words in double quotes. The field specification
+      appears before the colon (of course this is not limited to
+      phrases, <literal>author:Balzac</literal> would be ok
+      too). &RCL; currently manages the following fields:</para>
+      <itemizedlist>
+	<listitem><para><literal>title</literal>,
+	<literal>subject</literal> or <literal>caption</literal> are
+	synonyms which specify data to be searched for in the
+	document title or subject.</para>
+	</listitem>
+	<listitem><para><literal>author</literal> or
+	<literal>from</literal> for searching the documents originators.</para>
+	</listitem>
+	<listitem><para><literal>keyword</literal> for searching the
+	document specified keywords (few documents actually have any).</para>
+	</listitem>
+      </itemizedlist>
+
+      <para>As of release 1.9, the filters have the possibility to
+      create other fields with arbitrary names. No standard filters
+      use this possibility yet.</para>
+
+      <para>There are two other elements which may be specified
+      through the field syntax, but are somewhat special:</para>
+      <itemizedlist>
+	<listitem><para><literal>ext</literal> for specifying the file
+	name extension (Ex: <literal>ext:html</literal>)</para>
+	</listitem>
+	<listitem><para><literal>mime</literal> for specifying the
+	mime type. This one is quite special because you can specify
+	several values which will be OR'ed (the normal default for the
+	language is AND). Ex: <literal>mime:text/plain
+	mime:text/html</literal>. Specifying an explicit boolean
+	operator or negation (<literal>-</literal>) before a
+	<literal>mime</literal> specification is not supported and
+	will produce strange results.</para>
+	</listitem>
+      </itemizedlist>
+      <para>The query language is currently the only way to use the
+      &RCL; field search capability.</para>
+
       <para>Words inside phrases and capitalized words are not
-      stem-expanded. Wildcards may be used anywhere.</para>
+      stem-expanded. Wildcards may be used anywhere inside a term.
+      Specifying a wild-card on the left of a term can produce a very
+      slow search.</para>
 
       <para>You can use the <literal>show query</literal> link at the
       top of the result list to check the exact query which was
@@ -2089,36 +2112,91 @@ skippedPaths = ~/somedir/*.txt
 	  will be given a file name as argument and should output the
 	  text contents in html format on the standard output.</para>
 
-	  <para>The html could be very minimal like the following
-	  example:</para>
-	  <programlisting>&lt;html>&lt;head>
-&lt;meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
-&lt/head>
-&lt;body>some text content&lt;/body>&lt;/html>
-          </programlisting>
-
-	  <para>You should take care to escape some characters inside
-	  the text by transforming them into appropriate
-	  entities. "<literal>&amp;</literal>" should be transformed into
-	  "<literal>&amp;amp;</literal>", "<literal>&lt;</literal>"
-	  should be transformed into "<literal>&amp;lt;</literal>".</para>
-
-	  <para>The character set needs to be specified in the
-	  header. It does not need to be UTF-8 (&RCL; will take care
-	  of translating it), but it must be accurate for good
-	  results.</para>
-
-	  <para>&RCL; will also make use of other header fields if
-	  they are present: <literal>title</literal>,
-	  <literal>description</literal>, <literal>keywords</literal>.
-          <para>
-          <para>The easiest way to write a new filter is probably to start
-          from an existing one.</para>
+	  <para>You can find more details about writing a &RCL; filter
+	  in the <link linkend="rcl.extending.filters">section about
+	  writing filters</link></para>
 	</sect3>
 
       </sect2>
 
     </sect1>
+
+    <sect1 id="rcl.extending">
+      <title>Extending &RCL;</title>
+      
+      <sect2 id="rcl.extending.filters">
+	<title>Writing a document filter</title>
+
+	<para>&RCL; filters are executable programs which 
+	translate from a specific format (ie:
+	<application>openoffice</application>,
+	<application>acrobat</application>, etc.) to the &RCL;
+	indexing input format, which was chosen to be HTML.</para>
+
+	<para>&RCL; filters are usually shell-scripts, but this is in
+	no way necessary. These programs are extremely simple and most
+	of the difficulty lies in extracting the text from the native
+	format, not outputting what is expected by &RCL;. Happily
+	enough, most document formats already have translators or text
+	extractors which handle the difficult part and can be called
+	from the filter.</para>
+
+	<para>Filters are called with a single argument which is the
+	source file name. They should output the result to stdout.</para>
+
+	<para>The <literal>RECOLL_FILTER_FORPREVIEW</literal>
+	environment variable (values <literal>yes</literal>,
+	<literal>no</literal>) tells the filter if the operation is
+	for indexing or previewing. Some filters use this to output a
+	slightly different format. This is not essential.</para>
+
+	<para>The output HTML could be very minimal like the following
+	example:</para>
+
+	<programlisting>&lt;html>&lt;head>
+&lt;meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
+&lt/head>
+&lt;body>some text content&lt;/body>&lt;/html>
+          </programlisting>
+
+	<para>You should take care to escape some characters inside
+	  the text by transforming them into appropriate
+	  entities. "<literal>&amp;</literal>" should be transformed into
+	  "<literal>&amp;amp;</literal>", "<literal>&lt;</literal>"
+	  should be transformed into "<literal>&amp;lt;</literal>".</para>
+
+	<para>The character set needs to be specified in the
+	  header. It does not need to be UTF-8 (&RCL; will take care
+	  of translating it), but it must be accurate for good
+	  results.</para>
+
+	<para>&RCL; will also make use of other header fields if
+	  they are present: <literal>title</literal>,
+	  <literal>description</literal>,
+	  <literal>keywords</literal>.</para>
+
+	<para>As of &RCL; release 1.9, filters also have the
+	possibility to "invent" field names. This should be output as
+	meta tags:</para>
+
+	<programlisting>
+&lt;meta name="somefield" content="Some textual data" /&gt;
+</programlisting>
+	
+	<para>In this case, a correspondance between field name and
+	&XAP; prefix should also be added to the
+	<filename>mimeconf</filename> file. See the existing entries
+	for inspiration. The field can then be used inside the query
+	language to narrow searches.</para>
+
+	<para>The easiest way to write a new filter is probably to start
+          from an existing one.</para>
+
+	
+      </sect2>
+
+    </sect1>
+
   </chapter>
 
 </book>
diff --git a/website/BUGS.txt b/website/BUGS.txt
index 68274c66..e63e47b6 100644
--- a/website/BUGS.txt
+++ b/website/BUGS.txt
@@ -4,10 +4,21 @@ Bugs that are listed in an older version section are supposedly fixed in
 later versions. Bugs listed in the topmost section may also exist in older
 versions. 
 
-Latest (1.8.1):
+Latest (1.8.2):
+- There are a few problems in the qt4 version of recoll: some accelerators
+  (esc-spc, ctl-arrow) do not work, neither do copy/paste between the
+  result list and preview windows and x11 applications.
 - The dates shown for email attachments in a result list are the email
   folder modification date. This should be inherited from the parent
   message instead.
+- There are sometimes problems with document deletions: the index can
+  get in a state where deleted or moved documents are not purged from the
+  index (the log file says that the doc are deleted, but they aren't
+  actually). When this happens, the only solution currently is to reindex
+  from scratch (recollindex -z). This is due to a xapian bug, which will be
+  fixed in a future release. You can apply the following patch to xapian
+  1.0.1 to fix it:
+      http://www.lesbonscomptes.com/recoll/xapian/xapian-delete-document.patch 
 - NEAR crashes: 1.6 has added NEAR searches. Unlike what recoll did
   with PHRASES, stemming expansion is performed on terms inside NEAR
   clauses (except if prevented by a capitalized entry of course). There is
@@ -39,9 +50,9 @@ Latest (1.8.1):
   compressed (ie: xxx.txt.gz), recoll will try to start the external viewer
   on the compressed file, which will not work in most cases.
 
-- There are problems which have been reported indexing big mailstores
-  (several hundreds of thousands of messages): resulting in a very big
-  database and even crashes during indexation.
+- Problems have been reported indexing big mailstores (several hundreds of
+  thousands of messages): resulting in a very big database and even
+  crashes.
 
 - Under some versions of KDE (ie: Fedora FC5 KDE 3.5.4-0.5.fc5), there is a
   problem with the window stacking order. Opening the "browse" file
diff --git a/website/CHANGES.txt b/website/CHANGES.txt
index 120bafa8..67be7bab 100644
--- a/website/CHANGES.txt
+++ b/website/CHANGES.txt
@@ -1,5 +1,31 @@
 CHANGES 
 
+1.9.0
+- Add option to remember sort tool state between program invocations (it is
+  reset to inactive by default)
+- Improve qt4 build: no more need for --enable-qt4
+- Fixed a number of qt4 glitches: selection and keyboard shortcuts.
+- When searching for an empty string inside the preview window, position
+  the window to the next occurrence of the primary search terms.
+- Have email attachments inherit date and author from their parent message
+- Added an adjustable flush threshold during indexing: should help control
+  memory usage. See the idxflushmb configuration parameter.
+- Added a check for file system free space. Indexing will stop if the
+  threshold is reached. See the maxfsoccuppc configuration parameter.
+- Fix bus error on rclmon exit
+- Better handle aspell errors inside rclmon
+- Added File menu entry to erase document history.
+- Added ext: and mime: selectors to the query language.
+- Added support for arbitrary fields. Filters can now produce any number of
+  fields which will be selectively searchable through the query language.
+- Added abiword and kword support. 
+- Contributed filter: rcljpeg. This should be extended to use the new field
+  support.
+- Changed the icon to an ugly one. The previous one was nicer but looked
+  too much like Xapian's.
+- Added some kind of support for a stopword list.
+- Bound space and backspace to PgUp/PgDown in preview.
+
 1.8.2 2007-05-19
 - Fixed method name for compatibility with xapian 1.0.0
 - Add .beagle to default list of skipped names (avoids indexing beagle
diff --git a/website/credits.html b/website/credits.html
index 6b50ad28..fbdc469f 100644
--- a/website/credits.html
+++ b/website/credits.html
@@ -38,7 +38,7 @@
       <p>First of all, many thanks to the users who provided criticism
 	and ideas to make <span class="application">Recoll</span> go
 	forward ! Please 
-	<a href="mailto:jean-francois.dockes@wanadoo.fr>
+	<a href="mailto:jean-francois.dockes@wanadoo.fr">
 	  contact me</a> if you have something to suggest.</p>
 
       <p><span class="application">Recoll</span> borrows
diff --git a/website/doc.html b/website/doc.html
index a9dc810e..7897446c 100644
--- a/website/doc.html
+++ b/website/doc.html
@@ -30,16 +30,24 @@
     
     <div class="content">
 
-      <h1>Recoll user manuals</h1>
+      <h1>Recoll user manual</h1>
       
-      <blockquote>
       <ul>
       <li><a href="usermanual/index.html">English</a></li>
       <li><a href="http://mcz.altervista.org/Pagine/usermanual-italian.html">
 	  Italian</a></li>
       </ul>
-      </blockquote>
 
+      <p><br></p>
+
+      <h1>Other documentation</h1>
+
+      <ul>
+      <li><a href="perfs.html">Index size and indexing performance
+	      data.</a></li> 
+      </ul>
+
+      
     </div>
   </body>
 </html>
diff --git a/website/download.html b/website/download.html
index a84bc0a9..4985a0d9 100644
--- a/website/download.html
+++ b/website/download.html
@@ -24,7 +24,7 @@
       <ul>
 	<li><a href="index.html">Home</a></li>
 	<li><b>Downloads</b></li>
-	<li><a href="usermanual/index.html">User manual</a></li>
+	<li><a href="doc.html">Documentation</a></li>
 	<li><a href="usermanual/rcl.install.html">Installation</a></li>
 	<li><a href="index.html#support">Support</a></li>
       </ul>
@@ -47,6 +47,8 @@
       </table>
       </p>
 
+      <h2><a name="source">General information</a></h2>
+
       <p>You will probably need to have a look at the
 	<a href="usermanual/rcl.install.html">installation manual</a> for
 	building and/or installation instructions.</p>
@@ -68,12 +70,17 @@
 	<a href="usermanual/index.html#RCL.INSTALL.EXTERNAL">list</a> to
 	decide what you may want to install.</p>
 
+      <p>In addition, optional functionality in Recoll (the term explorer
+	tool in phonetic mode) uses the <b>aspell</b> package. The
+	installed version should be at least 0.60 (utf-8 support) for
+	this to run smoothly. This function is far from essential.</p>
+
       <p>If you find problems with the package or its
 	installation, <em>please</em> 
 	<a href="mailto:jean-francois.dockes@wanadoo.fr">
 	  report them</a>.</p>
 
-      <h4>What do the release numbers mean?</h4>
+      <h3>What do the release numbers mean?</h3>
 
       <p>The Recoll releases are numbered X.Y.Z. </p>
 
@@ -110,7 +117,16 @@
 	1.8.2 was released purely for fixing a small issue of
 	compatibility with xapian 1.0.0 and small config/install
 	glitches.  There is no functional reason to upgrade from
-	1.8.1, (or update packages).
+	1.8.1, (or update packages).</p>
+
+      <p>Recoll 1.8.2 is the first release that will let you take
+	advantage of the new Xapian 1.0, the main user-visible change
+	of which is the new default index format. In order to take
+	advantage of the new format (which is not mandatory) Recoll
+	users updating from an older release need to delete their old
+	index. There are <a
+	href="usermanual/usermanual.html#RCL.INDEXING.STORAGE.FORMAT">more
+	details in the user manual</a>.</p>
 
       <p>Older recoll releases:
 	<a href="recoll-1.8.1.tar.gz">1.8.1</a>
@@ -128,8 +144,8 @@
       <h2><a name="rpms">Packages</a></h2>
 
       <p>The executables inside the binary rpms have a static link to
-	xapian, there is no dependency except Qt 3.3. Of course you need
-	xapian-core installed to use the source rpm. </p>
+	xapian 0.9.x, there is no dependency except Qt 3.3. Of course
+	you need xapian-core installed to use the source rpm. </p>
 
       <p><b>Fedora Core</b>
 	FC6 RPM: 
@@ -168,10 +184,16 @@
 	<a href="debian/edgy/">debian/edgy</a>
       </p>
 
+      <p><b>Ubuntu 6.06 dapper</b> (the feisty version does not work
+      on dapper). This has a static link on xapian 0.9.10:
+	<a href="debian/dapper/recoll_1.8.2-0ubuntu1_i386.deb">
+	  recoll_1.8.2-0ubuntu1_i386.deb</a> </p>
+
       <p><b>Debian unstable</b> Recoll is in the package repository,
-      you can install it with the usual <em>apt-get install
-      recoll</em>. <a
-      href="http://packages.qa.debian.org/r/recoll.html">Package page</a></p>
+	you can install it with the usual <em>apt-get install
+	  recoll</em>. <a
+	  href="http://packages.qa.debian.org/r/recoll.html">
+	  Package page</a></p>
 
       <p><b>Debian 3.1</b> Thanks to Mario (<img align="top" src="mario.png">)
       for these: i386: 
diff --git a/website/features.html b/website/features.html
index 7cb84ff5..e04201d9 100644
--- a/website/features.html
+++ b/website/features.html
@@ -142,6 +142,7 @@
 	</dd>
       </ul>
 
+
       <h2><a name="#stemming"></a>Stemming</h2>
 
       <p>Stemming is a process which transforms inflected words into
diff --git a/website/fr/features.html b/website/fr/features.html
new file mode 100644
index 00000000..7f70d4a1
--- /dev/null
+++ b/website/fr/features.html
@@ -0,0 +1,205 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+
+<html>
+  <head>
+    <title>RECOLL: un outil personnel de recherche textuelle pour 
+    Unix et Linux</title>
+    <meta name="generator" content="HTML Tidy, see www.w3.org">
+    <meta name="Author" content="Jean-Francois Dockes">
+    <meta name="Description" content=
+    "recoll est un logiciel personnel de recherche textuelle pour unix et linux bas� sur Xapian, un moteur d'indexation puissant et mature.">
+    <meta name="Keywords" content=
+      "recherche textuelle,desktop,unix,linux,solaris,open source,free">
+    <meta http-equiv="Content-language" content="fr">
+    <meta http-equiv="content-type" content=
+    "text/html; charset=iso-8859-1">
+    <meta name="robots" content="All,Index,Follow">
+    <link type="text/css" rel="stylesheet" href="../styles/style.css">
+  </head>
+
+  <body>
+
+    <div class="rightlinks">
+      <ul>
+	<li><a href="../index.html">Base</a></li>
+	<li><a href="../pics/index.html">Copies d'�crans</a></li>
+	<li><a href="../download.html">T�l�chargements</a></li>
+	<li><a href="../manuals.html">Documentation</a></li>
+	<li><a href="../index.html#support">Support</a></li>
+	<li><a href="../devel.html">D�veloppement</a></li>
+      </ul>
+    </div>
+
+    <div class="content">
+
+      <h1 class="intro">Caract�ristiques de Recoll</h1>
+
+      <dl>
+	<dt><a name="systems">Syst�mes</a></dt>
+	<dd><span class="application">Recoll</span> a �t� compil� et
+	test� sur FreeBSD, Linux, Darwin, Solaris (versions
+	  FreeBSD 5.5, Fedora Core 5, Suse 10.1, Gentoo,
+	  Debian 3.1, Ubuntu Edgy, Solaris 8/9, mais d'autres versions
+	  r�centes conviennent sans doute �galement).</dd>
+
+	<dd>Versions de QT: 3.2, 3.3 et 4.2</dd>
+
+        <dt><a name="doctypes">Types de documents</a></dt>
+	<dd>Recoll peut traiter les types de documents suivants, ainsi
+	que des fichiers compress�s du m�me type: 
+
+          <dl>
+            <dt>En interne</dt>
+
+            <dd>
+              <ul>
+                <li><var class="literal">text</var>.</li>
+
+                <li><var class="literal">html</var>.</li>
+
+                <li><span class="application">OpenOffice</span>
+                (avec l'aide de la commande <b>unzip</b>).</li>
+
+                <li><var class="literal">maildir</var> et <var
+		    class="literal">mailbox</var> (<span class=
+		    "application">Mozilla</span>, <span class=
+		    "application">Thunderbird</span>, <span class=
+		    "application">Evolution</span> et sans doute
+		    d'autres).</li> 
+
+                <li>Fichiers de conversation <span class="application">
+		    gaim</span>.</li>
+
+                <li><span class="application">Scribus</span>.</li>
+
+              </ul>
+            </dd>
+
+            <dt>With external helpers</dt>
+
+            <dd>
+              <ul>
+                <li><var class="literal">pdf</var> avec <a href=
+                "http://www.foolabs.com/xpdf/">xpdf</a>.</li>
+
+                <li><var class="literal">postscript</var> avec 
+           <a href="http://www.gnu.org/software/ghostscript/ghostscript.html">
+                ghostscript</a> et 
+           <a href="http://www.cs.wisc.edu/~ghost/doc/pstotext.htm">
+		    pstotext</a>.</li>
+
+                <li>Fichiers <span class="application">Lyx</span>
+                (n�cessite l'application 
+		  <span class="application">Lyx</span>).</li>
+
+                <li><span class="application">msword</span> avec <a href=
+                "http://www.winfield.demon.nl/">antiword</a>.</li>
+
+                <li><span class="application">Powerpoint</span> et 
+		  <span class="application">Excel</span> avec les utilitaires
+		  <a href="http://www.45.free.net/~vitus/software/catdoc/">
+		    catdoc</a>.</li>
+
+                <li><var class="literal">rtf</var> avec <a href=
+                "http://www.gnu.org/software/unrtf/unrtf.html">unrtf</a>.</li>
+
+		<li><var class="literal">dvi</var> avec 
+		  <a href="http://www.radicaleye.com/dvips.html">dvips</a>.
+		</li>
+
+		<li><var class="literal">djvu</var> avec 
+		  <a href="http://djvulibre.djvuzone.org/doc/index.html">
+		    DjVuLibre</a>. </li>
+
+		<li>Tags <var class="literal">mp3</var> avec 
+		  <a href="http://id3lib.sourceforge.net/">
+		    id3info (id3lib)</a>. </li>
+
+              </ul>
+            </dd>
+          </dl>
+	</dd>
+
+	<dt>Autres caract�ristiques</dt>
+	<dd>
+	  <ul>
+	    <li>Index multiples interrogeables ensemble ou s�par�ment.</li>
+
+	    <li>Fonctions de recherche puissantes, avec expressions
+	    bool�ennes, phrases et proximit�, caract�res jokers,
+	    filtrage sur les types de fichiers o� l'emplacement.</li>
+
+	    <li>Fonction sp�cifique de recherche de noms de fichiers.</li>
+
+	    <li>Support de jeux de caract�res multiples. Les traitements
+	      internes et l'index utilisent l'encodage Unicode UTF-8.</li>
+
+	    <li>L'extraction des racines de mots <a href="#Stemming">
+		Stemming</a> est effectu�e au moment de la recherche
+		(permet de changer de langue apr�s l'indexation).</li>
+
+	    <li>Installation facile. Pas de processus permanent, de
+	      serveur web ou environnement exotique.</li>
+
+	    <li>Un indexeur qui peut fonctionner soit comme un
+	      processus l�ger dans l'interface de consultation, comme un
+	      programme batch externe int�grable par 
+	      <span class="application">cron</span>, ou comme un processus
+	      permanent pour l'indexation au fil de l'eau.</li>
+
+	  </ul>
+	</dd>
+      </ul>
+
+      <h2><a name="#stemming"></a>Lemmatisation</h2>
+
+      <p><em>Note: je serais preneur d'une traduction fran�aise
+	agr�able pour "stemming".</em></p>
+      <p>La lemmatisation transforme un mot d�riv� vers sa racine.
+       Par exemple, <i>aimer</i>, <i>aimerai</i>, <i>aimait</i>,
+	<i>aimez</i> etc. seraient transform�s en <i>aim</i> en
+	fran�ais. Une recherche de l'un quelconque des d�riv�s peut
+	automatiquement �tre �tendue vers tous les autres</p>
+
+      <p>Certains moteurs de recherche appliquent la transformation
+      pendant l'indexation. L'index ne stocke que les racines des
+      mots, avec des exceptions pour les termes qui sont reconnus
+      comme des noms propres (capitalisation). Au moment de la
+      recherche, les termes de la requ�te sont �galement transform�s
+      avant comparaison � l'index.</p>
+      
+      <p>Cette approche permet un index plus petit, mais elle perd
+	irr�vocablement de l'information pendant l'indexation.</p>
+
+      <p>Recoll fonctionne diff�remment. Les termes sont index�s sans
+	transformation. L'index r�sultant est plus gros, ce qui n'a
+	probablement pas beaucoup d'importance � une �poque de disques
+	de 100 Go principalement remplis d'information multim�dia
+	<em>non index�e</em>.
+
+      <p>� la fin de l'indexation, Recoll construit un ou plusieurs
+      dictionnaires de transformation (pour diff�rents langages), o�
+      toutes les racines sont list�es avec leurs transformations
+      possibles.</p>
+
+
+      <p>Au moment de la recherche, par d�faut, les termes de
+      l'utilisateurs sont transform�s, et �tendus aux d�riv�s par
+      utilisation du dictionnaire.
+	Les r�sultats obtenus sont analogues � ceux de
+	l'autre m�thode. L'avantage est que l'expansion peut �tre
+	contr�l�e au moment de la recherche:
+	<ul>
+	<li>On peut la supprimer pour n'importe quel terme de la
+	  requ�te, (en le faisant d�buter par une capitale:
+	  <em>Aime</em> par exemple pour chercher la ville d'Aime la
+	  Plagne). </li>
+	<li>Le langage de transformation peut �galement �tre chang�,
+	en supposant que plusieurs dictionnaires de transformation
+	aient �t� construits lors de l'indexation.</li>
+      </ul>
+	
+    </div>
+  </body>
+</html>
+
diff --git a/website/index.html.en b/website/index.html.en
index 1e28084b..74c5db76 100644
--- a/website/index.html.en
+++ b/website/index.html.en
@@ -81,6 +81,16 @@
 	<li><a class="weak" href="features.html">(more detail)</a></li>
       </ul>
 
+
+      <h2>News: </h1>
+      <p>There are new filters for 
+	<span class="application">kword</span> and 
+	<span class="application">abiword</span> files in the 
+	<a href="filters/filters.html">new filters section</a>. These
+	are usable with an existing <span
+	class="application">Recoll</span> 1.8 installation.</p>
+
+	
       <h2><a name="support">Support</a></h3>
 
       <p>If you have any problem with Recoll, its
diff --git a/website/index.html.fr b/website/index.html.fr
index 601b6c6d..9451eeaf 100644
--- a/website/index.html.fr
+++ b/website/index.html.fr
@@ -97,6 +97,15 @@
 
       </ul>
 
+      <h2>Nouvelles: </h1>
+      <p>Il y a de nouveaux filtres d'indexation pour les fichiers
+	<span class="application">kword</span> et 
+	<span class="application">abiword</span>. Ils sont t�l�chargeables
+	dans la   <a href="filters/filters.html">zone des nouveaux
+	filtres</a>, et sont utilisable avec une installation existante de 
+	<span class="application">Recoll</span> 1.8.</p>
+
+
       <h2><a name="support">Support</a></h3>
 
       <p>Si vous avez un probl�me quelconque avec le logiciel ou son
diff --git a/website/mario.png b/website/mario.png
new file mode 100644
index 00000000..773946b0
Binary files /dev/null and b/website/mario.png differ
diff --git a/website/perfs.html b/website/perfs.html
new file mode 100644
index 00000000..bfd8ed70
--- /dev/null
+++ b/website/perfs.html
@@ -0,0 +1,114 @@
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
+
+<html>
+  <head>
+    <title>RECOLL: a personal text search system for
+    Unix/Linux</title>
+    <meta name="generator" content="HTML Tidy, see www.w3.org">
+    <meta name="Author" content="Jean-Francois Dockes">
+    <meta name="Description" content=
+    "recoll is a simple full-text search system for unix and linux based on the powerful and mature xapian engine">
+    <meta name="Keywords" content=
+      "full text search,fulltext,desktop search,unix,linux,solaris,open source,free">
+    <meta http-equiv="Content-language" content="en">
+    <meta http-equiv="content-type" content=
+    "text/html; charset=iso-8859-1">
+    <meta name="robots" content="All,Index,Follow">
+    <link type="text/css" rel="stylesheet" href="styles/style.css">
+  </head>
+
+  <body>
+
+    <div class="rightlinks">
+      <ul>
+	<li><a href="index.html">Home</a></li>
+	<li><a href="pics/index.html">Screenshots</a></li>
+	<li><a href="download.html">Downloads</a></li>
+	<li><a href="doc.html">Documentation</a></li>
+      </ul>
+    </div>
+
+    <div class="content">
+
+      <h1 class="intro">Recoll: Indexing performance and index sizes</h1>
+
+      <p>The time needed to index a given set of documents, and the
+	resulting index size depend of many factors, such as file size
+	and proportion of actual text content for the index size, cpu
+	speed, available memory, average file size and format for the
+	speed of indexing.</p>
+
+      <p>We try here to give a number of reference points which can
+	be used to roughly estimate the resources needed to create and
+	store an index. Obviously, your data set will never fit one of
+	the samples, so the results cannot be exactly predicted.</p>
+
+      <p>The following data was obtained on a machine with a 1800 Mhz
+	AMD Duron CPU, 768Mb of Ram, and a 7200 RPM 160 GBytes IDE
+	disk, running Suse 10.1.</p>
+
+      <p><b>recollindex</b> (version 1.8.2 with xapian 1.0.0) is
+	executed with the default flush threshold value. 
+	The process memory usage is the one given by <b>ps</b></p>
+
+      <table border=1>
+	<thead>
+	  <tr>
+	    <th>Data</th>
+	    <th>Data size</th>
+	    <th>Indexing time</th>
+	    <th>Index size</th>
+	    <th>Peak process memory usage</th>
+	  </tr>
+	<tbody>
+	  <tr>
+	    <td>Random pdfs harvested on Google</td>
+	    <td>1.7 GB, 3564 files</td>
+	    <td>27 mn</td>
+	    <td>230 MB</td>
+	    <td>225 MB</td>
+	  </tr>
+	  <tr>
+	    <td>Ietf mailing list archive</td>
+	    <td>211 MB, 44,000 messages</td>
+	    <td>8 mn</td>
+	    <td>350 MB</td>
+	    <td>90 MB</td>
+	  </tr>
+	  <tr>
+	    <td>Partial Wikipedia dump</td>
+	    <td>15 GB, one million files</td>
+	    <td>6H30</td>
+	    <td>10 GB</td>
+	    <td>324 MB</td>
+	  </tr>
+	  <tr>
+	    <!-- DB: ndocs 3564 lastdocid 3564 avglength 6460.71 -->
+	    <td>Random pdfs harvested on Google<br>
+	    Recoll 1.9, <em>idxflushmb</em> set to 10</td>
+	    <td>1.7 GB, 3564 files</td>
+	    <td>25 mn</td>
+	    <td>262 MB</td>
+	    <td>65 MB</td>
+	  </tr>
+	</tbody>
+      </table>
+
+      <p>Notice how the index size for the mail archive is bigger than
+	the data size. Myriads of small pure text documents will do
+	this. The factor of expansion would be even much worse with
+	compressed folders of course (the test was on uncompressed
+	data).</p>
+
+      <p>The last test was performed with Recoll 1.9.0 which has an
+	ajustable flush threshold (<em>idxflushmb</em> parameter), here
+	set to 10 MB. Notice the much lower peak memory usage, with no
+	performance degradation. The resulting index is bigger though,
+	the exact reason is not known to me, possibly because of
+	additional fragmentation </p>
+      </p>
+
+    </div>
+  </body>
+</html>
+
diff --git a/website/rclidxfmt.html b/website/rclidxfmt.html
index 41b330de..57ced06a 100644
--- a/website/rclidxfmt.html
+++ b/website/rclidxfmt.html
@@ -2,72 +2,146 @@
 <html>
   <head>
     <title>Recoll Index format</title>
+    <meta name="generator" content="HTML Tidy, see www.w3.org">
+    <meta name="Author" content="Jean-Francois Dockes">
+    <meta name="Description" content=
+    "recoll est un logiciel personnel de recherche textuelle pour unix et linux bas� sur Xapian, un moteur d'indexation puissant et mature.">
+    <meta name="Keywords" content=
+      "recherche textuelle,desktop,unix,linux,solaris,open source,free">
+    <meta http-equiv="Content-language" content="fr">
+    <meta http-equiv="content-type" content=
+    "text/html; charset=iso-8859-1">
+    <meta name="robots" content="All,Index,Follow">
+    <link type="text/css" rel="stylesheet" href="styles/style.css">
   </head>
 
   <body>
+    <div class="content">
     <h1>Recoll index format details</h1>
 
-    <p>Terms are not stemmed before being stored. They are turned to
-      all minuscule letters with no accents.</p>
+    <p>A comparison of index formats for recoll 1.8 and omega
+    1.0.1</p>
 
-    <p>Special prefixed terms:</p>
-    <ul>
-      <li>Ddate: modification date of file, like YYYYMMDD</li>
+    <p>Recoll terms are not stemmed before being stored. They are turned to
+      all minuscule letters with no accents. An auxiliary database
+      handles stem expansion. Omega stores both raw
+      terms and stemmed versions (with prefix Z)</p>
 
-      <li>Mmonth: YYYYMM</li>
+    <h2>Special prefixed terms:</h2>
 
-      <li>Ppathhash truncated/hashed version of file path. For
+    <p>A comparison of prefixed term usage between Recoll and
+      omega/xapian. <em>xapian-core</em> in the Omega column means
+      that the prefix is not used by Omega, but mentionned as
+      allocated in the xapian prefix definition document.</p>
+
+    <table border=1 cellspacing=0 width="90%">
+	<thead>
+	<tr><th>Pref.</th><th>Recoll use</th><th>Omega use</th>
+	</tr>
+      </thead>
+      <tbody>
+	<tr><td>T</td><td>mime type</td><td>Same</td>
+	</tr>
+
+	<tr><td>P</td><td>Truncated/hashed version of file path. For
 	single-document files, and for the file part of a
 	multi-document file. Used for up-to-date checks and for
-	retrieving a document by path. omega uses U for the equivalent
-	term used for up to date checks.</li>
+	retrieving a document by path. </td><td>Path part of URL (no
+	hashing). Uses U for the equivalent
+	term used for up to date checks.</td> 
+	</tr>
 
-      <li>Qpathhash+ipath same + internal path for documents inside
-	multi-document files. Used to set the existence flag for
-	subdocs when a multi-document file is found to be up to date,
-	or for deleting all subdocs for a file, or for retrieving a
-	document by path+ipath. No real omega equivalent. Compatible
-	with Q definition in termprefixes.txt: unique identifier.</li>
+	<tr><td>Q</td><td>pathhash+ipath same + internal path for
+	documents inside multi-document files. Used to set the
+	existence flag for subdocs when a multi-document file is found
+	to be up to date, or for deleting all subdocs for a file, or
+	for retrieving a document by path+ipath. Compatible
+	with Q definition in xapian/termprefixes.txt: unique
+	identifier.</td><td>None</td> 
+	</tr>
 
-      <li>Tmimetype: document mime type.</li>
+	<tr><td>D</td><td>date: modification date of file, like
+	YYYYMMDD</td><td>Same</td>
+	</tr>
 
-      <li>Wweak: 10 days period (not used any more by omega)</li>
+	<tr><td>M</td><td>month: YYYYMM</td><td>Same</td>
+	</tr>
+	<tr><td>Y</td><td>year YYYY</td><td>Same</td>
+	</tr>
 
-      <li>Yyear YYYY</li>
+	<tr><td>XSFN</td><td>utf8 version of file name. Used for specific
+	file name searches</td><td>None</td>
+	</tr>
+	<tr><td>U</td><td>None</td><td>Url term. Truncated/hashed version
+	    of URL. Used for duplicate checks.</td>
+	</tr>
 
-      <li>XSFNfilename utf8 version of file name. Used for specific
-	file name searches</li>
+	<tr><td>S</td><td>Subject/title</td><td>xapian-core</td>
+	</tr>
+	<tr><td>A</td><td>Author</td><td>xapian-core</td>
+	</tr>
+	<tr><td>K</td><td>Keyword</td><td>xapian-core</td>
+	</tr>
+	
+      </tbody>
+    </table>
 
-    </ul>
-
-    <p>Omega prefixes with no equivalents in Recoll: P, R, U</p>
     <p>None of the "date" terms are currently used by recoll queries</p>
 
-    <p>Values: Recoll currently stores no document values.</p>
+    <h2>Values</h2>
+    <p>Recoll currently stores no document values.</p>
+    <p>Omega stores 2 values, for the md5 hash of the file, and the
+      last modification date (as unix time). The md5 value doesn't
+      appear to be currently used ?</p>
 
-    <p>Document data record format<p>
-    <ul>
-      <li>url= Full url. Always file://abspath. The path is not
+    <h2>Document data record format</h2>
+      <p>Recoll has the same line based / prefixed data record format
+      as omega (name=value\n).</p>
+
+    <table border=1 cellspacing=0 width="90%">
+	<thead>
+	<tr><th>Prefix</th><th>Recoll use</th><th>Omega use</th>
+	</tr>
+      </thead>
+      <tbody>
+	
+      <tr><td>url=</td><td>Full url. Always file://abspath. The path is not
 	encoded to utf-8, this is the system file name ,usable as an
-	argument to open(). (omega: sort of same)</li>
-      <li>mtype= mime type (omega: type)</li>
-      <li>fmtime= file modification date (omega: modtime)</li>
-      <li>dmtime= document modification date (omega: none)</li>
-      <li>origcharset= character set the text was converted from
-	(omega: none)</li>
-      <li>fbytes= file size in bytes (omega: size)</li>
-      <li>dbytes= document size in bytes (omega: none)</li>
-      <li>ipath= internal path for docs in multidoc files. (omega: none)</li>
-      <li>caption= title of document, utf8 (omega: same)</li>
-      <li>keywords= key words, utf8 (omega: none)</li>
-      <li>abstract= document abstract, utf8 (omega: sample)</li>
-    </ul>
+	argument to open()</td><td>Same</td>
+	</tr>
+
+	<tr><td>mtype=</td><td>mime type (omega: type)</td><td>type=</td>
+	</tr>
+	<tr><td>fmtime=</td><td>file modification date</td><td>modtime=</td>
+	</tr>
+	<tr><td>dmtime=</td><td> document modification date</td><td>None</td>
+	</tr>
+	<tr><td>origcharset=</td><td> character set the text was
+	    converted from</td><td>None</td>
+	</tr>
+	<tr><td>fbytes=</td><td> file size in bytes</td><td>size=</td>
+	</tr>
+	<tr><td>dbytes=</td><td>document size in bytes</td><td>None</td>
+	</tr>
+	<tr><td>ipath=</td><td>internal path for docs in multidoc
+	    files</td><td>None</td>
+	</tr>
+
+	<tr><td>caption=</td><td>title of document, utf8</td><td>Same</td>
+	</tr>
+	<tr><td>keywords=</td><td>key words, utf8</td><td>None</td>
+	</tr>
+	<tr><td>abstract=</td><td>document abstract, utf8</td><td>sample=</td>
+	</tr>
+      </tbody>
+    </table>
+    </div>
 
     <hr>
     <address><a href="mailto:jean-francois.dockes@wanadoo.fr">Jean-Francois Dockes</a></address>
 <!-- Created: Thu Dec  7 13:07:40 CET 2006 -->
 <!-- hhmts start -->
-Last modified: Thu Dec  7 14:19:02 CET 2006
+Last modified: Thu Jun 14 11:14:38 CEST 2007
 <!-- hhmts end -->
   </body>
 </html>
diff --git a/website/smile.png b/website/smile.png
new file mode 100644
index 00000000..49d678dd
Binary files /dev/null and b/website/smile.png differ
diff --git a/website/styles/style.css b/website/styles/style.css
index f4a7918c..8fd1315a 100644
--- a/website/styles/style.css
+++ b/website/styles/style.css
@@ -92,3 +92,4 @@ a.weak {
     color: #aaaaaa;
 }
 
+table { empty-cells:show; }

Data	Data size	Indexing time	Index size	Peak process memory usage
Random pdfs harvested on Google	1.7 GB, 3564 files	27 mn	230 MB	225 MB
Ietf mailing list archive	211 MB, 44,000 messages	8 mn	350 MB	90 MB
Partial Wikipedia dump	15 GB, one million files	6H30	10 GB	324 MB
Random pdfs harvested on Google + Recoll 1.9, idxflushmb set to 10	1.7 GB, 3564 files	25 mn	262 MB	65 MB
Pref.	Recoll use	Omega use
T	mime type	Same
P	Truncated/hashed version of file path. For single-document files, and for the file part of a multi-document file. Used for up-to-date checks and for - retrieving a document by path. omega uses U for the equivalent - term used for up to date checks. + retrieving a document by path.	Path part of URL (no + hashing). Uses U for the equivalent + term used for up to date checks.
Q	pathhash+ipath same + internal path for + documents inside multi-document files. Used to set the + existence flag for subdocs when a multi-document file is found + to be up to date, or for deleting all subdocs for a file, or + for retrieving a document by path+ipath. Compatible + with Q definition in xapian/termprefixes.txt: unique + identifier.	None
D	date: modification date of file, like + YYYYMMDD	Same
M	month: YYYYMM	Same
Y	year YYYY	Same
XSFN	utf8 version of file name. Used for specific + file name searches	None
U	None	Url term. Truncated/hashed version + of URL. Used for duplicate checks.
S	Subject/title	xapian-core
A	Author	xapian-core
K	Keyword	xapian-core
Prefix	Recoll use	Omega use
url=	Full url. Always file://abspath. The path is not encoded to utf-8, this is the system file name ,usable as an - argument to open(). (omega: sort of same) - mtype= mime type (omega: type) - fmtime= file modification date (omega: modtime) - dmtime= document modification date (omega: none) - origcharset= character set the text was converted from - (omega: none) - fbytes= file size in bytes (omega: size) - dbytes= document size in bytes (omega: none) - ipath= internal path for docs in multidoc files. (omega: none) - caption= title of document, utf8 (omega: same) - keywords= key words, utf8 (omega: none) - abstract= document abstract, utf8 (omega: sample) - + argument to open()	Same
mtype=	mime type (omega: type)	type=
fmtime=	file modification date	modtime=
dmtime=	document modification date	None
origcharset=	character set the text was + converted from	None
fbytes=	file size in bytes	size=
dbytes=	document size in bytes	None
ipath=	internal path for docs in multidoc + files	None
caption=	title of document, utf8	Same
keywords=	key words, utf8	None
abstract=	document abstract, utf8	sample=