This commit is contained in:
Jean-Francois Dockes 2010-10-19 15:57:36 +02:00
parent fe108af875
commit 9d89fc2061
6 changed files with 342 additions and 356 deletions

View File

@ -1,7 +1,8 @@
<!DOCTYPE BOOK PUBLIC "-//FreeBSD//DTD DocBook V4.1-Based Extension//EN" [ <!DOCTYPE BOOK PUBLIC "-//FreeBSD//DTD DocBook V4.1-Based Extension//EN" [
<!ENTITY RCL "<application>Recoll</application>"> <!ENTITY RCL "<application>Recoll</application>">
<!ENTITY RCLVERSION "1.12-1.13"> <!ENTITY RCLAPPS "<ulink url='http://www.recoll.org/features.html'>Recoll helper applications page</ulink>">
<!ENTITY RCLVERSION "1.14">
<!ENTITY XAP "<application>Xapian</application>"> <!ENTITY XAP "<application>Xapian</application>">
]> ]>
@ -2630,128 +2631,109 @@ while query.next >= 0 and query.next < nres:
<command>iconv</command> command, which is not always listed as a <command>iconv</command> command, which is not always listed as a
dependancy.</para> dependancy.</para>
<para>As of &RCL; release 1.14, a number of XML-based formats that <para>Please note that, due to the relatively dynamic nature of this
were handled by ad hoc filter code now use information, the most up to date version is now kept on the &RCLAPPS;
<command>xsltproc</command>, which usually comes with along with links to the home pages or best source/patches download
<ulink links. The list below is not updated often and may be quite
url="http://xmlsoft.org/XSLT/index.html">libxslt</ulink>. These stale.</para>
are: abiword, fb2 (ebooks), kword, openoffice, svg.</para>
<para>For many Linux distributions, most of the commands listed can
be installed from the package repositories. However, the packages
are sometimes outdated, or not the best version for &RCL;, so you
should take a look at the &RCLAPPS; if a file
type is important to you.</para>
<para>As of &RCL; release 1.14, a number of XML-based formats that
were handled by ad hoc filter code now use the
<command>xsltproc</command>, which usually comes with
<application>libxslt</application>. These are: abiword, fb2
(ebooks), kword, openoffice, svg.</para>
<para>Now for the list:</para>
<itemizedlist> <itemizedlist>
<listitem><para>Openoffice: supported natively, but needs the <listitem><para>Openoffice files need <command>unzip</command> and
<command>unzip</command> command to be installed.</para> <command>xsltproc</command>.</para></listitem>
<listitem><para>PDF files need <command>pdftotext</command> which
is part of the <application>Xpdf</application> or
<application>Poppler</application> packages.</para></listitem>
<listitem><para>Postscript files need <command>pstotext</command>.
The original version has an issue with shell
character in file names, which is corrected in recent
packages. See the the &RCLAPPS; for more detail.
</listitem> </listitem>
<listitem><para>PDF: pdftotext is part of the <ulink <listitem><para>MS Word needs
url="http://www.foolabs.com/xpdf/">Xpdf</ulink> or <ulink <command>antiword</command>. It is also useful to have
url="http://poppler.freedesktop.org/">Poppler</ulink> packages.</para> <command>wvWare</command> installed as it may be
be used as a fallback for some files which
<command>antiword</command> does not handle.</para></listitem>
<listitem><para>MS Excel and PowerPoint need <command>
catdoc</command>.</para></listitem>
<listitem><para>MS Open XML (docx) needs <command>
xsltproc</command>.</para></listitem>
<listitem><para>Wordperfect files need <command>wpd2html</command>
from the <application>libwpd</application> package.</para></listitem>
<listitem><para>RTF files need <command>unrtf</command>, which, in
its standard version, has much trouble with non-western character
sets. Check the &RCLAPPS;.</para></listitem>
<listitem><para>TeX files need <command>untex</command> or
<command>detex</command>. Check the &RCLAPPS; for sources if it's not
packaged for your distribution.</para></listitem>
<listitem><para>dvi files need <command>dvips</command>.</para>
</listitem> </listitem>
<listitem><para>Postscript: <ulink <listitem><para>djvu files need <command>djvutxt</command> and
url="http://www.cs.wisc.edu/~ghost/doc/pstotext.htm"> <command>djvused</command> from the
pstotext</ulink>. The original version has an issue with shell <application>DjVuLibre</application> package.</para></listitem>
character in file names. Most recent package repositories /
ports system use a patched version (ie FreeBSD, Debian). If <listitem><para>Audio files: &RCL; releases before 1.13
compiling from source, it would be better to apply the patch used the <command>id3info</command> command from the <application>
found id3lib</application> package to extract mp3 tag information,
<ulink url="http://www.recoll.org/files/pstotext-1.9_4-debian.patch"> <command>metaflac</command> (standard flac tools) for flac files,
here</ulink>.</para> and <command>ogginfo</command> (vorbis tools) for ogg
files. Releases 1.14 and later use a single
<application>Python</application> filter based
on <application>mutagen</application> for all audio file
types.</para>
</listitem> </listitem>
<listitem><para>MS Word: <ulink url="http://www.winfield.demon.nl"> <listitem><para>Pictures: &RCL; uses the
antiword</ulink>.</para> <application>Exiftool</application>
</listitem> <application>Perl</application> package to extract tag
information. Most image file formats are supported. Note that
<listitem><para>MS Excel and PowerPoint: there may not be much interest in indexing the technical tags
<ulink url="http://catdoc.klik.atekon.de/"> (image size, aperture, etc.). This is only of interest if you
catdoc</ulink>.</para> store personal tags or textual descriptions inside the image
</listitem> files.</para></listitem>
<listitem><para>MS Open XML (docx): needs
<command>xsltproc</command>.</para>
</listitem>
<listitem><para>Wordperfect files:
<ulink url="http://libwpd.sourceforge.net/download.html">
libwpd</ulink>.</para>
</listitem>
<listitem>
<para>RTF: <ulink
url="http://www.gnu.org/software/unrtf/unrtf.html">unrtf</ulink>
</para>
</listitem>
<listitem>
<para>TeX: &RCL; uses the <application>untex</application>
program. Your distribution may have a package for it. If it doesn't,
<ulink url="http://www.recoll.org/untex/untex-1.3.jf.tar.gz">
there is a copy of the source on the &RCL; web site</ulink>,
because the program has no obvious home. The filter can
also work with
<ulink url="http://www.cs.purdue.edu/homes/trinkle/detex/">
detex</ulink> and will use it if it is installed.</para>
</listitem>
<listitem>
<para>dvi: <ulink
url="http://www.radicaleye.com/dvips.html">dvips</ulink></para>
</listitem>
<listitem>
<para>djvu:
<ulink
url="http://djvu.sourceforge.net">DjVuLibre
</ulink></para>
</listitem>
<listitem><para>mp3, flac, ogg vorbis: &RCL; releases before 1.13
use the <command>id3info</command> command from the <ulink
url="http://id3lib.sourceforge.net/">id3lib</ulink> package to
extract mp3 tag information. (Some gcc versions after 4.4 may have
trouble compiling <application>id3lib</application>. <ulink
url="http://www.recoll.org/id3lib.html">You can find a
workaround here</ulink>), metaflac (standard flac tools) for flac
files, and ogginfo (vorbis tools) for ogg files. Releases 1.14
and later use a single Python filter based on
<ulink url="http://code.google.com/p/mutagen/">mutagen</ulink>
for all audio file types.</para>
</listitem>
<listitem>
<para>Pictures: &RCL; uses the
<ulink url="http://www.sno.phy.queensu.ca/~phil/exiftool/">
Exiftool</ulink> <application>Perl</application> package to
extract tag information. Most image file formats are
supported. Note that there may not be much interest in indexing
the technical tags (image size, aperture, etc.). This is only of
interest if you store personal tags or textual descriptions inside
the image files.</para>
</listitem>
<listitem><para>chm: files in microsoft help format need Python and <listitem><para>chm: files in microsoft help format need Python and
the <ulink the <application>pychm</application> module (which needs
url="http://gnochm.sourceforge.net/pychm.html">pychm</ulink> <application>chmlib</application>).</para></listitem>
module (which needs <ulink
url="http://www.jedrea.com/chmlib/">chmlib</ulink>).</para>
</listitem>
<listitem><para>ics: up to &RCL; 1.13, iCalendar files need Python <listitem><para>ICS: up to &RCL; 1.13, iCalendar files need
and the <application>icalendar</application> module. For newer <application>Python</application>
versions, <application>icalendar</application> is not needed and the <application>icalendar</application>
</para></listitem> module. <application>icalendar</application> is not needed for newer
versions, which use internal code.</para></listitem>
<listitem><para>zip: Zip archives need Python (and the standard <listitem><para>Zip archives need <application>Python</application>
zipfile module).</para> (and the standard zipfile module).</para></listitem>
</listitem>
</itemizedlist> </itemizedlist>
<para>Text, HTML, mail folders, Openoffice and Scribus files <para>Text, HTML, mail folders, and Scribus files are
are processed internally. Lyx is used to index Lyx files. Many processed internally. <application>Lyx</application> is used to
filters need <command>iconv</command> and the standard index Lyx files. Many filters need <command>iconv</command> and the
<command>sed</command> and <command>awk</command>. standard <command>sed</command> and <command>awk</command>.
</para> </para>
</sect1> </sect1>

View File

@ -46,18 +46,11 @@
<li><a href="perfs.html">Index size and indexing performance <li><a href="perfs.html">Index size and indexing performance
data.</a></li> data.</a></li>
<li>Faqs and Howtos are now kept in the <li><a href="http://bitbucket.org/medoc/recoll/wiki/FaqsAndHowTos">
<a href="http://bitbucket.org/medoc/recoll/wiki/FaqsAndHowTos"> Faqs and Howtos</a> are now kept in the
<a href="http://bitbucket.org/medoc/recoll/wiki/">
Recoll Wiki</a> on Recoll Wiki</a> on
<a href="http://bitbucket.org/medoc/recoll">bitbucket.org</a>.</li> <a href="http://bitbucket.org/medoc/recoll">bitbucket.org</a>.</li>
<p>Current list of HowTos:</p>
<ul>
<li><a href="http://bitbucket.org/medoc/recoll/wiki/PreventIndexingDir">Prevent indexing of a directory</a></li>
<li><a href="http://bitbucket.org/medoc/recoll/wiki/MultipleIndexes">Creating and using multiple indexes</a></li>
<li><a href="http://bitbucket.org/medoc/recoll/wiki/SavingConfig.wiki">Recoll configuration backup</a></p>
<li><a href="http://bitbucket.org/medoc/recoll/wiki/IndexMozillaCalendari">Indexing Mozilla Sunbird / Lightning calendar data</a></li>
</ul>
</ul> </ul>
</div> </div>

View File

@ -384,7 +384,8 @@ sudo add-apt-repository ppa:recoll-backports/ppa
<h2><a name="translations">Translations</a></h2> <h2><a name="translations">Translations</a></h2>
<p>Most of the translations for 1.13 are incomplete. The source <p>Most of the translations for 1.13 are incomplete (and I
forgot to update the message files for 1.14, ugh). The source
translation files are included in the source release. If your translation files are included in the source release. If your
language has some english messages left and you want to take a language has some english messages left and you want to take a
shot at fixing the problem, you can send the results to shot at fixing the problem, you can send the results to
@ -400,17 +401,17 @@ sudo add-apt-repository ppa:recoll-backports/ppa
</p> </p>
<p><a href="translations/recoll_xx.ts">recoll_xx.ts</a> is a blank <p><a href="translations/recoll_xx.ts">recoll_xx.ts</a> is a blank
Recoll 1.13 message file, handy to work on a new translation.</p> Recoll 1.14 message file, handy to work on a new translation.</p>
<h3>Updated 1.13 translations that became available after the <h3>Updated 1.13/1.14 translations that became available after the
release:</h3> release:</h3>
<p>None for now :(</p> <!-- <p>None for now :(</p> -->
<!-- <p>Lithuanian.
<p>German. <a href="translations/recoll_lt.ts">recoll_lt.ts</a>
<a href="translations/recoll_de.ts">recoll_de.ts</a> <a href="translations/recoll_lt.qm">recoll_lt.qm</a>
<a href="translations/recoll_de.qm">recoll_de.qm</a>
</p> </p>
<!--
<p>Ukrainian. <p>Ukrainian.
<a href="translations/recoll_uk.ts">recoll_uk.ts</a> <a href="translations/recoll_uk.ts">recoll_uk.ts</a>
<a href="translations/recoll_uk.qm">recoll_uk.qm</a> <a href="translations/recoll_uk.qm">recoll_uk.qm</a>

View File

@ -18,189 +18,194 @@
</head> </head>
<body> <body>
<div class="rightlinks"> <div class="rightlinks">
<ul> <ul>
<li><a href="index.html">Home</a></li> <li><a href="index.html">Home</a></li>
<li><a href="pics/index.html">Screenshots</a></li> <li><a href="pics/index.html">Screenshots</a></li>
<li><a href="download.html">Downloads</a></li> <li><a href="download.html">Downloads</a></li>
<li><a href="usermanual/index.html">User manual</a></li> <li><a href="usermanual/index.html">User manual</a></li>
<li><a href="index.html#support">Support</a></li> <li><a href="index.html#support">Support</a></li>
<li><a href="devel.html">Development</a></li> <li><a href="devel.html">Development</a></li>
</ul> </ul>
</div> </div>
<div class="content"> <div class="content">
<h1 class="intro">Recoll features</h1> <h1 class="intro">Recoll features</h1>
<dl> <h2><a name="systems">Supported systems</a></h2>
<dt><a name="systems">Supported systems</a></dt>
<dd><span class="application">Recoll</span> has been compiled and
tested on FreeBSD, Linux, Darwin and Solaris (versions
FreeBSD 5-7, Redhat 7/8/9, Fedora Core 5-13, Suse 10/11,
Gentoo, Debian 3.1, Solaris 8/9/10. Other not too distant
releases should be ok too).</dd>
<dd>Qt versions from 3.1 to 4.5</dd> <p><span class="application">Recoll</span> has been compiled
and tested on FreeBSD, Linux, Darwin and Solaris (initial
versions FreeBSD 5, Redhat 7, Fedora Core 5, Suse 10, Gentoo,
Debian 3.1, Solaris 8). It should compile and run on all
subsequent releases of these systems and probably a few
others too.</p>
<dt><a name="doctypes">Document types</a></dt> <p>Qt versions from 3.1 to 4.7</p>
<dd>Recoll can index many document types (along with their
<h2><a name="doctypes">Document types</a></h2>
<p>Recoll can index many document types (along with their
compressed versions). Some types are handled internally (no compressed versions). Some types are handled internally (no
external application needed). Other types need some application to external application needed). Other types need a separate
be installed to extract the text. Types that only need common application to be installed to extract the text. Types that
very common utilities (awk/sed/groff etc.) are listed in the only need very common utilities (awk/sed/groff etc.) are
native section.</dd> listed in the native section.</p>
<dl> <h4>File types indexed natively</h4>
<dt>Natively</dt>
<dd>
<ul> <ul>
<li><span class="literal">text</span>.</li> <li><span class="literal">text</span>.</li>
<li><span class="literal">html</span>.</li> <li><span class="literal">html</span>.</li>
<li><span class="literal">maildir</span> and <span <li><span class="literal">maildir</span> and <span class=
class="literal">mailbox</span> (<span class= "literal">mailbox</span> (<span class=
"literal">Mozilla</span>, <span class= "literal">Mozilla</span>, <span class=
"literal">Thunderbird</span> and <span class= "literal">Thunderbird</span> and <span class=
"literal">Evolution</span> mail ok).</li> "literal">Evolution</span> mail ok).</li>
<li><span class="literal">OpenOffice</span> <li><span class="literal">gaim</span> and <span class=
files (needs <span class="command">unzip</span> command).</li> "literal">purple</span> log files.</li>
<li><span class="literal">Abiword</span> files.</li> <li><span class="literal">Lyx</span> files (needs <span
class="literal">Lyx</span> to be installed).</li>
<li><span class="literal">Kword</span> files.</li>
<li><span class="literal">gaim</span> and <span
class="literal">purple</span> log files.</li>
<li><span class="literal">Lyx</span> files (needs
<span class="literal">Lyx</span> to be installed).</li>
<li><span class="literal">Scribus</span> files.</li> <li><span class="literal">Scribus</span> files.</li>
<li><span class="literal">Man pages</span> (need <span <li><span class="literal">Man pages</span> (need <span
class="command">groff</span>).</li> class="command">groff</span>).</li>
</ul> </ul>
</dd>
<dt>With external helpers</dt> <h4>File types indexed with external helpers</h4>
<dd> <p>Many document types need the <span class="command">iconv</span>
<para>In addition to the applications listed below, many command in addition to the applications specifically listed.</p>
document types need the <span
class="command">iconv</span> command.</para> <p>The following types need <span class=
"command">xsltproc</span> from the <b>libxslt</b> package.
Quite a few also need <span class="command">unzip</span>:</p>
<ul> <ul>
<li><span class="literal">Microsoft Office Open XML</span> <li><span class="literal">Abiword</span> files.</li>
files with the <span class="command">unzip</span>
and <span class="command">xsltproc</span> commands.</li>
<li><span class="literal">pdf</span> with the <span <li><span class="literal">Fb2</span> ebooks.</li>
class="command">pdftotext</span> command, which can be
installed as part of <a href= <li><span class="literal">Kword</span> files.</li>
"http://www.foolabs.com/xpdf/">xpdf</a> or <a
href="http://poppler.freedesktop.org/">poppler</a>, <li><span class="literal">Microsoft Office Open XML</span>
files.</li>
<li><span class="literal">OpenOffice</span> files.</li>
<li><span class="literal">SVG</span> files.</li>
</ul>
<p>Others:</p>
<ul>
<li><span class="literal">pdf</span> with the <span class=
"command">pdftotext</span> command, which can be installed
as part of <a href="http://www.foolabs.com/xpdf/">xpdf</a>
or <a href="http://poppler.freedesktop.org/">poppler</a>,
depending on your distribution.</li> depending on your distribution.</li>
<li><span class="literal">msword</span> with <a href= <li><span class="literal">msword</span> with <a href=
"http://www.winfield.demon.nl/">antiword</a>.</li> "http://www.winfield.demon.nl/">antiword</a>.</li>
<li><span class="literal">Powerpoint</span> and <li><span class="literal">Powerpoint</span> and <span
<span class="literal">Excel</span> with the class="literal">Excel</span> with the <a href=
<a href="http://catdoc.klik.atekon.de"> "http://catdoc.klik.atekon.de">catdoc</a> utilities.</li>
catdoc</a> utilities.</li>
<li><span class="literal">CHM (Microsoft help)</span> <li><span class="literal">CHM (Microsoft help)</span> files
files (needs <span class="command">Python, pychm or (needs <span class="command">Python, pychm or
chmlib</span>).</li> chmlib</span>).</li>
<li><span class="literal">Zip</span> <li><span class="literal">Zip</span> archives (needs <span
archives (needs <span class="command">Python</span>).</li> class="command">Python</span>).</li>
<li><span class="literal">iCalendar</span>(.ics) files <li><span class="literal">iCalendar</span>(.ics) files
(needs <span class="command">Python, (needs <span class="command">Python, <a href=
<a href="http://pypi.python.org/pypi/icalendar/2.1">icalendar</a></span>).</li> "http://pypi.python.org/pypi/icalendar/2.1">icalendar</a></span>).</li>
<li><span class="literal">Mozilla calendar data</span> <li><span class="literal">Mozilla calendar data</span> See
See <a href="http://bitbucket.org/medoc/recoll/wiki/IndexMozillaCalendari"> <a href=
"http://bitbucket.org/medoc/recoll/wiki/IndexMozillaCalendari">
the wiki</a> about this.</li> the wiki</a> about this.</li>
<li><span class="literal">Wordperfect</span> with <a href= <li><span class="literal">Wordperfect</span> with <a href=
"http://libwpd.sourceforge.net">libwpd</a>.</li> "http://libwpd.sourceforge.net">libwpd</a>.</li>
<li><span class="literal">postscript</span> with <li><span class="literal">postscript</span> with <a href=
<a href="http://www.gnu.org/software/ghostscript/ghostscript.html"> "http://www.gnu.org/software/ghostscript/ghostscript.html">ghostscript</a>
ghostscript</a> and and <a href=
<a href="http://www.cs.wisc.edu/~ghost/doc/pstotext.htm"> "http://www.cs.wisc.edu/~ghost/doc/pstotext.htm">pstotext</a>.
pstotext</a>. Actually the pstotext 1.9 found at the latter link has a
Actually the pstotext 1.9 found at the latter link problem with file names using special shell characters, and
has a problem with file names using special shell you should either use the version packaged for your system
characters, and you should either use the version which is probably patched, or apply the Debian patch which
packaged for your system which is probably patched, is stored <a href=
or apply the Debian patch which is "files/pstotext-1.9_4-debian.patch">here</a> for
stored <a href="files/pstotext-1.9_4-debian.patch">here</a> convenience. See
for convenience. See
http://packages.debian.org/squeeze/pstotext and http://packages.debian.org/squeeze/pstotext and
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=356988 http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=356988 for
for references/explanations.</li> references/explanations.</li>
<li><span class="literal">rtf</span> with <a href= <li><span class="literal">RTF</span> files with <a href=
"http://www.gnu.org/software/unrtf/unrtf.html">unrtf</a>.</li> "http://www.gnu.org/software/unrtf/unrtf.html">unrtf</a>. Please
note that up to version
0.21, <span class="command">unrtf</span> mostly does not work
with non western-european character sets. If you have a need
for indexing, ie, russian or chinese RTF files, I have
produced a modified version which works much better (as
indicated by my tests and a few external ones). You can
download the <a href="unrtf/unrtf-0.22.0beta.tar.gz">source
here</a>. The development is hosted
on <a href="http://www.bitbucket.org/medoc/unrtf-int">
bitbucket.org</a>.</li>
<li><span class="literal">TeX</span> with <li><span class="literal">TeX</span> with <span class=
<span class="command">untex</span>. If there is no untex "command">untex</span>. If there is no untex package for
package for your distribution, your distribution, <a href="untex/untex-1.3.jf.tar.gz">a
<a href="untex/untex-1.3.jf.tar.gz">a source package is source package is stored on this site</a> (as untex has no
stored on this site</a> (as untex has no obvious obvious home). Will also work with <a href=
home). "http://www.cs.purdue.edu/homes/trinkle/detex/">detex</a>
Will also work if this is installed.</li>
with <a
href="http://www.cs.purdue.edu/homes/trinkle/detex/">detex</a>
if this is installed.
</li>
<li><span class="literal">dvi</span> with <li><span class="literal">dvi</span> with <a href=
<a href="http://www.radicaleye.com/dvips.html">dvips</a>. "http://www.radicaleye.com/dvips.html">dvips</a>.</li>
</li>
<li><span class="literal">djvu</span> with <li><span class="literal">djvu</span> with <a href=
<a href="http://djvu.sourceforge.net">DjVuLibre</a>. "http://djvu.sourceforge.net">DjVuLibre</a>.</li>
</li>
<li><span class="literal">mp3/flac/ogg vorbis</span>
tags support with
<a href="http://id3lib.sourceforge.net/">id3info (id3lib)
</a> (compiling id3lib on recent systems may need
a small patch, see <a href="id3lib.html">here.</a>) or
the ogg and flac tools. Release 1.14 and later use a
python filter based on
<a href="http://code.google.com/p/mutagen/">mutagen</a>
for all audio tags.
</li>
<li>Image file tags support with
<a href="http://www.sno.phy.queensu.ca/~phil/exiftool/">
exiftool</a>. This is a perl program, so you also
need perl on the system. This works with about any
possible image file and tag format (jpg, png, tiff,
gif etc.).
</li>
<li>Audio file tags: Recoll releases 1.13 and older use <a
href="http://id3lib.sourceforge.net/">id3info (id3lib)</a>
(compiling id3lib on recent systems may need a small patch,
see <a href="id3lib.html">here.</a>) or the ogg and flac
tools.<br>
Recoll releases 1.14 and later use a Python filter based
on <a href="http://code.google.com/p/mutagen/">mutagen</a>
for all audio types.</li>
<li>Image file tags support with <a href=
"http://www.sno.phy.queensu.ca/~phil/exiftool/">exiftool</a>.
This is a perl program, so you also need perl on the
system. This works with about any possible image file and
tag format (jpg, png, tiff, gif etc.).</li>
</ul> </ul>
</dd>
</dl>
</dd>
<dt>Other features</dt> <h2>Other features</h2>
<dd>
<ul> <ul>
<li>Can use <b>Beagle</b> browser plug-ins to index web <li>Can use <b>Beagle</b> browser plug-ins to index web
history. See the history. See the <a href=
<a href="http://bitbucket.org/medoc/recoll/wiki/IndexBeagleWeb"> "http://bitbucket.org/medoc/recoll/wiki/IndexBeagleWeb">the
the Wiki</a> for more detail.</li> Wiki</a> for more detail.</li>
<li>Processes all email attachments.</li> <li>Processes all email attachments.</li>
@ -211,8 +216,8 @@
<li>Xesam-compatible query language.</li> <li>Xesam-compatible query language.</li>
<li>Wildcard searches (with a specific and faster function for <li>Wildcard searches (with a specific and faster function
file names).</li> for file names).</li>
<li>Support for multiple charsets. Internal processing and <li>Support for multiple charsets. Internal processing and
storage uses Unicode UTF-8.</li> storage uses Unicode UTF-8.</li>
@ -223,55 +228,58 @@
<li>Easy installation. No database daemon, web server or <li>Easy installation. No database daemon, web server or
exotic language necessary.</li> exotic language necessary.</li>
<li>An indexer which runs either as a thread inside the GUI, <li>An indexer which runs either as a thread inside the
as an external, batch, cron'able program, or as a GUI, as an external, batch, cron'able program, or as a
real-time indexing daemon.</li> real-time indexing daemon.</li>
</ul> </ul>
</dd>
</ul>
<h2><a name="#stemming"></a>Stemming</h2> <h2><a name="#stemming"></a>Stemming</h2>
<p>Stemming is a process which transforms inflected words into <p>Stemming is a process which transforms inflected words
their most basic form. For example, <i>flooring</i>, into their most basic form. For example, <i>flooring</i>,
<i>floors</i>, <i>floored</i> would probably all be transformed <i>floors</i>, <i>floored</i> would probably all be
to <i>floor</i> by a stemmer for the English language.</p> transformed to <i>floor</i> by a stemmer for the English
language.</p>
<p>In many search engines, the stemming process occurs during <p>In many search engines, the stemming process occurs during
indexing. The index will only contain the stemmed form of words, indexing. The index will only contain the stemmed form of
with exceptions for terms which are detected as being probably words, with exceptions for terms which are detected as being
proper nouns (ie: capitalized). At query time, the terms entered probably proper nouns (ie: capitalized). At query time, the
by the user are stemmed, then matched against the index.</p> terms entered by the user are stemmed, then matched against
the index.</p>
<p>This process results into a smaller index, but it has the <p>This process results into a smaller index, but it has the
grave inconvenient of irrevocably losing information during grave inconvenient of irrevocably losing information during
indexing.</p> indexing.</p>
<p>Recoll works in a different way. No stemming is performed at <p>Recoll works in a different way. No stemming is performed
query time, so that all information gets into the index. The at query time, so that all information gets into the index.
resulting index is bigger, but most people probably don't care The resulting index is bigger, but most people probably don't
much about this nowadays, because they have a 100Gb disk 95% care much about this nowadays, because they have a 100Gb disk
full of binary data <em>which does not get indexed</em>.</p> 95% full of binary data <em>which does not get
<p>At the end of an indexing pass, Recoll builds one or several indexed</em>.</p>
stemming dictionaries, where all word stems are listed in
correspondence to the list of their derivatives.</p> <p>At the end of an indexing pass, Recoll builds one or
several stemming dictionaries, where all word stems are
listed in correspondence to the list of their
derivatives.</p>
<p>At query time, by default, user-entered terms are stemmed, <p>At query time, by default, user-entered terms are stemmed,
then matched against the stem database, and the query is then matched against the stem database, and the query is
expanded to include all derivatives. This will yield search expanded to include all derivatives. This will yield search
results analogous to those obtained by a classical engine. results analogous to those obtained by a classical engine.
The benefits of this approach is that stem expansion can be The benefits of this approach is that stem expansion can be
controlled instantly at query time in several ways: controlled instantly at query time in several ways:</p>
<ul> <ul>
<li>It can be selectively turned-off for any query term by <li>It can be selectively turned-off for any query term by
capitalizing it (<i>Floor</i>).</li> capitalizing it (<i>Floor</i>).</li>
<li>The stemming language (ie: english, french...) can be
selected (this supposes that several stemming databases have
been built, which can be configured as part of the indexing,
or done later, in a reasonably fast way).</li>
</ul>
<li>The stemming language (ie: english, french...) can be
selected (this supposes that several stemming databases
have been built, which can be configured as part of the
indexing, or done later, in a reasonably fast way).</li>
</ul>
</div> </div>
</body> </body>
</html> </html>

View File

@ -104,16 +104,14 @@
</ul> </ul>
</li> </li>
<li>2010-04-14 :
Recoll <a href="download.html#source">1.13.04</a> is out. It
fixes a nasty bug (broken stemming) in 1.13.02.</li>
<li>2010-01-29 : the full Recoll source repository is now <li>2010-01-29 : the full Recoll source repository is now
hosted on hosted on
<a href="http://bitbucket.org/medoc/recoll">Bitbucket</a>, along <a href="http://bitbucket.org/medoc/recoll">Bitbucket</a>,
with a Wiki and an along with a Wiki
<a href="http://bitbucket.org/medoc/recoll/issues">issues tracking (<a href="http://bitbucket.org/medoc/recoll/wiki/FaqsAndHowTos">
system</a>. Hopefully, this Faqs and Howtos</a>) and an
<a href="http://bitbucket.org/medoc/recoll/issues">
issues tracking system</a>. Hopefully, this
new channel for reporting bugs and make suggestions will new channel for reporting bugs and make suggestions will
increase the feedback rate...</li> increase the feedback rate...</li>

View File

@ -135,6 +135,10 @@
contributions en code ou en suggestions, voir la page des contributions en code ou en suggestions, voir la page des
<a class="important" href="credits.html">Attributions</a>.</p> <a class="important" href="credits.html">Attributions</a>.</p>
<h2>Autres</h2>
<p>Je loue une
<a href="http://www.metairie-enbor.com/index.html">
grande maison sympa dans l'Aude</a> :)</p>
</div> </div>
</body> </body>