This commit is contained in:
dockes 2006-12-24 08:02:11 +00:00
parent 40ee0199a9
commit 5f022b80d5
9 changed files with 114 additions and 34 deletions

View File

@ -22,6 +22,7 @@ share/icons/hicolor/48x48/apps/recoll.png
%%DATADIR%%/filters/rclxls %%DATADIR%%/filters/rclxls
%%DATADIR%%/images/document.png %%DATADIR%%/images/document.png
%%DATADIR%%/images/drawing.png %%DATADIR%%/images/drawing.png
%%DATADIR%%/images/folder.png
%%DATADIR%%/images/html.png %%DATADIR%%/images/html.png
%%DATADIR%%/images/image.png %%DATADIR%%/images/image.png
%%DATADIR%%/images/message.png %%DATADIR%%/images/message.png

View File

@ -28,4 +28,5 @@ qtgui/recoll.pro
recollinstall recollinstall
sampleconf/recoll.conf sampleconf/recoll.conf
sysconf sysconf
wasabi
wxgui wxgui

View File

@ -4,8 +4,7 @@ Bugs that are listed in an older version section are supposedly fixed in
later versions. Bugs listed in the topmost section may also exist in older later versions. Bugs listed in the topmost section may also exist in older
versions. versions.
Latest (1.6.2): Latest (1.6.3):
- 1.6 NEAR crashes: 1.6 has added NEAR searches. Unlike what recoll did - 1.6 NEAR crashes: 1.6 has added NEAR searches. Unlike what recoll did
with PHRASES, stemming expansion is performed on terms inside NEAR with PHRASES, stemming expansion is performed on terms inside NEAR
clauses (except if prevented by a capitalized entry of course). There is clauses (except if prevented by a capitalized entry of course). There is
@ -53,6 +52,11 @@ Latest (1.6.2):
exception handling (recoll catches an exception while trying the exception handling (recoll catches an exception while trying the
yest inexistant db). yest inexistant db).
1.6.2
- Relatively unfrequent issue with message boundary detection in mbox
files, could cause miscellaneous problems.
- Executing an external viewer for a file with single-quotes in the name
would not work.
*************************************************************************** ***************************************************************************
1.5.10 1.5.10
- If a defaultcharset was set in the configuration file for a subdirectory, - If a defaultcharset was set in the configuration file for a subdirectory,

View File

@ -1,15 +1,30 @@
CHANGES CHANGES
Updating from 1.2 to 1.3 or 1.4 or 1.5: 1.7.0 2006-12-20
--------------------------------------- - Email attachments are now indexed.
From version 1.3 up, there is a new feature to search specifically for file - Right-click menu option to access the parent document of an embedded
names (with wildcard processing). If you want to take full advantage of result (ie from mail attachment to parent message).
this, you should perform a full reindex after installing the new version - The sort tool has been improved: no need to restart the query after sort
(ie: use recollindex -z, or delete ~/.recoll/xapiandb). criteria change.
Also, we now use the central copies of configuration files for default - Support for real-time indexing with inotify is now enabled by default
values, and the user ones only for overrides. Your old configuration files when appropriate.
will still work, but, you may want to remove them if they are unmodified, - Recoll now warns when the configured native viewer can not be found and
or keep only the modified parameters. starts an interface for chosing another one.
- Categories (text, presentation, spreadsheets, etc.) can be used instead
of raw mime types when filtering on file types in advanced search.
- The port to qt4 is functional and can be enabled with configure --enable-qt4
- 'autophrase' option improved and may now actually be useful.
- Improved highlighting (again...)
- Display term frequencies in term explorer.
- Recollindex -e to remove data from index for listed files.
1.6.3
- Fixed problem with bad detection of mbox message boundaries.
Upgrading can change the message numbering in some cases, and you should
perform a full index update (recollindex -z) after installing
the new version.
- Fixed problem with execution of external viewer for files with
single-quotes in the name.
1.6.2 1.6.2
- Minor solaris compilation glitches only. - Minor solaris compilation glitches only.
@ -34,6 +49,18 @@ or keep only the modified parameters.
managers. managers.
- Improved recall for phrases with composite words like email addresses. - Improved recall for phrases with composite words like email addresses.
Updating from 1.2 to 1.3 or 1.4 or 1.5:
---------------------------------------
From version 1.3 up, there is a new feature to search specifically for file
names (with wildcard processing). If you want to take full advantage of
this, you should perform a full reindex after installing the new version
(ie: use recollindex -z, or delete ~/.recoll/xapiandb).
Also, we now use the central copies of configuration files for default
values, and the user ones only for overrides. Your old configuration files
will still work, but, you may want to remove them if they are unmodified,
or keep only the modified parameters.
1.5.9 1.5.9
- Fix bad timezone conversion in email dates. Display timezone in result - Fix bad timezone conversion in email dates. Display timezone in result
list dates. list dates.

View File

@ -55,20 +55,20 @@
<h3>Source</h3> <h3>Source</h3>
<p><b>Current version:</b> <p><b>The cutting edge</b>
1.6.1: <a href="recoll-1.6.1.tar.gz">recoll-1.6.1.tar.gz</a> Version 1.7.0: <a
See the <a href="BUGS.txt">known bugs and issues</a> and <a href="recoll-1.7.0.tar.gz">recoll-1.7.0.tar.gz</a> brings some
href="CHANGES.txt">changes</a>.</p> nice features such as email attachment indexing, and
improvements to real-time indexing session support. See the
<a href="CHANGES.txt">changes file</a> for more detail.</p>
<p>recoll 1.6 has the capacity to perform proximity searches (a <p><b>Current version:</b>
bit like phrases, but unordered). There is a still unpatched 1.6.3: <a href="recoll-1.6.3.tar.gz">recoll-1.6.3.tar.gz</a>
problem in Xapian 0.9.9 which will make NEAR searches fail. See the <a href="BUGS.txt">known bugs and issues</a> and
If you intend to perform proximity searches, have a look at the <a href="CHANGES.txt">changes</a>.</p>
<a href="BUGS.txt">errata</a> for a workaround and Xapian
patch. All the statically linked binary packages below use a
patched Xapian-core library in order for NEAR searches to work.</p>
<p>Older recoll releases: <p>Older recoll releases:
<a href="recoll-1.6.1.tar.gz">1.6.1</a>
<a href="recoll-1.5.11.tar.gz">1.5.11</a>. <a href="recoll-1.5.11.tar.gz">1.5.11</a>.
<a href="recoll-1.5.6.tar.gz">1.5.6</a>. <a href="recoll-1.5.6.tar.gz">1.5.6</a>.
<a href="recoll-1.4.3.tar.gz">1.4.3</a>. <a href="recoll-1.4.3.tar.gz">1.4.3</a>.
@ -94,11 +94,11 @@
<p><b>Mandriva 2006</b> (also works on 2005 and 2007) <p><b>Mandriva 2006</b> (also works on 2005 and 2007)
RPM: RPM:
<a href="recoll-1.6.1-0.1.20060mdk.i586.rpm"> <a href="recoll-1.6.3-0.1.20060mdk.i586.rpm">
recoll-1.6.1-0.1.20060mdk.i586.rpm</a>. recoll-1.6.3-0.1.20060mdk.i586.rpm</a>.
Source: Source:
<a href="recoll-1.6.1-0.1.20060mdk.src.rpm"> <a href="recoll-1.6.3-0.1.20060mdk.src.rpm">
recoll-1.6.1-0.1.20060mdk.src.rpm</a> recoll-1.6.3-0.1.20060mdk.src.rpm</a>
</p> </p>
<p><b>Suse 10.1</b> <p><b>Suse 10.1</b>
@ -150,6 +150,9 @@
<a href="http://cvsweb.freebsd.org/ports/deskutils/recoll"> <a href="http://cvsweb.freebsd.org/ports/deskutils/recoll">
recoll port</a>.</p> recoll port</a>.</p>
<p>Up to date ports for <a href="port-recoll.tgz">recoll-1.6</a> and
<a href="port-xapian-core.tgz">xapian-0.9.9</a> (without the
NEAR patch).</p>
</div> </div>
</body> </body>
</html> </html>

View File

@ -59,7 +59,7 @@
<li><var class="literal">html</var>.</li> <li><var class="literal">html</var>.</li>
<li><span class="application">OpenOffice</span> <li><span class="application">OpenOffice</span>
files.</li> files (needs <b>unzip</b> command).</li>
<li><var class="literal">maildir</var> and <var <li><var class="literal">maildir</var> and <var
class="literal">mailbox</var> (<span class= class="literal">mailbox</var> (<span class=
@ -122,8 +122,8 @@
<li>Support for multiple charsets. Internal processing and <li>Support for multiple charsets. Internal processing and
storage uses Unicode UTF-8.</li> storage uses Unicode UTF-8.</li>
<li>Stemming performed at query time (can switch stemming <li><a href="#Stemming">Stemming</a> performed at query
language after indexing).</li> time (can switch stemming language after indexing).</li>
<li>Easy installation. No database daemon, web server or <li>Easy installation. No database daemon, web server or
exotic language necessary.</li> exotic language necessary.</li>
@ -134,6 +134,46 @@
</dd> </dd>
</ul> </ul>
<h2><a name="#stemming"></a>Stemming</h2>
<p>Stemming is a process which transforms inflected words into
their most basic form. For exemple, <i>flooring</i>,
<i>floors</i>, <i>floored</i> would probably all be transformed
to <i>floor</i> by a stemmer for the English language.</p>
<p>In many search engines, the stemming process occurs during
indexing. The index will only contain the stemmed form of words,
with exceptions for terms which are detected as being probably
proper nouns (ie: capitalized). At query time, the terms entered
by the user are stemmed, then matched against the index.</p>
<p>This process results into a smaller index, but it has the
grave inconvenient of irrevocably losing information during
indexing.</p>
<p>Recoll works in a different way. No stemming is performed at
query time, so that all information gets into the index. The
resulting index is bigger, but most people probably don't care
much about this nowadays, because they have a 100Gb disk 95%
full of binary data <em>which does not get indexed</em>.</p>
<p>At the end of an indexing pass, Recoll builds one or several
stemming dictionaries, where all word stems are listed in
correspondence to the list of their derivatives.</p>
<p>At query time, by default, user-entered terms are stemmed,
then matched against the stem database, and the query is
expanded to include all derivatives. This will yield search
results analogous to those obtained by a classical engine.
The benefits of this approach is that stem expansion can be
controlled instantly at query time in several ways:
<ul>
<li>It can be selectively turned-off for any query term by
capitalizing it (<i>Floor</i>).</li>
<li>The stemming language (ie: english, french...) can be
selected (this supposes that several stemming databases have
been built, which can be configured as part of the indexing,
or done later, in a reasonably fast way).</li>
</ul>
</div> </div>
</body> </body>

View File

@ -47,7 +47,7 @@
<p><span class="application">Recoll</span> is free, open source, <p><span class="application">Recoll</span> is free, open source,
and GPL-licensed. The current version is and GPL-licensed. The current version is
<a class="important" href="download.html">1.6.1</a></p> <a class="important" href="download.html">1.6.3</a></p>
<p>We borrow a lot of code <p>We borrow a lot of code
from other packages, and welcome code and ideas from from other packages, and welcome code and ideas from
contributors, see the <a class="important" contributors, see the <a class="important"

View File

@ -21,6 +21,7 @@
<a href="recoll2.html"><img src="recoll2-thumb.png"></a> <a href="recoll2.html"><img src="recoll2-thumb.png"></a>
<a href="recoll3.html"><img src="recoll3-thumb.png"></a> <a href="recoll3.html"><img src="recoll3-thumb.png"></a>
<a href="recoll4.html"><img src="recoll4-thumb.png"></a> <a href="recoll4.html"><img src="recoll4-thumb.png"></a>
<a href="recoll5.html"><img src="recoll5-thumb.png"></a>
</div> </div>
</body> </body>
</html> </html>

View File

@ -7,7 +7,10 @@
<body> <body>
<h1>Recoll index format details</h1> <h1>Recoll index format details</h1>
<p>Special (capitalized) terms:</p> <p>Terms are not stemmed before being stored. They are turned to
all minuscule letters with no accents.</p>
<p>Special prefixed terms:</p>
<ul> <ul>
<li>Ddate: modification date of file, like YYYYMMDD</li> <li>Ddate: modification date of file, like YYYYMMDD</li>
@ -64,7 +67,7 @@
<address><a href="mailto:jean-francois.dockes@wanadoo.fr">Jean-Francois Dockes</a></address> <address><a href="mailto:jean-francois.dockes@wanadoo.fr">Jean-Francois Dockes</a></address>
<!-- Created: Thu Dec 7 13:07:40 CET 2006 --> <!-- Created: Thu Dec 7 13:07:40 CET 2006 -->
<!-- hhmts start --> <!-- hhmts start -->
Last modified: Thu Dec 7 14:13:36 CET 2006 Last modified: Thu Dec 7 14:19:02 CET 2006
<!-- hhmts end --> <!-- hhmts end -->
</body> </body>
</html> </html>