*** empty log message ***
This commit is contained in:
parent
348b4bc717
commit
2674e45f29
88
packaging/rpm/recollCooker.spec
Normal file
88
packaging/rpm/recollCooker.spec
Normal file
@ -0,0 +1,88 @@
|
|||||||
|
Summary: Desktop full text search tool with a qt gui
|
||||||
|
Name: recoll
|
||||||
|
Version: 1.8.1
|
||||||
|
Release: %mkrel 1
|
||||||
|
License: GPL
|
||||||
|
Group: Databases
|
||||||
|
URL: http://www.recoll.org/
|
||||||
|
Source0: http://www.lesbonscomptes.com/recoll/%{name}-%{version}.tar.bz2
|
||||||
|
Patch1: %{name}-configure.patch
|
||||||
|
BuildRequires: libxapian-devel
|
||||||
|
BuildRequires: libfam-devel
|
||||||
|
BuildRequires: libqt-devel >= 3.3.7
|
||||||
|
BuildRequires: libaspell-devel
|
||||||
|
Requires: xapian
|
||||||
|
BuildRoot: %{_tmppath}/%{name}-%{version}--buildroot
|
||||||
|
|
||||||
|
%description
|
||||||
|
Recoll is a personal full text search tool for Unix/Linux.
|
||||||
|
It is based on the very strong Xapian backend, for which
|
||||||
|
it provides an easy to use, feature-rich, easy administration,
|
||||||
|
QT graphical interface.
|
||||||
|
|
||||||
|
%prep
|
||||||
|
%setup -q
|
||||||
|
%patch1 -p0
|
||||||
|
|
||||||
|
%build
|
||||||
|
%configure2_5x \
|
||||||
|
--with-fam \
|
||||||
|
--with-aspell
|
||||||
|
|
||||||
|
%make
|
||||||
|
|
||||||
|
%install
|
||||||
|
[ "%{buildroot}" != "/" ] && rm -rf %{buildroot}
|
||||||
|
|
||||||
|
%makeinstall_std
|
||||||
|
desktop-file-install --vendor="" \
|
||||||
|
--add-category="X-MandrivaLinux-MoreApplications-Databases" \
|
||||||
|
--dir %{buildroot}%{_datadir}/applications %{buildroot}%{_datadir}/applications/*
|
||||||
|
|
||||||
|
%clean
|
||||||
|
[ "%{buildroot}" != "/" ] && rm -rf %{buildroot}
|
||||||
|
|
||||||
|
%files
|
||||||
|
%defattr(644,root,root,755)
|
||||||
|
%doc %{_datadir}/%{name}/doc
|
||||||
|
%attr(755,root,root) %{_bindir}/%{name}*
|
||||||
|
%{_datadir}/applications/recoll-searchgui.desktop
|
||||||
|
%{_datadir}/icons/hicolor/48x48/apps/recoll-searchgui.png
|
||||||
|
%dir %{_datadir}/%{name}
|
||||||
|
%dir %{_datadir}/%{name}/examples
|
||||||
|
%dir %{_datadir}/%{name}/filters
|
||||||
|
%dir %{_datadir}/%{name}/images
|
||||||
|
%dir %{_datadir}/%{name}/translations
|
||||||
|
%{_datadir}/%{name}/examples/mime*
|
||||||
|
%{_datadir}/%{name}/examples/*.conf
|
||||||
|
%attr(755,root,root) %{_datadir}/%{name}/examples/rclmon.sh
|
||||||
|
%attr(755,root,root) %{_datadir}/%{name}/filters/rc*
|
||||||
|
%{_datadir}/%{name}/filters/xdg-open
|
||||||
|
%{_datadir}/%{name}/images/*png
|
||||||
|
%{_mandir}/man1/recoll*
|
||||||
|
%{_mandir}/man5/recoll*
|
||||||
|
%{_datadir}/%{name}/translations/*.qm
|
||||||
|
|
||||||
|
|
||||||
|
%changelog
|
||||||
|
* Fri Apr 20 2007 Tomasz Pawel Gajc <tpg@mandriva.org> 1.8.1-1mdv2008.0
|
||||||
|
+ Revision: 16093
|
||||||
|
- new version
|
||||||
|
- drop P0
|
||||||
|
|
||||||
|
+ Mandriva <devel@mandriva.com>
|
||||||
|
|
||||||
|
|
||||||
|
* Tue Mar 06 2007 Tomasz Pawel Gajc <tpg@mandriva.org> 1.7.5-2mdv2007.0
|
||||||
|
+ Revision: 134128
|
||||||
|
- rebuild
|
||||||
|
|
||||||
|
* Tue Jan 30 2007 Tomasz Pawel Gajc <tpg@mandriva.org> 1.7.5-1mdv2007.1
|
||||||
|
+ Revision: 115423
|
||||||
|
- add patch 1 - fix build on x86_64
|
||||||
|
- add patch 0 - fix menu entry
|
||||||
|
- fix group
|
||||||
|
- add buildrequires
|
||||||
|
- set correct bits on files
|
||||||
|
- Import recoll
|
||||||
|
|
||||||
@ -24,11 +24,12 @@
|
|||||||
Dockes</holder>
|
Dockes</holder>
|
||||||
</copyright>
|
</copyright>
|
||||||
|
|
||||||
<releaseinfo>$Id: usermanual.sgml,v 1.44 2007-06-08 16:46:53 dockes Exp $</releaseinfo>
|
<releaseinfo>$Id: usermanual.sgml,v 1.45 2007-06-26 16:58:25 dockes Exp $</releaseinfo>
|
||||||
|
|
||||||
<abstract>
|
<abstract>
|
||||||
<para>This document introduces full text search notions
|
<para>This document introduces full text search notions
|
||||||
and describes the installation and use of the &RCL; application.</para>
|
and describes the installation and use of the &RCL;
|
||||||
|
application. It currently describes &RCL; 1.9.</para>
|
||||||
</abstract>
|
</abstract>
|
||||||
|
|
||||||
|
|
||||||
@ -771,30 +772,6 @@ fvwm
|
|||||||
<replaceable>unplugged</replaceable> but not
|
<replaceable>unplugged</replaceable> but not
|
||||||
<replaceable>potatoes</replaceable> (in any part of the document).</para>
|
<replaceable>potatoes</replaceable> (in any part of the document).</para>
|
||||||
|
|
||||||
<para>The first element <literal>author:"john doe"</literal> is
|
|
||||||
a phrase search limited to a specific field. Phrase searches are
|
|
||||||
specified as usual by enclosing the words in double quotes. The
|
|
||||||
field specification appears before the colon (of course this is
|
|
||||||
not limited to phrases, <literal>author:Balzac</literal> would
|
|
||||||
be ok too). &RCL; currently manages the following fields:</para>
|
|
||||||
|
|
||||||
<itemizedlist>
|
|
||||||
<listitem><para><literal>title</literal>,
|
|
||||||
<literal>subject</literal> or <literal>caption</literal> are
|
|
||||||
synonyms which specify data to be searched for in the
|
|
||||||
document title or subject.</para>
|
|
||||||
</listitem>
|
|
||||||
<listitem><para><literal>author</literal> or
|
|
||||||
<literal>from</literal> for searching the documents originators.</para>
|
|
||||||
</listitem>
|
|
||||||
<listitem><para><literal>keyword</literal> for searching the
|
|
||||||
document specified keywords (few documents actually have any).</para>
|
|
||||||
</listitem>
|
|
||||||
</itemizedlist>
|
|
||||||
|
|
||||||
<para>The query language is currently the only way to use the
|
|
||||||
&RCL; field search capability.</para>
|
|
||||||
|
|
||||||
<para>All elements in the search entry are normally combined
|
<para>All elements in the search entry are normally combined
|
||||||
with an implicit AND. It is possible to specify that elements be
|
with an implicit AND. It is possible to specify that elements be
|
||||||
OR'ed instead, as in <replaceable>Beatles</replaceable>
|
OR'ed instead, as in <replaceable>Beatles</replaceable>
|
||||||
@ -817,8 +794,54 @@ fvwm
|
|||||||
<para>An entry preceded by a <literal>-</literal> specifies a
|
<para>An entry preceded by a <literal>-</literal> specifies a
|
||||||
term that should <emphasis>not</emphasis> appear.</para>
|
term that should <emphasis>not</emphasis> appear.</para>
|
||||||
|
|
||||||
|
<para>The first element in the above exemple,
|
||||||
|
<literal>author:"john doe"</literal> is a phrase search limited
|
||||||
|
to a specific field. Phrase searches are specified as usual by
|
||||||
|
enclosing the words in double quotes. The field specification
|
||||||
|
appears before the colon (of course this is not limited to
|
||||||
|
phrases, <literal>author:Balzac</literal> would be ok
|
||||||
|
too). &RCL; currently manages the following fields:</para>
|
||||||
|
<itemizedlist>
|
||||||
|
<listitem><para><literal>title</literal>,
|
||||||
|
<literal>subject</literal> or <literal>caption</literal> are
|
||||||
|
synonyms which specify data to be searched for in the
|
||||||
|
document title or subject.</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem><para><literal>author</literal> or
|
||||||
|
<literal>from</literal> for searching the documents originators.</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem><para><literal>keyword</literal> for searching the
|
||||||
|
document specified keywords (few documents actually have any).</para>
|
||||||
|
</listitem>
|
||||||
|
</itemizedlist>
|
||||||
|
|
||||||
|
<para>As of release 1.9, the filters have the possibility to
|
||||||
|
create other fields with arbitrary names. No standard filters
|
||||||
|
use this possibility yet.</para>
|
||||||
|
|
||||||
|
<para>There are two other elements which may be specified
|
||||||
|
through the field syntax, but are somewhat special:</para>
|
||||||
|
<itemizedlist>
|
||||||
|
<listitem><para><literal>ext</literal> for specifying the file
|
||||||
|
name extension (Ex: <literal>ext:html</literal>)</para>
|
||||||
|
</listitem>
|
||||||
|
<listitem><para><literal>mime</literal> for specifying the
|
||||||
|
mime type. This one is quite special because you can specify
|
||||||
|
several values which will be OR'ed (the normal default for the
|
||||||
|
language is AND). Ex: <literal>mime:text/plain
|
||||||
|
mime:text/html</literal>. Specifying an explicit boolean
|
||||||
|
operator or negation (<literal>-</literal>) before a
|
||||||
|
<literal>mime</literal> specification is not supported and
|
||||||
|
will produce strange results.</para>
|
||||||
|
</listitem>
|
||||||
|
</itemizedlist>
|
||||||
|
<para>The query language is currently the only way to use the
|
||||||
|
&RCL; field search capability.</para>
|
||||||
|
|
||||||
<para>Words inside phrases and capitalized words are not
|
<para>Words inside phrases and capitalized words are not
|
||||||
stem-expanded. Wildcards may be used anywhere.</para>
|
stem-expanded. Wildcards may be used anywhere inside a term.
|
||||||
|
Specifying a wild-card on the left of a term can produce a very
|
||||||
|
slow search.</para>
|
||||||
|
|
||||||
<para>You can use the <literal>show query</literal> link at the
|
<para>You can use the <literal>show query</literal> link at the
|
||||||
top of the result list to check the exact query which was
|
top of the result list to check the exact query which was
|
||||||
@ -2089,8 +2112,47 @@ skippedPaths = ~/somedir/*.txt
|
|||||||
will be given a file name as argument and should output the
|
will be given a file name as argument and should output the
|
||||||
text contents in html format on the standard output.</para>
|
text contents in html format on the standard output.</para>
|
||||||
|
|
||||||
<para>The html could be very minimal like the following
|
<para>You can find more details about writing a &RCL; filter
|
||||||
|
in the <link linkend="rcl.extending.filters">section about
|
||||||
|
writing filters</link></para>
|
||||||
|
</sect3>
|
||||||
|
|
||||||
|
</sect2>
|
||||||
|
|
||||||
|
</sect1>
|
||||||
|
|
||||||
|
<sect1 id="rcl.extending">
|
||||||
|
<title>Extending &RCL;</title>
|
||||||
|
|
||||||
|
<sect2 id="rcl.extending.filters">
|
||||||
|
<title>Writing a document filter</title>
|
||||||
|
|
||||||
|
<para>&RCL; filters are executable programs which
|
||||||
|
translate from a specific format (ie:
|
||||||
|
<application>openoffice</application>,
|
||||||
|
<application>acrobat</application>, etc.) to the &RCL;
|
||||||
|
indexing input format, which was chosen to be HTML.</para>
|
||||||
|
|
||||||
|
<para>&RCL; filters are usually shell-scripts, but this is in
|
||||||
|
no way necessary. These programs are extremely simple and most
|
||||||
|
of the difficulty lies in extracting the text from the native
|
||||||
|
format, not outputting what is expected by &RCL;. Happily
|
||||||
|
enough, most document formats already have translators or text
|
||||||
|
extractors which handle the difficult part and can be called
|
||||||
|
from the filter.</para>
|
||||||
|
|
||||||
|
<para>Filters are called with a single argument which is the
|
||||||
|
source file name. They should output the result to stdout.</para>
|
||||||
|
|
||||||
|
<para>The <literal>RECOLL_FILTER_FORPREVIEW</literal>
|
||||||
|
environment variable (values <literal>yes</literal>,
|
||||||
|
<literal>no</literal>) tells the filter if the operation is
|
||||||
|
for indexing or previewing. Some filters use this to output a
|
||||||
|
slightly different format. This is not essential.</para>
|
||||||
|
|
||||||
|
<para>The output HTML could be very minimal like the following
|
||||||
example:</para>
|
example:</para>
|
||||||
|
|
||||||
<programlisting><html><head>
|
<programlisting><html><head>
|
||||||
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
|
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
|
||||||
</head>
|
</head>
|
||||||
@ -2110,15 +2172,31 @@ skippedPaths = ~/somedir/*.txt
|
|||||||
|
|
||||||
<para>&RCL; will also make use of other header fields if
|
<para>&RCL; will also make use of other header fields if
|
||||||
they are present: <literal>title</literal>,
|
they are present: <literal>title</literal>,
|
||||||
<literal>description</literal>, <literal>keywords</literal>.
|
<literal>description</literal>,
|
||||||
<para>
|
<literal>keywords</literal>.</para>
|
||||||
|
|
||||||
|
<para>As of &RCL; release 1.9, filters also have the
|
||||||
|
possibility to "invent" field names. This should be output as
|
||||||
|
meta tags:</para>
|
||||||
|
|
||||||
|
<programlisting>
|
||||||
|
<meta name="somefield" content="Some textual data" />
|
||||||
|
</programlisting>
|
||||||
|
|
||||||
|
<para>In this case, a correspondance between field name and
|
||||||
|
&XAP; prefix should also be added to the
|
||||||
|
<filename>mimeconf</filename> file. See the existing entries
|
||||||
|
for inspiration. The field can then be used inside the query
|
||||||
|
language to narrow searches.</para>
|
||||||
|
|
||||||
<para>The easiest way to write a new filter is probably to start
|
<para>The easiest way to write a new filter is probably to start
|
||||||
from an existing one.</para>
|
from an existing one.</para>
|
||||||
</sect3>
|
|
||||||
|
|
||||||
</sect2>
|
</sect2>
|
||||||
|
|
||||||
</sect1>
|
</sect1>
|
||||||
|
|
||||||
</chapter>
|
</chapter>
|
||||||
|
|
||||||
</book>
|
</book>
|
||||||
|
|||||||
@ -4,10 +4,21 @@ Bugs that are listed in an older version section are supposedly fixed in
|
|||||||
later versions. Bugs listed in the topmost section may also exist in older
|
later versions. Bugs listed in the topmost section may also exist in older
|
||||||
versions.
|
versions.
|
||||||
|
|
||||||
Latest (1.8.1):
|
Latest (1.8.2):
|
||||||
|
- There are a few problems in the qt4 version of recoll: some accelerators
|
||||||
|
(esc-spc, ctl-arrow) do not work, neither do copy/paste between the
|
||||||
|
result list and preview windows and x11 applications.
|
||||||
- The dates shown for email attachments in a result list are the email
|
- The dates shown for email attachments in a result list are the email
|
||||||
folder modification date. This should be inherited from the parent
|
folder modification date. This should be inherited from the parent
|
||||||
message instead.
|
message instead.
|
||||||
|
- There are sometimes problems with document deletions: the index can
|
||||||
|
get in a state where deleted or moved documents are not purged from the
|
||||||
|
index (the log file says that the doc are deleted, but they aren't
|
||||||
|
actually). When this happens, the only solution currently is to reindex
|
||||||
|
from scratch (recollindex -z). This is due to a xapian bug, which will be
|
||||||
|
fixed in a future release. You can apply the following patch to xapian
|
||||||
|
1.0.1 to fix it:
|
||||||
|
http://www.lesbonscomptes.com/recoll/xapian/xapian-delete-document.patch
|
||||||
- NEAR crashes: 1.6 has added NEAR searches. Unlike what recoll did
|
- NEAR crashes: 1.6 has added NEAR searches. Unlike what recoll did
|
||||||
with PHRASES, stemming expansion is performed on terms inside NEAR
|
with PHRASES, stemming expansion is performed on terms inside NEAR
|
||||||
clauses (except if prevented by a capitalized entry of course). There is
|
clauses (except if prevented by a capitalized entry of course). There is
|
||||||
@ -39,9 +50,9 @@ Latest (1.8.1):
|
|||||||
compressed (ie: xxx.txt.gz), recoll will try to start the external viewer
|
compressed (ie: xxx.txt.gz), recoll will try to start the external viewer
|
||||||
on the compressed file, which will not work in most cases.
|
on the compressed file, which will not work in most cases.
|
||||||
|
|
||||||
- There are problems which have been reported indexing big mailstores
|
- Problems have been reported indexing big mailstores (several hundreds of
|
||||||
(several hundreds of thousands of messages): resulting in a very big
|
thousands of messages): resulting in a very big database and even
|
||||||
database and even crashes during indexation.
|
crashes.
|
||||||
|
|
||||||
- Under some versions of KDE (ie: Fedora FC5 KDE 3.5.4-0.5.fc5), there is a
|
- Under some versions of KDE (ie: Fedora FC5 KDE 3.5.4-0.5.fc5), there is a
|
||||||
problem with the window stacking order. Opening the "browse" file
|
problem with the window stacking order. Opening the "browse" file
|
||||||
|
|||||||
@ -1,5 +1,31 @@
|
|||||||
CHANGES
|
CHANGES
|
||||||
|
|
||||||
|
1.9.0
|
||||||
|
- Add option to remember sort tool state between program invocations (it is
|
||||||
|
reset to inactive by default)
|
||||||
|
- Improve qt4 build: no more need for --enable-qt4
|
||||||
|
- Fixed a number of qt4 glitches: selection and keyboard shortcuts.
|
||||||
|
- When searching for an empty string inside the preview window, position
|
||||||
|
the window to the next occurrence of the primary search terms.
|
||||||
|
- Have email attachments inherit date and author from their parent message
|
||||||
|
- Added an adjustable flush threshold during indexing: should help control
|
||||||
|
memory usage. See the idxflushmb configuration parameter.
|
||||||
|
- Added a check for file system free space. Indexing will stop if the
|
||||||
|
threshold is reached. See the maxfsoccuppc configuration parameter.
|
||||||
|
- Fix bus error on rclmon exit
|
||||||
|
- Better handle aspell errors inside rclmon
|
||||||
|
- Added File menu entry to erase document history.
|
||||||
|
- Added ext: and mime: selectors to the query language.
|
||||||
|
- Added support for arbitrary fields. Filters can now produce any number of
|
||||||
|
fields which will be selectively searchable through the query language.
|
||||||
|
- Added abiword and kword support.
|
||||||
|
- Contributed filter: rcljpeg. This should be extended to use the new field
|
||||||
|
support.
|
||||||
|
- Changed the icon to an ugly one. The previous one was nicer but looked
|
||||||
|
too much like Xapian's.
|
||||||
|
- Added some kind of support for a stopword list.
|
||||||
|
- Bound space and backspace to PgUp/PgDown in preview.
|
||||||
|
|
||||||
1.8.2 2007-05-19
|
1.8.2 2007-05-19
|
||||||
- Fixed method name for compatibility with xapian 1.0.0
|
- Fixed method name for compatibility with xapian 1.0.0
|
||||||
- Add .beagle to default list of skipped names (avoids indexing beagle
|
- Add .beagle to default list of skipped names (avoids indexing beagle
|
||||||
|
|||||||
@ -38,7 +38,7 @@
|
|||||||
<p>First of all, many thanks to the users who provided criticism
|
<p>First of all, many thanks to the users who provided criticism
|
||||||
and ideas to make <span class="application">Recoll</span> go
|
and ideas to make <span class="application">Recoll</span> go
|
||||||
forward ! Please
|
forward ! Please
|
||||||
<a href="mailto:jean-francois.dockes@wanadoo.fr>
|
<a href="mailto:jean-francois.dockes@wanadoo.fr">
|
||||||
contact me</a> if you have something to suggest.</p>
|
contact me</a> if you have something to suggest.</p>
|
||||||
|
|
||||||
<p><span class="application">Recoll</span> borrows
|
<p><span class="application">Recoll</span> borrows
|
||||||
|
|||||||
@ -30,15 +30,23 @@
|
|||||||
|
|
||||||
<div class="content">
|
<div class="content">
|
||||||
|
|
||||||
<h1>Recoll user manuals</h1>
|
<h1>Recoll user manual</h1>
|
||||||
|
|
||||||
<blockquote>
|
|
||||||
<ul>
|
<ul>
|
||||||
<li><a href="usermanual/index.html">English</a></li>
|
<li><a href="usermanual/index.html">English</a></li>
|
||||||
<li><a href="http://mcz.altervista.org/Pagine/usermanual-italian.html">
|
<li><a href="http://mcz.altervista.org/Pagine/usermanual-italian.html">
|
||||||
Italian</a></li>
|
Italian</a></li>
|
||||||
</ul>
|
</ul>
|
||||||
</blockquote>
|
|
||||||
|
<p><br></p>
|
||||||
|
|
||||||
|
<h1>Other documentation</h1>
|
||||||
|
|
||||||
|
<ul>
|
||||||
|
<li><a href="perfs.html">Index size and indexing performance
|
||||||
|
data.</a></li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
|
||||||
</div>
|
</div>
|
||||||
</body>
|
</body>
|
||||||
|
|||||||
@ -24,7 +24,7 @@
|
|||||||
<ul>
|
<ul>
|
||||||
<li><a href="index.html">Home</a></li>
|
<li><a href="index.html">Home</a></li>
|
||||||
<li><b>Downloads</b></li>
|
<li><b>Downloads</b></li>
|
||||||
<li><a href="usermanual/index.html">User manual</a></li>
|
<li><a href="doc.html">Documentation</a></li>
|
||||||
<li><a href="usermanual/rcl.install.html">Installation</a></li>
|
<li><a href="usermanual/rcl.install.html">Installation</a></li>
|
||||||
<li><a href="index.html#support">Support</a></li>
|
<li><a href="index.html#support">Support</a></li>
|
||||||
</ul>
|
</ul>
|
||||||
@ -47,6 +47,8 @@
|
|||||||
</table>
|
</table>
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
|
<h2><a name="source">General information</a></h2>
|
||||||
|
|
||||||
<p>You will probably need to have a look at the
|
<p>You will probably need to have a look at the
|
||||||
<a href="usermanual/rcl.install.html">installation manual</a> for
|
<a href="usermanual/rcl.install.html">installation manual</a> for
|
||||||
building and/or installation instructions.</p>
|
building and/or installation instructions.</p>
|
||||||
@ -68,12 +70,17 @@
|
|||||||
<a href="usermanual/index.html#RCL.INSTALL.EXTERNAL">list</a> to
|
<a href="usermanual/index.html#RCL.INSTALL.EXTERNAL">list</a> to
|
||||||
decide what you may want to install.</p>
|
decide what you may want to install.</p>
|
||||||
|
|
||||||
|
<p>In addition, optional functionality in Recoll (the term explorer
|
||||||
|
tool in phonetic mode) uses the <b>aspell</b> package. The
|
||||||
|
installed version should be at least 0.60 (utf-8 support) for
|
||||||
|
this to run smoothly. This function is far from essential.</p>
|
||||||
|
|
||||||
<p>If you find problems with the package or its
|
<p>If you find problems with the package or its
|
||||||
installation, <em>please</em>
|
installation, <em>please</em>
|
||||||
<a href="mailto:jean-francois.dockes@wanadoo.fr">
|
<a href="mailto:jean-francois.dockes@wanadoo.fr">
|
||||||
report them</a>.</p>
|
report them</a>.</p>
|
||||||
|
|
||||||
<h4>What do the release numbers mean?</h4>
|
<h3>What do the release numbers mean?</h3>
|
||||||
|
|
||||||
<p>The Recoll releases are numbered X.Y.Z. </p>
|
<p>The Recoll releases are numbered X.Y.Z. </p>
|
||||||
|
|
||||||
@ -110,7 +117,16 @@
|
|||||||
1.8.2 was released purely for fixing a small issue of
|
1.8.2 was released purely for fixing a small issue of
|
||||||
compatibility with xapian 1.0.0 and small config/install
|
compatibility with xapian 1.0.0 and small config/install
|
||||||
glitches. There is no functional reason to upgrade from
|
glitches. There is no functional reason to upgrade from
|
||||||
1.8.1, (or update packages).
|
1.8.1, (or update packages).</p>
|
||||||
|
|
||||||
|
<p>Recoll 1.8.2 is the first release that will let you take
|
||||||
|
advantage of the new Xapian 1.0, the main user-visible change
|
||||||
|
of which is the new default index format. In order to take
|
||||||
|
advantage of the new format (which is not mandatory) Recoll
|
||||||
|
users updating from an older release need to delete their old
|
||||||
|
index. There are <a
|
||||||
|
href="usermanual/usermanual.html#RCL.INDEXING.STORAGE.FORMAT">more
|
||||||
|
details in the user manual</a>.</p>
|
||||||
|
|
||||||
<p>Older recoll releases:
|
<p>Older recoll releases:
|
||||||
<a href="recoll-1.8.1.tar.gz">1.8.1</a>
|
<a href="recoll-1.8.1.tar.gz">1.8.1</a>
|
||||||
@ -128,8 +144,8 @@
|
|||||||
<h2><a name="rpms">Packages</a></h2>
|
<h2><a name="rpms">Packages</a></h2>
|
||||||
|
|
||||||
<p>The executables inside the binary rpms have a static link to
|
<p>The executables inside the binary rpms have a static link to
|
||||||
xapian, there is no dependency except Qt 3.3. Of course you need
|
xapian 0.9.x, there is no dependency except Qt 3.3. Of course
|
||||||
xapian-core installed to use the source rpm. </p>
|
you need xapian-core installed to use the source rpm. </p>
|
||||||
|
|
||||||
<p><b>Fedora Core</b>
|
<p><b>Fedora Core</b>
|
||||||
FC6 RPM:
|
FC6 RPM:
|
||||||
@ -168,10 +184,16 @@
|
|||||||
<a href="debian/edgy/">debian/edgy</a>
|
<a href="debian/edgy/">debian/edgy</a>
|
||||||
</p>
|
</p>
|
||||||
|
|
||||||
|
<p><b>Ubuntu 6.06 dapper</b> (the feisty version does not work
|
||||||
|
on dapper). This has a static link on xapian 0.9.10:
|
||||||
|
<a href="debian/dapper/recoll_1.8.2-0ubuntu1_i386.deb">
|
||||||
|
recoll_1.8.2-0ubuntu1_i386.deb</a> </p>
|
||||||
|
|
||||||
<p><b>Debian unstable</b> Recoll is in the package repository,
|
<p><b>Debian unstable</b> Recoll is in the package repository,
|
||||||
you can install it with the usual <em>apt-get install
|
you can install it with the usual <em>apt-get install
|
||||||
recoll</em>. <a
|
recoll</em>. <a
|
||||||
href="http://packages.qa.debian.org/r/recoll.html">Package page</a></p>
|
href="http://packages.qa.debian.org/r/recoll.html">
|
||||||
|
Package page</a></p>
|
||||||
|
|
||||||
<p><b>Debian 3.1</b> Thanks to Mario (<img align="top" src="mario.png">)
|
<p><b>Debian 3.1</b> Thanks to Mario (<img align="top" src="mario.png">)
|
||||||
for these: i386:
|
for these: i386:
|
||||||
|
|||||||
@ -142,6 +142,7 @@
|
|||||||
</dd>
|
</dd>
|
||||||
</ul>
|
</ul>
|
||||||
|
|
||||||
|
|
||||||
<h2><a name="#stemming"></a>Stemming</h2>
|
<h2><a name="#stemming"></a>Stemming</h2>
|
||||||
|
|
||||||
<p>Stemming is a process which transforms inflected words into
|
<p>Stemming is a process which transforms inflected words into
|
||||||
|
|||||||
205
website/fr/features.html
Normal file
205
website/fr/features.html
Normal file
@ -0,0 +1,205 @@
|
|||||||
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
|
||||||
|
|
||||||
|
<html>
|
||||||
|
<head>
|
||||||
|
<title>RECOLL: un outil personnel de recherche textuelle pour
|
||||||
|
Unix et Linux</title>
|
||||||
|
<meta name="generator" content="HTML Tidy, see www.w3.org">
|
||||||
|
<meta name="Author" content="Jean-Francois Dockes">
|
||||||
|
<meta name="Description" content=
|
||||||
|
"recoll est un logiciel personnel de recherche textuelle pour unix et linux basé sur Xapian, un moteur d'indexation puissant et mature.">
|
||||||
|
<meta name="Keywords" content=
|
||||||
|
"recherche textuelle,desktop,unix,linux,solaris,open source,free">
|
||||||
|
<meta http-equiv="Content-language" content="fr">
|
||||||
|
<meta http-equiv="content-type" content=
|
||||||
|
"text/html; charset=iso-8859-1">
|
||||||
|
<meta name="robots" content="All,Index,Follow">
|
||||||
|
<link type="text/css" rel="stylesheet" href="../styles/style.css">
|
||||||
|
</head>
|
||||||
|
|
||||||
|
<body>
|
||||||
|
|
||||||
|
<div class="rightlinks">
|
||||||
|
<ul>
|
||||||
|
<li><a href="../index.html">Base</a></li>
|
||||||
|
<li><a href="../pics/index.html">Copies d'écrans</a></li>
|
||||||
|
<li><a href="../download.html">Téléchargements</a></li>
|
||||||
|
<li><a href="../manuals.html">Documentation</a></li>
|
||||||
|
<li><a href="../index.html#support">Support</a></li>
|
||||||
|
<li><a href="../devel.html">Développement</a></li>
|
||||||
|
</ul>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div class="content">
|
||||||
|
|
||||||
|
<h1 class="intro">Caractéristiques de Recoll</h1>
|
||||||
|
|
||||||
|
<dl>
|
||||||
|
<dt><a name="systems">Systèmes</a></dt>
|
||||||
|
<dd><span class="application">Recoll</span> a été compilé et
|
||||||
|
testé sur FreeBSD, Linux, Darwin, Solaris (versions
|
||||||
|
FreeBSD 5.5, Fedora Core 5, Suse 10.1, Gentoo,
|
||||||
|
Debian 3.1, Ubuntu Edgy, Solaris 8/9, mais d'autres versions
|
||||||
|
récentes conviennent sans doute également).</dd>
|
||||||
|
|
||||||
|
<dd>Versions de QT: 3.2, 3.3 et 4.2</dd>
|
||||||
|
|
||||||
|
<dt><a name="doctypes">Types de documents</a></dt>
|
||||||
|
<dd>Recoll peut traiter les types de documents suivants, ainsi
|
||||||
|
que des fichiers compressés du même type:
|
||||||
|
|
||||||
|
<dl>
|
||||||
|
<dt>En interne</dt>
|
||||||
|
|
||||||
|
<dd>
|
||||||
|
<ul>
|
||||||
|
<li><var class="literal">text</var>.</li>
|
||||||
|
|
||||||
|
<li><var class="literal">html</var>.</li>
|
||||||
|
|
||||||
|
<li><span class="application">OpenOffice</span>
|
||||||
|
(avec l'aide de la commande <b>unzip</b>).</li>
|
||||||
|
|
||||||
|
<li><var class="literal">maildir</var> et <var
|
||||||
|
class="literal">mailbox</var> (<span class=
|
||||||
|
"application">Mozilla</span>, <span class=
|
||||||
|
"application">Thunderbird</span>, <span class=
|
||||||
|
"application">Evolution</span> et sans doute
|
||||||
|
d'autres).</li>
|
||||||
|
|
||||||
|
<li>Fichiers de conversation <span class="application">
|
||||||
|
gaim</span>.</li>
|
||||||
|
|
||||||
|
<li><span class="application">Scribus</span>.</li>
|
||||||
|
|
||||||
|
</ul>
|
||||||
|
</dd>
|
||||||
|
|
||||||
|
<dt>With external helpers</dt>
|
||||||
|
|
||||||
|
<dd>
|
||||||
|
<ul>
|
||||||
|
<li><var class="literal">pdf</var> avec <a href=
|
||||||
|
"http://www.foolabs.com/xpdf/">xpdf</a>.</li>
|
||||||
|
|
||||||
|
<li><var class="literal">postscript</var> avec
|
||||||
|
<a href="http://www.gnu.org/software/ghostscript/ghostscript.html">
|
||||||
|
ghostscript</a> et
|
||||||
|
<a href="http://www.cs.wisc.edu/~ghost/doc/pstotext.htm">
|
||||||
|
pstotext</a>.</li>
|
||||||
|
|
||||||
|
<li>Fichiers <span class="application">Lyx</span>
|
||||||
|
(nécessite l'application
|
||||||
|
<span class="application">Lyx</span>).</li>
|
||||||
|
|
||||||
|
<li><span class="application">msword</span> avec <a href=
|
||||||
|
"http://www.winfield.demon.nl/">antiword</a>.</li>
|
||||||
|
|
||||||
|
<li><span class="application">Powerpoint</span> et
|
||||||
|
<span class="application">Excel</span> avec les utilitaires
|
||||||
|
<a href="http://www.45.free.net/~vitus/software/catdoc/">
|
||||||
|
catdoc</a>.</li>
|
||||||
|
|
||||||
|
<li><var class="literal">rtf</var> avec <a href=
|
||||||
|
"http://www.gnu.org/software/unrtf/unrtf.html">unrtf</a>.</li>
|
||||||
|
|
||||||
|
<li><var class="literal">dvi</var> avec
|
||||||
|
<a href="http://www.radicaleye.com/dvips.html">dvips</a>.
|
||||||
|
</li>
|
||||||
|
|
||||||
|
<li><var class="literal">djvu</var> avec
|
||||||
|
<a href="http://djvulibre.djvuzone.org/doc/index.html">
|
||||||
|
DjVuLibre</a>. </li>
|
||||||
|
|
||||||
|
<li>Tags <var class="literal">mp3</var> avec
|
||||||
|
<a href="http://id3lib.sourceforge.net/">
|
||||||
|
id3info (id3lib)</a>. </li>
|
||||||
|
|
||||||
|
</ul>
|
||||||
|
</dd>
|
||||||
|
</dl>
|
||||||
|
</dd>
|
||||||
|
|
||||||
|
<dt>Autres caractéristiques</dt>
|
||||||
|
<dd>
|
||||||
|
<ul>
|
||||||
|
<li>Index multiples interrogeables ensemble ou séparément.</li>
|
||||||
|
|
||||||
|
<li>Fonctions de recherche puissantes, avec expressions
|
||||||
|
booléennes, phrases et proximité, caractères jokers,
|
||||||
|
filtrage sur les types de fichiers où l'emplacement.</li>
|
||||||
|
|
||||||
|
<li>Fonction spécifique de recherche de noms de fichiers.</li>
|
||||||
|
|
||||||
|
<li>Support de jeux de caractères multiples. Les traitements
|
||||||
|
internes et l'index utilisent l'encodage Unicode UTF-8.</li>
|
||||||
|
|
||||||
|
<li>L'extraction des racines de mots <a href="#Stemming">
|
||||||
|
Stemming</a> est effectuée au moment de la recherche
|
||||||
|
(permet de changer de langue après l'indexation).</li>
|
||||||
|
|
||||||
|
<li>Installation facile. Pas de processus permanent, de
|
||||||
|
serveur web ou environnement exotique.</li>
|
||||||
|
|
||||||
|
<li>Un indexeur qui peut fonctionner soit comme un
|
||||||
|
processus léger dans l'interface de consultation, comme un
|
||||||
|
programme batch externe intégrable par
|
||||||
|
<span class="application">cron</span>, ou comme un processus
|
||||||
|
permanent pour l'indexation au fil de l'eau.</li>
|
||||||
|
|
||||||
|
</ul>
|
||||||
|
</dd>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
<h2><a name="#stemming"></a>Lemmatisation</h2>
|
||||||
|
|
||||||
|
<p><em>Note: je serais preneur d'une traduction française
|
||||||
|
agréable pour "stemming".</em></p>
|
||||||
|
<p>La lemmatisation transforme un mot dérivé vers sa racine.
|
||||||
|
Par exemple, <i>aimer</i>, <i>aimerai</i>, <i>aimait</i>,
|
||||||
|
<i>aimez</i> etc. seraient transformés en <i>aim</i> en
|
||||||
|
français. Une recherche de l'un quelconque des dérivés peut
|
||||||
|
automatiquement être étendue vers tous les autres</p>
|
||||||
|
|
||||||
|
<p>Certains moteurs de recherche appliquent la transformation
|
||||||
|
pendant l'indexation. L'index ne stocke que les racines des
|
||||||
|
mots, avec des exceptions pour les termes qui sont reconnus
|
||||||
|
comme des noms propres (capitalisation). Au moment de la
|
||||||
|
recherche, les termes de la requête sont également transformés
|
||||||
|
avant comparaison à l'index.</p>
|
||||||
|
|
||||||
|
<p>Cette approche permet un index plus petit, mais elle perd
|
||||||
|
irrévocablement de l'information pendant l'indexation.</p>
|
||||||
|
|
||||||
|
<p>Recoll fonctionne différemment. Les termes sont indexés sans
|
||||||
|
transformation. L'index résultant est plus gros, ce qui n'a
|
||||||
|
probablement pas beaucoup d'importance à une époque de disques
|
||||||
|
de 100 Go principalement remplis d'information multimédia
|
||||||
|
<em>non indexée</em>.
|
||||||
|
|
||||||
|
<p>À la fin de l'indexation, Recoll construit un ou plusieurs
|
||||||
|
dictionnaires de transformation (pour différents langages), où
|
||||||
|
toutes les racines sont listées avec leurs transformations
|
||||||
|
possibles.</p>
|
||||||
|
|
||||||
|
|
||||||
|
<p>Au moment de la recherche, par défaut, les termes de
|
||||||
|
l'utilisateurs sont transformés, et étendus aux dérivés par
|
||||||
|
utilisation du dictionnaire.
|
||||||
|
Les résultats obtenus sont analogues à ceux de
|
||||||
|
l'autre méthode. L'avantage est que l'expansion peut être
|
||||||
|
contrôlée au moment de la recherche:
|
||||||
|
<ul>
|
||||||
|
<li>On peut la supprimer pour n'importe quel terme de la
|
||||||
|
requête, (en le faisant débuter par une capitale:
|
||||||
|
<em>Aime</em> par exemple pour chercher la ville d'Aime la
|
||||||
|
Plagne). </li>
|
||||||
|
<li>Le langage de transformation peut également être changé,
|
||||||
|
en supposant que plusieurs dictionnaires de transformation
|
||||||
|
aient été construits lors de l'indexation.</li>
|
||||||
|
</ul>
|
||||||
|
|
||||||
|
</div>
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
|
|
||||||
@ -81,6 +81,16 @@
|
|||||||
<li><a class="weak" href="features.html">(more detail)</a></li>
|
<li><a class="weak" href="features.html">(more detail)</a></li>
|
||||||
</ul>
|
</ul>
|
||||||
|
|
||||||
|
|
||||||
|
<h2>News: </h1>
|
||||||
|
<p>There are new filters for
|
||||||
|
<span class="application">kword</span> and
|
||||||
|
<span class="application">abiword</span> files in the
|
||||||
|
<a href="filters/filters.html">new filters section</a>. These
|
||||||
|
are usable with an existing <span
|
||||||
|
class="application">Recoll</span> 1.8 installation.</p>
|
||||||
|
|
||||||
|
|
||||||
<h2><a name="support">Support</a></h3>
|
<h2><a name="support">Support</a></h3>
|
||||||
|
|
||||||
<p>If you have any problem with Recoll, its
|
<p>If you have any problem with Recoll, its
|
||||||
|
|||||||
@ -97,6 +97,15 @@
|
|||||||
|
|
||||||
</ul>
|
</ul>
|
||||||
|
|
||||||
|
<h2>Nouvelles: </h1>
|
||||||
|
<p>Il y a de nouveaux filtres d'indexation pour les fichiers
|
||||||
|
<span class="application">kword</span> et
|
||||||
|
<span class="application">abiword</span>. Ils sont téléchargeables
|
||||||
|
dans la <a href="filters/filters.html">zone des nouveaux
|
||||||
|
filtres</a>, et sont utilisable avec une installation existante de
|
||||||
|
<span class="application">Recoll</span> 1.8.</p>
|
||||||
|
|
||||||
|
|
||||||
<h2><a name="support">Support</a></h3>
|
<h2><a name="support">Support</a></h3>
|
||||||
|
|
||||||
<p>Si vous avez un problème quelconque avec le logiciel ou son
|
<p>Si vous avez un problème quelconque avec le logiciel ou son
|
||||||
|
|||||||
BIN
website/mario.png
Normal file
BIN
website/mario.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 1.8 KiB |
114
website/perfs.html
Normal file
114
website/perfs.html
Normal file
@ -0,0 +1,114 @@
|
|||||||
|
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
|
||||||
|
|
||||||
|
<html>
|
||||||
|
<head>
|
||||||
|
<title>RECOLL: a personal text search system for
|
||||||
|
Unix/Linux</title>
|
||||||
|
<meta name="generator" content="HTML Tidy, see www.w3.org">
|
||||||
|
<meta name="Author" content="Jean-Francois Dockes">
|
||||||
|
<meta name="Description" content=
|
||||||
|
"recoll is a simple full-text search system for unix and linux based on the powerful and mature xapian engine">
|
||||||
|
<meta name="Keywords" content=
|
||||||
|
"full text search,fulltext,desktop search,unix,linux,solaris,open source,free">
|
||||||
|
<meta http-equiv="Content-language" content="en">
|
||||||
|
<meta http-equiv="content-type" content=
|
||||||
|
"text/html; charset=iso-8859-1">
|
||||||
|
<meta name="robots" content="All,Index,Follow">
|
||||||
|
<link type="text/css" rel="stylesheet" href="styles/style.css">
|
||||||
|
</head>
|
||||||
|
|
||||||
|
<body>
|
||||||
|
|
||||||
|
<div class="rightlinks">
|
||||||
|
<ul>
|
||||||
|
<li><a href="index.html">Home</a></li>
|
||||||
|
<li><a href="pics/index.html">Screenshots</a></li>
|
||||||
|
<li><a href="download.html">Downloads</a></li>
|
||||||
|
<li><a href="doc.html">Documentation</a></li>
|
||||||
|
</ul>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div class="content">
|
||||||
|
|
||||||
|
<h1 class="intro">Recoll: Indexing performance and index sizes</h1>
|
||||||
|
|
||||||
|
<p>The time needed to index a given set of documents, and the
|
||||||
|
resulting index size depend of many factors, such as file size
|
||||||
|
and proportion of actual text content for the index size, cpu
|
||||||
|
speed, available memory, average file size and format for the
|
||||||
|
speed of indexing.</p>
|
||||||
|
|
||||||
|
<p>We try here to give a number of reference points which can
|
||||||
|
be used to roughly estimate the resources needed to create and
|
||||||
|
store an index. Obviously, your data set will never fit one of
|
||||||
|
the samples, so the results cannot be exactly predicted.</p>
|
||||||
|
|
||||||
|
<p>The following data was obtained on a machine with a 1800 Mhz
|
||||||
|
AMD Duron CPU, 768Mb of Ram, and a 7200 RPM 160 GBytes IDE
|
||||||
|
disk, running Suse 10.1.</p>
|
||||||
|
|
||||||
|
<p><b>recollindex</b> (version 1.8.2 with xapian 1.0.0) is
|
||||||
|
executed with the default flush threshold value.
|
||||||
|
The process memory usage is the one given by <b>ps</b></p>
|
||||||
|
|
||||||
|
<table border=1>
|
||||||
|
<thead>
|
||||||
|
<tr>
|
||||||
|
<th>Data</th>
|
||||||
|
<th>Data size</th>
|
||||||
|
<th>Indexing time</th>
|
||||||
|
<th>Index size</th>
|
||||||
|
<th>Peak process memory usage</th>
|
||||||
|
</tr>
|
||||||
|
<tbody>
|
||||||
|
<tr>
|
||||||
|
<td>Random pdfs harvested on Google</td>
|
||||||
|
<td>1.7 GB, 3564 files</td>
|
||||||
|
<td>27 mn</td>
|
||||||
|
<td>230 MB</td>
|
||||||
|
<td>225 MB</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>Ietf mailing list archive</td>
|
||||||
|
<td>211 MB, 44,000 messages</td>
|
||||||
|
<td>8 mn</td>
|
||||||
|
<td>350 MB</td>
|
||||||
|
<td>90 MB</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<td>Partial Wikipedia dump</td>
|
||||||
|
<td>15 GB, one million files</td>
|
||||||
|
<td>6H30</td>
|
||||||
|
<td>10 GB</td>
|
||||||
|
<td>324 MB</td>
|
||||||
|
</tr>
|
||||||
|
<tr>
|
||||||
|
<!-- DB: ndocs 3564 lastdocid 3564 avglength 6460.71 -->
|
||||||
|
<td>Random pdfs harvested on Google<br>
|
||||||
|
Recoll 1.9, <em>idxflushmb</em> set to 10</td>
|
||||||
|
<td>1.7 GB, 3564 files</td>
|
||||||
|
<td>25 mn</td>
|
||||||
|
<td>262 MB</td>
|
||||||
|
<td>65 MB</td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
|
||||||
|
<p>Notice how the index size for the mail archive is bigger than
|
||||||
|
the data size. Myriads of small pure text documents will do
|
||||||
|
this. The factor of expansion would be even much worse with
|
||||||
|
compressed folders of course (the test was on uncompressed
|
||||||
|
data).</p>
|
||||||
|
|
||||||
|
<p>The last test was performed with Recoll 1.9.0 which has an
|
||||||
|
ajustable flush threshold (<em>idxflushmb</em> parameter), here
|
||||||
|
set to 10 MB. Notice the much lower peak memory usage, with no
|
||||||
|
performance degradation. The resulting index is bigger though,
|
||||||
|
the exact reason is not known to me, possibly because of
|
||||||
|
additional fragmentation </p>
|
||||||
|
</p>
|
||||||
|
|
||||||
|
</div>
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
|
|
||||||
@ -2,72 +2,146 @@
|
|||||||
<html>
|
<html>
|
||||||
<head>
|
<head>
|
||||||
<title>Recoll Index format</title>
|
<title>Recoll Index format</title>
|
||||||
|
<meta name="generator" content="HTML Tidy, see www.w3.org">
|
||||||
|
<meta name="Author" content="Jean-Francois Dockes">
|
||||||
|
<meta name="Description" content=
|
||||||
|
"recoll est un logiciel personnel de recherche textuelle pour unix et linux basé sur Xapian, un moteur d'indexation puissant et mature.">
|
||||||
|
<meta name="Keywords" content=
|
||||||
|
"recherche textuelle,desktop,unix,linux,solaris,open source,free">
|
||||||
|
<meta http-equiv="Content-language" content="fr">
|
||||||
|
<meta http-equiv="content-type" content=
|
||||||
|
"text/html; charset=iso-8859-1">
|
||||||
|
<meta name="robots" content="All,Index,Follow">
|
||||||
|
<link type="text/css" rel="stylesheet" href="styles/style.css">
|
||||||
</head>
|
</head>
|
||||||
|
|
||||||
<body>
|
<body>
|
||||||
|
<div class="content">
|
||||||
<h1>Recoll index format details</h1>
|
<h1>Recoll index format details</h1>
|
||||||
|
|
||||||
<p>Terms are not stemmed before being stored. They are turned to
|
<p>A comparison of index formats for recoll 1.8 and omega
|
||||||
all minuscule letters with no accents.</p>
|
1.0.1</p>
|
||||||
|
|
||||||
<p>Special prefixed terms:</p>
|
<p>Recoll terms are not stemmed before being stored. They are turned to
|
||||||
<ul>
|
all minuscule letters with no accents. An auxiliary database
|
||||||
<li>Ddate: modification date of file, like YYYYMMDD</li>
|
handles stem expansion. Omega stores both raw
|
||||||
|
terms and stemmed versions (with prefix Z)</p>
|
||||||
|
|
||||||
<li>Mmonth: YYYYMM</li>
|
<h2>Special prefixed terms:</h2>
|
||||||
|
|
||||||
<li>Ppathhash truncated/hashed version of file path. For
|
<p>A comparison of prefixed term usage between Recoll and
|
||||||
|
omega/xapian. <em>xapian-core</em> in the Omega column means
|
||||||
|
that the prefix is not used by Omega, but mentionned as
|
||||||
|
allocated in the xapian prefix definition document.</p>
|
||||||
|
|
||||||
|
<table border=1 cellspacing=0 width="90%">
|
||||||
|
<thead>
|
||||||
|
<tr><th>Pref.</th><th>Recoll use</th><th>Omega use</th>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
<tr><td>T</td><td>mime type</td><td>Same</td>
|
||||||
|
</tr>
|
||||||
|
|
||||||
|
<tr><td>P</td><td>Truncated/hashed version of file path. For
|
||||||
single-document files, and for the file part of a
|
single-document files, and for the file part of a
|
||||||
multi-document file. Used for up-to-date checks and for
|
multi-document file. Used for up-to-date checks and for
|
||||||
retrieving a document by path. omega uses U for the equivalent
|
retrieving a document by path. </td><td>Path part of URL (no
|
||||||
term used for up to date checks.</li>
|
hashing). Uses U for the equivalent
|
||||||
|
term used for up to date checks.</td>
|
||||||
|
</tr>
|
||||||
|
|
||||||
<li>Qpathhash+ipath same + internal path for documents inside
|
<tr><td>Q</td><td>pathhash+ipath same + internal path for
|
||||||
multi-document files. Used to set the existence flag for
|
documents inside multi-document files. Used to set the
|
||||||
subdocs when a multi-document file is found to be up to date,
|
existence flag for subdocs when a multi-document file is found
|
||||||
or for deleting all subdocs for a file, or for retrieving a
|
to be up to date, or for deleting all subdocs for a file, or
|
||||||
document by path+ipath. No real omega equivalent. Compatible
|
for retrieving a document by path+ipath. Compatible
|
||||||
with Q definition in termprefixes.txt: unique identifier.</li>
|
with Q definition in xapian/termprefixes.txt: unique
|
||||||
|
identifier.</td><td>None</td>
|
||||||
|
</tr>
|
||||||
|
|
||||||
<li>Tmimetype: document mime type.</li>
|
<tr><td>D</td><td>date: modification date of file, like
|
||||||
|
YYYYMMDD</td><td>Same</td>
|
||||||
|
</tr>
|
||||||
|
|
||||||
<li>Wweak: 10 days period (not used any more by omega)</li>
|
<tr><td>M</td><td>month: YYYYMM</td><td>Same</td>
|
||||||
|
</tr>
|
||||||
|
<tr><td>Y</td><td>year YYYY</td><td>Same</td>
|
||||||
|
</tr>
|
||||||
|
|
||||||
<li>Yyear YYYY</li>
|
<tr><td>XSFN</td><td>utf8 version of file name. Used for specific
|
||||||
|
file name searches</td><td>None</td>
|
||||||
|
</tr>
|
||||||
|
<tr><td>U</td><td>None</td><td>Url term. Truncated/hashed version
|
||||||
|
of URL. Used for duplicate checks.</td>
|
||||||
|
</tr>
|
||||||
|
|
||||||
<li>XSFNfilename utf8 version of file name. Used for specific
|
<tr><td>S</td><td>Subject/title</td><td>xapian-core</td>
|
||||||
file name searches</li>
|
</tr>
|
||||||
|
<tr><td>A</td><td>Author</td><td>xapian-core</td>
|
||||||
|
</tr>
|
||||||
|
<tr><td>K</td><td>Keyword</td><td>xapian-core</td>
|
||||||
|
</tr>
|
||||||
|
|
||||||
</ul>
|
</tbody>
|
||||||
|
</table>
|
||||||
|
|
||||||
<p>Omega prefixes with no equivalents in Recoll: P, R, U</p>
|
|
||||||
<p>None of the "date" terms are currently used by recoll queries</p>
|
<p>None of the "date" terms are currently used by recoll queries</p>
|
||||||
|
|
||||||
<p>Values: Recoll currently stores no document values.</p>
|
<h2>Values</h2>
|
||||||
|
<p>Recoll currently stores no document values.</p>
|
||||||
|
<p>Omega stores 2 values, for the md5 hash of the file, and the
|
||||||
|
last modification date (as unix time). The md5 value doesn't
|
||||||
|
appear to be currently used ?</p>
|
||||||
|
|
||||||
<p>Document data record format<p>
|
<h2>Document data record format</h2>
|
||||||
<ul>
|
<p>Recoll has the same line based / prefixed data record format
|
||||||
<li>url= Full url. Always file://abspath. The path is not
|
as omega (name=value\n).</p>
|
||||||
|
|
||||||
|
<table border=1 cellspacing=0 width="90%">
|
||||||
|
<thead>
|
||||||
|
<tr><th>Prefix</th><th>Recoll use</th><th>Omega use</th>
|
||||||
|
</tr>
|
||||||
|
</thead>
|
||||||
|
<tbody>
|
||||||
|
|
||||||
|
<tr><td>url=</td><td>Full url. Always file://abspath. The path is not
|
||||||
encoded to utf-8, this is the system file name ,usable as an
|
encoded to utf-8, this is the system file name ,usable as an
|
||||||
argument to open(). (omega: sort of same)</li>
|
argument to open()</td><td>Same</td>
|
||||||
<li>mtype= mime type (omega: type)</li>
|
</tr>
|
||||||
<li>fmtime= file modification date (omega: modtime)</li>
|
|
||||||
<li>dmtime= document modification date (omega: none)</li>
|
<tr><td>mtype=</td><td>mime type (omega: type)</td><td>type=</td>
|
||||||
<li>origcharset= character set the text was converted from
|
</tr>
|
||||||
(omega: none)</li>
|
<tr><td>fmtime=</td><td>file modification date</td><td>modtime=</td>
|
||||||
<li>fbytes= file size in bytes (omega: size)</li>
|
</tr>
|
||||||
<li>dbytes= document size in bytes (omega: none)</li>
|
<tr><td>dmtime=</td><td> document modification date</td><td>None</td>
|
||||||
<li>ipath= internal path for docs in multidoc files. (omega: none)</li>
|
</tr>
|
||||||
<li>caption= title of document, utf8 (omega: same)</li>
|
<tr><td>origcharset=</td><td> character set the text was
|
||||||
<li>keywords= key words, utf8 (omega: none)</li>
|
converted from</td><td>None</td>
|
||||||
<li>abstract= document abstract, utf8 (omega: sample)</li>
|
</tr>
|
||||||
</ul>
|
<tr><td>fbytes=</td><td> file size in bytes</td><td>size=</td>
|
||||||
|
</tr>
|
||||||
|
<tr><td>dbytes=</td><td>document size in bytes</td><td>None</td>
|
||||||
|
</tr>
|
||||||
|
<tr><td>ipath=</td><td>internal path for docs in multidoc
|
||||||
|
files</td><td>None</td>
|
||||||
|
</tr>
|
||||||
|
|
||||||
|
<tr><td>caption=</td><td>title of document, utf8</td><td>Same</td>
|
||||||
|
</tr>
|
||||||
|
<tr><td>keywords=</td><td>key words, utf8</td><td>None</td>
|
||||||
|
</tr>
|
||||||
|
<tr><td>abstract=</td><td>document abstract, utf8</td><td>sample=</td>
|
||||||
|
</tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
</div>
|
||||||
|
|
||||||
<hr>
|
<hr>
|
||||||
<address><a href="mailto:jean-francois.dockes@wanadoo.fr">Jean-Francois Dockes</a></address>
|
<address><a href="mailto:jean-francois.dockes@wanadoo.fr">Jean-Francois Dockes</a></address>
|
||||||
<!-- Created: Thu Dec 7 13:07:40 CET 2006 -->
|
<!-- Created: Thu Dec 7 13:07:40 CET 2006 -->
|
||||||
<!-- hhmts start -->
|
<!-- hhmts start -->
|
||||||
Last modified: Thu Dec 7 14:19:02 CET 2006
|
Last modified: Thu Jun 14 11:14:38 CEST 2007
|
||||||
<!-- hhmts end -->
|
<!-- hhmts end -->
|
||||||
</body>
|
</body>
|
||||||
</html>
|
</html>
|
||||||
|
|||||||
BIN
website/smile.png
Normal file
BIN
website/smile.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 1.4 KiB |
@ -92,3 +92,4 @@ a.weak {
|
|||||||
color: #aaaaaa;
|
color: #aaaaaa;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
table { empty-cells:show; }
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user