diff --git a/packaging/rpm/recollCooker.spec b/packaging/rpm/recollCooker.spec new file mode 100644 index 00000000..e36ccfde --- /dev/null +++ b/packaging/rpm/recollCooker.spec @@ -0,0 +1,88 @@ +Summary: Desktop full text search tool with a qt gui +Name: recoll +Version: 1.8.1 +Release: %mkrel 1 +License: GPL +Group: Databases +URL: http://www.recoll.org/ +Source0: http://www.lesbonscomptes.com/recoll/%{name}-%{version}.tar.bz2 +Patch1: %{name}-configure.patch +BuildRequires: libxapian-devel +BuildRequires: libfam-devel +BuildRequires: libqt-devel >= 3.3.7 +BuildRequires: libaspell-devel +Requires: xapian +BuildRoot: %{_tmppath}/%{name}-%{version}--buildroot + +%description +Recoll is a personal full text search tool for Unix/Linux. +It is based on the very strong Xapian backend, for which +it provides an easy to use, feature-rich, easy administration, +QT graphical interface. + +%prep +%setup -q +%patch1 -p0 + +%build +%configure2_5x \ + --with-fam \ + --with-aspell + +%make + +%install +[ "%{buildroot}" != "/" ] && rm -rf %{buildroot} + +%makeinstall_std +desktop-file-install --vendor="" \ + --add-category="X-MandrivaLinux-MoreApplications-Databases" \ + --dir %{buildroot}%{_datadir}/applications %{buildroot}%{_datadir}/applications/* + +%clean +[ "%{buildroot}" != "/" ] && rm -rf %{buildroot} + +%files +%defattr(644,root,root,755) +%doc %{_datadir}/%{name}/doc +%attr(755,root,root) %{_bindir}/%{name}* +%{_datadir}/applications/recoll-searchgui.desktop +%{_datadir}/icons/hicolor/48x48/apps/recoll-searchgui.png +%dir %{_datadir}/%{name} +%dir %{_datadir}/%{name}/examples +%dir %{_datadir}/%{name}/filters +%dir %{_datadir}/%{name}/images +%dir %{_datadir}/%{name}/translations +%{_datadir}/%{name}/examples/mime* +%{_datadir}/%{name}/examples/*.conf +%attr(755,root,root) %{_datadir}/%{name}/examples/rclmon.sh +%attr(755,root,root) %{_datadir}/%{name}/filters/rc* +%{_datadir}/%{name}/filters/xdg-open +%{_datadir}/%{name}/images/*png +%{_mandir}/man1/recoll* +%{_mandir}/man5/recoll* +%{_datadir}/%{name}/translations/*.qm + + +%changelog +* Fri Apr 20 2007 Tomasz Pawel Gajc 1.8.1-1mdv2008.0 ++ Revision: 16093 +- new version +- drop P0 + + + Mandriva + + +* Tue Mar 06 2007 Tomasz Pawel Gajc 1.7.5-2mdv2007.0 ++ Revision: 134128 +- rebuild + +* Tue Jan 30 2007 Tomasz Pawel Gajc 1.7.5-1mdv2007.1 ++ Revision: 115423 +- add patch 1 - fix build on x86_64 +- add patch 0 - fix menu entry +- fix group +- add buildrequires +- set correct bits on files +- Import recoll + diff --git a/src/doc/user/usermanual.sgml b/src/doc/user/usermanual.sgml index 19a9f52c..d2d9a693 100644 --- a/src/doc/user/usermanual.sgml +++ b/src/doc/user/usermanual.sgml @@ -24,11 +24,12 @@ Dockes - $Id: usermanual.sgml,v 1.44 2007-06-08 16:46:53 dockes Exp $ + $Id: usermanual.sgml,v 1.45 2007-06-26 16:58:25 dockes Exp $ This document introduces full text search notions - and describes the installation and use of the &RCL; application. + and describes the installation and use of the &RCL; + application. It currently describes &RCL; 1.9. @@ -771,30 +772,6 @@ fvwm unplugged but not potatoes (in any part of the document). - The first element author:"john doe" is - a phrase search limited to a specific field. Phrase searches are - specified as usual by enclosing the words in double quotes. The - field specification appears before the colon (of course this is - not limited to phrases, author:Balzac would - be ok too). &RCL; currently manages the following fields: - - - title, - subject or caption are - synonyms which specify data to be searched for in the - document title or subject. - - author or - from for searching the documents originators. - - keyword for searching the - document specified keywords (few documents actually have any). - - - - The query language is currently the only way to use the - &RCL; field search capability. - All elements in the search entry are normally combined with an implicit AND. It is possible to specify that elements be OR'ed instead, as in Beatles @@ -817,8 +794,54 @@ fvwm An entry preceded by a - specifies a term that should not appear. + The first element in the above exemple, + author:"john doe" is a phrase search limited + to a specific field. Phrase searches are specified as usual by + enclosing the words in double quotes. The field specification + appears before the colon (of course this is not limited to + phrases, author:Balzac would be ok + too). &RCL; currently manages the following fields: + + title, + subject or caption are + synonyms which specify data to be searched for in the + document title or subject. + + author or + from for searching the documents originators. + + keyword for searching the + document specified keywords (few documents actually have any). + + + + As of release 1.9, the filters have the possibility to + create other fields with arbitrary names. No standard filters + use this possibility yet. + + There are two other elements which may be specified + through the field syntax, but are somewhat special: + + ext for specifying the file + name extension (Ex: ext:html) + + mime for specifying the + mime type. This one is quite special because you can specify + several values which will be OR'ed (the normal default for the + language is AND). Ex: mime:text/plain + mime:text/html. Specifying an explicit boolean + operator or negation (-) before a + mime specification is not supported and + will produce strange results. + + + The query language is currently the only way to use the + &RCL; field search capability. + Words inside phrases and capitalized words are not - stem-expanded. Wildcards may be used anywhere. + stem-expanded. Wildcards may be used anywhere inside a term. + Specifying a wild-card on the left of a term can produce a very + slow search. You can use the show query link at the top of the result list to check the exact query which was @@ -2089,36 +2112,91 @@ skippedPaths = ~/somedir/*.txt will be given a file name as argument and should output the text contents in html format on the standard output. - The html could be very minimal like the following - example: - <html><head> -<meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> -</head> -<body>some text content</body></html> - - - You should take care to escape some characters inside - the text by transforming them into appropriate - entities. "&" should be transformed into - "&amp;", "<" - should be transformed into "&lt;". - - The character set needs to be specified in the - header. It does not need to be UTF-8 (&RCL; will take care - of translating it), but it must be accurate for good - results. - - &RCL; will also make use of other header fields if - they are present: title, - description, keywords. - - The easiest way to write a new filter is probably to start - from an existing one. + You can find more details about writing a &RCL; filter + in the section about + writing filters + + + Extending &RCL; + + + Writing a document filter + + &RCL; filters are executable programs which + translate from a specific format (ie: + openoffice, + acrobat, etc.) to the &RCL; + indexing input format, which was chosen to be HTML. + + &RCL; filters are usually shell-scripts, but this is in + no way necessary. These programs are extremely simple and most + of the difficulty lies in extracting the text from the native + format, not outputting what is expected by &RCL;. Happily + enough, most document formats already have translators or text + extractors which handle the difficult part and can be called + from the filter. + + Filters are called with a single argument which is the + source file name. They should output the result to stdout. + + The RECOLL_FILTER_FORPREVIEW + environment variable (values yes, + no) tells the filter if the operation is + for indexing or previewing. Some filters use this to output a + slightly different format. This is not essential. + + The output HTML could be very minimal like the following + example: + + <html><head> +<meta http-equiv="Content-Type" content="text/html;charset=UTF-8"> +</head> +<body>some text content</body></html> + + + You should take care to escape some characters inside + the text by transforming them into appropriate + entities. "&" should be transformed into + "&amp;", "<" + should be transformed into "&lt;". + + The character set needs to be specified in the + header. It does not need to be UTF-8 (&RCL; will take care + of translating it), but it must be accurate for good + results. + + &RCL; will also make use of other header fields if + they are present: title, + description, + keywords. + + As of &RCL; release 1.9, filters also have the + possibility to "invent" field names. This should be output as + meta tags: + + +<meta name="somefield" content="Some textual data" /> + + + In this case, a correspondance between field name and + &XAP; prefix should also be added to the + mimeconf file. See the existing entries + for inspiration. The field can then be used inside the query + language to narrow searches. + + The easiest way to write a new filter is probably to start + from an existing one. + + + + + + diff --git a/website/BUGS.txt b/website/BUGS.txt index 68274c66..e63e47b6 100644 --- a/website/BUGS.txt +++ b/website/BUGS.txt @@ -4,10 +4,21 @@ Bugs that are listed in an older version section are supposedly fixed in later versions. Bugs listed in the topmost section may also exist in older versions. -Latest (1.8.1): +Latest (1.8.2): +- There are a few problems in the qt4 version of recoll: some accelerators + (esc-spc, ctl-arrow) do not work, neither do copy/paste between the + result list and preview windows and x11 applications. - The dates shown for email attachments in a result list are the email folder modification date. This should be inherited from the parent message instead. +- There are sometimes problems with document deletions: the index can + get in a state where deleted or moved documents are not purged from the + index (the log file says that the doc are deleted, but they aren't + actually). When this happens, the only solution currently is to reindex + from scratch (recollindex -z). This is due to a xapian bug, which will be + fixed in a future release. You can apply the following patch to xapian + 1.0.1 to fix it: + http://www.lesbonscomptes.com/recoll/xapian/xapian-delete-document.patch - NEAR crashes: 1.6 has added NEAR searches. Unlike what recoll did with PHRASES, stemming expansion is performed on terms inside NEAR clauses (except if prevented by a capitalized entry of course). There is @@ -39,9 +50,9 @@ Latest (1.8.1): compressed (ie: xxx.txt.gz), recoll will try to start the external viewer on the compressed file, which will not work in most cases. -- There are problems which have been reported indexing big mailstores - (several hundreds of thousands of messages): resulting in a very big - database and even crashes during indexation. +- Problems have been reported indexing big mailstores (several hundreds of + thousands of messages): resulting in a very big database and even + crashes. - Under some versions of KDE (ie: Fedora FC5 KDE 3.5.4-0.5.fc5), there is a problem with the window stacking order. Opening the "browse" file diff --git a/website/CHANGES.txt b/website/CHANGES.txt index 120bafa8..67be7bab 100644 --- a/website/CHANGES.txt +++ b/website/CHANGES.txt @@ -1,5 +1,31 @@ CHANGES +1.9.0 +- Add option to remember sort tool state between program invocations (it is + reset to inactive by default) +- Improve qt4 build: no more need for --enable-qt4 +- Fixed a number of qt4 glitches: selection and keyboard shortcuts. +- When searching for an empty string inside the preview window, position + the window to the next occurrence of the primary search terms. +- Have email attachments inherit date and author from their parent message +- Added an adjustable flush threshold during indexing: should help control + memory usage. See the idxflushmb configuration parameter. +- Added a check for file system free space. Indexing will stop if the + threshold is reached. See the maxfsoccuppc configuration parameter. +- Fix bus error on rclmon exit +- Better handle aspell errors inside rclmon +- Added File menu entry to erase document history. +- Added ext: and mime: selectors to the query language. +- Added support for arbitrary fields. Filters can now produce any number of + fields which will be selectively searchable through the query language. +- Added abiword and kword support. +- Contributed filter: rcljpeg. This should be extended to use the new field + support. +- Changed the icon to an ugly one. The previous one was nicer but looked + too much like Xapian's. +- Added some kind of support for a stopword list. +- Bound space and backspace to PgUp/PgDown in preview. + 1.8.2 2007-05-19 - Fixed method name for compatibility with xapian 1.0.0 - Add .beagle to default list of skipped names (avoids indexing beagle diff --git a/website/credits.html b/website/credits.html index 6b50ad28..fbdc469f 100644 --- a/website/credits.html +++ b/website/credits.html @@ -38,7 +38,7 @@

First of all, many thanks to the users who provided criticism and ideas to make Recoll go forward ! Please - contact me if you have something to suggest.

Recoll borrows diff --git a/website/doc.html b/website/doc.html index a9dc810e..7897446c 100644 --- a/website/doc.html +++ b/website/doc.html @@ -30,16 +30,24 @@

-

Recoll user manuals

+

Recoll user manual

-
-
+


+ +

Other documentation

+ + + +
diff --git a/website/download.html b/website/download.html index a84bc0a9..4985a0d9 100644 --- a/website/download.html +++ b/website/download.html @@ -24,7 +24,7 @@ @@ -47,6 +47,8 @@

+

General information

+

You will probably need to have a look at the installation manual for building and/or installation instructions.

@@ -68,12 +70,17 @@ list to decide what you may want to install.

+

In addition, optional functionality in Recoll (the term explorer + tool in phonetic mode) uses the aspell package. The + installed version should be at least 0.60 (utf-8 support) for + this to run smoothly. This function is far from essential.

+

If you find problems with the package or its installation, please report them.

-

What do the release numbers mean?

+

What do the release numbers mean?

The Recoll releases are numbered X.Y.Z.

@@ -110,7 +117,16 @@ 1.8.2 was released purely for fixing a small issue of compatibility with xapian 1.0.0 and small config/install glitches. There is no functional reason to upgrade from - 1.8.1, (or update packages). + 1.8.1, (or update packages).

+ +

Recoll 1.8.2 is the first release that will let you take + advantage of the new Xapian 1.0, the main user-visible change + of which is the new default index format. In order to take + advantage of the new format (which is not mandatory) Recoll + users updating from an older release need to delete their old + index. There are more + details in the user manual.

Older recoll releases: 1.8.1 @@ -128,8 +144,8 @@

Packages

The executables inside the binary rpms have a static link to - xapian, there is no dependency except Qt 3.3. Of course you need - xapian-core installed to use the source rpm.

+ xapian 0.9.x, there is no dependency except Qt 3.3. Of course + you need xapian-core installed to use the source rpm.

Fedora Core FC6 RPM: @@ -168,10 +184,16 @@ debian/edgy

+

Ubuntu 6.06 dapper (the feisty version does not work + on dapper). This has a static link on xapian 0.9.10: + + recoll_1.8.2-0ubuntu1_i386.deb

+

Debian unstable Recoll is in the package repository, - you can install it with the usual apt-get install - recoll. Package page

+ you can install it with the usual apt-get install + recoll. + Package page

Debian 3.1 Thanks to Mario () for these: i386: diff --git a/website/features.html b/website/features.html index 7cb84ff5..e04201d9 100644 --- a/website/features.html +++ b/website/features.html @@ -142,6 +142,7 @@ +

Stemming

Stemming is a process which transforms inflected words into diff --git a/website/fr/features.html b/website/fr/features.html new file mode 100644 index 00000000..7f70d4a1 --- /dev/null +++ b/website/fr/features.html @@ -0,0 +1,205 @@ + + + + + RECOLL: un outil personnel de recherche textuelle pour + Unix et Linux + + + + + + + + + + + + +

+ +
+ +

Caractéristiques de Recoll

+ +
+
Systèmes
+
Recoll a été compilé et + testé sur FreeBSD, Linux, Darwin, Solaris (versions + FreeBSD 5.5, Fedora Core 5, Suse 10.1, Gentoo, + Debian 3.1, Ubuntu Edgy, Solaris 8/9, mais d'autres versions + récentes conviennent sans doute également).
+ +
Versions de QT: 3.2, 3.3 et 4.2
+ +
Types de documents
+
Recoll peut traiter les types de documents suivants, ainsi + que des fichiers compressés du même type: + +
+
En interne
+ +
+
    +
  • text.
  • + +
  • html.
  • + +
  • OpenOffice + (avec l'aide de la commande unzip).
  • + +
  • maildir et mailbox (Mozilla, Thunderbird, Evolution et sans doute + d'autres).
  • + +
  • Fichiers de conversation + gaim.
  • + +
  • Scribus.
  • + +
+
+ +
With external helpers
+ +
+ +
+
+
+ +
Autres caractéristiques
+
+
    +
  • Index multiples interrogeables ensemble ou séparément.
  • + +
  • Fonctions de recherche puissantes, avec expressions + booléennes, phrases et proximité, caractères jokers, + filtrage sur les types de fichiers où l'emplacement.
  • + +
  • Fonction spécifique de recherche de noms de fichiers.
  • + +
  • Support de jeux de caractères multiples. Les traitements + internes et l'index utilisent l'encodage Unicode UTF-8.
  • + +
  • L'extraction des racines de mots + Stemming est effectuée au moment de la recherche + (permet de changer de langue après l'indexation).
  • + +
  • Installation facile. Pas de processus permanent, de + serveur web ou environnement exotique.
  • + +
  • Un indexeur qui peut fonctionner soit comme un + processus léger dans l'interface de consultation, comme un + programme batch externe intégrable par + cron, ou comme un processus + permanent pour l'indexation au fil de l'eau.
  • + +
+
+ + +

Lemmatisation

+ +

Note: je serais preneur d'une traduction française + agréable pour "stemming".

+

La lemmatisation transforme un mot dérivé vers sa racine. + Par exemple, aimer, aimerai, aimait, + aimez etc. seraient transformés en aim en + français. Une recherche de l'un quelconque des dérivés peut + automatiquement être étendue vers tous les autres

+ +

Certains moteurs de recherche appliquent la transformation + pendant l'indexation. L'index ne stocke que les racines des + mots, avec des exceptions pour les termes qui sont reconnus + comme des noms propres (capitalisation). Au moment de la + recherche, les termes de la requête sont également transformés + avant comparaison à l'index.

+ +

Cette approche permet un index plus petit, mais elle perd + irrévocablement de l'information pendant l'indexation.

+ +

Recoll fonctionne différemment. Les termes sont indexés sans + transformation. L'index résultant est plus gros, ce qui n'a + probablement pas beaucoup d'importance à une époque de disques + de 100 Go principalement remplis d'information multimédia + non indexée. + +

À la fin de l'indexation, Recoll construit un ou plusieurs + dictionnaires de transformation (pour différents langages), où + toutes les racines sont listées avec leurs transformations + possibles.

+ + +

Au moment de la recherche, par défaut, les termes de + l'utilisateurs sont transformés, et étendus aux dérivés par + utilisation du dictionnaire. + Les résultats obtenus sont analogues à ceux de + l'autre méthode. L'avantage est que l'expansion peut être + contrôlée au moment de la recherche: +

    +
  • On peut la supprimer pour n'importe quel terme de la + requête, (en le faisant débuter par une capitale: + Aime par exemple pour chercher la ville d'Aime la + Plagne).
  • +
  • Le langage de transformation peut également être changé, + en supposant que plusieurs dictionnaires de transformation + aient été construits lors de l'indexation.
  • +
+ +
+ + + diff --git a/website/index.html.en b/website/index.html.en index 1e28084b..74c5db76 100644 --- a/website/index.html.en +++ b/website/index.html.en @@ -81,6 +81,16 @@
  • (more detail)
  • + +

    News:

    +

    There are new filters for + kword and + abiword files in the + new filters section. These + are usable with an existing Recoll 1.8 installation.

    + +

    Support

    If you have any problem with Recoll, its diff --git a/website/index.html.fr b/website/index.html.fr index 601b6c6d..9451eeaf 100644 --- a/website/index.html.fr +++ b/website/index.html.fr @@ -97,6 +97,15 @@ +

    Nouvelles:

    +

    Il y a de nouveaux filtres d'indexation pour les fichiers + kword et + abiword. Ils sont téléchargeables + dans la zone des nouveaux + filtres, et sont utilisable avec une installation existante de + Recoll 1.8.

    + +

    Support

    Si vous avez un problème quelconque avec le logiciel ou son diff --git a/website/mario.png b/website/mario.png new file mode 100644 index 00000000..773946b0 Binary files /dev/null and b/website/mario.png differ diff --git a/website/perfs.html b/website/perfs.html new file mode 100644 index 00000000..bfd8ed70 --- /dev/null +++ b/website/perfs.html @@ -0,0 +1,114 @@ + + + + + RECOLL: a personal text search system for + Unix/Linux + + + + + + + + + + + + +

    + +
    + +

    Recoll: Indexing performance and index sizes

    + +

    The time needed to index a given set of documents, and the + resulting index size depend of many factors, such as file size + and proportion of actual text content for the index size, cpu + speed, available memory, average file size and format for the + speed of indexing.

    + +

    We try here to give a number of reference points which can + be used to roughly estimate the resources needed to create and + store an index. Obviously, your data set will never fit one of + the samples, so the results cannot be exactly predicted.

    + +

    The following data was obtained on a machine with a 1800 Mhz + AMD Duron CPU, 768Mb of Ram, and a 7200 RPM 160 GBytes IDE + disk, running Suse 10.1.

    + +

    recollindex (version 1.8.2 with xapian 1.0.0) is + executed with the default flush threshold value. + The process memory usage is the one given by ps

    + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    DataData sizeIndexing timeIndex sizePeak process memory usage
    Random pdfs harvested on Google1.7 GB, 3564 files27 mn230 MB225 MB
    Ietf mailing list archive211 MB, 44,000 messages8 mn350 MB90 MB
    Partial Wikipedia dump15 GB, one million files6H3010 GB324 MB
    Random pdfs harvested on Google
    + Recoll 1.9, idxflushmb set to 10
    1.7 GB, 3564 files25 mn262 MB65 MB
    + +

    Notice how the index size for the mail archive is bigger than + the data size. Myriads of small pure text documents will do + this. The factor of expansion would be even much worse with + compressed folders of course (the test was on uncompressed + data).

    + +

    The last test was performed with Recoll 1.9.0 which has an + ajustable flush threshold (idxflushmb parameter), here + set to 10 MB. Notice the much lower peak memory usage, with no + performance degradation. The resulting index is bigger though, + the exact reason is not known to me, possibly because of + additional fragmentation

    +

    + +
    + + + diff --git a/website/rclidxfmt.html b/website/rclidxfmt.html index 41b330de..57ced06a 100644 --- a/website/rclidxfmt.html +++ b/website/rclidxfmt.html @@ -2,72 +2,146 @@ Recoll Index format + + + + + + + + +

    Recoll index format details

    -

    Terms are not stemmed before being stored. They are turned to - all minuscule letters with no accents.

    +

    A comparison of index formats for recoll 1.8 and omega + 1.0.1

    -

    Special prefixed terms:

    -
      -
    • Ddate: modification date of file, like YYYYMMDD
    • +

      Recoll terms are not stemmed before being stored. They are turned to + all minuscule letters with no accents. An auxiliary database + handles stem expansion. Omega stores both raw + terms and stemmed versions (with prefix Z)

      -
    • Mmonth: YYYYMM
    • +

      Special prefixed terms:

      -
    • Ppathhash truncated/hashed version of file path. For +

      A comparison of prefixed term usage between Recoll and + omega/xapian. xapian-core in the Omega column means + that the prefix is not used by Omega, but mentionned as + allocated in the xapian prefix definition document.

      + + + + + + + + + + + + -
    • Qpathhash+ipath same + internal path for documents inside - multi-document files. Used to set the existence flag for - subdocs when a multi-document file is found to be up to date, - or for deleting all subdocs for a file, or for retrieving a - document by path+ipath. No real omega equivalent. Compatible - with Q definition in termprefixes.txt: unique identifier.
    • + + -
    • Tmimetype: document mime type.
    • + + -
    • Wweak: 10 days period (not used any more by omega)
    • + + + + -
    • Yyear YYYY
    • + + + + -
    • XSFNfilename utf8 version of file name. Used for specific - file name searches
    • + + + + + + + + +
      Pref.Recoll useOmega use
      Tmime typeSame
      PTruncated/hashed version of file path. For single-document files, and for the file part of a multi-document file. Used for up-to-date checks and for - retrieving a document by path. omega uses U for the equivalent - term used for up to date checks. + retrieving a document by path. Path part of URL (no + hashing). Uses U for the equivalent + term used for up to date checks.
      Qpathhash+ipath same + internal path for + documents inside multi-document files. Used to set the + existence flag for subdocs when a multi-document file is found + to be up to date, or for deleting all subdocs for a file, or + for retrieving a document by path+ipath. Compatible + with Q definition in xapian/termprefixes.txt: unique + identifier.None
      Ddate: modification date of file, like + YYYYMMDDSame
      Mmonth: YYYYMMSame
      Yyear YYYYSame
      XSFNutf8 version of file name. Used for specific + file name searchesNone
      UNoneUrl term. Truncated/hashed version + of URL. Used for duplicate checks.
      SSubject/titlexapian-core
      AAuthorxapian-core
      KKeywordxapian-core
      -
    - -

    Omega prefixes with no equivalents in Recoll: P, R, U

    None of the "date" terms are currently used by recoll queries

    -

    Values: Recoll currently stores no document values.

    +

    Values

    +

    Recoll currently stores no document values.

    +

    Omega stores 2 values, for the md5 hash of the file, and the + last modification date (as unix time). The md5 value doesn't + appear to be currently used ?

    -

    Document data record format

    -

      -
    • url= Full url. Always file://abspath. The path is not +

      Document data record format

      +

      Recoll has the same line based / prefixed data record format + as omega (name=value\n).

      + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
      PrefixRecoll useOmega use
      url=Full url. Always file://abspath. The path is not encoded to utf-8, this is the system file name ,usable as an - argument to open(). (omega: sort of same) -
    • mtype= mime type (omega: type)
    • -
    • fmtime= file modification date (omega: modtime)
    • -
    • dmtime= document modification date (omega: none)
    • -
    • origcharset= character set the text was converted from - (omega: none)
    • -
    • fbytes= file size in bytes (omega: size)
    • -
    • dbytes= document size in bytes (omega: none)
    • -
    • ipath= internal path for docs in multidoc files. (omega: none)
    • -
    • caption= title of document, utf8 (omega: same)
    • -
    • keywords= key words, utf8 (omega: none)
    • -
    • abstract= document abstract, utf8 (omega: sample)
    • - + argument to open()
      Same
      mtype=mime type (omega: type)type=
      fmtime=file modification datemodtime=
      dmtime= document modification dateNone
      origcharset= character set the text was + converted fromNone
      fbytes= file size in bytessize=
      dbytes=document size in bytesNone
      ipath=internal path for docs in multidoc + filesNone
      caption=title of document, utf8Same
      keywords=key words, utf8None
      abstract=document abstract, utf8sample=
      +

    Jean-Francois Dockes
    -Last modified: Thu Dec 7 14:19:02 CET 2006 +Last modified: Thu Jun 14 11:14:38 CEST 2007 diff --git a/website/smile.png b/website/smile.png new file mode 100644 index 00000000..49d678dd Binary files /dev/null and b/website/smile.png differ diff --git a/website/styles/style.css b/website/styles/style.css index f4a7918c..8fd1315a 100644 --- a/website/styles/style.css +++ b/website/styles/style.css @@ -92,3 +92,4 @@ a.weak { color: #aaaaaa; } +table { empty-cells:show; }