Recoll index format details
-A comparison of index formats for recoll 1.8 and omega - 1.0.1
+A comparison of index formats for recoll 1.17 and omega + 1.0.1
Recoll terms are not stemmed before being stored. They are turned to all minuscule letters with no accents. An auxiliary database handles stem expansion. Omega stores both raw - terms and stemmed versions (with prefix Z)
+ terms (with prefix R) and stemmed versions (with prefix Z). + The xapian-side of the information here comes from the relevant + xapian-omega documentation + page. +Special prefixed terms:
A comparison of prefixed term usage between Recoll and - omega/xapian. xapian-core in the Omega column means - that the prefix is not used by Omega, but mentionned as - allocated in the xapian prefix definition document.
+ omega/xapian.| T | mime type | Same | -|
| A | Author | Same | |
| P | Truncated/hashed version of file path. For - single-document files, and for the file part of a - multi-document file. Used for up-to-date checks and for - retrieving a document by path. | Path part of URL (no - hashing). Uses U for the equivalent - term used for up to date checks. | -|
| Q | pathhash+ipath same + internal path for - documents inside multi-document files. Used to set the - existence flag for subdocs when a multi-document file is found - to be up to date, or for deleting all subdocs for a file, or - for retrieving a document by path+ipath. Compatible - with Q definition in xapian/termprefixes.txt: unique - identifier. | None | -|
| B | Unused | Reserved | |
| C | Unused | Reserved | |
| D | date: modification date of file, like - YYYYMMDD | Same | + YYYYMMDDSame |
| E | Unused. Recoll uses XE | +file name extension folded to lowercase | |
| F | Unused | Reserved | |
| G | Unused | newGroup / forum name | |
| H | Unused | host name | |
| I | Unused | "Can see" | |
| J | Unused | Reserved | |
| K | Keyword | Same | |
| L | Unused | ISO language code | |
| M | month: YYYYMM | Same | |
| N | Unused | ISO country code | |
| O | Unused | Owner | |
| P | Unused | Path part of URL | |
| Q | Unique Id. fs backend: trunc-hashed path+ipath + Other backends may use a different unique id. + | Unique Id | |
| R | Unused | Raw (unstemmed) term | |
| S | Subject/title | Same | |
| T | mime type | Same | |
| U | Unused | Full Url of indexed + document. Truncated/hashed version of URL. Used for + duplicate checks. | |
| V | Unused | "Can't see" | |
| W | Unused | Owner | |
| X | Prefix prefix for multichar prefixes | +Same | |
| Y | year YYYY | Same | |
| Z | Unused | Stemmed term | |
| XE | File name extension folded as lowercase + (omega uses E) | Unused | |
| XP | Path elements (for phrase-based directory filtering) + | Unused | |
| XSFN | utf8 lowercased/unaccented version of + file name. Used for specific file name searches. NOT SPLIT + (spaces as normal chars). | None | + +|
| XTO | Recipient | None | +|
| XXST | Not really a prefix: start of field + marker (for anchored phrase searches) | None | +|
| XXND | Not really a prefix: end of field + marker (for anchored phrase searches) | None | +|
| M | month: YYYYMM | Same | -|
| Y | year YYYY | Same | -|
| XSFN | utf8 version of file name. Used for specific - file name searches | None | -|
| U | None | Url term. Truncated/hashed version - of URL. Used for duplicate checks. | -|
| S | Subject/title | xapian-core | -|
| A | Author | xapian-core | -|
| K | Keyword | xapian-core | -
None of the "date" terms are currently used by recoll queries
Values
-Recoll currently stores no document values.
-Omega stores 2 values, for the md5 hash of the file, and the - last modification date (as unix time). The md5 value doesn't - appear to be currently used ?
+ +| Value slot | Recoll use | Omega use | +
|---|---|---|
| 0 | Unused | Unix modification time |
| 1 | MD5 | Same |
| 2 | Unused | Size |
| 10 | Signature: value to be checked for + up-to-dateness, ie mtime|size for the fs + backend | Unused |
Document data record format
+Recoll has the same line based / prefixed data record format - as omega (name=value\n).
+ as omega (name=value\n). The Omega data below is quite out of + date.