353 Commits

Author SHA1 Message Date
Jean-Francois Dockes
075f1f7518 filenames used for "filename search" need to be lowercased and stripped 2012-10-15 08:06:04 +02:00
Jean-Francois Dockes
bfeb681574 mimetype T prefix was mishandled for a raw index 2012-10-13 11:08:53 +02:00
Jean-Francois Dockes
3a2b15da10 comment 2012-10-12 13:36:38 +02:00
Jean-Francois Dockes
a16d047f8d Snippet generation: limit positions walk to max hit position. Return status code when truncated walk possibly generated incomplete snippets. Implement config variabl for max pos walk 2012-10-08 14:30:14 +02:00
Jean-Francois Dockes
c9f6612c10 implemented proper limitation and error reporting in case of truncation for term and query expansions 2012-10-05 12:36:19 +02:00
Jean-Francois Dockes
bfd111ecaa removed list size truncature on filename expansion 2012-10-05 09:19:42 +02:00
Jean-Francois Dockes
3f331ebb3e fix glitch caused by udi prefix change 2012-10-03 08:05:39 +02:00
Jean-Francois Dockes
be27f404d2 Prefixes for unique identifier and parent terms were not wrapped 2012-10-02 19:16:57 +02:00
Jean-Francois Dockes
4a0a4fcf8e fix 2 glitches in pdf page numer handling 2012-10-01 11:27:16 +02:00
"Jean-Francois Dockes ext:(%22)
af2d031e50 moved snippets generation code from db to query object 2012-09-26 12:13:40 +02:00
"Jean-Francois Dockes ext:(%22)
52bc9f4aa3 merged the case/diac sensitivity code back into trunk 2012-09-25 19:20:24 +02:00
"Jean-Francois Dockes ext:(%22)
ab32062fcc Separate count and context for snippets in the snippets popup from the default values for the result list 2012-09-23 18:19:43 +02:00
Jean-Francois Dockes
d9dc7cf142 preliminary implementation for the snippets "open to page" popup window 2012-09-20 13:51:40 +02:00
Jean-Francois Dockes
d25d79ea42 changed variable names for clarity 2012-09-19 19:49:43 +02:00
Jean-Francois Dockes
1b5136539f Bad concatenation generated absurd page numbers for document with several multiple paeg breaks 2012-09-19 14:04:20 +02:00
Jean-Francois Dockes
9b273d94e8 ensure that recoll configured with indexStripChars=1 runs as compiled with -DRCL_INDEX_STRIPCHARS
--HG--
branch : CASEDIACSENS
2012-09-15 15:16:20 +02:00
Jean-Francois Dockes
a7222d4f96 Make Recoll optionally sensitive to case and diacritics
--HG--
branch : CASEDIACSENS
2012-09-14 14:34:27 +02:00
Jean-Francois Dockes
3dfaa7525b Display page numbers inside abstracts when possible (e.g.: for pdfs) 2012-09-11 12:44:40 +02:00
Jean-Francois Dockes
3343a7f724 Fix the page break recording function for multiple page break at same term position 2012-09-10 18:14:21 +02:00
Jean-Francois Dockes
de812094b5 more small prefix fixups 2012-08-28 17:36:24 +02:00
Jean-Francois Dockes
2d6e11c0aa simplified field config a bit by moving some hard coded values from the c++ to the fields file 2012-08-28 14:44:53 +02:00
Jean-Francois Dockes
776800f47a arrange to create all stem dicts in one pass 2012-08-28 13:39:34 +02:00
Jean-Francois Dockes
fc8b458222 create class StemDb as derived class from XapSynFamily 2012-08-27 15:38:08 +02:00
Jean-Francois Dockes
bd0f002c1a Reimplemented the stem expansion mechanism over Xapian synonyms feature 2012-08-25 11:12:36 +02:00
"Jean-Francois Dockes ext:(%22)
0ebfc496d8 add capability to remember page breaks generated by, e.g. pdftotext, and use them to start an external viewer on a match page 2012-08-21 15:03:02 +02:00
Jean-Francois Dockes
baf450e75a rcldb fix crash caused by 5c8d237c639d in case there is only one index 2012-05-04 11:54:07 +02:00
Jean-Francois Dockes
73a3106a6d GUI: only do the result up to date check before preview for the main index (we cant update the others anyway) 2012-05-04 09:52:14 +02:00
Jean-Francois Dockes
8b34610dde Cleaned up file name handling. Fixes that file names were sometimes indexed split, sometimes not. They now always are both, with different prefixes. Forces reindex 2012-04-13 09:18:08 +02:00
Jean-Francois Dockes
4eaf12fb9c more delistification 2012-04-12 08:15:50 +02:00
Jean-Francois Dockes
ec7b40a52e cosmetics: list -> vector in more places 2012-04-11 19:58:08 +02:00
Jean-Francois Dockes
c7c9c49437 add -Z "in place reset" option to recollindex 2012-04-11 11:33:33 +02:00
Jean-Francois Dockes
07813ab6ba Dont store filename in empty title at index time, to keep choice at display time. Define %t as title in addition to %T as title or filename 2012-03-10 14:45:40 +01:00
Jean-Francois Dockes
7ddbbb1ee8 search language: implemented filtering on file size 2012-03-07 17:08:22 +01:00
Jean-Francois Dockes
85166c93b2 Changed the way we handle document sizes. The fbytes field should now be in most cases the most "natural" document size. pcbytes holds the top external container size and dbytes the text size 2012-03-07 15:39:30 +01:00
Jean-Francois Dockes
7b5a891ee3 idx: make Doc parameter to addOrUpdate non const to avoid extra copy 2012-03-07 08:34:25 +01:00
Jean-Francois Dockes
9bc2fc8958 Experimented with multithreading the indexing pipeline. Left undef'd as 15%-30% improvement of indexing time does not seem worth the complexity 2012-02-21 17:09:02 +01:00
Jean-Francois Dockes
516863b5d6 GUI: perform up to date check before previewing a subdoc. This is for example to avoid showing the wrong message if a mail folder has been compacted 2012-01-20 17:48:55 +01:00
Jean-Francois Dockes
607d3cc27b Add prefix translation for "mtype". Allows using term expansion to retrieve all the types from the index 2011-11-25 19:47:39 +01:00
Jean-Francois Dockes
0860b559ee get rid of a few garbage terms during indexing. Set a threshold for conversion errors after which we discard the doc. Stabilize the new termproc pipeline but no commongrams for now 2011-10-12 17:55:58 +02:00
Jean-Francois Dockes
5fd31172f5 New text to terms processing pipelines: results identical to 1.16 when used with empty stopfile 2011-10-07 07:53:49 +02:00
Jean-Francois Dockes
eda494153e simplify calls to isStop 2011-10-05 17:25:35 +02:00
Jean-Francois Dockes
acb297c9df comments + move the position jump to text_to_words 2011-10-04 16:33:44 +02:00
Jean-Francois Dockes
4ced9bee49 add termDocCnt method 2011-10-04 08:04:17 +02:00
Jean-Francois Dockes
38e0957962 const string cleanup 2011-10-01 16:39:38 +02:00
Jean-Francois Dockes
383468e2fc bump doc create/update messages updates to loginfo so that indexing progress can be monitored with less noise 2011-09-30 08:47:39 +02:00
Jean-Francois Dockes
424e4173ba threading cleanup: add mutex protection around moronic change to transcode. Add mutex to equiv issue in unac. Rename const strings everywhere to cstr_xx to ease future detection of potentially problematic static variables. Most probably close issue #65 2011-09-28 15:01:14 +02:00
Jean-Francois Dockes
e0d211d602 none 2011-09-20 17:16:41 +02:00
Jean-Francois Dockes
ee0d602ab3 Implement anchored searches: terms to be found at a maximum distance of the start or end of the text 2011-09-20 16:42:56 +02:00
Jean-Francois Dockes
c5ff0cdf52 Control memory usage when deleting documents: use idxflushmb as when adding/updating 2011-09-07 19:11:11 +02:00
Jean-Francois Dockes
a380873029 suppress some sources of spurious ellipsises in abstracts 2011-08-24 14:51:59 +02:00