226 Commits

Author SHA1 Message Date
Jean-Francois Dockes
3f331ebb3e fix glitch caused by udi prefix change 2012-10-03 08:05:39 +02:00
Jean-Francois Dockes
efd319025d attempt to eliminate more unicode uninteresting characters 2012-10-02 17:45:16 +02:00
Jean-Francois Dockes
f510262e1d default to stripped index 2012-10-01 17:27:16 +02:00
"Jean-Francois Dockes ext:(%22)
52bc9f4aa3 merged the case/diac sensitivity code back into trunk 2012-09-25 19:20:24 +02:00
Jean-Francois Dockes
de4225e1ae cleaned up uproplist file 2012-09-20 07:15:15 +02:00
Jean-Francois Dockes
63d97e597b added a bunch of graphic characters to the word breakers list and changed the container used from set to unordered_set for speed 2012-09-19 19:50:45 +02:00
Jean-Francois Dockes
9b273d94e8 ensure that recoll configured with indexStripChars=1 runs as compiled with -DRCL_INDEX_STRIPCHARS
--HG--
branch : CASEDIACSENS
2012-09-15 15:16:20 +02:00
Jean-Francois Dockes
a7222d4f96 Make Recoll optionally sensitive to case and diacritics
--HG--
branch : CASEDIACSENS
2012-09-14 14:34:27 +02:00
Jean-Francois Dockes
8b40cb0499 merged from cdsens branch 2012-09-13 12:28:42 +02:00
Jean-Francois Dockes
e0bc65bfdd small mods inocuous or auxiliary to case/diac sensitivity but which can live in main branch 2012-09-13 12:25:01 +02:00
Jean-Francois Dockes
25f4fc3b2c none 2012-09-13 12:07:27 +02:00
Jean-Francois Dockes
c030a15780 Remove improper assertion use from beagle cache handling code 2012-09-13 09:44:47 +02:00
Jean-Francois Dockes
20f79e400f fixed incorrect unique() algo usage 2012-09-01 17:27:49 +02:00
Jean-Francois Dockes
913dffc597 added code for unac to perform pure case-folding 2012-08-27 12:40:57 +02:00
Jean-Francois Dockes
909d92b218 added some currency symbols to punctuation 2012-08-24 20:54:03 +02:00
"Jean-Francois Dockes ext:(%22)
0ebfc496d8 add capability to remember page breaks generated by, e.g. pdftotext, and use them to start an external viewer on a match page 2012-08-21 15:03:02 +02:00
Jean-Francois Dockes
f95cee0cb4 config checking test program 2012-05-28 09:08:55 +02:00
Jean-Francois Dockes
7bbf61c8cb config: getDaemSkippedPaths() could crash if daemSkippedPaths was empty 2012-05-25 17:06:01 +02:00
Jean-Francois Dockes
4eaf12fb9c more delistification 2012-04-12 08:15:50 +02:00
Jean-Francois Dockes
ec7b40a52e cosmetics: list -> vector in more places 2012-04-11 19:58:08 +02:00
Jean-Francois Dockes
a4c17941b1 Added a configuration parameter to set specific unaccenting/lowercasing for some characters to be handled differently than would result from using the Unicode database. Exemple: "a with ring above" could be set to be preserved by a Swedish locutor 2012-04-09 12:42:23 +02:00
"Jean-Francois Dockes ext:(%22)
f044b20d5a Remove dependance on system type name in a few more places 2012-04-02 09:52:04 +02:00
Jean-Francois Dockes
581fcbc01e fix handling for some trademark, registered and copyright signs 2012-03-20 10:33:27 +01:00
Jean-Francois Dockes
85166c93b2 Changed the way we handle document sizes. The fbytes field should now be in most cases the most "natural" document size. pcbytes holds the top external container size and dbytes the text size 2012-03-07 15:39:30 +01:00
Jean-Francois Dockes
638d468796 clarified the use of string keys inside the Filter metaData array 2012-03-07 10:13:46 +01:00
Jean-Francois Dockes
2c6b023a88 real time indexer: monitor the configuration for changes and reexecute when needed 2012-03-06 09:35:21 +01:00
Jean-Francois Dockes
a5af2b93bd "md5"->cstr_md5 2012-02-25 10:41:27 +01:00
Jean-Francois Dockes
ef00bfae70 Implement the gui category filters as query language fragments instead of hard-coding them. This allows implementing other kinds of filtering (ie:on directory) just by changing a configuration file 2012-02-18 11:21:09 +01:00
Jean-Francois Dockes
1b0c77c2e4 add parameter to specify indexing status file path 2012-02-17 16:33:47 +01:00
Jean-Francois Dockes
068fa8ccc7 test driver fix 2012-02-17 10:17:12 +01:00
Jean-Francois Dockes
f59e2e033a index: update a status file while indexing 2012-02-06 17:03:39 +01:00
Jean-Francois Dockes
07226fa306 GUI tools for setting up indexing schedule, initial implementation done 2011-12-07 13:41:05 +01:00
Jean-Francois Dockes
b9c64e8591 Gui: help for cron etc. 1st checkpoint 2011-12-02 19:15:24 +01:00
Jean-Francois Dockes
3759c0b52d index: add skippedPathsFnmPathname variable to enable disabling the use of FNM_PATHNAME while matching in skippedPaths. Closes issue #67 2011-11-30 16:36:51 +01:00
Jean-Francois Dockes
27430403e2 comment 2011-11-25 19:44:37 +01:00
Jean-Francois Dockes
49554e42c2 Factorized common text transcoding code in separate module 2011-10-20 17:53:42 +02:00
Jean-Francois Dockes
6c72454396 generate acronyms for dotted abbrevs. ie O.E.C.D -> OECD 2011-10-20 13:24:29 +02:00
Jean-Francois Dockes
56fe54412f Protect against deadlock when using fam/gamin by adding a small timeout to the peek for events done between add calls. Add alarm to the addwatch call in case the deadlock happens anyway 2011-10-13 15:20:28 +02:00
Jean-Francois Dockes
0860b559ee get rid of a few garbage terms during indexing. Set a threshold for conversion errors after which we discard the doc. Stabilize the new termproc pipeline but no commongrams for now 2011-10-12 17:55:58 +02:00
Jean-Francois Dockes
5fd31172f5 New text to terms processing pipelines: results identical to 1.16 when used with empty stopfile 2011-10-07 07:53:49 +02:00
Jean-Francois Dockes
38e0957962 const string cleanup 2011-10-01 16:39:38 +02:00
Jean-Francois Dockes
3013e843a2 log 2011-10-01 09:20:10 +02:00
Jean-Francois Dockes
91778f8943 lower verbosity 2011-09-30 08:21:43 +02:00
Jean-Francois Dockes
424e4173ba threading cleanup: add mutex protection around moronic change to transcode. Add mutex to equiv issue in unac. Rename const strings everywhere to cstr_xx to ease future detection of potentially problematic static variables. Most probably close issue #65 2011-09-28 15:01:14 +02:00
Jean-Francois Dockes
5b3c5d8a5d small OpenBSD fixes (mount.h and FILE_OFFSET_BITS) 2011-09-23 10:32:41 +02:00
Jean-Francois Dockes
cd27645cc2 Avoid fwrite failure while trying to write empty missing helpers string 2011-09-20 07:37:28 +02:00
Jean-Francois Dockes
c5ff0cdf52 Control memory usage when deleting documents: use idxflushmb as when adding/updating 2011-09-07 19:11:11 +02:00
"Jean-Francois Dockes ext:(%22)
bc6587f07a get rid of unused guesscharset 2011-08-21 13:27:37 +02:00
"Jean-Francois Dockes ext:(%22)
ebbcc115a8 Allow setting a weight increase for field terms 2011-07-22 16:43:39 +02:00
"Jean-Francois Dockes ext:(%22)
36516b091b textsplit: discard - in front of words. Handle cjk punctuation characters 2011-07-16 11:51:38 +02:00