Jean-Francois Dockes
|
3f331ebb3e
|
fix glitch caused by udi prefix change
|
2012-10-03 08:05:39 +02:00 |
|
Jean-Francois Dockes
|
efd319025d
|
attempt to eliminate more unicode uninteresting characters
|
2012-10-02 17:45:16 +02:00 |
|
Jean-Francois Dockes
|
f510262e1d
|
default to stripped index
|
2012-10-01 17:27:16 +02:00 |
|
"Jean-Francois Dockes ext:(%22)
|
52bc9f4aa3
|
merged the case/diac sensitivity code back into trunk
|
2012-09-25 19:20:24 +02:00 |
|
Jean-Francois Dockes
|
de4225e1ae
|
cleaned up uproplist file
|
2012-09-20 07:15:15 +02:00 |
|
Jean-Francois Dockes
|
63d97e597b
|
added a bunch of graphic characters to the word breakers list and changed the container used from set to unordered_set for speed
|
2012-09-19 19:50:45 +02:00 |
|
Jean-Francois Dockes
|
9b273d94e8
|
ensure that recoll configured with indexStripChars=1 runs as compiled with -DRCL_INDEX_STRIPCHARS
--HG--
branch : CASEDIACSENS
|
2012-09-15 15:16:20 +02:00 |
|
Jean-Francois Dockes
|
a7222d4f96
|
Make Recoll optionally sensitive to case and diacritics
--HG--
branch : CASEDIACSENS
|
2012-09-14 14:34:27 +02:00 |
|
Jean-Francois Dockes
|
8b40cb0499
|
merged from cdsens branch
|
2012-09-13 12:28:42 +02:00 |
|
Jean-Francois Dockes
|
e0bc65bfdd
|
small mods inocuous or auxiliary to case/diac sensitivity but which can live in main branch
|
2012-09-13 12:25:01 +02:00 |
|
Jean-Francois Dockes
|
25f4fc3b2c
|
none
|
2012-09-13 12:07:27 +02:00 |
|
Jean-Francois Dockes
|
c030a15780
|
Remove improper assertion use from beagle cache handling code
|
2012-09-13 09:44:47 +02:00 |
|
Jean-Francois Dockes
|
20f79e400f
|
fixed incorrect unique() algo usage
|
2012-09-01 17:27:49 +02:00 |
|
Jean-Francois Dockes
|
913dffc597
|
added code for unac to perform pure case-folding
|
2012-08-27 12:40:57 +02:00 |
|
Jean-Francois Dockes
|
909d92b218
|
added some currency symbols to punctuation
|
2012-08-24 20:54:03 +02:00 |
|
"Jean-Francois Dockes ext:(%22)
|
0ebfc496d8
|
add capability to remember page breaks generated by, e.g. pdftotext, and use them to start an external viewer on a match page
|
2012-08-21 15:03:02 +02:00 |
|
Jean-Francois Dockes
|
f95cee0cb4
|
config checking test program
|
2012-05-28 09:08:55 +02:00 |
|
Jean-Francois Dockes
|
7bbf61c8cb
|
config: getDaemSkippedPaths() could crash if daemSkippedPaths was empty
|
2012-05-25 17:06:01 +02:00 |
|
Jean-Francois Dockes
|
4eaf12fb9c
|
more delistification
|
2012-04-12 08:15:50 +02:00 |
|
Jean-Francois Dockes
|
ec7b40a52e
|
cosmetics: list -> vector in more places
|
2012-04-11 19:58:08 +02:00 |
|
Jean-Francois Dockes
|
a4c17941b1
|
Added a configuration parameter to set specific unaccenting/lowercasing for some characters to be handled differently than would result from using the Unicode database. Exemple: "a with ring above" could be set to be preserved by a Swedish locutor
|
2012-04-09 12:42:23 +02:00 |
|
"Jean-Francois Dockes ext:(%22)
|
f044b20d5a
|
Remove dependance on system type name in a few more places
|
2012-04-02 09:52:04 +02:00 |
|
Jean-Francois Dockes
|
581fcbc01e
|
fix handling for some trademark, registered and copyright signs
|
2012-03-20 10:33:27 +01:00 |
|
Jean-Francois Dockes
|
85166c93b2
|
Changed the way we handle document sizes. The fbytes field should now be in most cases the most "natural" document size. pcbytes holds the top external container size and dbytes the text size
|
2012-03-07 15:39:30 +01:00 |
|
Jean-Francois Dockes
|
638d468796
|
clarified the use of string keys inside the Filter metaData array
|
2012-03-07 10:13:46 +01:00 |
|
Jean-Francois Dockes
|
2c6b023a88
|
real time indexer: monitor the configuration for changes and reexecute when needed
|
2012-03-06 09:35:21 +01:00 |
|
Jean-Francois Dockes
|
a5af2b93bd
|
"md5"->cstr_md5
|
2012-02-25 10:41:27 +01:00 |
|
Jean-Francois Dockes
|
ef00bfae70
|
Implement the gui category filters as query language fragments instead of hard-coding them. This allows implementing other kinds of filtering (ie:on directory) just by changing a configuration file
|
2012-02-18 11:21:09 +01:00 |
|
Jean-Francois Dockes
|
1b0c77c2e4
|
add parameter to specify indexing status file path
|
2012-02-17 16:33:47 +01:00 |
|
Jean-Francois Dockes
|
068fa8ccc7
|
test driver fix
|
2012-02-17 10:17:12 +01:00 |
|
Jean-Francois Dockes
|
f59e2e033a
|
index: update a status file while indexing
|
2012-02-06 17:03:39 +01:00 |
|
Jean-Francois Dockes
|
07226fa306
|
GUI tools for setting up indexing schedule, initial implementation done
|
2011-12-07 13:41:05 +01:00 |
|
Jean-Francois Dockes
|
b9c64e8591
|
Gui: help for cron etc. 1st checkpoint
|
2011-12-02 19:15:24 +01:00 |
|
Jean-Francois Dockes
|
3759c0b52d
|
index: add skippedPathsFnmPathname variable to enable disabling the use of FNM_PATHNAME while matching in skippedPaths. Closes issue #67
|
2011-11-30 16:36:51 +01:00 |
|
Jean-Francois Dockes
|
27430403e2
|
comment
|
2011-11-25 19:44:37 +01:00 |
|
Jean-Francois Dockes
|
49554e42c2
|
Factorized common text transcoding code in separate module
|
2011-10-20 17:53:42 +02:00 |
|
Jean-Francois Dockes
|
6c72454396
|
generate acronyms for dotted abbrevs. ie O.E.C.D -> OECD
|
2011-10-20 13:24:29 +02:00 |
|
Jean-Francois Dockes
|
56fe54412f
|
Protect against deadlock when using fam/gamin by adding a small timeout to the peek for events done between add calls. Add alarm to the addwatch call in case the deadlock happens anyway
|
2011-10-13 15:20:28 +02:00 |
|
Jean-Francois Dockes
|
0860b559ee
|
get rid of a few garbage terms during indexing. Set a threshold for conversion errors after which we discard the doc. Stabilize the new termproc pipeline but no commongrams for now
|
2011-10-12 17:55:58 +02:00 |
|
Jean-Francois Dockes
|
5fd31172f5
|
New text to terms processing pipelines: results identical to 1.16 when used with empty stopfile
|
2011-10-07 07:53:49 +02:00 |
|
Jean-Francois Dockes
|
38e0957962
|
const string cleanup
|
2011-10-01 16:39:38 +02:00 |
|
Jean-Francois Dockes
|
3013e843a2
|
log
|
2011-10-01 09:20:10 +02:00 |
|
Jean-Francois Dockes
|
91778f8943
|
lower verbosity
|
2011-09-30 08:21:43 +02:00 |
|
Jean-Francois Dockes
|
424e4173ba
|
threading cleanup: add mutex protection around moronic change to transcode. Add mutex to equiv issue in unac. Rename const strings everywhere to cstr_xx to ease future detection of potentially problematic static variables. Most probably close issue #65
|
2011-09-28 15:01:14 +02:00 |
|
Jean-Francois Dockes
|
5b3c5d8a5d
|
small OpenBSD fixes (mount.h and FILE_OFFSET_BITS)
|
2011-09-23 10:32:41 +02:00 |
|
Jean-Francois Dockes
|
cd27645cc2
|
Avoid fwrite failure while trying to write empty missing helpers string
|
2011-09-20 07:37:28 +02:00 |
|
Jean-Francois Dockes
|
c5ff0cdf52
|
Control memory usage when deleting documents: use idxflushmb as when adding/updating
|
2011-09-07 19:11:11 +02:00 |
|
"Jean-Francois Dockes ext:(%22)
|
bc6587f07a
|
get rid of unused guesscharset
|
2011-08-21 13:27:37 +02:00 |
|
"Jean-Francois Dockes ext:(%22)
|
ebbcc115a8
|
Allow setting a weight increase for field terms
|
2011-07-22 16:43:39 +02:00 |
|
"Jean-Francois Dockes ext:(%22)
|
36516b091b
|
textsplit: discard - in front of words. Handle cjk punctuation characters
|
2011-07-16 11:51:38 +02:00 |
|