Jean-Francois Dockes
|
b4c7efe490
|
Added (unifdefd) code to detect garbage data like undecoded base64 by looking at word length stats
|
2013-04-27 08:29:55 +02:00 |
|
"Jean-Francois Dockes ext:(%22)
|
0ebfc496d8
|
add capability to remember page breaks generated by, e.g. pdftotext, and use them to start an external viewer on a match page
|
2012-08-21 15:03:02 +02:00 |
|
Jean-Francois Dockes
|
4eaf12fb9c
|
more delistification
|
2012-04-12 08:15:50 +02:00 |
|
Jean-Francois Dockes
|
5fd31172f5
|
New text to terms processing pipelines: results identical to 1.16 when used with empty stopfile
|
2011-10-07 07:53:49 +02:00 |
|
Jean-Francois Dockes
|
cb0794e92c
|
textsplit: eliminate some garbage terms (ie long sequences of dashes)
|
2011-07-06 16:20:32 +02:00 |
|
Jean-Francois Dockes
|
b28eaf23fb
|
Got rid of all the old RCS id strings
|
2011-04-27 08:22:17 +02:00 |
|
Jean-Francois Dockes
|
48358c8252
|
Added option nonumbers not to generate terms for numbers. closes #16
|
2010-05-05 10:18:56 +02:00 |
|
Jean-Francois Dockes
|
8b2b00bc72
|
cosmetics: use derived class for actual splitter instead of callback
|
2010-02-02 15:33:52 +01:00 |
|
dockes
|
6169fdec4b
|
Emit a_b intermediary span when splitting a_b.c
|
2009-01-27 10:25:26 +00:00 |
|
dockes
|
64ef8d0b81
|
dont insert space in cjk abstracts
|
2008-12-12 11:53:45 +00:00 |
|
dockes
|
3414963810
|
take care of splitting user string with respect to unicode white space, not only ascii
|
2008-12-05 11:09:31 +00:00 |
|
dockes
|
90e378333e
|
make cjk ngramlen configurable
|
2007-10-04 12:21:52 +00:00 |
|
dockes
|
4adb351ca4
|
add flag to disable cjk processing
|
2007-10-02 11:39:08 +00:00 |
|
dockes
|
069d71ea8f
|
initial cjk support
|
2007-09-20 08:45:05 +00:00 |
|
dockes
|
ba295fae4f
|
use m_ prefix for members
|
2007-09-18 20:35:31 +00:00 |
|
dockes
|
d12021b22c
|
handle wildcards in search terms
|
2007-01-18 12:09:58 +00:00 |
|
dockes
|
554f75c99c
|
only autophrase if query has several terms
|
2006-12-08 07:11:17 +00:00 |
|
dockes
|
9d6963c95a
|
improved textsplit speed (needs utf8iter modifs too
|
2006-11-20 11:17:53 +00:00 |
|
dockes
|
b3ab39522b
|
optim ckpt
|
2006-11-19 18:37:37 +00:00 |
|
dockes
|
31b348b736
|
phrase queries with bot spans and words must be splitted as words only
|
2006-11-12 08:35:11 +00:00 |
|
dockes
|
3872f8cf38
|
*** empty log message ***
|
2006-01-30 11:15:28 +00:00 |
|
dockes
|
3c78938565
|
*** empty log message ***
|
2006-01-28 15:36:59 +00:00 |
|
dockes
|
8c9eb8c6d3
|
more textsplit tweaking
|
2006-01-28 10:23:55 +00:00 |
|
dockes
|
ce740a26ad
|
most of adv search working. Still need subtree/filename filters
|
2005-10-19 10:21:48 +00:00 |
|
dockes
|
8493933aef
|
comments
|
2005-10-10 13:25:23 +00:00 |
|
dockes
|
4588803281
|
phrases ok except for preview position
|
2005-02-08 10:56:13 +00:00 |
|
dockes
|
4c54a8478f
|
fixes in textsplit
|
2005-02-08 09:34:47 +00:00 |
|
dockes
|
2a020407da
|
simple term highlighting in query preview
|
2005-02-07 13:17:47 +00:00 |
|
dockes
|
5210139b85
|
*** empty log message ***
|
2005-01-24 13:17:59 +00:00 |
|
dockes
|
869b57eb8c
|
*** empty log message ***
|
2004-12-17 13:01:01 +00:00 |
|
dockes
|
5ca462cdff
|
*** empty log message ***
|
2004-12-14 17:54:16 +00:00 |
|