Jean-Francois Dockes
|
04cd868950
|
Handle the case where unac produces whitespace, which may occur with letter-less accents
|
2015-08-13 18:22:09 +02:00 |
|
Jean-Francois Dockes
|
94b94593e3
|
comments and indent
|
2015-06-09 19:34:15 +02:00 |
|
Jean-Francois Dockes
|
657c65d438
|
Prevent error caused by trying to add a posting for an empty term (created by unac on really weird data)
|
2012-11-16 17:41:14 +01:00 |
|
Jean-Francois Dockes
|
913dffc597
|
added code for unac to perform pure case-folding
|
2012-08-27 12:40:57 +02:00 |
|
Jean-Francois Dockes
|
ee9dbda9fc
|
comments doc and formatting
|
2012-08-24 10:26:16 +02:00 |
|
"Jean-Francois Dockes ext:(%22)
|
0ebfc496d8
|
add capability to remember page breaks generated by, e.g. pdftotext, and use them to start an external viewer on a match page
|
2012-08-21 15:03:02 +02:00 |
|
Jean-Francois Dockes
|
0860b559ee
|
get rid of a few garbage terms during indexing. Set a threshold for conversion errors after which we discard the doc. Stabilize the new termproc pipeline but no commongrams for now
|
2011-10-12 17:55:58 +02:00 |
|
Jean-Francois Dockes
|
4a7ff398b2
|
comments
|
2011-10-07 08:05:36 +02:00 |
|
Jean-Francois Dockes
|
5fd31172f5
|
New text to terms processing pipelines: results identical to 1.16 when used with empty stopfile
|
2011-10-07 07:53:49 +02:00 |
|