Jean-Francois Dockes
|
560041cab9
|
cleared out errant tabs
|
2020-05-30 15:54:49 +02:00 |
|
Jean-Francois Dockes
|
0cbc46732f
|
Fixed the FSF address
|
2019-03-04 11:19:14 +01:00 |
|
Jean-Francois Dockes
|
0ba71e0e39
|
devanagari punctuation
|
2013-10-18 13:06:17 +02:00 |
|
Jean-Francois Dockes
|
d3a26706b5
|
add a class for skipped characters
|
2012-10-03 09:07:59 +02:00 |
|
Jean-Francois Dockes
|
efd319025d
|
attempt to eliminate more unicode uninteresting characters
|
2012-10-02 17:45:16 +02:00 |
|
Jean-Francois Dockes
|
de4225e1ae
|
cleaned up uproplist file
|
2012-09-20 07:15:15 +02:00 |
|
Jean-Francois Dockes
|
63d97e597b
|
added a bunch of graphic characters to the word breakers list and changed the container used from set to unordered_set for speed
|
2012-09-19 19:50:45 +02:00 |
|
Jean-Francois Dockes
|
909d92b218
|
added some currency symbols to punctuation
|
2012-08-24 20:54:03 +02:00 |
|
Jean-Francois Dockes
|
581fcbc01e
|
fix handling for some trademark, registered and copyright signs
|
2012-03-20 10:33:27 +01:00 |
|
"Jean-Francois Dockes ext:(%22)
|
0e37f64a3c
|
added more punctuation
|
2011-07-16 11:50:02 +02:00 |
|
"Jean-Francois Dockes ext:(%22)
|
5e59354535
|
more punctuation
|
2011-07-12 03:32:00 -07:00 |
|
Jean-Francois Dockes
|
442ff819d0
|
added a number of unicode punctuation characters
|
2011-07-06 10:52:16 +02:00 |
|
Jean-Francois Dockes
|
b28eaf23fb
|
Got rid of all the old RCS id strings
|
2011-04-27 08:22:17 +02:00 |
|
dockes
|
3991b11d2b
|
small fix : remove diaeresis from seps + comments
|
2009-01-13 16:02:18 +00:00 |
|
dockes
|
3414963810
|
take care of splitting user string with respect to unicode white space, not only ascii
|
2008-12-05 11:09:31 +00:00 |
|
dockes
|
3872f8cf38
|
*** empty log message ***
|
2006-01-30 11:15:28 +00:00 |
|
dockes
|
d42db8b65d
|
improved word extraction a bit (unicode punctuation)
|
2005-02-11 11:20:02 +00:00 |
|