17 Commits

Author SHA1 Message Date
Jean-Francois Dockes
560041cab9 cleared out errant tabs 2020-05-30 15:54:49 +02:00
Jean-Francois Dockes
0cbc46732f Fixed the FSF address 2019-03-04 11:19:14 +01:00
Jean-Francois Dockes
0ba71e0e39 devanagari punctuation 2013-10-18 13:06:17 +02:00
Jean-Francois Dockes
d3a26706b5 add a class for skipped characters 2012-10-03 09:07:59 +02:00
Jean-Francois Dockes
efd319025d attempt to eliminate more unicode uninteresting characters 2012-10-02 17:45:16 +02:00
Jean-Francois Dockes
de4225e1ae cleaned up uproplist file 2012-09-20 07:15:15 +02:00
Jean-Francois Dockes
63d97e597b added a bunch of graphic characters to the word breakers list and changed the container used from set to unordered_set for speed 2012-09-19 19:50:45 +02:00
Jean-Francois Dockes
909d92b218 added some currency symbols to punctuation 2012-08-24 20:54:03 +02:00
Jean-Francois Dockes
581fcbc01e fix handling for some trademark, registered and copyright signs 2012-03-20 10:33:27 +01:00
"Jean-Francois Dockes ext:(%22)
0e37f64a3c added more punctuation 2011-07-16 11:50:02 +02:00
"Jean-Francois Dockes ext:(%22)
5e59354535 more punctuation 2011-07-12 03:32:00 -07:00
Jean-Francois Dockes
442ff819d0 added a number of unicode punctuation characters 2011-07-06 10:52:16 +02:00
Jean-Francois Dockes
b28eaf23fb Got rid of all the old RCS id strings 2011-04-27 08:22:17 +02:00
dockes
3991b11d2b small fix : remove diaeresis from seps + comments 2009-01-13 16:02:18 +00:00
dockes
3414963810 take care of splitting user string with respect to unicode white space, not only ascii 2008-12-05 11:09:31 +00:00
dockes
3872f8cf38 *** empty log message *** 2006-01-30 11:15:28 +00:00
dockes
d42db8b65d improved word extraction a bit (unicode punctuation) 2005-02-11 11:20:02 +00:00