Jean-Francois Dockes
|
06cd2bfd87
|
unac: exclude Tamil, move to Unicode 14.0.0, modernize autoxx, fix C build
|
2022-09-24 09:14:51 +02:00 |
|
Jean-Francois Dockes
|
a24fc7bacc
|
indents and readability
|
2021-11-02 12:05:04 +01:00 |
|
Jean-Francois Dockes
|
560041cab9
|
cleared out errant tabs
|
2020-05-30 15:54:49 +02:00 |
|
Jean-Francois Dockes
|
4fdfe04ce5
|
independantly->independently
|
2019-12-02 10:46:46 +01:00 |
|
Jean-Francois Dockes
|
c1fad4afc7
|
Replaced pthread with std:: thread and mutex
|
2016-07-12 18:08:21 +02:00 |
|
Jean-Francois Dockes
|
c1c73573d8
|
more int fixups
--HG--
branch : WINDOWSPORT
|
2015-09-02 07:34:59 +02:00 |
|
Jean-Francois Dockes
|
b6eb3589ba
|
do not unaccent Bengali characters (process like the Hindi ones)
|
2014-07-16 12:47:30 +02:00 |
|
medoc
|
698affcfc8
|
Dont strip diacritics from Hindi Devanagari characters, they are determinant to word meaning
|
2013-10-26 18:56:25 +02:00 |
|
medoc
|
ab28204ab1
|
make modern automake happy
|
2013-10-26 18:54:54 +02:00 |
|
medoc
|
b7a35ad1e1
|
INCLUDES->AM_CPPFLAGS
|
2013-10-26 18:52:19 +02:00 |
|
medoc
|
75ee622f24
|
INCLUDES->AM_CPPFLAGS
|
2013-10-26 18:52:01 +02:00 |
|
Jean-Francois Dockes
|
1d2f93802f
|
simplified the except_trans container, previous method was buggy
|
2012-09-20 13:46:09 +02:00 |
|
Jean-Francois Dockes
|
913dffc597
|
added code for unac to perform pure case-folding
|
2012-08-27 12:40:57 +02:00 |
|
Jean-Francois Dockes
|
a4c17941b1
|
Added a configuration parameter to set specific unaccenting/lowercasing for some characters to be handled differently than would result from using the Unicode database. Exemple: "a with ring above" could be set to be preserved by a Swedish locutor
|
2012-04-09 12:42:23 +02:00 |
|
Jean-Francois Dockes
|
0d24b5620b
|
Make unac suppress combining accents found in input. Input in decomposed form was previously not unaccented
|
2011-11-04 21:06:48 +01:00 |
|
Jean-Francois Dockes
|
a2c9d2a82b
|
simplify initial memory allocs by using realloc in all cases
|
2011-10-10 18:44:46 +02:00 |
|
Jean-Francois Dockes
|
424e4173ba
|
threading cleanup: add mutex protection around moronic change to transcode. Add mutex to equiv issue in unac. Rename const strings everywhere to cstr_xx to ease future detection of potentially problematic static variables. Most probably close issue #65
|
2011-09-28 15:01:14 +02:00 |
|
"Jean-Francois Dockes ext:(%22)
|
773ab56327
|
perform some iconv_open caching
|
2011-08-21 13:54:09 +02:00 |
|
dockes
|
0fc81d26b6
|
new unac approach for japanese: dont decompose at all
|
2009-01-06 18:40:41 +00:00 |
|
dockes
|
f83d325b72
|
*** empty log message ***
|
2008-12-21 13:05:17 +00:00 |
|
dockes
|
36919ab728
|
no going out of the basic plane!
|
2008-12-18 11:58:13 +00:00 |
|
dockes
|
caf54d1d7f
|
added recoll memory allocation checks
|
2008-12-18 11:12:37 +00:00 |
|
dockes
|
c0aa102ac0
|
*** empty log message ***
|
2008-12-18 11:05:09 +00:00 |
|
dockes
|
869d75ee03
|
use unicode 5.1.0 + dont unaccent katakana/hiragana. Main change in unicode is that letters ae and o with stroke dont decompose anymore into a+e and o+e we may actually want to restore this if it proves a problem
|
2008-12-18 11:04:47 +00:00 |
|
dockes
|
00b954c4ef
|
implemented additional case-folding
|
2006-01-06 13:10:08 +00:00 |
|
dockes
|
b396d2c39f
|
initial import
|
2006-01-06 13:08:12 +00:00 |
|