recoll

Author	SHA1	Message	Date
Jean-Francois Dockes	06cd2bfd87	unac: exclude Tamil, move to Unicode 14.0.0, modernize autoxx, fix C build	2022-09-24 09:14:51 +02:00
Jean-Francois Dockes	560041cab9	cleared out errant tabs	2020-05-30 15:54:49 +02:00
Jean-Francois Dockes	c1fad4afc7	Replaced pthread with std:: thread and mutex	2016-07-12 18:08:21 +02:00
Jean-Francois Dockes	b6eb3589ba	do not unaccent Bengali characters (process like the Hindi ones)	2014-07-16 12:47:30 +02:00
medoc	698affcfc8	Dont strip diacritics from Hindi Devanagari characters, they are determinant to word meaning	2013-10-26 18:56:25 +02:00
Jean-Francois Dockes	913dffc597	added code for unac to perform pure case-folding	2012-08-27 12:40:57 +02:00
Jean-Francois Dockes	a4c17941b1	Added a configuration parameter to set specific unaccenting/lowercasing for some characters to be handled differently than would result from using the Unicode database. Exemple: "a with ring above" could be set to be preserved by a Swedish locutor	2012-04-09 12:42:23 +02:00
Jean-Francois Dockes	0d24b5620b	Make unac suppress combining accents found in input. Input in decomposed form was previously not unaccented	2011-11-04 21:06:48 +01:00
Jean-Francois Dockes	424e4173ba	threading cleanup: add mutex protection around moronic change to transcode. Add mutex to equiv issue in unac. Rename const strings everywhere to cstr_xx to ease future detection of potentially problematic static variables. Most probably close issue #65	2011-09-28 15:01:14 +02:00
dockes	0fc81d26b6	new unac approach for japanese: dont decompose at all	2009-01-06 18:40:41 +00:00
dockes	869d75ee03	use unicode 5.1.0 + dont unaccent katakana/hiragana. Main change in unicode is that letters ae and o with stroke dont decompose anymore into a+e and o+e we may actually want to restore this if it proves a problem	2008-12-18 11:04:47 +00:00
dockes	00b954c4ef	implemented additional case-folding	2006-01-06 13:10:08 +00:00
dockes	b396d2c39f	initial import	2006-01-06 13:08:12 +00:00

13 Commits