Jean-Francois Dockes
|
48d4678770
|
experiment: Korean when Noun then JX emit both Noun and Noun+JX
|
2020-04-25 14:19:54 +02:00 |
|
Jean-Francois Dockes
|
ec7379f837
|
textsplitko: start cmd as python kosplitter.py
|
2020-04-10 14:34:50 +01:00 |
|
Jean-Francois Dockes
|
afcacf63c0
|
Fix page handling in Korean spitter, bug would shift the byte positions, with bad consequences for snippets
|
2020-03-31 16:11:37 +02:00 |
|
Jean-Francois Dockes
|
7de66aae60
|
Korean splitter: suppress some ctl chars from Komoran input. Better compute pages
|
2020-03-26 18:44:59 +01:00 |
|
Jean-Francois Dockes
|
1afc606718
|
textsplit: break on it.error() not only it.eof(). Seems to make a difference in rare cases? Add Komoran support but this one often fails
|
2020-03-26 09:31:19 +01:00 |
|
Jean-Francois Dockes
|
b677171fa8
|
GUI: Experimental: create a list of MIME types (compiled in for now: hwp) for which we prefer to use stored text for preview because extraction is slow
|
2020-03-25 18:13:00 +01:00 |
|
Jean-Francois Dockes
|
97e89c408a
|
korean splitter: only break korean stretch on non-korean alphabetic (e.g. not numbers or punctuation)
|
2020-03-25 16:57:42 +01:00 |
|
Jean-Francois Dockes
|
207bfec93e
|
korean splitter: restart the python/java splitter from time to time because it leaks memory
|
2020-03-24 11:27:10 +01:00 |
|
Jean-Francois Dockes
|
a323472876
|
typo in textsplitko would prevent use of Mecab
|
2020-03-24 08:50:24 +01:00 |
|
Jean-Francois Dockes
|
9719177c82
|
Korean external splitter: add some support for Mecab
|
2020-03-23 16:20:32 +01:00 |
|
Jean-Francois Dockes
|
c9667b5ba7
|
Korean text: sort-of-working version, in need of validation
|
2020-03-22 15:49:24 +01:00 |
|
Jean-Francois Dockes
|
384e3a1087
|
korean textsplit with extern help from konlpy, first step
|
2020-03-22 10:09:50 +01:00 |
|