Jean-Francois Dockes
|
3716ea3dac
|
unify processing for executing a python script
|
2020-09-28 14:04:09 +02:00 |
|
Jean-Francois Dockes
|
c1ef2187d3
|
Fixed LOG calls obsolescence issues preventing build with staticverbosity 7
|
2020-09-06 14:59:00 +01:00 |
|
Jean-Francois Dockes
|
f3858a7e3a
|
limit max size of korean single-word span
|
2020-05-31 09:57:58 +02:00 |
|
Jean-Francois Dockes
|
560041cab9
|
cleared out errant tabs
|
2020-05-30 15:54:49 +02:00 |
|
Jean-Francois Dockes
|
8ac74ca8f5
|
log levels
|
2020-05-24 14:39:06 +02:00 |
|
Jean-Francois Dockes
|
a5bab94ae3
|
korean splitter: break on digits
|
2020-05-24 14:02:23 +02:00 |
|
Jean-Francois Dockes
|
fc981e3733
|
new variation on the korean splitter. Index both the space-less spans whole and the mecab split output
|
2020-05-22 16:48:05 +02:00 |
|
Jean-Francois Dockes
|
ea2db676ed
|
korean: reactivate option to generate both noun,jx and noun+jx
|
2020-05-19 09:23:03 +02:00 |
|
Jean-Francois Dockes
|
97f3212f80
|
korean splitter: disable the noun+jx emitting thing
|
2020-05-14 09:23:09 +02:00 |
|
Jean-Francois Dockes
|
d58fec0b81
|
korean: for now dont filter tags, until it is better understood what should be done
|
2020-05-11 07:33:54 +02:00 |
|
Jean-Francois Dockes
|
48d4678770
|
experiment: Korean when Noun then JX emit both Noun and Noun+JX
|
2020-04-25 14:19:54 +02:00 |
|
Jean-Francois Dockes
|
ec7379f837
|
textsplitko: start cmd as python kosplitter.py
|
2020-04-10 14:34:50 +01:00 |
|
Jean-Francois Dockes
|
afcacf63c0
|
Fix page handling in Korean spitter, bug would shift the byte positions, with bad consequences for snippets
|
2020-03-31 16:11:37 +02:00 |
|
Jean-Francois Dockes
|
7de66aae60
|
Korean splitter: suppress some ctl chars from Komoran input. Better compute pages
|
2020-03-26 18:44:59 +01:00 |
|
Jean-Francois Dockes
|
1afc606718
|
textsplit: break on it.error() not only it.eof(). Seems to make a difference in rare cases? Add Komoran support but this one often fails
|
2020-03-26 09:31:19 +01:00 |
|
Jean-Francois Dockes
|
b677171fa8
|
GUI: Experimental: create a list of MIME types (compiled in for now: hwp) for which we prefer to use stored text for preview because extraction is slow
|
2020-03-25 18:13:00 +01:00 |
|
Jean-Francois Dockes
|
97e89c408a
|
korean splitter: only break korean stretch on non-korean alphabetic (e.g. not numbers or punctuation)
|
2020-03-25 16:57:42 +01:00 |
|
Jean-Francois Dockes
|
207bfec93e
|
korean splitter: restart the python/java splitter from time to time because it leaks memory
|
2020-03-24 11:27:10 +01:00 |
|
Jean-Francois Dockes
|
a323472876
|
typo in textsplitko would prevent use of Mecab
|
2020-03-24 08:50:24 +01:00 |
|
Jean-Francois Dockes
|
9719177c82
|
Korean external splitter: add some support for Mecab
|
2020-03-23 16:20:32 +01:00 |
|
Jean-Francois Dockes
|
c9667b5ba7
|
Korean text: sort-of-working version, in need of validation
|
2020-03-22 15:49:24 +01:00 |
|
Jean-Francois Dockes
|
384e3a1087
|
korean textsplit with extern help from konlpy, first step
|
2020-03-22 10:09:50 +01:00 |
|