Jean-Francois Dockes
48d4678770
experiment: Korean when Noun then JX emit both Noun and Noun+JX
2020-04-25 14:19:54 +02:00
Jean-Francois Dockes
2f794be314
Fix Windows gcc build. Needs some def to get w7+ windows api
2020-04-25 11:41:37 +02:00
Jean-Francois Dockes
07e3387fc1
Avoid calling isalpha() with big ints, may crash, depending on version
2020-04-25 11:19:52 +02:00
Jean-Francois Dockes
39c152bada
Fixed MSVC warnings, all inocuous
2020-04-17 14:26:40 +01:00
Jean-Francois Dockes
12ebb7ac6e
Windows: deal with non-ASCII user login, non-ascii paths in confdir etc.
2020-04-15 14:03:04 +01:00
Jean-Francois Dockes
9565663f09
textsplit: create isNGRAMMED() method to replace isCJK() and let the latter actually return what it says
2020-04-14 09:27:26 +02:00
Jean-Francois Dockes
eb53b598d6
Textsplit: lost char at korean->ascii transition
2020-04-10 14:54:13 +01:00
Jean-Francois Dockes
ec7379f837
textsplitko: start cmd as python kosplitter.py
2020-04-10 14:34:50 +01:00
Jean-Francois Dockes
de246349da
textsplit: use more regular test for ISHANGUL. CJK: do not ignore whitespace, break on alphabetic non cjk character
2020-04-10 14:28:14 +02:00
Jean-Francois Dockes
6999284c42
indent and decls
2020-04-05 13:46:47 +01:00
Jean-Francois Dockes
a468406e17
windows/qtcreator msvc adjustements
2020-04-04 14:00:39 +01:00
Jean-Francois Dockes
7656d1b2ef
Merge branch 'master' of https://framagit.org/medoc90/recoll
2020-04-03 07:34:41 +01:00
Jean-Francois Dockes
b0fb7612ee
some msvc changes
2020-04-03 07:33:27 +01:00
Jean-Francois Dockes
afcacf63c0
Fix page handling in Korean spitter, bug would shift the byte positions, with bad consequences for snippets
2020-03-31 16:11:37 +02:00
Jean-Francois Dockes
7de66aae60
Korean splitter: suppress some ctl chars from Komoran input. Better compute pages
2020-03-26 18:44:59 +01:00
Jean-Francois Dockes
9b3a5fac12
Merge branch 'kopostag'
2020-03-26 14:03:17 +01:00
Jean-Francois Dockes
f755505e98
bumpedversion
2020-03-26 11:02:37 +01:00
Jean-Francois Dockes
1afc606718
textsplit: break on it.error() not only it.eof(). Seems to make a difference in rare cases? Add Komoran support but this one often fails
2020-03-26 09:31:19 +01:00
Jean-Francois Dockes
b677171fa8
GUI: Experimental: create a list of MIME types (compiled in for now: hwp) for which we prefer to use stored text for preview because extraction is slow
2020-03-25 18:13:00 +01:00
Jean-Francois Dockes
97e89c408a
korean splitter: only break korean stretch on non-korean alphabetic (e.g. not numbers or punctuation)
2020-03-25 16:57:42 +01:00
Jean-Francois Dockes
207bfec93e
korean splitter: restart the python/java splitter from time to time because it leaks memory
2020-03-24 11:27:10 +01:00
Jean-Francois Dockes
a323472876
typo in textsplitko would prevent use of Mecab
2020-03-24 08:50:24 +01:00
Jean-Francois Dockes
9719177c82
Korean external splitter: add some support for Mecab
2020-03-23 16:20:32 +01:00
Jean-Francois Dockes
c9667b5ba7
Korean text: sort-of-working version, in need of validation
2020-03-22 15:49:24 +01:00
Jean-Francois Dockes
384e3a1087
korean textsplit with extern help from konlpy, first step
2020-03-22 10:09:50 +01:00
Jean-Francois Dockes
d83bb8cf69
indents
2020-03-21 10:24:26 +01:00
Jean-Francois Dockes
5be3ed89c5
comments
2020-03-21 10:16:44 +01:00
Jean-Francois Dockes
a52f68657d
bump version
2020-02-27 18:20:38 +01:00
Jean-Francois Dockes
25a5f3a7e0
Explicitly test for malloc_trim() in configure
2020-02-25 16:45:29 +01:00
Jean-Francois Dockes
6efd91c057
bumped version
2020-01-13 10:24:41 +01:00
Jean-Francois Dockes
414222c003
use conftree conversions
2019-12-02 09:37:34 +01:00
Jean-Francois Dockes
83e29a9b01
Windows: enable the firefox recent history indexer.
2019-11-24 10:46:23 +01:00
Jean-Francois Dockes
9c7886e2df
rclconfig: create an empty fields along the others during initial run: necessitated by the conftree change
2019-11-08 15:52:51 +01:00
Jean-Francois Dockes
8b4656ede5
rclconfig: more uniform generation and improved readability of error message
2019-11-01 09:06:37 +01:00
Jean-Francois Dockes
375806560b
windows version update
2019-10-27 07:04:50 +01:00
Jean-Francois Dockes
c11cac2868
orthograph, mostly in comments, also man pages
2019-10-18 09:13:10 +02:00
Jean-Francois Dockes
3484451d46
small changes after windows aspell support port
2019-10-13 12:58:05 +02:00
Jean-Francois Dockes
6c5440ff7b
Add aspell support on Windows
2019-10-13 10:12:53 +02:00
Jean-Francois Dockes
9c111fba29
macports: ensure the GUI finds recollindex
2019-09-27 11:36:31 +02:00
Jean-Francois Dockes
336dc0dc48
Default to UTF-8 on the mac, nl_langinfo(CODENAME) returns US-ASCII for desktop apps
2019-09-23 16:45:40 +02:00
Jean-Francois Dockes
d4c099ab59
merged branch 1.25 fixes
2019-08-09 11:54:39 +02:00
Jean-Francois Dockes
37325ceee5
bump version to 1.5.21
2019-08-08 14:18:57 +02:00
Jean-Francois Dockes
c3bc2da9af
bumped version to 1.25.20
2019-07-22 16:01:26 +02:00
Jean-Francois Dockes
c18f069c58
Windows: add the recoll temporary files directory to skippedPaths
2019-07-22 09:33:19 +02:00
Jean-Francois Dockes
bbf8c90185
experiment: ignore all ascii whitespace when generating cjk ngrams
2019-07-21 19:13:24 +02:00
Jean-Francois Dockes
baa6062de1
Do not process hangul as words, but as ngrams. Same issues as with Katakana: word separation too hard
2019-07-21 19:09:51 +02:00
Jean-Francois Dockes
6b058e9758
Regularise processing of hangul characters (there was a mixup of cjk/regular processing), and add a build-time option to either use cjk/ngram or regular term splitting for them
2019-07-21 19:09:51 +02:00
Jean-Francois Dockes
049ba1e7e4
Windows: build with UNICODE, get rid of all TCHAR/TEXT(), use explicit
...
xxA() interfaces and wchar_t in some places. Add a static cleanup retry
method to TempFile, called after clearing the MimeHandler cache (killing
the subprocesses which might hold an open file).
2019-07-21 16:23:16 +02:00
Jean-Francois Dockes
45043b816f
add onlyNames config variable for filtering file names
2019-06-17 08:28:14 +02:00
Jean-Francois Dockes
1991e132a7
bumped version to 1.25.19
2019-06-13 08:38:02 +02:00