477 Commits

Author SHA1 Message Date
Jean-Francois Dockes
39c152bada Fixed MSVC warnings, all inocuous 2020-04-17 14:26:40 +01:00
Jean-Francois Dockes
12ebb7ac6e Windows: deal with non-ASCII user login, non-ascii paths in confdir etc. 2020-04-15 14:03:04 +01:00
Jean-Francois Dockes
9565663f09 textsplit: create isNGRAMMED() method to replace isCJK() and let the latter actually return what it says 2020-04-14 09:27:26 +02:00
Jean-Francois Dockes
eb53b598d6 Textsplit: lost char at korean->ascii transition 2020-04-10 14:54:13 +01:00
Jean-Francois Dockes
ec7379f837 textsplitko: start cmd as python kosplitter.py 2020-04-10 14:34:50 +01:00
Jean-Francois Dockes
de246349da textsplit: use more regular test for ISHANGUL. CJK: do not ignore whitespace, break on alphabetic non cjk character 2020-04-10 14:28:14 +02:00
Jean-Francois Dockes
6999284c42 indent and decls 2020-04-05 13:46:47 +01:00
Jean-Francois Dockes
a468406e17 windows/qtcreator msvc adjustements 2020-04-04 14:00:39 +01:00
Jean-Francois Dockes
7656d1b2ef Merge branch 'master' of https://framagit.org/medoc90/recoll 2020-04-03 07:34:41 +01:00
Jean-Francois Dockes
b0fb7612ee some msvc changes 2020-04-03 07:33:27 +01:00
Jean-Francois Dockes
afcacf63c0 Fix page handling in Korean spitter, bug would shift the byte positions, with bad consequences for snippets 2020-03-31 16:11:37 +02:00
Jean-Francois Dockes
7de66aae60 Korean splitter: suppress some ctl chars from Komoran input. Better compute pages 2020-03-26 18:44:59 +01:00
Jean-Francois Dockes
9b3a5fac12 Merge branch 'kopostag' 2020-03-26 14:03:17 +01:00
Jean-Francois Dockes
f755505e98 bumpedversion 2020-03-26 11:02:37 +01:00
Jean-Francois Dockes
1afc606718 textsplit: break on it.error() not only it.eof(). Seems to make a difference in rare cases? Add Komoran support but this one often fails 2020-03-26 09:31:19 +01:00
Jean-Francois Dockes
b677171fa8 GUI: Experimental: create a list of MIME types (compiled in for now: hwp) for which we prefer to use stored text for preview because extraction is slow 2020-03-25 18:13:00 +01:00
Jean-Francois Dockes
97e89c408a korean splitter: only break korean stretch on non-korean alphabetic (e.g. not numbers or punctuation) 2020-03-25 16:57:42 +01:00
Jean-Francois Dockes
207bfec93e korean splitter: restart the python/java splitter from time to time because it leaks memory 2020-03-24 11:27:10 +01:00
Jean-Francois Dockes
a323472876 typo in textsplitko would prevent use of Mecab 2020-03-24 08:50:24 +01:00
Jean-Francois Dockes
9719177c82 Korean external splitter: add some support for Mecab 2020-03-23 16:20:32 +01:00
Jean-Francois Dockes
c9667b5ba7 Korean text: sort-of-working version, in need of validation 2020-03-22 15:49:24 +01:00
Jean-Francois Dockes
384e3a1087 korean textsplit with extern help from konlpy, first step 2020-03-22 10:09:50 +01:00
Jean-Francois Dockes
d83bb8cf69 indents 2020-03-21 10:24:26 +01:00
Jean-Francois Dockes
5be3ed89c5 comments 2020-03-21 10:16:44 +01:00
Jean-Francois Dockes
a52f68657d bump version 2020-02-27 18:20:38 +01:00
Jean-Francois Dockes
25a5f3a7e0 Explicitly test for malloc_trim() in configure 2020-02-25 16:45:29 +01:00
Jean-Francois Dockes
6efd91c057 bumped version 2020-01-13 10:24:41 +01:00
Jean-Francois Dockes
414222c003 use conftree conversions 2019-12-02 09:37:34 +01:00
Jean-Francois Dockes
83e29a9b01 Windows: enable the firefox recent history indexer. 2019-11-24 10:46:23 +01:00
Jean-Francois Dockes
9c7886e2df rclconfig: create an empty fields along the others during initial run: necessitated by the conftree change 2019-11-08 15:52:51 +01:00
Jean-Francois Dockes
8b4656ede5 rclconfig: more uniform generation and improved readability of error message 2019-11-01 09:06:37 +01:00
Jean-Francois Dockes
375806560b windows version update 2019-10-27 07:04:50 +01:00
Jean-Francois Dockes
c11cac2868 orthograph, mostly in comments, also man pages 2019-10-18 09:13:10 +02:00
Jean-Francois Dockes
3484451d46 small changes after windows aspell support port 2019-10-13 12:58:05 +02:00
Jean-Francois Dockes
6c5440ff7b Add aspell support on Windows 2019-10-13 10:12:53 +02:00
Jean-Francois Dockes
9c111fba29 macports: ensure the GUI finds recollindex 2019-09-27 11:36:31 +02:00
Jean-Francois Dockes
336dc0dc48 Default to UTF-8 on the mac, nl_langinfo(CODENAME) returns US-ASCII for desktop apps 2019-09-23 16:45:40 +02:00
Jean-Francois Dockes
d4c099ab59 merged branch 1.25 fixes 2019-08-09 11:54:39 +02:00
Jean-Francois Dockes
37325ceee5 bump version to 1.5.21 2019-08-08 14:18:57 +02:00
Jean-Francois Dockes
c3bc2da9af bumped version to 1.25.20 2019-07-22 16:01:26 +02:00
Jean-Francois Dockes
c18f069c58 Windows: add the recoll temporary files directory to skippedPaths 2019-07-22 09:33:19 +02:00
Jean-Francois Dockes
bbf8c90185 experiment: ignore all ascii whitespace when generating cjk ngrams 2019-07-21 19:13:24 +02:00
Jean-Francois Dockes
baa6062de1 Do not process hangul as words, but as ngrams. Same issues as with Katakana: word separation too hard 2019-07-21 19:09:51 +02:00
Jean-Francois Dockes
6b058e9758 Regularise processing of hangul characters (there was a mixup of cjk/regular processing), and add a build-time option to either use cjk/ngram or regular term splitting for them 2019-07-21 19:09:51 +02:00
Jean-Francois Dockes
049ba1e7e4 Windows: build with UNICODE, get rid of all TCHAR/TEXT(), use explicit
xxA() interfaces and wchar_t in some places. Add a static cleanup retry
method to TempFile, called after clearing the MimeHandler cache (killing
the subprocesses which might hold an open file).
2019-07-21 16:23:16 +02:00
Jean-Francois Dockes
45043b816f add onlyNames config variable for filtering file names 2019-06-17 08:28:14 +02:00
Jean-Francois Dockes
1991e132a7 bumped version to 1.25.19 2019-06-13 08:38:02 +02:00
Jean-Francois Dockes
b759490559 gcc 9.1: comparison object needs to be invocable as const. fixes issue #95 2019-06-12 11:17:35 +02:00
Jean-Francois Dockes
0101e6e160 bumped version ->1.25.18 2019-05-27 17:08:43 +02:00
Jean-Francois Dockes
15dc419fec windows: releases adjustments 2019-05-22 15:52:02 +02:00