27 Commits

Author SHA1 Message Date
Jean-Francois Dockes
ce0352eff4 Disable std::regex use for older gcc versions 2021-08-13 21:38:12 +02:00
Jean-Francois Dockes
fc0a48a524 Snippets generation: we did not store a possible last incomplete snippet at the end of the text 2021-04-13 10:43:10 +02:00
Jean-Francois Dockes
a3b1b48450 fixed mac os clang warnings 2021-04-01 09:22:17 -07:00
freddii
89c7efe682 fixed typos 2021-02-04 17:12:22 +01:00
Jean-Francois Dockes
796db76fc6 When splitting to generate abstract from text, do not set ONLYSPANS, generate all terms. Seems to solve issues with the snippet generator not finding a match when the query term is a partial span 2020-05-30 12:37:14 +02:00
Jean-Francois Dockes
39c152bada Fixed MSVC warnings, all inocuous 2020-04-17 14:26:40 +01:00
Jean-Francois Dockes
afcacf63c0 Fix page handling in Korean spitter, bug would shift the byte positions, with bad consequences for snippets 2020-03-31 16:11:37 +02:00
Jean-Francois Dockes
6a405e2089 hldata: comments + map->unordered_map 2019-07-21 19:13:24 +02:00
Jean-Francois Dockes
6cd2c9e2ca snippets: allow a little more contiguous expansion of current snippet 2019-07-21 19:13:24 +02:00
Jean-Francois Dockes
35ee3f7a13 Highlighting and snippets extraction: reworked to handle phrases properly. Use a compound position list instead of multiplying the OR groups inside a near clause 2019-07-21 19:09:51 +02:00
Jean-Francois Dockes
736051fcd6 GUI snippets window: add options for the max list length and for sorting the snippets by page number 2019-07-21 19:09:51 +02:00
Jean-Francois Dockes
2a945c9443 abstract: we used to discard snippets too early, before they might get a phrase weight boost 2019-05-24 08:51:11 +02:00
Jean-Francois Dockes
fdb14e60ac building abstract from stored text: limit count of terms explored to avoid taking forever on monster (multi mega-terms) documents 2019-05-17 09:37:39 +02:00
Jean-Francois Dockes
34bb62a8d9 got rid of a few unused variable warnings 2019-04-11 15:31:27 +02:00
Jean-Francois Dockes
0cbc46732f Fixed the FSF address 2019-03-04 11:19:14 +01:00
Jean-Francois Dockes
b69912bfab Fix crash during abstract generation, occuring when no matching fragments are found 2019-02-19 19:02:23 +01:00
Jean-Francois Dockes
b079f0fb94 adjust log message levels and fix a warning 2019-02-04 11:42:35 +01:00
Jean-Francois Dockes
8358742132 get things to build on centos7.5 (cosmetic changes) 2018-09-02 18:47:03 +02:00
Jean-Francois Dockes
84abb8ac04 Fix regex used for cleaning up snippets 2018-04-12 12:25:05 +02:00
Jean-Francois Dockes
e4e5ee35d6 cleanup repeated punctuation in snippets 2018-04-10 13:07:27 +02:00
Jean-Francois Dockes
3d4fd3c62e When storing doc text, always use a metadata entry. Get rid of the code to
store it in the data record. Make storing the default.  Add "fetchtext"
parameter to getDoc() to fetch and store the text in doc.text. Make this
accessible from Python. Misc comments and indents.
2018-01-25 13:20:02 +01:00
Jean-Francois Dockes
2c76a70c0e Abstracts: storing raw doc text in user metadata records 2018-01-06 11:38:24 +01:00
Jean-Francois Dockes
57d9ece876 rclabsfromtext: do not add page numbers if there are no pages 2018-01-06 10:39:02 +01:00
Jean-Francois Dockes
a35de1ef1e snippets: fix to the group matching code 2018-01-03 15:53:04 +01:00
Jean-Francois Dockes
567401233a Building abstract/snippets from the doc text: process phrase/group terms 2018-01-03 15:28:46 +01:00
Jean-Francois Dockes
bb810f9ceb Changed new param name storerawtext->storedoctext. + comments 2018-01-02 19:23:12 +01:00
Jean-Francois Dockes
b4493ed9e1 Snippets generation: add method for generating from doc stored text. Still needs refining, esp. for phrase/near 2017-12-30 08:43:14 +01:00