Jean-Francois Dockes
ce0352eff4
Disable std::regex use for older gcc versions
2021-08-13 21:38:12 +02:00
Jean-Francois Dockes
fc0a48a524
Snippets generation: we did not store a possible last incomplete snippet at the end of the text
2021-04-13 10:43:10 +02:00
Jean-Francois Dockes
a3b1b48450
fixed mac os clang warnings
2021-04-01 09:22:17 -07:00
freddii
89c7efe682
fixed typos
2021-02-04 17:12:22 +01:00
Jean-Francois Dockes
796db76fc6
When splitting to generate abstract from text, do not set ONLYSPANS, generate all terms. Seems to solve issues with the snippet generator not finding a match when the query term is a partial span
2020-05-30 12:37:14 +02:00
Jean-Francois Dockes
39c152bada
Fixed MSVC warnings, all inocuous
2020-04-17 14:26:40 +01:00
Jean-Francois Dockes
afcacf63c0
Fix page handling in Korean spitter, bug would shift the byte positions, with bad consequences for snippets
2020-03-31 16:11:37 +02:00
Jean-Francois Dockes
6a405e2089
hldata: comments + map->unordered_map
2019-07-21 19:13:24 +02:00
Jean-Francois Dockes
6cd2c9e2ca
snippets: allow a little more contiguous expansion of current snippet
2019-07-21 19:13:24 +02:00
Jean-Francois Dockes
35ee3f7a13
Highlighting and snippets extraction: reworked to handle phrases properly. Use a compound position list instead of multiplying the OR groups inside a near clause
2019-07-21 19:09:51 +02:00
Jean-Francois Dockes
736051fcd6
GUI snippets window: add options for the max list length and for sorting the snippets by page number
2019-07-21 19:09:51 +02:00
Jean-Francois Dockes
2a945c9443
abstract: we used to discard snippets too early, before they might get a phrase weight boost
2019-05-24 08:51:11 +02:00
Jean-Francois Dockes
fdb14e60ac
building abstract from stored text: limit count of terms explored to avoid taking forever on monster (multi mega-terms) documents
2019-05-17 09:37:39 +02:00
Jean-Francois Dockes
34bb62a8d9
got rid of a few unused variable warnings
2019-04-11 15:31:27 +02:00
Jean-Francois Dockes
0cbc46732f
Fixed the FSF address
2019-03-04 11:19:14 +01:00
Jean-Francois Dockes
b69912bfab
Fix crash during abstract generation, occuring when no matching fragments are found
2019-02-19 19:02:23 +01:00
Jean-Francois Dockes
b079f0fb94
adjust log message levels and fix a warning
2019-02-04 11:42:35 +01:00
Jean-Francois Dockes
8358742132
get things to build on centos7.5 (cosmetic changes)
2018-09-02 18:47:03 +02:00
Jean-Francois Dockes
84abb8ac04
Fix regex used for cleaning up snippets
2018-04-12 12:25:05 +02:00
Jean-Francois Dockes
e4e5ee35d6
cleanup repeated punctuation in snippets
2018-04-10 13:07:27 +02:00
Jean-Francois Dockes
3d4fd3c62e
When storing doc text, always use a metadata entry. Get rid of the code to
...
store it in the data record. Make storing the default. Add "fetchtext"
parameter to getDoc() to fetch and store the text in doc.text. Make this
accessible from Python. Misc comments and indents.
2018-01-25 13:20:02 +01:00
Jean-Francois Dockes
2c76a70c0e
Abstracts: storing raw doc text in user metadata records
2018-01-06 11:38:24 +01:00
Jean-Francois Dockes
57d9ece876
rclabsfromtext: do not add page numbers if there are no pages
2018-01-06 10:39:02 +01:00
Jean-Francois Dockes
a35de1ef1e
snippets: fix to the group matching code
2018-01-03 15:53:04 +01:00
Jean-Francois Dockes
567401233a
Building abstract/snippets from the doc text: process phrase/group terms
2018-01-03 15:28:46 +01:00
Jean-Francois Dockes
bb810f9ceb
Changed new param name storerawtext->storedoctext. + comments
2018-01-02 19:23:12 +01:00
Jean-Francois Dockes
b4493ed9e1
Snippets generation: add method for generating from doc stored text. Still needs refining, esp. for phrase/near
2017-12-30 08:43:14 +01:00