623 Commits

Author SHA1 Message Date
Jean-Francois Dockes
4756b1252b Add recollindex option to write file not indexed reasons to diagnostics file 2021-04-01 10:32:04 +02:00
Jean-Francois Dockes
8da0bf28cc GUI: add popup menu option to copy simple file name. fix typo keytcfn->keyctfn. Change utf8check() parms for easier usage 2021-03-27 09:00:28 +01:00
Jean-Francois Dockes
f57530e2a6 Python module: add method to retrieve the full snippets list 2021-03-10 13:30:50 +01:00
freddii
89c7efe682 fixed typos 2021-02-04 17:12:22 +01:00
Jean-Francois Dockes
cd46ba62fc multiword synonyms: fix position wrong by 1, phrase prox to adjacent term failed 2021-01-15 15:42:09 +01:00
Jean-Francois Dockes
aa2f0bfd73 implemented multi-word terms indexing for phrase/prox search on multiword synonyms 2021-01-15 14:13:08 +01:00
Jean-Francois Dockes
3c7e3ccbc7 Add some override qualifiers on methods 2020-12-28 14:17:27 +01:00
Jean-Francois Dockes
3479e7cd85 rclquery: increase the slice size from 50 to 100 seems to generally improve perfs 2020-12-17 11:13:59 +01:00
Jean-Francois Dockes
19eac2d7dc Renamed path_open() -> path_streamopen() 2020-09-29 13:35:55 +02:00
Jean-Francois Dockes
c1ef2187d3 Fixed LOG calls obsolescence issues preventing build with staticverbosity 7 2020-09-06 14:59:00 +01:00
Jean-Francois Dockes
20e845709e recollq: added option -p to be used with -A for showing page-numbered snippets instead of abstract 2020-08-29 09:43:21 +02:00
Jean-Francois Dockes
d9c1a9648c Windows msvc: rename dirent.h->msvc_dirent.h. mh_text: fix mimeconf-win and warning 2020-08-15 10:12:36 +01:00
Jean-Francois Dockes
322e17081f GUI filename search: arrange for directories to be sorted first by default 2020-08-11 18:30:51 +02:00
Jean-Francois Dockes
13333e6512 use common method when concatenating multiple values for a metadata element. Use a comma as separator 2020-08-11 11:39:22 +02:00
Jean-Francois Dockes
09ad94f3b7 removed obsolete test mains Makefiles 2020-08-06 11:46:11 +02:00
Jean-Francois Dockes
3948f9bd33 GUI: create separate popup menu entries for open parent and open folder 2020-07-16 10:25:26 +02:00
Jean-Francois Dockes
02556e7d08 doc and comments 2020-06-25 16:06:45 +02:00
Jean-Francois Dockes
f15e3f21fa Windows: replace unlink() with unicode-capable path_unlink() 2020-06-02 10:56:55 +01:00
Jean-Francois Dockes
560041cab9 cleared out errant tabs 2020-05-30 15:54:49 +02:00
Jean-Francois Dockes
796db76fc6 When splitting to generate abstract from text, do not set ONLYSPANS, generate all terms. Seems to solve issues with the snippet generator not finding a match when the query term is a partial span 2020-05-30 12:37:14 +02:00
Jean-Francois Dockes
5f76c2527d GUI searching with saved query: restore external indexes from saved query 2020-05-19 14:20:21 +02:00
Jean-Francois Dockes
2f794be314 Fix Windows gcc build. Needs some def to get w7+ windows api 2020-04-25 11:41:37 +02:00
Jean-Francois Dockes
126ac47dba tabs and indents 2020-04-24 13:45:41 +02:00
Jean-Francois Dockes
8a29522ef8 Fix issues consequent to type change for searchdata m_minsize and m_maxsize members 2020-04-21 13:45:00 +01:00
Jean-Francois Dockes
39c152bada Fixed MSVC warnings, all inocuous 2020-04-17 14:26:40 +01:00
Jean-Francois Dockes
12ebb7ac6e Windows: deal with non-ASCII user login, non-ascii paths in confdir etc. 2020-04-15 14:03:04 +01:00
Jean-Francois Dockes
9565663f09 textsplit: create isNGRAMMED() method to replace isCJK() and let the latter actually return what it says 2020-04-14 09:27:26 +02:00
Jean-Francois Dockes
5dd8774b3c whitespace and indents only 2020-04-14 09:25:13 +02:00
Jean-Francois Dockes
6999284c42 indent and decls 2020-04-05 13:46:47 +01:00
Jean-Francois Dockes
afcacf63c0 Fix page handling in Korean spitter, bug would shift the byte positions, with bad consequences for snippets 2020-03-31 16:11:37 +02:00
Jean-Francois Dockes
b6cd22c320 rcldb: message log level change (docid beyond updated.size()) 2020-03-27 10:56:14 +01:00
Jean-Francois Dockes
414222c003 use conftree conversions 2019-12-02 09:37:34 +01:00
Jean-Francois Dockes
f42338c026 recollq: add option to obtain exact result count 2019-11-28 16:13:27 +01:00
Jean-Francois Dockes
1243c30980 rcldb_p needs to include log.h if threads disabled 2019-11-25 09:58:26 +01:00
Jean-Francois Dockes
b368e4276f do not include excluded terms in the highlight information data 2019-07-21 19:13:24 +02:00
Jean-Francois Dockes
6a405e2089 hldata: comments + map->unordered_map 2019-07-21 19:13:24 +02:00
Jean-Francois Dockes
8ed8d05aab cjk phrases: hopefully the right fix this time for slack computation. lastpos-termcount correction was applied twice 2019-07-21 19:13:24 +02:00
Jean-Francois Dockes
fae0621d76 hldata generation during query processing: increase slack if position increases faster than term count (cjk) 2019-07-21 19:13:24 +02:00
Jean-Francois Dockes
6cd2c9e2ca snippets: allow a little more contiguous expansion of current snippet 2019-07-21 19:13:24 +02:00
Jean-Francois Dockes
35ee3f7a13 Highlighting and snippets extraction: reworked to handle phrases properly. Use a compound position list instead of multiplying the OR groups inside a near clause 2019-07-21 19:09:51 +02:00
Jean-Francois Dockes
736051fcd6 GUI snippets window: add options for the max list length and for sorting the snippets by page number 2019-07-21 19:09:51 +02:00
Jean-Francois Dockes
3f7d270691 GUI preview: improve operation when the index data is not up to date.
Avoid erasing all the file index data in case the subsequent update fails
(e.g. the file is locked). Improve the messages. Check for previous
indexing error, and modify the message.
2019-06-24 17:37:37 +02:00
Jean-Francois Dockes
ee8c5410bd Avoid purging existing subdocuments on file indexing error (e.g.: maybe a file lock issue that will go away) 2019-06-21 17:18:15 +02:00
Jean-Francois Dockes
be214c4a5a Take advantage of text storage when possible to display preview data for an unaccessible document 2019-06-16 11:49:18 +02:00
Jean-Francois Dockes
2a945c9443 abstract: we used to discard snippets too early, before they might get a phrase weight boost 2019-05-24 08:51:11 +02:00
Jean-Francois Dockes
81a91404a4 logs 2019-05-18 16:50:12 +02:00
Jean-Francois Dockes
8ddcc578ac Reverted 34d43d1188adfddb8fd8a4f7c7a28158a8b534f4
Keep only the main Snippet-producing makeabstract in rclquery, further
  formatting done in using modules
This was just a bad idea. The common methods are also used by the python module
2019-05-17 10:19:03 +02:00
Jean-Francois Dockes
a5810508ed abstract: optimize the way we retrieve the wdfs by sorting the list of terms we query for. Big difference on very big docs 2019-05-17 09:39:26 +02:00
Jean-Francois Dockes
fdb14e60ac building abstract from stored text: limit count of terms explored to avoid taking forever on monster (multi mega-terms) documents 2019-05-17 09:37:39 +02:00
Jean-Francois Dockes
8428093f6a synfamily: indent/log formats/extracted test main. No real change 2019-05-16 15:31:41 +02:00