589 Commits

Author SHA1 Message Date
Jean-Francois Dockes
b368e4276f do not include excluded terms in the highlight information data 2019-07-21 19:13:24 +02:00
Jean-Francois Dockes
6a405e2089 hldata: comments + map->unordered_map 2019-07-21 19:13:24 +02:00
Jean-Francois Dockes
8ed8d05aab cjk phrases: hopefully the right fix this time for slack computation. lastpos-termcount correction was applied twice 2019-07-21 19:13:24 +02:00
Jean-Francois Dockes
fae0621d76 hldata generation during query processing: increase slack if position increases faster than term count (cjk) 2019-07-21 19:13:24 +02:00
Jean-Francois Dockes
6cd2c9e2ca snippets: allow a little more contiguous expansion of current snippet 2019-07-21 19:13:24 +02:00
Jean-Francois Dockes
35ee3f7a13 Highlighting and snippets extraction: reworked to handle phrases properly. Use a compound position list instead of multiplying the OR groups inside a near clause 2019-07-21 19:09:51 +02:00
Jean-Francois Dockes
736051fcd6 GUI snippets window: add options for the max list length and for sorting the snippets by page number 2019-07-21 19:09:51 +02:00
Jean-Francois Dockes
3f7d270691 GUI preview: improve operation when the index data is not up to date.
Avoid erasing all the file index data in case the subsequent update fails
(e.g. the file is locked). Improve the messages. Check for previous
indexing error, and modify the message.
2019-06-24 17:37:37 +02:00
Jean-Francois Dockes
ee8c5410bd Avoid purging existing subdocuments on file indexing error (e.g.: maybe a file lock issue that will go away) 2019-06-21 17:18:15 +02:00
Jean-Francois Dockes
be214c4a5a Take advantage of text storage when possible to display preview data for an unaccessible document 2019-06-16 11:49:18 +02:00
Jean-Francois Dockes
2a945c9443 abstract: we used to discard snippets too early, before they might get a phrase weight boost 2019-05-24 08:51:11 +02:00
Jean-Francois Dockes
81a91404a4 logs 2019-05-18 16:50:12 +02:00
Jean-Francois Dockes
8ddcc578ac Reverted 34d43d1188adfddb8fd8a4f7c7a28158a8b534f4
Keep only the main Snippet-producing makeabstract in rclquery, further
  formatting done in using modules
This was just a bad idea. The common methods are also used by the python module
2019-05-17 10:19:03 +02:00
Jean-Francois Dockes
a5810508ed abstract: optimize the way we retrieve the wdfs by sorting the list of terms we query for. Big difference on very big docs 2019-05-17 09:39:26 +02:00
Jean-Francois Dockes
fdb14e60ac building abstract from stored text: limit count of terms explored to avoid taking forever on monster (multi mega-terms) documents 2019-05-17 09:37:39 +02:00
Jean-Francois Dockes
8428093f6a synfamily: indent/log formats/extracted test main. No real change 2019-05-16 15:31:41 +02:00
Jean-Francois Dockes
10a500aa1c comment 2019-05-16 15:29:52 +02:00
Jean-Francois Dockes
d4900584c8 comment 2019-05-16 15:29:32 +02:00
Jean-Francois Dockes
34d43d1188 Keep only the main Snippet-producing makeabstract in rclquery, further formatting done in using modules 2019-05-13 18:11:23 +02:00
Jean-Francois Dockes
ee5a260d54 Print xapian error when flush fails during purge 2019-05-02 10:36:24 +02:00
Jean-Francois Dockes
54f0eda990 make doc.meta an unordered_map 2019-04-20 15:04:19 +02:00
Jean-Francois Dockes
34bb62a8d9 got rid of a few unused variable warnings 2019-04-11 15:31:27 +02:00
Jean-Francois Dockes
b914e1d5b9 Fix abstract building when additional indexes are built: the raw text must be fetched by get_metadata() from its own index, not the combined one 2019-03-20 10:57:05 +01:00
Jean-Francois Dockes
0cbc46732f Fixed the FSF address 2019-03-04 11:19:14 +01:00
Jean-Francois Dockes
037aa07bfd Suppress compiler warning about a possibly truncated snprintf (no real problem) by increasing a buffer size 2019-03-04 10:58:49 +01:00
Jean-Francois Dockes
b69912bfab Fix crash during abstract generation, occuring when no matching fragments are found 2019-02-19 19:02:23 +01:00
Jean-Francois Dockes
9574030edc No need for boosting the original term if there was no expansion 2019-02-14 14:54:17 +01:00
Jean-Francois Dockes
b079f0fb94 adjust log message levels and fix a warning 2019-02-04 11:42:35 +01:00
Jean-Francois Dockes
399c633efd Avoid purging documents from absent mountable volumes 2019-02-03 18:51:52 +01:00
Jean-Francois Dockes
a16e39a92b improve readability by fixing LOG statements and using auto and range-fors 2019-02-03 12:31:42 +01:00
Jean-Francois Dockes
04f3449f99 Avoid multiple expansion of xapian term iterator 2019-02-01 09:07:28 +01:00
Jean-Francois Dockes
2909eec062 get rid of redundant rclversion.h 2019-01-30 12:40:55 +01:00
Jean-Francois Dockes
dcd517bcf2 Fix -z always resetting index to non-text-storing independantly of configuration 2019-01-29 20:11:43 +01:00
Jean-Francois Dockes
b62478c0cc Indent + comments + use c++11 loops 2018-11-14 10:38:22 +01:00
Jean-Francois Dockes
ea999ed6e5 Indent + comments + use c++11 initializers 2018-11-14 10:30:31 +01:00
Jean-Francois Dockes
036e1da6b4 rcldb: change flush log message level to INF 2018-11-14 09:42:22 +01:00
Jean-Francois Dockes
3ff88a9541 suppress query time updated map spurious error message 2018-10-07 09:07:51 +02:00
Jean-Francois Dockes
8358742132 get things to build on centos7.5 (cosmetic changes) 2018-09-02 18:47:03 +02:00
Jean-Francois Dockes
6441eea8aa Store the origin dbdir inside the GUI doc history, so we can later fetch documents from external indexes 2018-05-31 15:01:17 +02:00
Jean-Francois Dockes
ea3bd23d7c Fixed namespace decls issues 2018-04-18 09:34:58 +02:00
Jean-Francois Dockes
84abb8ac04 Fix regex used for cleaning up snippets 2018-04-12 12:25:05 +02:00
Jean-Francois Dockes
e4e5ee35d6 cleanup repeated punctuation in snippets 2018-04-10 13:07:27 +02:00
Jean-Francois Dockes
3168ba1082 log message 2018-04-10 10:31:24 +02:00
Jean-Francois Dockes
21adaca229 Add parameter to truncate all document text to specified length 2018-04-08 10:54:09 +02:00
Jean-Francois Dockes
7b83438e9c fix the ifdef condition for not trying to create a stretch db 2018-02-05 15:30:34 +01:00
Jean-Francois Dockes
cecd1b4ba7 Merge 1.23 Windows changes intended to improve the index rebuild failures caused by open files 2018-01-25 15:34:27 +01:00
Jean-Francois Dockes
3d4fd3c62e When storing doc text, always use a metadata entry. Get rid of the code to
store it in the data record. Make storing the default.  Add "fetchtext"
parameter to getDoc() to fetch and store the text in doc.text. Make this
accessible from Python. Misc comments and indents.
2018-01-25 13:20:02 +01:00
Jean-Francois Dockes
8b60cffa65 ranges: lowercase as needed when indexing 2018-01-24 15:58:50 +01:00
Jean-Francois Dockes
595e419d93 Implemented range queries, based on storing fields in xapian values 2018-01-24 09:43:20 +01:00
Jean-Francois Dockes
fd32872218 Improve 'rebuild index' under Windows: this often failed because of some
open files in the Xapian db (could not be deleted under windows).
Now only fails if a preview has been opened, and a GUI restart fixes the
situation.
2018-01-20 11:59:00 +01:00