233 Commits

Author SHA1 Message Date
Jean-Francois Dockes
2d6e11c0aa simplified field config a bit by moving some hard coded values from the c++ to the fields file 2012-08-28 14:44:53 +02:00
Jean-Francois Dockes
776800f47a arrange to create all stem dicts in one pass 2012-08-28 13:39:34 +02:00
Jean-Francois Dockes
fc8b458222 create class StemDb as derived class from XapSynFamily 2012-08-27 15:38:08 +02:00
Jean-Francois Dockes
bd0f002c1a Reimplemented the stem expansion mechanism over Xapian synonyms feature 2012-08-25 11:12:36 +02:00
"Jean-Francois Dockes ext:(%22)
0ebfc496d8 add capability to remember page breaks generated by, e.g. pdftotext, and use them to start an external viewer on a match page 2012-08-21 15:03:02 +02:00
Jean-Francois Dockes
baf450e75a rcldb fix crash caused by 5c8d237c639d in case there is only one index 2012-05-04 11:54:07 +02:00
Jean-Francois Dockes
73a3106a6d GUI: only do the result up to date check before preview for the main index (we cant update the others anyway) 2012-05-04 09:52:14 +02:00
Jean-Francois Dockes
8b34610dde Cleaned up file name handling. Fixes that file names were sometimes indexed split, sometimes not. They now always are both, with different prefixes. Forces reindex 2012-04-13 09:18:08 +02:00
Jean-Francois Dockes
4eaf12fb9c more delistification 2012-04-12 08:15:50 +02:00
Jean-Francois Dockes
ec7b40a52e cosmetics: list -> vector in more places 2012-04-11 19:58:08 +02:00
Jean-Francois Dockes
c7c9c49437 add -Z "in place reset" option to recollindex 2012-04-11 11:33:33 +02:00
Jean-Francois Dockes
07813ab6ba Dont store filename in empty title at index time, to keep choice at display time. Define %t as title in addition to %T as title or filename 2012-03-10 14:45:40 +01:00
Jean-Francois Dockes
7ddbbb1ee8 search language: implemented filtering on file size 2012-03-07 17:08:22 +01:00
Jean-Francois Dockes
85166c93b2 Changed the way we handle document sizes. The fbytes field should now be in most cases the most "natural" document size. pcbytes holds the top external container size and dbytes the text size 2012-03-07 15:39:30 +01:00
Jean-Francois Dockes
7b5a891ee3 idx: make Doc parameter to addOrUpdate non const to avoid extra copy 2012-03-07 08:34:25 +01:00
Jean-Francois Dockes
9bc2fc8958 Experimented with multithreading the indexing pipeline. Left undef'd as 15%-30% improvement of indexing time does not seem worth the complexity 2012-02-21 17:09:02 +01:00
Jean-Francois Dockes
516863b5d6 GUI: perform up to date check before previewing a subdoc. This is for example to avoid showing the wrong message if a mail folder has been compacted 2012-01-20 17:48:55 +01:00
Jean-Francois Dockes
607d3cc27b Add prefix translation for "mtype". Allows using term expansion to retrieve all the types from the index 2011-11-25 19:47:39 +01:00
Jean-Francois Dockes
0860b559ee get rid of a few garbage terms during indexing. Set a threshold for conversion errors after which we discard the doc. Stabilize the new termproc pipeline but no commongrams for now 2011-10-12 17:55:58 +02:00
Jean-Francois Dockes
5fd31172f5 New text to terms processing pipelines: results identical to 1.16 when used with empty stopfile 2011-10-07 07:53:49 +02:00
Jean-Francois Dockes
eda494153e simplify calls to isStop 2011-10-05 17:25:35 +02:00
Jean-Francois Dockes
acb297c9df comments + move the position jump to text_to_words 2011-10-04 16:33:44 +02:00
Jean-Francois Dockes
4ced9bee49 add termDocCnt method 2011-10-04 08:04:17 +02:00
Jean-Francois Dockes
38e0957962 const string cleanup 2011-10-01 16:39:38 +02:00
Jean-Francois Dockes
383468e2fc bump doc create/update messages updates to loginfo so that indexing progress can be monitored with less noise 2011-09-30 08:47:39 +02:00
Jean-Francois Dockes
424e4173ba threading cleanup: add mutex protection around moronic change to transcode. Add mutex to equiv issue in unac. Rename const strings everywhere to cstr_xx to ease future detection of potentially problematic static variables. Most probably close issue #65 2011-09-28 15:01:14 +02:00
Jean-Francois Dockes
e0d211d602 none 2011-09-20 17:16:41 +02:00
Jean-Francois Dockes
ee0d602ab3 Implement anchored searches: terms to be found at a maximum distance of the start or end of the text 2011-09-20 16:42:56 +02:00
Jean-Francois Dockes
c5ff0cdf52 Control memory usage when deleting documents: use idxflushmb as when adding/updating 2011-09-07 19:11:11 +02:00
Jean-Francois Dockes
a380873029 suppress some sources of spurious ellipsises in abstracts 2011-08-24 14:51:59 +02:00
Jean-Francois Dockes
d3fc258d85 avoid generating empty abstract field 2011-08-19 09:20:11 +02:00
"Jean-Francois Dockes ext:(%22)
ebbcc115a8 Allow setting a weight increase for field terms 2011-07-22 16:43:39 +02:00
"Jean-Francois Dockes ext:(%22)
48e86c99b5 GUI restable: fix sorting by file and doc size 2011-07-20 10:44:04 +02:00
Jean-Francois Dockes
469c544915 GUI: allow setting the snippet separator inside abstract (now a real html ellipsis by default) 2011-07-07 11:11:02 +02:00
Jean-Francois Dockes
b6c73ecdeb debug: improve consistency of log messages about up to date/processed files 2011-06-04 10:18:46 +02:00
Jean-Francois Dockes
08a65f5cfc experiment with xapian spell support (not ready yet) + take care of some static init issues showing up on the mac 2011-05-10 10:15:15 +02:00
Jean-Francois Dockes
84d59f18a0 GUI: when opening the index, discriminate errors on the main index from errors on external ones, to avoid starting the initial indexing dialog in the latter case 2011-04-29 16:16:04 +02:00
Jean-Francois Dockes
a4d1689581 try to be more responsive to user interrupts: do not build the aux databases after an interruption, and check for an interruption during the purge pass 2011-04-28 12:27:06 +02:00
Jean-Francois Dockes
55f124725f Fix problems that occurred when multiple threads were trying to read/convert files at the same time (ie: indexing and previewing threads in the GUI calling internfile()). Either get rid of or lock-protect all shared data, eliminate misc initialization possible conflicts by using static initializers. Hopefuly closes issue #51 2011-04-28 10:58:33 +02:00
Jean-Francois Dockes
01f24fa5fd cleaning up static variables 2011-04-27 09:09:01 +02:00
Jean-Francois Dockes
b28eaf23fb Got rid of all the old RCS id strings 2011-04-27 08:22:17 +02:00
Jean-Francois Dockes
963d7c50fd suppressed some overly repeated log messages 2011-03-11 11:49:54 +01:00
Jean-Francois Dockes
26929e9fb9 index: fixed the fix for path elts too long... 2011-02-14 20:30:26 +01:00
Jean-Francois Dockes
bf39719ac3 Indexing: need to truncate pathologically long path elements (would cause add_document error) 2011-02-13 10:07:25 +01:00
Jean-Francois Dockes
93fb51d59b query: add duplication indicator to relevancy rating 2011-01-17 16:04:07 +01:00
Jean-Francois Dockes
85b36d3c34 filename search fields: generate an AND of OR lists out of wildcard expansion instead of a global OR which did not make much sense 2011-01-13 11:47:35 +01:00
Jean-Francois Dockes
0a6063542f Gui: misc event/signals cleanups. No functional changes 2010-12-22 18:07:18 +01:00
Jean-Francois Dockes
45c08165f5 log message format 2010-12-21 10:34:02 +01:00
Jean-Francois Dockes
c79410da94 Move sort/filtering code out of reslist 2010-12-18 15:45:12 +01:00
Jean-Francois Dockes
61348a7731 GUI: got rid of the sort parameters dialog and sort by mime type, replaced by 2 arrows in toolbar for sorting by date, ascending or descending 2010-12-17 13:18:13 +01:00