305 Commits

Author SHA1 Message Date
Jean-Francois Dockes
7ddbbb1ee8 search language: implemented filtering on file size 2012-03-07 17:08:22 +01:00
Jean-Francois Dockes
85166c93b2 Changed the way we handle document sizes. The fbytes field should now be in most cases the most "natural" document size. pcbytes holds the top external container size and dbytes the text size 2012-03-07 15:39:30 +01:00
Jean-Francois Dockes
7b5a891ee3 idx: make Doc parameter to addOrUpdate non const to avoid extra copy 2012-03-07 08:34:25 +01:00
Jean-Francois Dockes
25a99a3b38 add omega-compatible value slot for file size 2012-03-06 07:28:18 +01:00
Jean-Francois Dockes
6cdf9ae12b Accept and process relative/incomplete paths with the dir: directive (dont anchor path phrase is path does not start with /) 2012-02-24 19:25:55 +01:00
Jean-Francois Dockes
9bc2fc8958 Experimented with multithreading the indexing pipeline. Left undef'd as 15%-30% improvement of indexing time does not seem worth the complexity 2012-02-21 17:09:02 +01:00
Jean-Francois Dockes
516863b5d6 GUI: perform up to date check before previewing a subdoc. This is for example to avoid showing the wrong message if a mail folder has been compacted 2012-01-20 17:48:55 +01:00
Jean-Francois Dockes
036937e8bf added getmeta() method to Rcl::Doc and use in misc places 2012-01-20 14:48:50 +01:00
Jean-Francois Dockes
1931595637 GUI: added menu entry to show all the mime types actually indexed (by content) 2011-11-25 19:47:56 +01:00
Jean-Francois Dockes
607d3cc27b Add prefix translation for "mtype". Allows using term expansion to retrieve all the types from the index 2011-11-25 19:47:39 +01:00
Jean-Francois Dockes
8d52e928d1 increase slack for automatic phrases 2011-10-20 13:25:33 +02:00
Jean-Francois Dockes
0860b559ee get rid of a few garbage terms during indexing. Set a threshold for conversion errors after which we discard the doc. Stabilize the new termproc pipeline but no commongrams for now 2011-10-12 17:55:58 +02:00
Jean-Francois Dockes
4a7ff398b2 comments 2011-10-07 08:05:36 +02:00
Jean-Francois Dockes
5fd31172f5 New text to terms processing pipelines: results identical to 1.16 when used with empty stopfile 2011-10-07 07:53:49 +02:00
Jean-Francois Dockes
eda494153e simplify calls to isStop 2011-10-05 17:25:35 +02:00
Jean-Francois Dockes
acb297c9df comments + move the position jump to text_to_words 2011-10-04 16:33:44 +02:00
Jean-Francois Dockes
e4eba0de97 stoplist: use stringToStrings in place of splitter to support quoted space-containing entries 2011-10-04 16:04:28 +02:00
Jean-Francois Dockes
bb2685c2f5 Add frequency threshold to avoid adding common term to the automatic phrase search extension. Use autophrase by default with simple search, with a default freq threshold at 2% 2011-10-04 09:03:43 +02:00
Jean-Francois Dockes
4ced9bee49 add termDocCnt method 2011-10-04 08:04:17 +02:00
Jean-Francois Dockes
38e0957962 const string cleanup 2011-10-01 16:39:38 +02:00
Jean-Francois Dockes
702fb88a1e Search: remove restriction on empty queries by replacing empty query with Xapian::Query::Matchall. This allows querying all files of a given type, or under a given tree, without an actual text search part 2011-09-30 08:50:50 +02:00
Jean-Francois Dockes
383468e2fc bump doc create/update messages updates to loginfo so that indexing progress can be monitored with less noise 2011-09-30 08:47:39 +02:00
Jean-Francois Dockes
424e4173ba threading cleanup: add mutex protection around moronic change to transcode. Add mutex to equiv issue in unac. Rename const strings everywhere to cstr_xx to ease future detection of potentially problematic static variables. Most probably close issue #65 2011-09-28 15:01:14 +02:00
Jean-Francois Dockes
e0d211d602 none 2011-09-20 17:16:41 +02:00
Jean-Francois Dockes
ee0d602ab3 Implement anchored searches: terms to be found at a maximum distance of the start or end of the text 2011-09-20 16:42:56 +02:00
Jean-Francois Dockes
c5ff0cdf52 Control memory usage when deleting documents: use idxflushmb as when adding/updating 2011-09-07 19:11:11 +02:00
Jean-Francois Dockes
a380873029 suppress some sources of spurious ellipsises in abstracts 2011-08-24 14:51:59 +02:00
Jean-Francois Dockes
d3fc258d85 avoid generating empty abstract field 2011-08-19 09:20:11 +02:00
"Jean-Francois Dockes ext:(%22)
ebbcc115a8 Allow setting a weight increase for field terms 2011-07-22 16:43:39 +02:00
"Jean-Francois Dockes ext:(%22)
48e86c99b5 GUI restable: fix sorting by file and doc size 2011-07-20 10:44:04 +02:00
Jean-Francois Dockes
469c544915 GUI: allow setting the snippet separator inside abstract (now a real html ellipsis by default) 2011-07-07 11:11:02 +02:00
Jean-Francois Dockes
b6c73ecdeb debug: improve consistency of log messages about up to date/processed files 2011-06-04 10:18:46 +02:00
Jean-Francois Dockes
91f277ec26 Search: allow setting weights on terms, ie: "important"2.5 2011-05-30 14:03:01 +02:00
Jean-Francois Dockes
ce9e9e4d00 query: support negative mime and catg clauses: -mime:text/plain 2011-05-15 09:29:24 +02:00
Jean-Francois Dockes
08a65f5cfc experiment with xapian spell support (not ready yet) + take care of some static init issues showing up on the mac 2011-05-10 10:15:15 +02:00
Jean-Francois Dockes
ce607032fa Fix a number of potential or actual static object initialization issues 2011-05-09 20:49:15 +02:00
Jean-Francois Dockes
32f4f7b6fc Fix a number of potential or actual static object initialization issues 2011-05-09 20:48:59 +02:00
Jean-Francois Dockes
84d59f18a0 GUI: when opening the index, discriminate errors on the main index from errors on external ones, to avoid starting the initial indexing dialog in the latter case 2011-04-29 16:16:04 +02:00
Jean-Francois Dockes
a4d1689581 try to be more responsive to user interrupts: do not build the aux databases after an interruption, and check for an interruption during the purge pass 2011-04-28 12:27:06 +02:00
Jean-Francois Dockes
55f124725f Fix problems that occurred when multiple threads were trying to read/convert files at the same time (ie: indexing and previewing threads in the GUI calling internfile()). Either get rid of or lock-protect all shared data, eliminate misc initialization possible conflicts by using static initializers. Hopefuly closes issue #51 2011-04-28 10:58:33 +02:00
Jean-Francois Dockes
01f24fa5fd cleaning up static variables 2011-04-27 09:09:01 +02:00
Jean-Francois Dockes
b28eaf23fb Got rid of all the old RCS id strings 2011-04-27 08:22:17 +02:00
Jean-Francois Dockes
e883c4d04e Search: allow negative directory filtering (all except from dir). Emit more explicit errors for other unallowed negative search clauses. 2011-03-30 14:35:09 +02:00
Jean-Francois Dockes
ae6d758b34 GUI: display estimated result count in status line 2011-03-11 11:54:50 +01:00
Jean-Francois Dockes
963d7c50fd suppressed some overly repeated log messages 2011-03-11 11:49:54 +01:00
Jean-Francois Dockes
26929e9fb9 index: fixed the fix for path elts too long... 2011-02-14 20:30:26 +01:00
Jean-Francois Dockes
bf39719ac3 Indexing: need to truncate pathologically long path elements (would cause add_document error) 2011-02-13 10:07:25 +01:00
Jean-Francois Dockes
e8fcd35fef fix term highlighting for field searches 2011-01-28 15:47:58 +01:00
Jean-Francois Dockes
50238d5577 restable: highlight match terms 2011-01-28 12:28:27 +01:00
Jean-Francois Dockes
76edc0b290 missing stdio.h 2011-01-17 16:09:14 +01:00