Jean-Francois Dockes
|
7ddbbb1ee8
|
search language: implemented filtering on file size
|
2012-03-07 17:08:22 +01:00 |
|
Jean-Francois Dockes
|
85166c93b2
|
Changed the way we handle document sizes. The fbytes field should now be in most cases the most "natural" document size. pcbytes holds the top external container size and dbytes the text size
|
2012-03-07 15:39:30 +01:00 |
|
Jean-Francois Dockes
|
7b5a891ee3
|
idx: make Doc parameter to addOrUpdate non const to avoid extra copy
|
2012-03-07 08:34:25 +01:00 |
|
Jean-Francois Dockes
|
25a99a3b38
|
add omega-compatible value slot for file size
|
2012-03-06 07:28:18 +01:00 |
|
Jean-Francois Dockes
|
6cdf9ae12b
|
Accept and process relative/incomplete paths with the dir: directive (dont anchor path phrase is path does not start with /)
|
2012-02-24 19:25:55 +01:00 |
|
Jean-Francois Dockes
|
9bc2fc8958
|
Experimented with multithreading the indexing pipeline. Left undef'd as 15%-30% improvement of indexing time does not seem worth the complexity
|
2012-02-21 17:09:02 +01:00 |
|
Jean-Francois Dockes
|
516863b5d6
|
GUI: perform up to date check before previewing a subdoc. This is for example to avoid showing the wrong message if a mail folder has been compacted
|
2012-01-20 17:48:55 +01:00 |
|
Jean-Francois Dockes
|
036937e8bf
|
added getmeta() method to Rcl::Doc and use in misc places
|
2012-01-20 14:48:50 +01:00 |
|
Jean-Francois Dockes
|
1931595637
|
GUI: added menu entry to show all the mime types actually indexed (by content)
|
2011-11-25 19:47:56 +01:00 |
|
Jean-Francois Dockes
|
607d3cc27b
|
Add prefix translation for "mtype". Allows using term expansion to retrieve all the types from the index
|
2011-11-25 19:47:39 +01:00 |
|
Jean-Francois Dockes
|
8d52e928d1
|
increase slack for automatic phrases
|
2011-10-20 13:25:33 +02:00 |
|
Jean-Francois Dockes
|
0860b559ee
|
get rid of a few garbage terms during indexing. Set a threshold for conversion errors after which we discard the doc. Stabilize the new termproc pipeline but no commongrams for now
|
2011-10-12 17:55:58 +02:00 |
|
Jean-Francois Dockes
|
4a7ff398b2
|
comments
|
2011-10-07 08:05:36 +02:00 |
|
Jean-Francois Dockes
|
5fd31172f5
|
New text to terms processing pipelines: results identical to 1.16 when used with empty stopfile
|
2011-10-07 07:53:49 +02:00 |
|
Jean-Francois Dockes
|
eda494153e
|
simplify calls to isStop
|
2011-10-05 17:25:35 +02:00 |
|
Jean-Francois Dockes
|
acb297c9df
|
comments + move the position jump to text_to_words
|
2011-10-04 16:33:44 +02:00 |
|
Jean-Francois Dockes
|
e4eba0de97
|
stoplist: use stringToStrings in place of splitter to support quoted space-containing entries
|
2011-10-04 16:04:28 +02:00 |
|
Jean-Francois Dockes
|
bb2685c2f5
|
Add frequency threshold to avoid adding common term to the automatic phrase search extension. Use autophrase by default with simple search, with a default freq threshold at 2%
|
2011-10-04 09:03:43 +02:00 |
|
Jean-Francois Dockes
|
4ced9bee49
|
add termDocCnt method
|
2011-10-04 08:04:17 +02:00 |
|
Jean-Francois Dockes
|
38e0957962
|
const string cleanup
|
2011-10-01 16:39:38 +02:00 |
|
Jean-Francois Dockes
|
702fb88a1e
|
Search: remove restriction on empty queries by replacing empty query with Xapian::Query::Matchall. This allows querying all files of a given type, or under a given tree, without an actual text search part
|
2011-09-30 08:50:50 +02:00 |
|
Jean-Francois Dockes
|
383468e2fc
|
bump doc create/update messages updates to loginfo so that indexing progress can be monitored with less noise
|
2011-09-30 08:47:39 +02:00 |
|
Jean-Francois Dockes
|
424e4173ba
|
threading cleanup: add mutex protection around moronic change to transcode. Add mutex to equiv issue in unac. Rename const strings everywhere to cstr_xx to ease future detection of potentially problematic static variables. Most probably close issue #65
|
2011-09-28 15:01:14 +02:00 |
|
Jean-Francois Dockes
|
e0d211d602
|
none
|
2011-09-20 17:16:41 +02:00 |
|
Jean-Francois Dockes
|
ee0d602ab3
|
Implement anchored searches: terms to be found at a maximum distance of the start or end of the text
|
2011-09-20 16:42:56 +02:00 |
|
Jean-Francois Dockes
|
c5ff0cdf52
|
Control memory usage when deleting documents: use idxflushmb as when adding/updating
|
2011-09-07 19:11:11 +02:00 |
|
Jean-Francois Dockes
|
a380873029
|
suppress some sources of spurious ellipsises in abstracts
|
2011-08-24 14:51:59 +02:00 |
|
Jean-Francois Dockes
|
d3fc258d85
|
avoid generating empty abstract field
|
2011-08-19 09:20:11 +02:00 |
|
"Jean-Francois Dockes ext:(%22)
|
ebbcc115a8
|
Allow setting a weight increase for field terms
|
2011-07-22 16:43:39 +02:00 |
|
"Jean-Francois Dockes ext:(%22)
|
48e86c99b5
|
GUI restable: fix sorting by file and doc size
|
2011-07-20 10:44:04 +02:00 |
|
Jean-Francois Dockes
|
469c544915
|
GUI: allow setting the snippet separator inside abstract (now a real html ellipsis by default)
|
2011-07-07 11:11:02 +02:00 |
|
Jean-Francois Dockes
|
b6c73ecdeb
|
debug: improve consistency of log messages about up to date/processed files
|
2011-06-04 10:18:46 +02:00 |
|
Jean-Francois Dockes
|
91f277ec26
|
Search: allow setting weights on terms, ie: "important"2.5
|
2011-05-30 14:03:01 +02:00 |
|
Jean-Francois Dockes
|
ce9e9e4d00
|
query: support negative mime and catg clauses: -mime:text/plain
|
2011-05-15 09:29:24 +02:00 |
|
Jean-Francois Dockes
|
08a65f5cfc
|
experiment with xapian spell support (not ready yet) + take care of some static init issues showing up on the mac
|
2011-05-10 10:15:15 +02:00 |
|
Jean-Francois Dockes
|
ce607032fa
|
Fix a number of potential or actual static object initialization issues
|
2011-05-09 20:49:15 +02:00 |
|
Jean-Francois Dockes
|
32f4f7b6fc
|
Fix a number of potential or actual static object initialization issues
|
2011-05-09 20:48:59 +02:00 |
|
Jean-Francois Dockes
|
84d59f18a0
|
GUI: when opening the index, discriminate errors on the main index from errors on external ones, to avoid starting the initial indexing dialog in the latter case
|
2011-04-29 16:16:04 +02:00 |
|
Jean-Francois Dockes
|
a4d1689581
|
try to be more responsive to user interrupts: do not build the aux databases after an interruption, and check for an interruption during the purge pass
|
2011-04-28 12:27:06 +02:00 |
|
Jean-Francois Dockes
|
55f124725f
|
Fix problems that occurred when multiple threads were trying to read/convert files at the same time (ie: indexing and previewing threads in the GUI calling internfile()). Either get rid of or lock-protect all shared data, eliminate misc initialization possible conflicts by using static initializers. Hopefuly closes issue #51
|
2011-04-28 10:58:33 +02:00 |
|
Jean-Francois Dockes
|
01f24fa5fd
|
cleaning up static variables
|
2011-04-27 09:09:01 +02:00 |
|
Jean-Francois Dockes
|
b28eaf23fb
|
Got rid of all the old RCS id strings
|
2011-04-27 08:22:17 +02:00 |
|
Jean-Francois Dockes
|
e883c4d04e
|
Search: allow negative directory filtering (all except from dir). Emit more explicit errors for other unallowed negative search clauses.
|
2011-03-30 14:35:09 +02:00 |
|
Jean-Francois Dockes
|
ae6d758b34
|
GUI: display estimated result count in status line
|
2011-03-11 11:54:50 +01:00 |
|
Jean-Francois Dockes
|
963d7c50fd
|
suppressed some overly repeated log messages
|
2011-03-11 11:49:54 +01:00 |
|
Jean-Francois Dockes
|
26929e9fb9
|
index: fixed the fix for path elts too long...
|
2011-02-14 20:30:26 +01:00 |
|
Jean-Francois Dockes
|
bf39719ac3
|
Indexing: need to truncate pathologically long path elements (would cause add_document error)
|
2011-02-13 10:07:25 +01:00 |
|
Jean-Francois Dockes
|
e8fcd35fef
|
fix term highlighting for field searches
|
2011-01-28 15:47:58 +01:00 |
|
Jean-Francois Dockes
|
50238d5577
|
restable: highlight match terms
|
2011-01-28 12:28:27 +01:00 |
|
Jean-Francois Dockes
|
76edc0b290
|
missing stdio.h
|
2011-01-17 16:09:14 +01:00 |
|