243 Commits

Author SHA1 Message Date
Jean-Francois Dockes
29fe1e4927 implemented maxmemberkb limit for multidoc (e.g. archive) members 2012-10-06 09:05:35 +02:00
Jean-Francois Dockes
1329265b7b check for empty file name in internfile, else gets stuck later because empty fn is interpreted as read stdin in md5 2012-10-05 16:42:13 +02:00
Jean-Francois Dockes
d942b44785 mbox: implement member size limit of 100MB and autodetec thunderbird mboxes (look for .msf) 2012-10-04 17:00:50 +02:00
Jean-Francois Dockes
e0bc65bfdd small mods inocuous or auxiliary to case/diac sensitivity but which can live in main branch 2012-09-13 12:25:01 +02:00
"Jean-Francois Dockes ext:(%22)
ec3dbb4092 comments 2012-08-21 08:38:23 +02:00
"Jean-Francois Dockes ext:(%22)
2870274f80 slightly simplified temp file handling 2012-08-21 08:35:39 +02:00
Jean-Francois Dockes
254a7dc972 comment 2012-06-05 14:14:02 +02:00
Jean-Francois Dockes
643f4d56bb internals: virtualized the doc fetcher interface 2012-06-05 07:16:11 +02:00
Jean-Francois Dockes
61a2e28a7c Absurd input source global variable in Binc imap caused the indexer to crash when an email message contained attachments which were disguised messages (ie: x-mimehtml), because this would cause a recursive call into Binc with a different data source (ie: string instead of original fd, clobbering the original source 2012-05-24 14:52:41 +02:00
Jean-Francois Dockes
3accce0b22 index: added sanity checks to mail handler 2012-05-16 12:25:44 +02:00
Jean-Francois Dockes
0333d83d2e html: small additional cleanup after previous <body> processing modification 2012-05-16 10:13:53 +02:00
Jean-Francois Dockes
e6191b51a8 Html: Just ignore opening and closing <body> and <html> tags. Current browsers show text before or after the body and ignore multiple body tags. Not pushed to 1.17 maint because of possible disruption. Closes issue #92 2012-05-16 10:07:09 +02:00
Jean-Francois Dockes
8b34610dde Cleaned up file name handling. Fixes that file names were sometimes indexed split, sometimes not. They now always are both, with different prefixes. Forces reindex 2012-04-13 09:18:08 +02:00
Jean-Francois Dockes
ec7b40a52e cosmetics: list -> vector in more places 2012-04-11 19:58:08 +02:00
Jean-Francois Dockes
78bd8d63da use vector instead of list for execmd arg list 2012-04-11 15:36:49 +02:00
Jean-Francois Dockes
9f402d33cb got rid of unused csguess module 2012-04-06 15:14:01 +02:00
Jean-Francois Dockes
80fb2f553c MIME handling: treat content-type=="text" as "text/plain". Needed for some very old messages 2012-03-18 08:26:44 +01:00
Jean-Francois Dockes
0050f96f57 fix test driver 2012-03-18 08:23:33 +01:00
Jean-Francois Dockes
85166c93b2 Changed the way we handle document sizes. The fbytes field should now be in most cases the most "natural" document size. pcbytes holds the top external container size and dbytes the text size 2012-03-07 15:39:30 +01:00
Jean-Francois Dockes
638d468796 clarified the use of string keys inside the Filter metaData array 2012-03-07 10:13:46 +01:00
Jean-Francois Dockes
a5af2b93bd "md5"->cstr_md5 2012-02-25 10:41:27 +01:00
Jean-Francois Dockes
ec87379015 html: handle the html5 charset meta tag 2012-01-26 19:27:58 +01:00
Jean-Francois Dockes
0d8a61ced9 log message 2012-01-26 19:26:54 +01:00
Jean-Francois Dockes
639a434dce comments 2012-01-26 18:17:37 +01:00
Jean-Francois Dockes
eed31f9ef1 html index: throw an exception after parsing in all cases so that the same code path is always used. The previous approach sometimes resulted in a bad charset used for preview 2012-01-25 17:33:41 +01:00
Jean-Francois Dockes
516863b5d6 GUI: perform up to date check before previewing a subdoc. This is for example to avoid showing the wrong message if a mail folder has been compacted 2012-01-20 17:48:55 +01:00
Jean-Francois Dockes
036937e8bf added getmeta() method to Rcl::Doc and use in misc places 2012-01-20 14:48:50 +01:00
Jean-Francois Dockes
1931595637 GUI: added menu entry to show all the mime types actually indexed (by content) 2011-11-25 19:47:56 +01:00
Jean-Francois Dockes
49554e42c2 Factorized common text transcoding code in separate module 2011-10-20 17:53:42 +02:00
Jean-Francois Dockes
f544b28b4a Transcode mh_execm text/plain output like we do for mh_exec. Adjust handling of transcoding errors. These changes should fix most cases of non-utf8 text making it to unac/index 2011-10-20 14:00:38 +02:00
Jean-Francois Dockes
38e0957962 const string cleanup 2011-10-01 16:39:38 +02:00
Jean-Francois Dockes
487b623faf log 2011-10-01 09:31:38 +02:00
Jean-Francois Dockes
424e4173ba threading cleanup: add mutex protection around moronic change to transcode. Add mutex to equiv issue in unac. Rename const strings everywhere to cstr_xx to ease future detection of potentially problematic static variables. Most probably close issue #65 2011-09-28 15:01:14 +02:00
"Jean-Francois Dockes ext:(%22)
802ebc7704 comments 2011-08-21 13:29:06 +02:00
"Jean-Francois Dockes ext:(%22)
9cefcb7283 Simple optimization makes mh_mbox 3x faster 2011-08-20 14:54:29 +02:00
"Jean-Francois Dockes ext:(%22)
6b04fe7f2c The record for an attachment for which conversion failed (ie: image without exiftool) would erase the message's record because its ipath was not updated 2011-07-16 11:53:54 +02:00
"Jean-Francois Dockes ext:(%22)
88685d2e64 search/index: fixed a number of bad conversions to properly deal with text documents bigger than 2GB 2011-07-12 08:28:09 -07:00
Jean-Francois Dockes
5292a97de3 mail handler: remove header names when indexing to avoid articially increasing the frequency of ie, the "subject" term 2011-06-27 18:38:44 +02:00
Jean-Francois Dockes
c7a241d26e htmlparse: merged some updates from xapian 1.2.6 2011-06-24 10:41:54 +02:00
Jean-Francois Dockes
67ad817e52 internfile: revert 2314:17098b627784 which was unneeded and wrong 2011-06-22 17:49:51 +02:00
Jean-Francois Dockes
ce44c0a875 preview: use the index idea of the mime type after decompression instead or re-running mimetype(). This will fix preview for compressed man pages (which were identified as text/troff after decomp because not under man/ 2011-06-22 16:09:55 +02:00
Jean-Francois Dockes
ba5e0c41b4 index: fixed the way we process some mime type aliases, which resulted in accumulating handlers in the handler cache 2011-06-21 19:18:55 +02:00
Jean-Francois Dockes
631121e24e internfile: keep around temp file for possible caller use 2011-05-09 07:00:34 +02:00
Jean-Francois Dockes
c45cdd7561 common data locking: remove deadlock in mbox cache locking 2011-04-28 14:28:19 +02:00
Jean-Francois Dockes
55f124725f Fix problems that occurred when multiple threads were trying to read/convert files at the same time (ie: indexing and previewing threads in the GUI calling internfile()). Either get rid of or lock-protect all shared data, eliminate misc initialization possible conflicts by using static initializers. Hopefuly closes issue #51 2011-04-28 10:58:33 +02:00
Jean-Francois Dockes
b28eaf23fb Got rid of all the old RCS id strings 2011-04-27 08:22:17 +02:00
Jean-Francois Dockes
2d8e57ee4f Gui preview, internfile: handle case where target doc of a compound ipath still needs further translation (is not text or html) 2011-04-26 08:26:09 +02:00
Jean-Francois Dockes
f4c1c3678d indexing: an error on an archive member could crash or block the indexing because of the unclean way the ipath was passed in/out of internfile(). Closes issue #55 2011-04-25 16:41:43 +02:00
Jean-Francois Dockes
52fda2a075 GUI: lock handler cache against multiple thread access 2011-04-24 08:47:27 +02:00
Jean-Francois Dockes
7eb182f53c index: escape colon characters inside ipaths. This could potentially happen with the zip (ie: zipped maildir) and chm filters 2011-03-12 12:03:39 +01:00