77 Commits

Author SHA1 Message Date
Jean-Francois Dockes
39c152bada Fixed MSVC warnings, all inocuous 2020-04-17 14:26:40 +01:00
Jean-Francois Dockes
0cbc46732f Fixed the FSF address 2019-03-04 11:19:14 +01:00
Jean-Francois Dockes
b4dfa40cbf mh_mail: use rfc2047 on additional headers requested by config. comments and small cleanups 2018-11-22 17:44:33 +01:00
Jean-Francois Dockes
9a9ce69647 comments and indent 2018-11-22 14:41:06 +01:00
Jean-Francois Dockes
218b3fbfe2 Fix clear() call super antipattern in handlers 2018-11-14 15:29:07 +01:00
Jean-Francois Dockes
7b8ba96b25 md5 for text/plain attachments was not computed, stayed same as parent so they were not shown if hide duplicates option was active in the GUI 2018-07-07 09:17:46 +02:00
Jean-Francois Dockes
f83490a5ee When indexing arbitrary email headers: sanitize the data to utf-8 to avoid later splitter errors 2017-10-20 17:49:30 +02:00
Jean-Francois Dockes
aa56a3540e mail: must not reset the configured list of additional headers for each message ! 2017-10-18 15:21:43 +02:00
Jean-Francois Dockes
9d95de032d mail message: multipart/alternative: avoid choosing the text/plain part if it is empty (yes it happens...) 2017-03-26 17:39:49 +02:00
Jean-Francois Dockes
b55f4b3b0a add nomd5types parameter to set file types for which dedup is not that useful and computation is expensive (e.g. audio files). Replace "call parent" misfeature with call to virtual in MimeHandler constructor. Fix log calls indent 2017-02-02 18:09:00 +01:00
Jean-Francois Dockes
f6a999de84 logging now uses c++ streams 2016-07-12 09:41:04 +02:00
Jean-Francois Dockes
75517f7497 recollindex builds. Still need to implement quite a lot of ifndefed stuff (pathut, rclconfig)
--HG--
branch : WINDOWSPORT
2015-08-30 15:30:50 +02:00
Jean-Francois Dockes
d4cd1dd91c 1st mods to get a build under windows. Does not build yet, far from it
--HG--
branch : WINDOWSPORT
2015-08-30 11:19:18 +02:00
Jean-Francois Dockes
3cceffdb9c Use O_NOATIME to avoid disturbing st_atime when possible. Closes issue #230 2015-03-25 13:49:33 +01:00
Jean-Francois Dockes
9ba0b3e8bc Replaced RSA md5 code with public domain OpenBSD/debian dpkg version 2015-03-01 14:28:01 +01:00
Jean-Francois Dockes
884234784d use content_type "name" attribute as attachment file name if there is no content_disposition "filename" attribute 2013-04-28 09:41:03 +02:00
Jean-Francois Dockes
a7728ceb91 changed the mime handler cache key (was the mime type), to avoid having multiple copies of the same filter when applied to different mime types. This reduces a lot the number of processes during indexing, with no impact on performance 2013-04-25 18:18:48 +02:00
Jean-Francois Dockes
2b80c77c23 Add possibility to display a list of sub-documents for a given result 2013-04-24 16:33:53 +02:00
"Jean-Francois Dockes ext:(%22)
860521be88 internfile: do not compute md5 when in preview mode 2013-04-09 12:40:46 +02:00
Jean-Francois Dockes
0ae8ec99f6 more utf-8 err checking prevents bogus terms in index 2013-03-30 10:24:10 +01:00
Jean-Francois Dockes
84b561b040 For plain text files, try alternate decode from 8bit charset when decode from UTF-8 fails 2012-10-06 15:12:49 +02:00
Jean-Francois Dockes
2fc294a9c6 factored out common charset handling code in exec and execm, cleaned up charset and textplain handling in mh_mail 2012-10-06 12:14:04 +02:00
Jean-Francois Dockes
29fe1e4927 implemented maxmemberkb limit for multidoc (e.g. archive) members 2012-10-06 09:05:35 +02:00
Jean-Francois Dockes
61a2e28a7c Absurd input source global variable in Binc imap caused the indexer to crash when an email message contained attachments which were disguised messages (ie: x-mimehtml), because this would cause a recursive call into Binc with a different data source (ie: string instead of original fd, clobbering the original source 2012-05-24 14:52:41 +02:00
Jean-Francois Dockes
3accce0b22 index: added sanity checks to mail handler 2012-05-16 12:25:44 +02:00
Jean-Francois Dockes
ec7b40a52e cosmetics: list -> vector in more places 2012-04-11 19:58:08 +02:00
Jean-Francois Dockes
80fb2f553c MIME handling: treat content-type=="text" as "text/plain". Needed for some very old messages 2012-03-18 08:26:44 +01:00
Jean-Francois Dockes
638d468796 clarified the use of string keys inside the Filter metaData array 2012-03-07 10:13:46 +01:00
Jean-Francois Dockes
a5af2b93bd "md5"->cstr_md5 2012-02-25 10:41:27 +01:00
Jean-Francois Dockes
f544b28b4a Transcode mh_execm text/plain output like we do for mh_exec. Adjust handling of transcoding errors. These changes should fix most cases of non-utf8 text making it to unac/index 2011-10-20 14:00:38 +02:00
Jean-Francois Dockes
38e0957962 const string cleanup 2011-10-01 16:39:38 +02:00
Jean-Francois Dockes
5292a97de3 mail handler: remove header names when indexing to avoid articially increasing the frequency of ie, the "subject" term 2011-06-27 18:38:44 +02:00
Jean-Francois Dockes
b28eaf23fb Got rid of all the old RCS id strings 2011-04-27 08:22:17 +02:00
Jean-Francois Dockes
e1a20aa810 got rid of accesses to global config through getMainConfig() 2011-03-02 13:47:07 +01:00
Jean-Francois Dockes
061ffda545 checked/changed all sprintf calls 2010-11-15 11:57:39 +01:00
Jean-Francois Dockes
e6d5f72886 added the possibility to extract arbitrary mail headers and use them as document fields. This forced an incompatible change in the format of the [stored] section inside the "fields" config file 2010-07-06 17:16:36 +02:00
dockes
c78a3bb567 add cnf(maildefcharset) to set specific mail default charset (mainly for readpst extracts which are utf-8 but have no charset set) 2009-11-27 13:23:13 +00:00
dockes
dd6acb07cc mh_mail: use truncate_to_word to avoid cutting an utf8 char. rcldb: logdeb text_to_word errors 2009-11-18 10:26:47 +00:00
dockes
7d18c22142 reason msg 2009-11-16 16:10:31 +00:00
dockes
daae416d98 extract msgid + generate abstract at start of txt, excluding headers 2009-10-31 09:00:31 +00:00
dockes
229645a0e2 added optional extended file attributes support 2009-01-21 13:55:12 +00:00
dockes
f57d4a91f9 compute md5 checksums for all docs and optionally collapse duplicates in results 2009-01-09 14:56:36 +00:00
dockes
9082f3bf65 allow specifying format and charset for ext filters. Cache and reuse filters 2008-10-04 14:26:59 +00:00
dockes
5cc1de9aad emit field for recipients 2008-09-16 08:13:45 +00:00
dockes
022e0e5f43 suppressed a few wasteful string-cstr conversions 2008-07-01 11:51:51 +00:00
dockes
0460f1016c mh_mail now uses mimetype() to try and better identify application/octet-stream 2008-07-01 10:29:45 +00:00
dockes
46a7f05cbc gcc 4 compat, thanks to Kartik Mistry 2007-12-13 06:58:22 +00:00
dockes
02475fba71 text/plain attachments were not transcoded to utf-8 2007-10-17 11:40:35 +00:00
dockes
1d683ad411 added field/prefixes for author and title + command line query language 2007-01-17 13:53:41 +00:00
dockes
094e465252 handle multipart/signed 2007-01-13 10:28:37 +00:00