Jean-Francois Dockes
|
75517f7497
|
recollindex builds. Still need to implement quite a lot of ifndefed stuff (pathut, rclconfig)
--HG--
branch : WINDOWSPORT
|
2015-08-30 15:30:50 +02:00 |
|
Jean-Francois Dockes
|
d4cd1dd91c
|
1st mods to get a build under windows. Does not build yet, far from it
--HG--
branch : WINDOWSPORT
|
2015-08-30 11:19:18 +02:00 |
|
Jean-Francois Dockes
|
3cceffdb9c
|
Use O_NOATIME to avoid disturbing st_atime when possible. Closes issue #230
|
2015-03-25 13:49:33 +01:00 |
|
Jean-Francois Dockes
|
9ba0b3e8bc
|
Replaced RSA md5 code with public domain OpenBSD/debian dpkg version
|
2015-03-01 14:28:01 +01:00 |
|
Jean-Francois Dockes
|
884234784d
|
use content_type "name" attribute as attachment file name if there is no content_disposition "filename" attribute
|
2013-04-28 09:41:03 +02:00 |
|
Jean-Francois Dockes
|
a7728ceb91
|
changed the mime handler cache key (was the mime type), to avoid having multiple copies of the same filter when applied to different mime types. This reduces a lot the number of processes during indexing, with no impact on performance
|
2013-04-25 18:18:48 +02:00 |
|
Jean-Francois Dockes
|
2b80c77c23
|
Add possibility to display a list of sub-documents for a given result
|
2013-04-24 16:33:53 +02:00 |
|
"Jean-Francois Dockes ext:(%22)
|
860521be88
|
internfile: do not compute md5 when in preview mode
|
2013-04-09 12:40:46 +02:00 |
|
Jean-Francois Dockes
|
0ae8ec99f6
|
more utf-8 err checking prevents bogus terms in index
|
2013-03-30 10:24:10 +01:00 |
|
Jean-Francois Dockes
|
84b561b040
|
For plain text files, try alternate decode from 8bit charset when decode from UTF-8 fails
|
2012-10-06 15:12:49 +02:00 |
|
Jean-Francois Dockes
|
2fc294a9c6
|
factored out common charset handling code in exec and execm, cleaned up charset and textplain handling in mh_mail
|
2012-10-06 12:14:04 +02:00 |
|
Jean-Francois Dockes
|
29fe1e4927
|
implemented maxmemberkb limit for multidoc (e.g. archive) members
|
2012-10-06 09:05:35 +02:00 |
|
Jean-Francois Dockes
|
61a2e28a7c
|
Absurd input source global variable in Binc imap caused the indexer to crash when an email message contained attachments which were disguised messages (ie: x-mimehtml), because this would cause a recursive call into Binc with a different data source (ie: string instead of original fd, clobbering the original source
|
2012-05-24 14:52:41 +02:00 |
|
Jean-Francois Dockes
|
3accce0b22
|
index: added sanity checks to mail handler
|
2012-05-16 12:25:44 +02:00 |
|
Jean-Francois Dockes
|
ec7b40a52e
|
cosmetics: list -> vector in more places
|
2012-04-11 19:58:08 +02:00 |
|
Jean-Francois Dockes
|
80fb2f553c
|
MIME handling: treat content-type=="text" as "text/plain". Needed for some very old messages
|
2012-03-18 08:26:44 +01:00 |
|
Jean-Francois Dockes
|
638d468796
|
clarified the use of string keys inside the Filter metaData array
|
2012-03-07 10:13:46 +01:00 |
|
Jean-Francois Dockes
|
a5af2b93bd
|
"md5"->cstr_md5
|
2012-02-25 10:41:27 +01:00 |
|
Jean-Francois Dockes
|
f544b28b4a
|
Transcode mh_execm text/plain output like we do for mh_exec. Adjust handling of transcoding errors. These changes should fix most cases of non-utf8 text making it to unac/index
|
2011-10-20 14:00:38 +02:00 |
|
Jean-Francois Dockes
|
38e0957962
|
const string cleanup
|
2011-10-01 16:39:38 +02:00 |
|
Jean-Francois Dockes
|
5292a97de3
|
mail handler: remove header names when indexing to avoid articially increasing the frequency of ie, the "subject" term
|
2011-06-27 18:38:44 +02:00 |
|
Jean-Francois Dockes
|
b28eaf23fb
|
Got rid of all the old RCS id strings
|
2011-04-27 08:22:17 +02:00 |
|
Jean-Francois Dockes
|
e1a20aa810
|
got rid of accesses to global config through getMainConfig()
|
2011-03-02 13:47:07 +01:00 |
|
Jean-Francois Dockes
|
061ffda545
|
checked/changed all sprintf calls
|
2010-11-15 11:57:39 +01:00 |
|
Jean-Francois Dockes
|
e6d5f72886
|
added the possibility to extract arbitrary mail headers and use them as document fields. This forced an incompatible change in the format of the [stored] section inside the "fields" config file
|
2010-07-06 17:16:36 +02:00 |
|
dockes
|
c78a3bb567
|
add cnf(maildefcharset) to set specific mail default charset (mainly for readpst extracts which are utf-8 but have no charset set)
|
2009-11-27 13:23:13 +00:00 |
|
dockes
|
dd6acb07cc
|
mh_mail: use truncate_to_word to avoid cutting an utf8 char. rcldb: logdeb text_to_word errors
|
2009-11-18 10:26:47 +00:00 |
|
dockes
|
7d18c22142
|
reason msg
|
2009-11-16 16:10:31 +00:00 |
|
dockes
|
daae416d98
|
extract msgid + generate abstract at start of txt, excluding headers
|
2009-10-31 09:00:31 +00:00 |
|
dockes
|
229645a0e2
|
added optional extended file attributes support
|
2009-01-21 13:55:12 +00:00 |
|
dockes
|
f57d4a91f9
|
compute md5 checksums for all docs and optionally collapse duplicates in results
|
2009-01-09 14:56:36 +00:00 |
|
dockes
|
9082f3bf65
|
allow specifying format and charset for ext filters. Cache and reuse filters
|
2008-10-04 14:26:59 +00:00 |
|
dockes
|
5cc1de9aad
|
emit field for recipients
|
2008-09-16 08:13:45 +00:00 |
|
dockes
|
022e0e5f43
|
suppressed a few wasteful string-cstr conversions
|
2008-07-01 11:51:51 +00:00 |
|
dockes
|
0460f1016c
|
mh_mail now uses mimetype() to try and better identify application/octet-stream
|
2008-07-01 10:29:45 +00:00 |
|
dockes
|
46a7f05cbc
|
gcc 4 compat, thanks to Kartik Mistry
|
2007-12-13 06:58:22 +00:00 |
|
dockes
|
02475fba71
|
text/plain attachments were not transcoded to utf-8
|
2007-10-17 11:40:35 +00:00 |
|
dockes
|
1d683ad411
|
added field/prefixes for author and title + command line query language
|
2007-01-17 13:53:41 +00:00 |
|
dockes
|
094e465252
|
handle multipart/signed
|
2007-01-13 10:28:37 +00:00 |
|
dockes
|
8fe7cb37d3
|
mh_mail needs to lowercase contentypes
|
2006-12-18 12:06:11 +00:00 |
|
dockes
|
8f1f2ca66d
|
mail attachments sort of ok
|
2006-12-16 15:39:54 +00:00 |
|
dockes
|
229eb0de78
|
test data indexing result same terms as 1.6.3
|
2006-12-15 16:33:15 +00:00 |
|
dockes
|
33c95ef1ba
|
Dijon filters 1st step: mostly working needs check and optim
|
2006-12-15 12:40:24 +00:00 |
|
dockes
|
9c32ef4f16
|
fix bug with bad message "From " delimiter detection
|
2006-12-07 08:06:54 +00:00 |
|
dockes
|
d5745bdb83
|
fix bug with bad message "From " delimiter detection
|
2006-12-07 07:06:28 +00:00 |
|
dockes
|
290a7272be
|
use regexp to better discriminate From delimiter lines in mbox. Avoid reading mboxes twice
|
2006-12-05 15:25:17 +00:00 |
|
dockes
|
417586fb2b
|
fix newlines
|
2006-09-23 07:39:18 +00:00 |
|
dockes
|
b14021f539
|
clarified depth processing and increased limit
|
2006-09-22 07:19:13 +00:00 |
|
dockes
|
3e2bccd259
|
walk the full mime tree instead of staying at level 1
|
2006-09-19 14:30:39 +00:00 |
|
dockes
|
cfe1dd5d9f
|
Use own code to parse rfc822 dates, strptime() cant do
|
2006-09-15 16:50:44 +00:00 |
|