49 Commits

Author SHA1 Message Date
Jean-Francois Dockes
b118c93b4f small cleanups to avoid a few ifdef _WIN32 2022-01-17 15:46:40 +01:00
Jean-Francois Dockes
d9c1a9648c Windows msvc: rename dirent.h->msvc_dirent.h. mh_text: fix mimeconf-win and warning 2020-08-15 10:12:36 +01:00
Jean-Francois Dockes
15924ce037 Process text/plain subdocuments like .txt files (paging big ones, etc.) 2020-08-15 10:20:48 +02:00
Jean-Francois Dockes
560041cab9 cleared out errant tabs 2020-05-30 15:54:49 +02:00
Jean-Francois Dockes
39c152bada Fixed MSVC warnings, all inocuous 2020-04-17 14:26:40 +01:00
Jean-Francois Dockes
37e203d535 mh_text: log message when skipping file with size over max 2019-05-17 09:32:46 +02:00
Jean-Francois Dockes
0cbc46732f Fixed the FSF address 2019-03-04 11:19:14 +01:00
Jean-Francois Dockes
bde991c08a got rid of off_t 2017-02-28 20:36:01 +01:00
Jean-Francois Dockes
b55f4b3b0a add nomd5types parameter to set file types for which dedup is not that useful and computation is expensive (e.g. audio files). Replace "call parent" misfeature with call to virtual in MimeHandler constructor. Fix log calls indent 2017-02-02 18:09:00 +01:00
Jean-Francois Dockes
93c0001439 pretty 2016-11-08 12:42:46 +01:00
Jean-Francois Dockes
f6a999de84 logging now uses c++ streams 2016-07-12 09:41:04 +02:00
Jean-Francois Dockes
ff15f8fb1c Centralize stat calls to ensure consistency of time fields on windows 2016-01-08 11:23:10 +01:00
Jean-Francois Dockes
d942242047 replace all %lld instances 2015-10-03 17:25:17 +02:00
Jean-Francois Dockes
2fe75dba28 More small windows int types fixes.
--HG--
branch : WINDOWSPORT
2015-09-01 15:03:21 +02:00
Jean-Francois Dockes
75517f7497 recollindex builds. Still need to implement quite a lot of ifndefed stuff (pathut, rclconfig)
--HG--
branch : WINDOWSPORT
2015-08-30 15:30:50 +02:00
Jean-Francois Dockes
c6e228b7c6 Prepared windows port by removing a number of spurious reference to unix-specific interfaces, and using some xapian posix adaptor includes 2015-08-19 14:41:10 +02:00
Jean-Francois Dockes
9ba0b3e8bc Replaced RSA md5 code with public domain OpenBSD/debian dpkg version 2015-03-01 14:28:01 +01:00
Jean-Francois Dockes
628302c348 Change things so that the first chunk of a multi-chunk (multi-mb) text files gets an ipath so that it does not stand for the whole file, but is treated like other chunks 2015-01-21 16:21:33 +01:00
Jean-Francois Dockes
7c9e0af8a1 Use readnext() method to read even 1st chunk of text files to perform appropriate end of chunk truncation to eol. Wont affect uncunked files 2015-01-21 16:03:26 +01:00
Jean-Francois Dockes
a7728ceb91 changed the mime handler cache key (was the mime type), to avoid having multiple copies of the same filter when applied to different mime types. This reduces a lot the number of processes during indexing, with no impact on performance 2013-04-25 18:18:48 +02:00
"Jean-Francois Dockes ext:(%22)
860521be88 internfile: do not compute md5 when in preview mode 2013-04-09 12:40:46 +02:00
Jean-Francois Dockes
66b59c9963 use the "charset" extended attribute for text files if it is set 2013-01-23 12:04:02 +01:00
Jean-Francois Dockes
9f402d33cb got rid of unused csguess module 2012-04-06 15:14:01 +02:00
Jean-Francois Dockes
638d468796 clarified the use of string keys inside the Filter metaData array 2012-03-07 10:13:46 +01:00
Jean-Francois Dockes
a5af2b93bd "md5"->cstr_md5 2012-02-25 10:41:27 +01:00
Jean-Francois Dockes
49554e42c2 Factorized common text transcoding code in separate module 2011-10-20 17:53:42 +02:00
Jean-Francois Dockes
38e0957962 const string cleanup 2011-10-01 16:39:38 +02:00
"Jean-Francois Dockes ext:(%22)
88685d2e64 search/index: fixed a number of bad conversions to properly deal with text documents bigger than 2GB 2011-07-12 08:28:09 -07:00
Jean-Francois Dockes
b28eaf23fb Got rid of all the old RCS id strings 2011-04-27 08:22:17 +02:00
Jean-Francois Dockes
f4c1c3678d indexing: an error on an archive member could crash or block the indexing because of the unclean way the ipath was passed in/out of internfile(). Closes issue #55 2011-04-25 16:41:43 +02:00
Jean-Francois Dockes
e1a20aa810 got rid of accesses to global config through getMainConfig() 2011-03-02 13:47:07 +01:00
Jean-Francois Dockes
292859a3ac Index: improve processing/rejection for binary files disguising as scripts (ie: shar archives). Use "internal text/plain" instead of "exec rcltext" for script files so that normal text/plain processing is done (max size, splits). Reject text if more than 25% iconv errors 2011-03-01 08:39:30 +01:00
Jean-Francois Dockes
320a869d6e Indexing filters: somewhat clarified and unified some charset-related parameters 2011-02-01 15:04:49 +01:00
Jean-Francois Dockes
061ffda545 checked/changed all sprintf calls 2010-11-15 11:57:39 +01:00
"Jean-Francois Dockes ext:(%22)
e5f41aeb05 Add large file support 2010-07-16 17:08:07 +02:00
dockes
e7b2bc4b46 new glibc missing includes 2009-11-28 09:15:46 +00:00
dockes
a029de8be9 set defaults usedesktoprefs, maxtext 20mb pagesz 1000k webcache 40m 2009-11-28 08:14:05 +00:00
dockes
6bd43301e1 gcc43+linux compile 2009-10-21 11:32:49 +00:00
dockes
a73a1fb097 dont set ipath for the first page in text files to avoid dual records for files under the page size 2009-09-30 15:53:06 +00:00
dockes
a374b2a7b7 implemented paged text files 2009-09-30 15:45:53 +00:00
dockes
0e1cbddb8b textfilemaxmbs 2009-09-29 15:58:45 +00:00
dockes
229645a0e2 added optional extended file attributes support 2009-01-21 13:55:12 +00:00
dockes
f57d4a91f9 compute md5 checksums for all docs and optionally collapse duplicates in results 2009-01-09 14:56:36 +00:00
dockes
33c95ef1ba Dijon filters 1st step: mostly working needs check and optim 2006-12-15 12:40:24 +00:00
dockes
f96fcd6dd3 get rid of unused temp 2006-03-20 15:14:08 +00:00
dockes
2a3075d6a6 reference to GPL in all .cpp files 2006-01-23 13:32:29 +00:00
dockes
be485e8059 allow indexing individual files. Fix pb with preview and charsets (local defcharset ignored) 2005-12-14 11:00:48 +00:00
dockes
ae8ff5abb3 *** empty log message *** 2005-11-24 07:16:16 +00:00
dockes
6cba3b65c1 restructuring on mimehandler files 2005-11-18 13:23:46 +00:00