268 Commits

Author SHA1 Message Date
Jean-Francois Dockes
50135e3428 process extended attributes by default 2013-02-19 16:12:24 +01:00
Jean-Francois Dockes
d3631b5ddf cleaned up processing of metadata from diverse origins (doc,extattrs,localfields) 2013-01-29 14:33:57 +01:00
Jean-Francois Dockes
66b59c9963 use the "charset" extended attribute for text files if it is set 2013-01-23 12:04:02 +01:00
Jean-Francois Dockes
f897f087aa HTML: do not concatenate text found before body tag with the title. Fixes issue #125 2013-01-12 14:06:40 +01:00
Jean-Francois Dockes
d2f7f11715 Use dynamic lib for shared recoll code 2012-12-29 14:27:01 +01:00
Jean-Francois Dockes
2d5c2a8058 split the iDocToFile method into static and member parts for use from python module 2012-12-20 11:15:10 +01:00
Jean-Francois Dockes
7945e04c5f removed static decls for previously deleted methods 2012-12-20 08:10:54 +01:00
Jean-Francois Dockes
d2acd4896b comments 2012-12-20 08:05:12 +01:00
Jean-Francois Dockes
a8f410606d recollindex threads: reset the config pointer when retrieving handler from cache. Seems to have been the cause of the crashes 2012-11-30 19:34:19 +01:00
Jean-Francois Dockes
6457fb4100 take care of pathologic charset decls with empty value 2012-11-26 11:40:08 +01:00
Jean-Francois Dockes
cd53c0a536 Multithreaded indexing seems not to crash anymore thanks to locked existence map 2012-11-02 21:43:51 +01:00
Jean-Francois Dockes
2ef104bddf comments 2012-11-02 09:47:39 +01:00
Jean-Francois Dockes
5fc8f240fe from 1.18 branch: Adjust things for using the new Firefox plugin: remove visible Beagle references + fix 1.18 web queue indexing bugs 2012-11-01 11:30:39 +01:00
Jean-Francois Dockes
04c19b33d5 from 1.18 branch: When creating initial config directory (1st exec), initialize specific unac_except_trans for some languages: de, se/no/dk/fi + fix mixup of language and country codes 2012-11-01 11:27:50 +01:00
Jean-Francois Dockes
ee7d0f2ee7 1st parallel multithreaded version of indexing which can do my home without crashing... Let's checkpoint 2012-11-01 11:19:48 +01:00
Jean-Francois Dockes
b8963db4b1 cleaned up the missing helper storage class 2012-10-28 16:43:19 +01:00
Jean-Francois Dockes
17f8b652d4 Support explicit HTML markup in fields when the markup="html" attribute is present 2012-10-25 14:22:20 +02:00
Jean-Francois Dockes
95ef518ec7 the missing filter detection code was broken 2012-10-23 19:40:51 +02:00
Jean-Francois Dockes
2a833536d5 handle application tag when looking for icon, and add icons for books and book chapters (epub, chm, info) 2012-10-23 16:34:07 +02:00
Jean-Francois Dockes
5add2e2384 Arrange so we can now open the parent of a document (e.g. chm file instead of temp copy of html page inside chm), even when the parent is itself embedded in an archive 2012-10-12 16:54:52 +02:00
Jean-Francois Dockes
d4edbbaedb rclepub: use elt ids instead of hrefs + debug traces 2012-10-11 15:35:15 +02:00
Jean-Francois Dockes
8e1ed842d2 message 2012-10-09 14:52:32 +02:00
Jean-Francois Dockes
f624d3b10e doc 2012-10-06 21:04:03 +02:00
Jean-Francois Dockes
84b561b040 For plain text files, try alternate decode from 8bit charset when decode from UTF-8 fails 2012-10-06 15:12:49 +02:00
Jean-Francois Dockes
2fc294a9c6 factored out common charset handling code in exec and execm, cleaned up charset and textplain handling in mh_mail 2012-10-06 12:14:04 +02:00
Jean-Francois Dockes
29fe1e4927 implemented maxmemberkb limit for multidoc (e.g. archive) members 2012-10-06 09:05:35 +02:00
Jean-Francois Dockes
1329265b7b check for empty file name in internfile, else gets stuck later because empty fn is interpreted as read stdin in md5 2012-10-05 16:42:13 +02:00
Jean-Francois Dockes
d942b44785 mbox: implement member size limit of 100MB and autodetec thunderbird mboxes (look for .msf) 2012-10-04 17:00:50 +02:00
Jean-Francois Dockes
e0bc65bfdd small mods inocuous or auxiliary to case/diac sensitivity but which can live in main branch 2012-09-13 12:25:01 +02:00
"Jean-Francois Dockes ext:(%22)
ec3dbb4092 comments 2012-08-21 08:38:23 +02:00
"Jean-Francois Dockes ext:(%22)
2870274f80 slightly simplified temp file handling 2012-08-21 08:35:39 +02:00
Jean-Francois Dockes
254a7dc972 comment 2012-06-05 14:14:02 +02:00
Jean-Francois Dockes
643f4d56bb internals: virtualized the doc fetcher interface 2012-06-05 07:16:11 +02:00
Jean-Francois Dockes
61a2e28a7c Absurd input source global variable in Binc imap caused the indexer to crash when an email message contained attachments which were disguised messages (ie: x-mimehtml), because this would cause a recursive call into Binc with a different data source (ie: string instead of original fd, clobbering the original source 2012-05-24 14:52:41 +02:00
Jean-Francois Dockes
3accce0b22 index: added sanity checks to mail handler 2012-05-16 12:25:44 +02:00
Jean-Francois Dockes
0333d83d2e html: small additional cleanup after previous <body> processing modification 2012-05-16 10:13:53 +02:00
Jean-Francois Dockes
e6191b51a8 Html: Just ignore opening and closing <body> and <html> tags. Current browsers show text before or after the body and ignore multiple body tags. Not pushed to 1.17 maint because of possible disruption. Closes issue #92 2012-05-16 10:07:09 +02:00
Jean-Francois Dockes
8b34610dde Cleaned up file name handling. Fixes that file names were sometimes indexed split, sometimes not. They now always are both, with different prefixes. Forces reindex 2012-04-13 09:18:08 +02:00
Jean-Francois Dockes
ec7b40a52e cosmetics: list -> vector in more places 2012-04-11 19:58:08 +02:00
Jean-Francois Dockes
78bd8d63da use vector instead of list for execmd arg list 2012-04-11 15:36:49 +02:00
Jean-Francois Dockes
9f402d33cb got rid of unused csguess module 2012-04-06 15:14:01 +02:00
Jean-Francois Dockes
80fb2f553c MIME handling: treat content-type=="text" as "text/plain". Needed for some very old messages 2012-03-18 08:26:44 +01:00
Jean-Francois Dockes
0050f96f57 fix test driver 2012-03-18 08:23:33 +01:00
Jean-Francois Dockes
85166c93b2 Changed the way we handle document sizes. The fbytes field should now be in most cases the most "natural" document size. pcbytes holds the top external container size and dbytes the text size 2012-03-07 15:39:30 +01:00
Jean-Francois Dockes
638d468796 clarified the use of string keys inside the Filter metaData array 2012-03-07 10:13:46 +01:00
Jean-Francois Dockes
a5af2b93bd "md5"->cstr_md5 2012-02-25 10:41:27 +01:00
Jean-Francois Dockes
ec87379015 html: handle the html5 charset meta tag 2012-01-26 19:27:58 +01:00
Jean-Francois Dockes
0d8a61ced9 log message 2012-01-26 19:26:54 +01:00
Jean-Francois Dockes
639a434dce comments 2012-01-26 18:17:37 +01:00
Jean-Francois Dockes
eed31f9ef1 html index: throw an exception after parsing in all cases so that the same code path is always used. The previous approach sometimes resulted in a bad charset used for preview 2012-01-25 17:33:41 +01:00