Jean-Francois Dockes
|
50135e3428
|
process extended attributes by default
|
2013-02-19 16:12:24 +01:00 |
|
Jean-Francois Dockes
|
d3631b5ddf
|
cleaned up processing of metadata from diverse origins (doc,extattrs,localfields)
|
2013-01-29 14:33:57 +01:00 |
|
Jean-Francois Dockes
|
66b59c9963
|
use the "charset" extended attribute for text files if it is set
|
2013-01-23 12:04:02 +01:00 |
|
Jean-Francois Dockes
|
f897f087aa
|
HTML: do not concatenate text found before body tag with the title. Fixes issue #125
|
2013-01-12 14:06:40 +01:00 |
|
Jean-Francois Dockes
|
d2f7f11715
|
Use dynamic lib for shared recoll code
|
2012-12-29 14:27:01 +01:00 |
|
Jean-Francois Dockes
|
2d5c2a8058
|
split the iDocToFile method into static and member parts for use from python module
|
2012-12-20 11:15:10 +01:00 |
|
Jean-Francois Dockes
|
7945e04c5f
|
removed static decls for previously deleted methods
|
2012-12-20 08:10:54 +01:00 |
|
Jean-Francois Dockes
|
d2acd4896b
|
comments
|
2012-12-20 08:05:12 +01:00 |
|
Jean-Francois Dockes
|
a8f410606d
|
recollindex threads: reset the config pointer when retrieving handler from cache. Seems to have been the cause of the crashes
|
2012-11-30 19:34:19 +01:00 |
|
Jean-Francois Dockes
|
6457fb4100
|
take care of pathologic charset decls with empty value
|
2012-11-26 11:40:08 +01:00 |
|
Jean-Francois Dockes
|
cd53c0a536
|
Multithreaded indexing seems not to crash anymore thanks to locked existence map
|
2012-11-02 21:43:51 +01:00 |
|
Jean-Francois Dockes
|
2ef104bddf
|
comments
|
2012-11-02 09:47:39 +01:00 |
|
Jean-Francois Dockes
|
5fc8f240fe
|
from 1.18 branch: Adjust things for using the new Firefox plugin: remove visible Beagle references + fix 1.18 web queue indexing bugs
|
2012-11-01 11:30:39 +01:00 |
|
Jean-Francois Dockes
|
04c19b33d5
|
from 1.18 branch: When creating initial config directory (1st exec), initialize specific unac_except_trans for some languages: de, se/no/dk/fi + fix mixup of language and country codes
|
2012-11-01 11:27:50 +01:00 |
|
Jean-Francois Dockes
|
ee7d0f2ee7
|
1st parallel multithreaded version of indexing which can do my home without crashing... Let's checkpoint
|
2012-11-01 11:19:48 +01:00 |
|
Jean-Francois Dockes
|
b8963db4b1
|
cleaned up the missing helper storage class
|
2012-10-28 16:43:19 +01:00 |
|
Jean-Francois Dockes
|
17f8b652d4
|
Support explicit HTML markup in fields when the markup="html" attribute is present
|
2012-10-25 14:22:20 +02:00 |
|
Jean-Francois Dockes
|
95ef518ec7
|
the missing filter detection code was broken
|
2012-10-23 19:40:51 +02:00 |
|
Jean-Francois Dockes
|
2a833536d5
|
handle application tag when looking for icon, and add icons for books and book chapters (epub, chm, info)
|
2012-10-23 16:34:07 +02:00 |
|
Jean-Francois Dockes
|
5add2e2384
|
Arrange so we can now open the parent of a document (e.g. chm file instead of temp copy of html page inside chm), even when the parent is itself embedded in an archive
|
2012-10-12 16:54:52 +02:00 |
|
Jean-Francois Dockes
|
d4edbbaedb
|
rclepub: use elt ids instead of hrefs + debug traces
|
2012-10-11 15:35:15 +02:00 |
|
Jean-Francois Dockes
|
8e1ed842d2
|
message
|
2012-10-09 14:52:32 +02:00 |
|
Jean-Francois Dockes
|
f624d3b10e
|
doc
|
2012-10-06 21:04:03 +02:00 |
|
Jean-Francois Dockes
|
84b561b040
|
For plain text files, try alternate decode from 8bit charset when decode from UTF-8 fails
|
2012-10-06 15:12:49 +02:00 |
|
Jean-Francois Dockes
|
2fc294a9c6
|
factored out common charset handling code in exec and execm, cleaned up charset and textplain handling in mh_mail
|
2012-10-06 12:14:04 +02:00 |
|
Jean-Francois Dockes
|
29fe1e4927
|
implemented maxmemberkb limit for multidoc (e.g. archive) members
|
2012-10-06 09:05:35 +02:00 |
|
Jean-Francois Dockes
|
1329265b7b
|
check for empty file name in internfile, else gets stuck later because empty fn is interpreted as read stdin in md5
|
2012-10-05 16:42:13 +02:00 |
|
Jean-Francois Dockes
|
d942b44785
|
mbox: implement member size limit of 100MB and autodetec thunderbird mboxes (look for .msf)
|
2012-10-04 17:00:50 +02:00 |
|
Jean-Francois Dockes
|
e0bc65bfdd
|
small mods inocuous or auxiliary to case/diac sensitivity but which can live in main branch
|
2012-09-13 12:25:01 +02:00 |
|
"Jean-Francois Dockes ext:(%22)
|
ec3dbb4092
|
comments
|
2012-08-21 08:38:23 +02:00 |
|
"Jean-Francois Dockes ext:(%22)
|
2870274f80
|
slightly simplified temp file handling
|
2012-08-21 08:35:39 +02:00 |
|
Jean-Francois Dockes
|
254a7dc972
|
comment
|
2012-06-05 14:14:02 +02:00 |
|
Jean-Francois Dockes
|
643f4d56bb
|
internals: virtualized the doc fetcher interface
|
2012-06-05 07:16:11 +02:00 |
|
Jean-Francois Dockes
|
61a2e28a7c
|
Absurd input source global variable in Binc imap caused the indexer to crash when an email message contained attachments which were disguised messages (ie: x-mimehtml), because this would cause a recursive call into Binc with a different data source (ie: string instead of original fd, clobbering the original source
|
2012-05-24 14:52:41 +02:00 |
|
Jean-Francois Dockes
|
3accce0b22
|
index: added sanity checks to mail handler
|
2012-05-16 12:25:44 +02:00 |
|
Jean-Francois Dockes
|
0333d83d2e
|
html: small additional cleanup after previous <body> processing modification
|
2012-05-16 10:13:53 +02:00 |
|
Jean-Francois Dockes
|
e6191b51a8
|
Html: Just ignore opening and closing <body> and <html> tags. Current browsers show text before or after the body and ignore multiple body tags. Not pushed to 1.17 maint because of possible disruption. Closes issue #92
|
2012-05-16 10:07:09 +02:00 |
|
Jean-Francois Dockes
|
8b34610dde
|
Cleaned up file name handling. Fixes that file names were sometimes indexed split, sometimes not. They now always are both, with different prefixes. Forces reindex
|
2012-04-13 09:18:08 +02:00 |
|
Jean-Francois Dockes
|
ec7b40a52e
|
cosmetics: list -> vector in more places
|
2012-04-11 19:58:08 +02:00 |
|
Jean-Francois Dockes
|
78bd8d63da
|
use vector instead of list for execmd arg list
|
2012-04-11 15:36:49 +02:00 |
|
Jean-Francois Dockes
|
9f402d33cb
|
got rid of unused csguess module
|
2012-04-06 15:14:01 +02:00 |
|
Jean-Francois Dockes
|
80fb2f553c
|
MIME handling: treat content-type=="text" as "text/plain". Needed for some very old messages
|
2012-03-18 08:26:44 +01:00 |
|
Jean-Francois Dockes
|
0050f96f57
|
fix test driver
|
2012-03-18 08:23:33 +01:00 |
|
Jean-Francois Dockes
|
85166c93b2
|
Changed the way we handle document sizes. The fbytes field should now be in most cases the most "natural" document size. pcbytes holds the top external container size and dbytes the text size
|
2012-03-07 15:39:30 +01:00 |
|
Jean-Francois Dockes
|
638d468796
|
clarified the use of string keys inside the Filter metaData array
|
2012-03-07 10:13:46 +01:00 |
|
Jean-Francois Dockes
|
a5af2b93bd
|
"md5"->cstr_md5
|
2012-02-25 10:41:27 +01:00 |
|
Jean-Francois Dockes
|
ec87379015
|
html: handle the html5 charset meta tag
|
2012-01-26 19:27:58 +01:00 |
|
Jean-Francois Dockes
|
0d8a61ced9
|
log message
|
2012-01-26 19:26:54 +01:00 |
|
Jean-Francois Dockes
|
639a434dce
|
comments
|
2012-01-26 18:17:37 +01:00 |
|
Jean-Francois Dockes
|
eed31f9ef1
|
html index: throw an exception after parsing in all cases so that the same code path is always used. The previous approach sometimes resulted in a bad charset used for preview
|
2012-01-25 17:33:41 +01:00 |
|