372 Commits

Author SHA1 Message Date
Jean-Francois Dockes
6d2454aedb rclpdf.py: fixed typo in processing xmp field names 2019-10-14 19:46:46 +02:00
Jean-Francois Dockes
20ebeec7fc handler verbosity 2019-10-14 09:03:55 +02:00
Jean-Francois Dockes
a96ee950b1 missing import in ppt msodumper 2019-10-11 15:23:58 +02:00
Jean-Francois Dockes
2491388e9e pst handler: improved charset processing 2019-10-11 14:18:27 +02:00
Jean-Francois Dockes
f66b5d1ef9 pdf: fix test on pdfocr config value 2019-10-11 12:05:26 +02:00
Jean-Francois Dockes
239297d3de For zip-bundled modules: prepend zip in path instead of append to make sure that our version is used 2019-10-10 14:15:05 +02:00
Jean-Francois Dockes
5210088e8f Epub: failed with python3 when epubcatenate was set 2019-10-10 09:02:00 +02:00
Jean-Francois Dockes
0436b80956 windows: avoid picking up a default pdftotext: we want ours 2019-10-07 11:45:14 +02:00
Jean-Francois Dockes
af42fe8f5e rclconfig.py, rclexecm.py: implement part of mimetype identification for rclexecm test mode 2019-10-06 07:44:50 +02:00
Jean-Francois Dockes
2e801812fe rclpdf: restore pdfextrametafix function and add test 2019-09-04 09:38:11 +02:00
Jean-Francois Dockes
e4576fc12f rcltex: try to detect character encoding 2019-08-27 08:32:50 +02:00
Jean-Francois Dockes
af664e7768 Input handlers: more closing to help with windows temp files 2019-07-21 10:03:03 +02:00
Jean-Francois Dockes
a1daa8de55 Epub: close file (windows temp file cleanup) 2019-07-20 19:17:29 +02:00
Jean-Francois Dockes
16a051c3b6 rcltext.py: make sure to close file (windows temp file removal) 2019-07-20 19:09:07 +02:00
Jean-Francois Dockes
7d168dc198 rclchm: close file (windows temp file removal) 2019-07-20 19:08:33 +02:00
Jean-Francois Dockes
2c454b92a6 rclimg: explicitely close file handle (windows temp file removal) 2019-07-20 15:14:32 +02:00
Jean-Francois Dockes
703caf2ee4 rclzip: close file when done (windows temp file cleanup) 2019-07-20 14:45:11 +02:00
Jean-Francois Dockes
4c2fd82d4e pst: wait for pffexport and generate error if exit code is not 0 2019-06-24 11:47:17 +02:00
Jean-Francois Dockes
db9fd248f3 7z: properly list the needed package as pylzma 2019-06-21 16:57:58 +02:00
Jean-Francois Dockes
628da0e454 pst: new file name was appended to pffexport command instead of replacing old 2019-06-17 10:30:02 +02:00
Jean-Francois Dockes
e38e58c37a In case the self-doc was not sent first by the handler, its udi was not recalculated, and it clobbered the last subdoc 2019-06-16 13:46:00 +02:00
Jean-Francois Dockes
5d25094107 pst: pass the command line ipath as base64 as there is no msw way to pass utf-8 2019-06-14 14:33:49 +02:00
Jean-Francois Dockes
6c73a0d666 pst: reset generator for new file 2019-06-13 16:16:32 +02:00
Jean-Francois Dockes
5ff1a92a51 pdf: ocr: small fixes, plus make pdfocr redefinable in subdirs 2019-06-13 09:47:25 +02:00
Jean-Francois Dockes
9dcdb6e9a6 pdf: ocr function was broken for python3 in some cases (depending on how the ocr language was specified) 2019-06-13 08:33:55 +02:00
Jean-Francois Dockes
b895980e95 PDF: fix the XMP metadata extraction code for python3 and other issues. Also get metadata from XML attributes 2019-06-12 19:21:37 +02:00
Jean-Francois Dockes
f0944ae0b2 rclpst: indexing / searching mostly working with maybe issues in data
charset conversions (check). Preview does not work, ipath needs conversion
inside pffexport
2019-05-28 18:39:37 +02:00
Jean-Francois Dockes
c1553029b9 Pst on Unix: email message indexing seems fully ok 2019-05-27 12:17:41 +02:00
Jean-Francois Dockes
cc4f4e0c74 ckpt: pst: basic indexing of email. no getipath/preview 2019-05-26 12:30:59 +02:00
Jean-Francois Dockes
c7c413d9e7 email address in copyright 2019-05-26 12:29:36 +02:00
Jean-Francois Dockes
7489427086 pff: dumpreader successfully creating an mbox 2019-05-25 15:14:50 +02:00
Jean-Francois Dockes
ec44e463c1 rclpython: catch more exceptions to avoid system reports 2019-04-28 17:50:02 +02:00
Jean-Francois Dockes
73d1cd36be ppt-dump: catch exceptions to avoid system reports 2019-04-28 17:49:42 +02:00
Jean-Francois Dockes
48bc71da70 djvu: do not set -escape option to djvutxt. Ticket #90 2019-04-13 15:06:25 +02:00
Jean-Francois Dockes
cd0941cb11 rclaudio: allow use of external tag fixer script 2019-04-13 14:23:55 +02:00
Jean-Francois Dockes
4fe6cecd19 rclaudio: better process id3 TXX and TXXX 2019-03-28 06:17:43 +01:00
Jean-Francois Dockes
2f75550348 msodumper: 7a364956 . Improved performance 2019-03-26 14:47:18 +01:00
Jean-Francois Dockes
a7f01c0b87 Try to be a little less noisy about errors processing xls files 2019-03-26 09:20:21 +01:00
Jean-Francois Dockes
083a7dfcc1 msodumper 1d64ca83 : catch exceptions in xls-dump to avoid system reports 2019-03-26 09:19:47 +01:00
Jean-Francois Dockes
a3c5c07b22 msodumper d19bebfd 2019-03-26 08:34:58 +01:00
Jean-Francois Dockes
e71d7f183f Python filters: using list append + join instead of string append improves performance hugely for big (book-sized) documents. Impact on a typical pdf mix is moderate though 2019-03-25 11:30:50 +01:00
Jean-Francois Dockes
79724b1d28 msodump: updated to git 1165d665: Add Ole identification checks to avoid crashes and loops on random file formats 2019-03-23 17:22:14 +01:00
Jean-Francois Dockes
5ede21b61b rclaudio: avoid generating errors for files which just probably have no tags 2019-03-21 15:28:39 +01:00
Jean-Francois Dockes
9709836300 Process .avi using rclimg 2019-03-05 14:00:58 +01:00
Jean-Francois Dockes
bacbdce8b8 more fsf address fixes 2019-03-04 11:33:59 +01:00
Jean-Francois Dockes
f482df9707 reset wrong mode change 2019-03-04 11:22:46 +01:00
Jean-Francois Dockes
0cbc46732f Fixed the FSF address 2019-03-04 11:19:14 +01:00
Jean-Francois Dockes
39712d03f3 rclrar: only decode the ipath if it's not str already 2019-02-19 20:53:49 +01:00
Jean-Francois Dockes
1f8afa4e2d trimmed python handler error verbosity 2019-02-18 10:32:54 +01:00
Jean-Francois Dockes
8f10f48555 py-unrar (for windows) wants str/unicode in its interface 2019-02-17 17:46:39 +01:00