Jean-Francois Dockes
|
fe86fa9e1f
|
ocr: compat: make a non-existant ocrprogs config variable equivalent to "tesseract"
|
2020-02-28 14:38:02 +01:00 |
|
Jean-Francois Dockes
|
1fb9421163
|
OCR: small adjustments for Windows
|
2020-02-28 09:22:03 +01:00 |
|
Jean-Francois Dockes
|
8560467e4a
|
pdf/ocr scripts: no need to look for rclocr if pdfocr is not set. comments.
|
2020-02-27 18:16:28 +01:00 |
|
Jean-Francois Dockes
|
e520176a2a
|
OCR: small adjustments for Windows. Works with Tesseract.
|
2020-02-27 14:10:55 +01:00 |
|
Jean-Francois Dockes
|
abb7ef8803
|
added ocr module for abbyy
|
2020-02-27 11:35:23 +01:00 |
|
Jean-Francois Dockes
|
7bc70a30ae
|
ocrcache: implemented purge functions/script
|
2020-02-27 09:25:52 +01:00 |
|
Jean-Francois Dockes
|
747e37a980
|
rclocr ckpt: cache+tesseract indexing working
|
2020-02-26 17:30:12 +01:00 |
|
Jean-Francois Dockes
|
38dfa5f841
|
1st version of the cached ocr mechanism
|
2020-02-15 21:19:13 +01:00 |
|
Jean-Francois Dockes
|
e7e37b9233
|
openxml: extract more metadata fiels (e.g. description, keywords)
|
2020-01-30 08:38:30 +01:00 |
|
Jean-Francois Dockes
|
a1122c4e8a
|
Fix format string used to generate/scan circache headers.
Use _ not . as prefix for webqueue metadata files
Fix log messages and indent
|
2019-11-24 15:02:30 +01:00 |
|
Jean-Francois Dockes
|
83e29a9b01
|
Windows: enable the firefox recent history indexer.
|
2019-11-24 10:46:23 +01:00 |
|
Jean-Francois Dockes
|
b43d1b3287
|
pdf xmp: pdfextrametafix: add method which takes the xml elt as arg instead of the text content
|
2019-11-14 18:19:33 +01:00 |
|
Jean-Francois Dockes
|
6d2454aedb
|
rclpdf.py: fixed typo in processing xmp field names
|
2019-10-14 19:46:46 +02:00 |
|
Jean-Francois Dockes
|
20ebeec7fc
|
handler verbosity
|
2019-10-14 09:03:55 +02:00 |
|
Jean-Francois Dockes
|
a96ee950b1
|
missing import in ppt msodumper
|
2019-10-11 15:23:58 +02:00 |
|
Jean-Francois Dockes
|
2491388e9e
|
pst handler: improved charset processing
|
2019-10-11 14:18:27 +02:00 |
|
Jean-Francois Dockes
|
f66b5d1ef9
|
pdf: fix test on pdfocr config value
|
2019-10-11 12:05:26 +02:00 |
|
Jean-Francois Dockes
|
239297d3de
|
For zip-bundled modules: prepend zip in path instead of append to make sure that our version is used
|
2019-10-10 14:15:05 +02:00 |
|
Jean-Francois Dockes
|
5210088e8f
|
Epub: failed with python3 when epubcatenate was set
|
2019-10-10 09:02:00 +02:00 |
|
Jean-Francois Dockes
|
0436b80956
|
windows: avoid picking up a default pdftotext: we want ours
|
2019-10-07 11:45:14 +02:00 |
|
Jean-Francois Dockes
|
af42fe8f5e
|
rclconfig.py, rclexecm.py: implement part of mimetype identification for rclexecm test mode
|
2019-10-06 07:44:50 +02:00 |
|
Jean-Francois Dockes
|
2e801812fe
|
rclpdf: restore pdfextrametafix function and add test
|
2019-09-04 09:38:11 +02:00 |
|
Jean-Francois Dockes
|
e4576fc12f
|
rcltex: try to detect character encoding
|
2019-08-27 08:32:50 +02:00 |
|
Jean-Francois Dockes
|
af664e7768
|
Input handlers: more closing to help with windows temp files
|
2019-07-21 10:03:03 +02:00 |
|
Jean-Francois Dockes
|
a1daa8de55
|
Epub: close file (windows temp file cleanup)
|
2019-07-20 19:17:29 +02:00 |
|
Jean-Francois Dockes
|
16a051c3b6
|
rcltext.py: make sure to close file (windows temp file removal)
|
2019-07-20 19:09:07 +02:00 |
|
Jean-Francois Dockes
|
7d168dc198
|
rclchm: close file (windows temp file removal)
|
2019-07-20 19:08:33 +02:00 |
|
Jean-Francois Dockes
|
2c454b92a6
|
rclimg: explicitely close file handle (windows temp file removal)
|
2019-07-20 15:14:32 +02:00 |
|
Jean-Francois Dockes
|
703caf2ee4
|
rclzip: close file when done (windows temp file cleanup)
|
2019-07-20 14:45:11 +02:00 |
|
Jean-Francois Dockes
|
4c2fd82d4e
|
pst: wait for pffexport and generate error if exit code is not 0
|
2019-06-24 11:47:17 +02:00 |
|
Jean-Francois Dockes
|
db9fd248f3
|
7z: properly list the needed package as pylzma
|
2019-06-21 16:57:58 +02:00 |
|
Jean-Francois Dockes
|
628da0e454
|
pst: new file name was appended to pffexport command instead of replacing old
|
2019-06-17 10:30:02 +02:00 |
|
Jean-Francois Dockes
|
e38e58c37a
|
In case the self-doc was not sent first by the handler, its udi was not recalculated, and it clobbered the last subdoc
|
2019-06-16 13:46:00 +02:00 |
|
Jean-Francois Dockes
|
5d25094107
|
pst: pass the command line ipath as base64 as there is no msw way to pass utf-8
|
2019-06-14 14:33:49 +02:00 |
|
Jean-Francois Dockes
|
6c73a0d666
|
pst: reset generator for new file
|
2019-06-13 16:16:32 +02:00 |
|
Jean-Francois Dockes
|
5ff1a92a51
|
pdf: ocr: small fixes, plus make pdfocr redefinable in subdirs
|
2019-06-13 09:47:25 +02:00 |
|
Jean-Francois Dockes
|
9dcdb6e9a6
|
pdf: ocr function was broken for python3 in some cases (depending on how the ocr language was specified)
|
2019-06-13 08:33:55 +02:00 |
|
Jean-Francois Dockes
|
b895980e95
|
PDF: fix the XMP metadata extraction code for python3 and other issues. Also get metadata from XML attributes
|
2019-06-12 19:21:37 +02:00 |
|
Jean-Francois Dockes
|
f0944ae0b2
|
rclpst: indexing / searching mostly working with maybe issues in data
charset conversions (check). Preview does not work, ipath needs conversion
inside pffexport
|
2019-05-28 18:39:37 +02:00 |
|
Jean-Francois Dockes
|
c1553029b9
|
Pst on Unix: email message indexing seems fully ok
|
2019-05-27 12:17:41 +02:00 |
|
Jean-Francois Dockes
|
cc4f4e0c74
|
ckpt: pst: basic indexing of email. no getipath/preview
|
2019-05-26 12:30:59 +02:00 |
|
Jean-Francois Dockes
|
c7c413d9e7
|
email address in copyright
|
2019-05-26 12:29:36 +02:00 |
|
Jean-Francois Dockes
|
7489427086
|
pff: dumpreader successfully creating an mbox
|
2019-05-25 15:14:50 +02:00 |
|
Jean-Francois Dockes
|
ec44e463c1
|
rclpython: catch more exceptions to avoid system reports
|
2019-04-28 17:50:02 +02:00 |
|
Jean-Francois Dockes
|
73d1cd36be
|
ppt-dump: catch exceptions to avoid system reports
|
2019-04-28 17:49:42 +02:00 |
|
Jean-Francois Dockes
|
48bc71da70
|
djvu: do not set -escape option to djvutxt. Ticket #90
|
2019-04-13 15:06:25 +02:00 |
|
Jean-Francois Dockes
|
cd0941cb11
|
rclaudio: allow use of external tag fixer script
|
2019-04-13 14:23:55 +02:00 |
|
Jean-Francois Dockes
|
4fe6cecd19
|
rclaudio: better process id3 TXX and TXXX
|
2019-03-28 06:17:43 +01:00 |
|
Jean-Francois Dockes
|
2f75550348
|
msodumper: 7a364956 . Improved performance
|
2019-03-26 14:47:18 +01:00 |
|
Jean-Francois Dockes
|
a7f01c0b87
|
Try to be a little less noisy about errors processing xls files
|
2019-03-26 09:20:21 +01:00 |
|