445 Commits

Author SHA1 Message Date
Jean-Francois Dockes
7179e0dbf8 cmdtalk: remove remains of python2 support 2021-09-23 11:19:36 +02:00
Jean-Francois Dockes
3df83ec982 Zip archives: set the modification date attribute for members 2021-07-30 10:53:43 +02:00
Jean-Francois Dockes
a67dd3f8a3 ost/pst filter: fix not fetching the message dates 2021-07-23 19:12:34 +02:00
Jean-Francois Dockes
174ad9fe22 rcl ocr with tesseract: fix stupid breakage in script 2021-06-13 07:14:51 +01:00
Jean-Francois Dockes
e42a4e9669 Chm: fix catenate mode which was broken a long time ago 2021-05-01 10:29:44 +02:00
Jean-Francois Dockes
3865e1b05f rclchm: chmcatenate=1 would get the handler to crash 2021-05-01 08:10:34 +02:00
Jean-Francois Dockes
5656d376c7 Windows: djvu: need to convert file name becore subprocess check_output 2021-04-30 08:37:19 +01:00
Jean-Francois Dockes
3f23000b89 rclpython: when not previewing, just output the file text, with no processing at all. Avoids spurious newlines 2021-04-14 14:26:11 +02:00
Jean-Francois Dockes
7a54c3a110 rclpython.py: dont try to subscript an exception 2021-03-29 09:52:38 +02:00
Jean-Francois Dockes
a4b3aff5c4 rclaudio: if mutagen.File() fails, try with mutagen.ID3()
This allows extracting the tags e.g. from adts files
mistaken for mp3 during initial identification, and for which
the full later mp3 init fails because wrong kind of frame.
2021-03-03 12:53:59 +01:00
Jean-Francois Dockes
31f6793495 rclaudio: catch exception when parsing bad date, set date to the epoch 2021-02-25 19:27:24 +01:00
Jean-Francois Dockes
dc934b7ddc comment 2021-02-10 14:57:40 +01:00
freddii
89c7efe682 fixed typos 2021-02-04 17:12:22 +01:00
Jean-Francois Dockes
50b64caf5e rclaudio: process the Group tag 2021-01-27 09:32:55 +01:00
Jean-Francois Dockes
2998486d54 revert wrong change in rclaudio 2021-01-19 19:27:48 +01:00
Jean-Francois Dockes
baf2ee8d6b dont make date a field alias for dmtime, does not make sense because of diff. formats in general 2021-01-16 19:19:29 +01:00
Jean-Francois Dockes
cb13b8b6df "print fields" change in rclexecm options had broken -s 2021-01-15 14:06:52 +01:00
Jean-Francois Dockes
72a9548c88 fix warning from rclaudio regexp 2021-01-06 12:01:42 +01:00
Jean-Francois Dockes
e00767d98c rclexecm test/debug: add option -f to dump fields 2020-12-29 15:04:49 +01:00
Jean-Francois Dockes
ee1e84b2f3 comments 2020-12-25 17:35:08 +01:00
Jean-Francois Dockes
53edd7b213 rcl7z: use py7zr if available, rather than pylzma, which does not work on some archives 2020-12-25 17:34:15 +01:00
Jean-Francois Dockes
824e305bb0 Add option to limit tesseract threads 2020-12-17 11:08:31 +01:00
Jean-Francois Dockes
b2f0e2e657 Add handler for emacs org-mode files 2020-11-30 09:50:44 +01:00
Jean-Francois Dockes
33725fd02c simplify stdout redirection for pdftk 2020-11-25 17:54:06 +01:00
Jean-Francois Dockes
8b6082a89f shared 2020-11-09 12:13:30 +01:00
Jean-Francois Dockes
f0abc1df68 pdf: discard pdftk stdout message "Error occurred during initialization of VM", it breaks pdf indexing when it occurs 2020-11-04 14:33:55 +01:00
Jean-Francois Dockes
f50a4e54b1 rclpython: renamed rclpython.py. Use rclexecm. Only colorize for preview, not indexing 2020-11-04 10:32:18 +01:00
Jean-Francois Dockes
e10cb959b3 add test for python program (different handler) 2020-10-18 18:38:44 +02:00
Jean-Francois Dockes
25eda37bc9 Index pdf annotations separately under field name annotation. Add annot, pdfannot and pa aliases. 2020-10-12 10:05:38 +02:00
Jean-Francois Dockes
694d0f155d pdf annot: guard against possible exception while formatting results 2020-10-10 12:48:18 +02:00
Jean-Francois Dockes
96104e7d67 fix rclocrtesseract fix 2020-09-28 11:05:12 +02:00
Jean-Francois Dockes
8accec9b88 rclocrtesseract: unquote tesseractcmd parameter and check existence. 2020-09-24 07:13:21 +02:00
Jean-Francois Dockes
0dd609cf1a python filters: replace misc message printing with single method in rclexecm 2020-09-23 18:38:22 +02:00
Jean-Francois Dockes
10bdf2a0c8 comments 2020-09-05 09:19:10 +02:00
Jean-Francois Dockes
d62bb9016a pdf: try to extract annotation text if the python3 poppler-glib binding is available 2020-09-03 16:16:54 +02:00
Jean-Francois Dockes
2c0fd8502a PDF: pdftk as snap (ubuntu): print warning about pdf attachments if TMPDIR does not belong to user 2020-08-20 11:27:12 +02:00
Jean-Francois Dockes
b305c86041 recoll-we-move-files: apply expanduser to the webdownloadsdir config value 2020-08-17 11:02:46 +02:00
Jean-Francois Dockes
d932d19562 epub handler: extract the opf metadata subjects fields as dc:subject tags. Share more code between rclepub and the now redundant rclepub1 (no more lynx usage in rclepub) 2020-08-09 09:49:08 +02:00
Jean-Francois Dockes
19fe03af62 Support visio .vsdx format 2020-08-04 10:57:13 +02:00
Jean-Francois Dockes
b2e68740ba PDF: attachment extraction was broken since python3 (wrong open mode r instead of rb for the extracted file) 2020-07-27 09:03:58 +02:00
Jean-Francois Dockes
b4306b71c0 openxml word: be more specific for extracting text, avoids treating some image parameters as text 2020-07-15 10:49:06 +02:00
Jean-Francois Dockes
4508b6b064 rclpdf: avoid crash when external metadata filter cant be imported 2020-07-13 10:13:59 +02:00
Jean-Francois Dockes
73f2836317 korean splitter: add inactive option to split on white space before calling the tagger 2020-05-19 09:22:16 +02:00
Jean-Francois Dockes
c6dac9347f cmdtalk: catch param decoding exceptions 2020-05-14 09:23:46 +02:00
Jean-Francois Dockes
dce3bff5d7 comment 2020-04-19 09:19:28 +02:00
Jean-Francois Dockes
c38db0f160 comment 2020-04-18 09:15:45 +02:00
Jean-Francois Dockes
b63cc1b712 Korean splitter script: use python-mecab-ko if possible, else konlpy 2020-04-10 14:27:06 +02:00
Jean-Francois Dockes
e8194dea9d comment 2020-04-08 09:51:37 +02:00
Jean-Francois Dockes
d3de1f0d6f add common execPythonScript method to rclexecm 2020-04-07 10:09:09 +02:00
Jean-Francois Dockes
32ebd65ba8 Windows: small changes for porting back from msvc to mingw 2020-04-07 09:40:00 +02:00