Jean-Francois Dockes
8b3792026f
Renamed a few extension-less python handlers with a .py extension for consistency
2022-01-14 12:12:22 +01:00
Jean-Francois Dockes
667e661c46
Standardize the shebang line of python scripts to using /usr/bin/env, which was already the vastly dominant choice
2022-01-14 09:27:04 +01:00
Jean-Francois Dockes
b3bb3784fc
Define specific document type for orgmode sub-documents
2022-01-04 14:02:26 +01:00
Jean-Francois Dockes
5fcffb7654
tesseract ocr: use compressed tif temp pages if pdftocairo is available (10x smaller than ppm)
2021-12-04 09:35:10 +01:00
Jean-Francois Dockes
e121695a3c
Python handlers: factorise tmp dir code
2021-12-03 11:03:23 +01:00
Jean-Francois Dockes
1593b1d87f
Change the way rclpd executes rclocr to avoid the command being killed before it can clean up when a signal is raised (e.g. timeout or kbd interrupt)
2021-12-03 10:49:44 +01:00
Jean-Francois Dockes
58d98b5626
PST : account for badly formed headers
2021-10-21 20:42:27 +02:00
Jean-Francois Dockes
1d158f329a
pst: account for possible failure in decoding body and possible "unicode" name for encoding
2021-10-19 09:53:59 +02:00
Jean-Francois Dockes
8a98635c3a
ipynb: format variations
2021-10-10 09:45:37 +02:00
Jean-Francois Dockes
7b81c16ea0
add support for ipython/jupyter notebooks
2021-10-10 08:11:59 +02:00
Jean-Francois Dockes
7179e0dbf8
cmdtalk: remove remains of python2 support
2021-09-23 11:19:36 +02:00
Jean-Francois Dockes
3df83ec982
Zip archives: set the modification date attribute for members
2021-07-30 10:53:43 +02:00
Jean-Francois Dockes
a67dd3f8a3
ost/pst filter: fix not fetching the message dates
2021-07-23 19:12:34 +02:00
Jean-Francois Dockes
174ad9fe22
rcl ocr with tesseract: fix stupid breakage in script
2021-06-13 07:14:51 +01:00
Jean-Francois Dockes
e42a4e9669
Chm: fix catenate mode which was broken a long time ago
2021-05-01 10:29:44 +02:00
Jean-Francois Dockes
3865e1b05f
rclchm: chmcatenate=1 would get the handler to crash
2021-05-01 08:10:34 +02:00
Jean-Francois Dockes
5656d376c7
Windows: djvu: need to convert file name becore subprocess check_output
2021-04-30 08:37:19 +01:00
Jean-Francois Dockes
3f23000b89
rclpython: when not previewing, just output the file text, with no processing at all. Avoids spurious newlines
2021-04-14 14:26:11 +02:00
Jean-Francois Dockes
7a54c3a110
rclpython.py: dont try to subscript an exception
2021-03-29 09:52:38 +02:00
Jean-Francois Dockes
a4b3aff5c4
rclaudio: if mutagen.File() fails, try with mutagen.ID3()
...
This allows extracting the tags e.g. from adts files
mistaken for mp3 during initial identification, and for which
the full later mp3 init fails because wrong kind of frame.
2021-03-03 12:53:59 +01:00
Jean-Francois Dockes
31f6793495
rclaudio: catch exception when parsing bad date, set date to the epoch
2021-02-25 19:27:24 +01:00
Jean-Francois Dockes
dc934b7ddc
comment
2021-02-10 14:57:40 +01:00
freddii
89c7efe682
fixed typos
2021-02-04 17:12:22 +01:00
Jean-Francois Dockes
50b64caf5e
rclaudio: process the Group tag
2021-01-27 09:32:55 +01:00
Jean-Francois Dockes
2998486d54
revert wrong change in rclaudio
2021-01-19 19:27:48 +01:00
Jean-Francois Dockes
baf2ee8d6b
dont make date a field alias for dmtime, does not make sense because of diff. formats in general
2021-01-16 19:19:29 +01:00
Jean-Francois Dockes
cb13b8b6df
"print fields" change in rclexecm options had broken -s
2021-01-15 14:06:52 +01:00
Jean-Francois Dockes
72a9548c88
fix warning from rclaudio regexp
2021-01-06 12:01:42 +01:00
Jean-Francois Dockes
e00767d98c
rclexecm test/debug: add option -f to dump fields
2020-12-29 15:04:49 +01:00
Jean-Francois Dockes
ee1e84b2f3
comments
2020-12-25 17:35:08 +01:00
Jean-Francois Dockes
53edd7b213
rcl7z: use py7zr if available, rather than pylzma, which does not work on some archives
2020-12-25 17:34:15 +01:00
Jean-Francois Dockes
824e305bb0
Add option to limit tesseract threads
2020-12-17 11:08:31 +01:00
Jean-Francois Dockes
b2f0e2e657
Add handler for emacs org-mode files
2020-11-30 09:50:44 +01:00
Jean-Francois Dockes
33725fd02c
simplify stdout redirection for pdftk
2020-11-25 17:54:06 +01:00
Jean-Francois Dockes
8b6082a89f
shared
2020-11-09 12:13:30 +01:00
Jean-Francois Dockes
f0abc1df68
pdf: discard pdftk stdout message "Error occurred during initialization of VM", it breaks pdf indexing when it occurs
2020-11-04 14:33:55 +01:00
Jean-Francois Dockes
f50a4e54b1
rclpython: renamed rclpython.py. Use rclexecm. Only colorize for preview, not indexing
2020-11-04 10:32:18 +01:00
Jean-Francois Dockes
e10cb959b3
add test for python program (different handler)
2020-10-18 18:38:44 +02:00
Jean-Francois Dockes
25eda37bc9
Index pdf annotations separately under field name annotation. Add annot, pdfannot and pa aliases.
2020-10-12 10:05:38 +02:00
Jean-Francois Dockes
694d0f155d
pdf annot: guard against possible exception while formatting results
2020-10-10 12:48:18 +02:00
Jean-Francois Dockes
96104e7d67
fix rclocrtesseract fix
2020-09-28 11:05:12 +02:00
Jean-Francois Dockes
8accec9b88
rclocrtesseract: unquote tesseractcmd parameter and check existence.
2020-09-24 07:13:21 +02:00
Jean-Francois Dockes
0dd609cf1a
python filters: replace misc message printing with single method in rclexecm
2020-09-23 18:38:22 +02:00
Jean-Francois Dockes
10bdf2a0c8
comments
2020-09-05 09:19:10 +02:00
Jean-Francois Dockes
d62bb9016a
pdf: try to extract annotation text if the python3 poppler-glib binding is available
2020-09-03 16:16:54 +02:00
Jean-Francois Dockes
2c0fd8502a
PDF: pdftk as snap (ubuntu): print warning about pdf attachments if TMPDIR does not belong to user
2020-08-20 11:27:12 +02:00
Jean-Francois Dockes
b305c86041
recoll-we-move-files: apply expanduser to the webdownloadsdir config value
2020-08-17 11:02:46 +02:00
Jean-Francois Dockes
d932d19562
epub handler: extract the opf metadata subjects fields as dc:subject tags. Share more code between rclepub and the now redundant rclepub1 (no more lynx usage in rclepub)
2020-08-09 09:49:08 +02:00
Jean-Francois Dockes
19fe03af62
Support visio .vsdx format
2020-08-04 10:57:13 +02:00
Jean-Francois Dockes
b2e68740ba
PDF: attachment extraction was broken since python3 (wrong open mode r instead of rb for the extracted file)
2020-07-27 09:03:58 +02:00