Jean-Francois Dockes
|
b2f0e2e657
|
Add handler for emacs org-mode files
|
2020-11-30 09:50:44 +01:00 |
|
Jean-Francois Dockes
|
33725fd02c
|
simplify stdout redirection for pdftk
|
2020-11-25 17:54:06 +01:00 |
|
Jean-Francois Dockes
|
8b6082a89f
|
shared
|
2020-11-09 12:13:30 +01:00 |
|
Jean-Francois Dockes
|
f0abc1df68
|
pdf: discard pdftk stdout message "Error occurred during initialization of VM", it breaks pdf indexing when it occurs
|
2020-11-04 14:33:55 +01:00 |
|
Jean-Francois Dockes
|
f50a4e54b1
|
rclpython: renamed rclpython.py. Use rclexecm. Only colorize for preview, not indexing
|
2020-11-04 10:32:18 +01:00 |
|
Jean-Francois Dockes
|
e10cb959b3
|
add test for python program (different handler)
|
2020-10-18 18:38:44 +02:00 |
|
Jean-Francois Dockes
|
25eda37bc9
|
Index pdf annotations separately under field name annotation. Add annot, pdfannot and pa aliases.
|
2020-10-12 10:05:38 +02:00 |
|
Jean-Francois Dockes
|
694d0f155d
|
pdf annot: guard against possible exception while formatting results
|
2020-10-10 12:48:18 +02:00 |
|
Jean-Francois Dockes
|
96104e7d67
|
fix rclocrtesseract fix
|
2020-09-28 11:05:12 +02:00 |
|
Jean-Francois Dockes
|
8accec9b88
|
rclocrtesseract: unquote tesseractcmd parameter and check existence.
|
2020-09-24 07:13:21 +02:00 |
|
Jean-Francois Dockes
|
0dd609cf1a
|
python filters: replace misc message printing with single method in rclexecm
|
2020-09-23 18:38:22 +02:00 |
|
Jean-Francois Dockes
|
10bdf2a0c8
|
comments
|
2020-09-05 09:19:10 +02:00 |
|
Jean-Francois Dockes
|
d62bb9016a
|
pdf: try to extract annotation text if the python3 poppler-glib binding is available
|
2020-09-03 16:16:54 +02:00 |
|
Jean-Francois Dockes
|
2c0fd8502a
|
PDF: pdftk as snap (ubuntu): print warning about pdf attachments if TMPDIR does not belong to user
|
2020-08-20 11:27:12 +02:00 |
|
Jean-Francois Dockes
|
b305c86041
|
recoll-we-move-files: apply expanduser to the webdownloadsdir config value
|
2020-08-17 11:02:46 +02:00 |
|
Jean-Francois Dockes
|
d932d19562
|
epub handler: extract the opf metadata subjects fields as dc:subject tags. Share more code between rclepub and the now redundant rclepub1 (no more lynx usage in rclepub)
|
2020-08-09 09:49:08 +02:00 |
|
Jean-Francois Dockes
|
19fe03af62
|
Support visio .vsdx format
|
2020-08-04 10:57:13 +02:00 |
|
Jean-Francois Dockes
|
b2e68740ba
|
PDF: attachment extraction was broken since python3 (wrong open mode r instead of rb for the extracted file)
|
2020-07-27 09:03:58 +02:00 |
|
Jean-Francois Dockes
|
b4306b71c0
|
openxml word: be more specific for extracting text, avoids treating some image parameters as text
|
2020-07-15 10:49:06 +02:00 |
|
Jean-Francois Dockes
|
4508b6b064
|
rclpdf: avoid crash when external metadata filter cant be imported
|
2020-07-13 10:13:59 +02:00 |
|
Jean-Francois Dockes
|
73f2836317
|
korean splitter: add inactive option to split on white space before calling the tagger
|
2020-05-19 09:22:16 +02:00 |
|
Jean-Francois Dockes
|
c6dac9347f
|
cmdtalk: catch param decoding exceptions
|
2020-05-14 09:23:46 +02:00 |
|
Jean-Francois Dockes
|
dce3bff5d7
|
comment
|
2020-04-19 09:19:28 +02:00 |
|
Jean-Francois Dockes
|
c38db0f160
|
comment
|
2020-04-18 09:15:45 +02:00 |
|
Jean-Francois Dockes
|
b63cc1b712
|
Korean splitter script: use python-mecab-ko if possible, else konlpy
|
2020-04-10 14:27:06 +02:00 |
|
Jean-Francois Dockes
|
e8194dea9d
|
comment
|
2020-04-08 09:51:37 +02:00 |
|
Jean-Francois Dockes
|
d3de1f0d6f
|
add common execPythonScript method to rclexecm
|
2020-04-07 10:09:09 +02:00 |
|
Jean-Francois Dockes
|
32ebd65ba8
|
Windows: small changes for porting back from msvc to mingw
|
2020-04-07 09:40:00 +02:00 |
|
Jean-Francois Dockes
|
a88c0114b1
|
python filters: htmlescape needs not be an RclExecM member
|
2020-03-27 17:19:40 +01:00 |
|
Jean-Francois Dockes
|
90dd64fc61
|
Have RclExecM inherit the shared CmdTalk now that the latter is used anyway for the korean splitter. Main diff: cmdtalk strips the colon from param names and does not lowercase them
|
2020-03-27 11:07:51 +01:00 |
|
Jean-Francois Dockes
|
1afc606718
|
textsplit: break on it.error() not only it.eof(). Seems to make a difference in rare cases? Add Komoran support but this one often fails
|
2020-03-26 09:31:19 +01:00 |
|
Jean-Francois Dockes
|
207bfec93e
|
korean splitter: restart the python/java splitter from time to time because it leaks memory
|
2020-03-24 11:27:10 +01:00 |
|
Jean-Francois Dockes
|
9719177c82
|
Korean external splitter: add some support for Mecab
|
2020-03-23 16:20:32 +01:00 |
|
Jean-Francois Dockes
|
c9667b5ba7
|
Korean text: sort-of-working version, in need of validation
|
2020-03-22 15:49:24 +01:00 |
|
Jean-Francois Dockes
|
384e3a1087
|
korean textsplit with extern help from konlpy, first step
|
2020-03-22 10:09:50 +01:00 |
|
Jean-Francois Dockes
|
03cbc203e1
|
Hanword: use the html converter, the text ones drops data from tables
|
2020-03-21 10:16:16 +01:00 |
|
Jean-Francois Dockes
|
2cbd9ad79c
|
Added handler for Hancom .hwp format
|
2020-03-10 14:38:52 +01:00 |
|
Jean-Francois Dockes
|
0f6b5911d5
|
rclpython: only python3 now
|
2020-03-03 18:54:41 +01:00 |
|
Jean-Francois Dockes
|
8c816f50cf
|
doc
|
2020-03-03 18:53:31 +01:00 |
|
Jean-Francois Dockes
|
fe86fa9e1f
|
ocr: compat: make a non-existant ocrprogs config variable equivalent to "tesseract"
|
2020-02-28 14:38:02 +01:00 |
|
Jean-Francois Dockes
|
1fb9421163
|
OCR: small adjustments for Windows
|
2020-02-28 09:22:03 +01:00 |
|
Jean-Francois Dockes
|
8560467e4a
|
pdf/ocr scripts: no need to look for rclocr if pdfocr is not set. comments.
|
2020-02-27 18:16:28 +01:00 |
|
Jean-Francois Dockes
|
e520176a2a
|
OCR: small adjustments for Windows. Works with Tesseract.
|
2020-02-27 14:10:55 +01:00 |
|
Jean-Francois Dockes
|
abb7ef8803
|
added ocr module for abbyy
|
2020-02-27 11:35:23 +01:00 |
|
Jean-Francois Dockes
|
7bc70a30ae
|
ocrcache: implemented purge functions/script
|
2020-02-27 09:25:52 +01:00 |
|
Jean-Francois Dockes
|
747e37a980
|
rclocr ckpt: cache+tesseract indexing working
|
2020-02-26 17:30:12 +01:00 |
|
Jean-Francois Dockes
|
38dfa5f841
|
1st version of the cached ocr mechanism
|
2020-02-15 21:19:13 +01:00 |
|
Jean-Francois Dockes
|
e7e37b9233
|
openxml: extract more metadata fiels (e.g. description, keywords)
|
2020-01-30 08:38:30 +01:00 |
|
Jean-Francois Dockes
|
a1122c4e8a
|
Fix format string used to generate/scan circache headers.
Use _ not . as prefix for webqueue metadata files
Fix log messages and indent
|
2019-11-24 15:02:30 +01:00 |
|
Jean-Francois Dockes
|
83e29a9b01
|
Windows: enable the firefox recent history indexer.
|
2019-11-24 10:46:23 +01:00 |
|