Jean-Francois Dockes
b895980e95
PDF: fix the XMP metadata extraction code for python3 and other issues. Also get metadata from XML attributes
2019-06-12 19:21:37 +02:00
Jean-Francois Dockes
e71d7f183f
Python filters: using list append + join instead of string append improves performance hugely for big (book-sized) documents. Impact on a typical pdf mix is moderate though
2019-03-25 11:30:50 +01:00
Jean-Francois Dockes
f482df9707
reset wrong mode change
2019-03-04 11:22:46 +01:00
Jean-Francois Dockes
0cbc46732f
Fixed the FSF address
2019-03-04 11:19:14 +01:00
Jean-Francois Dockes
7ea3936420
Windows: use wide char interfaces
...
Exchange file names and command line parameters with the system using
wchar_t interfaces: allows preserving values which can be reversibly
transcoded in the current multibyte charset (which can't be UTF-8). Store
all file paths internally in UTF-8
2019-01-25 15:28:24 +01:00
Jean-Francois Dockes
a457b6c68e
rclpdf ocr: fix python3 issue. Add pdfocrlang config variable
2018-07-18 18:05:42 +02:00
Jean-Francois Dockes
52d3bfa54f
Change the shebang line from python2 to python3 for all scripts
2018-06-01 14:55:10 +02:00
Jean-Francois Dockes
0b8988cd64
Fix Windows PDF indexing. The successful test for poppler/pdftotext was not acknowledged and pdf indexing always failed
2018-01-19 13:15:51 +01:00
Jean-Francois Dockes
123d5b36ad
pdf: add and document MetaFixer::wrapup() method
2017-05-17 08:32:23 +02:00
Jean-Francois Dockes
ef9e7a935b
PDF XMP: move field editing code to external script, document
2017-05-17 06:57:52 +02:00
Jean-Francois Dockes
9e046187da
pdf xmp metadata: handle the case where the x:xmpmeta node is omitted and the XML root is rdf:RDF
2017-05-16 03:20:57 +02:00
Jean-Francois Dockes
6f44dce466
pdf: Added field-fixing method for Xml metadata
2017-05-15 14:04:55 +02:00
Jean-Francois Dockes
ccc0398155
Handle a unicode conversion issue. Avoid returning None as document for an empty document
2017-05-15 12:35:59 +02:00
Jean-Francois Dockes
d87d410f11
pdf: added capability to extract metadata from XML packet
2017-05-12 10:27:12 +02:00
Jean-Francois Dockes
06e8424048
Changed input handler shebang lines to use explicit python2 instead of python. Cant switch to python3 because of msodump anyway
2017-04-09 04:09:02 +02:00
Jean-Francois Dockes
d6b230043c
Check for newer pdftotext version to avoid double HTML escaping. fixes issue #318
2016-08-05 08:51:34 +02:00
Jean-Francois Dockes
b421f86f72
renamed rclmpdf.py to more normal rclpdf.py
2016-04-11 13:59:07 +02:00