265 Commits

Author SHA1 Message Date
Jean-Francois Dockes
7346105dcb rclaudio: properly process unicode tags 2017-12-03 19:01:50 +01:00
Jean-Francois Dockes
bbb30d3351 rclaudio: properly parse mp4 trkn = (x,y) 2017-12-03 17:57:37 +01:00
Jean-Francois Dockes
5afe1aa631 Add and interface a script to move the files generated by the WebExtensions new browser extension into the web input queue 2017-11-24 15:30:27 +01:00
Jean-Francois Dockes
cd44aa33e1 added adaptor script for new browser plugin 2017-11-24 11:10:45 +01:00
Jean-Francois Dockes
f8ce677e65 rclimg: remove perl option -w 2017-07-10 22:51:29 +02:00
Jean-Francois Dockes
d5732e6a74 allow perl to not be /usr/bin/perl 2017-06-30 15:25:52 +02:00
Jean-Francois Dockes
123d5b36ad pdf: add and document MetaFixer::wrapup() method 2017-05-17 08:32:23 +02:00
Jean-Francois Dockes
ef9e7a935b PDF XMP: move field editing code to external script, document 2017-05-17 06:57:52 +02:00
Jean-Francois Dockes
9e046187da pdf xmp metadata: handle the case where the x:xmpmeta node is omitted and the XML root is rdf:RDF 2017-05-16 03:20:57 +02:00
Jean-Francois Dockes
6f44dce466 pdf: Added field-fixing method for Xml metadata 2017-05-15 14:04:55 +02:00
Jean-Francois Dockes
ccc0398155 Handle a unicode conversion issue. Avoid returning None as document for an empty document 2017-05-15 12:35:59 +02:00
Jean-Francois Dockes
d87d410f11 pdf: added capability to extract metadata from XML packet 2017-05-12 10:27:12 +02:00
Jean-Francois Dockes
d6a1f2a7f4 rclaudio: process additional tags 2017-04-25 10:16:48 +02:00
Jean-Francois Dockes
06e8424048 Changed input handler shebang lines to use explicit python2 instead of python. Cant switch to python3 because of msodump anyway 2017-04-09 04:09:02 +02:00
Jean-Francois Dockes
3e141cb2d5 support odf flat xml formats 2017-03-07 18:29:31 +01:00
Jean-Francois Dockes
4de12c11b7 odf file metadata was not properly processed 2017-03-07 18:28:23 +01:00
Jean-Francois Dockes
d35c2a557a Process a few non-standard tag names found in the wild + check for embedded images 2017-02-27 17:15:15 +01:00
Jean-Francois Dockes
d891488687 Get rid of using the "Easy" wrapper and process the original tags instead 2017-02-19 12:38:18 +01:00
Jean-Francois Dockes
28bf7ff93c rclaudio: let mutagen create the right object type. Extract more fields. Use the setfield() method instead of html meta tags. Needs the recent increase in max field count in mh_execm 2017-02-02 18:05:35 +01:00
Jean-Francois Dockes
7567025ad3 added "all in one" rclepub1 filter (no individual indexing of chapters) 2016-12-05 15:19:02 +01:00
Jean-Francois Dockes
d6b230043c Check for newer pdftotext version to avoid double HTML escaping. fixes issue #318 2016-08-05 08:51:34 +02:00
Jean-Francois Dockes
b9e672abda Allow execm input handlers to set arbitrary data fields 2016-07-11 18:13:39 +02:00
Jean-Francois Dockes
236900ee2a comments 2016-05-23 19:16:31 +02:00
Jean-Francois Dockes
b2bd67cee8 added bogus minimum sample execm handler, indexing text lines as docs 2016-05-23 18:59:00 +02:00
Jean-Francois Dockes
b421f86f72 renamed rclmpdf.py to more normal rclpdf.py 2016-04-11 13:59:07 +02:00
Jean-Francois Dockes
4830e35a1b pdf: add config variables to control if we attempt attachment extraction and ocr 2016-04-11 13:57:58 +02:00
Jean-Francois Dockes
74088bdada doc 2016-04-09 20:01:48 +02:00
Jean-Francois Dockes
b995cfb4e8 added module for simplified interface to libxmp 2016-04-08 11:37:23 +02:00
Jean-Francois Dockes
031cdf9761 converted rcldjvu to python 2016-04-08 10:24:52 +02:00
Jean-Francois Dockes
95bd49b420 Restore PDF OCR capability from shell version of rclpdf script 2016-04-08 09:00:23 +02:00
Jean-Francois Dockes
92bb5bfc43 xls filter: catch HTML files disguising as XLS 2016-02-26 09:35:23 +01:00
Jean-Francois Dockes
b4c1fd033a effect-less typo 2016-02-26 08:45:07 +01:00
Jean-Francois Dockes
d115bcfaa2 rclmpdf.py: p2/3 compat 2015-11-21 12:46:58 +01:00
Jean-Francois Dockes
5776c4bc20 rclinfo: remove trace message 2015-11-21 12:46:28 +01:00
Jean-Francois Dockes
953144d131 Make sure to execute python2 scripts with python2 2015-11-16 15:18:59 +01:00
Jean-Francois Dockes
683a258d4d more python3 tweaks 2015-11-16 13:19:44 +01:00
Jean-Francois Dockes
452e5c1c59 comments 2015-11-16 09:26:19 +01:00
Jean-Francois Dockes
585f651919 Use os.devnull instead of /dev/null 2015-11-15 16:04:55 +01:00
Jean-Francois Dockes
2e78f573de more py3 fixups 2015-11-07 17:19:40 +01:00
Jean-Francois Dockes
dfe00ab11f more filters made compatible with python3 2015-11-07 16:59:17 +01:00
Jean-Francois Dockes
f344e8fedd first pass at converting the filters for python 2/3 compat 2015-11-06 16:49:03 +01:00
Jean-Francois Dockes
d416acf1c0 use $HOME instead of ~ 2015-10-27 07:38:05 +01:00
Jean-Francois Dockes
8324f09d19 Get uncompression to work and fix a few other issues 2015-10-13 16:48:16 +02:00
Jean-Francois Dockes
a02a611694 let filter 'which' find a command in a specified subdir of PATH elements 2015-10-13 10:00:48 +02:00
Jean-Francois Dockes
4c3e112c27 Use the python-based filters written for ms-win on Linux too 2015-10-11 08:41:15 +02:00
Jean-Francois Dockes
0e6d921f9a added image tag filter based on pyexiv2 2015-10-10 18:40:04 +02:00
Jean-Francois Dockes
1e3ce6c36f Pure mingw build ok 2015-10-08 15:32:01 +02:00
Jean-Francois Dockes
453ed8748a Windows: manage timeouts, time and size limits 2015-10-08 14:08:36 +02:00
Jean-Francois Dockes
374f775092 Added possibly uncomplete rcluncomp.py script for windows 2015-10-01 18:23:30 +02:00
Jean-Francois Dockes
a411d4c964 Windows: small fixes for rclmpdf.py to work with alivate poppler 2015-10-01 16:36:29 +02:00