210 Commits

Author SHA1 Message Date
Jean-Francois Dockes
24c77d2984 more filter conversion to python: svg and xml. Get rid of rclnull
--HG--
branch : WINDOWSPORT
2015-09-14 09:51:11 +02:00
Jean-Francois Dockes
36b36f2c69 rcltext.py
--HG--
branch : WINDOWSPORT
2015-09-13 10:34:40 +02:00
Jean-Francois Dockes
42401c8f26 windows: rclrtf.py and rcldoc.py apparently working ok
--HG--
branch : WINDOWSPORT
2015-09-12 16:53:24 +02:00
Jean-Francois Dockes
118982d25e cleanup in new python filters
--HG--
branch : WINDOWSPORT
2015-09-12 10:54:26 +02:00
Jean-Francois Dockes
330c7fc30d Python filters beginning to work, still issues.
--HG--
branch : WINDOWSPORT
2015-09-11 16:16:16 +02:00
Jean-Francois Dockes
bd58ffb920 open xml python + xslt filter
--HG--
branch : WINDOWSPORT
2015-09-10 17:39:49 +02:00
Jean-Francois Dockes
8794932158 converted/duplicated rclsoff to rclsoff.py, using python-libxslt/xml
--HG--
branch : WINDOWSPORT
2015-09-07 15:34:39 +02:00
Jean-Francois Dockes
e40cf64e66 New python-based msword filter + basic arch to convert the others
--HG--
branch : WINDOWSPORT
2015-09-07 11:16:20 +02:00
Jean-Francois Dockes
f00ed2ba5a actually postprocess
--HG--
branch : WINDOWSPORT
2015-09-07 09:23:07 +02:00
Jean-Francois Dockes
16f495a9c0 temp ckpt
--HG--
branch : WINDOWSPORT
2015-09-06 19:55:43 +02:00
Jean-Francois Dockes
766a34a8db fix flac mime types in rclaudio + small changes for experimenting with embedding an interpreter in recollindex 2015-08-23 09:29:26 +02:00
Jean-Francois Dockes
83939e45ab import sys 2015-08-09 13:37:30 +02:00
Jean-Francois Dockes
6a6552ee43 exit with meaningful status 2015-07-31 11:24:56 +02:00
Jean-Francois Dockes
922a9384f9 rclpdf: work with newer poppler version which do escape html text inside <head> 2015-06-30 10:35:22 +02:00
Jean-Francois Dockes
eaddefa7c5 Add capability to run tesseract from rclpdf. Disabled by default, see comments at the top of rclpdf 2015-04-24 18:13:52 +02:00
Jean-Francois Dockes
1e6f56522e Let recollindex execute a script at startup to try and guess if it should retry failed files 2015-04-24 10:46:58 +02:00
Jean-Francois Dockes
fb83946183 Contributed rclscribus fixes, thanks to Morten 2015-04-20 09:16:37 +02:00
Jean-Francois Dockes
47b1d77c5d guard against spaces in filenames inside rclokulnote and rcldoc filters 2015-04-17 13:12:01 +02:00
Francois Botha
d80db8c09f Implement filter for .7z files. Based on rclzip and rcltar 2015-04-06 09:57:00 +02:00
Jean-Francois Dockes
fbb2c257a5 python2->python in script headers 2015-02-27 18:43:27 +01:00
Jean-Francois Dockes
cf33d7531c Make xls-dump.py errors less noisy, hopefully avoiding system reports on Fedora 2014-12-18 15:35:42 +01:00
Jean-Francois Dockes
02874255d8 rclmpdf ok? 2014-10-29 11:57:44 +01:00
Jean-Francois Dockes
86bc0e9104 dquot -> quot! 2014-10-29 11:57:18 +01:00
Jean-Francois Dockes
293468bd58 new pdf filter which can process attachments 2014-10-29 08:20:03 +01:00
Jean-Francois Dockes
7837558909 rclpurple: fix for current log format 2014-10-01 11:37:20 +02:00
Jean-Francois Dockes
552eb0965b rclpdf: also escape text inside meta content attributes 2014-08-25 14:16:45 +02:00
Jean-Francois Dockes
729be49a1b Improved error message, closes issue #207 2014-07-14 08:30:41 +02:00
Jean-Francois Dockes
958a8f6abb zip: improved error output. Fixes issue #201 2014-07-06 16:32:41 +02:00
Jean-Francois Dockes
cada24896f ppt-dump: improve error messages 2014-07-06 16:27:40 +02:00
Jean-Francois Dockes
25271db690 msword docs: avoid generating an error for files containing only a picture (empty antiword output) 2014-07-06 16:24:11 +02:00
Jean-Francois Dockes
62c2ff3d4c OpenOffice filter: do produce white space for tab input! 2014-06-24 08:13:32 +02:00
Jean-Francois Dockes
27f77addd6 rcltar: clean up import statements 2014-06-07 11:45:25 +02:00
Jean-Francois Dockes
28a4e4d8a8 catch ppt-dump errors to avoid bogus system reports 2014-05-06 11:39:27 +02:00
Jean-Francois Dockes
45b845769c Replace catdoc with mso-dumper for XLS too 2014-01-09 17:44:05 +01:00
Jean-Francois Dockes
ea2c80f3a8 PPT filter: fix infinite loop in script (happened on invalid files) 2013-11-21 12:59:13 +01:00
Jean-Francois Dockes
064c247499 PPT filter: use mso-dump 2013-11-19 14:42:05 +01:00
Jean-Francois Dockes
aca05b7b2a comments 2013-11-19 14:41:14 +01:00
Jean-Francois Dockes
f078369cbb rclppt: fix absolute paths 2013-11-14 19:20:36 +01:00
Jean-Francois Dockes
9c42bab11b ppt filter: support unoconv 0.4 by using directory as parameter to -o 2013-11-14 19:09:47 +01:00
Jean-Francois Dockes
134153e412 powerpoint: decide to use unoconv based on the number of lines in catppt output 2013-11-12 10:40:07 +01:00
Jean-Francois Dockes
a9358d2f03 Powerpoint docs: add option to have rclppt use unoconv 2013-11-12 09:56:50 +01:00
Jean-Francois Dockes
9d25a0475f have the zip filter access the config if possible and use the zipSkippedNames variable 2013-06-10 14:03:24 +02:00
Jean-Francois Dockes
ea27248837 test driver: no data output by default 2013-06-10 14:01:03 +02:00
Jean-Francois Dockes
2018ef76b8 extract more svg metadata 2013-03-28 08:49:40 +01:00
Jean-Francois Dockes
d3631b5ddf cleaned up processing of metadata from diverse origins (doc,extattrs,localfields) 2013-01-29 14:33:57 +01:00
Jean-Francois Dockes
e24bd240f9 Implement workaround to character encoding issues in chm files and python HTMLParser 2012-12-05 13:24:02 +01:00
Jean-Francois Dockes
e3664ca88b handle filters returning unicode objects 2012-10-23 16:32:52 +02:00
Jean-Francois Dockes
c92cf26316 extract epub metadata into top document 2012-10-23 16:32:20 +02:00
Jean-Francois Dockes
816980a1c4 implemented advanced search history feature 2012-10-16 13:37:56 +02:00
Jean-Francois Dockes
5add2e2384 Arrange so we can now open the parent of a document (e.g. chm file instead of temp copy of html page inside chm), even when the parent is itself embedded in an archive 2012-10-12 16:54:52 +02:00