176 Commits

Author SHA1 Message Date
Jean-Francois Dockes
ea2c80f3a8 PPT filter: fix infinite loop in script (happened on invalid files) 2013-11-21 12:59:13 +01:00
Jean-Francois Dockes
064c247499 PPT filter: use mso-dump 2013-11-19 14:42:05 +01:00
Jean-Francois Dockes
aca05b7b2a comments 2013-11-19 14:41:14 +01:00
Jean-Francois Dockes
f078369cbb rclppt: fix absolute paths 2013-11-14 19:20:36 +01:00
Jean-Francois Dockes
9c42bab11b ppt filter: support unoconv 0.4 by using directory as parameter to -o 2013-11-14 19:09:47 +01:00
Jean-Francois Dockes
134153e412 powerpoint: decide to use unoconv based on the number of lines in catppt output 2013-11-12 10:40:07 +01:00
Jean-Francois Dockes
a9358d2f03 Powerpoint docs: add option to have rclppt use unoconv 2013-11-12 09:56:50 +01:00
Jean-Francois Dockes
9d25a0475f have the zip filter access the config if possible and use the zipSkippedNames variable 2013-06-10 14:03:24 +02:00
Jean-Francois Dockes
ea27248837 test driver: no data output by default 2013-06-10 14:01:03 +02:00
Jean-Francois Dockes
2018ef76b8 extract more svg metadata 2013-03-28 08:49:40 +01:00
Jean-Francois Dockes
d3631b5ddf cleaned up processing of metadata from diverse origins (doc,extattrs,localfields) 2013-01-29 14:33:57 +01:00
Jean-Francois Dockes
e24bd240f9 Implement workaround to character encoding issues in chm files and python HTMLParser 2012-12-05 13:24:02 +01:00
Jean-Francois Dockes
e3664ca88b handle filters returning unicode objects 2012-10-23 16:32:52 +02:00
Jean-Francois Dockes
c92cf26316 extract epub metadata into top document 2012-10-23 16:32:20 +02:00
Jean-Francois Dockes
816980a1c4 implemented advanced search history feature 2012-10-16 13:37:56 +02:00
Jean-Francois Dockes
5add2e2384 Arrange so we can now open the parent of a document (e.g. chm file instead of temp copy of html page inside chm), even when the parent is itself embedded in an archive 2012-10-12 16:54:52 +02:00
Jean-Francois Dockes
c7a35a176c none 2012-10-12 13:35:21 +02:00
Jean-Francois Dockes
7fcb7c9bf7 ensure chm file can be renamed 2012-10-12 13:34:56 +02:00
Jean-Francois Dockes
d4edbbaedb rclepub: use elt ids instead of hrefs + debug traces 2012-10-11 15:35:15 +02:00
Jean-Francois Dockes
7c18d74541 add epub viewer and set rclaptg meta tag for chm and info 2012-10-11 14:03:30 +02:00
Jean-Francois Dockes
7037e1ca38 fix 8bit file name processing 2012-10-06 12:00:05 +02:00
Jean-Francois Dockes
ff2e12f149 glitch in maxmemberkb handling 2012-10-06 11:59:48 +02:00
Jean-Francois Dockes
29fe1e4927 implemented maxmemberkb limit for multidoc (e.g. archive) members 2012-10-06 09:05:35 +02:00
Jean-Francois Dockes
5b3cb69ee9 let rcldvi and rclps emit ^L page markers for use with %p and evince 2012-10-04 09:49:03 +02:00
Jean-Francois Dockes
b321b0babb skip very big files (50M) in zip tar and rar extractors 2012-10-04 08:22:33 +02:00
Jean-Francois Dockes
2bb14cc6ff none 2012-10-04 08:21:54 +02:00
"Jean-Francois Dockes ext:(%22)
0ebfc496d8 add capability to remember page breaks generated by, e.g. pdftotext, and use them to start an external viewer on a match page 2012-08-21 15:03:02 +02:00
Jean-Francois Dockes
df91cff95f rclsoff: modified to correctly handle exported google docs. Also improves handling regular libreoffice files: spaces were eaten around <span> tags 2012-05-28 09:45:08 +02:00
Jean-Francois Dockes
97ad15c42c Added contributed rcltar filter 2012-05-25 17:04:22 +02:00
Jean-Francois Dockes
eeaf564d4e Handle non-standard file name suffixes during decompression. Recoll should now index arbitrary compressed XML formats. Closes issue #93 2012-05-21 11:50:09 +02:00
Jean-Francois Dockes
cbe7fd21cb rclxml 2012-05-19 09:23:24 +02:00
"Jean-Francois Dockes ext:(%22)
22655319e3 rcldia fix from the author 2012-04-21 20:48:44 +02:00
"Jean-Francois Dockes ext:(%22)
ae01899962 added contributed dia filter 2012-04-03 17:30:08 +02:00
"Jean-Francois Dockes ext:(%22)
544e687afe rclchm: add concatenating mode 2012-04-03 17:29:01 +02:00
"Jean-Francois Dockes ext:(%22)
5f9095b472 Fixed python filter html escaping 2012-04-03 16:46:16 +02:00
Jean-Francois Dockes
8074523a56 rclchm: decode internal urls 2012-03-27 18:51:27 +02:00
Jean-Francois Dockes
fde36ecccc Handle garbled unrtf http-equiv header causing pbs with html5 handler 2012-01-26 19:30:43 +01:00
Jean-Francois Dockes
4c382b00b3 comment 2012-01-23 21:52:46 +01:00
Jean-Francois Dockes
f0a5eb006c okular notes: remove bit of test code 2012-01-23 21:21:11 +01:00
Jean-Francois Dockes
17542969a5 new gnumeric and okular notes filters 2012-01-23 20:25:55 +01:00
Jean-Francois Dockes
dc3aa5d564 stopwords-based charset guessing: use merged dictionary for all words instead of one dictionary per language/charset. Very marginal speed improvement but somewhat cleaner 2012-01-20 14:45:34 +01:00
Jean-Francois Dockes
f9a6be302b karaoke charset guessing: added greek, updated some languages 2012-01-20 14:43:24 +01:00
Jean-Francois Dockes
6d651cf043 karaoke filter/language guesser: use sets to store common words 2012-01-04 16:16:29 +01:00
Jean-Francois Dockes
9aeda04ccb augment the number of test words 10->20, + comments 2012-01-03 21:17:11 +01:00
Jean-Francois Dockes
636b935904 rclchm: use posixpath not path when dealing with internal paths 2011-12-27 17:59:33 +01:00
Jean-Francois Dockes
502f7e783e chm filter: handle files lacking a topics node 2011-12-17 16:41:45 +01:00
Jean-Francois Dockes
5fa720f23d Typo in error-message printing line crashed rclexecm.py 2011-12-17 16:41:16 +01:00
Jean-Francois Dockes
2afc769c38 rclpython: catch exception caused by indentation error in doc 2011-11-28 17:47:02 +01:00
Jean-Francois Dockes
f9f424de42 removed filters replaced by rclaudio/mutagen 2011-11-24 11:59:42 +01:00
Jean-Francois Dockes
ea61e85b8f multi-doc filter: getnext error would cause uncaught exception because of access to uninitialized eof variable 2011-11-04 17:32:14 +01:00