138 Commits

Author SHA1 Message Date
Jean-Francois Dockes
abc45bc156 internfile: transfer metadata from the last extracted (file-like) stage to the final document 2018-11-30 11:55:30 +01:00
Jean-Francois Dockes
23141307f7 internfile:collectIpathAndMT: simplify a bit 2018-11-14 15:09:45 +01:00
Jean-Francois Dockes
d69d2abbde TempFile: clean-up interface by using internal ref-counted class member. Uncomp: add interface to clear cache 2018-05-17 10:24:01 +02:00
Jean-Francois Dockes
29c6f75423 make sure that python rclextract.idoctofile always retrieves an uncompressed file of the correct MIME type. + misc comments 2017-07-20 12:52:24 +02:00
Jean-Francois Dockes
9f02bc8119 prettified LOG lines 2017-07-19 19:15:29 +02:00
Jean-Francois Dockes
19a4b2a287 Do not filter out text/html when it results from a conversion, even if excluded by indexedmimetypes/excludedmimetypes 2017-06-08 10:09:05 +02:00
Jean-Francois Dockes
bde991c08a got rid of off_t 2017-02-28 20:36:01 +01:00
Jean-Francois Dockes
2594b71ae8 log 2017-01-16 11:14:54 +01:00
Jean-Francois Dockes
d80531fa62 Fix mimetype filtering (indexedmimetypes/excludedmimetypes) not working for embedded documents 2017-01-13 09:18:18 +01:00
Jean-Francois Dockes
9ce6530e7b execm filters: the change to let filters set arbitrary metadata lost the top doc size, now saved aside 2016-08-12 18:00:52 +02:00
Jean-Francois Dockes
f6a999de84 logging now uses c++ streams 2016-07-12 09:41:04 +02:00
Jean-Francois Dockes
1aea57fcb2 defined data access interface for external indexers 2016-06-01 09:46:47 +02:00
Jean-Francois Dockes
ff15f8fb1c Centralize stat calls to ensure consistency of time fields on windows 2016-01-08 11:23:10 +01:00
Jean-Francois Dockes
f70c92c629 rcldb::getSubDocs() (called from GUI show subdocs) was returning too many results because the parent/child ipath test was flawed 2015-11-03 08:40:13 +01:00
Jean-Francois Dockes
3b18facc16 Fixed some "unused xxx" warnings + include autoconfig 2015-10-07 08:30:49 +02:00
Jean-Francois Dockes
1cbf02f713 Suppressed many integer size warnings by a mix of type adjustments and casts,
none of which should have a real effect.

--HG--
branch : WINDOWSPORT
2015-09-01 19:39:20 +02:00
Jean-Francois Dockes
14c8e740d6 Windows: fixed a number of int size warnings mostly by casting them away
--HG--
branch : WINDOWSPORT
2015-08-30 17:30:31 +02:00
Jean-Francois Dockes
d4cd1dd91c 1st mods to get a build under windows. Does not build yet, far from it
--HG--
branch : WINDOWSPORT
2015-08-30 11:19:18 +02:00
Jean-Francois Dockes
c6e228b7c6 Prepared windows port by removing a number of spurious reference to unix-specific interfaces, and using some xapian posix adaptor includes 2015-08-19 14:41:10 +02:00
Jean-Francois Dockes
4d1f679eac Use std[::tr1]::shared_ptr instead of local RefCntr by default 2015-08-09 13:54:24 +02:00
Jean-Francois Dockes
0840daf20e Avoid replacing (instead of concatenating) the current author field value with the internal one when the document is a top-level one. This allows metadata from metadatacmds to be used 2015-08-06 08:08:36 +02:00
Jean-Francois Dockes
4d35cbabfb Also index non-html files from the web queue and fix the Open operation for them 2015-07-24 16:30:13 +02:00
Jean-Francois Dockes
d630cbbaec Delete RCL_USE_XATTR configure/compile time variable, it was not
useful. Add configuration variable to use mtime instead of ctime for update
detection. Useful on a system where xattrs would be modified but not
indexed, to avoid excessive reindexing.
2014-12-09 11:15:17 +01:00
Jean-Francois Dockes
4ac34cb134 Off by one error in maximum embedding depth test caused overflow of FileInterner m_tmpflgs temp flags array and possibly bus error depending on arch (only seen on 32 bits arch) 2014-05-15 15:15:01 +02:00
Jean-Francois Dockes
9487a0cffa Code for reaping xattrs and cmd metadata did not need to be implemented as internfile members and can be used in other contexts 2013-10-03 09:38:35 +02:00
Jean-Francois Dockes
ebe9b44a2c fix metadatacmds multifield modif, didnt set anything at all... 2013-09-27 13:04:05 +02:00
Jean-Francois Dockes
3fbcbc8c2b allow multiple field output from metadatacmds entry beginning with rclmulti. Add noxattrfields config variable to allow disabling extended attributes usage 2013-09-27 12:07:32 +02:00
medoc
641acd3d68 move the execution of external metadata-gathering commands from fsindexer to internfile for consistency of handling with filter-generated metadata 2013-09-06 11:51:00 +02:00
Jean-Francois Dockes
243ac82526 missing return statement... 2013-05-26 15:25:16 +02:00
Jean-Francois Dockes
a1b7018cfd Fix problems which occurred when using functions like open-parents with multiple indexes containing identical paths (udis) 2013-05-25 11:26:57 +02:00
Jean-Francois Dockes
167c8a4286 fix minor issues in multisave and popup menus 2013-04-28 16:58:05 +02:00
Jean-Francois Dockes
a7728ceb91 changed the mime handler cache key (was the mime type), to avoid having multiple copies of the same filter when applied to different mime types. This reduces a lot the number of processes during indexing, with no impact on performance 2013-04-25 18:18:48 +02:00
Jean-Francois Dockes
2b80c77c23 Add possibility to display a list of sub-documents for a given result 2013-04-24 16:33:53 +02:00
Jean-Francois Dockes
3c80e51940 simplified temp file handling for compressed documents and, for querying, implemented caching for last file uncompressed 2013-03-06 18:52:57 +01:00
Jean-Francois Dockes
50135e3428 process extended attributes by default 2013-02-19 16:12:24 +01:00
Jean-Francois Dockes
d3631b5ddf cleaned up processing of metadata from diverse origins (doc,extattrs,localfields) 2013-01-29 14:33:57 +01:00
Jean-Francois Dockes
d2f7f11715 Use dynamic lib for shared recoll code 2012-12-29 14:27:01 +01:00
Jean-Francois Dockes
2d5c2a8058 split the iDocToFile method into static and member parts for use from python module 2012-12-20 11:15:10 +01:00
Jean-Francois Dockes
5fc8f240fe from 1.18 branch: Adjust things for using the new Firefox plugin: remove visible Beagle references + fix 1.18 web queue indexing bugs 2012-11-01 11:30:39 +01:00
Jean-Francois Dockes
ee7d0f2ee7 1st parallel multithreaded version of indexing which can do my home without crashing... Let's checkpoint 2012-11-01 11:19:48 +01:00
Jean-Francois Dockes
b8963db4b1 cleaned up the missing helper storage class 2012-10-28 16:43:19 +01:00
Jean-Francois Dockes
95ef518ec7 the missing filter detection code was broken 2012-10-23 19:40:51 +02:00
Jean-Francois Dockes
5add2e2384 Arrange so we can now open the parent of a document (e.g. chm file instead of temp copy of html page inside chm), even when the parent is itself embedded in an archive 2012-10-12 16:54:52 +02:00
Jean-Francois Dockes
8e1ed842d2 message 2012-10-09 14:52:32 +02:00
Jean-Francois Dockes
1329265b7b check for empty file name in internfile, else gets stuck later because empty fn is interpreted as read stdin in md5 2012-10-05 16:42:13 +02:00
"Jean-Francois Dockes ext:(%22)
2870274f80 slightly simplified temp file handling 2012-08-21 08:35:39 +02:00
Jean-Francois Dockes
643f4d56bb internals: virtualized the doc fetcher interface 2012-06-05 07:16:11 +02:00
Jean-Francois Dockes
8b34610dde Cleaned up file name handling. Fixes that file names were sometimes indexed split, sometimes not. They now always are both, with different prefixes. Forces reindex 2012-04-13 09:18:08 +02:00
Jean-Francois Dockes
ec7b40a52e cosmetics: list -> vector in more places 2012-04-11 19:58:08 +02:00
Jean-Francois Dockes
78bd8d63da use vector instead of list for execmd arg list 2012-04-11 15:36:49 +02:00