diff --git a/src/doc/user/usermanual.html b/src/doc/user/usermanual.html index dea2e342..0347adf7 100644 --- a/src/doc/user/usermanual.html +++ b/src/doc/user/usermanual.html @@ -6920,96 +6920,94 @@ recollindex -c "$confdir" -
Index queries do not provide document content (only
- a partial and unprecise reconstruction is performed to
- show the snippets text). In order to access the actual
- document data, the data extraction part of the indexing
- process must be performed (subdocument access and
- format translation). This is not trivial in the case of
- embedded documents. The rclextract module provides a single
- class which can be used to access the data content for
- result documents.
Prior to Recoll
+ 1.25, index queries never provide document content
+ because it is not stored. More recent versions usually
+ store the document text, which can be optionally
+ retrieved when running a query (see query.execute() above - the result is
+ always plain text).
The rclextract module
+ can give access to the original document and to the
+ document text content (if not stored by the index, or
+ to access an HTML version of the text). Acessing the
+ original document is particularly useful if it is
+ embedded (e.g. an email attachment).
You need to import the recoll module before the rclextract module.
An Extractor
- object is built from a Doc object, output from a
- query.
Extract document defined by ipath and
- return a Doc
- object. The doc.text field has the
- document text converted to either text/plain
- or text/html according to doc.mimetype. The typical
- use would be as follows:
++++
-- Extractor(doc)
+- +
+An
+Extractor+ object is built from aDocobject, output from a + query.- Extractor.textextract(ipath)
+- +
-Extract document defined by
+ipathand + return aDoc+ object. Thedoc.textfield has the + document text converted to either text/plain or + text/html according todoc.mimetype. The typical use + would be as follows:+from recoll import recoll, rclextract + qdoc = query.fetchone() extractor = recoll.Extractor(qdoc) doc = extractor.textextract(qdoc.ipath) # use doc.text, e.g. for previewing-Passing
-qdoc.ipathtotextextract()is redundant, - but reflects the fact that theExtractorobject actually - has the capability to access the other - entries in a compound document.- Extractor.idoctofile(ipath, targetmtype, - outfile='')
-- -
+Extracts document into an output file, - which can be given explicitly or will be - created as a temporary file to be deleted by - the caller. Typical use:
-+Passing
+qdoc.ipathtotextextract()is redundant, + but reflects the fact that theExtractorobject actually has + the capability to access the other entries in a + compound document.- Extractor.idoctofile(ipath, targetmtype, + outfile='')
+- +
-Extracts document into an output file, which + can be given explicitly or will be created as a + temporary file to be deleted by the caller. + Typical use:
++from recoll import recoll, rclextract + qdoc = query.fetchone() extractor = recoll.Extractor(qdoc) filename = extractor.idoctofile(qdoc.ipath, qdoc.mimetype)-In all cases the output is a copy, even if - the requested document is a regular system - file, which may be wasteful in some cases. If - you want to avoid this, you can test for a - simple file document as follows:
-+In all cases the output is a copy, even if + the requested document is a regular system + file, which may be wasteful in some cases. If + you want to avoid this, you can test for a + simple file document as follows:
+not doc.ipath and (not "rclbes" in doc.keys() or doc["rclbes"] == "FS")-