diff --git a/src/doc/user/usermanual.html b/src/doc/user/usermanual.html index dea2e342..0347adf7 100644 --- a/src/doc/user/usermanual.html +++ b/src/doc/user/usermanual.html @@ -6920,96 +6920,94 @@ recollindex -c "$confdir" -

Index queries do not provide document content (only - a partial and unprecise reconstruction is performed to - show the snippets text). In order to access the actual - document data, the data extraction part of the indexing - process must be performed (subdocument access and - format translation). This is not trivial in the case of - embedded documents. The rclextract module provides a single - class which can be used to access the data content for - result documents.

+

Prior to Recoll + 1.25, index queries never provide document content + because it is not stored. More recent versions usually + store the document text, which can be optionally + retrieved when running a query (see query.execute() above - the result is + always plain text).

+

The rclextract module + can give access to the original document and to the + document text content (if not stored by the index, or + to access an HTML version of the text). Acessing the + original document is particularly useful if it is + embedded (e.g. an email attachment).

+

You need to import the recoll module before the rclextract module.

Classes
+ "RCL.PROGRAM.PYTHONAPI.RCLEXTRACT.CLASSES.EXTRACTOR" + id= + "RCL.PROGRAM.PYTHONAPI.RCLEXTRACT.CLASSES.EXTRACTOR"> + The Extractor class
-
-
-
-
-
- The Extractor class
-
-
-
-
-
-
Extractor(doc)
-
-

An Extractor - object is built from a Doc object, output from a - query.

-
-
Extractor.textextract(ipath)
-
-

Extract document defined by ipath and - return a Doc - object. The doc.text field has the - document text converted to either text/plain - or text/html according to doc.mimetype. The typical - use would be as follows:

-
+              
+
+
Extractor(doc)
+
+

An Extractor + object is built from a Doc object, output from a + query.

+
+
Extractor.textextract(ipath)
+
+

Extract document defined by ipath and + return a Doc + object. The doc.text field has the + document text converted to either text/plain or + text/html according to doc.mimetype. The typical use + would be as follows:

+
+from recoll import recoll, rclextract
+
 qdoc = query.fetchone()
 extractor = recoll.Extractor(qdoc)
 doc = extractor.textextract(qdoc.ipath)
 # use doc.text, e.g. for previewing
-

Passing qdoc.ipath to textextract() is redundant, - but reflects the fact that the Extractor object actually - has the capability to access the other - entries in a compound document.

-
-
Extractor.idoctofile(ipath, targetmtype, - outfile='')
-
-

Extracts document into an output file, - which can be given explicitly or will be - created as a temporary file to be deleted by - the caller. Typical use:

-
+                    

Passing qdoc.ipath to textextract() is redundant, + but reflects the fact that the Extractor object actually has + the capability to access the other entries in a + compound document.

+
+
Extractor.idoctofile(ipath, targetmtype, + outfile='')
+
+

Extracts document into an output file, which + can be given explicitly or will be created as a + temporary file to be deleted by the caller. + Typical use:

+
+from recoll import recoll, rclextract
+
 qdoc = query.fetchone()
 extractor = recoll.Extractor(qdoc)
 filename = extractor.idoctofile(qdoc.ipath, qdoc.mimetype)
-

In all cases the output is a copy, even if - the requested document is a regular system - file, which may be wasteful in some cases. If - you want to avoid this, you can test for a - simple file document as follows:

-
+                    

In all cases the output is a copy, even if + the requested document is a regular system + file, which may be wasteful in some cases. If + you want to avoid this, you can test for a + simple file document as follows:

+
 not doc.ipath and (not "rclbes" in doc.keys() or doc["rclbes"] == "FS")
 
-
-
-
+
+
diff --git a/src/doc/user/usermanual.xml b/src/doc/user/usermanual.xml index 63dfe560..64d8fe8f 100644 --- a/src/doc/user/usermanual.xml +++ b/src/doc/user/usermanual.xml @@ -5349,40 +5349,45 @@ recollindex -c "$confdir" The rclextract module - Index queries do not provide document content (only a - partial and unprecise reconstruction is performed to show the - snippets text). In order to access the actual document data, the - data extraction part of the indexing process must be performed - (subdocument access and format translation). This is not trivial - in the case of embedded documents. The - rclextract module provides a single class - which can be used to access the data content for result - documents. + + Prior to &RCL; 1.25, index queries never provide document + content because it is not stored. More recent versions usually + store the document text, which can be optionally retrieved when + running a query (see query.execute() + above - the result is always plain text). - - Classes - - - The Extractor class + The rclextract module can give access to + the original document and to the document text content (if not + stored by the index, or to access an HTML version of the text). + Acessing the original document is particularly useful if it is + embedded (e.g. an email attachment). - + You need to import the recoll module + before the rclextract module. + + + The Extractor class - - Extractor(doc) - An Extractor object is - built from a Doc object, output - from a query. - - - Extractor.textextract(ipath) - Extract document defined by - ipath and return a - Doc object. The - doc.text field has the document text - converted to either text/plain or text/html according to - doc.mimetype. The typical use would be - as follows: + + + + Extractor(doc) + An Extractor object is + built from a Doc object, output + from a query. + + + Extractor.textextract(ipath) + Extract document defined by + ipath and return a + Doc object. The + doc.text field has the document text + converted to either text/plain or text/html according to + doc.mimetype. The typical use would be + as follows: +from recoll import recoll, rclextract + qdoc = query.fetchone() extractor = recoll.Extractor(qdoc) doc = extractor.textextract(qdoc.ipath) @@ -5401,6 +5406,8 @@ doc = extractor.textextract(qdoc.ipath) temporary file to be deleted by the caller. Typical use: +from recoll import recoll, rclextract + qdoc = query.fetchone() extractor = recoll.Extractor(qdoc) filename = extractor.idoctofile(qdoc.ipath, qdoc.mimetype) @@ -5417,8 +5424,7 @@ not doc.ipath and (not "rclbes" in doc.keys() or doc["rclbes"] == "FS") - - + diff --git a/src/doc/user/webhelp/Makefile b/src/doc/user/webhelp/Makefile index 9e2d535f..bef829f9 100644 --- a/src/doc/user/webhelp/Makefile +++ b/src/doc/user/webhelp/Makefile @@ -1,6 +1,6 @@ # Configuration # The name of the source DocBook xml file -INPUT_XML = ../usermanual.xml ../recoll.conf.xml +INPUT_XML = ../usermanual.xml # The makefile assumes that you have a # directory named images that contains