doc

2015-01-19 16:57:03 +01:00 · 2015-01-19 16:57:03 +01:00 · d6acbdfd9e
commit d6acbdfd9e
parent 4a987b708e
1 changed files with 93 additions and 56 deletions
--- a/src/doc/user/usermanual.xml
+++ b/src/doc/user/usermanual.xml
@ -50,18 +50,23 @@
    <sect1 id="RCL.INTRODUCTION.TRYIT">
      <title>Giving it a try</title>
-      <para>If you do not like reading manuals (who does?) and would like
+      <para>If you do not like reading manuals (who does?) but 
-        to give &RCL; a try, just <link
+        wish to give &RCL; a try, just <link
-        linkend="RCL.INSTALL.BINARY">install</link> the application and
+        linkend="RCL.INSTALL.BINARY">install</link> the application
-        start the <command>recoll</command> graphical user interface (GUI),
+        and start the <command>recoll</command> graphical user
-        which will ask to index your home directory by default, allowing
+        interface (GUI), which will ask permission to index your home
-        you to search immediately after indexing completes.</para>
+        directory by default, allowing you to search immediately after
        indexing completes.</para>
      <para>Do not do this if your home directory contains a huge
        number of documents and you do not want to wait or are very
        short on disk space. In this case, you may first want to customize
        the <link linkend="RCL.INDEXING.CONFIG">configuration</link>
-        to restrict the indexed area.</para> 
+        to restrict the indexed area (for the very impatient with a completed package install, from the <command>recoll</command> GUI: <menuchoice>
 	    <guimenu>Preferences</guimenu>
 	    <guimenuitem>Indexing configuration</guimenuitem>
          </menuchoice>, then adjust the <guilabel>Top
          directories</guilabel> section).</para>
      <para>Also be aware that you may need to install the
        appropriate <link linkend="RCL.INSTALL.EXTERNAL"> supporting
@ -74,12 +79,12 @@
      <title>Full text search</title>
      <para>&RCL; is a full text search application. Full text search
-        applications let you find your data by content rather
+        finds your data by content rather than by external attributes
-        than by external attributes (like a file name). More
+        (like a file name). You specify words
-        specifically, they will let you specify words (terms) that
+        (terms) which should or should not appear in the text you are
-        should or should not appear in the text you are looking for,
+        looking for, and receive in return a list of matching
-        and return a list of matching documents, ordered so that the
+        documents, ordered so that the most
-        most <emphasis>relevant</emphasis> documents will appear
+        <emphasis>relevant</emphasis> documents will appear
        first.</para>
      <para>You do not need to remember in what file or email message you
@ -88,27 +93,30 @@
        these terms are prominent, in a similar way to Internet search
        engines.</para>
-      <para>A search application tries to determine which documents are
+      <para>Full text search applications try to determine which
-        most relevant to the search terms you provide. Computer algorithms
+        documents are most relevant to the search terms you
-        for determining relevance can be very complex, and in general are
+        provide. Computer algorithms for determining relevance can be
-        inferior to the power of the human mind to rapidly determine
+        very complex, and in general are inferior to the power of the
-        relevance. The quality of relevance guessing is probably the most
+        human mind to rapidly determine relevance. The quality of
-        important aspect when evaluating a search application.</para>
+        relevance guessing is probably the most important aspect when
        evaluating a search application.</para>
-      <para>In many cases, you are looking for all the forms of a
+        <para>In many cases, you are looking for all the forms of a
-        word, not for a specific form or spelling. These different forms
+        word, including plurals, different tenses for a verb, or terms
-        may include plurals, different tenses for a verb, or terms derived
+        derived from the same root or <emphasis>stem</emphasis>
-        from the same root or <emphasis>stem</emphasis> (example: floor,
+        (example: <replaceable>floor, floors, floored,
-        floors, floored, flooring...). Search applications usually expand
+        flooring...</replaceable>). Queries are usually automatically
-        queries to all such related terms (words that reduce to the same
+        expanded to all such related terms (words that reduce to the
-        stem) and also provide a way to disable this expansion if you are
+        same stem). This can be prevented for searching for a specific
-        actually searching for a specific form.</para>
+        form.</para>
      <para>Stemming, by itself, does not accommodate for misspellings or
        phonetic searches. &RCL; supports these features through a specific
        tool (the <literal>term explorer</literal>) which will let you
        explore the set of index terms along different modes.</para>
        <para>Stemming, by itself, does not accommodate for misspellings
        or phonetic searches. A full text search application may also
        support this form of approximation. For example, a search for
        <replaceable>aliterattion</replaceable> returning no result may
        propose, depending on index contents, <replaceable>alliteration
        alteration alterations altercation</replaceable> as possible
        replacement terms. </para>
    </sect1>
@ -120,14 +128,25 @@
      library as its storage and retrieval engine. &XAP; is a very
      mature package using <ulink
      url="http://www.xapian.org/docs/intro_ir.html">a sophisticated
-      probabilistic ranking model</ulink>. &RCL; provides the mechanisms
+      probabilistic ranking model</ulink>.</para>
      and interface to get data into and out of the system.</para>
-      <para>In practice, &XAP; works by remembering where terms appear
+      <para>The &XAP; library manages an index database which
-      in your document files. The acquisition process is called
+      describes where terms appear in your document files. It
-      indexing. </para> 
+      efficiently processes the complex queries which are produced by
      the &RCL; query expansion mechanism, and is in charge of the
      all-important relevance computation task.</para>
-      <para>The resulting index can be big (roughly the size of the
+      <para>&RCL; provides the mechanisms and interface to get data
      into and out of the index. This includes translating the many
      possible document formats into pure text, handling term
      variations (using &XAP; stemmers), and spelling approximations
      (using the <application>aspell</application> speller),
      interpreting user queries and presenting results.</para>
      <para>In a shorter way, &RCL; does the dirty footwork, &XAP;
      deals with the intelligent parts of the process.</para>
      <para>The &XAP; index can be big (roughly the size of the
        original document set), but it is not a document
        archive. &RCL; can only display documents that still exist at
        the place from which they were indexed. (Actually, there is a
@ -136,9 +155,12 @@
        punctuation and capitalization are lost).</para>
      <para>&RCL; stores all internal data in <application>Unicode
-      UTF-8</application> format, and it can index files with
+      UTF-8</application> format, and it can index files of many types
-      different character sets, encodings, and languages into the same
+      with different character sets, encodings, and languages into the
-      index. It has can process many document types.</para>
+      same index. It can process documents embedded inside other
      documents (for example a pdf document stored inside a Zip
      archive sent as an email attachment...), down to an arbitrary
      depth.</para>
      <para>Stemming is the process by which &RCL; reduces words to
        their radicals so that searching does not depend, for example, on a
@ -206,9 +228,12 @@
      <para>The <link linkend="RCL.INDEXING.PERIODIC.EXEC">indexing
          process</link> is started automatically the first time you
-        execute the <command>recoll</command> GUI. Indexing can also be
+        execute the <command>recoll</command> GUI. Indexing can also
-        performed by executing the <command>recollindex</command>
+        be performed by executing the <command>recollindex</command>
-        command.</para>
+        command. &RCL; indexing is multithreaded by default when
        appropriate hardware resources are available, and can perform
        in parallel multiple tasks among text extraction, segmentation
        and index updates.</para>
      <para><link linkend="RCL.SEARCH">Searches</link> are usually
        performed inside the <command>recoll</command> GUI, which has many
@ -220,7 +245,10 @@
          <application>Python</application>
          programming interface</link>, a <link linkend="RCL.SEARCH.KIO">
          <application>KDE</application> KIO slave module</link>, and
-        a <ulink url="&WIKI;UnityLens">Ubuntu Unity Lens</ulink> module.
+        Ubuntu Unity <ulink url="https://bitbucket.org/medoc/unity-lens-recoll">
        Lens</ulink> (for older versions) or 
        <ulink url="https://bitbucket.org/medoc/unity-scope-recoll">
          Scope</ulink> (for current versions) modules.
        </para>
    </sect1>
@ -236,11 +264,11 @@
      <para>Indexing is the process by which the set of documents is
 	analyzed and the data entered into the database. &RCL;
 	indexing is normally incremental: documents will only be
-	processed if they have been modified. On the first execution,
+	processed if they have been modified since the last run. On
-	all documents will need processing. A full index build can be
+	the first execution, all documents will need processing. A
-	forced later by specifying an option to the indexing command
+	full index build can be forced later by specifying an option
-	(<command>recollindex</command> <option>-z</option>
+	to the indexing command (<command>recollindex</command>
-	or <option>-Z</option>).</para> 
+	<option>-z</option> or <option>-Z</option>).</para>
      <para>The following sections give an overview of different
 	aspects of the indexing processes and configuration, with links
@ -1853,6 +1881,11 @@ MimeType=*/*
      term is not known. For example, you may not remember the exact
      spelling, or only know the beginning of the name.</para>
      <para>The search will only propose replacement terms with
      spelling variations when no matching document were found. In some
      cases, both proper spellings and mispellings are present in the
      index, and it may be interesting to look for them explicitely.</para>
      <para>The term explorer tool (started from the toolbar icon or
      from the <guilabel>Term explorer</guilabel> entry of the
      <guilabel>Tools</guilabel> menu) can be used to search the full index
@ -4636,9 +4669,11 @@ except:
        <listitem><para>Openoffice files need <command>unzip</command> and
        <command>xsltproc</command>.</para></listitem>
-        <listitem><para>PDF files need <command>pdftotext</command> which
+        <listitem><para>PDF files need <command>pdftotext</command>
-        is part of the <application>Xpdf</application> or
+        which is part of <application>Poppler</application> (usually
-        <application>Poppler</application> packages.</para></listitem>
+        comes with the <literal>poppler-utils</literal>
        package). Avoid the original one from 
        <application>Xpdf</application>.</para></listitem>
        <listitem><para>Postscript files need <command>pstotext</command>. 
            The original version has an issue with shell
@ -4663,9 +4698,11 @@ except:
        <application>libwpd-tools</application> on Ubuntu)
        package.</para></listitem>
-        <listitem><para>RTF files need <command>unrtf</command>, which, in
+        <listitem><para>RTF files need <command>unrtf</command>,
-        its standard version, has much trouble with non-western character
+        which, in its older versions, has much trouble with
-        sets. Check  &RCLAPPS;.</para></listitem>
+        non-western character sets. Many Linux distributions carry
        outdated <command>unrtf</command> versions. Check
        &RCLAPPS; for details.</para></listitem>
        <listitem><para>TeX files need <command>untex</command> or
        <command>detex</command>. Check &RCLAPPS; for sources if it's not