Use own code to parse rfc822 dates, strptime() cant do

2006-09-15 16:50:44 +00:00 · 2006-09-15 16:50:44 +00:00 · cfe1dd5d9f
commit cfe1dd5d9f
parent de7a312051
5 changed files with 559 additions and 188 deletions
--- a/src/VERSION
+++ b/src/VERSION
@ -1 +1 @@
-1.4.4
+1.5.0
--- a/src/doc/user/usermanual.sgml
+++ b/src/doc/user/usermanual.sgml
@ -24,7 +24,7 @@
      Dockes</holder>
    </copyright>

-    <releaseinfo>$Id: usermanual.sgml,v 1.16 2006-09-11 14:22:15 dockes Exp $</releaseinfo>
+    <releaseinfo>$Id: usermanual.sgml,v 1.17 2006-09-15 16:50:44 dockes Exp $</releaseinfo>

    <abstract>
      <para>This document introduces full text search notions
@ -42,18 +42,18 @@

      <para>If you do not like reading manuals (who does?) and would
      like to give &RCL; a try, just perform <link
-      linkend="rcl.install">installation</link> and start the
+      linkend="rcl.install.binary">installation</link> and start the
      <command>recoll</command> user interface, which will index your
      home directory by default, allowing you to search immediately after
      indexing completes.</para>

-      <para>Do not do this if your home has a huge
+      <para>Do not do this if your home directory contains a huge
      number of documents and you do not want to wait or are very
      short on disk space. In this case, you may want to edit the <link
      linkend="rcl.indexing.config">configuration file</link> first to
      restrict the indexed area.</para>

-      <para>Also be aware that you will need to install the
+      <para>Also be aware that you may need to install the
      appropriate <link linkend="rcl.install.external">
      supporting applications</link> for document types that need
      them (for example <application>antiword</application> for
@ -186,7 +186,7 @@
      <link linkend="rcl.indexing.automat">programmed</link> into your
      <command>cron</command> file.</para>

-      <sidebar><para>Side note: there is nothing in &RCL; and &XAP;
+      <sidebar><para>There is nothing in &RCL; and &XAP;
      that would prevent interfacing with a real time file
      modification monitor, but this would tend to consume significant
      system resources for dubious gain, because you rarely need a
@ -196,7 +196,6 @@
      the manual page.</para>
      </sidebar>

-
      <para>&RCL; knows about quite a few different document
      types. The parameters for document types recognition and
      processing are set in 
@ -209,14 +208,23 @@
      <para>&RCL; indexing processes plain text, HTML, openoffice
      and e-mail files internally. Other types (ie: postscript, pdf,
      ms-word, rtf) need external applications for preprocessing. The
-      list is in the <link
-      linkend="rcl.install.building.prereqs">installation</link>
-      section.</para>
+      list is in the <link linkend="rcl.install.external">
+      installation</link> section.</para>

      <para>Without further configuration, &RCL; will index all
      appropriate files from your home directory, with a reasonable
      set of defaults.</para>

+      <para>In some cases, it may be interesting to index different
+      areas of the file system to separate databases. You can do this
+      by using multiple configuration directories, each indexing a
+      file system area to a specific database. You would use the
+      <literal>RECOLL_CONFDIR</literal> environment variable or the
+      <literal>-c</literal> <replaceable>confdir</replaceable> option
+      to <command>recollindex</command> to indicate which
+      configuration to process. The <command>recoll</command> search
+      program can use any selection of the existing databases for each
+      search, this is configurable inside the user interface.</para>
    </sect1>

    <sect1 id="rcl.indexing.storage">
@ -227,7 +235,7 @@
      be changed by setting the <literal>RECOLL_CONFDIR</literal>
      environment variable, or by specifying the
      <literal>dbdir</literal> parameter in the configuration file
-      (see the <link linkend="rcl.install.config">configuration
+      (see the <link linkend="rcl.install.config.recollconf">configuration
      section</link>).</para>

      <para>The size of the index is determined by the size of the set
@ -245,8 +253,9 @@
      (2006), that even a big index will be negligible against the
      total amount of data on the computer.</para>
      
-      <para>The index data directory only contains data that will be
-      rebuilt by an index run, so that it can be destroyed safely.</para>
+      <para>The index data directory (<filename>xapiandb</filename>)
+      only contains data that will be rebuilt by an index run, and it 
+      can always be destroyed safely.</para>

      <sect2 id="rcl.indexing.storage.security">
 	<title>Security aspects</title>
@ -258,13 +267,13 @@

 	<para>As of version 1.4, &RCL; will create the configuration
 	directory with a mode of 0700 (access by owner only). As the
-	index directory is by default a subdirectory of the
+	index data directory is by default a subdirectory of the
 	configuration directory, this should result in appropriate
-	protection. </para> 
+	protection.</para> 

 	<para>If you use another setup, you should think of the kind
 	of protection you need for your index, and set the directory
-	access modes appropriately.</para>
+	and files access modes appropriately.</para>

      </sect2>

@ -306,21 +315,25 @@
      <para>Indexing is performed either by the
        <command>recollindex</command> program, or by the
        indexing thread inside the <command>recoll</command>
-        program (use the <guimenu>File</guimenu> menu). 
+        program (use the <guimenu>File</guimenu> menu). Both programs
+        will use of the <literal>RECOLL_CONFDIR</literal>
+        variable or accept a <literal>-c</literal>
+        <replaceable>confdir</replaceable> option to specify the
+        configuration directory to be used.</para>

      <para>If the <command>recoll</command> program finds no index
-      when it starts, it will automatically start indexing (except
-      if cancelled).</para>
+       when it starts, it will automatically start indexing (except
+       if cancelled).</para>

      <para>It is best to avoid interrupting the indexing process, as
-        this may sometimes leave the database in a bad state.  This is
+        this may sometimes leave the index in a bad state.  This is
        not a serious problem, as you then just need to clear
        everything and restart the indexing: the index files are
        normally stored in the <filename>$HOME/.recoll/xapiandb</filename>
-        directory, 
-        which you can just delete if needed. Alternatively, you can
-        start <command>recollindex -z</command>, which will
-        reset the database before indexing.</para>
+        directory, which you can just delete if needed. Alternatively,
+        you can start <command>recollindex</command> with option
+        <literal>-z</literal>, which will reset the database before
+        indexing.</para> 

    </sect1>

@ -380,46 +393,153 @@
        (<literal>*</literal>, <literal>?</literal> ,
        <literal>[]</literal>). </para>

+      <para>You can search for exact phrases (adjacent words in a
+      given order) by enclosing the input inside double quotes. Ex:
+     <literal>"virtual reality"</literal>.</para>
+      <para>Character case has no influence on search, except that you
+      can disable stem expansion for any term by capitalizing it. Ie:
+      a search for <literal>floor</literal> will also normally look for 
+      <literal>flooring</literal>, <literal>floored</literal>, etc., but
+      a search for <literal>Floor</literal> will only look for
+      <literal>floor</literal>, in any character case (stemming can
+      also be disabled globally in the preferences). </para>
+
      <para>&RCL; remembers the last few searches that you
-      performed. You can use the simple search text entry widget (a
-      combobox) to recall them (click on the thing at the right of the
-      text field). Please note, however, that only the search texts
-      are remembered, not the mode (all/any/filename).</para>
+        performed. You can use the simple search text entry widget (a
+        combobox) to recall them (click on the thing at the right of the
+        text field). Please note, however, that only the search texts
+        are remembered, not the mode (all/any/filename).</para>
+
+      <para>Hitting <keycap>^Tab</keycap> (<keycap>Ctrl</keycap> +
+        <keycap>Tab</keycap>) while entering a word in the 
+        simple search entry will open a window with possible completions
+        for the word. The completions are extracted from the
+        database.</para>
+
+      <para>Double-clicking on a word in the result list or a preview
+      window will insert it into the simple search entry field.</para>

      <para>You can use the <guilabel>Tools</guilabel> / <guilabel>Advanced
        search</guilabel> dialog for more complex searches.</para>
+    </sect1>
+
+    <sect1 id="rcl.search.reslist">
+      <title>The result list</title>

      <para>After starting a search, a list of results will instantly
-      be displayed in the main list window. Clicking on the
-      <literal>Preview</literal> link for an entry will open an
-      internal preview window for the document. Clicking the
-      <literal>Edit</literal> link will attempt to start an external
-      viewer (have a look at the <filename>mimeconf</filename>
-      configuration file to see how these are configured).</para>
+       be displayed in the main list window.</para> 

      <para>By default, the document list is presented in order of
-      relevance (how well the system estimates that the document
-      matches the query). You can specify a different ordering by
-      using the  <link linkend="rcl.search.sort"><guilabel>Tools</guilabel>
+       relevance (how well the system estimates that the document
+       matches the query). You can specify a different ordering by
+       using the  <link linkend="rcl.search.sort"><guilabel>Tools</guilabel>
        / <guilabel>Sort parameters</guilabel></link> dialog.</para>

+      <para>Clicking on the
+       <literal>Preview</literal> link for an entry will open an
+       internal preview window for the document. Clicking the
+       <literal>Edit</literal> link will attempt to start an external
+       viewer (have a look at the <filename>mimeconf</filename>
+       configuration file to see how these are configured).</para>
+
      <para>The <literal>Preview</literal> and <literal>Edit</literal>
-      edit links may not be present for all entries, meaning that
-      &RCL; has no configured way to preview a given file type (which
-      was indexed by name only), or no configured external viewer for
-      the file type. This can sometimes be adjusted simply by tweaking
-      the <link linkend="rclinstall.config.mimemap">
+       edit links may not be present for all entries, meaning that
+       &RCL; has no configured way to preview a given file type (which
+       was indexed by name only), or no configured external viewer for
+       the file type. This can sometimes be adjusted simply by tweaking
+       the <link linkend="rclinstall.config.mimemap">
             <filename>mimemap</filename></link> and  
-      <link linkend="rclinstall.config.mimeconf">
+       <link linkend="rclinstall.config.mimeconf">
       <filename>mimeconf</filename></link> configuration files.</para> 

      <para>You can click on the <literal>Query details</literal> link
-      at the top of the results page to see the query actually 
-      performed, after stem expansion and other processing.</para>
+        at the top of the results page to see the query actually 
+        performed, after stem expansion and other processing.</para>
+
+      <para>Double-clicking on any word inside the result list or a
+      preview window will insert it into the simple search text.</para>
+
+      <para>The result list is divided into pages (the size of which
+       you can change in the preferences). Use the arrow buttons in the
+       toolbar or the links at the bottom of the page to browse the
+       results.</para>
+
+
+      <sect2 id="rcl.search.resultlist.menu">
+	<title>The result list right-click menu</title>
+
+	<para>Apart from the preview and edit links, you can display a
+          popup menu by right-clicking over a paragraph in the result
+         list. This menu has the following entries:</para>
+
+	<itemizedlist>
+	  <listitem><para><guilabel>Preview</guilabel></para></listitem>
+	  <listitem><para><guilabel>Edit</guilabel></para></listitem>
+	  <listitem><para><guilabel>Copy File Name</guilabel></para></listitem>
+	  <listitem><para><guilabel>Copy Url</guilabel></para></listitem>
+	  <listitem><para><guilabel>Find similar</guilabel></para></listitem>
+	</itemizedlist>
+
+	<para>The <guilabel>Preview</guilabel> and
+          <guilabel>Edit</guilabel> entries do the same thing as the 
+          corresponding links. The two following entries will copy either
+          an url or the file path to the clipboard, for pasting into
+          another application.</para>
+
+        <para>The <guilabel>Find similar</guilabel> entry will select
+          a number of relevant term from the current document and enter
+          them into the simple search field. You can then start a simple
+          search, with a good chance of finding documents related to the
+         current result.</para>
+
+      </sect2>
+    </sect1>
+
+    <sect1 id="rcl.search.preview">
+      <title>The preview window</title>
+
+      <para>The preview window opens when you first click a
+      <literal>Preview</literal> link inside the result list.</para>
+
+      <para>Subsequent preview requests for a given search open new
+      tabs in the existing window.</para>
+      
+      <para>Starting another search and requesting a preview will
+      create a new preview window. The old one stays open until you
+      close it.</para>
+
+      <para>You can close a preview tab by typing <keycap>^W</keycap> 
+      (<keycap>Ctrl</keycap> + <keycap>W</keycap>) in the
+      window. Closing the last tab for a window will also close the
+      window.</para> 
+
+      <para>Of course you can also close a preview window by using the
+      window manager button in the top of the frame.</para>
+
+      <para>You can display successive or previous documents from the
+      result list inside a preview tab by typing
+      <keycap>Shift</keycap>+<keycap>Down</keycap> or
+      <keycap>Shift</keycap>+<keycap>Up</keycap> (<keycap>Down</keycap>
+      and <keycap>Up</keycap> are the arrow keys).</para> 
+
+      <para>The preview tabs have an internal incremental search
+      function. You initiate the search either by typing a
+      <keycap>/</keycap> (slash) inside the text area or by clicking
+      into the <guilabel>Search for:</guilabel> text field and
+      entering the search string. You can then use the
+      <guilabel>Next</guilabel> and <guilabel>Previous</guilabel>
+      buttons to find the next/previous occurence. You can also type
+      <keycap>F3</keycap> inside the text area to get to the next
+      occurrence.</para>
+
+      <para>If you have a search string entered and you use ^Up/^Down
+      to browse the results, the search is initiated for each successive
+      document. If the string is found, the cursor will be positionned
+      at the first occurrence of the search string.</para>

    </sect1>

-      <sect1 id="rcl.search.complex">
+    <sect1 id="rcl.search.complex">
      <title>Complex/advanced search</title>

      <para>The advanced search dialog has fields that will allow a more
@ -427,19 +547,25 @@
        given exact phrase, none of the given elements, or a given file
        name (with wildcard expansion). All relevant fields will be
        combined by an implicit AND clause. All fields except "Exact
-        phrase" can accept single words, or phrases enclosed in double
-        quotes.</para>
+        phrase" can accept a mix of single words and phrases enclosed
+        in double quotes.</para>

-      <para>It will let you search for documents of specific mime
+      <para>Advanced search will let you search for documents of specific mime
        types (ie: only <literal>text/plain</literal>, or
        <literal>text/html</literal> or
-        <literal>application/pdf</literal> etc...)</para>
+        <literal>application/pdf</literal> etc...). The state of the
+        file type selection can be saved as the default (the file type
+        filter will not be activated at program startup, but the lists
+        will be in the restored state).</para>

-      <para>It will let you restrict the search results to a subtree of
-        the indexed area.</para>
+      <para>You can also restrict the search results
+      to a subtree of the indexed area. If you need to do this often,
+      you may think of setting up multiple indexes instead, as the
+      performance will be much better.</para>

      <para>Click on the <guilabel>Start Search</guilabel> button in
-      the advanced search dialog to start the search. The button in
+      the advanced search dialog, or type <keycap>Enter</keycap> in
+      any text field to start the search. The button in
      the main window always performs a simple search.</para>

      <para>Click on the <literal>Show query details</literal> link at
@ -450,29 +576,57 @@
    <sect1 id="rcl.search.multidb">
      <title>Multiple databases</title>

-      <para>Your &RCL; configuration always defines a main index. This
-      is what gets updated, for example, when you execute
-      <command>recollindex</command>. </para>
+      <para>Multiple &RCL; databases or indexes can be created by
+      using several configuration directories which are usually set to
+      index different areas of the file system. A specific index can
+      be selected for updating or searching, using the
+      <literal>RECOLL_CONFDIR</literal> environment variable or the
+      <literal>-c</literal> option to <command>recoll</command> and
+      <command>recollindex</command>.</para>

-      <para>You can use the <link
-      linkend="rcl.search.custom.extradb">search configuration
-      tool</link> to define additional databases to be searched. These
-      databases can be made active or inactive at any moment.</para>
+      <para>A <command>recollindex</command> program instance can only
+      update one specific index.</para>

-      <para>The typical use of this feature is for a system
-      administrator to set up a central index, that you may choose to
-      search, or not, in addition to your personal data. Of course,
-      there are other possibilities.</para>
+      <para>A <command>recoll</command> program instance is also
+      associated with a specific index, which is the one to be
+      updated by its indexing thread, but it can use any
+      number of &RCL; indexes for searching. The external indexes
+      can be selected through the <guilabel>external
+      indexes</guilabel> tab in the preferences dialog.</para>

-      <para>The main index (defined by your personal configuration) is
-      always active.</para>
+      <para>Index selection is performed in two phases. A set of all
+      usable indexes must first be defined, and then the subset of
+      indexes to be used for searching. Of course, these parameters
+      are retained across program executions (there are kept
+      separately for each &RCL; configuration). The set of all indexes
+      is usually quite stable, while the active ones might typically
+      be adjusted quite frequently.</para>

-      <para>The list of searchable databases may also be defined by
-      the <literal>RECOLL_EXTRA_DBS</literal> environment
-      variable. This should hold a colon-separated list of index
-      directories, ie: 
+      <para>The main index (defined by
+      <literal>RECOLL_CONFDIR</literal>) is always active. If this is
+      undesirable, you can set up your base configuration to index
+      an empty directory.</para>
+
+      <para>As building the set of all indexes can be a little tedious
+      when done through the user interface, you can use the
+      <literal>RECOLL_EXTRA_DBS</literal> environment
+      variable to provide an initial set. This might typically be
+      set up by a system administrator so that every user does not
+      have to do it. The variable should define a colon-separated list
+      of index  directories, ie: 
+     </para>
       <screen>export RECOLL_EXTRA_DBS=/some/place/xapiandb:/some/other/db</screen> 
-     </para> 
+
+      <para>A typical usage scenario for the multiple index feature
+      would be for a system administrator to set up a central index
+      for shared data, that you may choose to search, or not, in
+      addition to your personal data. Of course, there are other
+      possibilities. There are many cases where you know the subset of
+      files that you want to be searched for a given query, and where
+      restricting the query will much improve the precision of the
+      results. This can also be performed with the directory filter in
+      advanced search, but multiple indexes will have much better
+      performance and may be worth the trouble.</para>

    </sect1>

@ -488,7 +642,7 @@
    </sect1>

    <sect1 id="rcl.search.sort">
-      <title>Result list sorting</title>
+      <title>Sorting search results</title>

      <para>The documents in a result list are normally sorted in
      order of relevance. It is possible to specify different sort
@ -507,35 +661,6 @@

    </sect1>

-    <sect1 id="rcl.search.resultlist">
-      <title>Additional result list functionality</title>
-
-      <para>Apart from the preview and edit links, you can display a
-      popup menu by right-clicking over a paragraph in the result
-      list. This menu has the following entries:</para>
-
-      <itemizedlist>
-        <listitem><para><guilabel>Preview</guilabel></para></listitem>
-        <listitem><para><guilabel>Edit</guilabel></para></listitem>
-        <listitem><para><guilabel>Copy File Name</guilabel></para></listitem>
-        <listitem><para><guilabel>Copy Url</guilabel></para></listitem>
-        <listitem><para><guilabel>Find similar</guilabel></para></listitem>
-      </itemizedlist>
-
-      <para>The <guilabel>Preview</guilabel> and
-      <guilabel>Edit</guilabel> entries do the same thing as the 
-      corresponding links. The two following entries will copy either
-      an url or the file path to the clipboard, for pasting into
-      another application.</para>
-
-      <para>The <guilabel>Find similar</guilabel> entry will select
-      a number of relevant term from the current document and enter
-      them into the simple search field. You can then start a simple
-      search, with a good chance of finding documents related to the
-      current result.</para>
-
-    </sect1>
-
    <sect1 id="rcl.search.tips">
      <title>Search tips, shortcuts</title>

@ -555,11 +680,27 @@
        only for occurrences of <literal>user</literal> immediately
        followed by <literal>manual</literal>. You can use the
        <guilabel>This exact phrase</guilabel> field of the advanced
-        search dialog to the same effect.</para>
+        search dialog to the same effect. Phrases can be entered along
+        simple terms in all search entry fields (except <guilabel>This
+        exact phrase</guilabel>).</para>
      </formalpara>

+      <formalpara><title>AutoPhrases</title>
+      <para>This option can be set in the preferences dialog. If it is
+      set, a phrase will be automatically built and added to simple
+      searches when looking for <literal>Any terms</literal>. This
+      will not change radically the results, but will give a relevance
+      boost to the results where the search terms appear as a
+      phrase. Ie: searching for <literal>virtual reality</literal>
+      will still find all documents where either
+      <literal>virtual</literal> or <literal>reality</literal> or 
+      both appear, but those which contain <literal>virtual
+      reality</literal> should appear sooner in the list.</para>
+
+
      <formalpara><title>Term completion</title>
-	<para>Typing <keycap>^TAB</keycap> (Control+Tab) in the simple
+	<para>Typing <keycap>^TAB</keycap> (<keycap>Control</keycap> +
+	<keycap>Tab</keycap>) in the simple
 	search entry field while entering a word will either complete
 	the current word if its beginning matches a unique term in the
 	index, or open a window to propose a list of completions</para>
@ -572,7 +713,7 @@
      </formalpara>

      <formalpara><title>Finding related documents</title>
-	<para>Selecting the <guilabel>More like this</guilabel> entry
+	<para>Selecting the <guilabel>Find similar documents</guilabel> entry
 	in the result list paragraph right-click menu will select a
 	set of "interesting" terms from the current result, and insert
 	them into the simple search entry field. You can then possibly
@ -591,7 +732,7 @@
        specify them as ordinary terms in normal search fields (&RCL; used
        to index all directories in the file path as terms. This has been
        abandonned as it did not seem really useful). Alternatively, you
-        can use specific file name search which will
+        can use the specific file name search which will
        <emphasis>only</emphasis> look for file names and can use wildcard
        expansion.</para>
      </formalpara>
@ -607,6 +748,14 @@
        close it (and, for the last tab, close the preview window).</para>
      </formalpara>

+      <formalpara><title>List browsing in preview</title> 
+       <para>Entering <keycap>Shift-Down</keycap> or <keycap>Shift-Up</keycap>
+       (<keycap>Shift</keycap> + an arrow key) in a preview window will
+       display the next or the previous document from the result
+       list. Any secondary search currently active will be executed on
+       the new document.</para>
+      </formalpara>
+
    </sect1>

    <sect1 id="rcl.search.custom">
@ -664,16 +813,17 @@
      <formalpara><title>Search parameters:</title>
 	<para>
      <itemizedlist>
+
 	<listitem><para><guilabel>Stemming language</guilabel>:
 	stemming obviously depends on the document's language. This
 	listbox will let you chose among the stemming databases which
 	were built during indexing (this is set in the <link
 	linkend="rcl.install.config.recollconf">main configuration
 	file</link>), or later added with
-      <command>recollindex -s</command> (See the recollindex
-      manual). Stemming languages which are dynamically added will be
-      deleted at the next indexing pass unless they are also added in
-      the configuration file.</para>
+        <command>recollindex -s</command> (See the recollindex
+        manual). Stemming languages which are dynamically added will be
+        deleted at the next indexing pass unless they are also added in
+        the configuration file.</para>
 	</listitem>

 	<listitem><para><guilabel>Dynamically build
@ -684,29 +834,38 @@
 	result list display significantly for big documents, and you
 	may want to turn it off.</para>
 	</listitem>
+
 	<listitem><para><guilabel>Replace abstracts from
 	documents</guilabel>: this decides if we should synthetize and
 	display an abstract in place of an explicit abstract found
 	within the document itself.</para>
 	</listitem>
+
+	<listitem><para><guilabel>Synthetic abstract size</guilabel>:
+	adjust to taste...</para>
+	</listitem>
+
+	<listitem><para><guilabel>Synthetic abstract context
+	words</guilabel>: how many words should be displayed around
+	each term occurrence.</para>
+	</listitem>
+
      </itemizedlist>
       </para>
      </formalpara>

-      <formalpara id="rcl.search.custom.extradb"><title>Extra
-      databases:</title> 
-	<para></para>
-      </formalpara>
-      <para>This panel will let you browse for additional databases
-      that you may want to search. Extra databases are designated by
+      <formalpara id="rcl.search.custom.extradb">
+	<title>External indexes:</title> 
+      <para>This panel will let you browse for additional indexes
+      that you may want to search. External indexes are designated by
      their database directory (ie:
      <filename>/home/someothergui/.recoll/xapiandb</filename>,
      <filename>/usr/local/recollglobal/xapiandb</filename>).</para>

-      <para>Once entered, the databases will appear in the
-	<guilabel>All extra databases</guilabel> list, and you can
+      <para>Once entered, the indexes will appear in the
+	<guilabel>All indexes</guilabel> list, and you can
 	chose which ones you want to use at any moment by tranferring
-	them to/from the <guilabel>Active extra databases</guilabel>
+	them to/from the <guilabel>Active indexes</guilabel>
 	list.</para> 
      <para>Your main database (the one the current configuration
      indexes to), is always implicitely active. If this is not
@ -721,6 +880,51 @@
  <chapter id="rcl.install">
    <title>Installation</title>

+    <sect1 id="rcl.install.binary">
+      <title>Installing a prebuilt copy</title>
+
+      <para>Recoll binary installations are always linked statically
+        to the xapian libraries, and have no other dependencies. You
+        will only have to check or install 
+        <link linkend="rcl.install.external">supporting
+        applications</link> for the file types that you want to index
+        beyond text, html and mail files.</para> 
+
+      <sect2 id="rcl.install.binary.package">
+        <title>Installing through a package system</title>
+
+        <para>If you use a BSD-type port system or a
+         prebuilt package (RPM or other), just follow the usual
+         procedure, and maybe have a look at the <link
+         linkend="rcl.install.config">configuration
+         section</link> (but this may not be necessary for a quick
+         test with default parameters).</para>
+
+      </sect2>
+
+      <sect2 id="rcl.install.binary.rcl">
+        <title>Installing a prebuilt &RCL;</title>
+
+      <para>The unpackaged binary versions are just compressed tar
+        files of a build tree, where only the useful parts were kept
+        (executables and sample configuration).</para>
+
+      <para>The executable binary files are built with a static link to
+        libxapian and libiconv, to make installation easier (no
+        dependencies). However, this also means that you cannot change
+        the versions which are used.</para> 
+
+      <para>After extracting the tar file, you can proceed with
+        <link
+        linkend="rcl.install.building.install">installation</link> as
+        if you had built the package from source.</para> 
+
+	<para>The binary trees are built for installation to
+	<filename>/usr/local</filename>.</para>
+      </sect2>
+
+
+    </sect1>
      <sect1 id="rcl.install.building">
      <title>Building from source</title>

@ -815,46 +1019,19 @@
        and the sample configuration files, scripts and other shared
        data to
        <filename><replaceable>prefix</replaceable>/share/recoll</filename>.</para>
+	<para>If the installation prefix given to
+	<command>recollinstall</command> is different from what was
+	specified when executing <command>configure</command>, you
+	will have to set the <literal>RECOLL_DATADIR</literal>
+	environment variable to indicate where the shared data is to
+	be found.</para>
+
 	<para>You can then proceed to <link
 	linkend="rcl.install.config">configuration</link>. </para>

      </sect2>
    </sect1>

-    <sect1 id="rcl.install.binary">
-      <title>Installing a prebuilt copy</title>
-
-      <sect2 id="rcl.install.binary.package">
-        <title>Installing through a package system</title>
-
-        <para>If you are lucky enough to be using a port system or a
-        prebuilt package (RPM or other), just follow the usual
-        procedure, and have a look at the <link
-        linkend="rcl.install.config">configuration
-        section</link>.</para>
-      </sect2>
-
-      <sect2 id="rcl.install.binary.rcl">
-        <title>Installing a prebuilt &RCL;</title>
-
-      <para>The unpackaged binary versions are just compressed tar
-      files of a build
-        tree, where only the useful parts were kept (executables and
-        sample configuration).</para>
-
-      <para>The executable binary files are built with a static link to
-        libxapian and libiconv, to make installation easier (no
-        dependencies). However, this also means that you cannot change
-        the versions which are used.</para> 
-
-      <para>After extracting the tar file, you can proceed with
-        <link
-        linkend="rcl.install.building.install">installation</link> as
-        if you had built the package from source.</para> 
-      </sect2>
-
-
-    </sect1>


    <sect1 id="rcl.install.external">
@ -880,6 +1057,11 @@
            antiword</ulink>.</para>
          </listitem>

+        <listitem><para>MS Excel and PowerPoint: 
+           <ulink url="http://www.45.free.net/~vitus/software/catdoc/"> 
+            catdoc</ulink>.</para>
+          </listitem>
+
        <listitem>
            <para>RTF: <ulink
            url="http://www.gnu.org/software/unrtf/unrtf.html">unrtf</ulink>
@ -1012,6 +1194,14 @@
            </listitem> 
          </varlistentry>

+          <varlistentry><term><literal>dbdir</literal></term>
+            <listitem><para>The name of the Xapian data directory. It
+            will be created if needed when the index is
+            initialized. If this is not an absolute path, it will be
+            interpreted relative to the configuration directory.</para>
+            </listitem>
+          </varlistentry>
+
          <varlistentry><term><literal>skippedNames</literal></term>
            <listitem>
              <para>A space-separated list of patterns for
@ -1074,22 +1264,7 @@
            </listitem>
          </varlistentry>

-          <varlistentry><term><literal>iconsdir</literal></term>
-            <listitem><para>The name of the directory where
-            <command>recoll</command> result list icons are
-            stored. You can change this if you want different
-            images.</para>
-            </listitem>
-          </varlistentry>
-
-          <varlistentry><term><literal>dbdir</literal></term>
-            <listitem><para>The name of the Xapian data directory. It
-            will be created if needed when the index is
-            initialized. If this is not an absolute path, it will be
-            interpreted relative to the configuration directory.</para>
-            </listitem>
-          </varlistentry>
-          
+         
          <varlistentry><term><literal>defaultcharset</literal></term>
            <listitem><para>The name of the character set used for
            files that do not contain a character set definition (ie:
@ -1128,6 +1303,25 @@
 	    </listitem>
 	  </varlistentry>

+	  <varlistentry><term><literal>idxabsmlen</literal></term>
+	    <listitem><para>&RCL; stores an abstract for each indexed
+	    file inside the database. This is so that they can be
+	    displayed inside the result lists without decoding the
+	    original file. This parameter defines the size of the
+	    stored abstract (which can come from an actual section or
+	    just be the beginning of the text). The default value is 250.
+            </para>
+	    </listitem>
+	  </varlistentry>
+
+          <varlistentry><term><literal>iconsdir</literal></term>
+            <listitem><para>The name of the directory where
+            <command>recoll</command> result list icons are
+            stored. You can change this if you want different
+            images.</para>
+            </listitem>
+          </varlistentry>
+
        </variablelist>

      </sect2>
--- a/src/internfile/mh_mail.cpp
+++ b/src/internfile/mh_mail.cpp
@ -1,5 +1,5 @@
 #ifndef lint
-static char rcsid[] = "@(#$Id: mh_mail.cpp,v 1.16 2006-09-05 17:09:30 dockes Exp $ (C) 2005 J.F.Dockes";
+static char rcsid[] = "@(#$Id: mh_mail.cpp,v 1.17 2006-09-15 16:50:44 dockes Exp $ (C) 2005 J.F.Dockes";
 #endif
 /*
 *   This program is free software; you can redistribute it and/or modify
@ -216,19 +216,14 @@ MimeHandlerMail::processone(const string &fn, Binc::MimeDocument& doc,
    }
    if (doc.h.getFirstHeader("Date", hi)) {
 	rfc2047_decode(hi.getValue(), transcoded);
-	// Try to set the mtime from the date field.
-	string date = transcoded;
-	string::size_type pos;
-	// Possibly get rid of the day
-	if ((pos = date.find(",")) != string::npos)
-	    date = date.substr(pos+1);
-	struct tm tm;
-	if (strptime(date.c_str(), " %d %b %Y %H:%M:%S %z ", &tm)) {
+	time_t t = rfc2822DateToUxTime(transcoded);
+	if (t != (time_t)-1) {
 	    char ascuxtime[100];
-	    sprintf(ascuxtime, "%ld", (long)mktime(&tm));
+	    sprintf(ascuxtime, "%ld", (long)t);
 	    docout.dmtime = ascuxtime;
 	} else {
-	    LOGDEB(("strptime failed for [%s]\n", date.c_str()));
+	    // Leave mtime field alone, ftime will be used instead.
+	    LOGDEB(("rfc2822Date...: failed for [%s]\n", transcoded.c_str()));
 	}

 	docout.text += string("Date: ") + transcoded + string("\n");
--- a/src/utils/mimeparse.cpp
+++ b/src/utils/mimeparse.cpp
@ -1,5 +1,5 @@
 #ifndef lint
-static char rcsid[] = "@(#$Id: mimeparse.cpp,v 1.12 2006-09-06 09:14:43 dockes Exp $ (C) 2004 J.F.Dockes";
+static char rcsid[] = "@(#$Id: mimeparse.cpp,v 1.13 2006-09-15 16:50:44 dockes Exp $ (C) 2004 J.F.Dockes";
 #endif
 /*
 *   This program is free software; you can redistribute it and/or modify
@ -26,6 +26,7 @@ static char rcsid[] = "@(#$Id: mimeparse.cpp,v 1.12 2006-09-06 09:14:43 dockes E
 #include <ctype.h>
 #include <stdio.h>
 #include <ctype.h>
+#include <time.h>

 #include "mimeparse.h"
 #include "base64.h"
@ -578,8 +579,159 @@ bool rfc2047_decode(const std::string& in, std::string &out)
    return true;
 }

+#define DEBUGDATE 1
+#if DEBUGDATE
+#define DATEDEB(X) fprintf X
+#else
+#define DATEDEB(X)
+#endif
+
+// Convert rfc822 date to unix time. A date string normally looks like:
+//  Mon, 3 Jul 2006 09:51:58 +0200
+// But there are many common variations
+//
+time_t rfc2822DateToUxTime(const string& dt)
+{
+    // Strip everything up to first comma if any, we don't need weekday,
+    // then break into tokens
+    list<string> toks;
+    string::size_type idx;
+    if ((idx = dt.find_first_of(",")) != string::npos) {
+	if (idx == dt.length() - 1) {
+	    DATEDEB((stderr, "Bad rfc822 date format (short1): [%s]\n", 
+		     dt.c_str()));
+	    return (time_t)-1;
+	}
+	string date = dt.substr(idx+1);
+	stringToTokens(date, toks, " \t:");
+    } else {
+	stringToTokens(dt, toks, " \t:");
+    }
+
+#if DEBUGDATE
+    for (list<string>::iterator it = toks.begin(); it != toks.end(); it++) {
+	DATEDEB((stderr, "[%s] ", it->c_str()));
+    }
+    DATEDEB((stderr, "\n"));
+#endif
+
+    if (toks.size() == 6) {
+	// Probably no timezone, sometimes happens
+	toks.push_back("+0000");
+    }
+
+    if (toks.size() < 7) {
+	DATEDEB((stderr, "Bad rfc822 date format (toks cnt): [%s]\n", 
+		 dt.c_str()));
+	return (time_t)-1;
+    }
+	
+    struct tm tm;
+    memset(&tm, 0, sizeof(tm));
+
+    // Load struct tm with appropriate tokens, possibly converting
+    // when needed
+
+    list<string>::iterator it = toks.begin();
+
+    // Day of month: no conversion needed
+    tm.tm_mday = atoi(it->c_str());
+    it++;
+
+    // Month. Only Jan-Dec are legal. January, February do happen
+    // though. Convert to 0-11
+    if (*it == "Jan" || *it == "January") tm.tm_mon = 0; else if
+	(*it == "Feb" || *it == "February") tm.tm_mon = 1; else if
+	(*it == "Mar" || *it == "March") tm.tm_mon = 2; else if
+	(*it == "Apr" || *it == "April") tm.tm_mon = 3; else if
+	(*it == "May") tm.tm_mon = 4; else if
+	(*it == "Jun" || *it == "June") tm.tm_mon = 5; else if
+	(*it == "Jul" || *it == "July") tm.tm_mon = 6; else if
+	(*it == "Aug" || *it == "August") tm.tm_mon = 7; else if
+	(*it == "Sep" || *it == "September") tm.tm_mon = 8; else if
+	(*it == "Oct" || *it == "October") tm.tm_mon = 9; else if
+	(*it == "Nov" || *it == "November") tm.tm_mon = 10; else if
+	(*it == "Dec" || *it == "December") tm.tm_mon = 11; else {
+	DATEDEB((stderr, "Bad rfc822 date format (month): [%s]\n", 
+		 dt.c_str()));
+	return (time_t)-1;
+    }
+    it++;
+
+    // Year. Struct tm counts from 1900
+    tm.tm_year = atoi(it->c_str());
+    if (tm.tm_year > 1900)
+	tm.tm_year -= 1900;
+    it++;
+
+    // Hour minute second need no adjustments
+    tm.tm_hour = atoi(it->c_str()); it++;
+    tm.tm_min  = atoi(it->c_str()); it++;
+    tm.tm_sec  = atoi(it->c_str()); it++;	
+
+
+    // Timezone is supposed to be either +-XYZT or a zone name
+    int zonesecs = 0;
+    if (it->length() < 1) {
+	DATEDEB((stderr, "Bad rfc822 date format (zlen): [%s]\n", dt.c_str()));
+	return (time_t)-1;
+    }
+    if (it->at(0) == '-' || it->at(0) == '+') {
+	// Note that +xy:zt (instead of +xyzt) sometimes happen, we
+	// may want to process it one day
+	if (it->length() < 5) {
+	    DATEDEB((stderr, "Bad rfc822 date format (zlen1): [%s]\n", 
+		     dt.c_str()));
+	    goto nozone;
+	}
+	zonesecs = 3600*((it->at(1)-'0') * 10 + it->at(2)-'0')+ 
+	    (it->at(3)-'0')*10 + it->at(4)-'0';
+	zonesecs = it->at(0) == '+' ? -1 * zonesecs : zonesecs;
+    } else {
+	int hours;
+	if (*it == "A") hours= 1; else if (*it == "B") hours= 2; 
+	else if (*it == "C") hours= 3; else if (*it == "D") hours= 4; 
+	else if (*it == "E") hours= 5; else if (*it == "F") hours= 6;
+	else if (*it == "G") hours= 7; else if (*it == "H") hours= 8; 
+	else if (*it == "I") hours= 9; else if (*it == "K") hours= 10;
+	else if (*it == "L") hours= 11; else if (*it == "M") hours= 12; 
+	else if (*it == "N") hours= -1; else if (*it == "O") hours= -2; 
+	else if (*it == "P") hours= -3; else if (*it == "Q") hours= -4; 
+	else if (*it == "R") hours= -5; else if (*it == "S") hours= -6; 
+	else if (*it == "T") hours= -7; else if (*it == "U") hours= -8; 
+	else if (*it == "V") hours= -9; else if (*it == "W") hours= -10;
+	else if (*it == "X") hours= -11; else if (*it == "Y") hours= -12;
+	else if (*it == "Z") hours=  0; else if  (*it == "UT") hours= 0; 
+	else if (*it == "GMT") hours= 0; else if (*it == "EST") hours= 5;
+	else if (*it == "EDT") hours= 4; else if (*it == "CST") hours= 6;
+	else if (*it == "CDT") hours= 5; else if (*it == "MST") hours= 7;
+	else if (*it == "MDT") hours= 6; else if (*it == "PST") hours= 8;
+	else if (*it == "PDT") hours= 7; 
+	    // Non standard names
+	    // Standard Time (or Irish Summer Time?) is actually +5.5
+	else if (*it == "CET") hours= -1; else if (*it == "JST") hours= -9; 
+	else if (*it == "IST") hours= -5; else if (*it == "WET") hours= 0; 
+	else if (*it == "MET") hours= -1; 
+	else {
+	    DATEDEB((stderr, "Bad rfc822 date format (zname): [%s]\n", 
+		     dt.c_str()));
+	    // Forget tz
+	    goto nozone;
+	}
+	zonesecs = 3600 * hours;
+    }
+    DATEDEB((stderr, "Tz: [%s] -> %d\n", it->c_str(), zonesecs));
+ nozone:
+
+    time_t tim = mktime(&tm);
+    tim += zonesecs;
+    DATEDEB((stderr, "Date: %s  uxtime %ld \n", ctime(&tim), tim));
+    return tim;
+}
+
 #else 

+#include <time.h>

 #include <string>
 #include "mimeparse.h"
@ -588,6 +740,7 @@ bool rfc2047_decode(const std::string& in, std::string &out)

 using namespace std;
 extern bool rfc2231_decode(const string& in, string& out, string& charset); 
+extern time_t rfc2822DateToUxTime(const string& date);

 int
 main(int argc, const char **argv)
@ -641,7 +794,7 @@ main(int argc, const char **argv)
 	exit(1);
    }
    printf("Decoded: '%s'\n", out.c_str());
-#elif 1
+#elif 0
    char line [1024];
    string out;
    bool res;
@ -675,7 +828,22 @@ main(int argc, const char **argv)
 	exit(1);
    }
    printf("Decoded: [%s]\n", decoded.c_str());
-    
+#elif 1
+    {
+	time_t t;
+	
+	const char *dates[] = {
+	    " Wed, 13 Sep 2006 11:40:26 -0700 (PDT)",
+	    " Mon, 3 Jul 2006 09:51:58 +0200",
+	    " Wed, 13 Sep 2006 08:19:48 GMT-07:00",
+	    " Wed, 13 Sep 2006 11:40:26 -0700 (PDT)",
+	    " Sat, 23 Dec 89 19:27:12 EST",
+            "   13 Jan 90 08:23:29 GMT"};
+
+	for (unsigned int i = 0; i <sizeof(dates) / sizeof(char *); i++) {
+	    t = rfc2822DateToUxTime(dates[i]);
+	}
+    }
 #endif
 }

--- a/src/utils/mimeparse.h
+++ b/src/utils/mimeparse.h
@ -16,18 +16,24 @@
 */
 #ifndef _MIME_H_INCLUDED_
 #define _MIME_H_INCLUDED_
-/* @(#$Id: mimeparse.h,v 1.7 2006-09-06 09:14:43 dockes Exp $  (C) 2004 J.F.Dockes */
+/* @(#$Id: mimeparse.h,v 1.8 2006-09-15 16:50:44 dockes Exp $  (C) 2004 J.F.Dockes */
+
+#include <time.h>

 #include <string>
 #include <map>

 #include "base64.h"

+#ifndef NO_NAMESPACES
+using std::string;
+#endif
+
 /** A class to represent a MIME header value with parameters */
 class MimeHeaderValue {
 public:
-    std::string value;
-    std::map<std::string, std::string> params;
+    string value;
+    std::map<string, string> params;
 };

 /** 
@ -36,10 +42,10 @@ class MimeHeaderValue {
 * @param in the input string should be like: value; pn1=pv1; pn2=pv2. 
 *   Example: text/plain; charset="iso-8859-1" 
 */
-extern bool parseMimeHeaderValue(const std::string& in, MimeHeaderValue& psd);
+extern bool parseMimeHeaderValue(const string& in, MimeHeaderValue& psd);

 /** Quoted printable decoding. Doubles up as rfc2231 decoder, hence the esc */
-extern bool qp_decode(const std::string& in, std::string &out, 
+extern bool qp_decode(const string& in, string &out, 
 		      char esc = '=');

 /** Decode an Internet mail field value encoded according to rfc2047 
@ -53,6 +59,14 @@ extern bool qp_decode(const std::string& in, std::string &out,
 * @param in input string, ascii with rfc2047 markup
 * @return out output string encoded in utf-8
 */
-extern bool rfc2047_decode(const std::string& in, std::string &out);
+extern bool rfc2047_decode(const string& in, string &out);
+
+
+/** Decode RFC2822 date to unix time (gmt secs from 1970
+ *
+ * @param dt date string (the part after Date: )
+ * @return unix time
+ */
+time_t rfc2822DateToUxTime(const string& dt);

 #endif /* _MIME_H_INCLUDED_ */
 @ -1 +1 @@
 .4.4
 .5.0