This commit is contained in:
Jean-Francois Dockes 2012-04-09 14:24:07 +02:00
parent a4c17941b1
commit 411a232fbf
2 changed files with 44 additions and 10 deletions

View File

@ -3593,6 +3593,28 @@ while query.next >= 0 and query.next < nres:
List elements with embedded spaces can be quoted using List elements with embedded spaces can be quoted using
double-quotes.</para> double-quotes.</para>
<formalpara><title>Encoding issues</title>
<para>Most of the configuration parameters are plain ASCII. Two
particular sets of values may cause encoding issues:</para>
<itemizedlist>
<listitem><para>File path parameters may contain non-ascii
characters and should use the exact same byte values as found in
the file system directory. Usually, this means that the
configuration file should use the system default locale
encoding.</para>
</listitem>
<listitem><para>The <literal>unac_except_trans</literal> parameter
should be encoded in UTF-8. If your system locale is not UTF-8, and
you need to also specify non-ascii file paths, this poses a
difficulty because common text editors cannot handle multiple
encodings in a single file. In this relatively unlikely case, you
can edit the configuration file as two separate text files with
appropriate encodings, and concatenate them to create the complete
configuration.</para>
</listitem>
</itemizedlist>
<sect2 id="rcl.install.config.recollconf"> <sect2 id="rcl.install.config.recollconf">
<title>Main configuration file</title> <title>Main configuration file</title>
@ -3853,16 +3875,17 @@ skippedPaths = ~/somedir/&lowast;.txt
</varlistentry> </varlistentry>
<varlistentry><term><literal>unac_except_trans</literal></term> <varlistentry><term><literal>unac_except_trans</literal></term>
<listitem><para>This is a list of characters which should be <listitem><para>This is a list of characters, encoded in UTF-8,
handled specially when converting text to unaccented lowercase. which should be handled specially when converting text to
For example, in Swedish, the letter <literal>a with diaeresis unaccented lowercase. For example, in Swedish, the letter
</literal> has full alphabet citizenship and should not be <literal>a with diaeresis </literal> has full alphabet
turned into an <literal>a</literal>. Each element in the citizenship and should not be turned into an
space-separated list has the special character as first element <literal>a</literal>. Each element in the space-separated list
and the translation following. The handling of both the has the special character as first element and the translation
lowercase and upper-case versions of a character should be following. The handling of both the lowercase and upper-case
specified, as appartenance to the list will turn-off both versions of a character should be specified, as appartenance to
standard accent and case processing. Example for Swedish:</para> the list will turn-off both standard accent and case
processing. Example for Swedish:</para>
<programlisting> <programlisting>
unac_except_trans = åå Åå ää Ää öö Öö unac_except_trans = åå Åå ää Ää öö Öö
</programlisting> </programlisting>

View File

@ -65,6 +65,17 @@ indexstemminglanguages = english
# match your LANG and is not 8859-1, set it here. # match your LANG and is not 8859-1, set it here.
# defaultcharset = iso-8859-1 # defaultcharset = iso-8859-1
# A list of characters, encoded in UTF-8, which should be handled specially
# when converting text to unaccented lowercase. For example, in Swedish,
# the letter a with diaeresis has full alphabet citizenship and should not
# be turned into an a.
# Each element in the space-separated list has the special character as
# first element and the translation following. The handling of both the
# lowercase and upper-case versions of a character should be specified, as
# appartenance to the list will turn-off both standard accent and case
# processing. Example for Swedish:
# unac_except_trans = åå Åå ää Ää öö Öö
# Where to store the database (directory). This may be an absolute path, # Where to store the database (directory). This may be an absolute path,
# else it is taken as relative to the configuration directory (-c argument # else it is taken as relative to the configuration directory (-c argument
# or $RECOLL_CONFDIR). # or $RECOLL_CONFDIR).