Bugfix for empty words

The \b consider only ASCCI as letter. So diacritics are considered as non word.
and a word like leçon is splitted in 2, "le" is empty word, and the search is done
on çon (which is not french [1], so has no result)

[1] con (without the cedilla) is a french word, but I won't tell you what it means...
anyway, there are probably no "con" in most catalogues ;-)

Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
This commit is contained in:
Paul POULAIN 2007-10-12 17:35:23 -05:00 committed by Joshua Ferraro
parent 9f58a48ef2
commit 345eaeb725

View file

@ -564,8 +564,13 @@ sub buildQuery {
for ( my $i = 0 ; $i <= @operands ; $i++ ) {
my $operand = $operands[$i];
# remove stopwords from operand : parse all stopwords & remove them (case insensitive)
# we use IsAlpha unicode definition, to deal correctly with diacritics.
# otherwise, a french word like "leçon" is splitted in "le" "çon", le is an empty word, we get "çon"
# and don't find anything...
foreach (keys %{C4::Context->stopwords}) {
$operand=~ s/\b$_\b//i;
$operand=~ s/\P{IsAlpha}$_\P{IsAlpha}/ /i;
$operand=~ s/^$_\P{IsAlpha}/ /i;
$operand=~ s/\P{IsAlpha}$_$/ /i;
}
my $index = $indexes[$i];
my $stemmed_operand;