Bugfix for empty words
The \b consider only ASCCI as letter. So diacritics are considered as non word. and a word like leçon is splitted in 2, "le" is empty word, and the search is done on çon (which is not french [1], so has no result) [1] con (without the cedilla) is a french word, but I won't tell you what it means... anyway, there are probably no "con" in most catalogues ;-) Signed-off-by: Chris Cormack <crc@liblime.com> Signed-off-by: Joshua Ferraro <jmf@liblime.com>
This commit is contained in:
parent
9f58a48ef2
commit
345eaeb725
1 changed files with 6 additions and 1 deletions
|
@ -564,8 +564,13 @@ sub buildQuery {
|
|||
for ( my $i = 0 ; $i <= @operands ; $i++ ) {
|
||||
my $operand = $operands[$i];
|
||||
# remove stopwords from operand : parse all stopwords & remove them (case insensitive)
|
||||
# we use IsAlpha unicode definition, to deal correctly with diacritics.
|
||||
# otherwise, a french word like "leçon" is splitted in "le" "çon", le is an empty word, we get "çon"
|
||||
# and don't find anything...
|
||||
foreach (keys %{C4::Context->stopwords}) {
|
||||
$operand=~ s/\b$_\b//i;
|
||||
$operand=~ s/\P{IsAlpha}$_\P{IsAlpha}/ /i;
|
||||
$operand=~ s/^$_\P{IsAlpha}/ /i;
|
||||
$operand=~ s/\P{IsAlpha}$_$/ /i;
|
||||
}
|
||||
my $index = $indexes[$i];
|
||||
my $stemmed_operand;
|
||||
|
|
Loading…
Reference in a new issue