Fridolin Somers
9220482cd3
The ICU configuration files contains a rule to remove control characters : <transform rule="[:Control:] Any-Remove"/> This rule is before tokenization. The problem is that "[:Control:]" regex contains line feed, carriage return and tab. See http://www.regular-expressions.info/posixbrackets.html. So when several lines are indexed, last word of line is joined with first line of next line. Thoses words are then not searchable. For example : First line Second line This will become "First lineSecond line", tokenized as "First", "lineSecond" and "line". Test plan : - Use ICU in Zebra configuration - Choose an indexed field, like 300$a - Create a new record - Enter several lines in choosen field, like : First line Second line - Index this record => Without patch the search on "Second" does not return the record => With patch the search on "Second" returns the record - Same tests with tab and carriage return instead of line feed Signed-off-by: Chris Cormack <chris@bigballofwax.co.nz> Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com> Signed-off-by: Tomas Cohen Arazi <tomascohen@gmail.com> |
||
---|---|---|
.. | ||
pazpar2 | ||
searchengine | ||
zebradb | ||
koha-conf.xml | ||
koha-httpd.conf | ||
README.txt | ||
SIPconfig.xml |
Koha Configuration Files: The following files specify the base configuration for Koha ZOOM: * koha-httpd.conf In a debian system, this apache configuration file will be symlinked from /etc/apache2/sites-enabled Specify Koha's IP address with NameVirtualHost Set ServerName, etc * koha-production.xml * koha-testing.xml These are the production and testing configurations for zebrasrv and for Koha. The first part of each file specifies Zebra server names, indexing configuration files, and query language configurations. Koha configuration directives follow. * zebra-authorities.cfg * zebra-biblios.cfg