Fridolin Somers
9220482cd3
The ICU configuration files contains a rule to remove control characters : <transform rule="[:Control:] Any-Remove"/> This rule is before tokenization. The problem is that "[:Control:]" regex contains line feed, carriage return and tab. See http://www.regular-expressions.info/posixbrackets.html. So when several lines are indexed, last word of line is joined with first line of next line. Thoses words are then not searchable. For example : First line Second line This will become "First lineSecond line", tokenized as "First", "lineSecond" and "line". Test plan : - Use ICU in Zebra configuration - Choose an indexed field, like 300$a - Create a new record - Enter several lines in choosen field, like : First line Second line - Index this record => Without patch the search on "Second" does not return the record => With patch the search on "Second" returns the record - Same tests with tab and carriage return instead of line feed Signed-off-by: Chris Cormack <chris@bigballofwax.co.nz> Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com> Signed-off-by: Tomas Cohen Arazi <tomascohen@gmail.com> |
||
---|---|---|
.. | ||
authorities/etc | ||
biblios/etc | ||
etc | ||
lang_defs | ||
marc_defs | ||
xsl | ||
ccl.properties | ||
cql.properties | ||
explain-authorities.xml | ||
explain-biblios.xml | ||
pqf.properties | ||
retrieval-info-auth-dom.xml | ||
retrieval-info-auth-grs1.xml | ||
retrieval-info-bib-dom.xml | ||
retrieval-info-bib-grs1.xml | ||
zebra-authorities-dom.cfg | ||
zebra-authorities.cfg | ||
zebra-biblios-dom.cfg | ||
zebra-biblios.cfg |