Koha/etc/zebradb
Fridolin Somers 9220482cd3 Bug 13064 - Indexing problem with ICU on control characters
The ICU configuration files contains a rule to remove control characters :
  <transform rule="[:Control:] Any-Remove"/>
This rule is before tokenization.

The problem is that "[:Control:]" regex contains line feed, carriage return and tab. See http://www.regular-expressions.info/posixbrackets.html.
So when several lines are indexed, last word of line is joined with first line of next line. Thoses words are then not searchable.

For example :
  First line
  Second line
This will become "First lineSecond line", tokenized as "First", "lineSecond" and "line".

Test plan :
- Use ICU in Zebra configuration
- Choose an indexed field, like 300$a
- Create a new record
- Enter several lines in choosen field, like :
  First line
  Second line
- Index this record
=> Without patch the search on "Second" does not return the record
=> With patch the search on "Second" returns the record
- Same tests with tab and carriage return instead of line feed

Signed-off-by: Chris Cormack <chris@bigballofwax.co.nz>

Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@gmail.com>
2014-11-14 12:03:12 -03:00
..
authorities/etc Bug 9612: (follow-up) restore elementSetName in Context.pm 2014-05-19 16:46:57 +00:00
biblios/etc Revert "Bug 9828: More specific indexing of UNIMARC 6XX fields" 2014-10-28 12:02:34 -03:00
etc Bug 13064 - Indexing problem with ICU on control characters 2014-11-14 12:03:12 -03:00
lang_defs
marc_defs Bug 13163: NORMARC DOM config missing <id> entry 2014-10-31 16:45:04 -03:00
xsl Bug 11232: (qa followup) empty ID due to namespace mistake 2014-10-15 12:55:52 -03:00
ccl.properties Revert "Bug 9828: More specific indexing of UNIMARC 6XX fields" 2014-10-28 12:02:34 -03:00
cql.properties
explain-authorities.xml
explain-biblios.xml
pqf.properties
retrieval-info-auth-dom.xml Bug 9612: fix SRU response for DOM indexing 2014-05-05 20:28:04 +00:00
retrieval-info-auth-grs1.xml
retrieval-info-bib-dom.xml Bug 11232: Add new syntax for facets definition on koha-indexdefs-to-zebra.xsl 2014-10-15 12:55:33 -03:00
retrieval-info-bib-grs1.xml
zebra-authorities-dom.cfg Bug 11362 - increase zebra AUTH register sizes, from 4G to 20G 2014-10-24 09:41:04 -03:00
zebra-authorities.cfg Bug 11362 - increase zebra AUTH register sizes, from 4G to 20G 2014-10-24 09:41:04 -03:00
zebra-biblios-dom.cfg Bug 11232: (qa followup) Add missing fields/subfields to the item types faceta 2014-10-15 12:55:47 -03:00
zebra-biblios.cfg