Koha/etc/zebradb
Fridolin Somers 9220482cd3 Bug 13064 - Indexing problem with ICU on control characters
The ICU configuration files contains a rule to remove control characters :
  <transform rule="[:Control:] Any-Remove"/>
This rule is before tokenization.

The problem is that "[:Control:]" regex contains line feed, carriage return and tab. See http://www.regular-expressions.info/posixbrackets.html.
So when several lines are indexed, last word of line is joined with first line of next line. Thoses words are then not searchable.

For example :
  First line
  Second line
This will become "First lineSecond line", tokenized as "First", "lineSecond" and "line".

Test plan :
- Use ICU in Zebra configuration
- Choose an indexed field, like 300$a
- Create a new record
- Enter several lines in choosen field, like :
  First line
  Second line
- Index this record
=> Without patch the search on "Second" does not return the record
=> With patch the search on "Second" returns the record
- Same tests with tab and carriage return instead of line feed

Signed-off-by: Chris Cormack <chris@bigballofwax.co.nz>

Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@gmail.com>
2014-11-14 12:03:12 -03:00
..
authorities/etc Bug 9612: (follow-up) restore elementSetName in Context.pm 2014-05-19 16:46:57 +00:00
biblios/etc Revert "Bug 9828: More specific indexing of UNIMARC 6XX fields" 2014-10-28 12:02:34 -03:00
etc Bug 13064 - Indexing problem with ICU on control characters 2014-11-14 12:03:12 -03:00
lang_defs Bug 10431 - Redundant mappings removed 2013-07-05 06:56:44 -07:00
marc_defs Bug 13163: NORMARC DOM config missing <id> entry 2014-10-31 16:45:04 -03:00
xsl Bug 11232: (qa followup) empty ID due to namespace mistake 2014-10-15 12:55:52 -03:00
ccl.properties Revert "Bug 9828: More specific indexing of UNIMARC 6XX fields" 2014-10-28 12:02:34 -03:00
cql.properties Add more zebra configuration 2007-10-01 15:34:16 -05:00
explain-authorities.xml Bug 5370: Fix all references to koha.org 2010-11-09 10:45:27 +13:00
explain-biblios.xml Bug 5370: Fix all references to koha.org 2010-11-09 10:45:27 +13:00
pqf.properties fixing a couple mappings for SRU CQL server 2008-01-03 03:01:14 -06:00
retrieval-info-auth-dom.xml Bug 9612: fix SRU response for DOM indexing 2014-05-05 20:28:04 +00:00
retrieval-info-auth-grs1.xml Bug 3087 Fix Z39.50 server to return the correct record syntax 2012-10-22 14:12:22 +02:00
retrieval-info-bib-dom.xml Bug 11232: Add new syntax for facets definition on koha-indexdefs-to-zebra.xsl 2014-10-15 12:55:33 -03:00
retrieval-info-bib-grs1.xml Bug 3087 Fix Z39.50 server to return the correct record syntax 2012-10-22 14:12:22 +02:00
zebra-authorities-dom.cfg Bug 11362 - increase zebra AUTH register sizes, from 4G to 20G 2014-10-24 09:41:04 -03:00
zebra-authorities.cfg Bug 11362 - increase zebra AUTH register sizes, from 4G to 20G 2014-10-24 09:41:04 -03:00
zebra-biblios-dom.cfg Bug 11232: (qa followup) Add missing fields/subfields to the item types faceta 2014-10-15 12:55:47 -03:00
zebra-biblios.cfg Bug 7041: Sort >1000 search results with sortmax parameter in zebra config file 2011-12-03 08:15:34 +01:00