Bug 14456: EmbedSeeFromHeadings record filter shouldn't process MARC holding fields
If the system preference IncludeSeeFromInSearches is enabled, records
exported for zebra indexing are being additionally processed by
EmbedSeeFromHeadings record filter (right now used only in rebuild_zebra.pl
script). This filter embeds 'see from' fields (extracted from authority
records linked with the given biblio via $9 subfields) into target MARC
record, which is then subsequently indexed in zebra.
Currently all fields containing $9 are getting the same exact treatment
by this filter. But on the export stage when the filter is applied, MARC
record being processed already does have holdings data fields added in
the previous stage (usually 952 / 995, depending on the MARC format).
Problem is that holdings data fields use to have $9 subfields in them
as well (mapped to item.itemnumber by default). As a consequence, some
(great many in the typical setup) records exported for zebra indexing
may have surplus "see from" fields added erroneously in semi-random
fashion, so biblio searches would often return some completely
unexpected additional results.
EmbedSeeFromHeadings record filter should not process holdings fields
when dealing with MARC records intended for zebra indexing.
To reproduce:
1) database with as many sample or real-world biblio, item and authority
records as possible is recommended for testing purposes
2) enable IncludeSeeFromInSearches
3) export a bunch of biblio records for zebra (e.g.:
misc/migration_tools/rebuild_zebra.pl -I -b -x -k -length=1000),
inspect the result xml records in /tmp/<whatever> file; observe that at
the end of many records, here and there some extra "see from" (= 1st
indicator: 'z') fields tend to appear, which shouldn't be there ;)
To test:
4) apply patch
5) redo 3)
6) compare results from 3) and 5) with diff
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
I introduced a regression test for this. You should run the tests
without/with the patch and verify that the patch actually fixes the problem.
Good job Jacek! I'm sure writing the regression test would take less time
than such a detailed commit message!
Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
(cherry picked from commit
f3a8b7a0e1e1bf112628c6215105ab80f25ed94f)
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>