bug 9496: improve error checking in rebuild_zebra.pl
When using rebuild_zebra to index all records, skip over
bibliographic or authority records that don't come out
as valid XML. Also, strip extraneous XML declarations when
using --nosanitize.
Test plans
----------
Note that both plans assume that DOM indexing is turned on.
Test plan #1
============
[1] Run rebuild_zebra.pl with the -x -nosanitize options. Without
the patch, zebraidx should terminate early and complain
about invalid XML.
[2] With the patch, the rebuild_zebra.pl should work without
error.
Test plan #2
============
[1] Intentionally make a MARCXML record invalid, e.g, by running
the following SQL:
UPDATE bilbioitems SET marcxml = CONCATENATE(marcxml, 'junk')
WHERE biblionumber = 123;
[2] Run rebuild_zebra.pl -b -x -r
[3] Without the patch, only part of the database will be indexed.
[4] With the patch, rebuild_zebra.pl will not export the bad
record and will give an error message saying so, but will
successfully index the rest of the records.
Signed-off-by: Galen Charlton <gmc@esilibrary.com> Signed-off-by: Larry Baerveldt <larry@bywatersolutions.com> Signed-off-by: Mason James <mtj@kohaaloha.com> Signed-off-by: Paul Poulain <paul.poulain@biblibre.com> Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com> Signed-off-by: Chris Cormack <chris@bigballofwax.co.nz>