When looking for a bad MARC Record using the rebuild_zebra_sliced.sh, it is
useful to skip the complete MARCXML exporting from Koha and reuse the exported
files for Zebra indexing.
This patch adds a new parameter:
-x | --exclude-export Do not export Biblios from Koha, but use the existing
export-dir
Which depends on the:
-d | --export-dir Where rebuild_zebra.pl will export data
Default: $EXPORTDIR
!---------!
! TEST PLAN !
!---------!
1. Run
"./rebuild_zebra_sliced.sh --length 1000"
to export 1000 MARC Records
and slice them to one big 1000-Record chunk.
2. Realize that you get an imaginary "stack smashing detected"-error crashing
your indexing at some Record you dont know of and can't make out from the
indexing logging.
3. Start looking for the bad Record by running:
"./rebuild_zebra_sliced.sh --exlude-export --chunk-size 10"
To skip Biblios export from Koha which takes ~2h and get straight into
splitting your exported biblios to chunks of 10, and indexing them. You
know which chunk fails so it is much easier to find the issue there.
Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>
Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
Signed-off-by: Bernardo Gonzalez Kriegel <bgkriegel@gmail.com>
Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>
All tests and QA script pass.
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
When in DOM index mode, files exported by `rebuild_zebra.pl -x` are
wrapped by '<collection></collection>' tag.
This is a problem because splitting files produces invalid files.
This is fixed by adding the missing <collection> tags in each generated
file.
Another problem was that the wrong zebra configuration file was used.
The script now uses C4::Context->zebraconfig($server)->{config} to know
which configuration file has to be used.
Signed-off-by: Bernardo Gonzalez Kriegel <bgkriegel@gmail.com>
Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
This avoid indexing failures due to "bad offset" or "bad length" error
with ISO2709 format
+ minor improvements:
- --length parameter is optional. If not given, it will execute the
right sql query to find the number of records to index
- new parameter --reset-index. If set, index is reset before indexing
Signed-off-by: Bernardo Gonzalez Kriegel <bgkriegel@gmail.com>
Comment: Work as described. No errors.
Test: Edit record to make it longer than 9999. Without patch rebuild_sliced
fails. With patches works.
Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
indexing not indexation
some minor grammatical changes
Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
Complete rewrite of rebuild_zebra_sliced.zsh (renamed to .sh). Main
improvements are:
- both biblio and authority records are handled
- records are exported only once
It also add an option --skip-index to rebuild_zebra.pl that permit to
use rebuild_zebra.pl as an 'export only' script.
Description:
Index Koha records by chunks. It is useful when some record causes
errors and stop the indexation process. With this script, if indexation
of one chunk fails, chunk is splitted in 2 (or 3) chunks, and
indexation continue on these chunks.
rebuild_zebra.pl is called only once to export records.
Splitting and indexing is handled by this script (using yaz-marcdump and
zebraidx).
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>