The first part of the Zebra rebuild is the exporting. This part is
made faster. The second part with zebraidx is not changed.
A new commandline parameter -forks is added to the rebuild_zebra.pl
script. A subroutine export_marc_records is added between index_records
and export_marc_records_from_sth. The last routine has a new parameter,
the sequence number of the export file.
NOTE: This report does not touch koha-rebuild-zebra yet! This will be
done on a follow-up.
Test plan:
Note that the number of forks/records below can be adjusted
according to your server and database setup.
[1] Reindex a subset of 100 records without forks:
su [YOUR_KOHA_USER]
misc/migration_tools/rebuild_zebra.pl -a -b -r -d /tmp/rebuild01 -k --length 100
Check if /tmp/rebuild01/biblio contains one export file for auth/bib.
Verify that max. 100 auth and bib were indexed (check Auth search, Cataloguing)
[2] Reindex an additional subset of 100 recs with forks (remove -r, add -forks):
su [YOUR_KOHA_USER]
misc/migration_tools/rebuild_zebra.pl -a -b -d /tmp/rebuild02 -k --length 100 --offset 100 -forks 3
Check if /tmp/rebuild02/biblio contains 3 export files for auth/bib.
Verify that max. 200 auth and bib were indexed (check Auth search, Cataloguing)
[3] Run a full reindex with forks:
su [YOUR_KOHA_USER]
misc/migration_tools/rebuild_zebra.pl -a -b -d /tmp/rebuild03 -k -forks 3
Check both searches again.
[4] Bonus: To get a feeling of improved speed, reindex a larger production db with and
without using -forks. (Use something like above.) You may add -I to skip indexing
in order to better compare both exports.
Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
Reindexed a prod db in 96 mins instead of 150 mins (3 forks, 4 cores). Main gain in
biblio export; complete export took 35 mins, zebraidx 61 mins.
Signed-off-by: Paul Derscheid <paul.derscheid@lmscloud.de>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Signed-off-by: Katrin Fischer <katrin.fischer@bsz-bw.de>