Koha/misc/migration_tools
Marcel de Rooy 72d0f23367
Bug 38408: Add parallel exporting in rebuild_zebra.pl
The first part of the Zebra rebuild is the exporting. This part is
made faster. The second part with zebraidx is not changed.

A new commandline parameter -forks is added to the rebuild_zebra.pl
script. A subroutine export_marc_records is added between index_records
and export_marc_records_from_sth. The last routine has a new parameter,
the sequence number of the export file.

NOTE: This report does not touch koha-rebuild-zebra yet! This will be
done on a follow-up.

Test plan:
Note that the number of forks/records below can be adjusted
according to your server and database setup.

[1] Reindex a subset of 100 records without forks:
    su [YOUR_KOHA_USER]
    misc/migration_tools/rebuild_zebra.pl -a -b -r -d /tmp/rebuild01 -k --length 100
    Check if /tmp/rebuild01/biblio contains one export file for auth/bib.
    Verify that max. 100 auth and bib were indexed (check Auth search, Cataloguing)
[2] Reindex an additional subset of 100 recs with forks (remove -r, add -forks):
    su [YOUR_KOHA_USER]
    misc/migration_tools/rebuild_zebra.pl -a -b -d /tmp/rebuild02 -k --length 100 --offset 100 -forks 3
    Check if /tmp/rebuild02/biblio contains 3 export files for auth/bib.
    Verify that max. 200 auth and bib were indexed (check Auth search, Cataloguing)
[3] Run a full reindex with forks:
    su [YOUR_KOHA_USER]
    misc/migration_tools/rebuild_zebra.pl -a -b -d /tmp/rebuild03 -k -forks 3
    Check both searches again.
[4] Bonus: To get a feeling of improved speed, reindex a larger production db with and
    without using -forks. (Use something like above.) You may add -I to skip indexing
    in order to better compare both exports.

Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
Reindexed a prod db in 96 mins instead of 150 mins (3 forks, 4 cores). Main gain in
biblio export; complete export took 35 mins, zebraidx 61 mins.
Signed-off-by: Paul Derscheid <paul.derscheid@lmscloud.de>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Signed-off-by: Katrin Fischer <katrin.fischer@bsz-bw.de>
2025-03-10 11:42:33 +01:00
..
22_to_30 Bug 38664: Tidy the whole codebase 2025-02-11 14:58:24 +01:00
ifla Bug 38664: Tidy the whole codebase 2025-02-11 14:58:24 +01:00
build_oai_sets.pl Bug 38664: Tidy the whole codebase 2025-02-11 14:58:24 +01:00
buildCOUNTRY.pl Bug 38664: Tidy the whole codebase 2025-02-11 14:58:24 +01:00
buildEDITORS.pl Bug 38664: Tidy the whole codebase 2025-02-11 14:58:24 +01:00
buildLANG.pl Bug 38664: Tidy the whole codebase 2025-02-11 14:58:24 +01:00
bulkmarcimport.pl Bug 38664: Tidy the whole codebase 2025-02-11 14:58:24 +01:00
checkNonIndexedBiblios.pl Bug 38664: Tidy the whole codebase 2025-02-11 14:58:24 +01:00
create_analytical_rel.pl Bug 38664: Tidy the whole codebase 2025-02-11 14:58:24 +01:00
import_lexile.pl Bug 38664: Tidy the whole codebase 2025-02-11 14:58:24 +01:00
koha-svc.pl Bug 38664: Tidy the whole codebase 2025-02-11 14:58:24 +01:00
rebuild_zebra.pl Bug 38408: Add parallel exporting in rebuild_zebra.pl 2025-03-10 11:42:33 +01:00
rebuild_zebra_sliced.sh Bug 13660: Exclude export phase and use existing exported MARCXML - rebuild_zebra_sliced.sh 2018-01-09 17:23:50 -03:00
remove_unused_authorities.pl Bug 38664: Tidy the whole codebase 2025-02-11 14:58:24 +01:00
switch_marc21_series_info.pl Bug 38664: Tidy the whole codebase 2025-02-11 14:58:24 +01:00
upgradeitems.pl Bug 38664: Tidy the whole codebase 2025-02-11 14:58:24 +01:00