Koha/misc
Marcel de Rooy 72d0f23367
Bug 38408: Add parallel exporting in rebuild_zebra.pl
The first part of the Zebra rebuild is the exporting. This part is
made faster. The second part with zebraidx is not changed.

A new commandline parameter -forks is added to the rebuild_zebra.pl
script. A subroutine export_marc_records is added between index_records
and export_marc_records_from_sth. The last routine has a new parameter,
the sequence number of the export file.

NOTE: This report does not touch koha-rebuild-zebra yet! This will be
done on a follow-up.

Test plan:
Note that the number of forks/records below can be adjusted
according to your server and database setup.

[1] Reindex a subset of 100 records without forks:
    su [YOUR_KOHA_USER]
    misc/migration_tools/rebuild_zebra.pl -a -b -r -d /tmp/rebuild01 -k --length 100
    Check if /tmp/rebuild01/biblio contains one export file for auth/bib.
    Verify that max. 100 auth and bib were indexed (check Auth search, Cataloguing)
[2] Reindex an additional subset of 100 recs with forks (remove -r, add -forks):
    su [YOUR_KOHA_USER]
    misc/migration_tools/rebuild_zebra.pl -a -b -d /tmp/rebuild02 -k --length 100 --offset 100 -forks 3
    Check if /tmp/rebuild02/biblio contains 3 export files for auth/bib.
    Verify that max. 200 auth and bib were indexed (check Auth search, Cataloguing)
[3] Run a full reindex with forks:
    su [YOUR_KOHA_USER]
    misc/migration_tools/rebuild_zebra.pl -a -b -d /tmp/rebuild03 -k -forks 3
    Check both searches again.
[4] Bonus: To get a feeling of improved speed, reindex a larger production db with and
    without using -forks. (Use something like above.) You may add -I to skip indexing
    in order to better compare both exports.

Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
Reindexed a prod db in 96 mins instead of 150 mins (3 forks, 4 cores). Main gain in
biblio export; complete export took 35 mins, zebraidx 61 mins.
Signed-off-by: Paul Derscheid <paul.derscheid@lmscloud.de>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Signed-off-by: Katrin Fischer <katrin.fischer@bsz-bw.de>
2025-03-10 11:42:33 +01:00
..
admin Bug 38664: Tidy the whole codebase 2025-02-11 14:58:24 +01:00
bin Bug 38382: Fresh connection when connexion CSRF token expires 2024-12-20 18:33:25 +01:00
cronjobs Bug 39250: Add archive_purchase_suggestions.pl to cron.daily commented 2025-03-07 15:41:17 +01:00
devel Bug 39149: Make tidy.pl deal with .PL files 2025-02-18 15:35:52 +01:00
interface_customization Bug 23148: Replace Bridge icons with transparent PNG files 2020-07-20 16:16:37 +02:00
maintenance Bug 38762: Make compare_es_to_db.pl provide links to staff interface 2025-02-19 17:05:37 +01:00
migration_tools Bug 38408: Add parallel exporting in rebuild_zebra.pl 2025-03-10 11:42:33 +01:00
release_notes 24.11.00: Add missing manual contributors to release notes 2024-12-03 14:36:23 +01:00
search_tools Bug 38664: Tidy the whole codebase 2025-02-11 14:58:24 +01:00
translator Bug 38871: Remove sub string_list from misc/translator/xgettext.pl 2025-02-19 17:05:36 +01:00
workers Bug 38664: Tidy the whole codebase 2025-02-11 14:58:24 +01:00
add_date_fields_to_marc_records.pl Bug 38664: Tidy the whole codebase 2025-02-11 14:58:24 +01:00
add_statistics_borrowers_categorycode.pl Bug 38664: Tidy the whole codebase 2025-02-11 14:58:24 +01:00
batchCompareMARCvsFrameworks.pl Bug 38664: Tidy the whole codebase 2025-02-11 14:58:24 +01:00
batchdeletebiblios.pl Bug 38664: Tidy the whole codebase 2025-02-11 14:58:24 +01:00
batchDeleteUnusedSubfields.pl Bug 38664: Tidy the whole codebase 2025-02-11 14:58:24 +01:00
batchImportMARCWithBiblionumbers.pl Bug 38664: Tidy the whole codebase 2025-02-11 14:58:24 +01:00
batchRebuildBiblioTables.pl Bug 38664: Tidy the whole codebase 2025-02-11 14:58:24 +01:00
batchRebuildItemsTables.pl Bug 38664: Tidy the whole codebase 2025-02-11 14:58:24 +01:00
batchRepairMissingBiblionumbers.pl Bug 38664: Tidy the whole codebase 2025-02-11 14:58:24 +01:00
check_sysprefs.pl Bug 38664: Tidy the whole codebase 2025-02-11 14:58:24 +01:00
commit_file.pl Bug 38664: Tidy the whole codebase 2025-02-11 14:58:24 +01:00
export_borrowers.pl Bug 38664: Tidy the whole codebase 2025-02-11 14:58:24 +01:00
export_records.pl Bug 36770: (QA follow-up) Tidy export_records.pl 2024-08-09 18:44:54 +02:00
import_patrons.pl Bug 38664: Tidy the whole codebase 2025-02-11 14:58:24 +01:00
koha-install-log Bug 28519: Put CGI::Session::Serialize::yamlxs in lib directory 2021-06-17 10:07:36 +02:00
link_bibs_to_authorities.pl Bug 38664: Tidy the whole codebase 2025-02-11 14:58:24 +01:00
load_yaml.pl Bug 38664: Tidy the whole codebase 2025-02-11 14:58:24 +01:00
mod_zebraqueue.pl Bug 38664: Tidy the whole codebase 2025-02-11 14:58:24 +01:00
process_ill_updates.pl Bug 38664: Tidy the whole codebase 2025-02-11 14:58:24 +01:00
recreateIssueStatistics.pl Bug 38664: Tidy the whole codebase 2025-02-11 14:58:24 +01:00
sax_parser_print.pl Bug 38664: Tidy the whole codebase 2025-02-11 14:58:24 +01:00
sax_parser_test.pl Bug 38664: Tidy the whole codebase 2025-02-11 14:58:24 +01:00
sip_cli_emulator.pl Bug 38664: Tidy the whole codebase 2025-02-11 14:58:24 +01:00
stage_file.pl Bug 38664: Tidy the whole codebase 2025-02-11 14:58:24 +01:00
z3950_responder.pl Bug 38664: Tidy the whole codebase 2025-02-11 14:58:24 +01:00