Koha-community/Koha - Koha: The world's first free and open source library system

Author	SHA1	Message	Date
Tomas Cohen Arazi	b54c854d24	Bug 36901: Add logging for uncaught exceptions in background job classes This patch adds logging of unhandled exceptions that could occur. This is happening on busy production sites right now. This is also useful for plugin jobs that might not be 100% following the guidelines and would benefit from this. But as the [DO NOT PUSH] patch highlights, this is something we really want to have on our current codebase, as a database connection drop might make us reach that `catch` block we are adding logging to on this patch. To test: 1. Apply the [DO NOT PUSH] patch 2. Run: $ ktd --shell k$ restart_all ; tail -f /var/log/koha/kohadev/worker*.log 3. Pick a valid barcode on the staff UI 4. Use the 'Batch delete items' tool in the cataloguing section 5. Start the job for deleting the item => FAIL: The item got deleted, but the job marked as failed and no logs about the reasons 6. Apply this patch and repeat 2-5 => SUCCESS: Same scenario but there's a log with the error message 7. Sign off :-D Signed-off-by: David Nind <david@davidnind.com> Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com> Signed-off-by: Katrin Fischer <katrin.fischer@bsz-bw.de>	2024-07-01 18:55:54 +02:00
Kyle M Hall	e29e058851	Bug 33898: Implement reaping for database polling Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com> Signed-off-by: Victor Grousset/tuxayo <victor@tuxayo.net> Signed-off-by: Katrin Fischer <katrin.fischer@bsz-bw.de>	2024-03-15 09:38:37 +01:00
Marcel de Rooy	bc42006019	Bug 33898: Alternative approach with receive frame timeout See bug 33898 comment6. Test plan: Based on comment12: Stage MARC import and manage. Look at the same time to the results of ps aux\|grep Z. Verify that the lines with [background_jobs] <defunct> disappear within 10 seconds. Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl> Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com> Signed-off-by: Victor Grousset/tuxayo <victor@tuxayo.net> Signed-off-by: Katrin Fischer <katrin.fischer@bsz-bw.de>	2024-03-15 09:38:36 +01:00
Marcel de Rooy	c86e6ee82d	Bug 35819: (QA follow-up) Prevent warning on uninitialized retries count Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl> Signed-off-by: Katrin Fischer <katrin.fischer@bsz-bw.de>	2024-03-07 16:36:39 +01:00
Jonathan Druart	30e69ebc56	Bug 35819: Improve logging Log (warn) if the job will be processed later, but add a debug however. Have a specific log for bad status Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io> Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl> Signed-off-by: Katrin Fischer <katrin.fischer@bsz-bw.de>	2024-03-07 16:36:38 +01:00
Jonathan Druart	8394092568	Bug 35819: nack and not requeue if frame is invalid If a frame cannot be correctly processed (most probably because the body is not valid JSON) then we are not acking or nacking the frame and the worker is stuck. In this specific case we should nack without requeuing the frame. NOTE that requeue must be 'true' or 'false', not 1 or 0, or the default 'true' will be used. Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io> Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl> Signed-off-by: Katrin Fischer <katrin.fischer@bsz-bw.de>	2024-03-07 16:36:38 +01:00
Marcel de Rooy	0feb10df7f	Bug 35819: Add simple delay Here I add 500 ms. In my testing with the 1s sleep from the test plan, I might see one or two 'not found' lines. Obviously things depend on the time needed before the txn commits. But it will reduce a flood of these messages. Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl> Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io> Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl> Signed-off-by: Katrin Fischer <katrin.fischer@bsz-bw.de>	2024-03-07 16:36:37 +01:00
Tomas Cohen Arazi	1a51c2e973	Bug 35819: Notify NACK and requeue when Job ID not found This patch makes the worker reject the incoming frame for putting the message back in the queue, in the event the job id doesn't exist yet. Which is the case when some actions are being triggered inside a transaction which hasn't been commited to the DB yet. To test you will need 3 KTD shells (a) mysql: $ ktd --shell k$ sudo koha-mysql kohadev (b) logs: $ ktd --shell # for restarting the worker and looking at the logs k$ sudo koha-worker --restart kohadev ; tail -f /var/log/koha/kohadev/worker-.log (c) running the test: $ ktd --shell 1. Have (a), (b) and (c) terminals ready 2. On (c), run: $ perl -MKoha::Database -MKoha::BackgroundJob::BatchUpdateBiblioHoldsQueue -e 'Koha::Database->schema->txn_do( sub { Koha::BackgroundJob::BatchUpdateBiblioHoldsQueue->new->enqueue({ biblio_ids => [ 1 ] }); sleep 1; } );' => FAIL: (b) shows (once) an error about a job not existing 3. On (a) run: > SELECT * FROM background_jobs; => FAIL: Notice the job ID mentioned on 2 stands as 'new'. 4. Apply this patch 5. Ctrl+c on (b), and re-run to launch the worker with the patch applied 6. Repeat 2 => SUCCESS (partial): The error about the job not existing is displayed many times 7. Repeat 3 => SUCCESS: The job ID mentioned on 6 stands as 'finished'. 8. Sign off :-D Discussion: * The `requeue` header I added is correct, but it is the default behavior anyway. I prefered to make it explicit, though. * To avoid that bunch of retries, we should requeue with some delay. I didn't manage to make it work (yet) but there's a 'delay' plugin for rabbit [1]. We already install the 'stomp' plugin in koha-common.postinst. But this plugin requires downloading it. Which would require further investigation. * As Nick and Marcel pointed, we need to revisit the whole architecture, the need of a MQ (DB polling wouldn't have this problem), etc. But that's for another place. [1] https://hevodata.com/learn/rabbitmq-delayed-message/#:~:text=To%20delay%20a%20message%2C%20the,to%20queues%20or%20other%20exchanges. Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl> Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io> Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl> Signed-off-by: Katrin Fischer <katrin.fischer@bsz-bw.de>	2024-03-07 16:36:36 +01:00
Julian Maurice	972efe8b5c	Bug 36009: Document background_jobs_worker.pl --queue elastic_index Signed-off-by: Julian Maurice <julian.maurice@biblibre.com> Signed-off-by: Katrin Fischer <katrin.fischer@bsz-bw.de>	2024-03-07 15:02:49 +01:00
David Cook	2988c54971	Bug 35111: Ignore SIGPIPE in background jobs worker This change explicitly ignores SIGPIPE signals in the background jobs worker. Daemons like Starman ignore SIGPIPE so it makes sense to explicitly set this. Differences in the inner workings of MySQL vs MariaDB client libraries have yielded different behaviours in automatic reconnections and potentially SIGPIPE handling, so this helps to make the overall behaviour more consistent. Test plan: 0. Apply patch and run "restart_all" 1. Go to http://localhost:8081/cgi-bin/koha/catalogue/detail.pl?biblionumber=29 2. Click "Save" > "MARCXML" 3. Go to http://localhost:8081/cgi-bin/koha/tools/stage-marc-import.pl 4. Click "Choose file", choose the MARCXML file, click "Upload file" 5. Click "Stage for import" 6. Note the job is marked as "100% Finished" 7. In a separate window run "docker restart koha-db-1" 8. Repeat steps 3-5 for uploading file and running stage for import 9. Note that the job is marked as "100% Finished" as you'd expect Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl> [EDIT] Added comment on the SIG PIPE line. Signed-off-by: Victor Grousset/tuxayo <victor@tuxayo.net> Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>	2023-10-30 09:02:14 -03:00
Marcel de Rooy	f1edd8fe77	Bug 32305: Worker should ignore wrong status or queue in MQ mode MQ mode: The worker has seen that MQ is running and consumes messages instead of polling db only. Test plan: Stop worker for long tasks. Make sure that Rabbit MQ runs. Stage a file. (This adds a long task.) Goto staff view of jobs and cancel this job. Check if job is still in MQ with rabbitmqctl list_queues. Now start worker for long tasks. Check if job is gone in MQ with rabbitmqctl list_queues. And check logfile for the adjusted warning like: [WARN] Job 5 not found, or has wrong status/queue main:: /usr/share/koha/misc/workers/background_jobs_worker.pl (134) Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl> Signed-off-by: Nick Clemens <nick@bywatersolutions.com> Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com> Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl> [EDIT] Removed queue from query at MQ side. Discussed on IRC. Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>	2023-10-20 11:02:55 -03:00
Emily Lamancusa	a3c64f62c0	Bug 34470: Initialize random seed after spawning a child worker process When background_jobs_worker.pl spawns a new child process, it needs to explicitly reinitialize the random seed - otherwise each child process will inherit the same random seed from the parent process, and any randomization will produce identical results each time. This patch adds a call to srand immediately after the fork to reinitialize the seed. Note that child processes should not call srand with no parameter anywhere else, as the Perl documentation indicates that srand should not be called with no parameter more than once per process. To test: 1. Apply the logging patch only 2. Set system preferences: a. RealTimeHoldsQueue -> Enable b. RandomizeHoldsQueueWeight -> in random order 3. Watch the logs for the staff interface in ktd: ktd --shell koha-intra-err 4. Place a hold. Note that the logs display the branch list before and after it is randomized. 5. Place some more holds. Note that the branch order after randomization is identical each time. 6. Apply both patches and restart_all 7. Repeat steps 3-5. -> Note that the branch order before randomization hasn't changed -> Note that the branch order after randomization is now different each time. Signed-off-by: Nick Clemens <nick@bywatersolutions.com> Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>	2023-08-15 11:25:48 +03:00
Marcel de Rooy	3bf4addb24	Bug 32558: (QA follow-up) Leave default to 1, remove extra fork The default of 1 resembles the old behavior: 1 fork for the job. Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl> Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com> Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>	2023-03-06 13:57:53 -03:00
Kyle M Hall	311d68815f	Bug 32558: Add ability for background_jobs_worker.pl to process multiple jobs simultaneously up to a limit Right now background_jobs_worker.pl only processes jobs in serial. It would make sense to handle jobs in parallel up to a user definable limit. Test Plan: 1) Apply this patch 2) Stop background_jobs_worker.pl 3) Generate some background jobs by editing records, placing holds, etc 4) Watch processes in a new terminal: watch -n 0.1 'ps aux \| grep background_jobs_worker.pl' 5) Run background_jobs_worker.pl with parameter -m 3 or some other number of max processes 6) Note the multiple forked processes in the ps output Test notes - also tested the following on KTD: 1. Stop background_jobs_worker.pl 2. Edit /etc/koha/sites/kohadev/koha-conf.xml - set max_processes to 10 3. Generate some background jobs 4. Watch processes in a new terminal: watch -n 0.1 'ps aux \| grep background_jobs_worker.pl' 5. Restart all 6. Confirm multiple forked processes in the ps output Both methods work as expected and generate multiple forked processes based on the value set for max processes. Signed-off-by: emlam <emily.lamancusa@montgomerycountymd.gov> Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl> Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>	2023-03-06 13:57:53 -03:00
Nick Clemens	aba2453ad6	Bug 32594: Add a dedicated Elasticsearch biblio indexing background queue Currently we generate large numebrs if single record reindex for circulation and other actions. It can take a long time to process these as we need to load the ES settings for each. This patch updates the Elasticsearch background jobs to throw records into a new queue that can be processed by it's own worker and adds a dedicated worker that batches the jobs every 1 second. To test: 1 - Apply patches, set SearchEngine system preference to 'Elasticsearch' 2 - perl misc/search_tools/es_indexer_daemon.pl 3 - Leave the running in terminal and perform actions in staff interface: - Checking out a bib - Returning a bib - Editing a single bib - Editing a single item - Batch editing bibs - Batch editing items 4 - Confirm for each action that records are updated in search/search results 5 - Stop the script 6 - set SearchEngine system preference to 'Zebra' 7 - perl misc/search_tools/es_indexer_daemon.pl 8 - Script dies as Elasticsearch not enabled Signed-off-by: David Nind <david@davidnind.com> Signed-off-by: Emily Lamancusa <emily.lamancusa@montgomerycountymd.gov> Bug 32594: (follow-up) Adjust logging per bug 32612 JD amended patch: tidy! There were tabs here... Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org> Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>	2023-02-24 17:52:20 -03:00
Nick Clemens	4c51596a4b	Bug 32992: Move background_jobs_worker to misc/workers On bug 32594 we are adding a new worker, dedicated to Elastic indexing. We should have a common place for workers, and we agreed on misc/workers To test: 1 - Apply patch 2 - reset_all in koha testing docker 3 - ps aux \| grep background 4 - Confirm the workers are running, and running in the new directory 5 - Perform a batch item modification 6 - Ensure the job is processed by the worker Signed-off-by: David Nind <david@davidnind.com> Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org> Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>	2023-02-24 17:52:19 -03:00

16 commits