This patch adds logging of unhandled exceptions that could occur. This
is happening on busy production sites right now. This is also useful for
plugin jobs that might not be 100% following the guidelines and would
benefit from this.
But as the [DO NOT PUSH] patch highlights, this is something we really
want to have on our current codebase, as a database connection drop
might make us reach that `catch` block we are adding logging to on this
patch.
To test:
1. Apply the [DO NOT PUSH] patch
2. Run:
$ ktd --shell
k$ restart_all ; tail -f /var/log/koha/kohadev/worker*.log
3. Pick a valid barcode on the staff UI
4. Use the 'Batch delete items' tool in the cataloguing section
5. Start the job for deleting the item
=> FAIL: The item got deleted, but the job marked as failed and no logs
about the reasons
6. Apply this patch and repeat 2-5
=> SUCCESS: Same scenario but there's a log with the error message
7. Sign off :-D
Signed-off-by: David Nind <david@davidnind.com>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Signed-off-by: Katrin Fischer <katrin.fischer@bsz-bw.de>
Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
Signed-off-by: Victor Grousset/tuxayo <victor@tuxayo.net>
Signed-off-by: Katrin Fischer <katrin.fischer@bsz-bw.de>
See bug 33898 comment6.
Test plan:
Based on comment12: Stage MARC import and manage. Look at the same
time to the results of ps aux|grep Z. Verify that the lines with
[background_jobs] <defunct> disappear within 10 seconds.
Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
Signed-off-by: Victor Grousset/tuxayo <victor@tuxayo.net>
Signed-off-by: Katrin Fischer <katrin.fischer@bsz-bw.de>
Log (warn) if the job will be processed later, but add a debug however.
Have a specific log for bad status
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
Signed-off-by: Katrin Fischer <katrin.fischer@bsz-bw.de>
If a frame cannot be correctly processed (most probably because the body
is not valid JSON) then we are not acking or nacking the frame and the
worker is stuck.
In this specific case we should nack without requeuing the frame.
NOTE that requeue must be 'true' or 'false', not 1 or 0, or the default
'true' will be used.
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
Signed-off-by: Katrin Fischer <katrin.fischer@bsz-bw.de>
Here I add 500 ms. In my testing with the 1s sleep from the test
plan, I might see one or two 'not found' lines. Obviously things
depend on the time needed before the txn commits. But it will
reduce a flood of these messages.
Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
Signed-off-by: Katrin Fischer <katrin.fischer@bsz-bw.de>
This patch makes the worker reject the incoming frame for putting the
message back in the queue, in the event the job id doesn't exist yet.
Which is the case when some actions are being triggered inside a
transaction which hasn't been commited to the DB yet.
To test you will need 3 KTD shells
(a) mysql:
$ ktd --shell
k$ sudo koha-mysql kohadev
(b) logs:
$ ktd --shell
# for restarting the worker and looking at the logs
k$ sudo koha-worker --restart kohadev ; tail -f /var/log/koha/kohadev/worker-*.log
(c) running the test:
$ ktd --shell
1. Have (a), (b) and (c) terminals ready
2. On (c), run:
$ perl -MKoha::Database -MKoha::BackgroundJob::BatchUpdateBiblioHoldsQueue -e 'Koha::Database->schema->txn_do( sub { Koha::BackgroundJob::BatchUpdateBiblioHoldsQueue->new->enqueue({ biblio_ids => [ 1 ] }); sleep 1; } );'
=> FAIL:
* (b) shows (once) an error about a job not existing
3. On (a) run:
> SELECT * FROM background_jobs;
=> FAIL: Notice the job ID mentioned on 2 stands as 'new'.
4. Apply this patch
5. Ctrl+c on (b), and re-run to launch the worker with the patch applied
6. Repeat 2
=> SUCCESS (partial): The error about the job not existing is displayed
many times
7. Repeat 3
=> SUCCESS: The job ID mentioned on 6 stands as 'finished'.
8. Sign off :-D
Discussion:
* The `requeue` header I added is correct, but it is the default
behavior anyway. I prefered to make it explicit, though.
* To avoid that bunch of retries, we should requeue with some delay. I
didn't manage to make it work (yet) but there's a 'delay' plugin for
rabbit [1]. We already install the 'stomp' plugin in
koha-common.postinst. But this plugin requires downloading it. Which
would require further investigation.
* As Nick and Marcel pointed, we need to revisit the whole architecture,
the need of a MQ (DB polling wouldn't have this problem), etc. But
that's for another place.
[1] https://hevodata.com/learn/rabbitmq-delayed-message/#:~:text=To%20delay%20a%20message%2C%20the,to%20queues%20or%20other%20exchanges.
Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
Signed-off-by: Katrin Fischer <katrin.fischer@bsz-bw.de>
This change explicitly ignores SIGPIPE signals in the background jobs
worker.
Daemons like Starman ignore SIGPIPE so it makes sense to explicitly set this.
Differences in the inner workings of MySQL vs MariaDB client libraries have yielded
different behaviours in automatic reconnections and potentially SIGPIPE handling,
so this helps to make the overall behaviour more consistent.
Test plan:
0. Apply patch and run "restart_all"
1. Go to http://localhost:8081/cgi-bin/koha/catalogue/detail.pl?biblionumber=29
2. Click "Save" > "MARCXML"
3. Go to http://localhost:8081/cgi-bin/koha/tools/stage-marc-import.pl
4. Click "Choose file", choose the MARCXML file, click "Upload file"
5. Click "Stage for import"
6. Note the job is marked as "100% Finished"
7. In a separate window run "docker restart koha-db-1"
8. Repeat steps 3-5 for uploading file and running stage for import
9. Note that the job is marked as "100% Finished" as you'd expect
Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
[EDIT] Added comment on the SIG PIPE line.
Signed-off-by: Victor Grousset/tuxayo <victor@tuxayo.net>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
MQ mode: The worker has seen that MQ is running and consumes messages
instead of polling db only.
Test plan:
Stop worker for long tasks.
Make sure that Rabbit MQ runs.
Stage a file. (This adds a long task.)
Goto staff view of jobs and cancel this job.
Check if job is still in MQ with rabbitmqctl list_queues.
Now start worker for long tasks.
Check if job is gone in MQ with rabbitmqctl list_queues.
And check logfile for the adjusted warning like:
[WARN] Job 5 not found, or has wrong status/queue main:: /usr/share/koha/misc/workers/background_jobs_worker.pl (134)
Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
[EDIT] Removed queue from query at MQ side. Discussed on IRC.
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
When background_jobs_worker.pl spawns a new child process, it needs to
explicitly reinitialize the random seed - otherwise each child process
will inherit the same random seed from the parent process, and any
randomization will produce identical results each time.
This patch adds a call to srand immediately after the fork to
reinitialize the seed. Note that child processes should not call
srand with no parameter anywhere else, as the Perl documentation
indicates that srand should not be called with no parameter more than
once per process.
To test:
1. Apply the logging patch only
2. Set system preferences:
a. RealTimeHoldsQueue -> Enable
b. RandomizeHoldsQueueWeight -> in random order
3. Watch the logs for the staff interface
in ktd:
ktd --shell
koha-intra-err
4. Place a hold. Note that the logs display the branch list before and
after it is randomized.
5. Place some more holds. Note that the branch order after randomization
is identical each time.
6. Apply both patches and restart_all
7. Repeat steps 3-5.
-> Note that the branch order before randomization hasn't changed
-> Note that the branch order after randomization is now different
each time.
Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
The default of 1 resembles the old behavior: 1 fork for the job.
Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
Right now background_jobs_worker.pl only processes jobs in serial. It would make sense to handle jobs in parallel up to a user definable limit.
Test Plan:
1) Apply this patch
2) Stop background_jobs_worker.pl
3) Generate some background jobs by editing records, placing holds, etc
4) Watch processes in a new terminal: watch -n 0.1 'ps aux | grep background_jobs_worker.pl'
5) Run background_jobs_worker.pl with parameter -m 3 or some other
number of max processes
6) Note the multiple forked processes in the ps output
Test notes - also tested the following on KTD:
1. Stop background_jobs_worker.pl
2. Edit /etc/koha/sites/kohadev/koha-conf.xml - set max_processes to 10
3. Generate some background jobs
4. Watch processes in a new terminal: watch -n 0.1 'ps aux | grep background_jobs_worker.pl'
5. Restart all
6. Confirm multiple forked processes in the ps output
Both methods work as expected and generate multiple forked processes
based on the value set for max processes.
Signed-off-by: emlam <emily.lamancusa@montgomerycountymd.gov>
Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
Currently we generate large numebrs if single record reindex for circulation and other
actions. It can take a long time to process these as we need to load the ES settings for each.
This patch updates the Elasticsearch background jobs to throw records into a new queue
that can be processed by it's own worker and adds a dedicated worker that batches the jobs
every 1 second.
To test:
1 - Apply patches, set SearchEngine system preference to 'Elasticsearch'
2 - perl misc/search_tools/es_indexer_daemon.pl
3 - Leave the running in terminal and perform actions in staff interface:
- Checking out a bib
- Returning a bib
- Editing a single bib
- Editing a single item
- Batch editing bibs
- Batch editing items
4 - Confirm for each action that records are updated in search/search results
5 - Stop the script
6 - set SearchEngine system preference to 'Zebra'
7 - perl misc/search_tools/es_indexer_daemon.pl
8 - Script dies as Elasticsearch not enabled
Signed-off-by: David Nind <david@davidnind.com>
Signed-off-by: Emily Lamancusa <emily.lamancusa@montgomerycountymd.gov>
Bug 32594: (follow-up) Adjust logging per bug 32612
JD amended patch: tidy! There were tabs here...
Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
On bug 32594 we are adding a new worker, dedicated to Elastic indexing.
We should have a common place for workers, and we agreed on misc/workers
To test:
1 - Apply patch
2 - reset_all in koha testing docker
3 - ps aux | grep background
4 - Confirm the workers are running, and running in the new directory
5 - Perform a batch item modification
6 - Ensure the job is processed by the worker
Signed-off-by: David Nind <david@davidnind.com>
Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
2023-02-24 17:52:19 -03:00
Renamed from misc/background_jobs_worker.pl (Browse further)