Commit graph

22 commits

Author SHA1 Message Date
b54c854d24
Bug 36901: Add logging for uncaught exceptions in background job classes
This patch adds logging of unhandled exceptions that could occur. This
is happening on busy production sites right now. This is also useful for
plugin jobs that might not be 100% following the guidelines and would
benefit from this.

But as the [DO NOT PUSH] patch highlights, this is something we really
want to have on our current codebase, as a database connection drop
might make us reach that `catch` block we are adding logging to on this
patch.

To test:
1. Apply the [DO NOT PUSH] patch
2. Run:
   $ ktd --shell
  k$ restart_all ; tail -f /var/log/koha/kohadev/worker*.log
3. Pick a valid barcode on the staff UI
4. Use the 'Batch delete items' tool in the cataloguing section
5. Start the job for deleting the item
=> FAIL: The item got deleted, but the job marked as failed and no logs
about the reasons
6. Apply this patch and repeat 2-5
=> SUCCESS: Same scenario but there's a log with the error message
7. Sign off :-D

Signed-off-by: David Nind <david@davidnind.com>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Signed-off-by: Katrin Fischer <katrin.fischer@bsz-bw.de>
2024-07-01 18:55:54 +02:00
e29e058851
Bug 33898: Implement reaping for database polling
Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
Signed-off-by: Victor Grousset/tuxayo <victor@tuxayo.net>
Signed-off-by: Katrin Fischer <katrin.fischer@bsz-bw.de>
2024-03-15 09:38:37 +01:00
bc42006019
Bug 33898: Alternative approach with receive frame timeout
See bug 33898 comment6.

Test plan:
Based on comment12: Stage MARC import and manage. Look at the same
time to the results of ps aux|grep Z. Verify that the lines with
[background_jobs] <defunct> disappear within 10 seconds.

Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>

Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
Signed-off-by: Victor Grousset/tuxayo <victor@tuxayo.net>
Signed-off-by: Katrin Fischer <katrin.fischer@bsz-bw.de>
2024-03-15 09:38:36 +01:00
c86e6ee82d
Bug 35819: (QA follow-up) Prevent warning on uninitialized retries count
Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
Signed-off-by: Katrin Fischer <katrin.fischer@bsz-bw.de>
2024-03-07 16:36:39 +01:00
7c1d4716be
Bug 35819: Adjust es_indexer_daemon
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>

Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
[EDIT] Add forgotten module
Signed-off-by: Katrin Fischer <katrin.fischer@bsz-bw.de>
2024-03-07 16:36:39 +01:00
30e69ebc56
Bug 35819: Improve logging
Log (warn) if the job will be processed later, but add a debug however.

Have a specific log for bad status

Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>

Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
Signed-off-by: Katrin Fischer <katrin.fischer@bsz-bw.de>
2024-03-07 16:36:38 +01:00
8394092568
Bug 35819: nack and not requeue if frame is invalid
If a frame cannot be correctly processed (most probably because the body
is not valid JSON) then we are not acking or nacking the frame and the
worker is stuck.

In this specific case we should nack without requeuing the frame.

NOTE that requeue must be 'true' or 'false', not 1 or 0, or the default
'true' will be used.

Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>

Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
Signed-off-by: Katrin Fischer <katrin.fischer@bsz-bw.de>
2024-03-07 16:36:38 +01:00
0feb10df7f
Bug 35819: Add simple delay
Here I add 500 ms. In my testing with the 1s sleep from the test
plan, I might see one or two 'not found' lines. Obviously things
depend on the time needed before the txn commits. But it will
reduce a flood of these messages.

Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>

Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
Signed-off-by: Katrin Fischer <katrin.fischer@bsz-bw.de>
2024-03-07 16:36:37 +01:00
1a51c2e973
Bug 35819: Notify NACK and requeue when Job ID not found
This patch makes the worker reject the incoming frame for putting the
message back in the queue, in the event the job id doesn't exist yet.
Which is the case when some actions are being triggered inside a
transaction which hasn't been commited to the DB yet.

To test you will need 3 KTD shells
(a) mysql:
   $ ktd --shell
  k$ sudo koha-mysql kohadev

(b) logs:
   $ ktd --shell
   # for restarting the worker and looking at the logs
  k$ sudo koha-worker --restart kohadev  ; tail -f /var/log/koha/kohadev/worker-*.log
(c) running the test:
   $ ktd --shell

1. Have (a), (b) and (c) terminals ready
2. On (c), run:
   $ perl -MKoha::Database -MKoha::BackgroundJob::BatchUpdateBiblioHoldsQueue -e 'Koha::Database->schema->txn_do( sub { Koha::BackgroundJob::BatchUpdateBiblioHoldsQueue->new->enqueue({ biblio_ids => [ 1 ] }); sleep 1;  } );'
=> FAIL:
   * (b) shows (once) an error about a job not existing
3. On (a) run:
   > SELECT * FROM background_jobs;
=> FAIL: Notice the job ID mentioned on 2 stands as 'new'.
4. Apply this patch
5. Ctrl+c on (b), and re-run to launch the worker with the patch applied
6. Repeat 2
=> SUCCESS (partial): The error about the job not existing is displayed
many times
7. Repeat 3
=> SUCCESS: The job ID mentioned on 6 stands as 'finished'.
8. Sign off :-D

Discussion:

* The `requeue` header I added is correct, but it is the default
  behavior anyway. I prefered to make it explicit, though.

* To avoid that bunch of retries, we should requeue with some delay. I
  didn't manage to make it work (yet) but there's a 'delay' plugin for
  rabbit [1]. We already install the 'stomp' plugin in
  koha-common.postinst. But this plugin requires downloading it. Which
  would require further investigation.

* As Nick and Marcel pointed, we need to revisit the whole architecture,
  the need of a MQ (DB polling wouldn't have this problem), etc. But
  that's for another place.

[1] https://hevodata.com/learn/rabbitmq-delayed-message/#:~:text=To%20delay%20a%20message%2C%20the,to%20queues%20or%20other%20exchanges.

Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>

Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
Signed-off-by: Katrin Fischer <katrin.fischer@bsz-bw.de>
2024-03-07 16:36:36 +01:00
Julian Maurice
972efe8b5c
Bug 36009: Document background_jobs_worker.pl --queue elastic_index
Signed-off-by: Julian Maurice <julian.maurice@biblibre.com>
Signed-off-by: Katrin Fischer <katrin.fischer@bsz-bw.de>
2024-03-07 15:02:49 +01:00
7496a603cd
Bug 35086: Also split chunks when indexing from background job
The es background indexer is designed to combine background jobs when started based on the 'batch_size' option.

While this is helpful for combining individual updates, it can be problematic when there are several large batch modifications, or when worker has stopped and is restarted.

This patch uses the same logic as in the indexer to split the chunks that are sent directly for indexing.

To test:
1 - Follow test plan on previous patch
2 - Confirm items are correctly indexed and jobs marked

Signed-off-by: David Nind <david@davidnind.com>

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
Signed-off-by: Katrin Fischer <katrin.fischer@bsz-bw.de>
2024-01-16 12:06:00 +01:00
2988c54971
Bug 35111: Ignore SIGPIPE in background jobs worker
This change explicitly ignores SIGPIPE signals in the background jobs
worker.

Daemons like Starman ignore SIGPIPE so it makes sense to explicitly set this.
Differences in the inner workings of MySQL vs MariaDB client libraries have yielded
different behaviours in automatic reconnections and potentially SIGPIPE handling,
so this helps to make the overall behaviour more consistent.

Test plan:
0. Apply patch and run "restart_all"
1. Go to http://localhost:8081/cgi-bin/koha/catalogue/detail.pl?biblionumber=29
2. Click "Save" > "MARCXML"
3. Go to http://localhost:8081/cgi-bin/koha/tools/stage-marc-import.pl
4. Click "Choose file", choose the MARCXML file, click "Upload file"
5. Click "Stage for import"
6. Note the job is marked as "100% Finished"

7. In a separate window run "docker restart koha-db-1"
8. Repeat steps 3-5 for uploading file and running stage for import
9. Note that the job is marked as "100% Finished" as you'd expect

Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
[EDIT] Added comment on the SIG PIPE line.
Signed-off-by: Victor Grousset/tuxayo <victor@tuxayo.net>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
2023-10-30 09:02:14 -03:00
709dabb7e8
Bug 32305: Counterpart for es_indexer_daemon.pl
This is a kind of copy-and-paste variant of background worker.
We probably could do better than having two scripts here ;)

Test plan:
See former test plan. Apply it to ES indexing.

Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>

Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
[EDIT] Removed queue from query at MQ side. Discussed on IRC.
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
2023-10-20 11:02:56 -03:00
f1edd8fe77
Bug 32305: Worker should ignore wrong status or queue in MQ mode
MQ mode: The worker has seen that MQ is running and consumes messages
instead of polling db only.

Test plan:
Stop worker for long tasks.
Make sure that Rabbit MQ runs.
Stage a file. (This adds a long task.)
Goto staff view of jobs and cancel this job.
Check if job is still in MQ with rabbitmqctl list_queues.
Now start worker for long tasks.
Check if job is gone in MQ with rabbitmqctl list_queues.
And check logfile for the adjusted warning like:
  [WARN] Job 5 not found, or has wrong status/queue main:: /usr/share/koha/misc/workers/background_jobs_worker.pl (134)

Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>

Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
[EDIT] Removed queue from query at MQ side. Discussed on IRC.
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
2023-10-20 11:02:55 -03:00
Emily Lamancusa
a3c64f62c0
Bug 34470: Initialize random seed after spawning a child worker process
When background_jobs_worker.pl spawns a new child process, it needs to
explicitly reinitialize the random seed - otherwise each child process
will inherit the same random seed from the parent process, and any
randomization will produce identical results each time.

This patch adds a call to srand immediately after the fork to
reinitialize the seed. Note that child processes should not call
srand with no parameter anywhere else, as the Perl documentation
indicates that srand should not be called with no parameter more than
once per process.

To test:
1. Apply the logging patch only
2. Set system preferences:
    a. RealTimeHoldsQueue -> Enable
    b. RandomizeHoldsQueueWeight -> in random order
3. Watch the logs for the staff interface
   in ktd:
   ktd --shell
   koha-intra-err
4. Place a hold. Note that the logs display the branch list before and
   after it is randomized.
5. Place some more holds. Note that the branch order after randomization
   is identical each time.
6. Apply both patches and restart_all
7. Repeat steps 3-5.
   -> Note that the branch order before randomization hasn't changed
   -> Note that the branch order after randomization is now different
      each time.

Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
2023-08-15 11:25:48 +03:00
d8721bbc36
Bug 32594: Mark jobs as started and finished
Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
2023-03-28 14:49:44 +02:00
8e00746095
Bug 33108: (follow-up) Don't die, only warn if not using ES
Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
2023-03-20 09:39:47 -03:00
3bf4addb24 Bug 32558: (QA follow-up) Leave default to 1, remove extra fork
The default of 1 resembles the old behavior: 1 fork for the job.

Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>

Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
2023-03-06 13:57:53 -03:00
311d68815f Bug 32558: Add ability for background_jobs_worker.pl to process multiple jobs simultaneously up to a limit
Right now background_jobs_worker.pl only processes jobs in serial. It would make sense to handle jobs in parallel up to a user definable limit.

Test Plan:
1) Apply this patch
2) Stop background_jobs_worker.pl
3) Generate some background jobs by editing records, placing holds, etc
4) Watch processes in a new terminal: watch -n 0.1 'ps aux | grep background_jobs_worker.pl'
5) Run background_jobs_worker.pl with parameter -m 3 or some other
   number of max processes
6) Note the multiple forked processes in the ps output

Test notes - also tested the following on KTD:
1. Stop background_jobs_worker.pl
2. Edit /etc/koha/sites/kohadev/koha-conf.xml - set max_processes to 10
3. Generate some background jobs
4. Watch processes in a new terminal: watch -n 0.1 'ps aux | grep background_jobs_worker.pl'
5. Restart all
6. Confirm multiple forked processes in the ps output

Both methods work as expected and generate multiple forked processes
based on the value set for max processes.

Signed-off-by: emlam <emily.lamancusa@montgomerycountymd.gov>

Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
2023-03-06 13:57:53 -03:00
a3087d76bd
Bug 32594: (QA follow-up) Fix POD
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
2023-03-02 09:37:32 -03:00
aba2453ad6
Bug 32594: Add a dedicated Elasticsearch biblio indexing background queue
Currently we generate large numebrs if single record reindex for circulation and other
actions. It can take a long time to process these as we need to load the ES settings for each.

This patch updates the Elasticsearch background jobs to throw records into a new queue
that can be processed by it's own worker and adds a dedicated worker that batches the jobs
every 1 second.

To test:
1 - Apply patches, set SearchEngine system preference to 'Elasticsearch'
2 - perl misc/search_tools/es_indexer_daemon.pl
3 - Leave the running in terminal and perform actions in staff interface:
    - Checking out a bib
    - Returning a bib
    - Editing a single bib
    - Editing a single item
    - Batch editing bibs
    - Batch editing items
4 - Confirm for each action that records are updated in search/search results
5 - Stop the script
6 - set SearchEngine system preference to 'Zebra'
7 - perl misc/search_tools/es_indexer_daemon.pl
8 - Script dies as Elasticsearch not enabled

Signed-off-by: David Nind <david@davidnind.com>
Signed-off-by: Emily Lamancusa <emily.lamancusa@montgomerycountymd.gov>

Bug 32594: (follow-up) Adjust logging per bug 32612

JD amended patch: tidy! There were tabs here...

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
2023-02-24 17:52:20 -03:00
4c51596a4b
Bug 32992: Move background_jobs_worker to misc/workers
On bug 32594 we are adding a new worker, dedicated to Elastic indexing.
We should have a common place for workers, and we agreed on misc/workers

To test:
1 - Apply patch
2 - reset_all in koha testing docker
3 - ps aux | grep background
4 - Confirm the workers are running, and running in the new directory
5 - Perform a batch item modification
6 - Ensure the job is processed by the worker

Signed-off-by: David Nind <david@davidnind.com>

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
2023-02-24 17:52:19 -03:00