Commit graph

96 commits

Author SHA1 Message Date
bee097b39b Bug 14302: Remove GRS1 specific code
Remove:
- BIB_INDEX_MODE and AUTH_INDEX_MODE env var
- bib_index_mode and auth_index_mode options from scripts
- Warnings from about page, just kept one if zebra_bib_index_mode or
zebra_auth_index_mode still exist in config and are set to grs1

Test plan:
- Install Koha from src
- Install Koha from pkg
- Read the code, carefully!

Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
Rebased

Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>

Signed-off-by: Nick Clemens <nick@bywatersolutions.com>
2018-08-31 11:24:20 +00:00
10b5e1ee04 Bug 19122: (bug 18098 follow-up) Fix IncludeSeeFromInSearches behaviour
The IncludeSeeFromInSearches system preference is designed so that 'See from' headings from the authorities are included when you search in the catalog.
That means that you could find an author not only by the name printed on the book, but for example also by their pseudonym or a different spelling of their name.

It was added by bug 7417.

This regression has been introduced by
  commit 5ef1b6710e
  Bug 18098: Add an index with the count of not onloan items

-        } elsif ($record_type eq 'biblio' && C4::Context->preference('IncludeSeeFromInSearches')) {
-            my $normalizer = Koha::RecordProcessor->new( { filters => 'EmbedSeeFromHeadings' } );
[...]
+            push @filters, 'IncludeSeeFromInSearches'
+                if C4::Context->preference('IncludeSeeFromInSearches');

Test plan:
- Activate IncludeSeeFromInSearches
- Catalog an authority for a person
  - main heading in 100
  - see from headings in 400
- Catalog a bibliographic record and link it to the authority
- Make sure the record is indexed
- Verify that the record can be found searching for the main heading
- Verify that the record can be found searching for the see from headings

Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
Yet another reason to get rid of all this functions from this script.

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
2017-10-04 13:21:51 -03:00
2671eb2f93 Bug 18927: Use fully qualified subroutine names in C4::Items
rebuild_zebra.pl fails in some conditions (perl version?)
I do not recreate but it has been reported that reindex fails with:
  error retrieving biblio 94540 at /usr/share/koha/bin/migration_tools/rebuild_zebra.pl line 683, <DATA> line 751.

To fix it we can use fully qualified subroutine names for:
  GetMarcFromKohaField
  GetMarcBiblio
  GetBiblionumberFromItemnumber
  TransformKohaToMarc
  GetFrameworkCode

Test plan:
Confirm the rebuild_zebra script still works correctly after this patch

Signed-off-by: Lee Jamison <ldjamison@marywood.edu>

Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
2017-09-06 12:55:00 -03:00
Mark Tompsett
d5986c9b97 Bug 19040: Refactor GetMarcBiblio parameters
Change parameters to a hashref.

Signed-off-by: Josef Moravec <josef.moravec@gmail.com>

Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
Looks good to me.
Two calls in migration_tools/22_to_30 still in old style.

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
2017-08-25 10:23:42 -03:00
caa4cccfa0 Bug 16758: Use the default cache instance
I do not see a valid reason not to use the default one instead of the
syspref one.

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>

Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
2017-05-12 08:49:42 -04:00
Jacek Ablewicz
84dbc80074 Bug 16758 - Caching issues in scripts running in daemon mode
As L1 cache does not have expiration mechanism, scripts running
in daemon mode (rebuild_zebra.pl -daemon, sip server ?, ...) would
not be aware of any possible changes in the data being cached
in upstream L2 cache.

This patch adds ->flush_L1_caches() call in rebuild_zebra.pl
inside daemon mode loop.

To test:

1) apply patch
2) ensure that rebuild_zebra.pl -daemon is still working properly,
without any noticeable performance degradation
3) stop memcached daemon and try to run rebuild_zebra.pl -daemon
again: there should be a warning emitted stating that the script
is running in daemon mode but without recommended caching system

Signed-off-by: Josef Moravec <josef.moravec@gmail.com>

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>

Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
2017-05-12 08:49:42 -04:00
5ef1b6710e Bug 18098: Add an index with the count of not onloan items
This patch adds a numeric index 'not-onloan-count' containing the value
of 999$x. This subfield is filled by 'rebuild_zebra.pl' by making use of
(bug's 18208) 'EmbedItemsAvailability' filter.

bib1.att and indexes definitions are updated accordingly.

To test:
- Apply the patch
- Pick the right biblio-zebra-indexdefs.xsl file for your setup and
  replace the one your Zebra uses [1]
- Replace your bib1.att
- Replace your ccl.properties
- Have at least one record with more than one item, checkout some
  item(s) from that record(s).
- Rebuild zebra's indexes:
  $ sudo koha-shell kohadev
 k$ cd kohaclone
 k$ misc/migration_tools/rebuild_zebra.pl -r -b -v -k
 (notice the dump directory is kept, you can try the XSLT yourself
  running:
    $ xsltproc \
       etc/zebradb/marc_defs/marc21/biblios/biblio-zebra-indexdefs.xsl \
       /tmp/the_dump_dir/biblios/exported_records | less
=> SUCCESS: There are records with the not-onloan-count index, and the
            value is correct!
- Check Zebra yourself:
  $ yaz-client unix:/var/run/koha/kohadev/bibliosocket
 Z> base biblios
 Z> find @attr 1=9013 @attr 2=5 @attr 4=109 0
=> SUCCESS: The search matches the amount of records with not-onloan
            items.
 Z> s 1+1
=> SUCCESS: Records with 999$x having a value higher than 0 are rendered
- Sign off :-D

Note: While this work is complete on its purpose, it is part of an
attempt to create a better way of filtering by availability.

Sponsored-by: ByWater Solutions

 [1] In kohadevbox this would be
/etc/koha/zebradb/marc_defs/marc21/biblios/biblio-zebra-indexdefs.xsl

Edit: Added the missing XSLT changes for UNIMARC and NORMARC

Signed-off-by: Josef Moravec <josef.moravec@gmail.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>

Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
2017-05-08 09:21:41 -04:00
b4c66d5cc4 Bug 17935: Adjust some POD lines, fix a few typos
This patch does the following:

[1] Move some POD lines from Cache to Caches.
[2] Correct C4::Plugins to Koha::Plugins in POD line of Koha::Plugins
[3] POD Koha/AuthorisedValue.pm: lib_opac moved to opac_description
[4] The POD in Koha/Patron.pm uses head2 and head3 inconsistently.
    Ran s/^=head2/=head3/ on those lines (7 substitutions on 7 lines)
[5] Correct a copied POD line from reports/issues_stats.pl in
    reports/reserve_stats.pl.
[6] Correct a test description in t/db_dependent/Koha/Authorities.t.
    You should never delete the library :)
[7] Correct typo shouild in a comment of rebuild_zebra.pl

Test plan:
[1] Read the patch. Does it make sense?
[2] Run perldoc Koha/Cache.pm and Koha/Caches.pm
[3] Run t/db_dependent/Koha/Authorities.t

Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
Signed-off-by: David Cook <dcook@prosentient.com.au>

Signed-off-by: Jonathan Druart <jonathan.druart@biblibre.com>

Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
2017-02-14 14:12:50 +00:00
7a2cbfba1f Bug 17731: Remove noxml option from rebuild_zebra.pl
The removal of the noxml is a logical follow-up of bug 16506 (which
make xml the default).

Actually this option should have been removed by bug 10455 (it removes
the biblioitem.marc field).

Test plan:
Make sure the rebuild_zebra.pl script works as before.

Signed-off-by: Emma Smith <emma.nakamura.smith@gmail.com>

Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
2017-01-19 13:05:08 +00:00
bfcc7cad70 Bug 17376 - rebuild_zebra.pl in daemon mode no database access kills the process
When running rebuild_zebra.pl in daemon mode, a while loop runs the script for ever.
But if something crashes inside the rebuild process, the all daemon crashes.
For example when it can not access database.
This problem may be temporary so daemon should keep running.

This patch add eval around the rebuild process to allow a run to fail without killing the daemon.
Also moves the DB handler get inside daemon loop because it is broken is DB stoppes.

This is a big issue for indexer running in a systemd service.

Test plan :
- run rebuild_zebra.pl in daemon mode :
/home/koha/src/misc/migration_tools/rebuild_zebra.pl -daemon -z -a -b -x --sleep 30
- stop the database
- wait a minute
=> you see an error on database connexion
=> the daemon is still running
- restart the database
- test the indexer by creating a new record (wait for a minute)

Signed-off-by: Jacek Ablewicz <abl@biblos.pk.edu.pl>

Signed-off-by: Jonathan Druart <jonathan.druart@biblibre.com>

Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
2016-10-28 11:28:37 +00:00
6122b8fe6e Bug 16830: (followup) Remove weird character from warning in rebuild_zebra.pl
Signed-off-by: Mark Tompsett <mtompset@hotmail.com>

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>

Signed-off-by: Brendan Gallagher <brendan@bywatersolutions.com>
2016-08-04 19:41:42 +00:00
6c65b64c84 Bug 16505: Make sure $as_xml will not be used later
Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
2016-05-23 17:29:23 +00:00
5dd1b1bb66 Bug 16506: Remove warning for UNIMARC installs
Use of uninitialized value in numeric eq (==)

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>

Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
2016-05-23 17:29:12 +00:00
6c7a8c57e7 Bug 16506: (followup) Fix wrong option switch warning message
Signed-off-by: Bernardo Gonzalez Kriegel <bgkriegel@gmail.com>

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>

Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
2016-05-23 17:29:12 +00:00
7f22619001 Bug 16506: (Followup) remove warnings
Signed-off-by: Bernardo Gonzalez Kriegel <bgkriegel@gmail.com>

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>

Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
2016-05-23 17:29:12 +00:00
0b3b53e4f2 Bug 16506: Make rebuild_zebra.pl use XML as default
This patch deprecates the -x switch, making XML the default serialization format
used by rebuild_zebra.pl. It doesn't remove the option switch, but raises a warning
for the end user about the deprecation so they fix their cronjobs. Later we could remove it.

To test:
- Disable all indexing (daemon/cronjob)
- Create 2 records
- Edit one of them, delete the other one
- Verify they are queued for updates in zebraqueue
- sudo koha-mysql kohadev
  > SELECT * FROM zebraqueue WHERE done=0
...
| 265 |                265 | specialUpdate | biblioserver |    1 | 2016-05-13 14:23:45 |
| 266 |                  1 | recordDelete  | biblioserver |    1 | 2016-05-16 14:14:33 |
| 267 |                  2 | specialUpdate | biblioserver |    1 | 2016-05-16 14:15:06 |
+-----+--------------------+---------------+--------------+------+---------------------+
- Now go to koha-shell
  $ sudo koha-shell kohadev ; cd kohaclone
- Run:
  $ misc/migration_tools/rebuild_zebra.pl -k -b -z

  You will get something similar to this:
NOTHING cleaned : the export /tmp/jI0OeHy6Tn has been kept.
You can re-run this script with the -s  and -d /tmp/jI0OeHy6Tn parameters
if you just want to rebuild zebra after changing the record.abs
or another zebra config file
- Verify
  * less /tmp/jI0OeHy6Tn/del_biblio/exported_records
  * less /tmp/jI0OeHy6Tn/upd_biblio/exported_records
=> FAIL: They contain the records you added/modified/deleted but they are in
         USMARC format
- Apply the patch
- Mark your records for indexing (in koha-mysql kohadev)
  > UPDATE zebraqueue SET done=0 WHERE id > 264
- Run:
  $ misc/migration_tools/rebuild_zebra.pl -k -b -z

  You will get something similar to this:
<WARNINGS> [1]
NOTHING cleaned : the export /tmp/jI0OeHy6Tn has been kept.
You can re-run this script with the -s  and -d /tmp/jI0OeHy6Tn parameters
if you just want to rebuild zebra after changing the record.abs
or another zebra config file
- Verify
  * less /tmp/jI0OeHy6Tn/del_biblio/exported_records
  * less /tmp/jI0OeHy6Tn/upd_biblio/exported_records
=> SUCCESS: Data is correctly in XML format
- Run:
  $ misc/migration_tools/rebuild_zebra.pl -k -b -z -noxml

  You will get something similar to this:
<WARNINGS> [1]
NOTHING cleaned : the export /tmp/jI0OeHy6Tn has been kept.
You can re-run this script with the -s  and -d /tmp/jI0OeHy6Tn parameters
if you just want to rebuild zebra after changing the record.abs
or another zebra config file
- Verify
  * less /tmp/jI0OeHy6Tn/del_biblio/exported_records
  * less /tmp/jI0OeHy6Tn/upd_biblio/exported_records
=> SUCCESS: Data is correctly in USMARC format
- Sign off :-D

[1] Warnings covered by a followup

Signed-off-by: Bernardo Gonzalez Kriegel <bgkriegel@gmail.com>
On top of Bug 16505
Work as described following test plan, usmarc default pre patch,
post patch xml default and usmarc on request.
No errors (all patchset)

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>

Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
2016-05-23 17:29:12 +00:00
7a178fd262 Bug 16505: <collection> is missing the marc namespace and updates fail if -x is passed
Using rebuild_zerba.pl with the -x option switch, produces an incorrect output in
terms of what our XSLTs expect for indexing. This patch introduces the right namespace information
on the exported records so indexing succeeds.

To test:
- On current master, have some records on your db
- Run:
  $ sudo koha-shell kohadev
  $ cd kohaclone
  $ misc/migration_tools/rebuild_zebra.pl -r -b -k -x
=> you will get a message like this:

NOTHING cleaned : the export /tmp/NL5ufjUfpp has been kept.

- Run
  $ less /tmp/NL5ufjUfpp/biblio/exported_records
=> FAIL: The first line looks like this

<?xml version="1.0" encoding="UTF-8"?><collection><record

- Now run:
  $ xsltproc \
     /etc/koha/zebradb/marc_defs/marc21/biblios/biblio-zebra-indexdefs.xsl \
     /tmp/NL5ufjUfpp/biblio/exported_records
=> FAIL: No output
- Apply the patch
- Run:
  $ misc/migration_tools/rebuild_zebra.pl -r -b -k -x
- Take a look at the result file:
  $ less /tmp/asdiouqwiue/biblio/exported_records
=> SUCCESS: The start of the file looks like this:
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim">

- Run:
  $ xsltproc \
     /etc/koha/zebradb/marc_defs/marc21/biblios/biblio-zebra-indexdefs.xsl \
     /tmp/asdiouqwiue/biblio/exported_records
=> SUCCESS: There is actually indexing data :-D
- Sign off :-D

Edit: I changed qq{} for q{} as suggested by Jonathan.

Sponsored-by: American Numismatic Society

Signed-off-by: Bernardo Gonzalez Kriegel <bgkriegel@gmail.com>
Works as described following test plan
No errors

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>

Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
2016-05-23 17:04:04 +00:00
Julian Maurice
48df0b8a2d Bug 15325: Fix --table option of rebuild_zebra.pl
Option's value given on command line was never used and 'biblioitems'
was used instead.

Test plan:
1. git checkout master
2. perl misc/migration_tools/rebuild_zebra.pl -b -t items --where "price = 42"
3. You should see errors printed on screen about an unknown column
4. Apply patch
5. perl misc/migration_tools/rebuild_zebra.pl -b -t items --where "price = 42"
6. No errors \o/

Signed-off-by: Frédéric Demians <f.demians@tamil.fr>

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>

Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
2015-12-11 16:15:50 +00:00
Jonathan Druart
e8055c7ef6 Bug 12368: Die if the --table value is not allowed.
If the table given in parameter is not in the white list, the script
should die rathen than correct to a default value.

Signed-off-by: Chris Cormack <chris@bigballofwax.co.nz>

Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
2015-10-09 14:25:58 -03:00
Jonathan Druart
2d9c221abc Bug 12368: Rebuild Zebra improvement: allow to specify a DB table
Currently the --where parameter only allow to specify a condition on
fields in the biblioitems table.
For some needs it would be great to specify a condition on the field in
the items table.

The use case is the following: you want to reindex biblios with items
modified since a specific timestamp.

Test plan:
1/ Pick an item randomly in your catalogue
2/ Edit it and save
3/ Note that the items.timestamp has been set to today but not the
biblioitems.timestamp
4/ launch rebuild_zebra without the new parameter
  perl misc/migration_tools/rebuild_zebra.pl -b -v --where
  "timestamp >= XXX"
where XXX is the today date (e.g. "2014-06-05 00:00:00").
Note that the biblio has not been indexed.
5/ launch rebuild_zebra using the new parameter:
  perl misc/migration_tools/rebuild_zebra.pl -b -v -t items --where
  "timestamp >= XXX"
Note the biblio has been indexed.

Signed-off-by: Chris Cormack <chris@bigballofwax.co.nz>

Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
2015-10-09 14:25:58 -03:00
0a53d5e6b6 Bug 12651: DOM indexing is the default
On the 23 July development meeting it was decided to formally deprecate
GRS-1 indexing mode for Zebra. This patch makes code fallback to DOM
on the remaining places. No behaviour change should be noticed, as DOM
has been the default for a while.

Regards

Signed-off-by: Chris Cormack <chris@bigballofwax.co.nz>
Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>
Passes tests and QA script.
Also checked running Makefile.PL

Signed-off-by: Tomas Cohen Arazi <tomascohen@gmail.com>
2014-10-27 12:35:44 -03:00
Galen Charlton
03338b70e4 Bug 10955: (follow-up) improve usage information
This patch improves rebuild_zebra.pl's usage help
by explaining when --skip-deletes should be considered
and noting that it should be used in conjunction with
a cronjob to process deletions after hours.

Signed-off-by: Galen Charlton <gmc@esilibrary.com>
2014-03-10 18:46:28 +00:00
b0870311e1 Bug 10955 - Add ability to skip deletions in zebraqueue
It seems that record deletions can cause extreme slowdowns for Koha
installations with extremely large numbers of records. It would be
helpful to be able to skip record deletions when processing the
zebraqueue with rebuild_zebra.pl so the deletions can be processed with
a lower frequency.

Test Plan:
1) Disable any zebra indexing cronjobs you may have
2) Delete a record
3) Note the operation recordDelete in the zebraqueue table having done = 0
4) Run misc/migration_tools/rebuild_zebra.pl -b -z --skip-deletes
5) Note the delete still has done = 0
6) Run misc/migration_tools/rebuild_zebra.pl -b -z
7) Note the delete now has done = 1

Signed-off-by: Jonathan Druart <jonathan.druart@biblibre.com>
Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>
Passes all tests and QA script.
Also tested for authorities, no problems found.

Signed-off-by: Galen Charlton <gmc@esilibrary.com>

RM note: this is at best a work-around, and I will emphasize that
--skip-deletes should be used only when absolutely necessary.

I hope that --skip-deletes can go away at some point soon, but
that may depend on changes to Zebra.
2014-03-10 18:44:10 +00:00
Galen Charlton
160c44d4e9 Bug 11078: (follow-up) tidy code
- fix a couple typos in comments
- make replace a "$i" with a more descriptive variable name
- style some of the new code

Signed-off-by: Galen Charlton <gmc@esilibrary.com>
2014-02-28 22:24:28 +00:00
07de37f0e5 Bug 11078: QA Follow-up for missing file permissions on lockfile
The original patch creates a lockfile in the ZEBRA_LOCKDIR.
It can fall back to /var/lock or even /tmp.
If the create fails, it dies. This can be considered as very
exceptional.

This followup adjusts the fallback location in /var/lock or /tmp
slightly.  It appends the database name to the folder in order to
prevent interfering between multiple Koha instances. Creation of the
lockfile has been moved to a subroutine extending directory and file
creation testing.

In the very unlikely case that we cannot create the lockfile (after
three separate tries), this follow-up allows you to continue instead
of die.  This is just as we did before we had file locking here. Every
time skipping a reindex could cause more harm than continuing and
having the race condition once in a while.

Test plan:
Test adding and removing lockdir from your koha-conf.xml. Check fallback.
Note that fallback in /var/lock or /tmp must contain database name.
Remove the lockdir config line and remove permissions from fallback. In
this case the reindex should continue but with a warning.

Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
Tested with daemon and one-off invocation simultaneously.
Tested new wait parameter.
Tried all variations of lock directory (changing permissions etc.)

Signed-off-by: Galen Charlton <gmc@esilibrary.com>
2014-02-28 22:22:47 +00:00
Doug Kingston
88e7faf860 Bug 11078: Add locking to rebuild_zebra
This patch adds locking to rebuild_zebra.pl to ensure that simultaneous
changes are prevented (as one is likely to overwrite the other).
Incremental updates in daemon mode will skipped if the lock is busy
and they will be picked up on the next pass.  Non-daemon mode
invocations will also exit immediately if they cannot get the lock
unless the new flag -wait-for-lock is specified, in which case they
will wait until the get the lock and then proceed.

Supporting changes made to Makefile.PL and templates for the new
locking directory (paralleling the other zebra lock directories).
We stash the zebra_lockdir in koha-conf.xml so rebuild_zebra.pl
can find it.

To address earlier QA concerns we:
1. added code to check if flock is available and ignore locking if
it's missing (from M. de Rooy)

2. changed default for adhoc invocations to abort if they cannot
obtain the lock.  Added option -wait-for-lock if the user prefers
to wait until the lock is free, and then continue processing.

3. added missing entry to t/db_dependent/zebra_config.pl

4. added a fallback locking directory of /tmp

Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
Doug merged the original patch with the QA changes.
Just for the record, noting here that the original patch was tested
extensively too by Martin Renvoize.
I have added a followup for some exceptional cases.

Signed-off-by: Galen Charlton <gmc@esilibrary.com>
2014-02-28 22:21:41 +00:00
Galen Charlton
b26870e53d Bug 11252: remove deprecated -munge-config switch from rebuild_zebra.pl
The -munge-config switch has been deprecated for years, and
trying to use it would either not work at all or, if it did "work",
almost certainly damage one's Zebra configuration for Koha.

This patch removes this switch.

To test:

[1] Run rebuild_zebra.pl and verify that no mention is made
    of -munge-config.
[2] Run rebuild_zebra.pl to index records in one's test database
    and verify that there are no regressions.

Signed-off-by: Galen Charlton <gmc@esilibrary.com>
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
Removing a really dangerous option

Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>
Passes all tests and QA script.
Ran rebuild_zebra.pl with various options and confirmed
that data was reindexed successfully.
No regressions found.

Signed-off-by: Galen Charlton <gmc@esilibrary.com>
2013-12-26 15:24:41 +00:00
Galen Charlton
b25de3e7cf Bug 6435: (follow-up) make -daemon really imply -a and -b
This patch follows up on the previous patch by moving the
check for whether authority and/or biblio indexing have been
specified so that -daemon has a chance to set those modes.

Signed-off-by: Galen Charlton <gmc@esilibrary.com>
2013-11-24 18:20:56 +00:00
Doug Kingston
00240d6970 Bug 6435: (follow-up) rebuild_zebra -daemon option now smarter
Based on feedback, make daemon mode imply -z -a -b and abort
on startup if flags incompatible with an incremental update daemon
are used.  Update documentation to match.

Signed-off-by: Galen Charlton <gmc@esilibrary.com>
2013-11-24 18:15:23 +00:00
Doug Kingston
1b0992e8d5 Bug 6435: Add daemon mode to rebuild_zebra.pl
This change adds code to check the zebraqueue table with a cheap SQL query
and a daemon loop that checks for new entries and processes them incrementally
before sleeping for a controllable number of seconds.  The default is 5 seconds
which provides a near realtime search index update.  This is desirable particularly
for libraries that are doing active catalogue updating.  The query is adjusted
based on whether -a, -b, or -a -b are specified.

Help text updated.  Tested against a live 3.12 system.

Note that this fix will benefit from the fix to lack of locking (bug 11078)

Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Signed-off-by: Galen Charlton <gmc@esilibrary.com>
2013-11-24 18:12:21 +00:00
2eefd1f3a5 Bug 8745: General whitespace and tab tidy
http://bugs.koha-community.org/show_bug.cgi?id=8745
Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>
1) Runs not with root.
2) Runs with root and -run-as-root.
3) Runs using the normal koha user.

Note: Maybe the message should be clear about why
running as root is bad and which user you should
be running the script with?
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
2013-04-21 09:41:34 -04:00
Barry Cannon
ef86a77801 Bug 8745 - Disallow rebuild_zebra.pl from executing, when run by root user.
Added a check to warn users of execution as root user.
Added a 'runas-root' switch to allow users to force execution as root user.

Signed-off-by: Mason James <mtj@kohaaloha.com>
Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
2013-04-21 09:41:34 -04:00
f9c8f39c02 Bug 9609: Rebuilding zebra reports double number of exported records.
Test plan:
Clear the zebra queue (run rebuild). Update one biblio.
Rebuild zebra (again) with -z. Check zebra log: note 2 exported records.
Now apply patch, and repeat: You will see 1 exported record.

Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>
Works as described.
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
2013-04-02 08:41:40 -04:00
Galen Charlton
151e22070a bug 9496: improve error checking in rebuild_zebra.pl
When using rebuild_zebra to index all records, skip over
bibliographic or authority records that don't come out
as valid XML.  Also, strip extraneous XML declarations when
using --nosanitize.

Test plans
----------
Note that both plans assume that DOM indexing is turned on.

Test plan #1
============

[1] Run rebuild_zebra.pl with the -x -nosanitize options.  Without
    the patch, zebraidx should terminate early and complain
    about invalid XML.
[2] With the patch, the rebuild_zebra.pl should work without
    error.

Test plan #2
============
[1] Intentionally make a MARCXML record invalid, e.g, by running
    the following SQL:

    UPDATE bilbioitems SET marcxml = CONCATENATE(marcxml, 'junk')
    WHERE biblionumber = 123;

[2] Run rebuild_zebra.pl -b -x -r
[3] Without the patch, only part of the database will be indexed.
[4] With the patch, rebuild_zebra.pl will not export the bad
    record and will give an error message saying so, but will
    successfully index the rest of the records.

Signed-off-by: Galen Charlton <gmc@esilibrary.com>
Signed-off-by: Larry Baerveldt <larry@bywatersolutions.com>
Signed-off-by: Mason James <mtj@kohaaloha.com>

Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
2013-03-21 22:25:03 -04:00
4dcee58a4d Bug 7440 - Remove NoZebra vestiges
Removed NoZebra vestiges. This comprises several code blocks that depend on the NoZebra syspref and NZ related functions/methods.

C4::Biblio->
 GetNoZebraIndexes
 _DelBiblioNoZebra
 _AddBiblioNoZebra

C4::Search->
 NZgetRecords
 NZanalyse
 NZoperatorAND
 NZoperatorOR
 NZoperatorNOT
 NZorder

C4::Installer->
 set_indexing_engine

Sponsored-by: Universidad Nacional de Córdoba
Signed-off-by: Julian Maurice <julian.maurice@biblibre.com>

Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
2013-03-19 21:17:04 -04:00
Jared Camins-Esakov
49cadcf7c1 Bug 9049: Don't use shadow with rebuild_zebra -r
Due to a limitation of Zebra, the register must be cleared *before*
doing shadow indexing if you want to reset the indexes. In light of
that, it does not make sense to do shadow indexing at all when
rebuild_zebra.pl is run with the -r switch. This patch makes -r (reset)
imply -n (no shadow).

To test:
1) Run `rebuild_zebra.pl -b -r -v -v -v`
2) Note that the script never runs the merge phase

Without the patch I see log lines refering to the shadow cache (enabling shadow spec=/home/koha/koha-dev/var/lib/zebradb/biblios/shadow:20G)
With the patch I don't see anything in the logs about shadow.  I do however see lines about merging.  I think it could just be a misunderstanding of the logs

Signed-off-by: wajasu <matted-34813@mypacks.net>
Signed-off-by: Elliott Davis <elliott@bywatersolutions.com>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
2012-12-08 09:46:30 -05:00
Jared Camins-Esakov
deeeb068d9 Bug 9050: Use safer adelete when deleting records from Zebra index
Previously we used the "delete" command in zebraidx, which fails when
you try to delete a record that doesn't exist in the index. By changing
to the "adelete" command, we can reduce the likelihood of a failed
delete causing ghost records. A symptom of this problem is the warning
message occasionally encountered when indexing from the zebraqueue,
"[warn] cannot delete record above (seems new)."

To test:
1) Add a recordDelete action for a record that does not exist to
   zebraqueue in MySQL:
   INSERT INTO zebraqueue (biblio_auth_number, operation, server) \
       VALUES (999999999, 'recordDelete', 'biblioserver');
2) Run `rebuild_zebra.pl -b -z -v [-x]`.
3) Note that you do not get the message "[warn] cannot delete record
   above (seems new)".

Signed-off-by: Chris Cormack <chris@bigballofwax.co.nz>
Passed-QA-by: Paul Poulain <paul.poulain@biblibre.com>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
2012-11-12 18:53:49 -05:00
Jared Camins-Esakov
bc05b5d163 Bug 7417: Include see from references in bibliographic searches
This patch adds the Koha::Indexer::RecordNormalizer and
Koha::Indexer::MARC::RecordNormalizer::EmbedSeeFromHeadings packages
to enable the inclusion of alternate forms of headings in bibliographic
searches. When the new syspref IncludeSeeFromInSearches is turned on
(default is off) rebuild_zebra.pl will insert see from headings from
authority records into bibliographic records when indexing, so that a
search on an obsolete term will turn up relevant records.

To test:
1) Enable IncludeSeeFromInSearches
2) Add a heading that has an alternate form to a record (for example,
   "Cooking" has the alternate form "Cookery," if you have authority
   records from LC)
3) Index the zebraqueue (or reindex if you haven't indexed your system
   yet)
4) Confirm that if you search for "Cookery" you get the record you
   just modified

Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
Rebased on master 5 August 2012
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
Rebased on master 11 September 2012

Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>

Also checked:
- Verified database update works correctly
- Checked system preference and its description
- Checked staff/opac detail pages with feature on/off
- Checked staff/opac search facets
- Downloaded and tested records in various formats
- Tried different searches for 'see from' entries of authorities
- Ran all unit tests

No problems found.
2012-09-13 14:19:28 +02:00
Julian Maurice
57424a9fdc Bug 7286: rebuild_zebra_sliced for biblios and authorities
Complete rewrite of rebuild_zebra_sliced.zsh (renamed to .sh). Main
improvements are:
  - both biblio and authority records are handled
  - records are exported only once

It also add an option --skip-index to rebuild_zebra.pl that permit to
use rebuild_zebra.pl as an 'export only' script.

Description:
Index Koha records by chunks. It is useful when some record causes
errors and stop the indexation process. With this script, if indexation
of one chunk fails, chunk is splitted in 2 (or 3) chunks, and
indexation continue on these chunks.
rebuild_zebra.pl is called only once to export records.
Splitting and indexing is handled by this script (using yaz-marcdump and
zebraidx).

Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
2012-07-06 15:06:40 +02:00
christophe croullebois
082bb5049d Bug 8136 Changes the expected lenght of 100$a in rebuild_zebra.pl
In rebuild_zebra.pl, if we are in "unimarc" ("marcflavour" syspref), the sub "fix_unimarc_100" is called and checks if 100$a lenght is equal to 35.
If it is not the case, the sub inserts the localtime and more, so we loose the datas in reindexing.
The standart lenght is 36.
I have just changed 35 to 36.

Signed-off-by: Sophie Meynieux <sophie.meynieux@biblibre.com>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
2012-06-20 09:39:27 +02:00
Galen Charlton
daca5edc52 Bug 7818: -x option of rebuild_zebra.pl now works with DOM filter
One consequence is that the -x and -a options are no longer
mutually exclusive.

Also, because of the way that the GRS-1 SGML filter works, if you're
indexing multiple documents, you can't just wrap them in a document
element, but the DOM filter *requires* it.  Consequently, two
new config settings in koha-conf.xml are added to indicate the
Zebra filter in use so that the -x option of rebuild_zebra.pl
knows whether to wrap the exported records or not:

- bib_index_mode (defaults to 'grs1' if not specified)
- auth_index_mode (defaults to 'dom')

Signed-off-by: Galen Charlton <gmc@esilibrary.com>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
2012-06-09 11:44:09 +02:00
Paul Poulain
1fd8c8a4de Bug 7246 add offset/length and where options to rebuild_zebra
This patch reimplement a feature that is on biblibre/master for Koha-community/master

It adds 4 parameters:
* offset = the offset of record. Say 1000 to start rebuilding at the 1000th record of your database
* length = how many records to export. Say 400 to export only 400 records
* where = add a where clause to rebuild only a given itemtype, or anything you want to filter on

Another improvement resulting from offset & length limit is the rebuild_zebra_sliced.zsh
that will be submitted in another patch.
rebuild_zebra_sliced will slice your all database in small chunks, and, if something went wrong for a given slice, will slice the slice, and repeat, until you reach a slice size of 1, showing which record is wrong in your database.

Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
Removed mention of -l option for limiting number of items exported, as requested
by QA manager. This can be re-added in a later patch.

Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
2012-02-17 10:59:23 +01:00
Colin Campbell
263dded818 Bug 6752: Be stricter with utf-8 encoding of output
use encoding(UTF-8) rather than utf-8 for stricter
encoding
Marking output as ':utf8' only flags the data as utf8
using :encoding(UTF-8) also checks it as valid utf-8
see binmode in perlfunc for more details
In accordance with the robustness principle input
filehandles have not been changed as code may make
the undocumented assumption that invalid utf-8 is present
in the imput
Fixes errors reported by t/00-testcritic.t
Where feasable some filehandles have been made lexical rather than
reusing global filehandle vars

Signed-off-by: Jonathan Druart <jonathan.druart@biblibre.com>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
2012-01-27 12:11:06 +01:00
Dobrica Pavlinusic
90d68d6f5c Bug 7247 - rebuild_zebra.pl -v should show all Zebra log output
Currently, -v option resets Zebra log output to default system values.

This produce amount of log specified in system defaults which is usually
too low for debugging.

This change explicitly forces all Zebra log output which create much more
chatter so it triggers with verbosity level 2

Test scenario:
1. pick koha site to reindex
2. use -v -v options to rebuild_zebra.pl to see additional output

Signed-off-by: Liz Rea <wizzyrea@gmail.com>
Verified help corrections and  loglevel 2 output vs. loglevel 1 output. No issues found.

Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
2012-01-17 17:31:25 +01:00
Robin Sheat
849547df68 Bug 7008 - create tmp dir for zebra
Sometimes zebra needs a tmp dir in order to work. This ensures that it
is created both by koha-create-dirs in the packages, and by
rebuild_zebra when it runs.
--

tested ok, signing off
Signed-off-by: Mason James <mtj@kohaaloha.com>
2011-12-03 07:56:44 +01:00
4ce57a102b Bug 6799 rebuild_zebra.pl -x produces invalid XML records
This patch allow to handle properly items containing extended characters and
send valid XML records to zebraidx

Signed-off-by: Julian Maurice <julian.maurice@biblibre.com>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
2011-11-18 23:29:08 +01:00
Ian Walls
4e95e94727 Bug 6789: biblios with many items can result in broken search results link
This patch fixes an issue whereby biblios with many items (often > 500) would index,
but not the biblionumber itself, resulting in search results with a) inaccurate item counts
and b) no biblionumber to use in the link to the details page.  This is due to Net::Z3950::ZOOM  not providing
a mechanism for specifying different connection attributes; the maximumRecordSize ZOOM connection attribute,
if not specified, defaults to 1MB, which is less than the size of a MARC record with many, many 952 fields.  Since
it is unlikely we can fix Net::Z3950::ZOOM in a timely fashion, this patch aims to build a workaround on the Koha end.

This patch changes EmbedItemsInMarcBiblio to use append_fields instead of insert_ordered_fields,
so the 999$c will come before the item records.  It's VERY unlikely we will encounter more than 1MB of biblio-level MARC
content, as this would break the ISO-2709 standard by a large factor.

To this end, it also moves the fix_biblio_ids portion of get_corrected_marc_record out of rebuild_zebra.pl,
and makes it a part of GetMarcBiblio (right before EmbedItemsInMarcBiblio, so the 952s still come last).  fix_biblio_ids
is kept as a subroutine for the deletion portion of rebuild_zebra.pl, which still uses it.

It also uses the subroutine parameter in GetMarcBiblio to do the EmbedItemsInMarcBiblio action, rather than having
rebuild_zebra.pl perform it on the itemless record returned from GetMarcBiblio.  Simpler and cleaner that way.

To verify bug issue:
1. Find a biblio with over 700 items (or enough that the resulting MARCXML is greater than 1MB)
2. search for this biblio (in a search that would return multiple results, not just this title).  You should get the title in
the results list
3. attempt to click the link to this biblio's details page; the biblionumber should be blank, leading to a 404

To test solution:
1. Apply patch
2. modify the biblio slightly (click the 005 for example) and save
   OR manually add the biblio to zebraqueue for reindexing
3. after rebuild_zebra.pl -z -b -x runs, use the same search as above. The title should still appear.
4. click the link, and find yourself on the biblio detail page as desired

Signed-off-by: D Ruth Bavousett <ruth@bywatersolutions.com>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
2011-10-15 13:47:24 +13:00
Jesse Weaver
048c0dc04e Bug 6492 - Deleted biblios cause rebuild_zebra to fail
This both adds a bit of a failsafe to get_raw_biblio, and prevents
records that have been deleted from being updated by the same instance
of rebuild_zebra.

Minor amendment to remove duplication of 6433

Signed-off-by: MJ Ray <mjr@phonecoop.coop>
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
2011-07-05 11:18:28 +12:00
3b8f1318e0 Bug 6050 Followup, edit a last function call
Signed-off-by: Frédéric Demians <f.demians@tamil.fr>
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
2011-06-14 14:12:05 +12:00
Srdjan Janković
5829cef6d8 bug_6433: exception handling
Signed-off-by: Magnus Enger <magnus@enger.priv.no>
2011-06-10 11:27:25 +12:00