Commit graph

86 commits

Author SHA1 Message Date
6122b8fe6e Bug 16830: (followup) Remove weird character from warning in rebuild_zebra.pl
Signed-off-by: Mark Tompsett <mtompset@hotmail.com>

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>

Signed-off-by: Brendan Gallagher <brendan@bywatersolutions.com>
2016-08-04 19:41:42 +00:00
6c65b64c84 Bug 16505: Make sure $as_xml will not be used later
Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
2016-05-23 17:29:23 +00:00
5dd1b1bb66 Bug 16506: Remove warning for UNIMARC installs
Use of uninitialized value in numeric eq (==)

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>

Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
2016-05-23 17:29:12 +00:00
6c7a8c57e7 Bug 16506: (followup) Fix wrong option switch warning message
Signed-off-by: Bernardo Gonzalez Kriegel <bgkriegel@gmail.com>

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>

Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
2016-05-23 17:29:12 +00:00
7f22619001 Bug 16506: (Followup) remove warnings
Signed-off-by: Bernardo Gonzalez Kriegel <bgkriegel@gmail.com>

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>

Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
2016-05-23 17:29:12 +00:00
0b3b53e4f2 Bug 16506: Make rebuild_zebra.pl use XML as default
This patch deprecates the -x switch, making XML the default serialization format
used by rebuild_zebra.pl. It doesn't remove the option switch, but raises a warning
for the end user about the deprecation so they fix their cronjobs. Later we could remove it.

To test:
- Disable all indexing (daemon/cronjob)
- Create 2 records
- Edit one of them, delete the other one
- Verify they are queued for updates in zebraqueue
- sudo koha-mysql kohadev
  > SELECT * FROM zebraqueue WHERE done=0
...
| 265 |                265 | specialUpdate | biblioserver |    1 | 2016-05-13 14:23:45 |
| 266 |                  1 | recordDelete  | biblioserver |    1 | 2016-05-16 14:14:33 |
| 267 |                  2 | specialUpdate | biblioserver |    1 | 2016-05-16 14:15:06 |
+-----+--------------------+---------------+--------------+------+---------------------+
- Now go to koha-shell
  $ sudo koha-shell kohadev ; cd kohaclone
- Run:
  $ misc/migration_tools/rebuild_zebra.pl -k -b -z

  You will get something similar to this:
NOTHING cleaned : the export /tmp/jI0OeHy6Tn has been kept.
You can re-run this script with the -s  and -d /tmp/jI0OeHy6Tn parameters
if you just want to rebuild zebra after changing the record.abs
or another zebra config file
- Verify
  * less /tmp/jI0OeHy6Tn/del_biblio/exported_records
  * less /tmp/jI0OeHy6Tn/upd_biblio/exported_records
=> FAIL: They contain the records you added/modified/deleted but they are in
         USMARC format
- Apply the patch
- Mark your records for indexing (in koha-mysql kohadev)
  > UPDATE zebraqueue SET done=0 WHERE id > 264
- Run:
  $ misc/migration_tools/rebuild_zebra.pl -k -b -z

  You will get something similar to this:
<WARNINGS> [1]
NOTHING cleaned : the export /tmp/jI0OeHy6Tn has been kept.
You can re-run this script with the -s  and -d /tmp/jI0OeHy6Tn parameters
if you just want to rebuild zebra after changing the record.abs
or another zebra config file
- Verify
  * less /tmp/jI0OeHy6Tn/del_biblio/exported_records
  * less /tmp/jI0OeHy6Tn/upd_biblio/exported_records
=> SUCCESS: Data is correctly in XML format
- Run:
  $ misc/migration_tools/rebuild_zebra.pl -k -b -z -noxml

  You will get something similar to this:
<WARNINGS> [1]
NOTHING cleaned : the export /tmp/jI0OeHy6Tn has been kept.
You can re-run this script with the -s  and -d /tmp/jI0OeHy6Tn parameters
if you just want to rebuild zebra after changing the record.abs
or another zebra config file
- Verify
  * less /tmp/jI0OeHy6Tn/del_biblio/exported_records
  * less /tmp/jI0OeHy6Tn/upd_biblio/exported_records
=> SUCCESS: Data is correctly in USMARC format
- Sign off :-D

[1] Warnings covered by a followup

Signed-off-by: Bernardo Gonzalez Kriegel <bgkriegel@gmail.com>
On top of Bug 16505
Work as described following test plan, usmarc default pre patch,
post patch xml default and usmarc on request.
No errors (all patchset)

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>

Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
2016-05-23 17:29:12 +00:00
7a178fd262 Bug 16505: <collection> is missing the marc namespace and updates fail if -x is passed
Using rebuild_zerba.pl with the -x option switch, produces an incorrect output in
terms of what our XSLTs expect for indexing. This patch introduces the right namespace information
on the exported records so indexing succeeds.

To test:
- On current master, have some records on your db
- Run:
  $ sudo koha-shell kohadev
  $ cd kohaclone
  $ misc/migration_tools/rebuild_zebra.pl -r -b -k -x
=> you will get a message like this:

NOTHING cleaned : the export /tmp/NL5ufjUfpp has been kept.

- Run
  $ less /tmp/NL5ufjUfpp/biblio/exported_records
=> FAIL: The first line looks like this

<?xml version="1.0" encoding="UTF-8"?><collection><record

- Now run:
  $ xsltproc \
     /etc/koha/zebradb/marc_defs/marc21/biblios/biblio-zebra-indexdefs.xsl \
     /tmp/NL5ufjUfpp/biblio/exported_records
=> FAIL: No output
- Apply the patch
- Run:
  $ misc/migration_tools/rebuild_zebra.pl -r -b -k -x
- Take a look at the result file:
  $ less /tmp/asdiouqwiue/biblio/exported_records
=> SUCCESS: The start of the file looks like this:
<?xml version="1.0" encoding="UTF-8"?><collection xmlns="http://www.loc.gov/MARC21/slim">

- Run:
  $ xsltproc \
     /etc/koha/zebradb/marc_defs/marc21/biblios/biblio-zebra-indexdefs.xsl \
     /tmp/asdiouqwiue/biblio/exported_records
=> SUCCESS: There is actually indexing data :-D
- Sign off :-D

Edit: I changed qq{} for q{} as suggested by Jonathan.

Sponsored-by: American Numismatic Society

Signed-off-by: Bernardo Gonzalez Kriegel <bgkriegel@gmail.com>
Works as described following test plan
No errors

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>

Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
2016-05-23 17:04:04 +00:00
Julian Maurice
48df0b8a2d Bug 15325: Fix --table option of rebuild_zebra.pl
Option's value given on command line was never used and 'biblioitems'
was used instead.

Test plan:
1. git checkout master
2. perl misc/migration_tools/rebuild_zebra.pl -b -t items --where "price = 42"
3. You should see errors printed on screen about an unknown column
4. Apply patch
5. perl misc/migration_tools/rebuild_zebra.pl -b -t items --where "price = 42"
6. No errors \o/

Signed-off-by: Frédéric Demians <f.demians@tamil.fr>

Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>

Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
2015-12-11 16:15:50 +00:00
Jonathan Druart
e8055c7ef6 Bug 12368: Die if the --table value is not allowed.
If the table given in parameter is not in the white list, the script
should die rathen than correct to a default value.

Signed-off-by: Chris Cormack <chris@bigballofwax.co.nz>

Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
2015-10-09 14:25:58 -03:00
Jonathan Druart
2d9c221abc Bug 12368: Rebuild Zebra improvement: allow to specify a DB table
Currently the --where parameter only allow to specify a condition on
fields in the biblioitems table.
For some needs it would be great to specify a condition on the field in
the items table.

The use case is the following: you want to reindex biblios with items
modified since a specific timestamp.

Test plan:
1/ Pick an item randomly in your catalogue
2/ Edit it and save
3/ Note that the items.timestamp has been set to today but not the
biblioitems.timestamp
4/ launch rebuild_zebra without the new parameter
  perl misc/migration_tools/rebuild_zebra.pl -b -v --where
  "timestamp >= XXX"
where XXX is the today date (e.g. "2014-06-05 00:00:00").
Note that the biblio has not been indexed.
5/ launch rebuild_zebra using the new parameter:
  perl misc/migration_tools/rebuild_zebra.pl -b -v -t items --where
  "timestamp >= XXX"
Note the biblio has been indexed.

Signed-off-by: Chris Cormack <chris@bigballofwax.co.nz>

Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
2015-10-09 14:25:58 -03:00
0a53d5e6b6 Bug 12651: DOM indexing is the default
On the 23 July development meeting it was decided to formally deprecate
GRS-1 indexing mode for Zebra. This patch makes code fallback to DOM
on the remaining places. No behaviour change should be noticed, as DOM
has been the default for a while.

Regards

Signed-off-by: Chris Cormack <chris@bigballofwax.co.nz>
Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>
Passes tests and QA script.
Also checked running Makefile.PL

Signed-off-by: Tomas Cohen Arazi <tomascohen@gmail.com>
2014-10-27 12:35:44 -03:00
Galen Charlton
03338b70e4 Bug 10955: (follow-up) improve usage information
This patch improves rebuild_zebra.pl's usage help
by explaining when --skip-deletes should be considered
and noting that it should be used in conjunction with
a cronjob to process deletions after hours.

Signed-off-by: Galen Charlton <gmc@esilibrary.com>
2014-03-10 18:46:28 +00:00
b0870311e1 Bug 10955 - Add ability to skip deletions in zebraqueue
It seems that record deletions can cause extreme slowdowns for Koha
installations with extremely large numbers of records. It would be
helpful to be able to skip record deletions when processing the
zebraqueue with rebuild_zebra.pl so the deletions can be processed with
a lower frequency.

Test Plan:
1) Disable any zebra indexing cronjobs you may have
2) Delete a record
3) Note the operation recordDelete in the zebraqueue table having done = 0
4) Run misc/migration_tools/rebuild_zebra.pl -b -z --skip-deletes
5) Note the delete still has done = 0
6) Run misc/migration_tools/rebuild_zebra.pl -b -z
7) Note the delete now has done = 1

Signed-off-by: Jonathan Druart <jonathan.druart@biblibre.com>
Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>
Passes all tests and QA script.
Also tested for authorities, no problems found.

Signed-off-by: Galen Charlton <gmc@esilibrary.com>

RM note: this is at best a work-around, and I will emphasize that
--skip-deletes should be used only when absolutely necessary.

I hope that --skip-deletes can go away at some point soon, but
that may depend on changes to Zebra.
2014-03-10 18:44:10 +00:00
Galen Charlton
160c44d4e9 Bug 11078: (follow-up) tidy code
- fix a couple typos in comments
- make replace a "$i" with a more descriptive variable name
- style some of the new code

Signed-off-by: Galen Charlton <gmc@esilibrary.com>
2014-02-28 22:24:28 +00:00
07de37f0e5 Bug 11078: QA Follow-up for missing file permissions on lockfile
The original patch creates a lockfile in the ZEBRA_LOCKDIR.
It can fall back to /var/lock or even /tmp.
If the create fails, it dies. This can be considered as very
exceptional.

This followup adjusts the fallback location in /var/lock or /tmp
slightly.  It appends the database name to the folder in order to
prevent interfering between multiple Koha instances. Creation of the
lockfile has been moved to a subroutine extending directory and file
creation testing.

In the very unlikely case that we cannot create the lockfile (after
three separate tries), this follow-up allows you to continue instead
of die.  This is just as we did before we had file locking here. Every
time skipping a reindex could cause more harm than continuing and
having the race condition once in a while.

Test plan:
Test adding and removing lockdir from your koha-conf.xml. Check fallback.
Note that fallback in /var/lock or /tmp must contain database name.
Remove the lockdir config line and remove permissions from fallback. In
this case the reindex should continue but with a warning.

Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
Tested with daemon and one-off invocation simultaneously.
Tested new wait parameter.
Tried all variations of lock directory (changing permissions etc.)

Signed-off-by: Galen Charlton <gmc@esilibrary.com>
2014-02-28 22:22:47 +00:00
Doug Kingston
88e7faf860 Bug 11078: Add locking to rebuild_zebra
This patch adds locking to rebuild_zebra.pl to ensure that simultaneous
changes are prevented (as one is likely to overwrite the other).
Incremental updates in daemon mode will skipped if the lock is busy
and they will be picked up on the next pass.  Non-daemon mode
invocations will also exit immediately if they cannot get the lock
unless the new flag -wait-for-lock is specified, in which case they
will wait until the get the lock and then proceed.

Supporting changes made to Makefile.PL and templates for the new
locking directory (paralleling the other zebra lock directories).
We stash the zebra_lockdir in koha-conf.xml so rebuild_zebra.pl
can find it.

To address earlier QA concerns we:
1. added code to check if flock is available and ignore locking if
it's missing (from M. de Rooy)

2. changed default for adhoc invocations to abort if they cannot
obtain the lock.  Added option -wait-for-lock if the user prefers
to wait until the lock is free, and then continue processing.

3. added missing entry to t/db_dependent/zebra_config.pl

4. added a fallback locking directory of /tmp

Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
Doug merged the original patch with the QA changes.
Just for the record, noting here that the original patch was tested
extensively too by Martin Renvoize.
I have added a followup for some exceptional cases.

Signed-off-by: Galen Charlton <gmc@esilibrary.com>
2014-02-28 22:21:41 +00:00
Galen Charlton
b26870e53d Bug 11252: remove deprecated -munge-config switch from rebuild_zebra.pl
The -munge-config switch has been deprecated for years, and
trying to use it would either not work at all or, if it did "work",
almost certainly damage one's Zebra configuration for Koha.

This patch removes this switch.

To test:

[1] Run rebuild_zebra.pl and verify that no mention is made
    of -munge-config.
[2] Run rebuild_zebra.pl to index records in one's test database
    and verify that there are no regressions.

Signed-off-by: Galen Charlton <gmc@esilibrary.com>
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
Removing a really dangerous option

Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>
Passes all tests and QA script.
Ran rebuild_zebra.pl with various options and confirmed
that data was reindexed successfully.
No regressions found.

Signed-off-by: Galen Charlton <gmc@esilibrary.com>
2013-12-26 15:24:41 +00:00
Galen Charlton
b25de3e7cf Bug 6435: (follow-up) make -daemon really imply -a and -b
This patch follows up on the previous patch by moving the
check for whether authority and/or biblio indexing have been
specified so that -daemon has a chance to set those modes.

Signed-off-by: Galen Charlton <gmc@esilibrary.com>
2013-11-24 18:20:56 +00:00
Doug Kingston
00240d6970 Bug 6435: (follow-up) rebuild_zebra -daemon option now smarter
Based on feedback, make daemon mode imply -z -a -b and abort
on startup if flags incompatible with an incremental update daemon
are used.  Update documentation to match.

Signed-off-by: Galen Charlton <gmc@esilibrary.com>
2013-11-24 18:15:23 +00:00
Doug Kingston
1b0992e8d5 Bug 6435: Add daemon mode to rebuild_zebra.pl
This change adds code to check the zebraqueue table with a cheap SQL query
and a daemon loop that checks for new entries and processes them incrementally
before sleeping for a controllable number of seconds.  The default is 5 seconds
which provides a near realtime search index update.  This is desirable particularly
for libraries that are doing active catalogue updating.  The query is adjusted
based on whether -a, -b, or -a -b are specified.

Help text updated.  Tested against a live 3.12 system.

Note that this fix will benefit from the fix to lack of locking (bug 11078)

Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Signed-off-by: Galen Charlton <gmc@esilibrary.com>
2013-11-24 18:12:21 +00:00
2eefd1f3a5 Bug 8745: General whitespace and tab tidy
http://bugs.koha-community.org/show_bug.cgi?id=8745
Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>
1) Runs not with root.
2) Runs with root and -run-as-root.
3) Runs using the normal koha user.

Note: Maybe the message should be clear about why
running as root is bad and which user you should
be running the script with?
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
2013-04-21 09:41:34 -04:00
Barry Cannon
ef86a77801 Bug 8745 - Disallow rebuild_zebra.pl from executing, when run by root user.
Added a check to warn users of execution as root user.
Added a 'runas-root' switch to allow users to force execution as root user.

Signed-off-by: Mason James <mtj@kohaaloha.com>
Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
2013-04-21 09:41:34 -04:00
f9c8f39c02 Bug 9609: Rebuilding zebra reports double number of exported records.
Test plan:
Clear the zebra queue (run rebuild). Update one biblio.
Rebuild zebra (again) with -z. Check zebra log: note 2 exported records.
Now apply patch, and repeat: You will see 1 exported record.

Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>
Works as described.
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
2013-04-02 08:41:40 -04:00
Galen Charlton
151e22070a bug 9496: improve error checking in rebuild_zebra.pl
When using rebuild_zebra to index all records, skip over
bibliographic or authority records that don't come out
as valid XML.  Also, strip extraneous XML declarations when
using --nosanitize.

Test plans
----------
Note that both plans assume that DOM indexing is turned on.

Test plan #1
============

[1] Run rebuild_zebra.pl with the -x -nosanitize options.  Without
    the patch, zebraidx should terminate early and complain
    about invalid XML.
[2] With the patch, the rebuild_zebra.pl should work without
    error.

Test plan #2
============
[1] Intentionally make a MARCXML record invalid, e.g, by running
    the following SQL:

    UPDATE bilbioitems SET marcxml = CONCATENATE(marcxml, 'junk')
    WHERE biblionumber = 123;

[2] Run rebuild_zebra.pl -b -x -r
[3] Without the patch, only part of the database will be indexed.
[4] With the patch, rebuild_zebra.pl will not export the bad
    record and will give an error message saying so, but will
    successfully index the rest of the records.

Signed-off-by: Galen Charlton <gmc@esilibrary.com>
Signed-off-by: Larry Baerveldt <larry@bywatersolutions.com>
Signed-off-by: Mason James <mtj@kohaaloha.com>

Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
2013-03-21 22:25:03 -04:00
4dcee58a4d Bug 7440 - Remove NoZebra vestiges
Removed NoZebra vestiges. This comprises several code blocks that depend on the NoZebra syspref and NZ related functions/methods.

C4::Biblio->
 GetNoZebraIndexes
 _DelBiblioNoZebra
 _AddBiblioNoZebra

C4::Search->
 NZgetRecords
 NZanalyse
 NZoperatorAND
 NZoperatorOR
 NZoperatorNOT
 NZorder

C4::Installer->
 set_indexing_engine

Sponsored-by: Universidad Nacional de Córdoba
Signed-off-by: Julian Maurice <julian.maurice@biblibre.com>

Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
2013-03-19 21:17:04 -04:00
Jared Camins-Esakov
49cadcf7c1 Bug 9049: Don't use shadow with rebuild_zebra -r
Due to a limitation of Zebra, the register must be cleared *before*
doing shadow indexing if you want to reset the indexes. In light of
that, it does not make sense to do shadow indexing at all when
rebuild_zebra.pl is run with the -r switch. This patch makes -r (reset)
imply -n (no shadow).

To test:
1) Run `rebuild_zebra.pl -b -r -v -v -v`
2) Note that the script never runs the merge phase

Without the patch I see log lines refering to the shadow cache (enabling shadow spec=/home/koha/koha-dev/var/lib/zebradb/biblios/shadow:20G)
With the patch I don't see anything in the logs about shadow.  I do however see lines about merging.  I think it could just be a misunderstanding of the logs

Signed-off-by: wajasu <matted-34813@mypacks.net>
Signed-off-by: Elliott Davis <elliott@bywatersolutions.com>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
2012-12-08 09:46:30 -05:00
Jared Camins-Esakov
deeeb068d9 Bug 9050: Use safer adelete when deleting records from Zebra index
Previously we used the "delete" command in zebraidx, which fails when
you try to delete a record that doesn't exist in the index. By changing
to the "adelete" command, we can reduce the likelihood of a failed
delete causing ghost records. A symptom of this problem is the warning
message occasionally encountered when indexing from the zebraqueue,
"[warn] cannot delete record above (seems new)."

To test:
1) Add a recordDelete action for a record that does not exist to
   zebraqueue in MySQL:
   INSERT INTO zebraqueue (biblio_auth_number, operation, server) \
       VALUES (999999999, 'recordDelete', 'biblioserver');
2) Run `rebuild_zebra.pl -b -z -v [-x]`.
3) Note that you do not get the message "[warn] cannot delete record
   above (seems new)".

Signed-off-by: Chris Cormack <chris@bigballofwax.co.nz>
Passed-QA-by: Paul Poulain <paul.poulain@biblibre.com>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
2012-11-12 18:53:49 -05:00
Jared Camins-Esakov
bc05b5d163 Bug 7417: Include see from references in bibliographic searches
This patch adds the Koha::Indexer::RecordNormalizer and
Koha::Indexer::MARC::RecordNormalizer::EmbedSeeFromHeadings packages
to enable the inclusion of alternate forms of headings in bibliographic
searches. When the new syspref IncludeSeeFromInSearches is turned on
(default is off) rebuild_zebra.pl will insert see from headings from
authority records into bibliographic records when indexing, so that a
search on an obsolete term will turn up relevant records.

To test:
1) Enable IncludeSeeFromInSearches
2) Add a heading that has an alternate form to a record (for example,
   "Cooking" has the alternate form "Cookery," if you have authority
   records from LC)
3) Index the zebraqueue (or reindex if you haven't indexed your system
   yet)
4) Confirm that if you search for "Cookery" you get the record you
   just modified

Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
Rebased on master 5 August 2012
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
Rebased on master 11 September 2012

Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>

Also checked:
- Verified database update works correctly
- Checked system preference and its description
- Checked staff/opac detail pages with feature on/off
- Checked staff/opac search facets
- Downloaded and tested records in various formats
- Tried different searches for 'see from' entries of authorities
- Ran all unit tests

No problems found.
2012-09-13 14:19:28 +02:00
Julian Maurice
57424a9fdc Bug 7286: rebuild_zebra_sliced for biblios and authorities
Complete rewrite of rebuild_zebra_sliced.zsh (renamed to .sh). Main
improvements are:
  - both biblio and authority records are handled
  - records are exported only once

It also add an option --skip-index to rebuild_zebra.pl that permit to
use rebuild_zebra.pl as an 'export only' script.

Description:
Index Koha records by chunks. It is useful when some record causes
errors and stop the indexation process. With this script, if indexation
of one chunk fails, chunk is splitted in 2 (or 3) chunks, and
indexation continue on these chunks.
rebuild_zebra.pl is called only once to export records.
Splitting and indexing is handled by this script (using yaz-marcdump and
zebraidx).

Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
2012-07-06 15:06:40 +02:00
christophe croullebois
082bb5049d Bug 8136 Changes the expected lenght of 100$a in rebuild_zebra.pl
In rebuild_zebra.pl, if we are in "unimarc" ("marcflavour" syspref), the sub "fix_unimarc_100" is called and checks if 100$a lenght is equal to 35.
If it is not the case, the sub inserts the localtime and more, so we loose the datas in reindexing.
The standart lenght is 36.
I have just changed 35 to 36.

Signed-off-by: Sophie Meynieux <sophie.meynieux@biblibre.com>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
2012-06-20 09:39:27 +02:00
Galen Charlton
daca5edc52 Bug 7818: -x option of rebuild_zebra.pl now works with DOM filter
One consequence is that the -x and -a options are no longer
mutually exclusive.

Also, because of the way that the GRS-1 SGML filter works, if you're
indexing multiple documents, you can't just wrap them in a document
element, but the DOM filter *requires* it.  Consequently, two
new config settings in koha-conf.xml are added to indicate the
Zebra filter in use so that the -x option of rebuild_zebra.pl
knows whether to wrap the exported records or not:

- bib_index_mode (defaults to 'grs1' if not specified)
- auth_index_mode (defaults to 'dom')

Signed-off-by: Galen Charlton <gmc@esilibrary.com>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
2012-06-09 11:44:09 +02:00
Paul Poulain
1fd8c8a4de Bug 7246 add offset/length and where options to rebuild_zebra
This patch reimplement a feature that is on biblibre/master for Koha-community/master

It adds 4 parameters:
* offset = the offset of record. Say 1000 to start rebuilding at the 1000th record of your database
* length = how many records to export. Say 400 to export only 400 records
* where = add a where clause to rebuild only a given itemtype, or anything you want to filter on

Another improvement resulting from offset & length limit is the rebuild_zebra_sliced.zsh
that will be submitted in another patch.
rebuild_zebra_sliced will slice your all database in small chunks, and, if something went wrong for a given slice, will slice the slice, and repeat, until you reach a slice size of 1, showing which record is wrong in your database.

Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
Removed mention of -l option for limiting number of items exported, as requested
by QA manager. This can be re-added in a later patch.

Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
2012-02-17 10:59:23 +01:00
Colin Campbell
263dded818 Bug 6752: Be stricter with utf-8 encoding of output
use encoding(UTF-8) rather than utf-8 for stricter
encoding
Marking output as ':utf8' only flags the data as utf8
using :encoding(UTF-8) also checks it as valid utf-8
see binmode in perlfunc for more details
In accordance with the robustness principle input
filehandles have not been changed as code may make
the undocumented assumption that invalid utf-8 is present
in the imput
Fixes errors reported by t/00-testcritic.t
Where feasable some filehandles have been made lexical rather than
reusing global filehandle vars

Signed-off-by: Jonathan Druart <jonathan.druart@biblibre.com>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
2012-01-27 12:11:06 +01:00
Dobrica Pavlinusic
90d68d6f5c Bug 7247 - rebuild_zebra.pl -v should show all Zebra log output
Currently, -v option resets Zebra log output to default system values.

This produce amount of log specified in system defaults which is usually
too low for debugging.

This change explicitly forces all Zebra log output which create much more
chatter so it triggers with verbosity level 2

Test scenario:
1. pick koha site to reindex
2. use -v -v options to rebuild_zebra.pl to see additional output

Signed-off-by: Liz Rea <wizzyrea@gmail.com>
Verified help corrections and  loglevel 2 output vs. loglevel 1 output. No issues found.

Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
2012-01-17 17:31:25 +01:00
Robin Sheat
849547df68 Bug 7008 - create tmp dir for zebra
Sometimes zebra needs a tmp dir in order to work. This ensures that it
is created both by koha-create-dirs in the packages, and by
rebuild_zebra when it runs.
--

tested ok, signing off
Signed-off-by: Mason James <mtj@kohaaloha.com>
2011-12-03 07:56:44 +01:00
4ce57a102b Bug 6799 rebuild_zebra.pl -x produces invalid XML records
This patch allow to handle properly items containing extended characters and
send valid XML records to zebraidx

Signed-off-by: Julian Maurice <julian.maurice@biblibre.com>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
2011-11-18 23:29:08 +01:00
Ian Walls
4e95e94727 Bug 6789: biblios with many items can result in broken search results link
This patch fixes an issue whereby biblios with many items (often > 500) would index,
but not the biblionumber itself, resulting in search results with a) inaccurate item counts
and b) no biblionumber to use in the link to the details page.  This is due to Net::Z3950::ZOOM  not providing
a mechanism for specifying different connection attributes; the maximumRecordSize ZOOM connection attribute,
if not specified, defaults to 1MB, which is less than the size of a MARC record with many, many 952 fields.  Since
it is unlikely we can fix Net::Z3950::ZOOM in a timely fashion, this patch aims to build a workaround on the Koha end.

This patch changes EmbedItemsInMarcBiblio to use append_fields instead of insert_ordered_fields,
so the 999$c will come before the item records.  It's VERY unlikely we will encounter more than 1MB of biblio-level MARC
content, as this would break the ISO-2709 standard by a large factor.

To this end, it also moves the fix_biblio_ids portion of get_corrected_marc_record out of rebuild_zebra.pl,
and makes it a part of GetMarcBiblio (right before EmbedItemsInMarcBiblio, so the 952s still come last).  fix_biblio_ids
is kept as a subroutine for the deletion portion of rebuild_zebra.pl, which still uses it.

It also uses the subroutine parameter in GetMarcBiblio to do the EmbedItemsInMarcBiblio action, rather than having
rebuild_zebra.pl perform it on the itemless record returned from GetMarcBiblio.  Simpler and cleaner that way.

To verify bug issue:
1. Find a biblio with over 700 items (or enough that the resulting MARCXML is greater than 1MB)
2. search for this biblio (in a search that would return multiple results, not just this title).  You should get the title in
the results list
3. attempt to click the link to this biblio's details page; the biblionumber should be blank, leading to a 404

To test solution:
1. Apply patch
2. modify the biblio slightly (click the 005 for example) and save
   OR manually add the biblio to zebraqueue for reindexing
3. after rebuild_zebra.pl -z -b -x runs, use the same search as above. The title should still appear.
4. click the link, and find yourself on the biblio detail page as desired

Signed-off-by: D Ruth Bavousett <ruth@bywatersolutions.com>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
2011-10-15 13:47:24 +13:00
Jesse Weaver
048c0dc04e Bug 6492 - Deleted biblios cause rebuild_zebra to fail
This both adds a bit of a failsafe to get_raw_biblio, and prevents
records that have been deleted from being updated by the same instance
of rebuild_zebra.

Minor amendment to remove duplication of 6433

Signed-off-by: MJ Ray <mjr@phonecoop.coop>
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
2011-07-05 11:18:28 +12:00
3b8f1318e0 Bug 6050 Followup, edit a last function call
Signed-off-by: Frédéric Demians <f.demians@tamil.fr>
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
2011-06-14 14:12:05 +12:00
Srdjan Janković
5829cef6d8 bug_6433: exception handling
Signed-off-by: Magnus Enger <magnus@enger.priv.no>
2011-06-10 11:27:25 +12:00
e96315556b bug 5579: new routine to embed items in bib
Adds a new routine, C4::Biblio::EmbedItemsInMarcBiblio, to
embed the items in the bib record when necessary:

* cataloging/additem.pl
* rebuild_zebra.pl

Signed-off-by: Galen Charlton <gmc@esilibrary.com>
Signed-off-by: Claire Hernandez <claire.hernandez@biblibre.com>
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
2011-04-19 22:34:21 +12:00
Henri-Damien LAURENT
3584c4426b Bug 5579: remove items from MARC bib
This is a squash of four patches by Henri-Damien Laurent
starting work on removing the copy of item record information
in the 9XX field of bibliographic records.  The reason
for doing this is primarily to improve performance, in particular,
the expense of having to add/modify the bib record whenever an
item changes.  Now, whenever an item changes, the bib record is
put in the queue to be reindexed; when the bib is indexed, the 9XX
fields are inserted into the version of the bib that Zebra indexes.
Since rebuild_zebra.pl runs in a separate process, the processing of the
bib record will not delay (e.g.) circulation.

As part of upgrading to 3.4, the following batch script should be run:

misc/maintenance/remove_items_from_biblioitems.pl --run

This should be followed by a complete reindexing of the bib records, e.g.,

misc/migration_tools/rebuild_zebra.pl -b -r

Signed-off-by: Galen Charlton <gmcharlt@gmail.com>
Signed-off-by: Claire Hernandez <claire.hernandez@biblibre.com>
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
2011-04-19 22:33:56 +12:00
Ian Walls
8dc56a0d2c Bug 5831: rebuild_zebra.pl doesn't respect -r
Reimplements support for -r, as well for -reset

Signed-off-by: D Ruth Bavousett <ruth@bywatersolutions.com>
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
2011-03-06 08:44:57 +13:00
Robin Sheat
8de1ef7e94 Bug 5228 - make rebuild_zebra handle fixing the zebra dirs
If the zebra server directories don't exist, zebra will spit the dummy.
This makes rebuild_zebra.pl smart enough to create them if they're not
there. If that fails, it'll scream loudly so you know zebra isn't
reindexing.

Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
2010-12-13 21:59:49 +13:00
Robin Sheat
57d11aee2c Bug 5077 - ensure rebuild_zebra will run somewhere it can read
This prevents it leaving files lying around in /tmp

Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
Signed-off-by: Galen Charlton <gmcharlt@gmail.com>
2010-10-06 08:00:17 -04:00
Donovan Jones
5e0b850d49 Bug 2505 - Add commented use warnings where missing in the misc/ directory 2010-04-21 20:26:44 +12:00
459d732180 Bug 3301 - Speed up rebuild_zebra script
With this patch, rebuild_zebra can re-index a whole Koha DB
quickly:

  rebuild_zebra -r -b -nosanitize

Biblio (authority) records are dump directly in a file
from marcxml field without beeing transformed into
MARC::Record object and corrected.

DOCUMENTATION:

rebuild_zebra.pl new paramater:

-nosanitize  export biblio/authority records directly from DB marcxml
             field without sanitizing records. It speed up
             dump process but could fail if DB contains badly
             encoded records. Works now only with -x and -b

Signed-off-by: Galen Charlton <galen.charlton@liblime.com>
2009-06-29 07:52:46 -05:00
Brian Harrington
25cd35b3a1 bug 2924 fixed rebuild_zebra.pl to work when export is skipped
reindexing now occurs if there are $num_records_exported or if
$skip_export is set

Signed-off-by: Galen Charlton <galen.charlton@liblime.com>
2009-03-04 08:28:22 -06:00
Michael Hafen
086b3ccf9a bug in rebuild_zebra verbose logging - found another print I didn't want to see all the time
Add the phrase 'if ( $verbose_logging )' to the two print statements
concerning the skipping of biblio or authority records.

I recently had to split biblio and authority index updating in my cron
script ( had some really big records so had to add the -x switch which
should only be used on biblios accourding to the help ).  So I noticed
that rebuild_zebra.pl printed messages that it was skipping biblios or
authorities.

This patch is to conditionalize those prints based on the verbose
logging switch.

Signed-off-by: Galen Charlton <galen.charlton@liblime.com>
2008-12-11 09:23:28 -06:00
Michael Hafen
62a590a954 Reduce logging from rebuild_zebra.pl with a command line option
This reduces the output of the script and zebraidx, and creates a -v
command line switch which will increase the logging to their former
states.

Signed-off-by: Galen Charlton <galen.charlton@liblime.com>
2008-10-01 13:05:20 -05:00