Commit graph

70 commits

Author SHA1 Message Date
Galen Charlton
b26870e53d Bug 11252: remove deprecated -munge-config switch from rebuild_zebra.pl
The -munge-config switch has been deprecated for years, and
trying to use it would either not work at all or, if it did "work",
almost certainly damage one's Zebra configuration for Koha.

This patch removes this switch.

To test:

[1] Run rebuild_zebra.pl and verify that no mention is made
    of -munge-config.
[2] Run rebuild_zebra.pl to index records in one's test database
    and verify that there are no regressions.

Signed-off-by: Galen Charlton <gmc@esilibrary.com>
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
Removing a really dangerous option

Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>
Passes all tests and QA script.
Ran rebuild_zebra.pl with various options and confirmed
that data was reindexed successfully.
No regressions found.

Signed-off-by: Galen Charlton <gmc@esilibrary.com>
2013-12-26 15:24:41 +00:00
Galen Charlton
b25de3e7cf Bug 6435: (follow-up) make -daemon really imply -a and -b
This patch follows up on the previous patch by moving the
check for whether authority and/or biblio indexing have been
specified so that -daemon has a chance to set those modes.

Signed-off-by: Galen Charlton <gmc@esilibrary.com>
2013-11-24 18:20:56 +00:00
Doug Kingston
00240d6970 Bug 6435: (follow-up) rebuild_zebra -daemon option now smarter
Based on feedback, make daemon mode imply -z -a -b and abort
on startup if flags incompatible with an incremental update daemon
are used.  Update documentation to match.

Signed-off-by: Galen Charlton <gmc@esilibrary.com>
2013-11-24 18:15:23 +00:00
Doug Kingston
1b0992e8d5 Bug 6435: Add daemon mode to rebuild_zebra.pl
This change adds code to check the zebraqueue table with a cheap SQL query
and a daemon loop that checks for new entries and processes them incrementally
before sleeping for a controllable number of seconds.  The default is 5 seconds
which provides a near realtime search index update.  This is desirable particularly
for libraries that are doing active catalogue updating.  The query is adjusted
based on whether -a, -b, or -a -b are specified.

Help text updated.  Tested against a live 3.12 system.

Note that this fix will benefit from the fix to lack of locking (bug 11078)

Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Signed-off-by: Galen Charlton <gmc@esilibrary.com>
2013-11-24 18:12:21 +00:00
2eefd1f3a5 Bug 8745: General whitespace and tab tidy
http://bugs.koha-community.org/show_bug.cgi?id=8745
Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>
1) Runs not with root.
2) Runs with root and -run-as-root.
3) Runs using the normal koha user.

Note: Maybe the message should be clear about why
running as root is bad and which user you should
be running the script with?
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
2013-04-21 09:41:34 -04:00
Barry Cannon
ef86a77801 Bug 8745 - Disallow rebuild_zebra.pl from executing, when run by root user.
Added a check to warn users of execution as root user.
Added a 'runas-root' switch to allow users to force execution as root user.

Signed-off-by: Mason James <mtj@kohaaloha.com>
Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
2013-04-21 09:41:34 -04:00
f9c8f39c02 Bug 9609: Rebuilding zebra reports double number of exported records.
Test plan:
Clear the zebra queue (run rebuild). Update one biblio.
Rebuild zebra (again) with -z. Check zebra log: note 2 exported records.
Now apply patch, and repeat: You will see 1 exported record.

Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>
Works as described.
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
2013-04-02 08:41:40 -04:00
Galen Charlton
151e22070a bug 9496: improve error checking in rebuild_zebra.pl
When using rebuild_zebra to index all records, skip over
bibliographic or authority records that don't come out
as valid XML.  Also, strip extraneous XML declarations when
using --nosanitize.

Test plans
----------
Note that both plans assume that DOM indexing is turned on.

Test plan #1
============

[1] Run rebuild_zebra.pl with the -x -nosanitize options.  Without
    the patch, zebraidx should terminate early and complain
    about invalid XML.
[2] With the patch, the rebuild_zebra.pl should work without
    error.

Test plan #2
============
[1] Intentionally make a MARCXML record invalid, e.g, by running
    the following SQL:

    UPDATE bilbioitems SET marcxml = CONCATENATE(marcxml, 'junk')
    WHERE biblionumber = 123;

[2] Run rebuild_zebra.pl -b -x -r
[3] Without the patch, only part of the database will be indexed.
[4] With the patch, rebuild_zebra.pl will not export the bad
    record and will give an error message saying so, but will
    successfully index the rest of the records.

Signed-off-by: Galen Charlton <gmc@esilibrary.com>
Signed-off-by: Larry Baerveldt <larry@bywatersolutions.com>
Signed-off-by: Mason James <mtj@kohaaloha.com>

Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
2013-03-21 22:25:03 -04:00
4dcee58a4d Bug 7440 - Remove NoZebra vestiges
Removed NoZebra vestiges. This comprises several code blocks that depend on the NoZebra syspref and NZ related functions/methods.

C4::Biblio->
 GetNoZebraIndexes
 _DelBiblioNoZebra
 _AddBiblioNoZebra

C4::Search->
 NZgetRecords
 NZanalyse
 NZoperatorAND
 NZoperatorOR
 NZoperatorNOT
 NZorder

C4::Installer->
 set_indexing_engine

Sponsored-by: Universidad Nacional de Córdoba
Signed-off-by: Julian Maurice <julian.maurice@biblibre.com>

Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
2013-03-19 21:17:04 -04:00
Jared Camins-Esakov
49cadcf7c1 Bug 9049: Don't use shadow with rebuild_zebra -r
Due to a limitation of Zebra, the register must be cleared *before*
doing shadow indexing if you want to reset the indexes. In light of
that, it does not make sense to do shadow indexing at all when
rebuild_zebra.pl is run with the -r switch. This patch makes -r (reset)
imply -n (no shadow).

To test:
1) Run `rebuild_zebra.pl -b -r -v -v -v`
2) Note that the script never runs the merge phase

Without the patch I see log lines refering to the shadow cache (enabling shadow spec=/home/koha/koha-dev/var/lib/zebradb/biblios/shadow:20G)
With the patch I don't see anything in the logs about shadow.  I do however see lines about merging.  I think it could just be a misunderstanding of the logs

Signed-off-by: wajasu <matted-34813@mypacks.net>
Signed-off-by: Elliott Davis <elliott@bywatersolutions.com>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
2012-12-08 09:46:30 -05:00
Jared Camins-Esakov
deeeb068d9 Bug 9050: Use safer adelete when deleting records from Zebra index
Previously we used the "delete" command in zebraidx, which fails when
you try to delete a record that doesn't exist in the index. By changing
to the "adelete" command, we can reduce the likelihood of a failed
delete causing ghost records. A symptom of this problem is the warning
message occasionally encountered when indexing from the zebraqueue,
"[warn] cannot delete record above (seems new)."

To test:
1) Add a recordDelete action for a record that does not exist to
   zebraqueue in MySQL:
   INSERT INTO zebraqueue (biblio_auth_number, operation, server) \
       VALUES (999999999, 'recordDelete', 'biblioserver');
2) Run `rebuild_zebra.pl -b -z -v [-x]`.
3) Note that you do not get the message "[warn] cannot delete record
   above (seems new)".

Signed-off-by: Chris Cormack <chris@bigballofwax.co.nz>
Passed-QA-by: Paul Poulain <paul.poulain@biblibre.com>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
2012-11-12 18:53:49 -05:00
Jared Camins-Esakov
bc05b5d163 Bug 7417: Include see from references in bibliographic searches
This patch adds the Koha::Indexer::RecordNormalizer and
Koha::Indexer::MARC::RecordNormalizer::EmbedSeeFromHeadings packages
to enable the inclusion of alternate forms of headings in bibliographic
searches. When the new syspref IncludeSeeFromInSearches is turned on
(default is off) rebuild_zebra.pl will insert see from headings from
authority records into bibliographic records when indexing, so that a
search on an obsolete term will turn up relevant records.

To test:
1) Enable IncludeSeeFromInSearches
2) Add a heading that has an alternate form to a record (for example,
   "Cooking" has the alternate form "Cookery," if you have authority
   records from LC)
3) Index the zebraqueue (or reindex if you haven't indexed your system
   yet)
4) Confirm that if you search for "Cookery" you get the record you
   just modified

Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
Rebased on master 5 August 2012
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
Rebased on master 11 September 2012

Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>

Also checked:
- Verified database update works correctly
- Checked system preference and its description
- Checked staff/opac detail pages with feature on/off
- Checked staff/opac search facets
- Downloaded and tested records in various formats
- Tried different searches for 'see from' entries of authorities
- Ran all unit tests

No problems found.
2012-09-13 14:19:28 +02:00
Julian Maurice
57424a9fdc Bug 7286: rebuild_zebra_sliced for biblios and authorities
Complete rewrite of rebuild_zebra_sliced.zsh (renamed to .sh). Main
improvements are:
  - both biblio and authority records are handled
  - records are exported only once

It also add an option --skip-index to rebuild_zebra.pl that permit to
use rebuild_zebra.pl as an 'export only' script.

Description:
Index Koha records by chunks. It is useful when some record causes
errors and stop the indexation process. With this script, if indexation
of one chunk fails, chunk is splitted in 2 (or 3) chunks, and
indexation continue on these chunks.
rebuild_zebra.pl is called only once to export records.
Splitting and indexing is handled by this script (using yaz-marcdump and
zebraidx).

Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
2012-07-06 15:06:40 +02:00
christophe croullebois
082bb5049d Bug 8136 Changes the expected lenght of 100$a in rebuild_zebra.pl
In rebuild_zebra.pl, if we are in "unimarc" ("marcflavour" syspref), the sub "fix_unimarc_100" is called and checks if 100$a lenght is equal to 35.
If it is not the case, the sub inserts the localtime and more, so we loose the datas in reindexing.
The standart lenght is 36.
I have just changed 35 to 36.

Signed-off-by: Sophie Meynieux <sophie.meynieux@biblibre.com>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
2012-06-20 09:39:27 +02:00
Galen Charlton
daca5edc52 Bug 7818: -x option of rebuild_zebra.pl now works with DOM filter
One consequence is that the -x and -a options are no longer
mutually exclusive.

Also, because of the way that the GRS-1 SGML filter works, if you're
indexing multiple documents, you can't just wrap them in a document
element, but the DOM filter *requires* it.  Consequently, two
new config settings in koha-conf.xml are added to indicate the
Zebra filter in use so that the -x option of rebuild_zebra.pl
knows whether to wrap the exported records or not:

- bib_index_mode (defaults to 'grs1' if not specified)
- auth_index_mode (defaults to 'dom')

Signed-off-by: Galen Charlton <gmc@esilibrary.com>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
2012-06-09 11:44:09 +02:00
Paul Poulain
1fd8c8a4de Bug 7246 add offset/length and where options to rebuild_zebra
This patch reimplement a feature that is on biblibre/master for Koha-community/master

It adds 4 parameters:
* offset = the offset of record. Say 1000 to start rebuilding at the 1000th record of your database
* length = how many records to export. Say 400 to export only 400 records
* where = add a where clause to rebuild only a given itemtype, or anything you want to filter on

Another improvement resulting from offset & length limit is the rebuild_zebra_sliced.zsh
that will be submitted in another patch.
rebuild_zebra_sliced will slice your all database in small chunks, and, if something went wrong for a given slice, will slice the slice, and repeat, until you reach a slice size of 1, showing which record is wrong in your database.

Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
Removed mention of -l option for limiting number of items exported, as requested
by QA manager. This can be re-added in a later patch.

Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
2012-02-17 10:59:23 +01:00
Colin Campbell
263dded818 Bug 6752: Be stricter with utf-8 encoding of output
use encoding(UTF-8) rather than utf-8 for stricter
encoding
Marking output as ':utf8' only flags the data as utf8
using :encoding(UTF-8) also checks it as valid utf-8
see binmode in perlfunc for more details
In accordance with the robustness principle input
filehandles have not been changed as code may make
the undocumented assumption that invalid utf-8 is present
in the imput
Fixes errors reported by t/00-testcritic.t
Where feasable some filehandles have been made lexical rather than
reusing global filehandle vars

Signed-off-by: Jonathan Druart <jonathan.druart@biblibre.com>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
2012-01-27 12:11:06 +01:00
Dobrica Pavlinusic
90d68d6f5c Bug 7247 - rebuild_zebra.pl -v should show all Zebra log output
Currently, -v option resets Zebra log output to default system values.

This produce amount of log specified in system defaults which is usually
too low for debugging.

This change explicitly forces all Zebra log output which create much more
chatter so it triggers with verbosity level 2

Test scenario:
1. pick koha site to reindex
2. use -v -v options to rebuild_zebra.pl to see additional output

Signed-off-by: Liz Rea <wizzyrea@gmail.com>
Verified help corrections and  loglevel 2 output vs. loglevel 1 output. No issues found.

Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
2012-01-17 17:31:25 +01:00
Robin Sheat
849547df68 Bug 7008 - create tmp dir for zebra
Sometimes zebra needs a tmp dir in order to work. This ensures that it
is created both by koha-create-dirs in the packages, and by
rebuild_zebra when it runs.
--

tested ok, signing off
Signed-off-by: Mason James <mtj@kohaaloha.com>
2011-12-03 07:56:44 +01:00
4ce57a102b Bug 6799 rebuild_zebra.pl -x produces invalid XML records
This patch allow to handle properly items containing extended characters and
send valid XML records to zebraidx

Signed-off-by: Julian Maurice <julian.maurice@biblibre.com>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
2011-11-18 23:29:08 +01:00
Ian Walls
4e95e94727 Bug 6789: biblios with many items can result in broken search results link
This patch fixes an issue whereby biblios with many items (often > 500) would index,
but not the biblionumber itself, resulting in search results with a) inaccurate item counts
and b) no biblionumber to use in the link to the details page.  This is due to Net::Z3950::ZOOM  not providing
a mechanism for specifying different connection attributes; the maximumRecordSize ZOOM connection attribute,
if not specified, defaults to 1MB, which is less than the size of a MARC record with many, many 952 fields.  Since
it is unlikely we can fix Net::Z3950::ZOOM in a timely fashion, this patch aims to build a workaround on the Koha end.

This patch changes EmbedItemsInMarcBiblio to use append_fields instead of insert_ordered_fields,
so the 999$c will come before the item records.  It's VERY unlikely we will encounter more than 1MB of biblio-level MARC
content, as this would break the ISO-2709 standard by a large factor.

To this end, it also moves the fix_biblio_ids portion of get_corrected_marc_record out of rebuild_zebra.pl,
and makes it a part of GetMarcBiblio (right before EmbedItemsInMarcBiblio, so the 952s still come last).  fix_biblio_ids
is kept as a subroutine for the deletion portion of rebuild_zebra.pl, which still uses it.

It also uses the subroutine parameter in GetMarcBiblio to do the EmbedItemsInMarcBiblio action, rather than having
rebuild_zebra.pl perform it on the itemless record returned from GetMarcBiblio.  Simpler and cleaner that way.

To verify bug issue:
1. Find a biblio with over 700 items (or enough that the resulting MARCXML is greater than 1MB)
2. search for this biblio (in a search that would return multiple results, not just this title).  You should get the title in
the results list
3. attempt to click the link to this biblio's details page; the biblionumber should be blank, leading to a 404

To test solution:
1. Apply patch
2. modify the biblio slightly (click the 005 for example) and save
   OR manually add the biblio to zebraqueue for reindexing
3. after rebuild_zebra.pl -z -b -x runs, use the same search as above. The title should still appear.
4. click the link, and find yourself on the biblio detail page as desired

Signed-off-by: D Ruth Bavousett <ruth@bywatersolutions.com>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
2011-10-15 13:47:24 +13:00
Jesse Weaver
048c0dc04e Bug 6492 - Deleted biblios cause rebuild_zebra to fail
This both adds a bit of a failsafe to get_raw_biblio, and prevents
records that have been deleted from being updated by the same instance
of rebuild_zebra.

Minor amendment to remove duplication of 6433

Signed-off-by: MJ Ray <mjr@phonecoop.coop>
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
2011-07-05 11:18:28 +12:00
3b8f1318e0 Bug 6050 Followup, edit a last function call
Signed-off-by: Frédéric Demians <f.demians@tamil.fr>
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
2011-06-14 14:12:05 +12:00
Srdjan Janković
5829cef6d8 bug_6433: exception handling
Signed-off-by: Magnus Enger <magnus@enger.priv.no>
2011-06-10 11:27:25 +12:00
e96315556b bug 5579: new routine to embed items in bib
Adds a new routine, C4::Biblio::EmbedItemsInMarcBiblio, to
embed the items in the bib record when necessary:

* cataloging/additem.pl
* rebuild_zebra.pl

Signed-off-by: Galen Charlton <gmc@esilibrary.com>
Signed-off-by: Claire Hernandez <claire.hernandez@biblibre.com>
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
2011-04-19 22:34:21 +12:00
Henri-Damien LAURENT
3584c4426b Bug 5579: remove items from MARC bib
This is a squash of four patches by Henri-Damien Laurent
starting work on removing the copy of item record information
in the 9XX field of bibliographic records.  The reason
for doing this is primarily to improve performance, in particular,
the expense of having to add/modify the bib record whenever an
item changes.  Now, whenever an item changes, the bib record is
put in the queue to be reindexed; when the bib is indexed, the 9XX
fields are inserted into the version of the bib that Zebra indexes.
Since rebuild_zebra.pl runs in a separate process, the processing of the
bib record will not delay (e.g.) circulation.

As part of upgrading to 3.4, the following batch script should be run:

misc/maintenance/remove_items_from_biblioitems.pl --run

This should be followed by a complete reindexing of the bib records, e.g.,

misc/migration_tools/rebuild_zebra.pl -b -r

Signed-off-by: Galen Charlton <gmcharlt@gmail.com>
Signed-off-by: Claire Hernandez <claire.hernandez@biblibre.com>
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
2011-04-19 22:33:56 +12:00
Ian Walls
8dc56a0d2c Bug 5831: rebuild_zebra.pl doesn't respect -r
Reimplements support for -r, as well for -reset

Signed-off-by: D Ruth Bavousett <ruth@bywatersolutions.com>
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
2011-03-06 08:44:57 +13:00
Robin Sheat
8de1ef7e94 Bug 5228 - make rebuild_zebra handle fixing the zebra dirs
If the zebra server directories don't exist, zebra will spit the dummy.
This makes rebuild_zebra.pl smart enough to create them if they're not
there. If that fails, it'll scream loudly so you know zebra isn't
reindexing.

Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
2010-12-13 21:59:49 +13:00
Robin Sheat
57d11aee2c Bug 5077 - ensure rebuild_zebra will run somewhere it can read
This prevents it leaving files lying around in /tmp

Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
Signed-off-by: Galen Charlton <gmcharlt@gmail.com>
2010-10-06 08:00:17 -04:00
Donovan Jones
5e0b850d49 Bug 2505 - Add commented use warnings where missing in the misc/ directory 2010-04-21 20:26:44 +12:00
459d732180 Bug 3301 - Speed up rebuild_zebra script
With this patch, rebuild_zebra can re-index a whole Koha DB
quickly:

  rebuild_zebra -r -b -nosanitize

Biblio (authority) records are dump directly in a file
from marcxml field without beeing transformed into
MARC::Record object and corrected.

DOCUMENTATION:

rebuild_zebra.pl new paramater:

-nosanitize  export biblio/authority records directly from DB marcxml
             field without sanitizing records. It speed up
             dump process but could fail if DB contains badly
             encoded records. Works now only with -x and -b

Signed-off-by: Galen Charlton <galen.charlton@liblime.com>
2009-06-29 07:52:46 -05:00
Brian Harrington
25cd35b3a1 bug 2924 fixed rebuild_zebra.pl to work when export is skipped
reindexing now occurs if there are $num_records_exported or if
$skip_export is set

Signed-off-by: Galen Charlton <galen.charlton@liblime.com>
2009-03-04 08:28:22 -06:00
Michael Hafen
086b3ccf9a bug in rebuild_zebra verbose logging - found another print I didn't want to see all the time
Add the phrase 'if ( $verbose_logging )' to the two print statements
concerning the skipping of biblio or authority records.

I recently had to split biblio and authority index updating in my cron
script ( had some really big records so had to add the -x switch which
should only be used on biblios accourding to the help ).  So I noticed
that rebuild_zebra.pl printed messages that it was skipping biblios or
authorities.

This patch is to conditionalize those prints based on the verbose
logging switch.

Signed-off-by: Galen Charlton <galen.charlton@liblime.com>
2008-12-11 09:23:28 -06:00
Michael Hafen
62a590a954 Reduce logging from rebuild_zebra.pl with a command line option
This reduces the output of the script and zebraidx, and creates a -v
command line switch which will increase the logging to their former
states.

Signed-off-by: Galen Charlton <galen.charlton@liblime.com>
2008-10-01 13:05:20 -05:00
Galen Charlton
df1f46f9da bug 2253: improve rebuild_zebra's handling of zebraqueue
Prior to this patch, rebuild_zebra.pl -z was effectively
hanging on to a lock on the zebraqueue table, preventing
other scripts from inserting new entries into the table.
This had the effect of causing circulation operations
to time out.

Refactored by having rebuld_zebra.pl pull the active
queue into memory, then mark entries done by zebraqueue.id.
Consequently, rebuild_zebra.pl should no longer
block adding new entries into zebraqueue.

Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2008-06-19 09:49:06 -05:00
Galen Charlton
3109d5820e rebuild_zebra.pl - add -y option
rebuild_zebra.pl will now mark all zebraqueue entries
of the affected record type(s) done when run in
normal mode to index all records (as opposed to running
it with -z to just process the zebraqueue).  This prevents
any running zebraqueue_daemon processes from attempting
to reindex the same records, redundantly.

The new -y swtich overrides this new behavior; in other words, if
running rebuild_zebra.pl without -z, you can specify
-y to *not* mark zebraqueue done.

Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2008-04-21 11:17:29 -05:00
Galen Charlton
e2c1f11715 fixed memory leak I introduced
Accidentally introducing a circular reference in a
MARC::Record object does not lead to goodness, particularly
if you export lots and lots of them.

Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2008-04-01 06:46:05 -05:00
Galen Charlton
4f001186b6 still more rebuild_zebra refactoring
Merged duplicate code for indexing bibs and
authorities into a single index_records() function.

Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2008-03-25 07:58:03 -05:00
Galen Charlton
a5576b8dfe IMPORTANT: added -z option to rebuild_zebra.pl
The -z option, when used in conjunction with -a and/or -b,
selects the records to reindex from the zebraqueue table.
Both record updates and record deletes are handled.

-z is cannot be used with -s or -r: the updated records
must always be freshly exported, and if zebraqueue
is to be processed, it's assumed that you don't want
to drop the Zebra index first.

This means that rebuild_zebra.pl -b -a -x can be
used as a cronjob to update the indexes periodically; it
is believed that this will offer much better indexing
performance on some setups as compared to zebraqueue_daemon.pl,
which uses Z39.50 extended services to send record updates
to Zebra.

Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2008-03-25 07:58:01 -05:00
Galen Charlton
57d128f727 rebuild_zebra: exit if both -a and -x specified
At moment using both -a (index authorities) and
-x (export records as MARC XML) is not allowed -
if the Zebra authority database is using the DOM
filter, zebraidx will not be able to process the
exported records correctly.

Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2008-03-25 07:57:44 -05:00
Galen Charlton
f0d5da7448 more rebuild_zebra.pl refactoring
1. Logic to fix up record IDs, UNIMARC 100 field,
   and record leader now in separate functions.
2. Removed (incorrect) logic to save corrected record
   in database.

Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2008-03-25 07:57:43 -05:00
Galen Charlton
f98c27a8bc refactor rebuild_zebra: new routine for invoking zebraidx
Created a routine for calling zebraidx, replacing
separate invocations for bibs and authorities.

Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2008-03-25 07:57:42 -05:00
Galen Charlton
ae8a76dacc rebuild_zebra.pl: removed disused $limit option
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2008-03-25 07:57:41 -05:00
Ryan Higgins
71dd69d5ac add option to export and index xml to rebuild_zebra
Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2008-02-15 08:25:46 -06:00
Paul POULAIN
319a32b16e rebuild_zebra : directories updated
the unimarc stuff has been moved to marc_defs directory and the
lang specific is in lang_defs

Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2008-01-03 00:55:12 -06:00
Joshua Ferraro
c6ddddad98 adding a new option, -w, which disables shadow indexing for the current batch (faster indexing of large sets where ACID isn't critical)
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2007-12-30 12:13:27 -06:00
Galen Charlton
4609608ccc allow use of older version of File::Temp
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2007-12-22 22:58:12 -06:00
Galen Charlton
93beb943c0 bug 1661: rebuild_zebra.pl changes
[1] Use File::Temp to create and manage
    export directory if -d is not specified.
[2] Added usage message.
[3] Code that attempts to fix up Zebra
    configuration files changed so that it
    is invoked only if --munge-config option
    is supplied; this code will ultimately
    either be removed or moved to a separate
    script -- the sorts of errors that it
    tries to fix should no longer be appearing
    in a standard install.
[4] Fixed Win32 portability problem when removing
    temporary directory.

Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2007-12-20 19:19:43 -06:00
Paul POULAIN
262a6e2a9a Updating rebuild_zebra.pl : now uses etc config files
There are only 2 UNIMARC specific files (.abs and .chr), they have been moved to etc/zebradb

The rebuild_zebra.pl takes all config file from this location now.
the misc/zebra/ can be removed (and will be soon)

Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2007-11-25 17:07:46 -06:00
Paul POULAIN
f38b7598fc still handling better dirty MARC records
this time it's when a biblio don't have biblionumber, has a 100$a field, and it's invalid.

1 biblio in my 300 000 DB (and it was biblio 294 359, of course !)

Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2007-11-20 16:20:50 -06:00