Koha-community/Koha - Koha: The world's first free and open source library system

Author	SHA1	Message	Date
Galen Charlton	b26870e53d	Bug 11252: remove deprecated -munge-config switch from rebuild_zebra.pl The -munge-config switch has been deprecated for years, and trying to use it would either not work at all or, if it did "work", almost certainly damage one's Zebra configuration for Koha. This patch removes this switch. To test: [1] Run rebuild_zebra.pl and verify that no mention is made of -munge-config. [2] Run rebuild_zebra.pl to index records in one's test database and verify that there are no regressions. Signed-off-by: Galen Charlton <gmc@esilibrary.com> Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz> Removing a really dangerous option Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de> Passes all tests and QA script. Ran rebuild_zebra.pl with various options and confirmed that data was reindexed successfully. No regressions found. Signed-off-by: Galen Charlton <gmc@esilibrary.com>	2013-12-26 15:24:41 +00:00
Galen Charlton	b25de3e7cf	Bug 6435: (follow-up) make -daemon really imply -a and -b This patch follows up on the previous patch by moving the check for whether authority and/or biblio indexing have been specified so that -daemon has a chance to set those modes. Signed-off-by: Galen Charlton <gmc@esilibrary.com>	2013-11-24 18:20:56 +00:00
Doug Kingston	00240d6970	Bug 6435: (follow-up) rebuild_zebra -daemon option now smarter Based on feedback, make daemon mode imply -z -a -b and abort on startup if flags incompatible with an incremental update daemon are used. Update documentation to match. Signed-off-by: Galen Charlton <gmc@esilibrary.com>	2013-11-24 18:15:23 +00:00
Doug Kingston	1b0992e8d5	Bug 6435: Add daemon mode to rebuild_zebra.pl This change adds code to check the zebraqueue table with a cheap SQL query and a daemon loop that checks for new entries and processes them incrementally before sleeping for a controllable number of seconds. The default is 5 seconds which provides a near realtime search index update. This is desirable particularly for libraries that are doing active catalogue updating. The query is adjusted based on whether -a, -b, or -a -b are specified. Help text updated. Tested against a live 3.12 system. Note that this fix will benefit from the fix to lack of locking (bug 11078) Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz> Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com> Signed-off-by: Galen Charlton <gmc@esilibrary.com>	2013-11-24 18:12:21 +00:00
Mason James	2eefd1f3a5	Bug 8745: General whitespace and tab tidy http://bugs.koha-community.org/show_bug.cgi?id=8745 Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de> 1) Runs not with root. 2) Runs with root and -run-as-root. 3) Runs using the normal koha user. Note: Maybe the message should be clear about why running as root is bad and which user you should be running the script with? Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>	2013-04-21 09:41:34 -04:00
Barry Cannon	ef86a77801	Bug 8745 - Disallow rebuild_zebra.pl from executing, when run by root user. Added a check to warn users of execution as root user. Added a 'runas-root' switch to allow users to force execution as root user. Signed-off-by: Mason James <mtj@kohaaloha.com> Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de> Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>	2013-04-21 09:41:34 -04:00
Marcel de Rooy	f9c8f39c02	Bug 9609: Rebuilding zebra reports double number of exported records. Test plan: Clear the zebra queue (run rebuild). Update one biblio. Rebuild zebra (again) with -z. Check zebra log: note 2 exported records. Now apply patch, and repeat: You will see 1 exported record. Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com> Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de> Works as described. Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>	2013-04-02 08:41:40 -04:00
Galen Charlton	151e22070a	bug 9496: improve error checking in rebuild_zebra.pl When using rebuild_zebra to index all records, skip over bibliographic or authority records that don't come out as valid XML. Also, strip extraneous XML declarations when using --nosanitize. Test plans ---------- Note that both plans assume that DOM indexing is turned on. Test plan #1 ============ [1] Run rebuild_zebra.pl with the -x -nosanitize options. Without the patch, zebraidx should terminate early and complain about invalid XML. [2] With the patch, the rebuild_zebra.pl should work without error. Test plan #2 ============ [1] Intentionally make a MARCXML record invalid, e.g, by running the following SQL: UPDATE bilbioitems SET marcxml = CONCATENATE(marcxml, 'junk') WHERE biblionumber = 123; [2] Run rebuild_zebra.pl -b -x -r [3] Without the patch, only part of the database will be indexed. [4] With the patch, rebuild_zebra.pl will not export the bad record and will give an error message saying so, but will successfully index the rest of the records. Signed-off-by: Galen Charlton <gmc@esilibrary.com> Signed-off-by: Larry Baerveldt <larry@bywatersolutions.com> Signed-off-by: Mason James <mtj@kohaaloha.com> Signed-off-by: Paul Poulain <paul.poulain@biblibre.com> Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>	2013-03-21 22:25:03 -04:00
Tomas Cohen Arazi	4dcee58a4d	Bug 7440 - Remove NoZebra vestiges Removed NoZebra vestiges. This comprises several code blocks that depend on the NoZebra syspref and NZ related functions/methods. C4::Biblio-> GetNoZebraIndexes _DelBiblioNoZebra _AddBiblioNoZebra C4::Search-> NZgetRecords NZanalyse NZoperatorAND NZoperatorOR NZoperatorNOT NZorder C4::Installer-> set_indexing_engine Sponsored-by: Universidad Nacional de CÃ³rdoba Signed-off-by: Julian Maurice <julian.maurice@biblibre.com> Signed-off-by: Paul Poulain <paul.poulain@biblibre.com> Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>	2013-03-19 21:17:04 -04:00
Jared Camins-Esakov	49cadcf7c1	Bug 9049: Don't use shadow with rebuild_zebra -r Due to a limitation of Zebra, the register must be cleared before doing shadow indexing if you want to reset the indexes. In light of that, it does not make sense to do shadow indexing at all when rebuild_zebra.pl is run with the -r switch. This patch makes -r (reset) imply -n (no shadow). To test: 1) Run `rebuild_zebra.pl -b -r -v -v -v` 2) Note that the script never runs the merge phase Without the patch I see log lines refering to the shadow cache (enabling shadow spec=/home/koha/koha-dev/var/lib/zebradb/biblios/shadow:20G) With the patch I don't see anything in the logs about shadow. I do however see lines about merging. I think it could just be a misunderstanding of the logs Signed-off-by: wajasu <matted-34813@mypacks.net> Signed-off-by: Elliott Davis <elliott@bywatersolutions.com> Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>	2012-12-08 09:46:30 -05:00
Jared Camins-Esakov	deeeb068d9	Bug 9050: Use safer adelete when deleting records from Zebra index Previously we used the "delete" command in zebraidx, which fails when you try to delete a record that doesn't exist in the index. By changing to the "adelete" command, we can reduce the likelihood of a failed delete causing ghost records. A symptom of this problem is the warning message occasionally encountered when indexing from the zebraqueue, "[warn] cannot delete record above (seems new)." To test: 1) Add a recordDelete action for a record that does not exist to zebraqueue in MySQL: INSERT INTO zebraqueue (biblio_auth_number, operation, server) \ VALUES (999999999, 'recordDelete', 'biblioserver'); 2) Run `rebuild_zebra.pl -b -z -v [-x]`. 3) Note that you do not get the message "[warn] cannot delete record above (seems new)". Signed-off-by: Chris Cormack <chris@bigballofwax.co.nz> Passed-QA-by: Paul Poulain <paul.poulain@biblibre.com> Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>	2012-11-12 18:53:49 -05:00
Jared Camins-Esakov	bc05b5d163	Bug 7417: Include see from references in bibliographic searches This patch adds the Koha::Indexer::RecordNormalizer and Koha::Indexer::MARC::RecordNormalizer::EmbedSeeFromHeadings packages to enable the inclusion of alternate forms of headings in bibliographic searches. When the new syspref IncludeSeeFromInSearches is turned on (default is off) rebuild_zebra.pl will insert see from headings from authority records into bibliographic records when indexing, so that a search on an obsolete term will turn up relevant records. To test: 1) Enable IncludeSeeFromInSearches 2) Add a heading that has an alternate form to a record (for example, "Cooking" has the alternate form "Cookery," if you have authority records from LC) 3) Index the zebraqueue (or reindex if you haven't indexed your system yet) 4) Confirm that if you search for "Cookery" you get the record you just modified Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com> Rebased on master 5 August 2012 Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com> Rebased on master 11 September 2012 Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de> Also checked: - Verified database update works correctly - Checked system preference and its description - Checked staff/opac detail pages with feature on/off - Checked staff/opac search facets - Downloaded and tested records in various formats - Tried different searches for 'see from' entries of authorities - Ran all unit tests No problems found.	2012-09-13 14:19:28 +02:00
Julian Maurice	57424a9fdc	Bug 7286: rebuild_zebra_sliced for biblios and authorities Complete rewrite of rebuild_zebra_sliced.zsh (renamed to .sh). Main improvements are: - both biblio and authority records are handled - records are exported only once It also add an option --skip-index to rebuild_zebra.pl that permit to use rebuild_zebra.pl as an 'export only' script. Description: Index Koha records by chunks. It is useful when some record causes errors and stop the indexation process. With this script, if indexation of one chunk fails, chunk is splitted in 2 (or 3) chunks, and indexation continue on these chunks. rebuild_zebra.pl is called only once to export records. Splitting and indexing is handled by this script (using yaz-marcdump and zebraidx). Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com> Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>	2012-07-06 15:06:40 +02:00
christophe croullebois	082bb5049d	Bug 8136 Changes the expected lenght of 100$a in rebuild_zebra.pl In rebuild_zebra.pl, if we are in "unimarc" ("marcflavour" syspref), the sub "fix_unimarc_100" is called and checks if 100$a lenght is equal to 35. If it is not the case, the sub inserts the localtime and more, so we loose the datas in reindexing. The standart lenght is 36. I have just changed 35 to 36. Signed-off-by: Sophie Meynieux <sophie.meynieux@biblibre.com> Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>	2012-06-20 09:39:27 +02:00
Galen Charlton	daca5edc52	Bug 7818: -x option of rebuild_zebra.pl now works with DOM filter One consequence is that the -x and -a options are no longer mutually exclusive. Also, because of the way that the GRS-1 SGML filter works, if you're indexing multiple documents, you can't just wrap them in a document element, but the DOM filter requires it. Consequently, two new config settings in koha-conf.xml are added to indicate the Zebra filter in use so that the -x option of rebuild_zebra.pl knows whether to wrap the exported records or not: - bib_index_mode (defaults to 'grs1' if not specified) - auth_index_mode (defaults to 'dom') Signed-off-by: Galen Charlton <gmc@esilibrary.com> Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com> Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>	2012-06-09 11:44:09 +02:00
Paul Poulain	1fd8c8a4de	Bug 7246 add offset/length and where options to rebuild_zebra This patch reimplement a feature that is on biblibre/master for Koha-community/master It adds 4 parameters: * offset = the offset of record. Say 1000 to start rebuilding at the 1000th record of your database * length = how many records to export. Say 400 to export only 400 records * where = add a where clause to rebuild only a given itemtype, or anything you want to filter on Another improvement resulting from offset & length limit is the rebuild_zebra_sliced.zsh that will be submitted in another patch. rebuild_zebra_sliced will slice your all database in small chunks, and, if something went wrong for a given slice, will slice the slice, and repeat, until you reach a slice size of 1, showing which record is wrong in your database. Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com> Removed mention of -l option for limiting number of items exported, as requested by QA manager. This can be re-added in a later patch. Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>	2012-02-17 10:59:23 +01:00
Colin Campbell	263dded818	Bug 6752: Be stricter with utf-8 encoding of output use encoding(UTF-8) rather than utf-8 for stricter encoding Marking output as ':utf8' only flags the data as utf8 using :encoding(UTF-8) also checks it as valid utf-8 see binmode in perlfunc for more details In accordance with the robustness principle input filehandles have not been changed as code may make the undocumented assumption that invalid utf-8 is present in the imput Fixes errors reported by t/00-testcritic.t Where feasable some filehandles have been made lexical rather than reusing global filehandle vars Signed-off-by: Jonathan Druart <jonathan.druart@biblibre.com> Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>	2012-01-27 12:11:06 +01:00
Dobrica Pavlinusic	90d68d6f5c	Bug 7247 - rebuild_zebra.pl -v should show all Zebra log output Currently, -v option resets Zebra log output to default system values. This produce amount of log specified in system defaults which is usually too low for debugging. This change explicitly forces all Zebra log output which create much more chatter so it triggers with verbosity level 2 Test scenario: 1. pick koha site to reindex 2. use -v -v options to rebuild_zebra.pl to see additional output Signed-off-by: Liz Rea <wizzyrea@gmail.com> Verified help corrections and loglevel 2 output vs. loglevel 1 output. No issues found. Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>	2012-01-17 17:31:25 +01:00
Robin Sheat	849547df68	Bug 7008 - create tmp dir for zebra Sometimes zebra needs a tmp dir in order to work. This ensures that it is created both by koha-create-dirs in the packages, and by rebuild_zebra when it runs. -- tested ok, signing off Signed-off-by: Mason James <mtj@kohaaloha.com>	2011-12-03 07:56:44 +01:00
Frédéric Demians	4ce57a102b	Bug 6799 rebuild_zebra.pl -x produces invalid XML records This patch allow to handle properly items containing extended characters and send valid XML records to zebraidx Signed-off-by: Julian Maurice <julian.maurice@biblibre.com> Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>	2011-11-18 23:29:08 +01:00
Ian Walls	4e95e94727	Bug 6789: biblios with many items can result in broken search results link This patch fixes an issue whereby biblios with many items (often > 500) would index, but not the biblionumber itself, resulting in search results with a) inaccurate item counts and b) no biblionumber to use in the link to the details page. This is due to Net::Z3950::ZOOM not providing a mechanism for specifying different connection attributes; the maximumRecordSize ZOOM connection attribute, if not specified, defaults to 1MB, which is less than the size of a MARC record with many, many 952 fields. Since it is unlikely we can fix Net::Z3950::ZOOM in a timely fashion, this patch aims to build a workaround on the Koha end. This patch changes EmbedItemsInMarcBiblio to use append_fields instead of insert_ordered_fields, so the 999$c will come before the item records. It's VERY unlikely we will encounter more than 1MB of biblio-level MARC content, as this would break the ISO-2709 standard by a large factor. To this end, it also moves the fix_biblio_ids portion of get_corrected_marc_record out of rebuild_zebra.pl, and makes it a part of GetMarcBiblio (right before EmbedItemsInMarcBiblio, so the 952s still come last). fix_biblio_ids is kept as a subroutine for the deletion portion of rebuild_zebra.pl, which still uses it. It also uses the subroutine parameter in GetMarcBiblio to do the EmbedItemsInMarcBiblio action, rather than having rebuild_zebra.pl perform it on the itemless record returned from GetMarcBiblio. Simpler and cleaner that way. To verify bug issue: 1. Find a biblio with over 700 items (or enough that the resulting MARCXML is greater than 1MB) 2. search for this biblio (in a search that would return multiple results, not just this title). You should get the title in the results list 3. attempt to click the link to this biblio's details page; the biblionumber should be blank, leading to a 404 To test solution: 1. Apply patch 2. modify the biblio slightly (click the 005 for example) and save OR manually add the biblio to zebraqueue for reindexing 3. after rebuild_zebra.pl -z -b -x runs, use the same search as above. The title should still appear. 4. click the link, and find yourself on the biblio detail page as desired Signed-off-by: D Ruth Bavousett <ruth@bywatersolutions.com> Signed-off-by: Paul Poulain <paul.poulain@biblibre.com> Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>	2011-10-15 13:47:24 +13:00
Jesse Weaver	048c0dc04e	Bug 6492 - Deleted biblios cause rebuild_zebra to fail This both adds a bit of a failsafe to get_raw_biblio, and prevents records that have been deleted from being updated by the same instance of rebuild_zebra. Minor amendment to remove duplication of 6433 Signed-off-by: MJ Ray <mjr@phonecoop.coop> Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>	2011-07-05 11:18:28 +12:00
Frédéric Demians	3b8f1318e0	Bug 6050 Followup, edit a last function call Signed-off-by: Frédéric Demians <f.demians@tamil.fr> Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>	2011-06-14 14:12:05 +12:00
Srdjan Janković	5829cef6d8	bug_6433: exception handling Signed-off-by: Magnus Enger <magnus@enger.priv.no>	2011-06-10 11:27:25 +12:00
Galen Charlton	e96315556b	bug 5579: new routine to embed items in bib Adds a new routine, C4::Biblio::EmbedItemsInMarcBiblio, to embed the items in the bib record when necessary: * cataloging/additem.pl * rebuild_zebra.pl Signed-off-by: Galen Charlton <gmc@esilibrary.com> Signed-off-by: Claire Hernandez <claire.hernandez@biblibre.com> Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>	2011-04-19 22:34:21 +12:00
Henri-Damien LAURENT	3584c4426b	Bug 5579: remove items from MARC bib This is a squash of four patches by Henri-Damien Laurent starting work on removing the copy of item record information in the 9XX field of bibliographic records. The reason for doing this is primarily to improve performance, in particular, the expense of having to add/modify the bib record whenever an item changes. Now, whenever an item changes, the bib record is put in the queue to be reindexed; when the bib is indexed, the 9XX fields are inserted into the version of the bib that Zebra indexes. Since rebuild_zebra.pl runs in a separate process, the processing of the bib record will not delay (e.g.) circulation. As part of upgrading to 3.4, the following batch script should be run: misc/maintenance/remove_items_from_biblioitems.pl --run This should be followed by a complete reindexing of the bib records, e.g., misc/migration_tools/rebuild_zebra.pl -b -r Signed-off-by: Galen Charlton <gmcharlt@gmail.com> Signed-off-by: Claire Hernandez <claire.hernandez@biblibre.com> Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>	2011-04-19 22:33:56 +12:00
Ian Walls	8dc56a0d2c	Bug 5831: rebuild_zebra.pl doesn't respect -r Reimplements support for -r, as well for -reset Signed-off-by: D Ruth Bavousett <ruth@bywatersolutions.com> Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>	2011-03-06 08:44:57 +13:00
Robin Sheat	8de1ef7e94	Bug 5228 - make rebuild_zebra handle fixing the zebra dirs If the zebra server directories don't exist, zebra will spit the dummy. This makes rebuild_zebra.pl smart enough to create them if they're not there. If that fails, it'll scream loudly so you know zebra isn't reindexing. Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>	2010-12-13 21:59:49 +13:00
Robin Sheat	57d11aee2c	Bug 5077 - ensure rebuild_zebra will run somewhere it can read This prevents it leaving files lying around in /tmp Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz> Signed-off-by: Galen Charlton <gmcharlt@gmail.com>	2010-10-06 08:00:17 -04:00
Donovan Jones	5e0b850d49	Bug 2505 - Add commented use warnings where missing in the misc/ directory	2010-04-21 20:26:44 +12:00
Frédéric Demians	459d732180	Bug 3301 - Speed up rebuild_zebra script With this patch, rebuild_zebra can re-index a whole Koha DB quickly: rebuild_zebra -r -b -nosanitize Biblio (authority) records are dump directly in a file from marcxml field without beeing transformed into MARC::Record object and corrected. DOCUMENTATION: rebuild_zebra.pl new paramater: -nosanitize export biblio/authority records directly from DB marcxml field without sanitizing records. It speed up dump process but could fail if DB contains badly encoded records. Works now only with -x and -b Signed-off-by: Galen Charlton <galen.charlton@liblime.com>	2009-06-29 07:52:46 -05:00
Brian Harrington	25cd35b3a1	bug 2924 fixed rebuild_zebra.pl to work when export is skipped reindexing now occurs if there are $num_records_exported or if $skip_export is set Signed-off-by: Galen Charlton <galen.charlton@liblime.com>	2009-03-04 08:28:22 -06:00
Michael Hafen	086b3ccf9a	bug in rebuild_zebra verbose logging - found another print I didn't want to see all the time Add the phrase 'if ( $verbose_logging )' to the two print statements concerning the skipping of biblio or authority records. I recently had to split biblio and authority index updating in my cron script ( had some really big records so had to add the -x switch which should only be used on biblios accourding to the help ). So I noticed that rebuild_zebra.pl printed messages that it was skipping biblios or authorities. This patch is to conditionalize those prints based on the verbose logging switch. Signed-off-by: Galen Charlton <galen.charlton@liblime.com>	2008-12-11 09:23:28 -06:00
Michael Hafen	62a590a954	Reduce logging from rebuild_zebra.pl with a command line option This reduces the output of the script and zebraidx, and creates a -v command line switch which will increase the logging to their former states. Signed-off-by: Galen Charlton <galen.charlton@liblime.com>	2008-10-01 13:05:20 -05:00
Galen Charlton	df1f46f9da	bug 2253: improve rebuild_zebra's handling of zebraqueue Prior to this patch, rebuild_zebra.pl -z was effectively hanging on to a lock on the zebraqueue table, preventing other scripts from inserting new entries into the table. This had the effect of causing circulation operations to time out. Refactored by having rebuld_zebra.pl pull the active queue into memory, then mark entries done by zebraqueue.id. Consequently, rebuild_zebra.pl should no longer block adding new entries into zebraqueue. Signed-off-by: Joshua Ferraro <jmf@liblime.com>	2008-06-19 09:49:06 -05:00
Galen Charlton	3109d5820e	rebuild_zebra.pl - add -y option rebuild_zebra.pl will now mark all zebraqueue entries of the affected record type(s) done when run in normal mode to index all records (as opposed to running it with -z to just process the zebraqueue). This prevents any running zebraqueue_daemon processes from attempting to reindex the same records, redundantly. The new -y swtich overrides this new behavior; in other words, if running rebuild_zebra.pl without -z, you can specify -y to not mark zebraqueue done. Signed-off-by: Joshua Ferraro <jmf@liblime.com>	2008-04-21 11:17:29 -05:00
Galen Charlton	e2c1f11715	fixed memory leak I introduced Accidentally introducing a circular reference in a MARC::Record object does not lead to goodness, particularly if you export lots and lots of them. Signed-off-by: Joshua Ferraro <jmf@liblime.com>	2008-04-01 06:46:05 -05:00
Galen Charlton	4f001186b6	still more rebuild_zebra refactoring Merged duplicate code for indexing bibs and authorities into a single index_records() function. Signed-off-by: Joshua Ferraro <jmf@liblime.com>	2008-03-25 07:58:03 -05:00
Galen Charlton	a5576b8dfe	IMPORTANT: added -z option to rebuild_zebra.pl The -z option, when used in conjunction with -a and/or -b, selects the records to reindex from the zebraqueue table. Both record updates and record deletes are handled. -z is cannot be used with -s or -r: the updated records must always be freshly exported, and if zebraqueue is to be processed, it's assumed that you don't want to drop the Zebra index first. This means that rebuild_zebra.pl -b -a -x can be used as a cronjob to update the indexes periodically; it is believed that this will offer much better indexing performance on some setups as compared to zebraqueue_daemon.pl, which uses Z39.50 extended services to send record updates to Zebra. Signed-off-by: Joshua Ferraro <jmf@liblime.com>	2008-03-25 07:58:01 -05:00
Galen Charlton	57d128f727	rebuild_zebra: exit if both -a and -x specified At moment using both -a (index authorities) and -x (export records as MARC XML) is not allowed - if the Zebra authority database is using the DOM filter, zebraidx will not be able to process the exported records correctly. Signed-off-by: Joshua Ferraro <jmf@liblime.com>	2008-03-25 07:57:44 -05:00
Galen Charlton	f0d5da7448	more rebuild_zebra.pl refactoring 1. Logic to fix up record IDs, UNIMARC 100 field, and record leader now in separate functions. 2. Removed (incorrect) logic to save corrected record in database. Signed-off-by: Joshua Ferraro <jmf@liblime.com>	2008-03-25 07:57:43 -05:00
Galen Charlton	f98c27a8bc	refactor rebuild_zebra: new routine for invoking zebraidx Created a routine for calling zebraidx, replacing separate invocations for bibs and authorities. Signed-off-by: Joshua Ferraro <jmf@liblime.com>	2008-03-25 07:57:42 -05:00
Galen Charlton	ae8a76dacc	rebuild_zebra.pl: removed disused $limit option Signed-off-by: Joshua Ferraro <jmf@liblime.com>	2008-03-25 07:57:41 -05:00
Ryan Higgins	71dd69d5ac	add option to export and index xml to rebuild_zebra Signed-off-by: Chris Cormack <crc@liblime.com> Signed-off-by: Joshua Ferraro <jmf@liblime.com>	2008-02-15 08:25:46 -06:00
Paul POULAIN	319a32b16e	rebuild_zebra : directories updated the unimarc stuff has been moved to marc_defs directory and the lang specific is in lang_defs Signed-off-by: Chris Cormack <crc@liblime.com> Signed-off-by: Joshua Ferraro <jmf@liblime.com>	2008-01-03 00:55:12 -06:00
Joshua Ferraro	c6ddddad98	adding a new option, -w, which disables shadow indexing for the current batch (faster indexing of large sets where ACID isn't critical) Signed-off-by: Joshua Ferraro <jmf@liblime.com>	2007-12-30 12:13:27 -06:00
Galen Charlton	4609608ccc	allow use of older version of File::Temp Signed-off-by: Joshua Ferraro <jmf@liblime.com>	2007-12-22 22:58:12 -06:00
Galen Charlton	93beb943c0	bug 1661: rebuild_zebra.pl changes [1] Use File::Temp to create and manage export directory if -d is not specified. [2] Added usage message. [3] Code that attempts to fix up Zebra configuration files changed so that it is invoked only if --munge-config option is supplied; this code will ultimately either be removed or moved to a separate script -- the sorts of errors that it tries to fix should no longer be appearing in a standard install. [4] Fixed Win32 portability problem when removing temporary directory. Signed-off-by: Chris Cormack <crc@liblime.com> Signed-off-by: Joshua Ferraro <jmf@liblime.com>	2007-12-20 19:19:43 -06:00
Paul POULAIN	262a6e2a9a	Updating rebuild_zebra.pl : now uses etc config files There are only 2 UNIMARC specific files (.abs and .chr), they have been moved to etc/zebradb The rebuild_zebra.pl takes all config file from this location now. the misc/zebra/ can be removed (and will be soon) Signed-off-by: Joshua Ferraro <jmf@liblime.com>	2007-11-25 17:07:46 -06:00
Paul POULAIN	f38b7598fc	still handling better dirty MARC records this time it's when a biblio don't have biblionumber, has a 100$a field, and it's invalid. 1 biblio in my 300 000 DB (and it was biblio 294 359, of course !) Signed-off-by: Chris Cormack <crc@liblime.com> Signed-off-by: Joshua Ferraro <jmf@liblime.com>	2007-11-20 16:20:50 -06:00

1 2

70 commits