Koha-community/Koha - Koha: The world's first free and open source library system

Author	SHA1	Message	Date
Frédéric Demians	459d732180	Bug 3301 - Speed up rebuild_zebra script With this patch, rebuild_zebra can re-index a whole Koha DB quickly: rebuild_zebra -r -b -nosanitize Biblio (authority) records are dump directly in a file from marcxml field without beeing transformed into MARC::Record object and corrected. DOCUMENTATION: rebuild_zebra.pl new paramater: -nosanitize export biblio/authority records directly from DB marcxml field without sanitizing records. It speed up dump process but could fail if DB contains badly encoded records. Works now only with -x and -b Signed-off-by: Galen Charlton <galen.charlton@liblime.com>	2009-06-29 07:52:46 -05:00
Brian Harrington	25cd35b3a1	bug 2924 fixed rebuild_zebra.pl to work when export is skipped reindexing now occurs if there are $num_records_exported or if $skip_export is set Signed-off-by: Galen Charlton <galen.charlton@liblime.com>	2009-03-04 08:28:22 -06:00
Michael Hafen	086b3ccf9a	bug in rebuild_zebra verbose logging - found another print I didn't want to see all the time Add the phrase 'if ( $verbose_logging )' to the two print statements concerning the skipping of biblio or authority records. I recently had to split biblio and authority index updating in my cron script ( had some really big records so had to add the -x switch which should only be used on biblios accourding to the help ). So I noticed that rebuild_zebra.pl printed messages that it was skipping biblios or authorities. This patch is to conditionalize those prints based on the verbose logging switch. Signed-off-by: Galen Charlton <galen.charlton@liblime.com>	2008-12-11 09:23:28 -06:00
Michael Hafen	62a590a954	Reduce logging from rebuild_zebra.pl with a command line option This reduces the output of the script and zebraidx, and creates a -v command line switch which will increase the logging to their former states. Signed-off-by: Galen Charlton <galen.charlton@liblime.com>	2008-10-01 13:05:20 -05:00
Galen Charlton	df1f46f9da	bug 2253: improve rebuild_zebra's handling of zebraqueue Prior to this patch, rebuild_zebra.pl -z was effectively hanging on to a lock on the zebraqueue table, preventing other scripts from inserting new entries into the table. This had the effect of causing circulation operations to time out. Refactored by having rebuld_zebra.pl pull the active queue into memory, then mark entries done by zebraqueue.id. Consequently, rebuild_zebra.pl should no longer block adding new entries into zebraqueue. Signed-off-by: Joshua Ferraro <jmf@liblime.com>	2008-06-19 09:49:06 -05:00
Galen Charlton	3109d5820e	rebuild_zebra.pl - add -y option rebuild_zebra.pl will now mark all zebraqueue entries of the affected record type(s) done when run in normal mode to index all records (as opposed to running it with -z to just process the zebraqueue). This prevents any running zebraqueue_daemon processes from attempting to reindex the same records, redundantly. The new -y swtich overrides this new behavior; in other words, if running rebuild_zebra.pl without -z, you can specify -y to not mark zebraqueue done. Signed-off-by: Joshua Ferraro <jmf@liblime.com>	2008-04-21 11:17:29 -05:00
Galen Charlton	e2c1f11715	fixed memory leak I introduced Accidentally introducing a circular reference in a MARC::Record object does not lead to goodness, particularly if you export lots and lots of them. Signed-off-by: Joshua Ferraro <jmf@liblime.com>	2008-04-01 06:46:05 -05:00
Galen Charlton	4f001186b6	still more rebuild_zebra refactoring Merged duplicate code for indexing bibs and authorities into a single index_records() function. Signed-off-by: Joshua Ferraro <jmf@liblime.com>	2008-03-25 07:58:03 -05:00
Galen Charlton	a5576b8dfe	IMPORTANT: added -z option to rebuild_zebra.pl The -z option, when used in conjunction with -a and/or -b, selects the records to reindex from the zebraqueue table. Both record updates and record deletes are handled. -z is cannot be used with -s or -r: the updated records must always be freshly exported, and if zebraqueue is to be processed, it's assumed that you don't want to drop the Zebra index first. This means that rebuild_zebra.pl -b -a -x can be used as a cronjob to update the indexes periodically; it is believed that this will offer much better indexing performance on some setups as compared to zebraqueue_daemon.pl, which uses Z39.50 extended services to send record updates to Zebra. Signed-off-by: Joshua Ferraro <jmf@liblime.com>	2008-03-25 07:58:01 -05:00
Galen Charlton	57d128f727	rebuild_zebra: exit if both -a and -x specified At moment using both -a (index authorities) and -x (export records as MARC XML) is not allowed - if the Zebra authority database is using the DOM filter, zebraidx will not be able to process the exported records correctly. Signed-off-by: Joshua Ferraro <jmf@liblime.com>	2008-03-25 07:57:44 -05:00
Galen Charlton	f0d5da7448	more rebuild_zebra.pl refactoring 1. Logic to fix up record IDs, UNIMARC 100 field, and record leader now in separate functions. 2. Removed (incorrect) logic to save corrected record in database. Signed-off-by: Joshua Ferraro <jmf@liblime.com>	2008-03-25 07:57:43 -05:00
Galen Charlton	f98c27a8bc	refactor rebuild_zebra: new routine for invoking zebraidx Created a routine for calling zebraidx, replacing separate invocations for bibs and authorities. Signed-off-by: Joshua Ferraro <jmf@liblime.com>	2008-03-25 07:57:42 -05:00
Galen Charlton	ae8a76dacc	rebuild_zebra.pl: removed disused $limit option Signed-off-by: Joshua Ferraro <jmf@liblime.com>	2008-03-25 07:57:41 -05:00
Ryan Higgins	71dd69d5ac	add option to export and index xml to rebuild_zebra Signed-off-by: Chris Cormack <crc@liblime.com> Signed-off-by: Joshua Ferraro <jmf@liblime.com>	2008-02-15 08:25:46 -06:00
Paul POULAIN	319a32b16e	rebuild_zebra : directories updated the unimarc stuff has been moved to marc_defs directory and the lang specific is in lang_defs Signed-off-by: Chris Cormack <crc@liblime.com> Signed-off-by: Joshua Ferraro <jmf@liblime.com>	2008-01-03 00:55:12 -06:00
Joshua Ferraro	c6ddddad98	adding a new option, -w, which disables shadow indexing for the current batch (faster indexing of large sets where ACID isn't critical) Signed-off-by: Joshua Ferraro <jmf@liblime.com>	2007-12-30 12:13:27 -06:00
Galen Charlton	4609608ccc	allow use of older version of File::Temp Signed-off-by: Joshua Ferraro <jmf@liblime.com>	2007-12-22 22:58:12 -06:00
Galen Charlton	93beb943c0	bug 1661: rebuild_zebra.pl changes [1] Use File::Temp to create and manage export directory if -d is not specified. [2] Added usage message. [3] Code that attempts to fix up Zebra configuration files changed so that it is invoked only if --munge-config option is supplied; this code will ultimately either be removed or moved to a separate script -- the sorts of errors that it tries to fix should no longer be appearing in a standard install. [4] Fixed Win32 portability problem when removing temporary directory. Signed-off-by: Chris Cormack <crc@liblime.com> Signed-off-by: Joshua Ferraro <jmf@liblime.com>	2007-12-20 19:19:43 -06:00
Paul POULAIN	262a6e2a9a	Updating rebuild_zebra.pl : now uses etc config files There are only 2 UNIMARC specific files (.abs and .chr), they have been moved to etc/zebradb The rebuild_zebra.pl takes all config file from this location now. the misc/zebra/ can be removed (and will be soon) Signed-off-by: Joshua Ferraro <jmf@liblime.com>	2007-11-25 17:07:46 -06:00
Paul POULAIN	f38b7598fc	still handling better dirty MARC records this time it's when a biblio don't have biblionumber, has a 100$a field, and it's invalid. 1 biblio in my 300 000 DB (and it was biblio 294 359, of course !) Signed-off-by: Chris Cormack <crc@liblime.com> Signed-off-by: Joshua Ferraro <jmf@liblime.com>	2007-11-20 16:20:50 -06:00
Paul POULAIN	f1bca9ba50	missing biblionumber AND missing unimarc 100 was not properly handled now, adding both on the fly when needed. (had 2 biblios like that in a 290 000 DB, but was enought to have M::F::X complaining & diing !) Signed-off-by: Chris Cormack <crc@liblime.com> Signed-off-by: Joshua Ferraro <jmf@liblime.com>	2007-11-17 11:25:07 -06:00
Paul POULAIN	ef1ac56857	handling wrong MARC record better Signed-off-by: Chris Cormack <crc@liblime.com> Signed-off-by: Joshua Ferraro <jmf@liblime.com>	2007-11-12 17:13:00 -06:00
Paul POULAIN	9149a711fb	bugfixes to config files for zebra 2.0.18 those 2 lines are invalid Signed-off-by: Chris Cormack <crc@liblime.com> Signed-off-by: Joshua Ferraro <jmf@liblime.com>	2007-11-08 17:50:00 -06:00
Paul POULAIN	b7eb9e1b5c	rebuild_zebra now handle correctly improper authorities records (missing 100 field are automatically added) Signed-off-by: Chris Cormack <crc@liblime.com> Signed-off-by: Joshua Ferraro <jmf@liblime.com>	2007-11-07 08:18:24 -06:00
Paul POULAIN	bb5cea8e56	deal with wrong authorities when exporting for zebra (authorities that don't have a 001 field containing authid) also comment some code when exporting biblios (NOT tested, hdl,pls confirm this commit) Signed-off-by: Chris Cormack <crc@liblime.com> Signed-off-by: Joshua Ferraro <jmf@liblime.com>	2007-11-07 08:18:19 -06:00
Paul POULAIN	89b9e8f8c1	skip empty records (new GetMarcRecord behaviour that returns empty string and not empty MARC::Record) Signed-off-by: Chris Cormack <crc@liblime.com> Signed-off-by: Joshua Ferraro <jmf@liblime.com>	2007-10-31 19:41:49 -05:00
Paul POULAIN	49ef1df969	Adding a new option to rebuildzebra : noxml This option uses the iso2709 version of the MARC record instead of the XML one (biblioitems.marc vs biblioitems.marcxml) No change if the parameter is not set. Signed-off-by: Chris Cormack <crc@liblime.com> Signed-off-by: Joshua Ferraro <jmf@liblime.com>	2007-10-09 19:07:36 -05:00
Joshua Ferraro	ae34e8f45a	changing the name of the zebra password file to passwd Signed-off-by: Chris Cormack <crc@liblime.com>	2007-10-01 23:14:47 -05:00
tipaul	1399945a75	eval() on getAuthority & getBiblio to avoid a script failure	2007-08-01 09:20:03 +00:00
tipaul	5dd3f0229a	bugfixes (various), handling utf-8 without guessencoding (as suggested by joshua, fixing some zebra config files -for french but should be interesting for other languages-	2007-06-06 13:08:35 +00:00
tipaul	0569dccd5f	some changes to default zebra config for better searches	2007-05-25 09:34:30 +00:00
tipaul	5ff7fcffa4	Bugfixes & improvements (various and minor) : - updating templates to have tmpl_process3.pl running without any errors - adding a drupal-like css for prog templates (with 3 small images) - fixing some bugs in circulation & other scripts - updating french translation - fixing some typos in templates	2007-05-22 09:13:54 +00:00
tipaul	ca201e36af	Koha NoZebra : - support for authorities - some bugfixes in ordering and "CCL" parsing - support for authorities <=> biblios walking Seems I can do what I want now, so I consider its done, except for bugfixes that will be needed i m sure !	2007-05-10 14:45:15 +00:00
tipaul	6b201757c1	some bugfixes for this script that automatically build zebra DB from default config files	2007-04-17 08:50:33 +00:00
tipaul	a481fad4b7	Code cleaning : == Biblio.pm cleaning (useless) == * some sub declaration dropped * removed modbiblio sub * removed moditem sub * removed newitems. It was used only in finishrecieve. Replaced by a Koha2Marc+AddItem, that is better. * removed MARCkoha2marcItem * removed MARCdelsubfield declaration * removed MARCkoha2marcBiblio == Biblio.pm cleaning (naming conventions) == * MARCgettagslib renamed to GetMarcStructure * MARCgetitems renamed to GetMarcItem * MARCfind_frameworkcode renamed to GetFrameworkCode * MARCmarc2koha renamed to TransformMarcToKoha * MARChtml2marc renamed to TransformHtmlToMarc * MARChtml2xml renamed to TranformeHtmlToXml * zebraop renamed to ModZebra == MARC=OFF == * removing MARC=OFF related scripts (in cataloguing directory) * removed checkitems (function related to MARC=off feature, that is completly broken in head. If someone want to reintroduce it, hard work coming...) * removed getitemsbybiblioitem (used only by MARC=OFF scripts, that is removed as well)	2007-03-29 13:30:31 +00:00
tipaul	a3999812e6	rel_3_0 moved to HEAD	2007-03-09 14:52:58 +00:00
tipaul	f74823bf1b	OK, this time it seems to work. The last blocking problem was... a space in recordId: (bib1,Identifier-standard) just after the comma. Adam agreed it was a bug, and it should be solved soon. But now we are aware, we can avoid putting the space ! In this commit you have all what is needed to setup a working zebra DB in Unimarc : * collection.abs is UNIMARC specific and must be rewritten for MARC21, in marc21 directory * pdf.properties is to be copied unmodified in the marc21 directory (can also be put somewhere else) * rebuild_zebra.pl is SLOW, but 1 step reindexing tool, using ZOOM * rebuild_zebra_idx is FAST, but 2 step reindexing tool, and does not use zebra. run it, it will create all biblios XML files in /zebra/biblios directory, then zebraidx update biblios in your zebra directory * zebra.cfg is the zebra config file ;-) * test_cql2rpn.pl is a script that will query the database and show the results. Works for me, just change the query at the beginning to get answers you expect. What has to be done : * benchmarking : it seems the zebraidx update is faster than lightning (400biblios/sec : 10 000biblios in 25seconds), while ZOOM indexing is slow (something like 25biblios/second) More benchmarking could be done. * completing collection.abs for UNIMARC. I'll take care of it. * modifying Biblio.pm to use ZOOM instead of the "zebraidx through exec" running actually. I'll take care of it also. * modify the search API & tools & screens. I'll let the ball to someone else (chris ?) for this. I agree SearchMarc.pm can be dropped and replaced by something else (maybe a new-and-clean Search.pm package)	2006-02-09 10:59:34 +00:00
tipaul	369ee65d94	new version of rebuild_zebra. Should work with Perl-ZOOM, but DOES NOT WORK for me. I get : ZOOM error 10002 "Encoding failed" from diag-set 'ZOOM' help expected from indexdata...	2006-01-10 17:03:32 +00:00
tipaul	d5938493d7	synch'ing head and rel_2_2 (from 2.2.5, including npl templates) Seems not to break too many things, but i'm probably wrong here. at least, new features/bugfixes from 2.2.5 are here (tested on some features on my head local copy) - removing useless directories (koha-html and koha-plucene)	2006-01-06 16:39:37 +00:00
tipaul	dba37f38e7	This script can be use to rebuild the zebra DB. It stores all koha MARC records in iso2709, in the bilbios directory. After that, you just have to "zebraidx update biblios" I tried on a 9900 DB, here are the results : [paul@bureau migration_tools]$ ./rebuild_zebra.pl -c 9900 9903 MARC record done in 37.9104120731354 seconds [paul@bureau zebra]$ zebraidx update biblios <snip> 18:31:24-11/08 zebraidx(20348) [log] Iterations . . . 144575 18:31:24-11/08 zebraidx(20348) [log] Distinct words . 39891 18:31:24-11/08 zebraidx(20348) [log] Updates. . . . . 46 18:31:24-11/08 zebraidx(20348) [log] Deletions. . . . 2 18:31:24-11/08 zebraidx(20348) [log] Insertions . . . 39843 18:31:24-11/08 zebraidx(20348) [log] zebra_register_close p=0x8104cf8 18:31:25-11/08 zebraidx(20348) [log] Records: 9887 i/u/d 9881/6/0 18:31:25-11/08 zebraidx(20348) [log] user/system: 531/145 18:31:25-11/08 zebraidx(20348) [log] zebra_stop 18:31:25-11/08 zebraidx(20348) [log] zebraidx times: 11.33 5.31 1.45	2005-08-11 16:35:54 +00:00

40 commits