Commit graph

76 commits

Author SHA1 Message Date
Paul POULAIN
319a32b16e rebuild_zebra : directories updated
the unimarc stuff has been moved to marc_defs directory and the
lang specific is in lang_defs

Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2008-01-03 00:55:12 -06:00
Joshua Ferraro
c6ddddad98 adding a new option, -w, which disables shadow indexing for the current batch (faster indexing of large sets where ACID isn't critical)
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2007-12-30 12:13:27 -06:00
Galen Charlton
4609608ccc allow use of older version of File::Temp
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2007-12-22 22:58:12 -06:00
Galen Charlton
93beb943c0 bug 1661: rebuild_zebra.pl changes
[1] Use File::Temp to create and manage
    export directory if -d is not specified.
[2] Added usage message.
[3] Code that attempts to fix up Zebra
    configuration files changed so that it
    is invoked only if --munge-config option
    is supplied; this code will ultimately
    either be removed or moved to a separate
    script -- the sorts of errors that it
    tries to fix should no longer be appearing
    in a standard install.
[4] Fixed Win32 portability problem when removing
    temporary directory.

Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2007-12-20 19:19:43 -06:00
Paul POULAIN
262a6e2a9a Updating rebuild_zebra.pl : now uses etc config files
There are only 2 UNIMARC specific files (.abs and .chr), they have been moved to etc/zebradb

The rebuild_zebra.pl takes all config file from this location now.
the misc/zebra/ can be removed (and will be soon)

Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2007-11-25 17:07:46 -06:00
Paul POULAIN
f38b7598fc still handling better dirty MARC records
this time it's when a biblio don't have biblionumber, has a 100$a field, and it's invalid.

1 biblio in my 300 000 DB (and it was biblio 294 359, of course !)

Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2007-11-20 16:20:50 -06:00
Paul POULAIN
f1bca9ba50 missing biblionumber AND missing unimarc 100 was not properly handled
now, adding both on the fly when needed.
(had 2 biblios like that in a 290 000 DB, but was enought to have M::F::X complaining & diing !)

Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2007-11-17 11:25:07 -06:00
Paul POULAIN
ef1ac56857 handling wrong MARC record better
Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2007-11-12 17:13:00 -06:00
Paul POULAIN
9149a711fb bugfixes to config files for zebra 2.0.18
those 2 lines are invalid

Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2007-11-08 17:50:00 -06:00
Paul POULAIN
b7eb9e1b5c rebuild_zebra now handle correctly improper authorities records
(missing 100 field are automatically added)

Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2007-11-07 08:18:24 -06:00
Paul POULAIN
bb5cea8e56 deal with wrong authorities when exporting for zebra
(authorities that don't have a 001 field containing authid)

also comment some code when exporting biblios (NOT tested, hdl,pls confirm this commit)

Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2007-11-07 08:18:19 -06:00
Paul POULAIN
89b9e8f8c1 skip empty records (new GetMarcRecord behaviour that returns empty string and not empty MARC::Record)
Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2007-10-31 19:41:49 -05:00
Paul POULAIN
49ef1df969 Adding a new option to rebuildzebra : noxml
This option uses the iso2709 version of the MARC record instead of the XML one
(biblioitems.marc vs biblioitems.marcxml)
No change if the parameter is not set.

Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2007-10-09 19:07:36 -05:00
Joshua Ferraro
ae34e8f45a changing the name of the zebra password file to passwd
Signed-off-by: Chris Cormack <crc@liblime.com>
2007-10-01 23:14:47 -05:00
tipaul
1399945a75 eval() on getAuthority & getBiblio to avoid a script failure 2007-08-01 09:20:03 +00:00
tipaul
5dd3f0229a bugfixes (various), handling utf-8 without guessencoding (as suggested by joshua, fixing some zebra config files -for french but should be interesting for other languages- 2007-06-06 13:08:35 +00:00
tipaul
0569dccd5f some changes to default zebra config for better searches 2007-05-25 09:34:30 +00:00
tipaul
5ff7fcffa4 Bugfixes & improvements (various and minor) :
- updating templates to have tmpl_process3.pl running without any errors
- adding a drupal-like css for prog templates (with 3 small images)
- fixing some bugs in circulation & other scripts
- updating french translation
- fixing some typos in templates
2007-05-22 09:13:54 +00:00
tipaul
ca201e36af Koha NoZebra :
- support for authorities
- some bugfixes in ordering and "CCL" parsing
- support for authorities <=> biblios walking

Seems I can do what I want now, so I consider its done, except for bugfixes that will be needed i m sure !
2007-05-10 14:45:15 +00:00
tipaul
6b201757c1 some bugfixes for this script that automatically build zebra DB from default config files 2007-04-17 08:50:33 +00:00
tipaul
a481fad4b7 Code cleaning :
== Biblio.pm cleaning (useless) ==
* some sub declaration dropped
* removed modbiblio sub
* removed moditem sub
* removed newitems. It was used only in finishrecieve. Replaced by a Koha2Marc+AddItem, that is better.
* removed MARCkoha2marcItem
* removed MARCdelsubfield declaration
* removed MARCkoha2marcBiblio

== Biblio.pm cleaning (naming conventions) ==
* MARCgettagslib renamed to GetMarcStructure
* MARCgetitems renamed to GetMarcItem
* MARCfind_frameworkcode renamed to GetFrameworkCode
* MARCmarc2koha renamed to TransformMarcToKoha
* MARChtml2marc renamed to TransformHtmlToMarc
* MARChtml2xml renamed to TranformeHtmlToXml
* zebraop renamed to ModZebra

== MARC=OFF ==
* removing MARC=OFF related scripts (in cataloguing directory)
* removed checkitems (function related to MARC=off feature, that is completly broken in head. If someone want to reintroduce it, hard work coming...)
* removed getitemsbybiblioitem (used only by MARC=OFF scripts, that is removed as well)
2007-03-29 13:30:31 +00:00
tipaul
a3999812e6 rel_3_0 moved to HEAD 2007-03-09 14:52:58 +00:00
tipaul
f74823bf1b OK, this time it seems to work. The last blocking problem was... a space in
recordId: (bib1,Identifier-standard) just after the comma. Adam agreed it was a bug, and it should be solved soon. But now we are aware, we can avoid putting the space !

In this commit you have all what is needed to setup a working zebra DB in Unimarc :
* collection.abs is UNIMARC specific and must be rewritten for MARC21, in marc21 directory
* pdf.properties is to be copied unmodified in the marc21 directory (can also be put somewhere else)
* rebuild_zebra.pl is SLOW, but 1 step reindexing tool, using ZOOM
* rebuild_zebra_idx is FAST, but 2 step reindexing tool, and does not use zebra. run it, it will create all biblios XML files in /zebra/biblios directory, then zebraidx update biblios in your zebra directory
* zebra.cfg is the zebra config file ;-)
* test_cql2rpn.pl is a script that will query the database and show the results. Works for me, just change the query at the beginning to get answers you expect.

What has to be done :
* benchmarking : it seems the zebraidx update is faster than lightning (400biblios/sec : 10 000biblios in 25seconds), while ZOOM indexing is slow (something like 25biblios/second) More benchmarking could be done.
* completing collection.abs for UNIMARC. I'll take care of it.
* modifying Biblio.pm to use ZOOM instead of the "zebraidx through exec" running actually. I'll take care of it also.
* modify the search API & tools & screens. I'll let the ball to someone else (chris ?) for this. I agree SearchMarc.pm can be dropped and replaced by something else (maybe a new-and-clean Search.pm package)
2006-02-09 10:59:34 +00:00
tipaul
369ee65d94 new version of rebuild_zebra. Should work with Perl-ZOOM, but DOES NOT WORK for me.
I get  :
ZOOM error 10002 "Encoding failed" from diag-set 'ZOOM'

help expected from indexdata...
2006-01-10 17:03:32 +00:00
tipaul
d5938493d7 synch'ing head and rel_2_2 (from 2.2.5, including npl templates)
Seems not to break too many things, but i'm probably wrong here.
at least, new features/bugfixes from 2.2.5 are here (tested on some features on my head local copy)

- removing useless directories (koha-html and koha-plucene)
2006-01-06 16:39:37 +00:00
tipaul
dba37f38e7 This script can be use to rebuild the zebra DB. It stores all koha MARC records in iso2709, in the bilbios directory. After that, you just have to "zebraidx update biblios"
I tried on a 9900 DB, here are the results :

[paul@bureau migration_tools]$ ./rebuild_zebra.pl -c
9900
9903 MARC record done in 37.9104120731354 seconds

[paul@bureau zebra]$ zebraidx update biblios
<snip>
18:31:24-11/08 zebraidx(20348) [log] Iterations . . . 144575
18:31:24-11/08 zebraidx(20348) [log] Distinct words .  39891
18:31:24-11/08 zebraidx(20348) [log] Updates. . . . .     46
18:31:24-11/08 zebraidx(20348) [log] Deletions. . . .      2
18:31:24-11/08 zebraidx(20348) [log] Insertions . . .  39843
18:31:24-11/08 zebraidx(20348) [log] zebra_register_close p=0x8104cf8
18:31:25-11/08 zebraidx(20348) [log] Records:    9887 i/u/d 9881/6/0
18:31:25-11/08 zebraidx(20348) [log] user/system: 531/145
18:31:25-11/08 zebraidx(20348) [log] zebra_stop
18:31:25-11/08 zebraidx(20348) [log] zebraidx times: 11.33  5.31  1.45
2005-08-11 16:35:54 +00:00