Koha/misc/migration_tools/rebuild_zebra_idx.pl
tipaul f74823bf1b OK, this time it seems to work. The last blocking problem was... a space in
recordId: (bib1,Identifier-standard) just after the comma. Adam agreed it was a bug, and it should be solved soon. But now we are aware, we can avoid putting the space !

In this commit you have all what is needed to setup a working zebra DB in Unimarc :
* collection.abs is UNIMARC specific and must be rewritten for MARC21, in marc21 directory
* pdf.properties is to be copied unmodified in the marc21 directory (can also be put somewhere else)
* rebuild_zebra.pl is SLOW, but 1 step reindexing tool, using ZOOM
* rebuild_zebra_idx is FAST, but 2 step reindexing tool, and does not use zebra. run it, it will create all biblios XML files in /zebra/biblios directory, then zebraidx update biblios in your zebra directory
* zebra.cfg is the zebra config file ;-)
* test_cql2rpn.pl is a script that will query the database and show the results. Works for me, just change the query at the beginning to get answers you expect.

What has to be done :
* benchmarking : it seems the zebraidx update is faster than lightning (400biblios/sec : 10 000biblios in 25seconds), while ZOOM indexing is slow (something like 25biblios/second) More benchmarking could be done.
* completing collection.abs for UNIMARC. I'll take care of it.
* modifying Biblio.pm to use ZOOM instead of the "zebraidx through exec" running actually. I'll take care of it also.
* modify the search API & tools & screens. I'll let the ball to someone else (chris ?) for this. I agree SearchMarc.pm can be dropped and replaced by something else (maybe a new-and-clean Search.pm package)
2006-02-09 10:59:34 +00:00

55 lines
No EOL
1.2 KiB
Perl
Executable file

#!/usr/bin/perl
# small script that import an iso2709 file into koha 2.0
use strict;
# Koha modules used
use MARC::File::USMARC;
use MARC::Record;
use MARC::Batch;
use C4::Context;
use C4::Biblio;
use Time::HiRes qw(gettimeofday);
use Getopt::Long;
my ( $input_marc_file, $number) = ('',0);
my ($confirm);
GetOptions(
'c' => \$confirm,
);
unless ($confirm) {
print <<EOF
script to write files for zebra DB reindexing. Once it's done, run zebraidx update biblios
run the script with -c to confirm the reindexing.
EOF
;#'
die;
}
$|=1; # flushes output
my $dbh = C4::Context->dbh;
my $cgidir = C4::Context->intranetdir ."/cgi-bin";
unless (opendir(DIR, "$cgidir")) {
$cgidir = C4::Context->intranetdir."/";
}
my $starttime = gettimeofday;
my $sth = $dbh->prepare("select biblionumber from biblio");
$sth->execute;
my $i=0;
while ((my $biblionumber) = $sth->fetchrow) {
my $record = MARCgetbiblio($dbh,$biblionumber);
my $filename = $cgidir."/zebra/biblios/BIBLIO".$biblionumber."iso2709";
open F,"> $filename";
print F $record->as_xml();
close F;
$i++;
print "\r$i" unless ($i % 100);
}
my $timeneeded = gettimeofday - $starttime;
print "\n$i MARC record done in $timeneeded seconds\n";