Commit graph

274 commits

Author SHA1 Message Date
tipaul
3e85c9e97f NoZebra SQL index management :
* adding 3 subs in Biblio.pm
- GetNoZebraIndexes, that get the index structure in a new systempreference (added with this commit)
- _DelBiblioNoZebra, that retrieve all index entries for a biblio and remove in a variable the biblio reference
- _AddBiblioNoZebra, that add index entries for a biblio.
Note that the 2 _Add and _Del subs work only in a hash variable, to speed up things in case of a modif (ie : delete+add). The effective SQL update is done in the ModZebra sub (that existed before, and dealed with zebra index).
I think the code has to be more deeply tested, but it works at least partially.
2007-05-02 16:44:31 +00:00
tipaul
4213b6ec98 improving NOzebra search :
- changing nozebra table to have biblionumber,title-ranking; (; is the entry separator. Now, if a value is several times in an index, it is stored only once, with a higher ranking (the ranking is the number of times the word appeard for this index)
- improving search to have ranking value (default order). The ranking is the sum of ranking of all terms. The list is ordered by ranking+title, from most to lower
2007-05-02 11:57:11 +00:00
hdl
097fef712a Removing $dbh from GetMarcFromKohaField (dbh is not used in this function.) 2007-04-27 14:00:48 +00:00
tipaul
b53be9cdaf Koha 3.0 nozebra 1st commit : the script misc/migration_tools/rebuild_nozebra.pl build the nozebra table, and, if you set NoZebra to Yes, queries will be done through zebra. TODO :
- add nozebra table management on biblio editing
- the index table content is hardcoded. I still have to add some specific systempref to let the library update it
- manage pagination (next/previous)
- manage facets
WHAT works :
- NZgetRecords : has exactly the same API & returns as zebra getQuery, except that some parameters are unused
- search & sort works quite good
- CQL parser is better that what I thought I could do : title="harry and sally" and publicationyear>2000 not itemtype=LIVR should work fine
2007-04-25 16:26:42 +00:00
tipaul
6b201757c1 some bugfixes for this script that automatically build zebra DB from default config files 2007-04-17 08:50:33 +00:00
tipaul
eba2552086 Code cleaning of Biblio.pm (continued)
All subs have be cleaned :
- removed useless
- merged some
- reordering Biblio.pm completly
- using only naming conventions

Seems to have broken nothing, but it still has to be heavily tested.
Note that Biblio.pm is now much more efficient than previously & probably more reliable as well.
2007-03-29 16:45:53 +00:00
tipaul
a481fad4b7 Code cleaning :
== Biblio.pm cleaning (useless) ==
* some sub declaration dropped
* removed modbiblio sub
* removed moditem sub
* removed newitems. It was used only in finishrecieve. Replaced by a Koha2Marc+AddItem, that is better.
* removed MARCkoha2marcItem
* removed MARCdelsubfield declaration
* removed MARCkoha2marcBiblio

== Biblio.pm cleaning (naming conventions) ==
* MARCgettagslib renamed to GetMarcStructure
* MARCgetitems renamed to GetMarcItem
* MARCfind_frameworkcode renamed to GetFrameworkCode
* MARCmarc2koha renamed to TransformMarcToKoha
* MARChtml2marc renamed to TransformHtmlToMarc
* MARChtml2xml renamed to TranformeHtmlToXml
* zebraop renamed to ModZebra

== MARC=OFF ==
* removing MARC=OFF related scripts (in cataloguing directory)
* removed checkitems (function related to MARC=off feature, that is completly broken in head. If someone want to reintroduce it, hard work coming...)
* removed getitemsbybiblioitem (used only by MARC=OFF scripts, that is removed as well)
2007-03-29 13:30:31 +00:00
tipaul
f8e9fb6445 rel_3_0 moved to HEAD (introducing new files) 2007-03-09 15:34:17 +00:00
tipaul
a3999812e6 rel_3_0 moved to HEAD 2007-03-09 14:52:58 +00:00
thd
ad657e71eb For MARC 21, instead of deleting the whole subfield when a character does not
translate properly from MARC8 into UTF-8, only the problem characters are
deleted.
2006-09-01 17:11:53 +00:00
toins
eac83ccd45 Head & rel_2_2 merged 2006-07-04 15:02:42 +00:00
rangi
10b2315eb3 Fixing the problem that all items were getting biblioitem=1 set 2006-04-01 22:10:50 +00:00
kados
44b4d37b54 removed Zconns, no need for them anymore with new Context.pm setup 2006-02-27 01:06:30 +00:00
kados
fafe0896d6 minor bugfix with 'commit' option 2006-02-25 23:40:59 +00:00
kados
77abbe2caf A bulkmarcimport.pl that is based on the new Biblio.pm Zebra routines.
It now responds to:

-n : the number of records to import.
-commit : the number of records to wait before performing a 'commit' operation

ALSO: IMPORTANT: I took out the char_encoding as this should be handled by
MARC::File::XML now, unless I'm mistaken.
2006-02-25 21:53:48 +00:00
tipaul
f74823bf1b OK, this time it seems to work. The last blocking problem was... a space in
recordId: (bib1,Identifier-standard) just after the comma. Adam agreed it was a bug, and it should be solved soon. But now we are aware, we can avoid putting the space !

In this commit you have all what is needed to setup a working zebra DB in Unimarc :
* collection.abs is UNIMARC specific and must be rewritten for MARC21, in marc21 directory
* pdf.properties is to be copied unmodified in the marc21 directory (can also be put somewhere else)
* rebuild_zebra.pl is SLOW, but 1 step reindexing tool, using ZOOM
* rebuild_zebra_idx is FAST, but 2 step reindexing tool, and does not use zebra. run it, it will create all biblios XML files in /zebra/biblios directory, then zebraidx update biblios in your zebra directory
* zebra.cfg is the zebra config file ;-)
* test_cql2rpn.pl is a script that will query the database and show the results. Works for me, just change the query at the beginning to get answers you expect.

What has to be done :
* benchmarking : it seems the zebraidx update is faster than lightning (400biblios/sec : 10 000biblios in 25seconds), while ZOOM indexing is slow (something like 25biblios/second) More benchmarking could be done.
* completing collection.abs for UNIMARC. I'll take care of it.
* modifying Biblio.pm to use ZOOM instead of the "zebraidx through exec" running actually. I'll take care of it also.
* modify the search API & tools & screens. I'll let the ball to someone else (chris ?) for this. I agree SearchMarc.pm can be dropped and replaced by something else (maybe a new-and-clean Search.pm package)
2006-02-09 10:59:34 +00:00
tipaul
369ee65d94 new version of rebuild_zebra. Should work with Perl-ZOOM, but DOES NOT WORK for me.
I get  :
ZOOM error 10002 "Encoding failed" from diag-set 'ZOOM'

help expected from indexdata...
2006-01-10 17:03:32 +00:00
tipaul
d5938493d7 synch'ing head and rel_2_2 (from 2.2.5, including npl templates)
Seems not to break too many things, but i'm probably wrong here.
at least, new features/bugfixes from 2.2.5 are here (tested on some features on my head local copy)

- removing useless directories (koha-html and koha-plucene)
2006-01-06 16:39:37 +00:00
tipaul
dba37f38e7 This script can be use to rebuild the zebra DB. It stores all koha MARC records in iso2709, in the bilbios directory. After that, you just have to "zebraidx update biblios"
I tried on a 9900 DB, here are the results :

[paul@bureau migration_tools]$ ./rebuild_zebra.pl -c
9900
9903 MARC record done in 37.9104120731354 seconds

[paul@bureau zebra]$ zebraidx update biblios
<snip>
18:31:24-11/08 zebraidx(20348) [log] Iterations . . . 144575
18:31:24-11/08 zebraidx(20348) [log] Distinct words .  39891
18:31:24-11/08 zebraidx(20348) [log] Updates. . . . .     46
18:31:24-11/08 zebraidx(20348) [log] Deletions. . . .      2
18:31:24-11/08 zebraidx(20348) [log] Insertions . . .  39843
18:31:24-11/08 zebraidx(20348) [log] zebra_register_close p=0x8104cf8
18:31:25-11/08 zebraidx(20348) [log] Records:    9887 i/u/d 9881/6/0
18:31:25-11/08 zebraidx(20348) [log] user/system: 531/145
18:31:25-11/08 zebraidx(20348) [log] zebra_stop
18:31:25-11/08 zebraidx(20348) [log] zebraidx times: 11.33  5.31  1.45
2005-08-11 16:35:54 +00:00
tipaul
c52e5b61dd synch'ing 2.2 and head 2005-08-04 14:10:52 +00:00
tipaul
64cd740d2b synch'ing 2.2 and head 2005-05-04 08:58:30 +00:00
tipaul
93ff09d081 merging 2.2 branch with head. Sorry for not making it before, many many commits done here 2005-03-01 13:40:35 +00:00
tipaul
51e204fa23 moving bulkmarcimport script to migration_tools directory 2005-01-03 15:25:50 +00:00
tipaul
cd6f87a689 Auto-build LANG authorized values 2005-01-03 12:59:49 +00:00