Introduced new C4::Biblio function CheckItemPreSave,
which checks for duplicate barcodes and invalid
branch codes. Not yet sure whether this function
needs to be exported or whether it will just be
used internally to C4::Bibli.
Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
Changes to improve speed of MARC bib and item
imports:
[1] Turn off autocommit and commit database
transactions in larger batches.
[2] Introduce a new C4::Biblio function (AddBiblioAndItems)
to combine AddBiblio and AddItems -- this is faster
because we are not parsing the MARC XML of the biblio
every time we add an item.
[3] Introduce FasterTransformMarcToKoha, which is much
faster than TransformMarcToKoha. The new version,
which will replace the old one once it has been
fully tested, scans through each field in the
MARC record just once, instead of potentially
dozens of times.
[4] Remove code in bulkmarcexport that moved the
item tags to separate MARC::Record objects.
Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
[1] Use File::Temp to create and manage
export directory if -d is not specified.
[2] Added usage message.
[3] Code that attempts to fix up Zebra
configuration files changed so that it
is invoked only if --munge-config option
is supplied; this code will ultimately
either be removed or moved to a separate
script -- the sorts of errors that it
tries to fix should no longer be appearing
in a standard install.
[4] Fixed Win32 portability problem when removing
temporary directory.
Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
There are only 2 UNIMARC specific files (.abs and .chr), they have been moved to etc/zebradb
The rebuild_zebra.pl takes all config file from this location now.
the misc/zebra/ can be removed (and will be soon)
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
this time it's when a biblio don't have biblionumber, has a 100$a field, and it's invalid.
1 biblio in my 300 000 DB (and it was biblio 294 359, of course !)
Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
now, adding both on the fly when needed.
(had 2 biblios like that in a 290 000 DB, but was enought to have M::F::X complaining & diing !)
Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
(authorities that don't have a 001 field containing authid)
also comment some code when exporting biblios (NOT tested, hdl,pls confirm this commit)
Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
- the quotemeta was wrong (and introduced some bugs in diacritics)
- fixing some bugs that appear only sometimes : the union was done including weight, which is wrong & resulted in missing some results (when various weighting)
Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
If 100$a repeated, the scripts failed to handle that correctly
Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
This option uses the iso2709 version of the MARC record instead of the XML one
(biblioitems.marc vs biblioitems.marcxml)
No change if the parameter is not set.
Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
- adding browser and nozebra table definition to kohastructure & updatedatabase
- bumping to 3.00.00.005
Signed-off-by: Chris Cormack <crc@liblime.com>
- use biblio instead of marc_biblio,
- better check that biblionumber is correctly stored
- fix an buggy API call when ModMarcBiblio
Signed-off-by: Chris Cormack <crc@liblime.com>
- updating templates to have tmpl_process3.pl running without any errors
- adding a drupal-like css for prog templates (with 3 small images)
- fixing some bugs in circulation & other scripts
- updating french translation
- fixing some typos in templates
- support for authorities
- some bugfixes in ordering and "CCL" parsing
- support for authorities <=> biblios walking
Seems I can do what I want now, so I consider its done, except for bugfixes that will be needed i m sure !
* adding 3 subs in Biblio.pm
- GetNoZebraIndexes, that get the index structure in a new systempreference (added with this commit)
- _DelBiblioNoZebra, that retrieve all index entries for a biblio and remove in a variable the biblio reference
- _AddBiblioNoZebra, that add index entries for a biblio.
Note that the 2 _Add and _Del subs work only in a hash variable, to speed up things in case of a modif (ie : delete+add). The effective SQL update is done in the ModZebra sub (that existed before, and dealed with zebra index).
I think the code has to be more deeply tested, but it works at least partially.
- changing nozebra table to have biblionumber,title-ranking; (; is the entry separator. Now, if a value is several times in an index, it is stored only once, with a higher ranking (the ranking is the number of times the word appeard for this index)
- improving search to have ranking value (default order). The ranking is the sum of ranking of all terms. The list is ordered by ranking+title, from most to lower
- add nozebra table management on biblio editing
- the index table content is hardcoded. I still have to add some specific systempref to let the library update it
- manage pagination (next/previous)
- manage facets
WHAT works :
- NZgetRecords : has exactly the same API & returns as zebra getQuery, except that some parameters are unused
- search & sort works quite good
- CQL parser is better that what I thought I could do : title="harry and sally" and publicationyear>2000 not itemtype=LIVR should work fine
All subs have be cleaned :
- removed useless
- merged some
- reordering Biblio.pm completly
- using only naming conventions
Seems to have broken nothing, but it still has to be heavily tested.
Note that Biblio.pm is now much more efficient than previously & probably more reliable as well.
== Biblio.pm cleaning (useless) ==
* some sub declaration dropped
* removed modbiblio sub
* removed moditem sub
* removed newitems. It was used only in finishrecieve. Replaced by a Koha2Marc+AddItem, that is better.
* removed MARCkoha2marcItem
* removed MARCdelsubfield declaration
* removed MARCkoha2marcBiblio
== Biblio.pm cleaning (naming conventions) ==
* MARCgettagslib renamed to GetMarcStructure
* MARCgetitems renamed to GetMarcItem
* MARCfind_frameworkcode renamed to GetFrameworkCode
* MARCmarc2koha renamed to TransformMarcToKoha
* MARChtml2marc renamed to TransformHtmlToMarc
* MARChtml2xml renamed to TranformeHtmlToXml
* zebraop renamed to ModZebra
== MARC=OFF ==
* removing MARC=OFF related scripts (in cataloguing directory)
* removed checkitems (function related to MARC=off feature, that is completly broken in head. If someone want to reintroduce it, hard work coming...)
* removed getitemsbybiblioitem (used only by MARC=OFF scripts, that is removed as well)
It now responds to:
-n : the number of records to import.
-commit : the number of records to wait before performing a 'commit' operation
ALSO: IMPORTANT: I took out the char_encoding as this should be handled by
MARC::File::XML now, unless I'm mistaken.
recordId: (bib1,Identifier-standard) just after the comma. Adam agreed it was a bug, and it should be solved soon. But now we are aware, we can avoid putting the space !
In this commit you have all what is needed to setup a working zebra DB in Unimarc :
* collection.abs is UNIMARC specific and must be rewritten for MARC21, in marc21 directory
* pdf.properties is to be copied unmodified in the marc21 directory (can also be put somewhere else)
* rebuild_zebra.pl is SLOW, but 1 step reindexing tool, using ZOOM
* rebuild_zebra_idx is FAST, but 2 step reindexing tool, and does not use zebra. run it, it will create all biblios XML files in /zebra/biblios directory, then zebraidx update biblios in your zebra directory
* zebra.cfg is the zebra config file ;-)
* test_cql2rpn.pl is a script that will query the database and show the results. Works for me, just change the query at the beginning to get answers you expect.
What has to be done :
* benchmarking : it seems the zebraidx update is faster than lightning (400biblios/sec : 10 000biblios in 25seconds), while ZOOM indexing is slow (something like 25biblios/second) More benchmarking could be done.
* completing collection.abs for UNIMARC. I'll take care of it.
* modifying Biblio.pm to use ZOOM instead of the "zebraidx through exec" running actually. I'll take care of it also.
* modify the search API & tools & screens. I'll let the ball to someone else (chris ?) for this. I agree SearchMarc.pm can be dropped and replaced by something else (maybe a new-and-clean Search.pm package)
Seems not to break too many things, but i'm probably wrong here.
at least, new features/bugfixes from 2.2.5 are here (tested on some features on my head local copy)
- removing useless directories (koha-html and koha-plucene)