This patch fixes an issue whereby biblios with many items (often > 500) would index,
but not the biblionumber itself, resulting in search results with a) inaccurate item counts
and b) no biblionumber to use in the link to the details page. This is due to Net::Z3950::ZOOM not providing
a mechanism for specifying different connection attributes; the maximumRecordSize ZOOM connection attribute,
if not specified, defaults to 1MB, which is less than the size of a MARC record with many, many 952 fields. Since
it is unlikely we can fix Net::Z3950::ZOOM in a timely fashion, this patch aims to build a workaround on the Koha end.
This patch changes EmbedItemsInMarcBiblio to use append_fields instead of insert_ordered_fields,
so the 999$c will come before the item records. It's VERY unlikely we will encounter more than 1MB of biblio-level MARC
content, as this would break the ISO-2709 standard by a large factor.
To this end, it also moves the fix_biblio_ids portion of get_corrected_marc_record out of rebuild_zebra.pl,
and makes it a part of GetMarcBiblio (right before EmbedItemsInMarcBiblio, so the 952s still come last). fix_biblio_ids
is kept as a subroutine for the deletion portion of rebuild_zebra.pl, which still uses it.
It also uses the subroutine parameter in GetMarcBiblio to do the EmbedItemsInMarcBiblio action, rather than having
rebuild_zebra.pl perform it on the itemless record returned from GetMarcBiblio. Simpler and cleaner that way.
To verify bug issue:
1. Find a biblio with over 700 items (or enough that the resulting MARCXML is greater than 1MB)
2. search for this biblio (in a search that would return multiple results, not just this title). You should get the title in
the results list
3. attempt to click the link to this biblio's details page; the biblionumber should be blank, leading to a 404
To test solution:
1. Apply patch
2. modify the biblio slightly (click the 005 for example) and save
OR manually add the biblio to zebraqueue for reindexing
3. after rebuild_zebra.pl -z -b -x runs, use the same search as above. The title should still appear.
4. click the link, and find yourself on the biblio detail page as desired
Signed-off-by: D Ruth Bavousett <ruth@bywatersolutions.com>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
Display links to parent biblios, show linked items in holdings, allow holds on
linked items. This uses MARC to maintain relationships.
Sponsored by the Mississippi Department of Archives and History and RapidRadio
Solution. Originally developed by Savitra Sirohi and Amit Gupta at OSSLabs, with
UNIMARC support added by Zeno Tajoli. Commits squashed and merge conflicts
resolved by Chris Cormack from Catalyst. Respect for NORMARC and some small
framework portability fixes made by Jared Camins-Esakov of C & P Bibliography
Services.
IMPORTANT NOTE: A bug in the 773 coding for MARC21 was corrected from the
original OSS Labs code. The 773s generated by the pre-release code did not have
the first indicator set to '0', which means that they were not supposed to
display. Going forward, the first indicator will be set correctly, but existing
records created with this code will no longer appear (they appeared before only
due to another bug). To correct this, you could globally (or, to make sure you
only modify records created with the Analytics tool, for records with 773$0)
change the first indicator of the 773 from blank to '0'.
== Background ==
An analytic record for an item is a more detailed, monographic biblio for an
item attached to a serial record . This is often used for special issues of a
journal that are released as books on their own (assigned an ISBN, as well as an
ISSN/volume/issue). It is important for researchers to be able to search for
these items both as issues of the serial, and as monographs. It is equally
important for the library to not have duplicate item records for the item in
question to have to keep synchronized.
== Establishing relationships ==
Analytical records are connected to items belonging to parent or host
bibliographic records. This can be accomplished by:
* From an analytical bibliographic record linking to an host item by providing
the item barcode as input
* From a host item by using option "analyze", this creates a new empty
bibliographic record with field 773 (MARC21) populated
* Running a new CLI script that establishes a relationship between the
analytical record and the host item identified by the barcode in the
analytical record's 773$o (MARC21)
== Connecting Records ==
The relationships are maintained in the MARC records, we have not used database
tables at all.
== MARC Representation ==
In MARC21/NORMARC we have used:
* 773$9 to store the Koha item number of the host item
* 773$0 to store the Koha biblio number of the host bibliographic record
The above fields are used to display the relationships in various screens in the
OPAC and the staff interface. Additionally, when populating field 773 with host
item's details, we have used following MARC 21 mapping:
* 'a' <= 100/110/111 $a (author main)
* 'b' <= 250$a (edition)
* 'd' <= 260$a, 260$b, 260$c (place, publisher, year)
* 'o' <= barcode
* 't' <= 245$a (title)
* 'w' <= (003)001 --> if no 001 is available, we can populate biblionumber
* 'x' <= 022$a (issn)
* 'z' <= 020$a (isbn)
In UNIMARC, this code uses:
* 461$9 to store the Koha item number of the host item
* 461$0 to store the Koha biblio number of the host bibliographic record
When populating field 461 in UNIMARC, the following mapping is used:
* 't' <= 200$a (title)
== Treatment of Holds ==
A key requirement was to allow holds to be placed on host items from the
analytical record. We have accomplished this by allowing holds on specific
copies only. Biblio level holds are not allowed. This ensures that holds are
placed on specific items that are relevant to the analytical record.
== Deleting host items with linked analytical records ==
As we have not used database tables to maintain relationships, we had to use
search to find out if any linked analytical records are present. If 1 or more
analytical are present, we do not allow deletion of items. This is similar to
what we see when we try to delete authority records.
== Importing analytical records ==
Analytical records can be imported using bulkmarcimport or the GUI tools. The
new CLI script can be executed after the import to establish relationships with
host items. The script will establish relationships using the host item's
barcode, the barcode must be present in 773$o of the analytical record.
== What if there are two or more copies of the host item? ==
The current design will require that there be two host (773) fields, one for
each copy.
== What if there is no barcode available for the host item? ==
It is still possible to establish a relationship, by populating 773$9 with the
host's item number. However the CLI script uses barcode in 773$o to establish
relationships so it won't work where barcodes are unavailable. Also from an
analytical record, it is possible to establish a relationship to a host item by
providing the barcode as input, this option will not be available as well.
Commits that added the following features were squashed by Chris Cormack (this
is not a list of every commit):
* Display links to host records from biblio detail screens
* Support for UNIMARC, respecting the system preference 'marcflavor'
* Support holds from the OPAC
* Ability to link to items belong to host records from a analytical record
* Display items belonging to host records in the moredetail page
* Ability to edit items belonging to host records, also ability to delink from
them
* Move get host items code into a C4 routine, also calling the new routine in
related perl scripts
* Move host field population to a C4 routine, all changes in pl files to call
new routine
* Allow only specific copy holds for analytical records plus changes to use new
C4 routines
* Support for holds on items linked via host records
* Storing bibnumber and itemnumber in subfields 0 and 9, plus other mapping
changes
* New command line script that establishes relationships between analytical
records and host items and bibs. The script looks for host field (MARC21 773)
in records, and based on barcode in subfield 'o' populates host bibnumber in
subfield '0' and host itemnumber in subfield '9'. The script can be run after
an import of analytical records, it can also be run in the crontab to maintain
the relationships
* Ability to create analytical records from items, to view linked analytics, and
prevent deletion of items that have linked analytics
* New template for catalogue/detail.pl (NOTE: not a new template file, just a
new way of displaying analytics), template displays linked analytics and
allows creation of analytical records
* New zebra index for item number in host fields. This index will be used to
display links to analytical records from host records
* Display title of host record instead of the phrase host record
* Using detail.tmpl for analytics tab instead of a new template file
* Improved qualification info prepration in Prephostmarcfield
* Check for linked analytics before deleting item
* Display link to host record and more meaningful anchor text for edit item link
* Analytical record: Unimarc index in record.abs and help in
create_analytical_rel.pl
* Adding a sys pref that controls display of options to create analytical
relationships
* Add host entry in XSLT stylesheet in staff item detail
* Added host record support to OPAC detail XSLT
* Adding 773$0 and 773$9 to all frameworks
* Adding 773 subfields 0 and 9 to default marc framework via updatedatabase.pl
* Display create analytics and used in links in catalog detail
* Fixed problem where analytical records not showing in OPAC search results
because GetMarcBiblio now needs a flag to add item records
* Fixed problem where analytics count was set to 1 for all records, not just
those with analytics
* Fixed catalogue detail page not to show analytics counts if count is 0
Conflicts:
installer/data/mysql/updatedatabase.pl
koha-tmpl/intranet-tmpl/prog/en/modules/cataloguing/addbiblio.tt
kohaversion.pl
Co-author: Savitra Sirohi <savitra.sirohi@osslabs.biz>
Co-author: Zeno Tajoli <tajoli@cilea.it>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
Signed-off-by: Ian Walls <ian.walls@bywatersolutions.com>
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
This both adds a bit of a failsafe to get_raw_biblio, and prevents
records that have been deleted from being updated by the same instance
of rebuild_zebra.
Minor amendment to remove duplication of 6433
Signed-off-by: MJ Ray <mjr@phonecoop.coop>
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
Fixes bug where a bib record imported by bulkmarcimport.pl
could become unindexable by ensuring that ModBiblioMarc()
is always called by bulkmarcimport.pl to finalize saving the
bib record (as it was initially created by AddBiblio with the
defer_marc_save option).
Also introduces a utility routine, C4::Biblio::_strip_item_fields.
Signed-off-by: Galen Charlton <gmcharlt@gmail.com>
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
Adds a new routine, C4::Biblio::EmbedItemsInMarcBiblio, to
embed the items in the bib record when necessary:
* cataloging/additem.pl
* rebuild_zebra.pl
Signed-off-by: Galen Charlton <gmc@esilibrary.com>
Signed-off-by: Claire Hernandez <claire.hernandez@biblibre.com>
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
This is a squash of four patches by Henri-Damien Laurent
starting work on removing the copy of item record information
in the 9XX field of bibliographic records. The reason
for doing this is primarily to improve performance, in particular,
the expense of having to add/modify the bib record whenever an
item changes. Now, whenever an item changes, the bib record is
put in the queue to be reindexed; when the bib is indexed, the 9XX
fields are inserted into the version of the bib that Zebra indexes.
Since rebuild_zebra.pl runs in a separate process, the processing of the
bib record will not delay (e.g.) circulation.
As part of upgrading to 3.4, the following batch script should be run:
misc/maintenance/remove_items_from_biblioitems.pl --run
This should be followed by a complete reindexing of the bib records, e.g.,
misc/migration_tools/rebuild_zebra.pl -b -r
Signed-off-by: Galen Charlton <gmcharlt@gmail.com>
Signed-off-by: Claire Hernandez <claire.hernandez@biblibre.com>
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
Import script shouldn't remove an information present in entering biblio
records. With this patch, by default, ISBN are not cleared anymore.
[2011.04.12] Rebased on HEAD
DOCUMENTATION: There is a new paramater --isbn|--noisbn
Signed-off-by: Colin Campbell <colin.campbell@ptfs-europe.com>
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
Remove some unnecessary checks when check of error is
sufficient. Make the order in some cases more logical
Should remove some possibilities of runtime warning noise.
Although some calls belong to the 'Nothing could
ever go wrong' school have added some warnings
Signed-off-by: Christophe Croullebois <christophe.croullebois@biblibre.com>
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
Reimplements support for -r, as well for -reset
Signed-off-by: D Ruth Bavousett <ruth@bywatersolutions.com>
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
If the zebra server directories don't exist, zebra will spit the dummy.
This makes rebuild_zebra.pl smart enough to create them if they're not
there. If that fails, it'll scream loudly so you know zebra isn't
reindexing.
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
This prevents it leaving files lying around in /tmp
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
Signed-off-by: Galen Charlton <gmcharlt@gmail.com>
Adding some new options to bulkmarcimport :
-k idtagsubfield in order to store the id of the file record into another field
-match tagsubfield,index
-a to import authorities
-l logfilename to store logs
Bug Fixing : C4/Charset.pm
Charset was incorrect for UNIMARC Authorities
Signed-off-by: Galen Charlton <gmcharlt@gmail.com>
Numbers in perl with leading zeros are interpreted in octal
Ensure that comparisons are done using string operators
or where appropriate use the MARC::Field method
Signed-off-by: Galen Charlton <gmcharlt@gmail.com>
mergeauthority and ModAuthority were working on two separate directories.
So that no authority would ever be merged via cronjob or commandline script
when MergeAuthoritiesOnUpdate is disable
Signed-off-by: Galen Charlton <gmcharlt@gmail.com>
The standard license statement in the header is fine; please
don't confuse things by doing anything different.
Signed-off-by: Galen Charlton <gmcharlt@gmail.com>
With this patch, rebuild_zebra can re-index a whole Koha DB
quickly:
rebuild_zebra -r -b -nosanitize
Biblio (authority) records are dump directly in a file
from marcxml field without beeing transformed into
MARC::Record object and corrected.
DOCUMENTATION:
rebuild_zebra.pl new paramater:
-nosanitize export biblio/authority records directly from DB marcxml
field without sanitizing records. It speed up
dump process but could fail if DB contains badly
encoded records. Works now only with -x and -b
Signed-off-by: Galen Charlton <galen.charlton@liblime.com>
This patch adds the MARC21 subdivsion record tags (18x) to the
block which recognizes and assigns authtypecodes to imported
authority records.
Signed-off-by: Galen Charlton <galen.charlton@liblime.com>
Fixes a hang of the staging import tool when it
attempts to process a MARC21 record that claims
that it's UTF-8 when it is not. The staging import
will now attempt to fix the character encoding of such
records.
Also added a FIXME to bulkmarcimport.pl, which because
of its use of MARC::Batch will skip over such records -
better than the original hang of the staging import, but
worse than the staging import's new ability to fix such
records.
Signed-off-by: Galen Charlton <galen.charlton@liblime.com>
Adds three new switches:
-idmap <filename> - optional output file of
map of source record ID numbers
to Koha biblionumber
-x - if idmap is supplied, MARC tag
to get source record ID from
-y - if idmap is supplied, MARC subfield
to get source record ID from
Signed-off-by: Galen Charlton <galen.charlton@liblime.com>
This includes part of a patch from Henri-Damien Laurent
that could not be applied because Chris and Joe patches
happened to win the race.
Signed-off-by: Galen Charlton <galen.charlton@liblime.com>
Add "use warnings", remove unused variables and unnecessary finish/disconnect
at the end. This script could be improved to run only on tables that need to
be altered instead of touching all of them. It should also probably contain
warnings to the effect that it does not rescue your DATA that was forced into
whatever encoding the table used previously.
Signed-off-by: Galen Charlton <galen.charlton@liblime.com>
Add the phrase 'if ( $verbose_logging )' to the two print statements
concerning the skipping of biblio or authority records.
I recently had to split biblio and authority index updating in my cron
script ( had some really big records so had to add the -x switch which
should only be used on biblios accourding to the help ). So I noticed
that rebuild_zebra.pl printed messages that it was skipping biblios or
authorities.
This patch is to conditionalize those prints based on the verbose
logging switch.
Signed-off-by: Galen Charlton <galen.charlton@liblime.com>
This reduces the output of the script and zebraidx, and creates a -v
command line switch which will increase the logging to their former
states.
Signed-off-by: Galen Charlton <galen.charlton@liblime.com>
merge works on the fly now.
But for an obscure reason, merge_authority.pl fails to update database when lanched on command line.
Adding one table to LOCK for noZebra UPDATE in Biblio.pm
You should remove C4::Search from merg_authority.pl
Signed-off-by: Galen Charlton <galen.charlton@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
Prior to this patch, rebuild_zebra.pl -z was effectively
hanging on to a lock on the zebraqueue table, preventing
other scripts from inserting new entries into the table.
This had the effect of causing circulation operations
to time out.
Refactored by having rebuld_zebra.pl pull the active
queue into memory, then mark entries done by zebraqueue.id.
Consequently, rebuild_zebra.pl should no longer
block adding new entries into zebraqueue.
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
note 995 for items is hardcoded, so it's really for UNIMARC only. The script exit if you're not UNIMARCflavour
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
rebuild_zebra.pl will now mark all zebraqueue entries
of the affected record type(s) done when run in
normal mode to index all records (as opposed to running
it with -z to just process the zebraqueue). This prevents
any running zebraqueue_daemon processes from attempting
to reindex the same records, redundantly.
The new -y swtich overrides this new behavior; in other words, if
running rebuild_zebra.pl without -z, you can specify
-y to *not* mark zebraqueue done.
Signed-off-by: Joshua Ferraro <jmf@liblime.com>