The patch for bug 9523 added a JOIN to the biblio table when identifying
the best match so that if a matched record had been deleted it would
not hold up the import process. Unfortunately, this broke all authority
matching, since of course authorities don't appear in the biblio table.
This patch adds a join to auth_header as well, and decides which to
check based on the record type.
Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>
Comment on third patch of this series.
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
Test Plan:
1) Stage a MARC record file that will have matches with existing records
2) Delete the bib from Koha that was matched on
3) Attempt to import the records into Koha, the import will hang
4) Apply the patch
5) Reload manage-marc-import.pl and attempt to import again, this time it should succeed.
Signed-off-by: Galen Charlton <gmc@esilibrary.com>
Signed-off-by: Jonathan Druart <jonathan.druart@biblibre.com>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
Test Plan:
1) Catalog a new record with an ISBN
2) Add some items to the record
3) Download the record as MARCXML
4) Delete the itemnumbers from the 952 fields in the record,
Change the barcode fields to unused barcodes
5) Use xml2marc to save as a standard MARC file
6) Import the record using the 'Stage MARC for import' tool
Use the settings:
Record matching rule: ISBN
Action if matching record found: ignore
Action if no match found: ignore
Item processing: always_add
Check for embedded item record data?: Yes
How to process items: Always add items
7) Import, note the bib is ignored, and the items are not processed
8) Undo import into catalog
8) Apply this patch
9) Import this batch into the catalog
10) Note the items were processed and are now added to the matching
record
Signed-off-by: Mathieu Saby <mathieu.saby@univ-rennes2.fr>
Tested with UNIMARC record. I followed the test plan, just changing 952 by 995
Signed-off-by: Mason James <mtj@kohaaloha.com>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
Only changing some documentation about GetXmlBiblio
Signed-off-by: Mirko Tietgen <mirko@abunchofthings.net>
Added the word 'contain'
Signed-off-by: Jonathan Druart <jonathan.druart@biblibre.com>
Some Z39.50 server may use the MARC-8 encoding, which uses separated
diacritics. By forcing a normalization, all imported records will have
combined diacritics.
Records with separated diacritics might not show up in Zebra searches if
the search terms use accented characters.
Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
http://bugs.koha-community.org/show_bug.cgi?id=8610
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
checked it still works after the patch with UNIMARC and BNF server (that
provide utf-8 records)
A subroutine was not being imported by C4::ImportBatch (ironic, no?)
so this patch makes the call fully-qualified. This patch also cleans
up two warnings in C4::Auth that are raised when logged in as the
database user.
Signed-off-by: Nicole C. Engard <nengard@bywatersolutions.com>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
The staged MARC management script was not correctly informing
the decoder ring that we had UNIMARC authorities, and the decoder
ring was dutifully trying to turn the authority records into a
bibliographic box of cereal.
Expose authority import functionality to the command line import
scripts, and rename them from commit_biblios_file.pl and
stage_biblios_file.pl to commit_file.pl and stage_file.pl.
To test (note that these instructions assume you have a MARC21
installation and are using the provided sample file):
1. Find a file of authorities (a sample file with MARC21 authorities
is attached to bug 7475) and download it to your server
2. Stage the file using the following command (replace <filename> with
the name of the file you saved in step 1):
> misc/stage_file.pl --file <filename> --authorities
3. Note the batch number the script assigns to your batch
4. Commit the records using the following command (replace <batchnumber>
with the batch number you made note of in step 3):
> misc/commit_file.pl --batch-number <batchnumber>
5. Index the authorities Zebraqueue (or wait)
6. Confirm that the new authorities appear.
7. Create a matching rule with the following settings:
Code: AUTHTEST
Description: Personal name main entry
Match threshold: 999
Record type: Authority record
Search index: Heading-main
Score: 1000
Tag: 100
Subfields: a
Offset: 0
Length: 0
(note the ID of this matching rule)
8. Stage the authority file again, this time using the following
command:
> misc/stage_file.pl --file <filename> --authorities \
--match <matchingrule>
7. Revert the import with the following command:
> misc/commit_file.pl --batch-number <batchnumber> --revert
8. Index the authorities Zebraqueue (or wait)
9. Confirm that the records have been removed
10. Import an authority record with the Stage MARC/Manage staged MARC
tools in exactly the way you would for a bibliographic record,
but choose "Authority" instead of "Bibliographic" for the record
type.
Signed-off-by: Elliott Davis <elliott@bywatersolutions.com>
Testing plan delivers as it should.
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
Rebased on latest master 11 September 2012
* Add the code necessary to handle authorities with matching rules and
import batches.
* Update all the scripts that use the matcher and import batch code
to use the new API.
* Add authority records to the matching rules interface in the staff
client.
http://bugs.koha-community.org/show_bug.cgi?id=2060
Signed-off-by: Elliott Davis <elliott@bywatersolutions.com>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
Rebased on latest master 11 September 2012
Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
this patch checks the "on loan" and "reserved" status before deleting item, and
do noting in this case, so the record can't be deleted due to existing item.
Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>
svc/import_bib:
* takes POST request with parameters in url and MARC XML as DATA
* pushes MARC XML to an impoort bach queue of type 'webservice'
* returns status and imported record XML
* is a drop-in replacement for svc/new_bib
misc/cronjobs/import_webservice_batch.pl:
* a cron job for processing impoort bach queues of type 'webservice'
* batches can also be processed through the UI
misc/bin/connexion_import_daemon.pl:
* a daemon that listens for OCLC Connexion requests and is compliant
with OCLC Gateway spec
* takes request with MARC XML
* takes import batch params from a config file and forwards the lot to
svc/import_bib
* returns status
ImportBatches:
* Added new import batch type of 'webservice'
* Changed interface to AddImportBatch() - now it takes a hashref
* Replaced batch_type = 'batch' with
batch_type IN ( 'batch', 'webservice' ) in some SELECTs
Signed-off-by: MJ Ray <mjr@phonecoop.coop>
On some record, the commit_biblio_file is creating wide
character
because as_xml is not used with correct parameter.
This patch fixes that.
To test on a UNIMARC Koha, stage attachment 7510 and
then import.
It hangs before the patch, it passes after.
Signed-off-by: Chris Cormack <chris@bigballofwax.co.nz>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
Rather than having options for MARC21 and UNIMARC in the "Character encoding"
dropdown, the user should be able to select the appropriate character encoding.
The default retains the current behavior, which is to allow the system to guess
which character encoding is in use. However, it should be noticed that this is
almost always wrong for non-UTF8 records with non-ASCII characters. Specifying
a character set is much more reliable if you're not using UTF-8.
Rebased to use Template::Toolkit instead of HTML::Template::Pro.
Signed-off-by: Jared Camins-Esakov <jcamins@bywatersolutions.com>
Signed-off-by: Nicole C. Engard <nengard@bywatersolutions.com>
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
Fixed glitch where the first page of bibs in a batch (or the first
page of import batches) was displaying the entire list instead
of the correct number of records per page.
Signed-off-by: Galen Charlton <gmcharlt@gmail.com>
- basket.pl: updating display, formatting dates,
- neworderempty: updating display, removing useless code, using ACQ framework if it exist. The ACQ framework will be used for creating items record during acquisitions. If it does not exist, default is used instead (which has many more informations, lot of them being irrelevant during acquisition, like the barcode)
- new order from imported batch: rewrite of the workflow. Now uses neworderempty and changing status of import_record to 'imported'
- s/copyrightdate/publicationyear/ as it's what libraries uses when ordering
- fixing some warnings
-
batches, it is now possible to 'clean' a batch by
removing all bib and item records staged in the batch. This
has the effect of helping to reduce database space used
by old import batches as well as removing staged records
from the cataloging reservoir search. Note that 'cleaning'
a batch affects only the copies of the records that were staged;
if the batch was committed, cleaning the batch does not
affect any bibs and items that were committed into the catalog.
Also note that once you clean a committed batch of records, it is
impossible to undo the previous commit operation.
Signed-off-by: Galen Charlton <gmcharlt@gmail.com>
Also fix issues with normalizing ISBNs and the default
normalizer in C4::Matcher.
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
Signed-off-by: Galen Charlton <galen.charlton@liblime.com>
The feature in question is described in bug 2952; to
summarize the enhancement, which the earlier patch
description did not do, the list of bibs in an
import record batch now includes a column linking
each import record to the bib that was actually created
or updated when the import batch was committed.
The improvements in this patch are:
* If bib in import batch has not been committed, it
has not been linked to a matching new or updated bib.
In that case, do not create a link to a guaranteed
404 (/cgi-bin/koha/catalogue/detail.pl?biblionumber=)
* When reverting an import batch, set matched_biblionumber
to NULL for affected records - otherwise, the Bib
column will include links to bibs that may no longer
exist.
* Fixed a minor HTML validation error.
Signed-off-by: Galen Charlton <galen.charlton@liblime.com>
When recommiting a partially completed MARC
record batch, records that were already imported
(or had an error status) were being processed
again, leading to duplicate bibs. Corrected
so that these records are actually ignored.
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
Enhanced the ability of catalogers to specify how
bib and item records should be added, replaced, or
ignored during a staging import.
When an import batch of bib records is staged and commit,
the user can now explicitly specify what should occur
when an incoming bib record has, or does not have, a match
with a record already in the database. The options are:
if match found (overlay_action):
create_new (just add the incoming record)
replace (replace the matched record with the incoming one)
use_template (option not implemented)
ignore (do nothing with the incoming bib; however, the
items attached to it may still be processed
based on the item action)
if no match is found (nomatch_action):
create_new (just add the incoming record)
ignore (do nothing with the incoming bib; in this
case, any items attached to it will be
ignored since there will be nothing to
attach them to)
The following options for handling items embedded in the
bib record are now available:
always_add (add the items to the new or replaced bib)
add_only_if_match (add the items only if the incoming bib
matches an existing bib)
add_only_if_add (add the items only if the incoming bib
does *not* match an existing bib)
ignore (ignore the items entirely)
With these changes, it is now possible to support the following use cases:
[1] A library joining an existing Koha database wishes to add their
items to existing bib records if they match, but does not want
to overlay the bib records themselves.
[2] A library wants to load a file of records, but only handle
the new ones, not ones that are already in the database.
[3] A library wants to load a file of records, but only
handle the ones that match existing records (e.g., if
the records are coming back from an authority control vendor).
Documentation changes:
* See description above; also, screenshots of the 'stage MARC records
for import' and 'manage staged MARC records' should be updated.
Test cases:
* Added test cases to exercise staging and committing import batches.
UI changes:
* The pages for staging and managing import batches now have
controls for setting the overlay action, action if no match,
and item action separately.
* in the manage import batch tool, user is notified when they
change overlay action, no-match action, and item action
* HTML for manage import batch tool now uses fieldsets
Database changes (DB rev 076):
* added import_batches.item_action
* added import_batches.nomatch_action
* added 'ignore' as a valid value for import_batches.overlay_action
* added 'ignored' as a valid value for import_records.status
* added 'status' as a valid value for import_items.status
API changes:
* new accessor routines for C4::ImportBatch
GetImportBatchNoMatchAction
SetImportBatchNoMatchAction
GetImportBatchItemAction
SetImportBatchItemAction
* new internal functions for C4::ImportBatch to
determine how a given bib and item are to be
processed, based on overlay_action, nomatch_action,
and item_action:
_get_commit_action
_get_revert_action
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
Added invocations of StripNonXmlChars to uses
of new_from_xml() that involve records
saved to Koha fields via MARC::Record->as_xml();
for batch jobs that work on MARC XML files
coming from external sources, StripNonXmlChars
should not necessarily be used, as it may
be better to reject a file or record if it
contains that kind of encoding error.
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
* IsStringUTF8ish - determine if scalar contains a string in UTF8
* MarcToUTF8Record - convert MARC blob or MARC::Record to UTF8
* SetMarcUnicodeFlag - set appropriate MARC21 or UNIMARC field to
indicate that record is in UTF-8.
Design points of this module include:
* No dependencies on other C4 modules, making it easier to add
more test cases
* All character conversion code in one place
* Single entry point for doing a character conversion on a
MARC record
* Capture of errors and warnings produced by Text::Iconv
and MARC::Charset
* Start of support for guessing the source character set of
a MARC record.
Several functions were moved from other scripts
or modules to C4::Charset:
* C4::Koha->FixEncoding (expanded and renamed
MarcToUTF8Record)
* C4::Koha->char_decode5426
* fMARC8ToUTF8 from bulkmarcimport.pl (renamed
_marc_marc8_to_utf8)
Several batch jobs were adjusted to use MarcToUTF8Record instead of
FixEncoding.
Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
The problem occured during the population of
import_items.marcxml -- the MARC::Record object
created to store the item did not have the Leader/09
set to 'a', which means that MARC::File::XML
tried to transcode code the item from MARC-8 to UTF-8, which
breaks since the MARC data is already in UTF-8 at that point.
Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
Batch import was not removing item fields (e.g., 952
or 995) from MARC records.
Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
Introduced C4::Items module to separate items API
from biblio API. Details on changes will be
put in later commit messages.
Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
If a MARC batch was imported, then reverted, the 952s
from the saved copied of the bib were added back
when the bib was replaced, leading to duplication
Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
Enhancement to store the matching rule associated with an
import batch and to allow the current matching rule in
effect to be changed and the duplicate detection redone.
Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
As part of this, modified two routines in C4::ImportBatch
to support a callback for monitor progress of import
processing.
Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
Revamps the import options on the tools menu to have two parts:
[1] Staging (load file into reservoir)
[2] Managing (review the list of staged batches, then
choose to commit or undo a given batch.
Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>