Adding some new options to bulkmarcimport :
-k idtagsubfield in order to store the id of the file record into another field
-match tagsubfield,index
-a to import authorities
-l logfilename to store logs
Bug Fixing : C4/Charset.pm
Charset was incorrect for UNIMARC Authorities
Signed-off-by: Galen Charlton <gmcharlt@gmail.com>
Numbers in perl with leading zeros are interpreted in octal
Ensure that comparisons are done using string operators
or where appropriate use the MARC::Field method
Signed-off-by: Galen Charlton <gmcharlt@gmail.com>
mergeauthority and ModAuthority were working on two separate directories.
So that no authority would ever be merged via cronjob or commandline script
when MergeAuthoritiesOnUpdate is disable
Signed-off-by: Galen Charlton <gmcharlt@gmail.com>
The standard license statement in the header is fine; please
don't confuse things by doing anything different.
Signed-off-by: Galen Charlton <gmcharlt@gmail.com>
With this patch, rebuild_zebra can re-index a whole Koha DB
quickly:
rebuild_zebra -r -b -nosanitize
Biblio (authority) records are dump directly in a file
from marcxml field without beeing transformed into
MARC::Record object and corrected.
DOCUMENTATION:
rebuild_zebra.pl new paramater:
-nosanitize export biblio/authority records directly from DB marcxml
field without sanitizing records. It speed up
dump process but could fail if DB contains badly
encoded records. Works now only with -x and -b
Signed-off-by: Galen Charlton <galen.charlton@liblime.com>
This patch adds the MARC21 subdivsion record tags (18x) to the
block which recognizes and assigns authtypecodes to imported
authority records.
Signed-off-by: Galen Charlton <galen.charlton@liblime.com>
Fixes a hang of the staging import tool when it
attempts to process a MARC21 record that claims
that it's UTF-8 when it is not. The staging import
will now attempt to fix the character encoding of such
records.
Also added a FIXME to bulkmarcimport.pl, which because
of its use of MARC::Batch will skip over such records -
better than the original hang of the staging import, but
worse than the staging import's new ability to fix such
records.
Signed-off-by: Galen Charlton <galen.charlton@liblime.com>
Adds three new switches:
-idmap <filename> - optional output file of
map of source record ID numbers
to Koha biblionumber
-x - if idmap is supplied, MARC tag
to get source record ID from
-y - if idmap is supplied, MARC subfield
to get source record ID from
Signed-off-by: Galen Charlton <galen.charlton@liblime.com>
This includes part of a patch from Henri-Damien Laurent
that could not be applied because Chris and Joe patches
happened to win the race.
Signed-off-by: Galen Charlton <galen.charlton@liblime.com>
Add "use warnings", remove unused variables and unnecessary finish/disconnect
at the end. This script could be improved to run only on tables that need to
be altered instead of touching all of them. It should also probably contain
warnings to the effect that it does not rescue your DATA that was forced into
whatever encoding the table used previously.
Signed-off-by: Galen Charlton <galen.charlton@liblime.com>
Add the phrase 'if ( $verbose_logging )' to the two print statements
concerning the skipping of biblio or authority records.
I recently had to split biblio and authority index updating in my cron
script ( had some really big records so had to add the -x switch which
should only be used on biblios accourding to the help ). So I noticed
that rebuild_zebra.pl printed messages that it was skipping biblios or
authorities.
This patch is to conditionalize those prints based on the verbose
logging switch.
Signed-off-by: Galen Charlton <galen.charlton@liblime.com>
This reduces the output of the script and zebraidx, and creates a -v
command line switch which will increase the logging to their former
states.
Signed-off-by: Galen Charlton <galen.charlton@liblime.com>
merge works on the fly now.
But for an obscure reason, merge_authority.pl fails to update database when lanched on command line.
Adding one table to LOCK for noZebra UPDATE in Biblio.pm
You should remove C4::Search from merg_authority.pl
Signed-off-by: Galen Charlton <galen.charlton@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
Prior to this patch, rebuild_zebra.pl -z was effectively
hanging on to a lock on the zebraqueue table, preventing
other scripts from inserting new entries into the table.
This had the effect of causing circulation operations
to time out.
Refactored by having rebuld_zebra.pl pull the active
queue into memory, then mark entries done by zebraqueue.id.
Consequently, rebuild_zebra.pl should no longer
block adding new entries into zebraqueue.
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
note 995 for items is hardcoded, so it's really for UNIMARC only. The script exit if you're not UNIMARCflavour
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
rebuild_zebra.pl will now mark all zebraqueue entries
of the affected record type(s) done when run in
normal mode to index all records (as opposed to running
it with -z to just process the zebraqueue). This prevents
any running zebraqueue_daemon processes from attempting
to reindex the same records, redundantly.
The new -y swtich overrides this new behavior; in other words, if
running rebuild_zebra.pl without -z, you can specify
-y to *not* mark zebraqueue done.
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
* Add a new parameter -o to begin importing input file after skiping
n records.
* Enclose input file reading in an eval directive to avoid abording
import if few records are corrupted: they are now skipped.
* Help formating.
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
Accidentally introducing a circular reference in a
MARC::Record object does not lead to goodness, particularly
if you export lots and lots of them.
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
The -z option, when used in conjunction with -a and/or -b,
selects the records to reindex from the zebraqueue table.
Both record updates and record deletes are handled.
-z is cannot be used with -s or -r: the updated records
must always be freshly exported, and if zebraqueue
is to be processed, it's assumed that you don't want
to drop the Zebra index first.
This means that rebuild_zebra.pl -b -a -x can be
used as a cronjob to update the indexes periodically; it
is believed that this will offer much better indexing
performance on some setups as compared to zebraqueue_daemon.pl,
which uses Z39.50 extended services to send record updates
to Zebra.
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
At moment using both -a (index authorities) and
-x (export records as MARC XML) is not allowed -
if the Zebra authority database is using the DOM
filter, zebraidx will not be able to process the
exported records correctly.
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
1. Logic to fix up record IDs, UNIMARC 100 field,
and record leader now in separate functions.
2. Removed (incorrect) logic to save corrected record
in database.
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
The version of MARC::Batch->new() distributed with version
2.0.0 of MARC::Record, if given a file name, will
open it using the ':utf8' layer. This results in an
incorrect character conversion when processing records
in the MARC-8 character encoding.
To avoid this, batch jobs that use MARC::Batch now
open the file themselves, then pass the file handle
to MARC::Batch->new().
Signed-off-by: Joshua Ferraro <jmf@liblime.com>