Most of them were found and fixed using codespell.
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
Signed-off-by: Bernardo Gonzalez Kriegel <bgkriegel@gmail.com>
Signed-off-by: Jonathan Druart <jonathan.druart@koha-community.org>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
Add logging of errors.
Signed-off-by: Magnus Enger <magnus@enger.priv.no>
More errors are indeed showing up in the log.
(I took the liberty of changing the commit message a little bit.)
Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@gmail.com>
A minor QA comment.
::: misc/migration_tools/bulkmarcimport.pl
@@ +271,5 @@
> my ( $error, $results, $totalhits ) = C4::Search::SimpleSearch( $query, 0, 3, [$server] );
> + # changed to warn so able to continue with one broken record
> + if ( defined $error ) {
> + warn "unable to search the database for duplicates : $error";
> + next;
For consistency with the rest of the script, should this perhaps be:
next RECORD;
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@gmail.com>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@gmail.com>
GetFrameworkCode was incorrectly spelt as GetFrameworkcode on line 401.
Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>
Signed-off-by: Jonathan Druart <jonathan.druart@biblibre.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@gmail.com>
This patch turns off the AuthoritiesLogging syspref when running the
bulkmarcimport.pl script.
It also temporarily disables the syspref caching which will have
been making the CataloguingLogging handling ineffectual. (That is,
updating the CataloguingLogging syspref in the script wouldn't
have an effect as the original cached value would be used anyway.)
_TEST PLAN_
0) Turn on "AuthoritiesLogging" syspref
1) Load an authority record using bulkmarcimport.pl
2) Note a new Authorities entry in action_logs
3) Apply the patch
4) Repeat Step 1
5) Note that no new entry is made in action_logs
(Bonus points: Do the same thing with CataloguingLogging and a
bibliographic record.)
Signed-off-by: Bernardo Gonzalez Kriegel <bgkriegel@gmail.com>
Tested with biblio and auth imports.
Work as described, no koha-qa errors.
Note: If you begin to load a big file and get impatient and hit ^C,
seems that current syspref value is lost...
Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>
Passes tests and QA script.
Patch copies what was already done for the CatalougingLog, no problems found.
Signed-off-by: Tomas Cohen Arazi <tomascohen@gmail.com>
The initial patch for this bug did not include a specific command line
option for customization. If a module LocalChanges.pm existed, it would
be used without asking.
This patch adds a command line option enabling the customization option
and offering the extra possibility of using another module name. If no file
name is passed, we default to LocalChanges.
Without the -custom option, behavior is as it was.
Also some POD lines are added to document the feature.
Test plan:
[1] Make a LocalChanges.pm in migration_tools. Verify that it is not used,
if you do not enable the -cust parameter.
[2] Run the script again with -cust. Verify that it is called now.
[3] Copy LocalChanges.pm to Whatever.pm. Make some change. Run with
-cust Whatever and verify that the new module is used.
[4] Copy Whatever.pm to another dir, make some change. Run with -cust and the
full name. Verify that the latest change was used.
[5] Run without any option. Check the pod documentation.
Signed-off-by: Jonathan Druart <jonathan.druart@biblibre.com>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Signed-off-by: Galen Charlton <gmc@esilibrary.com>
This patch makes two adjustments:
[1] For the verbose option, verbose level 2 now means print the
formatted version of each record.
[2] If a module LocalChanges.pm is found in misc/migration_tools, the
routine "customize" in this module is called for each marc record.
This allows you to make local changes to these marc records before
importing them.
Test plan:
[1] Test the verbose option: a single -v for medium verbosity and two
-v to dump a human-readable version of the record to standard output.
(Do not yet copy LocalChanges.pm in the folder.)
You may used the attached example file on Bugzilla:
perl misc/migration_tools/bulkmarcimport.pl -file zztest01.xml -v -v -b -m XML -t | more
Note the option t for test; no records will be imported.
[2] Copy LocalChanges.pm in the migration_tools folder. You may use the
example provided on Bugzilla (in a patch). If you use the example module,
check the contents of 001, 005 and 590 fields. (The -v -v option allows
you to easily check that.)
Signed-off-by: Jonathan Druart <jonathan.druart@biblibre.com>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Signed-off-by: Galen Charlton <gmc@esilibrary.com>
Signed-off-by: Magnus Enger <digitalutvikling@gmail.com>
Keeps current behaviour as default.
The -append option is described in the POD and works as expected.
Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>
Works as described.
Adding a date/time to the output might
be good, to make it easier to find the entry you were looking for.
Signed-off-by: Galen Charlton <gmc@esilibrary.com>
This patch makes Koha <-> Zebra use MARCXML for the serialization when
using DOM, and USMARC for GRS-1.
* The following functions are modified to set the Zebra record syntax
according to the current sysprefs and configuration:
- C4::Context->Zconn
- C4::Context-_new_Zconn
* A new function 'new_record_from_zebra' is introduced, which checks the
context we are in, and creates the MARC::Record object using the right
constructor.
The following packages get touched to make use of the new function:
- C4::Search
- C4::AuthoritiesMarc
and the same happens to the UI scripts that make use of them (both in
the OPAC and STAFF interfaces).
* Calls to the unsafe ZOOM::Record->render()[1] method are removed.
Due to this last change the code for building facets was rewritten. And
for performance on the facets creation I pushed higher version
dependencies for MARC::File::XML and MARC::Record (we rely on
MARC::Field->as_string).
* Calls to MARC::Record->new_from_xml and MARC::Record->new_from_usmarc
are wrapped with eval for catching problems [2].
* As of bug 3087, UNIMARC uses the 'unimarc' record syntax. this case is
correctly handled.
* As of bug 7818 misc/migration_tools/rebuild_zebra.pl behaves like:
- bib_index_mode (defaults to 'grs1' if not specified)
- auth_index_mode (defaults to 'dom')
here we do exactly the same.
To test:
- prove t/db_dependent/Search.t should pass.
- Searching should remain functional.
- Indexing and searching for a big record should work (that's what the
unit tests do).
- Test an index scan search (on the staff interface):
Search > More options > Check "Scan indexes".
- Enable 'itemBarcodeFallbackSearch' and try to circulate any word, it
shouldn't break.
- Searching for a biblio in a new subscription shouldn't break.
- Running bulkmarcimport.pl shouldn't break.
- And so on... for the rest of the .pl files.
[1] http://search.cpan.org/~mirk/Net-Z3950-ZOOM/lib/ZOOM.pod#render()
[2] a record that cannot be parsed by MARC::Record is simply skipped (bug 10684)
Sponsored-by: Universidad Nacional de Cordoba
Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Signed-off-by: Galen Charlton <gmc@esilibrary.com>
bulkmarcimport.pl can crash when searching for duplicates if the 005
field from the incoming or local record is not defined. This patch
fixes it.
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
Test plan
1/ Create a record with no 005 field
2/ Try to import it checking for duplicates, notice it crashes
3/ Try with a record with a 005 field, but the one in Koha missing
one, still crashes
4/ Apply patch
5/ No more crash
Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>
Passes all tests and QA script.
Patch fixes the problem described for importing authorities
with the bulkmarcimport.pl when trying to match with existing
records.
Signed-off-by: Galen Charlton <gmc@esilibrary.com>
To test:
0) Don't apply the patch yet.
1) Have the CataloguingLog system preference set to 'Log'.
2) Import a file of bibliographic records with bulkmarcimport.pl.
3) Check the state of CataloguingLog system preference -- it will be
set to 'Don't log'.
4) Apply the patch.
5) Repeat steps 1-3. The CataloguingLog system preference
will be 'Log'.
Signed-off-by: Galen Charlton <gmc@esilibrary.com>
Signed-off-by: Jonathan Druart <jonathan.druart@biblibre.com>
Signed-off-by: Galen Charlton <gmc@esilibrary.com>
See the script's documentation for more details
New parameters are:
- authtypes
- filter
- insert
- update
- all
Signed-off-by: Pascale Nalon <pascale.nalon@gmail.com>
This patch is live in Mines ParisTech since 2012-07-24.
Signing off
Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>
- Moved the sign-off from bugzilla to the commit message.
- All tests and QA script pass.
- Amended commit message to list new parameters.
- Verified this patch works on a UNIMARC installation.
- Verified normal import still works correct on a MARC21
installation.
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
With the inclusion of this patch, all searches will (try) to use
QueryParser for handling queries for both the bibliographic and authority
databases if UseQueryParser is enabled. If QueryParser is unavailable,
UseQueryParser is disabled, or the search uses CCL indexes, the old
search code will be used.
To test:
1) Apply patch.
2) Run the unit test with `prove t/QueryParser.t`
3) Enable the UseQueryParser syspref.
4) Try searches that should return results in the following places:
* OPAC (simple search)
* OPAC (advanced search)
* OPAC (authorities)
* Staff client (header search)
* Staff client (advanced search)
* Staff client (cataloging search)
* Staff client (authorities)
* Staff client (importing a batch using a match point)
* Staff client (searching for an item for adding to a label)
* Staff client (acquisitions)
* Staff client (searching for a record to create a serial)
* ANYWHERE ELSE I HAVE FORGOTTEN
5) Disable the UseQueryParser syspref. Repeat at least some of the
searches you did above.
6) If all searches worked, sign off.
Signed-off-by: Chris Cormack <chris@bigballofwax.co.nz>
Signed-off-by: Elliott Davis <elliott@bywatersolions.com>
Searching still works as expected for variuos places.
QueryParser syspref seemed to be enabled by default
Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
Replace \r with \n for newline in output for bulkmarcimport.pl
Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
This allows the --framework option to be specified when running
bulkmarkimport. This option allows a framework code to be specified for
the records being imported.
Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>
All tests pass, perlcritic fails before and after.
Tested
- imported records with -framework FA, FA framework is used
- imported records without -framework, default framework is used
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
This adds the -dedupbarcode option that allows bulkmarkimport to erase
a barcode but keep the item of any items it finds with duplicate
barcodes.
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
use encoding(UTF-8) rather than utf-8 for stricter
encoding
Marking output as ':utf8' only flags the data as utf8
using :encoding(UTF-8) also checks it as valid utf-8
see binmode in perlfunc for more details
In accordance with the robustness principle input
filehandles have not been changed as code may make
the undocumented assumption that invalid utf-8 is present
in the imput
Fixes errors reported by t/00-testcritic.t
Where feasable some filehandles have been made lexical rather than
reusing global filehandle vars
Signed-off-by: Jonathan Druart <jonathan.druart@biblibre.com>
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
Fixes bug where a bib record imported by bulkmarcimport.pl
could become unindexable by ensuring that ModBiblioMarc()
is always called by bulkmarcimport.pl to finalize saving the
bib record (as it was initially created by AddBiblio with the
defer_marc_save option).
Also introduces a utility routine, C4::Biblio::_strip_item_fields.
Signed-off-by: Galen Charlton <gmcharlt@gmail.com>
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
Import script shouldn't remove an information present in entering biblio
records. With this patch, by default, ISBN are not cleared anymore.
[2011.04.12] Rebased on HEAD
DOCUMENTATION: There is a new paramater --isbn|--noisbn
Signed-off-by: Colin Campbell <colin.campbell@ptfs-europe.com>
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
Remove some unnecessary checks when check of error is
sufficient. Make the order in some cases more logical
Should remove some possibilities of runtime warning noise.
Although some calls belong to the 'Nothing could
ever go wrong' school have added some warnings
Signed-off-by: Christophe Croullebois <christophe.croullebois@biblibre.com>
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
Adding some new options to bulkmarcimport :
-k idtagsubfield in order to store the id of the file record into another field
-match tagsubfield,index
-a to import authorities
-l logfilename to store logs
Bug Fixing : C4/Charset.pm
Charset was incorrect for UNIMARC Authorities
Signed-off-by: Galen Charlton <gmcharlt@gmail.com>
Numbers in perl with leading zeros are interpreted in octal
Ensure that comparisons are done using string operators
or where appropriate use the MARC::Field method
Signed-off-by: Galen Charlton <gmcharlt@gmail.com>
Fixes a hang of the staging import tool when it
attempts to process a MARC21 record that claims
that it's UTF-8 when it is not. The staging import
will now attempt to fix the character encoding of such
records.
Also added a FIXME to bulkmarcimport.pl, which because
of its use of MARC::Batch will skip over such records -
better than the original hang of the staging import, but
worse than the staging import's new ability to fix such
records.
Signed-off-by: Galen Charlton <galen.charlton@liblime.com>
Adds three new switches:
-idmap <filename> - optional output file of
map of source record ID numbers
to Koha biblionumber
-x - if idmap is supplied, MARC tag
to get source record ID from
-y - if idmap is supplied, MARC subfield
to get source record ID from
Signed-off-by: Galen Charlton <galen.charlton@liblime.com>
* Add a new parameter -o to begin importing input file after skiping
n records.
* Enclose input file reading in an eval directive to avoid abording
import if few records are corrupted: they are now skipped.
* Help formating.
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
The version of MARC::Batch->new() distributed with version
2.0.0 of MARC::Record, if given a file name, will
open it using the ':utf8' layer. This results in an
incorrect character conversion when processing records
in the MARC-8 character encoding.
To avoid this, batch jobs that use MARC::Batch now
open the file themselves, then pass the file handle
to MARC::Batch->new().
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
* IsStringUTF8ish - determine if scalar contains a string in UTF8
* MarcToUTF8Record - convert MARC blob or MARC::Record to UTF8
* SetMarcUnicodeFlag - set appropriate MARC21 or UNIMARC field to
indicate that record is in UTF-8.
Design points of this module include:
* No dependencies on other C4 modules, making it easier to add
more test cases
* All character conversion code in one place
* Single entry point for doing a character conversion on a
MARC record
* Capture of errors and warnings produced by Text::Iconv
and MARC::Charset
* Start of support for guessing the source character set of
a MARC record.
Several functions were moved from other scripts
or modules to C4::Charset:
* C4::Koha->FixEncoding (expanded and renamed
MarcToUTF8Record)
* C4::Koha->char_decode5426
* fMARC8ToUTF8 from bulkmarcimport.pl (renamed
_marc_marc8_to_utf8)
Several batch jobs were adjusted to use MarcToUTF8Record instead of
FixEncoding.
Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
Replace C4::Biblio::AddBiblioAndItems with two
things:
* An option to C4::Biblio::AddBiblio to defer writing
biblioitems.marc and biblioitems.marcxml. This
option was created to give a significant
speed boost to bulkmarcimport.pl, but is *not*
recommended for general use.
* C4::Items::AddItemBatchFromMarc
This refactoring removes the need to have functions
in C4::Biblio and C4::Items that call each other's
private functions.
Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
* Move CheckItemPreSave to C4::Items (from C4::Biblio)
* Modified C4::Biblio::AddBiblioAndItems to use appropriate
internal routines from C4::Items
* Moved GetItemnumberFromBarcode to C4::Items
* Removed duplicate C4::Biblio::_koha_new_items
* Removed disused C4::Biblio::MARCitemchange
Currently AddBiblioAndItems is a special routine that
uses private subs from both C4::Biblio and C4::Items.
This needs to be refactored.
Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
Prior to this fix, the status fields had three 'off' values, NULL, "",
and 0. I've reduced it to two in the db, removing the option for NULL, and
setting the default value to 0, however, we need to verify that we don't ever
write out as "" as this needlessly complicates the indexing process,
critical for searching or limiting by status (e.g., availability). Also,
queries that attempt to write a NULL value to one of these fields will fail
(based on my tests).
This patch includes the following changes:
* Updated the database definition for notforloan, damaged, itemlost, and
wthdrawn in kohastructure.sql to forbid NULL and default to 0; MySQL
can't forbid other values (such as empty ""), so this has to be handled
at the application layer and REQUIRES further patching.
* Fixed the 'limit by availability' query node in Search.pm to use a
much less confusing definition of 'available'
* Added code to set values to 0 where they are NULL or empty ( "" ) for
notforloan, damaged, itemlost or wthdrawn in both the MARC and the items
table:
* Biblio.pm -> AddBiblioAndItems
* catalogue/updateitem.pl
* SEE NOTE BELOW, REQUIRES UPDATE TO THE REST OF KOHA'S ITEM MGT!
* Removed code in bulkmarcimport.pl that sets notforloan status depending
on item-level or bib-level itemtype -- that flag is designed to be set
only to override the notforloan setting for the item's (or bib's,
depending on the syspref) assigned itemtype (it doesn't need to override
to 'for loan', only to 'not for loan').
added $dbh->do("truncate zebraqueue"); when operation is 'delete'
* I updated some notes in catalogue/updateitem.pl as to why ModItem can't be
used -- we don't have _a_ place where we can change the item and marc :/
I've tested the following:
bulkmarcimport.pl..........................MARC/items OK
Staged Records Import......................NOT OK
updateitem.pl (via moredetail.pl)..........MARC/items OK
circulation.pl.............................NOT OK
returns.pl.................................NOT OK
addbiblio.pl...............................NOT OK
additem.pl.................................NOT OK
Basically, there isn't a single place to apply this patch that will
update both item data and MARC data in one place ... a future patch
needs to address this issue.
Signed-off-by: Galen Charlton <galen.charlton@liblime.com>
Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
Enabled automatic conversion of MARC-8 records to
UTF-8. Record is converted if its Leader/09 contains
a blank and the -s (skip) option hasn't been supplied
on the command-line. Any record that cannot be converted
to UTF-8 is skipped.
Also now use Unicode Normalization Form C (NFC) for
records converted from MARC-8.
Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
Introduced new C4::Biblio function CheckItemPreSave,
which checks for duplicate barcodes and invalid
branch codes. Not yet sure whether this function
needs to be exported or whether it will just be
used internally to C4::Bibli.
Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
Changes to improve speed of MARC bib and item
imports:
[1] Turn off autocommit and commit database
transactions in larger batches.
[2] Introduce a new C4::Biblio function (AddBiblioAndItems)
to combine AddBiblio and AddItems -- this is faster
because we are not parsing the MARC XML of the biblio
every time we add an item.
[3] Introduce FasterTransformMarcToKoha, which is much
faster than TransformMarcToKoha. The new version,
which will replace the old one once it has been
fully tested, scans through each field in the
MARC record just once, instead of potentially
dozens of times.
[4] Remove code in bulkmarcexport that moved the
item tags to separate MARC::Record objects.
Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>