Koha/C4
Julian Maurice 01d78e1ec7
Bug 29333: Fix encoding of imported UNIMARC authorities
MARC::Record and MARC::File::* modules sometimes use the position 09 of
the leader to detect encoding. A blank character means 'MARC-8' while an
'a' means 'UTF-8'.

In a UNIMARC authority this position is used to store the authority type
(see https://www.transition-bibliographique.fr/wp-content/uploads/2021/02/AIntroLabel-2004.pdf [FR]).
In this case, 'a' means 'Personal Name'.

The result is that the import will succeed for a Personal Name
authority, but it will fail for all other authority types.

Steps to reproduce:
0. Be sure to have a Koha UNIMARC instance.
1. Download the MARCXML for "Honoré de Balzac"
   curl -o balzac.marcxml https://www.idref.fr/02670305X.xml
2. Verify that it's encoded in UTF-8
   file balzac.marcxml
   (should output "balzac.marcxml: XML 1.0 document, UTF-8 Unicode
   text")
3. Go to Tools » Stage MARC for import and import balzac.marcxml with
   the following settings:
   Record type: Authority
   Character encoding: UTF-8
   Format: MARCXML
   Do not touch the other settings
4. Once imported, go to the staged MARC management tool and find your
   batch. Click on the authority title "Balzac Honoré de 1799-1850" to
   show the MARC inside a modal window. There should be no encoding
   issue.
5. Write down the imported record id (the number in column '#') and go
   to the MARC authority editor. Replace all URL parameters by
   'breedingid=THE_ID_YOU_WROTE_DOWN'
   The URL should look like this:
   /cgi-bin/koha/authorities/authorities.pl?breedingid=198
   You should see no encoding issues. Do not save the record.
6. Import the batch into the catalog. Verify that the authority record
   has no encoding issue.
7. Now download the MARCXML for "Athènes (Grèce)"
   curl -o athènes.marcxml https://www.idref.fr/027290530.xml
8. Repeat steps 2 to 6 using athènes.marcxml file. At steps 4 and 5 you
   should see encoding issues and that the position 9 of the leader was
   rewritten from 'c' to 'a'. Strangely, importing this batch fix the
   encoding issue, but we still lose the information in position 09 of
   the leader

This patch makes use of the MARCXML representation of the record instead
of the ISO2709 representation, because, unlike
MARC::Record::new_from_usmarc, MARC::Record::new_from_xml allows us to
pass directly the encoding and the format, which prevents data to be
double encoded when position 09 of the leader is different that 'a'

Test plan:
- Follow the "steps to reproduce" above and verify that you have no
  encoding issues.

Signed-off-by: David Nind <david@davidnind.com>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
2022-07-08 15:43:33 -03:00
..
AuthoritiesMarc Bug 17600: Standardize our EXPORT_OK 2021-07-16 08:58:47 +02:00
Barcodes Bug 26328: Cast barcode from varchar to integer for incremental barcode 2022-03-23 10:50:51 -10:00
ClassSortRoutine Bug 17600: Standardize our EXPORT_OK 2021-07-16 08:58:47 +02:00
ClassSplitRoutine Bug 28572: Remove C4::Debug 2021-06-22 12:04:32 +02:00
Creators Bug 24001: Fix Card template/rpinter profile creation and edit 2022-04-28 10:49:20 -10:00
External Bug 17600: Standardize our EXPORT_OK 2021-07-16 08:58:47 +02:00
Form Bug 28572: Remove C4::Debug 2021-06-22 12:04:32 +02:00
Heading Bug 26852: subfield $e missing in X11 definition of MARC21 headings 2021-08-11 13:27:52 +02:00
ILSDI Bug 30275: Rename issues.renewals to issues.renewals_count 2022-07-05 09:45:55 -03:00
Installer Bug 30731: Remove Readonly::XS::MAGIC_COOKIE 2022-06-01 16:15:26 -03:00
Labels Bug 17600: Standardize our EXPORT_OK 2021-07-16 08:58:47 +02:00
Linker Bug 28676: Cache and retrieve match_count when searching a cached heading 2021-09-20 12:06:56 +02:00
Members Bug 17600: Standardize our EXPORT_OK 2021-07-16 08:58:47 +02:00
OAI Bug 17600: Standardize our EXPORT_OK 2021-07-16 08:58:47 +02:00
Output Bug 17600: Standardize our EXPORT_OK 2021-07-16 08:58:47 +02:00
Patroncards Bug 25459: Makes barcode position respect units in patron cards layout 2021-11-02 16:50:01 +01:00
Reports Bug 29695: Remove C4::Reports::Guided::_get_column_defs 2022-04-12 11:40:16 +02:00
Search Bug 29915: Tiny session adjustments 2022-03-22 10:17:33 -10:00
Serials Bug 17600: Standardize our EXPORT_OK 2021-07-16 08:58:47 +02:00
SIP Bug 29755: Check each NoIssuesCharge separately 2022-05-06 15:58:13 -10:00
Utils Bug 29648: (QA follow-up) Minor POD fix 2022-04-27 11:20:45 -10:00
Accounts.pm Bug 17600: Standardize our EXPORT_OK 2021-07-16 08:58:47 +02:00
Acquisition.pm Bug 29844: Fix ->search occurrences 2022-02-09 15:36:23 -10:00
Auth.pm Bug 30842: 2FA - Allow at least one old TOTP 2022-06-01 16:14:42 -03:00
Auth_cas_servers.yaml.sample
Auth_with_cas.pm Bug 28417: Don't require C4::Auth_with_cas from opac-user if not needed 2021-11-03 15:40:52 +01:00
Auth_with_ldap.pm Bug 17600: Standardize our EXPORT_OK 2021-07-16 08:58:47 +02:00
Auth_with_shibboleth.pm Bug 17600: Standardize our EXPORT_OK 2021-07-16 08:58:47 +02:00
AuthoritiesMarc.pm Bug 30464: Make BatchUpdateAuthority update the index in one request 2022-05-05 11:17:36 -10:00
BackgroundJob.pm Bug 17600: Standardize our EXPORT_OK 2021-07-16 08:58:47 +02:00
Barcodes.pm Bug 17600: Standardize our EXPORT_OK 2021-07-16 08:58:47 +02:00
Biblio.pm Bug 30822: Make BatchCommitRecords update the index in one request 2022-06-22 09:43:47 -03:00
Breeding.pm Bug 30813: (QA follow-up) Adjust three use statements 2022-06-08 11:40:32 -03:00
Budgets.pm Bug 24190: (follow-up) Rename AcqLog 2021-09-21 20:22:57 +02:00
Calendar.pm Bug 17600: Standardize our EXPORT_OK 2021-07-16 08:58:47 +02:00
Charset.pm Bug 18984: Remove NORMARC support 2021-10-07 15:36:40 +02:00
Circulation.pm Bug 30275: (follow-up) Drop renewer_id constraint 2022-07-05 09:46:18 -03:00
ClassSortRoutine.pm Bug 29951: Fix EXPORT for C4::ClassS*Routine modules 2022-07-08 15:29:56 -03:00
ClassSource.pm Bug 29951: Fix EXPORT for C4::ClassS*Routine modules 2022-07-08 15:29:56 -03:00
ClassSplitRoutine.pm Bug 29951: Fix EXPORT for C4::ClassS*Routine modules 2022-07-08 15:29:56 -03:00
Context.pm Bug 30702: Fix Context.pm L785 warning on sessionID 2022-05-06 10:33:10 -10:00
Contract.pm Bug 17600: Standardize our EXPORT_OK 2021-07-16 08:58:47 +02:00
CourseReserves.pm Bug 17600: Standardize our EXPORT_OK 2021-07-16 08:58:47 +02:00
Creators.pm Bug 17600: Standardize our EXPORT_OK 2021-07-16 08:58:47 +02:00
Heading.pm Bug 25616: Uppercase hard coded lower case boolean operators for Elasticsearch 2022-02-24 14:35:36 -10:00
HoldsQueue.pm Bug 29346: Use fully qualified names for C4:Circulation routines in C4::HoldsQueue 2022-05-05 11:17:36 -10:00
HTML5Media.pm Bug 18984: Remove NORMARC support 2021-10-07 15:36:40 +02:00
ImportBatch.pm Bug 29333: Fix encoding of imported UNIMARC authorities 2022-07-08 15:43:33 -03:00
ImportExportFramework.pm Bug 13952: (follow-up) JS translatability, clean warns, other 2022-04-04 16:23:46 +02:00
InstallAuth.pm Bug 26019: Koha should set SameSite attribute on cookies 2022-04-13 15:55:38 +02:00
Installer.pm Bug 30620: Add a warning about /*!VERSION lines in kohastructure 2022-05-02 11:22:57 -10:00
ItemCirculationAlertPreference.pm Bug 29844: Fix ->search occurrences 2022-02-09 15:36:23 -10:00
Items.pm Bug 30824: (follow-up) POD 2022-06-13 20:16:32 -03:00
Koha.pm Bug 29883: avoid uninitialized value warn in GetAuthorisedValues sub 2022-06-01 13:40:24 -03:00
Labels.pm
Languages.pm Bug 15067: Follow up to fix sorting 2021-08-04 14:06:43 +02:00
Letters.pm Bug 28739: Execute the letter processing inside a transaction 2022-07-08 15:40:04 -03:00
Linker.pm Bug 17600: Standardize our EXPORT_OK 2021-07-16 08:58:47 +02:00
Log.pm Bug 28692: (QA follow-up) Fix test for objects 2021-11-16 14:00:20 +01:00
MarcModificationTemplates.pm Bug 17600: Standardize our EXPORT_OK 2021-07-16 08:58:47 +02:00
Matcher.pm Bug 17600: Standardize our EXPORT_OK 2021-07-16 08:58:47 +02:00
Members.pm Bug 30275: Rename issues.renewals to issues.renewals_count 2022-07-05 09:45:55 -03:00
Message.pm Bug 17600: Standardize our EXPORT_OK 2021-07-16 08:58:47 +02:00
Output.pm Bug 30115: Uninitialized value warning in C4/Output.pm 2022-02-21 15:15:47 -10:00
Overdues.pm Bug 24865: (QA follow-up) Remove hardcoded notice name from protected_letters 2022-07-05 11:37:39 -03:00
Patroncards.pm Bug 17600: Standardize our EXPORT_OK 2021-07-16 08:58:47 +02:00
Record.pm Bug 18984: Remove NORMARC support 2021-10-07 15:36:40 +02:00
Reports.pm Bug 17600: Standardize our EXPORT_OK 2021-07-16 08:58:47 +02:00
Reserves.pm Bug 12630: Rebase tests and cover CheckReserves 2022-06-13 10:24:50 -03:00
Ris.pm Bug 17600: Standardize our EXPORT_OK 2021-07-16 08:58:47 +02:00
RotatingCollections.pm Bug 17600: Standardize our EXPORT_OK 2021-07-16 08:58:47 +02:00
Scheduler.pm Bug 17600: Standardize our EXPORT_OK 2021-07-16 08:58:47 +02:00
Scrubber.pm Bug 17600: Standardize our EXPORT_OK 2021-07-16 08:58:47 +02:00
Search.pm Bug 30327: Fix tests 2022-06-25 15:25:18 -03:00
Serials.pm Bug 23352: Set default collection code when creating subscription 2022-05-10 15:17:17 -10:00
Service.pm Bug 17600: Standardize our EXPORT_OK 2021-07-16 08:58:47 +02:00
ShelfBrowser.pm Bug 17600: Standardize our EXPORT_OK 2021-07-16 08:58:47 +02:00
SMS.pm
SocialData.pm Bug 17600: Standardize our EXPORT_OK 2021-07-16 08:58:47 +02:00
Stats.pm Bug 19532: Recalls objects and tests 2022-03-14 22:45:51 -10:00
Suggestions.pm Bug 23991: (follow-up) Silence useless warnings 2022-06-27 13:23:06 -03:00
Tags.pm Bug 17600: Standardize our EXPORT_OK 2021-07-16 08:58:47 +02:00
Templates.pm Bug 26019: Koha should set SameSite attribute on cookies 2022-04-13 15:55:38 +02:00
TmplToken.pm
TmplTokenType.pm Bug 17600: Standardize our EXPORT_OK 2021-07-16 08:58:47 +02:00
TTParser.pm
UsageStats.pm Bug 30237: Replace AutoEmailOpacUser with AutoEmailNewUser 2022-04-20 09:03:39 -10:00
XISBN.pm Bug 30813: (QA follow-up) Adjust three use statements 2022-06-08 11:40:32 -03:00
XSLT.pm Bug 30291: Changes to controller scripts 2022-05-05 11:17:36 -10:00