Bug 35659: (follow-up) Better handling of accented characters
If you try to harvest bibliographic records from a UNIMARC OAI
repository (using oai_dc data format) in a MARC21 Koha instance
and run the OAI harvester script in verbose mode, you may get
lines similar to the following in the output:
no mapping found for [0xC9] at position 0 in Économie politique g0=ASCII_DEFAULT g1=EXTENDED_LATIN at /usr/share/perl5/MARC/Charset.pm line 308.
no mapping found for [0xC9] at position 0 in Église et société g0=ASCII_DEFAULT g1=EXTENDED_LATIN at /usr/share/perl5/MARC/Charset.pm line 308.
When looking at the imported records' biblio details page in
the OPAC, most words containing accented characters will not
appear correctly.
The fix is to apply Franck Theeten's solution from Bug 16488
(https://bugs.koha-community.org/bugzilla3/show_bug.cgi?id=16488#c24)
and modify the value of the MARC leader's 10th character
to 'a' in the XSLT that transforms the UNIMARC OAI records
into MARC21 XML. Then, the accented characters get imported
properly and the records appear correctly in the OPAC.
Test plan:
0) Without this patch, running the OAI harvesting script in
verbose mode produces many warnings, and garbled characters
appear in the OPAC biblio details page wherever accented
characters are in use.
1) Apply this patch.
2) Re-run the OAI harvesting script in verbose + force mode
(force mode is required to ignore record datestamps from
previous runs):
This time there should be no warnings printed on your
screen, and any characters with accents in the updated
records should look OK in the OPAC.
Thanks-to: Franck Theeten <franck.theeten@africamuseum.be> Signed-off-by: Michal Denar <black23@gmail.com> Signed-off-by: Julian Maurice <julian.maurice@biblibre.com> Signed-off-by: Pedro Amorim <pedro.amorim@ptfs-europe.com> Signed-off-by: Victor Grousset/tuxayo <victor@tuxayo.net> Signed-off-by: David Cook <dcook@prosentient.com.au> Sponsored-by: Association KohaLa - https://koha-fr.org/ Signed-off-by: Katrin Fischer <katrin.fischer@bsz-bw.de>