From 13ec430eccd52413e756c88f90a370787842cbc2 Mon Sep 17 00:00:00 2001 From: David Cook Date: Tue, 26 May 2020 12:30:53 +1000 Subject: [PATCH] Bug 17842: UTF-8 encode ISO2709 MARC download from cart MIME-Version: 1.0 Content-Type: text/plain; charset=utf8 Content-Transfer-Encoding: 8bit The cart was outputing ISO2709 MARC records with Latin-1 encoding. Records containing non-latin1 characters were automatically re-encoded as UTF-8 by browsers, which led to inconsistent character encodings for downloaded MARC files. This patch explicitly encodes ISO2709 MARC characters from the cart download as UTF-8 encoded bytes, which resolves the problem. Test Plan: 0) Don't apply patch 1) Create bib record with only ASCII characters 2) Add a ü character to the title 3) Save bib record 4) Download bib record from cart (opac and staff client) 5) Using xxd or some other program, note that the ü is represented by a FC byte (latin-1 encoded) 6) Apply the patch 7) Download bib record from cart (opac and staff client) 8) Using xxd or some other program, note that the ü is represented by C3 BC bytes (utf-8 encoded) 9) Success (Note that you could potentially use Notepad++ or some other program to open the downloaded file and just note the encoding that it finds. You could also try "chardetect" instead. Lots of options for figuring out the encoding.) Signed-off-by: Victor Grousset/tuxayo Signed-off-by: Julian Maurice Signed-off-by: Jonathan Druart --- basket/downloadcart.pl | 11 ++++++++++- opac/opac-downloadcart.pl | 10 +++++++++- 2 files changed, 19 insertions(+), 2 deletions(-) diff --git a/basket/downloadcart.pl b/basket/downloadcart.pl index 46534493d1..b69c7e2bba 100755 --- a/basket/downloadcart.pl +++ b/basket/downloadcart.pl @@ -71,7 +71,16 @@ if ($bib_list && $format) { next unless $record; if ($format eq 'iso2709') { - $output .= $record->as_usmarc(); + my $usmarc = $record->as_usmarc(); + if ($usmarc){ + #NOTE: If we don't explicitly UTF-8 encode the output, + #the browser will guess the encoding, and it won't always choose UTF-8. + my $bytes = encode("UTF-8", $usmarc); + if ($bytes) { + $output .= $bytes; + } + + } } elsif ($format eq 'ris') { $output .= marc2ris($record); diff --git a/opac/opac-downloadcart.pl b/opac/opac-downloadcart.pl index ec5e2c8e4a..f08dea13f1 100755 --- a/opac/opac-downloadcart.pl +++ b/opac/opac-downloadcart.pl @@ -90,7 +90,15 @@ if ($bib_list && $format) { next unless $record; if ($format eq 'iso2709') { - $output .= $record->as_usmarc(); + my $usmarc = $record->as_usmarc(); + if ($usmarc) { + #NOTE: If we don't explicitly UTF-8 encode the output, + #the browser will guess the encoding, and it won't always choose UTF-8. + my $bytes = encode("UTF-8", $usmarc); + if ($bytes) { + $output .= $bytes; + } + } } elsif ($format eq 'ris') { $output .= marc2ris($record); -- 2.39.5