From 40d3800cc4d73be3f40fb3b895de3e7fe29e4999 Mon Sep 17 00:00:00 2001 From: Tomas Cohen Arazi Date: Wed, 22 Jan 2014 15:38:41 -0300 Subject: [PATCH] Bug 9114: Make frameworks import/export routines correctly use UTF-8 Currently both the import_export_framework.pl script outputs data with Perl's default encoding, ISO-8859. This patch properly sets the binmode to UTF-8 when exporting SQL and CSV files using the PerlIO layer (":encoding(UTF-8)") for STDOUT. To test: Export step test - Use some ASCII character(s) with DIACRITICS in some field description in a chosen framework. - Export the framework at Administration > MARC frameworks - Run this to check the file is ISO-8859 encoded: $ file export_XXX.csv export_XXX.csv: ISO-8859 text, with very long lines (Note: try SQL and other output formats too. But not ODS) - Apply the patch - Export the framework again (change the name), and test encoding: $ file export_XXX_2.csv export_XXX_2.csv: UTF-8 Unicode text Import step test I assume you have two files, export_XXX.csv (ISO-8859 encoded) and export_XXX_2.csv (XXX will depend on your framework's code) - Reset your testing branch to master - Import export_XXX.csv - The string with non-ASCII chars is truncated at the first non-ASCII char's position (Note: this is the current behaviour). - Import export_XXX_2.csv - The non-ASCII chars are broken, the logs show errors on non-UNICODE chars. (Note: even thou UTF-8 is the expected encoding it is treated as ISO-8859). - Apply the patch - Import the good (UTF-8 as expected) file and check everything worked as expected. No double encoding should occur with either combination of formats. Sponsored-by: Universidad Nacional de Cordoba Signed-off-by: Magnus Enger I put some Norwegian and accented letters in a fremawork to test. Before the patch, the exported CSV came out as ISO-8859, after the patch it came out as UTF-8. ODS and XML (viewed in LibreOffice) both looked good, before and after the patch. Importing the ISO-8859 CSV cut off the strings at the first non-ASCII char. Importing the UTF-8 CSV worked as epected. Signed-off-by: Katrin Fischer Works as expected, passes tests and QA script. Signed-off-by: Galen Charlton --- admin/import_export_framework.pl | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/admin/import_export_framework.pl b/admin/import_export_framework.pl index 2a95c45014..20729f2c67 100755 --- a/admin/import_export_framework.pl +++ b/admin/import_export_framework.pl @@ -54,8 +54,12 @@ if ($action eq 'export' && $input->request_method() eq 'GET') { my $strXml = ''; my $format = $input->param('type_export_' . $frameworkcode); ExportFramework($frameworkcode, \$strXml, $format); + if ($format eq 'csv') { # CSV file + + # Correctly set the encoding to output plain text in UTF-8 + binmode(STDOUT,':encoding(UTF-8)'); print $input->header(-type => 'application/vnd.ms-excel', -attachment => 'export_' . $frameworkcode . '.csv'); print $strXml; } elsif ($format eq 'excel') {