Koha/t/Charset.t
Galen Charlton 60a98d258a IMPORTANT - refactor MARC character set handling
* IsStringUTF8ish - determine if scalar contains a string in UTF8
* MarcToUTF8Record - convert MARC blob or MARC::Record to UTF8
* SetMarcUnicodeFlag - set appropriate MARC21 or UNIMARC field to
  indicate that record is in UTF-8.

Design points of this module include:

* No dependencies on other C4 modules, making it easier to add
  more test cases
* All character conversion code in one place
* Single entry point for doing a character conversion on a
  MARC record
* Capture of errors and warnings produced by Text::Iconv
  and MARC::Charset
* Start of support for guessing the source character set of
  a MARC record.

Several functions were moved from other scripts
or modules to C4::Charset:

* C4::Koha->FixEncoding (expanded and renamed
  MarcToUTF8Record)
* C4::Koha->char_decode5426
* fMARC8ToUTF8 from bulkmarcimport.pl (renamed
  _marc_marc8_to_utf8)

Several batch jobs were adjusted to use MarcToUTF8Record instead of
FixEncoding.

Signed-off-by: Chris Cormack <crc@liblime.com>
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
2008-02-03 07:23:56 -06:00

20 lines
604 B
Perl
Executable file

#!/usr/bin/perl
use strict;
use warnings;
use Test::More tests => 6;
BEGIN {
use_ok('C4::Charset');
}
my $octets = "abc";
ok(IsStringUTF8ish($octets), "verify octets are valid UTF-8 (ASCII)");
$octets = "flamb\c3\a9";
ok(!utf8::is_utf8($octets), "verify that string does not have Perl UTF-8 flag on");
ok(IsStringUTF8ish($octets), "verify octets are valid UTF-8 (LATIN SMALL LETTER E WITH ACUTE)");
ok(!utf8::is_utf8($octets), "verify that IsStringUTF8ish does not magically turn Perl UTF-8 flag on");
$octets = "a\xc2" . "c";
ok(!IsStringUTF8ish($octets), "verify octets are not valid UTF-8");