Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
On bug 17591 we discovered that there was something weird going on with
the way we export and use subroutines/modules.
This patch tries to standardize our EXPORT to use EXPORT_OK only.
That way we will need to explicitely define the subroutine we want to
use from a module.
This patch is a squashed version of:
Bug 17600: After export.pl
Bug 17600: After perlimport
Bug 17600: Manual changes
Bug 17600: Other manual changes after second perlimports run
Bug 17600: Fix tests
And a lot of other manual changes.
export.pl is a dirty script that can be found on bug 17600.
"perlimport" is:
git clone https://github.com/oalders/App-perlimports.git
cd App-perlimports/
cpanm --installdeps .
export PERL5LIB="$PERL5LIB:/kohadevbox/koha/App-perlimports/lib"
find . \( -name "*.pl" -o -name "*.pm" \) -exec perl App-perlimports/script/perlimports --inplace-edit --no-preserve-unused --filename {} \;
The ideas of this patch are to:
* use EXPORT_OK instead of EXPORT
* perltidy the EXPORT_OK list
* remove '&' before the subroutine names
* remove some uneeded use statements
* explicitely import the subroutines we need within the controllers or
modules
Note that the private subroutines (starting with _) should not be
exported (and not used from outside of the module except from tests).
EXPORT vs EXPORT_OK (from
https://www.thegeekstuff.com/2010/06/perl-exporter-examples/)
"""
Export allows to export the functions and variables of modules to user’s namespace using the standard import method. This way, we don’t need to create the objects for the modules to access it’s members.
@EXPORT and @EXPORT_OK are the two main variables used during export operation.
@EXPORT contains list of symbols (subroutines and variables) of the module to be exported into the caller namespace.
@EXPORT_OK does export of symbols on demand basis.
"""
If this patch caused a conflict with a patch you wrote prior to its
push:
* Make sure you are not reintroducing a "use" statement that has been
removed
* "$subroutine" is not exported by the C4::$MODULE module
means that you need to add the subroutine to the @EXPORT_OK list
* Bareword "$subroutine" not allowed while "strict subs"
means that you didn't imported the subroutine from the module:
- use $MODULE qw( $subroutine list );
You can also use the fully qualified namespace: C4::$MODULE::$subroutine
Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
We are using Koha::Logger when it makes sense to keep the info,
otherwise we simply remove it
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
Bug 28572: Replace missing occurrence in misc/admin/koha-preferences
Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
The standard UNIMARC requires than the 9th character (starting from 0) in
labels must be blank (while it may be 'a' in marc21)
the problem is that C4::Charset::SetMarcUnicodeFlag (called in particular when
we import a record) always add 'a' char in the 9th label'pos whereas it should
do it just for MARC21 and NORMARC (not for UNIMARC) :
C4::Charset::SetMarcUnicodeFlag add 'a' char in the 9th label character for
MARC21 and NORMARC (it's normal), but just before doing this it call
"$marc_record->encoding('UTF-8')" which is a MARC::Record function which, when
called with 'UTF-8' parameter, do only one thing : add 'a' char in the 9th
label character
This patch only removes this incorrect function call, so, when we import a bib
record in UNIMARC : it no longer adds erroneous character (this does not change
anything for MARC21 and NORMARC because SetMarcUnicodeFlag explicitly adds 'a'
char in the 9th label for them)
Signed-off-by: Alex Buckley <alexbuckley@catalyst.net.nz>
Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
perl -p -i -e 's/^.*set the version for version checking.*\n//' **/*.pm
+ manual adjustements
Signed-off-by: Josef Moravec <josef.moravec@gmail.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@unc.edu.ar>
Signed-off-by: Brendan A Gallagher <brendan@bywatersolutions.com>
Mainly a
perl -p -i -e 's/^.*3.07.00.049.*\n//' **/*.pm
Then some adjustements
Signed-off-by: Josef Moravec <josef.moravec@gmail.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@unc.edu.ar>
Signed-off-by: Brendan A Gallagher <brendan@bywatersolutions.com>
perl -p -i -e 's/^(use vars .*)\$VERSION\s?(.*)/$1$2/' **/*.pm
Signed-off-by: Josef Moravec <josef.moravec@gmail.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@unc.edu.ar>
Signed-off-by: Brendan A Gallagher <brendan@bywatersolutions.com>
Conversion of MARC from ISO5426 is defined in C4::Charset::char_decode5426().
Each character or combined characters conversion is defined in a map.
This patch adds missing conversions.
See http://www.gymel.com/charsets/MAB2.html
Signed-off-by: Frederic Demians <f.demians@tamil.fr>
Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
Conversion of MARC from ISO5426 is defined in C4::Charset::char_decode5426().
Each character or combined characters conversion is defined in a map.
This patch changes some odd actual conversions.
In char_decode5426(), only characters between 0xC0 and 0xDF will be used for combining with following charater :
($char >= 0xC0 && $char <= 0xDF)
So conversion like "$chars{0x81d1}=0x00b0" will never be used.
Rules for "h with breve below" use combining with 0xf9 but looks like the correct caracter is 0xd5.
See http://www.gymel.com/charsets/MAB2.html
Signed-off-by: Frederic Demians <f.demians@tamil.fr>
Signed-off-by: Jonathan Druart <jonathan.druart@bugs.koha-community.org>
Signed-off-by: Tomas Cohen Arazi <tomascohen@theke.io>
C4::Context is only used to retrieve a syspref value.
This patch moves the use of C4::Context to a require.
Test plan:
Try to reach the SetMarcUnicodeFlag subroutine (batchmod, add/update a biblio, etc.)
Signed-off-by: Bernardo Gonzalez Kriegel <bgkriegel@gmail.com>
Tested on French UNIMARC install
No errors adding/editing biblios
No koha-qa errors
Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>
Signed-off-by: Tomas Cohen Arazi <tomascohen@gmail.com>
Signed-off-by: Chris Nighswonger <cnighswonger@foundations.edu>
Signed-off-by: Tomas Cohen Arazi <tomascohen@gmail.com>
Signed-off-by: Katrin Fischer <katrin.fischer@bsz-bw.de>
http://bugs.koha-community.org/show_bug.cgi?id=9987
Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@gmail.com>
See the wiki page for the explanation.
Signed-off-by: Paola Rossi <paola.rossi@cineca.it>
Signed-off-by: Bernardo Gonzalez Kriegel <bgkriegel@gmail.com>
Signed-off-by: Dobrica Pavlinusic <dpavlin@rot13.org>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@gmail.com>
This patch
- removes all html_entity usages in tt file which hide utf8 bugs
- removes all encode utf8 in tt plugins because we should get correctly
marked data from DBIC and other sources directly (cf plugin EncodeUTF8
used in renew.tt)
- adds some cleanup in C4::Templates::output: we now use perl utf8 file
handler output so we don't need to decode tt variables manually.
Signed-off-by: Paola Rossi <paola.rossi@cineca.it>
Signed-off-by: Bernardo Gonzalez Kriegel <bgkriegel@gmail.com>
Signed-off-by: Dobrica Pavlinusic <dpavlin@rot13.org>
Signed-off-by: Martin Renvoize <martin.renvoize@ptfs-europe.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@gmail.com>
Calls to C4/Charset.pm's NormalizeString function with an
undefined string were triggering warnings when running:
prove -v t/db_dependent/Holds.t
Sadly, t/Charset.t was also lacking calls to NormalizeString.
TEST PLAN
---------
1) prove -v t/db_dependent/Holds.t
-- This should generate the uninitialized string warnings.
Make sure CPL and MPL are in your branches to save
yourself from headaches due to expected data.
2) cat t/Charset.t
-- note there are no function calls to NormalizeString.
You can see other shortfalls in the tests beyond
NormalizeString with: grep ^sub C4/Charset.pm
3) prove -v t/Charset.t
4) Apply patch
5) prove -v t/Charset.t
-- Run as before with more tests.
6) cat t/Charset.t
-- note there are now function calls to NormalizeString.
7) prove -v t/db_dependent/Holds.t
-- Nice and clean run! :)
8) koha-qa.pl -v 2 -c 1
-- all should be Ok.
Signed-off-by: Tomas Cohen Arazi <tomascohen@gmail.com>
Signed-off-by: Katrin Fischer <katrin.fischer.83@web.de>
Signed-off-by: Tomas Cohen Arazi <tomascohen@gmail.com>
This patch
- rename _entity_clean as _clean_ampersand
- rename the script to sanitize_records.pl
- add a --fix-ampersand switch (the only one FOR NOW, enabled by
default) so it is obvious what the script does.
- make POD and usage reflect this changes.
Signed-off-by: Tomas Cohen Arazi <tomascohen@gmail.com>
This patch adds:
- a new maintenance script batch_sanitize_records
- a new subroutine C4::Charset::SanitizeRecord
- new unit tests for the new subroutine
Test plan:
1/ prove t/db_dependent/Charset.t
2/ Create a record containing "&amp;" (could be follow with as many
'amp;' as you want) in one of its fields and the same for the field
linked to biblioitems.url.
The url should not be sanitized, it may contain "&".
3/ Launch the maintenance script with the -h parameter to see how to use
it.
4/ Launch the script using the different parameters:
--filename=FILENAME
--biblionumbers='XXX'
--auto-search
The auto-search permits to sanitize all records containing "&amp;" in
the marcxml field.
Use the verbose flag for testing.
Without the --confirm flag, nothing is done.
5/ Use the --confirm flag and verify in the biblioitems.marcxml field
that the record has been sanitized.
6/ Try the --reindex flag to reindex records which have been modified.
Signed-off-by: Marcel de Rooy <m.de.rooy@rijksmuseum.nl>
Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
Signed-off-by: Tomas Cohen Arazi <tomascohen@gmail.com>
Signed-off-by: Mathieu Saby <mathieu.saby@univ-rennes2.fr>
This sub was causing 2 bugs :
- tools/exports.pl --clean was removing Â
- authority search plugin used in cataloging was removing  in suggested authorities displayed dynamicly (using ajax)
After applying the patch,
- NSB/NSE are still removed by nsb_clean
- tools/exports.pl --clean is no more removing Â
- authority search plugins is no more removing Â
Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>
Signed-off-by: Tomas Cohen Arazi <tomascohen@gmail.com>
Bug 12041 made xt/author/podcorrectness.t consider files in the 'Koha' namespace.
Some of them where failing. This patch fixes some of those POD problems.
Best regards
To+
Test:
1) run prove xt/author/podcorrectness.t
it fails
2) apply patch
3) run again, now it's ok
Signed-off-by: Bernardo Gonzalez Kriegel <bgkriegel@gmail.com>
Before patch test fails. After it, it pass
No koha-qa errors
Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>
Signed-off-by: Tomas Cohen Arazi <tomascohen@gmail.com>
C4::Charset::SetMarcUnicodeFlag() fetches system preference
values, so since it invokes routines in C4::Context, it should
load the module.
Signed-off-by: Chris Cormack <chris@bigballofwax.co.nz>
Signed-off-by: Jonathan Druart <jonathan.druart@biblibre.com>
Signed-off-by: Galen Charlton <gmc@esilibrary.com>
Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
Signed-off-by: Leila <koha.aixmarseille@gmail.com>
Bug 8015: Catch error in the SetUTF8Flag routine
The eval avoids the interface to run endlessly if an error occurred.
Signed-off-by: Kyle M Hall <kyle@bywatersolutions.com>
Signed-off-by: Leila <koha.aixmarseille@gmail.com>
Signed-off-by: Galen Charlton <gmc@esilibrary.com>
Changed Charset.pm to use defaultlanguage instead of 'fre'.
Signed-off-by: Rolando Isodoro <rolando.isidoro@gmail.com>
Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>
Passes all tests and QA script.
1) Check system preference was added correctly:
UNIMARCField100Language
2) Change code in preference to be not 'fre'.
3) Catalog a bibliographic record.
- check plugin shows new value
- check empty field is filled with new value from the plugin
- check you can still edit it to be something else
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
While typing an authority, will automatically propose authorities (similar to
autocompletion for patron search if activated)
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
Tested searching for authorities with and without autocomplete. Note that
this is most useful when used in the "Main entry" box instead of the
"Main entry ($a only)" box.
Signed-off-by: Jared Camins-Esakov <jcamins@cpbibliography.com>
Corrected tabs to spaces in auth-finder-search.inc while resolving merge
conflict.
Signed-off-by: Paul Poulain <paul.poulain@biblibre.com>
- The following export pages used to embed items when exporting,
this was no longer the case, so they were fixed :
Intranet :
- basket/downloadcart.pl,
- virtualshelves/downloadshelf.pl
- catalogue/export.pl
Opac :
- opac/opac-downloadcart.pl
- opac/opac-downloadshelf.pl
- opac/opac-export.pl
- Notes :
- GetMarcBiblio used to embed items data, this was no longer the case,
so an optional parameter was added to choose if items should be embedded or not.
This way, previous work on this bug is not broken, and this is a pretty usefull
feature, imho.
- An optional parameter has been added to SetUTF8Flag, to be able to use NFD during
normalization. This was required to make Unicode/UTF-8 export work again.
Signed-off-by: Claire Hernandez <claire.hernandez@biblibre.com>
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
- XSLT for the OPAC
- Value_builders for lesder, 007 and 008
- Default NORMARC framework
- Reverse MARC logic of some subs, so MARC21 is default (and works for NORMARC)
- Add NORMARC as an option to the syspref marcflavour
- Add record.abs for NORMARC
- Add NORMARC and nb as options to Makefile.PL
- Add etc/zebradb/lang_defs/nb/sort-string-utf.chr
- Copy MARC21slim2OAIDC.xsl to NORMARCslim2OAIDC.xsl
Some things are still missing, e.g.:
- XSLT for Intranet
- More MARC21slim2*.xsl transformations
POD was mistakenly telling that NFD was supposed to be the default
encoding. In fact, it is not, it is NFC.
So the variable $nfc to change to the not default encoding was misleading.
Renaming it into $nfd
(written by hdl)
Refactored by Chris Cormack
Signed-off-by: Davi <davi@gnu.org>
Signed-off-by: Chris Cormack <chrisc@catalyst.net.nz>
When using XSLT Display, and UNIMARC,
since marcFlavour is not used in encoding data, when data is true utf8, as_xml
fails on some subfields.
Moreover, because transformMARCXMLForXSLT edits some values in the marc record
and the PERL UTF8 is not handled by MARC::File::USMARC, it endsup in double
encoding the data.
Sending a patch to fix both issues.
This patch adds
- two functions in C4/Charset.pm
NormalizeString (uses Unicode::Normalize)
SetUTF8Flag (This function in my opinion belongs to MARC::Record, or at least MARC::File::USMARC)
- edits C4::XSLT in order to cope with the correct marcflavour
- edits C4::Search searchResults to use setUTF8Flag
Adding some new options to bulkmarcimport :
-k idtagsubfield in order to store the id of the file record into another field
-match tagsubfield,index
-a to import authorities
-l logfilename to store logs
Bug Fixing : C4/Charset.pm
Charset was incorrect for UNIMARC Authorities
Signed-off-by: Galen Charlton <gmcharlt@gmail.com>
When rebuild_zebra.pl is run from cron, there is an occasional error
of the form:
Use of uninitialized value $str in substitution (s///) at /home/ebpl/kohaclone/C4/Charset.pm line 304.
This error is occuring when the string that is fed to Charset::StripNonXmlChars
is null or undefined, for some reason.
This fix will handle the null-or-empty condition, and thus suppress the error.
Signed-off-by: Galen Charlton <gmcharlt@gmail.com>
Fixes a hang of the staging import tool when it
attempts to process a MARC21 record that claims
that it's UTF-8 when it is not. The staging import
will now attempt to fix the character encoding of such
records.
Also added a FIXME to bulkmarcimport.pl, which because
of its use of MARC::Batch will skip over such records -
better than the original hang of the staging import, but
worse than the staging import's new ability to fix such
records.
Signed-off-by: Galen Charlton <galen.charlton@liblime.com>
This patch fix the funciton SetMarcUnicodeFlag for UNIMARC support, now the function will fix the length of the field, and set encoding as "50 " instead of "5050".
Signed-off-by: Galen Charlton <galen.charlton@liblime.com>
This has bearing on bugs 2905, 2665, 2514 and other "wide character" crashes
related to diacritics and Unicode. This should help open the door for reliable
input of diacriticals via acquisitions.
MARC21_utf8_flag_fix.pl diagnoses and fixes existing problems with MARC data
affected by the bug.
Adding SetMarcUnicodeFlag to TransformKohaToMarc prevents the bug from corrupting
further data.
Signed-off-by: Galen Charlton <galen.charlton@liblime.com>
A script like bulkmarkimport.pl spends most of the time
in C4::Charset::MarcToUTF8Record function, and
specifically in C4::Charset::char_decode5426
just initializing a hash. This patch moves this
hash outside function to avoid its initializing
each time the functon is called.
A test on a specific conversion script shows me
that performances were improved from 23s to 8s.
Signed-off-by: Galen Charlton <galen.charlton@liblime.com>
Because of a bug in MARC::Charset 0.98, if a string to convert from
MARC-8 to UTF-8 has (a) one or more diacritics that (b) are only in character positions
128 to 255 inclusive, the resulting converted string is not in
UTF-8, but the legacy 8-bit encoding (e.g., ISO-8859-1). As a result,
when such a record is converted to XML using ->as_xml_record(), the resulting
XML can be truncated at the offending character. An example of such a record
is one that has a price in Briish pounds in the 260$c but no other diacritics.
Signed-off-by: Joshua Ferraro <jmf@liblime.com>
Added invocations of StripNonXmlChars to uses
of new_from_xml() that involve records
saved to Koha fields via MARC::Record->as_xml();
for batch jobs that work on MARC XML files
coming from external sources, StripNonXmlChars
should not necessarily be used, as it may
be better to reject a file or record if it
contains that kind of encoding error.
Signed-off-by: Joshua Ferraro <jmf@liblime.com>