Bug 11631: Make i18n toolchain ignore useless strings

This patch removes several types of strings from the
PO files that cannot be usefully translated, including
ones that consist entirely of punctuation and/or HTML entities.

Test:
1) Update PO files of some lang, xx-YY-*po
cd misc/translator
perl translate update xx-YY
2) Do it again, just in case
3) rm po/xx-YY*po~
4) Extract all msgid's, sorted
cat po/xx-YY*po | egrep "^msgid" | sort | uniq > xx-YY-pre
5) Apply the patch
6) Repeat 1-3
7) Repeat 4 again, other file
cat po/xx-YY*po | egrep "^msgid" | sort | uniq > xx-YY-post
8) Do a diff, inspect results, only strings with %s and \s
diff xx-YY-pre xx-YY-post | less

Signed-off-by: Bernardo Gonzalez Kriegel <bgkriegel@gmail.com>
Works as described, 380 strings less to 'translate'
No koha-qa errors.

Signed-off-by: Katrin Fischer <Katrin.Fischer.83@web.de>
Tested according to test plan, works as described.

Signed-off-by: Galen Charlton <gmc@esilibrary.com>
This commit is contained in:
Pasi Kallinen 2014-04-24 11:01:52 +03:00 committed by Galen Charlton
parent baa2fb2fba
commit c229553040

View file

@ -37,6 +37,7 @@ sub string_negligible_p ($) {
|| $t =~ /^\d+$/ # purely digits
|| $t =~ /^[-\+\.,:;!\?'"%\(\)\[\]\|]+$/ # punctuation w/o context
|| $t =~ /^[A-Za-z]$/ # single letters
|| $t =~ /^(&[a-z]+;|&#\d+;|&#x[0-9a-fA-F]+;|%%|%s|\s|[[:punct:]])*$/ # html entities,placeholder,punct, ...
)
}