acli [Fri, 27 Feb 2004 13:26:07 +0000 (13:26 +0000)]
- Consider <INPUT type=text> and <INPUT type=text> part of strings.
- If a string is enclosed by a tag, remove that tag from the extracted string
- Generate automatic comments to provide more information for the translator
- A couple bug fixes
acli [Wed, 25 Feb 2004 06:25:29 +0000 (06:25 +0000)]
After the previous change, the scanner will hang if the input is malformed
in a certain way, such as having a <title> but not matching </title>.
This should fix it.
acli [Wed, 25 Feb 2004 06:08:41 +0000 (06:08 +0000)]
This should make it handle commenting out of whole blocks of HTML better.
It seems to be still correct, and it is no longer complaining about syntax
errors when seeing commented-out HTML (esp. with TMPL_* directives).
Don't try to translate stuff between <title>...</title> too, the stuff in
the middle is supposed to be PCDATA.
acli [Tue, 24 Feb 2004 14:20:46 +0000 (14:20 +0000)]
tmpl_process3.pl did not know how to handle absolute pathnames in -i.
(Actually, xgettext.pl did not know how to handle them in the files listed
in the list of files.)
If the po file is empty (corrupted), $href->{'""'} will be undefined.
We just blindly dereferenced this null value without checking.
acli [Mon, 23 Feb 2004 01:21:03 +0000 (01:21 +0000)]
Fold all consecutive whitespaces into single blanks. This avoids problems
when minor whitespace changes occur in the original templates; it also
makes the strings much easier to read (e.g., instead of "foo\n\n\t\t bar",
xgettext.pl will now always generate "foo bar" and tmpl_process3.pl will
understand it to be the same as the original string).
acli [Sun, 22 Feb 2004 21:34:40 +0000 (21:34 +0000)]
Preliminary support for "analysis" of strings with <a> tags.
Early termination of analysis if we encounter some strings, such as </h1>
or | or ||, in order to avoid extracting strings that are unnecessarily
long and which doesn't add any meaningful context.
acli [Sun, 22 Feb 2004 09:04:53 +0000 (09:04 +0000)]
Try to relax the criteria for allowing groups of tokens without TMPL_VAR
to be combined together into one string. This seems to have the desired
effect (that "<b>foo</b> bar" type strings are now recognized in one piece).
However, "<h1>foo</h1>\nexplanation"-type things may now also be (arguably
wrongly) recognized as one piece.
acli [Sun, 22 Feb 2004 05:18:52 +0000 (05:18 +0000)]
Handle the iso8859-1 charset somewhat, so that when the po file is in
either iso8859-1 or utf8, msgmerge(1) won't crap out. The code is ugly;
the conversion table is hard-coded, and in some place not very appropriate.
However, this does fix the case where a few strings containing French
characters can't be translated. As a side effect, tmpl_process3 can now
also be used for French or other languages using iso8859-1.
acli [Fri, 20 Feb 2004 07:09:47 +0000 (07:09 +0000)]
Partially allow combination of several TEXT tokens. It seems that this
gives better strings. (Always allowing combinations gives havoc, we
currently avoid this by allowing combination only if the first and last
tokens are both TEXT.)
acli [Fri, 20 Feb 2004 04:38:02 +0000 (04:38 +0000)]
Support %0.0s notation so that we can omit the %s as in Year%s for the
Chinese translation. (This won't work for all languages; ultimately the
English templates must be fixed.)