acli [Mon, 23 Feb 2004 01:21:03 +0000 (01:21 +0000)]
Fold all consecutive whitespaces into single blanks. This avoids problems
when minor whitespace changes occur in the original templates; it also
makes the strings much easier to read (e.g., instead of "foo\n\n\t\t bar",
xgettext.pl will now always generate "foo bar" and tmpl_process3.pl will
understand it to be the same as the original string).
acli [Sun, 22 Feb 2004 21:34:40 +0000 (21:34 +0000)]
Preliminary support for "analysis" of strings with <a> tags.
Early termination of analysis if we encounter some strings, such as </h1>
or | or ||, in order to avoid extracting strings that are unnecessarily
long and which doesn't add any meaningful context.
acli [Sun, 22 Feb 2004 09:04:53 +0000 (09:04 +0000)]
Try to relax the criteria for allowing groups of tokens without TMPL_VAR
to be combined together into one string. This seems to have the desired
effect (that "<b>foo</b> bar" type strings are now recognized in one piece).
However, "<h1>foo</h1>\nexplanation"-type things may now also be (arguably
wrongly) recognized as one piece.
acli [Sun, 22 Feb 2004 05:18:52 +0000 (05:18 +0000)]
Handle the iso8859-1 charset somewhat, so that when the po file is in
either iso8859-1 or utf8, msgmerge(1) won't crap out. The code is ugly;
the conversion table is hard-coded, and in some place not very appropriate.
However, this does fix the case where a few strings containing French
characters can't be translated. As a side effect, tmpl_process3 can now
also be used for French or other languages using iso8859-1.
acli [Fri, 20 Feb 2004 07:09:47 +0000 (07:09 +0000)]
Partially allow combination of several TEXT tokens. It seems that this
gives better strings. (Always allowing combinations gives havoc, we
currently avoid this by allowing combination only if the first and last
tokens are both TEXT.)
acli [Fri, 20 Feb 2004 04:38:02 +0000 (04:38 +0000)]
Support %0.0s notation so that we can omit the %s as in Year%s for the
Chinese translation. (This won't work for all languages; ultimately the
English templates must be fixed.)
acli [Thu, 19 Feb 2004 21:24:30 +0000 (21:24 +0000)]
New scripts for translation into Chinese and other languages where English
word order is too different than the word order of the target language to
yield meaningful translations.
The new scripts use a different translation file format (namely standard
gettext-style PO files).
This seems to reasonably work (e.g., producing an empty en_GB translation
then installing seems to not corrupt the "translated" files), but it likely
will still contain some bugs. There is also little documentation, but try
to run perldoc on the .p[lm] files to see what's there. There are also some
spurious warnings (both from bugs in the new scripts and from buggy third-
party Locale::PO module).
acli [Sat, 14 Feb 2004 08:03:02 +0000 (08:03 +0000)]
Have to make it know what "closed start tag" notation is; other it spews
out more than a screenful or text for an "unknown token" when such notation
is seen
acli [Sat, 14 Feb 2004 05:46:38 +0000 (05:46 +0000)]
Make sure that if an attribute contains < or >, a warning is given; these
warnings aren't pedantic because (1) if it's a templating directive, it
might expand into something containing a real < and/or >, and (2) if it
contains >, the browser will close the current tag, and (3) if it contains
< and the browser knows what "SGML closed start tags" are (e.g., Mozilla),
the browser will also close the current tag.
acli [Fri, 13 Feb 2004 02:42:06 +0000 (02:42 +0000)]
The fixed search.marc/search.tmpl (nothing between <textarea></textarea>)
caused an eof token to be incorrectly generated by next_token(). This
is now fixed.
acli [Fri, 13 Feb 2004 01:14:18 +0000 (01:14 +0000)]
Don't issue warnings for unquoted attributes containing [^-\.a-zA-Z0-9]
unless --pedantic-warnings is given. These don't seem to cause any trouble,
even in Mozilla's standards compliant mode.
acli [Fri, 13 Feb 2004 00:42:52 +0000 (00:42 +0000)]
Seems like I wasn't careful enough recognizing unknown tokens. Incomplete
tags like "<b foo" at the end of the file seems to be discarded silently by
Mozilla, even in quirks mode. We now display a warning for these (in case
these ever come up by accident).
acli [Thu, 12 Feb 2004 08:55:14 +0000 (08:55 +0000)]
This is an experimental filter, based on simple scanning, that *should*
(ultimately) work better than the standard filter based on real parsing
of the .tmpl files.