[g11n-ko-discuss] Re: [i18n-discuss] [Fwd: Re: ksh93 i18n problems on Solaris ? / was: Re: [ksh93-integration-discuss] comments on ksh93 migration plan]
Ienup Sung
Ienup.Sung at Sun.COM
Wed Apr 12 19:42:05 PDT 2006
Hello everyone, Roland,
I just built the ksh93 as Roland instructed and uploaded the binaries at
the following location:
http://www.opensolaris.org/os/community/int_localization/tmp/
The binaries have been built with Solaris 10 and SOS10 C compiler and so
they should work for S10 and SX releases. There is also readme.txt file that
people can take a look at to see some more info on how to extract and what to
test.
I also briefly tested the ksh with en_US.UTF-8 locale and found the result
a bit of mixed bag. I was able to input most of CJK, Arabic, Greek, Cyrillic
characters and also view all Unicode characters (for that, by doing
"more /usr/pub/UTF-8") but I wasn't able to input some accented characters
such as ä, ç, and so on. I was, however, able to input ł, ŝ, and so on.
(The attached script file shows the test log that I did. The same for i386
version of ksh too.)
I did "ldd ksh" after the build and it showed the following by the way:
system% ldd arch/sol10.sun4/bin/ksh
libm.so.2 => /lib/64/libm.so.2
libsecdb.so.1 => /lib/64/libsecdb.so.1
libc.so.1 => /lib/64/libc.so.1
libnsl.so.1 => /lib/64/libnsl.so.1
libcmd.so.1 => /lib/64/libcmd.so.1
libmp.so.2 => /lib/64/libmp.so.2
libmd5.so.1 => /lib/64/libmd5.so.1
libscf.so.1 => /lib/64/libscf.so.1
libdoor.so.1 => /lib/64/libdoor.so.1
libuutil.so.1 => /lib/64/libuutil.so.1
/platform/SUNW,Sun-Blade-1000/lib/sparcv9/libc_psr.so.1
/platform/SUNW,Sun-Blade-1000/lib/sparcv9/libmd5_psr.so.1
And I also noticed that you've src/lib/libcmd/cat.c patched. Would I also
need to add -Bstatic before linking with the -lcmd?
Ienup
Roland Mainz wrote:
> Hi!
>
> ----
>
> Can anyone in the i18n community help to verify that the attached patch
> for ksh93 (Korn Shell 93) fixes the problems when inputting/editing text
> in ja_JP.PCK or *.UTF-8 locales, please ? We really need some urgend
> feedback whether the patch fixes this issue...
>
> Building ksh93 from source:
> 1. Download
> http://svn.genunix.org/repos/on/branches/ksh93/gisburn/scripts/buildksh93.ksh
> - this script builds ksh93 from sources (and also contains instructions
> how to download the sources via "wget")
> 2. Fetch sources as described in "buildksh93.ksh"
> 3. Edit "buildksh93.ksh" to match the platform (default is Solaris 10 on
> i386 with Sun Studio 10/11)
> 4. Unpack source
> % mkdir build
> % cd build
> % gunzip -c ../ast-ksh.2006-02-14.tgz | tar -xf -
> % gunzip -c ../INIT.2006-01-24.tgz | tar -xf -
> 5. Apply patch:
> % gpatch -p0 <ksh93-shift_ijs.diff.txt
> 6. Build ksh93:
> % time nice ksh ../buildksh93.ksh 2>&1 | tee -a buildlog.log
> 7. Start ksh93:
> % ./arch/sol10.i386/bin/ksh
> # input and/or edit japanese/chinese/korean text and report whether this
> works correctly
>
> Thanks for the help! :-)
>
> -------- Original Message --------
> Subject: Re: ksh93 i18n problems on Solaris ? / was: Re:
> [ksh93-integration-discuss] comments on ksh93 migration plan
> Date: Sun, 09 Apr 2006 22:41:42 +0200
> From: Roland Mainz <roland.mainz at nrubsig.org>
> To: April Chin <April.Chin at eng.sun.com>
> CC: ksh93-integration-discuss at opensolaris.org,robbin.kawabata at sun.com,
> kenjiro.tsuji at sun.com
> References: <200602221818.k1MIIfKE729837 at jurassic.eng.sun.com>
>
> April Chin wrote:
>
>>>April Chin wrote:
>>>
>>>>>ksh93 already has support for I18N and for multibyte character
>>>>>handling. If there is need for change, it should be minimal
>>>>>since currently all error message translation goes through a single
>>>>>interface.
>>>>>
>>>>>The multibyte character handling uses the POSIX mb*() interface.
>>>>
>>>>I believe in ksh93 this may not be working in all respects.
>>>>An i18n engineer and I tried a few manual tests on ksh93 with
>>>>multibyte characters, and ksh93 did not appear to recognize some of
>>>>them which may have included an ASCII byte within the multibyte character.
>>>
>>>April - did this problem occur only in interactive terminal mode or even
>>>when a script writes the japanese/ASCII text mixture (e.g. % cat
>>>"ksh93_echo_japanese.ksh93" | ksh # ) ?
>>
>>The test was done via a ksh93 script.
>>On a system with the Japanese locales installed, I set up
>>an appropriate locale:
>>
>>% setenv LANG ja_JP.PCK
>>
>>I input a file (sjis.dat, attached below) containing multibyte
>>characters, including ones with an ASCII byte, to a ksh93 script which
>>reads and echoes each line, and then each word in that line.
>>
>>% cat sjis.dat | test.sh
>>
>>where test.sh contains:
>>#!/usr/bin/ksh93
>>read a
>>echo $a
>>
>>for b in $a ;
>>do
>> echo $b;
>>done
>>
>>Doing the same test with the current Solaris ksh, instead of ksh93,
>>output all the characters as expected. ksh93 was able to process
>>2 out of 3 multibyte characters which contained an ASCII component.
>
>
> See below (and the attached patch... the fix is quite obvious what is
> going wrong... :-) ) ...
>
>
>>>Linux may suffer from a similar problem, please read
>>>https://mailman.research.att.com/pipermail/ast-users/2006q1/000838.html
>>>and
>>>https://mailman.research.att.com/pipermail/ast-users/2006q1/000839.html
>>>
>>>BTW: Which terminal emulator did you use ? Gnome terminal, kconsole,
>>>dtterm or xterm ?
>>
>>The i18n engineer provided me with a terminal emulator for which
>>I could turn on sjis mode, the newer mode which will
>>accept multibyte characters which may have an ASCII component.
>>It sounds like the Linux problem is related to the terminal emulator.
>
>
> No, the terminal emulator (KDE "konsole") was not the problem (the
> developers of it tested that explicitly) - seems to be a problem in
> ksh93r itself.
>
> Attached is a patch ("ksh93-shift_ijs.diff.txt" ; this patch applies
> against ksh93r) written by Werner Fink/SuSE which fixes the problem for
> both SuSE Linux and Solaris 10 (I tested this with en_US.UTF-8 and
> japanese text, for ja_JP.PCK I have to wait until next week when the
> student who knows this locale can run some quick tests herself...).
> Could you please test whether the problems on your side are now fixed ?
>
> The only thing which is interesting now (assuming the patch works): Can
> we make somehow an automated test script for this kind of failure (I'd
> like to see this integrated into the ksh93 test suite) ?
>
> ----
>
> Bye,
> Roland
>
>
>
> ------------------------------------------------------------------------
>
> --- src/cmd/ksh93/sh/lex.c
> +++ src/cmd/ksh93/sh/lex.c 2006-04-06 15:58:08.000000000 +0000
> @@ -293,11 +293,14 @@
> {
> switch(*len = mbsize(_Fcin.fcptr))
> {
> - case -1: /* bogus multiByte char - parse as bytes? */
> - case 0: /* NULL byte */
> + case -1: /* bogus multiByte char - parse as bytes? */
> + case 0: /* NULL byte */
> + *len = 1;
> case 1:
> - lexState = state[curChar=fcget()];
> - break;
> + if ((curChar = fcget()) == 0)
> + curChar = fcfill();
> + lexState = state[curChar];
> + break;
> default:
> /*
> * None of the state tables contain entries
> @@ -1596,6 +1599,36 @@
> {
> if(n!=S_NL)
> {
> +#if SHOPT_MULTIBYTE
> + if(mbwide())
> + {
> + do
> + {
> + ssize_t len;
> + switch((len = mbsize(_Fcin.fcptr)))
> + {
> + case -1: /* bogus multiByte char - parse as bytes? */
> + case 0: /* NULL byte */
> + case 1:
> + n = state[fcget()];
> + break;
> + default:
> + /*
> + * None of the state tables contain
> + * entries for multibyte characters,
> + * however, they should be treated
> + * the same as any other alph
> + * character. Therefore, we'll use
> + * the state of the 'a' character.
> + */
> + mbchar(_Fcin.fcptr);
> + n = state['a'];
> + }
> + }
> + while(n == 0);
> + }
> + else
> +#endif /* SHOPT_MULTIBYTE */
> /* skip over regular characters */
> while((n=state[fcget()])==0);
> }
> --- src/cmd/ksh93/sh/macro.c
> +++ src/cmd/ksh93/sh/macro.c 2006-04-06 16:02:40.000000000 +0000
> @@ -266,7 +266,38 @@
> cp = fcseek(0);
> while(1)
> {
> +#if SHOPT_MULTIBYTE
> + if(mbwide())
> + {
> + do
> + {
> + ssize_t len;
> + switch((len = mbsize(cp)))
> + {
> + case -1: /* bogus multiByte char - parse as bytes? */
> + case 0: /* NULL byte */
> + case 1:
> + n = state[*(unsigned char*)cp++];
> + break;
> + default:
> + /*
> + * None of the state tables contain
> + * entries for multibyte characters,
> + * however, they should be treated
> + * the same as any other alph
> + * character. Therefore, we'll use
> + * the state of the 'a' character.
> + */
> + cp += len;
> + n = state['a'];
> + }
> + }
> + while(n == 0);
> + }
> + else
> +#endif /* SHOPT_MULTIBYTE */
> while((n=state[*(unsigned char*)cp++])==0);
> +
> if(n==S_NL || n==S_QUOTE || n==S_RBRA)
> continue;
> if(c=(cp-1)-fcseek(0))
> @@ -395,8 +426,42 @@
> cp++;
> while(1)
> {
> - while((n=state[*(unsigned char*)cp++])==0);
> - c = (cp-1) - first;
> +#if SHOPT_MULTIBYTE
> + if (mbwide())
> + {
> + ssize_t len;
> + do
> + {
> + switch((len = mbsize(cp)))
> + {
> + case -1: /* bogus multiByte char - parse as bytes? */
> + case 0: /* NULL byte */
> + len = 1;
> + case 1:
> + n = state[*(unsigned char*)cp++];
> + break;
> + default:
> + /*
> + * None of the state tables contain entries
> + * for multibyte characters. However, they
> + * should be treated the same as any other
> + * alpha character, so we'll use the state
> + * which would normally be assigned to the
> + * 'a' character.
> + */
> + cp += len;
> + n = state['a'];
> + }
> + }
> + while(n == 0);
> + c = (cp-len) - first;
> + }
> + else
> +#endif /* SHOPT_MULTIBYTE */
> + {
> + while((n=state[*(unsigned char*)cp++])==0);
> + c = (cp-1) - first;
> + }
> switch(n)
> {
> case S_ESC:
> --- src/lib/libcmd/Mamfile
> +++ src/lib/libcmd/Mamfile 2006-04-06 16:08:42.000000000 +0000
> @@ -444,7 +444,7 @@
> prev cat.c
> meta cat.o %.c>%.o cat.c cat
> prev cat.c
> -exec - ${CC} ${mam_cc_FLAGS} ${CCFLAGS} -I. -I${PACKAGE_ast_INCLUDE} -DERROR_CATALOG=\""libcmd"\" -DUSAGE_LICENSE=\""[-author?Glenn Fowler <gsf at research.att.com>][-author?David Korn <dgk at research.att.com>][-copyright?Copyright (c) 1992-2006 AT&T Knowledge Ventures][-license?http://www.opensource.org/licenses/cpl1.0.txt][--catalog?libcmd]"\" -D_PACKAGE_ast -D_BLD_cmd -c cat.c
> +exec - ${CC} ${mam_cc_FLAGS} ${CCFLAGS} -I. -I${PACKAGE_ast_INCLUDE} -DERROR_CATALOG=\""libcmd"\" -DUSAGE_LICENSE=\""[-author?Glenn Fowler <gsf at research.att.com>][-author?David Korn <dgk at research.att.com>][-copyright?Copyright (c) 1992-2006 AT&T Knowledge Ventures][-license?http://www.opensource.org/licenses/cpl1.0.txt][--catalog?libcmd]"\" -D_PACKAGE_ast -D_BLD_cmd -DSHOPT_MULTIBYTE -c cat.c
> done cat.o generated
> make chgrp.o
> prev chgrp.c
> --- src/lib/libcmd/cat.c
> +++ src/lib/libcmd/cat.c 2006-04-06 16:09:45.000000000 +0000
> @@ -133,8 +133,39 @@
> while (endbuff)
> {
> cpold = cp;
> - /* skip over ASCII characters */
> + /* skip over ASCII and multi byte characters */
> +#if SHOPT_MULTIBYTE
> + if(mbwide())
> + {
> + do
> + {
> + ssize_t len;
> + switch((len = mbsize(cp)))
> + {
> + case -1: /* bogus multiByte char - parse as bytes? */
> + case 0: /* NULL byte */
> + case 1:
> + n = states[*cp++];
> + break;
> + default:
> + /*
> + * None of the state tables contain
> + * entries for multibyte characters,
> + * however, they should be treated
> + * the same as any other alph
> + * character. Therefore, we'll use
> + * the state of the 'a' character.
> + */
> + cp += len;
> + n = states['a'];
> + }
> + }
> + while(n == 0);
> + }
> + else
> +#endif /* SHOPT_MULTIBYTE */
> while ((n = states[*cp++]) == 0);
> +
> if (n==T_ENDBUF)
> {
> if (cp>endbuff)
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> i18n-discuss mailing list
> i18n-discuss at opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/i18n-discuss
-------------- next part --------------
ÅØ½ºÆ®°¡ ¾Æ´Ñ ÷ºÎ¸¦ ¸ðµÎ ¾ø¾Ö ¹ö·È½À´Ï´Ù...
À̸§: ksh93-sparc-test-log.txt
Çü½Ä: application/text
Å©±â: 1561 bytes
¼³¸í: ÀÌ¿ëÇÒ ¼ö ¾ø¼ü´Ï´Ù.
Url : http://oss-beta1.opensolaris.org/pipermail/g11n-ko-discuss/attachments/20060412/c78213b2/attachment.bin