[g11n-zh-hk-discuss] Re: [i18n-discuss] [Fwd: Re: ksh93 i18n problems on Solaris ? / was: Re: [ksh93-integration-discuss] comments on ksh93 migration plan]

Ienup Sung Ienup.Sung在Sun.COM
星期三 四月 12 19:42:05 PDT 2006


Hello everyone, Roland,

I just built the ksh93 as Roland instructed and uploaded the binaries at
the following location:

	http://www.opensolaris.org/os/community/int_localization/tmp/

The binaries have been built with Solaris 10 and SOS10 C compiler and so
they should work for S10 and SX releases. There is also readme.txt file that
people can take a look at to see some more info on how to extract and what to
test.

I also briefly tested the ksh with en_US.UTF-8 locale and found the result
a bit of mixed bag. I was able to input most of CJK, Arabic, Greek, Cyrillic
characters and also view all Unicode characters (for that, by doing
"more /usr/pub/UTF-8") but I wasn't able to input some accented characters
such as ä, ç, and so on. I was, however, able to input ł, ŝ, and so on.
(The attached script file shows the test log that I did. The same for i386
version of ksh too.)

I did "ldd ksh" after the build and it showed the following by the way:

	system% ldd arch/sol10.sun4/bin/ksh
         libm.so.2 =>     /lib/64/libm.so.2
         libsecdb.so.1 =>         /lib/64/libsecdb.so.1
         libc.so.1 =>     /lib/64/libc.so.1
         libnsl.so.1 =>   /lib/64/libnsl.so.1
         libcmd.so.1 =>   /lib/64/libcmd.so.1
         libmp.so.2 =>    /lib/64/libmp.so.2
         libmd5.so.1 =>   /lib/64/libmd5.so.1
         libscf.so.1 =>   /lib/64/libscf.so.1
         libdoor.so.1 =>  /lib/64/libdoor.so.1
         libuutil.so.1 =>         /lib/64/libuutil.so.1
         /platform/SUNW,Sun-Blade-1000/lib/sparcv9/libc_psr.so.1
         /platform/SUNW,Sun-Blade-1000/lib/sparcv9/libmd5_psr.so.1

And I also noticed that you've src/lib/libcmd/cat.c patched. Would I also
need to add -Bstatic before linking with the -lcmd?

Ienup

Roland Mainz wrote:
> Hi!
> 
> ----
> 
> Can anyone in the i18n community help to verify that the attached patch
> for ksh93 (Korn Shell 93) fixes the problems when inputting/editing text
> in ja_JP.PCK or *.UTF-8 locales, please ? We really need some urgend
> feedback whether the patch fixes this issue...
> 
> Building ksh93 from source:
> 1. Download
> http://svn.genunix.org/repos/on/branches/ksh93/gisburn/scripts/buildksh93.ksh
> - this script builds ksh93 from sources (and also contains instructions
> how to download the sources via "wget")
> 2. Fetch sources as described in "buildksh93.ksh"
> 3. Edit "buildksh93.ksh" to match the platform (default is Solaris 10 on
> i386 with Sun Studio 10/11)
> 4. Unpack source
> % mkdir build
> % cd build
> % gunzip -c ../ast-ksh.2006-02-14.tgz | tar -xf -
> % gunzip -c ../INIT.2006-01-24.tgz | tar -xf -
> 5. Apply patch:
> % gpatch -p0 <ksh93-shift_ijs.diff.txt
> 6. Build ksh93:
> % time nice ksh ../buildksh93.ksh 2>&1 | tee -a buildlog.log
> 7. Start ksh93:
> % ./arch/sol10.i386/bin/ksh
> # input and/or edit japanese/chinese/korean text and report whether this
> works correctly
> 
> Thanks for the help! :-)
> 
> -------- Original Message --------
> Subject: Re: ksh93 i18n problems on Solaris ? / was: Re:
> [ksh93-integration-discuss]  comments on ksh93 migration plan
> Date: Sun, 09 Apr 2006 22:41:42 +0200
> From: Roland Mainz <roland.mainz在nrubsig.org>
> To: April Chin <April.Chin在eng.sun.com>
> CC: ksh93-integration-discuss在opensolaris.org,robbin.kawabata在sun.com,
> kenjiro.tsuji在sun.com
> References: <200602221818.k1MIIfKE729837在jurassic.eng.sun.com>
> 
> April Chin wrote:
> 
>>>April Chin wrote:
>>>
>>>>>ksh93 already has support for I18N and for multibyte character
>>>>>handling.  If there is need for change, it should be minimal
>>>>>since currently all error message translation goes through a single
>>>>>interface.
>>>>>
>>>>>The multibyte character handling uses the POSIX mb*() interface.
>>>>
>>>>I believe in ksh93 this may not be working in all respects.
>>>>An i18n engineer and I tried a few manual tests on ksh93 with
>>>>multibyte characters, and ksh93 did not appear to recognize some of
>>>>them which may have included an ASCII byte within the multibyte character.
>>>
>>>April - did this problem occur only in interactive terminal mode or even
>>>when a script writes the japanese/ASCII text mixture (e.g. % cat
>>>"ksh93_echo_japanese.ksh93" | ksh # ) ?
>>
>>The test was done via a ksh93 script.
>>On a system with the Japanese locales installed, I set up
>>an appropriate locale:
>>
>>% setenv LANG ja_JP.PCK
>>
>>I input a file (sjis.dat, attached below) containing multibyte
>>characters, including ones with an ASCII byte, to a ksh93 script which
>>reads and echoes each line, and then each word in that line.
>>
>>% cat sjis.dat | test.sh
>>
>>where test.sh contains:
>>#!/usr/bin/ksh93
>>read a
>>echo $a
>>
>>for b in $a ;
>>do
>>        echo $b;
>>done
>>
>>Doing the same test with the current Solaris ksh, instead of ksh93,
>>output all the characters as expected.  ksh93 was able to process
>>2 out of 3 multibyte characters which contained an ASCII component.
> 
> 
> See below (and the attached patch... the fix is quite obvious what is
> going wrong... :-) ) ...
> 
> 
>>>Linux may suffer from a similar problem, please read
>>>https://mailman.research.att.com/pipermail/ast-users/2006q1/000838.html
>>>and
>>>https://mailman.research.att.com/pipermail/ast-users/2006q1/000839.html
>>>
>>>BTW: Which terminal emulator did you use ? Gnome terminal, kconsole,
>>>dtterm or xterm ?
>>
>>The i18n engineer provided me with a terminal emulator for which
>>I could turn on sjis mode, the newer mode which will
>>accept multibyte characters which may have an ASCII component.
>>It sounds like the Linux problem is related to the terminal emulator.
> 
> 
> No, the terminal emulator (KDE "konsole") was not the problem (the
> developers of it tested that explicitly) - seems to be a problem in
> ksh93r itself.
> 
> Attached is a patch ("ksh93-shift_ijs.diff.txt" ; this patch applies
> against ksh93r) written by Werner Fink/SuSE which fixes the problem for
> both SuSE Linux and Solaris 10 (I tested this with en_US.UTF-8 and
> japanese text, for ja_JP.PCK I have to wait until next week when the
> student who knows this locale can run some quick tests herself...).
> Could you please test whether the problems on your side are now fixed ?
> 
> The only thing which is interesting now (assuming the patch works): Can
> we make somehow an automated test script for this kind of failure (I'd
> like to see this integrated into the ksh93 test suite) ?
> 
> ----
> 
> Bye,
> Roland
> 
> 
> 
> ------------------------------------------------------------------------
> 
> --- src/cmd/ksh93/sh/lex.c
> +++ src/cmd/ksh93/sh/lex.c	2006-04-06 15:58:08.000000000 +0000
> @@ -293,11 +293,14 @@
>  	{
>  		switch(*len = mbsize(_Fcin.fcptr))
>  		{
> -		    case -1: /* bogus multiByte char - parse as bytes? */
> -		    case 0: /* NULL byte */
> +		    case -1:	/* bogus multiByte char - parse as bytes? */
> +		    case 0:	/* NULL byte */
> +				*len = 1;
>  		    case 1:
> -                                lexState = state[curChar=fcget()];
> -                                break;
> +				if ((curChar = fcget()) == 0)
> +					curChar = fcfill();
> +				lexState = state[curChar];
> +				break;
>  		    default:
>  			 /*
>  			 * None of the state tables contain entries
> @@ -1596,6 +1599,36 @@
>  	{
>  		if(n!=S_NL)
>  		{
> +#if SHOPT_MULTIBYTE
> +			if(mbwide())
> +			{
> +				do
> +				{
> +					ssize_t len;
> +					switch((len = mbsize(_Fcin.fcptr)))
> +					{
> +					    case -1:	/* bogus multiByte char - parse as bytes? */
> +					    case 0:	/* NULL byte */
> +					    case 1:
> +						n = state[fcget()];
> +						break;
> +					    default:
> +						/*
> +						 * None of the state tables contain
> +						 * entries for multibyte characters,
> +						 * however, they should be treated
> +						 * the same as any other alph
> +						 * character.  Therefore, we'll use
> +						 * the state of the 'a' character.
> +						 */
> +						mbchar(_Fcin.fcptr);
> +						n = state['a'];
> +					}
> +				}
> +				while(n == 0);
> +			}
> +			else
> +#endif /* SHOPT_MULTIBYTE */
>  			/* skip over regular characters */
>  			while((n=state[fcget()])==0);
>  		}
> --- src/cmd/ksh93/sh/macro.c
> +++ src/cmd/ksh93/sh/macro.c	2006-04-06 16:02:40.000000000 +0000
> @@ -266,7 +266,38 @@
>  	cp = fcseek(0);
>  	while(1)
>  	{
> +#if SHOPT_MULTIBYTE
> +		if(mbwide())
> +		{
> +			do
> +			{
> +				ssize_t len;
> +				switch((len = mbsize(cp)))
> +				{
> +				    case -1:	/* bogus multiByte char - parse as bytes? */
> +				    case 0:	/* NULL byte */
> +				    case 1:
> +					n = state[*(unsigned char*)cp++];
> +					break;
> +				    default:
> +					/*
> +					 * None of the state tables contain
> +					 * entries for multibyte characters,
> +					 * however, they should be treated
> +					 * the same as any other alph
> +					 * character.  Therefore, we'll use
> +					 * the state of the 'a' character.
> +					 */
> +					cp += len;
> +					n = state['a'];
> +				}
> +			}
> +			while(n == 0);
> +		}
> +		else
> +#endif /* SHOPT_MULTIBYTE */
>  		while((n=state[*(unsigned char*)cp++])==0);
> +
>  		if(n==S_NL || n==S_QUOTE || n==S_RBRA)
>  			continue;
>  		if(c=(cp-1)-fcseek(0))
> @@ -395,8 +426,42 @@
>  		cp++;
>  	while(1)
>  	{
> -		while((n=state[*(unsigned char*)cp++])==0);
> -		c = (cp-1) - first;
> +#if SHOPT_MULTIBYTE
> +		if (mbwide())
> +		{
> +			ssize_t len;
> +			do
> +			{
> +				switch((len = mbsize(cp)))
> +				{
> +				    case -1:	/* bogus multiByte char - parse as bytes? */
> +				    case 0:	/* NULL byte */
> +					len = 1;
> +				    case 1:
> +					n = state[*(unsigned char*)cp++];
> +					break;
> +				    default:
> +					/*
> +					 * None of the state tables contain entries
> +					 * for multibyte characters.  However, they
> +					 * should be treated the same as any other
> +					 * alpha character, so we'll use the state
> +					 * which would normally be assigned to the
> +					 * 'a' character.
> +					 */
> +					cp += len;
> +					n = state['a'];
> +				}
> +			}
> +			while(n == 0);
> +			c = (cp-len) - first;
> +                }
> +		else
> +#endif /* SHOPT_MULTIBYTE */
> +		{
> +			while((n=state[*(unsigned char*)cp++])==0);
> +			c = (cp-1) - first;
> +		}
>  		switch(n)
>  		{
>  		    case S_ESC:
> --- src/lib/libcmd/Mamfile
> +++ src/lib/libcmd/Mamfile	2006-04-06 16:08:42.000000000 +0000
> @@ -444,7 +444,7 @@
>  prev cat.c
>  meta cat.o %.c>%.o cat.c cat
>  prev cat.c
> -exec - ${CC} ${mam_cc_FLAGS} ${CCFLAGS} -I. -I${PACKAGE_ast_INCLUDE} -DERROR_CATALOG=\""libcmd"\" -DUSAGE_LICENSE=\""[-author?Glenn Fowler <gsf在research.att.com>][-author?David Korn <dgk在research.att.com>][-copyright?Copyright (c) 1992-2006 AT&T Knowledge Ventures][-license?http://www.opensource.org/licenses/cpl1.0.txt][--catalog?libcmd]"\" -D_PACKAGE_ast -D_BLD_cmd -c cat.c
> +exec - ${CC} ${mam_cc_FLAGS} ${CCFLAGS} -I. -I${PACKAGE_ast_INCLUDE} -DERROR_CATALOG=\""libcmd"\" -DUSAGE_LICENSE=\""[-author?Glenn Fowler <gsf在research.att.com>][-author?David Korn <dgk在research.att.com>][-copyright?Copyright (c) 1992-2006 AT&T Knowledge Ventures][-license?http://www.opensource.org/licenses/cpl1.0.txt][--catalog?libcmd]"\" -D_PACKAGE_ast -D_BLD_cmd -DSHOPT_MULTIBYTE -c cat.c
>  done cat.o generated
>  make chgrp.o
>  prev chgrp.c
> --- src/lib/libcmd/cat.c
> +++ src/lib/libcmd/cat.c	2006-04-06 16:09:45.000000000 +0000
> @@ -133,8 +133,39 @@
>  		while (endbuff)
>  		{
>  			cpold = cp;
> -			/* skip over ASCII characters */
> +			/* skip over ASCII and multi byte characters */
> +#if SHOPT_MULTIBYTE
> +			if(mbwide())
> +			{
> +				do
> +				{
> +					ssize_t len;
> +					switch((len = mbsize(cp)))
> +					{
> +					    case -1:	/* bogus multiByte char - parse as bytes? */
> +					    case 0:	/* NULL byte */
> +					    case 1:
> +						n = states[*cp++];
> +						break;
> +					    default:
> +						/*
> +						 * None of the state tables contain
> +						 * entries for multibyte characters,
> +						 * however, they should be treated
> +						 * the same as any other alph
> +						 * character.  Therefore, we'll use
> +						 * the state of the 'a' character.
> +						 */
> +						cp += len;
> +						n = states['a'];
> +					}
> +				}
> +				while(n == 0);
> +			}
> +			else
> +#endif /* SHOPT_MULTIBYTE */
>  			while ((n = states[*cp++]) == 0);
> +
>  			if (n==T_ENDBUF)
>  			{
>  				if (cp>endbuff)
> 
> 
> 
> ------------------------------------------------------------------------
> 
> _______________________________________________
> i18n-discuss mailing list
> i18n-discuss在opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/i18n-discuss
-------------- 下一部分 --------------
一个非文本附件被清除...
发信人: %(who)s
主题: %(subject)s
日期: %(date)s
大小: 1561
Url: http://oss-beta1.opensolaris.org/pipermail/g11n-zh-hk-discuss/attachments/20060412/c78213b2/attachment.bin 


关于邮件列表 g11n-zh-hk-discuss 的更多信息