From roland.mainz at nrubsig.org Tue Apr 11 17:05:40 2006 From: roland.mainz at nrubsig.org (Roland Mainz) Date: Wed, 12 Apr 2006 02:05:40 +0200 Subject: [g11n-zh-tw-discuss] [Fwd: Re: ksh93 i18n problems on Solaris ? / was: Re: [ksh93-integration-discuss] comments on ksh93 migration plan] Message-ID: <443C4454.8FE82400@nrubsig.org> Hi! ---- Can anyone in the i18n community help to verify that the attached patch for ksh93 (Korn Shell 93) fixes the problems when inputting/editing text in ja_JP.PCK or *.UTF-8 locales, please ? We really need some urgend feedback whether the patch fixes this issue... Building ksh93 from source: 1. Download http://svn.genunix.org/repos/on/branches/ksh93/gisburn/scripts/buildksh93.ksh - this script builds ksh93 from sources (and also contains instructions how to download the sources via "wget") 2. Fetch sources as described in "buildksh93.ksh" 3. Edit "buildksh93.ksh" to match the platform (default is Solaris 10 on i386 with Sun Studio 10/11) 4. Unpack source % mkdir build % cd build % gunzip -c ../ast-ksh.2006-02-14.tgz | tar -xf - % gunzip -c ../INIT.2006-01-24.tgz | tar -xf - 5. Apply patch: % gpatch -p0 &1 | tee -a buildlog.log 7. Start ksh93: % ./arch/sol10.i386/bin/ksh # input and/or edit japanese/chinese/korean text and report whether this works correctly Thanks for the help! :-) -------- Original Message -------- Subject: Re: ksh93 i18n problems on Solaris ? / was: Re: [ksh93-integration-discuss] comments on ksh93 migration plan Date: Sun, 09 Apr 2006 22:41:42 +0200 From: Roland Mainz To: April Chin CC: ksh93-integration-discuss at opensolaris.org,robbin.kawabata at sun.com, kenjiro.tsuji at sun.com References: <200602221818.k1MIIfKE729837 at jurassic.eng.sun.com> April Chin wrote: > > April Chin wrote: > > > > ksh93 already has support for I18N and for multibyte character > > > > handling. If there is need for change, it should be minimal > > > > since currently all error message translation goes through a single > > > > interface. > > > > > > > > The multibyte character handling uses the POSIX mb*() interface. > > > I believe in ksh93 this may not be working in all respects. > > > An i18n engineer and I tried a few manual tests on ksh93 with > > > multibyte characters, and ksh93 did not appear to recognize some of > > > them which may have included an ASCII byte within the multibyte character. > > > > April - did this problem occur only in interactive terminal mode or even > > when a script writes the japanese/ASCII text mixture (e.g. % cat > > "ksh93_echo_japanese.ksh93" | ksh # ) ? > > The test was done via a ksh93 script. > On a system with the Japanese locales installed, I set up > an appropriate locale: > > % setenv LANG ja_JP.PCK > > I input a file (sjis.dat, attached below) containing multibyte > characters, including ones with an ASCII byte, to a ksh93 script which > reads and echoes each line, and then each word in that line. > > % cat sjis.dat | test.sh > > where test.sh contains: > #!/usr/bin/ksh93 > read a > echo $a > > for b in $a ; > do > echo $b; > done > > Doing the same test with the current Solaris ksh, instead of ksh93, > output all the characters as expected. ksh93 was able to process > 2 out of 3 multibyte characters which contained an ASCII component. See below (and the attached patch... the fix is quite obvious what is going wrong... :-) ) ... > > Linux may suffer from a similar problem, please read > > https://mailman.research.att.com/pipermail/ast-users/2006q1/000838.html > > and > > https://mailman.research.att.com/pipermail/ast-users/2006q1/000839.html > > > > BTW: Which terminal emulator did you use ? Gnome terminal, kconsole, > > dtterm or xterm ? > > The i18n engineer provided me with a terminal emulator for which > I could turn on sjis mode, the newer mode which will > accept multibyte characters which may have an ASCII component. > It sounds like the Linux problem is related to the terminal emulator. No, the terminal emulator (KDE "konsole") was not the problem (the developers of it tested that explicitly) - seems to be a problem in ksh93r itself. Attached is a patch ("ksh93-shift_ijs.diff.txt" ; this patch applies against ksh93r) written by Werner Fink/SuSE which fixes the problem for both SuSE Linux and Solaris 10 (I tested this with en_US.UTF-8 and japanese text, for ja_JP.PCK I have to wait until next week when the student who knows this locale can run some quick tests herself...). Could you please test whether the problems on your side are now fixed ? The only thing which is interesting now (assuming the patch works): Can we make somehow an automated test script for this kind of failure (I'd like to see this integrated into the ksh93 test suite) ? ---- Bye, Roland -- __ . . __ (o.\ \/ /.o) roland.mainz at nrubsig.org \__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer /O /==\ O\ TEL +49 641 7950090 (;O/ \/ \O;) -------------- next part -------------- --- src/cmd/ksh93/sh/lex.c +++ src/cmd/ksh93/sh/lex.c 2006-04-06 15:58:08.000000000 +0000 @@ -293,11 +293,14 @@ { switch(*len = mbsize(_Fcin.fcptr)) { - case -1: /* bogus multiByte char - parse as bytes? */ - case 0: /* NULL byte */ + case -1: /* bogus multiByte char - parse as bytes? */ + case 0: /* NULL byte */ + *len = 1; case 1: - lexState = state[curChar=fcget()]; - break; + if ((curChar = fcget()) == 0) + curChar = fcfill(); + lexState = state[curChar]; + break; default: /* * None of the state tables contain entries @@ -1596,6 +1599,36 @@ { if(n!=S_NL) { +#if SHOPT_MULTIBYTE + if(mbwide()) + { + do + { + ssize_t len; + switch((len = mbsize(_Fcin.fcptr))) + { + case -1: /* bogus multiByte char - parse as bytes? */ + case 0: /* NULL byte */ + case 1: + n = state[fcget()]; + break; + default: + /* + * None of the state tables contain + * entries for multibyte characters, + * however, they should be treated + * the same as any other alph + * character. Therefore, we'll use + * the state of the 'a' character. + */ + mbchar(_Fcin.fcptr); + n = state['a']; + } + } + while(n == 0); + } + else +#endif /* SHOPT_MULTIBYTE */ /* skip over regular characters */ while((n=state[fcget()])==0); } --- src/cmd/ksh93/sh/macro.c +++ src/cmd/ksh93/sh/macro.c 2006-04-06 16:02:40.000000000 +0000 @@ -266,7 +266,38 @@ cp = fcseek(0); while(1) { +#if SHOPT_MULTIBYTE + if(mbwide()) + { + do + { + ssize_t len; + switch((len = mbsize(cp))) + { + case -1: /* bogus multiByte char - parse as bytes? */ + case 0: /* NULL byte */ + case 1: + n = state[*(unsigned char*)cp++]; + break; + default: + /* + * None of the state tables contain + * entries for multibyte characters, + * however, they should be treated + * the same as any other alph + * character. Therefore, we'll use + * the state of the 'a' character. + */ + cp += len; + n = state['a']; + } + } + while(n == 0); + } + else +#endif /* SHOPT_MULTIBYTE */ while((n=state[*(unsigned char*)cp++])==0); + if(n==S_NL || n==S_QUOTE || n==S_RBRA) continue; if(c=(cp-1)-fcseek(0)) @@ -395,8 +426,42 @@ cp++; while(1) { - while((n=state[*(unsigned char*)cp++])==0); - c = (cp-1) - first; +#if SHOPT_MULTIBYTE + if (mbwide()) + { + ssize_t len; + do + { + switch((len = mbsize(cp))) + { + case -1: /* bogus multiByte char - parse as bytes? */ + case 0: /* NULL byte */ + len = 1; + case 1: + n = state[*(unsigned char*)cp++]; + break; + default: + /* + * None of the state tables contain entries + * for multibyte characters. However, they + * should be treated the same as any other + * alpha character, so we'll use the state + * which would normally be assigned to the + * 'a' character. + */ + cp += len; + n = state['a']; + } + } + while(n == 0); + c = (cp-len) - first; + } + else +#endif /* SHOPT_MULTIBYTE */ + { + while((n=state[*(unsigned char*)cp++])==0); + c = (cp-1) - first; + } switch(n) { case S_ESC: --- src/lib/libcmd/Mamfile +++ src/lib/libcmd/Mamfile 2006-04-06 16:08:42.000000000 +0000 @@ -444,7 +444,7 @@ prev cat.c meta cat.o %.c>%.o cat.c cat prev cat.c -exec - ${CC} ${mam_cc_FLAGS} ${CCFLAGS} -I. -I${PACKAGE_ast_INCLUDE} -DERROR_CATALOG=\""libcmd"\" -DUSAGE_LICENSE=\""[-author?Glenn Fowler ][-author?David Korn ][-copyright?Copyright (c) 1992-2006 AT&T Knowledge Ventures][-license?http://www.opensource.org/licenses/cpl1.0.txt][--catalog?libcmd]"\" -D_PACKAGE_ast -D_BLD_cmd -c cat.c +exec - ${CC} ${mam_cc_FLAGS} ${CCFLAGS} -I. -I${PACKAGE_ast_INCLUDE} -DERROR_CATALOG=\""libcmd"\" -DUSAGE_LICENSE=\""[-author?Glenn Fowler ][-author?David Korn ][-copyright?Copyright (c) 1992-2006 AT&T Knowledge Ventures][-license?http://www.opensource.org/licenses/cpl1.0.txt][--catalog?libcmd]"\" -D_PACKAGE_ast -D_BLD_cmd -DSHOPT_MULTIBYTE -c cat.c done cat.o generated make chgrp.o prev chgrp.c --- src/lib/libcmd/cat.c +++ src/lib/libcmd/cat.c 2006-04-06 16:09:45.000000000 +0000 @@ -133,8 +133,39 @@ while (endbuff) { cpold = cp; - /* skip over ASCII characters */ + /* skip over ASCII and multi byte characters */ +#if SHOPT_MULTIBYTE + if(mbwide()) + { + do + { + ssize_t len; + switch((len = mbsize(cp))) + { + case -1: /* bogus multiByte char - parse as bytes? */ + case 0: /* NULL byte */ + case 1: + n = states[*cp++]; + break; + default: + /* + * None of the state tables contain + * entries for multibyte characters, + * however, they should be treated + * the same as any other alph + * character. Therefore, we'll use + * the state of the 'a' character. + */ + cp += len; + n = states['a']; + } + } + while(n == 0); + } + else +#endif /* SHOPT_MULTIBYTE */ while ((n = states[*cp++]) == 0); + if (n==T_ENDBUF) { if (cp>endbuff) From Ienup.Sung at Sun.COM Tue Apr 11 18:00:43 2006 From: Ienup.Sung at Sun.COM (Ienup Sung) Date: Tue, 11 Apr 2006 18:00:43 -0700 Subject: [g11n-zh-tw-discuss] Re: [i18n-discuss] [Fwd: Re: ksh93 i18n problems on Solaris ? / was: Re: [ksh93-integration-discuss] comments on ksh93 migration plan] In-Reply-To: <443C4454.8FE82400@nrubsig.org> References: <443C4454.8FE82400@nrubsig.org> Message-ID: <443C513B.3070707@sun.com> Hello Roland, If you have binaries that people can pick up and use to do the test, that'll be really great. Would you please let us know if you have binaries? If you have and need a temporary location for people to download, please also let me know. We have a temp directory at the community web page that people can down your binaries. Ienup From roland.mainz at nrubsig.org Wed Apr 12 12:27:35 2006 From: roland.mainz at nrubsig.org (Roland Mainz) Date: Wed, 12 Apr 2006 21:27:35 +0200 Subject: [g11n-zh-tw-discuss] Re: [i18n-discuss] [Fwd: Re: ksh93 i18n problems on Solaris ? / was:Re: [ksh93-integration-discuss] comments on ksh93 migration plan] References: <443C4454.8FE82400@nrubsig.org> <443C513B.3070707@sun.com> Message-ID: <443D54A7.F06393B9@nrubsig.org> Ienup Sung wrote: > If you have binaries that people can pick up and use to do the test, > that'll be really great. Would you please let us know if you have binaries? I can make binaries for Solaris 10 x86 and Solaris 8 SPARC (which leads to the problem that I am currently working from home and do not have any Solaris 10 SPARC machine around to build OS/Net - any idea who may be able to help (e.g. remote access to Solaris 10 SPARC machine with Sun Studio 10 or 11 installed)) ... > If you have and need a temporary location for people to download, please > also let me know. We have a temp directory at the community web page that > people can down your binaries. BTW: Do you know whether there is any per-opensolaris.org FTP/HTTP area where I could store such downloads ? svn.genunix.org would be one option - but right now I'd like to avoid spamming their Subversion repository with giant binary archives... ---- Bye, Roland -- __ . . __ (o.\ \/ /.o) roland.mainz at nrubsig.org \__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer /O /==\ O\ TEL +49 641 7950090 (;O/ \/ \O;) From Ienup.Sung at Sun.COM Wed Apr 12 12:37:07 2006 From: Ienup.Sung at Sun.COM (Ienup Sung) Date: Wed, 12 Apr 2006 12:37:07 -0700 Subject: [g11n-zh-tw-discuss] Re: [i18n-discuss] [Fwd: Re: ksh93 i18n problems on Solaris ? / was:Re: [ksh93-integration-discuss] comments on ksh93 migration plan] In-Reply-To: <443D54A7.F06393B9@nrubsig.org> References: <443C4454.8FE82400@nrubsig.org> <443C513B.3070707@sun.com> <443D54A7.F06393B9@nrubsig.org> Message-ID: <443D56E3.9080606@sun.com> Roland Mainz wrote: > I can make binaries for Solaris 10 x86 and Solaris 8 SPARC (which leads > to the problem that I am currently working from home and do not have any > Solaris 10 SPARC machine around to build OS/Net - any idea who may be > able to help (e.g. remote access to Solaris 10 SPARC machine with Sun > Studio 10 or 11 installed)) ... Oh. In that case, let me try to build test binaries for both platforms of S10 and I'll upload them at the I18N and L10N community's temp directory. (I was hoping you might have the test binaries handy.) > BTW: Do you know whether there is any per-opensolaris.org FTP/HTTP area > where I could store such downloads ? svn.genunix.org would be one option > - but right now I'd like to avoid spamming their Subversion repository > with giant binary archives... I don't know if we have such "sandbox" location in OSo. I cc'd Jim and also Derek in this email. (I noticed that Eric's already in the Cc: header field.) For this particular purpose, I'll upload the test binaries at: http://www.opensolaris.org/os/community/int_localization/tmp/ I though have a meeting this afternoon and so most likely I'll be able to upload at sometime in tonight. Ienup From Ienup.Sung at Sun.COM Wed Apr 12 19:42:05 2006 From: Ienup.Sung at Sun.COM (Ienup Sung) Date: Wed, 12 Apr 2006 19:42:05 -0700 Subject: [g11n-zh-tw-discuss] Re: [i18n-discuss] [Fwd: Re: ksh93 i18n problems on Solaris ? / was: Re: [ksh93-integration-discuss] comments on ksh93 migration plan] In-Reply-To: <443C4454.8FE82400@nrubsig.org> References: <443C4454.8FE82400@nrubsig.org> Message-ID: <443DBA7D.5040006@sun.com> Hello everyone, Roland, I just built the ksh93 as Roland instructed and uploaded the binaries at the following location: http://www.opensolaris.org/os/community/int_localization/tmp/ The binaries have been built with Solaris 10 and SOS10 C compiler and so they should work for S10 and SX releases. There is also readme.txt file that people can take a look at to see some more info on how to extract and what to test. I also briefly tested the ksh with en_US.UTF-8 locale and found the result a bit of mixed bag. I was able to input most of CJK, Arabic, Greek, Cyrillic characters and also view all Unicode characters (for that, by doing "more /usr/pub/UTF-8") but I wasn't able to input some accented characters such as ?, ?, and so on. I was, however, able to input ?, ?, and so on. (The attached script file shows the test log that I did. The same for i386 version of ksh too.) I did "ldd ksh" after the build and it showed the following by the way: system% ldd arch/sol10.sun4/bin/ksh libm.so.2 => /lib/64/libm.so.2 libsecdb.so.1 => /lib/64/libsecdb.so.1 libc.so.1 => /lib/64/libc.so.1 libnsl.so.1 => /lib/64/libnsl.so.1 libcmd.so.1 => /lib/64/libcmd.so.1 libmp.so.2 => /lib/64/libmp.so.2 libmd5.so.1 => /lib/64/libmd5.so.1 libscf.so.1 => /lib/64/libscf.so.1 libdoor.so.1 => /lib/64/libdoor.so.1 libuutil.so.1 => /lib/64/libuutil.so.1 /platform/SUNW,Sun-Blade-1000/lib/sparcv9/libc_psr.so.1 /platform/SUNW,Sun-Blade-1000/lib/sparcv9/libmd5_psr.so.1 And I also noticed that you've src/lib/libcmd/cat.c patched. Would I also need to add -Bstatic before linking with the -lcmd? Ienup Roland Mainz wrote: > Hi! > > ---- > > Can anyone in the i18n community help to verify that the attached patch > for ksh93 (Korn Shell 93) fixes the problems when inputting/editing text > in ja_JP.PCK or *.UTF-8 locales, please ? We really need some urgend > feedback whether the patch fixes this issue... > > Building ksh93 from source: > 1. Download > http://svn.genunix.org/repos/on/branches/ksh93/gisburn/scripts/buildksh93.ksh > - this script builds ksh93 from sources (and also contains instructions > how to download the sources via "wget") > 2. Fetch sources as described in "buildksh93.ksh" > 3. Edit "buildksh93.ksh" to match the platform (default is Solaris 10 on > i386 with Sun Studio 10/11) > 4. Unpack source > % mkdir build > % cd build > % gunzip -c ../ast-ksh.2006-02-14.tgz | tar -xf - > % gunzip -c ../INIT.2006-01-24.tgz | tar -xf - > 5. Apply patch: > % gpatch -p0 6. Build ksh93: > % time nice ksh ../buildksh93.ksh 2>&1 | tee -a buildlog.log > 7. Start ksh93: > % ./arch/sol10.i386/bin/ksh > # input and/or edit japanese/chinese/korean text and report whether this > works correctly > > Thanks for the help! :-) > > -------- Original Message -------- > Subject: Re: ksh93 i18n problems on Solaris ? / was: Re: > [ksh93-integration-discuss] comments on ksh93 migration plan > Date: Sun, 09 Apr 2006 22:41:42 +0200 > From: Roland Mainz > To: April Chin > CC: ksh93-integration-discuss at opensolaris.org,robbin.kawabata at sun.com, > kenjiro.tsuji at sun.com > References: <200602221818.k1MIIfKE729837 at jurassic.eng.sun.com> > > April Chin wrote: > >>>April Chin wrote: >>> >>>>>ksh93 already has support for I18N and for multibyte character >>>>>handling. If there is need for change, it should be minimal >>>>>since currently all error message translation goes through a single >>>>>interface. >>>>> >>>>>The multibyte character handling uses the POSIX mb*() interface. >>>> >>>>I believe in ksh93 this may not be working in all respects. >>>>An i18n engineer and I tried a few manual tests on ksh93 with >>>>multibyte characters, and ksh93 did not appear to recognize some of >>>>them which may have included an ASCII byte within the multibyte character. >>> >>>April - did this problem occur only in interactive terminal mode or even >>>when a script writes the japanese/ASCII text mixture (e.g. % cat >>>"ksh93_echo_japanese.ksh93" | ksh # ) ? >> >>The test was done via a ksh93 script. >>On a system with the Japanese locales installed, I set up >>an appropriate locale: >> >>% setenv LANG ja_JP.PCK >> >>I input a file (sjis.dat, attached below) containing multibyte >>characters, including ones with an ASCII byte, to a ksh93 script which >>reads and echoes each line, and then each word in that line. >> >>% cat sjis.dat | test.sh >> >>where test.sh contains: >>#!/usr/bin/ksh93 >>read a >>echo $a >> >>for b in $a ; >>do >> echo $b; >>done >> >>Doing the same test with the current Solaris ksh, instead of ksh93, >>output all the characters as expected. ksh93 was able to process >>2 out of 3 multibyte characters which contained an ASCII component. > > > See below (and the attached patch... the fix is quite obvious what is > going wrong... :-) ) ... > > >>>Linux may suffer from a similar problem, please read >>>https://mailman.research.att.com/pipermail/ast-users/2006q1/000838.html >>>and >>>https://mailman.research.att.com/pipermail/ast-users/2006q1/000839.html >>> >>>BTW: Which terminal emulator did you use ? Gnome terminal, kconsole, >>>dtterm or xterm ? >> >>The i18n engineer provided me with a terminal emulator for which >>I could turn on sjis mode, the newer mode which will >>accept multibyte characters which may have an ASCII component. >>It sounds like the Linux problem is related to the terminal emulator. > > > No, the terminal emulator (KDE "konsole") was not the problem (the > developers of it tested that explicitly) - seems to be a problem in > ksh93r itself. > > Attached is a patch ("ksh93-shift_ijs.diff.txt" ; this patch applies > against ksh93r) written by Werner Fink/SuSE which fixes the problem for > both SuSE Linux and Solaris 10 (I tested this with en_US.UTF-8 and > japanese text, for ja_JP.PCK I have to wait until next week when the > student who knows this locale can run some quick tests herself...). > Could you please test whether the problems on your side are now fixed ? > > The only thing which is interesting now (assuming the patch works): Can > we make somehow an automated test script for this kind of failure (I'd > like to see this integrated into the ksh93 test suite) ? > > ---- > > Bye, > Roland > > > > ------------------------------------------------------------------------ > > --- src/cmd/ksh93/sh/lex.c > +++ src/cmd/ksh93/sh/lex.c 2006-04-06 15:58:08.000000000 +0000 > @@ -293,11 +293,14 @@ > { > switch(*len = mbsize(_Fcin.fcptr)) > { > - case -1: /* bogus multiByte char - parse as bytes? */ > - case 0: /* NULL byte */ > + case -1: /* bogus multiByte char - parse as bytes? */ > + case 0: /* NULL byte */ > + *len = 1; > case 1: > - lexState = state[curChar=fcget()]; > - break; > + if ((curChar = fcget()) == 0) > + curChar = fcfill(); > + lexState = state[curChar]; > + break; > default: > /* > * None of the state tables contain entries > @@ -1596,6 +1599,36 @@ > { > if(n!=S_NL) > { > +#if SHOPT_MULTIBYTE > + if(mbwide()) > + { > + do > + { > + ssize_t len; > + switch((len = mbsize(_Fcin.fcptr))) > + { > + case -1: /* bogus multiByte char - parse as bytes? */ > + case 0: /* NULL byte */ > + case 1: > + n = state[fcget()]; > + break; > + default: > + /* > + * None of the state tables contain > + * entries for multibyte characters, > + * however, they should be treated > + * the same as any other alph > + * character. Therefore, we'll use > + * the state of the 'a' character. > + */ > + mbchar(_Fcin.fcptr); > + n = state['a']; > + } > + } > + while(n == 0); > + } > + else > +#endif /* SHOPT_MULTIBYTE */ > /* skip over regular characters */ > while((n=state[fcget()])==0); > } > --- src/cmd/ksh93/sh/macro.c > +++ src/cmd/ksh93/sh/macro.c 2006-04-06 16:02:40.000000000 +0000 > @@ -266,7 +266,38 @@ > cp = fcseek(0); > while(1) > { > +#if SHOPT_MULTIBYTE > + if(mbwide()) > + { > + do > + { > + ssize_t len; > + switch((len = mbsize(cp))) > + { > + case -1: /* bogus multiByte char - parse as bytes? */ > + case 0: /* NULL byte */ > + case 1: > + n = state[*(unsigned char*)cp++]; > + break; > + default: > + /* > + * None of the state tables contain > + * entries for multibyte characters, > + * however, they should be treated > + * the same as any other alph > + * character. Therefore, we'll use > + * the state of the 'a' character. > + */ > + cp += len; > + n = state['a']; > + } > + } > + while(n == 0); > + } > + else > +#endif /* SHOPT_MULTIBYTE */ > while((n=state[*(unsigned char*)cp++])==0); > + > if(n==S_NL || n==S_QUOTE || n==S_RBRA) > continue; > if(c=(cp-1)-fcseek(0)) > @@ -395,8 +426,42 @@ > cp++; > while(1) > { > - while((n=state[*(unsigned char*)cp++])==0); > - c = (cp-1) - first; > +#if SHOPT_MULTIBYTE > + if (mbwide()) > + { > + ssize_t len; > + do > + { > + switch((len = mbsize(cp))) > + { > + case -1: /* bogus multiByte char - parse as bytes? */ > + case 0: /* NULL byte */ > + len = 1; > + case 1: > + n = state[*(unsigned char*)cp++]; > + break; > + default: > + /* > + * None of the state tables contain entries > + * for multibyte characters. However, they > + * should be treated the same as any other > + * alpha character, so we'll use the state > + * which would normally be assigned to the > + * 'a' character. > + */ > + cp += len; > + n = state['a']; > + } > + } > + while(n == 0); > + c = (cp-len) - first; > + } > + else > +#endif /* SHOPT_MULTIBYTE */ > + { > + while((n=state[*(unsigned char*)cp++])==0); > + c = (cp-1) - first; > + } > switch(n) > { > case S_ESC: > --- src/lib/libcmd/Mamfile > +++ src/lib/libcmd/Mamfile 2006-04-06 16:08:42.000000000 +0000 > @@ -444,7 +444,7 @@ > prev cat.c > meta cat.o %.c>%.o cat.c cat > prev cat.c > -exec - ${CC} ${mam_cc_FLAGS} ${CCFLAGS} -I. -I${PACKAGE_ast_INCLUDE} -DERROR_CATALOG=\""libcmd"\" -DUSAGE_LICENSE=\""[-author?Glenn Fowler ][-author?David Korn ][-copyright?Copyright (c) 1992-2006 AT&T Knowledge Ventures][-license?http://www.opensource.org/licenses/cpl1.0.txt][--catalog?libcmd]"\" -D_PACKAGE_ast -D_BLD_cmd -c cat.c > +exec - ${CC} ${mam_cc_FLAGS} ${CCFLAGS} -I. -I${PACKAGE_ast_INCLUDE} -DERROR_CATALOG=\""libcmd"\" -DUSAGE_LICENSE=\""[-author?Glenn Fowler ][-author?David Korn ][-copyright?Copyright (c) 1992-2006 AT&T Knowledge Ventures][-license?http://www.opensource.org/licenses/cpl1.0.txt][--catalog?libcmd]"\" -D_PACKAGE_ast -D_BLD_cmd -DSHOPT_MULTIBYTE -c cat.c > done cat.o generated > make chgrp.o > prev chgrp.c > --- src/lib/libcmd/cat.c > +++ src/lib/libcmd/cat.c 2006-04-06 16:09:45.000000000 +0000 > @@ -133,8 +133,39 @@ > while (endbuff) > { > cpold = cp; > - /* skip over ASCII characters */ > + /* skip over ASCII and multi byte characters */ > +#if SHOPT_MULTIBYTE > + if(mbwide()) > + { > + do > + { > + ssize_t len; > + switch((len = mbsize(cp))) > + { > + case -1: /* bogus multiByte char - parse as bytes? */ > + case 0: /* NULL byte */ > + case 1: > + n = states[*cp++]; > + break; > + default: > + /* > + * None of the state tables contain > + * entries for multibyte characters, > + * however, they should be treated > + * the same as any other alph > + * character. Therefore, we'll use > + * the state of the 'a' character. > + */ > + cp += len; > + n = states['a']; > + } > + } > + while(n == 0); > + } > + else > +#endif /* SHOPT_MULTIBYTE */ > while ((n = states[*cp++]) == 0); > + > if (n==T_ENDBUF) > { > if (cp>endbuff) > > > > ------------------------------------------------------------------------ > > _______________________________________________ > i18n-discuss mailing list > i18n-discuss at opensolaris.org > http://mail.opensolaris.org/mailman/listinfo/i18n-discuss -------------- next part -------------- A non-text attachment was scrubbed... Name: ksh93-sparc-test-log.txt Type: application/text Size: 1561 bytes Desc: not available URL: From roland.mainz at nrubsig.org Thu Apr 13 07:04:54 2006 From: roland.mainz at nrubsig.org (Roland Mainz) Date: Thu, 13 Apr 2006 16:04:54 +0200 Subject: [g11n-zh-tw-discuss] Re: [i18n-discuss] [Fwd: Re: ksh93 i18n problems on Solaris ? / was:Re: [ksh93-integration-discuss] comments on ksh93 migration plan] References: <443C4454.8FE82400@nrubsig.org> <443DBA7D.5040006@sun.com> Message-ID: <443E5A86.EA81545C@nrubsig.org> Ienup Sung wrote: > I just built the ksh93 as Roland instructed and uploaded the binaries at > the following location: > > http://www.opensolaris.org/os/community/int_localization/tmp/ Thanks! :-) > The binaries have been built with Solaris 10 and SOS10 C compiler and so > they should work for S10 and SX releases. There is also readme.txt file that > people can take a look at to see some more info on how to extract and what to > test. > > I also briefly tested the ksh with en_US.UTF-8 locale and found the result > a bit of mixed bag. I was able to input most of What do you mean with "most of" ? Were there any failures with characters in these languages? > CJK, Arabic, Greek, Cyrillic > characters and also view all Unicode characters (for that, by doing > "more /usr/pub/UTF-8") but I wasn't able to input some accented characters > such as ??, ??, and so on. Which terminal emulator did you use ? The Gnome terminal ? Does it work with other terminal emulators (e.g. "dtterm" or KDE's "konsole") ? How does normal Solaris ksh behave for such characters (or bash3) ? BTW: Did you enable "emacs" or "vi" editing mode (e.g. % set -o emacs #) before entering the chars ? > I was, however, able to input ?,, ??, and so on. Weired. Any idea what is going wrong here (except the list of "usual suspects" listed above) ? > (The attached script file shows the test log that I did. The same for i386 > version of ksh too.) > > I did "ldd ksh" after the build and it showed the following by the way: BTW: You build the 64bit version on SPARC... did you get the same issues with accented characters when you run the 32bit version ? > system% ldd arch/sol10.sun4/bin/ksh > libm.so.2 => /lib/64/libm.so.2 > libsecdb.so.1 => /lib/64/libsecdb.so.1 > libc.so.1 => /lib/64/libc.so.1 > libnsl.so.1 => /lib/64/libnsl.so.1 > libcmd.so.1 => /lib/64/libcmd.so.1 > libmp.so.2 => /lib/64/libmp.so.2 > libmd5.so.1 => /lib/64/libmd5.so.1 > libscf.so.1 => /lib/64/libscf.so.1 > libdoor.so.1 => /lib/64/libdoor.so.1 > libuutil.so.1 => /lib/64/libuutil.so.1 > /platform/SUNW,Sun-Blade-1000/lib/sparcv9/libc_psr.so.1 > /platform/SUNW,Sun-Blade-1000/lib/sparcv9/libmd5_psr.so.1 > > And I also noticed that you've src/lib/libcmd/cat.c patched. That's the builtin version of /bin/cat ... > Would I also > need to add -Bstatic before linking with the -lcmd? The libcmd issue is slightly different. ksh93 has it's libcmd and Solaris has it's own libcmd. - For "buildksh93.ksh" I am simply linking libcmd.a (due lack of a shared library this should work (or not... I really didn't check this but I think you're right with the -Bstatic thing... ;-/ ) - and for the non-shared version of ksh93 (e.g. without seperate libast, libdll, libshell etc. libraries) it works definately as this is the normal way how the kornshell.com people build it). - For OS/Net I have a merged version of libcmd which contains both versions (see http://mail.opensolaris.org/pipermail/ksh93-integration-discuss/2006-March/000172.html). IMO it's IMO better to ship the non-shared version for testing purposes which does not depend on all the other libraries... BTW: Does Sun have any automated i18n test scripts for shells ? ---- Bye, Roland -- __ . . __ (o.\ \/ /.o) roland.mainz at nrubsig.org \__\/\/__/ MPEG specialist, C&&JAVA&&Sun&&Unix programmer /O /==\ O\ TEL +49 641 7950090 (;O/ \/ \O;) From Ienup.Sung at Sun.COM Thu Apr 13 11:36:16 2006 From: Ienup.Sung at Sun.COM (Ienup Sung) Date: Thu, 13 Apr 2006 11:36:16 -0700 Subject: [g11n-zh-tw-discuss] Re: [i18n-discuss] [Fwd: Re: ksh93 i18n problems on Solaris ? / was:Re: [ksh93-integration-discuss] comments on ksh93 migration plan] In-Reply-To: <443E5A86.EA81545C@nrubsig.org> References: <443C4454.8FE82400@nrubsig.org> <443DBA7D.5040006@sun.com> <443E5A86.EA81545C@nrubsig.org> Message-ID: <443E9A20.1070705@sun.com> Roland Mainz wrote: >>I also briefly tested the ksh with en_US.UTF-8 locale and found the result >>a bit of mixed bag. I was able to input most of > > > What do you mean with "most of" ? Were there any failures with > characters in these languages? Hello Roland, I have not tested input of all possible CJK and such languages but briefly typed in some notable characters. And so my test wasn't covering all possible characters or byte patterns in them. I hoped to make sure to inform you about that. >>CJK, Arabic, Greek, Cyrillic >>characters and also view all Unicode characters (for that, by doing >>"more /usr/pub/UTF-8") but I wasn't able to input some accented characters >>such as ??, ??, and so on. > > > Which terminal emulator did you use ? The Gnome terminal ? Does it work > with other terminal emulators (e.g. "dtterm" or KDE's "konsole") ? > How does normal Solaris ksh behave for such characters (or bash3) ? > BTW: Did you enable "emacs" or "vi" editing mode (e.g. % set -o emacs #) > before entering the chars ? I used gnome-terminal, dtterm, and xterm at Solaris 10. If I use /usr/bin/csh or /usr/bin/ksh, then, I don't have any problem. With the ksh93, I wasn't able to input some of the accented characters as I noted: ? (0xc3 0xa4, a with diaeresis) ? (0xc3 0xa7, c with cedilla) ? (0xc2 0xa1, inverted !) I haven't tested all Latin-1 characters but it appears I cannot input Latin-1 characters of Unicode (U+0080 - U+00FF). I also just re-did some of the tests that I did yesterday and attached the screen shots in this email. Hope everyone doesn't mind me sending these JPEG files (abuot 110KB) in this email... I didn't use any editing software but just the shell as shown in the log file I sent out. >>I was, however, able to input ?,, ??, and so on. > > > Weired. Any idea what is going wrong here (except the list of "usual > suspects" listed above) ? At this point and personally I don't have any clue since I've not looked at the source code. Things that are sure are there appears at least two problems: 1. I cannot input Latin-1 characters of UTF-8 locales. 2. Error messages from the ksh93 appears went through either not 8-bit clean or some form of a conversion. As I mentioned though, I was having no problem what so ever concatenate or do "more" on the entire Unicode characters especially for all printable characters in the range of U+0000 and U+10FFFF. I'd like to ask other language experts also try it out in your favorite locales and report back. Once some more data are collected, I think I could or anyone in this group could debug and also code review the patches and find out why there are problems and suggest possible fix if it is a bug in the ksh93 source. > BTW: You build the 64bit version on SPARC... did you get the same issues > with accented characters when you run the 32bit version ? The x86 version is an i386 binary (32-bit). As I reported, yes, I saw the same problems with the ksh-i386 binary. >>Would I also >>need to add -Bstatic before linking with the -lcmd? > > > The libcmd issue is slightly different. ksh93 has it's libcmd and > Solaris has it's own libcmd. > - For "buildksh93.ksh" I am simply linking libcmd.a (due lack of a > shared library this should work (or not... I really didn't check this > but I think you're right with the -Bstatic thing... ;-/ ) - and for the > non-shared version of ksh93 (e.g. without seperate libast, libdll, > libshell etc. libraries) it works definately as this is the normal way > how the kornshell.com people build it). > - For OS/Net I have a merged version of libcmd which contains both > versions (see > http://mail.opensolaris.org/pipermail/ksh93-integration-discuss/2006-March/000172.html). > IMO it's IMO better to ship the non-shared version for testing purposes > which does not depend on all the other libraries... Apparently the ksh built with your build script recorded and thinks that it needs shared object libraries which I think not good unless the libcmd between the two versions are 100% identical. Please let me know if the two versions of libraries are 100% compatible and shouldn't matter which one the ksh93 binaries use or not. (Based on your answer, I may need to rebuild the ksh93 binaries before people do the test.) > > BTW: Does Sun have any automated i18n test scripts for shells ? I do not know. I hope people in this mailing list can answer that question. Hi Ales, would you know or know someone who can answer on this question please? Ienup -------------- next part -------------- A non-text attachment was scrubbed... Name: ksh-test-gnome-terminal.jpg Type: image/jpeg Size: 46762 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: ksh-test-dtterm.jpg Type: image/jpeg Size: 67413 bytes Desc: not available URL: From Ienup.Sung at Sun.COM Fri Apr 14 17:49:47 2006 From: Ienup.Sung at Sun.COM (Ienup Sung) Date: Fri, 14 Apr 2006 17:49:47 -0700 Subject: [g11n-zh-tw-discuss] Any volunteer to fix ksh93 multibyte character input issue? (was: Re: [i18n-discuss] [Fwd: Re: ksh93 i18n problems on Solaris ? / was: Re: [ksh93-integration-discuss] comments on ksh93 migration plan]) In-Reply-To: <443C4454.8FE82400@nrubsig.org> References: <443C4454.8FE82400@nrubsig.org> Message-ID: <4440432B.6030209@sun.com> Hello folks, After some separate email exchanges with Dr. Werner Fink and Roland Mainz, it seems we are agreeing on that the ksh93 with the current patch fixes script/batch mode of ksh but not entirely on the multibyte input issues. And so I believe we need a volunteer who will identify the source of the multibyte input problem and also suggest and contribute the source patch for the problem to the ksh93-integration project. If there is anyone who would like to tackle on this problem and supply the source patch for the ksh93-integration project, please let the mailing list know and thanks very much. The email at below and the attached diff file from Roland shows how to get the ksh93 source and build the ksh93 in your system. To reproduce the multibyte input problem, please do the following: 1. CDE Login with any one of the UTF-8 locales such as en_US.UTF-8. (You can also login to JDS desktop with the UTF-8 locale.) 2. Start a dtterm or gnome-terminal. 3. Assuming that you have the ksh93 binary, at the ksh93 prompt, input Latin-1 accented characters such as: <"> press and release one by one for ? and so on and hit a return key. (If you don't have key, you can substitute the key with ++ keys pressing and releasing all together as one.) You will see something like the following: $ ^A^A ./ksh: ^A^A: not found [No such file or directory] The right thing to happen would be: $ ? ./ksh: ?: not found [No such file or directory] For any other Latin-1 compose key sequences or input methods of UTF-8 locales, please refer to the following: http://docs.sun.com/app/docs/doc/817-2521/6mi67tj51?a=view http://docs.sun.com/app/docs/doc/817-2521/6mi67tj4s?a=view Ienup Roland Mainz wrote: > Hi! > > ---- > > Can anyone in the i18n community help to verify that the attached patch > for ksh93 (Korn Shell 93) fixes the problems when inputting/editing text > in ja_JP.PCK or *.UTF-8 locales, please ? We really need some urgend > feedback whether the patch fixes this issue... > > Building ksh93 from source: > 1. Download > http://svn.genunix.org/repos/on/branches/ksh93/gisburn/scripts/buildksh93.ksh > - this script builds ksh93 from sources (and also contains instructions > how to download the sources via "wget") > 2. Fetch sources as described in "buildksh93.ksh" > 3. Edit "buildksh93.ksh" to match the platform (default is Solaris 10 on > i386 with Sun Studio 10/11) > 4. Unpack source > % mkdir build > % cd build > % gunzip -c ../ast-ksh.2006-02-14.tgz | tar -xf - > % gunzip -c ../INIT.2006-01-24.tgz | tar -xf - > 5. Apply patch: > % gpatch -p0 6. Build ksh93: > % time nice ksh ../buildksh93.ksh 2>&1 | tee -a buildlog.log > 7. Start ksh93: > % ./arch/sol10.i386/bin/ksh > # input and/or edit japanese/chinese/korean text and report whether this > works correctly > > Thanks for the help! :-) -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: ksh93-shift_ijs.diff.txt URL: