[ksh93-integration-discuss] ksh93 "printf" builtin vs. CR #6558816 ("printf variants behaving incorrectly for multibyte decimal point") ...
Glenn Fowler
gsf at research.att.com
Mon Jan 14 19:00:36 PST 2008
ast uses localeconv(3)
but
it assumes the struct lconv char* elements point to one byte
we will recode to properly treat the return as 0-terminated strings
it will be up to the native localeconv() implementation
to do the right thing
I just checked on sol11 and for LC_ALL=ar_SA.UTF-8
struct lconv decimal_point = ","
-- Glenn Fowler -- AT&T Research, Florham Park NJ --
On Tue, 15 Jan 2008 00:52:19 +0100 Roland Mainz wrote:
> I'm currently trying to figure out whether
> http://bugs.opensolaris.org/view_bug.do?bug_id=6558816 ("printf variants
> behaving incorrectly for multibyte decimal point") applies to the ksh93
> "printf" builtin command, too.
> The (public) description for CR #6558816 says:
> -- snip --
> In snv_62 for ar_EG.UTF-8/ar_SA.UTF-8 locales the decimal point is
> defined as 0x066b, the arabic decimal point. This is a multibyte
> charatcer with UTF-8 representation 0xd9 0xab.
> Compiling and running the following program in ar_EG.UTF-8/ar_SA.UTF-8
> locales
> #include <stdlib.h>
> #include <locale.h>
> void main() {
> float g=10.111;
> setlocale(LC_ALL,"");
> printf("%f\n", g);
> }
> #./a.out | od -x
> gives
> 0000000 3130 d931 3131 3130 300a
> which is not correct, the decimal point is chopped off at the first byte
> The following should be the right output
> 0000000 3130 d9ab 3131 3131 3030 0a
>
> according to man -s 3C printf
> All forms of the printf() functions allow for the insertion
> of a language-dependent radix character in the output
> string. The radix character is defined by the program's
> locale (category LC_NUMERIC). In the POSIX locale, or in a
> locale where the radix character is not defined, the radix
> character defaults to a period (.).
> -- snip --
> ast-ksh.2008-01-06 returns the following output:
> -- snip --
> $ ksh93 -c 'float i=10.111 ; export LC_ALL=ar_EG.UTF-8 ; printf "%f" i'
> | od -t x1
> 0000000 31 30 2c 31 31 31 30 30 30
> 0000011
> -- snip --
> ... e.g. it uses a comma (',') and not Unicode 0x066b ...
> Kenjiro: Which API should be used to obtain the multibyte character
> value for the arabic decimal point (note that ksh93 uses
> |libast::printf()| and not Solaris's |libc::printf()|, e.g. any fix for
> Solaris libc needs to be ported to |libast::printf()|, too) ?
> BTW: Is it a bug that $ LC_ALL=ar_SA.UTF-8 locale -k decimal_point #
> returns a comma (',' ) ?
More information about the ksh93-integration-discuss
mailing list