[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[debian-devel:13814] Re: iconv()

From: Tomohiro KUBOTA <tkubota@xxxxxxxxxxx>
Subject: [debian-devel:13814] Re: iconv()
Date: Tue, 27 Feb 2001 12:15:39 +0900
X-ml-info: If you have a question, send e-mail with the body "help" (without quotes) to the address debian-devel-ctl@debian.or.jp; help=<mailto:debian-devel-ctl@debian.or.jp?body=help>
X-ml-name: debian-devel
X-mlserver: fml [fml 3.0pl#17]; post only (only members can post)
References: <877l2chmzg.wl@piccolo>
Message-id: <871yskstfh.wl@xxxxxxxxxxxxxxxxxxxxx>
X-mail-count: 13814
User-agent: Wanderlust/1.1.1 (Purple Rain) EMY/1.13.8 (Tastes differ) FLIM/1.13.2 (Kasanui) APEL/10.2 Emacs/20.7 (i386-debian-linux-gnu) MULE/4.1 (AOI)

くぼたです。

At Tue, 27 Feb 2001 11:34:16 +0900,
Takashi Okamoto <toraneko@xxxxxxxxx> wrote:

> xxx -> utf-16 への変換で、glibc2.1 と glibc2.2 で違いがあるらしいです。
> glibc 2.2 では、BOM と呼ばれるものが、最初の2byteに追加されていて、それ
> で Big endian と Little endian を区別できます。
> 
> それが嫌な場合は、utf-16be とか、utf-16le を使えば良いようです。ただし、
> glibc <=2.1.x では utf-16be と utf-16le は使えませんし、BOM もありませ
> ん。
> (BOM については、unicode3.0のドキュメントに何か書いていたような...(忘))

FreeBSD の iconv() では、UCS-4 への変換でも BOM がつくという噂です。
いま、XTerm の国際化をやっているのですが、
http://www.debian.or.jp/~kubota/xterm.html
このへん、なんとかしてくれって感じですね。まあたしかに、
Markus さんたちが、__STDC_ISO_10646__ を使って、wchar_t は
UCS-4 だと決め打ちして、mb/wc 変換関数群を使いたくなるのも
分かるような。

# Markus さんとは誰か？ については、i18n@xfree86 メーリングリスト
# http://www.xfree86.org/mailman/listinfo/i18n
# の最近のログを見ればわかります。Unicode 以外の
# エンコーディングをさっさと廃止したいと願っている急進派です。
# この人の意見が通ってしまうと XFree86 とかで
# EUC-JP とかが使えなくなってしまうかもしれないので、
# おそれています。


> あと、Shift_JIS から unicode への変換では、
> 
>    TILDA ('~', U+007E) -> OVERLINE (U+203E).
>    BACKSLASH ('\', U+005C)  -> YEN (U+00A5).
> 
> となったりするので、注意が必要ですね。

これは、Shift_JIS の定義をそのまま実装すると、こういうふうになるのが
自然です。というのは、Shift_JIS の 0x21-0x7e は JIS X 0201 Roman 
だからです。(EUC-JP の 0x21-0x7e は ASCII です)。

まあ理屈の上ではそれが自然であっても、実用上は変なので、
どうすればいいんでしょうねえ。


> # 実は、既にこの辺りの話しは何処かに書いてある?
> # もし誰かまとめるのなら、この辺りの情報もまとめて欲しい...

このへんは、"Introduction to I18N" に書いたと思います。


> あと、iconv を使って、1文字づつ文字を変換する方法はないんですかね?
> 一応、下記のような感じでは可能なのですが(xxx -> utf-16 への変換の場
> 合)....

上記 XTerm の国際化のパッチを参照してください。charsets.c の最後の
ほうに 3 つほどある関数が、それ関係です。ただし、私自身、どこまで
対応すればまあまあな移植性が達成できるのか、わけわからん状態です。


---
久保田智広 Tomohiro KUBOTA <kubota@debian.org>
http://surfchem0.riken.go.jp/~kubota/
リニューアル中: "Introduction to I18N"
http://www.debian.org/doc/manuals/intro-i18n/

Follow-Ups:
- [debian-devel:13816] Re: iconv()
  - From: Atsuhito Kohda
- [debian-devel:13818] Re: iconv()
  - From: tajiri

References:
- [debian-devel:13813] Re: iconv()
  - From: Takashi Okamoto

Prev by Date: [debian-devel:13813] Re: iconv()
Next by Date: [debian-devel:13815] Unanswered problem reports by maintainer and package
Previous by thread: [debian-devel:13813] Re: iconv()
Next by thread: [debian-devel:13816] Re: iconv()
Index(es):
- Date
- Thread