[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[debian-devel:13253] Re: I/O for different encodings

From: Taketoshi Sano <sano@debian.org>
Subject: [debian-devel:13253] Re: I/O for different encodings
Date: Fri, 10 Nov 2000 17:45:12 +0900
X-dispatcher: imput version 991025(IM133)
X-fingerprint: A1A0 F2D0 9C5D 7D61 DBA1 1507 D0B3 F3D0 AE31 C009
X-ml-info: If you have a question, please contact debian-devel-admin@debian.or.jp; <mailto:debian-devel-admin@debian.or.jp>
X-ml-name: debian-devel
X-mlserver: fml [fml 3.0pl#17]; post only (anyone can post)
X-sender: Taketoshi Sano <kgh12351@xxxxxxxxxxx>
References: <877l6d2jml.fsf@xxxxxxxxxxxxxxxxxxxxx> <E13tzkw-0000yK-00@rakefet>
Message-id: <y5azoj8txrp.fsf@xxxxxxxxxxxxxxxxxxxx>
X-mail-count: 13253
User-agent: Semi-gnus/6.10.12 SEMI/1.12.1 ([JR] Nonoichi) FLIM/1.12.7 (Y�zaki) Emacs/20.7 (i386-debian-linux-gnu) MULE/4.0 (HANANOEN)

Hi.

In <E13tzkw-0000yK-00@rakefet>,  on Fri, 10 Nov 2000 00:01:10 +0200,
 Shaul Karl <shaulka@xxxxxxxxxxxx> wrote:

> > I'm working on a piece of software that will parse textual data (a
> > list of words), conduct some statistical analyses, and spit out more
> > textual data.  I'd like to support multiple languages, maybe even
> > multibyte encodings.  Can someone please point me towards some
> > resources, in particular how to handle text input and output in a
> > language-independent way?  As you can probably guess, I'm new to i18n.

> Not sure but I believe that everything is in the process of convergence to 
> Unicode (UTF8). Therefore, if I would have written such a program I would make 
> it to use this encoding.
> As for resources, there is a Unicode HOWTO on the LDP and many other resources 
> on the net.

I think the support of UTF8 is a minimum (or essential) requirement
for i18n especially in multibyte encodings.  There are some software
which claims the unicode support but does not support multibye encodings
correctly.  (So these "unicode supported" software can not handle some
languages which includes Japanese.)

If you can add support for more encodings then it is better than 
to support unicode only, but we can use some translation filter
(such as iconv(1), tcs(1), or so) to process text data, so the support
of Unicode (or UCS-4 which is better) is workable compromise, maybe.

Maybe You can read the discussion about i18n of groff on this list
from the web archive recently.  I think it has something useful
for you.

Regards.
-- 
  Taketoshi Sano: <sano@debian.org>,<sano@debian.or.jp>,<kgh12351@xxxxxxxxxxx>

Prev by Date: [debian-devel:13252] Re: X3.6/4 (woody) ct-Driver lockups
Next by Date: [debian-devel:13254] Unanswered problem reports by maintainer and package
Previous by thread: [debian-devel:13250] Unanswered problem reports by date
Next by thread: [debian-devel:13254] Unanswered problem reports by maintainer and package
Index(es):
- Date
- Thread