[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[debian-devel:15131] Re: 英訳の校正をお願いできませんか
dombly です。
[Message-ID: <20020524.073557.00420860.dombly@xxxxxxxxxxxxxxxx>]
At Fri, 24 May 2002 07:35:57 +0900 (JST),
dombly (丼) wrote:
丼> ・スペルチェック by ispell
手違いで ispell をかける前の版を送ってしまったようです。
再送します。
NAME
Unicode::Japanese - Japanese Charset Converter
SYNOPSIS
use Unicode::Japanese;
# convert utf8 -> sjis
print Unicode::Japanese->new($str)->sjis;
# convert sijs -> utf8
print Unicode::Japanese->new($str,'sjis')->get;
# convert sjis (imode_EMOJI) -> utf8
print Unicode::Japanese->new($str,'sjis-imode')->get;
# convert ZENKAKU (utf8) -> HANKAKU (utf8)
print Unicode::Japanese->new($str)->z2h->get;
DESCRIPTION
Module for conversion among charsets of Japanese encodings.
## convert は動詞; each 不要; in->of
FEATURES
* The instance stores internal strings in UTF-8.
## 「保持する」違い; コードを受ける前置詞は in に統一
* Supports both XS and Non-XS. Use XS for high performance,
or Non-XS for ease to use (only by copying Japanese.pm).
## overhaul _o_
* Supports conversion between ZENKAKU and HANKAKU.
## 名詞表現
* Safely handles "EMOJI" of the mobile phones (DoCoMo i-mode, ASTEL dot-i
and J-PHONE J-Sky) by mapping them on Unicode Private Use Area.
* Supports conversion of the same image of EMOJI between different mobile phone's
standard mutually.
## misspell
* Considers SJIS as MS-CP932. (Shift_JIS on MS-Windows (MS-SJIS/MS-CP932)
differ from generic Shift_JIS charset.)
* On converting Unicode to SJIS (EUC/JIS), those characters that cannot
be converted to SJIS (except "EMOJI") are escaped in "&#xxxx;" format.
## overhaul _o_
METHODS
$s = Unicode::Japanese->new($str [, $icode [, $encode]])
Creates a new instance of Unicode::Japanese.
If arguments are specified, passes through to set method.
$s->set($str [, $icode [, $encode]])
$str: string
$icode: charset, may be omitted (default = 'utf8')
$encode: ASCII encoding, may be omitted.
Sets a string in the instance. If '$icode' is omitted, string is
considered as UTF-8.
## set A to B は「A に B を代入する」という感じ
## 後半はoverhaul _o_
To specify a charset, choose from the following;
'jis', 'sjis', 'euc', 'utf8', 'ucs2', 'ucs4', 'utf16', 'utf16-ge',
'utf16-le', 'utf32', 'utf32-ge', 'utf32-le', 'ascii','binary',
'sjis-imode', 'sjis-doti', 'sjis-jsky'.
## choose and specify は冗長
'&#xxxx' will be converted to "EMOJI", when specified 'sjis-imode'
or 'sjis-doti'.
For auto charset detection, you MUST specify 'auto' so as to call getcode
method automatically.
## overhaul _o_
For ASCII encoding, only 'base64' may be specified. With it,
the string will be decoded before storing.
## overhaul _o_
To decode binary, specify 'binary' as the charset.
$str = $s->get
$str: string(UTF-8)
## 変数の説明には冠詞は不要
Gets a string with UTF-8.
$code = $s->getcode($str)
$str: string
$code: charset name
## 用語の統一...といいつつ thoroughly にはしてません _o_
Detects the charset of *$str*.
## 変数に冠詞は不要
Notice: The code of the string in the instance is NOT detected.
## overhaul _o_
Charsets are distinguished by the following algorism:
1 If BOM of UTF-32 is found, the charset is utf32.
2 If BOM of UTF-16 is found, the charset is utf16.
3 If it is in proper UTF-32BE, the charset is utf32-be.
## 叙述で用いるなら「for」だがやや曖昧になるので,限定用法に変えました。以下も。
4 If it is in proper UTF-32LE, the charset is utf32-le.
5 Without NON-ASCII characters, the charset is ascii. (control
codes except escape sequences has been included in ASCII)
6 If it includes JIS escape sequences, the charset is jis.
7 If it includes "J-PHONE EMOJI", the charset is sjis-sky.
8 If it is in proper EUC, the charset is euc.
9 If it is in proper SJIS, the charset is sjis.
10 If it is in proper SJIS and "EMOJI" of i-mode, the charset is
sjis-imode.
11 If it is in proper SJIS and "EMOJI" of dot-i,the charset is
sjis-doti.
12 If it is in proper UTF-8, the charset is utf8.
13 If none above is true, the charset is unknown.
## overhaul _o_
Regarding the algorism, pay attention to the following:
## cause は変; please 不要
* UTF-8 is occasionally detected as SJIS.
## overhaul _o_
* Can NOT detect UCS2 automatically.
* Can detect UTF-16 only when the string has BOM.
## overhaul _o_
* Can detect "EMOJI" when it is stored in binary, not in "&#xxxxx;"
format. (If only stored in "&#xxxxx;" format, getcode() will
return incorrect result. In that case, "EMOJI" will be crashed.)
## by だと binary stores "EMOJI" で変; 接続詞変更
$str = $s->conv($ocode, $encode)
$ocode: output charset (Choose from 'jis', 'sjis', 'euc', 'utf8',
'ucs2', 'ucs4', 'utf16', 'binary')
$encode: ASCII encoding, may be omitted.
$str: string
Gets a string converted to *$ocode*.
For ASCII encoding, only 'base64' may be specified. With it, the string
encoded in base64 will be returned.
$s->tag2bin
Replaces the substrings "&#xxxxx;" in the string with the binary entity
they mean.
## replace の用法
$s->z2h
Converts ZENKAKU to HANKAKU.
$s->h2z
Converts HANKAKU to ZENKAKU.
$s->hira2kata
Converts HIRAGANA to KATAKANA.
$s->kata2hira
Converts KATAKANA to HIRAGANA.
$str = $s->jis
$str: string (JIS)
Gets the string converted to JIS(ISO-2022-JP).
## the とすることで instance の保持する string のことを指すことが明確になります
$str = $s->euc
$str: string (EUC)
Gets the string converted to EUC.
$str = $s->utf8
$str: string (UTF-8)
Gets the string converted to UTF-8.
$str = $s->ucs2
$str: string (UCS2)
Gets the string converted to UCS2.
$str = $s->ucs4
$str: string (UCS4)
Gets the string converted to UCS4.
$str = $s->utf16
$str: string (UTF-16)
Gets the string converted to UTF-16(big-endian). BOM is not added.
## 主語が不明でした
$str = $s->sjis
$str: string (SJIS)
Gets the string converted to SJIS(MS-CP932).
$str = $s->sjis_imode
$str: string (SJIS/imode_EMOJI)
Gets the string converted to SJIS for i-mode.
$str = $s->sjis_doti
$str: string (SJIS/dot-i_EMOJI)
Gets the string converted to SJIS for dot-i.
$str = $s->sjis_sky
$str: string (SJIS/J-SKY_EMOJI)
Gets the string converted to SJIS for j-sky.
@str = $s->strcut($len)
$len: number of characters
@str: strings
Splits the string by length(*$len*).
$len = $s->strlen
$len: `visual width' of the string
## width だけだと意味不明かと。「表示幅」という用語そのものも
## 日本語コーディングに特有と思われ,quote してみました。
Gets the length of the string. This method has been offered to
substitute for perl build-in length(). ZENKAKU characters are
assumed to have lengths of 2, regardless of the coding being
SJIS or UTF-8.
## 後半overhaul _o_
$s->join_csv(@values);
@values: data array
Converts the array to a string in CSV format, then stores into
the instance. In the meantime, adds a newline("\n") at the end
of string.
## 接続詞変更
@values = $s->split_csv;
@values: data array
Splits the string, accounting it is in CSV format. Each newline("\n") is
removed before split.
## 日本語ドキュメントに合わせました
DESCRIPTION OF UNICODE MAPPING
SJIS
Mapped as MS-CP932. Mapping table in the following URL is used.
## 前置詞:この場合「掲載」という感じなので in か on が適切でしょうね
ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT
If a character cannot be mapped to SJIS from Unicode, it will be
converted to &#xxxxx; format.
Also, any unmapped character will be converted into "?" when converting
to SJIS for mobile phones.
EUC/JIS
Converted to SJIS and then mapped to Unicode. Any non-SJIS character
in the string will not be mapped correctly.
## 仮定は any に; out of SJIS では意味が逆 ^^;
DoCoMo i-mode
Portion of involving "EMOJI" in F800 - F9FF is maapped to
U+0FF800 - U+0FF9FF.
ASTEL dot-i
Portion of involving "EMOJI" in F000 - F4FF is maapped to
U+0FF000 - U+0FF4FF.
J-PHONE J-SKY
"J-SKY EMOJI" are mapped down as follows: "\e\$"(\x1b\x24) escape
sequences, the first byte, the second byte and "\x0f".
With sequential "EMOJI"s of identical first bytes,
it may be compressed by arranging only the second bytes.
## 後半overhaul _o_
4500 - 47FF is mapped to U+0FFB00 - U+0FFDFF, accounting the first
and the second bytes make one EMOJI character.
Unicode::Japanese will compress "J-SKY_EMOJI" automatically when
the first bytes of a sequence of "EMOJI" are identical.
## overhaul _o_
BUGS
* EUC, JIS strings cannot be converted correctly when they include
non-SJIS characters because they are converted to SJIS before
being converted to UTF-8.
## 用語統一; 後半overhaul _o_
* The Japanese.pm file will collapse if sent via ASCII mode of FTP,
as it has a trailing binary data.
## overhaul _o_
AUTHOR INFORMATION
Copyright 2001, SANO Taku (SAWATARI Mikage) All right resreved.
This library is free software; you can redistribute it and/or modify it
under the same terms as Perl itself.
Bug reports and comments to: mikage@xxxxxxxxx Thank you.
CREDITS
Thanks very much to:
Nao NAKAYAMA