[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[debian-devel:15131] Re: 英訳の校正をお願いできませんか



dombly です。

[Message-ID: <20020524.073557.00420860.dombly@xxxxxxxxxxxxxxxx>]
At Fri, 24 May 2002 07:35:57 +0900 (JST),
dombly (丼) wrote:
丼> ・スペルチェック by ispell

 手違いで ispell をかける前の版を送ってしまったようです。
再送します。

NAME
    Unicode::Japanese - Japanese Charset Converter

SYNOPSIS
    use Unicode::Japanese;

    # convert utf8 -> sjis

    print Unicode::Japanese->new($str)->sjis;

    # convert sijs -> utf8

    print Unicode::Japanese->new($str,'sjis')->get;

    # convert sjis (imode_EMOJI) -> utf8

    print Unicode::Japanese->new($str,'sjis-imode')->get;

    # convert ZENKAKU (utf8) -> HANKAKU (utf8)

    print Unicode::Japanese->new($str)->z2h->get;

DESCRIPTION
    Module for conversion among charsets of Japanese encodings.
## convert は動詞; each 不要; in->of

  FEATURES
    * The instance stores internal strings in UTF-8.
## 「保持する」違い; コードを受ける前置詞は in に統一

    * Supports both XS and Non-XS. Use XS for high performance,
      or Non-XS for ease to use (only by copying Japanese.pm).
## overhaul _o_

    * Supports conversion between ZENKAKU and HANKAKU.
## 名詞表現

    * Safely handles "EMOJI" of the mobile phones (DoCoMo i-mode, ASTEL dot-i
      and J-PHONE J-Sky) by mapping them on Unicode Private Use Area.

    * Supports conversion of the same image of EMOJI between different mobile phone's
      standard mutually.
## misspell

    * Considers SJIS as MS-CP932. (Shift_JIS on MS-Windows (MS-SJIS/MS-CP932)
      differ from generic Shift_JIS charset.)

    * On converting Unicode to SJIS (EUC/JIS), those characters that cannot
      be converted to SJIS (except "EMOJI") are escaped in "&#xxxx;" format.
## overhaul _o_

METHODS
    $s = Unicode::Japanese->new($str [, $icode [, $encode]])
        Creates a new instance of Unicode::Japanese.

        If arguments are specified, passes through to set method.

    $s->set($str [, $icode [, $encode]])

        $str: string
        $icode: charset, may be omitted (default = 'utf8')
        $encode: ASCII encoding, may be omitted.

        Sets a string in the instance. If '$icode' is omitted, string is
	considered as UTF-8.
## set A to B は「A に B を代入する」という感じ
## 後半はoverhaul _o_

        To specify a charset, choose from the following;
        'jis', 'sjis', 'euc', 'utf8', 'ucs2', 'ucs4', 'utf16', 'utf16-ge',
        'utf16-le', 'utf32', 'utf32-ge', 'utf32-le', 'ascii','binary',
        'sjis-imode', 'sjis-doti', 'sjis-jsky'.
## choose and specify は冗長

        '&#xxxx' will be converted to "EMOJI", when specified 'sjis-imode'
	or 'sjis-doti'.

        For auto charset detection, you MUST specify 'auto' so as to call getcode
        method automatically.
## overhaul _o_

	For ASCII encoding, only 'base64' may be specified. With it,
        the string will be decoded before storing.
## overhaul _o_

        To decode binary, specify 'binary' as the charset.

    $str = $s->get

        $str: string(UTF-8)
## 変数の説明には冠詞は不要

        Gets a string with UTF-8.

    $code = $s->getcode($str)

        $str: string
        $code: charset name
## 用語の統一...といいつつ thoroughly にはしてません _o_

        Detects the charset of *$str*.
## 変数に冠詞は不要

        Notice: The code of the string in the instance is NOT detected.
## overhaul _o_

        Charsets are distinguished by the following algorism:

        1   If BOM of UTF-32 is found, the charset is utf32.

        2   If BOM of UTF-16 is found, the charset is utf16.

        3   If it is in proper UTF-32BE, the charset is utf32-be.
## 叙述で用いるなら「for」だがやや曖昧になるので,限定用法に変えました。以下も。

        4   If it is in proper UTF-32LE, the charset is utf32-le.

        5   Without NON-ASCII characters, the charset is ascii. (control
            codes except escape sequences has been included in ASCII)

        6   If it includes JIS escape sequences, the charset is jis.

        7   If it includes "J-PHONE EMOJI", the charset is sjis-sky.

        8   If it is in proper EUC, the charset is euc.

        9   If it is in proper SJIS, the charset is sjis.

        10  If it is in proper SJIS and "EMOJI" of i-mode, the charset is
            sjis-imode.

        11  If it is in proper SJIS and "EMOJI" of dot-i,the charset is
            sjis-doti.

        12  If it is in proper UTF-8, the charset is utf8.

        13  If none above is true, the charset is unknown.
## overhaul _o_

        Regarding the algorism, pay attention to the following:
## cause は変; please 不要

        * UTF-8 is occasionally detected as SJIS.
## overhaul _o_

        * Can NOT detect UCS2 automatically.

        * Can detect UTF-16 only when the string has BOM.
## overhaul _o_

        * Can detect "EMOJI" when it is stored in binary, not in "&#xxxxx;"
          format. (If only stored in "&#xxxxx;" format, getcode() will
          return incorrect result. In that case, "EMOJI" will be crashed.)
## by だと binary stores "EMOJI" で変; 接続詞変更

    $str = $s->conv($ocode, $encode)

        $ocode: output charset (Choose from 'jis', 'sjis', 'euc', 'utf8',
        'ucs2', 'ucs4', 'utf16', 'binary')
        $encode: ASCII encoding, may be omitted.
        $str: string

        Gets a string converted to *$ocode*.

        For ASCII encoding, only 'base64' may be specified. With it, the string
        encoded in base64 will be returned.

    $s->tag2bin
        Replaces the substrings "&#xxxxx;" in the string with the binary entity
        they mean.
## replace の用法

    $s->z2h
        Converts ZENKAKU to HANKAKU.

    $s->h2z
        Converts HANKAKU to ZENKAKU.

    $s->hira2kata
        Converts HIRAGANA to KATAKANA.

    $s->kata2hira
        Converts KATAKANA to HIRAGANA.

    $str = $s->jis
        $str: string (JIS)

        Gets the string converted to JIS(ISO-2022-JP).
## the とすることで instance の保持する string のことを指すことが明確になります

    $str = $s->euc
        $str: string (EUC)

        Gets the string converted to EUC.

    $str = $s->utf8
        $str: string (UTF-8)

        Gets the string converted to UTF-8.

    $str = $s->ucs2
        $str: string (UCS2)

        Gets the string converted to UCS2.

    $str = $s->ucs4
        $str: string (UCS4)

        Gets the string converted to UCS4.

    $str = $s->utf16
        $str: string (UTF-16)

        Gets the string converted to UTF-16(big-endian). BOM is not added.
## 主語が不明でした

    $str = $s->sjis
        $str: string (SJIS)

        Gets the string converted to SJIS(MS-CP932).

    $str = $s->sjis_imode
        $str: string (SJIS/imode_EMOJI)

        Gets the string converted to SJIS for i-mode.

    $str = $s->sjis_doti
        $str: string (SJIS/dot-i_EMOJI)

        Gets the string converted to SJIS for dot-i.

    $str = $s->sjis_sky
        $str: string (SJIS/J-SKY_EMOJI)

        Gets the string converted to SJIS for j-sky.

    @str = $s->strcut($len)

        $len: number of characters
        @str: strings

        Splits the string by length(*$len*).

    $len = $s->strlen
        $len: `visual width' of the string
## width だけだと意味不明かと。「表示幅」という用語そのものも
## 日本語コーディングに特有と思われ,quote してみました。

        Gets the length of the string. This method has been offered to
        substitute for perl build-in length(). ZENKAKU characters are
        assumed to have lengths of 2, regardless of the coding being
        SJIS or UTF-8.
## 後半overhaul _o_

    $s->join_csv(@values);
        @values: data array

        Converts the array to a string in CSV format, then stores into
        the instance. In the meantime, adds a newline("\n") at the end
        of string.
## 接続詞変更

    @values = $s->split_csv;
        @values: data array

        Splits the string, accounting it is in CSV format. Each newline("\n") is
        removed before split.
## 日本語ドキュメントに合わせました

DESCRIPTION OF UNICODE MAPPING
    SJIS
      Mapped as MS-CP932. Mapping table in the following URL is used.
## 前置詞:この場合「掲載」という感じなので in か on が適切でしょうね

      ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT

      If a character cannot be mapped to SJIS from Unicode, it will be
      converted to &#xxxxx; format.

      Also, any unmapped character will be converted into "?" when converting
      to SJIS for mobile phones.

    EUC/JIS
      Converted to SJIS and then mapped to Unicode. Any non-SJIS character
      in the string will not be mapped correctly.
## 仮定は any に; out of SJIS では意味が逆 ^^;

    DoCoMo i-mode
      Portion of involving "EMOJI" in F800 - F9FF is maapped to
      U+0FF800 - U+0FF9FF.

    ASTEL dot-i
      Portion of involving "EMOJI" in F000 - F4FF is maapped to
      U+0FF000 - U+0FF4FF.

    J-PHONE J-SKY
      "J-SKY EMOJI" are mapped down as follows: "\e\$"(\x1b\x24) escape
      sequences, the first byte, the second byte and "\x0f".
      With sequential "EMOJI"s of identical first bytes,
      it may be compressed by arranging only the second bytes.
## 後半overhaul _o_

      4500 - 47FF is mapped to U+0FFB00 - U+0FFDFF, accounting the first
      and the second bytes make one EMOJI character.

      Unicode::Japanese will compress "J-SKY_EMOJI" automatically when
      the first bytes of a sequence of "EMOJI" are identical.
## overhaul _o_

BUGS
    * EUC, JIS strings cannot be converted correctly when they include
      non-SJIS characters because they are converted to SJIS before
      being converted to UTF-8.
## 用語統一; 後半overhaul _o_

    * The Japanese.pm file will collapse if sent via ASCII mode of FTP,
      as it has a trailing binary data.
## overhaul _o_

AUTHOR INFORMATION
    Copyright 2001, SANO Taku (SAWATARI Mikage) All right resreved.

    This library is free software; you can redistribute it and/or modify it
    under the same terms as Perl itself.

    Bug reports and comments to: mikage@xxxxxxxxx Thank you.

CREDITS
    Thanks very much to:

    Nao NAKAYAMA