[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[debian-devel:15129] 英訳の校正をお願いできませんか



こんにちは。杉浦です。

org に持って行くためも兼ねて、 Unicode::Japanese の pod の英訳の修正を
引き受けて何とかやってみたのですが。どうにも私では巧く訳せない部分が多い&
自分の英語能力のが足らなくて、なかなかまともな訳に出来ずにいます。

どなたか、下の文章の校正をお願いできないでしょうか。
意味がずれているとか、そもそも英語として変だとか、何でも良いので指摘して
いただけると、とてもありがたいです。

古い日本語の原文は(多少内容変わっていますが)これです。
http://sugi.nemui.org/tmp/Japanese.pm.old.html
# pod2text だと文字化けするので HTML に。

# 一応、変更/追加前の英訳は
# http://sugi.nemui.org/tmp/Japanese.pm.orig.txt
# にあります


どうかよろしくお願いします。

-- 
Tatsuki Sugiura   mailto:sugi@xxxxxxxxxxxxxxxxxxxxxxxxxxx


-----------------------------------------------------------------

NAME
    Unicode::Japanese - Japanese Charset Converter

SYNOPSIS
    use Unicode::Japanese;

    # convert utf8 -> sjis

    print Unicode::Japanese->new($str)->sjis;

    # convert sijs -> utf8

    print Unicode::Japanese->new($str,'sjis')->get;

    # convert sjis (imode_EMOJI) -> utf8

    print Unicode::Japanese->new($str,'sjis-imode')->get;

    # convert ZENKAKU (utf8) -> HANKAKU (utf8)

    print Unicode::Japanese->new($str)->z2h->get;

DESCRIPTION
    Module for convert among each charsets in Japanese encodings.

  FEATURES
    * The instance maintains internal strings with UTF-8.

    * Support both XS and Non-XS mode. XS for high performance, and No-XS
      for easy to use (only with copy Japanese.pm).

    * Support converting between ZENKAKU and HANKAKU.

    * Handle safely "EMOJI" of mobile phones (DoCoMo i-mode, ASTEL dot-i,
      and J-PHONE J-Sky) by mapping them on Unicode Private Use Area.

    * Support converting same image of EMOJI between diffrent mobile phone's
      standerd mutually.

    * Consider SJIS as MS-CP932. (Shift_JIS on MS-Windows (MS-SJIS/MS-CP932)
      differ from generic Shift_JIS charset.)

    * When convert Unicode to SJIS (EUC/JIS), escape to "&#xxxx;" format, if
      the character cannot be converted to SJIS. (except "EMOJI")

METHODS
    $s = Unicode::Japanese->new($str [, $icode [, $encode]])
        Create a new instance of Unicode::Japanese.

        If arguments was specified, pass through to set method.

    $s->set($str [, $icode [, $encode]])

        $str: string
        $icode: charset, can be omitted (default = 'utf8')
        $encode: encoding, can be omitted.

        Set a string to the instance. If omit '$icode', string is consider
        as UTF-8.

        If you specify a charset, choose and specify from the following;
        'jis', 'sjis', 'euc', 'utf8', 'ucs2', 'ucs4', 'utf16', 'utf16-ge',
        'utf16-le', 'utf32', 'utf32-ge', 'utf32-le', 'ascii','binary',
        'sjis-imode', 'sjis-doti', 'sjis-jsky'.

        '&#xxxx' will be converted to "EMOJI", when specify 'sjis-imode' or
        'sjis-doti'.

        For auto detect charset, MUST specify 'auto'. (then, call getcode
        method automatically)

        For encoding, can only be specified by 'base64'. If it specified,
        string will be decode before storing.

        When you decode binary, specify 'binary' as charset.

    $str = $s->get

        $str: a string(UTF-8)

        get string with UTF-8.

    $code = $s->getcode($str)

        $str: string
        $code: character set name

        Detect charset of a *$str*.

        Notice: This is not for string codes which is maintained instance!

        Charsets are distinguished by the following algorism;

        1   If BOM of UTF-32 was found, the charset is utf32.

        2   If BOM of UTF-16 was found, the charset is utf16.

        3   If it is proper as UTF-32BE, the charset is utf32-be.

        4   If it is proper as UTF-32LE, the charset is utf32-le.

        5   Without NON-ASCII characters, the charset is ascii. (control
            codes except escape sequences has been included in ASCII)

        6   If it include JIS escape sequences, the charset is jis.

        7   If it include "J-PHONE EMOJI", the charset is sjis-sky.

        8   If it is proper as EUC, the charset is euc.

        9   If it is proper as SJIS, the charset is sjis.

        10  If it is proper as SJIS and "EMOJI" of i-mode, the charset is
            sjis-imode.

        11  If it is proper as SJIS and "EMOJI" of dot-i,the charset is
            sjis-doti.

        12  If it is proper as UTF-8, the charset is utf8.

        13  If it is not true of them, the charset is unknown.

        Caused by the algorism, please pay attention to the following;

        * Possible, take UTF-8 for SJIS.

        * Can NOT detect UCS2 automatically.

        * Can detect UTF-16, only including BOM.

        * Can detect "EMOJI", when it is stored by binary, not by "&#xxxx;"
          format. (If it is only stored in &#xxxxx; format, getcode() will
          return incorrect result. Then, "EMOJI" will be crashed when you
          convert it.)

    $str = $s->conv($ocode, $encode)

        $ocode: output charset (Choose from 'jis', 'sjis', 'euc', 'utf8',
        'ucs2', 'ucs4', 'utf16', 'binary')
        $encode: encoding, can be omitted.
        $str: string

        Get strings converted to *$ocode*.

        For encoding, can only be specified by 'base64'. Then, string
        encoeded base64 will be returned.

    $s->tag2bin
        Replace to "&#xxxxx;" inclued in strings binary entity.
        (de-refecencing &#xxxxx;)

    $s->z2h
        Convert ZENKAKU to HANKAKU.

    $s->h2z
        Convert HANKAKU to ZENKAKU.

    $s->hira2kata
        Convert HIRAGANA to KATAKANA.

    $s->kata2hira
        Convert KATAKANA to HIRAGANA.

    $str = $s->jis
        $str: string (JIS)

        Get string converted JIS(ISO-2022-JP).

    $str = $s->euc
        $str: string (EUC)

        Get string converted EUC.

    $str = $s->utf8
        $str: string (UTF-8)

        Get string converted UTF-8.

    $str = $s->ucs2
        $str: string (UCS2)

        Get string converted UCS2.

    $str = $s->ucs4
        $str: string (UCS4)

        Get string converted UCS4.

    $str = $s->utf16
        $str: string (UTF-16)

        Get string converted UTF-16(big-endian). Not accompanied with BOM.

    $str = $s->sjis
        $str: string (SJIS)

        Get string converted SJIS(MS-CP932).

    $str = $s->sjis_imode
        $str: string (SJIS/imode_EMOJI)

        Get string converted SJIS for i-mode.

    $str = $s->sjis_doti
        $str: string (SJIS/dot-i_EMOJI)

        Get string converted SJIS for dot-i.

    $str = $s->sjis_sky
        $str: string (SJIS/J-SKY_EMOJI)

        Get string converted SJIS for j-sky.

    @str = $s->strcut($len)

        $len: number of charcter
        @STR: a string

        Split string by length(*$len*).

    $len = $s->strlen
        $len: width of a string

        Get length of strings stored in $s. This has been offerd to
        substitute for perl build-in length(). This method will count
        ZENKAKU as 2.

    $s->join_csv(@values);
        @values: data array

        Convert array to string of CSV format, and store into instance. At
        this time, Supplement newline("\n") in the end of string.

    @values = $s->split_csv;
        @values: data array

        Split string stored in $s as CSV format. Each newline("\n") will be
        remove before split.

DESCRIPTION OF UNICODE MAPPING
    SJIS
      Mapped as MS-CP932. Mapping table of following URL was used.

      ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT

      If character cannot be mapped to SJIS from Unicode, it will be
      converted to &#xxxxx; format.

      Also, every unmapped characters will be convert "?" when converting to
      SJIS for mobile phon.

    EUC/JIS
      At first, converts to SJIS, and maps to Unicode. If string include
      charcter was out of SJIS, it cannot be mapped correctly.

    DoCoMo i-mode
      Portion of involving "EMOJI" in F8OO - F9FF will be maapped to
      U+0FF800 - U+0FF9FF.

    ASTEL dot-i
      Portion of involving "EMOJI" in F~OO - F4FF will be maapped to
      U+0FF000 - U+0FF4FF.

    J-PHONE J-SKY
      "J-SKY EMOJI" are mapped down as follows. "\e\$"(\x1b\x24) escape
      sequences, the first byte, second byte, and "\xOf". Compressed by
      drawing second byte's "EMOJI" twice, if the "EMOJI" are same between
      first and second.

      Map 45OO - 47FF into U+OFFBOO - U+OFFDFF, as accounting the first byte
      and the second one is one character of a pair.

      Unicode::Japanese compresses "J-SKY_EMOJI" automatically, when "EMOJI"
      of consentive bytes are same.

BUGS
    * EUC, JIS cannot be converted to appropriately, when they include in
      untransformed characters to SJIS. Because they are converted after
      gotten SJIS to UTF-8.

    * If sent Japanese.pm via ASCII mode of FTP, file will be broken.
      Because it has a binary data.

AUTHOR INFORMATION
    Copyright 2001, SANO Taku (SAWATARI Mikage) All right resreved.

    This library is free software; you can redistribute it and/or modify it
    under the same terms as Perl itself.

    Bug reports and comments to: mikage@xxxxxxxxx Thank you.

CREDITS
    Thanks very much to:

    Nao NAKAYAMA