[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[debian-users:03277] Re: ja_JP locale
>>>>> On Sat, 6 Dec 97 03:35:37 +0900, yosshy@xxxxxxxxxxxxxxx (Akira YOSHIYAMA) said:
AY> 人海戦術が必要な程の作業量ではないと思います.JIS を例にとると,
AY> JIS X 0201(ANK) 158
AY> JIS X 0208(第1キャラクタマップ) 6879
AY> JIS X 0212(第2キャラクタマップ) 6067
AY> 文字分のエントリ名(ID 名)をそれぞれに付けていくという作業になります
AY> が,どのみち漢字それぞれにあまり凝った名前をつけてもどうかと思いますし,
AY> 極端な話,エントリ名なんて何も考えずにそのまま UCS のコード番号を付け
AY> てしまっても大部分の機能は問題なく動く筈です.
AY> しかし資料がない.どなたか参考文献を持っておられませんか?
手元にはこんなの (下) がありますが、glibc 2ではワイド文字側は常に
ISO 10646ですか?
OCNが来たら、glibc 2のソースを見て、いろいろ試してみようと思
っています。
-- ja_JP,XO
# Date: Mon, 26 Nov 90 03:01:02 GMT
# From: Masahiro SEKIGUCHI <seki@xxxxxxxxxxxxxxxxxxxxxxx>
# > If Sekiguchi-san also agrees,
# > I'd like to get an electronic version for forwarding to RIN.
# >
# > Donn Terry
#
# The following is the latest version of it, i.e., XoTGinter
# seq 1126 + seq 1395 patch. (I guess your own paper copy is
# different from the following...)
#
# Please feel free forwarding it to the RIN list. Any comments
# are welcomed.
#
# Thank you.
#
# Masahiro Sekiguchi, Fujits.
#
# Sample locale definition file for Japanese.
# Based on POSIX.2 D10 syntax with X/Open extension.
#
# NOTES FOR READERS
#
# ``Japanese character set'' in this exsample consists of:
# 128 characters in JIS X0201 (i.e. Japanese version of ISO 646).
# 8836 characters in JIS X0208 (so-called JIS Kanji).
# Implementations may contain more characters to support
# Japanese.
#
# Note that our ``character set'' includes undefined code points
# of JIS X0208.
#
# The names for JIS Kanji used here are just enough to identify
# the above character set. Kanji are given names based on their
# code value written in hexadecimal. Users may require more
# general notation to represent all supported characters in
# portable manner.
#
# This definition implicitly assume that underlying encoding
# is UJIS (EUC-JIS) or similar one. (Although characters in
# G2 and G3 are completely ignored.) The definition may not
# work if the systems uses other encoding.
# Notes on LC_CTYPE:
# Characters from JIS X0208 are included in appropriate
# classes as well as ASCII characters. (I.e., portable
# characters of POSIX.2.) All Kanji (i.e., ideographic
# characters in JIS X0208) are classified to alpha but
# punct.
# Undefined code points are not included in any classes,
# because they are not printable characters nor control
# characters.
LC_CTYPE
# Upper and lower consist of:
# alphabets in portable character set,
# Roman letters in JIS X0208,
# Greek letters in JIS X0208 and
# Russian letters in JIS X0208.
upper <A>;<B>;<C>;<D>;<E>;<F>;<G>;<H>;<I>;<J>;<K>;<L>;<M>;\
<N>;<O>;<P>;<Q>;<R>;<S>;<T>;<U>;<V>;<W>;<X>;<Y>;<Z>;\
<2341>;...;<235A>;\
<2621>;...;<2638>;\
<2721>;...;<2741>
lower <a>;<b>;<c>;<d>;<e>;<f>;<g>;<h>;<i>;<j>;<k>;<l>;<m>;\
<n>;<o>;<p>;<q>;<r>;<s>;<t>;<u>;<v>;<w>;<x>;<y>;<z>;\
<2361>;...;<237A>;\
<2641>;...;<2658>;\
<2751>;...;<2771>
# Digit and xdigit contains ASCII characters only, as required
# in the standard.
digit <zero>;<one>;<two>;<three>;<four>;\
<five>;<six>;<seven>;<eight>;<nine>
xdigit <zero>;<one>;<two>;<three>;<four>;\
<five>;<six>;<seven>;<eight>;<nine>;\
<A>;<B>;<C>;<D>;<E>;<F>;\
<a>;<b>;<c>;<d>;<e>;<f>
# JIS X0208 contains a ``Kanji space'' character.
# It is recognized as a ``locale specific white-space
# character.''
space <tab>;<newline>;<vertical-tab>;<form-feed>;\
<carriage-return>;<space>;\
<2121>
# Cntrl deifned here is exactly the same as one defined in
# POSIX.2 D10. No locale specific characters are added.
cntrl <alert>;<backspace>;<tab>;<newline>;<vertical-tab>;\
<form-feed>;<carriage-return>;\
<NUL>;<SOH>;<STX>;<ETX>;<EOT>;<ENQ>;<ACK>;<SO>;\
<SI>;<DLE>;<DC1>;<DC2>;<DC3>;<DC4>;<NAK>;<SYN>;\
<ETB>;<CAN>;<EM>;<SUB>;<ESC>;<IS4>;<IS3>;<IS2>;\
<IS1>;<DEL>
# Alpha class consists of all characters in upper and lower,
# and:
# Hiragana in JIS X0208,
# Katakana in JIS X0208 and
# Kanji of level 1 and 2 in JIS X0208.
#
# Please make sure that the characters specified as upper or
# lower are automatically included in alpha.
alpha <2421>;...;<2473>;\
<2521>;...;<2576>;\
<3021>;...;<307E>;<3121>;...;<317E>;<3221>;...;<327E>;\
<3321>;...;<337E>;<3421>;...;<347E>;<3521>;...;<357E>;\
<3621>;...;<367E>;<3721>;...;<377E>;<3821>;...;<387E>;\
<3921>;...;<397E>;<3A21>;...;<3A7E>;<3B21>;...;<3B7E>;\
<3C21>;...;<3C7E>;<3D21>;...;<3D7E>;<3E21>;...;<3E7E>;\
<3F21>;...;<3F7E>;<4021>;...;<407E>;<4121>;...;<417E>;\
<4221>;...;<427E>;<4321>;...;<437E>;<4421>;...;<447E>;\
<4521>;...;<457E>;<4621>;...;<467E>;<4721>;...;<477E>;\
<4821>;...;<487E>;<4921>;...;<497E>;<4A21>;...;<4A7E>;\
<4B21>;...;<4B7E>;<4C21>;...;<4C7E>;<4D21>;...;<4D7E>;\
<4E21>;...;<4E7E>;<4F21>;...;<4F53>;\
<5021>;...;<507E>;<5121>;...;<517E>;<5221>;...;<527E>;\
<5321>;...;<537E>;<5421>;...;<547E>;<5521>;...;<557E>;\
<5621>;...;<567E>;<5721>;...;<577E>;<5821>;...;<587E>;\
<5921>;...;<597E>;<5A21>;...;<5A7E>;<5B21>;...;<5B7E>;\
<5C21>;...;<5C7E>;<5D21>;...;<5D7E>;<5E21>;...;<5E7E>;\
<5F21>;...;<5F7E>;<6021>;...;<607E>;<6121>;...;<617E>;\
<6221>;...;<627E>;<6321>;...;<637E>;<6421>;...;<6426>
# Punct contains all other printable characters, i.e.:
# symbols in portable character set,
# special characters in JIS X0208,
# ruled line elements in JIS X0208.
punct <exclamation-mark>;<quotation-mark>;<number-sign>;\
<dollar-sign>;<percent>;<ampersand>;<apostrophe>;\
<left-parenthesis>;<right-parenthesis>;<asterisk>;\
<plus-sign>;<comma>;<hypen>;<period>;<slash>;\
<colon>;<semicolon>;<less-than>;<equals-sign>;\
<greater-than>;<question-mark>;<commercial-at>;\
<left-bracket>;<backslash>;<right-bracket>;\
<circumflex>;<underscore>;<grave-accent>;\
<left-brace>;<vertical-line>;<right-brace>;<tilde>;\
<2122>;...;<217E>;<2221>;...;<222E>;<223A>;...;<2241>;\
<224A>;...;<2250>;<225C>;...;<226A>;<2272>;...;<2279>;\
<227E>;\
<2821>;...;<2840>
# Alnum, graph and print are not defined here. They will
# be supplied by localedef command.
# blank is same as C locale.
blank <space>;<tab>
# Toupper and tolower handle characters in JIS X0208 as well.
toupper (<a>,<A>);(<b>,<B>);(<c>,<C>);(<d>,<D>);(<e>,<E>);\
(<f>,<F>);(<g>,<G>);(<h>,<H>);(<i>,<I>);(<j>,<J>);\
(<k>,<K>);(<l>,<L>);(<m>,<M>);(<n>,<N>);(<o>,<O>);\
(<p>,<P>);(<q>,<Q>);(<r>,<R>);(<s>,<S>);(<t>,<T>);\
(<u>,<U>);(<v>,<V>);(<w>,<W>);(<x>,<X>);(<y>,<Y>);\
(<z>,<Z>);\
(<2361>,<2341>);(<2362>,<2342>);(<2363>,<2343>);\
(<2364>,<2344>);(<2365>,<2345>);(<2366>,<2346>);\
(<2367>,<2347>);(<2368>,<2348>);(<2369>,<2349>);\
(<236A>,<234A>);(<236B>,<234B>);(<236C>,<234C>);\
(<236D>,<234D>);(<236E>,<234E>);(<236F>,<234F>);\
(<2370>,<2350>);(<2371>,<2351>);(<2372>,<2352>);\
(<2373>,<2353>);(<2374>,<2354>);(<2375>,<2355>);\
(<2376>,<2356>);(<2377>,<2357>);(<2378>,<2358>);\
(<2379>,<2359>);(<237A>,<235A>);\
(<2641>,<2621>);(<2642>,<2622>);(<2643>,<2623>);\
(<2644>,<2624>);(<2645>,<2625>);(<2646>,<2626>);\
(<2647>,<2627>);(<2648>,<2628>);(<2649>,<2629>);\
(<264A>,<262A>);(<264B>,<262B>);(<264C>,<262C>);\
(<264D>,<262D>);(<264E>,<262E>);(<264F>,<262F>);\
(<2650>,<2630>);(<2651>,<2631>);(<2652>,<2632>);\
(<2653>,<2633>);(<2654>,<2634>);(<2655>,<2635>);\
(<2656>,<2636>);(<2657>,<2637>);(<2658>,<2638>);\
(<2751>,<2721>);(<2752>,<2722>);(<2753>,<2723>);\
(<2754>,<2724>);(<2755>,<2725>);(<2756>,<2726>);\
(<2757>,<2727>);(<2758>,<2728>);(<2759>,<2729>);\
(<275A>,<272A>);(<275B>,<272B>);(<275C>,<272C>);\
(<275D>,<272D>);(<275E>,<272E>);(<275F>,<272F>);\
(<2760>,<2730>);(<2761>,<2731>);(<2762>,<2732>);\
(<2763>,<2733>);(<2764>,<2734>);(<2765>,<2735>);\
(<2766>,<2736>);(<2767>,<2737>);(<2768>,<2738>);\
(<2769>,<2739>);(<276A>,<273A>);(<276B>,<273B>);\
(<276C>,<273C>);(<276D>,<273D>);(<276E>,<273E>);\
(<276F>,<273F>);(<2770>,<2740>);(<2771>,<2741>)
tolower (<A>,<a>);(<B>,<b>);(<C>,<c>);(<D>,<d>);(<E>,<e>);\
(<F>,<f>);(<G>,<g>);(<H>,<h>);(<I>,<i>);(<J>,<j>);\
(<K>,<k>);(<L>,<l>);(<M>,<m>);(<N>,<n>);(<O>,<o>);\
(<P>,<p>);(<Q>,<q>);(<R>,<r>);(<S>,<s>);(<T>,<t>);\
(<U>,<u>);(<V>,<v>);(<W>,<w>);(<X>,<x>);(<Y>,<y>);\
(<Z>,<z>);\
(<2341>,<2361>);(<2342>,<2362>);(<2343>,<2363>);\
(<2344>,<2364>);(<2345>,<2365>);(<2346>,<2366>);\
(<2347>,<2367>);(<2348>,<2368>);(<2349>,<2369>);\
(<234A>,<236A>);(<234B>,<236B>);(<234C>,<236C>);\
(<234D>,<236D>);(<234E>,<236E>);(<234F>,<236F>);\
(<2350>,<2370>);(<2351>,<2371>);(<2352>,<2372>);\
(<2353>,<2373>);(<2354>,<2374>);(<2355>,<2375>);\
(<2356>,<2376>);(<2357>,<2377>);(<2358>,<2378>);\
(<2359>,<2379>);(<235A>,<237A>);\
(<2621>,<2641>);(<2622>,<2642>);(<2623>,<2643>);\
(<2624>,<2644>);(<2625>,<2645>);(<2626>,<2646>);\
(<2627>,<2647>);(<2628>,<2648>);(<2629>,<2649>);\
(<262A>,<264A>);(<262B>,<264B>);(<262C>,<264C>);\
(<262D>,<264D>);(<262E>,<264E>);(<262F>,<264F>);\
(<2630>,<2650>);(<2631>,<2651>);(<2632>,<2652>);\
(<2633>,<2653>);(<2634>,<2654>);(<2635>,<2655>);\
(<2636>,<2656>);(<2637>,<2657>);(<2638>,<2658>);\
(<2721>,<2751>);(<2722>,<2752>);(<2723>,<2753>);\
(<2724>,<2754>);(<2725>,<2755>);(<2726>,<2756>);\
(<2727>,<2757>);(<2728>,<2758>);(<2729>,<2759>);\
(<272A>,<275A>);(<272B>,<275B>);(<272C>,<275C>);\
(<272D>,<275D>);(<272E>,<275E>);(<272F>,<275F>);\
(<2730>,<2760>);(<2731>,<2761>);(<2732>,<2762>);\
(<2733>,<2763>);(<2734>,<2764>);(<2735>,<2765>);\
(<2736>,<2766>);(<2737>,<2767>);(<2738>,<2768>);\
(<2739>,<2769>);(<273A>,<276A>);(<273B>,<276B>);\
(<273C>,<276C>);(<273D>,<276D>);(<273E>,<276E>);\
(<273F>,<276F>);(<2740>,<2770>);(<2741>,<2771>)
END LC_CTYPE
# Notes on LC_COLLATE:
# The following definition provide the straight forward
# collation for all characters in our Japanese locale.
# The ordering is based on byte-by-byte comparison of
# their codes (in UJIS). Any *fancy* ordering is not
# described here.
LC_COLLATE
order start
<NUL>
<SOH>
<STX>
<ETX>
<EOT>
<ENQ>
<ACK>
<alert>
<backspace>
<tab>
<newline>
<vertical-tab>
<form-feed>
<carriage-return>
<SO>
<SI>
<DLE>
<DC1>
<DC2>
<DC3>
<DC4>
<NAK>
<SYN>
<ETB>
<CAN>
<EM>
<SUB>
<ESC>
<IS4>
<IS3>
<IS2>
<IS1>
<space>
<exclamation-mark>
<quotation-mark>
<number-sign>
<dollar-sign>
<percent>
<ampersand>
<apostrophe>
<left-parenthesis>
<right-parenthesis>
<asterisk>
<plus-sign>
<comma>
<hypen>
<period>
<slash>
<zero>
<one>
<two>
<three>
<four>
<five>
<six>
<seven>
<eight>
<nine>
<colon>
<semicolon>
<less-than>
<equals-sign>
<greater-than>
<question-mark>
<commercial-at>
<A>
<B>
<C>
<D>
<E>
<F>
<G>
<H>
<I>
<J>
<K>
<L>
<M>
<N>
<O>
<P>
<Q>
<R>
<S>
<T>
<U>
<V>
<W>
<X>
<Y>
<Z>
<left-bracket>
<backslash>
<right-bracket>
<circumflex>
<underscore>
<grave-accent>
<a>
<b>
<c>
<d>
<e>
<f>
<g>
<h>
<i>
<j>
<k>
<l>
<m>
<n>
<o>
<p>
<q>
<r>
<s>
<t>
<u>
<v>
<w>
<x>
<y>
<z>
<left-brace>
<vertical-line>
<right-brace>
<tilde>
<DEL>
<2121>
...
<217E>
<2221>
...
<227E>
<2321>
...
<237E>
<2421>
...
<247E>
<2521>
...
<257E>
<2621>
...
<267E>
<2721>
...
<277E>
<2821>
...
<287E>
<2921>
...
<297E>
<2A21>
...
<2A7E>
<2B21>
...
<2B7E>
<2C21>
...
<2C7E>
<2D21>
...
<2D7E>
<2E21>
...
<2E7E>
<2F21>
...
<2F7E>
<3021>
...
<307E>
<3121>
...
<317E>
<3221>
...
<327E>
<3321>
...
<337E>
<3421>
...
<347E>
<3521>
...
<357E>
<3621>
...
<367E>
<3721>
...
<377E>
<3821>
...
<387E>
<3921>
...
<397E>
<3A21>
...
<3A7E>
<3B21>
...
<3B7E>
<3C21>
...
<3C7E>
<3D21>
...
<3D7E>
<3E21>
...
<3E7E>
<3F21>
...
<3F7E>
<4021>
...
<407E>
<4121>
...
<417E>
<4221>
...
<427E>
<4321>
...
<437E>
<4421>
...
<447E>
<4521>
...
<457E>
<4621>
...
<467E>
<4721>
...
<477E>
<4821>
...
<487E>
<4921>
...
<497E>
<4A21>
...
<4A7E>
<4B21>
...
<4B7E>
<4C21>
...
<4C7E>
<4D21>
...
<4D7E>
<4E21>
...
<4E7E>
<4F21>
...
<4F7E>
<5021>
...
<507E>
<5121>
...
<517E>
<5221>
...
<527E>
<5321>
...
<537E>
<5421>
...
<547E>
<5521>
...
<557E>
<5621>
...
<567E>
<5721>
...
<577E>
<5821>
...
<587E>
<5921>
...
<597E>
<5A21>
...
<5A7E>
<5B21>
...
<5B7E>
<5C21>
...
<5C7E>
<5D21>
...
<5D7E>
<5E21>
...
<5E7E>
<5F21>
...
<5F7E>
<6021>
...
<607E>
<6121>
...
<617E>
<6221>
...
<627E>
<6321>
...
<637E>
<6421>
...
<647E>
<6521>
...
<657E>
<6621>
...
<667E>
<6721>
...
<677E>
<6821>
...
<687E>
<6921>
...
<697E>
<6A21>
...
<6A7E>
<6B21>
...
<6B7E>
<6C21>
...
<6C7E>
<6D21>
...
<6D7E>
<6E21>
...
<6E7E>
<6F21>
...
<6F7E>
<7021>
...
<707E>
<7121>
...
<717E>
<7221>
...
<727E>
<7321>
...
<737E>
<7421>
...
<747E>
<7521>
...
<757E>
<7621>
...
<767E>
<7721>
...
<777E>
<7821>
...
<787E>
<7921>
...
<797E>
<7A21>
...
<7A7E>
<7B21>
...
<7B7E>
<7C21>
...
<7C7E>
<7D21>
...
<7D7E>
<7E21>
...
<7E7E>
order end
END LC_COLLATE
# Notes on LC_MESSAGES
#
# The name of this category is LC_MESSAGES here, as in
# POSIX.2 D10.
#
# Any string starting with any form of Latin Y and Hiragana
# or Katakana HA (which is the first sound of ``hai'') are
# recognized as affirmative answer. Negative answer is
# recognized by Latin N and Hiragana or Katakana I (which
# stands for ``iie'').
LC_MESSAGES
yesexpr "^[<y><Y><2379><2359><244F><254F>]"
noexpr "^[<n><N><236E><234E><2424><2524>]"
END LC_MESSAGES
# Notes for LC_MONETARY:
# The <yen> character used as currency_symbol is intended
# to be double-assigned to either <backslash> (whose code
# in ASCII is for YEN sign in JIS X0201) or <216F>.
LC_MONETARY
int_curr_symbol "<J><P><Y><space>"
currency_symbol "<yen>"
mon_decimal_point ""
mon_thousand_sep "<comma>"
mon_grouping "3;0"
positive_sign ""
negative_sign "-"
int_frac_digits "0"
frac_digits "0"
p_cs_precedes "1"
p_sep_by_space "0"
n_cs_precedes "1"
n_sep_by_space "0"
p_sign_posn "1"
n_sign_posn "4"
END LC_MONETARY
# Notes for LC_NUMERIC
# Japanese numeric representation is one imported from U.S.
# in late 19th century. So the following definition is
# exactly the same to U.S.
LC_NUMERIC
decimal_point "<period>"
thousands_sep "<comma>"
grouping "3;0"
END LC_NUMERIC
# Notes for LC_TIME
# Our definition uses Kanji if it is appropriate.
# We have same definision for abmon and mon, because Japanese
# full representation for a month is enough short. Ancient
# names (MUTUKI, KISARAGI, YAYOI, etc.) are not used in these
# days.
LC_TIME
# NICHI, GETSU, KA, SUI, MOKU, KIN, DO in Kanji.
abday "<467C>";"<376E>";"<3250>";"<3F65>";\
"<4C5A>";"<3662>";"<4554>"
# NICHI-YOBI, GETSU-YOBI, and so on.
day "<467C><4D4B><467C>";"<376E><4D4B><467C>";\
"<3250><4D4B><467C>";"<3F65><4D4B><467C>";\
"<4C5A><4D4B><467C>";"<3662><4D4B><467C>";\
"<4554><4D4B><467C>"
# 1-gatsu, 2-gatsu, and so on, in ``Kanji-digits''
abmon "<2331><376E>";"<2332><376E>";"<2333><376E>";\
"<2334><376E>";"<2335><376E>";"<2336><376E>";\
"<2337><376E>";"<2338><376E>";"<2339><376E>";\
"<2331><2330><376E>";"<2331><2331><376E>";\
"<2331><2332><376E>"
# same as abmon
mon "<2331><376E>";"<2332><376E>";"<2333><376E>";\
"<2334><376E>";"<2335><376E>";"<2336><376E>";\
"<2337><376E>";"<2338><376E>";"<2339><376E>";\
"<2331><2330><376E>";"<2331><2331><376E>";\
"<2331><2332><376E>"
# Era year definition: THIS IS AN X/OPEN EXTENSION
# This definition handles these 4 era only, i.e., HEISEI,
# SHOWA, TAISHO and MEIJI. Years befor MEIJI are printed
# as SEIREKI (which is ``A.D.'') or KIGENZEN (which is ``B.C.'')
era "+:2:1990/01/01:+*:<4A3F><402E>:%N%o<472F>";\
"+:1:1989/01/08:1989/12/31:<4A3F><402E>:%N<3835><472F>";\
"+:2:1927/01/01:1989/01/07:<3E3C><4F42>:%N%o<472F>";\
"+:1:1926/12/25:1926/12/31:<3E3C><4F42>:%N<3835><472F>";\
"+:2:1913/01/01:1926/12/24:<4267><4035>:%N%o<472F>";\
"+:1:1912/07/30:1912/12/31:<4267><4035>:%N<3835><472F>";\
"+:2:1869/01/01:1912/07/29:<4C40><3C23>:%N%o<472F>";\
"+:1:1868/09/08:1868/12/31:<4C40><3C23>:%N<3835><472F>";\
"+:1:1/1/1:1868/09/07:<403E><4E71>:%N%o<472F>";\
"-:1:-1/12/31:-*:<352A><3835><4130>:%N%o<472F>"
# expected output:
# HEISEI 2 NEN 8 GATSU 20 NICHI ( GETSU ) 9 JI 30 FUN 0 BYOU
# in Kanji. (Note that 1990 is Heisei 2.)
d_t_fmt "%E%m<376E>%d<467C><214A>%a<214B>%H<3B7E>%M<4A2C>%S<4943>"
# 2/8/20
# Note that %o is X/Open extension.
d_fmt "%o<slash>%m<slash>%d"
# 9:30:00
t_fmt "%H<colon>%M<colon>%S"
# GOZEN and GOGO in Kanji; they are not needed usually
am_pm "<3861><4130>";"<3861><3865>"
END LC_TIME
--
ささやま <Kaz.Sasayama@xxxxxxxxxxxxxxx> / 有限会社ハイパーコア
「日本で唯一のDebian GNU/Linuxコンサルタント」
* linux-announceニュースサービスでは、Linux関連の「ニュース」を募集中
* <URL:http://www.spice.or.jp/%7Ehypercor/linux-announce/>