[chinese mac] CNS 11643 Not in Unicode

CNS 11643 in Unicode's Supplementary Private Use Area

These are hanzi found in CNS 11643 but not yet included in Unicode. You can see the glyphs in either TW-Kai-Plus-98_1.ttf or TW-Sung-Plus-98_1.ttf, available at https://data.gov.tw/dataset/5961 [Open_Data/Fonts/] You can install these fonts, but you don't have to install them to see what's in them. The utility Font File Browser works well for this. You can navigate to the code point in the Characters/Glyphs list (they are in Unicode order), or use this TXT file (and the others below) in the "Custom Text" view: [PUA_1-7.zip]

Planes 1-2

All CNS 11643-2007 Plane 1 and Plane 2 hanzi are in Unicode.

Plane 3

One CNS 11643-1992/2007 Plane 3 hanzi is not in Unicode: T3-6168, U+FFF79 = 󿽹

Plane 4

Eleven CNS 11643-1992/2007 Plane 4 hanzi are not in Unicode:

T4-225B, U+FFF7A = 󿽺
T4-2361, U+FFFFD = 󿿽
T4-276A, U+FFFFC = 󿿼
T4-2827, U+FFFFB = 󿿻
T4-287D, U+FFFFA = 󿿺
T4-2A6E, U+FFFF9 = 󿿹
T4-3042, U+FFFF8 = 󿿸
T4-385C, U+FFFF7 = 󿿷
T4-4458, U+FFFF6 = 󿿶
T4-533C, U+FFF7B = 󿽻*
T4-6339, U+FFFF5 = 󿿵
T4-655F, U+FFFF4 = 󿿴

* Unicode maps T4-533C to U+8786. There is a Compatibility Ideograph at U+2F9BE that maps to TF-517D. The CNS data omits T4-533C and instead maps TF-517D to the CJK Unified Ideograph at U+8786. This is probably an old mistake from before U+2F9BE existed, but it is/was based on a misunderstanding of how Unicode works. There are two glyphs that were unified at U+8786, one GSource (PRC) and one TSource (T4-533C)/JSource (Japan)/HSource (Hong Kong): [PDF] The GSource glyph is in CNS at TF-517D as a variant, so a Compatibility Ideograph was created for it: [PDF] The CNS data treats the GSource glyph at U+8786 as if it were normative -- it is not. Both glyphs are correct representations of the unified character.

One hanzi not in Unicode has been added since 2007: T4-6E5D, U+FFB68 = 󿭨

Plane 5

Two CNS 11643-1992/2007 Plane 5 hanzi are not in Unicode:

T5-234B, U+FFFF3 = 󿿳
T5-756C, U+FFFF2 = 󿿲

Three hanzi not in Unicode have been added since 2007:

T5-7C52, U+F8FDD = 󸿝
T5-7C53, U+F8FAF = 󸾯
T5-7C54, U+FFB67 = 󿭧

Planes 6-7

Three CNS 11643-1992/2007 Plane 6 and Plane 7 hanzi are not in Unicode:

T6-2A21, U+FFFF1 = 󿿱
T6-5C2F, U+FFFF0 = 󿿰
T7-4159, U+FFFEF = 󿿯

Three hanzi not in Unicode have been added since 2007:

T6-647B, U+FFB8A = 󿮊
T7-6656, U+F8FDE = 󸿞
T7-6657, U+F8FDA = 󸿚

Planes 8-9

In CNS 11643-2007, Planes 8 and 9 were not used for hanzi, but 974 hanzi have been added to Plane 9 since then. None are in Unicode: [PUA_9.zip]

Planes 10-11 [0xA-B]

There are 4,172 hanzi not in Unicode on Plane 11: [PUA_11.zip]

Plane 12 [0xC]

There are 5,354 hanzi not in Unicode on Plane 12: [PUA_12.zip]

Plane 13 [0xD]

There are 4,523 hanzi not in Unicode on Plane 13: [PUA_13.zip]

Plane 14 [0xE]

There are 3,113 hanzi not in Unicode on Plane 14: [PUA_14.zip]

Plane 15 [0xF]

There are 175 hanzi not in Unicode on Plane 15: [PUA_15.zip]

Plane 17 [0x11]

28 hanzi, none in Unicode: [PUA_17.zip]

Plane 19 [0x13]

1,760 hanzi, none in Unicode: [PUA_19.zip]

SOURCE: https://data.gov.tw/dataset/5961 [Open_Data/Mapping Tables/Unicode/CNS2UNICODE_Unicode 15.txt] [11/3/2017]