Errata fixed in Unicode 4.1.0

Unicode 4.1.0

Home | Site Map | Search

Related Links

Updates and Errata

Unicode 4.1.0

Unicode 4.0.1

Unicode 4.0

Unicode 3.2

Errata Fixed in Unicode 4.1.0

This page contains the definitive listing of all errata of record since the publication of The Unicode Standard, Version 4.0 and considered resolved by the release of Unicode Version 4.1. These errata are listed by date in the table below. For prior errata resolved in Unicode 4.0 and earlier, see Errata Fixed in Unicode 4.0.0 For errata still pending subsequent to the release of Unicode 4.1.0, see the list of current Updates and Errata.

Date Summary
2005-February-17 The following 7 Unified Han Ideographs are shown with incorrect representative glyphs in the printed code charts for Unicode 4.0. The representative glyph on the left below shows each character as it appeared in the earlier versions of the code charts; the glyph on the right shows each character as it should appear.

2004-November-15 In the character code charts for Unicode 4.0 (http://www.unicode.org/charts/PDF/Unicode-4.0/U40-2B00.pdf) the following characters are shown with an incorrect representative glyph, which contradict their character names: U+2B00, U+2B01, U+2B08, and U+2B09. The incorrect glyphs for each pair are shown on the left, the corrected glyphs are shown on the right. The correction ensures that the glyphs match the character identity as defined by the character names
.

2004-November-15 In the character code charts for Unicode 4.0 (http://www.unicode.org/charts/PDF/Unicode-4.0/U40-2B00.pdf) the character U+01B3 LATIN CAPITAL LETTER Y WITH HOOK is shown with a representative glyph, which is not the preferred form. The preferred form, with hook on the right is shown here:

2004-July-02 In the 4.0.0 and 4.0.1 versions of UAX #14 an update to the rules for handling the WJ and GL class was omitted. The pair table, including its annotations that reflect which rules are invoked for each pair were updated correctly. However, the text of the rules should have been updated as follows to split WJ off from GL and relax the rules for GL to allow SPACE to override the non-breaking nature of GL:
Word Joiner ~~Non-breaking characters~~:

LB 11b Don’t break before or after WORD JOINER, ~~NBSP, and related characters~~
× WJ GL

GL WJ ×

Spaces:
LB 12 Break after spaces
SP ÷

~~Many existing implementations reverse the order of precedence between rules LB11b and LB12.~~
Non-breaking characters:

LB 13 Don’t break before or after NBSP, and related characters

× GL

GL ×

Where the change in rule 12 only affects the comment. The modification section should have read:
Several changes to the rules. Moved rule 15b to 18b, added 14b, split rule 13 and moved WJ from 13 to 11b. Split rule 6 in to 6a and7b and split rule 3a into 3a and 3b. Restated rule 7a and added rule 7c.

2004-April-22 In the 4.0.1 version of UAX #29 in Table 1. Default Grapheme Cluster Boundaries there is a mistake in an explanation. The property value is correct but the example is not.
The following:

Hangul_Syllable_Type=L, e.g.:
U+1100 (ᄀ) HANGUL CHOSEONG KIYEOK
..U+115F (ᅟ) HANGUL CHOSEONG FILLER

should have been:

Hangul_Syllable_Type=L, e.g.:
U+1100 (ᄀ) HANGUL CHOSEONG KIYEOK
..U+1159 (ᅙ) HANGUL CHOSEONG YEORINHIEUH
U+115F (ᅟ) HANGUL CHOSEONG FILLER

Also in Table 1. Default Grapheme Cluster Boundaries, the definition of the value Control is incorrect. It needed to have been adjusted for the change in status of the Joiner characters.
After the line:

and not U+000A LINE FEED (LF)

the following text is missing:

and not U+200C ZERO WIDTH NON-JOINER (ZWNJ)
and not U+200D ZERO WIDTH JOINER (ZWJ)

The UTC has committed to having the two properties Numeric_Type:Decimal and General_Category:Decimal_Number in the Unicode Character Database encompass exactly the same characters. In 4.0.1, a production error caused this to be broken.

The following lines in UnicodeData.txt:
1369;ETHIOPIC DIGIT ONE;Nd;0;L;;;1;1;N;;;;;
...
1371;ETHIOPIC DIGIT NINE;Nd;0;L;;;9;9;N;;;;;
should have been:
1369;ETHIOPIC DIGIT ONE;Nd;0;L;;1;1;1;N;;;;;
...
1371;ETHIOPIC DIGIT NINE;Nd;0;L;;9;9;9;N;;;;;

In DerivedNumericTypes.txt the following line:
1369..1371 ; Digit # Nd [9] ETHIOPIC DIGIT ONE..ETHIOPIC DIGIT NINE
should have been:
1369..1371 ; Decimal # Nd [9] ETHIOPIC DIGIT ONE..ETHIOPIC DIGIT NINE
The precise numeric properties of these characters are under review and the noted inconsistency will be resolved in the next version of the standard.
In DerivedCoreProperties.txt in 4.0.1, the comment line with the derivation for Default_Ignorable_Code_Point is in error.
The following:

# Generated from Other_Default_Ignorable_Code_Point + Cf + Cc + Cs
+ Noncharacters - White_Space - Annotation_characters
should have been:
# Generated from Other_Default_Ignorable_Code_Point + Cf + Cc + Cs
# + Variation_Selector + Noncharacter_Code_Point
# - White_Space - Annotation_characters

In the 4.0.1 version of UCD.html under BIDIClass, above the row:
L Otherwise

the following default properties for BN should have been added:

BN U+2064..U+2069, U+FDD0..U+FDEF, U+FFFE..U+FFFF, U+1FFFE..U+1FFFF, U+2FFFE..U+2FFFF, U+3FFFE..U+3FFFF, U+4FFFE..U+4FFFF, U+5FFFE..U+5FFFF, U+6FFFE..U+6FFFF, U+7FFFE..U+7FFFF, U+8FFFE..U+8FFFF, U+9FFFE..U+9FFFF, U+AFFFE..U+AFFFF, U+BFFFE..U+BFFFF, U+CFFFE..U+CFFFF, U+DFFFE..U+E0000, U+E0002..U+E001F, U+E0080..U+E00FF, U+E01F0..U+E0FFF, U+EFFFE..U+EFFFF, U+FFFFE..U+FFFFF, U+10FFFE..U+10FFFF

2004-March-7 3396 SQUARE ML: The representative glyph on the left below shows the character as it appeared in some versions of previous code charts; the glyph on the right shows the character as it should appear (with lower case 'm'). The representative glyph was inadvertently shown with an upper case 'M' in Unicode 4.0 and Unicode 2.0, but was shown correctly in all other versions.


2004-March-7 In the character code charts for Unicode 4.0 (http://www.unicode.org/charts/PDF/Unicode-4.0/U40-1D300.pdf) the following characters are shown with an incorrect representative glyph: U+1D301, U+1D302, and U+1D303. The incorrect glyphs are shown on the left, the corrected glyphs are shown on the right.



2003-December-01 The following Unified Han Idegraphs are shown with incorrect representative glyphs in the online code charts for Unicode 3.1(http://www.unicode.org/charts/PDF/Unicode-3.1/U31-20000.pdf) and the printed code charts for Unicode 4.0: U+2384F, U+25D0E, U+27CF1, and U+2890F. The representative glyph on the left below shows each character as it appeared in the earlier versions of the code charts; the glyph on the right shows each character as it should appear.





2003-August-25 The annotation for 200B in the Unicode code charts should read:
* This character is intended for line break control. It has no width, but its presence between two characters does not prevent increased letter spacing in justification.

2003-May-23 031A COMBINING LEFT ANGLE ABOVE: The representative glyph on the left below shows the character as it appeared in some versions of previous charts; the glyph on the right shows the character as it should appear (with left angle aligned over right shoulder of base character, not centered). The representative glyph was inadvertantly centered in Unicode 3.0, but was shown correctly in earlier versions.