Errata fixed in Unicode 5.1.0

Unicode 5.1.0

Home | Site Map | Search

Related Links

Updates and Errata

Unicode 5.1.0

Unicode 5.0.0

Unicode 4.1.0

Unicode 4.0.1

Unicode 4.0

Unicode 3.2

Errata Fixed in Unicode 5.1.0

This page contains the definitive listing of all errata of record since the publication of The Unicode Standard, Version 5.0 and considered resolved by the release of Unicode Version 5.1. These errata are listed by date in the table below. For prior errata resolved in Unicode 5.0 and earlier, see Errata Fixed in Unicode 5.0.

For errata still pending subsequent to the release of Unicode 5.1.0, see the list of current Updates and Errata.

Date Summary
2008-March-07 The representative glyph for U+1D81 in the Unicode 5.0 chart has an extraneous line running from the lower right to upper left side of the glyph. It is most visible at high resolutions. The incorrect glyph is shown on the left, and a corrected glyph on the right.

2007-November-20 The representative glyph for U+1E9A LATIN SMALL LETTER A WITH RIGHT HALF RING in Unicode 2.0 has the ring well to the right. The representative glyph in Unicode 3.0 and later incorrectly had the right half ring over the base letter. Below are shown the incorrect glyph on the left, and the corrected glyph on the right:

2007-August-23 In the code charts for Unicode Version 5.0, the glyphs for U+0333 and U+0347 are incorrect. The glyph for U+0333 should be longer. The glyph for U+0347 should be shorter. The glyphs were not merely swapped: the correct glyph for U+0333 should be longer than the incorrect glyph for U+0347. Below are shown incorrect glyphs on the left and corrected glyphs on the right:

2007-July-30 In the code charts for Unicode Versions 5.0 and earlier, the representative glyphs for U+0460 and U+047E are shown with "broad omega" shaped glyphs. These are being corrected to show "W"-shaped glyphs for the uppercase letters, matching the shapes of their lowercase counterparts. The incorrect glyphs are shown on the left; the corrected glyphs are shown on the right.

2007-June-7
In the 5.0 code charts, the names for U+075E and U+075F are correct, but the glyphs should be swapped.

2007-April-19 In the code charts for Unicode Versions 5.0 and earlier, the representative glyphs for U+0478 and U+0479 are shown in an Old Church Slavonic (OCS) style typeface. The decision to encode a monograph uk character for OCS has made that style choice inappropriate for these characters. The incorrect glyphs are shown on the left; the corrected glyphs are shown on the right.

2007-April-12 In the code charts for Unicode Versions 5.0 and earlier, the lower bar on the glyph for U+2626 ORTHODOX CROSS is slanted downward in the wrong direction. The incorrect glyph is shown on the left; the corrected glyph is shown on the right.

2007-March-14 In UAX #15, Unicode Normalization Forms, for Unicode 5.0, there is an erroneous statement in the last paragraph of Section 14, Detecting Normalization Forms. The text currently states:
"...that no string when decomposed with NFD expands to more than 3x in length (measured in code units)."
That text should be corrected to state:
"...that no string when normalized to NFC expands to more than 3x in length (measured in code units)."
2007-February-14 In the code charts for Unicode Versions 5.0 and earlier, the representative glyphs for U+047C and U+047D represent an incorrect understanding of the nature of the character that was encoded ("beautiful omega"). The incorrect glyphs are shown on the left; the corrected glyphs are shown on the right.

2007-February-02 The sample code in Section 7 of UAX#14 does not handle leading spaces correctly. Adding the following code before the loop provides a fix:

// treat SP at start of input as if it followed WJ
if (cls == SP)
cls = WJ;

2007-January-25 In the file DerivedCoreProperties.txt in the Version 5.0 Unicode Character Database, the stated rule in the comments for the generation of the Default_Ignorable_Code_Point property is incomplete. The rule should include all characters with the Variation_Selector property, so that the complete statement of the rule is:

Other_Default_Ignorable_Code_Point + Cf + Cc + Cs
+ Noncharacter_Code_Point + Variation_Selector - White_Space
- FFF9..FFFB (Annotation Characters)

The actual listing of characters in the data file with the Default_Ignorable_Code_Point property is correct.
Note that the stated rule was further updated for Version 5.1 of the standard, so the correction in this erratum notice applies only to the Version 5.0 data file.

2007-January-22 The code point U+00A0 was supposed to have the Sentence_Break property value Sp in the Unicode Character Database for Version 5.0, but that change was overlooked in the updating of SentenceBreakProperty.txt. This will be corrected in a subsequent version of the standard.
2006-September-11 In the code charts for Unicode Version 5.0, the representative glyphs for U+1031 was incorrectly imaged on the wrong side of the dotted circle. The incorrect glyph is shown on the left; the corrected glyph is shown on the right.

2006-September-10
The Index.txt file in version 5.0.0 of the Unicode Character Database is not valid UTF-8. The following substitutions will fix the file:

Replace byte 0x92 in line 74 by U+00FC [ü] LATIN SMALL LETTER U WITH DIAERESIS. Replace byte 0xe1 in lines 854 and 1549 by a space.