The Unicode Blog: Arabic

Showing posts with label Arabic. Show all posts

Tuesday, September 14, 2021

Announcing The Unicode® Standard, Version 14.0

Version 14.0 of the Unicode Standard is now available, including the core specification, annexes, and data files. This version adds 838 characters, for a total of 144,697 characters. These additions include five new scripts, for a total of 159 scripts, as well as 37 new emoji characters.

The new scripts and characters in Version 14.0 add support for modern language groups in Bosnia, India, Indonesia, Iran, Java, Malaysia, Mongolia, Myanmar, Pakistan, and the Philippines, plus other languages in Africa and North America, including:

Arabic script additions that include honorifics and additions for Quranic use, and characters used to write languages across Africa, the Balkans, and South and Southeast Asia
The Vithkuqi script historically used to write Albanian and currently undergoing a modern revival
The Tangsa script used to write the Tangsa language, spoken in India and Myanmar
The Toto script used to write the Toto language in northeast India
Many Latin script additions for extended IPA

Popular symbol additions include:

37 emoji characters, including several new emoji for emotion and hand gestures (smileys, hands, animals and nature, food and drink, transport, and activities). For the full list of new emoji characters, see emoji additions for Unicode 14.0, and Emoji Counts. For a detailed description of support for emoji characters by the Unicode Standard, see UTS #51, Unicode Emoji.

Other symbol and notational additions include:

The som currency sign used in the Kyrgyz Republic
Znamenny musical notation developed in Russia

Support for other modern languages and scholarly work extends worldwide, including:

Cypro-Minoan, historically used primarily on the island of Cyprus
Old Uyghur, historically used in Central Asia and elsewhere to write Turkic, Chinese, Mongolian, Tibetan, and Arabic languages
Ahom, Balinese, Brahmi, Canadian aboriginal languages, Glagolitic, Kaithi, Kannada, Mongolian, Tagalog, Takri, and Telugu
Arabic support for Hausa, Wolof, Hindko, and Punjabi, and Ethiopic support for Gurage

Important chart font updates, including:

Significant updates to the CJK auxiliary blocks and enclosed alphanumerics

Unicode properties and specifications determine the behavior of text on computers and phones. Changes in Version 14.0 include the following Unicode Standard Annexes and Technical Standards that have notable modifications:

Five important Unicode annexes updated for Version 14.0:

Three important Unicode specifications updated for Version 14.0:

UTS #10, Unicode Collation Algorithm — sorting Unicode text
UTS #39, Unicode Security Mechanisms — reducing Unicode spoofing
UTS #46, Unicode IDNA Compatibility Processing — compatible processing of non-ASCII URLs

The Unicode Standard is the foundation for all modern software and communications around the world, including operating systems, browsers, laptops, and smart phones—plus the Internet and Web (URLs, HTML, XML, CSS, JSON, etc.). The Unicode Standard, its associated standards, and data form the foundation for CLDR and ICU releases.

Over 144,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

Tuesday, March 10, 2020

Announcing The Unicode® Standard, Version 13.0

Version 13.0 of the Unicode Standard is now available, including the core specification, annexes, and data files. This version adds 5,390 characters, for a total of 143,859 characters. These additions include four new scripts, for a total of 154 scripts, as well as 55 new emoji characters.

The new scripts and characters in Version 13.0 add support for modern language groups in Africa, Pakistan, South Asia, and China:

Arabic script additions to write Hausa, Wolof, and other languages in Africa, and other additions to write Hindko and Punjabi in Pakistan
A character for Syloti Nagri in South Asia
Bopomofo additions for Cantonese

Support for scholarly work was extended worldwide, including:

Yezidi, historically used in Iraq and Georgia for liturgical purposes, with some modern revival of usage
Chorasmian, historically used in Central Asia across Uzbekistan, Kazakhstan, and Turkmenistan to write an extinct Eastern Iranian language
Dives Akuru, historically used in the Maldives until the 20th century
Khitan Small Script, historically used in northern China

Popular symbol additions include:

55 emoji characters, including several new emoji for smileys, gender neutral people, animals, and the potted plant. For the full list of new emoji characters, see emoji additions for Unicode 13.0, and Emoji Counts. For a detailed description of support for emoji characters by the Unicode Standard, see UTS #51, Unicode Emoji.
Six Creative Commons license symbols that are used to describe functions, permissions, and concepts related to intellectual property that have widespread use on the web
Two Vietnamese reading marks that mark ideographs as having a distinct, colloquial reading
214 graphic characters that provide compatibility with various home computers from the mid-1970s to the mid-1980s and with early teletext broadcasting standards

Support for Chinese, Japanese, and Korean (CJK) unified ideographs was enhanced in Version 13.0 by the addition of 4,939 characters in Extension G, which is the first block to be encoded in Plane 3, as well as by significant corrections and improvements to the Unihan database. Changes to Unihan include updated regular expressions for many properties, the addition of several new properties, and the removal of three obsolete provisional properties. See UAX #38, Unicode Han Database (Unihan) for more information on the updates.

Important chart font updates, including:

An update to the code charts for the Adlam script, now using the Ebrima font. That font has an improved design and has gained widespread acceptance in the user community.
A completely updated font for the CJK Radicals Supplement and the Kangxi Radicals blocks. This font is also used to show the radicals in the CJK unified ideographs code charts, as well as in the radical-stroke indexes.

Additional support for lesser-used languages and scholarly work was extended, including:

A character used in Sinhala to write Sanskrit

Unicode properties and specifications determine the behavior of text on computers and phones. Changes in Version 13.0 include the following Unicode Standard Annexes and Technical Standards that have notable modifications:

Five important Unicode annexes updated for Version 13.0:

Three important Unicode specifications updated for Version 13.0:

UTS #10, Unicode Collation Algorithm — sorting Unicode text
UTS #39, Unicode Security Mechanisms — reducing Unicode spoofing
UTS #46, Unicode IDNA Compatibility Processing — compatible processing of non-ASCII URLs

Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

Tuesday, October 9, 2018

Unicode Arabic Mark Rendering UTR #53 Now Published

The combining classes of Arabic combining characters in Unicode are different than combining classes in most other scripts. They are a mixture of special classes for specific marks plus two more generalized classes for all the other marks. This has resulted in inconsistent and/or incorrect rendering for sequences with multiple combining marks since Unicode 2.0.

The Arabic Mark Transient Reordering Algorithm (AMTRA) described in UTR #53 is the recommended solution to achieving correct and consistent rendering of Arabic combining mark sequences. This algorithm provides results that match user expectations and assures that canonically equivalent sequences are rendered identically, independent of the order of the combining marks.

The concepts in this algorithm were first proposed four years ago by Roozbeh Pournader. We are pleased it has now been published as an official Technical Report.

Adopt-a-Character

Over 130,000 characters are available for adoption, to help the Unicode Consortium’s work on digitally disadvantaged languages.

Monday, September 25, 2017

Proposed Draft UTR #53, Unicode Arabic Mark Ordering Algorithm Now Available for Public Review

The Unicode Consortium has released Proposed Draft Unicode Technical Report #53, Unicode Arabic Mark Ordering Algorithm. This UTR describes an algorithm for determining correct rendering of Arabic combining mark sequences.

The combining classes of Arabic combining characters in Unicode are a mixture of special classes for specific marks plus two more generalized classes for all the other marks. For many years this has resulted in inconsistent rendering for sequences with multiple combining marks such as:

The algorithm described in this UTR provides a method to reorder Arabic combining marks in order to accomplish the following goals:

The inside-out rendering rule will display combining marks in the expected visual order.
Ensure identical display of canonically equivalent sequences.
Provide a mechanism for overriding the display order in exceptional cases.

The document is in “Proposed Draft” state, and made available for public review and comment. Information about this type of document can be found on the About Unicode Technical Reports page.

For information about how to discuss this Public Review Issue and how to supply formal feedback, please see the PRI #359 page.

Thursday, January 14, 2016

Proposed Update UAX #9, Unicode Bidirectional Algorithm

A new proposed update of UAX #9, Unicode Bidirectional Algorithm for the Unicode 9.0 release is now available for public review and comment.

The table in Section 2.7, Markup and Formatting, has been updated to reflect changes to isolates in HTML5 and CSS.

For further information and instructions on how to leave feedback, please see Public Review Issue #315.

Wednesday, November 25, 2015

New Character Property for Prepended Concatenation Marks

The Unicode Technical Committee is seeking feedback on a proposal to define a new character property for the class of prepended concatenation marks, also referred to as prefixed format control characters or, more generically, as subtending marks. Characters in that class include U+0600 ARABIC NUMBER SIGN and U+06DD ARABIC END OF AYAH. The new property, named Prepended_Concatenation_Mark and targeted for Unicode 9.0, would provide a mechanism to handle subtending marks collectively via properties rather than by hardcoded enumeration. A detailed description of the issue and how to provide feedback are given in Public Review Issue #310.

Tuesday, September 14, 2021

Announcing The Unicode® Standard, Version 14.0

Tuesday, March 10, 2020

Announcing The Unicode® Standard, Version 13.0

Tuesday, October 9, 2018

Unicode Arabic Mark Rendering UTR #53 Now Published

Adopt-a-Character

Monday, September 25, 2017

Proposed Draft UTR #53, Unicode Arabic Mark Ordering Algorithm Now Available for Public Review

Thursday, January 14, 2016

Proposed Update UAX #9, Unicode Bidirectional Algorithm

Wednesday, November 25, 2015

New Character Property for Prepended Concatenation Marks

Links of Interest

Blog Archive

Labels

Followers

Tuesday, September 14, 2021

Announcing The Unicode® Standard, Version 14.0

Tuesday, March 10, 2020

Announcing The Unicode® Standard, Version 13.0

Tuesday, October 9, 2018

Unicode Arabic Mark Rendering UTR #53 Now Published

Adopt-a-Character

Monday, September 25, 2017

Proposed Draft UTR #53, Unicode Arabic Mark Ordering Algorithm Now Available for Public Review

Thursday, January 14, 2016

Proposed Update UAX #9, Unicode Bidirectional Algorithm

Wednesday, November 25, 2015

New Character Property for Prepended Concatenation Marks

Links of Interest

Blog Archive

Labels

Followers

Subscribe to this blog