The Unicode Blog: cldr 38

Showing posts with label cldr 38. Show all posts

Thursday, October 29, 2020

ICU 68 Released

Unicode® ICU 68 has just been released. ICU 68 updates to CLDR 38 locale data with many additions and corrections. ICU 68 brings support for locale-dependent smart unit preferences (road distance, temperature, etc.), implements locale ID canonicalization conformant with CLDR, and includes many other bug fixes and enhancements.

ICU is a software library widely used by products and other libraries to support the world's languages, implementing both the latest version of the Unicode Standard and of the Unicode locale data (CLDR).

For details, please see http://site.icu-project.org/download/68.

Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

Wednesday, October 28, 2020

Unicode CLDR Language Data v38 released

The final release of Unicode CLDR version 38 is now available. Unicode CLDR provides an update to the key building blocks for software supporting the world’s languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.

CLDR v38 focused on enhancing the support for existing locales: Support for units of measurement in inflected languages (phase 1), adding annotations (names and search keywords) for many more non-emoji symbols (~400), plus for Emoji v13.1. In this version, there is also substantially higher coverage for (in order of completeness): Norwegian Nynorsk, Hausa, Igbo, Breton, Quechua, Yoruba, Fulah (Adlam script), Chakma, Asturian, Sanskrit, and Dogri.

The Survey Tool has improvements in performance, and introduced structured forum requests to improve coordination among translators. We would like to thank the 393 language experts who contributed to this release.

There are some changes that affect existing specifications and data: for example, the plural rules for French changed to add a new category; the specification for using aliases is more rigorous, and some alias data has changed — along with the specification for handling locale identifier canonicalization. For more information, see Migration.

The overall changes to the data items were:

Added	Deleted	Changed
155,131	33,805	45,895

See additional details in the CLDR v38 Release note.

Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

Friday, October 9, 2020

Unicode CLDR Locale Data v38 beta available for testing

The beta version of Unicode CLDR version 38 is now available. The data will not be changed except for showstoppers, but the LDML v38 spec can still be changed. The final release of v38 is planned for October 28, 2020. If you find any problems, please file a ticket.

Unicode CLDR provides an update to the key building blocks for software supporting the world's languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.

CLDR v38 includes:

Enhancements to existing locale data: adding support for units of measurement in inflected languages (phase 1), adding annotations (names and search keywords) for Unicode symbols that are non-emoji (~400), and annotations for Emoji v13.1.
Survey Tool upgrades: substantial performance improvements, plus structured forum entries to improve coordination among translators.

LDML v38 includes:

To make the canonicalization of locale identifiers clear and unambiguous, provided major restructuring of the specification for it. (This was done in concert with fixes to the alias data to work better with the specification.)
To support inflected units of measurement:
- minimalPairs adds new elements
  caseMinimalPairs and genderMinimalPairs
- unit adds a new element gender
- grammaticalData adds new elements
  grammaticalDerivations, deriveCompound, and deriveComponent
- unitPattern adds a new attribute case
- grammaticalCase, grammaticalGender, grammaticalDefiniteness add a new attribute scope
- compoundUnitPattern1 adds new attributes case and gender
- compoundUnitPattern adds a new attribute case
To allow for overriding dictionary-based segmentation breaks, added the Unicode Dictionary Break Exclusion Identifier, with the new key “dx”.
For picking the correct units of measurement for locales, defined the userPreferences skeleton more precisely.
For accurate plural categories in compact numbers, added the 'c' operand to plural rules to provide formatting for languages such as French.

See additional details in the draft CLDR v38 Release note.

The overall changes to the data items were:

Added	Deleted	Changed	Total
155,131	33,805	45,895	2,175,821

Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

Tuesday, September 15, 2020

Unicode CLDR Locale Data v38 alpha available for testing

The alpha version of Unicode CLDR version 38 is now available for data testing. The final release of v38 is planned for October 22, 2020. If you find any problems with the data, please file a ticket.

Unicode CLDR provides an update to the key building blocks for software supporting the world's languages. CLDR data is used by all major software systems (including all mobile phones) for their software internationalization and localization, adapting software to the conventions of different languages.

CLDR v38 includes:

Enhancements to existing locale data: adding support for units of measurement in inflected languages (phase 1), adding annotations (names and search keywords) for Unicode symbols that are non-emoji (~400), and annotations for Emoji v13.1.
New locales added: Dogri and Sanskrit.
Survey Tool upgrades: substantial performance improvements, plus structured forum entries to improve coordination among translators.

See additional details in the draft CLDR v38 Release note

The overall changes to the data items were:

Added	Deleted	Changed
155,131	33,805	45,895

Over 140,000 characters are available for adoption to help the Unicode Consortium’s work on digitally disadvantaged languages

Thursday, October 29, 2020

ICU 68 Released

Wednesday, October 28, 2020

Unicode CLDR Language Data v38 released

Friday, October 9, 2020

Unicode CLDR Locale Data v38 beta available for testing

Tuesday, September 15, 2020

Unicode CLDR Locale Data v38 alpha available for testing

Links of Interest

Blog Archive

Labels

Followers

Thursday, October 29, 2020

ICU 68 Released

Wednesday, October 28, 2020

Unicode CLDR Language Data v38 released

Friday, October 9, 2020

Unicode CLDR Locale Data v38 beta available for testing

Tuesday, September 15, 2020

Unicode CLDR Locale Data v38 alpha available for testing

Links of Interest

Blog Archive

Labels

Followers

Subscribe to this blog