Skip to main content
Resources

Second-Level Reference Label Generation Rules

ICANN has developed second-level Internationalized Domain Name (IDN) tables in machine-readable format or Label Generation Rules (LGRs) that registry operators can reference while designing their IDN tables. These reference LGRs will be used by ICANN org when reviewing IDN tables submitted for use with the generic top-level domains (gTLDs).

The reference LGRs have been developed using guidelines, which have been reviewed by the community. These LGRs are provided below in the XML format along with a more readable HTML format.

If you have questions or feedback regarding these reference LGRs, please send an email to IDNprogram@icann.org.

Current Version (24 January 2024)

The current version of Second-Level Reference LGRs were developed in consultation with the respective script communities following detailed analysis of the Root Zone Label Generation Rules (RZ-LGR) and after a Public Comment proceeding.

This version includes:

  • The script-based Reference LGRs including seven new scripts: Armenian, Cyrillic, Greek, Latin, Japanese, Korean, and Myanmar.
  • In some cases, LGRs may be used in concert with other LGRs under a TLD. In such case, a set of "full-variant" LGR has been defined that collectively contain the suggested cross-script variants identified to mitigate whole-script homograph labels.
  • A Common LGR has been created by merging the data from the script based LGRs. This file is intended for collision checking, particularly in the case where multiple LGRs are used in the same zone.
  • Language LGRs which were derived a repertoire that is based on the Root Zone LGR but restricted to specific languages.

See the Overview and Summary document for further details about these LGRs. The package of all LGRs is available here [ZIP, 7.3MB]. The changes from previous are document in this change log document.

Script-based LGRs

Name Language Tag1 LGR Document
Arabic und-Arab HTML, XML
Armenian und-Armn HTML, XML
Bangla (Bengali) und-Beng HTML, XML
Cyrillic und-Cyrl HTML, XML
Devanagari und-Deva HTML, XML
Ethiopic und-Ethi HTML, XML
Georgian und-Geor HTML, XML
Greek und-Grek HTML, XML
Gujarati und-Gujr HTML, XML
Gurmukhi und-Guru HTML, XML
Hebrew und-Hebr HTML, XML
Japanese und-Jpan HTML, XML
Kannada und-Knda HTML, XML
Khmer und-Khmr HTML, XML
Lao und-Laoo HTML, XML
Latin und-Latn HTML, XML
Malayalam und-Mlym HTML, XML
Myanmar und-Mymr HTML, XML
Oriya und-Orya HTML, XML
Sinhala und-Sinh HTML, XML
Tamil und-Taml HTML, XML
Telugu und-Telu HTML, XML
Thai und-Thai HTML, XML

1: The prefix 'und' (Undetermined) identifies linguistic content whose language is not determined. Please see RFC5646 for details of the language tag syntax and IANA language sub tag registry for the available language tags.

Full Variant Set LGRs and Common LGR

Name Language Tag Script Collection LGR Document
Chinese (Full Variant Set) und-Hani Han used in Chinese, Korean, Japanese scripts HTML, XML
Devanagari (Full Variant Set) und-Deva Devanagari, Bengali, and Gurmukhi HTML, XML
Korean (Full Variant Set) und-Kore Hangul and Han used in Chinese and Korean script HTML, XML
Latin (Full Variant Set) und-Latn Armenian, Cyrillic, Greek, Hebrew, and Latin HTML, XML
Myanmar (Full Variant Set) und-Mymr Georgian, Latin, Malayalam, Myanmar, and Oriya HTML, XML
Tamil (Full Variant Set) und-Taml Tamil and Malayalam HTML, XML
Telugu (Full Variant Set) und-Telu Kannada and Telugu HTML, XML
Common LGR Multiple Tags All scripts HTML, XML

Language-based LGRs

Name Language Tag2 LGR Document
Arabic ar HTML, XML
Belarusian be HTML, XML
Bosnian (Cyrillic) bs-Cyrl HTML, XML
Bosnian (Latin) bs HTML, XML
Bulgarian bg HTML, XML
Chinese zh HTML, XML
Danish da HTML, XML
English en HTML, XML
Finnish fi HTML, XML
French fr HTML, XML
German de HTML, XML
Hebrew he HTML, XML
Hindi hi HTML, XML
Hungarian hu HTML, XML
Icelandic is HTML, XML
Italian it HTML, XML
Japanese (Standalone) ja HTML, XML
Korean (Hangul) ko HTML, XML
Latvian lv HTML, XML
Lithuanian lt HTML, XML
Macedonian mk HTML, XML
Montenegrin cnr-Cyrl HTML, XML
Norwegian no HTML, XML
Polish pl HTML, XML
Portuguese pt HTML, XML
Russian ru HTML, XML
Serbian sr-Cyrl HTML, XML
Spanish es HTML, XML
Swedish sv HTML, XML
Thai th HTML, XML
Ukrainian uk HTML, XML

2: Where the default script is not identified, the script information is included to avoid ambiguity.

Archive

Name Language/ Script Additional Information
Arabic Language Version 2,18 May 2021 (HTML, XML).
Version 1, 13 January 2021 (HTML, XML)
View public comment materials.
Arabic Script Version 2, 1 January 2022 (HTML, XML)
Version 1, 22 April 2021 (HTML, XML)
View public comment materials.
Bangla (Bengali) Script Version 1, 15 December 2020 (HTML, XML)
View public comment materials.
Belarusian Language Version 2, 18 May 2021 (HTML, XML)
Version 1, 19 December 2016 (HTML, XML)
View public comment materials.
Bosnian (Cyrillic) Language Version 2, 18 May 2021 (HTML, XML)
Version 1, 10 October 2016 (HTML, XML)
View public comment materials.
Bosnian (Latin) Language Version 2, 18 May 2021 (HTML, XML)
Version 1, 10 October 2016 (HTML, XML)
View public comment materials.
Bulgarian Language Version 2, 18 May 2021 (HTML, XML)
Version 1, 10 October 2016 (HTML, XML)
View public comment materials.
Chinese Language Version 2, 11 January 2022 (HTML, XML)
Version 1, 15 December 2020 (HTML, XML)
View public comment materials.
Danish Language Version 2, 18 May 2021 (HTML, XML)
Version 1, 10 October 2016 (HTML, XML)
View public comment materials.
Devanagari Script Version 1: 15 December 2020 (HTML, XML)
View public comment materials.
English Language Version 2, 18 May 2021 (HTML, XML)
Version 1, 10 October 2016 (HTML, XML)
View public comment materials.
Ethiopic Script Version1, 15 December 2020 (HTML, XML)
View public comment materials.
Finnish Language Version 2, 18 May 2021 (HTML, XML)
Version 1, 10 October 2016 (HTML, XML)
View public comment materials.
French Language Version 2, 18 May 2021 (HTML, XML)
Version 1, 10 October 2016 (HTML, XML)
View public comment materials.
Georgian Script Version 1, 15 December 2020 (HTML, XML)
View public comment materials.
German Language Version 2, 18 May 2021 (HTML, XML)
Version 1, 10 October 2016 (HTML, XML)
View public comment materials.
Gujarati Script Version 1, 15 December 2020 (HTML, XML)
View public comment materials.
Gurmukhi Script Version 1, 15 December 2020 (HTML, XML)
View public comment materials.
Hebrew Language Version 2, 18 May 2021 (HTML, XML)
Version 1, 22 April 2021 (HTML, XML)
View public comment materials.
Hebrew Script Version 1, 22 April 2021 (HTML, XML)
View public comment materials.
Hindi Language Version 2, 18 May 2021 (HTML, XML)
Version 1, 15 December 2020 (HTML, XML)
View public comment materials.
Hungarian Language Version 2, 18 May 2021 (HTML, XML)
Version 1, 10 October 2016 (HTML, XML)
View public comment materials.
Icelandic Language Version 2, 18 May 2021 (HTML, XML)
Version 1, 10 October 2016 (HTML, XML)
View public comment materials.
Italian Language Version 2, 18 May 2021 (HTML, XML)
Version 1, 10 October 2016 (HTML, XML)
View public comment materials.
Kannada Script Version 1, 15 December 2020 (HTML, XML)
View public comment materials.
Khmer Script Version 1, 15 December 2020 (HTML, XML)
View public comment materials.
Korean Language Version 2, 18 May 2021 (HTML, XML)
Version 1, 10 October 2016 (HTML, XML)
View public comment materials.
Lao Script Version 2, 22 April 2021 (HTML, XML)
Version 1, 15 December 2020 (HTML, XML) View public comment materials.
Latvian Language Version 2, 18 May 2021 (HTML, XML)
Version 1, 10 October 2016 (HTML, XML)
View public comment materials.
Lithuanian Language Version 2, 18 May 2021 (HTML, XML)
Version 1, 10 October 2016 (HTML, XML)
View public comment materials.
Macedonian Language Version 2, 18 May 2021 (HTML, XML)
Version 1, 10 October 2016 (HTML, XML)
View public comment materials.
Malayalam Script Version 1, 15 December 2020 (HTML, XML)
View public comment materials.
Montenegrin Language Version 2, 18 May 2021 (HTML, XML)
Version 1, 10 October 2016 (HTML, XML)
View public comment materials.
Norwegian Language Version 2, 18 May 2021 (HTML, XML)
Version 1, 10 October 2016 (HTML, XML)
View public comment materials.
Oriya Script Version 1, 15 December 2020 (HTML, XML)
View public comment materials.
Polish Language Version 2, 18 May 2021 (HTML, XML)
Version 1, 10 October 2016 (HTML, XML)
View public comment materials.
Portuguese Language Version 2, 18 May 2021 (HTML, XML)
Version 1, 10 October 2016 (HTML, XML)
View public comment materials.
Russian Language Version 2, 18 May 2021 (HTML, XML)
Version 1, 10 October 2016 (HTML, XML)
View public comment materials.
Serbian Language Version 2, 18 May 2021 (HTML, XML)
Version 1, 10 October 2016 (HTML, XML)
View public comment materials.
Sinhala Script Version 1, 22 April 2021 (HTML, XML)
View public comment materials.
Spanish Language Version 2, 18 May 2021 (HTML, XML)
Version 1, 10 October 2016 (HTML, XML)
View public comment materials.
Swedish Language Version 2, 18 May 2021 (HTML, XML)
Version 1, 10 October 2016 (HTML, XML)
View public comment materials.
Tamil Script Version 1, 15 December 2020 (HTML, XML)
View public comment materials.
Telugu Script Version 1, 15 December 2020 (HTML, XML)
View public comment materials.
Thai Language Version 1, 18 May 2021 (HTML, XML)
Version 1, 15 December 2020 (HTML, XML)
View public comment materials.
Ukrainian Language Version 2, 18 May 2021 (HTML, XML)
Version 1, 19 December 2016 (HTML, XML)
View public comment materials.
Domain Name System
Internationalized Domain Name ,IDN,"IDNs are domain names that include characters used in the local representation of languages that are not written with the twenty-six letters of the basic Latin alphabet ""a-z"". An IDN can contain Latin letters with diacritical marks, as required by many European languages, or may consist of characters from non-Latin scripts such as Arabic or Chinese. Many languages also use other types of digits than the European ""0-9"". The basic Latin alphabet together with the European-Arabic digits are, for the purpose of domain names, termed ""ASCII characters"" (ASCII = American Standard Code for Information Interchange). These are also included in the broader range of ""Unicode characters"" that provides the basis for IDNs. The ""hostname rule"" requires that all domain names of the type under consideration here are stored in the DNS using only the ASCII characters listed above, with the one further addition of the hyphen ""-"". The Unicode form of an IDN therefore requires special encoding before it is entered into the DNS. The following terminology is used when distinguishing between these forms: A domain name consists of a series of ""labels"" (separated by ""dots""). The ASCII form of an IDN label is termed an ""A-label"". All operations defined in the DNS protocol use A-labels exclusively. The Unicode form, which a user expects to be displayed, is termed a ""U-label"". The difference may be illustrated with the Hindi word for ""test"" — परीका — appearing here as a U-label would (in the Devanagari script). A special form of ""ASCII compatible encoding"" (abbreviated ACE) is applied to this to produce the corresponding A-label: xn--11b5bs1di. A domain name that only includes ASCII letters, digits, and hyphens is termed an ""LDH label"". Although the definitions of A-labels and LDH-labels overlap, a name consisting exclusively of LDH labels, such as""icann.org"" is not an IDN."