In this page I will describe the early history of character set standardization. I stop approximately where Roman Czyborra starts. For further developments I refer you to those pages. In the early process five groups were important (Telegraph Unions, US/International, Soviet, Japanese and IBM), later more standards emerged:

CCITT

CCITT (or in full: Commité Consultatif International Telegraphique et Telephonique, or the International Consultative Committee on Telephony and Telegraphy) were the first body to standardize character sets. In total they have standardized five, the links below refer to actual pages giving them:

ASA/USASI/ANSI + ISO

The following are given in this section:

ASCII-1963


The first standardized version of ASCII is shown here. It became a standard in 1963 by the ASA (American Standards Association) as it was called at that time. The last two rows were not yet decided on, but the last unfilled position (between ACK and ESC) was reserved for an additional control symbol.

ASCII-1965


The second form of the ASCII standard, from 1965. The controls (except DEL) were removed from the last row, some controls were renamed and the last two rows were filled. The arrows and backslash disappeared and made place for other graphics. Also @ was moved. There is a tale here, because although the standard was approved it was never published, so it was also never used! The reason was that ASA heard that ISO (the International Standards Organization) would standardize a character set similar to but slightly conflicting with this standard. In the meantime (1966) the name of the US standards organization was changed to USASI (United States of America Standards Institute), and the code was often referred to as USASCII. (In 1969 the name was changed to ANSI (American National Standards Institute), but although the acronym ANSCII was proposed it was never used and USASCII as name fell into disuse.)

USASCII-8


In the meantime an 8-bit version of ASCII-1963 was floating around. There were no differences in the symbols supplied (except that Format Effector 0 was now BackSpace), but the coding was different. (The rows I did not give in the above chart were invalid codes.) I do not know the reason for this particular mapping on a 8-bit character set, but it was not long-lived.

ASCII-1967


The final version of this standard, from 1967. Note that the backtick and @ have flipped places with respect to the 1965 version. Also there are some slight naming differences with respect to the controls. And some symbols were changed. Most notable is the disappearance of the "logical NOT" and the "logical OR". (I may remark here that the use of the vertical bar for "logical OR" is not general European usage, there some form of lower case v was used, as can be seen on many European code charts presented on these pages.) The vertical bar was replaced by a broken bar. This resulted in much controversy in the US, and it was decided that the exclamation mark could have an alternate representation as vertical bar. This distinction has died out since that time, and in many places the "broken bar" is actually rendered as a vertical line. This standard was also adopted in 1967 by ISO as the international reference version. More information about that can be found in
Roman Czyborra's pages.

ISO 6 bit version


While ASA/USASI/ANSI never did sanction 6 bit subsetting, ISO did, and even standardized it. Here we see such subset with the pound symbol replacing a symbol that was marked as possibly for national use.

ISO 6 bit alternate version


Worse, ISO had even two versions of the 6 bit subset with major symbols replaced, here we see the alternative. It took ISO not so many years to throw the 6 bit subsets out of the standard.

GOST

Here I give attention to three standards of which two have also two variants:

GOST 10859


Just about the same time that ASA came with an US standard for character coding GOST, the standardization institute from the Soviet Union, came also with a standard. It dates from 1964, and it encodes the Cyrillic upper case letters. In addition it encodes those latin Letters that have no similar looking cyrillic counterpart. The special graphics encoded clearly show the intention: programming in Algol 60! The standard allowed a 4 bit subset (only the first row), a 5 bit subset (the first two rows) and a 6 bit subset (the first four rows). In the latter case the encoding for D would have the meaning of DEL.

GOST 10859 Latin


In addition to the Cyrillic encoding given the standard also allowed a full 6 bit Latin encoding. Also here a clear determination to allow programming in Algol 60.

GOST 13052


Later (when?) GOST came up with a newer standard, this was less targeted at programming and encoded a near full upper and lower case version of the Cyrillic script. The first four rows were identical to those in ASCII, except for the dollar sign. You can find more about this in
Roman Czyborra's pages. Interestingly, the chart he has for this standard differs in one point from mine (spot the difference!). The symbol encoded in Roman Czyborra's chart but not here is actually a symbol that was also not present in the Cyrillic 6 bit subset of GOST 10859. I think his chart comes from a later date.

GOST 19768/74


In 1974 a new coding was introduced. This one encoded the full script in the upper part of the chart. Some places (the green ones) are open according to the standard, but ECMA filled them in their standard with number 111. The name of the coding is DKOI.


A smaller set was also available, with uppercase letters only. In addition the standard provided for an EBCDIC version, see in the
EBCDIC section about that. There is also a punched card code for this version, see the section on punched cards.

GOST 19768/87


In 1987 the standard was revised, resulting in a complete rearrangement. Again, the green spots are not filled in by the standard, but filled in by ECMA with their standard number 113. This became the base of ISO 8859-5. Here again an EBCDIC version was defined. See the
EBCDIC section.

JISCII


In 1969 in Japan the ASCII standard was incorporated in it's own standard, called JISCII (Japanese Industrial Standard Code for Information Interchange). Later control symbols were introduced in the first two empty rows, and the designation changed to JIS X 201. The Japanese script does not use only the Chinese characters, but has also two syllable scripts. One of these (Katakana) was the earliest to be standardized and can be used to write the Japanese language.

Korean standards


Although quite early (when?) a standard emerged in South Korea, it became only an official standard in 1992. The Korean script has in addition to the Chinese characters also it's own alphabetic script, the Hangul. A number of Hangul are combined to form a block which denotes a syllable. Within such a syllable you can find the initial consonant, the vowel and the final consonant(s) (if any). The Johab standard is a 16-bit code where the initial bit denotes a Johab encoding, the next five bits the initial consonant, the next five the vowel and the final five the final consonants. Above you see an overview of the encodings. (I know that other encodings did occur, but cannot find them now.) The chart above shows also approximately the relative positions of the symbols, but to get all possible positions you need a chart of 183 symbols.


Here I show how the symbols are combined within a syllable. There are six possible ways, depending on the form of the vowel and the presence of a final consonant. The initial consonant is shown as a white block, the vowel as a red block and the final consonant as a black block. It is clear that (except for the possible final consonant) the form of a symbol will change dependend on it's use.

Below a breakdown of the different symbols is given, given in a larger, more legible, font.

Initial consonants


Only 19 initial consonants are possible, some are double consonants which means a different pronunciation.

Vowels


Korean knows only 21 different vowels (vowel combinations). Some positions are left undefined to disable confusion with 7-bit control symbols.

Final consonants


There are 27 final consonant combinations possible, in addition a 28-th code is used to denote the absence of a final consonant. The first position is left undefined to disable confusion with 7-bit codes, the next one denotes the absence of a final consonant. I have no idea why the third position in the second row is left undefined. It is strange that the coding of the final consonant is so different from that of the initial consonant, but three of the combinations allowed as initial consonant do not occur as final consonant.

Thai standards

There are two standards relevant here, of which one has an EBCDIC variant. Full texts of the standard can be found in Thai for both
TIS 620 and TIS 1074.

TIS 620


This standard came up in the 1986 and was revised in 1990. The chart shows the upper half of the code, the lower half is identical to ASCII. The same standard also defines an EBCDIC version which is shown in the EBCDIC section.

TIS 1074


This is a six bit standard for mixed Thai and Latin script to be used on Teletypes. Although it appears similar to 5 bit Latin teletype scode, a scrutiny reveals that it is completely different. There are three cases (called shifts in the standard): Lower Case, Middle Case and Upper Case. In each case seven codes are unused, and in Upper Case an additional seven are unused. This does not encode the full TIS-620 codeset, although it would have been possible if the ASCII lower case letters and the Thai numerals had been omitted. Moreover, three symbols are coded that are not in TIS-620 (Middle Case 014 and 075 and Upper Case 055). This standard dates from 1992.

Vietnamese Standards

Vietnamese is about the only major language that is tonal (i.e. uses tones to distinguish meanings) that uses the Latin alphabet as a base. The languages has six vowels in addition to the standard list (a-circumflex, a-breve, e-circumflex, o-circumflex and the special o-horn and u-horn) and one consonant (barred-d). In addition, tones are indicated on vowels by diacriticals (none, acute, grave, tilde, hook and dot-below). This means that the language needs 134 additional code points above the ASCII standard. Clearly that can not be done by the use of the additional code points obtained when 7-bit ASCII is extended to 8 bits, so at least six of the control symbols have to be replaced by printable symbols in a full Vietnamese standard. I show here three such standards, the first is official, the other two are not official but widely used (there are more, but apparently not used as much; I know of at least 7):
I have not yet found any reason for the particular assignment of the code points in all three standard.

VSCII


This is the code table as found in TCVN 5712 of 1993, the official standard. In addition to the printable characters it also provides code points for the tonal signs, so a lot of the control symbols had to go. This is also known as VSCII-1 (Vietnam Standard Code for Information Interchange). There are two additional version standardized, VSCII-2 and VSCII-3. In VSCII-2 the specific code points in the first, second, eighth and ninth row are omitted. This code has been registered for used in an
ISO 2022 context.


This table shows part of VSCII-3. It is similar to VSCII-2, but all code points of uppercase vowels with tonal signs are omitted. Also the tonal signs themselves are omitted, which is a bit strange. With this code you can not even use combining tonal signs. In this table only the last six rows are shown.

VISCII


Before the official standard (1992) a group of expatriate Vietnamese speakers (
Viet Std) came up with the standard code shown above. It was named VISCII = VIet-nam Standard Code for Information Interchange. It is similar to, and just as unexplainable as, VSCII.

VNCII (VPS)


And a third group, the
Vietnames Professionals Society, came with a third version, originally called VPS, later also called VNCII (Viet-Nam Code for Information Interchange?). This code is mostly used on Windows systems. An additional unexplained feature of this code is the empty code points in the higher part of the table.

IBM

IBM was not a standards institute, but it was a major player, so the following code tables are shown in this section:

Original EBCDIC


This is the original version of EBCDIC, IBM's own standard. Although it has been claimed that it predates ASCII, that is not true, this standard dates from 1964. Strange enough, IBM cooperated in the creation of ASCII, but did not implement it. The main concern was cost of translation of card columns to internal code which they thought should be done entirely in hardware, something other manufacturers had already abandoned. The sections outlined in red could be folded together to form a 64 character subset, that is where
EBCD Hollerith got it's name from. And obviously the symbol that would be folded to the cursed 0-8-2 card code would not be present in some versions of EBCD Hollerith. And also of course, the company being IBM, a single standard was not sufficient, so there was an alternate version in use where the dollarcent, vertical bar, exclamation mark and not sign were replaced by open square bracket, exclamation mark, closing square bracket and circumflex (caret) respectively. Note the move of the exclamation mark here! A design consideration was that, although the alphabetics were not contiguous, they would not be interspeced by non alphabetics. This is clearly violated by the braces, the backslash and the four banking symbols in the lower right quarter that are also present in the standard OCR-A font (but these latter four got very strange positions!).

Cyrillic EBCDIC


IBM came also with a Cyrillic version of EBCDIC. Here the lower case letters and a bit of punctuation were replaced by Cyrillic letters. The order is obviously the KOI order from GOST 13052, given
above. Some letters are put in a strange position. Again by folding you would get a 64 character set, but in this case some of the punctuation actually was not present, i.e. the columns 11, 12, 15 and 17 were not present at all and from column 16 only the Cyrillic letter, so the folded set consisted of only 49 symbols. So actually there was no inconsistency by replacing < by a Cyrillic letter but keeping > alone. (Or was there ...)

Japanese EBCDIC


Indeed, also a Japanese EBCDIC coding was introduced, shown above. At first only a subset of the Katakana were defined (those in the red box above), and this allowed a nice folding to a 64 character set. Later the additional symbols from
JISCII were added, so folding was no longer possible. As with the Cyrillic set, the new symbols replaced in part the lower case Latin letters.

Revised Japanese EBCDIC


Of course, IBM came up with a revised (and hence completely incompatible) version of Japanese EBCDIC. Here the defined symbols from EBCDIC were left unchanged (except the dollar sign), and the symbols were filled in order from the position of the space. I have no idea why the position immediately next to the space was left undefined, and I do not think I want to know. This code was used on the IBM System/3 computers while the IBM System/360 and System/370 computers remained with the previous definition. Can you say argh? But wait, there is more to come.

Augmented EBCDIC


In the mean time ISO had come up with additional symbols with the 8859-1 standard. This was implemented in EBCDIC of course. In this case the additional symbols not in EBCDIC were just added consecutively in the empty places in the code table (again skipping the position after the space). Of course a variant of this occurred with the interchanges mentioned
above.

Revised Augmented EBCDIC


Later the previous version was revised, and by now IBM came up with code pages for EBCDIC. Here we see code page 500, completely incompatible with the previous version. We see the added (and sometimes changed) control symbols taken from the then current ANSI standard. The rhyme and reason behind the position of the symbols I cannot fathom. It completely defies previous usage. Of course there is an alternate version with the interchanges mentioned
above, but here we did show the version with square brackets in the alternate place. The version with interchanged symbols is code page 037. It depends on the country which code page you get, for instance in the Netherlands cp 037 is used, in Belgium (just bordering south) cp 500 is used. But it not only depends on country, but also on device. Some devices would print out symbols different from others. One pretty unsure way to transfer e-mail was when it was going from an ASCII site through a number of EBCDIC sites back to an ASCII site. The probability that the result was the same as what had gone in was not 100 %. It appears that there are currently 57 (remember Heinz) versions of this standard.

The above may appear IBM bashing. And indeed it is. Consider my frustration when I had to develop a C program concurrently on an IBM mainframe and a Unix system. When I transferred a program by FTP from the Unix system to the IBM mainframe, the IBM C compiler would not compile because FTP and the compiler disagreed about the actual codings of the curly braces. Editing with a Telnet (tn3270) program was also problematical, here there was a another disagreement. And now this sillyness is still present with the Windows code pages...

Russian version of EBCDIC from 1974


The Soviet Union standardization of 1974 also had an EBCDIC version. It is based on their standard 8 bit code with the standard translation to EBCDIC. The positions marked green are based on the similar ECMA standard 111. Note that this completely deviates from the IBM Cyrillic standard. This code has been used on IBM clones produced in the SU.

Russian version of EBCDIC from 1987


In 1987 the Soviet Union switched to a new standard (which is the base for ECMA 113 and ISO 8859-5). This new standard had also an EBCDIC variant. Again the green positions are not in the standard itself. I do not know whether this coding has been used much, I think not.

Thai version of EBCDIC


When the Thai code was standardized the standard also contained a standard for Thai EBCDIC. Apart from the two Soviet Union standards above, about the only official standard that mentions EBCDIC I think. The base code was an EBCDIC display with only non-ISO-8859-1 part. I show it here based on CP 500.

EEC System 4


And in addition there were of course the versions of the competitors that produced IBM System/360 look-alikes. Here we see a coding for the EEC System/4 with considerable differences with respect to the control symbols.