![]() |
mail us
|
mail this page products | company | support | downloads | isp services | contact us |
This page summarises what at face value seems a remarkably simple concept - character representation. Turns out its more like a nightmare. The column marked relationship tries to define the relationships between the various standards.
| Name | Standard | Aliases | Description | Relationship |
| ASCII | ANSI X3.4-1986 ISO 646 ITU-T T.50 |
US-ASCII IA5 IRA5 ISO 646 |
ASCII is encoded as an 8 bit field but only uses the 7 bits 00 to 7F (0 to 127 decimal). Almost all other character codes contain ASCII as a base. Various national definitions exist which typically have only two printable differences. | ASCII is the same as IA5 or more properly now International Reference Alphabet No. 5 (IRA5) and previously International Alphabet No. 5 (defined in ITU-T T.50) and ISO 646. It also forms the first 128 values in ISO 8859-1 (Latin-1), Unicode and ISO 10646 (UCS). |
| IA5 | ITU-T T.50 | IRA5 ASCII ISO 646 |
International Alphabet No. 5 (ISO 646) now renamed International Reference Alphabet No. 5 (IRA5). | |
| IRA5 | ITU-T T.50 | IA5 ISO 646 ASCII |
International Reference Alphabet No. 5 (IRA5) (was International Alphabet No. 5 - IA5) and is the ITU equivalent of US-ASCII and ISO 646. IRA5 is encoded as an 8 bit field but only uses the 7 bits 00 to 7F (0 to 127 decimal). Almost all other character codes contain IRA5 as a base. | IRA5 is the same as ISO 646 and ASCII. It also forms the first 128 values in ISO 8859-1 (Latin-1), Unicode and ISO 10646 (UCS). |
| ISO 646 | ISO 646 | IA5 IRA5 ASCII |
ISO 646 is encoded as an 8 bit field but only uses the 7 bits 00 to 7F (0 to 127 decimal). Almost all other character codes contain ISO 646 as a base. | ISO 646 is the same as IRA5 (IA5) and ASCII. It also forms the first 128 values in ISO 8859-1 (Latin-1), Unicode and ISO 10646 (UCS). |
| ISO 8859-1 | ISO 8859-1 | Latin-1 | ISO 8859-1 is part of a large family (ISO 8859-1 to 8859-16) is encoded as an 8 bit field and uses all 8 bits 00 to FF (0 to 255 decimal). | It forms the first 128 values in IRA5, ISO 646, US-ASCII, ISO 8559-15 (Latin-1), Unicode and ISO 10646 (UCS). |
| ISO 8859-15 | ISO 8859-15 | Latin-9 | ISO 8859-15 is part of a large family (ISO 8859-1 to 8859-16) is encoded as an 8 bit field and uses all 8 bits 00 to FF (0 to 255 decimal). It differs from 8859-1 by 8 changes including the euro symbol. | It forms the first 128 values in IRA5, ISO 646, US-ASCII, ISO 8559-1 (Latin-9), Unicode and ISO 10646 (UCS). |
| ISO 10646 | ISO 10646 | UCS | ISO 10646 (Universal Character Set) is designed to be the replacement for all previous character sets by providing a single family of standards for the encoding of all possible characters and symbols in all written languages. It has two implementations UCS-2 (a 16 bit encoding) and UCS-4 (a 32 bit encoding). | The first 128 values in ISO 10646 are the same as ASCII, IA5, IRA5 and ISO 646, 8859-1 and 8859-15. Unicode from version 1.1 is th same as ISO 646. |
| Unicode | Unicode Consortium | - | Unicode (currently version 3.0). | From version 1.1 is fully compatible with ISO 10646. |
| CP1252 | RFC RFC 2781 | - | Microsoft's version of ISO 8859-1. There are 27 differences from 8859-1 (it includes the euro) - all in range x80 - x9F. | The first 128 values are the same as those of IRA5, ISO 646, US-ASCII ISO 8559-1 (Latin-1) & -15 (Latin-9), Unicode and ISO 10646 (UCS). |
| Transformations | ||||
| These values define how the underlying codeset of Unicode/ISO 10646 are sent over the wire. They are not charsets. | ||||
| UTF-7 | RFC 2152 | - | UCS Transformation Format-7. Defines how ISO 10646 (UCS) is transformed for non-MIME email data communications. May use from 1 to 9 octets for a single ISO 10646/Unicode character. | |
| UTF-8 | RFC 3629 | UTF-2 FSS-UTF |
UCS Transformation Format-8. Defines how ISO 10646 (UCS) is transformed for MIME enabled data communications. May use from 1 to 7 octets for a single ISO 10646/Unicode character. | |
| UTF-16 | - | - | UCS Transformation Format-16. Defines how ISO 10646 (UCS) is transformed for data communications. May use 1 or 2 octets for a single ISO 10646/Unicode character and thus reduces any UCS-4 to a UCS-2 format before encoding. | |
ISO 8859-1 Latin alphabet No. 1 West European ISO 8859-2 Latin alphabet No. 2 Central and East European ISO 8859-3 Latin alphabet No. 3 South European, Maltese & Esperanto ISO 8859-4 Latin alphabet No. 4 North European ISO 8859-5 Latin/Cyrillic alphabet Slavic languages ISO 8859-6 Latin/Arabic alphabet Arabic ISO 8859-7 Latin/Greek alphabet modern Greek ISO 8859-8 Latin/Hebrew alphabet Hebrew and Yiddish ISO 8859-9 Latin alphabet No. 5 Turkish ISO 8859-10 Latin alphabet No. 6 Nordic (Sámi, Inuit, Icelandic) ISO 8859-11 Latin/Thai alphabet Thai ISO 8859-12 not been defined) ISO 8859-13 Latin alphabet No. 7 Baltic Rim ISO 8859-14 Latin alphabet No. 8 Celtic ISO 8859-15 Latin alphabet No. 9 adds euro to -1 (8 changes) ISO 8859-16 Latin alphabet No. 10 South-Eastern Europe
Problems, comments, suggestions, corrections (including broken links) or something to add? Please take the time from a busy life to 'mail us' (at top of screen), the webmaster (below) or info-support at zytrax. You will have a warm inner glow for the rest of the day.
tech home
web stuff
dom stuff
css stuff
language stuff
regex stuff
rfc stuff
protocol stuff
cable stuff
lan wiring
rs232 wiring
howto stuff
survival stuff
wireless stuff
ascii codes
data rate stuff
telephony stuff
mechanical stuff
pc stuff
electronic stuff
tech links
open guides
RSS Feed
If you are happy it's OK - but your browser is giving a less than optimal experience on our site. You could, at no charge, upgrade to a W3C STANDARDS COMPLIANT browser such as Mozilla
ISO (International)
ANSI (US)
DIN (Germany)
ETSI (EU)
BSI (UK)
AFNOR (France)
TIA (US)
EIA (US)
ITU (International)
IEEE (US)
ETSI (EU)
OFTEL (UK)
|
Copyright © 1994 - 2008 ZyTrax, Inc. All rights reserved. Legal and Privacy |
site by zytrax![]() |
web-master at zytrax Page modified: October 07 2007. |