Tech Stuff - HTML Character Entity Encoding

Mercifully short character code overview. An apparently simple subject which turns out to be brutally complicated - well to our modest brains.

This stuff (and more) is contained in Chapter 24 of the current HTML 4.01 spec (the last ever version of HTML - its all XHTML now) which you can get (and lots of other great stuff) from the W3C site including their fantastic page validation services.

The reason you can never find this stuff is because the most popular values confusingly belong to a number of character sets.

There are two encoding forms supported by most browsers

Both formats are shown in the tables below. If you are using anything obscure or have to deal with old browsers (< v3 ish) you should stick to the numeric format only.

For convenience we show some popular values (from a number of character sets) in a single table then the full Latin-1 (ISO8859-1) table. If you are into the greek alphabet for obscure mathematical symbols - you are out of luck. Go the the W3C click on HTML then go for Chapter 24 of the latest spec - they are all there.

This lot should all display on most browsers. However in the tables below the column DN shows the result of using the name format encoding in your browser and the column D# shows the result using the numeric format. If there is no character in one or other of these column (except the space, dummy!) then your browser does not handle that format correctly.

Commonly used character entity references
DN Name format D# Numeric format Description Char set
  &nbsp;   &#160; non-breaking space ISO8559-1
© &copy; © &#169; copyright sign ISO8559-1
® &reg; ® &#174; registered trade mark sign ISO8559-1
° &deg; ° &#176; degree sign ISO8559-1
² &sup2; ² &#178; superscript 2 (squared) ISO8559-1
³ &sup3; ³ &#179; superscript 3 (cubed) ISO8559-1
" &quot; " &#34; quotation mark ISO10646
& &amp; & &#38; ampersand sign ISO10646
< &lt; < &#60; less than sign ISO10646
> &gt; > &#62; greater than sign ISO10646
&ndash; &#8211; en dash ISO10646
&mdash; &#8212; em dash ISO10646
&lsquo; &#8216; left single quote ISO10646
&rsquo; &#8217; right single quote, apostrophe ISO10646
&ldquo; &#8220; left double quotation mark ISO10646
&rdquo; &#8221; right double quotation mark ISO10646
&bull; &#8226; small black circle, bullet ISO10646
&dagger; &#8224; dagger sign ISO10646
&Dagger; &#8225; double dagger sign ISO10646
&prime; &#8242; prime = minutes = feet ISO10646
&Prime; &#8243; double prime = seconds = inches ISO10646
&lsaquo; &#8249; single left pointing angle quote ISO10646
&rsaquo; &#8250; single right pointing angle quote ISO10646
&euro; &#8364; euro sign ISO10646
&trade; &#8482; Registered Trademark sign ISO10646
&oplus; &#8853; circled plus = direct sum ISO10646
&otimes; &#8855; circled times = vector product ISO10646
˜ &tilde; ˜ &#732; tilde sign ISO10646
ˆ &circ; ˆ &#710; circumflex (or caret) sign ISO10646
&#9733; black star ISO10646
&#9734; empty star ISO10646
&spades; &#9824; black spade suit ISO10646
&clubs; &#9827; black clubs suit ISO10646
&hearts; &#9829; black heart suit ISO10646
&diams; &#9830; black diamonds suit ISO10646
&loz; &#9674; lozenge ISO10646
&larr; &#8592; left arrow ISO10646
&rarr; &#8594; right arrow ISO10646
&uarr; &#8593; up arrow ISO10646
&darr; &#8595; down arrow ISO10646
&harr; &#8596; right-left arrow ISO10646
¬ &not; ¬ &#172; NOT sign ISO8859-1

Here is the full list of character entities for accented characters and miscellaneous symbols in the Latin-1 (ISO 8859-1) character set. Values in the range 00 to 7F (the ASCII/IA5 set in this table) are typically used as raw characters with the exceptions of the special characters used for HTML encoding as shown above (<, >, & and ") but they can be represented as HTML entities by using the ASCII/IA5 decimal number, for example ',' (comma) has a decimal vale of 44 (in the ASCII/IA5 table and may be represented as a HTML entity by encoding as &#44;.

Latin-1 (ISO8859-1) character entity references
DN Name format D# Numeric format Description
|   | &#124; vertical bar
  &nbsp;   &#160; non-breaking space
¡ &iexcl; ¡ &#161; inverted exclamation mark
¢ &cent; ¢ &#162; cent sign
£ &pound; £ &#163; pound sign
¤ &curren; ¤ &#164; currency sign
¥ &yen; ¥ &#165; yen sign = yuan sign
¦ &brvbar; ¦ &#166; broken vertical bar
§ &sect; § &#167; section sign
¨ &uml; ¨ &#168; diaeresis = spacing diaeresis
© &copy; © &#169; copyright sign
ª &ordf; ª &#170; feminine ordinal indicator
« &laquo; « &#171; left-pointing double angle quotes (left pointing quillemet)
¬ &not; ¬ &#172; not sign
­ &shy; ­ &#173; soft hyphen
® &reg; ® &#174; registered sign
¯ &macr; ¯ &#175; macron = spacing macron
° &deg; ° &#176; degree sign
± &plusmn; ± &#177; plus-minus sign
² &sup2; ² &#178; superscript two (squared)
³ &sup3; ³ &#179; superscript three (cubed)
´ &acute; ´ &#180; acute accent
µ &micro; µ &#181; micro sign
&para; &#182; paragraph sign = pilcrow sign
· &middot; · &#183; middle dot = georgian comma
¸ &cedil; ¸ &#184; cedilla sign
¹ &sup1; ¹ &#185; superscript one
º &ordm; º &#186; masculine ordinal indicator
» &raquo; » &#187; right-pointing double angle quotes (right pointing quillemet)
¼ &frac14; ¼ &#188; vulgar fraction one quarter
½ &frac12; ½ &#189; vulgar fraction one half
¾ &frac34; ¾ &#190; vulgar fraction three quarters
¿ &iquest; ¿ &#191; inverted question mark
À &Agrave; À &#192; latin capital A with grave accent
Á &Aacute; Á &#193; latin capital A with acute accent
 &Acirc;  &#194; latin capital A with circumflex
à &Atilde; à &#195; latin capital A with tilde
Ä &Auml; Ä &#196; latin capital A with diaeresis
Å &Aring; Å &#197; latin capital A with ring
Æ &AElig; Æ &#198; latin capital AE
Ç &Ccedil; Ç &#199; latin capital C with cedilla
È &Egrave; È &#200; latin capital E with grave accent
É &Eacute; É &#201; latin capital E with acute accent
Ê &Ecirc; Ê &#202; latin capital E with circumflex
Ë &Euml; Ë &#203; latin capital E with diaeresis
Ì &Igrave; Ì &#204; latin capital I with grave accent
Í &Iacute; Í &#205; latin capital I with acute accent
Î &Icirc; Î &#206; latin capital I with circumflex
Ï &Iuml; Ï &#207; latin capital I with diaeresis
Ð &ETH; Ð &#208; latin capital letter ETH
Ñ &Ntilde; Ñ &#209; latin capital N with tilde
Ò &Ograve; Ò &#210; latin capital O with grave accent
Ó &Oacute; Ó &#211; latin capital O with acute accent
Ô &Ocirc; Ô &#212; latin capital O with circumflex
Õ &Otilde; Õ &#213; latin capital O with tilde
Ö &Ouml; Ö &#214; latin capital O with diaeresis
× &times; × &#215; multiplication sign
Ø &Oslash; Ø &#216; latin capital O with stroke
Ù &Ugrave; Ù &#217; latin capital U with grave accent
Ú &Uacute; Ú &#218; latin capital U with acute accent
Û &Ucirc; Û &#219; latin capital U with circumflex
Ü &Uml; Ü &#220; latin capital U with diaeresis
Ý &Yacute; Ý &#221; latin capital Y with acute accent
Þ &THORN; Þ &#222; latin capital THORN
ß &szlig; ß &#223; latin small letter sharp s
à &agrave; à &#224; latin small letter a with grave accent
á &aacute; á &#225; latin small letter a with acute accent
â &acirc; â &#226; latin small letter a with circumflex
ã &atilde; ã &#227; latin small letter a with tilde
ä &auml; ä &#228; latin small letter a with diaeresis
å &aring; å &#229; latin small letter a with ring
æ &aelig; æ &#230; latin small letter ae
ç &ccedil; ç &#231; latin small letter c with cedilla
è &egrave; è &#232; latin small letter e with grave accent
é &eacute; é &#233; latin small letter e with acute accent
ê &ecirc; ê &#234; latin small letter e with circumflex
ë &euml; ë &#235; latin small letter e with diaeresis
ì &igrave; ì &#236; latin small letter i with grave accent
í &iacute; í &#237; latin small letter i with acute accent
î &icirc; î &#238; latin small letter i with circumflex
ï &iuml; ï &#239; latin small letter i with diaeresis
ð &eth; ð &#240; latin small letter eth
ñ &ntilde; ñ &#241; latin small letter n with tilde
ò &ograve; ò &#242; latin small letter 0 with grave accent
ó &oacute; ó &#243; latin small letter 0 with acute accent
ô &ocirc; ô &#244; latin small letter 0 with circumflex
õ &otilde; õ &#245; latin small letter 0 with tilde
ö &ouml; ö &#246; latin small letter 0 with diaeresis
÷ &divide; ÷ &#247; division sign
ø &oslash; ø &#248; latin small letter 0 with stroke
ù &ugrave; ù &#249; latin small letter u with grave accent
ú &uacute; ú &#250; latin small letter u with acute accent
û &ucirc; û &#251; latin small letter u with circumflex
ü &uuml; ü &#252; latin small letter u with diareresis
ý &yacute; ý &#253; latin small letter y with acute accent
þ &thorn; þ &#254; latin small letter thorn
ÿ &yuml; ÿ &#255; latin small letter y with diaeresis

Tech Stuff

