Commit 5688b7ac authored by Walter Dörwald's avatar Walter Dörwald

Add two dictionaries to htmlentitydefs: name2codepoint maps

HTML entity names to Unicode codepoints (as integers).
codepoint2name is the reverse mapping. From SF patch #722017.
parent 19a02ba6
...@@ -145,15 +145,27 @@ method without a preceding call to \method{save_bgn()} will raise a ...@@ -145,15 +145,27 @@ method without a preceding call to \method{save_bgn()} will raise a
\modulesynopsis{Definitions of HTML general entities.} \modulesynopsis{Definitions of HTML general entities.}
\sectionauthor{Fred L. Drake, Jr.}{fdrake@acm.org} \sectionauthor{Fred L. Drake, Jr.}{fdrake@acm.org}
This module defines a single dictionary, \code{entitydefs}, which is This module defines three dictionaries, \code{name2codepoint},
\code{codepoint2name}, and \code{entitydefs}. \code{entitydefs} is
used by the \refmodule{htmllib} module to provide the used by the \refmodule{htmllib} module to provide the
\member{entitydefs} member of the \class{HTMLParser} class. The \member{entitydefs} member of the \class{HTMLParser} class. The
definition provided here contains all the entities defined by HTML 2.0 definition provided here contains all the entities defined by XHTML 1.0
that can be handled using simple textual substitution in the Latin-1 that can be handled using simple textual substitution in the Latin-1
character set (ISO-8859-1). character set (ISO-8859-1).
\begin{datadesc}{entitydefs} \begin{datadesc}{entitydefs}
A dictionary mapping HTML 2.0 entity definitions to their A dictionary mapping XHTML 1.0 entity definitions to their
replacement text in ISO Latin-1. replacement text in ISO Latin-1.
\end{datadesc}
\begin{datadesc}{name2codepoint}
A dictionary that maps HTML entity names to the Unicode codepoints.
\end{datadesc} \end{datadesc}
\begin{datadesc}{codepoint2name}
A dictionary that maps Unicode codepoints to HTML entity names.
\end{datadesc}
"""HTML character entity references.""" """HTML character entity references."""
entitydefs = { # maps the HTML entity name to the Unicode codepoint
'AElig': '\306', # latin capital letter AE = latin capital ligature AE, U+00C6 ISOlat1 name2codepoint = {
'Aacute': '\301', # latin capital letter A with acute, U+00C1 ISOlat1 'AElig': 0x00c6, # latin capital letter AE = latin capital ligature AE, U+00C6 ISOlat1
'Acirc': '\302', # latin capital letter A with circumflex, U+00C2 ISOlat1 'Aacute': 0x00c1, # latin capital letter A with acute, U+00C1 ISOlat1
'Agrave': '\300', # latin capital letter A with grave = latin capital letter A grave, U+00C0 ISOlat1 'Acirc': 0x00c2, # latin capital letter A with circumflex, U+00C2 ISOlat1
'Alpha': 'Α', # greek capital letter alpha, U+0391 'Agrave': 0x00c0, # latin capital letter A with grave = latin capital letter A grave, U+00C0 ISOlat1
'Aring': '\305', # latin capital letter A with ring above = latin capital letter A ring, U+00C5 ISOlat1 'Alpha': 0x0391, # greek capital letter alpha, U+0391
'Atilde': '\303', # latin capital letter A with tilde, U+00C3 ISOlat1 'Aring': 0x00c5, # latin capital letter A with ring above = latin capital letter A ring, U+00C5 ISOlat1
'Auml': '\304', # latin capital letter A with diaeresis, U+00C4 ISOlat1 'Atilde': 0x00c3, # latin capital letter A with tilde, U+00C3 ISOlat1
'Beta': 'Β', # greek capital letter beta, U+0392 'Auml': 0x00c4, # latin capital letter A with diaeresis, U+00C4 ISOlat1
'Ccedil': '\307', # latin capital letter C with cedilla, U+00C7 ISOlat1 'Beta': 0x0392, # greek capital letter beta, U+0392
'Chi': 'Χ', # greek capital letter chi, U+03A7 'Ccedil': 0x00c7, # latin capital letter C with cedilla, U+00C7 ISOlat1
'Dagger': '‡', # double dagger, U+2021 ISOpub 'Chi': 0x03a7, # greek capital letter chi, U+03A7
'Delta': 'Δ', # greek capital letter delta, U+0394 ISOgrk3 'Dagger': 0x2021, # double dagger, U+2021 ISOpub
'ETH': '\320', # latin capital letter ETH, U+00D0 ISOlat1 'Delta': 0x0394, # greek capital letter delta, U+0394 ISOgrk3
'Eacute': '\311', # latin capital letter E with acute, U+00C9 ISOlat1 'ETH': 0x00d0, # latin capital letter ETH, U+00D0 ISOlat1
'Ecirc': '\312', # latin capital letter E with circumflex, U+00CA ISOlat1 'Eacute': 0x00c9, # latin capital letter E with acute, U+00C9 ISOlat1
'Egrave': '\310', # latin capital letter E with grave, U+00C8 ISOlat1 'Ecirc': 0x00ca, # latin capital letter E with circumflex, U+00CA ISOlat1
'Epsilon': 'Ε', # greek capital letter epsilon, U+0395 'Egrave': 0x00c8, # latin capital letter E with grave, U+00C8 ISOlat1
'Eta': 'Η', # greek capital letter eta, U+0397 'Epsilon': 0x0395, # greek capital letter epsilon, U+0395
'Euml': '\313', # latin capital letter E with diaeresis, U+00CB ISOlat1 'Eta': 0x0397, # greek capital letter eta, U+0397
'Gamma': 'Γ', # greek capital letter gamma, U+0393 ISOgrk3 'Euml': 0x00cb, # latin capital letter E with diaeresis, U+00CB ISOlat1
'Iacute': '\315', # latin capital letter I with acute, U+00CD ISOlat1 'Gamma': 0x0393, # greek capital letter gamma, U+0393 ISOgrk3
'Icirc': '\316', # latin capital letter I with circumflex, U+00CE ISOlat1 'Iacute': 0x00cd, # latin capital letter I with acute, U+00CD ISOlat1
'Igrave': '\314', # latin capital letter I with grave, U+00CC ISOlat1 'Icirc': 0x00ce, # latin capital letter I with circumflex, U+00CE ISOlat1
'Iota': 'Ι', # greek capital letter iota, U+0399 'Igrave': 0x00cc, # latin capital letter I with grave, U+00CC ISOlat1
'Iuml': '\317', # latin capital letter I with diaeresis, U+00CF ISOlat1 'Iota': 0x0399, # greek capital letter iota, U+0399
'Kappa': 'Κ', # greek capital letter kappa, U+039A 'Iuml': 0x00cf, # latin capital letter I with diaeresis, U+00CF ISOlat1
'Lambda': 'Λ', # greek capital letter lambda, U+039B ISOgrk3 'Kappa': 0x039a, # greek capital letter kappa, U+039A
'Mu': 'Μ', # greek capital letter mu, U+039C 'Lambda': 0x039b, # greek capital letter lambda, U+039B ISOgrk3
'Ntilde': '\321', # latin capital letter N with tilde, U+00D1 ISOlat1 'Mu': 0x039c, # greek capital letter mu, U+039C
'Nu': 'Ν', # greek capital letter nu, U+039D 'Ntilde': 0x00d1, # latin capital letter N with tilde, U+00D1 ISOlat1
'OElig': 'Œ', # latin capital ligature OE, U+0152 ISOlat2 'Nu': 0x039d, # greek capital letter nu, U+039D
'Oacute': '\323', # latin capital letter O with acute, U+00D3 ISOlat1 'OElig': 0x0152, # latin capital ligature OE, U+0152 ISOlat2
'Ocirc': '\324', # latin capital letter O with circumflex, U+00D4 ISOlat1 'Oacute': 0x00d3, # latin capital letter O with acute, U+00D3 ISOlat1
'Ograve': '\322', # latin capital letter O with grave, U+00D2 ISOlat1 'Ocirc': 0x00d4, # latin capital letter O with circumflex, U+00D4 ISOlat1
'Omega': 'Ω', # greek capital letter omega, U+03A9 ISOgrk3 'Ograve': 0x00d2, # latin capital letter O with grave, U+00D2 ISOlat1
'Omicron': 'Ο', # greek capital letter omicron, U+039F 'Omega': 0x03a9, # greek capital letter omega, U+03A9 ISOgrk3
'Oslash': '\330', # latin capital letter O with stroke = latin capital letter O slash, U+00D8 ISOlat1 'Omicron': 0x039f, # greek capital letter omicron, U+039F
'Otilde': '\325', # latin capital letter O with tilde, U+00D5 ISOlat1 'Oslash': 0x00d8, # latin capital letter O with stroke = latin capital letter O slash, U+00D8 ISOlat1
'Ouml': '\326', # latin capital letter O with diaeresis, U+00D6 ISOlat1 'Otilde': 0x00d5, # latin capital letter O with tilde, U+00D5 ISOlat1
'Phi': 'Φ', # greek capital letter phi, U+03A6 ISOgrk3 'Ouml': 0x00d6, # latin capital letter O with diaeresis, U+00D6 ISOlat1
'Pi': 'Π', # greek capital letter pi, U+03A0 ISOgrk3 'Phi': 0x03a6, # greek capital letter phi, U+03A6 ISOgrk3
'Prime': '″', # double prime = seconds = inches, U+2033 ISOtech 'Pi': 0x03a0, # greek capital letter pi, U+03A0 ISOgrk3
'Psi': 'Ψ', # greek capital letter psi, U+03A8 ISOgrk3 'Prime': 0x2033, # double prime = seconds = inches, U+2033 ISOtech
'Rho': 'Ρ', # greek capital letter rho, U+03A1 'Psi': 0x03a8, # greek capital letter psi, U+03A8 ISOgrk3
'Scaron': 'Š', # latin capital letter S with caron, U+0160 ISOlat2 'Rho': 0x03a1, # greek capital letter rho, U+03A1
'Sigma': 'Σ', # greek capital letter sigma, U+03A3 ISOgrk3 'Scaron': 0x0160, # latin capital letter S with caron, U+0160 ISOlat2
'THORN': '\336', # latin capital letter THORN, U+00DE ISOlat1 'Sigma': 0x03a3, # greek capital letter sigma, U+03A3 ISOgrk3
'Tau': 'Τ', # greek capital letter tau, U+03A4 'THORN': 0x00de, # latin capital letter THORN, U+00DE ISOlat1
'Theta': 'Θ', # greek capital letter theta, U+0398 ISOgrk3 'Tau': 0x03a4, # greek capital letter tau, U+03A4
'Uacute': '\332', # latin capital letter U with acute, U+00DA ISOlat1 'Theta': 0x0398, # greek capital letter theta, U+0398 ISOgrk3
'Ucirc': '\333', # latin capital letter U with circumflex, U+00DB ISOlat1 'Uacute': 0x00da, # latin capital letter U with acute, U+00DA ISOlat1
'Ugrave': '\331', # latin capital letter U with grave, U+00D9 ISOlat1 'Ucirc': 0x00db, # latin capital letter U with circumflex, U+00DB ISOlat1
'Upsilon': 'Υ', # greek capital letter upsilon, U+03A5 ISOgrk3 'Ugrave': 0x00d9, # latin capital letter U with grave, U+00D9 ISOlat1
'Uuml': '\334', # latin capital letter U with diaeresis, U+00DC ISOlat1 'Upsilon': 0x03a5, # greek capital letter upsilon, U+03A5 ISOgrk3
'Xi': 'Ξ', # greek capital letter xi, U+039E ISOgrk3 'Uuml': 0x00dc, # latin capital letter U with diaeresis, U+00DC ISOlat1
'Yacute': '\335', # latin capital letter Y with acute, U+00DD ISOlat1 'Xi': 0x039e, # greek capital letter xi, U+039E ISOgrk3
'Yuml': 'Ÿ', # latin capital letter Y with diaeresis, U+0178 ISOlat2 'Yacute': 0x00dd, # latin capital letter Y with acute, U+00DD ISOlat1
'Zeta': 'Ζ', # greek capital letter zeta, U+0396 'Yuml': 0x0178, # latin capital letter Y with diaeresis, U+0178 ISOlat2
'aacute': '\341', # latin small letter a with acute, U+00E1 ISOlat1 'Zeta': 0x0396, # greek capital letter zeta, U+0396
'acirc': '\342', # latin small letter a with circumflex, U+00E2 ISOlat1 'aacute': 0x00e1, # latin small letter a with acute, U+00E1 ISOlat1
'acute': '\264', # acute accent = spacing acute, U+00B4 ISOdia 'acirc': 0x00e2, # latin small letter a with circumflex, U+00E2 ISOlat1
'aelig': '\346', # latin small letter ae = latin small ligature ae, U+00E6 ISOlat1 'acute': 0x00b4, # acute accent = spacing acute, U+00B4 ISOdia
'agrave': '\340', # latin small letter a with grave = latin small letter a grave, U+00E0 ISOlat1 'aelig': 0x00e6, # latin small letter ae = latin small ligature ae, U+00E6 ISOlat1
'alefsym': 'ℵ', # alef symbol = first transfinite cardinal, U+2135 NEW 'agrave': 0x00e0, # latin small letter a with grave = latin small letter a grave, U+00E0 ISOlat1
'alpha': 'α', # greek small letter alpha, U+03B1 ISOgrk3 'alefsym': 0x2135, # alef symbol = first transfinite cardinal, U+2135 NEW
'amp': '\46', # ampersand, U+0026 ISOnum 'alpha': 0x03b1, # greek small letter alpha, U+03B1 ISOgrk3
'and': '∧', # logical and = wedge, U+2227 ISOtech 'amp': 0x0026, # ampersand, U+0026 ISOnum
'ang': '∠', # angle, U+2220 ISOamso 'and': 0x2227, # logical and = wedge, U+2227 ISOtech
'aring': '\345', # latin small letter a with ring above = latin small letter a ring, U+00E5 ISOlat1 'ang': 0x2220, # angle, U+2220 ISOamso
'asymp': '≈', # almost equal to = asymptotic to, U+2248 ISOamsr 'aring': 0x00e5, # latin small letter a with ring above = latin small letter a ring, U+00E5 ISOlat1
'atilde': '\343', # latin small letter a with tilde, U+00E3 ISOlat1 'asymp': 0x2248, # almost equal to = asymptotic to, U+2248 ISOamsr
'auml': '\344', # latin small letter a with diaeresis, U+00E4 ISOlat1 'atilde': 0x00e3, # latin small letter a with tilde, U+00E3 ISOlat1
'bdquo': '„', # double low-9 quotation mark, U+201E NEW 'auml': 0x00e4, # latin small letter a with diaeresis, U+00E4 ISOlat1
'beta': 'β', # greek small letter beta, U+03B2 ISOgrk3 'bdquo': 0x201e, # double low-9 quotation mark, U+201E NEW
'brvbar': '\246', # broken bar = broken vertical bar, U+00A6 ISOnum 'beta': 0x03b2, # greek small letter beta, U+03B2 ISOgrk3
'bull': '•', # bullet = black small circle, U+2022 ISOpub 'brvbar': 0x00a6, # broken bar = broken vertical bar, U+00A6 ISOnum
'cap': '∩', # intersection = cap, U+2229 ISOtech 'bull': 0x2022, # bullet = black small circle, U+2022 ISOpub
'ccedil': '\347', # latin small letter c with cedilla, U+00E7 ISOlat1 'cap': 0x2229, # intersection = cap, U+2229 ISOtech
'cedil': '\270', # cedilla = spacing cedilla, U+00B8 ISOdia 'ccedil': 0x00e7, # latin small letter c with cedilla, U+00E7 ISOlat1
'cent': '\242', # cent sign, U+00A2 ISOnum 'cedil': 0x00b8, # cedilla = spacing cedilla, U+00B8 ISOdia
'chi': 'χ', # greek small letter chi, U+03C7 ISOgrk3 'cent': 0x00a2, # cent sign, U+00A2 ISOnum
'circ': 'ˆ', # modifier letter circumflex accent, U+02C6 ISOpub 'chi': 0x03c7, # greek small letter chi, U+03C7 ISOgrk3
'clubs': '♣', # black club suit = shamrock, U+2663 ISOpub 'circ': 0x02c6, # modifier letter circumflex accent, U+02C6 ISOpub
'cong': '≅', # approximately equal to, U+2245 ISOtech 'clubs': 0x2663, # black club suit = shamrock, U+2663 ISOpub
'copy': '\251', # copyright sign, U+00A9 ISOnum 'cong': 0x2245, # approximately equal to, U+2245 ISOtech
'crarr': '↵', # downwards arrow with corner leftwards = carriage return, U+21B5 NEW 'copy': 0x00a9, # copyright sign, U+00A9 ISOnum
'cup': '∪', # union = cup, U+222A ISOtech 'crarr': 0x21b5, # downwards arrow with corner leftwards = carriage return, U+21B5 NEW
'curren': '\244', # currency sign, U+00A4 ISOnum 'cup': 0x222a, # union = cup, U+222A ISOtech
'dArr': '⇓', # downwards double arrow, U+21D3 ISOamsa 'curren': 0x00a4, # currency sign, U+00A4 ISOnum
'dagger': '†', # dagger, U+2020 ISOpub 'dArr': 0x21d3, # downwards double arrow, U+21D3 ISOamsa
'darr': '↓', # downwards arrow, U+2193 ISOnum 'dagger': 0x2020, # dagger, U+2020 ISOpub
'deg': '\260', # degree sign, U+00B0 ISOnum 'darr': 0x2193, # downwards arrow, U+2193 ISOnum
'delta': 'δ', # greek small letter delta, U+03B4 ISOgrk3 'deg': 0x00b0, # degree sign, U+00B0 ISOnum
'diams': '♦', # black diamond suit, U+2666 ISOpub 'delta': 0x03b4, # greek small letter delta, U+03B4 ISOgrk3
'divide': '\367', # division sign, U+00F7 ISOnum 'diams': 0x2666, # black diamond suit, U+2666 ISOpub
'eacute': '\351', # latin small letter e with acute, U+00E9 ISOlat1 'divide': 0x00f7, # division sign, U+00F7 ISOnum
'ecirc': '\352', # latin small letter e with circumflex, U+00EA ISOlat1 'eacute': 0x00e9, # latin small letter e with acute, U+00E9 ISOlat1
'egrave': '\350', # latin small letter e with grave, U+00E8 ISOlat1 'ecirc': 0x00ea, # latin small letter e with circumflex, U+00EA ISOlat1
'empty': '∅', # empty set = null set = diameter, U+2205 ISOamso 'egrave': 0x00e8, # latin small letter e with grave, U+00E8 ISOlat1
'emsp': ' ', # em space, U+2003 ISOpub 'empty': 0x2205, # empty set = null set = diameter, U+2205 ISOamso
'ensp': ' ', # en space, U+2002 ISOpub 'emsp': 0x2003, # em space, U+2003 ISOpub
'epsilon': 'ε', # greek small letter epsilon, U+03B5 ISOgrk3 'ensp': 0x2002, # en space, U+2002 ISOpub
'equiv': '≡', # identical to, U+2261 ISOtech 'epsilon': 0x03b5, # greek small letter epsilon, U+03B5 ISOgrk3
'eta': 'η', # greek small letter eta, U+03B7 ISOgrk3 'equiv': 0x2261, # identical to, U+2261 ISOtech
'eth': '\360', # latin small letter eth, U+00F0 ISOlat1 'eta': 0x03b7, # greek small letter eta, U+03B7 ISOgrk3
'euml': '\353', # latin small letter e with diaeresis, U+00EB ISOlat1 'eth': 0x00f0, # latin small letter eth, U+00F0 ISOlat1
'euro': '€', # euro sign, U+20AC NEW 'euml': 0x00eb, # latin small letter e with diaeresis, U+00EB ISOlat1
'exist': '∃', # there exists, U+2203 ISOtech 'euro': 0x20ac, # euro sign, U+20AC NEW
'fnof': 'ƒ', # latin small f with hook = function = florin, U+0192 ISOtech 'exist': 0x2203, # there exists, U+2203 ISOtech
'forall': '∀', # for all, U+2200 ISOtech 'fnof': 0x0192, # latin small f with hook = function = florin, U+0192 ISOtech
'frac12': '\275', # vulgar fraction one half = fraction one half, U+00BD ISOnum 'forall': 0x2200, # for all, U+2200 ISOtech
'frac14': '\274', # vulgar fraction one quarter = fraction one quarter, U+00BC ISOnum 'frac12': 0x00bd, # vulgar fraction one half = fraction one half, U+00BD ISOnum
'frac34': '\276', # vulgar fraction three quarters = fraction three quarters, U+00BE ISOnum 'frac14': 0x00bc, # vulgar fraction one quarter = fraction one quarter, U+00BC ISOnum
'frasl': '⁄', # fraction slash, U+2044 NEW 'frac34': 0x00be, # vulgar fraction three quarters = fraction three quarters, U+00BE ISOnum
'gamma': 'γ', # greek small letter gamma, U+03B3 ISOgrk3 'frasl': 0x2044, # fraction slash, U+2044 NEW
'ge': '≥', # greater-than or equal to, U+2265 ISOtech 'gamma': 0x03b3, # greek small letter gamma, U+03B3 ISOgrk3
'gt': '\76', # greater-than sign, U+003E ISOnum 'ge': 0x2265, # greater-than or equal to, U+2265 ISOtech
'hArr': '⇔', # left right double arrow, U+21D4 ISOamsa 'gt': 0x003e, # greater-than sign, U+003E ISOnum
'harr': '↔', # left right arrow, U+2194 ISOamsa 'hArr': 0x21d4, # left right double arrow, U+21D4 ISOamsa
'hearts': '♥', # black heart suit = valentine, U+2665 ISOpub 'harr': 0x2194, # left right arrow, U+2194 ISOamsa
'hellip': '…', # horizontal ellipsis = three dot leader, U+2026 ISOpub 'hearts': 0x2665, # black heart suit = valentine, U+2665 ISOpub
'iacute': '\355', # latin small letter i with acute, U+00ED ISOlat1 'hellip': 0x2026, # horizontal ellipsis = three dot leader, U+2026 ISOpub
'icirc': '\356', # latin small letter i with circumflex, U+00EE ISOlat1 'iacute': 0x00ed, # latin small letter i with acute, U+00ED ISOlat1
'iexcl': '\241', # inverted exclamation mark, U+00A1 ISOnum 'icirc': 0x00ee, # latin small letter i with circumflex, U+00EE ISOlat1
'igrave': '\354', # latin small letter i with grave, U+00EC ISOlat1 'iexcl': 0x00a1, # inverted exclamation mark, U+00A1 ISOnum
'image': 'ℑ', # blackletter capital I = imaginary part, U+2111 ISOamso 'igrave': 0x00ec, # latin small letter i with grave, U+00EC ISOlat1
'infin': '∞', # infinity, U+221E ISOtech 'image': 0x2111, # blackletter capital I = imaginary part, U+2111 ISOamso
'int': '∫', # integral, U+222B ISOtech 'infin': 0x221e, # infinity, U+221E ISOtech
'iota': 'ι', # greek small letter iota, U+03B9 ISOgrk3 'int': 0x222b, # integral, U+222B ISOtech
'iquest': '\277', # inverted question mark = turned question mark, U+00BF ISOnum 'iota': 0x03b9, # greek small letter iota, U+03B9 ISOgrk3
'isin': '∈', # element of, U+2208 ISOtech 'iquest': 0x00bf, # inverted question mark = turned question mark, U+00BF ISOnum
'iuml': '\357', # latin small letter i with diaeresis, U+00EF ISOlat1 'isin': 0x2208, # element of, U+2208 ISOtech
'kappa': 'κ', # greek small letter kappa, U+03BA ISOgrk3 'iuml': 0x00ef, # latin small letter i with diaeresis, U+00EF ISOlat1
'lArr': '⇐', # leftwards double arrow, U+21D0 ISOtech 'kappa': 0x03ba, # greek small letter kappa, U+03BA ISOgrk3
'lambda': 'λ', # greek small letter lambda, U+03BB ISOgrk3 'lArr': 0x21d0, # leftwards double arrow, U+21D0 ISOtech
'lang': '〈', # left-pointing angle bracket = bra, U+2329 ISOtech 'lambda': 0x03bb, # greek small letter lambda, U+03BB ISOgrk3
'laquo': '\253', # left-pointing double angle quotation mark = left pointing guillemet, U+00AB ISOnum 'lang': 0x2329, # left-pointing angle bracket = bra, U+2329 ISOtech
'larr': '←', # leftwards arrow, U+2190 ISOnum 'laquo': 0x00ab, # left-pointing double angle quotation mark = left pointing guillemet, U+00AB ISOnum
'lceil': '⌈', # left ceiling = apl upstile, U+2308 ISOamsc 'larr': 0x2190, # leftwards arrow, U+2190 ISOnum
'ldquo': '“', # left double quotation mark, U+201C ISOnum 'lceil': 0x2308, # left ceiling = apl upstile, U+2308 ISOamsc
'le': '≤', # less-than or equal to, U+2264 ISOtech 'ldquo': 0x201c, # left double quotation mark, U+201C ISOnum
'lfloor': '⌊', # left floor = apl downstile, U+230A ISOamsc 'le': 0x2264, # less-than or equal to, U+2264 ISOtech
'lowast': '∗', # asterisk operator, U+2217 ISOtech 'lfloor': 0x230a, # left floor = apl downstile, U+230A ISOamsc
'loz': '◊', # lozenge, U+25CA ISOpub 'lowast': 0x2217, # asterisk operator, U+2217 ISOtech
'lrm': '‎', # left-to-right mark, U+200E NEW RFC 2070 'loz': 0x25ca, # lozenge, U+25CA ISOpub
'lsaquo': '‹', # single left-pointing angle quotation mark, U+2039 ISO proposed 'lrm': 0x200e, # left-to-right mark, U+200E NEW RFC 2070
'lsquo': '‘', # left single quotation mark, U+2018 ISOnum 'lsaquo': 0x2039, # single left-pointing angle quotation mark, U+2039 ISO proposed
'lt': '\74', # less-than sign, U+003C ISOnum 'lsquo': 0x2018, # left single quotation mark, U+2018 ISOnum
'macr': '\257', # macron = spacing macron = overline = APL overbar, U+00AF ISOdia 'lt': 0x003c, # less-than sign, U+003C ISOnum
'mdash': '—', # em dash, U+2014 ISOpub 'macr': 0x00af, # macron = spacing macron = overline = APL overbar, U+00AF ISOdia
'micro': '\265', # micro sign, U+00B5 ISOnum 'mdash': 0x2014, # em dash, U+2014 ISOpub
'middot': '\267', # middle dot = Georgian comma = Greek middle dot, U+00B7 ISOnum 'micro': 0x00b5, # micro sign, U+00B5 ISOnum
'minus': '−', # minus sign, U+2212 ISOtech 'middot': 0x00b7, # middle dot = Georgian comma = Greek middle dot, U+00B7 ISOnum
'mu': 'μ', # greek small letter mu, U+03BC ISOgrk3 'minus': 0x2212, # minus sign, U+2212 ISOtech
'nabla': '∇', # nabla = backward difference, U+2207 ISOtech 'mu': 0x03bc, # greek small letter mu, U+03BC ISOgrk3
'nbsp': '\240', # no-break space = non-breaking space, U+00A0 ISOnum 'nabla': 0x2207, # nabla = backward difference, U+2207 ISOtech
'ndash': '–', # en dash, U+2013 ISOpub 'nbsp': 0x00a0, # no-break space = non-breaking space, U+00A0 ISOnum
'ne': '≠', # not equal to, U+2260 ISOtech 'ndash': 0x2013, # en dash, U+2013 ISOpub
'ni': '∋', # contains as member, U+220B ISOtech 'ne': 0x2260, # not equal to, U+2260 ISOtech
'not': '\254', # not sign, U+00AC ISOnum 'ni': 0x220b, # contains as member, U+220B ISOtech
'notin': '∉', # not an element of, U+2209 ISOtech 'not': 0x00ac, # not sign, U+00AC ISOnum
'nsub': '⊄', # not a subset of, U+2284 ISOamsn 'notin': 0x2209, # not an element of, U+2209 ISOtech
'ntilde': '\361', # latin small letter n with tilde, U+00F1 ISOlat1 'nsub': 0x2284, # not a subset of, U+2284 ISOamsn
'nu': 'ν', # greek small letter nu, U+03BD ISOgrk3 'ntilde': 0x00f1, # latin small letter n with tilde, U+00F1 ISOlat1
'oacute': '\363', # latin small letter o with acute, U+00F3 ISOlat1 'nu': 0x03bd, # greek small letter nu, U+03BD ISOgrk3
'ocirc': '\364', # latin small letter o with circumflex, U+00F4 ISOlat1 'oacute': 0x00f3, # latin small letter o with acute, U+00F3 ISOlat1
'oelig': 'œ', # latin small ligature oe, U+0153 ISOlat2 'ocirc': 0x00f4, # latin small letter o with circumflex, U+00F4 ISOlat1
'ograve': '\362', # latin small letter o with grave, U+00F2 ISOlat1 'oelig': 0x0153, # latin small ligature oe, U+0153 ISOlat2
'oline': '‾', # overline = spacing overscore, U+203E NEW 'ograve': 0x00f2, # latin small letter o with grave, U+00F2 ISOlat1
'omega': 'ω', # greek small letter omega, U+03C9 ISOgrk3 'oline': 0x203e, # overline = spacing overscore, U+203E NEW
'omicron': 'ο', # greek small letter omicron, U+03BF NEW 'omega': 0x03c9, # greek small letter omega, U+03C9 ISOgrk3
'oplus': '⊕', # circled plus = direct sum, U+2295 ISOamsb 'omicron': 0x03bf, # greek small letter omicron, U+03BF NEW
'or': '∨', # logical or = vee, U+2228 ISOtech 'oplus': 0x2295, # circled plus = direct sum, U+2295 ISOamsb
'ordf': '\252', # feminine ordinal indicator, U+00AA ISOnum 'or': 0x2228, # logical or = vee, U+2228 ISOtech
'ordm': '\272', # masculine ordinal indicator, U+00BA ISOnum 'ordf': 0x00aa, # feminine ordinal indicator, U+00AA ISOnum
'oslash': '\370', # latin small letter o with stroke, = latin small letter o slash, U+00F8 ISOlat1 'ordm': 0x00ba, # masculine ordinal indicator, U+00BA ISOnum
'otilde': '\365', # latin small letter o with tilde, U+00F5 ISOlat1 'oslash': 0x00f8, # latin small letter o with stroke, = latin small letter o slash, U+00F8 ISOlat1
'otimes': '⊗', # circled times = vector product, U+2297 ISOamsb 'otilde': 0x00f5, # latin small letter o with tilde, U+00F5 ISOlat1
'ouml': '\366', # latin small letter o with diaeresis, U+00F6 ISOlat1 'otimes': 0x2297, # circled times = vector product, U+2297 ISOamsb
'para': '\266', # pilcrow sign = paragraph sign, U+00B6 ISOnum 'ouml': 0x00f6, # latin small letter o with diaeresis, U+00F6 ISOlat1
'part': '∂', # partial differential, U+2202 ISOtech 'para': 0x00b6, # pilcrow sign = paragraph sign, U+00B6 ISOnum
'permil': '‰', # per mille sign, U+2030 ISOtech 'part': 0x2202, # partial differential, U+2202 ISOtech
'perp': '⊥', # up tack = orthogonal to = perpendicular, U+22A5 ISOtech 'permil': 0x2030, # per mille sign, U+2030 ISOtech
'phi': 'φ', # greek small letter phi, U+03C6 ISOgrk3 'perp': 0x22a5, # up tack = orthogonal to = perpendicular, U+22A5 ISOtech
'pi': 'π', # greek small letter pi, U+03C0 ISOgrk3 'phi': 0x03c6, # greek small letter phi, U+03C6 ISOgrk3
'piv': 'ϖ', # greek pi symbol, U+03D6 ISOgrk3 'pi': 0x03c0, # greek small letter pi, U+03C0 ISOgrk3
'plusmn': '\261', # plus-minus sign = plus-or-minus sign, U+00B1 ISOnum 'piv': 0x03d6, # greek pi symbol, U+03D6 ISOgrk3
'pound': '\243', # pound sign, U+00A3 ISOnum 'plusmn': 0x00b1, # plus-minus sign = plus-or-minus sign, U+00B1 ISOnum
'prime': '′', # prime = minutes = feet, U+2032 ISOtech 'pound': 0x00a3, # pound sign, U+00A3 ISOnum
'prod': '∏', # n-ary product = product sign, U+220F ISOamsb 'prime': 0x2032, # prime = minutes = feet, U+2032 ISOtech
'prop': '∝', # proportional to, U+221D ISOtech 'prod': 0x220f, # n-ary product = product sign, U+220F ISOamsb
'psi': 'ψ', # greek small letter psi, U+03C8 ISOgrk3 'prop': 0x221d, # proportional to, U+221D ISOtech
'quot': '\42', # quotation mark = APL quote, U+0022 ISOnum 'psi': 0x03c8, # greek small letter psi, U+03C8 ISOgrk3
'rArr': '⇒', # rightwards double arrow, U+21D2 ISOtech 'quot': 0x0022, # quotation mark = APL quote, U+0022 ISOnum
'radic': '√', # square root = radical sign, U+221A ISOtech 'rArr': 0x21d2, # rightwards double arrow, U+21D2 ISOtech
'rang': '〉', # right-pointing angle bracket = ket, U+232A ISOtech 'radic': 0x221a, # square root = radical sign, U+221A ISOtech
'raquo': '\273', # right-pointing double angle quotation mark = right pointing guillemet, U+00BB ISOnum 'rang': 0x232a, # right-pointing angle bracket = ket, U+232A ISOtech
'rarr': '→', # rightwards arrow, U+2192 ISOnum 'raquo': 0x00bb, # right-pointing double angle quotation mark = right pointing guillemet, U+00BB ISOnum
'rceil': '⌉', # right ceiling, U+2309 ISOamsc 'rarr': 0x2192, # rightwards arrow, U+2192 ISOnum
'rdquo': '”', # right double quotation mark, U+201D ISOnum 'rceil': 0x2309, # right ceiling, U+2309 ISOamsc
'real': 'ℜ', # blackletter capital R = real part symbol, U+211C ISOamso 'rdquo': 0x201d, # right double quotation mark, U+201D ISOnum
'reg': '\256', # registered sign = registered trade mark sign, U+00AE ISOnum 'real': 0x211c, # blackletter capital R = real part symbol, U+211C ISOamso
'rfloor': '⌋', # right floor, U+230B ISOamsc 'reg': 0x00ae, # registered sign = registered trade mark sign, U+00AE ISOnum
'rho': 'ρ', # greek small letter rho, U+03C1 ISOgrk3 'rfloor': 0x230b, # right floor, U+230B ISOamsc
'rlm': '‏', # right-to-left mark, U+200F NEW RFC 2070 'rho': 0x03c1, # greek small letter rho, U+03C1 ISOgrk3
'rsaquo': '›', # single right-pointing angle quotation mark, U+203A ISO proposed 'rlm': 0x200f, # right-to-left mark, U+200F NEW RFC 2070
'rsquo': '’', # right single quotation mark, U+2019 ISOnum 'rsaquo': 0x203a, # single right-pointing angle quotation mark, U+203A ISO proposed
'sbquo': '‚', # single low-9 quotation mark, U+201A NEW 'rsquo': 0x2019, # right single quotation mark, U+2019 ISOnum
'scaron': 'š', # latin small letter s with caron, U+0161 ISOlat2 'sbquo': 0x201a, # single low-9 quotation mark, U+201A NEW
'sdot': '⋅', # dot operator, U+22C5 ISOamsb 'scaron': 0x0161, # latin small letter s with caron, U+0161 ISOlat2
'sect': '\247', # section sign, U+00A7 ISOnum 'sdot': 0x22c5, # dot operator, U+22C5 ISOamsb
'shy': '\255', # soft hyphen = discretionary hyphen, U+00AD ISOnum 'sect': 0x00a7, # section sign, U+00A7 ISOnum
'sigma': 'σ', # greek small letter sigma, U+03C3 ISOgrk3 'shy': 0x00ad, # soft hyphen = discretionary hyphen, U+00AD ISOnum
'sigmaf': 'ς', # greek small letter final sigma, U+03C2 ISOgrk3 'sigma': 0x03c3, # greek small letter sigma, U+03C3 ISOgrk3
'sim': '∼', # tilde operator = varies with = similar to, U+223C ISOtech 'sigmaf': 0x03c2, # greek small letter final sigma, U+03C2 ISOgrk3
'spades': '♠', # black spade suit, U+2660 ISOpub 'sim': 0x223c, # tilde operator = varies with = similar to, U+223C ISOtech
'sub': '⊂', # subset of, U+2282 ISOtech 'spades': 0x2660, # black spade suit, U+2660 ISOpub
'sube': '⊆', # subset of or equal to, U+2286 ISOtech 'sub': 0x2282, # subset of, U+2282 ISOtech
'sum': '∑', # n-ary sumation, U+2211 ISOamsb 'sube': 0x2286, # subset of or equal to, U+2286 ISOtech
'sup': '⊃', # superset of, U+2283 ISOtech 'sum': 0x2211, # n-ary sumation, U+2211 ISOamsb
'sup1': '\271', # superscript one = superscript digit one, U+00B9 ISOnum 'sup': 0x2283, # superset of, U+2283 ISOtech
'sup2': '\262', # superscript two = superscript digit two = squared, U+00B2 ISOnum 'sup1': 0x00b9, # superscript one = superscript digit one, U+00B9 ISOnum
'sup3': '\263', # superscript three = superscript digit three = cubed, U+00B3 ISOnum 'sup2': 0x00b2, # superscript two = superscript digit two = squared, U+00B2 ISOnum
'supe': '⊇', # superset of or equal to, U+2287 ISOtech 'sup3': 0x00b3, # superscript three = superscript digit three = cubed, U+00B3 ISOnum
'szlig': '\337', # latin small letter sharp s = ess-zed, U+00DF ISOlat1 'supe': 0x2287, # superset of or equal to, U+2287 ISOtech
'tau': 'τ', # greek small letter tau, U+03C4 ISOgrk3 'szlig': 0x00df, # latin small letter sharp s = ess-zed, U+00DF ISOlat1
'there4': '∴', # therefore, U+2234 ISOtech 'tau': 0x03c4, # greek small letter tau, U+03C4 ISOgrk3
'theta': 'θ', # greek small letter theta, U+03B8 ISOgrk3 'there4': 0x2234, # therefore, U+2234 ISOtech
'thetasym': 'ϑ', # greek small letter theta symbol, U+03D1 NEW 'theta': 0x03b8, # greek small letter theta, U+03B8 ISOgrk3
'thinsp': ' ', # thin space, U+2009 ISOpub 'thetasym': 0x03d1, # greek small letter theta symbol, U+03D1 NEW
'thorn': '\376', # latin small letter thorn with, U+00FE ISOlat1 'thinsp': 0x2009, # thin space, U+2009 ISOpub
'tilde': '˜', # small tilde, U+02DC ISOdia 'thorn': 0x00fe, # latin small letter thorn with, U+00FE ISOlat1
'times': '\327', # multiplication sign, U+00D7 ISOnum 'tilde': 0x02dc, # small tilde, U+02DC ISOdia
'trade': '™', # trade mark sign, U+2122 ISOnum 'times': 0x00d7, # multiplication sign, U+00D7 ISOnum
'uArr': '⇑', # upwards double arrow, U+21D1 ISOamsa 'trade': 0x2122, # trade mark sign, U+2122 ISOnum
'uacute': '\372', # latin small letter u with acute, U+00FA ISOlat1 'uArr': 0x21d1, # upwards double arrow, U+21D1 ISOamsa
'uarr': '↑', # upwards arrow, U+2191 ISOnum 'uacute': 0x00fa, # latin small letter u with acute, U+00FA ISOlat1
'ucirc': '\373', # latin small letter u with circumflex, U+00FB ISOlat1 'uarr': 0x2191, # upwards arrow, U+2191 ISOnum
'ugrave': '\371', # latin small letter u with grave, U+00F9 ISOlat1 'ucirc': 0x00fb, # latin small letter u with circumflex, U+00FB ISOlat1
'uml': '\250', # diaeresis = spacing diaeresis, U+00A8 ISOdia 'ugrave': 0x00f9, # latin small letter u with grave, U+00F9 ISOlat1
'upsih': 'ϒ', # greek upsilon with hook symbol, U+03D2 NEW 'uml': 0x00a8, # diaeresis = spacing diaeresis, U+00A8 ISOdia
'upsilon': 'υ', # greek small letter upsilon, U+03C5 ISOgrk3 'upsih': 0x03d2, # greek upsilon with hook symbol, U+03D2 NEW
'uuml': '\374', # latin small letter u with diaeresis, U+00FC ISOlat1 'upsilon': 0x03c5, # greek small letter upsilon, U+03C5 ISOgrk3
'weierp': '℘', # script capital P = power set = Weierstrass p, U+2118 ISOamso 'uuml': 0x00fc, # latin small letter u with diaeresis, U+00FC ISOlat1
'xi': 'ξ', # greek small letter xi, U+03BE ISOgrk3 'weierp': 0x2118, # script capital P = power set = Weierstrass p, U+2118 ISOamso
'yacute': '\375', # latin small letter y with acute, U+00FD ISOlat1 'xi': 0x03be, # greek small letter xi, U+03BE ISOgrk3
'yen': '\245', # yen sign = yuan sign, U+00A5 ISOnum 'yacute': 0x00fd, # latin small letter y with acute, U+00FD ISOlat1
'yuml': '\377', # latin small letter y with diaeresis, U+00FF ISOlat1 'yen': 0x00a5, # yen sign = yuan sign, U+00A5 ISOnum
'zeta': 'ζ', # greek small letter zeta, U+03B6 ISOgrk3 'yuml': 0x00ff, # latin small letter y with diaeresis, U+00FF ISOlat1
'zwj': '‍', # zero width joiner, U+200D NEW RFC 2070 'zeta': 0x03b6, # greek small letter zeta, U+03B6 ISOgrk3
'zwnj': '‌', # zero width non-joiner, U+200C NEW RFC 2070 'zwj': 0x200d, # zero width joiner, U+200D NEW RFC 2070
'zwnj': 0x200c, # zero width non-joiner, U+200C NEW RFC 2070
} }
# maps the Unicode codepoint to the HTML entity name
codepoint2name = {}
# maps the HTML entity name to the character
# (or a character reference if the character is outside the Latin-1 range)
entitydefs = {}
for (name, codepoint) in name2codepoint.iteritems():
codepoint2name[codepoint] = name
if codepoint <= 0xff:
entitydefs[name] = chr(codepoint)
else:
entitydefs[name] = '&#%d;' % codepoint
del name, codepoint
...@@ -103,6 +103,10 @@ Extension modules ...@@ -103,6 +103,10 @@ Extension modules
Library Library
------- -------
- htmlentitydefs has two new dictionaries: name2codepoint maps
HTML entity names to Unicode codepoints (as integers).
codepoint2name is the reverse mapping. See SF patch #722017.
- pdb has a new command, "debug", which lets you step through - pdb has a new command, "debug", which lets you step through
arbitrary code from the debugger's (pdb) prompt. arbitrary code from the debugger's (pdb) prompt.
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment