HTML Entity Encoder / Decoder

Convert text to HTML entities and back. Encode the HTML-special characters (& < > " ') to insert text safely into HTML, or encode every non-ASCII character as a named or numeric entity. Decode resolves any named or numeric entity — the full HTML5 table — back to plain text. Runs entirely in your browser. Last reviewed 2026-06-19.

How HTML entities work

HTML gives a few characters structural meaning: < and > open and close tags, & begins an entity, and " / ' delimit attribute values. To display those characters literally — or to drop untrusted text into a page without it being interpreted as markup — you replace them with their entities. Any other character can also be written as an entity by name (&copy; → ©) or by number (&#169; decimal, or &#xA9; hex).

Which mode should I use?

  • HTML-special only — escapes just & < > " '. This is what you want for safely showing or inserting text into HTML; everything else (accents, emoji, spaces) is left readable.
  • All non-ASCII — named — also encodes every character above plain ASCII, using a readable named entity where one exists (&eacute;, &mdash;) and falling back to a decimal numeric entity otherwise. Good for maximum email/legacy compatibility while staying human-readable.
  • All non-ASCII — decimal / hex — encodes every non-ASCII character purely as a numeric entity (&#169; or &#xA9;). Guaranteed to render anywhere, and the safest choice for characters that have no name.

Decode works on anything: it resolves every named entity (the complete HTML5 list) and both decimal and hexadecimal numeric entities, including astral characters and emoji, back to plain text — without ever inserting your input as live markup.

Common HTML entities

CharacterNamedDecimalHexDescription
& &amp; &#38; &#x26; Ampersand
< &lt; &#60; &#x3C; Less-than (opens a tag)
> &gt; &#62; &#x3E; Greater-than (closes a tag)
" &quot; &#34; &#x22; Double quote
' &apos; &#39; &#x27; Apostrophe (use &#39; for HTML4)
  &nbsp; &#160; &#xA0; Non-breaking space
© &copy; &#169; &#xA9; Copyright
® &reg; &#174; &#xAE; Registered trademark
&trade; &#8482; &#x2122; Trademark
&euro; &#8364; &#x20AC; Euro
£ &pound; &#163; &#xA3; Pound sterling
° &deg; &#176; &#xB0; Degree
× &times; &#215; &#xD7; Multiplication sign
&mdash; &#8212; &#x2014; Em dash
&hellip; &#8230; &#x2026; Horizontal ellipsis
&ldquo; &#8220; &#x201C; Left double quote

Tip: &apos; is valid in HTML5 but not older HTML, so this tool uses the universally-safe &#39; for an apostrophe when encoding.

A short history of HTML entities

Entities are older than the web as most people know it. HTML is an application of SGML (ISO 8879), and it inherited SGML's entity syntax along with the original mnemonic names. The first web standard to formalise them, HTML 2.0 (RFC 1866, November 1995), restricted documents to the ISO-8859-1 (Latin-1) character set and borrowed SGML's "Added Latin 1" entity set so authors could write characters like &copy; and &eacute; using nothing but plain ASCII — essential in an era when reliably transmitting anything beyond ASCII was a gamble.

Each later revision expanded the list. Today the WHATWG HTML standard defines a very large table of named character references — over 2,000 of them (the machine-readable entities.json list, commonly cited as 2,231 entries), covering currency, maths, Greek letters, arrows, punctuation and more. Numeric references, meanwhile, can reach any Unicode character whether or not it has a name, which is why the two systems coexist.

Three kinds of character reference

"Entity" is the everyday word, but the spec's umbrella term is character reference, and there are three flavours that all produce the same character:

  • Named character reference&name;, a readable alias such as &amp;, &nbsp; or &copy;. Easiest to read, but only works for characters that have a defined name.
  • Decimal numeric reference&#NN;, the character's Unicode code point in base 10 (&#169; = ©).
  • Hexadecimal numeric reference&#xNN;, the same code point in base 16 (&#xA9; = ©). The x may be upper- or lower-case.

The crucial detail is that in HTML4 and HTML5 a numeric reference names a Unicode code point, not a byte in some legacy encoding. That is why this tool can round-trip emoji and any script — 😀 is simply &#128512; (decimal) or &#x1F600; (hex) — and why numeric references are the guaranteed-to-render fallback when a character has no name.

The five predefined entities (and the apostrophe trap)

XML — and therefore strict XHTML, RSS and Atom feeds — predefines exactly five entities: &amp;, &lt;, &gt;, &quot; and &apos;. HTML has thousands of names, but those five are the load-bearing ones, because they escape the characters that actually control markup. There is one famous gotcha: &apos; is defined in XML and in HTML5, but it was never part of HTML4. A legacy browser or a strict HTML4 parser will show &apos; as literal text. The numeric &#39; works everywhere, which is exactly why this tool always emits &#39; for an apostrophe rather than the prettier name.

Escaping is the front line against XSS

Beyond simply displaying a <, HTML escaping is the primary defence against cross-site scripting (XSS). The rule from the OWASP XSS Prevention Cheat Sheet is to encode untrusted data at the exact moment it is written into a page, so a value like <script>…</script> becomes inert text instead of an executed tag. The catch is that escaping is context-sensitive — HTML entity encoding is only correct in one place:

Where the data landsCorrect escaping
HTML element content (page body)HTML entity encode & < >
Inside an HTML attributeAlso encode " and ', and always quote the attribute
Inside a <script> / JavaScriptJavaScript escaping (\uXXXX) — not HTML entities
Inside a URL / hrefURL (percent) encoding
Inside a <style> / CSSCSS escaping

This is why the " and ' characters matter so much: in an unquoted or single-quoted attribute, an unescaped quote lets an attacker close the attribute early and inject an event handler such as onmouseover=. HTML entity encoding does not protect data placed in a script, a CSS rule or a URL — those need their own encoding, which is a common source of vulnerabilities.

When you actually need entities (and when you don't)

A widespread misconception is that you must entity-encode every accented or non-Latin character. With a proper UTF-8 document you generally do not: é, 日本語 and 😀 display perfectly as their literal selves. Entities are about markup safety and legacy compatibility, not about representing Unicode. The genuine reasons to reach for them are:

  • Showing code as text — printing &lt;div&gt; so a reader sees the literal tag instead of the browser rendering it.
  • XML / RSS / Atom feeds, which are strict and reject a bare &, < or >.
  • Email and older systems that cannot be trusted to be UTF-8.
  • Injection prevention, as above.
  • Smart punctuation — curly quotes, em dashes, &copy; and &trade; — for environments that mangle anything outside ASCII.

Two pitfalls to watch for. Double-encoding happens when an escape function runs twice, turning & into &amp; and then &amp;amp;, so the user sees a literal &amp; on the page — use this tool's Decode to unwind it. Mojibake is the garbled text (é where é should be) caused by reading UTF-8 bytes as Latin-1; the real fix is a correct charset declaration, not an entity tool. And &nbsp; — the non-breaking space, U+00A0 — keeps things like "10 km" or "Mr. Smith" from wrapping across a line break; use it for that, not as a layout crutch (CSS does spacing better).

How the browser actually parses entities

Decoding has a few quirks worth knowing, all of which this tool handles. For backward compatibility, the HTML5 parser still recognises a fixed set of legacy names without a trailing semicolon — &copy, &amp, &lt — which is the classic trap behind a URL like ?x&copy=1 turning into ?x©=1. Most newer names require the semicolon. The parser also remaps certain numeric references on purpose: a NULL reference &#0; becomes the replacement character U+FFFD (�); numeric references in the range &#128;&#159; are remapped to their Windows-1252 characters rather than the literal C1 control codes (so &#147; and &#148; become curly quotes); and references to surrogate or out-of-range code points are replaced with U+FFFD too. These rules are baked into the spec so that messy real-world HTML still renders predictably.

Character encoding vs character references

It is easy to confuse two things that both deal with "characters". Character encoding — almost always UTF-8 today — is how characters are turned into the actual bytes stored in a file and sent over the network. Character references (entities) are a feature of the HTML/XML text itself, a way to write a character using only ASCII. They solve different problems, and getting the relationship wrong is the source of a lot of garbled text.

The key insight: with a correctly declared <meta charset="utf-8">, your document can contain real characters directly — café, €20, , even 😀 — and they will display perfectly without a single entity. Entities are not a way to "support" Unicode; UTF-8 already does that. Reach for entities only when you need to neutralise the markup-significant characters, or when a downstream system (an old email gateway, a strict XML parser) cannot be trusted to handle UTF-8. Treating entities as a charset substitute leads to over-encoded, unreadable source; treating UTF-8 as optional leads to é where an é should be. Use the right tool for each job.

A tour of the entities you'll actually use

Out of thousands of named references, a small handful do nearly all the real work. Knowing what each one is for saves a lot of guesswork:

  • &nbsp; (non-breaking space) — keeps two words on the same line, so "10&nbsp;km" or "Mr.&nbsp;Smith" never wrap awkwardly. It also stops HTML from collapsing repeated spaces. Use it deliberately, not as a layout hack.
  • &ndash; and &mdash; — the en dash (–, for ranges like 2010–2020) and the longer em dash (—, for breaks in a sentence). Many systems mangle these, so entities make them reliable.
  • Smart quotes&lsquo; &rsquo; for curly single quotes and &ldquo; &rdquo; for curly doubles. These are a classic source of mojibake when copied from a word processor, so encoding them keeps typography intact.
  • Legal & commercial&copy; (©), &reg; (®) and &trade; (™) for copyright, registered-trademark and trademark marks.
  • Maths & units&times; (×), &divide; (÷), &deg; (°), &plusmn; (±) and the arrows &larr; / &rarr;.

Every one of these has an exact numeric equivalent (for example &copy; is &#169; or &#xA9;), which is the guaranteed-to-render fallback if a target system doesn't recognise the name. Switch this tool to "decimal" or "hex" mode to see the numeric form of anything you encode.

How this tool decodes safely

Decoding looks trivial but hides a real risk: if a tool resolved entities by dropping your text into a live HTML element, a malicious string could execute a script. This tool avoids that entirely. Numeric references are resolved mathematically — the code point is read and the exact character is reconstructed, with no HTML involved at all. Named references are resolved one clean &name; token at a time through a detached element, and because such a token can never itself contain a < or >, there is no way for markup or a script to slip through. That approach also means the decoder supports the full HTML5 named table — every one of the 2,000-plus names — not just a hand-picked few, while staying completely on your device. Anything that isn't a valid, terminated entity is simply left untouched, so a stray & in your text survives decoding intact rather than being mangled. The result is a decoder that is both comprehensive and safe by construction.

Frequently asked questions

What is an HTML entity?
An HTML entity is a code that represents a character which is reserved in HTML or hard to type — written either by name (&copy; for ©) or by number (&#169; or &#xA9;). Entities let you show characters like < > & literally instead of having the browser treat them as markup.
When do I need to encode & < > " and '?
Whenever you put text inside HTML. < and > start and end tags, & starts an entity, and " and ' delimit attributes — so showing them literally (or putting user text into a page safely) means replacing them with &lt; &gt; &amp; &quot; and &#39;. The “HTML-special characters only” mode does exactly this.
Named vs numeric entities — which should I use?
They are equivalent. Named entities (&copy;, &mdash;) are more readable; numeric entities (&#169;, &#8212; or hex &#xA9;) work even for characters that have no name and are guaranteed to render. Decimal and hex numeric entities are interchangeable — &#169; and &#xA9; both produce ©.
Does it handle emoji and non-Latin scripts?
Yes. Encoding iterates by Unicode code point, so emoji and astral-plane characters (e.g. 😀 = &#128512; / &#x1F600;) and scripts like 日本語 or हिंदी encode and decode back exactly, with no broken surrogate pairs.
Is decoding safe — could it run a script?
No. Decoding never inserts your text as live HTML. Numeric entities are resolved mathematically from their code point, and named entities are resolved one clean &name; token at a time, so there is no way for markup or a script to execute. Everything also stays on your device.
Is anything uploaded to a server?
No. Encoding and decoding run entirely in your browser using JavaScript, so your text never leaves your device. The tool also works offline once the page has loaded.

Related tools

See all developer & data conversions →

Explore more tools

Browse all 53 tools →