HomeBlog › What HTML entities actually are — and when you have to encode

What HTML entities actually are — and when you have to encode

If you have ever pasted a code snippet into a web page and watched your <div> vanish — or worse, actually turn into a div — you have met the problem HTML encoding solves. It is one of those topics that feels fuzzy until someone explains the single rule underneath it, and then it is obvious forever. Here is that rule.

The browser cannot tell your text from your tags

An HTML page is just text. When the browser reads it, it has no way of knowing whether a < you typed was meant as "less than" in a sentence or as the start of a tag like <strong>. It resolves that ambiguity with a blunt rule: a < always begins a tag. So the moment your content contains a literal <, the browser stops displaying your text and starts trying to interpret markup. That is the entire source of the bug.

Only four characters cause this, and they are the only four you strictly need to think about: the angle brackets < and > that open and close tags, the ampersand & (because it starts an entity), and the double quote ", which can prematurely close an attribute value.

What an "entity" actually is

An HTML entity is an escape code. Instead of writing the character itself, you write a stand-in that the browser decodes back into the character when it renders — but, crucially, only after it has finished parsing tags. So &lt; tells the browser "show a less-than sign here; do not treat it as a tag." The reader sees a <, but the browser never tried to open a tag, because what was actually in the markup was the safe entity.

This is why the ampersand itself has to be encoded. Since every entity begins with &, a raw ampersand in your text can accidentally start one. Writing Q&A from a literal unencoded ampersand can break in surprising ways; encoded properly it always renders the way the reader expects.

Named vs numeric entities

You will see two styles. Named entities like &lt; are readable and ideal for hand-edited HTML. Numeric entities like &#60; (decimal) or &#x3C; (hex) reference the character by its Unicode code point. Numeric entities matter when a character has no short, memorable name, or when you are targeting a system that does not recognise the named set. They mean the same thing — the choice is about readability and compatibility.

The two situations where this bites

Displaying code on a page. The everyday case. You are writing a tutorial and want readers to see <button> as text rather than have the browser render a button. Encode the snippet, drop it into your <pre><code> block, and the tags appear as text.

Printing user input. The security case. If your site takes something a user typed and writes it straight into the page, an attacker can type markup the browser will execute — the classic cross-site scripting hole. Encoding user-supplied content before printing it turns their <script> into harmless visible text. It is the simplest, oldest defence there is, and it still matters.

Doing it without writing a function

For a one-off you do not need code. Paste your text into the HTML Encode / Decode tool, pick a direction, and copy the result. It handles named, decimal, and hex entities in one pass, so messy mixed input — the kind you get copying from an RSS feed or a field that was encoded twice — comes back clean. It runs in your browser, so you can paste internal code or data safely.

The one rule to remember: the browser reads < as the start of a tag, always. Encoding is just the polite way of saying "no, I meant the actual character." Once that clicks, the rest is detail.

← Back to all posts