HTML Entity Encoding: A Developer's Guide
What Are HTML Entities?
HTML entities are special codes that represent characters which have reserved meaning in HTML or cannot be typed directly on a keyboard. They begin with an ampersand (&) and end with a semicolon (;). For example, < represents the less-than sign (<), which would otherwise be interpreted as the start of an HTML tag.
Without entity encoding, characters like <, >, &, and ” would confuse the browser’s HTML parser, potentially breaking page layout or creating security vulnerabilities. Entity encoding ensures these characters are displayed as text rather than interpreted as markup.
Named vs. Numeric Entities
Named entities use descriptive names: & for ampersand, < for less-than, > for greater-than, ” for double quotes, and ’ for apostrophes. Named entities are human-readable and easy to remember for common characters.
Numeric entities use the character’s Unicode code point. Decimal format: < for less-than. Hexadecimal format: < for the same character. Numeric entities can represent any Unicode character, making them more versatile than named entities, which exist only for a subset of characters.
Both formats produce identical results in the browser. Named entities are preferred for common characters because they are more readable in source code. Numeric entities are necessary for less common symbols, emoji, and characters from non-Latin scripts.
Essential Characters to Encode
Five characters must always be encoded in HTML content:
- & (ampersand): & — Without encoding, the browser starts looking for an entity name.
- < (less-than): < — Without encoding, the browser interprets it as a tag opening.
- > (greater-than): > — Encoding is recommended for consistency, though browsers handle bare > in most contexts.
- ” (double quote): ” — Required inside attribute values that use double quotes.
- ’ (apostrophe): ’ or ’ — Required inside attribute values that use single quotes.
Security: Preventing XSS Attacks
Cross-site scripting (XSS) is one of the most common web vulnerabilities, and proper HTML entity encoding is the primary defense. If user-supplied data is inserted into HTML without encoding, an attacker can inject malicious scripts.
For example, if a user enters a name like and the application displays it without encoding, the browser executes the script. Encoding the input transforms it to <script>alert(‘hacked’)</script>, which displays as harmless text.
Always encode user input before inserting it into HTML content, attribute values, JavaScript strings, or CSS values. Most modern web frameworks provide automatic encoding (auto-escaping) in their template engines, but developers must understand the mechanism to avoid bypassing it accidentally.
Common HTML Entities Reference
Beyond the essential five, frequently used entities include: (non-breaking space), © (copyright symbol), ® (registered trademark), — (em dash), – (en dash), • (bullet), … (ellipsis), € (euro sign), and £ (pound sign).
Mathematical symbols, arrows, Greek letters, and currency signs all have named entities or can be expressed with numeric codes. The full list contains over 2,000 named entities as defined in the HTML specification.
Use the HTML entity encoder on CalcHub to encode and decode HTML entities, or explore our text tools for additional formatting utilities.
Encode HTML entities safely with CalcHub’s developer tools.
Explore all free tools on CalcHub
Browse Tools