charset
charset Attribute
The charset attribute is typically used in a <meta> tag inside the document's <head> section. UTF-8 is the recommended character encoding as it supports nearly all characters used in human languages.
Syntax
<meta charset="character-set">
Example
Setting the character encoding to UTF-8 in HTML:
Meta Example
Some text...
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
</head>
<body>
<h1>My Website</h1>
<p>Some text...</p>
</body>
</html>
ASCII Character Set
ASCII was one of the earliest character encoding standards, containing 128 characters:
- Uppercase and lowercase English letters (A-Z, a-z)
- Digits from 0 to 9
- Symbols such as
!,$,+,-,@,<, and>
ANSI Character Set
ANSI (Windows-1252) was an early Windows encoding system:
- Matches
ASCIIfor characters 0–127 - Includes additional special characters from 128–159
- Aligns with
UTF-8for characters 160–255
To use ANSI in HTML:
<meta charset="Windows-1252">
ISO-8859-1 Character Set
ISO-8859-1 was the default encoding for HTML 4, supporting 256 characters. It shares similarities with ASCII and ANSI but has certain differences.
- Matches ASCII for characters 0–127
- Skips the range 128–159
- Aligns with
ANSIandUTF-8for 160–255
Usage in HTML
HTML 4
<meta http-equiv="Content-Type" content="text/html;charset=ISO-8859-1">
HTML 5
<meta charset="ISO-8859-1">
UTF-8 Character Set
UTF-8 is a universal character encoding that supports many character sets.
- Matches ASCII for 0–127
- Excludes characters 128–159
- Aligns with ANSI and
ISO-8859-1for160–255 - Supports thousands of additional characters
Usage in HTML
<meta charset="UTF-8">
Common Character Encodings
UTF-8:The most widely used encoding, supporting almost all characters worldwide. Example:<meta charset="UTF-8">.ISO-8859-1 (Latin-1):Supports most Western European languages. Example:<meta charset="ISO-8859-1">.Windows-1252:Similar to ISO-8859-1 with additional characters. Example:<meta charset="Windows-1252">.UTF-16:Less common for the web. Example:<meta charset="UTF-16">.ISO-8859-2:Supports Central and Eastern European languages. Example:<meta charset="ISO-8859-2">.GBK:Used for Simplified Chinese characters. Example:<meta charset="GBK">.Shift_JIS:Used for Japanese text. Example:<meta charset="Shift_JIS">.EUC-KR:Used for the Korean language. Example:<meta charset="EUC-KR">.
Values
character-set- Specifies the character set, such as UTF-8 or ISO-8859-1.
Applies To
The charset attribute is used in the following HTML element:
Character Set Comparison
The table below highlights key differences between the mentioned character sets.
| Num | ASCII | ANSI | ISO-8859-1 | UTF-8 | Description |
|---|---|---|---|---|---|
| 0 | NUL | NUL | NUL | NUL | Null character |
| 1 | SOH | SOH | SOH | SOH | Start of Header |
| 2 | STX | STX | STX | STX | Start of Text |
| 3 | ETX | ETX | ETX | ETX | End of Text |
| 4 | EOT | EOT | EOT | EOT | End of Transmission |
| 5 | ENQ | ENQ | ENQ | ENQ | Enquiry |
| 6 | ACK | ACK | ACK | ACK | Acknowledgment |
| 7 | BEL | BEL | BEL | BEL | Bell |
| 8 | BS | BS | BS | BS | Backspace |
| 9 | TAB | TAB | TAB | TAB | Horizontal Tab |
| 10 | LF | LF | LF | LF | Line Feed |
| 32 | Space | Space | Space | Space | Space |
| 48-57 | 0-9 | 0-9 | 0-9 | 0-9 | Digits |
| 65-90 | A-Z | A-Z | A-Z | A-Z | Uppercase Latin letters |
| 97-122 | a-z | a-z | a-z | a-z | Lowercase Latin letters |
| 128-159 | (unused) | Control characters (not used) | |||
| 160 | Non-breaking space | ||||
| 161-255 | Various | Various | Various | Various | Extended characters |
Conclusion
The charset attribute is essential for defining a webpage’s character encoding, ensuring correct text display, particularly for non-ASCII characters. While older encodings like ASCII, ANSI, and ISO-8859-1 were once common, UTF-8 is now the standard due to its wide-ranging language support. Using <meta charset="UTF-8"> is strongly recommended for modern web development, as it ensures compatibility and consistent text rendering across various platforms.