Basic Latin (Unicode block)
The Basic Latin Unicode block is the first block of the Unicode standard and is identical to the American Standard Code for Information Interchange (ASCII) character set. It contains the 128 code points from U+0000 to U+007F.
This block encompasses the core characters used for the English language and many other Western languages. It includes:
-
Control characters (U+0000 to U+001F and U+007F): These are non-printing characters used for controlling devices or formatting text. Examples include Null (U+0000), Line Feed (U+000A), and Delete (U+007F).
-
Basic punctuation marks and symbols (U+0020 to U+002F, U+003A to U+0040, U+005B to U+0060, U+007B to U+007E): This range includes characters such as space, exclamation mark, quotation mark, number sign, dollar sign, percent sign, ampersand, apostrophe, parentheses, asterisk, plus sign, comma, hyphen-minus, period, slash, colon, semicolon, less-than sign, equals sign, greater-than sign, question mark, at sign, opening square bracket, backslash, closing square bracket, caret, underscore, grave accent, opening curly brace, vertical bar, closing curly brace, and tilde.
-
Digits (U+0030 to U+0039): These are the numerals 0 through 9.
-
Uppercase letters (U+0041 to U+005A): These are the uppercase letters A through Z.
-
Lowercase letters (U+0061 to U+007A): These are the lowercase letters a through z.
Because of its fundamental role and compatibility with ASCII, the Basic Latin block is supported by virtually all character encodings and computer systems. Its widespread adoption makes it a cornerstone of modern computing and text processing. Characters outside this block often require more complex encoding schemes like UTF-8 or UTF-16 to be properly represented and displayed.