What is Unicode?

Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems. The objective of Unicode is to unify all the different encoding schemes so that the confusion between computers can be limited as much as possible. Currently the Unicode standard defines values for over 100,000 characters and can be seen at the Unicode Consortium (http://unicode.org).

The Unicode standard has several character encoding forms:

  • UTF-8: only uses one byte (8 bits) to encode English characters. It can use a sequence of bytes to encode the other characters. UTF-8 is widely used in email systems and on the Internet.
  • UTF-16: uses two bytes (16 bits) to encode the most commonly used characters. If needed, the additional characters can be represented by a pair of 16-bit numbers.
  • UTF-32: uses four bytes (32 bits) to encode the characters. It became apparent that as the Unicode standard grew a 16-bit number is too small to represent all the characters. UTF-32 is capable of representing every Unicode character as one number.

UTF stands for Unicode Transformation Unit.

ASCII which stands for American Standard Code for Information Interchange became the first widespread encoding scheme. However, it is limited to only 128 character definitions. This is fine for the most common English characters, numbers and punctuation. ASCII was a bit limiting for the rest of the world. Depending on where you were, there might be a different character being displayed for the same ASCII code. The other parts of the world began creating their own encoding schemes which were of different lengths. Programs had to then figure out which encoding scheme they were meant to be using. The Unicode standard was created to overcome these problems.

 

ASCII vs UNICODE

ASCII is the lowest common denominator of character sets. ASCII has only 128 characters, but Unicode has more than 65,000. A Unicode escape can be used to insert any Unicode character into a program using only ASCII characters. A Unicode escape means exactly the same thing as the character that it represents.

Unicode escapes are designed for use when a programmer needs to insert a character that can't be represented in the source file's character set. They are used primarily to put non-ASCII characters into identifiers, string literals, character literals, and comments. Occasionally, a Unicode escape adds to the clarity of a program by positively identifying one of several similar-looking characters.

Quick Notes Finder Tags

Activities (1) advanced java (1) agile (3) App Servers (6) archived notes (2) Arrays (1) Best Practices (12) Best Practices (Design) (3) Best Practices (Java) (7) Best Practices (Java EE) (1) BigData (3) Chars & Encodings (6) coding problems (2) Collections (15) contests (3) Core Java (All) (55) course plan (2) Database (12) Design patterns (8) dev tools (3) downloads (2) eclipse (9) Essentials (1) examples (14) Exception (1) Exceptions (4) Exercise (1) exercises (6) Getting Started (18) Groovy (2) hadoop (4) hibernate (77) hibernate interview questions (6) History (1) Hot book (5) http monitoring (2) Inheritance (4) intellij (1) java 8 notes (4) Java 9 (1) Java Concepts (7) Java Core (9) java ee exercises (1) java ee interview questions (2) Java Elements (16) Java Environment (1) Java Features (4) java interview points (4) java interview questions (4) javajee initiatives (1) javajee thoughts (3) Java Performance (6) Java Programmer 1 (11) Java Programmer 2 (7) Javascript Frameworks (1) Java SE Professional (1) JPA 1 - Module (6) JPA 1 - Modules (1) JSP (1) Legacy Java (1) linked list (3) maven (1) Multithreading (16) NFR (1) No SQL (1) Object Oriented (9) OCPJP (4) OCPWCD (1) OOAD (3) Operators (4) Overloading (2) Overriding (2) Overviews (1) policies (1) programming (1) Quartz Scheduler (1) Quizzes (17) RabbitMQ (1) references (2) restful web service (3) Searching (1) security (10) Servlets (8) Servlets and JSP (31) Site Usage Guidelines (1) Sorting (1) source code management (1) spring (4) spring boot (3) Spring Examples (1) Spring Features (1) spring jpa (1) Stack (1) Streams & IO (3) Strings (11) SW Developer Tools (2) testing (1) troubleshooting (1) user interface (1) vxml (8) web services (1) Web Technologies (1) Web Technology Books (1) youtube (1)