The following sections will describe the java character set and character encodings for JAVA EE 6.
A character set is a set of textual and graphic symbols, each of which is mapped to a set of non-negative integers,
A character encoding maps a character set to units of a specific width and elaborate byte serialization and ordering rules. Many character sets have more than one encoding.
Java programs can represent Japanese character sets using the EUC-JP or Shift-JIS encodings, among others. Each encoding has rules for representing the characters.
The ISO 8859 defines the 13 characters encodings that represent the text in a dozen languages. Each ISO 8859 has 256 characters.
UTF-8 is a variable-width character encoding that encodes 16-bit Unicode characters as one of the four bytes. A byte in UTF-8 is equivalent to 7-bit ASCII if its high-order bit is zero, otherwise, the character comprises a variable number of bytes.
UTF-8 is compatible with the majority of existing web content and provides access to the Unicode character set. Current versions of browsers and email clients support UTF-8. In addition, many web standards specify UTF-8 and their character encodings