A character set is really just that, a set of characters. These are the characters of your alphabet. For practical reasons each character is identified by a code point e.g. $A is identified by the code point 65.
Examples of character sets are ASCII, ISO-8859-1, Unicode or UCS (Universal Character Set).
- ASCII (American Standard Code for Information Interchange) contains 128 characters. It was designed following several constraints such that it would be easy to go from a lowercase character to its uppercase equivalent. You can get the list of characters at http://en.wikipedia.org/wiki/Ascii. ASCII was designed with the idea in mind that other countries could plug their specific characters in it but it somehow failed. ASCII was extended in Extended ASCII which offers 256 characters.
- ISO-8859-1 (ISO/IEC 8859-1) is a superset of ASCII to which it adds 128 new characters. Also called Latin-1 or latin1, it is the standard alphabet of the latin alphabet, and is well-suited for Western Europe, Americas, parts of Africa. Since ISO-8859-1 did not contain certain characters such as the Euro sign, it was updated into ISO-8859-15. However, ISO-8859-1 is still the default encoding of documents delivered via HTTP with a MIME type beginning with "text/". http://www.utoronto.ca/webdocs/HTMLdocs/NewHTML/iso_table.html shows in particular ISO-8859-1.
- Unicode is a superset of Latin-1. To accelerate the early adoption of Unicode, the first 256 code points are identical to ISO-8859-1. A character is not described via its glyph but identified by its code point, which is usually referred to using "U+" followed by its hexadecimal value. Note that Unicode also specifies a set of rules for normalization, collation bi-directional display order and much more.
- UCS — the ‘Universal Character Set’ specified by the ISO/IEC 10646 International Standard contains a hundred thousand characters. Each character is unambiguously identified by a name and an integer also called its code point.
http://www.fileformat.info/info/charset/index.htm shows several character sets.
In Pharo. Now let us see the concepts exist in Pharo. The
WideString class hierarchy is roughly equivalent to the
LargeInteger hierarchy. The class
Integer is the abstract superclass of
SmallInteger which represents number with ranges between -1073741824 and 1073741823, and
LargeInteger which represents all the other numbers. In Pharo, the class
String is the abstract superclass of the classes
ByteString (ISO-8859-1) and
WideString (Unicode minus ISO-8859-1). Such classes are about character sets and not encodings.