Dynamic Web Development with Seaside

17.6.3In Seaside and Pharo

Now let us see how these principles apply to Pharo. The Unicode introduction started with version 3.8 of Squeak and it is slowly consolidated. You can still develop applications with different encodings with Seaside. There is an important rule in Seaside about the encoding: “do unto Seaside as you would have Seaside do unto you”. This means that if you run an encoded adapter web server such as WAKomEncoded, Seaside will give you strings in the specified encoding but also expect from you strings in that encoding. In Squeak encoding, each character is represented by an instance of Character. If you have non-Latin-1 characters, you’ll end up with instances of WideString. If all your Characters are in Latin-1, you’ll have ByteStrings.

WAKomEncoded. WAKomEncoded takes one or more bytes of UTF-8 and maps them to a single character (and vice versa). This allows it to support all 100,000 characters in Unicode. The following code shows how to start the encoding adapter.

"Start on a different port from your standard (WAKom) port"
WAKomEncoded startOn: 8081

WAKom. Now what WAKom does, is a one to one mapping from bytes to characters. This works fine if and only if your character set has 255 or less entries and your encoding maps one to one. Examples for such combination are ASCII and ISO-8859-1 (latin-1).

If you run a non-encoded web server adapter like WAKom, Seaside will give you strings in the encoding of the web page (!) and expect from you strings in the encoding of the web page.

Example. If you have the character ä in a UTF-8 encoded page and you run an encoding server adapter like WAKomEncoded this character is represented by the Squeak string:

String with: (Character value: 16rE4)

However if you run an adapter like WAKom, the same character ä is represented by the Squeak string:

String with: (Character value: 16rC3) with: (Character value: 16rA4).

Yes, that is a string with two Characters! How can this be? Because ä (the Unicode character U+00E4) is encoded in UTF-8 with the two byte sequence 0xC3 0xA4 and WAKom does not interpret that, it just serves the two bytes.

Use UTF-8. Try to use UTF-8 for your external encodings because it supports Unicode. So you can have access to the largest character set. Then use WAKomEncoded; this way your internal string will be encoded on WideString. WAKomEncoded will do the conversion of the response/answer between WideString and UTF-8.

To see if your encoding works, go to http://localhost:8080/tests/alltests and then to the “Encoding” test (select WAEncodingTest). There’s a link there to a page with a lot of foreign characters, pick the most foreign text you can find and paste it into the upper input field, submit the field and repeat it for the lower field.

Telling the browser the encoding. So now that you decided which encoding to use and that Seaside will send pages to the browser in that encoding, you will have to tell the browser which encoding you decided to use. Seaside does this automatically for you. Override charSet in your session class (the default is 'utf-8' in Squeak). In Seaside 3.0 this is a configuration setting in the application.

The charset will make sure that the generated html specifies the encodings as shown below.

Content-Type:text/html;charset=utf-8

<meta content="text/html;charset=utf-8"
http-equiv="Content-Type"/>

Now you should understand a little more about character encodings and how Seaside deals with them. Pay attention that the contents of uploaded files are not encoded even if you use WAKomEncoded. In addition you have to be aware that you may have other parts of your application that will have to deal with such issues: LDAP, Database, Host OS, etc.

Copyright © 19 March 2024 Stéphane Ducasse, Lukas Renggli, C. David Shaffer, Rick Zaccone
This book is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 license.

This book is published using Seaside, Magritte and the Pier book publishing engine.