Pages

2011-04-20

Unicode again

Google just claimed to have blogger "rewritten from scratch". Let's see whether they undestroyed the unicode handling (broken since 2008).

This is a cuneiform dingir (U+1202d), entered using the gtk input method (C-S-u, hexnumber, space to finish): ��

This is the same dingir, entered as a HTML entity (ampersand, hash, letter x, hexdigits 1202d, semicolon): ��

First visible change: The "preview" button opens a new tab, the directly entered dingir is already broken (replaced by a pair of black diamonds with question mrks inside), the entitized works. Now lets save and look.

Edit after looking: The directly entered dingir is now destroyed even in the source code (I did all editing in the HTML editing subtab), the entity is still there. So at least there is one way to enter unicode, albeit an inconvenient one.

Edit 2: Fun fact: If I search my blog for "unicode", the result box shows my old post (2008-02-19) with correctly displayed cuneiform characters. If I click on the search result, it gives me a page with the same characters broken.

Edit 3: After adding "Edit 2", google broke the HTML entity as well. Now I'm stuck with only questionable diamonds (a diamond might be a girl's best friend, but I'm a man).