Whirlycott / Philip Jacob

Sunday, May 28, 2006

More UTF-8 head-thumping with Hibernate 3

Filed under: Technology — Philip Jacob @ 3:45 am

I’ve just finished upgrading an application component so that it uses Hibernate 3 instead of Hibernate 2. The last time I tried to do this, I spend half a day on it, realised that all of my UTF-8 encoded data wasn’t working and simply abandoned the effort. But I was feeling brave, so I tried again.

First off, since I’m running on OS X, the first thing that I had to learn is that Firefox on OS X doesn’t handle a lot of Indic text properly (notice that this bug is just over three years old!). That makes it really hard to test when you are looking at question marks instead of Tamil! Solution: use Safari. It works fine. A good test page is the OpenOffice Tamil intro.

But it still wasn’t working for me. I resorted back to the techniques that I used in an earlier blog post, specifically md5 checksums of the text in question. And, yes, there was definitely a problem.

The solution: you need extra parameters for your connection string when using Hibernate 3:

jdbc:mysql://localhost:3306/mydb?autoReconnect=true&useUnicode=true&characterEncoding=UTF-8

… and now things work again (well, in Safari, anyway).

4 Responses to “More UTF-8 head-thumping with Hibernate 3”

  1. Bart Cilfone Says:

    This solution would seem to indicate that the problem is with mysql rather than Hibernate.

    Last year, I went through our app at work to enable full UTF-8 support across every layer and found that literally every layer was misconfigured. Even before my effort, UTF-8 would seem to work in some areas of the site, but it was purely by accident. We’re using Oracle and a legacy Java app server and so the data was getting into the JVM memory space in tact from the database, but when when being rendered out to the pages it was being garbled in several various exciting ways.

    While it was a very frustrating effort - the result of all the effort is that a web page shows accented text correctly - it was very rewarding as far as learning about a whole area of software that us stupid Americans usually don’t have to worry about.

  2. Isocra blog Says:

    UTF-8 with Hibernate 3.0…

    Hi again, I’m now tackling something I’ve been meaning to do for ages (and probably should have done before I started the project) and that is to use Hibernate rather than rolling my own SQL.

    I’m reading the book “Hibernate in Action” by Christ…

  3. dube Says:

    Arrr thanks! Your approach worked for my charset problem too.

    have latin1 encoded tables and use hibernate3:
    ?useUnicode=true&characterEncoding=iso-8859-1

    - autoreconnect does not rely to the problem so I skipped that
    - I use & because of xml configuration :)

  4. Duke Says:

    Arrr thanks! Your approach worked for my charset problem too.

Leave a Reply