Friday, May 25, 2007

Getting to_yaml to generate proper UTF-8 and not "!binary"

The linked article (in Japanese) contains a fix that lets YAML handle multibyte characters properly. This can be very useful when generating test fixtures from existing data. Here is a quick summary if you do not read Japanese:

By default, YAML does not support multibyte characters:

>> puts "あ".to_yaml
--- !binary |
44GC

Changing YAML's default encoding, or passing an :Encoding to the to_yaml call did not help.

The author wrote a monkey patch, that among other things, patches the String base class... Hmm, scarry... But it seems to work, as the following test shows:

>> puts [["あ", "い"], {"う" => ["え"]}, Struct.new(:name).new("お")].to_yaml
---
- - "あ"
- "い"
- "う":
- "え"
- !ruby/struct:
name: "お"


Also, some helpful Japanese fellow has created a Rails plugin for this fix. Here are the links to the SVN repository, for the 0.1.0 release, and the trunk.

No comments: