FasterCSV, Ruby 1.8, and Character Encodings

December 7, 2010 Evan Farrar

We had a bit of a head scratcher this week at the New York City office while working on Red Rover, a social directory for engaging students with their colleges and employees with their employer. We were trying to allow a CSV to be uploaded to the application, when it mysteriously failed to parse the CSV. We narrowed it down to being caused by a certain row with strangely encoded international characters (but not every row with them was a problem):

Fuentes,Jesús,”Cribbage, Chess, and Bridge Club”,Treasurer

But another row with the same character with the same encoding would import fine:

Johnson,Lúisa,Dodgeball Club,President

It turned out that this was due a problem with how Ruby finds character boundaries in 1.8. If that miscalculated character boundary happens to be where a quote mark begins in your CSV file, FasterCSV will hurl:

1.8.7> 'Jesús,"'.split(//)
=> ["J","e","s","349s,""]
1.9   > 'Jesús,"'.split(//)
=> ["J","e","s","ú","s",",","""]

This is not a problem in Ruby 1.9 with FasterCSV or in the old fashioned CSV class included with Ruby’s standard library in 1.8.6. Hopefully I can help others who have got this error staring them in the face despite having a perfectly valid CSV in every regard:

FasterCSV::MalformedCSVError: FasterCSV::MalformedCSVError
    from /opt/ruby-enterprise-1.8.7-2010.01/lib/ruby/gems/1.8/gems/fastercsv-1.5.3/lib/faster_csv.rb:1623:in `shift'
    from /opt/ruby-enterprise-1.8.7-2010.01/lib/ruby/gems/1.8/gems/fastercsv-1.5.3/lib/faster_csv.rb:1614:in `each'
    from /opt/ruby-enterprise-1.8.7-2010.01/lib/ruby/gems/1.8/gems/fastercsv-1.5.3/lib/faster_csv.rb:1614:in `shift'
    from /opt/ruby-enterprise-1.8.7-2010.01/lib/ruby/gems/1.8/gems/fastercsv-1.5.3/lib/faster_csv.rb:1581:in `loop'
    from /opt/ruby-enterprise-1.8.7-2010.01/lib/ruby/gems/1.8/gems/fastercsv-1.5.3/lib/faster_csv.rb:1581:in `shift'
    from /opt/ruby-enterprise-1.8.7-2010.01/lib/ruby/gems/1.8/gems/fastercsv-1.5.3/lib/faster_csv.rb:1526:in `each'
    from /opt/ruby-enterprise-1.8.7-2010.01/lib/ruby/gems/1.8/gems/fastercsv-1.5.3/lib/faster_csv.rb:1537:in `to_a'
    from /opt/ruby-enterprise-1.8.7-2010.01/lib/ruby/gems/1.8/gems/fastercsv-1.5.3/lib/faster_csv.rb:1537:in `read'
    from /opt/ruby-enterprise-1.8.7-2010.01/lib/ruby/gems/1.8/gems/fastercsv-1.5.3/lib/faster_csv.rb:1229:in `parse'

About the Author

Biography

More Content by Evan Farrar
Previous
ActiveHash 0.9.0 released for Rails 3
ActiveHash 0.9.0 released for Rails 3

I recently released active_hash-0.9.0 which should work with Rails 3 / ActiveModel. If you use previous ...

Next
Are Rails plugins still necessary?
Are Rails plugins still necessary?

Back in the day, plugins were an acceptable way to extend Rails because gems were hard to create and publis...