Git Product home page Git Product logo

csv's Introduction

CSV

This library provides a complete interface to CSV files and data. It offers tools to enable you to read and write to and from Strings or IO objects, as needed.

Installation

Add this line to your application's Gemfile:

gem 'csv'

And then execute:

$ bundle

Or install it yourself as:

$ gem install csv

Usage

require "csv"

CSV.foreach("path/to/file.csv") do |row|
  # use row here...
end

Documentation

  • API: all classes, methods, and constants.
  • Recipes: specific code for specific tasks.

Development

After checking out the repo, run ruby run-test.rb to check if your changes can pass the test.

To install this gem onto your local machine, run bundle exec rake install. To release a new version, update the version number in version.rb, and then run bundle exec rake release, which will create a git tag for the version, push git commits and tags, and push the .gem file to rubygems.org.

Contributing

Bug reports and pull requests are welcome on GitHub at https://github.com/ruby/csv.

NOTE: About RuboCop

We don't use RuboCop because we can manage our coding style by ourselves. We want to accept small fluctuations in our coding style because we use Ruby. Please do not submit issues and PRs that aim to introduce RuboCop in this repository.

License

The gem is available as open source under the terms of the 2-Clause BSD License.

See LICENSE.txt for details.

csv's People

Contributors

284km avatar akr avatar alyssais avatar amatsuda avatar burdettelamar avatar dependabot[bot] avatar ericgpks avatar esparta avatar gogotanaka avatar hsbt avatar koic avatar kou avatar marcandre avatar maschwenk avatar maumagnaguagno avatar mrkn avatar nagachika avatar nobu avatar nurse avatar olleolleolle avatar sampatbadhe avatar shibasaka avatar stevendaniels avatar stomar avatar takkanm avatar tomog105 avatar vbrazo avatar watson1978 avatar wowinter13 avatar zverok avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

csv's Issues

#read returns all rows instead of just the remaining ones

Here's the documentation of the #read method:

Slurps the remaining rows and returns an Array of Arrays.

Let's check this with the following sample script:

# frozen_string_literal: true

require 'csv'

data = StringIO.new <<~DATA
  foo,1
  bar,2
  baz,3
DATA

csv = CSV.new(data)
puts "Shift 1: #{csv.shift.inspect}"
puts "Shift 2: #{csv.shift.inspect}"
puts "Rest   : #{csv.readlines.inspect}"
$ ruby -v
ruby 2.5.3p105 (2018-10-18 revision 65156) [x86_64-darwin18]

$ ruby sample-csv.rb 
Shift 1: ["foo", "1"]
Shift 2: ["bar", "2"]
Rest   : [["baz", "3"]]

Perfect - using ruby-2.5.3 this is all fine.

Running this same example with ruby-2.6.2 it shows #read returns all rows of the file instead of just the remaining ones:

$ ruby -v
ruby 2.6.2p47 (2019-03-13 revision 67232) [x86_64-darwin18]

$ ruby sample-csv.rb 
Shift 1: ["foo", "1"]
Shift 2: ["bar", "2"]
Rest   : [["foo", "1"], ["bar", "2"], ["baz", "3"]]

CSV parser fails with mixed quoting and non-standard row_sep

opts = {row_sep: "|\n", col_sep: ","}
CSV.parse(CSV.generate(opts) { |csv|
  csv << ["yes, it's true"];
  csv << [ "CSV is broken"];
  csv << ["uhoh!"]; }, opts)

In the example above, the parser will raise a CSV::MalformedCSVError (Unquoted fields do not allow new line <"\n"> in line 2.) error.

It will succeed if any of the following are done:

  1. pass force_quotes: true to generate
  2. add a comma in each row
  3. remove the comma from the first row
  4. use row_sep: "\n"

Therefore, the problem seems to be that a non-standard row_sep + lines with a field being quoted in one row and unquoted in the next.

#eof? doesn't reflect the actual object's state anymore for ruby >= 2.6

Hello,

I'm creating this issue to know a bit more about the methods exposed from the CSV object that are delegated to the underlying IO object, including #eof?.

For ruby < 2.6, the behaviour was that whenever we call #readline, #eof? would return the state of the IO and tell if the end of file is reached. However, this behaviour has changed in ruby >= 2.6 and it seems that we cannot rely on #eof? anymore to tell the state of the IO when we want to readline on the CSV object.

Is it the intended behaviour? If so, shouldn't #eof? (and possibly other delegated methods from IO) be deprecated or at least a notice in the documentation that we shouldn't rely on it (them)?
If not, I'd be happy to give it a try and fix it.

Thanks

  • ruby -v: ruby 2.6.1p33 (2019-01-30 revision 66950) [x86_64-darwin16]

For the following CSV test.csv file:

a,b,c,d
1,2,3,4
5,6,7,8
9,10,11,12

and the following code snippet:

require 'csv'
csv = ::CSV.open('./test.csv', {headers: true, col_sep: ','})
p csv.eof?
csv.readline
p csv.eof?
csv.readline
p csv.eof?
csv.readline
p csv.eof?

I got the following results with 2.6.1:

false
true
true
true

but using ruby 2.5.3p105 (2018-10-18 revision 65156) [x86_64-darwin16] I got:

false
false
false
true

Confusing error Do not allow except col_sep_split_separator

Is it possible to make the following error message less cryptic? I have no idea what is col_sep_split_separator (It doesn't appear in the documentation or in the parameters), and the language is not clear at all:

message = "Do not allow except col_sep_split_separator " +

In the past CSV.parse('"a"bc"') returned "CSV::MalformedCSVError: Unclosed quoted field on line 1.", which states clearly that a quote is unclosed.

Now, it returns "Do not allow except col_sep_split_separator after quoted fields in line 1.", and I think most people wouldn't know what is causing the problem.

CSV.generate returns unexpected encoding string

Since Ruby's internal encoding is UTF-8, I expect CSV.generate returns UTF-8 string as well
However, CSV.generate returns ASCII-8BIT string by default.

csv/lib/csv.rb

Line 537 in 9b81ece

str = String.new

Seems that String.new returns ASCII-8BIT string.

I want to know whether this is right behavior in Ruby 2.6 's CSV library. Thank you!!!

Ruby 2.6

$ ruby -v t.rb
ruby 2.6.0p0 (2018-12-25 revision 66547) [x86_64-darwin18]
#<Encoding:ASCII-8BIT>
"\xE5\xBA\x97\xE5\x90\x8D,\xE5\x90\x8D\xE7\xA7\xB0,\xE7\xA8\x8E\xE9\xA1\x8D\xE5\x8C\xBA\xE5\x88\x86,\xE6\x95\xB0\xE9\x87\x8F,\xE7\xA8\x8E\xE9\xA1\x8D\n"

Ruby 2.5

$ ruby -v t.rb
ruby 2.5.3p105 (2018-10-18 revision 65156) [x86_64-darwin18]
#<Encoding:UTF-8>
"店名,名称,税額区分,数量,税額\n"

Test code

require 'csv'
csv_string = CSV.generate do |csv|
  csv << ["店名", "名称", "税額区分", "数量", "税額"]
end
p csv_string.encoding
p csv_string

Regression since ruby 2.4 - `unknown encoding name - bom|utf-8`

Hi there I update my app to ruby-2.5 and have some problems with sending encoding bom|utf-8 to csv parser.

I create a sample Dockerfiles for this:
With ruby-2.4.3 everything working fine:

FROM ruby:2.4.3

RUN echo 'test' > test.csv
RUN echo "require 'csv'\ncsv = CSV.read('test.csv', encoding: 'bom|utf-8')\n p csv[0][0]" > script.rb
CMD ruby script.rb

output is "test"

But for ruby-2.5.0 encoding error is happend

FROM ruby:2.5.0

RUN echo 'test' > test.csv
RUN echo "require 'csv'\ncsv = CSV.read('test.csv', encoding: 'bom|utf-8')\n p csv[0][0]" > script.rb
CMD ruby script.rb

Error is:

/usr/local/lib/ruby/2.5.0/csv.rb:1532:in `find': unknown encoding name - bom|utf-8 (ArgumentError)
        from /usr/local/lib/ruby/2.5.0/csv.rb:1532:in `initialize'
        from /usr/local/lib/ruby/2.5.0/csv.rb:1280:in `new'
        from /usr/local/lib/ruby/2.5.0/csv.rb:1280:in `open'
        from /usr/local/lib/ruby/2.5.0/csv.rb:1346:in `read'
        from script.rb:2:in `<main>'

Calling to_a or Enumerable method on CSV for the second time empties it

Code to reproduce:

irb(main):015:0> data = CSV.open('FL_insurance_sample.csv', headers: true)
=> <#CSV io_type:File io_path:"FL_insurance_sample.csv" encoding:UTF-8 lineno:0 col_sep:"," row_sep:"\r" quote_char:"\"" headers:true>
irb(main):016:0> data.map {|row| row['policyID'] }
=> ["119736", "448094", "206893", "333743", "172534", "785275", "995932", "223488", "433512", "142071", "253816", , "198381", "746777", "144396", "263732", "696203", "200041", "127791", "945454", "918783", "515966", "430825", "190906", "166568", "353031", "551567", "133628", "441954", "747072", "701430", "420763", "618391", "214094", "253272", "836769", "396418", "212363", "246838", "758281", "551473", "999148", "983276", "751862", "347062", "894809", "762476", "747976", "685314", "387815", "991621", "998347", "996182", "840804", "783604", "253701", "840979", "861594", "779435", "900859", "186330", "512609", "432937", "412242", "597608", "213466", "517820", "980965", "191055", "510055", "843311", "816735", "337433", "623083", "528326", "334012", "531464", "128205", "982934", "366858", "917861", "210083", "586410", "237321", "910610", "617475", "902121", "182262", "945617", "156266", "588078", "688316", "382076", "121582", "563196", "643286", "142017", "879091", "337675", "379732", "932035", "312404", "747081", "901144", "196907", "388423", "856044", "539127", "700397", "661275", "139221", "937181", "543206", "341736", "754850", "866762", "356525", "871304", "758309", "803289", "922786", "549990", "168802", "546453", "813885", "353826", "938192", "302211", "331518", "631541", "373350", "840399", "107484", "134321", "245082", "895779", "614634", "731255", "870031", "240901", "663297", "191014", "325852", "121870", "751791", "273278", "381736", "625288", "315535", "774751", "328683", "423065", "213789", "844864", "662432", "776510", "375098", "161060", "465293", "704983", "378969", "983689", "541087", "525184", "864811", "312073", "830827", "603600", "830638", "755799", "414134", "733306", "483113", "201599", "115553", "188952", "923142", "394238", "884711", "463999", "575587", "971233", "735553", "827885", "842735", "542544", "489799", "179917", "743678", "598714", "895885", "313786", "535248", "706708", "771872", "413294", "408522", "971371", "340752", "348885", "889842", "606357", "738016", "231530", "223769", "242550", "649523", "732806", "506127", "682614", "383061", "394483", "100734", "233198", "407586", "777849", "368612", "127678", "120313", "381935", "143107", "805832", "687756", "349748", "169703", "160845", "114716", "639372", "962341", "300088", "447463", "837999", "296462", "921167", "609153", "375107", "566456", "910667", "822256", "306936", "176160", "409658", "413864", "445152", "168986", "288291", "180190", "668451", "614810", "502082", "706039", "306321", "373798", "469052", "481459", "390921", "196529", "740303", "538916", "994961", "409875", "312502", "972492", "973803", "392941", "483132", "941267", "979434", "395199", "935112", "328328", "988479", "938390", "531723", "390295", "800333", "896027", "159053", "850624", "501210", "206613", "215176", "348557", "449758", "149149", "630004", "400230", "820739", "392800", "331634", "289487", "944227", "309870", "781075", "138800", "238602", "766085", "742674", "539986", "970167", "790549", "726087", "466491", "137084", "770955", "296576", "375634", "666525", "885400", "294715", "121406", "667352", "221935", "420569", "121608", "219338", "656313", "719612", "622576", "288272", "483654", "206575", "431741", "574975", "916215", "250085", "498521", "437546", "336215", "850733", "779536", "806053", "215624", "615260", "495971", "514202", "538761", "213484", "143092", "708325", "155509", "746709", "819848", "309835", "139184", "290655", "827012", "318336", "258417", "701302", "142034", "328942", "347135", "909114", "563415", "279517", "556606", "651400", "727059", "696150", "617858", "435154", "526134", "168517", "338250", "183015", "153278", "805805", "123298", "220620", "111828", "809267", "100832", "833501", "855220", "780279", "736500", "769111", "552700", "261355", "232697", "871943", "902178", "642392", "879254", "662332", "322056", "412153", "661157", "340623", "316160", "436138", "117516", "648515", "845348", "369196", "524989", "731292", "270408", "146599", "334837", "713639", "994148", "311195", "122957", "792909", "354608", "669472", "606218", "997046", "762150", "598122", "743490", "354377", "423012", "193658", "932158", "145157", "917300", "139283", "677487", "945412", "360081", "942850", "788310", "715861", "597274", "568609", "289654", "280820", "673647", "327821", "950932", "998455", "540523", "424217", "699408", "687347", "680327", "597117", "266549", "673994", "436373", "835868", "214893", "649680", "415864", "119912", "595269", "767070", "676536", "852629", "389349", "152953", "955461", "237748", "761602", "896196", "119126", "384059", "390978", "274235", "771305", "726249", "593040", "492618", "930951", "481346", "441607", "969756", "260394", "344306", "303488", "420736", "999074", "436388", "225877", "239600", "372980", "426550", "668292", "579643", "719414", "909116", "154943", "767203", "969840", "250163", "357693", "202254", "891615", "210155", "406098", "636233", "660021", "908793", "285202", "575821", "905410", "234566", "648800", "118922", "999511", "460983", "746058", "914824", "828851", "215511", "689551", "646941", "239894", "601473", "295575", "284817", "247418", "956631", "274394", "241568", "710879", "389832", "981075", "472817", "270291", "625006", "730065", "127439", "200161", "972046", "731927", "189177", "805881", "689675", "147990", "857740", "650870", "171696", "173411", "134855", "253880", "867210", "201080", "402131", "974455", "703811", "766146", "651977", "395336", "309592", "724503", "133652", "932447", "640925", "268779", "234219", "776191", "862405", "439550", "610152", "636589", "663394", "203390", "460155", "202669", "680696", "932413", "577961", "402465", "441804", "427207", "357789", "962575", "481067", "378030", "227970", "312359", "622542", "654076", "799638", "301168", "704199", "995995", "586496", "166251", "627242", "704509", "392017", "744159", "452889", "877439", "636980", "996522", "929753", "956015", "669029", "228166", "677953", "401301", "268640", "300856", "933265", "891070", "511306", "645925", "616982", "719750", "610089", "629045", "685168", "578509", "775294", "210493", "914012", "852931", "727832", "140749", "596550", "859377", "584257", "155589", "514183", "872822", "651684", "779165", "158884", "144149", "383577", "261091", "444338", "327171", "371347", "951686", "447162", "188345", "818636", "614898", "669081", "935876", "468469", "542386", "395441", "303562", "825549", "970082", "716335", "254640", "614480", "738905", "524306", "734078", "427755", "484434", "995789", "105029", "939080", "963857", "269554", "359282", "246752", "204754", "659602", "334792", "674289", "384509", "475176", "332849", "803588", "230743", "411740", "937207", "848833", "504245", "734996", "230134", "816763", "322162", "321155", "613838", "869481", "384695", "983908", "111917", "176957", "470811", "352835", "481615", "906547", "748767", "389516", "594771", "786732", "612028", "989756", "699478", "415463", "351135", "496510", "485677", "677003", "105608", "158676", "777817", "812774", "159089", "786677", "259373", "451475", "882753", "783377", "437430", "827369", "596250", "285886", "458484", "486112", "101338", "232095", "342224", "886136", "903258", "520030", "277195", "181268", "128861", "502760", "848887", "548440", "201566", "744482", "390109", "511392", "408803", "411806", "344504", "433345", "272099", "163684", "719955", "876995", "743399", "595305", "910435", "131728", "372630", "273180", "618362", "653880", "939368", "320372", "695248", "622358", "248006", "717616", "678182", "399488", "739362", "760498", "447188", "255060", "964099", "853251", "280691", "436217", "137690", "364869", "622931", "144721", "879044", "293936", "791778", "771997", "610413", "110834", "810838", "501151", "132290", "790532", "793536", "252450", "390685", "263106", "222923", "478696", "644404", "936549", "726970", "637467", "506061", "405009", "797729", "342656", "808248", "586817", "297047", "377880", "960034", "556888", "440763", "827514", "261396", "971391", "329887", "790204", "593181", "567130", "778905", "565101", "124465", "254681", "730075", "469022", "427994", "904768", "952446", "802386", "443430", "306353", "425957", "984289", "841525", "671320", "790892", "387385", "485814", "803291", "575869", "274205", "508824", "176891", "201277", "242745", "272543", "354967", "417685", "216543", "547186", "766555", "340437", "960468", "578597", "316406", "730243", "340294", "348693", "198785", "296809", "466837", "931973", "478208", "327460", "755449", "246887", "790477", "807314", "852868", "757484", "539258", "125300", "515247", "362625", "441367", "428148", "251047", "613819", "494107", "634777", "984559", "388907", "444433", "136473", "928946", "276598", "957251", "437315", "880804", "278989", "223853", "526575", "914254", "485361", "689055", "662290", "311286", "830307", "886048", "100316", "302424", "902058", "465430", "704037", "465920", "485918", "211655", "461546", "956939", "223206", "808184", "263920", "481056", "885148", "639151", "939684", "730817", "881633", "351383", "194298", "892232", "332022", "568246", "363008", "463483", "679828", "958563", "349028", "483460", "267578", "951972", "230094", "107453", "310848", "480667", "376972", "329020", "687753", "839888", "248834", "462344", "583299", "460519", "733735", "267190", "408444", "941525", "701341", "670477", "569296", "186507", "141517", "424714", "569209", "950233", "484152", "196725", "124654", "911722", "765060", "688938", "273297", "248817", "226968", "840749", "599513", "352034", "459881", "596506", "419655", "823571", "700978", "425548", "416477", "997627", "904796", "366928", "620351", "284239", "790773", "220438", "608209", "979905", "430332", "342163", "686993", "871485", "675816", "536685", "623673", "374046", "227293", "544721", "187389", "233931", "979777", "854640", "713166", "622713", "760557", "386420", "990463", "186358", "364912", "352489", "549727", "271003", "814262", "896386", "865501", "453724", "408265", "520733", "993917", "343211", "850825", "378860", "682241", "640824", "807855", "349153", "336877", "491718", "963762", "110199", "560859", "971647", "299010", "731013", "183167", "170649", "677574", "980724", "361902", "361949", "762422", "230449", "626481", "596965", "663507", "893473", "144278", "666220", "145040", "350628", "143023", "347038", "306792", "479443", "139293", "928529", "406883", "904481", "800218", "431173", "779054", "995049", "726733", "980126", "353282", "208099", "776568", "219138", "572410", "976645", "243505", "378722", "397940", "502466", "878346", "508637", "982298", "489930", "858457", "339167", "172384", "123923", "554572", "198361", "302493", "111020", "387113", "681245", "335083", "198084", "974019", "189732", "919237", "703813", "992140", "442886", "323672", "209448", "551371", "754333", "803578", "860005", "244382", "557428", "946133", "954047", "751178", "983825", "385875", "521593", "850201", "742445", "688673", "950319", "804282", "964764", "946971", "381999", "703464", "767342", "295825", "511947", "497446", "724330", "715643", "646193", "301733", "512304", "837657", "789826", "631172", "693823", "478886", "996504", "113209", "628450", "296773", "332837", "723813", "973935", "902448", "967792", "991555", "199723", "709705", "581618", "228671", "792113", "362668", "924918", "810749", "916616", "971782", "251835", "647540", "565470", "643886", "970032", "705570", "886873", "109862", "280283", "176659", "605167", "522273", "161421", "964810", "148513", "537098", "939518", "120991", "642958", "102002", "484307", "539985", "660338", "680100", "240664", "948418", "915706", "700735", "521481", "399117", "850152", "267551", "475548", "708729", "875533", "754962", "254136", "520684", "811927", "307890", "441071", "960973", "134341", "830075", "194015", "344814", "594806", "819822", "747511", "274748", "671128", "546588", "281041", "388236", "583457", "634761", "992828", "840869", "753794", "394096", "993901", "785862", "567375", "503438", "428168", "824970", "182953", "470499", "309731", "190203", "706567", "449478", "355644", "651125", "629544", "851607", "778665", "869078", "737972", "644493", "206766", "349230", "539320", "696388", "250787", "122420", "613556", "703519", "275369", "332121", "931804", "384379", "740074", "103223", "492213", "246117", "401497", "702684", "672475", "683842", "752251", "256535", "177567", "889944", "167508", "902637", "529761", "824464", "852690", "289551", "714503", "339215", "916316", "562662", "859555", "155867", "326736", "386572", "980608", "403239", "610427", "358889", "328732", "575764", "467957", "664848", "412356", "422141", "604590", "805329", "450931", "266753", "551411", "649329", "992384", "580316", "803768", "905727", "107888", "916778", "942342", "768796", "531150", "293430", "209503", "102398", "903095", "636279", "359690", "891085", "409967", "678303", "645243", "749435", "611857", "430900", "755749", "274616", "544936", "583970", "580276", "542777", "886730", "744670", "772025", "456572", "128131", "271209", "300811", "473009", "526005", "860797", "685747", "123937", "183879", "512260", "691247", "883852", "555449", "570707", "781009", "151838", "181634", "439778", "168008", "968580", "884571", "905912", "128146", "344782", "552814", "500262", "785630", "933238", "503844", "460236", "700206", "891489", "304021", "433793", "464021", "491115", "944950", "838298", "672001", "224331", "676822", "206441", "370773", "394818", "852520", "876010", "593070", "179262", "369830", "828704", "455732", "934502", "470145", "589653", "709858", "660969", "725797", "669238", "868735", "238679", "888350", "537673", "269818", "833285", "556472", "853960", "691111", "902545", "638372", "538203", "963299", "370091", "682597", "430806", "143612", "237956", "245985", "183788", "103810", "278243", "878953", "394030", "965651", "454585", "658942", "481064", "970015", "664155", "504852", "465283", "901625", "997150", "261097", "575697", "209405", "319675", "508721", "495724", "548607", "454012", "588111", "569500", "972877", "746177", "415821", "900320", "866150", "790880", "362729", "788533", "727899", "326368", "653519", "814948", "724926", "284110", "676635", "868212", "767054", "506214", "897926", "677790", "101036", "233192", "544426", "268827", "635219", "265871", "552478", "247639", "617777", "227225", "741059", "113309", "154954", "315104", "282118", "901589", "475454", "733014", "872342", "977297", "996797", "930412", "828167", "835470", "130599", "556323", "653783", "199843", "408604", "235265", "330735", "632165", "681096", "547242", "242005", "566613", "974414", "597501", "329516", "182197", "266075", "885528", "970498", "587057", "742260", "815587", "888748", "651160", "900301", "431495", "291761", "392571", "296579", "814503", "397603", "278859", "927946", "307199", "473551", "359896", "630799", "115071", "813157", "446877", "549092", "869635", "181190", "952536", "234730", "990532", "437376", "868414", "560023", "531018", "862279", "913677", "368497", "806491", "956474", "393855", "236809", "233360", "852903", "411412", "638511", "619853", "677532", "203973", "864999", "470074", "454081", "679930", "276744", "689881", "219881", "281812", "681382", "187152", "420392", "254782", "983500", "612229", "613407", "380142", "935588", "203097", "771476", "881297", "474249", "832086", "377902", "605866", "571393", "801676", "520599", "317574", "481235", "942125", "658270", "776764", "549397", "929053", "634233", "747131", "995193", "381321", "995419", "501720", "523558", "650297", "871903", "846959", "898645", "631406", "344642", "180988", "476557", "495613", "813476", "110207", "349233", "355679", "342917", "763188", "684404", "523215", "961479", "872886", "846234", "365135", "420233", "390802", "278448", "901435", "788395", "870434", "584847", "440239", "847832", "177345", "285210", "879413", "107783", "336897", "972481", "860546", "268835", "578855", "678603", "603821", "947816", "286578", "591676", "787369", "935010", "712366", "210708", "692014", "758505", "888231", "435390", "323832", "389142", "950551", "836880", "850076", "641388", "342919", "419446", "757410", "584791", "430552", "233436", "770307", "453741", "805159", "655713", "632168", "236753", "814965", "416373", "458001", "421258", "487337", "705392", "724971", "525340", "274641", "725522", "948462", "931725", "297000", "734660", "906872", "209844", "846660", "744254", "785151", "557521", "670304", "210163", "208187", "881910", "715199", "439458", "441512", "560925", "436617", "189453", "800460", "958971", "650415", "807845", "150155", "189513", "518477", "562655", "316784", "327359", "449637", "519610", "753517", "303809", "348892", "730950", "238662", "481664", "546767", "663326", "413119", "885963", "787349", "214885", "799851", "958812", "711408", "766275", "982804", "177639", "182295", "360593", "455558", "498511", "803252", "942252", "642072", "893642", "979888", "385335", "782042", "172382", "267767", "694601", "612302", "508808", "879262", "503798", "166146", "527060", "922058", "944141", "132263", "908240", "454703", "904151", "966797", "128830", "122025", "396173", "983748", "986174", "379626", "463284", "686919", "831575", "475571", "245018", "253022", "597738", "923068", "928173", "860443", "844073", "110751", "215972", "254044", "881800", "826248", "441838", "123081", "417943", "834306", "724894", "991408", "680719", "609567", "499385", "370022", "689051", "873363", "997492", "891852", "495074", "480977", "902538", "126856", "312850", "880863", "926739", "265135", "713991", "709214", "856409", "715519", "275781", "476956", "182028", "307144", "970874", "640315", "319900", "304610", "727625", "611749", "908085", "284304", "515383", "209934", "439491", "114641", "329658", "323610", "280448", "311781", "164040", "494483", "998895", "291130", "651752", "193711", "249527", "611084", "781807", "674839", "449262", "921003", "587925", "755440", "200577", "469159", "942176", "881788", "485164", "547747", "331888", "571760", "172126", "470768", "664373", "441872", "389978", "376364", "542479", "640211", "422963", "977255", "613747", "380308", "915355", "383141", "164909", "627420", "759506", "955914", "114477", "431157", "861984", "762453", "582444", "710826", "143379", "473557", "567547", "733330", "984266", "689889", "165580", "192397", "898470", "393024", "857508", "862971", "469168", "863406", "198867", "743079", "473889", "683427", "106321", "448154", "770864", "689741", "667310", "705671", "831095", "181430", "176997", "842701", "336744", "910596", "876682", "182231", "825424", "526224", "296132", "194544", "372807", "109022", "746495", "529196", "540751", "946516", "598909", "801828", "707890", "543857", "114550", "583414", "104959", "806361", "934501", "637915", "870261", "444277", "711042", "155229", "941628", "854460", "426518", "565545", "913552", "734960", "829687", "162975", "679151", "524631", "458718", "147509", "760033", "248427", "809305", "607091", "691454", "136509", "295142", "126875", "135566", "758919", "919634", "924677", "216954", "543931", "127115", "450072", "568353", "838315", "494884", "374662", "317978", "443783", "586472", "619279", "670758", "220374", "533430", "868629", "128756", "333988", "260039", "938522", "948241", "476849", "573272", "939271", "669724", "326868", "888466", "955649", "350333", "265649", "672496", "932711", "813138", "387562", "763690", "144432", "956980", "606510", "191943", "902403", "408994", "706339", "993218", "854831", "359057", "384404", "704979", "565064", "185266", "832418", "118372", "355246", "354474", "952889", "862741", "147554", "379234", "335634", "674044", "278463", "744141", "941104", "369810", "622825", "159490", "949620", "589581", "290570", "115262", "157439", "820427", "208245", "153652", "807437", "205089", "928704", "847055", "443358", "885202", "421647", "288215", "656479", "474918", "239241", "123075", "640862", "742091", "127947", "977282", "663177", "113510", "587466", "536877", "254997", "247253", "437996", "916505", "644885", "730168", "954535", "564735", "441080", "873449", "968925", "907594", "775788", "611644", "406061", "618002", "482864", "567849", "751165", "144854", "466348", "315491", "795920", "809607", "474800", "256436", "947732", "408682", "356370", "507120", "127256", "167047", "400444", "342254", "817780", "487950", "667514", "513680", "405754", "532214", "293233", "754154", "249388", "283201", "248770", "672690", "420683", "198679", "351513", "149468", "145769", "601460", "592320", "669544", "762657", "380367", "288147", "182418", "375290", "805557", "829455", "316463", "927046", "850288", "885654", "826742", "941158", "967866", "873747", "829382", "430259", "438032", "303496", "847942", "218356", "452794", "981439", "598715", "163921", "818157", "251458", "671867", "427176", "403131", "764637", "674131", "657414", "527597", "142447", "802496", "696236", "896153", "919260", "884541", "243532", "623924", "213207", "247464", "988513", "350424", "673107", "122904", "852799", "778713", "152959", "565397", "908162", "687004", "497350", "680301", "778222", "643726", "346776", "525302", "594959", "963097", "255200", "905563", "845050", "673682", "347556", "323644", "849213", "315733", "370714", "813661", "601144", "958101", "430723", "161858", "883297", "220889", "337019", "287974", "909550", "100922", "296485", "376743", "451369", "994229", "549339", "197726", "987427", "993682", "976312", "900034", "491015", "617324", "239357", "681327", "352403", "164006", "251771", "503103", "655363", "585539", "887739", "403857", "454622", "528824", "553390", "781776", "752454", "208268", "334746", "345341", "174580", "970196", "748393", "916913", "592832", "576225", "925826", "708420", "175701", "925767", "930709", "983768", "950355", "233788", "353200", "875871", "124583", "952771", "821284", "948967", "430600", "253223", "694013", "235606", "318077", "652306", "930767", "877872", "213778", "844059", "247133", "360152", "998644", "193200", "676176", "917098", "388192", "943356", "218764", "814489", "507992", "260882", "539377", "266526", "123549", "829398", "460058", "856342", "692354", "615248", "860257", "931737", "433714", "248035", "526493", "997868", "358238", "451520", "761310", "771480", "101934", "806089", "216998", "948101", "253386", "428457", "724191", "583553", "856147", "555415", "920290", "137015", "403589", "953377", "374030", "336624", "780782", "838573", "585047", "546503", "429947", "674337", "308168", "437807", "732936", "381680", "186331", "271843", "835867", "831567", "254867", "548875", "561547", "571309", "328654", "252040", "444929", "197179", "633582", "592768", "876103", "862211", "790275", "621015", "895668", "135611", "167834", "812700", "325223", "677144", "669129", "293268", "532415", "251466", "894944", "664654", "907057", "998569", "792947", "502869", "725878", "230998", "423210", "389940", "664326", "240123", "331172", "556670", "846632", "322952", "176662", "402035", "605361", "443281", "805272", "241436", "995966", "522413", "161754", "990849", "474544", "324243", "477231", "792493", "412141", "524599", "885997", "420433", "729792", "395368", "205934", "578516", "872862", "114347", "475670", "443621", "673952", "646454", "950759", "429724", "424668", "540318", "491387", "653920", "924154", "495756", "979918", "773304", "546476", "993585", "535339", "690173", "506624", "736529", "180830", "949466", "425185", "147649", "347367", "552384", "811257", "674211", "166223", "765687", "282794", "593914", "324812", "608083", "423580", "387933", "904424", "761790", "598443", "484654", "527134", "176490", "418718", "380161", "472396", "618259", "768626", "458435", "765917", "916187", "162788", "120519", "311330", "387890", "208316", "160228", "874004", "698690", "745209", "110742", "534704", "200400", "774390", "538228", "813558", "818077", "159155", "333123", "757197", "920104", "960185", "725351", "673832", "930523", "498099", "305181", "325705", "466339", "971724", "954511", "540123", "213307", "216606", "874075", "971470", "366404", "876618", "405650", "491600", "568134", "242703", "384502", "453529", "901763", "156912", "684723", "424691", "492823", "203236", "676080", "299949", "909870", "973531", "734936", "470691", "110865", "475900", "226393", "210736", "305466", "894356", "885456", "538335", "609325", "137546", "277745", "295124", "621716", "132178", "120869", "140522", "727284", "236135", "787194", "852514", "343463", "780366", "866819", "199429", "119518", "747453", "440966", "194655", "951443", "702469", "793790", "438939", "999934", "250883", "385392", "608329", "457841", "493081", "934596", "548420", "335002", "165548", "377989", "227423", "124412", "377470", "621976", "246495", "835654", "752159", "883077", "770421", "993256", "608797", "177498", "583056", "648684", "772506", "219637", "258611", "809796", "208711", "323348", "728302", "522495", "129550", "453849", "941314", "458742", "895375", "185051", "462082", "882912", "234498", "176588", "842073", "743463", "242608", "231375", "718468", "394321", "444948", "238013", "271859", "211885", "776863", "722439", "359327", "710731", "776506", "645112", "482005", "502056", "572712", "462794", "292594", "501693", "216263", "773421", "245630", "572151", "547063", "230388", "708957", "515262", "756887", "676849", "357598", "934659", "257988", "430403", "569210", "307831", "339743", "745408", "539716", "101917", "820525", "307438", "170986", "357034", "914080", "974421", "220168", "959902", "795347", "732807", "227458", "890669", "819429", "639251", "158748", "772770", "152480", "859106", "581469", "956578", "183150", "784387", "891604", "409847", "636070", "817161", "943428", "806748", "562174", "934491", "436612", "728370", "762676", "734837", "752475", "257481", "695884", "187491", "868235", "840135", "136438", "188725", "281230", "291930", "111122", "676051", "849208", "778619", "472645", "598219", "204934", "958855", "687514", "538729", "696393", "274681", "415978", "160023", "184988", "218831", "596035", "168397", "423292", "828873", "204797", "897569", "799747", "464641", "696673", "976632", "732026", "117534", "872422", "899914", "384082", "896585", "152761", "544530", "805168", "709187", "409699", "416182", "590430", "528889", "261688", "115390", "398857", "760015", "903693", "769620", "104286", "146960", "233763", "285808", "646299", "760225", "979849", "219918", "625730", "522608", "165057", "159906", "378949", "662978", "558678", "730488", "229350", "817582", "874837", "774295", "760133", "772729", "239902", "904378", "314179", "303881", "517094", "107937", "917246", "257598", "644718", "747921", "442005", "659759", "213603", "838151", "267426", "571302", "698726", "282310", "539291", "375853", "240937", "919609", "476820", "159650", "615969", "100552", "651052", "748723", "673237", "110064", "703669", "519631", "329672", "294876", "820774", "252218", "865012", "849538", "981829", "770201", "826721", "396474", "880873", "565387", "332302", "460245", "911743", "267295", "872032", "613154", "976549", "179720", "265541", "850806", "852687", "315838", "203148", "130405", "131808", "134560", "505648", "775324", "811476", "182634", "427559", "437777", "902563", "367446", "271334", "612601", "316847", "150303", "394051", "542171", "171860", "400481", "816979", "727342", "445739", "404437", "609024", "755866", "112859", "962786", "496644", "535918", "463363", "226759", "607543", "425549", "327498", "396221", "690336", "136887", "635836", "515258", "651907", "536350", "986555", "201560", "508697", "111291", "609096", "822949", "602022", "705280", "680770", "107422", "856543", "619650", "572365", "595809", "257438", "823119", "617556", "764690", "864046", "193605", "140260", "630427", "667000", "534281", "733846", "441668", "346905", "471804", "924273", "782472", "923115", "460088", "435210", "465217", "632642", "946716", "889218", "404441", "444832", "963351", "933512", "266370", "558297", "906995", "109623", "785276", "184434", "193939", "500086", "639901", "323489", "937572", "756783", "433616", "826854", "331401", "259660", "518773", "123592", "475196", "143117", "984362", "134777", "222297", "265957", "918381", "796136", "845445", "226030", "271564", "924063", "103266", "700435", "659660", "227300", "374066", "123016", "420523", "776342", "776037", "804597", "608373", "595264", "407421", "352091", "803973", "270558", "652444", "827200", "850231", "498526", "717198", "448126", "184340", "638794", "407739", "184207", "657784", "814859", "811680", "979770", "252003", "226540", "672570", "155093", "974037", "979183", "233479", "755452", "250746", "537361", "915444", "931045", "901729", "768473", "252678", "681011", "951273", "789023", "240265", "588128", "972275", "168545", "197675", "999382", "692391", "672027", "264144", "770696", "187260", "873049", "593470", "963969", "555273", "615393", "490494", "697353", "421097", "950831", "766780", "204413", "544764", "131988", "129847", "409099", "928010", "466138", "609599", "587061", "230711", "287115", "776232", "745932", "325051", "187894", "195243", "713013", "253583", "781680", "542224", "104974", "723519", "493420", "334249", "527477", "359947", "746102", "949162", "696418", "194983", "680323", "839993", "541094", "239552", "770103", "165523", "344351", "942745", "947678", "672744", "792046", "387999", "908535", "358731", "677721", "941277", "483690", "857802", "740906", "919102", "304392", "375135", "221539", "371791", "982150", "338482", "286495", "366455", "398273", "912893", "966632", "914040", "882363", "546934", "816632", "589723", "430821", "627786", "736953", "533010", "381430", "759310", "444137", "903870", "624726", "495956", "113363", "357742", "279348", "324826", "215848", "962606", "887227", "565260", "328610", "944346", "644507", "729667", "429283", "581781", "755753", "142827", "341946", "270573", "536379", "171775", "708413", "954090", "549875", "781573", "669963", "973078", "587327", "575246", "943418", "578947", "186785", "636868", "396787", "960541", "112131", "188509", "413139", "962216", "486011", "807241", "633515", "834247", "784953", "468672", "164662", "136179", "703995", "236855", "574154", "909313", "867949", "988803", "176858", "175809", "742632", "385687", "177371", "112455", "336898", "204230", "830696", "909294", "782864", "843402", "869528", "524469", "169804", "347285", "567160", "654565", "510895", "987492", "586377", "337553", "354199", "981409", "206290", "337732", "645321", "975467", "395047", "642389", "224356", "653196", "333301", "927664", "126301", "363406", "554639", "537517", "985795", "384126", "487525", "922534", "760822", "623487", "872380", "524056", "222610", "198654", "990746", "225303", "925232", "800314", "550794", "350309", "580261", "348652", "889450", "399352", "580279", "943758", "430347", "170664", "978837", "275155", "194680", "695630", "215995", "238812", "968286", "218943", "317113", "451791", "652662", "400773", "665980", "158638", "292508", "464645", "935905", "406770", "590484", "812083", "406339", "590191", "872415", "100183", "388113", "401548", "470715", "867722", "901247", "640036", "241060", "628393", "446363", "260998", "690403", "393469", "264232", "743750", "907482", "921398", "963545", "384813", "116201", "206089", "202042", "870972", "263023", "444839", "554136", "763442", "929994", "531174", "557398", "404442", "506915", "373536", "844429", "283751", "346096", "636998", "244426", "114549", "662993", "221023", "657126", "579728", "342330", "522288", "147972", "413112", "769784", "442020", "323442", "998523", "241351", "254636", "599130", "993406", "129197", "387772", "188539", "966998", "749607", "535991", "193274", "754564", "749566", "847856", "330371", "497553", "113326", "903029", "222753", "681211", "297969", "428746", "803892", "438700", "398597", "164281", "803877", "482168", "755955", "965248", "244345", "936339", "897474", "391278", "957782", "251684", "641478", "474867", "490406", "915481", "841009", "840625", "635630", "177196", "861422", "552214", "238887", "493555", "132602", "845185", "164980", "496569", "148266", "147958", "163343", "329163", "583236", "461324", "220158", "328677", "488314", "310994", "713148", "489265", "659216", "279868", "495998", "568583", "677274", "852311", "822395", "253856", "551302", "496047", "374148", "792523", "900748", "445981", "660678", "150447", "364459", "887035", "971990", "951307", "281064", "753008", "502107", "462946", "643422", "120198", "202971", "759562", "799104", "558130", "886216", "951224", "216749", "581390", "502722", "146138", "532633", "927138", "673188", "736391", "604743", "692332", "427050", "988573", "940597", "239535", "347408", "538280", "551460", "749963", "161927", "150363", "107975", "346822", "541383", "776492", "794095", "975186", "913911", "411601", "151923", "819754", "154536", "761630", "620772", "294679", "624136", "721031", "927124", "304866", "346806", "653887", "949615", "536648", "633500", "622395", "633813", "225863", "226126", "772926", "147594", "125807", "363724", "996010", "986423", "253156", "729943", "391026", "412655", "647057", "779973", "689489", "990714", "706555", "129296", "327539", "118944", "144243", "437444", "670813", "575467", "699009", "671436", "413834", "239946", "990000", "826462", "316214", "492961", "442410", "896508", "870816", "677914", "176452", "970883", "725467", "770729", "766802", "588698", "298126", "122168", "892180", "948179", "337833", "554172", "669026", "264935", "505926", "465988", "271099", "556902", "485501", "599108", "358359", "278650", "609336", "572254", "385575", "522335", "834638", "635324", "570424", "942863", "470637", "996559", "301932", "632025", "229880", "109576", "788175", "914574", "569415", "821291", "634124", "185970", "436744", "704860", "101647", "850933", "368464", "104727", "225655", "615422", "206104", "709548", "226074", "518819", "652298", "353715", "319395", "400381", "298323", "845239", "813847", "964981", "885249", "923289", "126697", "132123", "840286", "682904", "257187", "560496", "746359", "190282", "104333", "820974", "482419", "300169", "653797", "497445", "504791", "974677", "765649", "494391", "456862", "242153", "947099", "297233", "874742", "360668", "435391", "171307", "239305", "930747", "625422", "130055", "844995", "459561", "462242", "146416", "570060", "536378", "213647", "920679", "926661", "558833", "974897", "207143", "913546", "870447", "552493", "510748", "938188", "532913", "167499", "765386", "210890", "262962", "616053", "784997", "861634", "109830", "266063", "457869", "741249", "763585", "949907", "191615", "818768", "194701", "910643", "637252", "370830", "677724", "850978", "375369", "446144", "487346", "523299", "339883", "804691", "579920", "993078", "109832", "905794", "346428", "972835", "442072", "974847", "455276", "838658", "486358", "562913", "541001", "465850", "589494", "907092", "515739", "809347", "137666", "874786", "559221", "247162", "253810", "192327", "571274", "724204", "147039", "193318", "615282", "680026", "594538", "781700", "984848", "610699", "158137", "493201", "482257", "486523", "288843", "376340", "591874", "635533", "459510", "967051", "413727", "568714", "873338", "784142", "663960", "537983", "228015", "122852", "469724", "324199", "199791", "773124", "182215", "377464", "647274", "637261", "396524", "161803", "498048", "137093", "409641", "381346", "746036", "625114", "417996", "435656", "139250", "743287", "597971", "335302", "236690", "614044", "613017", "768688", "774160", "670560", "657807", "783382", "187163", "557026", "479705", "194308", "368553", "496190", "547412", "173565", "471644", "383282", "693042", "105317", "816245", "803073", "988321", "925937", "730644", "482134", "125516", "182810", "689686", "745335", "357563", "797318", "123000", "792398", "320607", "641460", "182745", "420504", "653409", "355976", "884082", "102209", "215487", "471737", "927260", "702589", "402398", "875957", "893906", "598292", "355578", "156004", "250802", "738022", "861105", "960463", "538138", "942785", "224136", "393896", "946965", "931106", "449878", "378063", "210664", "911234", "213856", "203214", "588568", "554477", "321025", "951282", "562737", "938615", "194628", "874342", "734544", "963243", "403614", "108042", "409595", "165986", "336849", "245687", "547858", "906916", "668006", "518416", "643623", "926693", "395381", "953823", "174451", "498894", "751188", "572494", "706173", "720644", "322646", "654622", "464553", "670821", "570426", "362631", "247880", "437580", "252203", "376260", "966934", "472053", "155170", "383762", "311678", "392740", "338463", "742057", "769118", "599926", "422230", "301651", "383830", "163854", "662565", "660484", "617248", "825861", "231197", "916258", "161758", "887223", "383344", "459625", "693149", "351043", "844929", "774573", "660510", "112074", "982904", "507519", "532294", "835886", "161671", "223349", "320534", "776158", "606616", "812455", "555701", "963672", "137571", "330767", "412319", "240615", "118770", "767744", "167183", "849883", "310361", "650883", "876881", "161574", "593465", "413972", "430801", "542491", "141830", "913792", "899387", "922176", "508612", "453383", "988066", "447991", "993055", "322842", "521572", "631055", "321905", "452162", "908980", "890894", "888700", "918787", "457477", "471375", "471133", "354257", "966598", "924670", "371425", "629224", "540985", "721375", "713284", "999959", "662754", "924130", "931461", "257105", "223233", "978076", "705587", "735129", "224104", "986821", "513126", "825668", "748339", "253767", "829906", "244259", "686544", "275362", "504590", "918960", "124890", "745232", "551173", "625539", "182852", "325267", "928844", "564801", "400779", "580684", "592817", "734245", "166317", "238385", "100486", "104523", "499960", "509826", "262040", "257157", "551907", "473018", "336331", "176508", "884653", "884722", "605027", "903230", "476285", "948111", "504675", "671626", "707861", "895905", "507835", "428223", "367585", "830470", "272593", "989877", "699646", "399907", "211482", "554106", "590629", "992337", "415862", "846424", "942249", "270306", "525880", "216978", "847119", "435108", "397388", "345366", "223060", "347262", "900540", "513661", "905359", "108901", "318268", "807301", "786635", "812938", "346494", "385437", "825088", "170378", "389374", "129286", "907574", "735800", "115957", "421717", "112435", "110176", "836931", "189595", "990943", "736490", "531947", "611803", "760374", "388637", "218586", "762291", "988675", "541105", "170671", "240936", "626576", "879316", "710780", "745558", "317535", "207231", "439562", "824767", "860072", "206961", "868341", "680474", "428430", "215439", "481384", "924346", "361021", "655610", "934022", "405382", "573268", "377718", "154462", "935950", "484776", "355148", "204231", "758924", "702105", "609967", "916840", "723783", "931709", "328567", "433520", "421307", "616888", "742840", "961261", "312921", "996298", "889576", "336265", "867761", "447892", "616915", "888181", "549952", "745622", "674160", "291839", "587343", "410984", "444661", "268530", "824378", "831910", "160889", "852682", "318811", "356824", "635094", "299775", "241604", "574570", "663265", "876259", "339351", "522848", "410190", "955014", "691750", "487472", "553269", "754185", "319628", "163711", "201993", "200790", "246049", "948699", "350602", "481169", "702608", "536914", "368936", "551454", "132393", ...]
irb(main):017:0> data.map {|row| row['policyID'] }
=> []
irb(main):018:0> data
=> <#CSV io_type:File io_path:"FL_insurance_sample.csv" encoding:UTF-8 lineno:36635 col_sep:"," row_sep:"\r" quote_char:"\"" headers:["policyID", "statecode", "county", "eq_site_limit", "hu_site_limit", "fl_site_limit", "fr_site_limit", "tiv_2011", "tiv_2012", "eq_site_deductible", "hu_site_deductible", "fl_site_deductible", "fr_site_deductible", "point_latitude", "point_longitude", "line", "construction", "point_granularity"]>
irb(main):019:0> data.to_a
=> []
irb(main):020:0> 

Tested on ruby 2.6.2 and 2.5.1 respective csv versions: csv (default: 3.0.4) and csv (default: 1.0.0) and I experienced the bug on both.

CSV parser with first 1kB block and wrong encoding causes unexpected behavior

Hello,
we validate user's CSV. One of validation steps is the encoding validation of a line.

Our validation method looks somehow like this

    def valid_and_parse_lines
      line_number = 2 # count start after headers

      loop do
        row = csv.gets
        break unless row
        line = line_initialization(row)
        add_error(line_number, line.errors.full_messages) if line.invalid?
      rescue ArgumentError => exception
        raise if exception.message != 'invalid byte sequence in UTF-8'
        add_error(line_number, 'This line contains invalid UTF-8 characters')
      ensure
        line_number += 1
      end

      csv.rewind
    end

The CSV is loaded this way

@csv ||= ::CSV.new(file || content, encoding: Encoding::UTF_8, headers: true)

What is the issue? If there is some ones of the lines of the first 1kB block (used by stdlib's CSV parser for some detection) contain some invalid lines it leads some lines are skipped.

Here is an example:
labels_invalid_iso8859-1_encoding.csv.zip

We always open the files in UTF-8, this CSV contains two lines with illegal character in ISO-8859-1. The second and fourth lines (after the header) contain the illegal character. In this case, only the second lines is detected, fourth one is unfortunately skipped. There is no problem with this if there are other invalid lines after the first 1kB block.

It seems the bug relates to these lines in csv.rb

CSV parser ignores the stream position of a StringIO

The CSV parser seems to call @input.string if @input is a StringIO, but I think it should check if @input.pos is zero before doing that.

require 'csv'
require 'stringio'

strio = StringIO.new(<<'EOF')
aaa,b,c
EOF

p strio.read(2) #=> "aa"
p strio.pos #=> 2
p CSV.parse_line(strio) #=> ["aaa", "b", "c"] (["a", "b", "c"] is expected)

This problem was found by @katsyoshi and presented on ruby-jp.slack.com. I tracked it down and identified the causing commit as eeab2ed, which was between v3.0.1 and v3.0.2.

Proposal: Test coverage report for the project

Currently the project has no test coverage report, this can be a nice to have to maintain and improve the CSV gem.

Pros:

  • We can actively monitor what needs to be done on the current tests and be prepared also for the new contributions.
  • Analyze new areas of improvements (on test and the code)

Cons:

  • None. Not technical at least.

Plan (if accepted):

  • Add SimpleCov gem as development dependency and configure the unit test according.
  • Optionally, setup a service to have the info visible(as CodeClimate)

CSV.parse(csv, col_sep: nil) throw a TypeError (no implicit conversion of nil into String) in ruby 2.6

Hi,

We tried to migrate one of our projects to ruby 2.6 and I am dealing with this difference in behavior that doesn't seem to be documented in changelogs.

ruby 2.5.3:

irb(main):006:0> CSV.parse("toto,tata", col_sep: nil)
=> [["t", "o", "t", "o", ",", "t", "a", "t", "a", nil]]

ruby 2.6.3:

irb(main):003:0> CSV.parse("toto,tata", col_sep: nil)
Traceback (most recent call last):
       15: from /Users/beauraf/.asdf/installs/ruby/2.6.3/bin/irb:23:in `<main>'
       14: from /Users/beauraf/.asdf/installs/ruby/2.6.3/bin/irb:23:in `load'
       13: from /Users/beauraf/.asdf/installs/ruby/2.6.3/lib/ruby/gems/2.6.0/gems/irb-1.0.0/exe/irb:11:in `<top (required)>'
       12: from (irb):3
       11: from /Users/beauraf/.asdf/installs/ruby/2.6.3/lib/ruby/2.6.0/CSV.rb:685:in `parse'
       10: from /Users/beauraf/.asdf/installs/ruby/2.6.3/lib/ruby/2.6.0/CSV.rb:1245:in `read'
        9: from /Users/beauraf/.asdf/installs/ruby/2.6.3/lib/ruby/2.6.0/CSV.rb:1245:in `to_a'
        8: from /Users/beauraf/.asdf/installs/ruby/2.6.3/lib/ruby/2.6.0/CSV.rb:1236:in `each'
        7: from /Users/beauraf/.asdf/installs/ruby/2.6.3/lib/ruby/2.6.0/CSV.rb:1430:in `parser_enumerator'
        6: from /Users/beauraf/.asdf/installs/ruby/2.6.3/lib/ruby/2.6.0/CSV.rb:1421:in `parser'
        5: from /Users/beauraf/.asdf/installs/ruby/2.6.3/lib/ruby/2.6.0/CSV.rb:1421:in `new'
        4: from /Users/beauraf/.asdf/installs/ruby/2.6.3/lib/ruby/2.6.0/csv/parser.rb:230:in `initialize'
        3: from /Users/beauraf/.asdf/installs/ruby/2.6.3/lib/ruby/2.6.0/csv/parser.rb:328:in `prepare'
        2: from /Users/beauraf/.asdf/installs/ruby/2.6.3/lib/ruby/2.6.0/csv/parser.rb:455:in `prepare_separators'
        1: from /Users/beauraf/.asdf/installs/ruby/2.6.3/lib/ruby/2.6.0/csv/parser.rb:455:in `escape'
TypeError (no implicit conversion of nil into String)

Is this behaviour intended?

Thanks.

CSV.each rewinding and including header row

Reproduction script

require 'csv'

data = CSV.new(<<~ROWS, headers: true)
  Name,Department,Salary
  Bob,Engineering,1000
ROWS

data.each do |row|
  puts row.to_s
end

puts 'second loop'

data.each do |row|
  puts row.to_s
end

ruby 2.6.0p0 (2018-12-25 revision 66547) [x86_64-darwin18]

ruby csv_bug.rb

Bob,Engineering,1000
second loop
Name,Department,Salary  <--- 🔥this should not be here 🔥 
Bob,Engineering,1000

ruby 2.5.3p105 (2018-10-18 revision 65156) [x86_64-darwin18]

ruby csv_bug.rb

Bob,Engineering,1000
second loop

ruby 2.4.1p111 (2017-03-22 revision 58053) [x86_64-darwin17]

ruby csv_bug.rb

Bob,Engineering,1000
second loop

What I think is going on

I may take a crack at fixing myself, but almost certain this isn't desired behavior.

It seems the row pointer is being rewound when calling .each twice, but on the second iteration the rewind is including the header line -- even though I've specified headers: true

Side note, in the future I think .each should always rewind the row counter after execution, which appears what it's trying to do in 2.6.0 but it's accidentally including the header row


Side note

Bob should ask for a raise

NoMethodError occurred at #gets after MalformedCSVError

reproduction code:

require 'csv'

puts CSV::VERSION
csv = CSV.new(<<~CSV, headers: true, return_headers: true)
  head1,head2,head3
  aaa,bbb,ccc
  ddd,ee"e.fff
  ggg,hhh,iii
CSV

until csv.eof?
  begin
    p csv.gets
  rescue CSV::MalformedCSVError => e
    p e
  end
end

ruby 2.6.2:

$ ruby -v && ruby /tmp/csv.rb
ruby 2.6.2p47 (2019-03-13 revision 67232) [x86_64-linux]
3.0.4
#<CSV::Row "head1":"head1" "head2":"head2" "head3":"head3">
#<CSV::Row "head1":"aaa" "head2":"bbb" "head3":"ccc">
#<CSV::MalformedCSVError: Illegal quoting in line 3.>
Traceback (most recent call last):
	3: from /tmp/csv.rb:1:in `each'
	2: from /home/krororo/.rbenv/versions/2.6.2/lib/ruby/2.6.0/csv/parser.rb:236:in `parse'
	1: from /home/krororo/.rbenv/versions/2.6.2/lib/ruby/2.6.0/csv/parser.rb:236:in `new'
/home/krororo/.rbenv/versions/2.6.2/lib/ruby/2.6.0/csv/row.rb:35:in `initialize': undefined method `size' for nil:NilClass (NoMethodError)

ruby 2.5.3:

$ ruby -v && ruby /tmp/csv.rb
ruby 2.5.3p105 (2018-10-18 revision 65156) [x86_64-linux]
2.4.8
#<CSV::Row "head1":"head1" "head2":"head2" "head3":"head3">
#<CSV::Row "head1":"aaa" "head2":"bbb" "head3":"ccc">
#<CSV::MalformedCSVError: Illegal quoting in line 3.>
#<CSV::Row "head1":"ggg" "head2":"hhh" "head3":"iii">

The expected behavior is to get data after an error occurs.

Accept \r without quote if row separator doesn't include \r

This may work:

@@ -360,12 +320,14 @@ class CSV
       if @liberal_parsing
         @unquoted_value = Regexp.new("[^".encode(@encoding) +
                                      escaped_column_separator +
-                                     "\r\n]+".encode(@encoding))
+                                     escaped_row_separator +
+                                     "]+".encode(@encoding))
       else
         @unquoted_value = Regexp.new("[^".encode(@encoding) +
                                      escaped_quote_character +
                                      escaped_column_separator +
-                                     "\r\n]+".encode(@encoding))
+                                     escaped_row_separator +
+                                     "]+".encode(@encoding))
       end
       @cr_or_lf = Regexp.new("[\r\n]".encode(@encoding))
       @not_line_end = Regexp.new("[^\r\n]+".encode(@encoding))

2.5 --> 2.6: backwards incompatible change in #parse method for IOString

If you call #seek(n) or #read(n) on StringIO object newer version of CSV.parse ignores current offset position in IO object.

Consider this examples
Ruby 2.5.5

require 'csv'
io = StringIO.new("#meta\na,b\n1,2")
io.read(6) # => "#meta\n"
CSV.parse(io) # => [["a", "b"], ["1", "2"]]

Ruby 2.6.3

io = StringIO.new("#meta\na,b\n1,2")
io.read(6) # => "#meta\n"
CSV.parse(io) # => [["#meta"], ["a", "b"], ["1", "2"]] (contains skipped data)

It looks that new logic always rewinds StringIO and there is now way to pass IO with specific offset position anymore.

#convert_fields's sym/string break isn't clarified in #initialize's docs

If iterating thru multiple converters, if any single converter results in a symbol, subsequent converters are skipped. As seen in the #convert_field docs below:

#
  # Processes +fields+ with <tt>@converters</tt>, or <tt>@header_converters</tt>
  # if +headers+ is passed as +true+, returning the converted field set.  Any
  # converter that changes the field into something other than a String halts
  # the pipeline of conversion for that field.  This is primarily an efficiency
  # shortcut.
  #
  def convert_fields(fields, headers = false)
    # see if we are converting headers or fields
    converters = headers ? @header_converters : @converters

    fields.map.with_index do |field, index|
      converters.each do |converter|
        break if field.nil?
        field = if converter.arity == 1  # straight field converter
          converter[field]
        else                             # FieldInfo converter
          header = @use_headers && !headers ? @headers[index] : nil
          converter[field, FieldInfo.new(index, lineno, header)]
        end
        break unless field.is_a? String  # short-circuit pipeline for speed
      end
      field  # final state of each field, converted or original
    end
  end

However the docs for the #initialize options don't relay that same requirement about converter arrays:

# <b><tt>:converters</tt></b>::         An Array of names from the Converters
  #                                       Hash and/or lambdas that handle custom
  #                                       conversion.  A single converter
  #                                       doesn't have to be in an Array.  All
  #                                       built-in converters try to transcode
  #                                       fields to UTF-8 before converting.
  #                                       The conversion will fail if the data
  #                                       cannot be transcoded, leaving the
  #                                       field unchanged.

This leads a reader to believe that you can pass converters like so:

:header_converters => [:symbol, lambda { |h| h.do_some_stuff }]

But this will result in lambda { |h| h.do_some_stuff } never executing.

I could reverse the order of my converters, as shown below, and now both will execute, but the intended result is dependent on the correct order.

:header_converters => [lambda { |h| h.do_some_stuff }, :symbol]

My workaround:
Because I didn't want to duplicate your :symbol lambda (and risk falling out of sync with subsequent changes), I had to write a custom lambda, that as a part of its definition invokes the default symbol lambda:

  def self.symbol_and_squeezed_lambda
    lambda do |header|
      native_symbol_lambda = CSV::HeaderConverters[:symbol]
      first_conversion = native_symbol_lambda.call(header)
      first_conversion.to_s.squeeze('_').to_sym
    end
  end
  ...

:header_converters => symbol_and_squeezed_lambda

All of this could only be found by putting debuggers inside csv.rb and tracing my way down to and through the convert_field method before I realized why me second/custom converter wasn't being executed.

Possible changes:

  1. Document this symbol/order requirement of converted in the docs/comments for #initialize
  2. Remove the break in the #convert_fields

Even if change 1 is implemented, it would only help clarity, and would still require the use of workaround as I showed above.

2.5 --> 2.6: backwards incompatible change in MalformedCSVError#new

In Ruby 2.5, CSV::MalformedCSVError simply inherited from RuntimeError. In 2.6, it defines its own #new, taking 2 arguments (as opposed to the single argument of RuntimeError).

Code explicitly raising new errors of this class outside of the CSV library implementation itself breaks since it doesn't include the second argument (a line number).

Is making the line number optional something you'd entertain?

CSV.generate_line (and others) generating unexpected results for pipe-delimited values

Given CSV.generate_line(["",""], col_sep: '|') the resulting ""|"" is unexpected. The expected result is a simple |.

It appears the issue is caused by

csv/lib/csv.rb

Line 1438 in ba560e4

if field.empty? or
. What's the reason for quoting empty values?

If the current behavior is on purpose (which seems to be the case), could an option be introduced to not quote empty values, such as quote_empty: true?

(Also, does anyone know the smallest workaround for this that doesn't break legitimate double-double-quotes in CSV quoting?

GemSpec is requiring Ruby 2.5.0-dev

Both gems on ruby gems versions 0.0.1 and 0.1.0 now require ruby 2.5.0-dev. Both released gems can no longer be installed with released versions of ruby like 2.3.x.

#6

CSV gem for ruby which is less than ruby <2.5 version

Hey friends I am unable to use this gem for any of the ruby-version which is less than 2.5. I tried with all version of csv gem but still same issue.
The snippet below shows for least version of csv.

Bundler could not find compatible versions for gem "ruby":
In Gemfile:
ruby

csv (~> 0.0.1) was resolved to 0.0.1, which depends on
  ruby (>= 2.5.0dev)

BOM UTF-8 is ignored after rewind

I have a CSV file with "forced quotes" and UTF-8 BOM (\xEF\xBB\xBF) which CSV can not read after a rewind. I get "CSV::MalformedCSVError: Illegal quoting in line 1."

My UTF-8 CSV file with BOM:

File.open('bom_test.csv', 'w') do |io|
  io.write("\xEF\xBB\xBF\"Name\",\"City\"\n\"John Doe\",\"New York\"")
end

Reproduce error:

# Case 1
csv = CSV.open('bom_test.csv', 'r:BOM|UTF-8', {headers: true})
csv.shift
# => #<CSV::Row "Name":"John Doe" "City":"New York">
csv.rewind
csv.shift
# => CSV::MalformedCSVError (Illegal quoting in line 1.)

# Case 2
csv = CSV.open('bom_test.csv', 'r:BOM|UTF-8', {headers: true})
csv.readline
# => #<CSV::Row "Name":"John Doe" "City":"New York">
csv.rewind
csv.readline
# => CSV::MalformedCSVError (Illegal quoting in line 1.)

CSV.parse not removing BOM character

It looks like there's a bug in CSV 3.0.0 where parse with encoding 'bom|utf-8' doesn't remove the bom character (but it seems to be fine on the read method)

require 'csv'
require 'tempfile'
t = Tempfile.new("file.csv")
bom_character = 65_279
t << "name\nMy CSV".codepoints.unshift(bom_character).pack("U*")
t.rewind
csv = CSV.read(t, headers: true, encoding: 'bom|utf-8')
csv.first["name"]
=> "My CSV"
csv = CSV.parse(File.read(t), headers: true, encoding: 'bom|utf-8')
csv.first["name"]
=> nil

Converters doesn't work on generate_line

When you call

CSV.generate_line(['a'], converters: lambda { |field| field.prepend '=' } )

doesn't do anything.

Checking the code, and correct me if I'm wrong, the only supported options when using generate_line are

csv/lib/csv.rb

Line 1083 in 9ccdfe3

output = row.map(&@quote).join(@col_sep) + @row_sep # quote and separate

The "problem" in my opinion is that the documentation says that it can be used any option new understands:

csv/lib/csv.rb

Lines 563 to 567 in 9ccdfe3

# The +options+ parameter can be anything CSV::new() understands. This method
# understands an additional <tt>:encoding</tt> parameter to set the base
# Encoding for the output. This method will try to guess your Encoding from
# the first non-+nil+ field in +row+, if possible, but you may need to use
# this parameter as a backup plan.

I can try to make changes to make it work if you guys point me what should be the expected behaviour.

Cloning rows

Is this behavior correct when cloning a CSV::Row?

irb(main):006:0> row = CSV::Row.new([:name], ['Andre'])
=> #<CSV::Row name:"Andre">
irb(main):007:0> row[:name]
=> "Andre"
irb(main):008:0> dup_row = row.dup
=> #<CSV::Row name:"Andre">
irb(main):009:0> dup_row[:name]
=> "Andre"
irb(main):010:0> dup_row.delete(:name)
=> [:name, "Andre"]
irb(main):011:0> dup_row[:name]
=> nil
irb(main):012:0> row['name']
=> nil

I expected that when modifying the duplicated row would not change the original object.

Using ruby 2.5.1p57

#eof? raise CSV::MalformedCSVError

The same code as #82. #eof? raise CSV::MalformedCSVError. Expected behavior returns true or false value.

backtrace:

$ ruby /tmp/csv.rb 
3.0.9
#<CSV::Row "head1":"head1" "head2":"head2" "head3":"head3">
#<CSV::Row "head1":"aaa" "head2":"bbb" "head3":"ccc">
Traceback (most recent call last):
	7: from /tmp/csv.rb:1:in `each'
	6: from /home/krororo/.rbenv/versions/2.6.2/lib/ruby/gems/2.6.0/gems/csv-3.0.9/lib/csv/parser.rb:303:in `parse'
	5: from /home/krororo/.rbenv/versions/2.6.2/lib/ruby/gems/2.6.0/gems/csv-3.0.9/lib/csv/parser.rb:779:in `parse_quotable_loose'
	4: from /home/krororo/.rbenv/versions/2.6.2/lib/ruby/gems/2.6.0/gems/csv-3.0.9/lib/csv/parser.rb:28:in `each_line'
	3: from /home/krororo/.rbenv/versions/2.6.2/lib/ruby/gems/2.6.0/gems/csv-3.0.9/lib/csv/parser.rb:28:in `each_line'
	2: from /home/krororo/.rbenv/versions/2.6.2/lib/ruby/gems/2.6.0/gems/csv-3.0.9/lib/csv/parser.rb:31:in `block in each_line'
	1: from /home/krororo/.rbenv/versions/2.6.2/lib/ruby/gems/2.6.0/gems/csv-3.0.9/lib/csv/parser.rb:818:in `block in parse_quotable_loose'
/home/krororo/.rbenv/versions/2.6.2/lib/ruby/gems/2.6.0/gems/csv-3.0.9/lib/csv/parser.rb:879:in `parse_quotable_robust': Illegal quoting in line 3. (CSV::MalformedCSVError)

2.6 -> 2.7 behavior change

I tried running Activeadmin's tests against ruby master, and it all went good, except for a single failure, related to encodings.

I researched a bit and it all comes down to the following change in behaviour:

On ruby 2.6 with csv 3.1.1

$ ruby -EUTF-8:UTF-8 -ve 'require "csv"; puts CSV.generate_line(["おはようございます".encode("Shift_JIS")], encoding: Encoding::Shift_JIS).encoding'
ruby 2.6.5p106 (2019-08-29 revision 67799) [x86_64-linux]
last_commit=Revert "merge revision(s) 53e9908d8afc7f03109b0aafd1698ab35f512b05: [Backport #15916]"
Shift_JIS

On master with csv 3.1.1

$ ruby -EUTF-8:UTF-8 -ve 'require "csv"; puts CSV.generate_line(["おはようございます".encode("Shift_JIS")], encoding: Encoding::Shift_JIS).encoding'
ruby 2.7.0dev (2019-09-09T12:27:40Z master 89c5d5a64e) [x86_64-linux]
/home/deivid/.rbenv/versions/master/lib/ruby/gems/2.7.0/gems/csv-3.1.1/lib/csv.rb:568: warning: The last argument is used as the keyword parameter
/home/deivid/.rbenv/versions/master/lib/ruby/gems/2.7.0/gems/csv-3.1.1/lib/csv.rb:915: warning: for `initialize' defined here
UTF-8

Parsing lines with trailing whitespaces. (CSV::MalformedCSVError (Any value after quoted field isn't allowed in line 1.))

When trying to parse a file that contains extra whitespace at the end of the line, the parser fails with CSV::MalformedCSVError (Any value after quoted field isn't allowed in line 1.)

Other programs (Numbers, OpenOffice, etc) seem to have no trouble with the file.

CSV gem version: 3.1.0

Ruby version: 2.5.1

File contents (note that there is one extra whitespace after the double quote of header_2:

"header_1","header_2" 
"value_1","value_2" 
$ irb
2.5.1 :001 > require 'csv'
 => true
2.5.1 :002 > csv = CSV.open('bad_file.csv', headers: true)
 => <#CSV io_type:File io_path:"bad_file.csv" encoding:UTF-8 lineno:0 col_sep:"," row_sep:"\n" quote_char:"\"" headers:true>
2.5.1 :003 > csv.each { |row| puts row.inspect }
Traceback (most recent call last):
       11: from .rvm/rubies/ruby-2.5.1/bin/irb:11:in '<main>'
       10: from (irb):3
        9: from .rvm/gems/ruby-2.5.1/gems/csv-3.1.0/lib/csv.rb:1243:in 'each'
        8: from .rvm/gems/ruby-2.5.1/gems/csv-3.1.0/lib/csv.rb:1243:in 'each'
        7: from .rvm/gems/ruby-2.5.1/gems/csv-3.1.0/lib/csv/parser.rb:303:in 'parse'
        6: from .rvm/gems/ruby-2.5.1/gems/csv-3.1.0/lib/csv/parser.rb:779:in 'parse_quotable_loose'
        5: from .rvm/gems/ruby-2.5.1/gems/csv-3.1.0/lib/csv/parser.rb:28:in 'each_line'
        4: from .rvm/gems/ruby-2.5.1/gems/csv-3.1.0/lib/csv/parser.rb:28:in 'each_line'
        3: from .rvm/gems/ruby-2.5.1/gems/csv-3.1.0/lib/csv/parser.rb:31:in 'block in each_line'
        2: from .rvm/gems/ruby-2.5.1/gems/csv-3.1.0/lib/csv/parser.rb:818:in 'block in parse_quotable_loose'
        1: from .rvm/gems/ruby-2.5.1/gems/csv-3.1.0/lib/csv/parser.rb:869:in 'parse_quotable_robust'
CSV::MalformedCSVError (Any value after quoted field isn't allowed in line 1.)

Reading from a non-rewindable stream that supports pos can cause row_sep autodetection to silently corrupt data

As the title says, passing an IO object that supports pos but not rewind to CSV.new or CSV.parse can cause silent data corruption (specifically, loss of the first 1024 bytes of input) if row_sep is not explicitly specified.

I originally ran into this problem on JRuby, while trying to use a non-markable Java InputStream (specifically, a java.util.zip.ZipInputStream) wrapped with .to_io as the input to CSV.parse. However, it's also possible to demonstrate the problem with a simple wrapper around StringIO:

require 'csv'
require 'stringio'
require 'forwardable'

class DummyIO
  extend Forwardable
  def_delegators :@io, :gets, :read, :pos  # no seek or rewind!
  def initialize(data)
    @io = StringIO.new(data)
  end
end

csv = (1..10).map do |row|
   (1..100).map { |col| "row#{row}col#{col}" }.to_a.join(",")
end.to_a.join("\n")

CSV.new(DummyIO.new(csv)).each do |row|
  puts row.inspect
end

The output of this code, which should begin with ["row1col1", "row1col2", "row1col3", instead begins with ["ol4", "row2col5", "row2col6", "row2col7".

Ideally, the row_sep autodetection should be rewritten to save the data it reads, so that it can be later parsed without having to seek or rewind the input at all. This would allow any non-seekable input streams to be safely used without needing any special-case hacks.

Header refreshing problem of CSV::Table in v 3.0.3

Probably, v3.0.3, the header of CSV::Table is not refreshed by the addition of new CSV::Row object or Hash.

require 'csv'

txt =<<TXT
"h1","h2","h3"
1,"test",3
2,"test",6
3,"test",9
TXT

table = CSV::parse(txt, headers: true)
table.each do |row|
  row << {"h4" => "additional"}
end
puts table.to_csv

ver 3.0.3

h1,h2,h3
1,test,3,additional
2,test,6,additional
3,test,9,additional

ver 3.0.0

h1, h2, h3, h4
1, test, 3, additional
2, test, 6, additional
3, test, 9, additional

I'm not sure but from discussion in stack-overflow board or my trial, it was started from 3.0.1.
v 3.0.0 is working as expect.

Rubocop integration?

Hi there, while filing some bug for this project I noticed what code style of this project is not in great shape.
I can help with integration of RuboCop into project and fixing current code style issues if authors are agree with me.

Handle =" as a designator for Excel 'formulas'

We have a .CSV from a vendor that quotes numeric fields that should be treated as strings with =" -- for example (excerpt from a real file)

USD,="6161015399000000",54585234

I'm going to experiment with a custom converter. If that works, I'll close this issue.

2.5.0 / no docs?

Is this Ruby's official CSV library now? I tried using FasterCSV with ruby 2.4.0, but it bails with an error requiring I use "ruby 1.9 csv" instead. Since this vode is available as the official ruby/csv namespace on Github, I assume that this is that gem, but there's no documentation, and the gemspec says that 2.5.0 is the minimum required ruby version? Should I be looking at the FasterCSV docs, is this just that, directly moved over?

If I'm in the wrong place, please point me to the docs for whatever CSV gem that works with 2.4.0.

Can't parse a single new line character

$ ruby -r csv -e 'puts CSV::VERSION'
3.0.1
$ ruby -v
ruby 2.6.0rc1 (2018-12-06 trunk 66253) [x86_64-darwin18]

This seems to be a recent regression.

In Ruby 2.6:

> CSV.parse("\n", {headers:'s'}) { |x| puts x }
Traceback (most recent call last):
       11: from /Users/cabeer/.rubies/ruby-2.6.0-rc1/bin/irb:23:in `<main>'
       10: from /Users/cabeer/.rubies/ruby-2.6.0-rc1/bin/irb:23:in `load'
        9: from /Users/cabeer/.rubies/ruby-2.6.0-rc1/lib/ruby/gems/2.6.0/gems/irb-0.9.6/exe/irb:11:in `<top (required)>'
        8: from (irb):5
        7: from /Users/cabeer/.rubies/ruby-2.6.0-rc1/lib/ruby/2.6.0/csv.rb:693:in `parse'
        6: from /Users/cabeer/.rubies/ruby-2.6.0-rc1/lib/ruby/2.6.0/csv.rb:1147:in `each'
        5: from /Users/cabeer/.rubies/ruby-2.6.0-rc1/lib/ruby/2.6.0/csv.rb:1205:in `shift'
        4: from /Users/cabeer/.rubies/ruby-2.6.0-rc1/lib/ruby/2.6.0/csv.rb:1205:in `loop'
        3: from /Users/cabeer/.rubies/ruby-2.6.0-rc1/lib/ruby/2.6.0/csv.rb:1239:in `block in shift'
        2: from /Users/cabeer/.rubies/ruby-2.6.0-rc1/lib/ruby/2.6.0/csv.rb:1239:in `new'
        1: from /Users/cabeer/.rubies/ruby-2.6.0-rc1/lib/ruby/2.6.0/csv/row.rb:32:in `initialize'
NoMethodError (undefined method `each' for nil:NilClass)

In Ruby 2.5:

CSV.parse("\n", {headers:'s'}) { |x| puts x }

=> nil

`CSV.parse('', headers: true)` was broken?

If headers: true option was given, CSV.parse() returns CSV::Table object even if empty string will be passed as argument with Ruby 2.5.0.

However, in Ruby 2.6, the above behavior was incomatible.

I want to know whether it is expected. Thank you!!

Ruby 2.6

$ ruby -v ttt.rb
ruby 2.6.0p0 (2018-12-25 revision 66547) [x86_64-darwin18]
[]
Traceback (most recent call last):
ttt.rb:4:in `<main>': undefined method `headers' for []:Array (NoMethodError)

Ruby 2.5

$ ruby -v ttt.rb
ruby 2.5.3p105 (2018-10-18 revision 65156) [x86_64-darwin18]
#<CSV::Table mode:col_or_row row_count:1>
[]

Test code

require 'csv'
row = CSV.parse('', headers: true)
p row
p row.headers

Regression #open method for writing from Ruby 2.4.3 to 2.5.0

I have noticed a different behaviour between Ruby <= 2.4.3 and Ruby 2.5.0 for the #open method.

If you create an empty file for writing and you are not writing any line in that CSV file, Ruby <= 2.4.3 doesn't write anything (an empty file) but Ruby 2.5.0 writes the headers.

Ruby <= 2.4.3

$ ruby -v
ruby 2.4.3p205 (2017-12-14 revision 61247) [x86_64-darwin17]
$ irb
irb(main):001:0> require "csv"
=> true
irb(main):002:0> CSV.open("ruby-2.4.3.csv", "wb", headers: ["name", "surname"], write_headers: true) { }
=> nil
irb(main):003:0> `cat ruby-2.4.3.csv`
=> ""

Ruby 2.5.0

$ ruby -v
ruby 2.5.0p0 (2017-12-25 revision 61468) [x86_64-darwin17]
$ irb
irb(main):001:0> require "csv"
=> true
irb(main):002:0> CSV.open("ruby-2.5.0.csv", "wb", headers: ["name", "surname"], write_headers: true) { }
=> nil
irb(main):003:0> `cat ruby-2.5.0.csv`
=> "name,surname\n"

In the examples, I'm using an empty block but in a real application probably you will have an if statement, something like this:

CSV.open(...) do |csv|
  csv << "hello" if condition
end

When a line contains an illegal character in a given encoding, throw something else than ArgurmentError

This is one of the most confusing exception in the stdlib

I'd expect something like CSV::MalformedCSVError but ArgumentError. I have to use something like this in our code, where I could be sure I don't catch something else than only an illegal character on a given line:

loop do
  row = csv.gets
  break unless row
  ...
rescue ArgumentError => exception
  raise if exception.message != 'invalid byte sequence in UTF-8'
  ...
end

Reporting TODO MalformedCSVError with col_sep of multiple spaces in Ruby 2.6.0

I'm experiencing a regression in the CSV library on upgrading to ruby 2.6.0:

Consider:

require 'csv'
CSV.parse("John Doe    Male\nJane Doe    Female", col_sep: "    ")

In ruby 2.5.1, this returns:

=> [["John Doe", "Male"], ["Jane Doe", "Female"]]

In ruby 2.6.0, this raises an exception (with a TODO in the message?)

CSV::MalformedCSVError (TODO: Meaningful message in line 1.)

CSV.parse(csv, headers: true) returns wrong value if CSV string has only one line

When run following test code, Ruby 2.6.1's CSV library returns diffrenet result with Ruby 2.5.3.

Test code

require 'csv'

csv = <<-CSV
alice,有栖,ありす,,,,,,,,
CSV

rows = CSV.parse(csv, headers: true)
p rows.headers

Ruby 2.6.1

$ ruby -v csv.rb
ruby 2.6.1p33 (2019-01-30 revision 66950) [x86_64-darwin18]
["alice", "有栖", "ありす", nil, nil, nil, nil, nil, nil, nil, nil]

Ruby 2.5.3

$ ruby -v csv.rb
ruby 2.5.3p105 (2018-10-18 revision 65156) [x86_64-darwin18]
[]

Got wrong results when parse from File and that includes 3 bytes char with skip_lines

I haven't figured out how to solve that problem, but i found some step to reproduce it:

  • read data from File.open
  • skip_lines options are presence
  • data length is over the chunk size (32 * 1024)
  • data includes 3 bytes char (such as chinese)
  • v3.0.1 works correctly, It's happening in v3.0.2 and afters

test case:

def test_three_bytes_chars
  Tempfile.create(['temp', '.csv']) do |tempfile|
    tempfile.close
    path = tempfile.path

    text = "\xE5\x93\x88\xE5\x9B\x89"
    File.open(path, "w") do |csv|
      row = [text].join(',').concat("\n")
      row_count = (32 * 1024) / (StringIO.new(row).length) + 1
      csv << (row * row_count)
    end;
    assert_equal(
      [text],
      CSV.read(File.open(path), headers: true, :skip_lines => /\A#/).to_a.last
    )
  end
end

Results in Ruby 2.5.5:

# Running tests:

[102/756] TestCSVFeatures#test_three_bytes_chars = 0.06 s
  1) Failure:
TestCSVFeatures#test_three_bytes_chars [/Users/jeff/project/csv/test/csv/test_features.rb:341]:
<["哈囉"]> expected but was
<["囉"]>.

[142/756] TestCSVFeatures::DifferentOFS#test_three_bytes_chars = 0.06 s
  2) Failure:
TestCSVFeatures::DifferentOFS#test_three_bytes_chars [/Users/jeff/project/csv/test/csv/test_features.rb:341]:
<["哈囉"]> expected but was
<["囉"]>.

Finished tests in 2.863357s, 264.0258 tests/s, 2669.9430 assertions/s.
756 tests, 7645 assertions, 2 failures, 0 errors, 0 skips

ruby -v: ruby 2.5.5p157 (2019-03-15 revision 67260) [x86_64-darwin18]

Can't parse one line end with CR

$ ruby -v
ruby 2.5.3p105 (2018-10-18 revision 65156) [x86_64-darwin16]

$ ruby -r csv -e 'puts CSV::VERSION'
3.0.0
CSV.parse("foo")
#=> [["foo"]]

CSV.parse("foo\n")
#=> [["foo"]]
CSV.parse("foo\nbar\n")
#=> [["foo"], ["bar"]]

CSV.parse("foo\r\n")
#=> [["foo"]]
CSV.parse("foo\r\nbar\r\n")
#=> [["foo"], ["bar"]]

CSV.parse("foo\r")
#=> CSV::MalformedCSVError (Unquoted fields do not allow \r or \n in line 1.)
CSV.parse("foo\rbar\r")
#=> [["foo"], ["bar"]]

Is it intentional?

I expect to be like this.

CSV.parse("foo\r")
#=> [["foo"]]

Proposal: Split csv.rb into a more manageable structure

Currently CSV is a single file library, being a 2300+ LOC file is kind of difficult to manage by itself. We probably need to follow the Rubygem's proposal on how to structure the gem.
Pros:

  • Explore, check, review and maintain 5 to 6 files of 500+ LOC is (based on my own experience) more easy to manage
  • Next commits can and should affect relevant class, making the git log more user friendly.
  • Reduce and distribute the churn across history of the gem

Cons:

  • At the moment none, at least no technical

Plan:

  • Extract internal classes:
    • CSV::Table -> /lib/csv/table.rb
    • CSV::Row -> /lib/csv/row.rb
  • Move extensions to their own namspace/files
    • Array -> /lib/core_ext/array.rb
    • String -> /lib/core_ext/string.rb

With this we will have 5 files, and still work to do via future refactors (a 1700+ LOC file still looks like a red flag) and unit test.

CSV.parse can't handle double double-quoted column at Ruby 2.6.0

At first, I don't know whether the following CSV is valid format.

a,""b""

Ruby 2.5's CSV library handles it well.
However, Ruby 2.6's shows different behavior.

Ruby 2.6

$ ruby -v t.rb
ruby 2.6.0p0 (2018-12-25 revision 66547) [x86_64-darwin18]
Traceback (most recent call last):
	5: from t.rb:7:in `<main>'
	4: from /Users/watson/.rbenv/versions/2.6.0/lib/ruby/2.6.0/csv.rb:683:in `parse'
	3: from /Users/watson/.rbenv/versions/2.6.0/lib/ruby/2.6.0/csv.rb:1180:in `read'
	2: from /Users/watson/.rbenv/versions/2.6.0/lib/ruby/2.6.0/csv.rb:1180:in `to_a'
	1: from /Users/watson/.rbenv/versions/2.6.0/lib/ruby/2.6.0/csv.rb:1171:in `each'
/Users/watson/.rbenv/versions/2.6.0/lib/ruby/2.6.0/csv/parser.rb:273:in `parse': Do not allow except col_sep_split_separator after quoted fields in line 1. (CSV::MalformedCSVError)

Ruby 2.5

$ ruby -v t.rb
ruby 2.5.3p105 (2018-10-18 revision 65156) [x86_64-darwin18]
[["a", "\"b\""]]

Test code

require 'csv'

csv =<<CSV
a,""b""
CSV

p CSV.parse(csv)

ArgumentError: invalid byte sequence in UTF-16LE

I have a file, it used utf-16le encoding.

require 'csv'

CSV.foreach('shifenzheng.txt', encoding: "utf-16le") do |row|
    p row
    sleep 10
end

I got the following error:

ArgumentError: invalid byte sequence in UTF-16LE                                                                       
from ~/.rvm/rubies/ruby-2.5.0/lib/ruby/2.5.0/csv.rb:2046:in `=~'

I tried to:

CSV.foreach('shifenzheng.txt', encoding: 'utf-16le:utf-8') do |row|                                     
   p row; sleep 10;                                                                                      
end

This worked.

I tried to:

fp = File.open 'shifenzheng.txt', 'rb', encoding: 'utf-16le'
fp.readline

This worked well, too.

So, I looked into csv code, it seems CSV::foreach dose not accept argument for reading by bytes.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.