Git Product home page Git Product logo

Comments (9)

dehesa avatar dehesa commented on September 12, 2024

Hi @xsleonard, I am out for the weekend so I don't think I will manage to solve this till Tuesday. Your bug description seems clear and I believe I understand the problem, but I would appreciate if you could provide a CSV with two or three rows (of dummy data) triggering the problem. That way I can even include it as a test so it doesn't happen again.

from codablecsv.

xsleonard avatar xsleonard commented on September 12, 2024

Ok, I'll look into more on Monday as well.
The attached file is the simplest case with a header (1 column) so its easier to step through in debugging.
It has a .txt extension because .csv extensions are blocked for upload here, you can rename it as .csv.
csv-crlf.txt

from codablecsv.

xsleonard avatar xsleonard commented on September 12, 2024

Using a trimStrategy of ["\r"] avoids a crash (CharacterSet.whitespaces.union(["\r"]) in my case).

A more intuitive solution would accept multicharacter Delimiter.Row (that's what I was doing before, $0.delimiter.row = "\r\n") but seems trickier to implement in the _parseEscapedField function.

from codablecsv.

dehesa avatar dehesa commented on September 12, 2024

Hey @xsleonard. I built some tests and tried your file and everything worked.

extension DecodingBadInputTests {
    /// Tests CRLF delimiters with escape fields.
    func testFileCRLF() throws {
        let decoder = CSVDecoder {
            $0.encoding = .utf8
            $0.bufferingStrategy = .sequential
            $0.headerStrategy = .firstLine
            $0.trimStrategy = .whitespaces
            $0.delimiters.row = "\r\n"
        }
        let file = try decoder.decode([[String]].self, from: URL(fileURLWithPath: "/Users/marcos/Desktop/test.csv"))
        XCTAssertEqual(1, file.count)
        XCTAssertEqual(["G"], file[0])
    }
   
    /// Tests CRLF delimiters with escape fields.
    func testLazyFileCRLF() throws {
        let decoder = try CSVDecoder {
            $0.encoding = .utf8
            $0.bufferingStrategy = .sequential
            $0.headerStrategy = .firstLine
            $0.trimStrategy = .whitespaces
            $0.delimiters.row = "\r\n"
        }.lazy(from: URL(fileURLWithPath: "/Users/marcos/Desktop/test.csv"))
        XCTAssertEqual(["G"], try decoder.decodeRow([String].self))
        XCTAssertNil(decoder.next())
    }
}

I also recreated the file in code in case you sent me an invalid file, but it also worked.

extension DecodingBadInputTests {
    /// Tests CRLF delimiters with escape fields.
    func testInputCRLF() throws {
        let input = "Name\r\n\"G\"\r\n"
        let decoder = CSVDecoder {
            $0.encoding = .utf8
            $0.bufferingStrategy = .sequential
            $0.headerStrategy = .firstLine
            $0.trimStrategy = .whitespaces
            $0.delimiters.row = "\r\n"
        }
        let file = try decoder.decode([String].self, from: input)
        for row in file { print(row) }
    }
}

Are you sure the problem is not on your side. Maybe bad file data?

from codablecsv.

xsleonard avatar xsleonard commented on September 12, 2024

Sorry, I described/diagnosed the problem wrong.

If you change $0.delimiters.row = "\r\n" to $0.delimiters.row = "\n" (or leave it unassigned to default to \n) you will see the error.

The error (invalidInput) occurs if the file has \r\n line endings but $0.delimiters.row = "\n". Previously the parser would leave a \r on the last field, but otherwise parse the file, so I'd trim control characters from the last field (post-decoding) in case the file was CRLF. I don't have control over whether or not the file is LF or CRLF.

The solution to parsing both CRLF and LF files is using a trimStrategy containing \r, which is fine for my case, but is not obvious, and the error message (invalidInput) does not point to the problem. Any CSV file generated on a windows machine will have CRLF line endings, so it needs to be handled for user-provided/wild CSVs.

I suggest one of these:

  • Improving the error message if it suspects CRLF, and/or including the invalid scalar in the error message userinfo, and a pointer to the fix to enable both CRLF + LF handling (trimStrategy)
  • Handling both LF and CRLF by default

from codablecsv.

dehesa avatar dehesa commented on September 12, 2024

You are right. The error message doesn't provide correct information. I have changed it adding more useful data and push to master.

Regarding changing the default (which is the same problem you brought in #33): It has some side-effects, since \r may be meaningful to some users and they may want that data in their fields. That is why I make the user to explicitly change the configuration values. I agree is not ideal and I will probably add some information in the README about CLRF files.

from codablecsv.

xsleonard avatar xsleonard commented on September 12, 2024

I wouldn't make the \r in trimCharacters the default, besides your point, it also forces the use of CharacterSet which you've mentioned is a performance issue, so would impact those who need performance. Possibly it can support multiple row delimiters (\n and \r\n) or if delimiter is not explicitly configured, handle it with special-case code internally

from codablecsv.

dehesa avatar dehesa commented on September 12, 2024

I agree that having a special row delimiter that cover both \n and \r\n would be nice to have. I did add that to the Roadmap the first time you brought it up. But I don't know when I will have time to implement it. If it is something that really bothers you, you can always try to implement it yourself 😅 The code that needs to be changed is pretty encapsulated in CSVReader.

from codablecsv.

dehesa avatar dehesa commented on September 12, 2024

This functionality is now supported by CodableCSV. Thanks @xsleonard for bringing this forward.

from codablecsv.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.