Is your feature request related to a problem? Please describe. I

[Feature Request] Add support for custom {Record, Line} delimiter about fastcsv HOT 3 CLOSED

IIvm commented on September 16, 2024

[Feature Request] Add support for custom {Record, Line} delimiter

from fastcsv.

Comments (3)

osiegmar commented on September 16, 2024

I couldn't find the normative specification of the TPC-H data format. According to the dbgen tool, these are ASCII files containing records that, by default, are separated by a pipe character (|) and terminated by a line-feed character (\n). Several examples are shown in the answers directory.

While this format is not CSV, its similarity should be sufficient for FastCSV to easily read and write such files by configuring the field separator to | (CsvReader.builder().fieldSeparator('|') / CsvWriter.builder().fieldSeparator('|')). If, for any reason, you need to add a field separator at the end of each line when writing such files, simply add one more null field to the record. When reading such files, you could ignore the last field in each record.

You may encounter problems when the data itself contains the field separator |, newline characters, or quotation marks. But the examples I have seen seem not to use these characters.

from fastcsv.

IIvm commented on September 16, 2024

Thanks for your feedback! But I think there is still some use cases that need record delimiter, like Snowflake and MySQL both support self-defined record delimiters.

ref: https://docs.snowflake.com/en/sql-reference/sql/create-file-format

RECORD_DELIMITER = 'character' | NONE
Use
Data loading, data unloading, and external tables

Definition
One or more singlebyte or multibyte characters that separate records in an input file (data loading) or unloaded file (data unloading). Accepts common escape sequences or the following singlebyte or multibyte characters:

Singlebyte characters
Octal values (prefixed by \\) or hex values (prefixed by 0x or \x). For example, for records delimited by the circumflex accent (^) character, specify the octal (\\136) or hex (0x5e) value.

Multibyte characters
Hex values (prefixed by \x). For example, for records delimited by the cent (¢) character, specify the hex (\xC2\xA2) value.

The delimiter for RECORD_DELIMITER or FIELD_DELIMITER cannot be a substring of the delimiter for the other file format option (e.g. FIELD_DELIMITER = 'aa' RECORD_DELIMITER = 'aabb').

Is there any way I can implement this feature with FastCSV without the self-defined record delimiter support?

from fastcsv.

osiegmar commented on September 16, 2024

Is there any way I can implement this feature with FastCSV without the self-defined record delimiter support?

To make use of custom line/record delimiters with FastCSV, you may create an implementation of java.io.Reader that replaces the record delimiter with the standard line-feed character. Then, pass this customized Reader to the CsvReader. Similarly, achieve the same for the CsvWriter by implementing a custom java.io.Writer that replaces the line-feed character with the record delimiter.

But I think there is still some use cases that need record delimiter, like Snowflake and MySQL both support self-defined record delimiters.

The mere presence of this feature in other implementations does not justify its inclusion in FastCSV. Could you share a concrete use case where this feature would be required in the context of CSV (which is not the case for TPC-H data)? Preferably something with a normative specification.

Currently, I don't see how this feature aligns with the goals of FastCSV.

from fastcsv.

[Feature Request] Add support for custom {Record, Line} delimiter about fastcsv HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent