Comments (3)
I couldn't find the normative specification of the TPC-H data format. According to the dbgen tool, these are ASCII files containing records that, by default, are separated by a pipe character (|
) and terminated by a line-feed character (\n
). Several examples are shown in the answers directory.
While this format is not CSV, its similarity should be sufficient for FastCSV to easily read and write such files by configuring the field separator to |
(CsvReader.builder().fieldSeparator('|')
/ CsvWriter.builder().fieldSeparator('|')
). If, for any reason, you need to add a field separator at the end of each line when writing such files, simply add one more null field to the record. When reading such files, you could ignore the last field in each record.
You may encounter problems when the data itself contains the field separator |
, newline characters, or quotation marks. But the examples I have seen seem not to use these characters.
from fastcsv.
Thanks for your feedback! But I think there is still some use cases that need record delimiter, like Snowflake and MySQL both support self-defined record delimiters.
ref: https://docs.snowflake.com/en/sql-reference/sql/create-file-format
RECORD_DELIMITER = 'character' | NONE
Use
Data loading, data unloading, and external tables
Definition
One or more singlebyte or multibyte characters that separate records in an input file (data loading) or unloaded file (data unloading). Accepts common escape sequences or the following singlebyte or multibyte characters:
Singlebyte characters
Octal values (prefixed by \\) or hex values (prefixed by 0x or \x). For example, for records delimited by the circumflex accent (^) character, specify the octal (\\136) or hex (0x5e) value.
Multibyte characters
Hex values (prefixed by \x). For example, for records delimited by the cent (ยข) character, specify the hex (\xC2\xA2) value.
The delimiter for RECORD_DELIMITER or FIELD_DELIMITER cannot be a substring of the delimiter for the other file format option (e.g. FIELD_DELIMITER = 'aa' RECORD_DELIMITER = 'aabb').
Is there any way I can implement this feature with FastCSV without the self-defined record delimiter support?
from fastcsv.
Is there any way I can implement this feature with FastCSV without the self-defined record delimiter support?
To make use of custom line/record delimiters with FastCSV, you may create an implementation of java.io.Reader
that replaces the record delimiter with the standard line-feed character. Then, pass this customized Reader to the CsvReader. Similarly, achieve the same for the CsvWriter by implementing a custom java.io.Writer
that replaces the line-feed character with the record delimiter.
But I think there is still some use cases that need record delimiter, like Snowflake and MySQL both support self-defined record delimiters.
The mere presence of this feature in other implementations does not justify its inclusion in FastCSV. Could you share a concrete use case where this feature would be required in the context of CSV (which is not the case for TPC-H data)? Preferably something with a normative specification.
Currently, I don't see how this feature aligns with the goals of FastCSV.
from fastcsv.
Related Issues (20)
- OSGi support HOT 1
- Make the escape character configurable HOT 6
- Limits should be overridable default values HOT 3
- When try use CsvReader<CsvRecord> records second time in separate method then records is empty HOT 4
- Any plan to natively support Apache SPARK ? HOT 2
- Empty fields at the end are omitted HOT 8
- [Document only] Update `Declare dependency` section of the `quickstart.mdx` HOT 1
- CSV data buffered after exception?! HOT 3
- MalformedCsvException: Row 222 has 2 fields, but first row had 1 fields HOT 1
- Performance regression with 2.1.0 HOT 14
- Support for efficiently reading files via random access
- CsvReader fails to parse file with some quoted fields HOT 3
- Use of mutation testing in FastCSV - Help needed HOT 2
- Support for coping with invalid quote chars HOT 7
- How do I use multiple characters as field separators?
- Can not write data to a file via CsvWriter on Android 33 HOT 3
- NamedCsvReader should support for empty lines and different field count
- NamedCsvReader should trim header fields when it reads first line HOT 4
- NamedCsvReader should have support for returning default value if field is not present HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ๐๐๐
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google โค๏ธ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fastcsv.