Git Product home page Git Product logo

fastcsv's Introduction

👨🏻‍💻 Oliver Siegmar

Linkedin Badge XING Badge

fastcsv's People

Contributors

charphi avatar dependabot[bot] avatar juergen-albert avatar nathankleyn avatar obolrom avatar osiegmar avatar richard-lionheart avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

fastcsv's Issues

Add QuoteStrategy parameter in CsvReader to handle empty strings vs null values

QuoteStrategy.EMPTY is convenient if I want to differenciate empty strings from null values in the output file.

However there is no such parameter in CsvReader which means I cannot read back the original data.

Below is a unit test showing this:

    /**
     * Writes a single row of special values, reads back the file, and tests
     * that read values exactly match the original values.
     */
    @Test
    public void test() throws IOException {
        String[] values = new String[]{
            "Simple text",
            "Multiline\ntext",
            // a string containing a comma
            "1,2",
            // a string with double quotes
            "\"Hello\"",
            // a string containing a single character: a double quote
            "\"",
            // an empty string
            "",
            // a null value
            null
        };

        File tmp = new File("C:/tmp/csv.txt");

        // write the csv file
        try (CsvWriter csv = CsvWriter.builder()
            .quoteStrategy(QuoteStrategy.EMPTY)
            .build(tmp.toPath(), StandardCharsets.UTF_8)) {

            csv.writeRow(values);
        }

        // read back the file
        String[] readValues = null;
        try (CsvReader csv = CsvReader.builder()
            .skipEmptyRows(true)
            .build(tmp.toPath(), StandardCharsets.UTF_8)) {

            for (CsvRow row : csv) {
                readValues = new String[row.getFieldCount()];
                for (int i = 0; i < readValues.length; i++) {
                    readValues[i] = row.getField(i);
                }
            }
        }

        Assert.assertNotNull(readValues);
        // this fails because of the null value read back as an empty string
        Assert.assertArrayEquals(values, readValues);
    }
}

It would be very nice to have the QuoteStrategy parameter in the reader.

CsvRow returns incorrect starting offset for multibyte input

Describe the bug
random access by offset is incorrect.
testfile: Item.csv
runtime log:

CsvRow[originalLineNumber=1, startingOffset=0, fields=[JHXMMLCARMY0926DYG0111RL, 139707794, Women’s V Neck Nightshirt Cotton Casual Sleepwear Short Sleeve Nightgown S-XXL, ACTIVE, PUBLISHED, , Clothing, 16.32, USD, 16.32, 0.0, , 2038356, VALUE, 0.48, "LB", , Seller Fulfilled, , 2A5KQQ6BAE5S, 05432968344899, , http://www.walmart.com/ip/Women-s-V-Neck-Nightshirt-Cotton-Casual-Sleepwear-Short-Sleeve-Nightgown-S-XXL/139707794, https://i5.walmartimages.com/asr/f277aaf6-4bf0-4635-be9b-9ecf8826bbfa.c175ab2fc00cdfa902ee3408d6d4c586.jpeg, UNNAV, ["UNNAV"], Carlendan, 10/29/2021, 12/31/2049, 10/29/2021, 10/29/2021, 0, , Y, , , , ], comment=false]
CsvRow[originalLineNumber=1, startingOffset=0, fields=[, ], comment=false]

To Reproduce
JUnit test to reproduce the behavior:

    private static void randomAccessFile() {
        try {

            final Path path = Paths.get(System.getProperty("user.dir") + "/data/Item.csv");

            // collect row offsets (could also be done in larger chunks)
            final List<Long> offsets;
            try (CsvReader csvReader = CsvReader.builder().build(path, UTF_8)) {
                offsets = csvReader.stream()
                        .map(CsvRow::getStartingOffset)
                        .collect(Collectors.toList());
            }

            // random access read with offset seeking
            try (RandomAccessFile raf = new RandomAccessFile(path.toFile(), "r");
                 FileInputStream fin = new FileInputStream(raf.getFD());
                 InputStreamReader isr = new InputStreamReader(fin, UTF_8);
                 CsvReader reader = CsvReader.builder().build(isr);
                 CloseableIterator<CsvRow> iterator = reader.iterator()) {

                // seek to file offset of row 5
                raf.seek(offsets.get(5));
                reader.resetBuffer();
                System.out.println(iterator.next());

                // seek to file offset of row 8
                raf.seek(offsets.get(8));
                reader.resetBuffer();
                System.out.println(iterator.next());
            }
        } catch (final IOException e) {
            throw new UncheckedIOException(e);
        }
    }

Additional context
Java distribution and version to be used (output of java -version).

Unable to contact via email

Hi,

I'm a java programmer and saw the statistics and it was awesome. I have used CSV reader as well.

How does it works very fast? Which change makes it very fast and missing in other libraries?

I'm very curious to know..Since i am unable to find the email address..putting as feature request. Please don't mistake.

Thanks

Can't write to file

Hi I trying to write using appendLine() function. But the file is empty after I call the function.
This is a bug?

How to append a row to an existing file?

Maybe I'm missing something obvious .. I want to simply add a row of data to a CSV file that already exists on disk.

Using CsvWriter and writeRow does not create a row at the bottom of the file. I noticed things changed in the version overhaul, and the CsvAppender and appendLine stuff is gone.

So, using the CsvWriter how can you open an existing CSV file and add a row of new data?

csvReader.read(file, StandardCharsets.UTF_8); Gives Null even if it's contains a header

Hi!

I have a issue here. I know that you have write the code like this, but this line gives null

CsvContainer csvContainer = csvReader.read(file, StandardCharsets.UTF_8);

Even if I have a header. I understand if csvContainer is null if the CSV file is empty, but it should not be null if csvContainer has at least one row or a header.

Edit:

Found out that this can be done this way.

public void newRow(String rowText) {
		try {
			CsvParser csvParser = csvReader.parse(file, StandardCharsets.UTF_8);
			Collection<String[]> data = new ArrayList<>();
			csvParser.nextRow(); // Need to call this to get the header
			data.add((String[]) csvParser.getHeader().toArray()); // Add header
			CsvRow csvRow;
			while((csvRow = csvParser.nextRow()) != null)
				data.add((String[]) csvRow.getFields().toArray()); // Add existing lines to data
			
			data.add(rowText.split(delimiter)); // Add the new line to data with a new line
			csvWriter.write(file, StandardCharsets.UTF_8, data); // Auto close
		} catch (IOException e) {
			dialogs.exception("Cannot add new rows", e);
		}
	}

FastCSV should have a method where to append text to files. I know that you have such method, but that will first remove all data, then fill.

It's a great library and I will use it with Deeplearning4j. But I wonder if you could make an interface so it would be easier to use?

Easier I mean by a programmer should not need to write this much code for null exceptions

package se.danielmartensson.tools;

import java.io.File;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collection;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

import de.siegmar.fastcsv.reader.CsvContainer;
import de.siegmar.fastcsv.reader.CsvParser;
import de.siegmar.fastcsv.reader.CsvReader;
import de.siegmar.fastcsv.reader.CsvRow;
import de.siegmar.fastcsv.writer.CsvAppender;
import de.siegmar.fastcsv.writer.CsvWriter;
import javafx.scene.control.Alert.AlertType;


/**
 * The reason why we are using FastCSV and not SQLite, is due to memory use.
 * @author Daniel Mårtensson
 *
 */
public class CSVHandler {
	private Dialogs dialogs = new Dialogs();
	private CsvReader csvReader;
	private CsvWriter csvWriter;;
	private File file;
	private String delimiter;
	
	/**
	 * Constructor 
	 * @param fileHandler File handler object
	 * @param filePath Path to our file
	 * @param delimiter Separator "," or ";" etc.
	 * @param headers String that contains name of columns with delimiter as separator
	 */
	public CSVHandler(FileHandler fileHandler, String filePath, String delimiter, String headers) {
        file = fileHandler.loadFile(filePath);
        this.delimiter = delimiter;
        csvWriter = new CsvWriter();
        csvReader = new CsvReader();
        csvReader.setFieldSeparator(delimiter.charAt(0));
        csvWriter.setFieldSeparator(delimiter.charAt(0));
        
        /*
         * Check if file has 0 rows = empty
         */
        if(getTotalRows() == 0)
        	newHeader(headers); // Write our header if we don't have one
        csvReader.setContainsHeader(true);
        
	}

	/**
	 * Get a single cell
	 * @param row Row index
	 * @param header Header name
	 * @return String
	 */
	public String getCell(int row, String header) {
		try {
			CsvContainer csvContainer = csvReader.read(file, StandardCharsets.UTF_8);
			int totalRows = csvContainer.getRowCount();
			if(row > totalRows)
				dialogs.alertDialog(AlertType.WARNING, "Index", "Index out of bounds: " + row + " > " + totalRows);
			else
				for (int i = 0; i < totalRows; i++)
					if(i == row)
						return csvContainer.getRow(i).getField(header); // Success!
			return ""; // Nothing happens!
		} catch (IOException | NullPointerException e) {
			dialogs.exception("Cannot get cell. Returning empty string", e);
			return ""; // Empty
		}
	}
	
	/**
	 * Return a complete row
	 * @param row Row number that we want to return
	 * @return
	 */
	public List<String> getRow(int row) {
		try {
			CsvContainer csvContainer = csvReader.read(file, StandardCharsets.UTF_8);
			return csvContainer.getRow(row).getFields();
		}catch(IOException | NullPointerException e) {
			dialogs.exception("Cannot get rows. Return List<String> = null", e);
			return null;
		}
	}
	
	/**
	 * Set one value to a single cell
	 * @param row Row number
	 * @param header Our string header
	 * @param cellValue Our value that we want to insert
	 */
	public void setCell(int row, String header, String cellValue) {
		try {
			/*
			 * Get total columns and get the current cell value in a row 
			 */
			CsvContainer csvContainer = csvReader.read(file, StandardCharsets.UTF_8);
			CsvRow csvRow = csvContainer.getRow(row);
			String currentCell = csvRow.getField(header);
			int totalColumns = csvContainer.getRow(row).getFields().size();

			/*
			 * Search for column index by searching for a know cell value
			 */
			int columIndex = 0;
			while(columIndex < totalColumns)
				if(csvRow.getField(columIndex).equals(currentCell))
					break;
				else
					columIndex++;
			
			/*
			 * Insert cellValue in column and insert row in container
			 */
			csvRow.getFields().set(columIndex, cellValue); 
			csvContainer.getRows().set(row, csvRow); // TODO: Testa ta bort denna rad
			
			/*
			 * Collect and write all
			 */
			writeAll(csvContainer);
		} catch (IOException | NullPointerException e) {
			dialogs.exception("Cannot set cell", e);
		}
	}
	
	/**
	 * Replace a whole row
	 * @param row row number
	 * @param text text with delimiter separator
	 */
	public void replaceRow(int row, String text) {
		try {
			/*
			 * Replace all items in a row
			 */
			CsvContainer csvContainer = csvReader.read(file, StandardCharsets.UTF_8);
			CsvRow csvRow = csvContainer.getRow(row);
			String[] list = text.split(String.valueOf(delimiter));
			int totalColumns = csvContainer.getRow(row).getFields().size();
			if(list.length == totalColumns) {
				for(int i = 0; i < list.length; i++)
					csvRow.getFields().set(i, list[i]);
				
				/*
				 * Insert row in container
				 */
				csvContainer.getRows().set(row, csvRow); // TODO: Testa ta bort denna rad
				
				/*
				 * Collect and write all
				 */
				writeAll(csvContainer);
			}else {
				dialogs.alertDialog(AlertType.ERROR, "Insert", "Not same dimension as CSV file");
			}
		} catch (IOException | NullPointerException e) {
			dialogs.exception("Cannot replace row", e);
		}
	}
	
	/**
	 * Search for a cell value in a g
	 * @param cellValue The cell in form of a string
	 * @param header Name of the column
	 * @return boolean
	 */
	public boolean exist(String cellValue, String header) {
		try {
			CsvContainer csvContainer = csvReader.read(file, StandardCharsets.UTF_8);
			if(csvContainer == null)
				return false; // Nothing has been added, except the header
			for(int i = 0; i < csvContainer.getRowCount(); i++)
				if(cellValue.equals(csvContainer.getRow(i).getField(header)) == true)
					return true; // Yes
			return false; // Nope
		} catch (IOException | NullPointerException e) {
			dialogs.exception("Cannot check existens. Returning false", e);
			return false;
		}
	}
	
	/**
	 * Find on which row cellValue is on a header
	 * @param cellValue
	 * @param header
	 * @return int
	 */
	public int findRow(String cellValue, String header) {
		try {
			CsvContainer csvContainer = csvReader.read(file, StandardCharsets.UTF_8);
			for(int i = 0; i < csvContainer.getRowCount(); i++)
				if(cellValue.equals(csvContainer.getRow(i).getField(header)) == true)
					return i; // Yes
			return 0; // Nope
		} catch (IOException | NullPointerException e) {
			dialogs.exception("Cannot find row index. Returning 0", e);
			return 0;
		}
	}
	
	/**
	 * Delete the whole row at least if we got a row
	 * @param row row number 
	 */
	public void deleteRow(int row)  {
		try {
			/*
			 * Remove a selected row
			 */
			CsvContainer csvContainer = csvReader.read(file, StandardCharsets.UTF_8);
			csvContainer.getRows().remove(row);
			writeAll(csvContainer);
			
		} catch (IOException | NullPointerException e) {
			dialogs.exception("Cannot delete row", e);
		}
	}
	
	/**
	 * Write all to the file
	 * @param csvContainer The CsvContainer object
	 * @throws IOException
	 */
	private void writeAll(CsvContainer csvContainer) throws IOException {
		/*
		 * Collect and write all
		 */
		Collection<String[]> data = new ArrayList<>();
		for(CsvRow csvRow : csvContainer.getRows())
			data.add((String[]) csvRow.getFields().toArray());
		csvWriter.write(file, StandardCharsets.UTF_8, data); // Auto close
	}

	/**
	 * Write a new header to the CSV file - This won't give us csvAppender == null if we have empty file
	 * @param rowText Enter the string
	 */
	public void newHeader(String rowText) {
		try {
			Collection<String[]> data = new ArrayList<>();
			data.add(rowText.split(delimiter)); // Add the header data
			csvWriter.write(file, StandardCharsets.UTF_8, data); // Auto close
		} catch (IOException | NullPointerException e) {
			dialogs.exception("Cannot write now row", e);
		}
	}
	
	/**
	 * Create a new row
	 * @param rowText
	 */
	public void newRow(String rowText) {
		try {
			CsvParser csvParser = csvReader.parse(file, StandardCharsets.UTF_8);
			Collection<String[]> data = new ArrayList<>();
			CsvRow csvRow = csvParser.nextRow(); // Need to call this to get the header
			data.add((String[]) csvParser.getHeader().toArray()); // Add header
			data.add((String[]) csvRow.getFields().toArray()); // Add the row under the header
			while((csvRow = csvParser.nextRow()) != null) {
				data.add((String[]) csvRow.getFields().toArray()); // Add existing lines to data
			}
			
			data.add(rowText.split(delimiter)); // Add the new line to data with a new line
			csvWriter.write(file, StandardCharsets.UTF_8, data); // Auto close
		} catch (IOException e) {
			dialogs.exception("Cannot add new rows", e);
		}
	}

	/**
	 * Return total rows
	 * @return int total rows
	 */
	public int getTotalRows() {
		try {
			CsvContainer csvContainer = csvReader.read(file, StandardCharsets.UTF_8);
			if(csvContainer == null)
				return 0; // Null means no rows here
			return csvContainer.getRowCount();
		} catch (IOException e) {
			dialogs.exception("Cannot find total rows. Returning 0", e);
			return 0;
		}
	}

	/**
	 * Return total columns, in this case, it's on row index 0
	 * @return int total columns
	 */
	public int getTotalColumns() {
		try {
			CsvContainer csvContainer = csvReader.read(file, StandardCharsets.UTF_8);
			if(csvContainer == null)
				return 0; // Null means no rows here
			return csvContainer.getRow(0).getFields().size();
		} catch (IOException e) {
			dialogs.exception("Cannot find total columns. Returning 0.", e);
			return 0;
		}
	}
}

Issue with reading

First column of line: id;name ;firstname;age;�������;�����
Not reading russian language values from rows and fields

Caused by: java.io.IOException: Maximum buffer size 8388608 is not enough to read data

Describe the bug
I am trying to read a file having size of approx 340 MBs. After I reach line number 809, I get this error:
Caused by: java.io.IOException: Maximum buffer size 8388608 is not enough to read data

To Reproduce
Try reading a csv with 800k rows.

Code:

CsvReader reader = CsvReader.builder()
            .fieldSeparator('\t')
            .quoteCharacter('"')
            .commentStrategy(CommentStrategy.NONE)
            .skipEmptyRows(true)
            .errorOnDifferentFieldCount(true)
            .build(path, charset);

reader.forEach(System.out::println);

Additional context
java version "1.8.0_201"

Please make compatible for Kotlin

I faced error when using this with kotlin for android version of 23. I found that android remove many class like java.time and java.nio. Can you fix this incompatible.

Support for GzipInputstream and GzipOutputstream

Hi,

Currently to write anything using csvWriter file, file path or writer is supported. I wanted to write a zipped csv file so I did something like below,
where stream is GZIPOutputStream. But it only accepts a byte array. And as you can write(...) method is giving char[] so I had to convert it to byte array. I guess because of that It is taking same time as fasterXML.

Writer writer = new Writer() {
            @Override
            public void write(@NotNull char[] cbuf, int off, int len) throws IOException {
                byte[] b = new byte[len];
                for (int i = 0; i < len; i++) {
                    b[i] = (byte) cbuf[i];
                    bytes[0] += 8;
                }
                stream.write(b);
            }

            @Override
            public void flush() throws IOException {
                stream.flush();
            }

            @Override
            public void close() throws IOException {
                stream.close();
            }
        };

Thanks.

Direct access to the fields as String[]

Hi,

I know you've put some extra efforts to provide the CsvRow.getFields() as unmodiable List, but it would also be nice if the String[] is directly available.

The reason in my case is that my application was built upon another CSV parsing before, and handled all the rows as String[] ...so, swapping the CSV parsing with your library actually requires me to swap the List<String> from getFields() back to String[] to avoid rewriting too much code. However, this feels both cumbersome and unnessary since they are already available within the CsvRow ...just not accessible. It would be nice to expose it directly.

Since many Libs deal with String[] rows, I think it could actually help as drop-in replacement for several of them.

Greetings aus Stuttgart

New read method in CsvReader

Hello!
Do you plan to add another method, where will the InputStream be the first parameter?
Thanks.

public CsvContainer read(final InputStream stream) throws IOException {
        Objects.requireNonNull(stream, "stream must not be null");
        try (final Reader reader = newInputStreamReader(stream)) {
            return read(reader);
    }
}

When are you planning to release v2?

Sorry, this is probably not the best forum for this question, but I'm not quite sure where else to ask this - feel free to point me somewhere else for this discussion.

I see a number of references to a v2 which will probably introduce a number of improvements but also breaking changes. When are you thinking you might be releasing a v2? I see some commits from January in the version2-rewrite branch, but can't quite get a sense of how far along you are with that.

Also, it looks like v2 will require Java 8 - is that correct?

Class naming disambiguation

In the current version, some class names are not clear (see #7 (comment)).
Here are some suggestions for 2.x:

Solution 1: reader, writer, fluent factories

old name new name
CsvReader CsvReaderFactory
CsvParser CsvReader
CsvWriter CsvWriterFactory
CsvAppender CsvWriter
CsvWriterFactory factory = new CsvWriterFactory().fieldSeparator(';');

try(StringWriter writer = new StringWriter()) {
  try(CsvWriter csv = factory.create(writer)) {
    ...
  }
}

Path file = ...;
try(CsvWriter csv = factory.create(file, StandardCharsets.UTF_8)) {
  ...
}

Solution 2: reader, writer, bean settings

old name new name
CsvReader CsvReaderSettings
CsvParser CsvReader
CsvWriter CsvWriterSettings
CsvAppender CsvWriter
CsvWriterSettings settings = new CsvWriterSettings();
settings.setFieldSeparator(';');

try(StringWriter writer = new StringWriter()) {
  try(CsvWriter csv = CsvWriter.create(settings, writer)) {
    ...
  }
}

Path file = ...;
try(CsvWriter csv = CsvWriter.create(settings, file, StandardCharsets.UTF_8)) {
  ...
}

Solution 3: reader, writer, value settings

old name new name
CsvReader CsvReaderSettings
CsvParser CsvReader
CsvWriter CsvWriterSettings
CsvAppender CsvWriter
CsvWriterSettings settings = CsvWriterSettings.builder().fieldSeparator(';').buid();

try(StringWriter writer = new StringWriter()) {
  try(CsvWriter csv = CsvWriter.create(settings, writer)) {
    ...
  }
}

Path file = ...;
try(CsvWriter csv = CsvWriter.create(settings, file, StandardCharsets.UTF_8)) {
  ...
}

Any plans to make a release?

Hi, there are commits in master, but no releases were made in two years, is it possible for you to make a release?
Specifically interested in the following fix.

Thanks!

Performance regression with 2.1.0

Thank you for your work on this great product! It's proven performance has significantly improved the performance of our application.

We had initially been using version 1.0.4, which greatly improved the performance of our CSV parsing (over commons-csv which we had been previously using). We recently tried upgrading to version 2.1.0. I like the new API, however, we noticed that there was a significant performance degradation over 1.0.4. We have a little bit of a unique data format that we deal with, which involves an embedded CSV list within a CSV column. This is how our data looks:

NAME,NUMBER,WIDGETS_LIST
john doe,123456,"""thequickbrownfoxjumpedoverthelazydog"""
john smith,7890123,"""thequickbrownfoxjumpedoverthelazydog1"",""thequickbrownfoxjumpedoverthelazydog2"""

The WIDGETS_LIST column is a variable length list that is formatted as an embedded csv string. Each item in the list is usually around 200 characters long.

With fastcsv 1.0.4 we would parse the data with code like this:

class Parser {

  Client parseCsv(Path file) {
    List<Client> clients = new ArrayList<>();
    CsvReader csvReader = new CsvReader();
    try(var parser = csvReader.parse(file, StandardCharsets.UTF_8)) {
      CsvRow row;
      while( (row = parser.nextRow()) != null) {
        String name = row.getField(0);
        String number = row.getField(1);
        List<String> widgets = parseWidgets(row.getField(2));
        clients.add(new Client(name, number, widgets));
      }
    }
    return clients;
  }
  List<String> parseWidgets(String data) {
    CsvReader csvReader = new CsvReader();
    CsvParser parser = csvParser.parser(new StringReader(data));
    CsvRow row = parser.nextRow();
    return row != null ? List.copyOf(row.getFields()) : List.of();
  }

}

With fastcsv 2.1.0 we parse with code like this:

class Parser {

  Client parseCsv(Path file) {
    try(var parser = CsvReader.builder().build(file)) {
      return parser.stream()
         .map(row -> {
             String name = row.getField(0);
             String number = row.getField(1);
             List<String> widgets = parseWidgets(row.getField(2));
             return new Client(name, number, widgets));
         })
         .toList();
  }
  List<String> parseWidgets(String data) {
    return CsvReader.builder().build(data)
        .stream().flatMap(r -> r.getFields().stream())
        .toList();
  }

}

Very surprisingly, the fastcsv 2.1.0 code takes around twice as long to parse the CSV data than version 1.0.4. It seems to be related to the embedded CSV string since for other data without the embedded CSV, 2.1.0 is actually faster than 1.0.4. However, I cannot figure out why the embedded CSV is causing such a significant slow down. To get meaningful performance results we benchmarked with a CSV file containing about 1 million rows, and processed the same file 10 times per run.

Additional context
Java distribution and version to be used (output of java -version).

openjdk version "17.0.2" 2022-01-18
OpenJDK Runtime Environment Temurin-17.0.2+8 (build 17.0.2+8)
OpenJDK 64-bit Server VM Temurin-17.0.2+8 (build 17.0.2+8, mixed mode, sharing)

Support for devices with API <26 using read methods

When using the following methods:
read(final File file, final Charset charset) at CsvReader.java:107
read(final Path path, final Charset charset) at CsvReader.java:122

devices < API 26 receive this exception:

java.lang.NoSuchMethodError: No virtual method toPath()Ljava/nio/file/Path; in class Ljava/io/File; or its super classes (declaration of 'java.io.File' appears in /system/framework/core-oj.jar) at de.siegmar.fastcsv.reader.CsvReader.read(CsvReader.java:109)

Most likely because of some methods in the java.nio package are not available for API <26
Call requires API level 26 (current min is 19): java.nio.file.Paths#get

Suggestion would be to add a minimum API requirement for these methods or to read the file in a different way internally if the API is <26

ArrayIndexOutOfBoundsException in ReusableStringBuilder

Please help with this exception:

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException
at java.base/java.lang.System.arraycopy(Native Method)
at de.siegmar.fastcsv.reader.ReusableStringBuilder.append(ReusableString
Builder.java:65)
at de.siegmar.fastcsv.reader.RowReader.readLine(RowReader.java:74)
at de.siegmar.fastcsv.reader.CsvParser.nextRow(CsvParser.java:85)
at de.siegmar.fastcsv.reader.CsvReader.read(CsvReader.java:147)
at de.siegmar.fastcsv.reader.CsvReader.read(CsvReader.java:126)
at com.teamtrade.fundamental.report.screener.ReportScreener.readCsvFile(
ReportScreener.java:69)
at com.teamtrade.fundamental.report.screener.ReportScreener.readQuarterR
eports(ReportScreener.java:141)
at com.teamtrade.fundamental.report.screener.ReportScreener.main(ReportS
creener.java:45)

private static CsvContainer readCsvFile(Path path) throws IOException {
    CsvReader csvReader = new CsvReader();
    csvReader.setFieldSeparator('\t');
    csvReader.setContainsHeader(true);
    return csvReader.read(path, StandardCharsets.UTF_8); // line number 69
}

CSV file name 'txt.tsv'. It is in this archive: https://www.sec.gov/files/dera/data/financial-statement-and-notes-data-sets/2015q2_notes.zip

Support for commented lines

Wanted to have support for the commented lines as well. This will help in ignoring the lines mentioned as comment in the csv format.

Unnecesary temporary objects in CsvAppender

In CsvAppender.appendField(final String value)

final char[] valueChars = value.toCharArray();
...
for (final char c : valueChars) {

This is creating a temporary array that is only used to be iterated. It is easy to avoid this:

for (int i = 0; i < value.length(); i++) {
    final char c = value.charAt(i);

IMO the extra index checks in charAt() weight less than the "new char[length]" impact on GC. Maybe I am wrong.

BTW, thanks for this nice easy to use library!

Unnecessary "throws IOException" in CsvParser.parse()

You can remove the throws because there is nothing that can throw the exception.

public CsvParser parse(final Reader reader) / * nothing happens without throws IOException */{
        return new CsvParser(Objects.requireNonNull(reader, "reader must not be null"),
            fieldSeparator, textDelimiter, containsHeader, skipEmptyRows,
            errorOnDifferentFieldCount);
    }

Add CSV table row position

Currently CsvRow has originalLineNumber property which represents line number in file but not in table.

Example:

planet,text // line 1 in file and line 1 in table view
Earth,"opan // line 2 in file and line 2 in table view
adsad
sdfsdf
sdfsdfsd
sdfsdfsd
sdfsfdsf"
Mars,marscool // line 8 in file, but line 3 in table view

Crash below Android 8.0 Oreo

When running the example on a device below Android 8.0 Oreo, this LogCat log is created after the crash:

06-14 08:57:12.906 21834-24375/? E/AndroidRuntime: FATAL EXCEPTION: AsyncTask #5
Process: com.app.my, PID: 21834
java.lang.NoSuchMethodError: No virtual method toPath()Ljava/nio/file/Path; in class Ljava/io/File; or its super classes (declaration of 'java.io.File' appears in /system/framework/core-oj.jar)
at de.siegmar.fastcsv.writer.CsvWriter.append(CsvWriter.java:149)
at com.app.my.utils.AirDataUtils.writePath(MyUtils.java:29)
at com.app.my.activities.MainActivity$56$1$1.run(MainActivity.java:2860)
at android.os.AsyncTask$SerialExecutor$1.run(AsyncTask.java:243)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1133)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:607)
at java.lang.Thread.run(Thread.java:762)

After updating the device to Android 8.0 Oreo, it works fine.
My build config:

compileSdkVersion 27
buildToolsVersion '27.0.3'
minSdkVersion 19
targetSdkVersion 27

Quoted fields at end of row are silently dropped

Many thanks for such a great piece of software, we chose to use your library for some big-data processing because we found it had vastly better performance than anything else! Huge thanks!

We ran into a slightly obscure bug parsing some huge CSVs: a field that appears at the end of a row that is quoted but empty gets silently dropped. Here's an example:

"foo",""

If you run this test you'll see it fails:

public void handlesEmptyQuotedFieldsAtEndOfRow() throws IOException {
  assertEquals(readCsvRow("foo,\"\"").getField(1), "");
}

We ran into this because we receive CSVs that have all fields quoted, even empty ones, and couldn't work out why accessing the final field would sometimes lead to an ArrayIndexOutOfBoundsException.

I've had an attempt at a fix for this which I'll raise a PR for momentarily, and I've done my best to try to stick to the performance sensitive methods you are using, but eager for feedback if I've done anything not to your liking!

Please let me know if we can help in any other way!

"NoSuchMethodError" Exception is thrown while saving to csv

@osiegmar
Unable to save to csv due to exception, below is my stack trace and my code.
No virtual method toPath()Ljava/nio/file/Path; in class Ljava/io/File;

if (entityList.size() > 0) {

                File storageDir = new File(Environment.getExternalStorageDirectory() + "/"
                        + this.getString(R.string.app_name));

                boolean success = true;
                if (!storageDir.exists()) {
                    success = storageDir.mkdirs();
                }

                if (success) {

// String baseDir = getExternalStorageDirectory().getAbsolutePath();
// String filePath = baseDir + "/" + "Demo.csv";
File file = new File(storageDir, "contacts.csv");
CsvWriter csvWriter = new CsvWriter();
Collection<String[]> data = new ArrayList<>();
for (ContactEntity d : entityList) {
data.add(new String[]{"Name", "Phone Number"});
data.add(new String[]{d.getName(), d.getName()});
}
try {
csvWriter.write(file, StandardCharsets.UTF_8, data);
Log.v(TAG, "csv file created");
} catch (IOException e) {
e.printStackTrace();
}
}else {
Toast.makeText(this, "Directory not exist", Toast.LENGTH_SHORT).show();
}
} else {
Toast.makeText(this, "No data to export csv", Toast.LENGTH_SHORT).show();
}

ArrayIndexOutOfBoundsException And Null when reading from getField

Hi,

I have a problem using the csvReader.
I have created the csv file using the full csv at once writer including header and custom settings as
csvWriter.setFieldSeparator(';');
csvWriter.setLineDelimiter("\r\n".toCharArray());
csvWriter.setAlwaysDelimitText(true);

However i am trying to read the csv file using the full CSV file with header at once reader.
At first my issue is i get an ArrayIndexOutOfBoundException, and i dont know if this is caused by a large csv file.
public Map<String, String> readingFileAtOnceHeader(File file) throws IOException {
Map<String, String> personMap = new HashMap<>();
CsvReader csvReader = new CsvReader();
csvReader.setContainsHeader(true);
CsvContainer csv = csvReader.read(file, StandardCharsets.UTF_8);
for (CsvRow row : csv.getRows()) {
personMap.put(row.getField("PersonID"), row.getField("CivilRegistrationNumber"));
}
return personMap;
}
Then i tried to add
csvReader.setTextDelimiter('\'');
which solved the problem (However this is not what i want to add).

The second issue is that i am trying to read the two of the header fields, as illustrated in the above code. However both of these a null and i cant figure out why. I tried with index as well, and when i try
row.getField(0)
it returns the whole row of data and not only data at index 0. While
row.getField(0)
return an indexOutOfBoundException.

CsvWriter "No such a method"

I think that I found a bug inside the class CsvWriter:

"Caused by: java.lang.NoSuchMethodError: No virtual method toPath()Ljava/nio/file/Path; in class Ljava/io/File; or its super classes (declaration of 'java.io.File' appears in /system/framework/core-oj.jar)
                                                                             at de.siegmar.fastcsv.writer.CsvWriter.append(CsvWriter.java:148)"

I doesn't work on android 7.0 but it works on android 8.0, same phone.
I already tried forcing Android Studio working with Java VERSION_1_8 and VERSION_1_7 but it's still the same.

Use client passed Writer without wrapping it

Is your feature request related to a problem? Please describe.

Basically CsvWriter fails the Single Responsibility Principle by tackling both high level csv formatting and low level IO buffering, creating problems for non trivial uses.
Some use cases rely on a Writer obtained previously which is used to write more data than just a single CSV file. For example the same stream can contain multiple CSVs or some other data at the end.
The user can not do a writer.flush() because the writer is internally wrapped with CachingWriter so the last bytes may never get written until a csvWriter.close() is issued which closes the writer as well.

Once a Writer is passed its state is unknown until a call to CavWriter.close() is made!

Describe the solution you'd like

Let construct a CsvWriter without messing in any way with the writer passed. Let CsvWriter deal with building the csv data structure, and let the passed Writer deal with the low level stuff.

Describe alternatives you've considered

No alternative possible except a source code modification.

RFC 4180 compliance
Would this feature comply to RFC 4180?

Yes, but the important part is that the code will be more correct, because it won't mess with external code passed to it.

Steps to use ?

Can anyone please tell me the steps on how to use this in my android application?

Thank You!

GC limit overhead exceeded because of temporary objects

Hi,

I am trying to read from a csv file containing a bit more than 2 million rows, then make a simple mapping to something i can use to finally insert it to a database. However, i am getting an erro: "GC limit overhead exceeded", as it creates a lot of temporary objects.

I read the other issue regarding temporary objects, however as i could understand, it is regarding writing to a csv file, but i am getting this error while reading from an csv file.

Per-field quoting

I need a way to enforce per-field quoting in order to generate CSV for PostgreSQL COPY statement because quoted empty string is treated as NULL, while totally empty field is treated as an empty string:

1,,3 ---> ""
1,"",3 ---> NULL

The current CsvAppender API doesn't support such behavior. Possible solutions:

  • add an additional flag to appendField
  • add a new method appendDelimitedField
  • make alwaysDelimitText field mutable so that the consumer can turn it off/on before appending the specific field

java.lang.ArrayIndexOutOfBoundsException: 1

Hello i have problem using CsvReader:
Csv file:
Field1 Field2 Field3 Field4 Field5
1Value1 1Value2 1Value3 1Value4 1Value5
2Value1 2Value2 2Value3 2Value4 2Value5
3Value1 3Value2 3Value3 3Value4 3Value5
Simple program:

CsvReader reader = new CsvReader();
reader.setTextDelimiter('\t');
reader.setContainsHeader(true);
CsvContainer csv = reader.read(Paths("test.csv"), StandardCharsets.UTF_8);
for (CsvRow row : csv.getRows()) {
     System.out.println(row);
     System.out.println(row.getField(0));
     System.out.println(row.getField(1));
}

Output:
CsvRow{originalLineNumber=2, fields={Field1 Field2 Field3 Field4 Field5=1Value1 1Value2 1Value3 1Value4 1Value5}}
1Value1 1Value2 1Value3 1Value4 1Value5
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
at java.util.Arrays$ArrayList.get(Arrays.java:3841)
at de.siegmar.fastcsv.reader.CsvRow.getField(CsvRow.java:66)
at org.mycompany.fastcsvtest.Main.readTest(Main.java:92)
at org.mycompany.fastcsvtest.Main.main(Main.java:28)

Any thoughts?
Thanks.

More <Storage Access Framework>-friendly CsvWriter builder

Hello! First of all, i'd like to both thank you and congratulate you, it's a great library and very straightforward to use.
The only not-so-straightforward part of using this to export data to a CSV on my Android application was a combination of:

  • It's encouraged to use the path version of CsvWriter.builder().build() over the Writer one
  • The standard Storage Access Framework's Documents Provider file creation mechanism returns a Uri whose path seems to be of little use, as it's apparently not linked to the file you create (always throws NoSuchFileException). Instead, the documentation promotes use of FileDescriptors and whatnot.

This is my first Android app so maybe I'm making some rookie mistakes but i ended up doing this, based on this method:

    // <Activity code>
    // called from a Button click, prompts the 'save As' window for the user to pick a file name and location
    private void getCSVExportingFile() {
        Intent intent = new Intent(Intent.ACTION_CREATE_DOCUMENT);
        intent.addCategory(Intent.CATEGORY_OPENABLE);
        intent.setType("text/csv");
        intent.putExtra(Intent.EXTRA_TITLE, CSVExporter.buildName());

        return super.onOptionsItemSelected(item);
        startActivityForResult(intent, Constants.CSV_CREATE_FILE_INTENT_CODE);
    }

   // this is called after the 'save As' window exits, with data == null if it fails or != if it succeeds.
   @Override
    protected void onActivityResult(int requestCode, int resultCode, @Nullable Intent data) {
        super.onActivityResult(requestCode, resultCode, data);
        if (requestCode == Constants.CSV_CREATE_FILE_INTENT_CODE && resultCode == RESULT_OK) {
            Uri uri = null;
            if (data != null) {
                uri = data.getData();
                try {
                    CSVExporter.exportToCSV(this.getContentResolver().openOutputStream(uri));   // <- this (1/2)
                } catch (FileNotFoundException e) {
                    e.printStackTrace();
                }
            }
        }
    }

   // <I called this CSVExporter, but could be anything>
   //  the actual dumping function
    public static void exportToCSV(OutputStream openOutputStream) {
        new Thread(() -> {
            try (CsvWriter csv = CsvWriter.builder().build(new OutputStreamWriter(openOutputStream))) { // <- this (2/2)
                writeHeader(csv);
                List<SomeRecord> data = someClass.getRecords();
                for(SomeRecord item : data){
                    writeRow(csv,item);
                }
            } catch (IOException e) {
                e.printStackTrace();
            }
        }).start();
    }

So what bothers me is IMO the smoothest you can do is

  1. get the activity, then the content provider and then OutputStream using the Uri
  2. use the Uri to build an OutputStreamWriter
  3. still use the less-preferred (according to documentation), Writer version of the CsvWriter.builder().build()

Looking at the source code, the path version doesn't seem to do things much different than what I did. Now while this may be unnecesary, i think there's a quality of life improvement in adding a builder that takes either the Uri or the OutputStream as the parameter. Also with little change, I think the above code could be used as a more realistic version of the CsvWriter file() example, making this library suuuuper plug-n-play. Going from the current example to the final version of this code was a big leap IMO.

Thanks again!

Question about parsing random strings

I want to use the FastCSV for implementing a large file CSV editor, using JavaFX.
I plan to parse only separate rows, at render time. This means I need to use the CSV parser to parse a String line at a time.
Is it possible to have something like this?

CSVParser parser = new CSVParser();
parse.parse( line98);
parser.reset();
parser.parse(line99)

I mean I parse a line, then I may parse another, etc. For getting optimal memory usage, I would instantiate the CSVParser only one time.

To flush, or not to flush?

I noticed that FastBufferedWriter has flush method which is never called:

// https://github.com/osiegmar/FastCSV/blob/master/src/main/java/de/siegmar/fastcsv/writer/FastBufferedWriter.java#L70-L73
@Override
public void flush() throws IOException {
    flushBuffer();
    out.flush();
}

FastBufferedWriter uses flushBuffer:

@Override
public void write(final char[] cbuf, final int off, final int len) throws IOException {
    if (pos + len >= buf.length) {
        flushBuffer();
    }

    if (len >= buf.length) {
        out.write(cbuf, off, len);
    } else {
        System.arraycopy(cbuf, off, buf, pos, len);
        pos += len;
    }
}

private void flushBuffer() throws IOException {
    out.write(buf, 0, pos);
    pos = 0;
}

Should FastBufferedWriter use flush instead of flushBuffer? Or what is the purpose of flush and when it should be used? Readme does not provide any information about it

Could NamedCsvReader and CsvReader share a common interface?

I was wondering whether it might be possible for NamedCsvReader and CsvReader to share a common interface, or perhaps for NamedCsvReader to extend from CsvReader? (And similarly for NamedCsvRow and CsvRow

Here's the use case I have in my head...

I want to use FastCSV to process a user-specified CSV, which may or may not have headers (but the user will tell me whether it does or not). At the moment, I have to write essentially the same code twice because NamedCsvReader and CsvReader are completely separate classes, as are NamedCsvRow and CsvRow. What would make my code much neater and easier to manage would be something along the following lines...

ICsvReader csvReader;
if(hasHeaders){
  csvReader = NamedCsvReader.builder().build(path, charset);
}else{
  csvReader = CsvReader.builder().build(path, charset);
}

csvReader.stream().forEach(row -> {
  //Do something here with each row, casting to NamedCsvRow where necessary and appropriate
});

Perhaps there's a reason why it hasn't been written this way (or perhaps I've missed an existing way of doing this), but perhaps something that could be considered in a future release?

Option to set null string?

Null string default is written as "null". Should be present a setter to override the default null string

Don't force the use of FastBufferedWriter

Many Writer implementations are already buffered and fast enough. Also, many use cases start with an already existing BufferedWriter or similar, adding an extra layer only adds another temporary buffer copy.
For example, you may have code like this working for different uses cases:

GZIPOutputStream zout = new GZIPOutputStream(
new CipherOutputStream(new FileOutputStream(file), c));
		
return new BufferedWriter(new OutputStreamWriter(zout, "utf-8"));

Assuming you can't change the code above or you don't want to be forced to do this:

if (usingFasctCsv)
   return new OutputStreamWriter(zout, "utf-8")
else 
   return new BufferedWriter(new OutputStreamWriter(zout, "utf-8"));

Can you add some way to construct an appender without wrapping the writer?
Or I will try a pull request...

RowReader.readLine() not reseting copyLen after use in CR and LF logic

The copyLen variable not being reset back to zero after used in CR and LF logic.

 } else if (c == CR) {
     if (copyLen > 0) {
         localCurrentField.append(localBuf, localCopyStart, copyLen);
	 copyLen = 0; // FIXME missing this line <=======
     }
     localLine.addField(localCurrentField.toStringAndReset());
     localPrevChar = c;
     localCopyStart = localBufPos;
     break;
 } else if (c == LF) {
     if (localPrevChar != CR) {
         if (copyLen > 0) {
             localCurrentField.append(localBuf, localCopyStart, copyLen);
	     copyLen = 0; // FIXME missing this here too! <========
         }
         localLine.addField(localCurrentField.toStringAndReset());
         localPrevChar = c;
         localCopyStart = localBufPos;
         break;
     }
     localCopyStart = localBufPos;
 } else {

CSV appender for writting big files

I want to create a big csv file using the CSV Appender. I'm using this code:

`for (int i = 0; i < writeBuffer.size(); i++) {
	String array[] = writeBuffer.get(i).toArray(new String[writeBuffer.get(i).size()]);
	csvAppender.appendLine(array);
   }`

being writeBuffer a List<List>. This buffer can have more than 500 lines.
When I finish with the processing, the resulting file only has 148 lines and the last one is incomplete.

I also have try to flush at 100 lines, but then is not writting the next lines.

Maybe i am using the library in a incorrect way?

Thanks in advance.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.