parsecsv / parsecsv-for-php Goto Github PK

View Code? Open in Web Editor NEW

680.0 680.0 177.0 632 KB

CSV data parser for PHP.

License: MIT License

Makefile 0.67% PHP 99.33%

parsecsv-for-php's People

Contributors

Stargazers

Watchers

Forkers

christiangrech morrislaptop danielcristian annanh mael221090 mwmalinowski norcoen askie kowach wolf777 lodev09 helpse foued611 vergilji navinkumarsharma bfredd sohag07hasan eddy1982 ryanhightower walkeraguilar hanicker ronnevinkx eltictacdicta thais-sa uurtech maitret zhuomingliang lmon semvdwal mrfrogcoder jvlync calinsargan thominj codingeass yohanlebret icyz zythos waaron repat ck-developer marcelovani xeleniumz zodchii jarvine2 2c-gstoqnov breyndotechse marquisknox sbsangpi ozee31 oguzsaka srinathweb mortl yehchge soberanes luizventurote webds alex-vlasov karthikeyansam frozenmosaic bhilleli rafasashi dharin-shah vasia123 wazelin naveedmetlo julioamorim zulhilmixrahman pippinsplugins slaveykov helalbookmark gpauliniko ludovicf01 php-tool bigboi900 kevinjo monicakaggudas diksha694 isnusun bayurepo optimum7com jertippets qa1 methnen piskvor webgensk dragankovacevic makiplanka jestinas ddunod willempaling wir erikhauters kjschabra westguard michaelsharman devthue tofandel atouhou pilr saidrajafallah

parsecsv-for-php's Issues

Coding Style

Right now the coding style of parseCSV is kind of messy. Do we want to update to a standard such as PSR. I personally do not like the PSR styles as they are in my opinion counter efficient for commercial development. However parseCSV isn't commercial so I am not totally against it. Just wanted to see what others thought about updating the coding style.

Timezone Error

Getting a timezone error instead of data in excel. Warning: date(): It is not safe to rely on the system's timezone settings. You are *required* to use the date.timezone setting or the date_default_timezone_set() function. In case you used any of those methods an.

Fixed by adding date_default_timezone_set("Europe/London") in the output method.

Indentation, hard-tabs vs soft-tabs?

I'm not sure what the norm is now a days, but having looked at a bunch of popular packages on packagist.org, it seems soft-tabs (4 spaces) is popular.

So, should we switch to 4 spaces wide soft-tabs, another width, or leave hard-tabs?

[enhancement] Line-Based API?

I am parsing a 150MB CSV, and i (quickly) ran into fgetcsv's shortcomings. One advantage it has, though, is that the number of lines was pretty much irrelevant.

In the examples and in the code, I can't really find an equivalent, though. It seems that the line by line stuff is bundled up in parse_string and the file is always read "as a whole" in _rfile.

Is there a way to just pass line data to parse_string as a workaround?

$csv->output in while loop and output header

Hello,

I think that this is not an issue with the script. Anyway, I'm using csv parse to make a multi line csv from db query. For every while loop, I use "$csv->output($filename, $array, $temparray, ';');". Even if I get a multi line csv file as it should be, I have "Cannot modify header information - headers already sent by" errors from output function. Because every loop the headers are sent to browser. Any suggestions how to get true this? Thanks for helping me.

Kind regards

encoding issue when parsing chines/arabic

Hi there
I am using this library from last 2 years very easy to use and control
thanks for such a useful tool .
currently I am facing an issue in parsCSV. i need to parse arabic and chines data from sheet
and the library showing me only ?????? like this
obsessively I had tried both method auto and encode but not showing the data exactly what i want any urgent suggestion or help will highly appreciated from my side
Thanks ,

On output order

In unparse()

    // create heading
    if ($this->heading && !$append && !empty($fields)) {
        foreach ($fields as $key => $value) {
            $entry[] = $this->_enclose_value($value, $delimiter);
        }

        $string .= implode($delimiter, $entry).$this->linefeed;
        $entry   = array();
    }
    // create data
    foreach ($data as $key => $row) {
        foreach ($row as $field => $value) {
            $entry[] = $this->_enclose_value($value, $delimiter);
        }

        $string .= implode($delimiter, $entry).$this->linefeed;
        $entry   = array();
    }

if there is $fields exists, it seems the output should based on the $fields, otherwise

How do we guarantee the data row order is the same to the $fields
If $fields is given to only select some fields not all, this unparse seems broken

Can someone first check if above is an issue? i could provide some kind of fix.

[feature request] ability to change Content-Type header in order to create TSV file.

Content-Type is hardcoded and set to application/csv.
The only differences between tsv and csv format are delimiter (which could be changed) and mime type (which is hardcoded). For tsv it's not application/csv but text/tab-separated-values

Unit Tests

As I originally created this project back in the dark ages before I had any knowledge of unit testing and other sane things, parseCSV currently lacks them.

This issue is for any discussions related to creating proper unit tests.

Current State of Tests

Autoloader not working

Hello

I'm using composer to include parsecsv-for-php.
I added "parsecsv/php-parsecsv": "0.4.5" to my composer.json file.
But the class cannot loaded with the PSR autoloader.

Error: Class 'parseCSV' not found

PSR says: The fully-qualified namespace and class is suffixed with .php when loading from the file system.

I think the filename must be renamed from "parsecsv.lib.php" to "parseCSV.php"

Much better would be (to be compliant with PSR-1): Class names MUST be declared in StudlyCaps.
e.g.
Classname = ParseCSV
Filename = ParseCSV.php

Append Mode

One of the more recent commits killed append mode when writing a file.

Project TODO

In no particular order, my thoughts on getting this library up to date for some well deserved showtime!

Code Cleanup and new style documentation (done: 4b28088)
Composer Support (done: 5a70a7b)
Proper access for protected methods and properties
PHP 5.3 standardization
Unit TEST(S)!!! 😡 😭 😌 (#4)
Contributors section in Readme (done: f59af53)
Update readme (grammar and maintenance notice) (done: c915579)
Update Changelog

Force UTF-8 BOM

This is not my request!!!!

This request is from:
https://github.com/asessa/php-parsecsv/commit/1d6864c6a41746075dc24fac02774d4d535cf22c

I love the idea and think we should work to integrate this.

Splitting rows unexpectedly.

I'm been working with this and found that whenever there is a zero in the line, it breaks the sequence.
Here.
I've this string in the file
http://www.amazon.com/ROX-Ice-Ball-Maker-Original/dp/B00MX59NMQ/ref=sr_1_1?ie=UTF8&qid=1435604374&sr=8-1&keywords=rox+ice+molds

I expected this output
[link] => http://www.amazon.com/ROX-Ice-Ball-Maker-Original/dp/B00MX59NMQ/ref=sr_1_1?ie=UTF8&qid=1435604374&sr=8-1&keywords=rox+ice+molds

but unfortunately getting this one,

[0] => Array
(
[link] => http://www.amazon.com/ROX-Ice-Ball-Maker-Original/dp/B
[1] =>
[2] => MX59NMQ/ref=sr_1_1?ie=UTF8&qid=14356
[3] => 4374&sr=8-1&keywords=rox+ice+molds
)

Fix Documenation

This is not my request!!!

This is a request from:
ChristianGrech@bdab0b5

Will work to solve shortly

outputed csv does not enclose when source is

When reading an existing csv file that have all the cells (values) enclosed and then outputing it, the downloadable doesn't have any values enclosed. This again is an issue with the _enclose_value method.

CSV adding extra line

Hi, thanks for a wonderful csv library. I only have 1 problem with this library when I edit a data and I used $csv->save() each row added an extra line. Please see screenshot https://monosnap.com/image/snEP0sXtTjmyIvefVhEXkQpVfSMxGV. Thanks in advance

Case insensitive headers

Is there anyway to make the headers case insensitive - e.g. force the lib to make all headers lower or upper case.

I am dealing with CSVs from multiple users some who user caps and some who do not.

output function un-parses the data printed on screen as well

$result = array(array('Name'=>'Parser', 'Age'=>'30')); print '<pre>'; print_r($result); print '</pre>'; $csv = new parseCSV(); //$csv->save('list.csv',$result); $csv->output('list.csv',$result,null,',');

Try the above code. It creates a file with the printed results on screen as well along with the array data.

Could I change the column's id when I parse the csv?

During the parse i would to change the id of column with id=4 for example. Is it possible?
Thanks.

double line endings

Steps to reproduce

$csv = new parseCSV();
$csv->parse('someFile.csv');
$csv->linefeed = "\r\n";
$csv->save('otherFile.csv');

Expected behaviour

otherFile.csv has \r\n Line Endings

Actual behaviour

otherFile.csv has \r\r\n Line Endings

Server configuration

Operating system: Win10

PHP version: 7.0.1

I could fix the problem by changing the write mode in the save function from

       $mode = ($append) ? 'at' : 'wt';

        $mode = ($append) ? 'ab' : 'wb';

File output header

Fileouput is:
if ( $filename !== null ) {
header('Content-type: application/csv');
header('Content-Disposition: attachment; filename="'.$filename.'"');
echo $data;
}

Should be something like:
if ( $filename !== null ) {
header("Content-type: application/csv");
header("Content-Length: " . mb_strlen($data, '8bit'));
if (strstr($_SERVER["HTTP_USER_AGENT"], "MSIE") != false) {
// needed for IE8 over https
header('Expires: 0');
header('Pragma: cache');
header('Cache-Control: private');
header("Content-Disposition: attachment; filename=" . urlencode($filename) . '; modification-date="' . date('r') . '";');
} else {
header("Content-Disposition: attachment; filename="" . $filename . '"; modification-date="' . date('r') . '";');
}
echo $data;
}

Still far from perfect but I hope it's a bit of an improvement :)
I don't use git/github, so sorry I have to post this as a comment.

Might also cosider:
header('Connection: Keep-Alive');
header('Expires: 0');
header('Cache-Control: must-revalidate, post-check=0, pre-check=0');
header('Pragma: public');

Encoding changed only when parsing files

Hi,

It seems that the iconv() conversion takes place only when parsing files, not when parsing data from string. It's a little inconsistent and I have to add my own conversion, although there's one in the parsecsv library.

keep_file_data construct parameter

While building unit tests for parseCSV I have come across an interesting issue while testing the construct method. The default value for the keep_file_data is false. (And in one of my tests I check the default), However I can't assert my csv string that was sent to the parameter because file_data is wiped by the time I can review the parseCSV object. Would you be up for the idea of adding keep_file_data as a parameter to the construct?

PS. As a side note I realize that my unit test shouldn't care what the file_data property has in it as long as the execution and results are successful. I know that but I am trying to build a very strict unit tests to prevent my mistake of earlier today.

function _enclose_value

I just tested this class as a replacement for my own ragged parser and came across some data this parser seems to have some problems with.

I have a textfield in database containing semicolon followed by a whitespace followed by \r\n followed by more text - the parser does not enclose this value, so following columns are getting shifted in libreoffice calc.

I think the root of this problem is the function

function _enclose_value ($value = null)

I'm not very experienced with regular expressions so I'll need more time to figure it out.
Maybe you guys already have an idea?

modify csv titles

*How to modify csv titles: *
i try:
$csv->titles = array('fname','LastName','EmailAddress','paxContactNo','paxGenderID','paxAgeGroupID','BookingCode');

$csv->save();

before:
FirstName,LastName,EmailAddress,paxContactNo,paxGenderID,paxAgeGroupID,BookingCode

After: it changes the syntax of file like this:
"fname""LastName""EmailAddress""paxContactNo""paxGenderID""paxAgeGroupID""BookingCode"

is it okay or i have to change?

PSR coding style

Hello, I like your CSV parser and I would like to contribute to this project.
What do you think about rewriting the code to PSR-2 style?

https://github.com/php-fig/fig-standards/blob/master/accepted/PSR-2-coding-style-guide.md

Detect BOM and strip it?

Hello!

Thank you for this library.

I have some CSV files generated with Microsoft Office and they contain BOM at the beginning of the file. Looks like your parser is not handling it correctly (BOM sequence is added to the name of the first field).

I suggest to detect and remove BOM before parsing the file. Right now I have to do this manually.

Cheers!

Standards compliant vs wikipedia compliant

The relevant standard here is RFC 4180 https://www.ietf.org/rfc/rfc4180.txt

The wikipedia article talks about what the RFC says but it also discusses lots of ways to handle CSV files that have little to do with the specific CSV standard.

Is the goal of this project to be compliant with the wikipedia article (which is what the readme currently says) or the RFC?

Either way.. thanks for the great code!

-FT

->auto() delimiter detection does not account for offset

If I have a file that looks like the following.

Summary
First,Last,Age
John,Smith,10
Billy,Bob,9
Jane,Fine,14
Jim,Stark,12

A	B	C
Summary
First	Last	Age
John	Smith	10
Billy	Bob	9
Jane	Fine	14
Jim	Stark	12

And then pass it to parseCsv with an offset.

$parseCsv = new \parseCsv($file, 2);

then parseCsv is unable to determine the delimiter

$delimiter = $parseCsv->auto(); // returns false

the reason for this is that auto does not account for offset which ends up sending the first row to _check_count which sees that , is not represented on every line and immediately returned as false.

I'm guessing someone will say that this is not a valid CSV file. And according to RFC 4180, it isn't.

Each line should contain the same number of fields throughout the file.

However, everyone knows that there are a lot of different implementations of the CSV and it would be nice if we could allow for this case of considering the offset in delimiter detection.

The last line should be ignored if it's empty

Thanks for the great work on this project. It fits my needs perfectly except for one minor detail. According to section 2 of RFC 4180, "The last record in the file may or may not have an ending line break." But if I parse a file that ends with a newline character, the parser returns an array ending with a nearly empty record that corresponds to the empty line at the end of the file. The record contains a key for the first column, but no other data. The parser should ignore the empty row at the end of the file.

Use MS Excel's "sep=" to detect the delimiter

Microsoft Excel uses different default delimiter based on the current OS locale. For example, if locale is set to US it will use "," as delimiter and if it's Danish, it will use ";" by default. It works like that because many European languages use comma for decimal notation ("1,23" instead of "1.23" like in US).
However, the default delimiter can be specified in the file by putting "sep=," as the first line (in this case it will use comma as the delimiter no matter what OS locale is set to). It would be great if this library could do the same to try to detect the default delimiter and then skip the first line of the file if it's used to specify the delimiter.

Is it possible to use another less memory intensive structure?

When I try to load large number of rows, the RAM usage goes above the 512MB I allocated for my PHP. Would be great if it is possible to store the data in a less memory intensive structure as opposed to the standard PHP array which is a memory hog.

flock causes the script to abort

The following line causes the script to abort (there is no error message) on OS X Yosemite 10.10.2 running XAMPP with PHP 5.5.14.

flock($fp, $lock);

It is defined inside parsecsv.lib.php inside the function _wfile
If I comment this line, everything works fine.

the type of result data is only string?

the parse csv result data is wrong!
all fields should not been string!

Last line not parsed

Hi,
When parsing a CSV string (not a file) coming from a textarea, the last line does not contain a \r or \n so it is not retrieved.
I think you should update your lib in order to add manually an end-of-line if the last characters are not \n or \r.
Thanks

Output method always produces headers and outputs directly to browser

Sure, you can set $this->output_filename to NULL before calling the output method, but the doc block for the method should reflect that. As it is now, one would expect the method to return a string if the first parameter is set to NULL.

In my modified parseCSV.php file, I just commented out lines 459 through 461:

    /*if (empty($filename)) {
        $filename = $this->output_filename;
    }*/

Message: iconv(): Detected an incomplete multibyte character in input string

When I use

$this->csv->encoding('UTF-16', 'UTF-8');
$this->csv->parse($file_path);

It is giving me this error.

What about using this instead in the library ?

iconv(mb_detect_encoding($text, mb_detect_order(), true), "UTF-8", $text);

This will detect the incoming encoding and convert to UTF - 8

Creation of new columns works but headers not appended.

You can add new columns to the data and persist them with $csv->save().
However when this is done the keys added to the data set are not added to the headers.
I suppose it might be hard to ensure that added rows are done uniformly accross the data so maybe this is not desired functionality however it suited my purposes.
Is there a mechanism to at least manually edit the header line in this library? Would be useful.

For large CSV's, is there a function to kill it from memory once it has been loaded

conditions question

So I see the docblock for the conditions property states the property should be a string. Which is confirmed by looking at the conditions.php example
$csv->conditions = 'author does not contain dan brown';

However in the construct

if (count($conditions) > 0) {
    $this->conditions = $conditions;
}

I would venture to guess you mean to use strlen($conditions)>0 but I just wanted to make sure.

Silent data loss: Last line is ignored if it does not end with newline

The CSV parser causes a silent loss of data if an input CSV file does not have a trailing newline. In that case, the last line is ignored without any warning.

Example program:

<?php
$test_csv = "a1;b1;c1;d1\na2;b2;c2;d2";
require 'parsecsv.lib.php';
$parser = new parseCSV();
$parser->encoding('UTF-8', 'UTF-8');
$parser->heading = false;
$parser->delimiter = ';';
$parser->parse($test_csv);
var_export($parser->data);

Expected result:

array (
  0 => 
  array (
    0 => 'a1',
    1 => 'b1',
    2 => 'c1',
    3 => 'd1',
  ),
  1 => 
  array (
    0 => 'a2',
    1 => 'b2',
    2 => 'c2',
    3 => 'd2',
  ),
)

Actual result:

array (
  0 => 
  array (
    0 => 'a1',
    1 => 'b1',
    2 => 'c1',
    3 => 'd1',
  ),
)

Project TODO

make parseCSV use the iterator interface
- let iterator function handle both modes, big array and CSVReaderRows
adapt file-handling from CSVReader in parseCSV
merge functions:
- don't let CSVReader use str_getcsv() (maybe parse_string() can be used as a drop-in replacement)
add option to constructor, which mode to use
- let constructor initialize both ways

how to convert 9.00E+18 numbers into simple numbers

hey folks
I have used you lib it is really awesome every thing is working fine in it..
but facing only one issue there is a field which is containing 13 digit integer value '12123123123123xx'
which is shown 9.00E+18 like this in sheet . When i render and insert it into db it insert the data like this 9.00E+18 so how to formate them into real value which we insert into it..
thanks in advance ..

Extra column added into parsed csv array

After upload I'm moving the file from $_FILES['somefile'][tmp_name] to server's file folder.
The address is stored into $newFile variable

$csvFile=new parseCSV($newFile);

After this it's printing in a foreach loop:

Array ( [Full Name] => Amaris Ever [Email] => [email protected] [Phone] => XXX-XXX-8738 [Mobile] => [Fax] => [Address] => [City] => [State] => TX [ZIP] => 75006 [Country] => US [10] => )

The last column does not exist in csv

Not splitting Excel CSV files with quotes around lines

The CSV exported from EXCEL returns it with quotes. However these are not being detected so headers are treated as a single array node for example (as are all other lines).

Example of Google Contact Fields Headers -

From Excel

"Name,Given Name,Additional Name,Family Name,Yomi Name,Given Name Yomi,Additional Name Yomi,Family Name Yomi,Name Prefix,Name Suffix,Initials,Nickname,Short Name,Maiden Name,Birthday,Gender,Location,Billing Information,Directory....etc"

Returned from parsecsv-for-php

[titles] => Array
(
[0] => Name,Given Name,Additional Name,Family Name,Yomi Name,Given Name Yomi,Additional Name Yomi,Family Name Yomi,Name Prefix,Name Suffix,Initials,Nickname,Short Name,Maiden Name,Birthday,Gender,Location,Billing Information,Directory..... etc
)

Any ideas on why this is happening ?

Single column, numerical CSV

Parsing issue when csv are numerical and has only one column.
ex:

86545235689
34365587654
13469874576

Somehow, it seperate each row base on the integer '6'.

I believe this was cause by having $enclosure = '"';
and requires all CSV to be enclose...

Blind 500 error while parsing more than 13000 lines

php_info and csv files at https://www.dropbox.com/s/5tg3th1cfx1euee/_testCSV.zip

I get a blind 500 error from my hoster (1and1, who doesn't provide me with Apache error logs). This is not a catchable error, not a memory error.

It seems to occur in parse_string, while parsing lines >13000.

function parse_string() and unicode files

I don't have any data at hand to test it, but I spent some time playing around with the parse_string()-function for my chunk reader and think that it might fail if the csv file uses unicode line terminators.

Maybe more problems could occur from iterating $data[ ] as single characters, not taking care of multibyte characters.

Any thoughts?

Heading and offset