parsecsv / parsecsv-for-php Goto Github PK
View Code? Open in Web Editor NEWCSV data parser for PHP.
License: MIT License
CSV data parser for PHP.
License: MIT License
Right now the coding style of parseCSV is kind of messy. Do we want to update to a standard such as PSR. I personally do not like the PSR styles as they are in my opinion counter efficient for commercial development. However parseCSV isn't commercial so I am not totally against it. Just wanted to see what others thought about updating the coding style.
Getting a timezone error instead of data in excel. Warning: date(): It is not safe to rely on the system's timezone settings. You are *required* to use the date.timezone setting or the date_default_timezone_set() function. In case you used any of those methods an
.
Fixed by adding date_default_timezone_set("Europe/London")
in the output method.
I'm not sure what the norm is now a days, but having looked at a bunch of popular packages on packagist.org, it seems soft-tabs (4 spaces) is popular.
So, should we switch to 4 spaces wide soft-tabs, another width, or leave hard-tabs?
I am parsing a 150MB CSV, and i (quickly) ran into fgetcsv
's shortcomings. One advantage it has, though, is that the number of lines was pretty much irrelevant.
In the examples and in the code, I can't really find an equivalent, though. It seems that the line by line stuff is bundled up in parse_string and the file is always read "as a whole" in _rfile.
Is there a way to just pass line data to parse_string as a workaround?
Hello,
I think that this is not an issue with the script. Anyway, I'm using csv parse to make a multi line csv from db query. For every while loop, I use "$csv->output($filename, $array, $temparray, ';');". Even if I get a multi line csv file as it should be, I have "Cannot modify header information - headers already sent by" errors from output function. Because every loop the headers are sent to browser. Any suggestions how to get true this? Thanks for helping me.
Kind regards
Hi there
I am using this library from last 2 years very easy to use and control
thanks for such a useful tool .
currently I am facing an issue in parsCSV. i need to parse arabic and chines data from sheet
and the library showing me only ?????? like this
obsessively I had tried both method auto and encode but not showing the data exactly what i want any urgent suggestion or help will highly appreciated from my side
Thanks ,
In unparse()
// create heading
if ($this->heading && !$append && !empty($fields)) {
foreach ($fields as $key => $value) {
$entry[] = $this->_enclose_value($value, $delimiter);
}
$string .= implode($delimiter, $entry).$this->linefeed;
$entry = array();
}
// create data
foreach ($data as $key => $row) {
foreach ($row as $field => $value) {
$entry[] = $this->_enclose_value($value, $delimiter);
}
$string .= implode($delimiter, $entry).$this->linefeed;
$entry = array();
}
if there is $fields exists, it seems the output should based on the $fields, otherwise
Can someone first check if above is an issue? i could provide some kind of fix.
Content-Type
is hardcoded and set to application/csv
.
The only differences between tsv
and csv
format are delimiter (which could be changed) and mime type (which is hardcoded). For tsv
it's not application/csv
but text/tab-separated-values
As I originally created this project back in the dark ages before I had any knowledge of unit testing and other sane things, parseCSV currently lacks them.
This issue is for any discussions related to creating proper unit tests.
Hello
I'm using composer to include parsecsv-for-php.
I added "parsecsv/php-parsecsv": "0.4.5" to my composer.json file.
But the class cannot loaded with the PSR autoloader.
Error: Class 'parseCSV' not found
PSR says: The fully-qualified namespace and class is suffixed with .php when loading from the file system.
I think the filename must be renamed from "parsecsv.lib.php" to "parseCSV.php"
Much better would be (to be compliant with PSR-1): Class names MUST be declared in StudlyCaps.
e.g.
Classname = ParseCSV
Filename = ParseCSV.php
One of the more recent commits killed append mode when writing a file.
In no particular order, my thoughts on getting this library up to date for some well deserved showtime!
This is not my request!!!!
This request is from:
https://github.com/asessa/php-parsecsv/commit/1d6864c6a41746075dc24fac02774d4d535cf22c
I love the idea and think we should work to integrate this.
I'm been working with this and found that whenever there is a zero in the line, it breaks the sequence.
Here.
I've this string in the file
http://www.amazon.com/ROX-Ice-Ball-Maker-Original/dp/B00MX59NMQ/ref=sr_1_1?ie=UTF8&qid=1435604374&sr=8-1&keywords=rox+ice+molds
I expected this output
[link] => http://www.amazon.com/ROX-Ice-Ball-Maker-Original/dp/B00MX59NMQ/ref=sr_1_1?ie=UTF8&qid=1435604374&sr=8-1&keywords=rox+ice+molds
but unfortunately getting this one,
[0] => Array
(
[link] => http://www.amazon.com/ROX-Ice-Ball-Maker-Original/dp/B
[1] =>
[2] => MX59NMQ/ref=sr_1_1?ie=UTF8&qid=14356
[3] => 4374&sr=8-1&keywords=rox+ice+molds
)
When reading an existing csv file that have all the cells (values) enclosed and then outputing it, the downloadable doesn't have any values enclosed. This again is an issue with the _enclose_value method.
Hi, thanks for a wonderful csv library. I only have 1 problem with this library when I edit a data and I used $csv->save() each row added an extra line. Please see screenshot https://monosnap.com/image/snEP0sXtTjmyIvefVhEXkQpVfSMxGV. Thanks in advance
Is there anyway to make the headers case insensitive - e.g. force the lib to make all headers lower or upper case.
I am dealing with CSVs from multiple users some who user caps and some who do not.
$result = array(array('Name'=>'Parser', 'Age'=>'30')); print '<pre>'; print_r($result); print '</pre>'; $csv = new parseCSV(); //$csv->save('list.csv',$result); $csv->output('list.csv',$result,null,',');
Try the above code. It creates a file with the printed results on screen as well along with the array data.
During the parse i would to change the id of column with id=4 for example. Is it possible?
Thanks.
$csv = new parseCSV();
$csv->parse('someFile.csv');
$csv->linefeed = "\r\n";
$csv->save('otherFile.csv');
otherFile.csv has \r\n Line Endings
otherFile.csv has \r\r\n Line Endings
Operating system: Win10
PHP version: 7.0.1
I could fix the problem by changing the write mode in the save function from
$mode = ($append) ? 'at' : 'wt';
to
$mode = ($append) ? 'ab' : 'wb';
Fileouput is:
if ( $filename !== null ) {
header('Content-type: application/csv');
header('Content-Disposition: attachment; filename="'.$filename.'"');
echo $data;
}
Should be something like:
if ( $filename !== null ) {
header("Content-type: application/csv");
header("Content-Length: " . mb_strlen($data, '8bit'));
if (strstr($_SERVER["HTTP_USER_AGENT"], "MSIE") != false) {
// needed for IE8 over https
header('Expires: 0');
header('Pragma: cache');
header('Cache-Control: private');
header("Content-Disposition: attachment; filename=" . urlencode($filename) . '; modification-date="' . date('r') . '";');
} else {
header("Content-Disposition: attachment; filename="" . $filename . '"; modification-date="' . date('r') . '";');
}
echo $data;
}
Still far from perfect but I hope it's a bit of an improvement :)
I don't use git/github, so sorry I have to post this as a comment.
Might also cosider:
header('Connection: Keep-Alive');
header('Expires: 0');
header('Cache-Control: must-revalidate, post-check=0, pre-check=0');
header('Pragma: public');
Hi,
It seems that the iconv() conversion takes place only when parsing files, not when parsing data from string. It's a little inconsistent and I have to add my own conversion, although there's one in the parsecsv library.
While building unit tests for parseCSV I have come across an interesting issue while testing the construct method. The default value for the keep_file_data is false. (And in one of my tests I check the default), However I can't assert my csv string that was sent to the parameter because file_data is wiped by the time I can review the parseCSV object. Would you be up for the idea of adding keep_file_data as a parameter to the construct?
PS. As a side note I realize that my unit test shouldn't care what the file_data property has in it as long as the execution and results are successful. I know that but I am trying to build a very strict unit tests to prevent my mistake of earlier today.
I just tested this class as a replacement for my own ragged parser and came across some data this parser seems to have some problems with.
I have a textfield in database containing semicolon followed by a whitespace followed by \r\n followed by more text - the parser does not enclose this value, so following columns are getting shifted in libreoffice calc.
I think the root of this problem is the function
function _enclose_value ($value = null)
I'm not very experienced with regular expressions so I'll need more time to figure it out.
Maybe you guys already have an idea?
*How to modify csv titles: *
i try:
$csv->titles = array('fname','LastName','EmailAddress','paxContactNo','paxGenderID','paxAgeGroupID','BookingCode');
$csv->save();
before:
FirstName,LastName,EmailAddress,paxContactNo,paxGenderID,paxAgeGroupID,BookingCode
After: it changes the syntax of file like this:
"fname""LastName""EmailAddress""paxContactNo""paxGenderID""paxAgeGroupID""BookingCode"
is it okay or i have to change?
Hello, I like your CSV parser and I would like to contribute to this project.
What do you think about rewriting the code to PSR-2 style?
https://github.com/php-fig/fig-standards/blob/master/accepted/PSR-2-coding-style-guide.md
Hello!
Thank you for this library.
I have some CSV files generated with Microsoft Office and they contain BOM at the beginning of the file. Looks like your parser is not handling it correctly (BOM sequence is added to the name of the first field).
I suggest to detect and remove BOM before parsing the file. Right now I have to do this manually.
Cheers!
The relevant standard here is RFC 4180 https://www.ietf.org/rfc/rfc4180.txt
The wikipedia article talks about what the RFC says but it also discusses lots of ways to handle CSV files that have little to do with the specific CSV standard.
Is the goal of this project to be compliant with the wikipedia article (which is what the readme currently says) or the RFC?
Either way.. thanks for the great code!
-FT
If I have a file that looks like the following.
Summary
First,Last,Age
John,Smith,10
Billy,Bob,9
Jane,Fine,14
Jim,Stark,12
A | B | C |
---|---|---|
Summary | ||
First | Last | Age |
John | Smith | 10 |
Billy | Bob | 9 |
Jane | Fine | 14 |
Jim | Stark | 12 |
And then pass it to parseCsv with an offset.
$parseCsv = new \parseCsv($file, 2);
then parseCsv is unable to determine the delimiter
$delimiter = $parseCsv->auto(); // returns false
the reason for this is that auto
does not account for offset which ends up sending the first row to _check_count
which sees that ,
is not represented on every line and immediately returned as false.
I'm guessing someone will say that this is not a valid CSV file. And according to RFC 4180, it isn't.
Each line should contain the same number of fields throughout the file.
However, everyone knows that there are a lot of different implementations of the CSV and it would be nice if we could allow for this case of considering the offset in delimiter detection.
Thanks for the great work on this project. It fits my needs perfectly except for one minor detail. According to section 2 of RFC 4180, "The last record in the file may or may not have an ending line break." But if I parse a file that ends with a newline character, the parser returns an array ending with a nearly empty record that corresponds to the empty line at the end of the file. The record contains a key for the first column, but no other data. The parser should ignore the empty row at the end of the file.
Microsoft Excel uses different default delimiter based on the current OS locale. For example, if locale is set to US it will use "," as delimiter and if it's Danish, it will use ";" by default. It works like that because many European languages use comma for decimal notation ("1,23" instead of "1.23" like in US).
However, the default delimiter can be specified in the file by putting "sep=," as the first line (in this case it will use comma as the delimiter no matter what OS locale is set to). It would be great if this library could do the same to try to detect the default delimiter and then skip the first line of the file if it's used to specify the delimiter.
When I try to load large number of rows, the RAM usage goes above the 512MB I allocated for my PHP. Would be great if it is possible to store the data in a less memory intensive structure as opposed to the standard PHP array which is a memory hog.
The following line causes the script to abort (there is no error message) on OS X Yosemite 10.10.2 running XAMPP with PHP 5.5.14.
flock($fp, $lock);
It is defined inside parsecsv.lib.php inside the function _wfile
If I comment this line, everything works fine.
the parse csv result data is wrong!
all fields should not been string!
Hi,
When parsing a CSV string (not a file) coming from a textarea, the last line does not contain a \r or \n so it is not retrieved.
I think you should update your lib in order to add manually an end-of-line if the last characters are not \n or \r.
Thanks
Sure, you can set $this->output_filename to NULL before calling the output method, but the doc block for the method should reflect that. As it is now, one would expect the method to return a string if the first parameter is set to NULL.
In my modified parseCSV.php file, I just commented out lines 459 through 461:
/*if (empty($filename)) {
$filename = $this->output_filename;
}*/
When I use
$this->csv->encoding('UTF-16', 'UTF-8');
$this->csv->parse($file_path);
It is giving me this error.
What about using this instead in the library ?
iconv(mb_detect_encoding($text, mb_detect_order(), true), "UTF-8", $text);
This will detect the incoming encoding and convert to UTF - 8
You can add new columns to the data and persist them with $csv->save().
However when this is done the keys added to the data set are not added to the headers.
I suppose it might be hard to ensure that added rows are done uniformly accross the data so maybe this is not desired functionality however it suited my purposes.
Is there a mechanism to at least manually edit the header line in this library? Would be useful.
For large CSV's, is there a function to kill it from memory once it has been loaded
So I see the docblock for the conditions property states the property should be a string. Which is confirmed by looking at the conditions.php example
$csv->conditions = 'author does not contain dan brown';
However in the construct
if (count($conditions) > 0) {
$this->conditions = $conditions;
}
I would venture to guess you mean to use strlen($conditions)>0
but I just wanted to make sure.
The CSV parser causes a silent loss of data if an input CSV file does not have a trailing newline. In that case, the last line is ignored without any warning.
Example program:
<?php
$test_csv = "a1;b1;c1;d1\na2;b2;c2;d2";
require 'parsecsv.lib.php';
$parser = new parseCSV();
$parser->encoding('UTF-8', 'UTF-8');
$parser->heading = false;
$parser->delimiter = ';';
$parser->parse($test_csv);
var_export($parser->data);
Expected result:
array (
0 =>
array (
0 => 'a1',
1 => 'b1',
2 => 'c1',
3 => 'd1',
),
1 =>
array (
0 => 'a2',
1 => 'b2',
2 => 'c2',
3 => 'd2',
),
)
Actual result:
array (
0 =>
array (
0 => 'a1',
1 => 'b1',
2 => 'c1',
3 => 'd1',
),
)
hey folks
I have used you lib it is really awesome every thing is working fine in it..
but facing only one issue there is a field which is containing 13 digit integer value '12123123123123xx'
which is shown 9.00E+18 like this in sheet . When i render and insert it into db it insert the data like this 9.00E+18 so how to formate them into real value which we insert into it..
thanks in advance ..
After upload I'm moving the file from $_FILES['somefile'][tmp_name] to server's file folder.
The address is stored into $newFile variable
$csvFile=new parseCSV($newFile);
After this it's printing in a foreach loop:
Array ( [Full Name] => Amaris Ever [Email] => [email protected] [Phone] => XXX-XXX-8738 [Mobile] => [Fax] => [Address] => [City] => [State] => TX [ZIP] => 75006 [Country] => US [10] => )
The last column does not exist in csv
The CSV exported from EXCEL returns it with quotes. However these are not being detected so headers are treated as a single array node for example (as are all other lines).
Example of Google Contact Fields Headers -
From Excel
"Name,Given Name,Additional Name,Family Name,Yomi Name,Given Name Yomi,Additional Name Yomi,Family Name Yomi,Name Prefix,Name Suffix,Initials,Nickname,Short Name,Maiden Name,Birthday,Gender,Location,Billing Information,Directory....etc"
Returned from parsecsv-for-php
[titles] => Array
(
[0] => Name,Given Name,Additional Name,Family Name,Yomi Name,Given Name Yomi,Additional Name Yomi,Family Name Yomi,Name Prefix,Name Suffix,Initials,Nickname,Short Name,Maiden Name,Birthday,Gender,Location,Billing Information,Directory..... etc
)
Any ideas on why this is happening ?
Parsing issue when csv are numerical and has only one column.
ex:
86545235689
34365587654
13469874576
Somehow, it seperate each row base on the integer '6'.
I believe this was cause by having $enclosure = '"';
and requires all CSV to be enclose...
php_info and csv files at https://www.dropbox.com/s/5tg3th1cfx1euee/_testCSV.zip
I get a blind 500 error from my hoster (1and1, who doesn't provide me with Apache error logs). This is not a catchable error, not a memory error.
It seems to occur in parse_string, while parsing lines >13000.
I don't have any data at hand to test it, but I spent some time playing around with the parse_string()-function for my chunk reader and think that it might fail if the csv file uses unicode line terminators.
Maybe more problems could occur from iterating $data[ ] as single characters, not taking care of multibyte characters.
Any thoughts?
Hello,
I'm using ParseCSV for big files (169Mo and up to 97000 records) and I must use the offset/limit feature to parse step by step.
If my CSV page has a heading, for an offset of 0 I have my array with keys as head names, but if my offset is set more than 0, I lost the keys as head names.
I suggest to modify the code like this in parse_string function:
if ( $this->heading && empty($head) ) {
$head = $row;
} elseif ( $this->_validate_offset($row_count) && $this->_validate_row_conditions($row, $this->conditions) ) {
if ( empty($this->fields) || (!empty($this->fields) && (($this->heading && $row_count > 0) || !$this->heading)) ) {
if ( !empty($this->sort_by) && !empty($row[$this->sort_by]) ) {
if ( isset($rows[$row[$this->sort_by]]) ) {
$rows[$row[$this->sort_by].'_0'] = &$rows[$row[$this->sort_by]];
unset($rows[$row[$this->sort_by]]);
for ( $sn=1; isset($rows[$row[$this->sort_by].'_'.$sn]); $sn++ ) {}
$rows[$row[$this->sort_by].'_'.$sn] = $row;
} else $rows[$row[$this->sort_by]] = $row;
} else $rows[] = $row;
}
}
With this modification, I have always the good head names in my array keys.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.