Git Product home page Git Product logo

parsecsv-for-php's People

Contributors

andreybolonin avatar breyndotechse avatar fonata avatar geminorum avatar gogowitsch avatar helpse avatar jimeh avatar lbajsarowicz avatar monkeywithacupcake avatar morozov avatar morrislaptop avatar mte90 avatar norcoen avatar piskvor avatar repat avatar sharkmachine avatar susgo avatar tunecino avatar waknauss avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

parsecsv-for-php's Issues

Coding Style

Right now the coding style of parseCSV is kind of messy. Do we want to update to a standard such as PSR. I personally do not like the PSR styles as they are in my opinion counter efficient for commercial development. However parseCSV isn't commercial so I am not totally against it. Just wanted to see what others thought about updating the coding style.

Timezone Error

Getting a timezone error instead of data in excel. Warning: date(): It is not safe to rely on the system's timezone settings. You are *required* to use the date.timezone setting or the date_default_timezone_set() function. In case you used any of those methods an.

Fixed by adding date_default_timezone_set("Europe/London") in the output method.

Indentation, hard-tabs vs soft-tabs?

I'm not sure what the norm is now a days, but having looked at a bunch of popular packages on packagist.org, it seems soft-tabs (4 spaces) is popular.

So, should we switch to 4 spaces wide soft-tabs, another width, or leave hard-tabs?

[enhancement] Line-Based API?

I am parsing a 150MB CSV, and i (quickly) ran into fgetcsv's shortcomings. One advantage it has, though, is that the number of lines was pretty much irrelevant.

In the examples and in the code, I can't really find an equivalent, though. It seems that the line by line stuff is bundled up in parse_string and the file is always read "as a whole" in _rfile.

Is there a way to just pass line data to parse_string as a workaround?

$csv->output in while loop and output header

Hello,

I think that this is not an issue with the script. Anyway, I'm using csv parse to make a multi line csv from db query. For every while loop, I use "$csv->output($filename, $array, $temparray, ';');". Even if I get a multi line csv file as it should be, I have "Cannot modify header information - headers already sent by" errors from output function. Because every loop the headers are sent to browser. Any suggestions how to get true this? Thanks for helping me.

Kind regards

encoding issue when parsing chines/arabic

Hi there
I am using this library from last 2 years very easy to use and control
thanks for such a useful tool .
currently I am facing an issue in parsCSV. i need to parse arabic and chines data from sheet
and the library showing me only ?????? like this
obsessively I had tried both method auto and encode but not showing the data exactly what i want any urgent suggestion or help will highly appreciated from my side
Thanks ,

On output order

In unparse()

    // create heading
    if ($this->heading && !$append && !empty($fields)) {
        foreach ($fields as $key => $value) {
            $entry[] = $this->_enclose_value($value, $delimiter);
        }

        $string .= implode($delimiter, $entry).$this->linefeed;
        $entry   = array();
    }
    // create data
    foreach ($data as $key => $row) {
        foreach ($row as $field => $value) {
            $entry[] = $this->_enclose_value($value, $delimiter);
        }

        $string .= implode($delimiter, $entry).$this->linefeed;
        $entry   = array();
    }

if there is $fields exists, it seems the output should based on the $fields, otherwise

  1. How do we guarantee the data row order is the same to the $fields
  2. If $fields is given to only select some fields not all, this unparse seems broken

Can someone first check if above is an issue? i could provide some kind of fix.

Unit Tests

As I originally created this project back in the dark ages before I had any knowledge of unit testing and other sane things, parseCSV currently lacks them.

This issue is for any discussions related to creating proper unit tests.

Current State of Tests

  • Basic setup of tests.
  • Travis-CI Setup.
  • Expand this list with more Todo's as needed.
  • Properties
    • set methods (methods still need built)
      • heading
        • non bool exception
      • fields
        • non array exception
      • sort_by
        • non string exception
      • sort_reverse
        • non bool exception
      • sort_type
        • non string exception
        • must match regular,numeric, or string
      • delimiter
        • non string exception
      • enclosure
        • non string exception
      • enclose_all
        • non bool exception
      • conditions
        • non string exception
      • offset
        • non int or null exception
      • limit
        • non int or null exception
      • auto_depth
        • non int exception
      • auto_non_chars
        • non string exception
      • auto_preferred
        • non string exception
      • convert_encoding
        • non bool exception
      • input_encoding
        • non string exception
      • output_encoding
        • non string exception
      • linefeed
        • non string exception
      • output_delimiter
        • non string exception
      • output_filename
        • non string exception
      • keep_file_data
        • non bool exception
      • file
        • non string exception
      • file_data
        • non string exception
      • titles
        • non array exception
    • default values
      • heading
      • fields
      • sort_by
      • sort_reverse
      • sort_type
      • delimiter
      • enclosure
      • enclose_all
      • conditions
      • offset
      • limit
      • auto_depth
      • auto_non_chars
      • auto_preferred
      • convert_encoding
      • input_encoding
      • output_encoding
      • linefeed
      • output_delimiter
      • output_filename
      • keep_file_data
      • file
      • file_data
      • error
      • error_info
      • titles
      • data
    • methods
      • __construct
        • input param
        • offset param
        • limit param
        • conditions param
        • keep_file_data param
      • parse
        • input param
        • offset param
        • limit param
        • conditions param
      • save
        • file param
        • data param
        • append param
        • fields param
      • output
        • filename param
        • data param
        • fields param
        • delimiter param
      • encoding
        • input param
        • output param
      • auto
        • file param
        • parse param
        • search_depth param
        • preferred param
        • enclosure param
      • parse_file
        • file param
      • parse_string
        • data param
      • unparse
        • data param
        • fields param
        • append param
        • is_php param
        • delimiter param

Autoloader not working

Hello

I'm using composer to include parsecsv-for-php.
I added "parsecsv/php-parsecsv": "0.4.5" to my composer.json file.
But the class cannot loaded with the PSR autoloader.

Error: Class 'parseCSV' not found

PSR says: The fully-qualified namespace and class is suffixed with .php when loading from the file system.

I think the filename must be renamed from "parsecsv.lib.php" to "parseCSV.php"

Much better would be (to be compliant with PSR-1): Class names MUST be declared in StudlyCaps.
e.g.
Classname = ParseCSV
Filename = ParseCSV.php

Append Mode

One of the more recent commits killed append mode when writing a file.

Project TODO

In no particular order, my thoughts on getting this library up to date for some well deserved showtime!

  • Code Cleanup and new style documentation (done: 4b28088)
  • Composer Support (done: 5a70a7b)
  • Proper access for protected methods and properties
  • PHP 5.3 standardization
  • Unit TEST(S)!!! ๐Ÿ˜ก ๐Ÿ˜ญ ๐Ÿ˜Œ (#4)
  • Contributors section in Readme (done: f59af53)
  • Update readme (grammar and maintenance notice) (done: c915579)
  • Update Changelog

Splitting rows unexpectedly.

I'm been working with this and found that whenever there is a zero in the line, it breaks the sequence.
Here.
I've this string in the file
http://www.amazon.com/ROX-Ice-Ball-Maker-Original/dp/B00MX59NMQ/ref=sr_1_1?ie=UTF8&qid=1435604374&sr=8-1&keywords=rox+ice+molds

I expected this output
[link] => http://www.amazon.com/ROX-Ice-Ball-Maker-Original/dp/B00MX59NMQ/ref=sr_1_1?ie=UTF8&qid=1435604374&sr=8-1&keywords=rox+ice+molds

but unfortunately getting this one,

[0] => Array
(
[link] => http://www.amazon.com/ROX-Ice-Ball-Maker-Original/dp/B
[1] =>
[2] => MX59NMQ/ref=sr_1_1?ie=UTF8&qid=14356
[3] => 4374&sr=8-1&keywords=rox+ice+molds
)

outputed csv does not enclose when source is

When reading an existing csv file that have all the cells (values) enclosed and then outputing it, the downloadable doesn't have any values enclosed. This again is an issue with the _enclose_value method.

Case insensitive headers

Is there anyway to make the headers case insensitive - e.g. force the lib to make all headers lower or upper case.

I am dealing with CSVs from multiple users some who user caps and some who do not.

output function un-parses the data printed on screen as well

$result = array(array('Name'=>'Parser', 'Age'=>'30')); print '<pre>'; print_r($result); print '</pre>'; $csv = new parseCSV(); //$csv->save('list.csv',$result); $csv->output('list.csv',$result,null,',');

Try the above code. It creates a file with the printed results on screen as well along with the array data.

double line endings

Steps to reproduce

$csv = new parseCSV();
$csv->parse('someFile.csv');
$csv->linefeed = "\r\n";
$csv->save('otherFile.csv');

Expected behaviour

otherFile.csv has \r\n Line Endings

Actual behaviour

otherFile.csv has \r\r\n Line Endings

Server configuration

Operating system: Win10

PHP version: 7.0.1

I could fix the problem by changing the write mode in the save function from

       $mode = ($append) ? 'at' : 'wt';

to

        $mode = ($append) ? 'ab' : 'wb';

File output header

Fileouput is:
if ( $filename !== null ) {
header('Content-type: application/csv');
header('Content-Disposition: attachment; filename="'.$filename.'"');
echo $data;
}

Should be something like:
if ( $filename !== null ) {
header("Content-type: application/csv");
header("Content-Length: " . mb_strlen($data, '8bit'));
if (strstr($_SERVER["HTTP_USER_AGENT"], "MSIE") != false) {
// needed for IE8 over https
header('Expires: 0');
header('Pragma: cache');
header('Cache-Control: private');
header("Content-Disposition: attachment; filename=" . urlencode($filename) . '; modification-date="' . date('r') . '";');
} else {
header("Content-Disposition: attachment; filename="" . $filename . '"; modification-date="' . date('r') . '";');
}
echo $data;
}

Still far from perfect but I hope it's a bit of an improvement :)
I don't use git/github, so sorry I have to post this as a comment.

Might also cosider:
header('Connection: Keep-Alive');
header('Expires: 0');
header('Cache-Control: must-revalidate, post-check=0, pre-check=0');
header('Pragma: public');

Encoding changed only when parsing files

Hi,

It seems that the iconv() conversion takes place only when parsing files, not when parsing data from string. It's a little inconsistent and I have to add my own conversion, although there's one in the parsecsv library.

keep_file_data construct parameter

While building unit tests for parseCSV I have come across an interesting issue while testing the construct method. The default value for the keep_file_data is false. (And in one of my tests I check the default), However I can't assert my csv string that was sent to the parameter because file_data is wiped by the time I can review the parseCSV object. Would you be up for the idea of adding keep_file_data as a parameter to the construct?

PS. As a side note I realize that my unit test shouldn't care what the file_data property has in it as long as the execution and results are successful. I know that but I am trying to build a very strict unit tests to prevent my mistake of earlier today.

function _enclose_value

I just tested this class as a replacement for my own ragged parser and came across some data this parser seems to have some problems with.

I have a textfield in database containing semicolon followed by a whitespace followed by \r\n followed by more text - the parser does not enclose this value, so following columns are getting shifted in libreoffice calc.

I think the root of this problem is the function

function _enclose_value ($value = null)

I'm not very experienced with regular expressions so I'll need more time to figure it out.
Maybe you guys already have an idea?

modify csv titles

*How to modify csv titles: *
i try:
$csv->titles = array('fname','LastName','EmailAddress','paxContactNo','paxGenderID','paxAgeGroupID','BookingCode');

$csv->save();

before:
FirstName,LastName,EmailAddress,paxContactNo,paxGenderID,paxAgeGroupID,BookingCode

After: it changes the syntax of file like this:
"fname""LastName""EmailAddress""paxContactNo""paxGenderID""paxAgeGroupID""BookingCode"

is it okay or i have to change?

Detect BOM and strip it?

Hello!

Thank you for this library.

I have some CSV files generated with Microsoft Office and they contain BOM at the beginning of the file. Looks like your parser is not handling it correctly (BOM sequence is added to the name of the first field).

I suggest to detect and remove BOM before parsing the file. Right now I have to do this manually.

Cheers!

Standards compliant vs wikipedia compliant

The relevant standard here is RFC 4180 https://www.ietf.org/rfc/rfc4180.txt

The wikipedia article talks about what the RFC says but it also discusses lots of ways to handle CSV files that have little to do with the specific CSV standard.

Is the goal of this project to be compliant with the wikipedia article (which is what the readme currently says) or the RFC?

Either way.. thanks for the great code!

-FT

->auto() delimiter detection does not account for offset

If I have a file that looks like the following.

Summary
First,Last,Age
John,Smith,10
Billy,Bob,9
Jane,Fine,14
Jim,Stark,12
A B C
Summary
First Last Age
John Smith 10
Billy Bob 9
Jane Fine 14
Jim Stark 12

And then pass it to parseCsv with an offset.

$parseCsv = new \parseCsv($file, 2);

then parseCsv is unable to determine the delimiter

$delimiter = $parseCsv->auto(); // returns false

the reason for this is that auto does not account for offset which ends up sending the first row to _check_count which sees that , is not represented on every line and immediately returned as false.

I'm guessing someone will say that this is not a valid CSV file. And according to RFC 4180, it isn't.

Each line should contain the same number of fields throughout the file.

However, everyone knows that there are a lot of different implementations of the CSV and it would be nice if we could allow for this case of considering the offset in delimiter detection.

The last line should be ignored if it's empty

Thanks for the great work on this project. It fits my needs perfectly except for one minor detail. According to section 2 of RFC 4180, "The last record in the file may or may not have an ending line break." But if I parse a file that ends with a newline character, the parser returns an array ending with a nearly empty record that corresponds to the empty line at the end of the file. The record contains a key for the first column, but no other data. The parser should ignore the empty row at the end of the file.

Use MS Excel's "sep=" to detect the delimiter

Microsoft Excel uses different default delimiter based on the current OS locale. For example, if locale is set to US it will use "," as delimiter and if it's Danish, it will use ";" by default. It works like that because many European languages use comma for decimal notation ("1,23" instead of "1.23" like in US).
However, the default delimiter can be specified in the file by putting "sep=," as the first line (in this case it will use comma as the delimiter no matter what OS locale is set to). It would be great if this library could do the same to try to detect the default delimiter and then skip the first line of the file if it's used to specify the delimiter.

Is it possible to use another less memory intensive structure?

When I try to load large number of rows, the RAM usage goes above the 512MB I allocated for my PHP. Would be great if it is possible to store the data in a less memory intensive structure as opposed to the standard PHP array which is a memory hog.

flock causes the script to abort

The following line causes the script to abort (there is no error message) on OS X Yosemite 10.10.2 running XAMPP with PHP 5.5.14.

flock($fp, $lock);

It is defined inside parsecsv.lib.php inside the function _wfile
If I comment this line, everything works fine.

Last line not parsed

Hi,
When parsing a CSV string (not a file) coming from a textarea, the last line does not contain a \r or \n so it is not retrieved.
I think you should update your lib in order to add manually an end-of-line if the last characters are not \n or \r.
Thanks

Output method always produces headers and outputs directly to browser

Sure, you can set $this->output_filename to NULL before calling the output method, but the doc block for the method should reflect that. As it is now, one would expect the method to return a string if the first parameter is set to NULL.

In my modified parseCSV.php file, I just commented out lines 459 through 461:

    /*if (empty($filename)) {
        $filename = $this->output_filename;
    }*/

Creation of new columns works but headers not appended.

You can add new columns to the data and persist them with $csv->save().
However when this is done the keys added to the data set are not added to the headers.
I suppose it might be hard to ensure that added rows are done uniformly accross the data so maybe this is not desired functionality however it suited my purposes.
Is there a mechanism to at least manually edit the header line in this library? Would be useful.

conditions question

So I see the docblock for the conditions property states the property should be a string. Which is confirmed by looking at the conditions.php example
$csv->conditions = 'author does not contain dan brown';

However in the construct

if (count($conditions) > 0) {
    $this->conditions = $conditions;
}

I would venture to guess you mean to use strlen($conditions)>0 but I just wanted to make sure.

Silent data loss: Last line is ignored if it does not end with newline

The CSV parser causes a silent loss of data if an input CSV file does not have a trailing newline. In that case, the last line is ignored without any warning.

Example program:

<?php
$test_csv = "a1;b1;c1;d1\na2;b2;c2;d2";
require 'parsecsv.lib.php';
$parser = new parseCSV();
$parser->encoding('UTF-8', 'UTF-8');
$parser->heading = false;
$parser->delimiter = ';';
$parser->parse($test_csv);
var_export($parser->data);

Expected result:

array (
  0 => 
  array (
    0 => 'a1',
    1 => 'b1',
    2 => 'c1',
    3 => 'd1',
  ),
  1 => 
  array (
    0 => 'a2',
    1 => 'b2',
    2 => 'c2',
    3 => 'd2',
  ),
)

Actual result:

array (
  0 => 
  array (
    0 => 'a1',
    1 => 'b1',
    2 => 'c1',
    3 => 'd1',
  ),
)

Project TODO

  • make parseCSV use the iterator interface
    • let iterator function handle both modes, big array and CSVReaderRows
  • adapt file-handling from CSVReader in parseCSV
  • merge functions:
    • don't let CSVReader use str_getcsv() (maybe parse_string() can be used as a drop-in replacement)
  • add option to constructor, which mode to use
    • let constructor initialize both ways

how to convert 9.00E+18 numbers into simple numbers

hey folks
I have used you lib it is really awesome every thing is working fine in it..
but facing only one issue there is a field which is containing 13 digit integer value '12123123123123xx'
which is shown 9.00E+18 like this in sheet . When i render and insert it into db it insert the data like this 9.00E+18 so how to formate them into real value which we insert into it..
thanks in advance ..

Extra column added into parsed csv array

After upload I'm moving the file from $_FILES['somefile'][tmp_name] to server's file folder.
The address is stored into $newFile variable

$csvFile=new parseCSV($newFile);

After this it's printing in a foreach loop:

Array ( [Full Name] => Amaris Ever [Email] => [email protected] [Phone] => XXX-XXX-8738 [Mobile] => [Fax] => [Address] => [City] => [State] => TX [ZIP] => 75006 [Country] => US [10] => )

The last column does not exist in csv

Not splitting Excel CSV files with quotes around lines

The CSV exported from EXCEL returns it with quotes. However these are not being detected so headers are treated as a single array node for example (as are all other lines).

Example of Google Contact Fields Headers -

From Excel

"Name,Given Name,Additional Name,Family Name,Yomi Name,Given Name Yomi,Additional Name Yomi,Family Name Yomi,Name Prefix,Name Suffix,Initials,Nickname,Short Name,Maiden Name,Birthday,Gender,Location,Billing Information,Directory....etc"

Returned from parsecsv-for-php

[titles] => Array
(
[0] => Name,Given Name,Additional Name,Family Name,Yomi Name,Given Name Yomi,Additional Name Yomi,Family Name Yomi,Name Prefix,Name Suffix,Initials,Nickname,Short Name,Maiden Name,Birthday,Gender,Location,Billing Information,Directory..... etc
)

Any ideas on why this is happening ?

Single column, numerical CSV

Parsing issue when csv are numerical and has only one column.
ex:

86545235689
34365587654
13469874576

Somehow, it seperate each row base on the integer '6'.

I believe this was cause by having $enclosure = '"';
and requires all CSV to be enclose...

function parse_string() and unicode files

I don't have any data at hand to test it, but I spent some time playing around with the parse_string()-function for my chunk reader and think that it might fail if the csv file uses unicode line terminators.

Maybe more problems could occur from iterating $data[ ] as single characters, not taking care of multibyte characters.

Any thoughts?

Heading and offset

Hello,

I'm using ParseCSV for big files (169Mo and up to 97000 records) and I must use the offset/limit feature to parse step by step.
If my CSV page has a heading, for an offset of 0 I have my array with keys as head names, but if my offset is set more than 0, I lost the keys as head names.
I suggest to modify the code like this in parse_string function:

if ( $this->heading && empty($head) ) {
    $head = $row;
} elseif ( $this->_validate_offset($row_count) && $this->_validate_row_conditions($row, $this->conditions) ) {
    if ( empty($this->fields) || (!empty($this->fields) && (($this->heading && $row_count > 0) || !$this->heading)) ) {
        if ( !empty($this->sort_by) && !empty($row[$this->sort_by]) ) {
            if ( isset($rows[$row[$this->sort_by]]) ) {
                $rows[$row[$this->sort_by].'_0'] = &$rows[$row[$this->sort_by]];
                unset($rows[$row[$this->sort_by]]);
                for ( $sn=1; isset($rows[$row[$this->sort_by].'_'.$sn]); $sn++ ) {}
                $rows[$row[$this->sort_by].'_'.$sn] = $row;
            } else $rows[$row[$this->sort_by]] = $row;
        } else $rows[] = $row;
    }
}

With this modification, I have always the good head names in my array keys.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.