Git Product home page Git Product logo

Comments (19)

ankitpokhrel avatar ankitpokhrel commented on May 18, 2024

The problem is when there is multiline text in a cell. As you are exploding shared string file with PHP_EOL in SharedStringHelper.php#243 it messes up the indexing when there is new line in the cell text itself.

We solved it temporarily by replacing PHP_EOL in text with unique character and then converting it back while retrieving the text.

Replace new line at SharedStringHelper.php#122

 $unescapedTextValue = str_replace(PHP_EOL, '
', $unescapedTextValue);

and convert it back on retrieval. SharedStringHelper.php#249

$sharedString = str_replace('
', PHP_EOL, $this->inMemoryTempFileContents[$indexInFile]);

But this doesn't seems to be the good solution. May be you should consider calculating and adding numeric index in the shared string file and make calculations based on those index.

from spout.

adrilo avatar adrilo commented on May 18, 2024

Thanks for reporting this issue @ankitpokhrel. And thanks for investigating!

Reading the temporary file containing the shared strings should remain a pretty fast task. Currently, as you pointed it out, it is assumed that the data is on one line. This makes the data retrieval really efficient because you know that the shared string with ID 85 will be on the 85th line.
Allowing strings to be on multiple lines will prevent us from making this assumption and lead to a more complex code. For instance, you could use grep to get the data that starts with "85--" (I tried using grep to parse the shared strings file - and avoid having to load the temp file in memory - but ended up not using because it's slow...). Another way to achieve this would be to load everything in memory as I do now, but instead of doing an explode, I could do something smarter like this:

Adding string index and special delimiters "||" around it at SharedStringsHelper.php#L207

fwrite($this->tempFilePointer, "||$sharedStringIndex||$sharedString" . PHP_EOL);

Then, when loading the temp file in memory, instead of using explode -- SharedStringsHelper.php#L243 -- I could have something like this:

$fileHandle = fopen($tempFilePath, 'r');
if ($fileHandle) {
    $currentSharedStringIndex = 0;
    while (($line = fgets($fileHandle)) !== false) {
        if (preg_match('/\|\|(\d+)\|\|(.*)/', $line, $matches)) {
            $currentSharedStringIndex = $matches[1];
            $currentSharedString = $matches[2];
            $this->inMemoryTempFileContents[$currentSharedStringIndex] = $currentSharedString;
        } else {
            $this->inMemoryTempFileContents[$currentSharedStringIndex] .= PHP_EOL . $line;
        }
    }

    fclose($fileHandle);
}

And retrieve it that way:

if (array_key_exists($sharedStringIndex, $this->inMemoryTempFileContents)) {
    $sharedString = $this->inMemoryTempFileContents[$sharedStringIndex];
}

This would still be pretty similar to your solution, as you need to determine a special delimiter that can be ignored on retrieval.

I can't think of another better solution that would not affect the complexity and performance of the code. So I'm going to implement your initial suggestion. If you have other suggestions, please let me know!

from spout.

ankitpokhrel avatar ankitpokhrel commented on May 18, 2024

Glad to know that you find the solution useful @adrilo . Another solution that I can think of is to use special character + '\n' as the line separator for the plugin and explode file content based on this. Not sure how feasible it will be but that should work too.

from spout.

adrilo avatar adrilo commented on May 18, 2024

That would work but I like it less that the first solution you suggested. What I did is encode the \n character the way Excel does it. So the data remains valid and unchanged at any step (which is not true if an index or any extra character gets added).

I'll go ahead and merge my pull request.

from spout.

Alt0car avatar Alt0car commented on May 18, 2024

The problem is always here ;
I'm trying to parse an xlsx and i have this error :

Fatal error: Uncaught exception 'Box\Spout\Reader\Exception\SharedStringNotFoundException' with message 'Shared string not found for index: 0' in /var/www/vhosts/_/httpdocs/vendor/box/spout/src/Spout/Reader/Helper/XLSX/SharedStringsHelper.php:261
Stack trace:
#0 /var/www/vhosts/
/httpdocs/vendor/box/spout/src/Spout/Reader/XLSX.php(219): Box\Spout\Reader\Helper\XLSX\SharedStringsHelper->getStringAtIndex(0)
#1 /var/www/vhosts/**
/httpdocs/vendor/box/spout/src/Spout/Reader/AbstractReader.php(131): Box\Spout\Reader\XLSX->read()
#2 /var/www/vhosts/_/httpdocs/vendor/temp.php(56): Box\Spout\Reader\AbstractReader->hasNextRow()
#3 {main}
thrown in /var/www/vhosts/
*/httpdocs/vendor/box/spout/src/Spout/Reader/Helper/XLSX/SharedStringsHelper.php on line 261

from spout.

adrilo avatar adrilo commented on May 18, 2024

@Alt0car which version of Spout are you using? The bug was fixed in 1.0.2.

from spout.

Alt0car avatar Alt0car commented on May 18, 2024

I'm using the 1.0.2 bro ^^. So i think the bug are not really fixed...

from spout.

adrilo avatar adrilo commented on May 18, 2024

Interesting... @Alt0car If your XLSX file does not contain any sensitive data, would you mind sending it to me to help me debugging this issue? You can send it at adrien_at_box.com

from spout.

 avatar commented on May 18, 2024

Same Problem here...fixing this would bring a huge advantage.
I just try to convert a .XLS-file with 150000 lines into a .CSV-file

    $reader = ReaderFactory::create(Type::XLSX);
    $reader->open('file.xlsx');

    $writer = WriterFactory::create(Type::CSV);
    $writer->openToFile('file.csv');

    while ($reader->hasNextSheet()) {
        $reader->nextSheet();

        while ($reader->hasNextRow()) {
            $row = $reader->nextRow();
            $writer->addRow($row);
        }
        $writer->close();
    }
    $reader->close();

and always get this error:
[2015-04-15 11:55:08] local.ERROR: exception 'Box\Spout\Reader\Exception\SharedStringNotFoundException' with message 'Shared string not found for index: 9' in ...\vendor\box\spout\src\Spout\Reader\Helper\XLSX\SharedStringsHelper.php:261
Stack trace:
#0 ...\vendor\box\spout\src\Spout\Reader\XLSX.php(219): Box\Spout\Reader\Helper\XLSX\SharedStringsHelper->getStringAtIndex(9)
#1 ...\vendor\box\spout\src\Spout\Reader\AbstractReader.php(131): Box\Spout\Reader\XLSX->read()
#2 ---\Box\Spout\Reader\AbstractReader->hasNextRow()

from spout.

ankitpokhrel avatar ankitpokhrel commented on May 18, 2024

@Alt0car and @hkretschmer Are you guys working with multiple sheet or single sheet?

from spout.

 avatar commented on May 18, 2024

@ankitpokhrel I'm using single sheet files

from spout.

ankitpokhrel avatar ankitpokhrel commented on May 18, 2024

@hkretschmer Strange...v1.0.3 is working fine for all of my sheets. Is it possible to send your excel file at info [at] ankitpokhrel [dot] com [dot] np ?

from spout.

 avatar commented on May 18, 2024

@ankitpokhrel Sorry but those files contain sensible data...but I've found out, that there must be a context with empty cells...very strange. If I manually put some dummy data in those empty cells, save the sheet, delete the dummies and save the sheet again, it's working...has anybody an idea?

from spout.

adrilo avatar adrilo commented on May 18, 2024

@hkretschmer how did you create your XLSX? Through Excel or programmatically?
Also, can you confirm that I understand the problem correctly?

This does not work:

A B C
value A1 value C1
value A2 value B2 value C2

But this does:

A B C
value A1 dummy value value C1
value A2 value B2 value C2

Is that correct? Can you reproduce the issue with this setup?

from spout.

 avatar commented on May 18, 2024

@adrilo The XLSX is exportet from another program (not manually created). The setup, that you have shown above describes the problem correctly. But usually I have no problems with empty cells.

from spout.

adrilo avatar adrilo commented on May 18, 2024

@hkretschmer so this error with empty cells only happens sometimes? Will there be a way for you to create a XLSX file without sensitive data where the issue occurs? I guess the XLSX file generated by your program slightly differs from what Spout expects, hence the error. Being able to reproduce the issue would be super useful!

from spout.

ankitpokhrel avatar ankitpokhrel commented on May 18, 2024

@adrilo I think the problem might be because of self closing tags which represent NO data instead of EMPTY data but I am not quite sure.

<si xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"><t/></si>

I received a sample file with issue from @hkretschmer sometime ago which I will forward you shortly.

from spout.

adrilo avatar adrilo commented on May 18, 2024

Thanks!

from spout.

adrilo avatar adrilo commented on May 18, 2024

@ankitpokhrel @hkretschmer Thank you for taking the time to reproduce the issue. I found the cause of it and have a fix for it: #21.
You can now pull the latest version (v1.0.4) and your issue should be fixed!

from spout.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.