Comments (19)
The problem is when there is multiline text in a cell. As you are exploding shared string file with PHP_EOL
in SharedStringHelper.php#243 it messes up the indexing when there is new line in the cell text itself.
We solved it temporarily by replacing PHP_EOL
in text with unique character and then converting it back while retrieving the text.
Replace new line at SharedStringHelper.php#122
$unescapedTextValue = str_replace(PHP_EOL, ' ', $unescapedTextValue);
and convert it back on retrieval. SharedStringHelper.php#249
$sharedString = str_replace(' ', PHP_EOL, $this->inMemoryTempFileContents[$indexInFile]);
But this doesn't seems to be the good solution. May be you should consider calculating and adding numeric index in the shared string file and make calculations based on those index.
from spout.
Thanks for reporting this issue @ankitpokhrel. And thanks for investigating!
Reading the temporary file containing the shared strings should remain a pretty fast task. Currently, as you pointed it out, it is assumed that the data is on one line. This makes the data retrieval really efficient because you know that the shared string with ID 85 will be on the 85th line.
Allowing strings to be on multiple lines will prevent us from making this assumption and lead to a more complex code. For instance, you could use grep
to get the data that starts with "85--" (I tried using grep
to parse the shared strings file - and avoid having to load the temp file in memory - but ended up not using because it's slow...). Another way to achieve this would be to load everything in memory as I do now, but instead of doing an explode, I could do something smarter like this:
Adding string index and special delimiters "||" around it at SharedStringsHelper.php#L207
fwrite($this->tempFilePointer, "||$sharedStringIndex||$sharedString" . PHP_EOL);
Then, when loading the temp file in memory, instead of using explode -- SharedStringsHelper.php#L243 -- I could have something like this:
$fileHandle = fopen($tempFilePath, 'r');
if ($fileHandle) {
$currentSharedStringIndex = 0;
while (($line = fgets($fileHandle)) !== false) {
if (preg_match('/\|\|(\d+)\|\|(.*)/', $line, $matches)) {
$currentSharedStringIndex = $matches[1];
$currentSharedString = $matches[2];
$this->inMemoryTempFileContents[$currentSharedStringIndex] = $currentSharedString;
} else {
$this->inMemoryTempFileContents[$currentSharedStringIndex] .= PHP_EOL . $line;
}
}
fclose($fileHandle);
}
And retrieve it that way:
if (array_key_exists($sharedStringIndex, $this->inMemoryTempFileContents)) {
$sharedString = $this->inMemoryTempFileContents[$sharedStringIndex];
}
This would still be pretty similar to your solution, as you need to determine a special delimiter that can be ignored on retrieval.
I can't think of another better solution that would not affect the complexity and performance of the code. So I'm going to implement your initial suggestion. If you have other suggestions, please let me know!
from spout.
Glad to know that you find the solution useful @adrilo . Another solution that I can think of is to use special character + '\n'
as the line separator for the plugin and explode file content based on this. Not sure how feasible it will be but that should work too.
from spout.
That would work but I like it less that the first solution you suggested. What I did is encode the \n character the way Excel does it. So the data remains valid and unchanged at any step (which is not true if an index or any extra character gets added).
I'll go ahead and merge my pull request.
from spout.
The problem is always here ;
I'm trying to parse an xlsx and i have this error :
Fatal error: Uncaught exception 'Box\Spout\Reader\Exception\SharedStringNotFoundException' with message 'Shared string not found for index: 0' in /var/www/vhosts/_/httpdocs/vendor/box/spout/src/Spout/Reader/Helper/XLSX/SharedStringsHelper.php:261
Stack trace:
#0 /var/www/vhosts//httpdocs/vendor/box/spout/src/Spout/Reader/XLSX.php(219): Box\Spout\Reader\Helper\XLSX\SharedStringsHelper->getStringAtIndex(0)
#1 /var/www/vhosts/**/httpdocs/vendor/box/spout/src/Spout/Reader/AbstractReader.php(131): Box\Spout\Reader\XLSX->read()
#2 /var/www/vhosts/_/httpdocs/vendor/temp.php(56): Box\Spout\Reader\AbstractReader->hasNextRow()
#3 {main}
thrown in /var/www/vhosts/*/httpdocs/vendor/box/spout/src/Spout/Reader/Helper/XLSX/SharedStringsHelper.php on line 261
from spout.
@Alt0car which version of Spout are you using? The bug was fixed in 1.0.2.
from spout.
I'm using the 1.0.2 bro ^^. So i think the bug are not really fixed...
from spout.
Interesting... @Alt0car If your XLSX file does not contain any sensitive data, would you mind sending it to me to help me debugging this issue? You can send it at adrien_at_box.com
from spout.
Same Problem here...fixing this would bring a huge advantage.
I just try to convert a .XLS-file with 150000 lines into a .CSV-file
$reader = ReaderFactory::create(Type::XLSX);
$reader->open('file.xlsx');
$writer = WriterFactory::create(Type::CSV);
$writer->openToFile('file.csv');
while ($reader->hasNextSheet()) {
$reader->nextSheet();
while ($reader->hasNextRow()) {
$row = $reader->nextRow();
$writer->addRow($row);
}
$writer->close();
}
$reader->close();
and always get this error:
[2015-04-15 11:55:08] local.ERROR: exception 'Box\Spout\Reader\Exception\SharedStringNotFoundException' with message 'Shared string not found for index: 9' in ...\vendor\box\spout\src\Spout\Reader\Helper\XLSX\SharedStringsHelper.php:261
Stack trace:
#0 ...\vendor\box\spout\src\Spout\Reader\XLSX.php(219): Box\Spout\Reader\Helper\XLSX\SharedStringsHelper->getStringAtIndex(9)
#1 ...\vendor\box\spout\src\Spout\Reader\AbstractReader.php(131): Box\Spout\Reader\XLSX->read()
#2 ---\Box\Spout\Reader\AbstractReader->hasNextRow()
from spout.
@Alt0car and @hkretschmer Are you guys working with multiple sheet or single sheet?
from spout.
@ankitpokhrel I'm using single sheet files
from spout.
@hkretschmer Strange...v1.0.3 is working fine for all of my sheets. Is it possible to send your excel file at info [at] ankitpokhrel [dot] com [dot] np ?
from spout.
@ankitpokhrel Sorry but those files contain sensible data...but I've found out, that there must be a context with empty cells...very strange. If I manually put some dummy data in those empty cells, save the sheet, delete the dummies and save the sheet again, it's working...has anybody an idea?
from spout.
@hkretschmer how did you create your XLSX? Through Excel or programmatically?
Also, can you confirm that I understand the problem correctly?
This does not work:
A | B | C |
---|---|---|
value A1 | value C1 | |
value A2 | value B2 | value C2 |
But this does:
A | B | C |
---|---|---|
value A1 | dummy value | value C1 |
value A2 | value B2 | value C2 |
Is that correct? Can you reproduce the issue with this setup?
from spout.
@adrilo The XLSX is exportet from another program (not manually created). The setup, that you have shown above describes the problem correctly. But usually I have no problems with empty cells.
from spout.
@hkretschmer so this error with empty cells only happens sometimes? Will there be a way for you to create a XLSX file without sensitive data where the issue occurs? I guess the XLSX file generated by your program slightly differs from what Spout expects, hence the error. Being able to reproduce the issue would be super useful!
from spout.
@adrilo I think the problem might be because of self closing tags which represent NO data instead of EMPTY data but I am not quite sure.
<si xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"><t/></si>
I received a sample file with issue from @hkretschmer sometime ago which I will forward you shortly.
from spout.
Thanks!
from spout.
@ankitpokhrel @hkretschmer Thank you for taking the time to reproduce the issue. I found the cause of it and have a fix for it: #21.
You can now pull the latest version (v1.0.4) and your issue should be fixed!
from spout.
Related Issues (20)
- Thanks to package authors! How does it work? HOT 1
- Style not present when editing existing spreadsheet
- It takes around 5 secs to read 1000 rows (25 cols) HOT 1
- This page isn’t working when calling $reader = ReaderEntityFactory:: HOT 1
- Previous issue #861 prematurely closed HOT 4
- Support Excel cell Comments HOT 2
- Unable to open source data
- Trying to add vertical alignment and fix textWrap for XLSXWriter HOT 1
- Add option to change the encoding of the whole exported csv file (not only the content) HOT 1
- PHP 8.1 Error HOT 4
- Incorrect type reported when adding an unsupported type to a cell
- Percentage type reading supported? HOT 1
- Content-Disposition header for file download is not properly encoded HOT 1
- Format date on export XLSX
- Is it possible to use wb instead of wb+ fopen mode in openToFile() in AbstractWriter in order to use compress.zlib:// ? HOT 5
- PHP 8.1 Warnings HOT 2
- Doing cell->getValue() on loop stops when the value is a DateTime Object
- rowIterator->next() crash HOT 5
- rowIterator->next()
- Count total number of rows for XLSX HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from spout.