Git Product home page Git Product logo

creek's People

Contributors

aralox avatar bgentry avatar bschmeck avatar davich avatar dbernheisel avatar demimismo avatar dmitriyfirsov avatar dpsk avatar flyingmachine avatar harrykiselev avatar jarredholman avatar juniljacob avatar kamilhism avatar maciejmajewski avatar maland avatar mateusmedeiros avatar mdemare avatar melcha avatar mindreframer avatar pavitkaur05 avatar pdenya avatar petergoldstein avatar ppostma avatar pythonicrubyist avatar radar avatar rhammam1 avatar thomassevestre avatar weilandia avatar westonganger avatar y-f-u avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

creek's Issues

Can't read an xlsx file created programmatically with Axlsx

Any files that are created with the Axlsx gem can not be read. The data in the first worksheet appears to be fictional.

I have a spreadsheet filled with data, and a header in the first row going out to column JN2. However, creek reads this row as:

{"A2"=>nil, "B2"=>nil, "C2"=>nil, "D2"=>nil, "E2"=>11000.0, "F2"=>1.0, "G2"=>0.0, "H2"=>490.0, "I2"=>nil, "J2"=>nil, "K2"=>nil, "L2"=>nil, "M2"=>nil, "N2"=>1.0, "O2"=>4.0, "P2"=>nil, "Q2"=>nil, "R2"=>nil, "S2"=>nil, "T2"=>nil, "U2"=>nil, "V2"=>nil, "W2"=>4.0, "X2"=>nil, "Y2"=>nil, "Z2"=>8.0, "AA2"=>8.0, "AB2"=>8.0, "AC2"=>1.0, "AD2"=>nil, "AE2"=>nil, "AF2"=>33597.0, "AG2"=>nil, "AH2"=>588.0, "AI2"=>nil, "AJ2"=>32257.0, "AK2"=>nil, "AL2"=>nil, "AM2"=>nil, "AN2"=>nil, "AO2"=>nil, "AP2"=>0.0, "AQ2"=>nil, "AR2"=>nil, "AS2"=>nil, "AT2"=>nil, "AU2"=>nil, "AV2"=>nil, "AW2"=>nil, "AX2"=>nil, "AY2"=>nil, "AZ2"=>nil, "BA2"=>nil, "BB2"=>nil, "BC2"=>nil, "BD2"=>nil, "BE2"=>nil, "BF2"=>nil, "BG2"=>nil, "BH2"=>nil, "BI2"=>nil, "BJ2"=>nil, "BK2"=>nil, "BL2"=>nil, "BM2"=>nil, "BN2"=>nil, "BO2"=>nil, "BP2"=>nil, "BQ2"=>nil, "BR2"=>nil, "BS2"=>nil, "BT2"=>nil, "BU2"=>nil, "BV2"=>nil, "BW2"=>nil, "BX2"=>nil, "BY2"=>nil, "BZ2"=>nil, "CA2"=>nil, "CB2"=>nil, "CC2"=>nil, "CD2"=>nil, "CE2"=>nil, "CF2"=>nil, "CG2"=>nil, "CH2"=>nil, "CI2"=>nil, "CJ2"=>nil, "CK2"=>nil, "CL2"=>nil, "CM2"=>nil, "CN2"=>nil, "CO2"=>nil, "CP2"=>nil, "CQ2"=>nil, "CR2"=>nil, "CS2"=>nil, "CT2"=>nil, "CU2"=>nil, "CV2"=>nil, "CW2"=>nil, "CX2"=>nil, "CY2"=>nil, "CZ2"=>nil, "DA2"=>nil, "DB2"=>nil, "DC2"=>nil, "DD2"=>nil, "DE2"=>nil, "DF2"=>nil, "DG2"=>nil, "DH2"=>nil, "DI2"=>nil, "DJ2"=>nil, "DK2"=>nil, "DL2"=>nil, "DM2"=>nil, "DN2"=>nil, "DO2"=>nil, "DP2"=>nil, "DQ2"=>nil, "DR2"=>nil, "DS2"=>nil, "DT2"=>nil, "DU2"=>nil, "DV2"=>nil, "DW2"=>nil, "DX2"=>nil, "DY2"=>nil, "DZ2"=>nil, "EA2"=>nil, "EB2"=>nil, "EC2"=>nil, "ED2"=>nil, "EE2"=>nil, "EF2"=>nil, "EG2"=>nil, "EH2"=>nil, "EI2"=>nil, "EJ2"=>nil, "EK2"=>nil, "EL2"=>nil, "EM2"=>nil, "EN2"=>nil, "EO2"=>nil, "EP2"=>nil, "EQ2"=>nil, "ER2"=>nil, "ES2"=>nil, "ET2"=>nil, "EU2"=>nil, "EV2"=>nil, "EW2"=>nil, "EX2"=>nil, "EY2"=>nil, "EZ2"=>nil, "FA2"=>nil, "FB2"=>nil, "FC2"=>nil, "FD2"=>nil, "FE2"=>nil, "FF2"=>nil, "FG2"=>nil, "FH2"=>nil, "FI2"=>nil, "FJ2"=>nil, "FK2"=>nil, "FL2"=>nil, "FM2"=>nil, "FN2"=>nil, "FO2"=>nil, "FP2"=>nil, "FQ2"=>nil, "FR2"=>nil, "FS2"=>nil, "FT2"=>nil, "FU2"=>nil, "FV2"=>nil, "FW2"=>nil, "FX2"=>nil, "FY2"=>nil, "FZ2"=>nil, "GA2"=>nil, "GB2"=>nil, "GC2"=>nil, "GD2"=>nil, "GE2"=>nil, "GF2"=>nil, "GG2"=>nil, "GH2"=>nil, "GI2"=>nil, "GJ2"=>nil, "GK2"=>nil, "GL2"=>nil, "GM2"=>nil, "GN2"=>nil, "GO2"=>nil, "GP2"=>nil, "GQ2"=>nil, "GR2"=>nil, "GS2"=>nil, "GT2"=>nil, "GU2"=>nil, "GV2"=>nil, "GW2"=>nil, "GX2"=>nil, "GY2"=>nil, "GZ2"=>nil, "HA2"=>nil, "HB2"=>nil, "HC2"=>nil, "HD2"=>nil, "HE2"=>nil, "HF2"=>nil, "HG2"=>nil, "HH2"=>nil, "HI2"=>nil, "HJ2"=>nil, "HK2"=>nil, "HL2"=>nil, "HM2"=>nil, "HN2"=>nil, "HO2"=>nil, "HP2"=>nil, "HQ2"=>nil, "HR2"=>nil, "HS2"=>nil, "HT2"=>nil, "HU2"=>nil, "HV2"=>nil, "HW2"=>nil, "HX2"=>nil, "HY2"=>nil, "HZ2"=>nil, "IA2"=>nil, "IB2"=>nil, "IC2"=>nil, "ID2"=>nil, "IE2"=>nil, "IF2"=>nil, "IG2"=>nil, "IH2"=>nil, "II2"=>nil, "IJ2"=>nil, "IK2"=>nil, "IL2"=>nil, "IM2"=>nil, "IN2"=>nil, "IO2"=>nil, "IP2"=>nil, "IQ2"=>nil, "IR2"=>nil, "IS2"=>nil, "IT2"=>nil, "IU2"=>nil, "IV2"=>nil, "IW2"=>nil, "IX2"=>nil, "IY2"=>nil, "IZ2"=>nil, "JA2"=>nil, "JB2"=>nil, "JC2"=>nil, "JD2"=>nil, "JE2"=>nil, "JF2"=>nil, "JG2"=>nil, "JH2"=>nil, "JI2"=>nil, "JJ2"=>nil, "JK2"=>nil, "JL2"=>nil, "JM2"=>nil, "JN2"=>nil}

It's interesting that it does capture the correct number of columns but none of the nil or numeric values above have any basis in reality.

Viewing the file, in Excel, it looks like a normal file and functions normally.

If I first view the file in Excel and then save it, creek reads it successfully.

I can't figure out why creek can read the file after it has been manually opened and saved by Excel, but not before. After all, Excel is able to read it. Obviously there is some kind of state change in the file after Excel reads it, I just can't imagine what it could be.

Why does the next image repeat the last one? Not one in one cell?

New question is cell "H2"=>[#Pathname:/var/folders/8d/csyp9yv9641gfp1t_pf02ybr0000gn/T/creek__drawing20171011-19805-63gbw0/image1.jpeg] It has one picture;
In cell "H3"=>[#Pathname:/var/folders/8d/csyp9yv9641gfp1t_pf02ybr0000gn/T/creek__drawing20171011-19805-63gbw0/image1.jpeg, #Pathname:/var/folders/8d/csyp9yv9641gfp1t_pf02ybr0000gn/T/creek__drawing20171011-19805-63gbw0/image2.jpeg] two pictures,
Why repeat cell "H2" picture? Why not one picture in a cell?

Replace HTTParty with http.rb?

Would you be opposed to replacing this gem's use of HTTParty with another HTTP lib, like http.rb? It appears it is only used in one place.

HTTParty's post-install message is a nuisance that I always try to keep out of my dependency tree:

When you HTTParty, you must party hard!

As a bonus, http.rb should be faster too. I'd be happy to send a PR.

How to close the file after parsing through the xlsx file

Hi,

I am using creek gem in my project, as bellow shown code. After parsing through the xlsx file, how do i close the file ( in below code how do i close the parseFile). Do we have something like "parseFile.close"
require 'creek'
parseFile= Creek::Book.new 'specs/fixtures/sample.xlsx'
sheet= parseFile.sheets[0]

With Regards,
Naresh

xls support

It would be nice to use this for xls files as well.
Cheers

`<module:Constants>': uninitialized constant Creek::Styles::Constants::Date (NameError)

Require the lib raise a uninitialized constant exception with ruby 2.2.0:

/home/reaper/.rvm/gems/ruby-2.2.0/gems/creek-1.0.6/lib/creek/styles/constants.rb:40:in `<module:Constants>': uninitialized constant Creek::Styles::Constants::Date (NameError)
        from /home/reaper/.rvm/gems/ruby-2.2.0/gems/creek-1.0.6/lib/creek/styles/constants.rb:3:in `<class:Styles>'
        from /home/reaper/.rvm/gems/ruby-2.2.0/gems/creek-1.0.6/lib/creek/styles/constants.rb:2:in `<module:Creek>'
        from /home/reaper/.rvm/gems/ruby-2.2.0/gems/creek-1.0.6/lib/creek/styles/constants.rb:1:in `<top (required)>'
        from /home/reaper/.rvm/rubies/ruby-2.2.0/lib/ruby/2.2.0/rubygems/core_ext/kernel_require.rb:54:in `require'
        from /home/reaper/.rvm/rubies/ruby-2.2.0/lib/ruby/2.2.0/rubygems/core_ext/kernel_require.rb:54:in `require'
        from /home/reaper/.rvm/gems/ruby-2.2.0/gems/creek-1.0.6/lib/creek.rb:3:in `<top (required)>'
        from /home/reaper/.rvm/rubies/ruby-2.2.0/lib/ruby/2.2.0/rubygems/core_ext/kernel_require.rb:128:in `require'
        from /home/reaper/.rvm/rubies/ruby-2.2.0/lib/ruby/2.2.0/rubygems/core_ext/kernel_require.rb:128:in `rescue in require'
        from /home/reaper/.rvm/rubies/ruby-2.2.0/lib/ruby/2.2.0/rubygems/core_ext/kernel_require.rb:39:in `require'
        from ./import_excel.rb:3:in `<main>'

Make typecasting optional

This is an excellent and fast gem, but I was surprised to see that it does casting to various Ruby types. Perhaps we can make that optional?

Array sorting exception by certain type of xlsx

Hi,
i have recently stumbled upon your handy library. Unfortunately it was not working out of box for my xlsx files. Sorting exception was thrown.

ArgumentError:
       comparison of NilClass with String failed
     # ./lib/creek/sheet.rb:101:in `sort'

It looks that my files are somehow "malformed" and the Nokogiri XML reader parses our weird node with no content which is stored as nil => "" pair in cells hash. The nil key value blows the array sorting. And should be not there in general. The problem disappears after i re-save the file in LibreOffice.

There is a simple fix for my scenario together with example spec and malformed fixture file in my fork:
martinsabo@5a6a338

The weird node is parsed out from this part:

<?xml version="1.0" encoding="UTF-8"?>
<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"><dimension ref="A1"/><sheetViews><sheetView workbookViewId="0" tabSelected="true"/></sheetViews><sheetFormatPr defaultRowHeight="15.0" baseColWidth="40"/><cols><col min="1" max="1" width="60.0" customWidth="true"/><col min="2" max="2" width="20.0" customWidth="true"/></cols>

If somebody had similar problem please post a comment.

Thank you.

More detailed examples in documentation

I was wondering if it may be possible to add examples on how to access whole rows and row elements. All I've seen is row inspect which returns both, or, maybe explain how using row.inspect we could access some specific row elements.

Replace removed example xlsx file in README

Following the example in the README

This doesn't seem to work
Tested on MacOS, ruby version: ruby 2.6.2p47
creek version: creek (2.4.4). It seems the example xlsx file doesn't exist. This should probably be updated to pull from a more reliable source. Why not just included an example xlsx file in this repo and use that?

require 'creek'
remote_url = 'http://dev-builds.libreoffice.org/tmp/test.xlsx'
Creek::Book.new remote_url, remote: true

Raises

...gems/ruby-2.4.1/gems/rubyzip-1.2.3/lib/zip/central_directory.rb:143:in `get_e_o_c_d': Zip end of central directory signature not found (Zip::Error)

Handle weird character escape sequence

The OOXML spec allows non-printable characters to be encoded using an escape sequence _xHHHH_ where HHHH is the hex code of the character.

I have received a few xlsx files that contain _x000D_ sequences (the carriage return character).

If you load the file in creek (or open the xml file in a text editor), the cell contents will be
"first line_x000D_\nsecond line"
Where _x000D_ is the carriage return character, and the line feed character is encoded normally.

When you open it in excel you just see a newline.

see:

SharedStrings are entirely loaded in memory

I'm trying to parse the first couple of rows of a large XLSX, and it seems that the entire workbook's SharedStrings is loaded when calling Creek::Book.new(file), which somewhat defeats the purpose of streaming the rows efficiently.

I tested the memory performance when loading a 12MB XLSX.

require 'benchmark/memory'

describe 'Creek opens files efficiently' do
  it 'foo' do
    Benchmark.memory do |x|
      x.report("opening") { Creek::Book.new 'spec/fixtures/large.xlsx' }
    end
  end
end

Here are the results:

Calculating -------------------------------------
             opening    27.576M memsize (   696.000  retained)
                       257.683k objects (    11.000  retained)
                        50.000  strings (     9.000  retained)

When I comment out loading SharedStrings, then I don't see that memory bloat:

# lib/creek/shared_strings.rb
# lines 17-22

+      @dictionary = Hash.new
+      # if @book.files.file.exist?(path)
+      #   doc = @book.files.file.open path
+      #   xml = Nokogiri::XML::Document.parse doc
+      #   parse_shared_string_from_document(xml)
+      # end
-      if @book.files.file.exist?(path)
-        doc = @book.files.file.open path
-        xml = Nokogiri::XML::Document.parse doc
-        parse_shared_string_from_document(xml)
-      end
Calculating -------------------------------------
             opening    88.282k memsize (     0.000  retained)
                       319.000  objects (     0.000  retained)
                        35.000  strings (     0.000  retained)

Is there a way to get shared strings lazily?

Is it possible to disable the sheet type conversion?

If I upload an excel workbook where a column is labeled as a "Number" type, Creek automatically calls .to_f on the value. I want to only get the string value of the original cell.

Is this possible? Kind of deal break since some users upload workbooks that have the Number/Decimal type set on a column, and then I get weird values like "5.109182736E9" when I want to interpret it as a literal string (the original being "5109182736")

Thanks!

In the Creek::Styles::Converter class it is said that type can also be determined by a style, however there is no way from the Creek::Sheet Class to assign a style to the type

Hi,
In the Creek::Styles::Converter class it is said that type can also be determined by a style, however beyond those established in constant DATE_TYPES, there is no way from the Creek::Sheet class to assign a style to the type.
In my case, I think because an older version of nokogiri gem, cells with Large Numbers based strings (and format as text) are retrieved as nodes with type 'n' (as numbers) and not 's' (strings), so they are converted into float. However the book styles retrieved from Creek::Styles::StyleTypes, determine those cells as :unsupported and if there was a way to explicitly tell the converter to use the style and not the type, the converter's own functionality would make it work correctly.

Kind Regards,

Add rubocop

Clean up the whole app by getting rubocop runs to pass, then add it to the Rakefile

I am willing to implement it.

Remove http dependency?

Hello!

I would like to propose removing the http gem dependency entirely. From what I see, it's only used to download remote XLSX files, which, in my opinion, should be handled outside the gem by the devs (by their favourite http gem with 10 monkeypatches for their own super-secret security policy :) ).

Benefits that I see:

  • does not require to download a lib which has tons of alternatives
  • does not require users to update the lib to match Creek dependency
  • prevents clashes with other gems depending on HTTP

Personally, it would allow me to update Creek without regenerating VCRs and solving problems that come with it... 😇

Let me know what you think!

Creek rounding numbers

I have a cell with the value 2.16 that is formatted as as number with no decimal places. Creek reads 2.
image

How to parse date?

I am trying to import date from an excelsheet, its in the format of dd/mm/yyyy or dd-mm-yyyy.
The cell type is date. When parsing the sheet, it converts that to a number and eventually wrong date.
I was not able to find how to save correct date using the gem. Could you please help?

Character sequence "_xhtml_" is replaced with NUL byte

In #73, a change was made to recognize special hex code escape sequence of the format x, e.g. _x000D_ for carriage return characters.

However, the code to recognize these character squences is using the regular expression
HEX_ESCAPE_REGEXP = /_x[0-9A-Za-z]{4}_/
which finds more than only hex sequences.

This was causing a problem for me with a spreadsheet that somewhere contained the (unescaped) string "_xhtml_". creek replaced it with a NUL byte (\0), which was causing errors in my application.

An example for a sharedStrings.xml (created manually with LibreOffice) to reproduce the issue:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <sst xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main" count="9" uniqueCount="9"><si><t xml:space="preserve">Test Case Name</t></si><si><t xml:space="preserve">Description</t></si><si><t xml:space="preserve">Step Instructions</t></si><si><t xml:space="preserve">Expected Results</t></si><si><t xml:space="preserve">A</t></si><si><t xml:space="preserve">B&#10;C</t></si><si><t xml:space="preserve">A_xhtml_B</t></si><si><t xml:space="preserve">D</t></si><si><t xml:space="preserve">A_x005F_x000D_B</t></si></sst>

I entered both "_xhtml_" and "_x000D_" manually in LibreOffice. You can see that in case of the second value it escaped the first underscore, but the "_xhtml_" string was not escaped (because it is not a hex value).

I guess you fix the issue by matching only
/_x[0-9A-Fa-f]{4}_/ ?

Thanks in advance!

xml-namespaces in a book

I've met an xlsx file with xml-prefixes in a workbook and worksheets. Currently parsing such file returns zero sheets :(
I don't know what application saves xlsx files that way and unfortunately I can't share the file - it contains confidential information. I fixed an issue in this PR (#66).
Sorry for lots of changes, I just tried to reduce code duplication.

Error parsing date column in JRuby

I have an XLSX spreadsheet that parses correctly using creek on MRI. When I parse the same file using creek on JRuby, I get the following error:

[1] pry(main)> require 'creek'
Ignoring jruby-launcher-1.1.1-java because its extensions are not built.  Try: gem pristine jruby-launcher --version 1.1.1
Ignoring jruby-launcher-1.1.1-java because its extensions are not built.  Try: gem pristine jruby-launcher --version 1.1.1
=> true
[2] pry(main)> book = Creek::Book.new('./test.xlsx')
=> #<Creek::Book:0x34a75079 @files=./test.xlsx, @shared_strings=#<Creek::SharedStrings:0x107ed6fc @book=#<Creek::Book:0x34a75079 ...>>>
[3] pry(main)> book.sheets[0].rows.each { |row| }
RangeError: too big for int: 2165046705
from /home/duncan/.rbenv/versions/jruby-1.7.19/lib/ruby/1.9/date.rb:1354:in `+'

Versions:

  • creek (1.0.9)
  • nokogiri (~> 1.6.0)
  • rubyzip (>= 1.0.0)
  • jruby 1.7.19 (1.9.3p551) 2015-01-29 20786bd on Java HotSpot(TM) 64-Bit Server VM 1.8.0_45-b14 +jit [linux-amd64]

You can download the test XLSX file here.

excel_col_names bloats memory

While profiling some of our application code, I noticed that these lines:

creek/lib/creek/sheet.rb

Lines 28 to 30 in 4808520

(0...16384).each do |i|
@@excel_col_names[col_name(i)] = i
end

cause 16384 objects (about 1.5MB) to be permanently allocated, since the class variable can't be garbage collected.

I have some ideas for how to fix, if you are up for it:

  1. Make it a instance variable, so the whole object can be garbage collected when the Book goes out of scope. The code is already fast, and is already re-generating this hash every time a Book gets initialized, so it should not be noticeable.

  2. Make @ excel_col_names a sparse hash, so only the subset of column names that are needed for a particular spreadsheet are used, something like:

excel_col_names = Hash.new do |key,hash|
  hash[col_name(key)] = key  # I think this is not quite right, but you get the idea
end

...so then you could reference any of the 16384 column names lazily at any point in the code

I'm totally up for making these changes if you're game

Open xlsx with known Password for reading

I am looking for a way to open a password protected file in Rails. We know the password, so this is not cracking, just opening like normal but with the use of the password.
The client wants to upload a password protected file (details have potential private data in it).
We need to open the file and process the contents.
Normal ... just with the twist of using the password to open the file.

I searched you issues for 'password' and 'protected' and did not see anything. Also it does not appear to be in the docs.

Thanks!

Large numbers casted as BigDecimal

Not sure if it's intended but a original value of 783294732 becomes 7.83294732E8. I dug and it seems to be a conversion happening to large numbers, which are casted as BigDecimal.

Why then the resulting value is a stringified BigDecimal, and not a BigDecimal instance ?

Not reading data. At all.

I have an Excel file, created with Axlsx, that Excel opens just fine.
There is only one sheet.
When looping through the rows of that sheet, the values look like this:

Row 1: {}
Row 2: {"A2"=>nil, "B2"=>nil, "C2"=>nil, "D2"=>nil, "E2"=>nil, "F2"=>1.0, "G2"=>20.0, "H2"=>179.0, "I2"=>nil, "J2"=>nil, "K2"=>nil, "L2"=>nil, "M2"=>nil, "N2"=>1.0, "O2"=>55.0, "P2"=>nil, "Q2"=>nil, "R2"=>nil, "S2"=>nil, "T2"=>nil, "U2"=>nil, "V2"=>nil, "W2"=>55.0, "X2"=>nil, "Y2"=>nil, "Z2"=>13.0, "AA2"=>8.0, "AB2"=>13.0, "AC2"=>nil, "AD2"=>8668.0, "AE2"=>nil, "AF2"=>63692.0, "AG2"=>nil, "AH2"=>229.0, "AI2"=>nil, "AJ2"=>33990.0, "AK2"=>nil, "AL2"=>0.0, "AM2"=>nil, "AN2"=>nil, "AO2"=>nil, "AP2"=>0.0, "AQ2"=>0.0, "AR2"=>0.0, "AS2"=>nil, "AT2"=>nil, "AU2"=>0.0, "AV2"=>nil, "AW2"=>nil, "AX2"=>nil, "AY2"=>0.0, "AZ2"=>0.0, "BA2"=>0.0, "BB2"=>0.0, "BC2"=>nil, "BD2"=>nil, "BE2"=>nil}

The first row is completely blank! In reality it has headers going out to column BD.

If I cut and paste the contents into a brand new sheet, and save it, the file is read properly.

And yet, Excel can read either file.

Any idea what's going on here?

Blank Cells not appearing in one sheet in spreadsheet

I have no clue what is causing this, but I have narrowed the problem down to one sheet in this file:

http://static.crunchbase.com/exports/crunchbase_monthly_export.xlsx

When looking at all other sheets except for sheet 4 (Investments), nils are returned for blank cells.

I just spent several hours scratching my head trying to figure out a gem that would work, and Dullard is so far the fastest option. When I dug in, I noticed other issues referencing this and assumed it wasn't fixed.

In my testing, though, I'd only tested with the same sheet. In testing every other sheet, there is no problem.

Is anyone else seeing the same thing? Any advice?

I'll be able to get at the data I want from other sheets, but would love to help track this one down if I can. Let me know if there's any more info you need from me!

Read Cell Value

Hello,

can you simply provide an example for reading the value from a cell, like A1?

Thanks!
RK

read correct decoding

I have a xlsx in format windows-1252.

When I read cell.format I get utf-8 despite I put the "encoding: windows-1252" line at the start of the file.

Is There a way to change the format to read the file?

Sheets are returned in the wrong order

I have a spreadsheet with 10+ sheets, and I need to access the third one, but when I get the index 2 the sheet returned is the 8th one. The names also seem to be associated wrong, because I get the same results when trying a find by .name().

Creek::Book.new(path).sheets.find { |sheet| sheet.name == 'NAME' }  # returns 8th sheet
Creek::Book.new(path).sheets[2]  # returns 8th sheet

I can't provide the spreadsheet I'm using, but I think it's related to google docs.
I exported the spreadsheet to .xlsx using google sheets and it seems there's a very similar issue on roo that was related to google sheets exported spreadsheets: roo-rb/roo#166

Problems reading file created by AXLSX gem

In my app, the data is pulled from various sources and stored in the form of Excel file created by AXLSX gem.

It seems that creek has problems reading this data file. After opening in Excel and saving, all works fine.

Do you have any hint what is wrong and how can I make the file readable without going throgh Excel save step?

Please find attached the example file that has troubles being read by creek:
failing_creek_read.xlsx

creek 1.1.2 depends on nokigiri ~> 1.6.0 which has a security vulnerability

The following is the output of bundle audit

ruby-advisory-db: 284 advisories
Name: nokogiri
Version: 1.6.8.1
Advisory: CVE-2016-4658
Criticality: Unknown
URL: https://github.com/sparklemotion/nokogiri/issues/1615
Title: Nokogiri gem contains several vulnerabilities in libxml2 and libxslt
Solution: upgrade to >= 1.7.1

Vulnerabilities found!

But creek depends on that specific version of nokigiri. can we release a version of creek that depends on nokigiri 1.7.1 ?

Thanks

Can't install Creek gem in Ruby 2.7

I can't install Creek gem in newest version of Ruby, it throws the following error

C:\>gem install creek
ERROR:  Error installing creek:
        The last version of nokogiri (>= 1.10.0) to support your Ruby & RubyGems was 1.10.7. Try installing it with `gem install nokogiri -v 1.10.7` and then running the current command again
        nokogiri requires Ruby version >= 2.3, < 2.7.dev. The current ruby version is 2.7.0.0.

Creek not able to handle custom formatted Cell. Parsing it as a datetime

I am parsing an excel spreadsheet with a cell that is custom formatted. The value is actually 25 but the format to display it is 'JPY25'. Creek gets the number but the number is converted to a DateTime after passing through the convert function.

This is the node.attributes hash:
{"r"=>"M2", "s"=>"8"}

When the convert function(Ref below) is called from inside the rows_generator function, node.value is 25 for a cell formatted as 'JPY25' but because of the node.attributes['s'], the type(which is nil - determined by node.attributes['t']) is determined by style(which is DateTime) and a DateTime value is generated after Creek::Styles::Converter.call is called.

def convert(value, type, style_idx)
      style = @book.style_types[style_idx.to_i]
      Creek::Styles::Converter.call(value, type, style, converter_options)
end

Can we have the ability to pass an optional hash in instance method simple_rows of Creek::Sheet class?
This way, we can define what type (:float, :string, :date, :bignum etc) we want for a particular column. Maybe a hash like below:

type_hash = {"A" => :float, "B" => :string}

Please let me know if this approach is correct or if there is a more valid solution.

P.S. Please don't mind the explanation part. Doing it for the first time.

Creek reads first row as empty [] even though it has data

When running creek on mac, i see this issue of returning first row as empty []. Once I open the file in Microsoft Excel for mac, and save it, then it reads fine.
Am I missing any thing? Any help/suggestions would be great.
Ruby 2.3.0
creek 1.1.2
MacOS Sierra 10.12.3

How to get header as Key name?

The first row of my excel has the header. How can I get the header name instead of "A", "B" etc.?

{"A"=>"DATE, "B"=>"TEXT", "C"=>"TEXT", "D"=>"TEXT"}

-->

{"Start date"=>"DATE, "First name"=>"TEXT", "Last name"=>"TEXT", "Address"=>"TEXT"}

Would be great to parse header giving a regex expression. Just like Roo gem

Unable to handle formatted cells

I have an excel spreadsheet with a cell that is formatted to show a date based on a number. Creek doesn't seem to take this into account when presenting the value. Is this intended?

Wrongly truncating numbers

I'm not sure if this is a bug of Google Sheets or Creek (or both or neither), but I have a spreadsheet with a row filled with numbers. One of these numbers, for example, is displayed on google sheets as 52.097.260. Up in the "fx" bar, the same number is displayed as 52097260.

In the sheet xml it seems the number is actually a float: <v>5.209726E7</v>. Ok until here.

When I try to get the rows with Creek in my ruby app though, the row is returned like this:

{"A5"=>"Brasil", "B5"=>5, "C5"=>5, "D5"=>5, "E5"=>5, "F5"=>5, "G5"=>5, "H5"=>5, "I5"=>5, "J5"=>5, "K5"=>5, "L5"=>8, "M5"=>8, "N5"=>8, "O5"=>8, "P5"=>8, "Q5"=>8, "R5"=>8, "S5"=>8, "T5"=>8, "U5"=>8, "V5"=>8, "W5"=>1, "X5"=>1, "Y5"=>1, "Z5"=>1, "AA5"=>1, "AB5"=>1, "AC5"=>1, "AD5"=>1, "AE5"=>1, "AF5"=>1, "AG5"=>1, "AH5"=>1, "AI5"=>1, "AJ5"=>1, "AK5"=>1, "AL5"=>1, "AM5"=>1, "AN5"=>1, "AO5"=>1, "AP5"=>1, "AQ5"=>1, "AR5"=>1, "AS5"=>nil, "AT5"=>nil, "AU5"=>nil, "AV5"=>nil, "AW5"=>nil, "AX5"=>nil, "AY5"=>nil, "AZ5"=>nil, "BA5"=>nil, "BB5"=>nil, "BC5"=>nil}

From what I could tell by reading creek's source code, it is something related to styles.
The style of the numbers of these rows is either "38" or "39". Here are they on the styles.xml:

<xf borderId="10" fillId="0" fontId="1" numFmtId="3" xfId="0" applyBorder="1" applyFont="1" applyNumberFormat="1"/>
<xf borderId="11" fillId="0" fontId="1" numFmtId="3" xfId="0" applyBorder="1" applyFont="1" applyNumberFormat="1"/>

Both have numFmtId="3", which is for fixnum, according to https://github.com/pythonicrubyist/creek/blob/ff478caee306774ffd7423d46a7502027277ffa3/lib/creek/styles/constants.rb.

I tried to edit the styles.xml directly just to test that out, changing both styles to have numFmtId="0".

The resulting hash changed to:

{"A5"=>"Brasil", "B5"=>"5.209726E7", "C5"=>"5.21612016E7", "D5"=>"5.22251432E7", "E5"=>"5.22890848E7", "F5"=>"5.235302640000001E7", "G5"=>"5.241696800000001E7", "H5"=>"5.248090960000001E7", "I5"=>"5.254485120000001E7", "J5"=>"5.260879280000001E7", "K5"=>"5.267273440000001E7", "L5"=>"8.0437327E7", "M5"=>"8.047588936363636E7", "N5"=>"8.051445172727272E7", "O5"=>"8.055301409090908E7", "P5"=>"8.059157645454544E7", "Q5"=>"8.06301388181818E7", "R5"=>"8.066870118181816E7", "S5"=>"8.070726354545452E7", "T5"=>"8.074582590909088E7", "U5"=>"8.078438827272724E7", "V5"=>"8.08229506363636E7", "W5"=>"1.1099099E8", "X5"=>"1.139836842222222E8", "Y5"=>"1.169763784444444E8", "Z5"=>"1.199690726666667E8", "AA5"=>"1.229617668888889E8", "AB5"=>"1.259544611111111E8", "AC5"=>"1.289471553333333E8", "AD5"=>"1.319398495555556E8", "AE5"=>"1.349325437777778E8", "AF5"=>"1.37925238E8", "AG5"=>"1.402261791E8", "AH5"=>"1.425271202E8", "AI5"=>"1.448280613E8", "AJ5"=>"1.471290024E8", "AK5"=>"1.494299435E8", "AL5"=>"1.517308846E8", "AM5"=>"1.540318257E8", "AN5"=>"1.563327668E8", "AO5"=>"1.586337078999999E8", "AP5"=>"1.60934649E8", "AQ5"=>"1.632355901E8", "AR5"=>"1.655365312E8", "AS5"=>nil, "AT5"=>nil, "AU5"=>nil, "AV5"=>nil, "AW5"=>nil, "AX5"=>nil, "AY5"=>nil, "AZ5"=>nil, "BA5"=>nil, "BB5"=>nil, "BC5"=>nil}

Which I guess is expected, considering that with numFmtId 0 creek just returns the value.

So, my conclusion was that the styles associated with those cells should have a float numFmtId.

Now, I'm not quite sure if it's creek's fault or if the sheets Google Sheets make are ill-formed. If creek got the values right and the sheets are actually wrong, then please just close this issue.

I made a gist with the offending file before and after I changed the styles.xml, and also both styles.xml and a diff of them showing that the only change I made was in the numFmtIds.

https://gist.github.com/mateusmedeiros/47ec1f4ce704f4c85729
The "working" files are what I got after I changed the numFmtId, the "broken" are before the change.

Open file through URI

Hi,

I have a requirement where-in I need to store my excel files in S3. Is it possible for creek to read files directly using S3 URI?

Thanks,
Shuchi

Text read from excel files is html escaped

I have an xlsx file with an ampersand in it (&) and when I read the file using Creek, it becomes &amp;. Is this kind of html escaping expected? Is there a way of turning it off? Thanks

decimal records are not processed correctly

I have a file containing multiple decimal values.
0.001 | 0.01 | 0.1

when parsing the data as described in README the first decimal is not returned correctly!

This is what I do:

require 'creek'
creek = Creek::Book.new 'spec/fixtures/sample.xlsx'
sheet = creek.sheets[0]
sheet.rows.each do |row|
  puts row
end

This is what I see in the console (the first value is just 1E-3 instead of 0.001):
{"M2"=>"1E-3", "N2"=>"0.01", "O2"=>"0.1"}

Is that an issue or can I somehow configure the format the data is returned?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.