Git Product home page Git Product logo

Comments (9)

wgtmac avatar wgtmac commented on June 26, 2024 4

I have reproduced the issue. The root cause is that the reader tried to read col3 (w/ columnId = 4) which does not have any stream (both PRESENT and DATA streams all have ZERO length as listed below). The parent of col3 is col2 (w/ columnId = 3) whose values are all null, which means the reader should stop reading at col2 w/o touching col3.

Rows: 50000
Compression: ZLIB
Compression size: 65536
Calendar: Julian/Gregorian
Type: struct<col0:struct<col1:int>,col2:struct<col3:int>>

Stripe Statistics:
  Stripe 1:
    Column 0: count: 50000 hasNull: false
    Column 1: count: 50000 hasNull: false
    Column 2: count: 50000 hasNull: false min: 0 max: 149992 sum: 3752012883
    Column 3: count: 0 hasNull: true
    Column 4: count: 0 hasNull: true sum: 0

File Statistics:
  Column 0: count: 50000 hasNull: false
  Column 1: count: 50000 hasNull: false
  Column 2: count: 50000 hasNull: false min: 0 max: 149992 sum: 3752012883
  Column 3: count: 0 hasNull: true
  Column 4: count: 0 hasNull: true sum: 0

Stripes:
  Stripe: offset: 3 data: 129019 rows: 50000 tail: 68 index: 216
    Stream: column 0 section ROW_INDEX start: 3 length 17
    Stream: column 1 section ROW_INDEX start: 20 length 17
    Stream: column 2 section ROW_INDEX start: 37 length 122
    Stream: column 3 section ROW_INDEX start: 159 length 35
    Stream: column 4 section ROW_INDEX start: 194 length 25
    Stream: column 2 section DATA start: 219 length 129007
    Stream: column 3 section PRESENT start: 129226 length 12
    Stream: column 4 section PRESENT start: 129238 length 0
    Stream: column 4 section DATA start: 129238 length 0
    Encoding column 0: DIRECT
    Encoding column 1: DIRECT
    Encoding column 2: DIRECT_V2
    Encoding column 3: DIRECT
    Encoding column 4: DIRECT_V2

from orc.

wgtmac avatar wgtmac commented on June 26, 2024 4

I will file a JIRA and fix it shortly.

from orc.

dongjoon-hyun avatar dongjoon-hyun commented on June 26, 2024 1

Thank you for reporting, @jnwan .

from orc.

wgtmac avatar wgtmac commented on June 26, 2024 1

@wgtmac has explained the root cause well. Just want to reemphasize that same issue happens on other complicated columns, like map, empty map will also get "bad read in nextBuffer" error.

This issue has been fixed into the main branch. Please have a try and let us know if there is any issue. Thanks @jnwan !

from orc.

coderex2522 avatar coderex2522 commented on June 26, 2024

ColumnReader needs to fix this bug by processing for cases where data stream does not exist.

from orc.

dongjoon-hyun avatar dongjoon-hyun commented on June 26, 2024

Thank you so much, @wgtmac and @coderex2522 !

from orc.

coderex2522 avatar coderex2522 commented on June 26, 2024

I create a new issue in Jira.

from orc.

jnwan avatar jnwan commented on June 26, 2024

@wgtmac has explained the root cause well. Just want to reemphasize that same issue happens on other complicated columns, like map, empty map will also get "bad read in nextBuffer" error.

from orc.

jnwan avatar jnwan commented on June 26, 2024

Verified the issue got fixed! Thank you!

from orc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.