Git Product home page Git Product logo

Comments (4)

ag-ramachandran avatar ag-ramachandran commented on August 16, 2024 1

Glad that helped @msft33333. If this works can we close the issue

from azure-kusto-spark.

ag-ramachandran avatar ag-ramachandran commented on August 16, 2024

Hello @msft33333
Apologies for this inconvenience, this is a known issue because a new native implementation of Parquet writer that uses new encoding schemes was rolled out on the ADX side. This uses delta byte array for strings and other byte array-based Parquet types (default in Parquet V2 which most modern parquet readers support by default)

Spark 2.x has the issue (with Delta encodings on Parquet V2) as mentioned here https://issues.apache.org/jira/si/jira.issueviews:issue-html/SPARK-26509/SPARK-26509.html

Parquet V2 which has Delta encodings was to be an improvement over the Parquet V1 (plain encoding). Based on documentation, delta encoding provided advantages in reading rows in batches (Ref : https://issues.apache.org/jira/browse/SPARK-12854). These flags are provided as workaround as you can see in the previous ticket on Spark Jira

Mitigation: To mitigate this a new version of the library has been rolled out that will be deployed on Synapse (Spark library version 3.1.10).

Will triage it to the right team and get you a roll out as soon as possible

Once again apologize for the inconvenience

from azure-kusto-spark.

ag-ramachandran avatar ag-ramachandran commented on August 16, 2024

Also, to overcome this you can try setting the following spark options and this should fix the issue immediately. We will work parallelly to get a new revision to Synapse

spark.conf.set("spark.sql.parquet.enableVectorizedReader", "false")
spark.conf.set("parquet.split.files", "false")

from azure-kusto-spark.

msft33333 avatar msft33333 commented on August 16, 2024

Hi @ag-ramachandran ,
Thanks for the detailed reply!The workaround works fine for me,thanks again!

from azure-kusto-spark.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.