Comments (4)
Glad that helped @msft33333. If this works can we close the issue
from azure-kusto-spark.
Hello @msft33333
Apologies for this inconvenience, this is a known issue because a new native implementation of Parquet writer that uses new encoding schemes was rolled out on the ADX side. This uses delta byte array for strings and other byte array-based Parquet types (default in Parquet V2 which most modern parquet readers support by default)
Spark 2.x has the issue (with Delta encodings on Parquet V2) as mentioned here https://issues.apache.org/jira/si/jira.issueviews:issue-html/SPARK-26509/SPARK-26509.html
Parquet V2 which has Delta encodings was to be an improvement over the Parquet V1 (plain encoding). Based on documentation, delta encoding provided advantages in reading rows in batches (Ref : https://issues.apache.org/jira/browse/SPARK-12854). These flags are provided as workaround as you can see in the previous ticket on Spark Jira
Mitigation: To mitigate this a new version of the library has been rolled out that will be deployed on Synapse (Spark library version 3.1.10).
Will triage it to the right team and get you a roll out as soon as possible
Once again apologize for the inconvenience
from azure-kusto-spark.
Also, to overcome this you can try setting the following spark options and this should fix the issue immediately. We will work parallelly to get a new revision to Synapse
spark.conf.set("spark.sql.parquet.enableVectorizedReader", "false")
spark.conf.set("parquet.split.files", "false")
from azure-kusto-spark.
Hi @ag-ramachandran ,
Thanks for the detailed reply!The workaround works fine for me,thanks again!
from azure-kusto-spark.
Related Issues (20)
- Fix TODO's on the code
- java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.Double HOT 5
- Add additional options for Export to DataStorage
- This repo is missing important files HOT 2
- Update Kusto Java Lib with JDK 11 HOT 1
- Copy table metadata and policies when creating temp tables
- DataServiceException: IOError when trying to retrieve CloudInfo HOT 4
- Can read ADX (free tier) from Databricks but cannot write rows to ADX HOT 4
- Sensitive information printing as plain text in DAG representation, Text Execution Summary, and Details HOT 1
- Inconsistent values when using ForceDistributed vs Single modes
- Wrong SAS URLs gets through parser
- Container cache refresh fails the ingestion
- Error using Service Principal with Certificate HOT 4
- Maven coordinates not found in Databricks Install libraries - com.microsoft.azure.kusto:kusto-spark_3.0_2.12:4.0.2 HOT 6
- Unable to use connector if uri ends with kusto.fabric.microsoft.com HOT 2
- Kusto Spark connector cannot connect to Fabric Kusto databases HOT 9
- Ingest without spark temp tables HOT 2
- Ingestion hangs when using Maven package but works fine with GitHub release HOT 17
- Azure CLI authentication support HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from azure-kusto-spark.