Comments (2)
a) There are a couple of challenges here, by using SparkIngestionPropertiesJson, we are using the FlushImmediatelt flag to true, which we do not recommend.
Here is how the internals work
DataFrame ----> Write to blob ------> Ingest this blob
To optimize for throughput in ingestion, the size of the blob is critical. Kusto is optimized for few large blobs , as opposed to many small blobs. Please use the right batching policy from Kusto and you can get rid of SparkIngestionProperties altogether
(Refer : https://learn.microsoft.com/en-us/azure/data-explorer/kusto/management/batchingpolicy)
b) You can try and use the Queued writeMode, IIRC right the version in Synapse had an issue that where the shards that were to be merged were queried incorrectly (so probably that could be a cause too)
If you still want to use the FlushImmediately flag still (Not recommended, will result in no aggregation and many smaller ingestion, please use the queued write option
.option("writeMode","Queued")
Refer : https://github.com/Azure/azure-kusto-spark/blob/master/docs/KustoSink.md writeMode
from azure-kusto-spark.
Thanks @ag-ramachandran
.option("writeMode","Queued") solved my problem.
Thanks for your kind answer and suggestions.
from azure-kusto-spark.
Related Issues (20)
- DataServiceException: IOError when trying to retrieve CloudInfo HOT 4
- Can read ADX (free tier) from Databricks but cannot write rows to ADX HOT 4
- Sensitive information printing as plain text in DAG representation, Text Execution Summary, and Details HOT 1
- Inconsistent values when using ForceDistributed vs Single modes
- Wrong SAS URLs gets through parser
- Container cache refresh fails the ingestion
- Error using Service Principal with Certificate HOT 4
- Maven coordinates not found in Databricks Install libraries - com.microsoft.azure.kusto:kusto-spark_3.0_2.12:4.0.2 HOT 6
- Unable to use connector if uri ends with kusto.fabric.microsoft.com HOT 2
- Kusto Spark connector cannot connect to Fabric Kusto databases HOT 9
- Ingest without spark temp tables HOT 2
- Ingestion hangs when using Maven package but works fine with GitHub release HOT 17
- Azure CLI authentication support HOT 3
- no option to pass in the appId/appKey with the API call for authentication in Synapse HOT 2
- Support for Scala 2.13 HOT 2
- Cannot write to ADX from Azure Databricks using Kusto connector for pyspark "com.microsoft.kusto.spark.datasource" HOT 6
- ThrottleExceptions when writing data to ADX/Kusto HOT 12
- Stuck at connecting to Kusto HOT 1
- Ingestion fails for tables with "-" in the names HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from azure-kusto-spark.