Comments (12)
@ag-ramachandran may be the message is truncated. I sent you a new email with Exception stack trace
from azure-kusto-spark.
Hello @ravikiransharvirala
More than a connector issue, this is an underlying issue on the capacity and not a connector issue that will require investigation.
All connector things remaining same, it could quite be that there are limits hit on parallel runs (or) other workloads using capacity
As this will need looking into the cluster, would be great if you can raise a support request (or) alternatively send me details of the cluster on my microsoft handle (ramacg) so that i can take it up further
cc: @asaharn
from azure-kusto-spark.
@ag-ramachandran Thanks for responding. I shared cluster details via email. Please let me know if you haven't received it.
from azure-kusto-spark.
@ravikiransharvirala Not yet!
from azure-kusto-spark.
@ag-ramachandran interesting. I sent it to ramacg at microsoft.
I sent it again without links and images.
from azure-kusto-spark.
@ravikiransharvirala got it now. Will check
from azure-kusto-spark.
@ravikiransharvirala , Please share the logs from STDOUT/ERR/LOG4j so that i can look at time correlation of these errors and capacity too
from azure-kusto-spark.
@ag-ramachandran Sure, will do that.
from azure-kusto-spark.
@ag-ramachandran Sent it as text. Let me know if you haven't received it.
from azure-kusto-spark.
@ravikiransharvirala no exceptions in that log though!
from azure-kusto-spark.
@ag-ramachandran Do you recommend persisting the dataframe after reading the data from the Kusto connector coz after reading the data from the database and performing transformations on it, I notice the connector making calls to the database throughout the job's execution.
These are the two queries I noticed while running the job (the job needs entire data from the table)
<table_name> | count
<table_name> | evaluate estimate_rows_count()
from azure-kusto-spark.
@ravikiransharvirala , Need more specifics. If are trying to read the same data again and again, it makes good reading to cache it.
These queries are used to determine how data is read, the internals of reading are different in ForceSingle and ForceDistributed modes. In your case you can set the readMode as ForceDistributed and i think some of these queries would go away.
If in force distributed mode (parquet export) you want to reuse the same file use the transient cache to true.
KUSTO_READ_MODE 'readMode' - Override the connector heuristic to choose between 'Single' and 'Distributed' mode. Options are - 'ForceSingleMode', 'ForceDistributedMode'. Scala and Java users may take these options from com.microsoft.kusto.spark.datasource.ReadMode.
KUSTO_DISTRIBUTED_READ_MODE_TRANSIENT_CACHE When 'Distributed' read mode is used and this is set to 'true', the request query is exported only once and exported data is reused.
Read up more on : https://github.com/Azure/azure-kusto-spark/blob/master/docs/KustoSource.md
P.S. It may not be related to this issue, but are good options to set and try for optimized reads
from azure-kusto-spark.
Related Issues (20)
- DataServiceException: IOError when trying to retrieve CloudInfo HOT 4
- Can read ADX (free tier) from Databricks but cannot write rows to ADX HOT 4
- Sensitive information printing as plain text in DAG representation, Text Execution Summary, and Details HOT 1
- Inconsistent values when using ForceDistributed vs Single modes
- Wrong SAS URLs gets through parser
- Container cache refresh fails the ingestion
- Error using Service Principal with Certificate HOT 4
- Maven coordinates not found in Databricks Install libraries - com.microsoft.azure.kusto:kusto-spark_3.0_2.12:4.0.2 HOT 6
- Unable to use connector if uri ends with kusto.fabric.microsoft.com HOT 2
- Kusto Spark connector cannot connect to Fabric Kusto databases HOT 9
- Ingest without spark temp tables HOT 2
- Ingestion hangs when using Maven package but works fine with GitHub release HOT 17
- Azure CLI authentication support HOT 3
- no option to pass in the appId/appKey with the API call for authentication in Synapse HOT 2
- Support for Scala 2.13 HOT 2
- Write to Kusto in Synapse with option "sparkIngestionPropertiesJson" always failed in spark 3.3 HOT 2
- Cannot write to ADX from Azure Databricks using Kusto connector for pyspark "com.microsoft.kusto.spark.datasource" HOT 6
- Stuck at connecting to Kusto HOT 1
- Ingestion fails for tables with "-" in the names HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from azure-kusto-spark.