Comments (7)
Hello @aniarpu , Please provide the ode snippet you are using. Is it Read or Write? We'll guide you through next steps
from azure-kusto-spark.
Hi @ag-ramachandran, I am trying to perform a write. Here is a code snippet I am running from a synapse notebook. I am using the following jar kusto-spark_3.0_2.12-5.0.4-jar-with-dependencies.jar
`import com.microsoft.kusto.spark.datasink.KustoSinkOptions
import com.microsoft.kusto.spark.datasource
import com.microsoft.kusto.spark.sql.extension.SparkExtension.DataFrameReaderExtension
import com.microsoft.kusto.spark.utils.{KustoDataSourceUtils => KDSU}
import org.apache.spark.sql.{Row, SparkSession}
import org.apache.spark.sql.types.{StringType, LongType, StructField, StructType}
import com.microsoft.kusto.spark.common.KustoOptions
val schema = StructType(
List(
StructField("BinaryName", StringType, true),
StructField("Etime", LongType, true),
StructField("Ftime", LongType, true),
StructField("SQLizerPartitionIndex", LongType, true)
)
)
var df = spark.createDataFrame(spark.sparkContext.emptyRDD[Row], schema)
df = df.union(Seq(("binary1", 100, 100, 1)).toDF)
df.
write.
format("com.microsoft.kusto.spark.datasource").
option(KustoSinkOptions.KUSTO_CLUSTER, "ossec").
option(KustoSinkOptions.KUSTO_DATABASE, "IsoPlat").
option(KustoSinkOptions.KUSTO_TABLE, "AIS2").
option(KustoSinkOptions.KUSTO_MANAGED_IDENTITY_AUTH, true.toString).
option(KustoSinkOptions.KUSTO_MANAGED_IDENTITY_CLIENT_ID, "").
mode("Append").
save()`
from azure-kusto-spark.
2 things @aniarpu
a) In Synapse, you should use a SystemManagedIdentity to create a linked service and then use that from the code as a linked service. Doubt if MI was tested this way in Synapse
b) The format is wrong it should be the following (Refer : Samples
.format("com.microsoft.kusto.spark.synapse.datasource")
And, just confirmed from the git blame, the version used in Synapse is 3.1.16 , which does not support ManagedIdentity as a spark option (hence the compilation error)
from azure-kusto-spark.
Thaks for the quick reply!
a) For this option did you mean something like this? option("spark.synapse.linkedService","DataExplorer_OSSec_IsoPlat").
I did try this but this uses device authentication, and I'm trying to avoid that.
b) I did try this, but I think my synapse notebook does not have the package as I got this error. I ended up using the format I used to circumvent that.
Error - java.lang.ClassNotFoundException:
Failed to find data source: com.microsoft.kusto.spark.synapse.datasource.
from azure-kusto-spark.
Hi @aniarpu
In synapse, we can use a SystemManagedIdentity by creating a linked service (No UserManagedIdentity). As a sample you can refer (LinkedService)[https://learn.microsoft.com/en-us/azure/data-factory/concepts-linked-services?tabs=data-factory] and (SystemManagedIdentity)[https://learn.microsoft.com/en-us/azure/synapse-analytics/synapse-service-identity]. This name can then be used from the notebook
As for format : com.microsoft.kusto.spark.synapse.datasource. , do you have an extra dot in the end ?
from azure-kusto-spark.
Thanks for the links, my synapse notebook session is not running as a managed identity, maybe that has something to do with it?
No, the format doesn't have an extra dot in the end, I must have accidentally added a full stop by habit.
So, summarizing (please let me know if this is incorrect),
- While using a synapse notebook, the only format that can be used is com.microsoft.kusto.spark.synapse.datasource and nothing else?
- MI is not a supported option in synapse with com.microsoft.kusto.spark.datasource
- I have to use only system managed identities, and no user manged identities.
from azure-kusto-spark.
Summary :
- Yes
- Yes
- Yes
Synapse need not run with MI, the linked service can use ManagedIdentity (independent of Synapse)
from azure-kusto-spark.
Related Issues (20)
- Support for Scala 2.13 HOT 2
- Write to Kusto in Synapse with option "sparkIngestionPropertiesJson" always failed in spark 3.3 HOT 2
- Cannot write to ADX from Azure Databricks using Kusto connector for pyspark "com.microsoft.kusto.spark.datasource" HOT 6
- ThrottleExceptions when writing data to ADX/Kusto HOT 12
- Stuck at connecting to Kusto HOT 1
- Ingestion fails for tables with "-" in the names HOT 1
- Importing the spark connector enables verbose logging HOT 6
- com.microsoft.azure.kusto.data.auth.CloudDependentTokenProviderBase.initializeWithCloudInfo throws Null Pointer Exception HOT 2
- Overwrite data option not working HOT 1
- Spark write to Synapse error: java.lang.NoClassDefFoundError: com/twitter/util/TimeoutException HOT 17
- Unable to Authenticate Using Managed Identity HOT 3
- ExtendedKustoClient: Some extents were not processed and we got an empty move result'1' Please open issue if you see this trace. At: https://github.com/Azure/azure-kusto-spark/issues HOT 1
- DeviceAuthentication does not exist in the JVM on Databricks Runtime 14.3 LTS HOT 2
- Data from subsequent batches are skipped after an BlobAlreadyReceived_BlobAlreadyFoundInBatch error HOT 4
- Dependency issues after update to maven kusto-spark_3.0_2.12:5.0.7 HOT 2
- Kusto library not working when we enabled private end-point on Databricks. Issue is <>.blob.core.windows.net Name or service not known HOT 7
- Writing data to an ADX with private endpoints HOT 5
- UnknownHostException for created storage account HOT 3
- Add SQL support for Spark reads
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from azure-kusto-spark.