Hello, I am trying to save data into Azure SQL DW using Spark (Databricks) and rec

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thanks for the quick reply <a class="user-mention notranslate" data-hovercard-type="us

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Thanks a lot <a class="user-mention notranslate" data-hovercard-type="user" data-hover

Trying to save data into Azure SQL DW about sql-spark-connector HOT 17 CLOSED

ederfdias commented on May 9, 2024

Trying to save data into Azure SQL DW

from sql-spark-connector.

Comments (17)

arvindshmicrosoft commented on May 9, 2024 1

@ederfdias - I have categorically mentioned that results might be better with the optimized SQL DW connector. Your results will depend on data volumes, schema etc. That said, it is best if you evaluate for yourself. It is highly possible that for smaller datasets, the performance of our new connector might be faster as well - you should definitely test.

The problem you had related to the failed login is because the sample script in the README had some typos which have subsequently been corrected - specifically the JDBC URL had a wrong attribute for database name - it is now corrected to databaseName in the README - please check that:

url = server_name + ";" + "databaseName=" + database_name + ";"

Secondly, since Azure Synapse / SQL DW operates in READ UNCOMMITTED, you need to specify the isolation level using mssqlIsolationLevel when you write to SQL DW using this connector (make sure you use overwrite or append as appropriate to your case, keeping in mind that overwrite will drop the destination table if it already exists). Note the syntax below for setting the isolation level:

try:
  df.write \
    .format("com.microsoft.sqlserver.jdbc.spark") \
    .mode("append") \
    .option("url", url) \
    .option("dbtable", table_name) \
    .option("user", username) \
    .option("password", password) \
    .option("mssqlIsolationLevel", "READ_UNCOMMITTED") \
    .save()
except ValueError as error :
    print("Connector write failed", error)

In closing, thank you for your interest in using the connector. Hopefully you can get to a good result with your tests.

from sql-spark-connector.

dgajendran commented on May 9, 2024 1

Thanks for the quick reply @arvindshmicrosoft . I am running it from my local terminal. I use findspark to include jars. After adding this option option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver") this error is gone. Now I get this: com.microsoft.sqlserver.jdbc.SQLServerException: Invalid object name 'dbo.job_tbl'

However, I am 100% sure this table exists in the DB. I saw some resources that Java tries to match with some letter case error and hence it doesn't find the table. Not exactly sure about what's causing this.

from sql-spark-connector.

dgajendran commented on May 9, 2024 1

@arvindshmicrosoft Thanks for the replies. I found the error with the help of my teammate. I didn't have owner permission to access the DB. I got it and then it started showing up.

from sql-spark-connector.

shivsood commented on May 9, 2024

Never tried this with SQL DW. From the error it does look like an auth issue. My understanding is that we'll need a seperate connector for SQL DW. @rajmera3 @arvindshmicrosoft for reqs.

from sql-spark-connector.

arvindshmicrosoft commented on May 9, 2024

@ederfdias could you please clarify if using the specialized SQL DW connector within Databricks is an option for you? That leverages the distributed data loading native to SQL DW / Synapse (PolyBase) and might be faster than using a traditional TDS connection such as what is exposed by this connector...

from sql-spark-connector.

arvindshmicrosoft commented on May 9, 2024

@ederfdias I took a quick look by doing this myself. While this connector itself (as mentioned by @shivsood) is not fully tested with Azure Synapse Analytics (a.k.a. Azure SQL DW) you can proceed (with a certain degree of risk) if you set the transaction isolation level to READ_UNCOMMITTED. Also, the problem with authentication is most likely due to typos in the sample scripts in our README. I have submitted a PR #10 to fix these.

As a reminder, from Azure Databricks, you will still be better off with the specialized SQL DW connector within Databricks as that is the recommended and supported connector for that scenario.

from sql-spark-connector.

ederfdias commented on May 9, 2024

Hi @arvindshmicrosoft I was expecting that this library will have a better performance than the library that you mentioned, but based on your comment looks like that is not the case. Right?
Do you know if There is plans to improve this library to save data to SQL dw?

from sql-spark-connector.

ederfdias commented on May 9, 2024

Thanks a lot @arvindshmicrosoft

from sql-spark-connector.

shivsood commented on May 9, 2024

Leaving this open as enhacement request - Supporting SQL DW write

from sql-spark-connector.

KristoR commented on May 9, 2024

I'm interested in this issue getting implemented.

I have a similar problem, need to write to Synapse from Databricks.
I'm getting the following error:
SQLServerException: 4022;Bulk load data was expected but not sent. The batch will be terminated.

The Databricks SQL DW connector is not an option, since it does not support service principal auth.

from sql-spark-connector.

arvindshmicrosoft commented on May 9, 2024

@KristoR - that error message is interesting. Would you be able to share a minimal repro, so that we can look at it?

from sql-spark-connector.

dgajendran commented on May 9, 2024

@arvindshmicrosoft Related error. I get java.sql.SQLException: No suitable driver error when trying to read from synapse DB. Here is the snippet:
jdbc_db = self.spark.read.format("com.microsoft.sqlserver.jdbc.spark") \ .option("url", "jdbc:sqlserver://server.database.windows.net:1433") \ .option("dbtable", "config.dockerImages") \ .option("accessToken", access_token) \ .option("encrypt", "true") \ .option("databaseName", "dbname")\ .option("hostNameInCertificate", "*.database.windows.net") \ .load()

from sql-spark-connector.

arvindshmicrosoft commented on May 9, 2024

@arvindshmicrosoft Related error. I get java.sql.SQLException: No suitable driver error when trying to read from synapse DB. Here is the snippet:
jdbc_db = self.spark.read.format("com.microsoft.sqlserver.jdbc.spark") \ .option("url", "jdbc:sqlserver://server.database.windows.net:1433") \ .option("dbtable", "config.dockerImages") \ .option("accessToken", access_token) \ .option("encrypt", "true") \ .option("databaseName", "dbname")\ .option("hostNameInCertificate", "*.database.windows.net") \ .load()

On first glance, it appears you might be missing the JAR, or it might not be on the classpath. Where are you executing this?

from sql-spark-connector.

arvindshmicrosoft commented on May 9, 2024

please append databaseName=dbname; to the JDBC URL (with the semicolon delimiter of course). I believe that is required to use the correct database context.

from sql-spark-connector.

dgajendran commented on May 9, 2024

Yes, I did that. It's not reading from the database I give. Instead, it's taking tables only from master DB. Not sure how to supply my database name.

from sql-spark-connector.

arvindshmicrosoft commented on May 9, 2024

I had a typo in my previous reply - the attribute in the URL needs to be databaseName=dbname; and not database=dbname;

from sql-spark-connector.

dgajendran commented on May 9, 2024

Yes, I tried this as well. And added databaseName in both URL and as option. But the code didn't point to the DB I wanted to. Always going with master.

from sql-spark-connector.

Trying to save data into Azure SQL DW about sql-spark-connector HOT 17 CLOSED

Comments (17)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent