Git Product home page Git Product logo

Comments (17)

arvindshmicrosoft avatar arvindshmicrosoft commented on May 9, 2024 1

@ederfdias - I have categorically mentioned that results might be better with the optimized SQL DW connector. Your results will depend on data volumes, schema etc. That said, it is best if you evaluate for yourself. It is highly possible that for smaller datasets, the performance of our new connector might be faster as well - you should definitely test.

The problem you had related to the failed login is because the sample script in the README had some typos which have subsequently been corrected - specifically the JDBC URL had a wrong attribute for database name - it is now corrected to databaseName in the README - please check that:

url = server_name + ";" + "databaseName=" + database_name + ";"

Secondly, since Azure Synapse / SQL DW operates in READ UNCOMMITTED, you need to specify the isolation level using mssqlIsolationLevel when you write to SQL DW using this connector (make sure you use overwrite or append as appropriate to your case, keeping in mind that overwrite will drop the destination table if it already exists). Note the syntax below for setting the isolation level:

try:
  df.write \
    .format("com.microsoft.sqlserver.jdbc.spark") \
    .mode("append") \
    .option("url", url) \
    .option("dbtable", table_name) \
    .option("user", username) \
    .option("password", password) \
    .option("mssqlIsolationLevel", "READ_UNCOMMITTED") \
    .save()
except ValueError as error :
    print("Connector write failed", error)

In closing, thank you for your interest in using the connector. Hopefully you can get to a good result with your tests.

from sql-spark-connector.

dgajendran avatar dgajendran commented on May 9, 2024 1

Thanks for the quick reply @arvindshmicrosoft . I am running it from my local terminal. I use findspark to include jars. After adding this option option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver") this error is gone. Now I get this: com.microsoft.sqlserver.jdbc.SQLServerException: Invalid object name 'dbo.job_tbl'

However, I am 100% sure this table exists in the DB. I saw some resources that Java tries to match with some letter case error and hence it doesn't find the table. Not exactly sure about what's causing this.

from sql-spark-connector.

dgajendran avatar dgajendran commented on May 9, 2024 1

@arvindshmicrosoft Thanks for the replies. I found the error with the help of my teammate. I didn't have owner permission to access the DB. I got it and then it started showing up.

from sql-spark-connector.

shivsood avatar shivsood commented on May 9, 2024

Never tried this with SQL DW. From the error it does look like an auth issue. My understanding is that we'll need a seperate connector for SQL DW. @rajmera3 @arvindshmicrosoft for reqs.

from sql-spark-connector.

arvindshmicrosoft avatar arvindshmicrosoft commented on May 9, 2024

@ederfdias could you please clarify if using the specialized SQL DW connector within Databricks is an option for you? That leverages the distributed data loading native to SQL DW / Synapse (PolyBase) and might be faster than using a traditional TDS connection such as what is exposed by this connector...

from sql-spark-connector.

arvindshmicrosoft avatar arvindshmicrosoft commented on May 9, 2024

@ederfdias I took a quick look by doing this myself. While this connector itself (as mentioned by @shivsood) is not fully tested with Azure Synapse Analytics (a.k.a. Azure SQL DW) you can proceed (with a certain degree of risk) if you set the transaction isolation level to READ_UNCOMMITTED. Also, the problem with authentication is most likely due to typos in the sample scripts in our README. I have submitted a PR #10 to fix these.

As a reminder, from Azure Databricks, you will still be better off with the specialized SQL DW connector within Databricks as that is the recommended and supported connector for that scenario.

from sql-spark-connector.

ederfdias avatar ederfdias commented on May 9, 2024

Hi @arvindshmicrosoft I was expecting that this library will have a better performance than the library that you mentioned, but based on your comment looks like that is not the case. Right?
Do you know if There is plans to improve this library to save data to SQL dw?

from sql-spark-connector.

ederfdias avatar ederfdias commented on May 9, 2024

Thanks a lot @arvindshmicrosoft

from sql-spark-connector.

shivsood avatar shivsood commented on May 9, 2024

Leaving this open as enhacement request - Supporting SQL DW write

from sql-spark-connector.

KristoR avatar KristoR commented on May 9, 2024

I'm interested in this issue getting implemented.

I have a similar problem, need to write to Synapse from Databricks.
I'm getting the following error:
SQLServerException: 4022;Bulk load data was expected but not sent. The batch will be terminated.

The Databricks SQL DW connector is not an option, since it does not support service principal auth.

from sql-spark-connector.

arvindshmicrosoft avatar arvindshmicrosoft commented on May 9, 2024

@KristoR - that error message is interesting. Would you be able to share a minimal repro, so that we can look at it?

from sql-spark-connector.

dgajendran avatar dgajendran commented on May 9, 2024

@arvindshmicrosoft Related error. I get java.sql.SQLException: No suitable driver error when trying to read from synapse DB. Here is the snippet:
jdbc_db = self.spark.read.format("com.microsoft.sqlserver.jdbc.spark") \ .option("url", "jdbc:sqlserver://server.database.windows.net:1433") \ .option("dbtable", "config.dockerImages") \ .option("accessToken", access_token) \ .option("encrypt", "true") \ .option("databaseName", "dbname")\ .option("hostNameInCertificate", "*.database.windows.net") \ .load()

from sql-spark-connector.

arvindshmicrosoft avatar arvindshmicrosoft commented on May 9, 2024

@arvindshmicrosoft Related error. I get java.sql.SQLException: No suitable driver error when trying to read from synapse DB. Here is the snippet:
jdbc_db = self.spark.read.format("com.microsoft.sqlserver.jdbc.spark") \ .option("url", "jdbc:sqlserver://server.database.windows.net:1433") \ .option("dbtable", "config.dockerImages") \ .option("accessToken", access_token) \ .option("encrypt", "true") \ .option("databaseName", "dbname")\ .option("hostNameInCertificate", "*.database.windows.net") \ .load()

On first glance, it appears you might be missing the JAR, or it might not be on the classpath. Where are you executing this?

from sql-spark-connector.

arvindshmicrosoft avatar arvindshmicrosoft commented on May 9, 2024

please append databaseName=dbname; to the JDBC URL (with the semicolon delimiter of course). I believe that is required to use the correct database context.

from sql-spark-connector.

dgajendran avatar dgajendran commented on May 9, 2024

Yes, I did that. It's not reading from the database I give. Instead, it's taking tables only from master DB. Not sure how to supply my database name.

from sql-spark-connector.

arvindshmicrosoft avatar arvindshmicrosoft commented on May 9, 2024

I had a typo in my previous reply - the attribute in the URL needs to be databaseName=dbname; and not database=dbname;

from sql-spark-connector.

dgajendran avatar dgajendran commented on May 9, 2024

Yes, I tried this as well. And added databaseName in both URL and as option. But the code didn't point to the DB I wanted to. Always going with master.

from sql-spark-connector.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.