Comments (17)
@ederfdias - I have categorically mentioned that results might be better with the optimized SQL DW connector. Your results will depend on data volumes, schema etc. That said, it is best if you evaluate for yourself. It is highly possible that for smaller datasets, the performance of our new connector might be faster as well - you should definitely test.
The problem you had related to the failed login is because the sample script in the README had some typos which have subsequently been corrected - specifically the JDBC URL had a wrong attribute for database name - it is now corrected to databaseName
in the README - please check that:
url = server_name + ";" + "databaseName=" + database_name + ";"
Secondly, since Azure Synapse / SQL DW operates in READ UNCOMMITTED, you need to specify the isolation level using mssqlIsolationLevel
when you write to SQL DW using this connector (make sure you use overwrite
or append
as appropriate to your case, keeping in mind that overwrite
will drop the destination table if it already exists). Note the syntax below for setting the isolation level:
try:
df.write \
.format("com.microsoft.sqlserver.jdbc.spark") \
.mode("append") \
.option("url", url) \
.option("dbtable", table_name) \
.option("user", username) \
.option("password", password) \
.option("mssqlIsolationLevel", "READ_UNCOMMITTED") \
.save()
except ValueError as error :
print("Connector write failed", error)
In closing, thank you for your interest in using the connector. Hopefully you can get to a good result with your tests.
from sql-spark-connector.
Thanks for the quick reply @arvindshmicrosoft . I am running it from my local terminal. I use findspark to include jars. After adding this option option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver")
this error is gone. Now I get this: com.microsoft.sqlserver.jdbc.SQLServerException: Invalid object name 'dbo.job_tbl'
However, I am 100% sure this table exists in the DB. I saw some resources that Java tries to match with some letter case error and hence it doesn't find the table. Not exactly sure about what's causing this.
from sql-spark-connector.
@arvindshmicrosoft Thanks for the replies. I found the error with the help of my teammate. I didn't have owner permission to access the DB. I got it and then it started showing up.
from sql-spark-connector.
Never tried this with SQL DW. From the error it does look like an auth issue. My understanding is that we'll need a seperate connector for SQL DW. @rajmera3 @arvindshmicrosoft for reqs.
from sql-spark-connector.
@ederfdias could you please clarify if using the specialized SQL DW connector within Databricks is an option for you? That leverages the distributed data loading native to SQL DW / Synapse (PolyBase) and might be faster than using a traditional TDS connection such as what is exposed by this connector...
from sql-spark-connector.
@ederfdias I took a quick look by doing this myself. While this connector itself (as mentioned by @shivsood) is not fully tested with Azure Synapse Analytics (a.k.a. Azure SQL DW) you can proceed (with a certain degree of risk) if you set the transaction isolation level to READ_UNCOMMITTED. Also, the problem with authentication is most likely due to typos in the sample scripts in our README. I have submitted a PR #10 to fix these.
As a reminder, from Azure Databricks, you will still be better off with the specialized SQL DW connector within Databricks as that is the recommended and supported connector for that scenario.
from sql-spark-connector.
Hi @arvindshmicrosoft I was expecting that this library will have a better performance than the library that you mentioned, but based on your comment looks like that is not the case. Right?
Do you know if There is plans to improve this library to save data to SQL dw?
from sql-spark-connector.
Thanks a lot @arvindshmicrosoft
from sql-spark-connector.
Leaving this open as enhacement request - Supporting SQL DW write
from sql-spark-connector.
I'm interested in this issue getting implemented.
I have a similar problem, need to write to Synapse from Databricks.
I'm getting the following error:
SQLServerException: 4022;Bulk load data was expected but not sent. The batch will be terminated.
The Databricks SQL DW connector is not an option, since it does not support service principal auth.
from sql-spark-connector.
@KristoR - that error message is interesting. Would you be able to share a minimal repro, so that we can look at it?
from sql-spark-connector.
@arvindshmicrosoft Related error. I get java.sql.SQLException: No suitable driver
error when trying to read from synapse DB. Here is the snippet:
jdbc_db = self.spark.read.format("com.microsoft.sqlserver.jdbc.spark") \ .option("url", "jdbc:sqlserver://server.database.windows.net:1433") \ .option("dbtable", "config.dockerImages") \ .option("accessToken", access_token) \ .option("encrypt", "true") \ .option("databaseName", "dbname")\ .option("hostNameInCertificate", "*.database.windows.net") \ .load()
from sql-spark-connector.
@arvindshmicrosoft Related error. I get
java.sql.SQLException: No suitable driver
error when trying to read from synapse DB. Here is the snippet:
jdbc_db = self.spark.read.format("com.microsoft.sqlserver.jdbc.spark") \ .option("url", "jdbc:sqlserver://server.database.windows.net:1433") \ .option("dbtable", "config.dockerImages") \ .option("accessToken", access_token) \ .option("encrypt", "true") \ .option("databaseName", "dbname")\ .option("hostNameInCertificate", "*.database.windows.net") \ .load()
On first glance, it appears you might be missing the JAR, or it might not be on the classpath. Where are you executing this?
from sql-spark-connector.
please append databaseName=dbname;
to the JDBC URL (with the semicolon delimiter of course). I believe that is required to use the correct database context.
from sql-spark-connector.
Yes, I did that. It's not reading from the database I give. Instead, it's taking tables only from master DB. Not sure how to supply my database name.
from sql-spark-connector.
I had a typo in my previous reply - the attribute in the URL needs to be databaseName=dbname;
and not database=dbname;
from sql-spark-connector.
Yes, I tried this as well. And added databaseName in both URL and as option. But the code didn't point to the DB I wanted to. Always going with master.
from sql-spark-connector.
Related Issues (20)
- java.lang.ClassNotFoundException: com.microsoft.sqlserver.jdbc.SQLServerDriver HOT 1
- writing with mode "append" to an existing table only rolls back faulty rows w/o "NO_DUPLICATES"
- [Question] how to set MAXDOP
- Error while writing - com.microsoft.sqlserver.jdbc.SQLServerException: The connection Is closed. HOT 1
- Missing Maven distros HOT 6
- TCP/IP connection to the host whatever.database.windows.net, port 1433 has failed. Error: "null. Verify the connection properties. HOT 3
- Need the ability to use a linked service in a notebook to do a connection to a database
- Don't run the AFTER insert trigger
- Possible Wheel (.whl) distribution? HOT 1
- GA versions HOT 2
- TIMESTAMP to datetime2 fails on Azure Synapse HOT 1
- Connection closed when try to use om.microsoft.sqlserver.jdbc.spark connector HOT 4
- Version 8.4.1 of Microsoft JDBC Driver For SQL Server
- [Request] Support Spark 3.5
- Does bulk upsert data import support?
- Support for Spark 3.4.x HOT 1
- Assessing the risk of duplicated entries for BEST_EFFORT reliabilityLevel HOT 1
- sql-spark-connector issue where the access token expires post 60 mins
- Intermittent Authentication Failure using ActiveDirectoryPassword
- The project is dead or archived? HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sql-spark-connector.