Comments (4)
I'm getting the same error using:
13.3 LTS (includes Apache Spark 3.4.1, Scala 2.12)
spark_mssql_connector_2_12_1_4_0_BETA.jar
I can see the table itself is created before the error with columns in it
Standard JDBC is working but processed only 2M rows in 2 hours
from sql-spark-connector.
I found the problem on my side, so please check it on yours.
The problem was between differences in data types in Databricks and SQL server and more precisely between TIMESTAMP columns in the table in Databricks (source table) and DATETIME data type for the column in SQL Server (target table). In the SQL Server, I changed the datatype to DATETIM2 and CAST() the column in Databricks ETL before writing the data to the SQL server and everything started working.
Unfortunately, the message is quite unuseful. I hope MS/Databricks to consider this and return more useful error messages in these cases.
Best Regards,
Roza
from sql-spark-connector.
Interesting, I am going to play around at bit with the datatypes. The weird thing is that I was creating a new table without specifying any datatypes. Probably should because with the standard JDBC all strings become nvarchar(max)
Just creating a new table using this:
df.write
.format("com.microsoft.sqlserver.jdbc.spark")
.mode("overwrite")
.option("url", url)
.option("dbtable", table_name)
.option("user", username)
.option("password", password)
.save()
from sql-spark-connector.
For anyone interested, we faced similar issue recently and the solution is to recast the datatypes on PySpark DataFrame. It looks like that for some reason, the write to SQL works more reliable with strong casting. We added this step before every write in ETL pipelines.
for column in df.schema:
df = df.withColumn(column.name, col(column.name).cast(column.dataType))
If you use .mode(“overwrite”)
, you will probably not see any issue because this will drop and recreate the table in SQL with datatypes inferred from the DataFrame. However, if you want to use the table with predefined datatypes without actually dropping the table before write (with .mode(“overwrite”)
alongside with .option(“truncate”, True)
), I strongly suggest recasting with the code snippet above.
from sql-spark-connector.
Related Issues (20)
- java.lang.ClassNotFoundException: com.microsoft.sqlserver.jdbc.SQLServerDriver HOT 1
- writing with mode "append" to an existing table only rolls back faulty rows w/o "NO_DUPLICATES"
- [Question] how to set MAXDOP
- Error while writing - com.microsoft.sqlserver.jdbc.SQLServerException: The connection Is closed. HOT 1
- Missing Maven distros HOT 6
- TCP/IP connection to the host whatever.database.windows.net, port 1433 has failed. Error: "null. Verify the connection properties. HOT 3
- Need the ability to use a linked service in a notebook to do a connection to a database
- Don't run the AFTER insert trigger
- Possible Wheel (.whl) distribution? HOT 1
- GA versions HOT 2
- TIMESTAMP to datetime2 fails on Azure Synapse HOT 1
- Version 8.4.1 of Microsoft JDBC Driver For SQL Server
- [Request] Support Spark 3.5 HOT 1
- Does bulk upsert data import support?
- Support for Spark 3.4.x HOT 2
- Assessing the risk of duplicated entries for BEST_EFFORT reliabilityLevel HOT 1
- sql-spark-connector issue where the access token expires post 60 mins HOT 1
- Intermittent Authentication Failure using ActiveDirectoryPassword
- The project is dead or archived? HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sql-spark-connector.