Comments (3)
Full callstack / error details please - without this is not actionable. The inner error will have information about what caused the abortion.
from azure-cosmosdb-spark.
Py4JJavaError Traceback (most recent call last)
in
----> 1 update_entities(df_input)
in update_entities(df)
15 def update_entities(df):
16 print("Starting ingestion: ", datetime.datetime.utcnow().strftime("%Y-%m-%d %H:%M:%S.%f"))
---> 17 df
18 .write
19 .format("cosmos.oltp") \
/databricks/spark/python/pyspark/sql/readwriter.py in save(self, path, format, mode, partitionBy, **options)
736 self.format(format)
737 if path is None:
--> 738 self._jwrite.save()
739 else:
740 self._jwrite.save(path)
/databricks/spark/python/lib/py4j-0.10.9.1-src.zip/py4j/java_gateway.py in call(self, *args)
1302
1303 answer = self.gateway_client.send_command(command)
-> 1304 return_value = get_return_value(
1305 answer, self.gateway_client, self.target_id, self.name)
1306
/databricks/spark/python/pyspark/sql/utils.py in deco(*a, **kw)
115 def deco(*a, **kw):
116 try:
--> 117 return f(*a, **kw)
118 except py4j.protocol.Py4JJavaError as e:
119 converted = convert_exception(e.java_exception)
/databricks/spark/python/lib/py4j-0.10.9.1-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
325 if answer[1] == REFERENCE_TYPE:
--> 326 raise Py4JJavaError(
327 "An error occurred while calling {0}{1}{2}.\n".
328 format(target_id, ".", name), value)
Py4JJavaError: An error occurred while calling o3320.save.
: org.apache.spark.SparkException: Writing job aborted
at org.apache.spark.sql.errors.QueryExecutionErrors$.writingJobAbortedError(QueryExecutionErrors.scala:727)
at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2(WriteToDataSourceV2Exec.scala:395)
at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2$(WriteToDataSourceV2Exec.scala:339)
at org.apache.spark.sql.execution.datasources.v2.AppendDataExec.writeWithV2(WriteToDataSourceV2Exec.scala:241)
at org.apache.spark.sql.execution.datasources.v2.V2ExistingTableWriteExec.run(WriteToDataSourceV2Exec.scala:318)
at org.apache.spark.sql.execution.datasources.v2.V2ExistingTableWriteExec.run$(WriteToDataSourceV2Exec.scala:317)
at org.apache.spark.sql.execution.datasources.v2.AppendDataExec.run(WriteToDataSourceV2Exec.scala:241)
at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43)
at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43)
at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$1(QueryExecution.scala:130)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$5(SQLExecution.scala:172)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:319)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:146)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:854)
at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:113)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:269)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.applyOrElse(QueryExecution.scala:130)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.applyOrElse(QueryExecution.scala:126)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:485)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:86)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:485)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:268)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:264)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:461)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$eagerlyExecuteCommands$1(QueryExecution.scala:126)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:324)
at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:126)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:111)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:102)
at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:153)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:947)
at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:346)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:258)
at sun.reflect.GeneratedMethodAccessor426.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:295)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:251)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 25 in stage 651.0 failed 4 times, most recent failure: Lost task 25.3 in stage 651.0 (TID 12744) (10.139.64.4 executor 0): {"ClassName":"BulkOperationFailedException","userAgent":"azsdk-java-cosmos/4.32.0-snapshot.1 Linux/5.4.0-1080-azure JRE/1.8.0_302","statusCode":409,"resourceAddress":null,"innerErrorMessage":"All retries exhausted for 'UPSERT' bulk operation - statusCode=[409:0] itemId=[e75841c9-534b-423f-b397-afbc2ad7d427], partitionKeyValue=[[{}]]","causeInfo":null,"responseHeaders":"{x-ms-substatus=0}"}
at com.azure.cosmos.spark.BulkWriter.handleNonSuccessfulStatusCode(BulkWriter.scala:396)
at com.azure.cosmos.spark.BulkWriter.$anonfun$subscriptionDisposable$2(BulkWriter.scala:168)
at com.azure.cosmos.spark.BulkWriter.$anonfun$subscriptionDisposable$2$adapted(BulkWriter.scala:144)
at azure_cosmos_spark.reactor.core.scala.publisher.package$.$anonfun$scalaConsumer2JConsumer$2(package.scala:45)
at azure_cosmos_spark.reactor.core.scala.publisher.package$.$anonfun$scalaConsumer2JConsumer$2$adapted(package.scala:45)
at scala.Option.foreach(Option.scala:407)
at azure_cosmos_spark.reactor.core.scala.publisher.package$.$anonfun$scalaConsumer2JConsumer$1(package.scala:45)
at azure_cosmos_spark.reactor.core.publisher.LambdaSubscriber.onNext(LambdaSubscriber.java:160)
at azure_cosmos_spark.reactor.core.publisher.FluxPublishOn$PublishOnSubscriber.runAsync(FluxPublishOn.java:440)
at azure_cosmos_spark.reactor.core.publisher.FluxPublishOn$PublishOnSubscriber.run(FluxPublishOn.java:527)
at azure_cosmos_spark.reactor.core.scheduler.WorkerTask.call(WorkerTask.java:84)
at azure_cosmos_spark.reactor.core.scheduler.WorkerTask.call(WorkerTask.java:37)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2973)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2920)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2914)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2914)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1334)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1334)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1334)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3182)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3123)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3111)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:1096)
at org.apache.spark.SparkContext.runJobInternal(SparkContext.scala:2494)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2477)
at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2(WriteToDataSourceV2Exec.scala:363)
... 45 more
Caused by: {"ClassName":"BulkOperationFailedException","userAgent":"azsdk-java-cosmos/4.32.0-snapshot.1 Linux/5.4.0-1080-azure JRE/1.8.0_302","statusCode":409,"resourceAddress":null,"innerErrorMessage":"All retries exhausted for 'UPSERT' bulk operation - statusCode=[409:0] itemId=[e75841c9-534b-423f-b397-afbc2ad7d427], partitionKeyValue=[[{}]]","causeInfo":null,"responseHeaders":"{x-ms-substatus=0}"}
at com.azure.cosmos.spark.BulkWriter.handleNonSuccessfulStatusCode(BulkWriter.scala:396)
at com.azure.cosmos.spark.BulkWriter.$anonfun$subscriptionDisposable$2(BulkWriter.scala:168)
at com.azure.cosmos.spark.BulkWriter.$anonfun$subscriptionDisposable$2$adapted(BulkWriter.scala:144)
at azure_cosmos_spark.reactor.core.scala.publisher.package$.$anonfun$scalaConsumer2JConsumer$2(package.scala:45)
at azure_cosmos_spark.reactor.core.scala.publisher.package$.$anonfun$scalaConsumer2JConsumer$2$adapted(package.scala:45)
at scala.Option.foreach(Option.scala:407)
at azure_cosmos_spark.reactor.core.scala.publisher.package$.$anonfun$scalaConsumer2JConsumer$1(package.scala:45)
at azure_cosmos_spark.reactor.core.publisher.LambdaSubscriber.onNext(LambdaSubscriber.java:160)
at azure_cosmos_spark.reactor.core.publisher.FluxPublishOn$PublishOnSubscriber.runAsync(FluxPublishOn.java:440)
at azure_cosmos_spark.reactor.core.publisher.FluxPublishOn$PublishOnSubscriber.run(FluxPublishOn.java:527)
at azure_cosmos_spark.reactor.core.scheduler.WorkerTask.call(WorkerTask.java:84)
at azure_cosmos_spark.reactor.core.scheduler.WorkerTask.call(WorkerTask.java:37)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
... 1 more
from azure-cosmosdb-spark.
This error
{"ClassName":"BulkOperationFailedException","userAgent":"azsdk-java-cosmos/4.32.0-snapshot.1 Linux/5.4.0-1080-azure JRE/1.8.0_302","statusCode":409,"resourceAddress":null,"innerErrorMessage":"All retries exhausted for 'UPSERT' bulk operation - statusCode**=[409:0]** itemId=[e75841c9-534b-423f-b397-afbc2ad7d427], partitionKeyValue=
for an upsert operation means that you have a unique key constraint configured on your container and at least one document you try to insert is violating the unique key constraint.
--> see this section of the Ingestion best practices for more details.
You need to fix the input data to not violate the unique key constraint before you can ingest it.
from azure-cosmosdb-spark.
Related Issues (20)
- Read Data after enabling TTL on container level HOT 5
- Cosmos db OLTP connector custom query with TOP or Limit clause return wrong number of document HOT 1
- Using changefeedmaxpagesperbatch = 1 on streaming
- CosmosDBSpark.load(ss,readConfig) failing with SSL exception.
- Snapshot Isolation Guarantees HOT 1
- Unable to read edges from Cosmos DB
- Cosmos Changefeed Spark Streaming stops suddenly
- Performance issues observed after updating from azure-cosmosdb-spark_2.4.0_2.11-2.0.1 to azure-cosmosdb-spark_2.4.0_2.11-3.70 HOT 2
- Can't write CosmosDB with call stack HOT 1
- Question - dose this connector work with java SpringBoot? HOT 1
- Help - Cosmos Spark OLTP 3 Connecter With SpringBoot Issue HOT 1
- Unable to Write or Read (un-encrypted) data from Cosmos DB Containers created via Cosmos DB SDK Encryption HOT 1
- Help - How to get Azure Cosmos Db's total Ru's for one year
- itemPatch API Does not work with / HOT 3
- Cannot install on Databricks 11.3 LTS (Spark 3.3.0 HOT 2
- This repo is missing important files
- spark.cosmos.changeFeed.itemCountPerTriggerHint not honoured HOT 1
- Hierarchical partition keys support with spark connector
- Problem with library: com.azure.cosmos.spark:azure-cosmos-spark_3-3_2-12:4.23.0
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from azure-cosmosdb-spark.