ashkapsky / bigdatalog Goto Github PK
View Code? Open in Web Editor NEWLicense: Apache License 2.0
License: Apache License 2.0
Facing the same error. I am writing a pyspark.pandas dataframe to delta:
*********** An error occurred while calling o11455.save.
: org.apache.spark.SparkException: Job aborted.
at org.apache.spark.sql.errors.QueryExecutionErrors$.jobAbortedError(QueryExecutionErrors.scala:882)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$1(FileFormatWriter.scala:334)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:154)
at com.databricks.sql.transaction.tahoe.commands.WriteIntoDeltaCommand.run(WriteIntoDeltaCommand.scala:70)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.$anonfun$writeFiles$11(TransactionalWriteEdge.scala:571)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$8(SQLExecution.scala:250)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:400)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:195)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:985)
at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:149)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:350)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.$anonfun$writeFiles$1(TransactionalWriteEdge.scala:571)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.withOperationTypeTag(DeltaLogging.scala:193)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.withOperationTypeTag$(DeltaLogging.scala:180)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.withOperationTypeTag(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.$anonfun$recordDeltaOperationInternal$2(DeltaLogging.scala:157)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordFrameProfile(DeltaLogging.scala:262)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordFrameProfile$(DeltaLogging.scala:260)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.recordFrameProfile(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.$anonfun$recordDeltaOperationInternal$1(DeltaLogging.scala:156)
at com.databricks.logging.UsageLogging.$anonfun$recordOperation$1(UsageLogging.scala:547)
at com.databricks.logging.UsageLogging.executeThunkAndCaptureResultTags$1(UsageLogging.scala:642)
at com.databricks.logging.UsageLogging.$anonfun$recordOperationWithResultTags$4(UsageLogging.scala:663)
at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:404)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:147)
at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:402)
at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:399)
at com.databricks.spark.util.PublicDBLogging.withAttributionContext(DatabricksSparkUsageLogger.scala:26)
at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:447)
at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:432)
at com.databricks.spark.util.PublicDBLogging.withAttributionTags(DatabricksSparkUsageLogger.scala:26)
at com.databricks.logging.UsageLogging.recordOperationWithResultTags(UsageLogging.scala:637)
at com.databricks.logging.UsageLogging.recordOperationWithResultTags$(UsageLogging.scala:556)
at com.databricks.spark.util.PublicDBLogging.recordOperationWithResultTags(DatabricksSparkUsageLogger.scala:26)
at com.databricks.logging.UsageLogging.recordOperation(UsageLogging.scala:547)
at com.databricks.logging.UsageLogging.recordOperation$(UsageLogging.scala:517)
at com.databricks.spark.util.PublicDBLogging.recordOperation(DatabricksSparkUsageLogger.scala:26)
at com.databricks.spark.util.PublicDBLogging.recordOperation0(DatabricksSparkUsageLogger.scala:66)
at com.databricks.spark.util.DatabricksSparkUsageLogger.recordOperation(DatabricksSparkUsageLogger.scala:148)
at com.databricks.spark.util.UsageLogger.recordOperation(UsageLogger.scala:72)
at com.databricks.spark.util.UsageLogger.recordOperation$(UsageLogger.scala:59)
at com.databricks.spark.util.DatabricksSparkUsageLogger.recordOperation(DatabricksSparkUsageLogger.scala:107)
at com.databricks.spark.util.UsageLogging.recordOperation(UsageLogger.scala:433)
at com.databricks.spark.util.UsageLogging.recordOperation$(UsageLogger.scala:412)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.recordOperation(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordDeltaOperationInternal(DeltaLogging.scala:155)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordDeltaOperation(DeltaLogging.scala:145)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordDeltaOperation$(DeltaLogging.scala:133)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.recordDeltaOperation(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.$anonfun$recordWriteFilesOperation$1(TransactionalWriteEdge.scala:307)
at com.databricks.sql.acl.CheckPermissions$.trusted(CheckPermissions.scala:1837)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.recordWriteFilesOperation(TransactionalWriteEdge.scala:306)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.writeFiles(TransactionalWriteEdge.scala:339)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.writeFiles$(TransactionalWriteEdge.scala:333)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.writeFiles(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.writeFiles(TransactionalWriteEdge.scala:621)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.writeFiles$(TransactionalWriteEdge.scala:611)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.writeFiles(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.files.TransactionalWrite.writeFiles(TransactionalWrite.scala:226)
at com.databricks.sql.transaction.tahoe.files.TransactionalWrite.writeFiles$(TransactionalWrite.scala:223)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.writeFiles(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.commands.WriteIntoDelta.write(WriteIntoDelta.scala:368)
at com.databricks.sql.transaction.tahoe.commands.WriteIntoDelta.$anonfun$run$2(WriteIntoDelta.scala:111)
at com.databricks.sql.transaction.tahoe.commands.WriteIntoDelta.$anonfun$run$2$adapted(WriteIntoDelta.scala:100)
at com.databricks.sql.transaction.tahoe.DeltaLog.withNewTransaction(DeltaLog.scala:312)
at com.databricks.sql.transaction.tahoe.commands.WriteIntoDelta.$anonfun$run$1(WriteIntoDelta.scala:100)
at com.databricks.sql.acl.CheckPermissions$.trusted(CheckPermissions.scala:1837)
at com.databricks.sql.transaction.tahoe.commands.WriteIntoDelta.run(WriteIntoDelta.scala:99)
at com.databricks.sql.transaction.tahoe.sources.DeltaDataSource.createRelation(DeltaDataSource.scala:168)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:49)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:80)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:78)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:89)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$2(QueryExecution.scala:247)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$8(SQLExecution.scala:250)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:400)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:195)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:985)
at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:149)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:350)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$1(QueryExecution.scala:247)
at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$withMVTagsIfNecessary(QueryExecution.scala:232)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.applyOrElse(QueryExecution.scala:245)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.applyOrElse(QueryExecution.scala:238)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:99)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:31)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:298)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:294)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:488)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$eagerlyExecuteCommands$1(QueryExecution.scala:238)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:354)
at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:238)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:192)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:183)
at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:274)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:965)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:430)
at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:339)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:251)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:306)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 43 in stage 360.0 failed 4 times, most recent failure: Lost task 43.3 in stage 360.0 (TID 9912) (172.16.177.134 executor 10): org.apache.spark.SparkException: Checkpoint block rdd_1208_8 not found! Either the executor that originally checkpointed this partition is no longer alive, or the original RDD is unpersisted. If this problem persists, you may consider using rdd.checkpoint()
instead, which is slower than local checkpointing but more fault-tolerant.
at org.apache.spark.errors.SparkCoreErrors$.checkpointRDDBlockIdNotFoundError(SparkCoreErrors.scala:82)
at org.apache.spark.rdd.LocalCheckpointRDD.compute(LocalCheckpointRDD.scala:61)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:408)
at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:423)
at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1559)
at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1486)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1550)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1369)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:421)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:372)
at org.apache.spark.rdd.ZippedWithIndexRDD.compute(ZippedWithIndexRDD.scala:71)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:122)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.$anonfun$runTask$3(ShuffleMapTask.scala:81)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ShuffleMapTask.$anonfun$runTask$1(ShuffleMapTask.scala:81)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.doRunTask(Task.scala:174)
at org.apache.spark.scheduler.Task.$anonfun$run$4(Task.scala:137)
at com.databricks.unity.EmptyHandle$.runWithAndClose(UCSHandle.scala:126)
at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:137)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.Task.run(Task.scala:96)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$13(Executor.scala:902)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1697)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:905)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:760)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:3414)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:3336)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:3325)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:3325)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1433)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1433)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1433)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3625)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3563)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3551)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:51)
Caused by: org.apache.spark.SparkException: Checkpoint block rdd_1208_8 not found! Either the executor that originally checkpointed this partition is no longer alive, or the original RDD is unpersisted. If this problem persists, you may consider using rdd.checkpoint()
instead, which is slower than local checkpointing but more fault-tolerant.
at org.apache.spark.errors.SparkCoreErrors$.checkpointRDDBlockIdNotFoundError(SparkCoreErrors.scala:82)
at org.apache.spark.rdd.LocalCheckpointRDD.compute(LocalCheckpointRDD.scala:61)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:408)
at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:423)
at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1559)
at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1486)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1550)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1369)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:421)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:372)
at org.apache.spark.rdd.ZippedWithIndexRDD.compute(ZippedWithIndexRDD.scala:71)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:122)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.$anonfun$runTask$3(ShuffleMapTask.scala:81)
at com.data
*** WARNING: max output size exceeded, skipping output. ***
Logging.scala:193)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.withOperationTypeTag$(DeltaLogging.scala:180)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.withOperationTypeTag(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.$anonfun$recordDeltaOperationInternal$2(DeltaLogging.scala:157)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordFrameProfile(DeltaLogging.scala:262)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordFrameProfile$(DeltaLogging.scala:260)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.recordFrameProfile(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.$anonfun$recordDeltaOperationInternal$1(DeltaLogging.scala:156)
at com.databricks.logging.UsageLogging.$anonfun$recordOperation$1(UsageLogging.scala:547)
at com.databricks.logging.UsageLogging.executeThunkAndCaptureResultTags$1(UsageLogging.scala:642)
at com.databricks.logging.UsageLogging.$anonfun$recordOperationWithResultTags$4(UsageLogging.scala:663)
at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:404)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:147)
at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:402)
at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:399)
at com.databricks.spark.util.PublicDBLogging.withAttributionContext(DatabricksSparkUsageLogger.scala:26)
at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:447)
at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:432)
at com.databricks.spark.util.PublicDBLogging.withAttributionTags(DatabricksSparkUsageLogger.scala:26)
at com.databricks.logging.UsageLogging.recordOperationWithResultTags(UsageLogging.scala:637)
at com.databricks.logging.UsageLogging.recordOperationWithResultTags$(UsageLogging.scala:556)
at com.databricks.spark.util.PublicDBLogging.recordOperationWithResultTags(DatabricksSparkUsageLogger.scala:26)
at com.databricks.logging.UsageLogging.recordOperation(UsageLogging.scala:547)
at com.databricks.logging.UsageLogging.recordOperation$(UsageLogging.scala:517)
at com.databricks.spark.util.PublicDBLogging.recordOperation(DatabricksSparkUsageLogger.scala:26)
at com.databricks.spark.util.PublicDBLogging.recordOperation0(DatabricksSparkUsageLogger.scala:66)
at com.databricks.spark.util.DatabricksSparkUsageLogger.recordOperation(DatabricksSparkUsageLogger.scala:148)
at com.databricks.spark.util.UsageLogger.recordOperation(UsageLogger.scala:72)
at com.databricks.spark.util.UsageLogger.recordOperation$(UsageLogger.scala:59)
at com.databricks.spark.util.DatabricksSparkUsageLogger.recordOperation(DatabricksSparkUsageLogger.scala:107)
at com.databricks.spark.util.UsageLogging.recordOperation(UsageLogger.scala:433)
at com.databricks.spark.util.UsageLogging.recordOperation$(UsageLogger.scala:412)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.recordOperation(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordDeltaOperationInternal(DeltaLogging.scala:155)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordDeltaOperation(DeltaLogging.scala:145)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordDeltaOperation$(DeltaLogging.scala:133)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.recordDeltaOperation(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.$anonfun$recordWriteFilesOperation$1(TransactionalWriteEdge.scala:307)
at com.databricks.sql.acl.CheckPermissions$.trusted(CheckPermissions.scala:1837)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.recordWriteFilesOperation(TransactionalWriteEdge.scala:306)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.writeFiles(TransactionalWriteEdge.scala:339)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.writeFiles$(TransactionalWriteEdge.scala:333)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.writeFiles(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.writeFiles(TransactionalWriteEdge.scala:621)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.writeFiles$(TransactionalWriteEdge.scala:611)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.writeFiles(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.files.TransactionalWrite.writeFiles(TransactionalWrite.scala:226)
at com.databricks.sql.transaction.tahoe.files.TransactionalWrite.writeFiles$(TransactionalWrite.scala:223)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.writeFiles(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.commands.WriteIntoDelta.write(WriteIntoDelta.scala:368)
at com.databricks.sql.transaction.tahoe.commands.WriteIntoDelta.$anonfun$run$2(WriteIntoDelta.scala:111)
at com.databricks.sql.transaction.tahoe.commands.WriteIntoDelta.$anonfun$run$2$adapted(WriteIntoDelta.scala:100)
at com.databricks.sql.transaction.tahoe.DeltaLog.withNewTransaction(DeltaLog.scala:312)
at com.databricks.sql.transaction.tahoe.commands.WriteIntoDelta.$anonfun$run$1(WriteIntoDelta.scala:100)
at com.databricks.sql.acl.CheckPermissions$.trusted(CheckPermissions.scala:1837)
at com.databricks.sql.transaction.tahoe.commands.WriteIntoDelta.run(WriteIntoDelta.scala:99)
at com.databricks.sql.transaction.tahoe.sources.DeltaDataSource.createRelation(DeltaDataSource.scala:168)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:49)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:80)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:78)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:89)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$2(QueryExecution.scala:247)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$8(SQLExecution.scala:250)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:400)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:195)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:985)
at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:149)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:350)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$1(QueryExecution.scala:247)
at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$withMVTagsIfNecessary(QueryExecution.scala:232)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.applyOrElse(QueryExecution.scala:245)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.applyOrElse(QueryExecution.scala:238)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:99)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:31)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:298)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:294)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:488)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$eagerlyExecuteCommands$1(QueryExecution.scala:238)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:354)
at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:238)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:192)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:183)
at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:274)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:965)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:430)
at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:339)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:251)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:306)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 43 in stage 360.0 failed 4 times, most recent failure: Lost task 43.3 in stage 360.0 (TID 9912) (172.16.177.134 executor 10): org.apache.spark.SparkException: Checkpoint block rdd_1208_8 not found! Either the executor that originally checkpointed this partition is no longer alive, or the original RDD is unpersisted. If this problem persists, you may consider using rdd.checkpoint()
instead, which is slower than local checkpointing but more fault-tolerant.
at org.apache.spark.errors.SparkCoreErrors$.checkpointRDDBlockIdNotFoundError(SparkCoreErrors.scala:82)
at org.apache.spark.rdd.LocalCheckpointRDD.compute(LocalCheckpointRDD.scala:61)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:408)
at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:423)
at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1559)
at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1486)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1550)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1369)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:421)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:372)
at org.apache.spark.rdd.ZippedWithIndexRDD.compute(ZippedWithIndexRDD.scala:71)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:122)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.$anonfun$runTask$3(ShuffleMapTask.scala:81)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ShuffleMapTask.$anonfun$runTask$1(ShuffleMapTask.scala:81)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.doRunTask(Task.scala:174)
at org.apache.spark.scheduler.Task.$anonfun$run$4(Task.scala:137)
at com.databricks.unity.EmptyHandle$.runWithAndClose(UCSHandle.scala:126)
at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:137)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.Task.run(Task.scala:96)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$13(Executor.scala:902)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1697)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:905)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:760)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:3414)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:3336)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:3325)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:3325)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1433)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1433)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1433)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3625)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3563)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3551)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:51)
Caused by: org.apache.spark.SparkException: Checkpoint block rdd_1208_8 not found! Either the executor that originally checkpointed this partition is no longer alive, or the original RDD is unpersisted. If this problem persists, you may consider using rdd.checkpoint()
instead, which is slower than local checkpointing but more fault-tolerant.
at org.apache.spark.errors.SparkCoreErrors$.checkpointRDDBlockIdNotFoundError(SparkCoreErrors.scala:82)
at org.apache.spark.rdd.LocalCheckpointRDD.compute(LocalCheckpointRDD.scala:61)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:408)
at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:423)
at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1559)
at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1486)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1550)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1369)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:421)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:372)
at org.apache.spark.rdd.ZippedWithIndexRDD.compute(ZippedWithIndexRDD.scala:71)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:122)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.$anonfun$runTask$3(ShuffleMapTask.scala:81)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ShuffleMapTask.$anonfun$runTask$1(ShuffleMapTask.scala:81)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.doRunTask(Task.scala:174)
at org.apache.spark.scheduler.Task.$anonfun$run$4(Task.scala:137)
at com.databricks.unity.EmptyHandle$.runWithAndClose(UCSHandle.scala:126)
at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:137)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.Task.run(Task.scala:96)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$13(Executor.scala:902)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1697)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:905)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:760)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
shape - (191365764, 5)
Originally posted by @alanthom in #9 (comment)
Hello,
I get an error message that a column is not found (see error.zip for full log message):
org.apache.spark.sql.AnalysisException: cannot resolve 'extern_file_arcs2.Z' given input columns: [X, Z, Z, Y];
I used the following datalog program:
database({ extern_file_arcs(X:string, Y:string) }).
arcs(X,Y) <- extern_file_arcs(X,Y).
tc(X,Y) <- arcs(X,Y).
tc(X,Y) <- tc(X,Z), arcs(Z,Y).
By chance I doubled the second line. Astonishingly, it then run without problem:
database({ extern_file_arcs(X:string, Y:string) }).
arcs(X,Y) <- extern_file_arcs(X,Y).
arcs(X,Y) <- extern_file_arcs(X,Y).
tc(X,Y) <- arcs(X,Y).
tc(X,Y) <- tc(X,Z), arcs(Z,Y).
Is that a bug or a feature? How can I be sure that I repeat the rules often enough?
Facing the same error. I am writing a pyspark.pandas dataframe to delta:
*********** An error occurred while calling o11455.save.
: org.apache.spark.SparkException: Job aborted.
at org.apache.spark.sql.errors.QueryExecutionErrors$.jobAbortedError(QueryExecutionErrors.scala:882)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$1(FileFormatWriter.scala:334)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:154)
at com.databricks.sql.transaction.tahoe.commands.WriteIntoDeltaCommand.run(WriteIntoDeltaCommand.scala:70)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.$anonfun$writeFiles$11(TransactionalWriteEdge.scala:571)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$8(SQLExecution.scala:250)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:400)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:195)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:985)
at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:149)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:350)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.$anonfun$writeFiles$1(TransactionalWriteEdge.scala:571)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.withOperationTypeTag(DeltaLogging.scala:193)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.withOperationTypeTag$(DeltaLogging.scala:180)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.withOperationTypeTag(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.$anonfun$recordDeltaOperationInternal$2(DeltaLogging.scala:157)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordFrameProfile(DeltaLogging.scala:262)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordFrameProfile$(DeltaLogging.scala:260)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.recordFrameProfile(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.$anonfun$recordDeltaOperationInternal$1(DeltaLogging.scala:156)
at com.databricks.logging.UsageLogging.$anonfun$recordOperation$1(UsageLogging.scala:547)
at com.databricks.logging.UsageLogging.executeThunkAndCaptureResultTags$1(UsageLogging.scala:642)
at com.databricks.logging.UsageLogging.$anonfun$recordOperationWithResultTags$4(UsageLogging.scala:663)
at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:404)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:147)
at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:402)
at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:399)
at com.databricks.spark.util.PublicDBLogging.withAttributionContext(DatabricksSparkUsageLogger.scala:26)
at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:447)
at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:432)
at com.databricks.spark.util.PublicDBLogging.withAttributionTags(DatabricksSparkUsageLogger.scala:26)
at com.databricks.logging.UsageLogging.recordOperationWithResultTags(UsageLogging.scala:637)
at com.databricks.logging.UsageLogging.recordOperationWithResultTags$(UsageLogging.scala:556)
at com.databricks.spark.util.PublicDBLogging.recordOperationWithResultTags(DatabricksSparkUsageLogger.scala:26)
at com.databricks.logging.UsageLogging.recordOperation(UsageLogging.scala:547)
at com.databricks.logging.UsageLogging.recordOperation$(UsageLogging.scala:517)
at com.databricks.spark.util.PublicDBLogging.recordOperation(DatabricksSparkUsageLogger.scala:26)
at com.databricks.spark.util.PublicDBLogging.recordOperation0(DatabricksSparkUsageLogger.scala:66)
at com.databricks.spark.util.DatabricksSparkUsageLogger.recordOperation(DatabricksSparkUsageLogger.scala:148)
at com.databricks.spark.util.UsageLogger.recordOperation(UsageLogger.scala:72)
at com.databricks.spark.util.UsageLogger.recordOperation$(UsageLogger.scala:59)
at com.databricks.spark.util.DatabricksSparkUsageLogger.recordOperation(DatabricksSparkUsageLogger.scala:107)
at com.databricks.spark.util.UsageLogging.recordOperation(UsageLogger.scala:433)
at com.databricks.spark.util.UsageLogging.recordOperation$(UsageLogger.scala:412)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.recordOperation(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordDeltaOperationInternal(DeltaLogging.scala:155)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordDeltaOperation(DeltaLogging.scala:145)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordDeltaOperation$(DeltaLogging.scala:133)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.recordDeltaOperation(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.$anonfun$recordWriteFilesOperation$1(TransactionalWriteEdge.scala:307)
at com.databricks.sql.acl.CheckPermissions$.trusted(CheckPermissions.scala:1837)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.recordWriteFilesOperation(TransactionalWriteEdge.scala:306)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.writeFiles(TransactionalWriteEdge.scala:339)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.writeFiles$(TransactionalWriteEdge.scala:333)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.writeFiles(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.writeFiles(TransactionalWriteEdge.scala:621)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.writeFiles$(TransactionalWriteEdge.scala:611)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.writeFiles(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.files.TransactionalWrite.writeFiles(TransactionalWrite.scala:226)
at com.databricks.sql.transaction.tahoe.files.TransactionalWrite.writeFiles$(TransactionalWrite.scala:223)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.writeFiles(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.commands.WriteIntoDelta.write(WriteIntoDelta.scala:368)
at com.databricks.sql.transaction.tahoe.commands.WriteIntoDelta.$anonfun$run$2(WriteIntoDelta.scala:111)
at com.databricks.sql.transaction.tahoe.commands.WriteIntoDelta.$anonfun$run$2$adapted(WriteIntoDelta.scala:100)
at com.databricks.sql.transaction.tahoe.DeltaLog.withNewTransaction(DeltaLog.scala:312)
at com.databricks.sql.transaction.tahoe.commands.WriteIntoDelta.$anonfun$run$1(WriteIntoDelta.scala:100)
at com.databricks.sql.acl.CheckPermissions$.trusted(CheckPermissions.scala:1837)
at com.databricks.sql.transaction.tahoe.commands.WriteIntoDelta.run(WriteIntoDelta.scala:99)
at com.databricks.sql.transaction.tahoe.sources.DeltaDataSource.createRelation(DeltaDataSource.scala:168)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:49)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:80)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:78)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:89)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$2(QueryExecution.scala:247)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$8(SQLExecution.scala:250)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:400)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:195)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:985)
at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:149)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:350)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$1(QueryExecution.scala:247)
at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$withMVTagsIfNecessary(QueryExecution.scala:232)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.applyOrElse(QueryExecution.scala:245)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.applyOrElse(QueryExecution.scala:238)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:99)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:31)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:298)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:294)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:488)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$eagerlyExecuteCommands$1(QueryExecution.scala:238)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:354)
at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:238)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:192)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:183)
at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:274)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:965)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:430)
at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:339)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:251)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:306)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 43 in stage 360.0 failed 4 times, most recent failure: Lost task 43.3 in stage 360.0 (TID 9912) (172.16.177.134 executor 10): org.apache.spark.SparkException: Checkpoint block rdd_1208_8 not found! Either the executor that originally checkpointed this partition is no longer alive, or the original RDD is unpersisted. If this problem persists, you may consider using rdd.checkpoint()
instead, which is slower than local checkpointing but more fault-tolerant.
at org.apache.spark.errors.SparkCoreErrors$.checkpointRDDBlockIdNotFoundError(SparkCoreErrors.scala:82)
at org.apache.spark.rdd.LocalCheckpointRDD.compute(LocalCheckpointRDD.scala:61)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:408)
at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:423)
at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1559)
at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1486)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1550)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1369)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:421)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:372)
at org.apache.spark.rdd.ZippedWithIndexRDD.compute(ZippedWithIndexRDD.scala:71)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:122)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.$anonfun$runTask$3(ShuffleMapTask.scala:81)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ShuffleMapTask.$anonfun$runTask$1(ShuffleMapTask.scala:81)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.doRunTask(Task.scala:174)
at org.apache.spark.scheduler.Task.$anonfun$run$4(Task.scala:137)
at com.databricks.unity.EmptyHandle$.runWithAndClose(UCSHandle.scala:126)
at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:137)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.Task.run(Task.scala:96)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$13(Executor.scala:902)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1697)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:905)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:760)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:3414)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:3336)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:3325)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:3325)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1433)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1433)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1433)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3625)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3563)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3551)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:51)
Caused by: org.apache.spark.SparkException: Checkpoint block rdd_1208_8 not found! Either the executor that originally checkpointed this partition is no longer alive, or the original RDD is unpersisted. If this problem persists, you may consider using rdd.checkpoint()
instead, which is slower than local checkpointing but more fault-tolerant.
at org.apache.spark.errors.SparkCoreErrors$.checkpointRDDBlockIdNotFoundError(SparkCoreErrors.scala:82)
at org.apache.spark.rdd.LocalCheckpointRDD.compute(LocalCheckpointRDD.scala:61)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:408)
at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:423)
at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1559)
at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1486)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1550)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1369)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:421)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:372)
at org.apache.spark.rdd.ZippedWithIndexRDD.compute(ZippedWithIndexRDD.scala:71)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:122)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.$anonfun$runTask$3(ShuffleMapTask.scala:81)
at com.data
*** WARNING: max output size exceeded, skipping output. ***
Logging.scala:193)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.withOperationTypeTag$(DeltaLogging.scala:180)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.withOperationTypeTag(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.$anonfun$recordDeltaOperationInternal$2(DeltaLogging.scala:157)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordFrameProfile(DeltaLogging.scala:262)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordFrameProfile$(DeltaLogging.scala:260)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.recordFrameProfile(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.$anonfun$recordDeltaOperationInternal$1(DeltaLogging.scala:156)
at com.databricks.logging.UsageLogging.$anonfun$recordOperation$1(UsageLogging.scala:547)
at com.databricks.logging.UsageLogging.executeThunkAndCaptureResultTags$1(UsageLogging.scala:642)
at com.databricks.logging.UsageLogging.$anonfun$recordOperationWithResultTags$4(UsageLogging.scala:663)
at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:404)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:147)
at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:402)
at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:399)
at com.databricks.spark.util.PublicDBLogging.withAttributionContext(DatabricksSparkUsageLogger.scala:26)
at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:447)
at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:432)
at com.databricks.spark.util.PublicDBLogging.withAttributionTags(DatabricksSparkUsageLogger.scala:26)
at com.databricks.logging.UsageLogging.recordOperationWithResultTags(UsageLogging.scala:637)
at com.databricks.logging.UsageLogging.recordOperationWithResultTags$(UsageLogging.scala:556)
at com.databricks.spark.util.PublicDBLogging.recordOperationWithResultTags(DatabricksSparkUsageLogger.scala:26)
at com.databricks.logging.UsageLogging.recordOperation(UsageLogging.scala:547)
at com.databricks.logging.UsageLogging.recordOperation$(UsageLogging.scala:517)
at com.databricks.spark.util.PublicDBLogging.recordOperation(DatabricksSparkUsageLogger.scala:26)
at com.databricks.spark.util.PublicDBLogging.recordOperation0(DatabricksSparkUsageLogger.scala:66)
at com.databricks.spark.util.DatabricksSparkUsageLogger.recordOperation(DatabricksSparkUsageLogger.scala:148)
at com.databricks.spark.util.UsageLogger.recordOperation(UsageLogger.scala:72)
at com.databricks.spark.util.UsageLogger.recordOperation$(UsageLogger.scala:59)
at com.databricks.spark.util.DatabricksSparkUsageLogger.recordOperation(DatabricksSparkUsageLogger.scala:107)
at com.databricks.spark.util.UsageLogging.recordOperation(UsageLogger.scala:433)
at com.databricks.spark.util.UsageLogging.recordOperation$(UsageLogger.scala:412)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.recordOperation(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordDeltaOperationInternal(DeltaLogging.scala:155)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordDeltaOperation(DeltaLogging.scala:145)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordDeltaOperation$(DeltaLogging.scala:133)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.recordDeltaOperation(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.$anonfun$recordWriteFilesOperation$1(TransactionalWriteEdge.scala:307)
at com.databricks.sql.acl.CheckPermissions$.trusted(CheckPermissions.scala:1837)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.recordWriteFilesOperation(TransactionalWriteEdge.scala:306)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.writeFiles(TransactionalWriteEdge.scala:339)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.writeFiles$(TransactionalWriteEdge.scala:333)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.writeFiles(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.writeFiles(TransactionalWriteEdge.scala:621)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.writeFiles$(TransactionalWriteEdge.scala:611)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.writeFiles(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.files.TransactionalWrite.writeFiles(TransactionalWrite.scala:226)
at com.databricks.sql.transaction.tahoe.files.TransactionalWrite.writeFiles$(TransactionalWrite.scala:223)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.writeFiles(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.commands.WriteIntoDelta.write(WriteIntoDelta.scala:368)
at com.databricks.sql.transaction.tahoe.commands.WriteIntoDelta.$anonfun$run$2(WriteIntoDelta.scala:111)
at com.databricks.sql.transaction.tahoe.commands.WriteIntoDelta.$anonfun$run$2$adapted(WriteIntoDelta.scala:100)
at com.databricks.sql.transaction.tahoe.DeltaLog.withNewTransaction(DeltaLog.scala:312)
at com.databricks.sql.transaction.tahoe.commands.WriteIntoDelta.$anonfun$run$1(WriteIntoDelta.scala:100)
at com.databricks.sql.acl.CheckPermissions$.trusted(CheckPermissions.scala:1837)
at com.databricks.sql.transaction.tahoe.commands.WriteIntoDelta.run(WriteIntoDelta.scala:99)
at com.databricks.sql.transaction.tahoe.sources.DeltaDataSource.createRelation(DeltaDataSource.scala:168)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:49)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:80)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:78)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:89)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$2(QueryExecution.scala:247)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$8(SQLExecution.scala:250)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:400)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:195)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:985)
at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:149)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:350)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$1(QueryExecution.scala:247)
at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$withMVTagsIfNecessary(QueryExecution.scala:232)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.applyOrElse(QueryExecution.scala:245)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.applyOrElse(QueryExecution.scala:238)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:99)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:31)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:298)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:294)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:488)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$eagerlyExecuteCommands$1(QueryExecution.scala:238)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:354)
at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:238)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:192)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:183)
at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:274)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:965)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:430)
at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:339)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:251)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:306)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 43 in stage 360.0 failed 4 times, most recent failure: Lost task 43.3 in stage 360.0 (TID 9912) (172.16.177.134 executor 10): org.apache.spark.SparkException: Checkpoint block rdd_1208_8 not found! Either the executor that originally checkpointed this partition is no longer alive, or the original RDD is unpersisted. If this problem persists, you may consider using rdd.checkpoint()
instead, which is slower than local checkpointing but more fault-tolerant.
at org.apache.spark.errors.SparkCoreErrors$.checkpointRDDBlockIdNotFoundError(SparkCoreErrors.scala:82)
at org.apache.spark.rdd.LocalCheckpointRDD.compute(LocalCheckpointRDD.scala:61)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:408)
at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:423)
at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1559)
at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1486)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1550)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1369)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:421)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:372)
at org.apache.spark.rdd.ZippedWithIndexRDD.compute(ZippedWithIndexRDD.scala:71)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:122)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.$anonfun$runTask$3(ShuffleMapTask.scala:81)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ShuffleMapTask.$anonfun$runTask$1(ShuffleMapTask.scala:81)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.doRunTask(Task.scala:174)
at org.apache.spark.scheduler.Task.$anonfun$run$4(Task.scala:137)
at com.databricks.unity.EmptyHandle$.runWithAndClose(UCSHandle.scala:126)
at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:137)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.Task.run(Task.scala:96)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$13(Executor.scala:902)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1697)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:905)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:760)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:3414)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:3336)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:3325)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:3325)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1433)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1433)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1433)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3625)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3563)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3551)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:51)
Caused by: org.apache.spark.SparkException: Checkpoint block rdd_1208_8 not found! Either the executor that originally checkpointed this partition is no longer alive, or the original RDD is unpersisted. If this problem persists, you may consider using rdd.checkpoint()
instead, which is slower than local checkpointing but more fault-tolerant.
at org.apache.spark.errors.SparkCoreErrors$.checkpointRDDBlockIdNotFoundError(SparkCoreErrors.scala:82)
at org.apache.spark.rdd.LocalCheckpointRDD.compute(LocalCheckpointRDD.scala:61)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:408)
at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:423)
at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1559)
at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1486)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1550)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1369)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:421)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:372)
at org.apache.spark.rdd.ZippedWithIndexRDD.compute(ZippedWithIndexRDD.scala:71)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:122)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.$anonfun$runTask$3(ShuffleMapTask.scala:81)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ShuffleMapTask.$anonfun$runTask$1(ShuffleMapTask.scala:81)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.doRunTask(Task.scala:174)
at org.apache.spark.scheduler.Task.$anonfun$run$4(Task.scala:137)
at com.databricks.unity.EmptyHandle$.runWithAndClose(UCSHandle.scala:126)
at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:137)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.Task.run(Task.scala:96)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$13(Executor.scala:902)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1697)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:905)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:760)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
shape - (191365764, 5)
Originally posted by @alanthom in #9 (comment)
Hello,
could you add more details about how to run a program in the README? It would help to specify the command line options of Experiments.scala. Here the steps that made it work for me.
../bigdatalog.deal
database({ arcs(X:string, Y:string) }).
tc(X,Y) <- arcs(X,Y).
A B
B C
C D
D E
E F
F G
G H
./bin/run-example datalog.Experiments --program=99 --file=../redirect.txt --queryform="tc(A,B)" --baserelation_arcs=../arcs
Hi,
I was trying to execute un script using PageRank algorithm, so, I defined a schema like this:
database({ node(N:double), edge(S:double, Sink:double), edgeCount(S:double, Cnt:double) }).
rank(Inc, N, msum<R>) <- node(N), Inc = 1, R = 0.15 / N.
rank(Incp, N, msum<R>) <- rank(Inc, P, RP), edge(P,N), edgeCount(P, Cnt), Cnt > 0, R = 0.85 * RP / Cnt, Incp = Inc + 1.
However, when I try to execute I got this error:
Exception in thread "main" org.apache.spark.sql.AnalysisException: undefined function mcount;
at org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry$$anonfun$2.apply(FunctionRegistry.scala:65)
at org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry$$anonfun$2.apply(FunctionRegistry.scala:65)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry.lookupFunction(FunctionRegistry.scala:64)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$12$$anonfun$applyOrElse$5$$anonfun$applyOrElse$24.apply(Analyzer.scala:574)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$12$$anonfun$applyOrElse$5$$anonfun$applyOrElse$24.apply(Analyzer.scala:574)
at org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:48)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$12$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:573)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$12$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:570)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:259)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:259)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:258)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:264)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:264)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:281)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$class.foreach(Iterator.scala:727)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
...
I'm not trying to use the mcount
function, but apparently the program is trying to call it.
Thanks!
I guess, there is no roadmap :)
It is not supposed to be compatible with Spark 2.x?
There would be nothing past DeALS-0.6.jar ( 0.7, ... ) ?
The DeAL GUI editor and DeALSFileRunner is for UCLA-use only?
I've had an enormous amount of issues building BigDatalog in 2021:
[error]
[error] last tree to typer: TypeTree(trait Seq)
[error] symbol: trait Seq in package collection (flags: <interface> abstract <trait> <lateinterface>)
[error] symbol definition: abstract trait Seq[+A] extends PartialFunction[Int,A] with Iterable[A] with GenSeq[A] with GenericTraversableTemplate[A,Seq] with SeqLike[A,Seq[A]]
[error] tpe: Seq
[error] symbol owners: trait Seq -> package collection
[error] context owners: anonymous class $anonfun -> method createListenerAndUI -> object SQLContext -> package sql
[error]
[error] == Enclosing template or block ==
[error]
[error] ClassDef( // final class $anonfun extends AbstractFunction1[org.apache.spark.ui.SparkUI,org.apache.spark.sql.execution.ui.SQLTab] with Serializable
[error] final <synthetic> @{ SerialVersionUID(0) }
[error] "$anonfun"
[error] []
[error] Template( // val <local $anonfun>: <notype>, tree.tpe=scala.runtime.AbstractFunction1[org.apache.spark.ui.SparkUI,org.apache.spark.sql.execution.ui.SQLTab] with Serializable
[error] "scala.runtime.AbstractFunction1", "scala.Serializable" // parents
[error] ValDef(
[error] private
[error] "_"
[error] <tpt>
[error] <empty>
[error] )
[error] // 2 statements
[error] DefDef( // def <init>(): scala.runtime.AbstractFunction1[org.apache.spark.ui.SparkUI,org.apache.spark.sql.execution.ui.SQLTab] with Serializable
[error] <method> <triedcooking>
[error] "<init>"
[error] []
[error] List(Nil)
[error] <tpt> // tree.tpe=scala.runtime.AbstractFunction1[org.apache.spark.ui.SparkUI,org.apache.spark.sql.execution.ui.SQLTab] with Serializable
[error] Block( // tree.tpe=Unit
[error] Apply( // def <init>(): scala.runtime.AbstractFunction1[T1,R] in class AbstractFunction1, tree.tpe=scala.runtime.AbstractFunction1[org.apache.spark.ui.SparkUI,org.apache.spark.sql.execution.ui.SQLTab]
[error] $anonfun.super."<init>" // def <init>(): scala.runtime.AbstractFunction1[T1,R] in class AbstractFunction1, tree.tpe=()scala.runtime.AbstractFunction1[org.apache.spark.ui.SparkUI,org.apache.spark.sql.execution.ui.SQLTab]
[error] Nil
[error] )
[error] ()
[error] )
[error] )
[error] DefDef( // final def apply(x$14: org.apache.spark.ui.SparkUI): org.apache.spark.sql.execution.ui.SQLTab
[error] <method> final
[error] "apply"
[error] []
[error] // 1 parameter list
[error] ValDef( // x$14: org.apache.spark.ui.SparkUI
[error] <param> <synthetic> <triedcooking>
[error] "x$14"
[error] <tpt> // tree.tpe=org.apache.spark.ui.SparkUI
[error] <empty>
[error] )
[error] <tpt> // tree.tpe=org.apache.spark.sql.execution.ui.SQLTab
[error] Apply( // def <init>(listener: org.apache.spark.sql.execution.ui.SQLListener,sparkUI: org.apache.spark.ui.SparkUI): org.apache.spark.sql.execution.ui.SQLTab in class SQLTab, tree.tpe=org.apache.spark.sql.execution.ui.SQLTab
[error] new org.apache.spark.sql.execution.ui.SQLTab."<init>" // def <init>(listener: org.apache.spark.sql.execution.ui.SQLListener,sparkUI: org.apache.spark.ui.SparkUI): org.apache.spark.sql.execution.ui.SQLTab in class SQLTab, tree.tpe=(listener: org.apache.spark.sql.execution.ui.SQLListener, sparkUI: org.apache.spark.ui.SparkUI)org.apache.spark.sql.execution.ui.SQLTab
[error] // 2 arguments
[error] "listener" // val listener: org.apache.spark.sql.execution.ui.SQLListener, tree.tpe=org.apache.spark.sql.execution.ui.SQLListener
[error] "x$14" // x$14: org.apache.spark.ui.SparkUI, tree.tpe=org.apache.spark.ui.SparkUI
[error] )
[error] )
[error] )
[error] )
[error]
[error] == Expanded type of tree ==
[error]
[error] TypeRef(
[error] TypeSymbol(
[error] abstract trait Seq[+A] extends PartialFunction[Int,A] with Iterable[A] with GenSeq[A] with GenericTraversableTemplate[A,Seq] with SeqLike[A,Seq[A]]
[error]
[error] )
[error] normalize = PolyType(
[error] typeParams = List(TypeParam(+A))
[error] resultType = TypeRef(
[error] TypeSymbol(
[error] abstract trait Seq[+A] extends PartialFunction[Int,A] with Iterable[A] with GenSeq[A] with GenericTraversableTemplate[A,Seq] with SeqLike[A,Seq[A]]
[error]
[error] )
[error] args = List(TypeParamTypeRef(TypeParam(+A)))
[error] )
[error] )
[error] )
[error]
Does anyone have a successful copy of BigDatalog? Our team has been trying to run a comparison against it compared to our own Datalog engine, but we have had very poor luck in getting it to run on anything close to a modern system and the lack of the included Dockerfiles' to continue to build has defeated any of their utility.
We've been trying for a few weeks now to build BigDatalog and have essentially given up because it's so far bitrotted..
Hello,
I tried to minimize #7 further. I arrived at
a(X,Y) <- b(X,Y).
b(X,Y) <- a(X,Y).
which leads to a StackOverflowError:
Exception in thread "main" java.lang.StackOverflowError
at edu.ucla.cs.wis.bigdatalog.interpreter.relational.ProgramGenerator.generateRecursiveOperator(ProgramGenerator.java:1439)
at edu.ucla.cs.wis.bigdatalog.interpreter.relational.ProgramGenerator.generateOperator(ProgramGenerator.java:186)
at edu.ucla.cs.wis.bigdatalog.interpreter.relational.ProgramGenerator.generateOperators(ProgramGenerator.java:249)
at edu.ucla.cs.wis.bigdatalog.interpreter.relational.ProgramGenerator.generateCliqueOperator(ProgramGenerator.java:1616)
at edu.ucla.cs.wis.bigdatalog.interpreter.relational.ProgramGenerator.generateRecursiveOperator(ProgramGenerator.java:1472)
at edu.ucla.cs.wis.bigdatalog.interpreter.relational.ProgramGenerator.generateOperator(ProgramGenerator.java:186)
at edu.ucla.cs.wis.bigdatalog.interpreter.relational.ProgramGenerator.generateOperators(ProgramGenerator.java:249)
at edu.ucla.cs.wis.bigdatalog.interpreter.relational.ProgramGenerator.generateCliqueOperator(ProgramGenerator.java:1616)
...
and then the stack trace repeats again and again
Hi, everyone!
I'm a fresher to BigDatalog. My OS is Ubuntu 16.04. My running steps are list as follows:
arc
, Nonearc
, None== Analyzed Logical Plan ==
A: int, To: int
Subquery tc
+- Recursion tc, true, [1,0]
:- Subquery arc
: +- LogicalRDD [From#0,To#1], MapPartitionsRDD[4] at mapPartitions at Utilities.scala:100
+- Project [A#4,To#1]
+- Join Inner, Some((C#5 = From#0))
:- Subquery tc1
: +- LinearRecursiveRelation tc, [A#4,C#5], [1,0]
+- BroadcastHint
+- Subquery arc2
+- Project [From#0,To#1]
+- Subquery arc
+- LogicalRDD [From#0,To#1], MapPartitionsRDD[4] at mapPartitions at Utilities.scala:100
== Optimized Logical Plan ==
Recursion tc, true, [1,0]
:- LogicalRDD [From#0,To#1], MapPartitionsRDD[4] at mapPartitions at Utilities.scala:100
+- Project [A#4,To#1]
+- Join Inner, Some((C#5 = From#0))
:- LinearRecursiveRelation tc, [A#4,C#5], [1,0]
+- BroadcastHint
+- Project [From#0,To#1]
+- LogicalRDD [From#0,To#1], MapPartitionsRDD[4] at mapPartitions at Utilities.scala:100
== Physical Plan ==
Recursion [A#4,To#1] (Linear) [tc][1,0]
:- TungstenExchange hashpartitioning(From#0,200), None
: +- ConvertToUnsafe
: +- Scan ExistingRDD[From#0,To#1]
+- Project [A#4,To#1]
+- BroadcastHashJoin [C#5], [From#0], BuildRight
:- LinearRecursiveRelation A#4,C#5
+- Project [From#0,To#1]
+- Scan ExistingRDD[From#0,To#1]
17-12-01 09:33:41 INFO BigDatalogContext: ** END BigDatalog Program END **
17-12-01 09:33:41 INFO Recursion: Recursion operator configuration settings:
17-12-01 09:33:41 INFO Recursion: Using memory checkpointing with StorageLevel(false, true, false, true, 1)
17-12-01 09:33:41 INFO Recursion: Recursion version: Single-Job-PSN w/ SetRDD
17-12-01 09:33:41 INFO FileInputFormat: Total input paths to process : 1
17-12-01 09:33:41 INFO FileInputFormat: Total input paths to process : 1
17-12-01 09:33:41 INFO SparkContext: Starting job: run at ThreadPoolExecutor.java:1149
17-12-01 09:33:41 INFO DAGScheduler: Got job 0 (run at ThreadPoolExecutor.java:1149) with 200 output partitions
17-12-01 09:33:41 INFO DAGScheduler: Final stage: ResultStage 0 (run at ThreadPoolExecutor.java:1149)
17-12-01 09:33:41 INFO DAGScheduler: Parents of final stage: List()
17-12-01 09:33:41 INFO DAGScheduler: Missing parents: List()
17-12-01 09:33:41 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[7] at run at ThreadPoolExecutor.java:1149), which has no missing parents
17-12-01 09:33:41 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 7.6 KB, free 180.7 KB)
17-12-01 09:33:41 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 3.9 KB, free 184.6 KB)
17-12-01 09:33:41 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:42113 (size: 3.9 KB, free: 511.1 MB)
17-12-01 09:33:41 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1096
17-12-01 09:33:41 INFO DAGScheduler: Submitting 200 missing tasks from ResultStage 0 (MapPartitionsRDD[7] at run at ThreadPoolExecutor.java:1149)
17-12-01 09:33:41 INFO TaskSchedulerImpl: Adding task set 0.0 with 200 tasks
17-12-01 09:33:41 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0,PROCESS_LOCAL, 2522 bytes)
17-12-01 09:33:41 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, partition 1,PROCESS_LOCAL, 2522 bytes)
17-12-01 09:33:41 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, localhost, partition 2,PROCESS_LOCAL, 2522 bytes)
17-12-01 09:33:41 INFO TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3, localhost, partition 3,PROCESS_LOCAL, 2522 bytes)
17-12-01 09:33:41 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
17-12-01 09:33:41 INFO Executor: Running task 2.0 in stage 0.0 (TID 2)
17-12-01 09:33:41 INFO Executor: Running task 3.0 in stage 0.0 (TID 3)
17-12-01 09:33:41 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
......(omit)
17-12-01 09:33:56 INFO TaskSchedulerImpl: Adding task set 3.0 with 200 tasks
17-12-01 09:33:56 INFO TaskSetManager: Starting task 124.0 in stage 3.0 (TID 600, localhost, partition 124,PROCESS_LOCAL, 2344 bytes)
17-12-01 09:33:56 INFO TaskSetManager: Starting task 125.0 in stage 3.0 (TID 601, localhost, partition 125,PROCESS_LOCAL, 2344 bytes)
17-12-01 09:33:56 INFO TaskSetManager: Starting task 126.0 in stage 3.0 (TID 602, localhost, partition 126,PROCESS_LOCAL, 2344 bytes)
17-12-01 09:33:56 INFO TaskSetManager: Starting task 127.0 in stage 3.0 (TID 603, localhost, partition 127,PROCESS_LOCAL, 2344 bytes)
17-12-01 09:33:56 INFO Executor: Running task 126.0 in stage 3.0 (TID 602)
17-12-01 09:33:56 INFO Executor: Running task 124.0 in stage 3.0 (TID 600)
17-12-01 09:33:56 INFO Executor: Running task 125.0 in stage 3.0 (TID 601)
17-12-01 09:33:56 INFO Executor: Running task 127.0 in stage 3.0 (TID 603)
17-12-01 09:33:56 INFO CacheManager: Partition rdd_31_127 not found, computing it
17-12-01 09:33:56 INFO CacheManager: Partition rdd_31_126 not found, computing it
17-12-01 09:33:56 INFO CacheManager: Partition rdd_31_124 not found, computing it
17-12-01 09:33:56 INFO CacheManager: Partition rdd_31_125 not found, computing it
17-12-01 09:33:56 INFO CacheManager: Partition rdd_27_127 not found, computing it
17-12-01 09:33:56 INFO CacheManager: Partition rdd_27_125 not found, computing it
17-12-01 09:33:56 INFO BlockManager: Found block rdd_17_127 locally
17-12-01 09:33:56 INFO CacheManager: Partition rdd_27_124 not found, computing it
17-12-01 09:33:56 INFO CacheManager: Partition rdd_27_126 not found, computing it
17-12-01 09:33:56 INFO CacheManager: Partition rdd_17_124 not found, computing it
17-12-01 09:33:56 INFO BlockManager: Found block rdd_21_127 locally
17-12-01 09:33:56 INFO SetRDDHashSetPartition: Union set size 988 for rdd 18 took 0 ms
17-12-01 09:33:56 ERROR Executor: Exception in task 124.0 in stage 3.0 (TID 600)
org.apache.spark.SparkException: Checkpoint block rdd_17_124 not found! Either the executor that originally checkpointed this partition is no longer alive, or the original RDD is unpersisted. If this problem persists, you may consider using rdd.checkpoint()
or rdd.localcheckpoint()
instead, which are slower than memory checkpointing but more fault-tolerant.
at org.apache.spark.rdd.MemoryCheckpointRDD.compute(MemoryCheckpointRDD.scala:43)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:304)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
at edu.ucla.cs.wis.bigdatalog.spark.execution.setrdd.SetRDD.compute(SetRDD.scala:108)
at edu.ucla.cs.wis.bigdatalog.spark.execution.setrdd.SetRDD.computeOrReadCheckpoint(SetRDD.scala:104)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.fixedpoint.FixedPointResultTask.runTask(FixedPointResultTask.scala:54)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
17-12-01 09:33:56 INFO BlockManager: Found block rdd_17_125 locally
17-12-01 09:33:56 INFO CacheManager: Partition rdd_17_126 not found, computing it
17-12-01 09:33:56 ERROR Executor: Exception in task 126.0 in stage 3.0 (TID 602)
org.apache.spark.SparkException: Checkpoint block rdd_17_126 not found! Either the executor that originally checkpointed this partition is no longer alive, or the original RDD is unpersisted. If this problem persists, you may consider using rdd.checkpoint()
or rdd.localcheckpoint()
instead, which are slower than memory checkpointing but more fault-tolerant.
at org.apache.spark.rdd.MemoryCheckpointRDD.compute(MemoryCheckpointRDD.scala:43)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:304)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
at edu.ucla.cs.wis.bigdatalog.spark.execution.setrdd.SetRDD.compute(SetRDD.scala:108)
at edu.ucla.cs.wis.bigdatalog.spark.execution.setrdd.SetRDD.computeOrReadCheckpoint(SetRDD.scala:104)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.fixedpoint.FixedPointResultTask.runTask(FixedPointResultTask.scala:54)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
17-12-01 09:33:56 INFO BlockManager: Found block rdd_21_125 locally
17-12-01 09:33:56 INFO SetRDDHashSetPartition: Union set size 1032 for rdd 18 took 0 ms
17-12-01 09:33:56 INFO MemoryStore: 1 blocks selected for dropping
17-12-01 09:33:56 INFO BlockManager: Dropping block rdd_21_124 from memory
17-12-01 09:33:56 INFO BlockManagerInfo: Removed rdd_21_124 on localhost:42113 in memory (size: 1708.1 KB, free: 18.3 MB)
17-12-01 09:33:56 INFO TaskSetManager: Starting task 128.0 in stage 3.0 (TID 604, localhost, partition 128,PROCESS_LOCAL, 2344 bytes)
17-12-01 09:33:56 INFO MemoryStore: Block rdd_27_127 stored as values in memory (estimated size 1720.1 KB, free 509.1 MB)
17-12-01 09:33:56 INFO TaskSetManager: Starting task 129.0 in stage 3.0 (TID 605, localhost, partition 129,PROCESS_LOCAL, 2344 bytes)
17-12-01 09:33:56 INFO Executor: Running task 129.0 in stage 3.0 (TID 605)
17-12-01 09:33:56 WARN TaskSetManager: Lost task 124.0 in stage 3.0 (TID 600, localhost): org.apache.spark.SparkException: Checkpoint block rdd_17_124 not found! Either the executor that originally checkpointed this partition is no longer alive, or the original RDD is unpersisted. If this problem persists, you may consider using rdd.checkpoint()
or rdd.localcheckpoint()
instead, which are slower than memory checkpointing but more fault-tolerant.
at org.apache.spark.rdd.MemoryCheckpointRDD.compute(MemoryCheckpointRDD.scala:43)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:304)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
at edu.ucla.cs.wis.bigdatalog.spark.execution.setrdd.SetRDD.compute(SetRDD.scala:108)
at edu.ucla.cs.wis.bigdatalog.spark.execution.setrdd.SetRDD.computeOrReadCheckpoint(SetRDD.scala:104)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.fixedpoint.FixedPointResultTask.runTask(FixedPointResultTask.scala:54)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
17-12-01 09:33:56 INFO Executor: Running task 128.0 in stage 3.0 (TID 604)
17-12-01 09:33:56 INFO CacheManager: Partition rdd_31_129 not found, computing it
17-12-01 09:33:56 ERROR TaskSetManager: Task 124 in stage 3.0 failed 1 times; aborting job
17-12-01 09:33:56 INFO BlockManagerInfo: Added rdd_27_127 in memory on localhost:42113 (size: 1720.1 KB, free: 16.6 MB)
17-12-01 09:33:56 WARN TaskSetManager: Lost task 126.0 in stage 3.0 (TID 602, localhost): org.apache.spark.SparkException: Checkpoint block rdd_17_126 not found! Either the executor that originally checkpointed this partition is no longer alive, or the original RDD is unpersisted. If this problem persists, you may consider using rdd.checkpoint()
or rdd.localcheckpoint()
instead, which are slower than memory checkpointing but more fault-tolerant.
at org.apache.spark.rdd.MemoryCheckpointRDD.compute(MemoryCheckpointRDD.scala:43)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:304)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
at edu.ucla.cs.wis.bigdatalog.spark.execution.setrdd.SetRDD.compute(SetRDD.scala:108)
at edu.ucla.cs.wis.bigdatalog.spark.execution.setrdd.SetRDD.computeOrReadCheckpoint(SetRDD.scala:104)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.fixedpoint.FixedPointResultTask.runTask(FixedPointResultTask.scala:54)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
17-12-01 09:33:56 INFO MemoryStore: 1 blocks selected for dropping
17-12-01 09:33:56 INFO BlockManager: Dropping block rdd_11_128 from memory
17-12-01 09:33:56 INFO CacheManager: Partition rdd_31_128 not found, computing it
17-12-01 09:33:56 INFO BlockManagerInfo: Removed rdd_11_128 on localhost:42113 in memory (size: 1708.1 KB, free: 18.3 MB)
17-12-01 09:33:56 INFO CacheManager: Partition rdd_27_128 not found, computing it
17-12-01 09:33:56 INFO BlockManager: Found block rdd_17_128 locally
17-12-01 09:33:56 INFO BlockManager: Found block rdd_21_128 locally
17-12-01 09:33:56 INFO SetRDDHashSetPartition: Union set size 846 for rdd 18 took 0 ms
17-12-01 09:33:56 INFO BlockManager: Found block rdd_21_127 locally
17-12-01 09:33:56 INFO MemoryStore: 1 blocks selected for dropping
17-12-01 09:33:56 INFO BlockManager: Dropping block rdd_15_125 from memory
17-12-01 09:33:56 INFO SetRDDHashSetPartition: Diff set size 150 for rdd 28 took 0 ms
17-12-01 09:33:56 INFO CacheManager: Partition rdd_27_129 not found, computing it
17-12-01 09:33:56 INFO BlockManagerInfo: Removed rdd_15_125 on localhost:42113 in memory (size: 1708.1 KB, free: 20.0 MB)
17-12-01 09:33:56 INFO TaskSchedulerImpl: Cancelling stage 3
17-12-01 09:33:56 INFO BlockManager: Found block rdd_17_129 locally
17-12-01 09:33:56 INFO BlockManager: Found block rdd_21_129 locally
17-12-01 09:33:56 INFO SetRDDHashSetPartition: Union set size 1193 for rdd 18 took 1 ms
17-12-01 09:33:56 INFO MemoryStore: 1 blocks selected for dropping
17-12-01 09:33:56 INFO BlockManager: Dropping block rdd_15_127 from memory
17-12-01 09:33:56 INFO BlockManagerInfo: Removed rdd_15_127 on localhost:42113 in memory (size: 1708.1 KB, free: 21.6 MB)
17-12-01 09:33:56 INFO MemoryStore: 1 blocks selected for dropping
17-12-01 09:33:56 INFO BlockManager: Dropping block rdd_21_126 from memory
17-12-01 09:33:56 INFO BlockManagerInfo: Removed rdd_21_126 on localhost:42113 in memory (size: 1712.1 KB, free: 23.3 MB)
17-12-01 09:33:56 INFO Executor: Executor is trying to kill task 128.0 in stage 3.0 (TID 604)
17-12-01 09:33:56 INFO Executor: Executor is trying to kill task 125.0 in stage 3.0 (TID 601)
17-12-01 09:33:56 INFO Executor: Executor is trying to kill task 129.0 in stage 3.0 (TID 605)
17-12-01 09:33:56 INFO Executor: Executor is trying to kill task 127.0 in stage 3.0 (TID 603)
17-12-01 09:33:56 INFO TaskSchedulerImpl: Stage 3 was cancelled
17-12-01 09:33:56 INFO DAGScheduler: FixedPointResultStage 3 (runFixedPointJob at Recursion.scala:197) failed in 0.047 s
17-12-01 09:33:56 INFO DAGScheduler: Fixed Point Job 1 failed: runFixedPointJob at Recursion.scala:197, took 11.303584 s
17-12-01 09:33:56 INFO SparkContext: Invoking stop() from shutdown hook
17-12-01 09:33:56 INFO MemoryStore: Block rdd_27_128 stored as values in memory (estimated size 1720.1 KB, free 504.1 MB)
17-12-01 09:33:56 INFO BlockManagerInfo: Added rdd_27_128 in memory on localhost:42113 (size: 1720.1 KB, free: 21.6 MB)
17-12-01 09:33:56 INFO BlockManager: Found block rdd_21_128 locally
17-12-01 09:33:56 INFO SetRDDHashSetPartition: Diff set size 139 for rdd 28 took 0 ms
17-12-01 09:33:56 INFO MemoryStore: 1 blocks selected for dropping
17-12-01 09:33:56 INFO BlockManager: Dropping block rdd_15_128 from memory
17-12-01 09:33:56 INFO BlockManagerInfo: Removed rdd_15_128 on localhost:42113 in memory (size: 1708.1 KB, free: 23.3 MB)
17-12-01 09:33:56 INFO MemoryStore: Block rdd_31_127 stored as values in memory (estimated size 1706.1 KB, free 504.1 MB)
17-12-01 09:33:56 INFO BlockManagerInfo: Added rdd_31_127 in memory on localhost:42113 (size: 1706.1 KB, free: 21.6 MB)
17-12-01 09:33:56 INFO Executor: Executor killed task 127.0 in stage 3.0 (TID 603)
17-12-01 09:33:56 WARN TaskSetManager: Lost task 127.0 in stage 3.0 (TID 603, localhost): TaskKilled (killed intentionally)
17-12-01 09:33:56 INFO MemoryStore: 1 blocks selected for dropping
17-12-01 09:33:56 INFO BlockManager: Dropping block rdd_11_130 from memory
17-12-01 09:33:56 INFO BlockManagerInfo: Removed rdd_11_130 on localhost:42113 in memory (size: 1708.1 KB, free: 23.3 MB)
17-12-01 09:33:56 INFO MemoryStore: Block rdd_27_125 stored as values in memory (estimated size 1720.1 KB, free 504.1 MB)
17-12-01 09:33:56 INFO BlockManagerInfo: Added rdd_27_125 in memory on localhost:42113 (size: 1720.1 KB, free: 21.6 MB)
17-12-01 09:33:56 INFO BlockManager: Found block rdd_21_125 locally
17-12-01 09:33:56 INFO SetRDDHashSetPartition: Diff set size 67 for rdd 28 took 0 ms
17-12-01 09:33:56 INFO MemoryStore: Block rdd_27_129 stored as values in memory (estimated size 1720.1 KB, free 505.8 MB)
17-12-01 09:33:56 INFO BlockManagerInfo: Added rdd_27_129 in memory on localhost:42113 (size: 1720.1 KB, free: 19.9 MB)
17-12-01 09:33:56 INFO MemoryStore: Block rdd_31_128 stored as values in memory (estimated size 1706.1 KB, free 507.5 MB)
17-12-01 09:33:56 INFO BlockManager: Found block rdd_21_129 locally
17-12-01 09:33:56 INFO SetRDDHashSetPartition: Diff set size 303 for rdd 28 took 0 ms
17-12-01 09:33:56 INFO BlockManagerInfo: Added rdd_31_128 in memory on localhost:42113 (size: 1706.1 KB, free: 18.3 MB)
17-12-01 09:33:56 INFO MemoryStore: Block rdd_31_125 stored as values in memory (estimated size 1705.1 KB, free 509.1 MB)
17-12-01 09:33:56 INFO Executor: Executor killed task 128.0 in stage 3.0 (TID 604)
17-12-01 09:33:56 INFO BlockManagerInfo: Added rdd_31_125 in memory on localhost:42113 (size: 1705.1 KB, free: 16.6 MB)
17-12-01 09:33:56 INFO Executor: Executor killed task 125.0 in stage 3.0 (TID 601)
17-12-01 09:33:56 INFO MemoryStore: 1 blocks selected for dropping
17-12-01 09:33:56 INFO BlockManager: Dropping block rdd_11_129 from memory
17-12-01 09:33:56 INFO BlockManagerInfo: Removed rdd_11_129 on localhost:42113 in memory (size: 1712.1 KB, free: 18.3 MB)
17-12-01 09:33:56 WARN TaskSetManager: Lost task 125.0 in stage 3.0 (TID 601, localhost): TaskKilled (killed intentionally)
17-12-01 09:33:56 WARN TaskSetManager: Lost task 128.0 in stage 3.0 (TID 604, localhost): TaskKilled (killed intentionally)
17-12-01 09:33:56 INFO MemoryStore: Block rdd_31_129 stored as values in memory (estimated size 1708.1 KB, free 509.1 MB)
17-12-01 09:33:56 INFO BlockManagerInfo: Added rdd_31_129 in memory on localhost:42113 (size: 1708.1 KB, free: 16.6 MB)
17-12-01 09:33:56 INFO Executor: Executor killed task 129.0 in stage 3.0 (TID 605)
17-12-01 09:33:56 WARN TaskSetManager: Lost task 129.0 in stage 3.0 (TID 605, localhost): TaskKilled (killed intentionally)
17-12-01 09:33:56 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool
17-12-01 09:33:56 INFO SparkUI: Stopped Spark web UI at http://172.26.163.180:4040
17-12-01 09:33:56 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17-12-01 09:33:57 INFO MemoryStore: MemoryStore cleared
17-12-01 09:33:57 INFO BlockManager: BlockManager stopped
17-12-01 09:33:57 INFO BlockManagerMaster: BlockManagerMaster stopped
17-12-01 09:33:57 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17-12-01 09:33:57 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
17-12-01 09:33:57 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
17-12-01 09:33:57 INFO SparkContext: Successfully stopped SparkContext
17-12-01 09:33:57 INFO ShutdownHookManager: Shutdown hook called
17-12-01 09:33:57 INFO ShutdownHookManager: Deleting directory /tmp/spark-221ba8c7-a682-4887-ace6-b37e2eb9f122/httpd-6f67ecc9-fb6b-4ba4-8678-8d34d1e4faad
17-12-01 09:33:57 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
17-12-01 09:33:57 INFO ShutdownHookManager: Deleting directory /tmp/spark-221ba8c7-a682-4887-ace6-b37e2eb9f122
The problems come always the same when we changing the computers and the programs. That bother us. Anyone can help?
Thank you very much.
Hello,
I tried to execute the query rel(X,Y)
with the following datalog program:
rel2(X,Y) <- rel1(X,Y).
rel2(X,Z) <- rel3(X,Y), rel1(Y,Z).
rel3(X,Z) <- rel2(X,Y), rel1(Y,Z).
database({ rel1(X:string, Y:string) }).
and the following data for rel1:
a b
b c
I expect the output rel3(a,c)
. However I get a NullPointerException instead:
17/10/14 23:16:44 INFO BigDatalogContext: BigDatalog Query: "rel3(A,B)"
17/10/14 23:16:44 INFO BigDatalogContext: ** START Operator Program START **
17/10/14 23:16:44 INFO BigDatalogContext:
0: rel3(X, Y) <RECURSIVE_CLIQUE>(Recursion: LINEAR, Evaluation Type: SemiNaive)
Exit Rules:
Recursive Rules:
1: (X, Y) <DISTINCT PROJECT>
2: (0.Y = 1.X) <JOIN>
3: rel2(X, Y) <MUTUAL_RECURSIVE_CLIQUE>(Recursion: LINEAR, Evaluation Type: SemiNaive)
Exit Rules:
4: rel1(X, Y) <BASE_RELATION>
Recursive Rules:
4: (X, Y) <DISTINCT PROJECT>
5: (0.Y = 1.X) <JOIN>
6: rel3(X, Y) <RECURSIVE_RELATION>
6: rel1(X, Y) <BASE_RELATION>
3: rel1(X, Y) <BASE_RELATION>
17/10/14 23:16:44 INFO BigDatalogContext: ** END Operator Program END **
17/10/14 23:16:44 INFO BigDatalogContext: ** START BigDatalog Program START **
Exception in thread "main" java.lang.NullPointerException
at edu.ucla.cs.wis.bigdatalog.spark.logical.LogicalPlanGenerator.getPlan(LogicalPlanGenerator.scala:74)
at edu.ucla.cs.wis.bigdatalog.spark.logical.LogicalPlanGenerator.getPlan(LogicalPlanGenerator.scala:101)
at edu.ucla.cs.wis.bigdatalog.spark.logical.LogicalPlanGenerator.getPlan(LogicalPlanGenerator.scala:70)
at edu.ucla.cs.wis.bigdatalog.spark.logical.LogicalPlanGenerator.generateSparkProgram(LogicalPlanGenerator.scala:66)
at edu.ucla.cs.wis.bigdatalog.spark.BigDatalogContext.generateProgram(BigDatalogContext.scala:149)
at edu.ucla.cs.wis.bigdatalog.spark.BigDatalogContext.query(BigDatalogContext.scala:137)
at bigdatalog.Main.main(Main.java:59)
17/10/14 23:16:44 INFO SparkContext: Invoking stop() from shutdown hook
Is there any way to get it working?
I tried to use the recursion, but it fails with a lot of error messages. (See #3 for more details on how I run the program)
A B
B C
C D
D E
E F
F G
G H
database({ arcs(X:string, Y:string) }).
tc(X,Y) <- arcs(X,Y).
tc(X,Y) <- tc(X, Z), arcs(Z,Y).
./bin/run-example datalog.Experiments --program=99 --file=../redirect.txt --queryform="tc(A,B)" --baserelation_arcs=../bigdatalog-java/arcs
Here the part of the output with the first error message (see the attachment for the complete log)
17/10/10 16:35:57 INFO Recursion: Fixed Point Iteration # 2, time: 9170ms
17/10/10 16:35:57 INFO DAGScheduler: Submitting FixedPointResultStage 3 (SetRDD.diffRDD SetRDD[32] at RDD at SetRDD.scala:29), which has no missing parents
17/10/10 16:35:57 INFO MemoryStore: Block broadcast_5 stored as values in memory (estimated size 16.9 KB, free 510.0 MB)
17/10/10 16:35:57 INFO MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 8.7 KB, free 510.1 MB)
17/10/10 16:35:57 INFO BlockManagerInfo: Added broadcast_5_piece0 in memory on localhost:43953 (size: 8.7 KB, free: 1135.5 KB)
17/10/10 16:35:57 INFO SparkContext: Created broadcast 5 from broadcast at DAGScheduler.scala:1096
17/10/10 16:35:57 INFO DAGScheduler: Submitting 200 missing tasks from FixedPointResultStage 3 (SetRDD.diffRDD SetRDD[32] at RDD at SetRDD.scala:29)
17/10/10 16:35:57 INFO TaskSchedulerImpl: Adding task set 3.0 with 200 tasks
17/10/10 16:35:57 INFO TaskSetManager: Starting task 121.0 in stage 3.0 (TID 256, localhost, partition 121,PROCESS_LOCAL, 2343 bytes)
17/10/10 16:35:57 INFO TaskSetManager: Starting task 123.0 in stage 3.0 (TID 257, localhost, partition 123,PROCESS_LOCAL, 2343 bytes)
17/10/10 16:35:57 INFO TaskSetManager: Starting task 124.0 in stage 3.0 (TID 258, localhost, partition 124,PROCESS_LOCAL, 2343 bytes)
17/10/10 16:35:57 INFO TaskSetManager: Starting task 125.0 in stage 3.0 (TID 259, localhost, partition 125,PROCESS_LOCAL, 2343 bytes)
17/10/10 16:35:57 INFO Executor: Running task 121.0 in stage 3.0 (TID 256)
17/10/10 16:35:57 INFO Executor: Running task 123.0 in stage 3.0 (TID 257)
17/10/10 16:35:57 INFO Executor: Running task 124.0 in stage 3.0 (TID 258)
17/10/10 16:35:57 INFO Executor: Running task 125.0 in stage 3.0 (TID 259)
17/10/10 16:35:57 INFO CacheManager: Partition rdd_31_123 not found, computing it
17/10/10 16:35:57 INFO CacheManager: Partition rdd_31_125 not found, computing it
17/10/10 16:35:57 INFO CacheManager: Partition rdd_31_121 not found, computing it
17/10/10 16:35:57 INFO CacheManager: Partition rdd_27_123 not found, computing it
17/10/10 16:35:57 INFO CacheManager: Partition rdd_27_121 not found, computing it
17/10/10 16:35:57 INFO BlockManager: Found block rdd_17_123 locally
17/10/10 16:35:57 INFO BlockManager: Found block rdd_21_123 locally
17/10/10 16:35:57 INFO CacheManager: Partition rdd_17_121 not found, computing it
17/10/10 16:35:57 INFO SetRDDHashSetPartition: Union set size 0 for rdd 18 took 0 ms
17/10/10 16:35:57 ERROR Executor: Exception in task 121.0 in stage 3.0 (TID 256)
org.apache.spark.SparkException: Checkpoint block rdd_17_121 not found! Either the executor that originally checkpointed this partition is no longer alive, or the original RDD is unpersisted. If this problem persists, you may consider using `rdd.checkpoint()` or `rdd.localcheckpoint()` instead, which are slower than memory checkpointing but more fault-tolerant.
at org.apache.spark.rdd.MemoryCheckpointRDD.compute(MemoryCheckpointRDD.scala:43)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:304)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
at edu.ucla.cs.wis.bigdatalog.spark.execution.setrdd.SetRDD.compute(SetRDD.scala:108)
at edu.ucla.cs.wis.bigdatalog.spark.execution.setrdd.SetRDD.computeOrReadCheckpoint(SetRDD.scala:104)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.fixedpoint.FixedPointResultTask.runTask(FixedPointResultTask.scala:54)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
17/10/10 16:35:57 INFO CacheManager: Partition rdd_27_125 not found, computing it
17/10/10 16:35:57 INFO BlockManager: Found block rdd_17_125 locally
17/10/10 16:35:57 INFO BlockManager: Found block rdd_21_125 locally
17/10/10 16:35:57 INFO SetRDDHashSetPartition: Union set size 0 for rdd 18 took 0 ms
17/10/10 16:35:57 INFO MemoryStore: 1 blocks selected for dropping
17/10/10 16:35:57 INFO BlockManager: Dropping block rdd_17_124 from memory
17/10/10 16:35:57 INFO BlockManagerInfo: Removed rdd_17_124 on localhost:43953 in memory (size: 1701.1 KB, free: 2.8 MB)
17/10/10 16:35:57 INFO MemoryStore: 1 blocks selected for dropping
17/10/10 16:35:57 INFO BlockManager: Dropping block rdd_11_125 from memory
17/10/10 16:35:57 INFO BlockManagerInfo: Removed rdd_11_125 on localhost:43953 in memory (size: 1701.1 KB, free: 4.4 MB)
17/10/10 16:35:57 INFO MemoryStore: 1 blocks selected for dropping
17/10/10 16:35:57 INFO BlockManager: Dropping block rdd_15_124 from memory
17/10/10 16:35:57 INFO BlockManagerInfo: Removed rdd_15_124 on localhost:43953 in memory (size: 1701.1 KB, free: 6.1 MB)
Hello,
I tried to evaluate the query bad(X,Y) on program
bad(X, X) <- parent(X, X).
with the following input:
parent(bob, alice).
parent(charly, alice).
It should not produce any result tuples, however I get
bad(bob, alice).
bad(charly, alice).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.