Git Product home page Git Product logo

bigdatalog's People

Contributors

ashkapsky avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bigdatalog's Issues

I am writing a pyspark.pandas dataframe to delta table in azure databricks with a compute cluster - 4-7 Workers 256-448 GB Memory 128-224 Cores 1 Driver 32 GB Memory, 16 Cores Runtime 11.3.x-scala2.12.

          Facing the same error. I am writing a pyspark.pandas dataframe to delta:

*********** An error occurred while calling o11455.save.
: org.apache.spark.SparkException: Job aborted.
at org.apache.spark.sql.errors.QueryExecutionErrors$.jobAbortedError(QueryExecutionErrors.scala:882)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$1(FileFormatWriter.scala:334)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:154)
at com.databricks.sql.transaction.tahoe.commands.WriteIntoDeltaCommand.run(WriteIntoDeltaCommand.scala:70)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.$anonfun$writeFiles$11(TransactionalWriteEdge.scala:571)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$8(SQLExecution.scala:250)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:400)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:195)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:985)
at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:149)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:350)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.$anonfun$writeFiles$1(TransactionalWriteEdge.scala:571)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.withOperationTypeTag(DeltaLogging.scala:193)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.withOperationTypeTag$(DeltaLogging.scala:180)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.withOperationTypeTag(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.$anonfun$recordDeltaOperationInternal$2(DeltaLogging.scala:157)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordFrameProfile(DeltaLogging.scala:262)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordFrameProfile$(DeltaLogging.scala:260)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.recordFrameProfile(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.$anonfun$recordDeltaOperationInternal$1(DeltaLogging.scala:156)
at com.databricks.logging.UsageLogging.$anonfun$recordOperation$1(UsageLogging.scala:547)
at com.databricks.logging.UsageLogging.executeThunkAndCaptureResultTags$1(UsageLogging.scala:642)
at com.databricks.logging.UsageLogging.$anonfun$recordOperationWithResultTags$4(UsageLogging.scala:663)
at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:404)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:147)
at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:402)
at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:399)
at com.databricks.spark.util.PublicDBLogging.withAttributionContext(DatabricksSparkUsageLogger.scala:26)
at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:447)
at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:432)
at com.databricks.spark.util.PublicDBLogging.withAttributionTags(DatabricksSparkUsageLogger.scala:26)
at com.databricks.logging.UsageLogging.recordOperationWithResultTags(UsageLogging.scala:637)
at com.databricks.logging.UsageLogging.recordOperationWithResultTags$(UsageLogging.scala:556)
at com.databricks.spark.util.PublicDBLogging.recordOperationWithResultTags(DatabricksSparkUsageLogger.scala:26)
at com.databricks.logging.UsageLogging.recordOperation(UsageLogging.scala:547)
at com.databricks.logging.UsageLogging.recordOperation$(UsageLogging.scala:517)
at com.databricks.spark.util.PublicDBLogging.recordOperation(DatabricksSparkUsageLogger.scala:26)
at com.databricks.spark.util.PublicDBLogging.recordOperation0(DatabricksSparkUsageLogger.scala:66)
at com.databricks.spark.util.DatabricksSparkUsageLogger.recordOperation(DatabricksSparkUsageLogger.scala:148)
at com.databricks.spark.util.UsageLogger.recordOperation(UsageLogger.scala:72)
at com.databricks.spark.util.UsageLogger.recordOperation$(UsageLogger.scala:59)
at com.databricks.spark.util.DatabricksSparkUsageLogger.recordOperation(DatabricksSparkUsageLogger.scala:107)
at com.databricks.spark.util.UsageLogging.recordOperation(UsageLogger.scala:433)
at com.databricks.spark.util.UsageLogging.recordOperation$(UsageLogger.scala:412)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.recordOperation(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordDeltaOperationInternal(DeltaLogging.scala:155)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordDeltaOperation(DeltaLogging.scala:145)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordDeltaOperation$(DeltaLogging.scala:133)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.recordDeltaOperation(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.$anonfun$recordWriteFilesOperation$1(TransactionalWriteEdge.scala:307)
at com.databricks.sql.acl.CheckPermissions$.trusted(CheckPermissions.scala:1837)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.recordWriteFilesOperation(TransactionalWriteEdge.scala:306)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.writeFiles(TransactionalWriteEdge.scala:339)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.writeFiles$(TransactionalWriteEdge.scala:333)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.writeFiles(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.writeFiles(TransactionalWriteEdge.scala:621)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.writeFiles$(TransactionalWriteEdge.scala:611)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.writeFiles(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.files.TransactionalWrite.writeFiles(TransactionalWrite.scala:226)
at com.databricks.sql.transaction.tahoe.files.TransactionalWrite.writeFiles$(TransactionalWrite.scala:223)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.writeFiles(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.commands.WriteIntoDelta.write(WriteIntoDelta.scala:368)
at com.databricks.sql.transaction.tahoe.commands.WriteIntoDelta.$anonfun$run$2(WriteIntoDelta.scala:111)
at com.databricks.sql.transaction.tahoe.commands.WriteIntoDelta.$anonfun$run$2$adapted(WriteIntoDelta.scala:100)
at com.databricks.sql.transaction.tahoe.DeltaLog.withNewTransaction(DeltaLog.scala:312)
at com.databricks.sql.transaction.tahoe.commands.WriteIntoDelta.$anonfun$run$1(WriteIntoDelta.scala:100)
at com.databricks.sql.acl.CheckPermissions$.trusted(CheckPermissions.scala:1837)
at com.databricks.sql.transaction.tahoe.commands.WriteIntoDelta.run(WriteIntoDelta.scala:99)
at com.databricks.sql.transaction.tahoe.sources.DeltaDataSource.createRelation(DeltaDataSource.scala:168)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:49)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:80)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:78)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:89)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$2(QueryExecution.scala:247)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$8(SQLExecution.scala:250)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:400)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:195)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:985)
at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:149)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:350)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$1(QueryExecution.scala:247)
at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$withMVTagsIfNecessary(QueryExecution.scala:232)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.applyOrElse(QueryExecution.scala:245)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.applyOrElse(QueryExecution.scala:238)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:99)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:31)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:298)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:294)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:488)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$eagerlyExecuteCommands$1(QueryExecution.scala:238)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:354)
at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:238)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:192)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:183)
at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:274)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:965)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:430)
at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:339)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:251)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:306)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 43 in stage 360.0 failed 4 times, most recent failure: Lost task 43.3 in stage 360.0 (TID 9912) (172.16.177.134 executor 10): org.apache.spark.SparkException: Checkpoint block rdd_1208_8 not found! Either the executor that originally checkpointed this partition is no longer alive, or the original RDD is unpersisted. If this problem persists, you may consider using rdd.checkpoint() instead, which is slower than local checkpointing but more fault-tolerant.
at org.apache.spark.errors.SparkCoreErrors$.checkpointRDDBlockIdNotFoundError(SparkCoreErrors.scala:82)
at org.apache.spark.rdd.LocalCheckpointRDD.compute(LocalCheckpointRDD.scala:61)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:408)
at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:423)
at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1559)
at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1486)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1550)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1369)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:421)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:372)
at org.apache.spark.rdd.ZippedWithIndexRDD.compute(ZippedWithIndexRDD.scala:71)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:122)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.$anonfun$runTask$3(ShuffleMapTask.scala:81)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ShuffleMapTask.$anonfun$runTask$1(ShuffleMapTask.scala:81)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.doRunTask(Task.scala:174)
at org.apache.spark.scheduler.Task.$anonfun$run$4(Task.scala:137)
at com.databricks.unity.EmptyHandle$.runWithAndClose(UCSHandle.scala:126)
at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:137)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.Task.run(Task.scala:96)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$13(Executor.scala:902)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1697)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:905)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:760)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)

Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:3414)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:3336)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:3325)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:3325)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1433)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1433)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1433)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3625)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3563)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3551)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:51)
Caused by: org.apache.spark.SparkException: Checkpoint block rdd_1208_8 not found! Either the executor that originally checkpointed this partition is no longer alive, or the original RDD is unpersisted. If this problem persists, you may consider using rdd.checkpoint() instead, which is slower than local checkpointing but more fault-tolerant.
at org.apache.spark.errors.SparkCoreErrors$.checkpointRDDBlockIdNotFoundError(SparkCoreErrors.scala:82)
at org.apache.spark.rdd.LocalCheckpointRDD.compute(LocalCheckpointRDD.scala:61)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:408)
at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:423)
at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1559)
at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1486)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1550)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1369)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:421)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:372)
at org.apache.spark.rdd.ZippedWithIndexRDD.compute(ZippedWithIndexRDD.scala:71)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:122)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.$anonfun$runTask$3(ShuffleMapTask.scala:81)
at com.data

*** WARNING: max output size exceeded, skipping output. ***

Logging.scala:193)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.withOperationTypeTag$(DeltaLogging.scala:180)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.withOperationTypeTag(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.$anonfun$recordDeltaOperationInternal$2(DeltaLogging.scala:157)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordFrameProfile(DeltaLogging.scala:262)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordFrameProfile$(DeltaLogging.scala:260)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.recordFrameProfile(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.$anonfun$recordDeltaOperationInternal$1(DeltaLogging.scala:156)
at com.databricks.logging.UsageLogging.$anonfun$recordOperation$1(UsageLogging.scala:547)
at com.databricks.logging.UsageLogging.executeThunkAndCaptureResultTags$1(UsageLogging.scala:642)
at com.databricks.logging.UsageLogging.$anonfun$recordOperationWithResultTags$4(UsageLogging.scala:663)
at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:404)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:147)
at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:402)
at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:399)
at com.databricks.spark.util.PublicDBLogging.withAttributionContext(DatabricksSparkUsageLogger.scala:26)
at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:447)
at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:432)
at com.databricks.spark.util.PublicDBLogging.withAttributionTags(DatabricksSparkUsageLogger.scala:26)
at com.databricks.logging.UsageLogging.recordOperationWithResultTags(UsageLogging.scala:637)
at com.databricks.logging.UsageLogging.recordOperationWithResultTags$(UsageLogging.scala:556)
at com.databricks.spark.util.PublicDBLogging.recordOperationWithResultTags(DatabricksSparkUsageLogger.scala:26)
at com.databricks.logging.UsageLogging.recordOperation(UsageLogging.scala:547)
at com.databricks.logging.UsageLogging.recordOperation$(UsageLogging.scala:517)
at com.databricks.spark.util.PublicDBLogging.recordOperation(DatabricksSparkUsageLogger.scala:26)
at com.databricks.spark.util.PublicDBLogging.recordOperation0(DatabricksSparkUsageLogger.scala:66)
at com.databricks.spark.util.DatabricksSparkUsageLogger.recordOperation(DatabricksSparkUsageLogger.scala:148)
at com.databricks.spark.util.UsageLogger.recordOperation(UsageLogger.scala:72)
at com.databricks.spark.util.UsageLogger.recordOperation$(UsageLogger.scala:59)
at com.databricks.spark.util.DatabricksSparkUsageLogger.recordOperation(DatabricksSparkUsageLogger.scala:107)
at com.databricks.spark.util.UsageLogging.recordOperation(UsageLogger.scala:433)
at com.databricks.spark.util.UsageLogging.recordOperation$(UsageLogger.scala:412)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.recordOperation(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordDeltaOperationInternal(DeltaLogging.scala:155)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordDeltaOperation(DeltaLogging.scala:145)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordDeltaOperation$(DeltaLogging.scala:133)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.recordDeltaOperation(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.$anonfun$recordWriteFilesOperation$1(TransactionalWriteEdge.scala:307)
at com.databricks.sql.acl.CheckPermissions$.trusted(CheckPermissions.scala:1837)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.recordWriteFilesOperation(TransactionalWriteEdge.scala:306)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.writeFiles(TransactionalWriteEdge.scala:339)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.writeFiles$(TransactionalWriteEdge.scala:333)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.writeFiles(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.writeFiles(TransactionalWriteEdge.scala:621)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.writeFiles$(TransactionalWriteEdge.scala:611)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.writeFiles(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.files.TransactionalWrite.writeFiles(TransactionalWrite.scala:226)
at com.databricks.sql.transaction.tahoe.files.TransactionalWrite.writeFiles$(TransactionalWrite.scala:223)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.writeFiles(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.commands.WriteIntoDelta.write(WriteIntoDelta.scala:368)
at com.databricks.sql.transaction.tahoe.commands.WriteIntoDelta.$anonfun$run$2(WriteIntoDelta.scala:111)
at com.databricks.sql.transaction.tahoe.commands.WriteIntoDelta.$anonfun$run$2$adapted(WriteIntoDelta.scala:100)
at com.databricks.sql.transaction.tahoe.DeltaLog.withNewTransaction(DeltaLog.scala:312)
at com.databricks.sql.transaction.tahoe.commands.WriteIntoDelta.$anonfun$run$1(WriteIntoDelta.scala:100)
at com.databricks.sql.acl.CheckPermissions$.trusted(CheckPermissions.scala:1837)
at com.databricks.sql.transaction.tahoe.commands.WriteIntoDelta.run(WriteIntoDelta.scala:99)
at com.databricks.sql.transaction.tahoe.sources.DeltaDataSource.createRelation(DeltaDataSource.scala:168)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:49)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:80)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:78)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:89)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$2(QueryExecution.scala:247)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$8(SQLExecution.scala:250)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:400)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:195)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:985)
at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:149)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:350)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$1(QueryExecution.scala:247)
at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$withMVTagsIfNecessary(QueryExecution.scala:232)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.applyOrElse(QueryExecution.scala:245)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.applyOrElse(QueryExecution.scala:238)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:99)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:31)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:298)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:294)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:488)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$eagerlyExecuteCommands$1(QueryExecution.scala:238)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:354)
at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:238)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:192)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:183)
at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:274)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:965)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:430)
at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:339)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:251)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:306)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 43 in stage 360.0 failed 4 times, most recent failure: Lost task 43.3 in stage 360.0 (TID 9912) (172.16.177.134 executor 10): org.apache.spark.SparkException: Checkpoint block rdd_1208_8 not found! Either the executor that originally checkpointed this partition is no longer alive, or the original RDD is unpersisted. If this problem persists, you may consider using rdd.checkpoint() instead, which is slower than local checkpointing but more fault-tolerant.
at org.apache.spark.errors.SparkCoreErrors$.checkpointRDDBlockIdNotFoundError(SparkCoreErrors.scala:82)
at org.apache.spark.rdd.LocalCheckpointRDD.compute(LocalCheckpointRDD.scala:61)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:408)
at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:423)
at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1559)
at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1486)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1550)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1369)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:421)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:372)
at org.apache.spark.rdd.ZippedWithIndexRDD.compute(ZippedWithIndexRDD.scala:71)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:122)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.$anonfun$runTask$3(ShuffleMapTask.scala:81)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ShuffleMapTask.$anonfun$runTask$1(ShuffleMapTask.scala:81)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.doRunTask(Task.scala:174)
at org.apache.spark.scheduler.Task.$anonfun$run$4(Task.scala:137)
at com.databricks.unity.EmptyHandle$.runWithAndClose(UCSHandle.scala:126)
at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:137)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.Task.run(Task.scala:96)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$13(Executor.scala:902)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1697)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:905)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:760)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)

Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:3414)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:3336)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:3325)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:3325)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1433)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1433)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1433)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3625)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3563)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3551)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:51)
Caused by: org.apache.spark.SparkException: Checkpoint block rdd_1208_8 not found! Either the executor that originally checkpointed this partition is no longer alive, or the original RDD is unpersisted. If this problem persists, you may consider using rdd.checkpoint() instead, which is slower than local checkpointing but more fault-tolerant.
at org.apache.spark.errors.SparkCoreErrors$.checkpointRDDBlockIdNotFoundError(SparkCoreErrors.scala:82)
at org.apache.spark.rdd.LocalCheckpointRDD.compute(LocalCheckpointRDD.scala:61)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:408)
at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:423)
at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1559)
at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1486)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1550)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1369)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:421)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:372)
at org.apache.spark.rdd.ZippedWithIndexRDD.compute(ZippedWithIndexRDD.scala:71)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:122)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.$anonfun$runTask$3(ShuffleMapTask.scala:81)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ShuffleMapTask.$anonfun$runTask$1(ShuffleMapTask.scala:81)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.doRunTask(Task.scala:174)
at org.apache.spark.scheduler.Task.$anonfun$run$4(Task.scala:137)
at com.databricks.unity.EmptyHandle$.runWithAndClose(UCSHandle.scala:126)
at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:137)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.Task.run(Task.scala:96)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$13(Executor.scala:902)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1697)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:905)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:760)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)

shape - (191365764, 5)

Originally posted by @alanthom in #9 (comment)

Problems with "missing" column when using one level of indirection

Hello,

I get an error message that a column is not found (see error.zip for full log message):

org.apache.spark.sql.AnalysisException: cannot resolve 'extern_file_arcs2.Z' given input columns: [X, Z, Z, Y];

I used the following datalog program:

database({ extern_file_arcs(X:string, Y:string) }).
arcs(X,Y) <- extern_file_arcs(X,Y).
tc(X,Y) <- arcs(X,Y).
tc(X,Y) <- tc(X,Z), arcs(Z,Y).

By chance I doubled the second line. Astonishingly, it then run without problem:

database({ extern_file_arcs(X:string, Y:string) }).
arcs(X,Y) <- extern_file_arcs(X,Y).
arcs(X,Y) <- extern_file_arcs(X,Y).
tc(X,Y) <- arcs(X,Y).
tc(X,Y) <- tc(X,Z), arcs(Z,Y).

Is that a bug or a feature? How can I be sure that I repeat the rules often enough?

I am writing a pyspark.pandas dataframe to delta table in azure databricks with a compute cluster - 4-7 Workers 256-448 GB Memory 128-224 Cores 1 Driver 32 GB Memory, 16 Cores Runtime 11.3.x-scala2.12. The dataframe has around 20million rows

          Facing the same error. I am writing a pyspark.pandas dataframe to delta:

*********** An error occurred while calling o11455.save.
: org.apache.spark.SparkException: Job aborted.
at org.apache.spark.sql.errors.QueryExecutionErrors$.jobAbortedError(QueryExecutionErrors.scala:882)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$1(FileFormatWriter.scala:334)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:154)
at com.databricks.sql.transaction.tahoe.commands.WriteIntoDeltaCommand.run(WriteIntoDeltaCommand.scala:70)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.$anonfun$writeFiles$11(TransactionalWriteEdge.scala:571)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$8(SQLExecution.scala:250)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:400)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:195)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:985)
at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:149)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:350)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.$anonfun$writeFiles$1(TransactionalWriteEdge.scala:571)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.withOperationTypeTag(DeltaLogging.scala:193)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.withOperationTypeTag$(DeltaLogging.scala:180)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.withOperationTypeTag(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.$anonfun$recordDeltaOperationInternal$2(DeltaLogging.scala:157)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordFrameProfile(DeltaLogging.scala:262)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordFrameProfile$(DeltaLogging.scala:260)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.recordFrameProfile(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.$anonfun$recordDeltaOperationInternal$1(DeltaLogging.scala:156)
at com.databricks.logging.UsageLogging.$anonfun$recordOperation$1(UsageLogging.scala:547)
at com.databricks.logging.UsageLogging.executeThunkAndCaptureResultTags$1(UsageLogging.scala:642)
at com.databricks.logging.UsageLogging.$anonfun$recordOperationWithResultTags$4(UsageLogging.scala:663)
at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:404)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:147)
at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:402)
at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:399)
at com.databricks.spark.util.PublicDBLogging.withAttributionContext(DatabricksSparkUsageLogger.scala:26)
at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:447)
at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:432)
at com.databricks.spark.util.PublicDBLogging.withAttributionTags(DatabricksSparkUsageLogger.scala:26)
at com.databricks.logging.UsageLogging.recordOperationWithResultTags(UsageLogging.scala:637)
at com.databricks.logging.UsageLogging.recordOperationWithResultTags$(UsageLogging.scala:556)
at com.databricks.spark.util.PublicDBLogging.recordOperationWithResultTags(DatabricksSparkUsageLogger.scala:26)
at com.databricks.logging.UsageLogging.recordOperation(UsageLogging.scala:547)
at com.databricks.logging.UsageLogging.recordOperation$(UsageLogging.scala:517)
at com.databricks.spark.util.PublicDBLogging.recordOperation(DatabricksSparkUsageLogger.scala:26)
at com.databricks.spark.util.PublicDBLogging.recordOperation0(DatabricksSparkUsageLogger.scala:66)
at com.databricks.spark.util.DatabricksSparkUsageLogger.recordOperation(DatabricksSparkUsageLogger.scala:148)
at com.databricks.spark.util.UsageLogger.recordOperation(UsageLogger.scala:72)
at com.databricks.spark.util.UsageLogger.recordOperation$(UsageLogger.scala:59)
at com.databricks.spark.util.DatabricksSparkUsageLogger.recordOperation(DatabricksSparkUsageLogger.scala:107)
at com.databricks.spark.util.UsageLogging.recordOperation(UsageLogger.scala:433)
at com.databricks.spark.util.UsageLogging.recordOperation$(UsageLogger.scala:412)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.recordOperation(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordDeltaOperationInternal(DeltaLogging.scala:155)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordDeltaOperation(DeltaLogging.scala:145)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordDeltaOperation$(DeltaLogging.scala:133)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.recordDeltaOperation(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.$anonfun$recordWriteFilesOperation$1(TransactionalWriteEdge.scala:307)
at com.databricks.sql.acl.CheckPermissions$.trusted(CheckPermissions.scala:1837)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.recordWriteFilesOperation(TransactionalWriteEdge.scala:306)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.writeFiles(TransactionalWriteEdge.scala:339)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.writeFiles$(TransactionalWriteEdge.scala:333)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.writeFiles(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.writeFiles(TransactionalWriteEdge.scala:621)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.writeFiles$(TransactionalWriteEdge.scala:611)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.writeFiles(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.files.TransactionalWrite.writeFiles(TransactionalWrite.scala:226)
at com.databricks.sql.transaction.tahoe.files.TransactionalWrite.writeFiles$(TransactionalWrite.scala:223)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.writeFiles(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.commands.WriteIntoDelta.write(WriteIntoDelta.scala:368)
at com.databricks.sql.transaction.tahoe.commands.WriteIntoDelta.$anonfun$run$2(WriteIntoDelta.scala:111)
at com.databricks.sql.transaction.tahoe.commands.WriteIntoDelta.$anonfun$run$2$adapted(WriteIntoDelta.scala:100)
at com.databricks.sql.transaction.tahoe.DeltaLog.withNewTransaction(DeltaLog.scala:312)
at com.databricks.sql.transaction.tahoe.commands.WriteIntoDelta.$anonfun$run$1(WriteIntoDelta.scala:100)
at com.databricks.sql.acl.CheckPermissions$.trusted(CheckPermissions.scala:1837)
at com.databricks.sql.transaction.tahoe.commands.WriteIntoDelta.run(WriteIntoDelta.scala:99)
at com.databricks.sql.transaction.tahoe.sources.DeltaDataSource.createRelation(DeltaDataSource.scala:168)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:49)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:80)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:78)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:89)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$2(QueryExecution.scala:247)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$8(SQLExecution.scala:250)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:400)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:195)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:985)
at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:149)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:350)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$1(QueryExecution.scala:247)
at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$withMVTagsIfNecessary(QueryExecution.scala:232)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.applyOrElse(QueryExecution.scala:245)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.applyOrElse(QueryExecution.scala:238)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:99)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:31)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:298)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:294)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:488)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$eagerlyExecuteCommands$1(QueryExecution.scala:238)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:354)
at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:238)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:192)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:183)
at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:274)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:965)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:430)
at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:339)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:251)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:306)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 43 in stage 360.0 failed 4 times, most recent failure: Lost task 43.3 in stage 360.0 (TID 9912) (172.16.177.134 executor 10): org.apache.spark.SparkException: Checkpoint block rdd_1208_8 not found! Either the executor that originally checkpointed this partition is no longer alive, or the original RDD is unpersisted. If this problem persists, you may consider using rdd.checkpoint() instead, which is slower than local checkpointing but more fault-tolerant.
at org.apache.spark.errors.SparkCoreErrors$.checkpointRDDBlockIdNotFoundError(SparkCoreErrors.scala:82)
at org.apache.spark.rdd.LocalCheckpointRDD.compute(LocalCheckpointRDD.scala:61)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:408)
at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:423)
at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1559)
at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1486)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1550)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1369)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:421)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:372)
at org.apache.spark.rdd.ZippedWithIndexRDD.compute(ZippedWithIndexRDD.scala:71)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:122)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.$anonfun$runTask$3(ShuffleMapTask.scala:81)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ShuffleMapTask.$anonfun$runTask$1(ShuffleMapTask.scala:81)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.doRunTask(Task.scala:174)
at org.apache.spark.scheduler.Task.$anonfun$run$4(Task.scala:137)
at com.databricks.unity.EmptyHandle$.runWithAndClose(UCSHandle.scala:126)
at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:137)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.Task.run(Task.scala:96)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$13(Executor.scala:902)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1697)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:905)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:760)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)

Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:3414)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:3336)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:3325)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:3325)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1433)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1433)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1433)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3625)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3563)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3551)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:51)
Caused by: org.apache.spark.SparkException: Checkpoint block rdd_1208_8 not found! Either the executor that originally checkpointed this partition is no longer alive, or the original RDD is unpersisted. If this problem persists, you may consider using rdd.checkpoint() instead, which is slower than local checkpointing but more fault-tolerant.
at org.apache.spark.errors.SparkCoreErrors$.checkpointRDDBlockIdNotFoundError(SparkCoreErrors.scala:82)
at org.apache.spark.rdd.LocalCheckpointRDD.compute(LocalCheckpointRDD.scala:61)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:408)
at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:423)
at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1559)
at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1486)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1550)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1369)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:421)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:372)
at org.apache.spark.rdd.ZippedWithIndexRDD.compute(ZippedWithIndexRDD.scala:71)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:122)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.$anonfun$runTask$3(ShuffleMapTask.scala:81)
at com.data

*** WARNING: max output size exceeded, skipping output. ***

Logging.scala:193)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.withOperationTypeTag$(DeltaLogging.scala:180)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.withOperationTypeTag(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.$anonfun$recordDeltaOperationInternal$2(DeltaLogging.scala:157)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordFrameProfile(DeltaLogging.scala:262)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordFrameProfile$(DeltaLogging.scala:260)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.recordFrameProfile(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.$anonfun$recordDeltaOperationInternal$1(DeltaLogging.scala:156)
at com.databricks.logging.UsageLogging.$anonfun$recordOperation$1(UsageLogging.scala:547)
at com.databricks.logging.UsageLogging.executeThunkAndCaptureResultTags$1(UsageLogging.scala:642)
at com.databricks.logging.UsageLogging.$anonfun$recordOperationWithResultTags$4(UsageLogging.scala:663)
at com.databricks.logging.UsageLogging.$anonfun$withAttributionContext$1(UsageLogging.scala:404)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
at com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:147)
at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:402)
at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:399)
at com.databricks.spark.util.PublicDBLogging.withAttributionContext(DatabricksSparkUsageLogger.scala:26)
at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:447)
at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:432)
at com.databricks.spark.util.PublicDBLogging.withAttributionTags(DatabricksSparkUsageLogger.scala:26)
at com.databricks.logging.UsageLogging.recordOperationWithResultTags(UsageLogging.scala:637)
at com.databricks.logging.UsageLogging.recordOperationWithResultTags$(UsageLogging.scala:556)
at com.databricks.spark.util.PublicDBLogging.recordOperationWithResultTags(DatabricksSparkUsageLogger.scala:26)
at com.databricks.logging.UsageLogging.recordOperation(UsageLogging.scala:547)
at com.databricks.logging.UsageLogging.recordOperation$(UsageLogging.scala:517)
at com.databricks.spark.util.PublicDBLogging.recordOperation(DatabricksSparkUsageLogger.scala:26)
at com.databricks.spark.util.PublicDBLogging.recordOperation0(DatabricksSparkUsageLogger.scala:66)
at com.databricks.spark.util.DatabricksSparkUsageLogger.recordOperation(DatabricksSparkUsageLogger.scala:148)
at com.databricks.spark.util.UsageLogger.recordOperation(UsageLogger.scala:72)
at com.databricks.spark.util.UsageLogger.recordOperation$(UsageLogger.scala:59)
at com.databricks.spark.util.DatabricksSparkUsageLogger.recordOperation(DatabricksSparkUsageLogger.scala:107)
at com.databricks.spark.util.UsageLogging.recordOperation(UsageLogger.scala:433)
at com.databricks.spark.util.UsageLogging.recordOperation$(UsageLogger.scala:412)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.recordOperation(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordDeltaOperationInternal(DeltaLogging.scala:155)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordDeltaOperation(DeltaLogging.scala:145)
at com.databricks.sql.transaction.tahoe.metering.DeltaLogging.recordDeltaOperation$(DeltaLogging.scala:133)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.recordDeltaOperation(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.$anonfun$recordWriteFilesOperation$1(TransactionalWriteEdge.scala:307)
at com.databricks.sql.acl.CheckPermissions$.trusted(CheckPermissions.scala:1837)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.recordWriteFilesOperation(TransactionalWriteEdge.scala:306)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.writeFiles(TransactionalWriteEdge.scala:339)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.writeFiles$(TransactionalWriteEdge.scala:333)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.writeFiles(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.writeFiles(TransactionalWriteEdge.scala:621)
at com.databricks.sql.transaction.tahoe.files.TransactionalWriteEdge.writeFiles$(TransactionalWriteEdge.scala:611)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.writeFiles(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.files.TransactionalWrite.writeFiles(TransactionalWrite.scala:226)
at com.databricks.sql.transaction.tahoe.files.TransactionalWrite.writeFiles$(TransactionalWrite.scala:223)
at com.databricks.sql.transaction.tahoe.OptimisticTransaction.writeFiles(OptimisticTransaction.scala:112)
at com.databricks.sql.transaction.tahoe.commands.WriteIntoDelta.write(WriteIntoDelta.scala:368)
at com.databricks.sql.transaction.tahoe.commands.WriteIntoDelta.$anonfun$run$2(WriteIntoDelta.scala:111)
at com.databricks.sql.transaction.tahoe.commands.WriteIntoDelta.$anonfun$run$2$adapted(WriteIntoDelta.scala:100)
at com.databricks.sql.transaction.tahoe.DeltaLog.withNewTransaction(DeltaLog.scala:312)
at com.databricks.sql.transaction.tahoe.commands.WriteIntoDelta.$anonfun$run$1(WriteIntoDelta.scala:100)
at com.databricks.sql.acl.CheckPermissions$.trusted(CheckPermissions.scala:1837)
at com.databricks.sql.transaction.tahoe.commands.WriteIntoDelta.run(WriteIntoDelta.scala:99)
at com.databricks.sql.transaction.tahoe.sources.DeltaDataSource.createRelation(DeltaDataSource.scala:168)
at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:49)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:80)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:78)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:89)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$2(QueryExecution.scala:247)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$8(SQLExecution.scala:250)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:400)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withCustomExecutionEnv$1(SQLExecution.scala:195)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:985)
at org.apache.spark.sql.execution.SQLExecution$.withCustomExecutionEnv(SQLExecution.scala:149)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:350)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$1(QueryExecution.scala:247)
at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$withMVTagsIfNecessary(QueryExecution.scala:232)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.applyOrElse(QueryExecution.scala:245)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.applyOrElse(QueryExecution.scala:238)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:512)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:99)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:512)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:31)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:298)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:294)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:31)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:488)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$eagerlyExecuteCommands$1(QueryExecution.scala:238)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:354)
at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:238)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:192)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:183)
at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:274)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:965)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:430)
at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:339)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:251)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
at py4j.Gateway.invoke(Gateway.java:306)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:195)
at py4j.ClientServerConnection.run(ClientServerConnection.java:115)
at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 43 in stage 360.0 failed 4 times, most recent failure: Lost task 43.3 in stage 360.0 (TID 9912) (172.16.177.134 executor 10): org.apache.spark.SparkException: Checkpoint block rdd_1208_8 not found! Either the executor that originally checkpointed this partition is no longer alive, or the original RDD is unpersisted. If this problem persists, you may consider using rdd.checkpoint() instead, which is slower than local checkpointing but more fault-tolerant.
at org.apache.spark.errors.SparkCoreErrors$.checkpointRDDBlockIdNotFoundError(SparkCoreErrors.scala:82)
at org.apache.spark.rdd.LocalCheckpointRDD.compute(LocalCheckpointRDD.scala:61)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:408)
at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:423)
at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1559)
at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1486)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1550)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1369)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:421)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:372)
at org.apache.spark.rdd.ZippedWithIndexRDD.compute(ZippedWithIndexRDD.scala:71)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:122)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.$anonfun$runTask$3(ShuffleMapTask.scala:81)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ShuffleMapTask.$anonfun$runTask$1(ShuffleMapTask.scala:81)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.doRunTask(Task.scala:174)
at org.apache.spark.scheduler.Task.$anonfun$run$4(Task.scala:137)
at com.databricks.unity.EmptyHandle$.runWithAndClose(UCSHandle.scala:126)
at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:137)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.Task.run(Task.scala:96)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$13(Executor.scala:902)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1697)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:905)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:760)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)

Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:3414)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:3336)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:3325)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:3325)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1433)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1433)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1433)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:3625)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3563)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:3551)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:51)
Caused by: org.apache.spark.SparkException: Checkpoint block rdd_1208_8 not found! Either the executor that originally checkpointed this partition is no longer alive, or the original RDD is unpersisted. If this problem persists, you may consider using rdd.checkpoint() instead, which is slower than local checkpointing but more fault-tolerant.
at org.apache.spark.errors.SparkCoreErrors$.checkpointRDDBlockIdNotFoundError(SparkCoreErrors.scala:82)
at org.apache.spark.rdd.LocalCheckpointRDD.compute(LocalCheckpointRDD.scala:61)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:408)
at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:423)
at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1559)
at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1486)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1550)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1369)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:421)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:372)
at org.apache.spark.rdd.ZippedWithIndexRDD.compute(ZippedWithIndexRDD.scala:71)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.UnionRDD.compute(UnionRDD.scala:122)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:60)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:410)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:374)
at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
at org.apache.spark.scheduler.ShuffleMapTask.$anonfun$runTask$3(ShuffleMapTask.scala:81)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ShuffleMapTask.$anonfun$runTask$1(ShuffleMapTask.scala:81)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.doRunTask(Task.scala:174)
at org.apache.spark.scheduler.Task.$anonfun$run$4(Task.scala:137)
at com.databricks.unity.EmptyHandle$.runWithAndClose(UCSHandle.scala:126)
at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:137)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.Task.run(Task.scala:96)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$13(Executor.scala:902)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1697)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:905)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:760)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)

shape - (191365764, 5)

Originally posted by @alanthom in #9 (comment)

Improve readme

Hello,

could you add more details about how to run a program in the README? It would help to specify the command line options of Experiments.scala. Here the steps that made it work for me.

content of ../redirect.txt

../bigdatalog.deal

content of ../bigdatalog.deal

database({ arcs(X:string, Y:string) }).
tc(X,Y) <- arcs(X,Y).

content of ../arcs

A	B
B	C
C	D
D	E
E	F
F	G
G	H

command to execute the example

./bin/run-example datalog.Experiments --program=99 --file=../redirect.txt --queryform="tc(A,B)" --baserelation_arcs=../arcs

org.apache.spark.sql.AnalysisException: undefined function mcount

Hi,

I was trying to execute un script using PageRank algorithm, so, I defined a schema like this:

database({ node(N:double), edge(S:double, Sink:double), edgeCount(S:double, Cnt:double) }).
rank(Inc, N, msum<R>) <- node(N), Inc = 1, R = 0.15 / N.
rank(Incp, N, msum<R>) <- rank(Inc, P, RP), edge(P,N), edgeCount(P, Cnt), Cnt > 0, R = 0.85 * RP / Cnt, Incp = Inc + 1.

However, when I try to execute I got this error:

Exception in thread "main" org.apache.spark.sql.AnalysisException: undefined function mcount;
	at org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry$$anonfun$2.apply(FunctionRegistry.scala:65)
	at org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry$$anonfun$2.apply(FunctionRegistry.scala:65)
	at scala.Option.getOrElse(Option.scala:120)
	at org.apache.spark.sql.catalyst.analysis.SimpleFunctionRegistry.lookupFunction(FunctionRegistry.scala:64)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$12$$anonfun$applyOrElse$5$$anonfun$applyOrElse$24.apply(Analyzer.scala:574)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$12$$anonfun$applyOrElse$5$$anonfun$applyOrElse$24.apply(Analyzer.scala:574)
	at org.apache.spark.sql.catalyst.analysis.package$.withPosition(package.scala:48)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$12$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:573)
	at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveFunctions$$anonfun$apply$12$$anonfun$applyOrElse$5.applyOrElse(Analyzer.scala:570)
	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:259)
	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$3.apply(TreeNode.scala:259)
	at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)
	at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:258)
	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:264)
	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:264)
	at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:281)
	at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
...

I'm not trying to use the mcount function, but apparently the program is trying to call it.
Thanks!

Project Future

I guess, there is no roadmap :)
It is not supposed to be compatible with Spark 2.x?
There would be nothing past DeALS-0.6.jar ( 0.7, ... ) ?
The DeAL GUI editor and DeALSFileRunner is for UCLA-use only?

Building BigDatalog in 2021

I've had an enormous amount of issues building BigDatalog in 2021:

  • Dockerfiles are all stale. Pointers to packages are dead.
  • Building BigDatalog with Java 1.8 and Spark 1.6.1 results in build errors:
[error]
[error]   last tree to typer: TypeTree(trait Seq)
[error]               symbol: trait Seq in package collection (flags: <interface> abstract <trait> <lateinterface>)
[error]    symbol definition: abstract trait Seq[+A] extends PartialFunction[Int,A] with Iterable[A] with GenSeq[A] with GenericTraversableTemplate[A,Seq] with SeqLike[A,Seq[A]]
[error]                  tpe: Seq
[error]        symbol owners: trait Seq -> package collection
[error]       context owners: anonymous class $anonfun -> method createListenerAndUI -> object SQLContext -> package sql
[error]
[error] == Enclosing template or block ==
[error]
[error] ClassDef( // final class $anonfun extends AbstractFunction1[org.apache.spark.ui.SparkUI,org.apache.spark.sql.execution.ui.SQLTab] with Serializable
[error]   final <synthetic> @{ SerialVersionUID(0) }
[error]   "$anonfun"
[error]   []
[error]   Template( // val <local $anonfun>: <notype>, tree.tpe=scala.runtime.AbstractFunction1[org.apache.spark.ui.SparkUI,org.apache.spark.sql.execution.ui.SQLTab] with Serializable
[error]     "scala.runtime.AbstractFunction1", "scala.Serializable" // parents
[error]     ValDef(
[error]       private
[error]       "_"
[error]       <tpt>
[error]       <empty>
[error]     )
[error]     // 2 statements
[error]     DefDef( // def <init>(): scala.runtime.AbstractFunction1[org.apache.spark.ui.SparkUI,org.apache.spark.sql.execution.ui.SQLTab] with Serializable
[error]       <method> <triedcooking>
[error]       "<init>"
[error]       []
[error]       List(Nil)
[error]       <tpt> // tree.tpe=scala.runtime.AbstractFunction1[org.apache.spark.ui.SparkUI,org.apache.spark.sql.execution.ui.SQLTab] with Serializable
[error]       Block( // tree.tpe=Unit
[error]         Apply( // def <init>(): scala.runtime.AbstractFunction1[T1,R] in class AbstractFunction1, tree.tpe=scala.runtime.AbstractFunction1[org.apache.spark.ui.SparkUI,org.apache.spark.sql.execution.ui.SQLTab]
[error]           $anonfun.super."<init>" // def <init>(): scala.runtime.AbstractFunction1[T1,R] in class AbstractFunction1, tree.tpe=()scala.runtime.AbstractFunction1[org.apache.spark.ui.SparkUI,org.apache.spark.sql.execution.ui.SQLTab]
[error]           Nil
[error]         )
[error]         ()
[error]       )
[error]     )
[error]     DefDef( // final def apply(x$14: org.apache.spark.ui.SparkUI): org.apache.spark.sql.execution.ui.SQLTab
[error]       <method> final
[error]       "apply"
[error]       []
[error]       // 1 parameter list
[error]       ValDef( // x$14: org.apache.spark.ui.SparkUI
[error]         <param> <synthetic> <triedcooking>
[error]         "x$14"
[error]         <tpt> // tree.tpe=org.apache.spark.ui.SparkUI
[error]         <empty>
[error]       )
[error]       <tpt> // tree.tpe=org.apache.spark.sql.execution.ui.SQLTab
[error]       Apply( // def <init>(listener: org.apache.spark.sql.execution.ui.SQLListener,sparkUI: org.apache.spark.ui.SparkUI): org.apache.spark.sql.execution.ui.SQLTab in class SQLTab, tree.tpe=org.apache.spark.sql.execution.ui.SQLTab
[error]         new org.apache.spark.sql.execution.ui.SQLTab."<init>" // def <init>(listener: org.apache.spark.sql.execution.ui.SQLListener,sparkUI: org.apache.spark.ui.SparkUI): org.apache.spark.sql.execution.ui.SQLTab in class SQLTab, tree.tpe=(listener: org.apache.spark.sql.execution.ui.SQLListener, sparkUI: org.apache.spark.ui.SparkUI)org.apache.spark.sql.execution.ui.SQLTab
[error]         // 2 arguments
[error]         "listener" // val listener: org.apache.spark.sql.execution.ui.SQLListener, tree.tpe=org.apache.spark.sql.execution.ui.SQLListener
[error]         "x$14" // x$14: org.apache.spark.ui.SparkUI, tree.tpe=org.apache.spark.ui.SparkUI
[error]       )
[error]     )
[error]   )
[error] )
[error]
[error] == Expanded type of tree ==
[error]
[error] TypeRef(
[error]   TypeSymbol(
[error]     abstract trait Seq[+A] extends PartialFunction[Int,A] with Iterable[A] with GenSeq[A] with GenericTraversableTemplate[A,Seq] with SeqLike[A,Seq[A]]
[error]
[error]   )
[error]   normalize = PolyType(
[error]     typeParams = List(TypeParam(+A))
[error]     resultType = TypeRef(
[error]       TypeSymbol(
[error]         abstract trait Seq[+A] extends PartialFunction[Int,A] with Iterable[A] with GenSeq[A] with GenericTraversableTemplate[A,Seq] with SeqLike[A,Seq[A]]
[error]
[error]       )
[error]       args = List(TypeParamTypeRef(TypeParam(+A)))
[error]     )
[error]   )
[error] )
[error]

Does anyone have a successful copy of BigDatalog? Our team has been trying to run a comparison against it compared to our own Datalog engine, but we have had very poor luck in getting it to run on anything close to a modern system and the lack of the included Dockerfiles' to continue to build has defeated any of their utility.

We've been trying for a few weeks now to build BigDatalog and have essentially given up because it's so far bitrotted..

StackOverflowError for recursion of type a <- b, b <- a

Hello,

I tried to minimize #7 further. I arrived at

a(X,Y) <- b(X,Y).
b(X,Y) <- a(X,Y).

which leads to a StackOverflowError:

Exception in thread "main" java.lang.StackOverflowError
	at edu.ucla.cs.wis.bigdatalog.interpreter.relational.ProgramGenerator.generateRecursiveOperator(ProgramGenerator.java:1439)
	at edu.ucla.cs.wis.bigdatalog.interpreter.relational.ProgramGenerator.generateOperator(ProgramGenerator.java:186)
	at edu.ucla.cs.wis.bigdatalog.interpreter.relational.ProgramGenerator.generateOperators(ProgramGenerator.java:249)
	at edu.ucla.cs.wis.bigdatalog.interpreter.relational.ProgramGenerator.generateCliqueOperator(ProgramGenerator.java:1616)
	at edu.ucla.cs.wis.bigdatalog.interpreter.relational.ProgramGenerator.generateRecursiveOperator(ProgramGenerator.java:1472)
	at edu.ucla.cs.wis.bigdatalog.interpreter.relational.ProgramGenerator.generateOperator(ProgramGenerator.java:186)
	at edu.ucla.cs.wis.bigdatalog.interpreter.relational.ProgramGenerator.generateOperators(ProgramGenerator.java:249)
	at edu.ucla.cs.wis.bigdatalog.interpreter.relational.ProgramGenerator.generateCliqueOperator(ProgramGenerator.java:1616)
...

and then the stack trace repeats again and again

SparkException: Checkpoint block not found

Hi, everyone!

I'm a fresher to BigDatalog. My OS is Ubuntu 16.04. My running steps are list as follows:

  1. Download BigDatalog by git;
  2. Compile BigDatalog by the command in Readme.md
  3. Run a test by the command like this:
    ./bin/run-example org.apache.spark.examples.datalog.Experiments program=11 file=/BigDatalog-master/datalog/src/test/resources/tree11.csv checkpointdir=/checkpoint
  4. then we say a bad result like this:
    17-12-01 09:33:56 INFO DAGScheduler: Submitting 200 missing tasks from FixedPointResultStage 3 (SetRDD.diffRDD SetRDD[32] at RDD at SetRDD.scala:29)
    17-12-01 09:33:36 INFO SparkContext: Running Spark version 1.6.1
    17-12-01 09:33:37 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    17-12-01 09:33:37 WARN Utils: Your hostname, magic resolves to a loopback address: 127.0.1.1; using 172.26.163.180 instead (on interface ppp0)
    17-12-01 09:33:37 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
    17-12-01 09:33:37 INFO SecurityManager: Changing view acls to: magic
    17-12-01 09:33:37 INFO SecurityManager: Changing modify acls to: magic
    17-12-01 09:33:37 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(magic); users with modify permissions: Set(magic)
    17-12-01 09:33:37 INFO Utils: Successfully started service 'sparkDriver' on port 37107.
    17-12-01 09:33:37 INFO Slf4jLogger: Slf4jLogger started
    17-12-01 09:33:37 INFO Remoting: Starting remoting
    17-12-01 09:33:38 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:34312]
    17-12-01 09:33:38 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 34312.
    17-12-01 09:33:38 INFO SparkEnv: Registering MapOutputTracker
    17-12-01 09:33:38 INFO SparkEnv: Registering BlockManagerMaster
    17-12-01 09:33:38 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-74c808ac-518c-4872-8034-11f83fe86f8a
    17-12-01 09:33:38 INFO MemoryStore: MemoryStore started with capacity 511.1 MB
    17-12-01 09:33:38 INFO SparkEnv: Registering OutputCommitCoordinator
    17-12-01 09:33:38 INFO Utils: Successfully started service 'SparkUI' on port 4040.
    17-12-01 09:33:38 INFO SparkUI: Started SparkUI at http://172.26.163.180:4040
    17-12-01 09:33:38 INFO HttpFileServer: HTTP File server directory is /tmp/spark-221ba8c7-a682-4887-ace6-b37e2eb9f122/httpd-6f67ecc9-fb6b-4ba4-8678-8d34d1e4faad
    17-12-01 09:33:38 INFO HttpServer: Starting HTTP Server
    17-12-01 09:33:38 INFO Utils: Successfully started service 'HTTP file server' on port 37272.
    17-12-01 09:33:38 INFO SparkContext: Added JAR file:/home/magic/BigDatalog-master/examples/target/scala-2.10/spark-examples-1.6.1-hadoop2.4.0.jar at http://172.26.163.180:37272/jars/spark-examples-1.6.1-hadoop2.4.0.jar with timestamp 1512092018724
    17-12-01 09:33:38 INFO Executor: Starting executor ID driver on host localhost
    17-12-01 09:33:38 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 42113.
    17-12-01 09:33:38 INFO NettyBlockTransferService: Server created on 42113
    17-12-01 09:33:38 INFO BlockManagerMaster: Trying to register BlockManager
    17-12-01 09:33:38 INFO BlockManagerMasterEndpoint: Registering block manager localhost:42113 with 511.1 MB RAM, BlockManagerId(driver, localhost, 42113)
    17-12-01 09:33:38 INFO BlockManagerMaster: Registered BlockManager
    17-12-01 09:33:40 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 156.3 KB, free 156.3 KB)
    17-12-01 09:33:40 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 16.8 KB, free 173.1 KB)
    17-12-01 09:33:40 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:42113 (size: 16.8 KB, free: 511.1 MB)
    17-12-01 09:33:40 INFO SparkContext: Created broadcast 0 from textFile at Utilities.scala:97
    17-12-01 09:33:40 INFO BigDatalogContext: BigDatalog Query: "tc(A,B)."
    17-12-01 09:33:41 INFO BigDatalogContext: ** START Operator Program START **
    17-12-01 09:33:41 INFO BigDatalogContext:
    0: tc(A, To) <RECURSIVE_CLIQUE>(Recursion: LINEAR, Evaluation Type: SemiNaive)
    Exit Rules:
    1: arc(From, To) <BASE_RELATION>
    Recursive Rules:
    1: (A, To)
    2: (0.C = 1.From)
    3: tc(A, C) <RECURSIVE_RELATION>
    3: arc(From, To) <BASE_RELATION>
    17-12-01 09:33:41 INFO BigDatalogContext: ** END Operator Program END **
    17-12-01 09:33:41 INFO BigDatalogContext: ** START BigDatalog Program START **
    17-12-01 09:33:41 INFO BigDatalogContext: == Parsed Logical Plan ==
    'Subquery tc
    +- 'Recursion tc, true, [1,0]
    :- 'UnresolvedRelation arc, None
    +- 'Project ['tc1.A,'arc2.To]
    +- 'Join Inner, Some(('tc1.C = 'arc2.From))
    :- Subquery tc1
    : +- LinearRecursiveRelation tc, [A#4,C#5], [1,0]
    +- 'BroadcastHint
    +- 'Subquery arc2
    +- 'Project [*]
    +- 'UnresolvedRelation arc, None

== Analyzed Logical Plan ==
A: int, To: int
Subquery tc
+- Recursion tc, true, [1,0]
:- Subquery arc
: +- LogicalRDD [From#0,To#1], MapPartitionsRDD[4] at mapPartitions at Utilities.scala:100
+- Project [A#4,To#1]
+- Join Inner, Some((C#5 = From#0))
:- Subquery tc1
: +- LinearRecursiveRelation tc, [A#4,C#5], [1,0]
+- BroadcastHint
+- Subquery arc2
+- Project [From#0,To#1]
+- Subquery arc
+- LogicalRDD [From#0,To#1], MapPartitionsRDD[4] at mapPartitions at Utilities.scala:100

== Optimized Logical Plan ==
Recursion tc, true, [1,0]
:- LogicalRDD [From#0,To#1], MapPartitionsRDD[4] at mapPartitions at Utilities.scala:100
+- Project [A#4,To#1]
+- Join Inner, Some((C#5 = From#0))
:- LinearRecursiveRelation tc, [A#4,C#5], [1,0]
+- BroadcastHint
+- Project [From#0,To#1]
+- LogicalRDD [From#0,To#1], MapPartitionsRDD[4] at mapPartitions at Utilities.scala:100

== Physical Plan ==
Recursion [A#4,To#1] (Linear) [tc][1,0]
:- TungstenExchange hashpartitioning(From#0,200), None
: +- ConvertToUnsafe
: +- Scan ExistingRDD[From#0,To#1]
+- Project [A#4,To#1]
+- BroadcastHashJoin [C#5], [From#0], BuildRight
:- LinearRecursiveRelation A#4,C#5
+- Project [From#0,To#1]
+- Scan ExistingRDD[From#0,To#1]
17-12-01 09:33:41 INFO BigDatalogContext: ** END BigDatalog Program END **
17-12-01 09:33:41 INFO Recursion: Recursion operator configuration settings:
17-12-01 09:33:41 INFO Recursion: Using memory checkpointing with StorageLevel(false, true, false, true, 1)
17-12-01 09:33:41 INFO Recursion: Recursion version: Single-Job-PSN w/ SetRDD
17-12-01 09:33:41 INFO FileInputFormat: Total input paths to process : 1
17-12-01 09:33:41 INFO FileInputFormat: Total input paths to process : 1
17-12-01 09:33:41 INFO SparkContext: Starting job: run at ThreadPoolExecutor.java:1149
17-12-01 09:33:41 INFO DAGScheduler: Got job 0 (run at ThreadPoolExecutor.java:1149) with 200 output partitions
17-12-01 09:33:41 INFO DAGScheduler: Final stage: ResultStage 0 (run at ThreadPoolExecutor.java:1149)
17-12-01 09:33:41 INFO DAGScheduler: Parents of final stage: List()
17-12-01 09:33:41 INFO DAGScheduler: Missing parents: List()
17-12-01 09:33:41 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[7] at run at ThreadPoolExecutor.java:1149), which has no missing parents
17-12-01 09:33:41 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 7.6 KB, free 180.7 KB)
17-12-01 09:33:41 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 3.9 KB, free 184.6 KB)
17-12-01 09:33:41 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:42113 (size: 3.9 KB, free: 511.1 MB)
17-12-01 09:33:41 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1096
17-12-01 09:33:41 INFO DAGScheduler: Submitting 200 missing tasks from ResultStage 0 (MapPartitionsRDD[7] at run at ThreadPoolExecutor.java:1149)
17-12-01 09:33:41 INFO TaskSchedulerImpl: Adding task set 0.0 with 200 tasks
17-12-01 09:33:41 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0,PROCESS_LOCAL, 2522 bytes)
17-12-01 09:33:41 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, partition 1,PROCESS_LOCAL, 2522 bytes)
17-12-01 09:33:41 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, localhost, partition 2,PROCESS_LOCAL, 2522 bytes)
17-12-01 09:33:41 INFO TaskSetManager: Starting task 3.0 in stage 0.0 (TID 3, localhost, partition 3,PROCESS_LOCAL, 2522 bytes)
17-12-01 09:33:41 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
17-12-01 09:33:41 INFO Executor: Running task 2.0 in stage 0.0 (TID 2)
17-12-01 09:33:41 INFO Executor: Running task 3.0 in stage 0.0 (TID 3)
17-12-01 09:33:41 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
......(omit)
17-12-01 09:33:56 INFO TaskSchedulerImpl: Adding task set 3.0 with 200 tasks
17-12-01 09:33:56 INFO TaskSetManager: Starting task 124.0 in stage 3.0 (TID 600, localhost, partition 124,PROCESS_LOCAL, 2344 bytes)
17-12-01 09:33:56 INFO TaskSetManager: Starting task 125.0 in stage 3.0 (TID 601, localhost, partition 125,PROCESS_LOCAL, 2344 bytes)
17-12-01 09:33:56 INFO TaskSetManager: Starting task 126.0 in stage 3.0 (TID 602, localhost, partition 126,PROCESS_LOCAL, 2344 bytes)
17-12-01 09:33:56 INFO TaskSetManager: Starting task 127.0 in stage 3.0 (TID 603, localhost, partition 127,PROCESS_LOCAL, 2344 bytes)
17-12-01 09:33:56 INFO Executor: Running task 126.0 in stage 3.0 (TID 602)
17-12-01 09:33:56 INFO Executor: Running task 124.0 in stage 3.0 (TID 600)
17-12-01 09:33:56 INFO Executor: Running task 125.0 in stage 3.0 (TID 601)
17-12-01 09:33:56 INFO Executor: Running task 127.0 in stage 3.0 (TID 603)
17-12-01 09:33:56 INFO CacheManager: Partition rdd_31_127 not found, computing it
17-12-01 09:33:56 INFO CacheManager: Partition rdd_31_126 not found, computing it
17-12-01 09:33:56 INFO CacheManager: Partition rdd_31_124 not found, computing it
17-12-01 09:33:56 INFO CacheManager: Partition rdd_31_125 not found, computing it
17-12-01 09:33:56 INFO CacheManager: Partition rdd_27_127 not found, computing it
17-12-01 09:33:56 INFO CacheManager: Partition rdd_27_125 not found, computing it
17-12-01 09:33:56 INFO BlockManager: Found block rdd_17_127 locally
17-12-01 09:33:56 INFO CacheManager: Partition rdd_27_124 not found, computing it
17-12-01 09:33:56 INFO CacheManager: Partition rdd_27_126 not found, computing it
17-12-01 09:33:56 INFO CacheManager: Partition rdd_17_124 not found, computing it
17-12-01 09:33:56 INFO BlockManager: Found block rdd_21_127 locally
17-12-01 09:33:56 INFO SetRDDHashSetPartition: Union set size 988 for rdd 18 took 0 ms
17-12-01 09:33:56 ERROR Executor: Exception in task 124.0 in stage 3.0 (TID 600)
org.apache.spark.SparkException: Checkpoint block rdd_17_124 not found! Either the executor that originally checkpointed this partition is no longer alive, or the original RDD is unpersisted. If this problem persists, you may consider using rdd.checkpoint() or rdd.localcheckpoint() instead, which are slower than memory checkpointing but more fault-tolerant.
at org.apache.spark.rdd.MemoryCheckpointRDD.compute(MemoryCheckpointRDD.scala:43)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:304)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
at edu.ucla.cs.wis.bigdatalog.spark.execution.setrdd.SetRDD.compute(SetRDD.scala:108)
at edu.ucla.cs.wis.bigdatalog.spark.execution.setrdd.SetRDD.computeOrReadCheckpoint(SetRDD.scala:104)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.fixedpoint.FixedPointResultTask.runTask(FixedPointResultTask.scala:54)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
17-12-01 09:33:56 INFO BlockManager: Found block rdd_17_125 locally
17-12-01 09:33:56 INFO CacheManager: Partition rdd_17_126 not found, computing it
17-12-01 09:33:56 ERROR Executor: Exception in task 126.0 in stage 3.0 (TID 602)
org.apache.spark.SparkException: Checkpoint block rdd_17_126 not found! Either the executor that originally checkpointed this partition is no longer alive, or the original RDD is unpersisted. If this problem persists, you may consider using rdd.checkpoint() or rdd.localcheckpoint() instead, which are slower than memory checkpointing but more fault-tolerant.
at org.apache.spark.rdd.MemoryCheckpointRDD.compute(MemoryCheckpointRDD.scala:43)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:304)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
at edu.ucla.cs.wis.bigdatalog.spark.execution.setrdd.SetRDD.compute(SetRDD.scala:108)
at edu.ucla.cs.wis.bigdatalog.spark.execution.setrdd.SetRDD.computeOrReadCheckpoint(SetRDD.scala:104)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.fixedpoint.FixedPointResultTask.runTask(FixedPointResultTask.scala:54)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
17-12-01 09:33:56 INFO BlockManager: Found block rdd_21_125 locally
17-12-01 09:33:56 INFO SetRDDHashSetPartition: Union set size 1032 for rdd 18 took 0 ms
17-12-01 09:33:56 INFO MemoryStore: 1 blocks selected for dropping
17-12-01 09:33:56 INFO BlockManager: Dropping block rdd_21_124 from memory
17-12-01 09:33:56 INFO BlockManagerInfo: Removed rdd_21_124 on localhost:42113 in memory (size: 1708.1 KB, free: 18.3 MB)
17-12-01 09:33:56 INFO TaskSetManager: Starting task 128.0 in stage 3.0 (TID 604, localhost, partition 128,PROCESS_LOCAL, 2344 bytes)
17-12-01 09:33:56 INFO MemoryStore: Block rdd_27_127 stored as values in memory (estimated size 1720.1 KB, free 509.1 MB)
17-12-01 09:33:56 INFO TaskSetManager: Starting task 129.0 in stage 3.0 (TID 605, localhost, partition 129,PROCESS_LOCAL, 2344 bytes)
17-12-01 09:33:56 INFO Executor: Running task 129.0 in stage 3.0 (TID 605)
17-12-01 09:33:56 WARN TaskSetManager: Lost task 124.0 in stage 3.0 (TID 600, localhost): org.apache.spark.SparkException: Checkpoint block rdd_17_124 not found! Either the executor that originally checkpointed this partition is no longer alive, or the original RDD is unpersisted. If this problem persists, you may consider using rdd.checkpoint() or rdd.localcheckpoint() instead, which are slower than memory checkpointing but more fault-tolerant.
at org.apache.spark.rdd.MemoryCheckpointRDD.compute(MemoryCheckpointRDD.scala:43)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:304)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
at edu.ucla.cs.wis.bigdatalog.spark.execution.setrdd.SetRDD.compute(SetRDD.scala:108)
at edu.ucla.cs.wis.bigdatalog.spark.execution.setrdd.SetRDD.computeOrReadCheckpoint(SetRDD.scala:104)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.fixedpoint.FixedPointResultTask.runTask(FixedPointResultTask.scala:54)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

17-12-01 09:33:56 INFO Executor: Running task 128.0 in stage 3.0 (TID 604)
17-12-01 09:33:56 INFO CacheManager: Partition rdd_31_129 not found, computing it
17-12-01 09:33:56 ERROR TaskSetManager: Task 124 in stage 3.0 failed 1 times; aborting job
17-12-01 09:33:56 INFO BlockManagerInfo: Added rdd_27_127 in memory on localhost:42113 (size: 1720.1 KB, free: 16.6 MB)
17-12-01 09:33:56 WARN TaskSetManager: Lost task 126.0 in stage 3.0 (TID 602, localhost): org.apache.spark.SparkException: Checkpoint block rdd_17_126 not found! Either the executor that originally checkpointed this partition is no longer alive, or the original RDD is unpersisted. If this problem persists, you may consider using rdd.checkpoint() or rdd.localcheckpoint() instead, which are slower than memory checkpointing but more fault-tolerant.
at org.apache.spark.rdd.MemoryCheckpointRDD.compute(MemoryCheckpointRDD.scala:43)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:304)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
at edu.ucla.cs.wis.bigdatalog.spark.execution.setrdd.SetRDD.compute(SetRDD.scala:108)
at edu.ucla.cs.wis.bigdatalog.spark.execution.setrdd.SetRDD.computeOrReadCheckpoint(SetRDD.scala:104)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.fixedpoint.FixedPointResultTask.runTask(FixedPointResultTask.scala:54)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

17-12-01 09:33:56 INFO MemoryStore: 1 blocks selected for dropping
17-12-01 09:33:56 INFO BlockManager: Dropping block rdd_11_128 from memory
17-12-01 09:33:56 INFO CacheManager: Partition rdd_31_128 not found, computing it
17-12-01 09:33:56 INFO BlockManagerInfo: Removed rdd_11_128 on localhost:42113 in memory (size: 1708.1 KB, free: 18.3 MB)
17-12-01 09:33:56 INFO CacheManager: Partition rdd_27_128 not found, computing it
17-12-01 09:33:56 INFO BlockManager: Found block rdd_17_128 locally
17-12-01 09:33:56 INFO BlockManager: Found block rdd_21_128 locally
17-12-01 09:33:56 INFO SetRDDHashSetPartition: Union set size 846 for rdd 18 took 0 ms
17-12-01 09:33:56 INFO BlockManager: Found block rdd_21_127 locally
17-12-01 09:33:56 INFO MemoryStore: 1 blocks selected for dropping
17-12-01 09:33:56 INFO BlockManager: Dropping block rdd_15_125 from memory
17-12-01 09:33:56 INFO SetRDDHashSetPartition: Diff set size 150 for rdd 28 took 0 ms
17-12-01 09:33:56 INFO CacheManager: Partition rdd_27_129 not found, computing it
17-12-01 09:33:56 INFO BlockManagerInfo: Removed rdd_15_125 on localhost:42113 in memory (size: 1708.1 KB, free: 20.0 MB)
17-12-01 09:33:56 INFO TaskSchedulerImpl: Cancelling stage 3
17-12-01 09:33:56 INFO BlockManager: Found block rdd_17_129 locally
17-12-01 09:33:56 INFO BlockManager: Found block rdd_21_129 locally
17-12-01 09:33:56 INFO SetRDDHashSetPartition: Union set size 1193 for rdd 18 took 1 ms
17-12-01 09:33:56 INFO MemoryStore: 1 blocks selected for dropping
17-12-01 09:33:56 INFO BlockManager: Dropping block rdd_15_127 from memory
17-12-01 09:33:56 INFO BlockManagerInfo: Removed rdd_15_127 on localhost:42113 in memory (size: 1708.1 KB, free: 21.6 MB)
17-12-01 09:33:56 INFO MemoryStore: 1 blocks selected for dropping
17-12-01 09:33:56 INFO BlockManager: Dropping block rdd_21_126 from memory
17-12-01 09:33:56 INFO BlockManagerInfo: Removed rdd_21_126 on localhost:42113 in memory (size: 1712.1 KB, free: 23.3 MB)
17-12-01 09:33:56 INFO Executor: Executor is trying to kill task 128.0 in stage 3.0 (TID 604)
17-12-01 09:33:56 INFO Executor: Executor is trying to kill task 125.0 in stage 3.0 (TID 601)
17-12-01 09:33:56 INFO Executor: Executor is trying to kill task 129.0 in stage 3.0 (TID 605)
17-12-01 09:33:56 INFO Executor: Executor is trying to kill task 127.0 in stage 3.0 (TID 603)
17-12-01 09:33:56 INFO TaskSchedulerImpl: Stage 3 was cancelled
17-12-01 09:33:56 INFO DAGScheduler: FixedPointResultStage 3 (runFixedPointJob at Recursion.scala:197) failed in 0.047 s
17-12-01 09:33:56 INFO DAGScheduler: Fixed Point Job 1 failed: runFixedPointJob at Recursion.scala:197, took 11.303584 s
17-12-01 09:33:56 INFO SparkContext: Invoking stop() from shutdown hook
17-12-01 09:33:56 INFO MemoryStore: Block rdd_27_128 stored as values in memory (estimated size 1720.1 KB, free 504.1 MB)
17-12-01 09:33:56 INFO BlockManagerInfo: Added rdd_27_128 in memory on localhost:42113 (size: 1720.1 KB, free: 21.6 MB)
17-12-01 09:33:56 INFO BlockManager: Found block rdd_21_128 locally
17-12-01 09:33:56 INFO SetRDDHashSetPartition: Diff set size 139 for rdd 28 took 0 ms
17-12-01 09:33:56 INFO MemoryStore: 1 blocks selected for dropping
17-12-01 09:33:56 INFO BlockManager: Dropping block rdd_15_128 from memory
17-12-01 09:33:56 INFO BlockManagerInfo: Removed rdd_15_128 on localhost:42113 in memory (size: 1708.1 KB, free: 23.3 MB)
17-12-01 09:33:56 INFO MemoryStore: Block rdd_31_127 stored as values in memory (estimated size 1706.1 KB, free 504.1 MB)
17-12-01 09:33:56 INFO BlockManagerInfo: Added rdd_31_127 in memory on localhost:42113 (size: 1706.1 KB, free: 21.6 MB)
17-12-01 09:33:56 INFO Executor: Executor killed task 127.0 in stage 3.0 (TID 603)
17-12-01 09:33:56 WARN TaskSetManager: Lost task 127.0 in stage 3.0 (TID 603, localhost): TaskKilled (killed intentionally)
17-12-01 09:33:56 INFO MemoryStore: 1 blocks selected for dropping
17-12-01 09:33:56 INFO BlockManager: Dropping block rdd_11_130 from memory
17-12-01 09:33:56 INFO BlockManagerInfo: Removed rdd_11_130 on localhost:42113 in memory (size: 1708.1 KB, free: 23.3 MB)
17-12-01 09:33:56 INFO MemoryStore: Block rdd_27_125 stored as values in memory (estimated size 1720.1 KB, free 504.1 MB)
17-12-01 09:33:56 INFO BlockManagerInfo: Added rdd_27_125 in memory on localhost:42113 (size: 1720.1 KB, free: 21.6 MB)
17-12-01 09:33:56 INFO BlockManager: Found block rdd_21_125 locally
17-12-01 09:33:56 INFO SetRDDHashSetPartition: Diff set size 67 for rdd 28 took 0 ms
17-12-01 09:33:56 INFO MemoryStore: Block rdd_27_129 stored as values in memory (estimated size 1720.1 KB, free 505.8 MB)
17-12-01 09:33:56 INFO BlockManagerInfo: Added rdd_27_129 in memory on localhost:42113 (size: 1720.1 KB, free: 19.9 MB)
17-12-01 09:33:56 INFO MemoryStore: Block rdd_31_128 stored as values in memory (estimated size 1706.1 KB, free 507.5 MB)
17-12-01 09:33:56 INFO BlockManager: Found block rdd_21_129 locally
17-12-01 09:33:56 INFO SetRDDHashSetPartition: Diff set size 303 for rdd 28 took 0 ms
17-12-01 09:33:56 INFO BlockManagerInfo: Added rdd_31_128 in memory on localhost:42113 (size: 1706.1 KB, free: 18.3 MB)
17-12-01 09:33:56 INFO MemoryStore: Block rdd_31_125 stored as values in memory (estimated size 1705.1 KB, free 509.1 MB)
17-12-01 09:33:56 INFO Executor: Executor killed task 128.0 in stage 3.0 (TID 604)
17-12-01 09:33:56 INFO BlockManagerInfo: Added rdd_31_125 in memory on localhost:42113 (size: 1705.1 KB, free: 16.6 MB)
17-12-01 09:33:56 INFO Executor: Executor killed task 125.0 in stage 3.0 (TID 601)
17-12-01 09:33:56 INFO MemoryStore: 1 blocks selected for dropping
17-12-01 09:33:56 INFO BlockManager: Dropping block rdd_11_129 from memory
17-12-01 09:33:56 INFO BlockManagerInfo: Removed rdd_11_129 on localhost:42113 in memory (size: 1712.1 KB, free: 18.3 MB)
17-12-01 09:33:56 WARN TaskSetManager: Lost task 125.0 in stage 3.0 (TID 601, localhost): TaskKilled (killed intentionally)
17-12-01 09:33:56 WARN TaskSetManager: Lost task 128.0 in stage 3.0 (TID 604, localhost): TaskKilled (killed intentionally)
17-12-01 09:33:56 INFO MemoryStore: Block rdd_31_129 stored as values in memory (estimated size 1708.1 KB, free 509.1 MB)
17-12-01 09:33:56 INFO BlockManagerInfo: Added rdd_31_129 in memory on localhost:42113 (size: 1708.1 KB, free: 16.6 MB)
17-12-01 09:33:56 INFO Executor: Executor killed task 129.0 in stage 3.0 (TID 605)
17-12-01 09:33:56 WARN TaskSetManager: Lost task 129.0 in stage 3.0 (TID 605, localhost): TaskKilled (killed intentionally)
17-12-01 09:33:56 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool
17-12-01 09:33:56 INFO SparkUI: Stopped Spark web UI at http://172.26.163.180:4040
17-12-01 09:33:56 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
17-12-01 09:33:57 INFO MemoryStore: MemoryStore cleared
17-12-01 09:33:57 INFO BlockManager: BlockManager stopped
17-12-01 09:33:57 INFO BlockManagerMaster: BlockManagerMaster stopped
17-12-01 09:33:57 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
17-12-01 09:33:57 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
17-12-01 09:33:57 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
17-12-01 09:33:57 INFO SparkContext: Successfully stopped SparkContext
17-12-01 09:33:57 INFO ShutdownHookManager: Shutdown hook called
17-12-01 09:33:57 INFO ShutdownHookManager: Deleting directory /tmp/spark-221ba8c7-a682-4887-ace6-b37e2eb9f122/httpd-6f67ecc9-fb6b-4ba4-8678-8d34d1e4faad
17-12-01 09:33:57 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
17-12-01 09:33:57 INFO ShutdownHookManager: Deleting directory /tmp/spark-221ba8c7-a682-4887-ace6-b37e2eb9f122

The problems come always the same when we changing the computers and the programs. That bother us. Anyone can help?
Thank you very much.

NullPointerException for recursion of type a<-b, b<-a

Hello,

I tried to execute the query rel(X,Y) with the following datalog program:

rel2(X,Y) <- rel1(X,Y). 
rel2(X,Z) <- rel3(X,Y), rel1(Y,Z). 
rel3(X,Z) <- rel2(X,Y), rel1(Y,Z). 
database({ rel1(X:string, Y:string) }).

and the following data for rel1:

a	b
b	c

I expect the output rel3(a,c). However I get a NullPointerException instead:

17/10/14 23:16:44 INFO BigDatalogContext: BigDatalog Query: "rel3(A,B)"
17/10/14 23:16:44 INFO BigDatalogContext: ** START Operator Program START **
17/10/14 23:16:44 INFO BigDatalogContext: 
0: rel3(X, Y) <RECURSIVE_CLIQUE>(Recursion: LINEAR, Evaluation Type: SemiNaive)
Exit Rules: 
Recursive Rules: 
 1: (X, Y) <DISTINCT PROJECT>
  2: (0.Y = 1.X) <JOIN>
   3: rel2(X, Y) <MUTUAL_RECURSIVE_CLIQUE>(Recursion: LINEAR, Evaluation Type: SemiNaive)
   Exit Rules: 
    4: rel1(X, Y) <BASE_RELATION>
   Recursive Rules: 
    4: (X, Y) <DISTINCT PROJECT>
     5: (0.Y = 1.X) <JOIN>
      6: rel3(X, Y) <RECURSIVE_RELATION>
      6: rel1(X, Y) <BASE_RELATION>
   3: rel1(X, Y) <BASE_RELATION>
17/10/14 23:16:44 INFO BigDatalogContext: ** END Operator Program END **
17/10/14 23:16:44 INFO BigDatalogContext: ** START BigDatalog Program START **
Exception in thread "main" java.lang.NullPointerException
	at edu.ucla.cs.wis.bigdatalog.spark.logical.LogicalPlanGenerator.getPlan(LogicalPlanGenerator.scala:74)
	at edu.ucla.cs.wis.bigdatalog.spark.logical.LogicalPlanGenerator.getPlan(LogicalPlanGenerator.scala:101)
	at edu.ucla.cs.wis.bigdatalog.spark.logical.LogicalPlanGenerator.getPlan(LogicalPlanGenerator.scala:70)
	at edu.ucla.cs.wis.bigdatalog.spark.logical.LogicalPlanGenerator.generateSparkProgram(LogicalPlanGenerator.scala:66)
	at edu.ucla.cs.wis.bigdatalog.spark.BigDatalogContext.generateProgram(BigDatalogContext.scala:149)
	at edu.ucla.cs.wis.bigdatalog.spark.BigDatalogContext.query(BigDatalogContext.scala:137)
	at bigdatalog.Main.main(Main.java:59)
17/10/14 23:16:44 INFO SparkContext: Invoking stop() from shutdown hook

Is there any way to get it working?

Recursion failing because of missing checkpoint blocks

I tried to use the recursion, but it fails with a lot of error messages. (See #3 for more details on how I run the program)

content of arcs

A	B
B	C
C	D
D	E
E	F
F	G
G	H

content of bigdatalog.deal

database({ arcs(X:string, Y:string) }).
tc(X,Y) <- arcs(X,Y).
tc(X,Y) <- tc(X, Z), arcs(Z,Y).

command

./bin/run-example datalog.Experiments --program=99 --file=../redirect.txt --queryform="tc(A,B)" --baserelation_arcs=../bigdatalog-java/arcs

error message

Here the part of the output with the first error message (see the attachment for the complete log)

17/10/10 16:35:57 INFO Recursion: Fixed Point Iteration # 2, time: 9170ms
17/10/10 16:35:57 INFO DAGScheduler: Submitting FixedPointResultStage 3 (SetRDD.diffRDD SetRDD[32] at RDD at SetRDD.scala:29), which has no missing parents
17/10/10 16:35:57 INFO MemoryStore: Block broadcast_5 stored as values in memory (estimated size 16.9 KB, free 510.0 MB)
17/10/10 16:35:57 INFO MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 8.7 KB, free 510.1 MB)
17/10/10 16:35:57 INFO BlockManagerInfo: Added broadcast_5_piece0 in memory on localhost:43953 (size: 8.7 KB, free: 1135.5 KB)
17/10/10 16:35:57 INFO SparkContext: Created broadcast 5 from broadcast at DAGScheduler.scala:1096
17/10/10 16:35:57 INFO DAGScheduler: Submitting 200 missing tasks from FixedPointResultStage 3 (SetRDD.diffRDD SetRDD[32] at RDD at SetRDD.scala:29)
17/10/10 16:35:57 INFO TaskSchedulerImpl: Adding task set 3.0 with 200 tasks
17/10/10 16:35:57 INFO TaskSetManager: Starting task 121.0 in stage 3.0 (TID 256, localhost, partition 121,PROCESS_LOCAL, 2343 bytes)
17/10/10 16:35:57 INFO TaskSetManager: Starting task 123.0 in stage 3.0 (TID 257, localhost, partition 123,PROCESS_LOCAL, 2343 bytes)
17/10/10 16:35:57 INFO TaskSetManager: Starting task 124.0 in stage 3.0 (TID 258, localhost, partition 124,PROCESS_LOCAL, 2343 bytes)
17/10/10 16:35:57 INFO TaskSetManager: Starting task 125.0 in stage 3.0 (TID 259, localhost, partition 125,PROCESS_LOCAL, 2343 bytes)
17/10/10 16:35:57 INFO Executor: Running task 121.0 in stage 3.0 (TID 256)
17/10/10 16:35:57 INFO Executor: Running task 123.0 in stage 3.0 (TID 257)
17/10/10 16:35:57 INFO Executor: Running task 124.0 in stage 3.0 (TID 258)
17/10/10 16:35:57 INFO Executor: Running task 125.0 in stage 3.0 (TID 259)
17/10/10 16:35:57 INFO CacheManager: Partition rdd_31_123 not found, computing it
17/10/10 16:35:57 INFO CacheManager: Partition rdd_31_125 not found, computing it
17/10/10 16:35:57 INFO CacheManager: Partition rdd_31_121 not found, computing it
17/10/10 16:35:57 INFO CacheManager: Partition rdd_27_123 not found, computing it
17/10/10 16:35:57 INFO CacheManager: Partition rdd_27_121 not found, computing it
17/10/10 16:35:57 INFO BlockManager: Found block rdd_17_123 locally
17/10/10 16:35:57 INFO BlockManager: Found block rdd_21_123 locally
17/10/10 16:35:57 INFO CacheManager: Partition rdd_17_121 not found, computing it
17/10/10 16:35:57 INFO SetRDDHashSetPartition: Union set size 0 for rdd 18 took 0 ms
17/10/10 16:35:57 ERROR Executor: Exception in task 121.0 in stage 3.0 (TID 256)
org.apache.spark.SparkException: Checkpoint block rdd_17_121 not found! Either the executor that originally checkpointed this partition is no longer alive, or the original RDD is unpersisted. If this problem persists, you may consider using `rdd.checkpoint()` or `rdd.localcheckpoint()` instead, which are slower than memory checkpointing but more fault-tolerant.
	at org.apache.spark.rdd.MemoryCheckpointRDD.compute(MemoryCheckpointRDD.scala:43)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:304)
	at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
	at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
	at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
	at org.apache.spark.rdd.ZippedPartitionsRDD2.compute(ZippedPartitionsRDD.scala:88)
	at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
	at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
	at edu.ucla.cs.wis.bigdatalog.spark.execution.setrdd.SetRDD.compute(SetRDD.scala:108)
	at edu.ucla.cs.wis.bigdatalog.spark.execution.setrdd.SetRDD.computeOrReadCheckpoint(SetRDD.scala:104)
	at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
	at org.apache.spark.scheduler.fixedpoint.FixedPointResultTask.runTask(FixedPointResultTask.scala:54)
	at org.apache.spark.scheduler.Task.run(Task.scala:89)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
17/10/10 16:35:57 INFO CacheManager: Partition rdd_27_125 not found, computing it
17/10/10 16:35:57 INFO BlockManager: Found block rdd_17_125 locally
17/10/10 16:35:57 INFO BlockManager: Found block rdd_21_125 locally
17/10/10 16:35:57 INFO SetRDDHashSetPartition: Union set size 0 for rdd 18 took 0 ms
17/10/10 16:35:57 INFO MemoryStore: 1 blocks selected for dropping
17/10/10 16:35:57 INFO BlockManager: Dropping block rdd_17_124 from memory
17/10/10 16:35:57 INFO BlockManagerInfo: Removed rdd_17_124 on localhost:43953 in memory (size: 1701.1 KB, free: 2.8 MB)
17/10/10 16:35:57 INFO MemoryStore: 1 blocks selected for dropping
17/10/10 16:35:57 INFO BlockManager: Dropping block rdd_11_125 from memory
17/10/10 16:35:57 INFO BlockManagerInfo: Removed rdd_11_125 on localhost:43953 in memory (size: 1701.1 KB, free: 4.4 MB)
17/10/10 16:35:57 INFO MemoryStore: 1 blocks selected for dropping
17/10/10 16:35:57 INFO BlockManager: Dropping block rdd_15_124 from memory
17/10/10 16:35:57 INFO BlockManagerInfo: Removed rdd_15_124 on localhost:43953 in memory (size: 1701.1 KB, free: 6.1 MB)

bigdatalog.log.zip

Wrong result for term with repeated variables

Hello,

I tried to evaluate the query bad(X,Y) on program

bad(X, X) <- parent(X, X).

with the following input:

parent(bob, alice).
parent(charly, alice).

It should not produce any result tuples, however I get

bad(bob, alice).
bad(charly, alice).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.