Comments (4)
hello guys,
Any updates on the issue? this blocks the use of delta-lake (a table format from databricks) with spylon.
I am getting:
Caused by: java.lang.ArrayStoreException: org.apache.spark.sql.delta.actions.AddFile
at scala.runtime.ScalaRunTime$.array_update(ScalaRunTime.scala:75)
at scala.Array$.slowcopy(Array.scala:152)
at scala.Array$.copy(Array.scala:178)
at scala.collection.mutable.ResizableArray.copyToArray(ResizableArray.scala:80)
at scala.collection.mutable.ResizableArray.copyToArray$(ResizableArray.scala:78)
at scala.collection.mutable.ArrayBuffer.copyToArray(ArrayBuffer.scala:49)
at scala.collection.TraversableOnce.copyToArray(TraversableOnce.scala:316)
at scala.collection.TraversableOnce.copyToArray$(TraversableOnce.scala:315)
at scala.collection.AbstractTraversable.copyToArray(Traversable.scala:108)
at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:324)
at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:321)
at scala.collection.AbstractTraversable.toArray(Traversable.scala:108)
at org.apache.spark.sql.delta.files.DelayedCommitProtocol.commitJob(DelayedCommitProtocol.scala:59)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:215)
from spylon-kernel.
I have no insight into what's wrong after poking and doing a bit of research. There have been similar reports of issues like this in the past, with slight variance in the details:
- https://groups.google.com/forum/#!topic/spark-users/FwXualTA_sQ
- https://issues.cloudera.org/browse/LIVY-62
Interestingly enough, the case class works fine when using DataFrames and Datasets:
import org.apache.spark.sql.types.{StructType, StructField, StringType, IntegerType}
val first_name = StructField("first_name", StringType)
val last_name = StructField("last_name", StringType)
val age = StructField("age", IntegerType)
val schema = StructType(Array(first_name, last_name, age))
val df = spark.read.schema(schema).csv("person.txt")
val ds = df.as[Persons]
ds.collect()
from spylon-kernel.
I used your example to test my spylon kernel env, it didn't work also. When I executed the code:
ds2.collect
the error message is:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 1 in stage 0.0 failed 1 times, most recent failure: Lost task 1.0 in stage 0.0 (TID 1, localhost, executor driver): java.lang.ClassCastException: DataRow cannot be cast to DataRow
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:234)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:228)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1499)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1487)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1486)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1486)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1714)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2022)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2043)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2062)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:2087)
at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:936)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:362)
at org.apache.spark.rdd.RDD.collect(RDD.scala:935)
at org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:278)
at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:2853)
at org.apache.spark.sql.Dataset$$anonfun$collect$1.apply(Dataset.scala:2390)
at org.apache.spark.sql.Dataset$$anonfun$collect$1.apply(Dataset.scala:2390)
at org.apache.spark.sql.Dataset$$anonfun$55.apply(Dataset.scala:2837)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:2836)
at org.apache.spark.sql.Dataset.collect(Dataset.scala:2390)
... 37 elided
Caused by: java.lang.ClassCastException: DataRow cannot be cast to DataRow
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:234)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:228)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
... 1 more
Do you meet such problem? I didn't figure out why the problem is produced, environment or others.
My the docker image:
docker push leeivan/spark-lab-spylon
from spylon-kernel.
I used Apache Toree-Scala kernel in jupyter lab with case class, and had the same problem. I found the jira: https://issues.apache.org/jira/browse/TOREE-428, but the bug also exists in the latest version. Any update now?
from spylon-kernel.
Related Issues (20)
- ExecutorClassLoader error in Spylon notebook
- How add additional jar files to SparkContext HOT 5
- [Request] provide example of Spark Yarn cluster conectivity (EMR) HOT 1
- Write NULL File to HDFS
- Graph Frames modules are missing
- Not able of import external packages HOT 6
- Question: How to gracefully stop execution in a cell?
- Does spylon-kernel support Spark 3.0? HOT 2
- spylon-kernel error : compilation: disabled (not enough contiguous free space left) HOT 1
- Unable to install spylon kernel HOT 2
- spylon launcher.packages inside kernel.json args
- s3 filesystem not found
- [BUG]: Spark submit fails: No such file or directory: '/opt/spark/python/pyspark/./bin/spark-submit'
- Cannot get Hive data HOT 1
- Run Scala cell on Jupyter notebook
- Cannot install spylon-kernel on Ubuntu 22 HOT 1
- Failed running `python -m spylon_kernel install`
- Unable to use existing spark server with spylon-kernel
- Using spylon-kernel with java?
- Outdated versioneer.py broken for Python 3.12
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from spylon-kernel.