Spark 101 (14 min read) (blogpost)
Getting Started with Apache Spark 2.x (4 hour read) (e-book)
Advanced Data Science On Spark (2 hour watch) (Spark Summit Talk) (slides)
Mastering Apache Spark (e-book)
Configuration parameters (Spark docs)
7 tips to debug ... (databricks blog)
- Scale up slowly
- In Spark UI, drill down to task page, sort tasks, examine "Erros" column
- Reproduce errors
- Examine dataset partitioning
- large data + too few partitions (symptom: there are few tasks, but each takes a lot of time)
- small data + too many parititions (overhead of many partitions can slow down the job)
- spark.sql.shuffle.partitions
- Beware bad inputs
Spark UI (databricks blog)
Practical optimization (databricks training)
Spark: The Definitive Guide Book (Ch18-Monitoring and Debugging) (Ch19-Performance Tuning)
Debugging and logging best practices (blogpost)
A tale of three Apache Spark APIs:... (databricks blog) (Spark summit video)