Git Product home page Git Product logo

spark-utils's Issues

Potential security vulnerability in the zstd C library.

Hi, @tupol , I'd like to report a vulnerable dependency in org.tupol:spark-utils-io_2.12:0.6.2.

Issue Description

I noticed that org.tupol:spark-utils-io_2.12:0.6.2 directly depends on org.apache.spark:spark-core_2.12:3.0.1 in the pom. However, as shown in the following dependency graph, org.apache.spark:spark-core_2.12:3.0.1 sufferes from the vulnerability which the C library zstd(version:1.4.3) exposed: CVE-2021-24032.

Dependency Graph between Java and Shared Libraries

image (12)

Suggested Vulnerability Patch Versions

org.apache.spark:spark-core_2.12:3.2.0 (>=3.2.0) has upgraded this vulnerable C library zstd to the patch version 1.5.0.

Java build tools cannot report vulnerable C libraries, which may induce potential security issues to many downstream Java projects. Could you please upgrade this vulnerable dependency?

Thanks for your help~
Best regards,
Helen Parr

Cross compile on Scala 2.11 and 2.12

Spark-Utils needs to catch up with the world and as a first step, it needs to cross-compile on both Scala 2.11 and 2.12.
While at it, Spark 2.4.x should be brought up to the latest version as well.

Avro support is native in Spark 2.4.x

Avro support is native in Spark 2.4.x, so we no longer can pass it as "com.databricks.spark.avro", but it is now passed as "avro".

One solution is to add a generic source / sink configuration that accept anything, with minimal validation. This might work, as it should not break the compatibility between versions and allow for more custom sources and sinks.

Another one is just to rename "com.databricks.spark.avro" to "avro", but this will break the compatibility between 2.3.x and 2.4.x Spark versions.

Application Configuration File Name

Currently the application configuration file name is limited to application.conf.

We should be able to pass in an alternative application configuration file name as an application parameter, e.g. -conf-file-name='my_app.conf' and the SparkApp should pick the configuration file from my_app.conf.

Exceptions vs `Try[T]`

In the early versions the scala-utils and spark-utils were Try[T] centric, meaning that everything that could fail returned a Try[T]. At some point, observing how some developers were using it, I decided that it might be easier to throw exceptions, even though my functional blood was boiling a little.

Question Should we go for a more functional, no side effects approach or should we keep throwing exceptions?
In a sense we are also logging, so we have few pure functions, so this question is actually not as easy as it seems.

SparkApp fails if no configuration is expected

When implementing a SparkApp that has no configuration expectations, the initialization of the app fails with the following exception:

An unexpected com.typesafe.config.ConfigException$Missing was thrown.
ScalaTestFailureLocation: org.tupol.spark.SparkAppSpec$$anonfun$5 at (SparkAppSpec.scala:64)
org.scalatest.exceptions.TestFailedException: An unexpected com.typesafe.config.ConfigException$Missing was thrown.
. . .

Sample test for SparkApp:

  object MockAppNoConfig extends SparkApp[String, Unit] {
    def createContext(config: Config): String = "Hello"
    override def run(implicit spark: SparkSession, config: String): Unit = Unit
  }

  test("SparkApp.main successfully completes") {
    noException shouldBe thrownBy(MockAppNoConfig.main(Array()))
  }

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.