Git Product home page Git Product logo

Comments (3)

itsvikramagr avatar itsvikramagr commented on May 17, 2024

We can have small files committed as part of ETL/Streaming operations. An obvious next step is to create a job which can over-write files to create more compacted files. Spark module currently only supports "Append" while creating a writer. For over-writing files, there is a need to extend the writer to support other modes as well. We already have overWrite and rewriteFile APIs. Can they be used to implement the new functionality?

cc @rdblue @prakharjain09

from iceberg.

rdblue avatar rdblue commented on May 17, 2024

@prakharjain09, Spark is removing SaveMode because it has unreliable behavior. Instead, Spark is introducing overwrite APIs that match some of the APIs that Iceberg supports. See SupportsDynamicOverwrite and SupportsOverwrite in the latest Spark DSv2 design doc.

When Spark supports those, Iceberg will be able to support overwrite.

from iceberg.

rdblue avatar rdblue commented on May 17, 2024

I went ahead and added this recently in #246 to support overwrite in Spark 2.4 since it is taking a while to get 3.0 out. It should work the same way in 3.0, but we will no longer recommend using this API because it doesn't state the behavior sources should implement (overwrite, truncate, replace table are all allowed).

See the overwrite docs and #246.

from iceberg.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.