Git Product home page Git Product logo

Comments (8)

tokoko avatar tokoko commented on July 18, 2024 2

Glad to be able to help. One more pointer that may help you out, but note that this my preferred direction that I'm trying to push (but with not much luck as of yet :) ). Despite your preference for polars, you should probably still check out duckdb PR I linked above. The actual offline store implementation is written using ibis rather than duckdb directly. As ibis has a fairly good polars backend, you could easily reuse the same ibis implementation. In that case, polars implementation might be just a single line code change (probably not but something close to that).

from feast.

tokoko avatar tokoko commented on July 18, 2024 1

@ion-elgreco Let me try to give you a quick rundown of options how the integration might look like. First of all, The concept closest to backend in feast is an OfflineStore, but offline store implementations don't just specify the sources and how they should be read, they also implement additional logic on top of it (point-in-time join between entity dataframe and feature tables). That's why it's unlikely that we can have a deltalake offline store implementation as there's no way to specify data transformations with deltalake. The closest thing to what you're looking for is probably a polars implementation (it's using delta-rs if i'm not mistaken, right?) or something like duckdb that can be extended to use delta-rs for working with delta tables (I already have a draft PR that adds duckdb minus delta #3822).

Feast has another concept called DataSource. This is how you specify the sources that offline stores will have to read later on.
The implementation you might be interested in is FileSource as @sudohainguyen pointed out, that allows users to specify file format, but currently only parquet format is supported. So the first logical step should be to extend FileSource to allow users to specify delta as a file format. Once we have that, we can teach various offline store implementations (jvm-based or otherwise) how to read them.

from feast.

sudohainguyen avatar sudohainguyen commented on July 18, 2024

as I understand you want to query a feature table as delta format, spark and trino can help.
feast does support both of them

from feast.

ion-elgreco avatar ion-elgreco commented on July 18, 2024

No I would like to do this without a JVM application. So delta-rs Python bindings (deltalake) can be used to achieve this: https://github.com/delta-io/delta-rs

from feast.

sudohainguyen avatar sudohainguyen commented on July 18, 2024

cool, we need some changes to extend FileSource to read delta tables, do you mind contributing?

from feast.

ion-elgreco avatar ion-elgreco commented on July 18, 2024

Sure, if you can give me some pointers : )

from feast.

ion-elgreco avatar ion-elgreco commented on July 18, 2024

@tokoko gotcha, that helps! Since I mainly use Polars I will look into adding that as an offline store and then add delta as additional filesource using deltalake as dependency.

Yup Polars uses deltalake to read and write.

from feast.

sudohainguyen avatar sudohainguyen commented on July 18, 2024

Great explaination @tokoko !
Looking forward to seeing changes

from feast.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.