Git Product home page Git Product logo

Comments (9)

tokoko avatar tokoko commented on June 11, 2024 1

I'm not a big fan of a lookback name tbh. I think it only makes sense if online store is the only component that we're focusing on. If we set aside online stores for a minute, during get_historical_features call the way offline store handles the rows sounds compatible to the meaning of ttl to me. When a client askს for a feature value for a specific point in time, only the values that haven't expired at that time are considered. The fact that it's doing a "lookback" to do that is an implementation detail.

I think my preferred approach would be to fix the underlying issue rather than change the parameter name. the main problem here is that as the issue indicated, ttl is not handled the same way in the offline and online flows. It's ignored in online stores 😄. I understand there are good reasons why it might be hard to actually delete expired rows from online storage during materialization, but what we can do is to discard expired values after in the online store logic itself. Once we have that, ttl wouldn't be so misleading anymore.

from feast.

breno-costa avatar breno-costa commented on June 11, 2024 1

Regarding option 2, there are situations where features will never be fetched again for a given entity key.

Example: imagine that you have features calculated for a customer entity to be used in your product. However, some customers cancel their accounts on your product. You don't need to make inference and generate features for those customers anymore.

Materialization will no longer update features and inference endpoints will not call the get_online_features function for those customers anymore. And then, the old data will remain in the online store forever unless some cleanup is done.

from feast.

tokoko avatar tokoko commented on June 11, 2024 1

I'm in favor of a mix between options 2 and 3:

  • We store ttl information (expire_date for example) in the online store for every entity in the feature view during materialization.
  • online store reader treats expired feature values in an online store as if they're not there, discards them after reading from the database.
  • introduce feast cleanup command that will physically remove all expired data from an online store. (This is only relevant for an online store, in the offline store nothing's really ever expired as the user might always want to query past information). This command should be only for housekeeping and not affect online store behavior at all, in other words it will only remove the feature values from the online store that would be discarded by online store read method anyway.

from feast.

franciscojavierarceo avatar franciscojavierarceo commented on June 11, 2024

I need to spend more time thinking about this but I do agree the ttl at the FeatureView level is misleading as I had this exact experience in my last role and it caused me some headaches. I think renaming it to follow industry conventions would be good.

from feast.

franciscojavierarceo avatar franciscojavierarceo commented on June 11, 2024

Fixing ttl to behave as expected would be ideal. I haven't used the offline store as much but if it's using the ttl as expected then I agree with your approach.

from feast.

franciscojavierarceo avatar franciscojavierarceo commented on June 11, 2024

If you look at the documentation it says:

Feature views consist of:
...
(optional) a TTL, which limits how far back Feast will look when generating historical datasets

According to Wikipedia:

Time to live (TTL) or hop limit is a mechanism which limits the lifespan or lifetime of data in a computer or network...The Time to Live is an indication of an upper bound on the lifetime of an internet datagram.

And in HTTP:

Time to live may also be expressed as the date and time on which a record expires. The Expires: header in HTTP responses, the Cache-Control: max-age header field in both requests and responses and the expires field in HTTP cookies express time-to-live in this way.

So it would be rational for the ttl for an online Feature View to behave as an "upper bound on the lifetime of data in a database."

Options

  1. We could change ttl to offline_store_ttl or offline_ttl to make this name more intuitive and explicit
  2. We could add another parameter called online_store_ttl or online_ttl and replicate the HTTP behavior by:
    • returning None or Expired along with some metadata when calling get_online_features
    • Dropping the record in the database when calling get_online_features after the read is recieved
  3. We could create a command to expire offline data or online data in batch and call it something like feast expire feature_view that users could run on some schedule

Thoughts @tokoko @HaoXuAI?

from feast.

franciscojavierarceo avatar franciscojavierarceo commented on June 11, 2024

Yeah, agreed. Approach (2) + (3) is the right one.

Only thing left to decide is naming conventions...do you all have any opinions here?

For example, we could continue to keep the name ttl and just make the behavior more obvious (and document it) within each respective function call (i.e., making ttl behave as expected for get_online_features and get_offline_features).

Or we could go the route of online_ttl and renaming the current ttl to offline_ttl.

And I do like feast cleanup but that may also make the user think more is happening than dropping records. Not sure.

from feast.

franciscojavierarceo avatar franciscojavierarceo commented on June 11, 2024
  • We store ttl information (expire_date for example) in the online store for every entity in the feature view during materialization.

I would recommend we store the expire_date in the feature view as metadata. Changes to the expiration will be more straightforward that way.

from feast.

tokoko avatar tokoko commented on June 11, 2024

@franciscojavierarceo For some reason, I assumed an entity timestamp was not part of the online store, my bad. if we already have an entity timestamp in there and there's a ttl field in a feature view, that first point is redundant. online store can decide whether values are expired or not based on those 2.

from feast.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.