Comments (9)
I'm not a big fan of a lookback
name tbh. I think it only makes sense if online store is the only component that we're focusing on. If we set aside online stores for a minute, during get_historical_features
call the way offline store handles the rows sounds compatible to the meaning of ttl
to me. When a client askს for a feature value for a specific point in time, only the values that haven't expired at that time are considered. The fact that it's doing a "lookback" to do that is an implementation detail.
I think my preferred approach would be to fix the underlying issue rather than change the parameter name. the main problem here is that as the issue indicated, ttl
is not handled the same way in the offline and online flows. It's ignored in online stores 😄. I understand there are good reasons why it might be hard to actually delete expired rows from online storage during materialization, but what we can do is to discard expired values after in the online store logic itself. Once we have that, ttl
wouldn't be so misleading anymore.
from feast.
Regarding option 2, there are situations where features will never be fetched again for a given entity key.
Example: imagine that you have features calculated for a customer entity to be used in your product. However, some customers cancel their accounts on your product. You don't need to make inference and generate features for those customers anymore.
Materialization will no longer update features and inference endpoints will not call the get_online_features function for those customers anymore. And then, the old data will remain in the online store forever unless some cleanup is done.
from feast.
I'm in favor of a mix between options 2 and 3:
- We store ttl information (
expire_date
for example) in the online store for every entity in the feature view during materialization. - online store reader treats expired feature values in an online store as if they're not there, discards them after reading from the database.
- introduce
feast cleanup
command that will physically remove all expired data from an online store. (This is only relevant for an online store, in the offline store nothing's really ever expired as the user might always want to query past information). This command should be only for housekeeping and not affect online store behavior at all, in other words it will only remove the feature values from the online store that would be discarded by online store read method anyway.
from feast.
I need to spend more time thinking about this but I do agree the ttl
at the FeatureView
level is misleading as I had this exact experience in my last role and it caused me some headaches. I think renaming it to follow industry conventions would be good.
from feast.
Fixing ttl
to behave as expected would be ideal. I haven't used the offline store as much but if it's using the ttl
as expected then I agree with your approach.
from feast.
If you look at the documentation it says:
Feature views consist of:
...
(optional) a TTL, which limits how far back Feast will look when generating historical datasets
According to Wikipedia:
Time to live (TTL) or hop limit is a mechanism which limits the lifespan or lifetime of data in a computer or network...The Time to Live is an indication of an upper bound on the lifetime of an internet datagram.
And in HTTP:
Time to live may also be expressed as the date and time on which a record expires. The Expires: header in HTTP responses, the Cache-Control: max-age header field in both requests and responses and the expires field in HTTP cookies express time-to-live in this way.
So it would be rational for the ttl
for an online Feature View to behave as an "upper bound on the lifetime of data in a database."
Options
- We could change
ttl
tooffline_store_ttl
oroffline_ttl
to make this name more intuitive and explicit - We could add another parameter called
online_store_ttl
oronline_ttl
and replicate the HTTP behavior by:- returning
None
orExpired
along with some metadata when callingget_online_features
- Dropping the record in the database when calling
get_online_features
after the read is recieved
- returning
- We could create a command to expire offline data or online data in batch and call it something like
feast expire feature_view
that users could run on some schedule
from feast.
Yeah, agreed. Approach (2) + (3) is the right one.
Only thing left to decide is naming conventions...do you all have any opinions here?
For example, we could continue to keep the name ttl
and just make the behavior more obvious (and document it) within each respective function call (i.e., making ttl
behave as expected for get_online_features
and get_offline_features
).
Or we could go the route of online_ttl
and renaming the current ttl
to offline_ttl
.
And I do like feast cleanup
but that may also make the user think more is happening than dropping records. Not sure.
from feast.
- We store ttl information (expire_date for example) in the online store for every entity in the feature view during materialization.
I would recommend we store the expire_date
in the feature view as metadata. Changes to the expiration will be more straightforward that way.
from feast.
@franciscojavierarceo For some reason, I assumed an entity timestamp was not part of the online store, my bad. if we already have an entity timestamp in there and there's a ttl field in a feature view, that first point is redundant. online store can decide whether values are expired or not based on those 2.
from feast.
Related Issues (20)
- Make RegistryServer writable
- Tags are incorrect on patch releases
- Running `feast materialize-incremental` for an end date far in future breaks incremental materializations up to that date HOT 9
- [Placeholder] Improve the code quality by following the Sonarcloud result
- venv error python3.10 HOT 6
- remove the self assignment in "sdk/python/tests/utils/e2e_test_validation.py" at line 179
- Registry get_feature_view should read stream and on-demand feature views HOT 10
- Remove feature server remote deployment options (lambda, cloudrun)
- Retire Pytz library from the code repo.
- Retire/update the datetime.utcnow() function
- 'feast.ui_server' is not a package HOT 11
- Update the Snowflake document to mention the SQL query string limitation. HOT 2
- Add `--tags` filtering capability to the `feast feature-views list` cli command
- Feature UI Server image won't start in an OpenShift cluster
- Add Python Native Support for ODFVs in Spark get_historical_features() HOT 1
- Python feature server giving error -> unexpected keyword argument 'no_feature_log'
- Add support for ML Metadata or alternative
- Add `online_read_async` method to Postgres Online Store HOT 1
- Add support for Python 3.11 and Ubuntu for SQLite HOT 2
- Update docs to recommend Python as the recommended language for a service
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from feast.