Comments (9)
@shuchu Confirmed this happens in both 0.37.0 and 0.38.0.
@tokoko Understand the concerns completely, both on clock time and the possible use case for future-dated features.
The main issue with the current setup is there isn't an easy way to override the start date if you did typo the previous end date, which is how we found the issue.
Currently, if you accidentally materialize for say, a year ahead, all of your future materializations with correct end dates can't work. There's no real way to solve this other than blowing away the registry and starting again AFAICT.
Wouldn't necessarily have to compare with clock time, could simply check that start_date
is not less than end_date
for the materialization, there's no situation where a materialization could run in that scenario. In that case you could prompt for a new start_date
value, or just generally support it as an optional arg to materialize-incremental
.
from feast.
@samhallam-reverb did you try the 0.37.0 version?
also, welcome to help us fix this issue since you almost know the fix.
from feast.
I'm not sure about this... Currently materialize
commands only take into account values that are in event_timestamp
column, never a clock time. Bringing now
into this equation means we'll have to handle timezone differences between the two. Also, isn't there a possibility someone might want to prepopulate values into the future in the offline store, maybe you're calculating monthly feature values that you want to take effect from the start of the next month, idk...
To me, this feels like something that needs to be handled by the user, the user passes the end date after all and can always cap it with the current time beforehand.
from feast.
@samhallam-reverb Sure, looks like it always looks for a max value in it's history of materialization timestamps, so it seems impossible to be rectified by normal means. Registry API also only supports apply_materialization
that adds to the list of timestamp pairs, never removes them.
tbh, I'm not sure I got what you meant in the last paragraph. Unless you either look at a clock time or peek into the data, it will be hard to come up with any sort of logic and I'm not sure what that logic should be either.
What if we do away with storing materialization_intervals
history in the registry altogether? I'll have to double check, but pretty sure it's only used for querying the max end_date timestamp that has occurred, so we might as well switch to storing just that. In that case, apply_materialization
would simply overwrite the erroneous value and allow for the problem to be solved.
P.S. There is another issue somewhere where a user did a very frequent materialize-incremental
calls (once an hour, i think) and ended up bloating the registry as every materialize call added to the Feature View proto size in the registry. Switching to storing a single value would be a fix for that issue as well.
from feast.
@samhallam-reverb it seems 0.37 and 0.38 have the same behavior as 0.34
from feast.
@tokoko That makes sense to me, I don't know the context as to why the current schema was designed but it also seems that storing the materialization runs as individual rows would solve the problem too.
Happy to help contribute to either of those solutions if they seem acceptable.
from feast.
After thinking about it some more, I'm between these 2 approaches:
- As I outlined above, we mostly keep the current workflow, but switch from storing
materialization_intervals
to storing just a single timestamp designating that last ingested value. This fixes the issue in this ticket and also protects us from registry objects growing huge when a user runsmaterialize-incremental
frequently. - We can also scrap materialization info from the registry altogether and instead move the responsibility for storing those values to
OnlineStore
API (add 2 new methods for storing and retrievinglast_materialized
timestamp). One benefit of this is that registries will become less critical as long as you have object definitions intact in the git repo. Even if you nuke the registry, you can always rebuild with afeast apply
. Another big benefit is that this will simplify supporting multiple online stores at the same time as the materialization information will be owned by individual online stores themselves. The obvious downside is that it's breaking pluggableOnlineStore
API, but imho that's bound to happen before we reach 1.0 release, so now is the best time to do it.
@franciscojavierarceo @HaoXuAI @jeremyary wdyt?
from feast.
@tokoko I think keeping the current registry is the right thing as that metadata is important for reproducibility. I'd rather have to deal with the complexities of implementing timezones. From reviewing the code, the start date and end date are already timezone aware.
for feature_view in feature_views_to_materialize:
start_date = feature_view.most_recent_end_time
....
start_date = utils.make_tzaware(start_date)
end_date = utils.make_tzaware(end_date)
From the feature_store.py
file
from feast.
And it looks like it already uses datetime.utcnow()
in the ttl.
from feast.
Related Issues (20)
- feastdev/feature-server{,-java} 0.38.0 & 0.39.0 tags missing from Docker Hub
- feast gcp notebook, getting TypeError: Client.__init__() got an unexpected keyword argument 'database' HOT 1
- Use modular fixture functions for integration test environments
- OnDemandFeatureView.feature_transformation.infer_features does pass UDF outputs to python_type_to_feast_value_type HOT 1
- PostgresOnlineStore: Improve materialization HOT 17
- Is Feast helpful in case of the pictures and video as a features?
- FeastObject type is missing SavedDataset HOT 5
- Google cloud datastore: Client.__init__() got an unexpected keyword argument 'database' HOT 4
- Discussion: Postgres Online Store: Is connection handling done properly?
- new Dask version lock pandas to 2.0+
- Enhance the python feature server with new `list` endpoints. HOT 9
- Add Open Inference Protocol to feature servers HOT 3
- Add support for get_historical_features for vector search HOT 1
- Deprecation of distutils in python 3.12 breaking feast init HOT 1
- no image `feastdev/feast-operator:0.37.0`
- Rewrite RemoteOnlineStore to use get_online_features HOT 1
- Update CI to have a test for main feast dependency before release/deployment HOT 1
- Minor doc fix of page feature-servers
- Security fix for possible Cross-site-scripting (XSS) attack
- feast materialize can not load all data into online store HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from feast.