-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running feast materialize-incremental
for an end date far in future breaks incremental materializations up to that date
#4222
Comments
@samhallam-reverb did you try the 0.37.0 version? also, welcome to help us fix this issue since you almost know the fix. |
I'm not sure about this... Currently To me, this feels like something that needs to be handled by the user, the user passes the end date after all and can always cap it with the current time beforehand. |
@shuchu Confirmed this happens in both 0.37.0 and 0.38.0. @tokoko Understand the concerns completely, both on clock time and the possible use case for future-dated features. The main issue with the current setup is there isn't an easy way to override the start date if you did typo the previous end date, which is how we found the issue. Currently, if you accidentally materialize for say, a year ahead, all of your future materializations with correct end dates can't work. There's no real way to solve this other than blowing away the registry and starting again AFAICT. Wouldn't necessarily have to compare with clock time, could simply check that |
@samhallam-reverb Sure, looks like it always looks for a max value in it's history of materialization timestamps, so it seems impossible to be rectified by normal means. Registry API also only supports tbh, I'm not sure I got what you meant in the last paragraph. Unless you either look at a clock time or peek into the data, it will be hard to come up with any sort of logic and I'm not sure what that logic should be either. What if we do away with storing P.S. There is another issue somewhere where a user did a very frequent |
@samhallam-reverb it seems 0.37 and 0.38 have the same behavior as 0.34 |
@tokoko That makes sense to me, I don't know the context as to why the current schema was designed but it also seems that storing the materialization runs as individual rows would solve the problem too. Happy to help contribute to either of those solutions if they seem acceptable. |
After thinking about it some more, I'm between these 2 approaches:
|
@tokoko I think keeping the current registry is the right thing as that metadata is important for reproducibility. I'd rather have to deal with the complexities of implementing timezones. From reviewing the code, the start date and end date are already timezone aware. for feature_view in feature_views_to_materialize:
start_date = feature_view.most_recent_end_time
....
start_date = utils.make_tzaware(start_date)
end_date = utils.make_tzaware(end_date) From the |
And it looks like it already uses |
Expected Behavior
When running
feast materialize-incremental
it should not be possible for a materialize to run starting from a date that hasn't occurred yet. Would expectmaterialize-incremental
to choosemin(now, most_recent_materialization_date)
Current Behavior
When running
feast materialize-incremental
for a future date, say2025-04-08T00:00:00
, it will set that date as the next start date. This breaks all futurematerialize-incremental
commands up until that date, if you run daily materializations they will no longer function until the actual date goes beyond this erroneously entered start date. Because the data is stored as a serialized protobuf in the SQL-based registry this is also non-trivial to change.You can see code here showing that there is no validation checks for the start date, merely the latest date that has ever run.
Steps to reproduce
Run a
materialize-incremental
command for a date far in the future.Run a second
materialize-incremental
command for a date less than the last start date. Watch Feast always find no date to materialize.Specifications
Possible Solution
Refactor the above function to choose
min(now, most_recent_materialization_date)
. Happy to contribute to this fix.The text was updated successfully, but these errors were encountered: