Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Versioning of features / nodes #175

Closed
tim-x-y-z opened this issue May 23, 2023 · 3 comments
Closed

Versioning of features / nodes #175

tim-x-y-z opened this issue May 23, 2023 · 3 comments

Comments

@tim-x-y-z
Copy link

tim-x-y-z commented May 23, 2023

Is your feature request related to a problem? Please describe.
I'd love to know what version a feature was created with. I could just use a git commit hash, but it may be that between 2 commits, the code though which the feature flows hasn't changed so the version should be the same.
This would be ideal in situation where we would need to pre-process a large amount of feature to be used later, than some process could just check whether the features were created with the same "version". If not, recompute only those ones.

Describe the solution you'd like
Each features / last node of a graph has a version attached to it. The version depends on the code within the function used to create the feature but also of all the parents.
I feel in some ways i'd like the solution to be clear / easy. Like a hash of the code of the node.

i feel it might be tricky to think about for nodes that interact with io but perhaps that should be dealt by the user.

Describe alternatives you've considered
I considered using git commit hash, but features code changes much less frequently than the overall repo. I considered implement something hacky myself but haven't got to it.

Additional context
Screenshot 2023-05-23 at 15 03 52

@skrawcz
Copy link
Collaborator

skrawcz commented May 23, 2023

@tim-habitat thanks for the idea! @elijahbenizzy and I have talked about this, as this is very close in functionality to "caching"/"check pointing". Would love to chat more about your use case. Would you be up for a call? Feel free to jump into slack and we can schedule a time/chat more there.

@tim-x-y-z
Copy link
Author

this is very close in functionality to "caching"/"check pointing"

I agree, this why i would like this. Also can be useful for monitoring / tracking changes between training and inference of ml models.

Would you be up for a call?

Sure thing!

@zilto
Copy link
Collaborator

zilto commented Mar 3, 2024

some features are now available under the h_diskcache plugin and the hamilton CLI. Continuing discussion in #728

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants