-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make records hashable #107
Make records hashable #107
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #107 +/- ##
==========================================
+ Coverage 80.41% 80.46% +0.05%
==========================================
Files 34 34
Lines 3165 3174 +9
==========================================
+ Hits 2545 2554 +9
Misses 620 620
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding the _mutable
flag/field, I think it's best if we just treat it as how dataclasses works when using unsafe_hash=True
, hashable but still mutable:
In [1]: from dataclasses import dataclass
...:
...: @dataclass(unsafe_hash=True)
...: class MyRecord:
...: foo: str
...: bar: int
...:
In [2]: a = MyRecord("a", 1337)
In [3]: b = MyRecord("b", 1337)
In [4]: set([a, b])
Out[4]: {MyRecord(foo='a', bar=1337), MyRecord(foo='b', bar=1337)}
In [5]: b.foo = "a"
In [6]: set([a, b])
Out[6]: {MyRecord(foo='a', bar=1337)}
I think the _mutable
flag/field introduces unnecessary hand holding and extra code complexity. So it's best if we just not implement this.
Ideally we want to make flow.record compatible with dataclasses which could solve most of these features more nicely. It's something we want to explore in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the nice feature and implementation, LGTM
This PR implements the
__hash__
functions for theRecord
class, enabling grouping records in sets. This enables set operations on collections of records, e.g:For some record types though, there will be fields that you might want to ignore, such as
_generated
and possibly_source
when comparing record output from two different targets. Preferably you want the fields to ignore to be configurable at runtime. Therefore, we opted to add an environment variableFLOW_RECORD_IGNORE
which can contain a list of comma-separated field names to ignore.We implemented a check where hashed records are considered 'immutable' and a
ValueError
is raised if a property is changed after__hash__
has been called on aRecord
instance. This may have some performance impact though. Feel free to suggest alternative approaches.