-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New helper module with some reusable functions #724
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Questions + feedback.
# Datetime conversion functions | ||
|
||
|
||
def date_to_timestamp(value, tzinfo=datetime.timezone.utc): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add type annotations. For example:
import typing as t
def date_to_timestamp(
value: t.Union[int, str, datetime.datetime],
tzinfo: datetime.tzinfo = datetime.timezone.utc) -> t.Optional[int]:
parsed_date = parse_date(value) | ||
|
||
if not parsed_date: | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either the function is misdocumented, or this is incorrect. What if you used -1
as a sentinel values instead of None
?
return int(parsed_date.timestamp()) | ||
|
||
|
||
def parse_date(value, tzinfo=datetime.timezone.utc): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add type annotations.
return parsed | ||
|
||
|
||
def timestamp_to_readable(value, tzinfo=datetime.timezone.utc, format_as='%Y-%m-%d %H:%M:%S'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use the ISO format as the default format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i.e. the built-in one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ISO is a little less readable, though certainly more readable than unix timestamps! But it might be worth the tradeoff to use a more standardized format.
# Flatten contacts | ||
|
||
def get_primary_contact_from_nested(contact_list, get_first=True, selector=None): | ||
"""Extracts single contact value from list of dictionaries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see a shifting in dostring styles between this style (which I prefer) and the description on a new line, e.g.:
"""
Extract single contact ...
Which is the correct style?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we have a preference. I was going to say "let's go with what Python prefers" but it looks like Python itself doesn't have a preference. I know it's a little inconsistent but I don't think it's a big deal to switch back and forth within parsons, and that's one less thing that PR contributors have to worry about.
This helper method helps "flatten" the dictionary by returning the primary number (or, | ||
if no primary number is found and get_first is True, the first number found). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Idea: what if we made the keys of the dictionary a tuple with all the numbers? e.g.
result = {
('5554444444'): {'foo': 'bar'},
('4441234567', '7771234567'): {'bin': 'ban'},
# ...
}
I bring this up, because there would be no exceptional cases here. If users wanted to get the first number, they would simply access the tuple and get the first element.
if not selector: # if selector still not found, look in dict | ||
dict_keys = list(contact_list[0].keys()) | ||
dict_keys.pop("primary") | ||
selector = dict_keys[0] # NOTE: this will break on dicts that have additional keys |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs fixing / addressing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
e.g. it could be addressed with a try-catch block, or by raising an error msg.
|
||
# Flatten contacts | ||
|
||
def get_primary_contact_from_nested(contact_list, get_first=True, selector=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this function used? What problem does it solve?
parsed_value = re.compile(r'\d+(?:\.\d+)?').findall(value) # extracts digits only | ||
value = "".join(parsed_value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this a bit more readable / easier to maintain (regexes, in my view, should be avoided at all costs):
parsed_value = re.compile(r'\d+(?:\.\d+)?').findall(value) # extracts digits only | |
value = "".join(parsed_value) | |
value = "".join([v for v in value if v.isdigit()]) |
What problems are you trying to solve? |
Hi Alex, thanks for these reviews, lots of good catches here. Before you
do any additional pr reviews, let's touch base about the project, our goals
and community norms, etc. I'll email you tomorrow to set up a time.
…On Sat, Sep 17, 2022, 6:34 PM Alex Merose ***@***.***> wrote:
the overall goal of providing more helper code for users
What problems are you trying to solve?
—
Reply to this email directly, view it on GitHub
<#724 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAI75YTTOWZALUSF2CVE3TDV6ZBRDANCNFSM546PV5BQ>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Closing this but linking it from #836 so that the code/discussion here isn't lost. |
I would like for Parsons to include more reusable code which helps users with transformations and analysis. I have placed some examples in a new
etl_helpers
folder.I am looking for feedback on:
etl_helpers
a decent name? I wanted to distinguish from the existingutilities
which is primarily for code that helps Parsons contributors enhance Parsons connectors.Once the overall approach is approved, and I have feedback on the specific helper functions, I will write some tests for them as well as incorporate them into the documentation.
Note: I copied two datetime functions from
utilities/datetime.py
which I didn't even know were there before. Eventually with enough notice we should remove theutilities/datetime.py
file.Checklist: