New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

New helper module with some reusable functions #724

Closed

shaunagm wants to merge 2 commits into main from add-helpers

Collaborator

shaunagm commented Jul 28, 2022 •

edited

Loading

I would like for Parsons to include more reusable code which helps users with transformations and analysis. I have placed some examples in a new etl_helpers folder.

I am looking for feedback on:

the overall goal of providing more helper code for users
the architecture here. is etl_helpers a decent name? I wanted to distinguish from the existing utilities which is primarily for code that helps Parsons contributors enhance Parsons connectors.
the specific helper functions I added

Once the overall approach is approved, and I have feedback on the specific helper functions, I will write some tests for them as well as incorporate them into the documentation.

Note: I copied two datetime functions from utilities/datetime.py which I didn't even know were there before. Eventually with enough notice we should remove the utilities/datetime.py file.

Checklist:

shaunagm changed the title ~~First stab at helper module with some reusable functions~~ New helper module with some reusable functions


          First stab at helper module with some reusable functions

7155c30

shaunagm force-pushed the add-helpers branch from 8f92cc2 to 7155c30 Compare

July 28, 2022 20:19

alxmrs reviewed

View reviewed changes

Contributor

alxmrs left a comment

Questions + feedback.

parsons/etl_helpers/conversions.py

		# Datetime conversion functions


		def date_to_timestamp(value, tzinfo=datetime.timezone.utc):

Contributor

alxmrs Sep 17, 2022

Please add type annotations. For example:

import typing as t
def date_to_timestamp(
    value: t.Union[int, str, datetime.datetime],
    tzinfo: datetime.tzinfo = datetime.timezone.utc) -> t.Optional[int]:

parsons/etl_helpers/conversions.py

+                  parsed_date = parse_date(value)
+                  if not parsed_date:
+                      return None

Contributor

alxmrs Sep 17, 2022

Either the function is misdocumented, or this is incorrect. What if you used -1 as a sentinel values instead of None?

parsons/etl_helpers/conversions.py

		return int(parsed_date.timestamp())


		def parse_date(value, tzinfo=datetime.timezone.utc):

Contributor

alxmrs Sep 17, 2022

Add type annotations.

parsons/etl_helpers/conversions.py

		return parsed


		def timestamp_to_readable(value, tzinfo=datetime.timezone.utc, format_as='%Y-%m-%d %H:%M:%S'):

Contributor

alxmrs Sep 17, 2022

Why not use the ISO format as the default format?

Contributor

alxmrs Sep 17, 2022

i.e. the built-in one?

Collaborator Author

shaunagm Sep 22, 2022 •

edited

Loading

ISO is a little less readable, though certainly more readable than unix timestamps! But it might be worth the tradeoff to use a more standardized format.

parsons/etl_helpers/conversions.py

+              # Flatten contacts
+              def get_primary_contact_from_nested(contact_list, get_first=True, selector=None):
+                  """Extracts single contact value from list of dictionaries.

Contributor

alxmrs Sep 17, 2022

I see a shifting in dostring styles between this style (which I prefer) and the description on a new line, e.g.:

"""
Extract single contact ...

Which is the correct style?

Collaborator Author

shaunagm Sep 22, 2022

I don't think we have a preference. I was going to say "let's go with what Python prefers" but it looks like Python itself doesn't have a preference. I know it's a little inconsistent but I don't think it's a big deal to switch back and forth within parsons, and that's one less thing that PR contributors have to worry about.

parsons/etl_helpers/conversions.py

Comment on lines +101 to +102

		This helper method helps "flatten" the dictionary by returning the primary number (or,
		if no primary number is found and get_first is True, the first number found).

Contributor

alxmrs Sep 17, 2022

Idea: what if we made the keys of the dictionary a tuple with all the numbers? e.g.

result = {
   ('5554444444'): {'foo': 'bar'},
   ('4441234567', '7771234567'): {'bin': 'ban'},
   # ...
}

I bring this up, because there would be no exceptional cases here. If users wanted to get the first number, they would simply access the tuple and get the first element.

parsons/etl_helpers/conversions.py

+                      if not selector:  # if selector still not found, look in dict
+                          dict_keys = list(contact_list[0].keys())
+                          dict_keys.pop("primary")
+                          selector = dict_keys[0]  # NOTE: this will break on dicts that have additional keys

Contributor

alxmrs Sep 17, 2022

This needs fixing / addressing.

Contributor

alxmrs Sep 17, 2022

e.g. it could be addressed with a try-catch block, or by raising an error msg.

parsons/etl_helpers/conversions.py


		# Flatten contacts

		def get_primary_contact_from_nested(contact_list, get_first=True, selector=None):

Contributor

alxmrs Sep 17, 2022

How is this function used? What problem does it solve?

parsons/etl_helpers/conversions.py

Comment on lines +176 to +177

		parsed_value = re.compile(r'\d+(?:\.\d+)?').findall(value) # extracts digits only
		value = "".join(parsed_value)

Contributor

alxmrs Sep 17, 2022

I find this a bit more readable / easier to maintain (regexes, in my view, should be avoided at all costs):

Suggested change

      
                parsed_value = re.compile(r'\d+(?:\.\d+)?').findall(value)  # extracts digits only
          
                value = "".join(parsed_value)
          
                value = "".join([v for v in value if v.isdigit()])

Contributor

alxmrs commented Sep 17, 2022

the overall goal of providing more helper code for users

What problems are you trying to solve?

Collaborator Author

shaunagm commented Sep 17, 2022 via email

Hi Alex, thanks for these reviews, lots of good catches here. Before you do any additional pr reviews, let's touch base about the project, our goals and community norms, etc. I'll email you tomorrow to set up a time.

On Sat, Sep 17, 2022, 6:34 PM Alex Merose ***@***.***> wrote: the overall goal of providing more helper code for users What problems are you trying to solve? — Reply to this email directly, view it on GitHub <#724 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAI75YTTOWZALUSF2CVE3TDV6ZBRDANCNFSM546PV5BQ> . You are receiving this because you authored the thread.Message ID: ***@***.***>


          Merge branch 'main' into add-helpers

b1cf748

shaunagm mentioned this pull request

Add more helper methods and utilities to Parsons (and create standardized process for it) #554

Closed

Collaborator Author

shaunagm commented Apr 10, 2023

Adding helper utilities to Parsons

shaunagm mentioned this pull request

chore(Geocoder): Add more informative error handling to geocode #729 #810

Merged

shaunagm mentioned this pull request

Restructure utilities and improve documentation of them #836

Open

Collaborator Author

shaunagm commented Jun 6, 2023

Closing this but linking it from #836 so that the code/discussion here isn't lost.

shaunagm closed this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet