-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: more weather pathfinder improvements #785
base: main
Are you sure you want to change the base?
Conversation
|
||
def remove_locality_suffix(city: str) -> str: | ||
"""Remove either city or municipality suffix from a city name""" | ||
return re.sub(r"\s*(city|municipality)$", "", city, flags=re.IGNORECASE).strip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, sure this is beneficial? I can think of "Mexico City", "Panama City", "New York City", "Salt Lake City", "Oklahoma City", "Quebec City", "Guatemala City", "Ho Chi Minh City", etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I did test out "New York City", "Quebec City" and they were okay, but you're right the others may or may not work. There are a couple of cities in South America that will benefit. Given that I think it would make sense to have the suffix removed as another city name to try if the original fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. If this doesn't regress other locations, we can give it a try.
You can compile this regex (re.compile()
) as a constant so it won't be compiled on the fly for each removal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
k so i now have it so that we try the original, normalized, then without the city/municipality suffix
f01b316
to
6c62c21
Compare
|
||
LOCALITY_SUFFIX_PATTERN = re.compile(r"\s*(city|municipality)$", re.IGNORECASE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you want to match one and more whitespace before "city"?
LOCALITY_SUFFIX_PATTERN = re.compile(r"\s*(city|municipality)$", re.IGNORECASE) | |
LOCALITY_SUFFIX_PATTERN: re.Pattern = re.compile(r"\s+(city|municipality)$", re.IGNORECASE) |
|
||
def remove_locality_suffix(city: str) -> str: | ||
"""Remove either city or municipality suffix from a city name""" | ||
return LOCALITY_SUFFIX_PATTERN.sub("", city).strip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return LOCALITY_SUFFIX_PATTERN.sub("", city).strip() | |
return LOCALITY_SUFFIX_PATTERN.sub("", city) |
|
||
if res is not None: | ||
return res, is_skipped | ||
city = geolocation.city |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having to do all the following upfront for every location can be quite wasteful, in particular, for those locations that don't require special handling. Let's make normalization optional and lazy.
# define this at the module level
CITY_NAME_NORMALIZERS: Callable[[Str], Str] = [
lambda a: a,
normalize_string,
remove_locality_suffix,
]
async def explore(...):
...
# if city is None, just return
if geolocation.city is None:
return None, is_skipped
city: str = typing.cast(str, geolocation.city) # tell the type checker it's truly not None
# map is lazy, so items of `cities` would only be evaluated one by one if needed
cities = map(lambda fn: fn(city), CITY_NAME_NORMALIZERS)
explored_cities: list[str] = [] # store the explored cities to avoid duplicates
for region in compass(weather_context.geolocation):
for city in cities:
if city in explored_cities:
continue
else:
explored_cities.append(city)
# probe the location as usual
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh that's fancy
country = location.country | ||
regions = location.regions | ||
city = location.city | ||
|
||
if regions and country and city: | ||
corrected_city = CITY_NAME_CORRECTION_MAPPING.get(city, city) | ||
corrected_city = CITY_NAME_CORRECTION_MAPPING.get(city, remove_locality_suffix(city)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since remove_locality_suffix()
is already handled in explore()
, do we still need it here? Doing the suffix removal in both places seems pretty hard to follow to me. Is there a way for us to not do suffix removing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yup, you're right
country = location.country | ||
regions = location.regions | ||
city = location.city | ||
|
||
if regions and country and city: | ||
corrected_city = CITY_NAME_CORRECTION_MAPPING.get(city, city) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, is this a mis-kill? CITY_NAME_CORRECTION_MAPPING
is still in use, eh?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, r+wc.
References
Description
Shrunk down the pathfinder list by adding functions to remove the locality suffix, and normalizing the city as an option to try. I left some in the list like
Đồng Nại": "Dong Nai",
since when normalized it givesong Nai
PR Review Checklist
Put an
x
in the boxes that apply[DISCO-####]
, and has the same title (if applicable)[load test: (abort|skip|warn)]
keywords are applied to the last commit message (if applicable)