Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: more weather pathfinder improvements #785

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

tiftran
Copy link
Contributor

@tiftran tiftran commented Feb 7, 2025

References

Description

Shrunk down the pathfinder list by adding functions to remove the locality suffix, and normalizing the city as an option to try. I left some in the list like Đồng Nại": "Dong Nai", since when normalized it gives ong Nai

PR Review Checklist

Put an x in the boxes that apply

  • This PR conforms to the Contribution Guidelines
  • The PR title starts with the JIRA issue reference, format example [DISCO-####], and has the same title (if applicable)
  • [load test: (abort|skip|warn)] keywords are applied to the last commit message (if applicable)
  • Documentation has been updated (if applicable)
  • Functional and performance test coverage has been expanded and maintained (if applicable)

@tiftran tiftran requested a review from a team February 7, 2025 18:42

def remove_locality_suffix(city: str) -> str:
"""Remove either city or municipality suffix from a city name"""
return re.sub(r"\s*(city|municipality)$", "", city, flags=re.IGNORECASE).strip()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, sure this is beneficial? I can think of "Mexico City", "Panama City", "New York City", "Salt Lake City", "Oklahoma City", "Quebec City", "Guatemala City", "Ho Chi Minh City", etc.

Copy link
Contributor Author

@tiftran tiftran Feb 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I did test out "New York City", "Quebec City" and they were okay, but you're right the others may or may not work. There are a couple of cities in South America that will benefit. Given that I think it would make sense to have the suffix removed as another city name to try if the original fails.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. If this doesn't regress other locations, we can give it a try.

You can compile this regex (re.compile()) as a constant so it won't be compiled on the fly for each removal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k so i now have it so that we try the original, normalized, then without the city/municipality suffix

@tiftran tiftran force-pushed the more-improvements branch 2 times, most recently from f01b316 to 6c62c21 Compare February 10, 2025 04:05
@tiftran tiftran requested a review from ncloudioj February 10, 2025 04:10

LOCALITY_SUFFIX_PATTERN = re.compile(r"\s*(city|municipality)$", re.IGNORECASE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you want to match one and more whitespace before "city"?

Suggested change
LOCALITY_SUFFIX_PATTERN = re.compile(r"\s*(city|municipality)$", re.IGNORECASE)
LOCALITY_SUFFIX_PATTERN: re.Pattern = re.compile(r"\s+(city|municipality)$", re.IGNORECASE)


def remove_locality_suffix(city: str) -> str:
"""Remove either city or municipality suffix from a city name"""
return LOCALITY_SUFFIX_PATTERN.sub("", city).strip()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return LOCALITY_SUFFIX_PATTERN.sub("", city).strip()
return LOCALITY_SUFFIX_PATTERN.sub("", city)


if res is not None:
return res, is_skipped
city = geolocation.city
Copy link
Contributor

@ncloudioj ncloudioj Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having to do all the following upfront for every location can be quite wasteful, in particular, for those locations that don't require special handling. Let's make normalization optional and lazy.

# define this at the module level
CITY_NAME_NORMALIZERS: Callable[[Str], Str] = [
    lambda a: a,
    normalize_string,
    remove_locality_suffix,
]

async def explore(...):
    ...
    
    # if city is None, just return
    if geolocation.city is None:
        return None, is_skipped

    city: str = typing.cast(str, geolocation.city)  # tell the type checker it's truly not None 
    # map is lazy, so items of `cities` would only be evaluated one by one if needed 
    cities = map(lambda fn: fn(city), CITY_NAME_NORMALIZERS)
    explored_cities: list[str] = []  # store the explored cities to avoid duplicates

    for region in compass(weather_context.geolocation):
        for city in cities:
            if city in explored_cities:
                continue
            else:
                explored_cities.append(city)
            
             # probe the location as usual     

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh that's fancy

country = location.country
regions = location.regions
city = location.city

if regions and country and city:
corrected_city = CITY_NAME_CORRECTION_MAPPING.get(city, city)
corrected_city = CITY_NAME_CORRECTION_MAPPING.get(city, remove_locality_suffix(city))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since remove_locality_suffix() is already handled in explore(), do we still need it here? Doing the suffix removal in both places seems pretty hard to follow to me. Is there a way for us to not do suffix removing here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yup, you're right

country = location.country
regions = location.regions
city = location.city

if regions and country and city:
corrected_city = CITY_NAME_CORRECTION_MAPPING.get(city, city)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, is this a mis-kill? CITY_NAME_CORRECTION_MAPPING is still in use, eh?

Copy link
Contributor

@ncloudioj ncloudioj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, r+wc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants