Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing the scraping problem from wikipedia #2

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

tanatiem
Copy link

@tanatiem tanatiem commented Feb 6, 2023

__init__.py : download_thai_address()
When trying to scrape the list of "List of tambon in Thailand (...)" hrefs from the wikipedia page.. The existing code doesn't seem to work, probably due to the change of HTML structure. This problem occurs right away during the import of the package.

# Doesn't seem to retrieve the desired ul anymore
urls = data.find_all(name='ul')[0]

# Change to this, it can now download and build th_provinces_districts_sub_districts.json successfully
urls = data.select_one('div.mw-parser-output').find('ul')

@dsin
Copy link

dsin commented Jun 21, 2024

@tanatiem I invited you to collaborate the forked repository that include your fix.
Now we can help maintain ThaiAddressParserPlus. Please join.

I also put that on PyPI, that means we can do
pip install ThaiAddressParserPlus

I also add @HandsomeBrotherShuaiLi too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants