Fixing the scraping problem from wikipedia #2

tanatiem · 2023-02-06T07:57:51Z

__init__.py : download_thai_address()
When trying to scrape the list of "List of tambon in Thailand (...)" hrefs from the wikipedia page.. The existing code doesn't seem to work, probably due to the change of HTML structure. This problem occurs right away during the import of the package.

# Doesn't seem to retrieve the desired ul anymore
urls = data.find_all(name='ul')[0]

# Change to this, it can now download and build th_provinces_districts_sub_districts.json successfully
urls = data.select_one('div.mw-parser-output').find('ul')

dsin · 2024-06-21T06:59:17Z

@tanatiem I invited you to collaborate the forked repository that include your fix.
Now we can help maintain ThaiAddressParserPlus. Please join.

I also put that on PyPI, that means we can do
pip install ThaiAddressParserPlus

I also add @HandsomeBrotherShuaiLi too.

Fixing the scraping problem from wikipedia

89d6353

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing the scraping problem from wikipedia #2

Fixing the scraping problem from wikipedia #2

tanatiem commented Feb 6, 2023

dsin commented Jun 21, 2024 •

edited

Loading

Fixing the scraping problem from wikipedia #2

Are you sure you want to change the base?

Fixing the scraping problem from wikipedia #2

Conversation

tanatiem commented Feb 6, 2023

dsin commented Jun 21, 2024 • edited Loading

dsin commented Jun 21, 2024 •

edited

Loading