Skip to content

Wikipedia web crawler to extract ISO-3166 countries and their subdivisions with ISO codes

Notifications You must be signed in to change notification settings

leikoilja/ISO-3166-Countries-with-Subdivisions-Wiki-Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ISO-3166 Countries with Subdivisions(Regions)

Overview

A simple web crawler created by use of Scrapy. It crawls Wikipedia for all countries list and extracts their name and ISO-3166-1 alpha-2 as well as ISO-3166-1 alpha-3 codes. Moreover it follows each country and extracts it's subdivisions (regions) and their corresponding ISO-3166-2 codes.

All of that is exported into a JSON file as following: alt text

Requirements

  • Python 3.5+
  • Scrapy

Install

Running

  • From the repo directory run scrapy crawl codes

Note that crawler will not overwrite output country_codes.json file, but will append to it. Therefore you might want to backup the output file first by renaiming it.

About

Wikipedia web crawler to extract ISO-3166 countries and their subdivisions with ISO codes

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages