Skip to content
giacecco edited this page Nov 13, 2014 · 4 revisions

"common" and Companies House ETL modules

Despite its name, this repository includes the source for both Open Addresses':

  • common ETL (Extract - Transform - Load) module, that harvests address gazetteer entities (town names, localities names, street names, postcodes...) from a series of Open Data sources that are highly reliable, making them into reusable reference tables, and
  • Companies House ETL module, dedicated to harvesting addresses specifically from Companies House's "Free Company Data Product" dataset.

The common ETL reference tables are suitable for being re-used for the normalisation of addresses sourced from any source. The Companies House ETL depends on them.

This documentation describes the modules and how to run them yourself.

Clone this wiki locally