Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postcode status flag (was #3 in companies-house-ETL) #6

Open
giacecco opened this issue Nov 13, 2014 · 0 comments
Open

Postcode status flag (was #3 in companies-house-ETL) #6

giacecco opened this issue Nov 13, 2014 · 0 comments

Comments

@giacecco
Copy link
Contributor

@MurrayData on 4/11

I have added a postcode status flag: 1 is current, 0 is retired. I update where I can if there is a direct replacement for an retired postcode in which case I set the status to 2.

We have two options:

I can only pass back current or updated postcodes with address back to the ingester and ignore retired ones as they are probably out of date.
I can pass back the status, with a flag which could be passed in the address or as a provenance field.
What do you think?

@giacecco's response

I have thought about this. @Floppy and @pezholio may have ideas, too.

In my opinion the validity of an address is not a matter of provenance, but is a property of the address itself. And it is not a matter of 1 or 0, on or off, it is a matter of periods in time. What I expect ETLs to tell the ingester is that one address:
a) was valid in a defined period of time (even just one day), or rather
b) became invalid on a certain date (without making assumptions on it being still invalid today).

E.g. when processing Companies House's data, I expect their records to be created at the moment of incorporation and updated on any other event in the life of the company that needs being recorded with Companies House. This means that an address coming from Companies House then tells me that the address was valid at the time of the latest change in that company's record.

If I have several copies of the same data in time - and we will - perhaps I know more. For example, I could see one company's incorporation event and later in time, the event when one director was added. In this case, I know that the address was valid on those two events' dates.

Inferring the "absolute" validity on an address at the moment of publishing, the "on or off" thing, and estimating confidence in the inference, is a job for the "magic" box in the solution architecture diagram.

Any ideas?

@MurrayData's response

If a postcode ceased to be valid, I can get its expiry date.

This isn’t included at present, the prototype has a current flag which shows the postcode was in existence at the time of the ONSPD extract, so we can put a date on it.

I would suggest for the second part, the date of the last accounts or company return (which isn’t included at present but would be a simple mod) is the date that company last transacted with Companies House which I would suggest has greater validity than incorporation date. A company which is in liquidation or has closed, won’t transact in this way so the date will be older.

So maybe 2 dates: a postcode currency date and a last transacted date which refers to the address within the postcode.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant