Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement new import scripts to update Brazilian data #72

Closed
augusto-herrmann opened this issue Jan 19, 2016 · 7 comments · Fixed by #105
Closed

implement new import scripts to update Brazilian data #72

augusto-herrmann opened this issue Jan 19, 2016 · 7 comments · Fixed by #105
Assignees
Labels
Data Data sources and ingestion automation

Comments

@augusto-herrmann
Copy link
Collaborator

The Brazilian dataset is quite outdated. Especially as a recent organizational reform changed a lot of Ministries.

The source dataset has changed a lot too. Used to be a big XML dump (this is what the import scripts used to read), but now it's a RESTful API using JSON. The data model and fields available have also changed. So the scripts need to be completely re-done.

@augusto-herrmann augusto-herrmann added the Data Data sources and ingestion automation label Jan 19, 2016
@todrobbins
Copy link
Contributor

@augusto-herrmann are you interested in taking this or should I move forward with this script?

@augusto-herrmann
Copy link
Collaborator Author

Sure, but @todrobbins before I work in this, I'd like to confirm whether or not you actually did any work in it already, considering a couple of months have passed since then.

@augusto-herrmann augusto-herrmann self-assigned this Jul 31, 2017
@rufuspollock
Copy link
Member

@augusto-herrmann i don't believe @todrobbins has done any specific work here so please go ahead!

@todrobbins
Copy link
Contributor

@rufuspollock is correct as I have not done any work in this area. Carry on!

@augusto-herrmann
Copy link
Collaborator Author

I am having difficulty with this as I am unsure about how to handle the changes in organizational structures and how to map identifiers between versions of the organizational structure. I think we need to solve issue #68 before properly handling this.

Alternatively, I could just naïvely run the scripts to obtain the latest version without caring about keeping the ids consistent between the version in use in publicbodies.org (which is quite old) and the current organizational structure of the Brazilian government. Of course, if there is something that currently depends on those ids, it would most likely be broken by this approach.

@rufuspollock
Copy link
Member

rufuspollock commented Sep 11, 2017

@augusto-herrmann i've commented in #68. We could trial out the solution of #68 here and see how it goes ...

@augusto-herrmann
Copy link
Collaborator Author

@augusto-herrmann i've commented in #68. We could trial out the solution of #68 here and see how it goes ...

I did not handle or track in time the changes to the structure of public bodies in this PR. There are too many changes over so many years, that it would take an "epic" effort to be able to do that. I don't even think that the structure at a certain point in time is available as open data, so that would be difficult to obtain if not outright impossible.

For now, we just replace the file with the current structure of public bodies in Brazil. Not even the ids are necessarily consistent with the old data, as a different algorithm was used now to generate the slugs (we now use the python-slugify package, while it was a custom slug making code before – I don't think even this lib existed at the time).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Data Data sources and ingestion automation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants