Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How should the temporal dimension (ie. organizational reforms) be handled? #68

Open
augusto-herrmann opened this issue Nov 26, 2015 · 7 comments
Labels
Data model Changes to schema and how to represent data Needs work

Comments

@augusto-herrmann
Copy link
Collaborator

Brazil has been through a quite comprehensive organizational reform recently, with many Ministries being merged, renamed, etc. I suppose this is quite common in other countries as well. So I'm opening this issue to discuss how this history of organizations should be handled (if at all) in this project. Ideas are welcome.

Some possibilities:

  1. Do not store or link organizational history in any way. Let the version control keep previous versions of tables and leave it at that.
  2. Create a reorg_from field containing the id of the public body this was merged from. Reorgs can then be tracked by analysing this field along with the existing founding_date and dissolution_date fields.
  3. Store the dynamic history of public bodies and reorgs in another separate table (using an yet to be defined schema).
@augusto-herrmann augusto-herrmann added the Data model Changes to schema and how to represent data label Jan 19, 2016
@augusto-herrmann
Copy link
Collaborator Author

An use case related to this is on the first draft of the Core Public Organization Vocabulary (cpov):

3.7. Keep track of the evolution of public organisations
The structure and responsibilities of public organisations are prone to change, e.g.
following elections. A core vocabulary describing public organisations, allows to track
these changes over time.
The CPOV allows stakeholders to track the frequent changes in structure and responsibilities of public organisations.

I highly suggest that people interested on this publicbodies.org project do participate in the open Working Group that is developing the cpov.

@rufuspollock
Copy link
Member

@augusto-herrmann my instinct is the simplest way to handle this is to add three columns to the tables:

  • status: with values like active and obsolete
  • A timespan for the validity/existence of a body
    • from:
    • to: (this would be blank if body is still extant)

There's also move about what replaces it but i think this gets really complicated quickly and could go in the notes fields for now IMO.

I think this is the lightest way to get moving for now but maybe others have good ideas too! E.g. @jpmckinney may have good ideas here!

@jpmckinney
Copy link

@rufuspollock I think from and to are handled by the existing fields founding_date and dissolution_date (taken from Schema.org), unless I'm misunderstanding something.

I'm not fond of status fields (though I know they are common). I prefer a data model that is 'append-only' (it makes it easier to track changes over time) and where the status is derivable from other facts. So, a status of 'active' is derived from the fact that an organization has no dissolution_date. (There are possibilities like "We don't know whether the organization is active or not" and "We know the organization is dissolved but we don't know when that change occurred" which are harder to model, but I don't know that we need to address those now.)

For organizational transformation, that is indeed complicated, and may be better suited for an unstructured notes field. If structured data is necessary, some options to handle are: renaming, merging, splitting, moving within a hierarchy.

@rufuspollock
Copy link
Member

@jpmckinney sounds good.

@augusto-herrmann so I think we should go for founding_date and dissolution_date and leave status.

We can put some unstructured or semistructured info in the notes field about how the change related to other orgs (i don't think we are in a position to standardize this yet and suggest you make something up for brazil and once we have trialled that a bit we can revisit whether we can standardize that).

@augusto-herrmann augusto-herrmann changed the title How the temporal dimension (ie. organizational reforms) be handled? How should the temporal dimension (ie. organizational reforms) be handled? Sep 24, 2020
@augusto-herrmann
Copy link
Collaborator Author

This issue is related to #13, should be handled in tandem when tackled.

@augusto-herrmann
Copy link
Collaborator Author

@jpmckinney @rufuspollock the problem is that data sources often do not provide a founding_date and almost never a dissolution_date.

Right now, with the data sources that have already been automated, we just delete the missing entries when bringing in new data, i.e. the whole list gets replaced by the new version. Since we're tracking the csv in git, this is not much of a problem unless we want to display extinct public bodies in the UI. If people want to get to older data, just check older versions of the csv in git.

This seems to me like the least costly, maintainable solution at the moment.

@jpmckinney
Copy link

I'm happy with whatever compromise the project chooses. In terms of options:

If deleting rows, this means that if a dataset uses publicbodies IDs, then if a body it refers to is deleted in publicbodies, then that dataset can no longer reconcile. Furthermore, users of that dataset don't know whether (1) the dataset's ID is incorrect or (2) the ID refers to a body that has since been deleted.

Instead of deleting rows, this issue had suggested additional columns (status, dates, etc.). Dates are a no-go if they are not easily available in practice (and putting sentinel values in these columns would be no different from using a status field). That said, if the IDs are not persistent across time, then status might not work either (e.g. if the ministry of "Environment" becomes "Environment and Climate Change", and the script mints a new ID, even though the ministry has only changed in name and not in identity); in this case, you'd have two rows for the same body with different statuses.

Another solution that has not been discussed is to instead move/merge the old table for country XYZ into a "history" table for country XYZ. The semantics of the history table are that it contains all bodies that had been collected in the past and that could not be reconciled with the most recent collection. This way, users can still easily look up any ID that had existed in the past, without attaching any specific meaning to the "history" table – entries might be there because: a body no longer exists, reconciliation failed (like in the Environment example above), the source accidentally omitted the body, etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Data model Changes to schema and how to represent data Needs work
Projects
None yet
Development

No branches or pull requests

3 participants