Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HR DB check needs a configurable grace period #805

Closed
maarten-litmaath opened this issue Jul 7, 2024 · 14 comments · Fixed by #844
Closed

HR DB check needs a configurable grace period #805

maarten-litmaath opened this issue Jul 7, 2024 · 14 comments · Fixed by #844

Comments

@maarten-litmaath
Copy link

It has been observed that even when a user's new contract starts the next day after the previous contract ended, and the LHC experiment affiliation has remained the same, the HR DB check can decide to put the user account into the PENDING_REMOVAL state, from which it currently cannot recover automatically (see #768). To avoid glitches due to the exact meaning and implementation of the relevant HR DB API time stamps, which might even change, possibly accidentally, the HR DB check needs to be made more robust: a configurable grace period of at least 1 day should be applied before concluding a user is no longer affiliated with a given experiment.

@rmiccoli
Copy link
Contributor

rmiccoli commented Jul 8, 2024

I think we should consider a few things, first of all the end time.
If the user's new affiliation starts the day after the previous one expires and the end time expires with the old affiliation, then the IAM internal lifecycle job adds the label PENDING_SUSPENSION to the account, but the user remains active for a certain grace period. If we extend the CERN HR DB job, this doesn't change.

I'd like to understand if the end time always coincides with the expiry of the CERN affiliation. Is it always set? and how is it set?

In this specific case, my guess is that the CERN job runs when the new affiliation has already started, otherwise the user would be disabled and then restored at the next run.

Also, does the membership expire at 23:59 or 00:00 and does the new one, the day after, start at exactly what time?

@deesto
Copy link

deesto commented Jul 8, 2024

Someone with access to the CERN HR DB should confirm, but from what we have been shown without direct access, records seem to have a start and end date, with no time specified, e.g.:

"experiment": "ATLAS",
"startDate": "2024-07-01",
"endDate": "2026-06-30"`

If that is indeed the case, the time is likely the same for start and end.

@maarten-litmaath
Copy link
Author

Hi all,
we may well need to have a chat with an HR DB expert about these matters.
That may take some time due to the summer holiday season...

However, my point is that we should ensure we have enough flexibility on the side of IAM to counteract any unwanted behavior prompted by results from HR DB queries. That implies applying grace periods where needed and being able to correct a person's status from the GUI. I suspect we are close already, but with v1.9.0 we observed at least the case of an ATLAS user that had to be manually restored, while the HR DB suggested his next contract started on the next day after his previous contract finished --> we need to avoid that an admin intervention is needed for such cases...

Even if we get an exact specification of the current behavior of the API, there may well come a code change (on purpose or accidental) that adjusts the exact meaning of a timestamp: normally the users of such info do not care much, while we do!

For example, we just add 1 day to the contract end date and only flag the account if on the second day the user still does not have a new contract. We probably would like to have that grace period configurable.

@maarten-litmaath
Copy link
Author

Hi again,
we ran into yet another example yesterday: delayed updates of the HR DB caused an LHCb account to be blocked, while everything looked OK on the LHCb side...

@vokac
Copy link

vokac commented Jul 25, 2024

We discussed in today WLCG AuthZ meeting that while CERN HR API returns just

"startDate": "2024-07-01"
"endDate": "2026-06-30"

we should threat it as

"startDate": "2024-07-01 00:00:00"
"endDate": "2026-06-30 23:59:59"

@deesto
Copy link

deesto commented Jul 25, 2024

we should threat it as

"startDate": "2024-07-01 00:00:00"
"endDate": "2026-06-30 23:59:59"

Is the 1s gap below threshold or enough to continue to trigger suspension?

@maarten-litmaath
Copy link
Author

maarten-litmaath commented Jul 25, 2024

Good point. If we still see unwarranted suspensions,
we could add that last second. Adjusted example:

[...]
# previous contract
"endDate": "2024-07-01 00:00:00",
[...]
# current contract
"startDate": "2024-07-01 00:00:00",
[...]

As the HR DB is sometimes seen to get updated belatedly,
we should try to avoid immediately suffering from that;
hopefully we already have enough grace period flexibility...

@belforte
Copy link

belforte commented Sep 9, 2024

why do you guys discuss "hours" here ??
Let's have a 7-day grace period. End of story.
I AM REALLY REALLY REALLY SICK AND TIRED OF RESTORING USERS MANUALLY

I have seen cases where it took days for User's Office to get records straight when
people simply change institution, there is no reason they should be affected by that.

Unless user association was terminated because of evil acts, there is no possible harm
by having a decent grace period.
BUT YOU MUST WARN THE USER, so that they can talk to User Office and get
things corrected before voms-proxy-init stops working.

@belforte
Copy link

belforte commented Sep 9, 2024

maybe you guys should do the work to deal with users, it will make it easier for you to get it clear where priority lies. I do not give a damn for a configurable grace period nest year, I want a sensible grace period yesterday !! How could this tool ever got to be deployed without it ?
And, by the way, what do all those fields in user status page EXACTLY mean ? Where is this documented ?
image

@maarten-litmaath
Copy link
Author

Ciao Stefano,
we will look into having this issue worked on with a higher priority.
More news hopefully soon...

@rmiccoli
Copy link
Contributor

rmiccoli commented Sep 9, 2024

maybe you guys should do the work to deal with users, it will make it easier for you to get it clear where priority lies. I do not give a damn for a configurable grace period nest year, I want a sensible grace period yesterday !! How could this tool ever got to be deployed without it ? And, by the way, what do all those fields in user status page EXACTLY mean ? Where is this documented ? image

Hi, unfortunately the meaning of all those labels (referring to the account lifecycle) were not documented in Indigo IAM website, but there is already a PR in review almost ready to be merged indigo-iam/iam-website#108.

@vokac
Copy link

vokac commented Sep 9, 2024

Is not a bit weird to require grace period for a service that should enforce security policies? That's why I think it make sense to fix how we use CERN HR infrastructure data instead of applying grace period workarounds in IAM...

@belforte
Copy link

belforte commented Sep 9, 2024

thank you all for listening.
@rmiccoli nice diagram !

@vokac, I appreciate the philosophical point, but we need to have something that can work. One thing is that we must be able to stop evil-doers asap (and being able to search users by DN is important there! ). A different thing is how to deal gracefully with the fact that humans at CERN may not always do every thing in a matter of hours. When a student moves to another institution we do not kick them out of the experiment, nor does CERN disable their computer account. So we do not take their proxy or token permissions out either. If that can be achieved by smart interpretation of HR records, fine. But the end result must be that I do not get bugged because a User Office employee took a day off !

@belforte
Copy link

belforte commented Sep 9, 2024

nor because they delete old institution on Friday and add new one on Monday !

@enricovianello enricovianello linked a pull request Sep 16, 2024 that will close this issue
@rmiccoli rmiccoli moved this to On Review in WLCG tasks Sep 18, 2024
@enricovianello enricovianello closed this as completed by moving to Done in WLCG tasks Oct 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

6 participants