Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Token reloading with RwLock #835

Merged
merged 6 commits into from
Mar 1, 2022
Merged

Token reloading with RwLock #835

merged 6 commits into from
Mar 1, 2022

Conversation

kazk
Copy link
Member

@kazk kazk commented Feb 19, 2022

The original implementation deadlocked (#829) because it was accidentally trying to lock for write while holding read lock, then attempting to read.
The guard not dropped before else was surprising, so here's a minimal demo showing when a temporary value is dropped: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=401eaf29aecf31d4e2d434d005e1c24b

I've tested this fix with in-cluster controller-rs and it's been working fine, but please test it again. I'm assuming RwLock is more efficient than Mutex for this, but I haven't benchmarked. If we want to keep Mutex, we should remove cached_token.

@codecov-commenter
Copy link

codecov-commenter commented Feb 19, 2022

Codecov Report

Merging #835 (b2fc0ff) into master (4a79392) will decrease coverage by 0.01%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #835      +/-   ##
==========================================
- Coverage   69.97%   69.95%   -0.02%     
==========================================
  Files          59       59              
  Lines        4140     4141       +1     
==========================================
  Hits         2897     2897              
- Misses       1243     1244       +1     
Impacted Files Coverage Δ
kube-client/src/client/auth/mod.rs 57.62% <0.00%> (-0.33%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4a79392...b2fc0ff. Read the comment docs.

let mut locked = token_file.lock().await;
if let Some(token) = locked.cached_token() {
bearer_header(token)
if let Some(header) = { let guard = token_file.read().await; guard.cached_token().map(bearer_header) } {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hah, aggressive scoping. i think this makes sense, but will double check with controller-rs just in case.

it might make sense to put a comment on this line to explain the scope, or maybe see if we can put a unit test around to_header in the ::File case

@nightkr
Copy link
Member

nightkr commented Feb 19, 2022

This does have the downside (compared to the Mutex) that it can end up reloading multiple times every time the TTL expires.

I also wonder if we'd rather explicitly drop() the read lock before taking the write lock, instead of relying on the (evidently sometimes quite unclear) scoping rules.

@kazk
Copy link
Member Author

kazk commented Feb 19, 2022

This does have the downside (compared to the Mutex) that it can end up reloading multiple times every time the TTL expires.

If locks happen like the following:

0 1 2 3
R
  R─┐
    W W
  R───┘
R
R

0 = not expired
1 = expiring
2 = reload
3 = not expired

The second W shouldn't reload because expires_at was updated by the first W?

I also wonder if we'd rather explicitly drop() the read lock before taking the write lock, instead of relying on the (evidently sometimes quite unclear) scoping rules.

Yeah, that's probably better.

@nightkr
Copy link
Member

nightkr commented Feb 20, 2022

The second W shouldn't reload because expires_at was updated by the first W?

Ideally, yeah. Then again, I'm not sure it's a huge problem since it should already be in the OS' page cache at that point...

Signed-off-by: kazk <kazk.dev@gmail.com>
Signed-off-by: kazk <kazk.dev@gmail.com>
Signed-off-by: kazk <kazk.dev@gmail.com>
Signed-off-by: kazk <kazk.dev@gmail.com>
@kazk
Copy link
Member Author

kazk commented Feb 20, 2022

Do you mean it's still possible?

fn token(&mut self) -> &str { 
    if self.is_expiring() { // must be false in the second `W`, right?
        if let Ok(token) = std::fs::read_to_string(&self.path) { 
            self.token = SecretString::from(token); 
        }
        self.expires_at = Utc::now() + Duration::seconds(60);
    }
    self.token.expose_secret() 
} 

@nightkr
Copy link
Member

nightkr commented Feb 24, 2022

Ah okay, that makes sense then. I /think/ it might be slightly clearer to have both checks in the same function (nothing else should use token directly, right?), but it's not a huge problem.

Signed-off-by: kazk <kazk.dev@gmail.com>
Copy link
Member

@nightkr nightkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM then. We might want to consider normalizing this logic across all three token sources at some point, but that feels like a future problem to tackle.

@clux
Copy link
Member

clux commented Feb 25, 2022

We might want to consider normalizing this logic across all three token sources at some point, but that feels like a future problem to tackle.

Yeah, the more immediate thing here is probably some kind of e2e test for the token loading (#832) along with a fix for the import issue (#833) so that users don't keep experiencing this issue. Already saw someone run into it again in on kube-rs/controller-rs#26 and it's not a great look. For future bugs like this I think we should probably yank the release since it doesn't leave the crate working very well.

This fix looks great to me otoh!

@clux clux added this to the 0.70.0 milestone Feb 25, 2022
@clux clux added the changelog-fix changelog fix category for prs label Feb 25, 2022
@nightkr
Copy link
Member

nightkr commented Feb 25, 2022

For future bugs like this I think we should probably yank the release since it doesn't leave the crate working very well.

FWIW, yanking wouldn't have fixed that issue, since it was already locked to the broken version. cargo update would have updated to 0.69.1, regardless of whether 0.69.0 was locked. Yanking would only have affected people who explicitly depended on kube-client = "=0.69.1".

@clux
Copy link
Member

clux commented Feb 25, 2022

oh. i thought #833 implied that users who manually bumped the .toml of kube to 0.69.1 could run into the issue if we didn't yank. do you not get warnings of yanked versions when building against a lockfile?

@nightkr
Copy link
Member

nightkr commented Feb 25, 2022

#833 is more about the opposite issue, people who explicitly depend on kube = "=0.69.0" would still get upgraded to 0.69.1 for the subcrates.

do you not get warnings of yanked versions when building against a lockfile?

I'm not sure. Looks like cargo-audit does, which implies to me that cargo build doesn't...

Signed-off-by: kazk <kazk.dev@gmail.com>
@kazk kazk merged commit 0b5e803 into kube-rs:master Mar 1, 2022
@kazk kazk deleted the retry-rwlock branch March 1, 2022 00:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog-fix changelog fix category for prs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants