-
-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use a more general (and faster) method to sanitize URLs with credentials #19239
Conversation
7558472
to
e6dd7c2
Compare
e6dd7c2
to
3d3f223
Compare
"https://" + userPlaceholder + "@github.com/go-gitea/test_repo.git", | ||
}, | ||
// case 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still need this normal cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which case? I think I have covered most.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean why we delete these original test cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All code are rewritten, all new cases cover old ones. If you feel which is missing, please just point out and add it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And many cases are indeed the old cases, for example, these github.com
cases.
Co-authored-by: zeripath <art27@cantab.net>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems okay... The current code seems to be indeed faster:
BenchmarkNew-12 3575206 368.0 ns/op 320 B/op 2 allocs/op
BenchmarkOld-12 373596 2956 ns/op 3288 B/op 9 allocs/op
A relative minor issue that comes with this, is that the current code will "handle" URL's like ://@
into -> ://sanitized-credential@
whereby the old function would return a error that it is a incorrect URL, but like I said relative minor issue.
Thank you for the benchmark! In my mind the major purpose is to make code more simple and intuitive, the performance is just a bonus. And the edge case |
Why the skip-changelog? |
this PR is just a refactoring, it doesn't change the expected output, and doesn't improve performance obviously (because the sanitizer isn't called too much), so I think it's not necessary to mention it in the changelog, it doesn't provide useful information to end users. feel free to adjust the labels. |
modules/util/sanitize.go
Outdated
u.User = url.User(userPlaceholder) | ||
} else { | ||
u.User = nil | ||
// SanitizeCredentialURLs remove all credentials in URLs for the input string: "https://user:pass@domain.com" => "https://sanitized-credential@domain.com" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code only removes the credential from the first URL it finds in the string
If we want to strip out credentials from all urls in the string we're gonna need to do something else. (Possibly iterate across all "://" in the string however see below...)
One other issue is that there are URLs which do not have the ":", for example //username:password@www.google.com
which says use the current protocol. I think /\username:password@www.google.com
also works.
IMO I don't think we need to necessarily sanitize these - we should probably just make it clear that we're not going to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I will add it to comment. IMO "//xxxx" couldn't be treated as a valid URL in strings. It's only valid in a context with scheme already (say, inside a browser)
And /\
is not standard, it's just a browser's dirty hack. Golang's url.Parse doesn't support /\
either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// SanitizeCredentialURLs remove all credentials in URLs for the input string: "https://user:pass@domain.com" => "https://sanitized-credential@domain.com" | |
// SanitizeCredentialURLs remove all credentials in URLs (starting with "scheme://") for the input string: "https://user:pass@domain.com" => "https://sanitized-credential@domain.com" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"This code only removes the credential from the first URL it finds in the string"
No, it removes all. See the test case
{
"URLs in log https://u:b@h and https://u:b@h:80/, with https://h.com and u@h.com",
"URLs in log https://" + userPlaceholder + "@h and https://" + userPlaceholder + "@h:80/, with https://h.com and u@h.com",
},
Co-authored-by: zeripath <art27@cantab.net>
…L in strings because it doesn't have a valid scheme context.
…into refactor-url-sanitizer
* giteaofficial/main: Update reserved usernames list (go-gitea#18438) Configure OpenSSH log level via Environment in Docker (go-gitea#19274) Use a more general (and faster) method to sanitize URLs with credentials (go-gitea#19239) [skip ci] Updated translations via Crowdin fix link to package registry docs (go-gitea#19268) Add Redis Sentinel Authentication Support (go-gitea#19213)
As discussed in
We can use a more general (and faster) method to sanitize URLs with credentials
Benefits:
util.NewStringURLSanitizer(opts.Remote, true).Replace(opts.Remote)
util.SanitizeCredentialURLs(opts.Remote)
get a remoteAddr; util.NewURLSanitizedError(err, remoteAddr, true)
util.SanitizeErrorCredentialURLs(err)