-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redact password from various log messages #5590
Conversation
Thanks @kvalev! This LGTM as is. I think pip should actually show |
I will take a stab at it @pradyunsg, it seems simple enough. |
src/pip/_internal/utils/misc.py
Outdated
def redact_auth_from_url(url): | ||
# Return a copy of url with 'username:password@' redacted by | ||
# substituting the credentials with '<redacted>' in case they | ||
# were present in the url. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better if only the password is replaced with a substitution string (e.g. <redacted>
or ****
). It helps to have the username present, e.g. for troubleshooting purposes.
@@ -646,3 +646,26 @@ def test_call_subprocess_closes_stdin(): | |||
def test_remove_auth_from_url(auth_url, expected_url): | |||
url = remove_auth_from_url(auth_url) | |||
assert url == expected_url | |||
|
|||
|
|||
@pytest.mark.parametrize('auth_url, expected_url', [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add test cases of:
- containing
:
but no@
(e.g.domain.tld:8080
) - password containing a
:
Also, I would order the test cases "simplest" to least simple, e.g. start with the test cases having no user-pass, and put all cases with password redaction at the end. The current ordering makes it harder to tell what cases are covered and left out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does a password containing ":" need to be urlencoded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe.
>>> urlparse('https://user:pass:word@domain.tld/svn/project/trunk')
ParseResult(scheme='https', netloc='user:pass:word@domain.tld', path='/svn/project/trunk', params='', query='', fragment='')
It couldn't hurt to add test cases for both -- :
both urlencoded and not urlencoded. Since this is user input, it seems like we can't guarantee it will be encoded correctly, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To expand the previous example:
>>> parsed = urlparse('https://user:pass:word@domain.tld/svn/project/trunk')
>>> parsed.password
'pass:word'
(But yes, it looks like it's supposed to be encoded.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is user input, it seems like we can't guarantee it will be encoded correctly, etc.
Yeah. This makes sense. @kvalev Could you add a case here: 'https://user:pass:word@domain.tld/svn/project/trunk'
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already did, see the latest commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. I was viewing an outdated diff. Sorry for the noise. :)
src/pip/_internal/utils/misc.py
Outdated
@@ -852,6 +853,30 @@ def enum(*sequential, **named): | |||
return type('Enum', (), enums) | |||
|
|||
|
|||
def redact_password_from_url(url): | |||
# Return a copy of url by redacting the password with '****' in case it | |||
# was present in the url. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this a docstring not code comment.
src/pip/_internal/utils/misc.py
Outdated
# Return a copy of url by redacting the password with '****' in case it | ||
# was present in the url. | ||
|
||
# parsed url |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment seems unnecessary.
src/pip/_internal/utils/misc.py
Outdated
# parsed url | ||
purl = urllib_parse.urlsplit(url) | ||
|
||
# redact the password |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment seems unnecessary.
src/pip/_internal/utils/misc.py
Outdated
|
||
redacted_netloc = userpass + '@' + netloc | ||
|
||
# redacted url |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment seems unnecessary.
news/4746.bugfix
Outdated
@@ -0,0 +1 @@ | |||
Redact username/password from log messages when installing packages from Git |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update to say you're only redacting the password. Also, isn't redact_password_from_url()
being called in non-Git cases, too, in some cases? I would just say "Redact password from the URL in some log messages."
src/pip/_internal/utils/misc.py
Outdated
|
||
redacted_netloc = purl.netloc | ||
if purl.password: | ||
auth, netloc = redacted_netloc.split('@') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, this will cause an exception if the user included a password with an unencoded "@", like
"https://user:pass@word@domain.tld/svn/project/trunk"
tests/unit/test_utils.py
Outdated
|
||
@pytest.mark.parametrize('auth_url, expected_url', [ | ||
('http://user@domain.tld:8080/', | ||
'http://user@domain.tld:8080/',), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see the test case I suggested of a netloc containing a :
with no @
for the netloc, like "domain.tld:8080", to make sure the characters following the colon aren't stripped.
src/pip/_internal/utils/misc.py
Outdated
purl = urllib_parse.urlsplit(url) | ||
|
||
redacted_netloc = purl.netloc | ||
if purl.password: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this new version will leak the password if it's the empty string:
>>> a = urllib.parse.urlsplit("https://user:@domain.tld/svn")
>>> a.password
''
If you want to use the password attribute, you should probably be checking for non-None.
I'm a big fan of sharing code and not duplicating logic. One thing I notice is that I think it would be good to share code between This could even reduce the combinatorial explosion of cases to test because you could independently test the two different |
+1 to what Chris says, that would simplify both the tests and the code. :) |
In the interests of sharing code and not duplicating logic, I think it would help a lot to add a new function to misc.py whose implementation matches this newly added method (it already has tests): pip/src/pip/_internal/vcs/subversion.py Lines 103 to 125 in a296897
And then change You can then also use the new function inside the new functions you'll be creating in misc.py. |
Hello! I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the |
This PR seems abandoned, I created #5773 to follow on the work. |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Redact all authentication information when logging urls of Git-based requirements.
Current behavior:
New behavior:
Fixes #4746