Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redact password from various log messages #5590

Closed
wants to merge 8 commits into from
Closed

Redact password from various log messages #5590

wants to merge 8 commits into from

Conversation

kvalev
Copy link

@kvalev kvalev commented Jul 10, 2018

Redact all authentication information when logging urls of Git-based requirements.

Current behavior:

Collecting git+https://username:password@gitlab.com/user/project
  Cloning https://username:password@gitlab.com/user/project to /private/var/folders/rp/7lwtlxfs11g5v1k_9vxph_md2rygl_/T/pip-req-build-04B4B0
Requirement already satisfied (use --upgrade to upgrade): project==0.1.0 from git+https://username:password@gitlab.com/user/project in /usr/local/lib/python2.7/site-packages

New behavior:

Collecting git+https://username:***@gitlab.com/user/project
  Cloning https://username:***@gitlab.com/user/project to /private/var/folders/rp/7lwtlxfs11g5v1k_9vxph_md2rygl_/T/pip-req-build-04B4B0
Requirement already satisfied (use --upgrade to upgrade): project==0.1.0 from git+https://username:***@gitlab.com/user/project in /usr/local/lib/python2.7/site-packages

Fixes #4746

@pradyunsg pradyunsg added the type: enhancement Improvements to functionality label Jul 10, 2018
@pradyunsg
Copy link
Member

Thanks @kvalev! This LGTM as is.

I think pip should actually show <redacted> for the authentication information here (and there's one more place where we made a similar change recently). I understand that it might make this a little more work than this PR is currently. Do let me know if you'd be willing to try a hand at that. :)

@kvalev
Copy link
Author

kvalev commented Jul 11, 2018

I will take a stab at it @pradyunsg, it seems simple enough.

@kvalev kvalev changed the title Remove username/password from log messages when installing from Git Redact username/password from log messages when installing from Git Jul 12, 2018
def redact_auth_from_url(url):
# Return a copy of url with 'username:password@' redacted by
# substituting the credentials with '<redacted>' in case they
# were present in the url.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better if only the password is replaced with a substitution string (e.g. <redacted> or ****). It helps to have the username present, e.g. for troubleshooting purposes.

@@ -646,3 +646,26 @@ def test_call_subprocess_closes_stdin():
def test_remove_auth_from_url(auth_url, expected_url):
url = remove_auth_from_url(auth_url)
assert url == expected_url


@pytest.mark.parametrize('auth_url, expected_url', [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add test cases of:

  1. containing : but no @ (e.g. domain.tld:8080)
  2. password containing a :

Also, I would order the test cases "simplest" to least simple, e.g. start with the test cases having no user-pass, and put all cases with password redaction at the end. The current ordering makes it harder to tell what cases are covered and left out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does a password containing ":" need to be urlencoded?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe.

>>> urlparse('https://user:pass:word@domain.tld/svn/project/trunk')
ParseResult(scheme='https', netloc='user:pass:word@domain.tld', path='/svn/project/trunk', params='', query='', fragment='')

It couldn't hurt to add test cases for both -- : both urlencoded and not urlencoded. Since this is user input, it seems like we can't guarantee it will be encoded correctly, etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To expand the previous example:

>>> parsed = urlparse('https://user:pass:word@domain.tld/svn/project/trunk')
>>> parsed.password
'pass:word'

(But yes, it looks like it's supposed to be encoded.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is user input, it seems like we can't guarantee it will be encoded correctly, etc.

Yeah. This makes sense. @kvalev Could you add a case here: 'https://user:pass:word@domain.tld/svn/project/trunk'?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already did, see the latest commit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. I was viewing an outdated diff. Sorry for the noise. :)

@@ -852,6 +853,30 @@ def enum(*sequential, **named):
return type('Enum', (), enums)


def redact_password_from_url(url):
# Return a copy of url by redacting the password with '****' in case it
# was present in the url.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this a docstring not code comment.

# Return a copy of url by redacting the password with '****' in case it
# was present in the url.

# parsed url
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment seems unnecessary.

# parsed url
purl = urllib_parse.urlsplit(url)

# redact the password
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment seems unnecessary.


redacted_netloc = userpass + '@' + netloc

# redacted url
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment seems unnecessary.

news/4746.bugfix Outdated
@@ -0,0 +1 @@
Redact username/password from log messages when installing packages from Git
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update to say you're only redacting the password. Also, isn't redact_password_from_url() being called in non-Git cases, too, in some cases? I would just say "Redact password from the URL in some log messages."

@kvalev kvalev changed the title Redact username/password from log messages when installing from Git Redact password from various log messages Jul 12, 2018

redacted_netloc = purl.netloc
if purl.password:
auth, netloc = redacted_netloc.split('@')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, this will cause an exception if the user included a password with an unencoded "@", like
"https://user:pass@word@domain.tld/svn/project/trunk"


@pytest.mark.parametrize('auth_url, expected_url', [
('http://user@domain.tld:8080/',
'http://user@domain.tld:8080/',),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the test case I suggested of a netloc containing a : with no @ for the netloc, like "domain.tld:8080", to make sure the characters following the colon aren't stripped.

purl = urllib_parse.urlsplit(url)

redacted_netloc = purl.netloc
if purl.password:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this new version will leak the password if it's the empty string:

>>> a = urllib.parse.urlsplit("https://user:@domain.tld/svn")
>>> a.password
''

If you want to use the password attribute, you should probably be checking for non-None.

@cjerdonek
Copy link
Member

I'm a big fan of sharing code and not duplicating logic. One thing I notice is that remove_auth_from_url() and redact_password_from_url() are almost identical, except for what they do to the netloc.

I think it would be good to share code between remove_auth_from_url() and redact_password_from_url(), e.g. by defining a function with signature transform_url(url, transform_netloc) that accepts the part that is different between the two functions (a function that transforms the netloc).

This could even reduce the combinatorial explosion of cases to test because you could independently test the two different transform_netloc() functions, instead of having every test be a test of the whole thing.

@pradyunsg
Copy link
Member

+1 to what Chris says, that would simplify both the tests and the code. :)

@pradyunsg pradyunsg closed this Jul 22, 2018
@pradyunsg pradyunsg reopened this Jul 22, 2018
@cjerdonek
Copy link
Member

In the interests of sharing code and not duplicating logic, I think it would help a lot to add a new function to misc.py whose implementation matches this newly added method (it already has tests):

def get_netloc_and_auth(self, netloc):
"""
Parse out and remove from the netloc the auth information.
This allows the auth information to be provided via the --username
and --password options instead of via the URL.
"""
if '@' not in netloc:
return netloc, (None, None)
# Split from the right because that's how urllib.parse.urlsplit()
# behaves if more than one @ is present (by checking the password
# attribute of urlsplit()'s return value).
auth, netloc = netloc.rsplit('@', 1)
if ':' in auth:
# Split from the left because that's how urllib.parse.urlsplit()
# behaves if more than one : is present (again by checking the
# password attribute of the return value)
user_pass = tuple(auth.split(':', 1))
else:
user_pass = auth, None
return netloc, user_pass

And then change Subversion.get_netloc_and_auth() to call this new function.

You can then also use the new function inside the new functions you'll be creating in misc.py.

@BrownTruck
Copy link
Contributor

Hello!

I am an automated bot and I have noticed that this pull request is not currently able to be merged. If you are able to either merge the master branch into this pull request or rebase this pull request against master then it will eligible for code review and hopefully merging!

@orf
Copy link
Contributor

orf commented Sep 10, 2018

This PR seems abandoned, I created #5773 to follow on the work.

@kvalev kvalev closed this Sep 10, 2018
@lock
Copy link

lock bot commented Jun 1, 2019

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot added the auto-locked Outdated issues that have been locked by automation label Jun 1, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Jun 1, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
auto-locked Outdated issues that have been locked by automation needs rebase or merge PR has conflicts with current master type: enhancement Improvements to functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pip prints out username and password from URLs with them
5 participants