Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redact unsafe URLs in the Trace2 output #1616

Closed

Conversation

dscho
Copy link
Member

@dscho dscho commented Nov 21, 2023

The Trace2 output can contain secrets when a user issues a Git command with sensitive information in the command-line. A typical (if highly discouraged) example is: git clone https://user:password@host.com/.

With this PR, the Trace2 output redacts passwords in such URLs by default.

This series also includes a commit to temporarily disable leak checking on t0210,t0211 because the tests uncover other unrelated bugs in Git.

These patches were integrated into Microsoft's fork of Git, as microsoft#616, and have been cooking there ever since.

cc: Elijah Newren newren@gmail.com
cc: Jeff King peff@peff.net

@jeffhostetler
Copy link

I killed the osx-clang after 2h35m and restarted it. I assumed it was stuck. The osx-gcc took only 25 minutes.
I also restarted the CI / sparse jobs. It failed on a download of something.

@dscho
Copy link
Member Author

dscho commented Nov 22, 2023

@jeffhostetler do you have any suggestions how these patches (or the cover letter) could be improved before I send this off?

grep "d0|main|start|.* clone https://user:pwd@example.com" actual &&
grep "d0|main|def_param|.*|remote.origin.url:https://user:pwd@example.com" actual
'

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Since I can't comment on the commit message, I'll do it here.)

This commit just translates your t0210 test into t0211. Should we expand the commit message to say that or squash this into the first commit?

Copy link
Member Author

@dscho dscho Nov 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will edit the commit message!

#define trace2_def_param(param, value) \
trace2_def_param_fl(__FILE__, __LINE__, (param), (value))
#define trace2_def_param(param, value, kvi) \
trace2_def_param_fl(__FILE__, __LINE__, (param), (value), (kvi))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commit is unrelated to the radacting theme of the series, should we put it first to better isolate it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

Add `struct key_value_info` argument to `trace2_def_param()`.

In dc90208 (trace2: plumb config kvi, 2023-06-28) a `kvi`
argument was added to `trace2_def_param_fl()` but the macro
was not up updated. Let's fix that.

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
@dscho dscho force-pushed the trace2-redact-credentials-in-https-urls branch from 0d28692 to b7d5f13 Compare November 22, 2023 15:25
@dscho
Copy link
Member Author

dscho commented Nov 22, 2023

@jeffhostetler I force-pushed changes according to your feedback. Here is the range-diff:

  • 3: b8071e2 = 1: 97d17c2 trace2: fix signature of trace2_def_param() macro

  • 1: 5f14510 = 2: 0fb2ac8 trace2: redact passwords from https:// URLs by default

  • 2: 092dd80 ! 3: ea30fa3 t0211: test URL redacting in PERF format

    @@ Metadata
      ## Commit message ##
         t0211: test URL redacting in PERF format
     
    +    This transmogrifies the test case that was just added to t0210, to also
    +    cover the `GIT_TRACE2_PERF` backend.
    +
         Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
     
      ## t/t0211-trace2-perf.sh ##
  • 4: 56f417d ! 4: 185b4de t0212: test URL redacting in EVENT format

    @@ Metadata
      ## Commit message ##
         t0212: test URL redacting in EVENT format
     
    +    This transmogrifies the test case that was just added to t0210, and then
    +    transmogrified and added to t0211, to also cover the `GIT_TRACE2_EVENT`
    +    backend.
    +
         Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
     
      ## t/helper/test-trace2.c ##
  • 5: 0d28692 = 5: b7d5f13 t0210,t0211: unset SANITIZE_LEAK

@jeffhostetler
Copy link

GH is being weird as I review this, so I'll copy it here to avoid losing it.

In my t0212 commit:

# Furthermore, the Trace2 routines print the same content in normal, perf, or event format, and in t0210 and t0211 we already tested the basic functionality, so no need to repeat it here.
#
# In this test, we use the test-helper to unit test each of the event messages where URLs can appear and confirm that they are redacted in each event.

Maybe that goes in the commit message ??

@jeffhostetler
Copy link

Does the SANITIZE_LEAK commit at the end cause problems for later bisectors ?? Should it be moved up ?

Copy link

@jeffhostetler jeffhostetler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got in some weird GH review state while typing while when refreshes appeared, so I'm going to submit this as is to avoid losing state.

# In the following "...redact..." tests, skip testing the GIT_TRACE2_REDACT=0
# case because we would need to exactly model the full JSON event stream like
# we did in the basic tests above and I do not think it is worth it.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# Furthermore, the Trace2 routines print the same content in normal, perf, or event format, and in t0210 and t0211 we already tested the basic functionality, so no need to repeat it here.
#
# In this test, we use the test-helper to unit test each of the event messages where URLs can appear and confirm that they are redacted in each event.

Maybe that goes in the commit message ??

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, I guess I misunderstood you... In cea7cd2 I moved that new paragraph into the commit message, but you wanted to move that code comment into the commit message, too, right?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry to be confusing. my brain was a bit scattered. let me see what you have below.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's leave the code comment here as it is. And then make the commit message something like:

t0212: test URL redacting in EVENT format

Test the redact URL changes in the Trace2 EVENT format.

In the previous 2 commits, we tested basic URL redacting functionality
in NORMAL and PERF formats in t0210 and t0211.  Since the Trace2
routines print the same content regardless of format, we do not need
to repeat those tests here.

In this test, we use the test-helper to unit test each of the event
messages where URLs can appear and confirm that they are redacted in
each event.  This allows us to test redaction for events that are difficult
to setup tests for, such as child events.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. I've now submitted this with the code comment removed. Sorry!

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no problem. the comment is not really needed with the improved commit message.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no problem. the comment is not really needed with the improved commit message.

@jeffhostetler
Copy link

the transmogrifies message doesn't fit on t0212 -- let's go with my other suggestion.

@dscho dscho force-pushed the trace2-redact-credentials-in-https-urls branch from b7d5f13 to cea7cd2 Compare November 22, 2023 17:59
@dscho
Copy link
Member Author

dscho commented Nov 22, 2023

I force-pushed and this is the range-diff:

  • 1: 0fb2ac8 ! 1: 0c4954b trace2: redact passwords from https:// URLs by default

    @@ Commit message
         via `GIT_TRACE_REDACT` (defaulting to `true`), we guard the new code via
         `GIT_TRACE2_REDACT` (also defaulting to `true`).
     
    +    Turn off TEST_PASSES_SANITIZE_LEAK in t0210 and t0211 tests.
    +
    +    The tests this commit uncover leaks in `builtin/clone.c` and `remote.c`.
    +    Therefore we need to turn off `TEST_PASSES_SANITIZE_LEAK`. The reasons:
    +
    +    - We observed that `the_repository->remote_status` is not released
    +      properly.
    +
    +    - We are using `url...insteadOf` and that runs into a code path where an
    +      allocated URL is replaced with another URL, and the original URL is
    +      never released.
    +
    +    - `remote_states` contains plenty of `struct remote`s whose refspecs
    +      seem to be usually allocated by never released.
    +
    +    More investigation is needed here to identify the exact cause and
    +    proper fixes for these leaks/bugs.
    +
    +    Co-authored-by: Jeff Hostetler <jeffhostetler@github.com>
    +    Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
         Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
     
      ## t/t0210-trace2-normal.sh ##
    +@@
    + 
    + test_description='test trace2 facility (normal target)'
    + 
    +-TEST_PASSES_SANITIZE_LEAK=true
    ++TEST_PASSES_SANITIZE_LEAK=false
    + . ./test-lib.sh
    + 
    + # Turn off any inherited trace2 settings for this test.
     @@ t/t0210-trace2-normal.sh: test_expect_success 'using global config with include' '
      	test_cmp expect actual
      '
  • 2: ea30fa3 ! 2: fe87e87 t0211: test URL redacting in PERF format

    @@ Commit message
         This transmogrifies the test case that was just added to t0210, to also
         cover the `GIT_TRACE2_PERF` backend.
     
    +    Just like t0211, we now have to toggle the `TEST_PASSES_SANITIZE_LEAK`
    +    annotation.
    +
         Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
     
      ## t/t0211-trace2-perf.sh ##
    +@@
    + 
    + test_description='test trace2 facility (perf target)'
    + 
    +-TEST_PASSES_SANITIZE_LEAK=true
    ++TEST_PASSES_SANITIZE_LEAK=false
    + . ./test-lib.sh
    + 
    + # Turn off any inherited trace2 settings for this test.
     @@ t/t0211-trace2-perf.sh: test_expect_success PTHREADS 'global counter test/test2' '
      	have_counter_event "main" "counter" "test" "test2" 60 actual
      '
  • 3: 185b4de ! 3: cea7cd2 t0212: test URL redacting in EVENT format

    @@ Metadata
      ## Commit message ##
         t0212: test URL redacting in EVENT format
     
    -    This transmogrifies the test case that was just added to t0210, and then
    -    transmogrified and added to t0211, to also cover the `GIT_TRACE2_EVENT`
    -    backend.
    +    Furthermore, the Trace2 routines print the same content in normal, perf,
    +    or event format, and in t0210 and t0211 we already tested the basic
    +    functionality, so no need to repeat it here.
    +
    +    In this test, we use the test-helper to unit test each of the event
    +    messages where URLs can appear and confirm that they are redacted in
    +    each event.
     
         Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
     
  • 4: b7d5f13 < -: ----------- t0210,t0211: unset SANITIZE_LEAK

But I guess I misunderstood and the last commit needs to move the code comment into the commit message?

@@ -2,7 +2,7 @@

test_description='test trace2 facility (normal target)'

TEST_PASSES_SANITIZE_LEAK=true
TEST_PASSES_SANITIZE_LEAK=false
. ./test-lib.sh

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment on the commit message:

s/The tests this commit/The new tests added in this commit/

Maybe drop the "Turn off TEST... in t0210 and t0211 tests." sentence, since it is explained in the next paragraph (and we only modified t0210 in this commit).

dscho and others added 3 commits November 22, 2023 20:07
It is an unsafe practice to call something like

	git clone https://user:password@example.com/

This not only risks leaking the password "over the shoulder" or into the
readline history of the current Unix shell, it also gets logged via
Trace2 if enabled.

Let's at least avoid logging such secrets via Trace2, much like we avoid
logging secrets in `http.c`. Much like the code in `http.c` is guarded
via `GIT_TRACE_REDACT` (defaulting to `true`), we guard the new code via
`GIT_TRACE2_REDACT` (also defaulting to `true`).

The new tests added in this commit uncover leaks in `builtin/clone.c`
and `remote.c`. Therefore we need to turn off
`TEST_PASSES_SANITIZE_LEAK`. The reasons:

- We observed that `the_repository->remote_status` is not released
  properly.

- We are using `url...insteadOf` and that runs into a code path where an
  allocated URL is replaced with another URL, and the original URL is
  never released.

- `remote_states` contains plenty of `struct remote`s whose refspecs
  seem to be usually allocated by never released.

More investigation is needed here to identify the exact cause and
proper fixes for these leaks/bugs.

Co-authored-by: Jeff Hostetler <jeffhostetler@github.com>
Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This transmogrifies the test case that was just added to t0210, to also
cover the `GIT_TRACE2_PERF` backend.

Just like t0211, we now have to toggle the `TEST_PASSES_SANITIZE_LEAK`
annotation.

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
In the added tests cases, skip testing the `GIT_TRACE2_REDACT=0` case
because we would need to exactly model the full JSON event stream like
we did in the preceding basic tests and I do not think it is worth it.

Furthermore, the Trace2 routines print the same content in normal, perf,
or event format, and in t0210 and t0211 we already tested the basic
functionality, so no need to repeat it here.

In this test, we use the test-helper to unit test each of the event
messages where URLs can appear and confirm that they are redacted in
each event.

Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
@dscho dscho force-pushed the trace2-redact-credentials-in-https-urls branch from cea7cd2 to 9bfd00c Compare November 22, 2023 19:12
@dscho
Copy link
Member Author

dscho commented Nov 22, 2023

Next attempt.

Range-diff

  • 1: 0c4954b ! 1: a1686ab trace2: redact passwords from https:// URLs by default

    @@ Commit message
         via `GIT_TRACE_REDACT` (defaulting to `true`), we guard the new code via
         `GIT_TRACE2_REDACT` (also defaulting to `true`).
     
    -    Turn off TEST_PASSES_SANITIZE_LEAK in t0210 and t0211 tests.
    -
    -    The tests this commit uncover leaks in `builtin/clone.c` and `remote.c`.
    -    Therefore we need to turn off `TEST_PASSES_SANITIZE_LEAK`. The reasons:
    +    The new tests added in this commit uncover leaks in `builtin/clone.c`
    +    and `remote.c`. Therefore we need to turn off
    +    `TEST_PASSES_SANITIZE_LEAK`. The reasons:
     
         - We observed that `the_repository->remote_status` is not released
           properly.
  • 2: fe87e87 = 2: e50160f t0211: test URL redacting in PERF format

  • 3: cea7cd2 ! 3: 3e436f7a28d t0212: test URL redacting in EVENT format

    @@ Metadata
      ## Commit message ##
         t0212: test URL redacting in EVENT format
     
    +    In the added tests cases, skip testing the `GIT_TRACE2_REDACT=0` case
    +    because we would need to exactly model the full JSON event stream like
    +    we did in the preceding basic tests and I do not think it is worth it.
    +
         Furthermore, the Trace2 routines print the same content in normal, perf,
         or event format, and in t0210 and t0211 we already tested the basic
         functionality, so no need to repeat it here.
    @@ t/t0212-trace2-event.sh: test_expect_success 'discard traces when there are too
      	head -n2 trace_target_dir/git-trace2-discard | tail -n1 | grep \"event\":\"too_many_files\"
      '
      
    -+# In the following "...redact..." tests, skip testing the GIT_TRACE2_REDACT=0
    -+# case because we would need to exactly model the full JSON event stream like
    -+# we did in the basic tests above and I do not think it is worth it.
    -+
     +test_expect_success 'unsafe URLs are redacted by default in cmd_start events' '
     +	test_when_finished \
     +		"rm -r trace.event" &&

@dscho dscho marked this pull request as ready for review November 22, 2023 19:16
@dscho
Copy link
Member Author

dscho commented Nov 22, 2023

/submit

Copy link

gitgitgadget bot commented Nov 22, 2023

Submitted as pull.1616.git.1700680717.gitgitgadget@gmail.com

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-1616/dscho/trace2-redact-credentials-in-https-urls-v1

To fetch this version to local tag pr-1616/dscho/trace2-redact-credentials-in-https-urls-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-1616/dscho/trace2-redact-credentials-in-https-urls-v1

@@ -337,8 +337,8 @@ struct key_value_info;
void trace2_def_param_fl(const char *file, int line, const char *param,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Jeff Hostetler <jeffhostetler@github.com>
>
> Add `struct key_value_info` argument to `trace2_def_param()`.
>
> In dc90208497 (trace2: plumb config kvi, 2023-06-28) a `kvi`
> argument was added to `trace2_def_param_fl()` but the macro
> was not up updated. Let's fix that.
>
> Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
> ---
>  trace2.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/trace2.h b/trace2.h
> index 40d8c2e02a5..1f0669bbd2d 100644
> --- a/trace2.h
> +++ b/trace2.h
> @@ -337,8 +337,8 @@ struct key_value_info;
>  void trace2_def_param_fl(const char *file, int line, const char *param,
>  			 const char *value, const struct key_value_info *kvi);
>  
> -#define trace2_def_param(param, value) \
> -	trace2_def_param_fl(__FILE__, __LINE__, (param), (value))
> +#define trace2_def_param(param, value, kvi) \
> +	trace2_def_param_fl(__FILE__, __LINE__, (param), (value), (kvi))

IOW, this macro was not used back when it was updated, and nobody
used it since then?  

I briefly wondered if we are better off removing it but that does
not make sense because you are adding a new (and only) user to it.

Will queue.  Thanks.

>  
>  /*
>   * Tell trace2 about a newly instantiated repo object and assign

Copy link

gitgitgadget bot commented Nov 23, 2023

This branch is now known as jh/trace2-redact-auth.

Copy link

gitgitgadget bot commented Nov 23, 2023

This patch series was integrated into seen via git@f086c48.

@gitgitgadget gitgitgadget bot added the seen label Nov 23, 2023
@@ -2,7 +2,7 @@

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Wed, Nov 22, 2023 at 11:19 AM Johannes Schindelin via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> It is an unsafe practice to call something like
>
>         git clone https://user:password@example.com/
>
> This not only risks leaking the password "over the shoulder" or into the
> readline history of the current Unix shell, it also gets logged via
> Trace2 if enabled.

Indeed.  Clone urls _also_ seem to be slurped up by other tools, such
as IDEs, and possibly sent off to various third-party cloud services
when users have various AI-assist plugins installed in their IDEs,
resulting in some infosec incidents and fire drills.  (Not a
theoretical scenario, and not fun.)

> Let's at least avoid logging such secrets via Trace2, much like we avoid
> logging secrets in `http.c`. Much like the code in `http.c` is guarded
> via `GIT_TRACE_REDACT` (defaulting to `true`), we guard the new code via
> `GIT_TRACE2_REDACT` (also defaulting to `true`).

Training users is hard.  I appreciate the changes here to make trace2
not be a leak vector, but is it time to perhaps consider bigger safety
measures: At the clone/fetch level, automatically warn loudly whenever
such a URL is provided, accompanied with a note that in the future it
will be turned into a hard error?

Either way, I agree with your "at least" comment here and the changes
you are making.

> The new tests added in this commit uncover leaks in `builtin/clone.c`
> and `remote.c`. Therefore we need to turn off
> `TEST_PASSES_SANITIZE_LEAK`. The reasons:
>
> - We observed that `the_repository->remote_status` is not released
>   properly.
>
> - We are using `url...insteadOf` and that runs into a code path where an
>   allocated URL is replaced with another URL, and the original URL is
>   never released.
>
> - `remote_states` contains plenty of `struct remote`s whose refspecs
>   seem to be usually allocated by never released.
>
> More investigation is needed here to identify the exact cause and
> proper fixes for these leaks/bugs.

Thanks for carefully documenting and explaining.

> Co-authored-by: Jeff Hostetler <jeffhostetler@github.com>
> Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  t/t0210-trace2-normal.sh |  20 ++++++-
>  trace2.c                 | 120 ++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 136 insertions(+), 4 deletions(-)
>
> diff --git a/t/t0210-trace2-normal.sh b/t/t0210-trace2-normal.sh
> index 80e76a4695e..c312657a12c 100755
> --- a/t/t0210-trace2-normal.sh
> +++ b/t/t0210-trace2-normal.sh
> @@ -2,7 +2,7 @@
>
>  test_description='test trace2 facility (normal target)'
>
> -TEST_PASSES_SANITIZE_LEAK=true
> +TEST_PASSES_SANITIZE_LEAK=false
>  . ./test-lib.sh
>
>  # Turn off any inherited trace2 settings for this test.
> @@ -283,4 +283,22 @@ test_expect_success 'using global config with include' '
>         test_cmp expect actual
>  '
>
> +test_expect_success 'unsafe URLs are redacted by default' '
> +       test_when_finished \
> +               "rm -r trace.normal unredacted.normal clone clone2" &&
> +
> +       test_config_global \
> +               "url.$(pwd).insteadOf" https://user:pwd@example.com/ &&
> +       test_config_global trace2.configParams "core.*,remote.*.url" &&
> +
> +       GIT_TRACE2="$(pwd)/trace.normal" \
> +               git clone https://user:pwd@example.com/ clone &&
> +       ! grep user:pwd trace.normal &&
> +
> +       GIT_TRACE2_REDACT=0 GIT_TRACE2="$(pwd)/unredacted.normal" \
> +               git clone https://user:pwd@example.com/ clone2 &&
> +       grep "start .* clone https://user:pwd@example.com" unredacted.normal &&
> +       grep "remote.origin.url=https://user:pwd@example.com" unredacted.normal
> +'
> +
>  test_done
> diff --git a/trace2.c b/trace2.c
> index 6dc74dff4c7..87d9a3a0361 100644
> --- a/trace2.c
> +++ b/trace2.c
> @@ -20,6 +20,7 @@
>  #include "trace2/tr2_tmr.h"
>
>  static int trace2_enabled;
> +static int trace2_redact = 1;
>
>  static int tr2_next_child_id; /* modify under lock */
>  static int tr2_next_exec_id; /* modify under lock */
> @@ -227,6 +228,8 @@ void trace2_initialize_fl(const char *file, int line)
>         if (!tr2_tgt_want_builtins())
>                 return;
>         trace2_enabled = 1;
> +       if (!git_env_bool("GIT_TRACE2_REDACT", 1))
> +               trace2_redact = 0;
>
>         tr2_sid_get();
>
> @@ -247,12 +250,93 @@ int trace2_is_enabled(void)
>         return trace2_enabled;
>  }
>
> +/*
> + * Redacts an argument, i.e. ensures that no password in
> + * https://user:password@host/-style URLs is logged.
> + *
> + * Returns the original if nothing needed to be redacted.
> + * Returns a pointer that needs to be `free()`d otherwise.
> + */
> +static const char *redact_arg(const char *arg)
> +{
> +       const char *p, *colon;
> +       size_t at;
> +
> +       if (!trace2_redact ||
> +           (!skip_prefix(arg, "https://", &p) &&
> +            !skip_prefix(arg, "http://", &p)))
> +               return arg;
> +
> +       at = strcspn(p, "@/");
> +       if (p[at] != '@')
> +               return arg;
> +
> +       colon = memchr(p, ':', at);
> +       if (!colon)
> +               return arg;
> +
> +       return xstrfmt("%.*s:<REDACTED>%s", (int)(colon - arg), arg, p + at);
> +}
> +
> +/*
> + * Redacts arguments in an argument list.
> + *
> + * Returns the original if nothing needed to be redacted.
> + * Otherwise, returns a new array that needs to be released
> + * via `free_redacted_argv()`.
> + */
> +static const char **redact_argv(const char **argv)
> +{
> +       int i, j;
> +       const char *redacted = NULL;
> +       const char **ret;
> +
> +       if (!trace2_redact)
> +               return argv;
> +
> +       for (i = 0; argv[i]; i++)
> +               if ((redacted = redact_arg(argv[i])) != argv[i])
> +                       break;
> +
> +       if (!argv[i])
> +               return argv;
> +
> +       for (j = 0; argv[j]; j++)
> +               ; /* keep counting */
> +
> +       ALLOC_ARRAY(ret, j + 1);
> +       ret[j] = NULL;
> +
> +       for (j = 0; j < i; j++)
> +               ret[j] = argv[j];
> +       ret[i] = redacted;
> +       for (++i; argv[i]; i++) {
> +               redacted = redact_arg(argv[i]);
> +               ret[i] = redacted ? redacted : argv[i];
> +       }
> +
> +       return ret;
> +}
> +
> +static void free_redacted_argv(const char **redacted, const char **argv)
> +{
> +       int i;
> +
> +       if (redacted != argv) {
> +               for (i = 0; argv[i]; i++)
> +                       if (redacted[i] != argv[i])
> +                               free((void *)redacted[i]);
> +               free((void *)redacted);
> +       }
> +}
> +
>  void trace2_cmd_start_fl(const char *file, int line, const char **argv)
>  {
>         struct tr2_tgt *tgt_j;
>         int j;
>         uint64_t us_now;
>         uint64_t us_elapsed_absolute;
> +       const char **redacted;
>
>         if (!trace2_enabled)
>                 return;
> @@ -260,10 +344,14 @@ void trace2_cmd_start_fl(const char *file, int line, const char **argv)
>         us_now = getnanotime() / 1000;
>         us_elapsed_absolute = tr2tls_absolute_elapsed(us_now);
>
> +       redacted = redact_argv(argv);
> +
>         for_each_wanted_builtin (j, tgt_j)
>                 if (tgt_j->pfn_start_fl)
>                         tgt_j->pfn_start_fl(file, line, us_elapsed_absolute,
> -                                           argv);
> +                                           redacted);
> +
> +       free_redacted_argv(redacted, argv);
>  }
>
>  void trace2_cmd_exit_fl(const char *file, int line, int code)
> @@ -409,6 +497,7 @@ void trace2_child_start_fl(const char *file, int line,
>         int j;
>         uint64_t us_now;
>         uint64_t us_elapsed_absolute;
> +       const char **orig_argv = cmd->args.v;
>
>         if (!trace2_enabled)
>                 return;
> @@ -419,10 +508,24 @@ void trace2_child_start_fl(const char *file, int line,
>         cmd->trace2_child_id = tr2tls_locked_increment(&tr2_next_child_id);
>         cmd->trace2_child_us_start = us_now;
>
> +       /*
> +        * The `pfn_child_start_fl` API takes a `struct child_process`
> +        * rather than a simple `argv` for the child because some
> +        * targets make use of the additional context bits/values. So
> +        * temporarily replace the original argv (inside the `strvec`)
> +        * with a possibly redacted version.
> +        */
> +       cmd->args.v = redact_argv(orig_argv);
> +
>         for_each_wanted_builtin (j, tgt_j)
>                 if (tgt_j->pfn_child_start_fl)
>                         tgt_j->pfn_child_start_fl(file, line,
>                                                   us_elapsed_absolute, cmd);
> +
> +       if (cmd->args.v != orig_argv) {
> +               free_redacted_argv(cmd->args.v, orig_argv);
> +               cmd->args.v = orig_argv;
> +       }
>  }
>
>  void trace2_child_exit_fl(const char *file, int line, struct child_process *cmd,
> @@ -493,6 +596,7 @@ int trace2_exec_fl(const char *file, int line, const char *exe,
>         int exec_id;
>         uint64_t us_now;
>         uint64_t us_elapsed_absolute;
> +       const char **redacted;
>
>         if (!trace2_enabled)
>                 return -1;
> @@ -502,10 +606,14 @@ int trace2_exec_fl(const char *file, int line, const char *exe,
>
>         exec_id = tr2tls_locked_increment(&tr2_next_exec_id);
>
> +       redacted = redact_argv(argv);
> +
>         for_each_wanted_builtin (j, tgt_j)
>                 if (tgt_j->pfn_exec_fl)
>                         tgt_j->pfn_exec_fl(file, line, us_elapsed_absolute,
> -                                          exec_id, exe, argv);
> +                                          exec_id, exe, redacted);
> +
> +       free_redacted_argv(redacted, argv);
>
>         return exec_id;
>  }
> @@ -637,13 +745,19 @@ void trace2_def_param_fl(const char *file, int line, const char *param,
>  {
>         struct tr2_tgt *tgt_j;
>         int j;
> +       const char *redacted;
>
>         if (!trace2_enabled)
>                 return;
>
> +       redacted = redact_arg(value);
> +
>         for_each_wanted_builtin (j, tgt_j)
>                 if (tgt_j->pfn_param_fl)
> -                       tgt_j->pfn_param_fl(file, line, param, value, kvi);
> +                       tgt_j->pfn_param_fl(file, line, param, redacted, kvi);
> +
> +       if (redacted != value)
> +               free((void *)redacted);
>  }
>
>  void trace2_def_repo_fl(const char *file, int line, struct repository *repo)
> --
> gitgitgadget

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff King wrote (reply to this):

On Thu, Nov 23, 2023 at 10:59:20AM -0800, Elijah Newren wrote:

> > Let's at least avoid logging such secrets via Trace2, much like we avoid
> > logging secrets in `http.c`. Much like the code in `http.c` is guarded
> > via `GIT_TRACE_REDACT` (defaulting to `true`), we guard the new code via
> > `GIT_TRACE2_REDACT` (also defaulting to `true`).
> 
> Training users is hard.  I appreciate the changes here to make trace2
> not be a leak vector, but is it time to perhaps consider bigger safety
> measures: At the clone/fetch level, automatically warn loudly whenever
> such a URL is provided, accompanied with a note that in the future it
> will be turned into a hard error?

Yes, the password in such a case ends up in the plaintext .git/config
file, which is not great.

There's some discussion and patches here:

  https://lore.kernel.org/git/nycvar.QRO.7.76.6.1905172121130.46@tvgsbejvaqbjf.bet/

I meant to follow up on them, but never did.

-Peff

Copy link

gitgitgadget bot commented Nov 23, 2023

User Elijah Newren <newren@gmail.com> has been added to the cc: list.

Copy link

gitgitgadget bot commented Nov 23, 2023

On the Git mailing list, Elijah Newren wrote (reply to this):

On Wed, Nov 22, 2023 at 11:18 AM Johannes Schindelin via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> The Trace2 output can contain secrets when a user issues a Git command with
> sensitive information in the command-line. A typical (if highly discouraged)
> example is: git clone https://user:password@host.com/.
>
> With this PR, the Trace2 output redacts passwords in such URLs by default.
>
> This series also includes a commit to temporarily disable leak checking on
> t0210,t0211 because the tests uncover other unrelated bugs in Git.
>
> These patches were integrated into Microsoft's fork of Git, as
> https://github.com/microsoft/git/pull/616, and have been cooking there ever
> since.

Thanks for making these changes.  Makes me wonder, back when we were
logging trace2 data, if we had some of these leaks.  Eek.

As I commented in patch 2, I think this is a good start, but I'm
curious if others would be willing to turn clone/fetch of such bad
URLs into warnings for now and errors later.  The prevalence of
AI-assist add-ons for various IDEs and the number of developers opting
to use those IDEs and add-ons, and the fact that these tools sometimes
include repository URLs in what they send off to third parties, makes
me wonder if our recent infosec fire drill is soon going to be a
widely shared experience by lots of other companies and individuals.
Training users to not do bad things is hard, and it might be worth
saving them from themselves.  Thoughts?

Copy link

gitgitgadget bot commented Nov 27, 2023

There was a status update in the "New Topics" section about the branch jh/trace2-redact-auth on the Git mailing list:

trace2 streams used to record the URLs that potentially embed
authentication material, which has been corrected.

Will merge to 'next'.
source: <pull.1616.git.1700680717.gitgitgadget@gmail.com>

Copy link

gitgitgadget bot commented Nov 27, 2023

This patch series was integrated into seen via git@0c58ab9.

Copy link

gitgitgadget bot commented Nov 27, 2023

User Jeff King <peff@peff.net> has been added to the cc: list.

Copy link

gitgitgadget bot commented Dec 9, 2023

This patch series was integrated into seen via git@fed91a4.

Copy link

gitgitgadget bot commented Dec 9, 2023

There was a status update in the "Cooking" section about the branch jh/trace2-redact-auth on the Git mailing list:

trace2 streams used to record the URLs that potentially embed
authentication material, which has been corrected.

Will merge to 'next'.
source: <pull.1616.git.1700680717.gitgitgadget@gmail.com>

Copy link

gitgitgadget bot commented Dec 10, 2023

This patch series was integrated into seen via git@89ea8f0.

Copy link

gitgitgadget bot commented Dec 11, 2023

This patch series was integrated into next via git@7e679a4.

@gitgitgadget gitgitgadget bot added the next label Dec 11, 2023
Copy link

gitgitgadget bot commented Dec 11, 2023

This patch series was integrated into seen via git@3153568.

Copy link

gitgitgadget bot commented Dec 12, 2023

There was a status update in the "Cooking" section about the branch jh/trace2-redact-auth on the Git mailing list:

trace2 streams used to record the URLs that potentially embed
authentication material, which has been corrected.

Will merge to 'master'.
source: <pull.1616.git.1700680717.gitgitgadget@gmail.com>

Copy link

gitgitgadget bot commented Dec 13, 2023

This patch series was integrated into seen via git@a2491ca.

Copy link

gitgitgadget bot commented Dec 13, 2023

This patch series was integrated into seen via git@52d1b6a.

Copy link

gitgitgadget bot commented Dec 14, 2023

This patch series was integrated into seen via git@3f332e3.

Copy link

gitgitgadget bot commented Dec 15, 2023

This patch series was integrated into seen via git@c5f4444.

Copy link

gitgitgadget bot commented Dec 19, 2023

This patch series was integrated into seen via git@c1aefb6.

Copy link

gitgitgadget bot commented Dec 19, 2023

This patch series was integrated into master via git@c1aefb6.

Copy link

gitgitgadget bot commented Dec 19, 2023

This patch series was integrated into next via git@c1aefb6.

@gitgitgadget gitgitgadget bot added the master label Dec 19, 2023
@gitgitgadget gitgitgadget bot closed this Dec 19, 2023
Copy link

gitgitgadget bot commented Dec 19, 2023

Closed via c1aefb6.

@dscho dscho deleted the trace2-redact-credentials-in-https-urls branch December 19, 2023 07:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants