Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redact unsafe URLs in the Trace2 output #1616

Closed
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 19 additions & 1 deletion t/t0210-trace2-normal.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Elijah Newren wrote (reply to this):

On Wed, Nov 22, 2023 at 11:19 AM Johannes Schindelin via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> It is an unsafe practice to call something like
>
>         git clone https://user:password@example.com/
>
> This not only risks leaking the password "over the shoulder" or into the
> readline history of the current Unix shell, it also gets logged via
> Trace2 if enabled.

Indeed.  Clone urls _also_ seem to be slurped up by other tools, such
as IDEs, and possibly sent off to various third-party cloud services
when users have various AI-assist plugins installed in their IDEs,
resulting in some infosec incidents and fire drills.  (Not a
theoretical scenario, and not fun.)

> Let's at least avoid logging such secrets via Trace2, much like we avoid
> logging secrets in `http.c`. Much like the code in `http.c` is guarded
> via `GIT_TRACE_REDACT` (defaulting to `true`), we guard the new code via
> `GIT_TRACE2_REDACT` (also defaulting to `true`).

Training users is hard.  I appreciate the changes here to make trace2
not be a leak vector, but is it time to perhaps consider bigger safety
measures: At the clone/fetch level, automatically warn loudly whenever
such a URL is provided, accompanied with a note that in the future it
will be turned into a hard error?

Either way, I agree with your "at least" comment here and the changes
you are making.

> The new tests added in this commit uncover leaks in `builtin/clone.c`
> and `remote.c`. Therefore we need to turn off
> `TEST_PASSES_SANITIZE_LEAK`. The reasons:
>
> - We observed that `the_repository->remote_status` is not released
>   properly.
>
> - We are using `url...insteadOf` and that runs into a code path where an
>   allocated URL is replaced with another URL, and the original URL is
>   never released.
>
> - `remote_states` contains plenty of `struct remote`s whose refspecs
>   seem to be usually allocated by never released.
>
> More investigation is needed here to identify the exact cause and
> proper fixes for these leaks/bugs.

Thanks for carefully documenting and explaining.

> Co-authored-by: Jeff Hostetler <jeffhostetler@github.com>
> Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
> ---
>  t/t0210-trace2-normal.sh |  20 ++++++-
>  trace2.c                 | 120 ++++++++++++++++++++++++++++++++++++++-
>  2 files changed, 136 insertions(+), 4 deletions(-)
>
> diff --git a/t/t0210-trace2-normal.sh b/t/t0210-trace2-normal.sh
> index 80e76a4695e..c312657a12c 100755
> --- a/t/t0210-trace2-normal.sh
> +++ b/t/t0210-trace2-normal.sh
> @@ -2,7 +2,7 @@
>
>  test_description='test trace2 facility (normal target)'
>
> -TEST_PASSES_SANITIZE_LEAK=true
> +TEST_PASSES_SANITIZE_LEAK=false
>  . ./test-lib.sh
>
>  # Turn off any inherited trace2 settings for this test.
> @@ -283,4 +283,22 @@ test_expect_success 'using global config with include' '
>         test_cmp expect actual
>  '
>
> +test_expect_success 'unsafe URLs are redacted by default' '
> +       test_when_finished \
> +               "rm -r trace.normal unredacted.normal clone clone2" &&
> +
> +       test_config_global \
> +               "url.$(pwd).insteadOf" https://user:pwd@example.com/ &&
> +       test_config_global trace2.configParams "core.*,remote.*.url" &&
> +
> +       GIT_TRACE2="$(pwd)/trace.normal" \
> +               git clone https://user:pwd@example.com/ clone &&
> +       ! grep user:pwd trace.normal &&
> +
> +       GIT_TRACE2_REDACT=0 GIT_TRACE2="$(pwd)/unredacted.normal" \
> +               git clone https://user:pwd@example.com/ clone2 &&
> +       grep "start .* clone https://user:pwd@example.com" unredacted.normal &&
> +       grep "remote.origin.url=https://user:pwd@example.com" unredacted.normal
> +'
> +
>  test_done
> diff --git a/trace2.c b/trace2.c
> index 6dc74dff4c7..87d9a3a0361 100644
> --- a/trace2.c
> +++ b/trace2.c
> @@ -20,6 +20,7 @@
>  #include "trace2/tr2_tmr.h"
>
>  static int trace2_enabled;
> +static int trace2_redact = 1;
>
>  static int tr2_next_child_id; /* modify under lock */
>  static int tr2_next_exec_id; /* modify under lock */
> @@ -227,6 +228,8 @@ void trace2_initialize_fl(const char *file, int line)
>         if (!tr2_tgt_want_builtins())
>                 return;
>         trace2_enabled = 1;
> +       if (!git_env_bool("GIT_TRACE2_REDACT", 1))
> +               trace2_redact = 0;
>
>         tr2_sid_get();
>
> @@ -247,12 +250,93 @@ int trace2_is_enabled(void)
>         return trace2_enabled;
>  }
>
> +/*
> + * Redacts an argument, i.e. ensures that no password in
> + * https://user:password@host/-style URLs is logged.
> + *
> + * Returns the original if nothing needed to be redacted.
> + * Returns a pointer that needs to be `free()`d otherwise.
> + */
> +static const char *redact_arg(const char *arg)
> +{
> +       const char *p, *colon;
> +       size_t at;
> +
> +       if (!trace2_redact ||
> +           (!skip_prefix(arg, "https://", &p) &&
> +            !skip_prefix(arg, "http://", &p)))
> +               return arg;
> +
> +       at = strcspn(p, "@/");
> +       if (p[at] != '@')
> +               return arg;
> +
> +       colon = memchr(p, ':', at);
> +       if (!colon)
> +               return arg;
> +
> +       return xstrfmt("%.*s:<REDACTED>%s", (int)(colon - arg), arg, p + at);
> +}
> +
> +/*
> + * Redacts arguments in an argument list.
> + *
> + * Returns the original if nothing needed to be redacted.
> + * Otherwise, returns a new array that needs to be released
> + * via `free_redacted_argv()`.
> + */
> +static const char **redact_argv(const char **argv)
> +{
> +       int i, j;
> +       const char *redacted = NULL;
> +       const char **ret;
> +
> +       if (!trace2_redact)
> +               return argv;
> +
> +       for (i = 0; argv[i]; i++)
> +               if ((redacted = redact_arg(argv[i])) != argv[i])
> +                       break;
> +
> +       if (!argv[i])
> +               return argv;
> +
> +       for (j = 0; argv[j]; j++)
> +               ; /* keep counting */
> +
> +       ALLOC_ARRAY(ret, j + 1);
> +       ret[j] = NULL;
> +
> +       for (j = 0; j < i; j++)
> +               ret[j] = argv[j];
> +       ret[i] = redacted;
> +       for (++i; argv[i]; i++) {
> +               redacted = redact_arg(argv[i]);
> +               ret[i] = redacted ? redacted : argv[i];
> +       }
> +
> +       return ret;
> +}
> +
> +static void free_redacted_argv(const char **redacted, const char **argv)
> +{
> +       int i;
> +
> +       if (redacted != argv) {
> +               for (i = 0; argv[i]; i++)
> +                       if (redacted[i] != argv[i])
> +                               free((void *)redacted[i]);
> +               free((void *)redacted);
> +       }
> +}
> +
>  void trace2_cmd_start_fl(const char *file, int line, const char **argv)
>  {
>         struct tr2_tgt *tgt_j;
>         int j;
>         uint64_t us_now;
>         uint64_t us_elapsed_absolute;
> +       const char **redacted;
>
>         if (!trace2_enabled)
>                 return;
> @@ -260,10 +344,14 @@ void trace2_cmd_start_fl(const char *file, int line, const char **argv)
>         us_now = getnanotime() / 1000;
>         us_elapsed_absolute = tr2tls_absolute_elapsed(us_now);
>
> +       redacted = redact_argv(argv);
> +
>         for_each_wanted_builtin (j, tgt_j)
>                 if (tgt_j->pfn_start_fl)
>                         tgt_j->pfn_start_fl(file, line, us_elapsed_absolute,
> -                                           argv);
> +                                           redacted);
> +
> +       free_redacted_argv(redacted, argv);
>  }
>
>  void trace2_cmd_exit_fl(const char *file, int line, int code)
> @@ -409,6 +497,7 @@ void trace2_child_start_fl(const char *file, int line,
>         int j;
>         uint64_t us_now;
>         uint64_t us_elapsed_absolute;
> +       const char **orig_argv = cmd->args.v;
>
>         if (!trace2_enabled)
>                 return;
> @@ -419,10 +508,24 @@ void trace2_child_start_fl(const char *file, int line,
>         cmd->trace2_child_id = tr2tls_locked_increment(&tr2_next_child_id);
>         cmd->trace2_child_us_start = us_now;
>
> +       /*
> +        * The `pfn_child_start_fl` API takes a `struct child_process`
> +        * rather than a simple `argv` for the child because some
> +        * targets make use of the additional context bits/values. So
> +        * temporarily replace the original argv (inside the `strvec`)
> +        * with a possibly redacted version.
> +        */
> +       cmd->args.v = redact_argv(orig_argv);
> +
>         for_each_wanted_builtin (j, tgt_j)
>                 if (tgt_j->pfn_child_start_fl)
>                         tgt_j->pfn_child_start_fl(file, line,
>                                                   us_elapsed_absolute, cmd);
> +
> +       if (cmd->args.v != orig_argv) {
> +               free_redacted_argv(cmd->args.v, orig_argv);
> +               cmd->args.v = orig_argv;
> +       }
>  }
>
>  void trace2_child_exit_fl(const char *file, int line, struct child_process *cmd,
> @@ -493,6 +596,7 @@ int trace2_exec_fl(const char *file, int line, const char *exe,
>         int exec_id;
>         uint64_t us_now;
>         uint64_t us_elapsed_absolute;
> +       const char **redacted;
>
>         if (!trace2_enabled)
>                 return -1;
> @@ -502,10 +606,14 @@ int trace2_exec_fl(const char *file, int line, const char *exe,
>
>         exec_id = tr2tls_locked_increment(&tr2_next_exec_id);
>
> +       redacted = redact_argv(argv);
> +
>         for_each_wanted_builtin (j, tgt_j)
>                 if (tgt_j->pfn_exec_fl)
>                         tgt_j->pfn_exec_fl(file, line, us_elapsed_absolute,
> -                                          exec_id, exe, argv);
> +                                          exec_id, exe, redacted);
> +
> +       free_redacted_argv(redacted, argv);
>
>         return exec_id;
>  }
> @@ -637,13 +745,19 @@ void trace2_def_param_fl(const char *file, int line, const char *param,
>  {
>         struct tr2_tgt *tgt_j;
>         int j;
> +       const char *redacted;
>
>         if (!trace2_enabled)
>                 return;
>
> +       redacted = redact_arg(value);
> +
>         for_each_wanted_builtin (j, tgt_j)
>                 if (tgt_j->pfn_param_fl)
> -                       tgt_j->pfn_param_fl(file, line, param, value, kvi);
> +                       tgt_j->pfn_param_fl(file, line, param, redacted, kvi);
> +
> +       if (redacted != value)
> +               free((void *)redacted);
>  }
>
>  void trace2_def_repo_fl(const char *file, int line, struct repository *repo)
> --
> gitgitgadget

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jeff King wrote (reply to this):

On Thu, Nov 23, 2023 at 10:59:20AM -0800, Elijah Newren wrote:

> > Let's at least avoid logging such secrets via Trace2, much like we avoid
> > logging secrets in `http.c`. Much like the code in `http.c` is guarded
> > via `GIT_TRACE_REDACT` (defaulting to `true`), we guard the new code via
> > `GIT_TRACE2_REDACT` (also defaulting to `true`).
> 
> Training users is hard.  I appreciate the changes here to make trace2
> not be a leak vector, but is it time to perhaps consider bigger safety
> measures: At the clone/fetch level, automatically warn loudly whenever
> such a URL is provided, accompanied with a note that in the future it
> will be turned into a hard error?

Yes, the password in such a case ends up in the plaintext .git/config
file, which is not great.

There's some discussion and patches here:

  https://lore.kernel.org/git/nycvar.QRO.7.76.6.1905172121130.46@tvgsbejvaqbjf.bet/

I meant to follow up on them, but never did.

-Peff

test_description='test trace2 facility (normal target)'

TEST_PASSES_SANITIZE_LEAK=true
TEST_PASSES_SANITIZE_LEAK=false
. ./test-lib.sh

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment on the commit message:

s/The tests this commit/The new tests added in this commit/

Maybe drop the "Turn off TEST... in t0210 and t0211 tests." sentence, since it is explained in the next paragraph (and we only modified t0210 in this commit).


# Turn off any inherited trace2 settings for this test.
Expand Down Expand Up @@ -283,4 +283,22 @@ test_expect_success 'using global config with include' '
test_cmp expect actual
'

test_expect_success 'unsafe URLs are redacted by default' '
test_when_finished \
"rm -r trace.normal unredacted.normal clone clone2" &&

test_config_global \
"url.$(pwd).insteadOf" https://user:pwd@example.com/ &&
test_config_global trace2.configParams "core.*,remote.*.url" &&

GIT_TRACE2="$(pwd)/trace.normal" \
git clone https://user:pwd@example.com/ clone &&
! grep user:pwd trace.normal &&

GIT_TRACE2_REDACT=0 GIT_TRACE2="$(pwd)/unredacted.normal" \
git clone https://user:pwd@example.com/ clone2 &&
grep "start .* clone https://user:pwd@example.com" unredacted.normal &&
grep "remote.origin.url=https://user:pwd@example.com" unredacted.normal
'

test_done
120 changes: 117 additions & 3 deletions trace2.c
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
#include "trace2/tr2_tmr.h"

static int trace2_enabled;
static int trace2_redact = 1;

static int tr2_next_child_id; /* modify under lock */
static int tr2_next_exec_id; /* modify under lock */
Expand Down Expand Up @@ -227,6 +228,8 @@ void trace2_initialize_fl(const char *file, int line)
if (!tr2_tgt_want_builtins())
return;
trace2_enabled = 1;
if (!git_env_bool("GIT_TRACE2_REDACT", 1))
trace2_redact = 0;

tr2_sid_get();

Expand All @@ -247,23 +250,108 @@ int trace2_is_enabled(void)
return trace2_enabled;
}

/*
* Redacts an argument, i.e. ensures that no password in
* https://user:password@host/-style URLs is logged.
*
* Returns the original if nothing needed to be redacted.
* Returns a pointer that needs to be `free()`d otherwise.
*/
static const char *redact_arg(const char *arg)
{
const char *p, *colon;
size_t at;

if (!trace2_redact ||
(!skip_prefix(arg, "https://", &p) &&
!skip_prefix(arg, "http://", &p)))
return arg;

at = strcspn(p, "@/");
if (p[at] != '@')
return arg;

colon = memchr(p, ':', at);
if (!colon)
return arg;

return xstrfmt("%.*s:<REDACTED>%s", (int)(colon - arg), arg, p + at);
}

/*
* Redacts arguments in an argument list.
*
* Returns the original if nothing needed to be redacted.
* Otherwise, returns a new array that needs to be released
* via `free_redacted_argv()`.
*/
static const char **redact_argv(const char **argv)
{
int i, j;
const char *redacted = NULL;
const char **ret;

if (!trace2_redact)
return argv;

for (i = 0; argv[i]; i++)
if ((redacted = redact_arg(argv[i])) != argv[i])
break;

if (!argv[i])
return argv;

for (j = 0; argv[j]; j++)
; /* keep counting */

ALLOC_ARRAY(ret, j + 1);
ret[j] = NULL;

for (j = 0; j < i; j++)
ret[j] = argv[j];
ret[i] = redacted;
for (++i; argv[i]; i++) {
redacted = redact_arg(argv[i]);
ret[i] = redacted ? redacted : argv[i];
}

return ret;
}

static void free_redacted_argv(const char **redacted, const char **argv)
{
int i;

if (redacted != argv) {
for (i = 0; argv[i]; i++)
if (redacted[i] != argv[i])
free((void *)redacted[i]);
free((void *)redacted);
}
}

void trace2_cmd_start_fl(const char *file, int line, const char **argv)
{
struct tr2_tgt *tgt_j;
int j;
uint64_t us_now;
uint64_t us_elapsed_absolute;
const char **redacted;

if (!trace2_enabled)
return;

us_now = getnanotime() / 1000;
us_elapsed_absolute = tr2tls_absolute_elapsed(us_now);

redacted = redact_argv(argv);

for_each_wanted_builtin (j, tgt_j)
if (tgt_j->pfn_start_fl)
tgt_j->pfn_start_fl(file, line, us_elapsed_absolute,
argv);
redacted);

free_redacted_argv(redacted, argv);
}

void trace2_cmd_exit_fl(const char *file, int line, int code)
Expand Down Expand Up @@ -409,6 +497,7 @@ void trace2_child_start_fl(const char *file, int line,
int j;
uint64_t us_now;
uint64_t us_elapsed_absolute;
const char **orig_argv = cmd->args.v;

if (!trace2_enabled)
return;
Expand All @@ -419,10 +508,24 @@ void trace2_child_start_fl(const char *file, int line,
cmd->trace2_child_id = tr2tls_locked_increment(&tr2_next_child_id);
cmd->trace2_child_us_start = us_now;

/*
* The `pfn_child_start_fl` API takes a `struct child_process`
* rather than a simple `argv` for the child because some
* targets make use of the additional context bits/values. So
* temporarily replace the original argv (inside the `strvec`)
* with a possibly redacted version.
*/
cmd->args.v = redact_argv(orig_argv);

for_each_wanted_builtin (j, tgt_j)
if (tgt_j->pfn_child_start_fl)
tgt_j->pfn_child_start_fl(file, line,
us_elapsed_absolute, cmd);

if (cmd->args.v != orig_argv) {
free_redacted_argv(cmd->args.v, orig_argv);
cmd->args.v = orig_argv;
}
}

void trace2_child_exit_fl(const char *file, int line, struct child_process *cmd,
Expand Down Expand Up @@ -493,6 +596,7 @@ int trace2_exec_fl(const char *file, int line, const char *exe,
int exec_id;
uint64_t us_now;
uint64_t us_elapsed_absolute;
const char **redacted;

if (!trace2_enabled)
return -1;
Expand All @@ -502,10 +606,14 @@ int trace2_exec_fl(const char *file, int line, const char *exe,

exec_id = tr2tls_locked_increment(&tr2_next_exec_id);

redacted = redact_argv(argv);

for_each_wanted_builtin (j, tgt_j)
if (tgt_j->pfn_exec_fl)
tgt_j->pfn_exec_fl(file, line, us_elapsed_absolute,
exec_id, exe, argv);
exec_id, exe, redacted);

free_redacted_argv(redacted, argv);

return exec_id;
}
Expand Down Expand Up @@ -637,13 +745,19 @@ void trace2_def_param_fl(const char *file, int line, const char *param,
{
struct tr2_tgt *tgt_j;
int j;
const char *redacted;

if (!trace2_enabled)
return;

redacted = redact_arg(value);

for_each_wanted_builtin (j, tgt_j)
if (tgt_j->pfn_param_fl)
tgt_j->pfn_param_fl(file, line, param, value, kvi);
tgt_j->pfn_param_fl(file, line, param, redacted, kvi);

if (redacted != value)
free((void *)redacted);
}

void trace2_def_repo_fl(const char *file, int line, struct repository *repo)
Expand Down
4 changes: 2 additions & 2 deletions trace2.h
Original file line number Diff line number Diff line change
Expand Up @@ -337,8 +337,8 @@ struct key_value_info;
void trace2_def_param_fl(const char *file, int line, const char *param,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Jeff Hostetler via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Jeff Hostetler <jeffhostetler@github.com>
>
> Add `struct key_value_info` argument to `trace2_def_param()`.
>
> In dc90208497 (trace2: plumb config kvi, 2023-06-28) a `kvi`
> argument was added to `trace2_def_param_fl()` but the macro
> was not up updated. Let's fix that.
>
> Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
> ---
>  trace2.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/trace2.h b/trace2.h
> index 40d8c2e02a5..1f0669bbd2d 100644
> --- a/trace2.h
> +++ b/trace2.h
> @@ -337,8 +337,8 @@ struct key_value_info;
>  void trace2_def_param_fl(const char *file, int line, const char *param,
>  			 const char *value, const struct key_value_info *kvi);
>  
> -#define trace2_def_param(param, value) \
> -	trace2_def_param_fl(__FILE__, __LINE__, (param), (value))
> +#define trace2_def_param(param, value, kvi) \
> +	trace2_def_param_fl(__FILE__, __LINE__, (param), (value), (kvi))

IOW, this macro was not used back when it was updated, and nobody
used it since then?  

I briefly wondered if we are better off removing it but that does
not make sense because you are adding a new (and only) user to it.

Will queue.  Thanks.

>  
>  /*
>   * Tell trace2 about a newly instantiated repo object and assign

const char *value, const struct key_value_info *kvi);

#define trace2_def_param(param, value) \
trace2_def_param_fl(__FILE__, __LINE__, (param), (value))
#define trace2_def_param(param, value, kvi) \
trace2_def_param_fl(__FILE__, __LINE__, (param), (value), (kvi))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commit is unrelated to the radacting theme of the series, should we put it first to better isolate it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!


/*
* Tell trace2 about a newly instantiated repo object and assign
Expand Down