-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
regression - glob expansion with brace expansion and parameter exapansion #660
Comments
Yikes. Thanks for the report. Will look into it ASAP. |
The nature of the bug seems to be that, if both a variable expansion (quoted or not, set or not) and a brace expansion are used, quoted wildcards are subject to expansion. |
The closest I've been able to come to fixing this so far is the patch below. All the above reproducers act correctly with this patch and all the current regression tests pass. diff --git a/src/cmd/ksh93/sh/macro.c b/src/cmd/ksh93/sh/macro.c
index 44e48bb92..8717b62e7 100644
--- a/src/cmd/ksh93/sh/macro.c
+++ b/src/cmd/ksh93/sh/macro.c
@@ -2583,7 +2583,7 @@ static void endfield(Mac_t *mp,int split)
mp->atmode = 0;
if(mp->patfound)
{
- int musttrim = mp->wasexpan && !mp->noextpat && strchr(argp->argval,'\\');
+ int musttrim = mp->wasexpan && !mp->quoted && !mp->noextpat && strchr(argp->argval,'\\');
sh.argaddr = 0;
#if SHOPT_BRACEPAT
/* in POSIX mode, disallow brace expansion for unquoted expansions */ Unfortunately the patch also reintroduces #549 for cases where any part of the word is quoted -- even if the quoted part is empty. For example:
The fundamental problem is that, by the time endfield() is reached, we have a complete word and we can no longer distinguish between the quoted and unquoted parts of it. The Mac_t flags apply to the entire word. |
From issue #660: Globs are not expanded with the following: % echo "/"{bin,sbin}"/*" /bin/* /sbin/* % v=/; echo "$v"bin"/*" /bin/* But globbing is unexpectedly performed if both parameter substitution and brace expansion are present: % v=/; echo "$v"{bin,sbin}"/*" [output omitted for the sake of brevity] Obtained from: ksh93/ksh#660 MFH: 2023Q3
From issue #660: Globs are not expanded with the following: % echo "/"{bin,sbin}"/*" /bin/* /sbin/* % v=/; echo "$v"bin"/*" /bin/* But globbing is unexpectedly performed if both parameter substitution and brace expansion are present: % v=/; echo "$v"{bin,sbin}"/*" [output omitted for the sake of brevity] Obtained from: ksh93/ksh#660
Patch version two. In mac_copy(), don't internally backslash-escape a backslash in a glob pattern bracket expression. This also fixes the #549 regression reintroduced by the previous patch, at least for standard glob patterns. For this, we need to use the Patch v2diff --git a/src/cmd/ksh93/sh/macro.c b/src/cmd/ksh93/sh/macro.c
index 44e48bb92..29ac26aed 100644
--- a/src/cmd/ksh93/sh/macro.c
+++ b/src/cmd/ksh93/sh/macro.c
@@ -76,6 +76,7 @@ typedef struct _mac_
char macsub; /* set to 1 when running mac_substitute */
int dotdot; /* set for .. in subscript */
void *nvwalk; /* for name space walking */
+ char bracketexpr; /* set when in [brackets] within a non-ERE glob pattern */
} Mac_t;
#undef ESCAPE
@@ -437,7 +438,6 @@ static void copyto(Mac_t *mp,int endch, int newquote)
char oldquote = mp->quote; /* save "double quoted" state */
char ansi_c = 0; /* set when processing ANSI C escape codes */
int32_t ere = 0; /* bitmask of pattern options indicating an extended regular expression */
- char bracketexpr = 0; /* set when in [brackets] within a non-ERE glob pattern */
Sfio_t *sp = mp->sp;
Stk_t *stkp = sh.stk;
char *resume = 0;
@@ -533,7 +533,7 @@ static void copyto(Mac_t *mp,int endch, int newquote)
if(mp->pattern)
{
/* preserve \ for escaping glob pattern bracket expression operators */
- if(bracketexpr && n==S_BRAOP)
+ if(mp->bracketexpr && n==S_BRAOP)
break;
/* preserve \digit for pattern matching */
/* also \alpha for extended patterns */
@@ -636,8 +636,8 @@ static void copyto(Mac_t *mp,int endch, int newquote)
mp->pattern = c;
break;
case S_ENDCH:
- if(bracketexpr && cp[-1]==RBRACT && !(mp->quote || mp->lit))
- bracketexpr--;
+ if(mp->bracketexpr && cp[-1]==RBRACT && !(mp->quote || mp->lit))
+ mp->bracketexpr--;
if((mp->lit || cp[-1]!=endch || mp->quote!=newquote))
goto pattern;
if(endch==RBRACE && mp->pattern && brace)
@@ -738,12 +738,12 @@ static void copyto(Mac_t *mp,int endch, int newquote)
cp = first = fcseek(0);
break;
}
- if(mp->pattern==1 && !ere && !bracketexpr)
+ if(mp->pattern==1 && !ere && !mp->bracketexpr)
{
- bracketexpr++;
+ mp->bracketexpr++;
/* a ] following [, as in []abc], should not close the bracket expression */
if(cp[0]==RBRACT && cp[1])
- bracketexpr++;
+ mp->bracketexpr++;
}
/* FALLTHROUGH */
case S_PAT:
@@ -883,7 +883,7 @@ static void copyto(Mac_t *mp,int endch, int newquote)
break;
case S_BRAOP:
/* escape a quoted !^- within a bracket expression */
- if(!bracketexpr || !(mp->quote || mp->lit))
+ if(!mp->bracketexpr || !(mp->quote || mp->lit))
continue;
if(c)
sfwrite(stkp,first,c);
@@ -2481,7 +2481,10 @@ static void mac_copy(Mac_t *mp,const char *str, int size)
continue;
}
if(n==S_ESC)
- sfputc(stkp,ESCAPE);
+ {
+ if(!mp->bracketexpr)
+ sfputc(stkp,ESCAPE);
+ }
else if(n==S_EPAT)
{
/* don't allow extended patterns in this case */
@@ -2583,7 +2586,7 @@ static void endfield(Mac_t *mp,int split)
mp->atmode = 0;
if(mp->patfound)
{
- int musttrim = mp->wasexpan && !mp->noextpat && strchr(argp->argval,'\\');
+ int musttrim = mp->wasexpan && !mp->quoted && !mp->noextpat && strchr(argp->argval,'\\');
sh.argaddr = 0;
#if SHOPT_BRACEPAT
/* in POSIX mode, disallow brace expansion for unquoted expansions */ edit: It's broken, do not apply |
a quick test runs as expected I noticed this bug in an automated script thank you! |
It still isn't right. With patch v2 (and before):
That pattern should not resolve to the file |
That is just going to have to be a separate bug. And I'm not sure it will ever be fixed without redesigning the entire split/glob mechanism from the ground up. |
Argh. No it doesn't fix those. I don't know what made me think that. I must have been testing the wrong binary. |
So, patch v2 is total nonsense. We cannot use the bracketexpr flag from copyto() in mac_copy(), becasue mac_copy() is called at S_EOF, meaning at the end of processing. I have a new patch where mac_copy() does its own detection of a bracket expression, making that fix work: Patch v3 (also broken, do not apply)diff --git a/src/cmd/ksh93/sh/macro.c b/src/cmd/ksh93/sh/macro.c
index 44e48bb92..9d19df04a 100644
--- a/src/cmd/ksh93/sh/macro.c
+++ b/src/cmd/ksh93/sh/macro.c
@@ -2451,6 +2451,7 @@ static void mac_copy(Mac_t *mp,const char *str, int size)
}
else if(!mp->quote && mp->split && (mp->ifs||mp->pattern))
{
+ char bracketexpr = 0; /* set when in [brackets] */
/* split words at ifs characters */
state = sh.ifstable;
if(mp->pattern)
@@ -2467,6 +2468,8 @@ static void mac_copy(Mac_t *mp,const char *str, int size)
if(state[c]==0)
state[c] = S_PAT;
}
+ if(state[RBRACT]==0 && state[LBRACT]==S_PAT)
+ state[RBRACT] = S_ENDCH;
if(state[ESCAPE]==0)
state[ESCAPE] = S_ESC;
}
@@ -2481,7 +2484,10 @@ static void mac_copy(Mac_t *mp,const char *str, int size)
continue;
}
if(n==S_ESC)
- sfputc(stkp,ESCAPE);
+ {
+ if(!bracketexpr)
+ sfputc(stkp,ESCAPE);
+ }
else if(n==S_EPAT)
{
/* don't allow extended patterns in this case */
@@ -2491,7 +2497,21 @@ static void mac_copy(Mac_t *mp,const char *str, int size)
mp->noextpat = 1;
}
else if(n==S_PAT)
+ {
mp->patfound = mp->pattern;
+ if(!bracketexpr && mp->pattern)
+ {
+ bracketexpr++;
+ /* a ] following [, as in []abc], should not close the bracket expression */
+ if(cp[0]==RBRACT && cp[1])
+ bracketexpr++;
+ }
+ }
+ else if(n==S_ENDCH)
+ {
+ if(bracketexpr)
+ bracketexpr--;
+ }
else if(n && mp->ifs)
{
if(mbwide() && n==S_MBYTE)
@@ -2557,8 +2577,10 @@ static void mac_copy(Mac_t *mp,const char *str, int size)
if(state[c]==S_PAT)
state[c] = 0;
}
- if(sh.ifstable[ESCAPE]==S_ESC)
- sh.ifstable[ESCAPE] = 0;
+ if(state[RBRACT]==S_ENDCH)
+ state[RBRACT] = 0;
+ if(state[ESCAPE]==S_ESC)
+ state[ESCAPE] = 0;
}
}
else
@@ -2583,7 +2605,7 @@ static void endfield(Mac_t *mp,int split)
mp->atmode = 0;
if(mp->patfound)
{
- int musttrim = mp->wasexpan && !mp->noextpat && strchr(argp->argval,'\\');
+ int musttrim = mp->wasexpan && !mp->quoted && !mp->noextpat && strchr(argp->argval,'\\');
sh.argaddr = 0;
#if SHOPT_BRACEPAT
/* in POSIX mode, disallow brace expansion for unquoted expansions */
diff --git a/src/cmd/ksh93/tests/glob.sh b/src/cmd/ksh93/tests/glob.sh
index ae4a8e101..d95eeae42 100755
--- a/src/cmd/ksh93/tests/glob.sh
+++ b/src/cmd/ksh93/tests/glob.sh
@@ -471,6 +471,8 @@ test_glob '<~(K)[]-z]>' ~(K)["]-z"]
unquoted_patvar='[\!N]'; test_glob '<[\!N]>' $unquoted_patvar
unquoted_patvar='[\^N]'; test_glob '<[\^N]>' $unquoted_patvar
unquoted_patvar='[a\-c]'; test_glob '<[a\-c]>' $unquoted_patvar
+unquoted_patvar='[a\-c]'; test_glob '<[a\-c]>' ""$unquoted_patvar
+unquoted_patvar='[a\-c]'; test_glob '<[a\-c]>' $unquoted_patvar""
: > -; test_glob '<->' $unquoted_patvar
: > a > c; test_glob '<-> <a> <c>' $unquoted_patvar
: > !; unquoted_patvar='[\!N]'; test_glob '<!>' $unquoted_patvar But that doesn't fix the #549 regression reintroduced by patch v1 either, it just breaks it in the opposite way:
And, as a bonus, we get a lot of regression test failures as well:
|
As far as I can tell, the "design" of the split/glob mechanism, which is decades of kludge upon kludge upon kludge etc., is simply incompatible with correct operation. Any change in the internal backslash escaping mechanism will introduce regressions. At least I sure don't know how to fix things. And they never did get it right at AT&T. This is simply hopeless. I'm throwing in the towel. The first, simple patch above at least improves things, and fixes the unacceptable regression that @dicktyr reported. So I'm just going to commit it, make an urgent 1.0.6 release, and hope there aren't any other serious issues left to be discovered. And we're just going to have to live with those corner case bugs involving backslashes. I cannot fix them. A comprehensive redesign is needed where that internal backslash escaping mechanism is replaced by something completely different. But that is a pipe dream, at least for the time being. |
@dicktyr (Richard Taityr) reports: > globs are not expanded with the following: > > % echo "/"{bin,sbin}"/*" > /bin/* /sbin/* > > % v=/; echo "$v"bin"/*" > /bin/* > > but globbing is unexpectedly performed if both parameter > substitution and brace expansion are present: > > % v=/; echo "$v"{bin,sbin}"/*" > [output omitted for the sake of brevity] So, quoted pattern characters are expanded. No es bueno. The closest I've been able to come to fixing this so far is: src/cmd/ksh93/sh/macro.c: endfield(): - Do not set musttrim if mp->quoted is set, i.e., if the argument node contains any quoted parts. This fixes the major bug. But it also partially reintroduces the previously fixed bug <#549> for cases where any part of the word is quoted -- even if the quoted part is empty. For example: $ mkdir testdir $ cd testdir $ touch a b c ./- $ p='[a\-c]' $ echo $p # OK - a c $ echo ""$p # BUG a b c $ echo $p"" # BUG a b c The fundamental problem is that, by the time endfield() is reached, we have a complete word and we can no longer distinguish between the quoted and unquoted parts of it. The Mac_t flags, such as mp->quoted, apply to the entire word. This is a fundamental flaw in the current design, where quotes are internally translated to backslash escapes for certain characters. To the best of my knowledge (and I've tried hard), it is not possible to fix this without introducing other regressions. A radical redesign of this entire mechanism is needed where this internal backslash escaping is replaced by a way for every 'struct argnod' argument node to track for each individual character whether it is quoted or not, without modifying the argument string itself. Because it is incorrect to modify the argument string. But, for the foreseeable future, that is a pipe dream, because there is no one who fully understands all this code with all its myriad kludges and workarounds going decades back. Resolves: #660
From the announcement: This is an urgent bugfix release that fixes a serious regression in pathname expansion, see: ksh93/ksh#660 The previous 1.0.5 release is withdrawn and should not be used. Main changes between 1.0.5 and 1.0.6: - Fixed a serious regression in pathname expansion where quoted wildcard characters were incorrectly expanded if a pattern contains both a brace expansion and a variable expansion. - Fixed a bug where the command to launch a full-screen editor (^X^E in emacs and 'v' in vi) could cause the wrong command line to be edited if two shell sessions share a .sh_history file. 1.0.5 had a large amount of bugfixes compared to 93u+m/1.0.4. In summary: - Fixed various bugs causing crashes. - Fixed many bugs in the emacs and vi line editors, in command completion, and in file name completion. - Fixed various bugs in the handling of quotes, backslash escapes and braces when processing shell glob patterns (e.g. in pathname expansion and 'case'). - ksh now throws a panic and exits if a read error (such as an I/O error) occurs while trying to read the next command(s) from a running script. - Fixed many bugs in 'printf' and 'print -f' built-in commands, including: . Multiple bugs causing incorrect output for relative date specifications, e.g., printf %T\\n 'exactly 20 months ago' now outputs a correct result. . More printf bugs with mix and match of % and %x$. . A data corruption bug when using %B with 'printf -v varname'. . A bug causing double evaluation of arithmetic expressions. - Fixed a bug where 'unset -f commandname', executed in a subshell, hides any built-in command by the same name for the duration of that subshell. - Fixed ${var/#/string} and ${var/%/string} (with anchored empty pattern) to work as on mksh, bash and zsh; these are no longer ineffective. - Fixed incorrect result of array slicing ${array[@]:offset:length} where 'length' is a nested expansion involving an array. - Command names can now end in ':' as they can on other shells. - Fixed a spurious syntax error in compound assignments upon encountering a pair of repeated opening parentheses '(('. - Fixed spurious syntax error in ${parameter:offset:length}: the arithmetic expressions 'offset' and 'length' may now contain the operators ( ) & |. - Fixed a parsing bug in the declaration of .sh.math.* arithmetic functions. - Fixed nameref self-reference loop detection for more than two namerefs. - Several improvements to the POSIX compatibility mode. - Many more minor and/or esoteric bugfixes.
This includes an urgent bugfix release that fixes a serious regression in pathname expansion, see: ksh93/ksh#660
release 1.0.5 appears to have introduced a regression
globs are not expanded with the following:
but globbing is unexpectedly performed if both parameter substitution and brace expansion are present:
The text was updated successfully, but these errors were encountered: