-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
echo ˋls "fooˋ
should be a syntax error
#199
Comments
"ls foo
should be a syntax errorls "foo
should be a syntax error
Wow, and
|
And so do
|
Unbelievable: this is deliberate, it's by design. This code specifically refuses to throw a syntax error on EOF and unbalanced quotes if we're running code from a string (such as when doing Lines 448 to 457 in 7ad274f
The balance struct member is specifically for internally adding the missing quote: Lines 1321 to 1326 in 7ad274f
This must be another hack for compatibility with an antiquated shell bug. Sure enough, checking So, this is Historical Behaviour™. And we probably break a few old ksh scripts if we fix it. <sigh> In fact, fixing it (by removing the check for parsing from a string) breaks a few regression tests in basic.sh and bracket.sh. Those tests all have syntax errors in them due to unbalanced quotes. But I'm going to fix this bug in the posix mode at least. That's really turning out to be the sanity mode for this shell. I'll fix the regression tests, too. |
Interesting, FWIW, pbosh and bosh (Schily tools shell, evolved from Solaris SVR4 sh - http://schilytools.sourceforge.net/bosh.html ) also exhibit the same behavior, also in POSIX mode. It should probably be considered a bug in POSIX mode. At the very least I don't recall that posix mentions auto balancing. Except for AT&T ksh variants (including 2020) and the Schily shells, all other shells I have access to throw a syntax error on |
It is most definitely not specified by POSIX. :) If it were, it would have to work for regular scripts executed from files as well, which it doesn't:
However, that does execute on the old Bourne shell. Which is more evidence that it's an unintentional bug in the Bourne shell, and that the ksh93 backporting of that bug is deliberate and specific. Here is the source code for the Tenth Edition UNIX Bourne shell. I can find no evidence in it that the behaviour is deliberate. If there is, it should be somewhere in word.c which looks like it's the ancestor of lex.c and parse.c in ksh93. But I think it's a bug that people apparently started relying on so much that Korn felt compelled to unfix it for these specific use cases. Yech. However, I also found actual unintended breakage caused by this hack: Unbalanced arithmetic command:
If an external command Unbalanced arithmetic expansion:
Doesn't even execute the arithmetic expansion. Neither the arithmetic command nor arithmetic expansion exist in the old Bourne shell. The code is trying to compensate for unbalanced parentheses and braces as well. Even the old Bourne shell never accepts these under any conditions. Apparently single parentheses and braces are caught regardless, elsewhere in the code:
So the code that tries to compensate for those is pointless, causes breakage that has nothing to do with Bourne shell compatibility, and should be removed. Only unbalanced quotes should be compensated for in non-POSIX mode. |
This fixes the syntax errors in the regression tests. diff --git a/src/cmd/ksh93/tests/basic.sh b/src/cmd/ksh93/tests/basic.sh
index 5e894bb..18fd83d 100755
--- a/src/cmd/ksh93/tests/basic.sh
+++ b/src/cmd/ksh93/tests/basic.sh
@@ -357,8 +357,8 @@ fi
print cat > $tmp/scriptx
chmod +x $tmp/scriptx
[[ $($SHELL -c "print foo | $tmp/scriptx ;:" 2> /dev/null ) == foo ]] || err_exit 'piping into script fails'
-[[ $($SHELL -c 'X=1;print -r -- ${X:=$(expr "a(0)" : '"'a*(\([^)]\))')}'" 2> /dev/null) == 1 ]] || err_exit 'x=1;${x:=$(..."...")} failure'
-[[ $($SHELL -c 'print -r -- ${X:=$(expr "a(0)" : '"'a*(\([^)]\))')}'" 2> /dev/null) == 0 ]] || err_exit '${x:=$(..."...")} failure'
+[[ $($SHELL -c 'X=1;print -r -- ${X:=$(expr "a(0)" : '"'a*(\([^)]\))')}" 2> /dev/null) == 1 ]] || err_exit 'x=1;${x:=$(..."...")} failure'
+[[ $($SHELL -c 'print -r -- ${X:=$(expr "a(0)" : '"'a*(\([^)]\))')}" 2> /dev/null) == 0 ]] || err_exit '${x:=$(..."...")} failure'
if cat /dev/fd/3 >/dev/null 2>&1 || whence mkfifo > /dev/null
then [[ $(cat <(print hello) ) == hello ]] || err_exit "process substitution not working outside for or while loop"
$SHELL -c '[[ $(for i in 1;do cat <(print hello);done ) == hello ]]' 2> /dev/null|| err_exit "process substitution not working in for or while loop"
diff --git a/src/cmd/ksh93/tests/substring.sh b/src/cmd/ksh93/tests/substring.sh
index 7b65944..36207dc 100755
--- a/src/cmd/ksh93/tests/substring.sh
+++ b/src/cmd/ksh93/tests/substring.sh
@@ -587,16 +587,16 @@ done
#multibyte locale tests
if((SHOPT_MULTIBYTE)); then
-x='a<2b|>c<3d|\>e' LC_ALL=debug $SHELL -c 'test "${x:0:1}" == a || err_exit ${x:0:1} should be a'
-x='a<2b|>c<3d|\>e' LC_ALL=debug $SHELL -c 'test "${x:1:1}" == "<2b|>" || err_exit ${x:1:1} should be <2b|>'
-x='a<2b|>c<3d|\>e' LC_ALL=debug $SHELL -c 'test "${x:3:1}" == "<3d|\\>" || err_exit ${x:3:1} should be <3d|\>'
-x='a<2b|>c<3d|\>e' LC_ALL=debug $SHELL -c 'test "${x:4:1}" == e || err_exit ${x:4:1} should bee'
-x='a<2b|>c<3d|\>e' LC_ALL=debug $SHELL -c 'test "${x:1}" == "<2b|>c<3d|\\>e" || print -u2 ${x:1}" should be <2b|>c<3d|\>e'
-x='a<2b|>c<3d|\>e' LC_ALL=debug $SHELL -c 'test "${x: -1:1}" == e || err_exit ${x: -1:1} should be e'
-x='a<2b|>c<3d|\>e' LC_ALL=debug $SHELL -c 'test "${x: -2:1}" == "<3d|\\>" || err_exit ${x: -2:1} == <3d|\>'
-x='a<2b|>c<3d|\>e' LC_ALL=debug $SHELL -c 'test "${x:1:3}" == "<2b|>c<3d|\\>" || err_exit ${x:1:3} should be <2b|>c<3d|\>'
-x='a<2b|>c<3d|\>e' LC_ALL=debug $SHELL -c 'test "${x:1:20}" == "<2b|>c<3d|\\>e" || err_exit ${x:1:20} should be <2b|>c<3d|\>e'
-x='a<2b|>c<3d|\>e' LC_ALL=debug $SHELL -c 'test "${x#??}" == "c<3d|\\>e" || err_exit "${x#??} should be c<3d|\>e'
+x='a<2b|>c<3d|\>e' LC_ALL=debug $SHELL -c 'test "${x:0:1}" == a' || err_exit '${x:0:1} should be a'
+x='a<2b|>c<3d|\>e' LC_ALL=debug $SHELL -c 'test "${x:1:1}" == "<2b|>"' || err_exit '${x:1:1} should be <2b|>'
+x='a<2b|>c<3d|\>e' LC_ALL=debug $SHELL -c 'test "${x:3:1}" == "<3d|\\>"' || err_exit '${x:3:1} should be <3d|\>'
+x='a<2b|>c<3d|\>e' LC_ALL=debug $SHELL -c 'test "${x:4:1}" == e' || err_exit '${x:4:1} should be e'
+x='a<2b|>c<3d|\>e' LC_ALL=debug $SHELL -c 'test "${x:1}" == "<2b|>c<3d|\\>e"' || err_exit '${x:1} should be <2b|>c<3d|\>e'
+x='a<2b|>c<3d|\>e' LC_ALL=debug $SHELL -c 'test "${x: -1:1}" == e' || err_exit '${x: -1:1} should be e'
+x='a<2b|>c<3d|\>e' LC_ALL=debug $SHELL -c 'test "${x: -2:1}" == "<3d|\\>"' || err_exit '${x: -2:1} should be <3d|\>'
+x='a<2b|>c<3d|\>e' LC_ALL=debug $SHELL -c 'test "${x:1:3}" == "<2b|>c<3d|\\>"' || err_exit '${x:1:3} should be <2b|>c<3d|\>'
+x='a<2b|>c<3d|\>e' LC_ALL=debug $SHELL -c 'test "${x:1:20}" == "<2b|>c<3d|\\>e"' || err_exit '${x:1:20} should be <2b|>c<3d|\>e'
+x='a<2b|>c<3d|\>e' LC_ALL=debug $SHELL -c 'test "${x#??}" == "c<3d|\\>e"' || err_exit '${x#??} should be c<3d|\>e'
fi # SHOPT_MULTIBYTE
x='a one and a two' |
Bash documentation specifically mentions this behavior, although doesn't call out ksh for it.
From https://tiswww.case.edu/php/chet/bash/bashref.html#Implementation-Differences-From-The-SVR4_002e2-Shell
"Bash does not allow unbalanced quotes. The SVR4.2 shell will silently insert a needed closing quote at `EOF` under certain circumstances. This can be the cause of some hard-to-find errors."
That makes it *sound* deliberate on the part of Bourne sh.
…--
Marc Wilson
posguy99@gmail.com
On Sat, Feb 27, 2021, at 10:14 PM, Martijn Dekker wrote:
Unbelievable: this is deliberate, it's by design.
This code specifically refuses to throw a syntax error on EOF and unbalanced quotes if we're running code from a string (such as when doing `eval`, a `-c` script, or a backtick command substitution). https://github.com/ksh93/ksh/blob/7ad274f8b660879b5535af3f2d8fd3cb86ef2684/src/cmd/ksh93/sh/lex.c#L448-L457
The `balance` struct member is specifically for internally adding the missing quote: https://github.com/ksh93/ksh/blob/7ad274f8b660879b5535af3f2d8fd3cb86ef2684/src/cmd/ksh93/sh/lex.c#L1321-L1326
This must be another hack for compatibility with an antiquated shell bug.
Sure enough, checking `/bin/sh` on Solaris 10.1, which is an original Bourne shell, shows identical behaviour on that. It wrongly accepts unbalanced quotes in the same contexts.
So, this is Historical Behaviour™. And we probably break a few old ksh scripts if we fix it.
In fact, fixing it (by removing the check for parsing from a string) breaks a few regression tests in basic.sh and bracket.sh. Those tests all have syntax errors in them due to unbalanced quotes.
But I'm going to fix this bug in the posix mode at least. That's really turning out to be the sanity mode for this shell. I'll fix the regression tests, too.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub <#199 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AADDYRBDVBQTBCITYEYRB73TBHNOHANCNFSM4YKPV2LQ>.
|
Thanks @posguy99. Yes it does. But I can't find anything in the Bourne shell source code that actually inserts the closing quote or backtick. It looks more like it simply fails to check for a closing quote or backtick. There is another problem: an interaction with the fix for a memory leak in backtick-style command substitutions in combination with aliases. That fix is this, and I still don't understand why it works: Line 2121 in 7ad274f
setupalias() in lex.c to be closed properly, fixing a memory leak. This fix is part of the 93v- beta and ksh2020, as well.
The problem is that this introduces a side effect to this rebalancing quotes compatibility back. This 2015 mailing list post details the side effect. Basically, the added linefeed character becomes part of any quoted string whose quotes are not closed properly. That is wrong, and causes strange corner case bugs. That mailing list post suggests a different fix, and it does also work to fix the memory leak. However, it introduces a different corner-case regression instead. This test, which didn't exist at the time that was written, fails with that fix, because a spurious space gets somehow inserted between So it seems to me we can't win here. We either apply a fix for the memory leak and get one corner case bug or another, or we don't and we have a memory leak. The way I see it, the only realistic way out of this is to actually enforce correct syntax. Tolerating syntax errors is just asking for weird side effects, inconsistent states, and corner case bugs. We should not want to do that. Old scripts that rely on this will just need to be fixed. It's not rocket science. |
I tend to agree. Would be interesting to be able to find a real-world script or class of scripts which made use of this feature/bug. |
Already found some in the regression tests. Not that I'd say they "make use of" the bug – those tests were just written wrong. The basic.sh ones just had a spurious extra single quote, clearly inserted because someone miscounted the quotes. But the substring.sh tests never worked at all, they just failed to throw an error. And that is a good illustration of why this hack is a bad thing. It masks bugs (as the bash documentation entry posted by @posguy99 also points out). Unbalanced quotes were never tolerated in the main script itself, thankfully (unlike in the Bourne shell). The things that work now and would fail are backtick-style command substitutions, For reference and curiosity's sake, here's a patch that disables this hack for POSIX mode only, and also fixes up the hack to avoid that bug with arithmetic commands and expansions. But, as said, I'm pretty sure I would prefer to remove the hack instead, as it was problematic to begin with, and the memory leak fix further broke it. diff --git a/src/cmd/ksh93/sh/lex.c b/src/cmd/ksh93/sh/lex.c
index 0fc46c4..00cd7e5 100644
--- a/src/cmd/ksh93/sh/lex.c
+++ b/src/cmd/ksh93/sh/lex.c
@@ -326,7 +326,6 @@ int sh_lex(Lex_t* lp)
Stk_t *stkp = shp->stk;
int inlevel=lp->lexd.level, assignment=0, ingrave=0;
int epatchar=0;
- Sfio_t *sp;
#if SHOPT_MULTIBYTE
LEN=1;
#endif /* SHOPT_MULTIBYTE */
@@ -397,7 +396,6 @@ int sh_lex(Lex_t* lp)
fcseek(-LEN);
goto breakloop;
case S_EOF:
- sp = fcfile();
if((n=lexfill(lp)) > 0)
{
fcseek(-1);
@@ -424,38 +422,36 @@ int sh_lex(Lex_t* lp)
return(lp->token=EOFSYM);
if(mode >ST_NORM && lp->lexd.level>0)
{
+ /*
+ * Compatibility with ancient Bourne shell breakage: if we're analysing a
+ * string ('eval', -c script, backtick comsub), only throw a syntax error
+ * for unbalanced quotes or backticks if we're in POSIX sanity mode.
+ */
switch(c=endchar(lp))
{
case '$':
if(mode==ST_LIT)
{
- c = '\'';
+ lp->lexd.balance = '\'';
break;
}
mode = oldmode(lp);
poplevel(lp);
continue;
- case RBRACT:
- c = LBRACT;
- break;
case 1: /* for ((...)) */
- case RPAREN:
c = LPAREN;
break;
- default:
- c = LBRACE;
- break;
case '"': case '`': case '\'':
lp->lexd.balance = c;
break;
}
- if(sp && !(sfset(sp,0,0)&SF_STRING))
+ if(sh_isoption(SH_POSIX) ||
+ !lp->lexd.balance && (c==LPAREN || fcfile() && !(sfset(fcfile(),0,0)&SF_STRING)))
{
lp->lasttok = c;
lp->token = EOFSYM;
sh_syntax(lp);
}
- lp->lexd.balance = c;
}
goto breakloop;
case S_COM: |
src/cmd/ksh93/tests/basic.sh: - Fix syntax error (unbalanced single quote) in two -c script invocations. It only failed to throw a syntax error due to a problematic hack in ksh that may be removed soon. See: #199 src/cmd/ksh93/tests/builtins.sh, src/cmd/ksh93/tests/io.sh: - Redirect standard error on two ksh -i invocations to /dev/null to work around the test hanging on AIX. src/cmd/ksh93/tests/comvario.sh: - Remove duplicate copyright header. - Fix warning format. src/cmd/ksh93/tests/functions.sh: - Fix the 'TERM signal sent to last process of function kills the script' test so that it works on AIX. We cannot rely on grepping 'ps' output as the external 'sleep' command does not show the command name on AIX. Instead, find it by its parent PID. src/cmd/ksh93/tests/locale.sh, src/cmd/ksh93/tests/substring.sh: - Rewrite the very broken multibyte locale tests (two outright syntax errors due to unbalanced quotes, and none of the tests actually worked). - Since they set LC_ALL, move them to locale.sh. src/cmd/ksh93/tests/variables.sh: - Redirect stderr on some 'ulimit -t unlimited' invocations (which fork subshells as the intended side effect) to /dev/null in case that exceeds a system-defined limit.
src/cmd/ksh93/tests/basic.sh: - Fix syntax error (unbalanced single quote) in two -c script invocations. It only failed to throw a syntax error due to a problematic hack in ksh that may be removed soon. See: #199 src/cmd/ksh93/tests/builtins.sh, src/cmd/ksh93/tests/io.sh: - Redirect standard error on two ksh -i invocations to /dev/null to work around the test hanging on AIX. src/cmd/ksh93/tests/comvario.sh: - Remove duplicate copyright header. - Fix warning format. src/cmd/ksh93/tests/functions.sh: - Fix the 'TERM signal sent to last process of function kills the script' test so that it works on AIX. We cannot rely on grepping 'ps' output as the external 'sleep' command does not show the command name on AIX. Instead, find it by its parent PID. src/cmd/ksh93/tests/locale.sh, src/cmd/ksh93/tests/substring.sh: - Rewrite the very broken multibyte locale tests (two outright syntax errors due to unbalanced quotes, and none of the tests actually worked). - Since they set LC_ALL, move them to locale.sh. src/cmd/ksh93/tests/variables.sh: - Redirect stderr on some 'ulimit -t unlimited' invocations (which fork subshells as the intended side effect) to /dev/null in case that exceeds a system-defined limit.
src/cmd/ksh93/tests/basic.sh: - Fix syntax error (unbalanced single quote) in two -c script invocations. It only failed to throw a syntax error due to a problematic hack in ksh that may be removed soon. See: #199 src/cmd/ksh93/tests/builtins.sh, src/cmd/ksh93/tests/io.sh: - Redirect standard error on two ksh -i invocations to /dev/null to work around the test hanging on AIX. src/cmd/ksh93/tests/comvario.sh: - Remove duplicate copyright header. - Fix warning format. src/cmd/ksh93/tests/functions.sh: - Fix the 'TERM signal sent to last process of function kills the script' test so that it works on AIX. We cannot rely on grepping 'ps' output as the external 'sleep' command does not show the command name on AIX. Instead, find it by its parent PID. src/cmd/ksh93/tests/locale.sh, src/cmd/ksh93/tests/substring.sh: - Rewrite the very broken multibyte locale tests (two outright syntax errors due to unbalanced quotes, and none of the tests actually worked). - Since they set LC_ALL, move them to locale.sh. src/cmd/ksh93/tests/variables.sh: - Redirect stderr on some 'ulimit -t unlimited' invocations (which fork subshells as the intended side effect) to /dev/null in case that exceeds a system-defined limit.
This is the patch that removes the hack altogether. diff --git a/src/cmd/ksh93/sh/lex.c b/src/cmd/ksh93/sh/lex.c
index 0fc46c4..1b7cee9 100644
--- a/src/cmd/ksh93/sh/lex.c
+++ b/src/cmd/ksh93/sh/lex.c
@@ -92,7 +92,6 @@ struct lexdata
char nested_tilde;
char *docend;
char noarg;
- char balance;
char warn;
char message;
char arith;
@@ -277,7 +276,7 @@ Lex_t *sh_lexopen(Lex_t *lp, Shell_t *sp, int mode)
lp->lexd.warn=1;
if(!mode)
{
- lp->lexd.noarg = lp->lexd.level= lp->lexd.dolparen = lp->lexd.balance = 0;
+ lp->lexd.noarg = lp->lexd.level= lp->lexd.dolparen = 0;
lp->lexd.nocopy = lp->lexd.docword = lp->lexd.nest = lp->lexd.paren = 0;
lp->lexd.lex_state = lp->lexd.lastc=0;
lp->lexd.docend = 0;
@@ -326,7 +325,6 @@ int sh_lex(Lex_t* lp)
Stk_t *stkp = shp->stk;
int inlevel=lp->lexd.level, assignment=0, ingrave=0;
int epatchar=0;
- Sfio_t *sp;
#if SHOPT_MULTIBYTE
LEN=1;
#endif /* SHOPT_MULTIBYTE */
@@ -397,7 +395,6 @@ int sh_lex(Lex_t* lp)
fcseek(-LEN);
goto breakloop;
case S_EOF:
- sp = fcfile();
if((n=lexfill(lp)) > 0)
{
fcseek(-1);
@@ -446,16 +443,11 @@ int sh_lex(Lex_t* lp)
c = LBRACE;
break;
case '"': case '`': case '\'':
- lp->lexd.balance = c;
break;
}
- if(sp && !(sfset(sp,0,0)&SF_STRING))
- {
- lp->lasttok = c;
- lp->token = EOFSYM;
- sh_syntax(lp);
- }
- lp->lexd.balance = c;
+ lp->lasttok = c;
+ lp->token = EOFSYM;
+ sh_syntax(lp);
}
goto breakloop;
case S_COM:
@@ -1299,13 +1291,9 @@ int sh_lex(Lex_t* lp)
}
breakloop:
if(lp->lexd.nocopy)
- {
- lp->lexd.balance = 0;
return(0);
- }
if(lp->lexd.dolparen)
{
- lp->lexd.balance = 0;
if(lp->lexd.docword)
nested_here(lp);
lp->lexd.message = (wordflags&ARG_MESSAGE);
@@ -1318,12 +1306,6 @@ breakloop:
lp->arg = (struct argnod*)stkseek(stkp,ARGVAL);
if(n>0)
sfwrite(stkp,state,n);
- /* add balancing character if necessary */
- if(lp->lexd.balance)
- {
- sfputc(stkp,lp->lexd.balance);
- lp->lexd.balance = 0;
- }
sfputc(stkp,0);
stkseek(stkp,stktell(stkp)-1);
state = stkptr(stkp,ARGVAL); |
If Bourne actually doesn't have the hack, and just fails to check, then yeah, IMO ksh should not do the hack and just be syntactically correct. To be truly compatible, ksh shouldn't check either, and no one wants that. :) Betas are supposed to expose breakage, so change it and let things break, if they're gonna. |
ls "foo
should be a syntax errorls "foo
should be a syntax error
ls "foo
should be a syntax errorls "foo\
should be a syntax error
ls "foo\
should be a syntax error\
ls "foo\`` should be a syntax error
\
ls "foo\`` should be a syntax errorls "foo
</code> should be a syntax error
ls "foo
</code> should be a syntax errorecho
ls "foo``` should be a syntax error
echo
ls "foo``` should be a syntax errorecho
ls "foo
` should be a syntax error
echo
ls "foo
` should be a syntax error echo ls "foo
should be a syntax error
echo ls "foo
should be a syntax error echo
ls "foo
` should be a syntax error
echo
ls "foo
` should be a syntax errorls "foo\
should be a syntax error
ls "foo\
should be a syntax errorecho ˋls "fooˋ
should be a syntax error
Title says it all. It isn't, which is bogus.
The text was updated successfully, but these errors were encountered: