Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backtick command substitutions can't nest double quotes #352

Closed
JohnoKing opened this issue Nov 27, 2021 · 7 comments
Closed

Backtick command substitutions can't nest double quotes #352

JohnoKing opened this issue Nov 27, 2021 · 7 comments
Labels
bug Something is not working

Comments

@JohnoKing
Copy link

I tried setting ksh as /bin/sh on my Linux system, and found that it caused xdg-open to stop working:

$ xdg-open 
/usr/bin/xdg-open: line 139: syntax error at line 164: `(' unexpected

The line causing the error:

command="`grep -E "^Exec(\[[^]=]*])?=" "$file" | cut -d= -f 2- | first_word`"

Backtick command substitutions in ksh cannot nest double quotes, unlike other shells. This bug is likely related to #199:

# Latest ksh93u+m git commit
$ ksh -c 'foobar="`echo "foo bar"`"; echo $foobar'
ksh: syntax error at line 1: `"' unmatched
# '"`

# ksh93u+ (the errors differ because of https://github.com/ksh93/ksh/issues/199)
$ ksh93u -c 'foobar="`echo "foo bar"`"; echo $foobar'
ksh93u: : cannot execute [Is a directory]
ksh93u: bar: not found [No such file or directory]

# Working examples
$ bash -c 'foobar="`echo "foo bar"`"; echo $foobar'
foo bar
$ dash -c 'foobar="`echo "foo bar"`"; echo $foobar'
foo bar
$ mksh -c 'foobar="`echo "foo bar"`"; echo $foobar'
foo bar
@hyenias
Copy link

hyenias commented Nov 28, 2021

It seems to be the outermost double quotes enclosing the backticks that triggers the issue. Are they even needed?

$ bash -c 'a="`echo "one * three"`"; typeset -p a'
declare -- a="one * three"
$ bash -c 'a=`echo "one * three"`; typeset -p a'
declare -- a="one * three"
$
$ zsh -c 'a="`echo "one * three"`"; typeset -p a'
typeset a='one * three'
$ zsh -c 'a=`echo "one * three"`; typeset -p a'
typeset a='one * three'
$
$ mksh -c 'a="`echo "one * three"`"; typeset -p a'
typeset a='one * three'
$ mksh -c 'a=`echo "one * three"`; typeset -p a'
typeset a='one * three'
$
$ ksh -c 'a="`echo "one * three"`"; typeset -p a'
ksh: syntax error at line 1: `"' unmatched
$ ksh -c 'a=`echo "one * three"`; typeset -p a'
a='one * three'
$

@McDutchie
Copy link

Indeed, ksh93 is the only current shell that gives a syntax error here. However, the original Bourne shell (of which ksh93 is a descendant) throws a syntax error here as well.

For information, POSIX specifies:

Within the backquoted style of command substitution, shall retain its literal meaning, except when followed by: $, `, or <backslash>. The search for the matching backquote shall be satisfied by the first unquoted non-escaped backquote; during this search, if a non-escaped backquote is encountered within a shell comment, a here-document, an embedded command substitution of the $(command) form, or a quoted string, undefined results occur. A single-quoted or double-quoted string that begins, but does not end, within the `...` sequence produces undefined results.

Bolsky & Korn (1995) give this documentation on page 143:

ksh constructs the command to be executed from the characters that are left after ksh makes the following deletions. A \ that is followed by a $, `, or \, is removed. The following example illustrates the complexity of quoting with ``.

$ print -r "`print -r \$PWD "\$PWD" ')a\`\$\\'`"
/home/dgk /home/dgk )a`$\

You can include backquotes, `...`, within grouping (double) quotes. If you do that, and you also want to include one or more double quotes within the backquotes, then you must precede each included double quote by a backslash. ksh removes the backslash when it constructs the command.

@McDutchie
Copy link

McDutchie commented Nov 29, 2021

Apart from removing the outer quotes, it seems like the syntax error can be avoided by escaping the parentheses in the regex with backslashes (which will be removed by the backtick comsub):

command="`grep -E "^Exec\(\[[^]=]*]\)?=" "$file" | cut -d= -f 2- | first_word`"

So that should work on every shell.

However, the need to escape parentheses is not documented or specified anywhere AFAICT.

edit: That is the fix for the ancient Bourne shell as well.

@McDutchie McDutchie added the bug Something is not working label Dec 5, 2021
@McDutchie
Copy link

If we make sh_syntax() call abort() instead of throwing an error, the backtrace shows us where the syntax error is thrown:

Thread 0 Crashed:: Dispatch queue: com.apple.main-thread
0   libsystem_kernel.dylib        	0x00007fff7d22c2c2 __pthread_kill + 10
1   libsystem_pthread.dylib       	0x00007fff7d2e7bf1 pthread_kill + 284
2   libsystem_c.dylib             	0x00007fff7d1966a6 abort + 127
3   ksh                           	0x00000001055069f1 sh_syntax + 17
4   ksh                           	0x000000010552bbce assign + 110 (parse.c:952)
5   ksh                           	0x0000000105529f24 simple + 2244 (parse.c:1528)
6   ksh                           	0x00000001055287ea item + 3050 (parse.c:1346)
7   ksh                           	0x0000000105527a6b term + 187 (parse.c:588)
8   ksh                           	0x000000010552788b list + 27 (parse.c:559)
9   ksh                           	0x000000010552708a sh_cmd + 58 (parse.c:508)
10  ksh                           	0x0000000105526e7a sh_parse + 1146 (parse.c:397)
11  ksh                           	0x00000001054c8924 exfile + 3012 (main.c:582)
12  ksh                           	0x00000001054c9a01 sh_main + 3473 (main.c:362)
13  ksh                           	0x00000001054ae226 main + 38 (pmain.c:46)
14  libdyld.dylib                 	0x00007fff7d0f13d5 start + 1

Somehow it's ending up in assign() which the comment says is for compound assignments. That's very wrong as the reproducer is no such thing. Unfortunately that incorrect decision might be made in a completely different place, probably somewhere in lex.c.

@McDutchie
Copy link

The syntax error is thrown by the first sh_syntax() call in assign().

The following debug patch makes a little more sense of what's happening:

--- a/src/cmd/ksh93/sh/parse.c
+++ b/src/cmd/ksh93/sh/parse.c
@@ -985,7 +985,10 @@ static struct argnod *assign(Lex_t *lexp, register struct argnod *ap, int type)
 	Namval_t *np;
 	n = strlen(ap->argval)-1;
 	if(ap->argval[n]!='=')
+{
+error(ERROR_warn(0),"syntax error at: %c [%s]",ap->argval[n],ap->argval);
 		sh_syntax(lexp);
+}
 	if(ap->argval[n-1]=='+')
 	{
 		ap->argval[n--]=0;

With that, the output is:

$ command="`grep -E "^Exec(\[[^]=]*])?=" "$file" | cut -d= -f 2- | first_word`"
arch/darwin.i386-64/bin/ksh: warning: syntax error at: 'c' [command="`grep -E "^Exec]
arch/darwin.i386-64/bin/ksh: syntax error: `(' unexpected

So what's happening is that it's actually considering the current argument to end at (. And that is because it didn't consider the quote " to be a quote; it considered it to be a perfectly regular character that is part of the argument. Which makes ( unquoted, so it's got a special meaning and ends the argument.

@McDutchie McDutchie added the 1.0 label Feb 17, 2022
@atheik
Copy link

atheik commented Apr 29, 2022

Here's a patch that I think fixes the issue:

diff --git a/src/cmd/ksh93/sh/lex.c b/src/cmd/ksh93/sh/lex.c
index c9971851..a6172e4b 100644
--- a/src/cmd/ksh93/sh/lex.c
+++ b/src/cmd/ksh93/sh/lex.c
@@ -313,6 +313,13 @@ int sh_lex(Lex_t* lp)
 		/* skip over characters in the current state */
 		state = sh_lexstates[mode];
 		while((n=STATE(state,c))==0);
+		/*
+		sfprintf(sfstderr,"sh_lex:        mode: %d\n",mode);        sfsync(sfstderr);
+		sfprintf(sfstderr,"sh_lex: oldmode(lp): %d\n",oldmode(lp)); sfsync(sfstderr);
+		sfprintf(sfstderr,"sh_lex:           c: %d\n",c);           sfsync(sfstderr);
+		sfprintf(sfstderr,"sh_lex:           n: %d\n",n);           sfsync(sfstderr);
+		sfprintf(sfstderr,"======\n");                              sfsync(sfstderr);
+		*/
 		switch(n)
 		{
 			case S_BREAK:
@@ -772,8 +779,17 @@ int sh_lex(Lex_t* lp)
 				{
 					if(sh.inlineno > lp->lastline)
 						lp->lex.last_quote = c;
-					mode = oldmode(lp);
-					poplevel(lp);
+					/* at this point we know that the previous skipping of characters was done according to the
+					   ST_QUOTE state table. we also know that the character that stopped the skipping is also
+					   the ending character for the current level. if that character was " and if we are in a
+					   `...` statement, then don't switch to the old mode, as we are actually at the innermost
+					   section of a "`"..."`" statement, which should be skipped again using the ST_QUOTE state
+					   table. */
+					if(c!='"' || !ingrave)
+					{
+						mode = oldmode(lp);
+						poplevel(lp);
+					}
 				}
 				else if(c=='"' && n==RBRACE)
 					mode = ST_QNEST;
diff --git a/src/cmd/ksh93/sh/macro.c b/src/cmd/ksh93/sh/macro.c
index b1ade153..2ced4c9d 100644
--- a/src/cmd/ksh93/sh/macro.c
+++ b/src/cmd/ksh93/sh/macro.c
@@ -2137,6 +2137,10 @@ static void comsubst(Mac_t *mp,register Shnode_t* t, int type)
 		sp = sfnew(NIL(Sfio_t*),str,c,-1,SF_STRING|SF_READ);
 		c = sh.inlineno;
 		sh.inlineno = error_info.line+sh.st.firstline;
+		/*
+		sfprintf(sfstderr,"comsubst: str: %s",str); sfsync(sfstderr);
+		sfprintf(sfstderr,"^^^^^^^^\n");            sfsync(sfstderr);
+		*/
 		t = (Shnode_t*)sh_parse(sp,SH_EOF|SH_NL);
 		sh.inlineno = c;
 		type = 1;

If you comment out the fix and uncomment the debug prints, then I hope you'll see what I was going after in this fix:

# works
arch/linux.i386-64/bin/ksh -c ': "`: "AB"`"'

# the latter sh_parse() which is called in comsubst() gets wrong input
arch/linux.i386-64/bin/ksh -c ': "`: "A B"`"'

@McDutchie
Copy link

Thank you very much, that fix makes a lot of sense and I've not discovered any side effects so far.

McDutchie pushed a commit that referenced this issue May 20, 2022
The following reproducer causes a spurious syntax error:

    foo="`: "("`"

The nested double quotes are not recognised correctly, causing a
syntax error at the '('. Removing the outer double quotes (which
are unnecessary) is a workaround, but it's still a bug as every
other shell accepts this. This bug has been present since the
original Bourne shell.

src/cmd/ksh93/sh/lex.c: sh_lex(): case S_QUOTE:
- If the current character is '"' and we're in a `...` command
  substitution (ingrave is true), then do not switch to the old
  mode but keep using the ST_QUOTE state table.

Thanks to @JohnoKing for the report and to @atheik for the fix.

Co-authored by: Martijn Dekker <martijn@inlv.org>
Resolves: #352
McDutchie added a commit to modernish/modernish that referenced this issue May 22, 2022
Fixed in ksh 93u+m 2022-05-20
Ref.: ksh93/ksh#352
McDutchie added a commit to modernish/modernish that referenced this issue Jun 2, 2022
Fixed in ksh 93u+m 2022-05-20
Ref.: ksh93/ksh#352
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is not working
Projects
None yet
Development

No branches or pull requests

4 participants