Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Syntax error leaves ksh stuck #779

Closed
posguy99 opened this issue Aug 26, 2024 · 5 comments
Closed

Syntax error leaves ksh stuck #779

posguy99 opened this issue Aug 26, 2024 · 5 comments
Labels
bug Something is not working

Comments

@posguy99
Copy link

A completely errant paste that wasn't meant for the terminal window exposed this...

Reproducer:

$ echo $KSH_VERSION
Version AJM 93u+m/1.1.0-alpha+b5d1f098 2024-08-23
$ for ((i = 5 , i = 0, --i)) 
-ksh: syntax error: `{' unmatched
-ksh: syntax error: `{' unmatched
-ksh: syntax error: `{' unmatched
-ksh: syntax error: `{' unmatched
-ksh: syntax error: `{' unmatched
-ksh: syntax error: `{' unmatched
-ksh: syntax error: `{' unmatched
-ksh: syntax error: `{' unmatched
-ksh: syntax error: `{' unmatched

Continues forever until you kill the shell.

ksh93u+ does not have the problem:

[82] mbp-m2 $ echo $KSH_VERSION         
Version AJM 93u+ 2012-08-01
$ for ((i = 5 , i = 0, --i))
/bin/ksh: syntax error: `))' unexpected
$

Obviously bad code shouldn't cause the shell to get stuck.

@posguy99
Copy link
Author

Even simpler reproducer:

$ for(())
-ksh: syntax error: `{' unmatched
-ksh: syntax error: `{' unmatched
-ksh: syntax error: `{' unmatched
-ksh: syntax error: `{' unmatched

Continues forever.

@McDutchie McDutchie added the bug Something is not working label Aug 26, 2024
@McDutchie
Copy link

Thanks for the report. On my end, ksh93u+ and ksh2020 are also broken, just not in the same way.

Schermafbeelding 2024-08-26 om 21 33 23

@McDutchie
Copy link

The current infinite-loop manifestation of the bug was introduced in: f8f2c4b (found by trying old commits)

@McDutchie
Copy link

McDutchie commented Aug 26, 2024

So, after some debugging and backtracing, I've found that what happens is this:

  1. arithfor() in parse.c is called to parse the arithmetic for loop
  2. arithfor() saves current input stream and opens a new input stream to separate each of the three expressions in (( ... ))
  3. on line 758, arithfor() calls sh_lexskip() to find the next ;
  4. in lex.c line 1749, sh_lexskip() calls the main lexer function, sh_lex()
  5. in lex.c line 375, sh_lex() calls sh_syntax() to throw the syntax error in the reproducer
  6. sh_syntax() calls errormsg() from libast, which causes a longjmp straight back to the prompt, without giving arithfor() a chance to restore the input stream that it saved in step 2. This is the bug. We can get all sorts of unpredictable/undefined behaviour that way.

I can think of two strategies to fix this:

  1. Push context and sigsetjmp in arithfor() before calling sh_lexskip(), so that it longjmps back to arithfor() and it can restore the input stream before doing its own longjmp. This may carry a minor performance hit for nested arithmetic for loops.
  2. Since arithfor() throws a syntax error itself when needed anyway (and does so after restoring the input stream), we could also find a way for sh_lex() not to throw that syntax error when called from sh_lexskip.

Strategy 2 might be the best way. It just so happens that sh_lexskip sets its own flag, lp->lexd.noarg, that we can use in sh_lex() to tell if it was called from sh_lexskip(). So we could use that to prevent it from throwing the syntax error and just return instead, handing control back to arithfor(), so that it can throw the syntax error after restoring state.

Here is a patch that does that. Please test.

diff --git a/src/cmd/ksh93/sh/lex.c b/src/cmd/ksh93/sh/lex.c
index ed8ed2db6..d60cb1cb9 100644
--- a/src/cmd/ksh93/sh/lex.c
+++ b/src/cmd/ksh93/sh/lex.c
@@ -344,7 +344,7 @@ int sh_lex(Lex_t* lp)
 				/* end-of-file */
 				if(mode==ST_BEGIN)
 					return lp->token=EOFSYM;
-				if(mode >ST_NORM && lp->lexd.level>0)
+				if(mode >ST_NORM && lp->lexd.level>0 && !lp->lexd.noarg)
 				{
 					switch(c=endchar(lp))
 					{

@posguy99
Copy link
Author

posguy99 commented Aug 26, 2024

That stopped the infinite loop.

$ for ((i = 5 , i == 0, --i))
arch/darwin.arm64-64/bin/ksh: syntax error: `))' unexpected
$ for (())
arch/darwin.arm64-64/bin/ksh: syntax error: `))' unexpected
$ echo $KSH_VERSION
Version AJM 93u+m/1.1.0-alpha+b75df92a/MOD 2024-08-26

Works for me.

McDutchie added a commit that referenced this issue Aug 27, 2024
Marc Wilson (@posguy99) reports:

> A completely errant paste that wasn't meant for the terminal
> window exposed this...
>
> Reproducer:
>
>   $ for ((i = 5 , i = 0, --i))
>   -ksh: syntax error: `{' unmatched
>   -ksh: syntax error: `{' unmatched
>   -ksh: syntax error: `{' unmatched
>   -ksh: syntax error: `{' unmatched
>   -ksh: syntax error: `{' unmatched
>
> Continues forever until you kill the shell.

Simplest reproducer is 'for(())', with identical symptoms. In
scripts, the symptom is less severe, but the error message is still
incorrect as above. For example:

  $ ksh -c 'for(())'
  ksh: syntax error at line 1: `{' unmatched

The expected error is:

  ksh: syntax error at line 1: `))' unexpected

ksh93u+ and ksh2020 are also broken, just not in the same way, and
harder to reproduce. The current infinite-loop manifestation of the
bug was introduced in commit f8f2c4b.

Analysis: what happens is this:

1. arithfor() in parse.c is called to parse the arithmetic for loop
2. arithfor() saves current input stream and opens a new input
   stream to separate each of the three expressions in (( ... ))
3. on line 758, arithfor() calls sh_lexskip() to find the next ';'
4. in lex.c line 1749, sh_lexskip() calls the main lexer function,
   sh_lex()
5. in lex.c line 375, sh_lex() calls sh_syntax() to throw the
   syntax error in the reproducer
6. sh_syntax() calls errormsg() from libast, which causes a longjmp
   straight back to the prompt, without giving arithfor() a chance
   to restore the input stream that it saved in step 2. This is the
   bug. We can get all sorts of unpredictable/undefined behaviour
   that way.

I can think of two strategies to fix this:

1. Push context and sigsetjmp in arithfor() before calling
   sh_lexskip(), so that it longjmps back to arithfor() and it can
   restore the input stream before doing its own longjmp. This may
   carry a minor performance hit for nested arithmetic for loops.

2. Since arithfor() throws a syntax error itself when needed anyway
   (and does so after restoring the input stream), we could also
   find a way for sh_lex() not to throw that syntax error when
   called from sh_lexskip.

Strategy 2 might be the best way. It just so happens that
sh_lexskip sets its own flag, lp->lexd.noarg, that we can use in
sh_lex() to tell if it was called from sh_lexskip(). So we could
use that to prevent it from throwing the syntax error and just
return instead, handing control back to arithfor(), so that it can
throw the syntax error after restoring state.

src/cmd/ksh93/include/shlex.h,
src/cmd/ksh93/sh/lex.c:
- Rename lexd.noarg flag to lexd.inlexskip. Since it will now be
  multi-purpose, this is clearer to the code reader. It simply
  means that the current sh_lex() invocation was called from
  sh_lexwskip().
- When reading end-of-file, do not throw a symtax error from
  sh_lex() if lexd.inlexskip is set, letting the caller handle it
  instead. This fixes the bug.

Resolves: #779
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is not working
Projects
None yet
Development

No branches or pull requests

3 participants
@posguy99 @McDutchie and others