Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

awk option parser cannot handle -vfoo=bar, only -v foo=bar #82

Closed
tejing1 opened this issue Jul 26, 2022 · 8 comments
Closed

awk option parser cannot handle -vfoo=bar, only -v foo=bar #82

tejing1 opened this issue Jul 26, 2022 · 8 comments

Comments

@tejing1
Copy link

tejing1 commented Jul 26, 2022

error: builder for '/nix/store/q1jmr4hvrri48p2d6zqanip92b5ay2gh-sfeedrc.drv' failed with exit code 9;
       last 10 log lines:
       > [resholve context] RESHOLVE_LORE=/nix/store/3ns00nskan71bf01z2912f9h2xmnifqj-more-binlore
       > [resholve context] RESHOLVE_EXECER=
       > [resholve context] RESHOLVE_FAKE='function:'\''feed'\'';'\''_fetch'\'';'\''_convertencoding'\'';'\''_parse'\'';'\''_filter'\'';'\''_merge'\'';'\''_order'\'''
       > [resholve context] RESHOLVE_INPUTS=/nix/store/47n5hzqpahs7yv84ia6cxp3jg9ca8r86-coreutils-9.0/bin:/nix/store/j2ja4qf1hs51n9zvzq1i5sbl1vkxn8wd-lockfile-progs-0.1.19/bin:/nix/store/fr7vrxblkj327ypn3vhjwfhf19lddqqd-gawk-5.1.1/bin
       > [resholve context] RESHOLVE_INTERPRETER=/nix/store/0d3wgx8x6dxdb2cpnq105z23hah07z7l-bash-5.1-p16/bin/bash
       > [resholve context] /nix/store/x7qhhcwch658pj4sqfc3dj5bi6j99rbw-resholve-0.8.0/bin/resholve --overwrite /nix/store/1sgljd7baig9da7vs50wdkx9x0q72gad-sfeedrc
       > WARNING:__main__:CommandParser CommandParser(prog='awk (generic)', usage=None, description=None, version=None, formatter_class=<class 'argparse.HelpFormatter'>, conflict_handler='error', add_help=False) passing instead of "argument -c/--traditional: ignored explicit argument 'ategory=$1'"
       >     awk -F'\t' -vcategory="$1" '{ split($9, cs, "|");for (i in cs) if (cs[i] == category) { print; break; }; }'
       >     ^~~
       > /nix/store/1sgljd7baig9da7vs50wdkx9x0q72gad-sfeedrc:18: 'awk' _might_ be able to execute its arguments, and I don't have any command-specific rules for figuring out if this specific invocation does or not.
       For full logs, run 'nix log /nix/store/q1jmr4hvrri48p2d6zqanip92b5ay2gh-sfeedrc.drv'.
  • This is a pretty bad error message for this situation.
  • Awk (gnu) accepts -vcategory="$1" and -v category="$1" interchangeably, but resholve does not. Changing the source to the form with a space solved the problem for me, but I still thought I'd report it.

This may well be a fundamental limitation of the option parsing library in use, but let's at least make the error message more helpful. After all, you DO have "command-specific rules" for awk, they just failed in this instance.

@abathur
Copy link
Owner

abathur commented Jul 30, 2022

After peeking at this and realizing how many stars align to create this problem I'm not sure if there'll be an immediate/obvious fix (but I think it'll be feasible to raise a better error). For now I'll just break this down in public:

  1. In the parsing routine, I've been tentatively trying out a step that knocks ~varlikes out of most invocations. I don't remember the exact circumstances that encouraged me to add it and my comment wasn't useful enough, but (projecting/guessing) maybe I couldn't get argparse to handle them, and they're reasonably common in invocations of env and sudo.

    resholve/resholve

    Lines 3938 to 3939 in b743d2e

    # tentatively try knocking out varlikes for everyone?
    invocation = self._strip_varlikes(invocation)

  2. The assumption that the varlikes will be stripped out leaks into the parser for awk. I assumed the space-separated form, so I set that argument not to actually expect an argument. (It would be stripped in the space-separated case.)

    resholve/resholve

    Lines 1908 to 1910 in b743d2e

    # actually assign should be append, but strip_varlikes will
    # pluck out the args it is looking for...
    generic.add_argument("-v", "--assign", action="count")

  3. I think argparse is parsing this as roughly equivalent to -v -c ategory="$1" (and failing because I've set this option not to expect an argument). resholve surfaces these as warnings instead of errors here:

    resholve/resholve

    Lines 1367 to 1368 in b743d2e

    def error(self, message):
    logger.warn("CommandParser %r passing instead of %r", self, message)

    It's doing this for two reasons, one of which is thornier than the other:

    • In a few cases, resholve is coping with mutually-exclusive syntaxes from different command variants by just trying N parsers in turn, surfacing parse errors as mere warnings, and issuing the "I don't have any command-specific rules" error if none of them hit without error (i.e., there is no command parser for this command that can cope with the syntax in this invocation). This occurs at

      resholve/resholve

      Lines 3941 to 3978 in b743d2e

      if externals[subcmd]:
      for parser in externals[subcmd]:
      parsed = _unparsed = None
      parsed = parser.parse_known_args(invocation.args)
      if not parsed:
      logger.debug(
      "parse failed (parser=%r, args=%r)", parser, invocation
      )
      continue
      else:
      parsed, _unparsed = parsed
      # see sudo parser for example
      if "skip" in parsed and parsed.skip:
      continue
      handler = getattr(
      self,
      "handle_external_{:}".format(subcmd),
      self.handle_external_generic,
      )
      logger.debug(
      "parsed %r args %r (unparsed: %r)", parser, parsed, _unparsed
      )
      logger.debug(
      "calling %r(parsed=%r, ob=%r)",
      handler,
      parsed,
      invocation,
      )
      if handler(parsed, invocation):
      return
      observe(
      PotentialExecer,
      invocation.firstword,
      word=invocation.firstword.ast,
      arena=self.arena,
      )

    • I didn't have a great sense of how good the parsers would be and what all kinds of things would cause parse errors here, so I didn't want to break a bunch of existing uses by making parse errors blocking.

      (I think I'm on board with raising a distinct error if there are parsers and they all fail--you're right that the error is misleading.)

@abathur
Copy link
Owner

abathur commented Jul 30, 2022

I have some uncommited WIP to add a new error, at least. Here's what the new error looks like when I use your invocation as a test case:

not ok 65 exercise built-in syntax parsers in 343ms
# (from function `parsers' in file tests/helpers.bash, line 174,
#  in test file tests/parsers.bats, line 10)
#   `parsers' failed with status 15
# WARNING:__main__:CommandParser CommandParser(prog='awk (generic)', usage=None, description=None, version=None, formatter_class=<class 'argparse.HelpFormatter'>, conflict_handler='error', add_help=False) passing instead of "argument -c/--traditional: ignored explicit argument 'ategory=$1'"
#   awk -F'\t' -vcategory="$1" '{ split($9, cs, "|");for (i in cs) if (cs[i] == category) { print; break; }; }'
#   ^~~
# [ stdinNone ]:1: 'awk' is able to execute its arguments, and none of my command-specific rules for figuring out if this specific invocation does or not succeeded. Look above for warning messages with more context on what went wrong.

WDYT?

@tejing1
Copy link
Author

tejing1 commented Jul 31, 2022

Oof, removing var=val arguments before the parser sounds like the kind of thing that was bound to come back to bite you eventually. Long term, you may not be able to manage without writing a custom argument parser. The ones that already exist generally assume you're defining the interface being parsed and can just... not do things it doesn't support.

As for the proposed error message: not bad, but maybe do s/is able to execute its arguments, and/executes its arguments in some circumstances, and/. It sounds slightly clearer, at least to me.

@abathur
Copy link
Owner

abathur commented Jul 31, 2022

Oof, removing var=val arguments before the parser sounds like the kind of thing that was bound to come back to bite you eventually.

Hehe :)

I suspected it might.

Long term, you may not be able to manage without writing a custom argument parser. The ones that already exist generally assume you're defining the interface being parsed and can just... not do things it doesn't support.

Yes, I've suspected as much for a bit.

There was actually an early period (at least, for this functionality) where I was trying to just build up a stable of parsing techniques, but it was a bit of a mess and I suspected there'd be literally 0 chance of outsider contributions on these while doing it that way.

I hope to end up, whether it's on a common parser or home grown, at something that is config driven, so I've got some incentive to stretch argparse as far as it'll go to understand the requirements better before tilting at the windmill.

I might have mentioned this in the issue about making custom parsers easier to bolt on, but I also a vague sense that it'd be nice if the format could graduate into something flexible enough to be re-usable in other tooling cases? (A probably-quixotic dream about actual upstreams or at least other parties helping with some of the lifting of creating and managing them...)

As for the proposed error message: not bad, but maybe do s/is able to execute its arguments, and/executes its arguments in some circumstances, and/. It sounds slightly clearer, at least to me.

👍

@tejing1
Copy link
Author

tejing1 commented Jul 31, 2022

I hope to end up, whether it's on a common parser or home grown, at something that is config driven, so I've got some incentive to stretch argparse as far as it'll go to understand the requirements better before tilting at the windmill.

Whatever you do, I think you at least need a code-based escape hatch. There's no way a config-driven system can really handle everything. The world is too weird. Given that, you might as well make the system modular at the code level, then make some generic handlers that cover a lot of cases and can be invoked from data read from a config file.

I've been mulling over how that kind of system should be structured, off and on, since your reply on #81. Being a functional programmer at heart, I come at it from the perspective of composability. It's clear you need some kind of system that takes a list of strings representing a command and returns several lists of strings representing the commands it would invoke, and also produces some (sometimes failing) method to turn desired modifications of the invoked commands back into a modification of the original command. Where it gets messy is that you can't always statically determine the subcommands. They often depend partially on runtime data, and you need some kind of sufficiently flexible structure to represent most cases of that, while also still being workable for the individual command's parsers.

The more I think about this problem as a whole, the more I think it basically all comes down to a composable element like that. Parsing shell and modifying it, in itself, comes down to almost the same operation. And the hard part will always come down to defining the boundary of the statically analyzable portion of the system in a useful way.

abathur added a commit that referenced this issue Jul 31, 2022
abathur added a commit that referenced this issue Jul 31, 2022
Closes #82. resholve strips "varlikes" out of commands before handling
them (I think argparse may choke on them?). Canonical examples are
`env` and `sudo`. Because of this, the main awk parser lies and says
the `-v` option has no argument.

This adds a second near-identical awk parser that correctly expects
the argument.
abathur added a commit that referenced this issue Jul 31, 2022
@abathur
Copy link
Owner

abathur commented Jul 31, 2022

Woke up a little smarter today and realized that we can brute force this for now by adding a second parser that recognizes the closed form (well, it recognizes the actual argument).

I've cut a v0.8.1 release and opened a nixpkgs PR.

In case speaking it out loud helps me remember or search for it later, I separately committed a failing syntax/parser test so that there's a good example to point at.

@tejing1
Copy link
Author

tejing1 commented Jul 31, 2022

Ah of course, the good old getopt approach. normalize then use a simpler parser.

EDIT: Oh, I see. You're doing either/or. It should cover more cases at least, so long as people don't mix both forms on the same command line.

@abathur
Copy link
Owner

abathur commented Jul 31, 2022

so long as people don't mix both forms on the same command line.

Nightmare fuel :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants