Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Parsing errors #839

Closed
azeemshaikh38 opened this issue Aug 11, 2021 · 32 comments
Closed

BUG: Parsing errors #839

azeemshaikh38 opened this issue Aug 11, 2021 · 32 comments
Assignees
Labels
kind/bug Something isn't working
Projects

Comments

@azeemshaikh38
Copy link
Contributor

Noticing many parsing errors during cron job. Not blocking cron job, but good to fix since otherwise we just return ErrScorecardInternal. Some examples to reproduce shown below:

scorecard --repo=github.com/ufal/udpipe --checks=Pinned-Dependencies

scorecard --repo=github.com/ubisoft/Sharpmake --checks=Pinned-Dependencies

scorecard --repo=github.com/uber/okbuck --checks=Pinned-Dependencies

scorecard --repo=github.com/uber/NullAway --checks=Pinned-Dependencies

scorecard --repo=github.com/aliyun/aliyun-odps-python-sdk --checks=Token-Permissions,Pinned-Dependencies

scorecard --repo=github.com/alibaba/GraphScope --checks=Token-Permissions

scorecard --repo=github.com/u-boot/u-boot --checks=Pinned-Dependencies

There might be more cases. Will add as I find them. Would be good to fix these and add these repos to cron/data/projects.release.csv as and when we fix them.

@azeemshaikh38 azeemshaikh38 added the kind/bug Something isn't working label Aug 11, 2021
@laurentsimon
Copy link
Contributor

@chrismcgehee are you interested in helping for this issue? It's in the Pinned-Dependency check that you're looking at to add lines.

@ristomcgehee
Copy link
Contributor

Sure, I'll take this one.

@ristomcgehee
Copy link
Contributor

There are several different categories these parsing errors fall into. I'll list them and state how I plan to address them:

  • Treating powershell code as shell code
    Example: ubisoft/Sharpmake
    How to fix: Check for if: runner.os == 'Windows'. Skip for now and add a todo to parse powershell code
  • Valid bash that is not parsed correctly
    Example: uber/NullAway
    How to fix: Open issue in mvdan.cc/sh/v3/syntax.
  • Bash files in .github/workflows
    Example: alibaba/GraphScope
    How to fix: Parse as shell code instead of yaml
  • Non-shell files with ".sh" extension
    Example: u-boot/u-boot
    How to fix: Evaluate if shebang is for a recognized shell program (/bin/bash, /bin/zsh, etc.) before trying to parse

Let me know if you have any suggestions.

@azeemshaikh38
Copy link
Contributor Author

Thanks Chris, this is great! Comments inline:

  • Treating powershell code as shell code
    Example: ubisoft/Sharpmake
    How to fix: Check for if: runner.os == 'Windows'. Skip for now and add a todo to parse powershell code

How would we check for this? If we have parsers that can understand this, that can work. But if we are planning to use string matching to do this, I'm not too sure about this one.

  • Valid bash that is not parsed correctly
    Example: uber/NullAway
    How to fix: Open issue in mvdan.cc/sh/v3/syntax.
  • Bash files in .github/workflows
    Example: alibaba/GraphScope
    How to fix: Parse as shell code instead of yaml
  • Non-shell files with ".sh" extension
    Example: u-boot/u-boot
    How to fix: Evaluate if shebang is for a recognized shell program (/bin/bash, /bin/zsh, etc.) before trying to parse

All of this sound good to me.

@ristomcgehee
Copy link
Contributor

How would we check for this? If we have parsers that can understand this, that can work. But if we are planning to use string matching to do this, I'm not too sure about this one.

This is a pretty simple check, so I think string matching would be appropriate here. I was unable to find any open-source parsers for github workflows.

@laurentsimon
Copy link
Contributor

Thanks Chris, this is great! Comments inline:

  • Treating powershell code as shell code
    Example: ubisoft/Sharpmake
    How to fix: Check for if: runner.os == 'Windows'. Skip for now and add a todo to parse powershell code

workflows also have a os param you can additionally look for.

How would we check for this? If we have parsers that can understand this, that can work. But if we are planning to use string matching to do this, I'm not too sure about this one.

  • Valid bash that is not parsed correctly
    Example: uber/NullAway
    How to fix: Open issue in mvdan.cc/sh/v3/syntax.
  • Bash files in .github/workflows
    Example: alibaba/GraphScope
    How to fix: Parse as shell code instead of yaml

you should not need to parse the file if it's a shell, because it will be done by validateShellScriptIsFreeOfInsecureDownloads . All you need to do is return early.
We're already calling isShellScriptFile() (see https://github.com/ossf/scorecard/blob/main/checks/pinned_dependencies.go#L419) to check if a file is a shell script, and I forgot to add it for github workflows :-) So I think just calling isShellScriptFile` should be enough.

  • Non-shell files with ".sh" extension
    Example: u-boot/u-boot
    How to fix: Evaluate if shebang is for a recognized shell program (/bin/bash, /bin/zsh, etc.) before trying to parse

this is already done in isSupportedShell(), see supported shell names in https://github.com/ossf/scorecard/blob/main/checks/shell_download_validate.go#L35
I think https://github.com/ossf/scorecard/blob/main/checks/pinned_dependencies.go#L193 is missing a call to isSupportedShell()

All of this sound good to me.

@ristomcgehee
Copy link
Contributor

Laurent, thanks for the comments.

workflows also have a os param you can additionally look for.

Is there an example yml file you can point to that shows the os param?

@ristomcgehee
Copy link
Contributor

Ok, maybe if all the values in runs-on or strategy.matrix.os start with "windows-", we can assume the script is powershell. Otherwise, we can look for if: runner.os == 'Windows' (using a slightly more sophisticated regex).

@laurentsimon
Copy link
Contributor

sounds like a reasonable approach. Also verify that if if: runner.os == 'Windows' is present, at least one windows os is specified in runs-on

@azeemshaikh38
Copy link
Contributor Author

Found one more. Might be similar to previous ones I found.

scorecard --repo=https://github.com/crawl/crawl --checks=Pinned-Dependencies

@azeemshaikh38
Copy link
Contributor Author

One more from #954

scorecard --repo=github.com/sourcegraph/sourcegraph --checks Pinned-Dependencies 

@ristomcgehee
Copy link
Contributor

The error from https://github.com/crawl/crawl is because there is a python file in .github/workflows. I'll fix this by only trying to unmarshal files in .github/workflows if it has a .yml or .yaml extension. From https://docs.github.com/en/actions/reference/workflow-syntax-for-github-actions:

Workflow files use YAML syntax, and must have either a .yml or .yaml file extension.

The error from https://github.com/sourcegraph/sourcegraph is actually failing on the CODENOTIFY file. The same fix should handle this one also. @nathan-415, fyi.

@naveensrinivasan
Copy link
Member

@chrismcgehee Can we close this?

@ristomcgehee
Copy link
Contributor

@chrismcgehee Can we close this?

Not yet. There's still one more issue I need to address: shell code that isn't able to be parsed.

@laurentsimon
Copy link
Contributor

I found another one --repo=tensorflow/tensorflow that fails invalid shell code: 18:48: &> must be followed by a word

@ristomcgehee
Copy link
Contributor

Ok, there are 4 repos that give us invalid shell code. Here's the deal with each of them:

That leads me to a broader question of how we handle shell code that we are unable to parse. Currently, the entire check fails with a score of '?'. What I would like to do instead is log a warning whenever we fail to parse a file, but continue on with the check. It would be good if we put a system in place to monitor for these warnings so we can evaluate if there is a bug in our parsing code. How does this sound?

@azeemshaikh38
Copy link
Contributor Author

It would be good if we put a system in place to monitor for these warnings

+1. See https://github.com/ossf/scorecard/blob/main/stats/views.go#L50. You can use that to record/monitor these parse errors.

We can start with this to get a sense of how bad the problem is and then decide on whether we should log a warning and continue.

@azeemshaikh38
Copy link
Contributor Author

@chrismcgehee thanks for fixing so many of the parsing issues. The number of parsing errors we see in the cron job has significantly reduced!

Some new parsing issues I see as of now:

./scorecard --repo=github.com/ljharb/array-includes

./scorecard --repo=github.com/llooker/swilt_test1

./scorecard --repo=github.com/llvm/llvm-project

./scorecard --repo=github.com/logur/logur

./scorecard --repo=github.com/looker/ci_demo

./scorecard --repo=github.com/lucas-clemente/quic-go

@azeemshaikh38
Copy link
Contributor Author

azeemshaikh38 commented Oct 5, 2021

Monitoring data @chrismcgehee put in place for shell errors. We seem to consistently have some shell related failures when running the cron job. Although the number is not that high which is a good thing. But we clearly seem to fail parsing shell code on a significant number of repos.

image

@azeemshaikh38
Copy link
Contributor Author

Another parsing issue reported by @asraa in #1163:

The Token-Permission check seems to error out on an invalid GitHub workflow from https://securitylab.github.com/research/github-actions-preventing-pwn-requests/

The mistake is in parsing jobs in

mjobs, ok := jobs.(map[interface{}]interface{})

I believe this should be a list of jobs.

What happens is that no top-level perms are defined, so this checks run level perms. Then it parses jobs as a map, when in fact it is a list, and the result is an internal error.

@ristomcgehee
Copy link
Contributor

@asraa, can you provide an example yaml file (or a repo) where this would fail? I looked through https://securitylab.github.com/research/github-actions-preventing-pwn-requests/ and didn't see any yaml file that looked like it had jobs as a list instead of a map.

@asraa
Copy link
Contributor

asraa commented Oct 27, 2021

@asraa, can you provide an example yaml file (or a repo) where this would fail? I looked through https://securitylab.github.com/research/github-actions-preventing-pwn-requests/ and didn't see any yaml file that looked like it had jobs as a list instead of a map.

Ooops! I failed to update this. I found two things

The YAML linked in that blog will return github internal error for parsing workflow for the Token-Permissions check

return sce.WithMessage(sce.ErrScorecardInternal, errInvalidGitHubWorkflow.Error())

@azeemshaikh38
Copy link
Contributor Author

@chrismcgehee FYI, code is panic-ing:

./scorecard --repo=github.com/twisted/towncrier --checks=Pinned-Dependencies
Starting [Pinned-Dependencies]
panic: interface conversion: interface {} is map[string]interface {}, not string

goroutine 82 [running]:
github.com/ossf/scorecard/v3/checks.getOSesForJob(0xc0001872f8)
        github.com/ossf/scorecard/v3/checks/pinned_dependencies.go:595 +0x2c9
github.com/ossf/scorecard/v3/checks.jobAlwaysRunsOnWindows(0xc000187480)
        github.com/ossf/scorecard/v3/checks/pinned_dependencies.go:607 +0x1d
github.com/ossf/scorecard/v3/checks.getShellForStep(0xbe6b9b, 0xc0001872f8)
        github.com/ossf/scorecard/v3/checks/pinned_dependencies.go:637 +0x58
github.com/ossf/scorecard/v3/checks.validateGitHubWorkflowIsFreeOfInsecureDownloads({0xc0006541f4, 0x18}, {0xc000453500, 0x14fb, 0x14fc}, {0xce6d18, 0xc0007ffda0}, {0xa92200, 0xc00001b828})
        github.com/ossf/scorecard/v3/checks/pinned_dependencies.go:555 +0x4f5
github.com/ossf/scorecard/v3/checks.CheckFilesContent({0xbeccf5, 0x13}, 0x0, 0xc000120ec0, 0xc22670, {0xa92200, 0xc00001b828})
        github.com/ossf/scorecard/v3/checks/file_utils.go:106 +0x1aa
github.com/ossf/scorecard/v3/checks.isGitHubWorkflowScriptFreeOfInsecureDownloads(0xc000120ec0)
        github.com/ossf/scorecard/v3/checks/pinned_dependencies.go:502 +0x57
github.com/ossf/scorecard/v3/checks.PinnedDependencies(0x7)
        github.com/ossf/scorecard/v3/checks/pinned_dependencies.go:160 +0xfe
github.com/ossf/scorecard/v3/checker.(*Runner).Run(0xc000187ed8, {0xce16b0, 0xc000117bf0}, 0xc22600)
        github.com/ossf/scorecard/v3/checker/check_runner.go:123 +0x2b2
github.com/ossf/scorecard/v3/pkg.runEnabledChecks.func1()
        github.com/ossf/scorecard/v3/pkg/scorecard.go:61 +0x1a5
created by github.com/ossf/scorecard/v3/pkg.runEnabledChecks
        github.com/ossf/scorecard/v3/pkg/scorecard.go:54 +0x118

@naveensrinivasan
Copy link
Member

On a similar note why not use this library for parsing https://github.com/rhysd/actionlint #989 ?

@azeemshaikh38
Copy link
Contributor Author

Only briefly looked at https://pkg.go.dev/github.com/rhysd/actionlint#Parse. Sounds like a good idea to me @naveensrinivasan .

@chrismcgehee wdyt?

@ristomcgehee
Copy link
Contributor

@azeemshaikh38 Thanks for fixing the panicky code.
@naveensrinivasan That library looks great! Glad you found it. I would love doing less maintenance for parsing actions.
@asraa I expect that when we move to the library for parsing actions, the yaml included in the blog will get parsed correctly. If it doesn't, we can look into it further at that time.

@naveensrinivasan
Copy link
Member

@azeemshaikh38 Thanks for fixing the panicky code.
@naveensrinivasan That library looks great! Glad you found it. I would love doing less maintenance for parsing actions.
@asraa I expect that when we move to the library for parsing actions, the yaml included in the blog will get parsed correctly. If it doesn't, we can look into it further at that time.

@chrismcgehee Do you want to do this? Or I can take it.

@ristomcgehee
Copy link
Contributor

@chrismcgehee Do you want to do this? Or I can take it.

I can do this one.

@naveensrinivasan
Copy link
Member

@chrismcgehee Do you want to do this? Or I can take it.

I can do this one.

Thanks

@azeemshaikh38
Copy link
Contributor Author

Parsing errors that showed up with the new actionlint parser:

./scorecard --repo=github.com/containous/maesh --checks=Pinned-Dependencies
./scorecard --repo=github.com/mahmoud/boltons --checks=Pinned-Dependencies

@spencerschrock
Copy link
Contributor

Closing as most of the parse errors here were fixed, and the fact that we continue on error now (#3515). More recent parse errors have their own issues: e.g. #3075

Scorecard automation moved this from Backlog to Done Nov 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Something isn't working
Projects
Development

No branches or pull requests

6 participants