-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create github-actions-usage
lambda
#810
Conversation
1807ab3
to
cabbe16
Compare
3ef37dc
to
e4fb0d2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Super useful feature, and I really like the code style overall. Separating side effects from logic make it easier to test the logic, perhaps we could take advantaghe of this with a couple more tests?
What's the behaviour we want when any part of this fails? Not something to worry about in this PR, but do we want best efforts for the data that it can process, or do we want it to stop completely? In either case, how would we find out that this happened, and resolve it?
I'm also a bit nervous about the schema validation, because it looks like a potential source of failures in the future. There are some comments about this below.
Great stuff though, really exciting to have this data.
e4fb0d2
to
165452a
Compare
346413a
to
1789f8a
Compare
a176033
to
2c90698
Compare
…the schema Validating the Workflow content against the schema allows us to type-cast the objects with confidence. Using a schema from https://www.schemastore.org.
…github_actions_usage`
By joining `github_workflows` with `github_repositories` at query time, the volume of code reduces, due to less validation being necessary.
Organise the logic into three distinct steps: 1. Read data 2. Transform data 3. Write data This should improve readability.
Also add comments.
This view shows the archived status of a repository, the name of the Action being used, and the version.
Triggered when the `GitHubRepositories` task stops successfully.
2c90698
to
2a95efa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work!
…low-parser` Swap the community maintained Schema Store parsing of GitHub Workflow files for the GitHub authored `@actions/workflow-parser` NPM module. Interestingly, the two approaches yield slightly different results: - https://github.com/guardian/service-catalogue/blob/3f25b6c553e1ed2b192bce11cc15e6f25afa7239/.github/workflows/ecs-publish.yml - Schema Store fails to parse - `@actions/workflow-parser` able to parse - https://github.com/guardian/grid/blob/e37e3acaeea198beb82896a143cc9c66d200aadc/.github/workflows/ci.yml - Schema Store able to parse - `@actions/workflow-parser` fails to parse Anecdotally, this version also appears to be more performant. `@actions/workflow-parser` is an ESM only module, and it was proving tricky to get Jest to work with it. For this reason, the tests use Node's native test runner.
2a95efa
to
fdc0c63
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great stuff - this is going to be really useful!
Rollout steps:
|
This is great - thanks @akash1810 ! |
The
contents
field of thegithub_workflows
table contains the contents of a repository's GitHub Workflows, as a YAML string. Querying YAML in SQL isn't trivial.What does this change?
This change adds a new table
guardian_github_actions_usage
, to track the Actions we're using. It has the schema:For ease, a view
view_github_actions
is also created, which yields results similar to1:The
guardian_github_actions_usage
table is populated by a new AWS Lambda, triggered once theGitHubRepositories
ECS task completes successfully.The Lambda works as follows:
github_workflows
tablecontents
field@actions/workflow-parser
to parse thecontents
field2uses
string from thecontents
Validating the Workflow content against the schema allows us to type-cast the rows with confidence, and without any complex parsing.
The Lambda is invoked when the GitHubRepositories task successfully ends.
Note
Why?
To observe how we're using GitHub Actions across the organisation, and how it compares to the guidance provided by GitHub.
Other usage for this data includes answering:
How has it been verified?
GitHubRepositories
task, observed lambda starting once the task completes, and the new table being populatedNotes
Comparing
@actions/workflow-parser
to Schema StorePrevious commits in this branch used Schema Store to parse the
contents
field of thegithub_workflows
table.Schema Store is a community maintained set of YAML schemas. It's also used by JetBrains IDEs (and others, I imagine!).
@actions/workflow-parser
is used within the GitHub Actions VS Code extension. In theory,@actions/workflow-parser
will be kept up to date. It's unclear if this library is intended for external use, or if its an internal library. However, it's on NPM...Its worth noting Schema Store and
@actions/workflow-parser
yield slightly different results. At present, thegithub_workflows
table has 1315 rows. Schema Store can process 1305 of them, and@actions/workflow-parser
can process 1301 of them.Additionally, Schema Store and
@actions/workflow-parser
have differing results for some files:guardian/service-catalogue
@actions/workflow-parser
able to parseguardian/grid
@actions/workflow-parser
fails to parseI'll create an issue on the two repositories to ask for clarification here.
The default error messages provided by
@actions/workflow-parser
are more helpful, describing the line and column of an error:Full comparison
@actions/workflow-parser
Alternative solutions
Just use SQL
An alternative to this change would be to add a
contents_as_json
column to the CloudQuery table - cloudquery/cloudquery#16846.However, the schema of a Workflow file (specifically where the
uses
property can appear) is tricky. It can appear withinsteps
, or outside.That is, the SQL to extract these values would be complicated (we'd likely continue to abstract it away into a view though).
Footnotes
Some columns removed, for brevity ↩
https://github.com/actions/languageservices/tree/main/workflow-parser is written by GitHub, and used by various GitHub authored GitHub VS Code extensions ↩