Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GitHub intel updates to error handling #1303

Merged
merged 8 commits into from
May 13, 2024
Merged

GitHub intel updates to error handling #1303

merged 8 commits into from
May 13, 2024

Conversation

serge-wq
Copy link
Contributor

Summary

We've found constant errors in the GitHub module when the response data does not have the expected schema. These updates, while not always prevent the crash, they will provide more insight into what happened.

@serge-wq serge-wq changed the title GitHub intel updates to error handlingg GitHub intel updates to error handling May 10, 2024
Copy link

@kledo-lyft kledo-lyft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we able to use this version using the git commit hash without having to merge?

cartography/intel/github/util.py Show resolved Hide resolved
Comment on lines +173 to +180
if 'data' not in resp:
logger.warning(
f'Got no "data" attribute in response: {resp}. '
f'Stopping requests for organization: {organization} and '
f'resource_type: {resource_type}',
)
has_next_page = False
continue

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a known issue that could happen? It's like a key in a dictionary that we expect but doesn't exist; and that can happen in many other places. Wonder if we know more about this particular case to suspect it might happen here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data is the root element from GH GraphQL API.
Unfortunately, there is no official documentation explicitly describing the response 🙃 .
But, in case it doesn't exists, the code cannot get any data to ingest.

Copy link
Contributor Author

@serge-wq serge-wq May 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @heryxpc said, GraphQL response usually contain data as the root attribute in the response which contains the actual requested data. This is a convention I've seen across GraphQL, not only for GitHub. Another thing that I've seen across GraphQL APIs is that if there is no data available, the attributes are returned as undefined, null or not even present. That's why this check here ill help us know what happened.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this issue is what has been breaking the GitHub crons in our infra. As it's unclear in which page it happens, I'm just breaking out the loop gracefully and making sure we at least got the org_data.

Comment on lines -183 to -184

org_data = {'url': resp['data']['organization']['url'], 'login': resp['data']['organization']['login']}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume the change here is that: in the past, this information is retrieved in the last.
We change this to collect this info on every page if not already collected.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

org_data is mandatory to perform the full ingestion of Github info to Cartography: https://github.com/lyft/cartography/blob/fa1be115c80e83cefe0e31be5d738b5cda9a0887/cartography/intel/github/users.py#L116
I believe below code ensures that it exists on any page, otherwise it breaks the full process before starting ingestion.

heryxpc
heryxpc previously approved these changes May 10, 2024
Copy link
Contributor

@heryxpc heryxpc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Like @khanhldt mentioned, might be good to skip the formatting changes on this PR and see if we can put black formatter to the whole repo on a separate PR.

@serge-wq
Copy link
Contributor Author

@heryxpc @khanhldt styling updates were applied by the linter if I remove them checks will fail. Can I get another look?

Copy link
Contributor

@khanhldt khanhldt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the linter changes are quite arbitrary as I tested this locally.
I'm not sure why, but don't have much time to look into this.

--

We should really set up Black on this

@serge-wq serge-wq merged commit a50573b into master May 13, 2024
5 checks passed
@serge-wq serge-wq deleted the github-gql-fixes branch May 13, 2024 18:01
chandanchowdhury pushed a commit to juju4/cartography that referenced this pull request Jun 26, 2024
## Summary

We've found constant errors in the GitHub module when the response data
does not have the expected schema. These updates, while not always
prevent the crash, they will provide more insight into what happened.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants