-
Notifications
You must be signed in to change notification settings - Fork 340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GitHub intel updates to error handling #1303
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we able to use this version using the git commit hash without having to merge?
if 'data' not in resp: | ||
logger.warning( | ||
f'Got no "data" attribute in response: {resp}. ' | ||
f'Stopping requests for organization: {organization} and ' | ||
f'resource_type: {resource_type}', | ||
) | ||
has_next_page = False | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a known issue that could happen? It's like a key in a dictionary that we expect but doesn't exist; and that can happen in many other places. Wonder if we know more about this particular case to suspect it might happen here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data
is the root element from GH GraphQL API.
Unfortunately, there is no official documentation explicitly describing the response 🙃 .
But, in case it doesn't exists, the code cannot get any data to ingest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @heryxpc said, GraphQL response usually contain data
as the root attribute in the response which contains the actual requested data. This is a convention I've seen across GraphQL, not only for GitHub. Another thing that I've seen across GraphQL APIs is that if there is no data available, the attributes are returned as undefined, null or not even present. That's why this check here ill help us know what happened.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this issue is what has been breaking the GitHub crons in our infra. As it's unclear in which page it happens, I'm just breaking out the loop gracefully and making sure we at least got the org_data.
|
||
org_data = {'url': resp['data']['organization']['url'], 'login': resp['data']['organization']['login']} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the change here is that: in the past, this information is retrieved in the last.
We change this to collect this info on every page if not already collected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
org_data
is mandatory to perform the full ingestion of Github info to Cartography: https://github.com/lyft/cartography/blob/fa1be115c80e83cefe0e31be5d738b5cda9a0887/cartography/intel/github/users.py#L116
I believe below code ensures that it exists on any page, otherwise it breaks the full process before starting ingestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Like @khanhldt mentioned, might be good to skip the formatting changes on this PR and see if we can put black
formatter to the whole repo on a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the linter changes are quite arbitrary as I tested this locally.
I'm not sure why, but don't have much time to look into this.
--
We should really set up Black on this
## Summary We've found constant errors in the GitHub module when the response data does not have the expected schema. These updates, while not always prevent the crash, they will provide more insight into what happened.
Summary
We've found constant errors in the GitHub module when the response data does not have the expected schema. These updates, while not always prevent the crash, they will provide more insight into what happened.