Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replay a GitHub event locally at a bot instance #178

Closed
wants to merge 10 commits into from

Conversation

jonas-lq
Copy link
Collaborator

@jonas-lq jonas-lq commented Apr 26, 2023

Fixes #94

PR for being able to replay events stored in the file system.

Triggered by commenting "replay " on the pr (Your GitHub app must subscribe to the event "Issue comment")

There were already some support for replaying events (see line 200 in eessi_bot_event_handler.py), but this is outdated and would not work even if the specified file contained both the header and body (PyGHee logs events as separate files for body and header). Should I change the code here to utilise my newly implemented function?

@trz42 trz42 changed the title Issue#94 replay a GitHub event locally at a bot instance Apr 27, 2023
Copy link
Contributor

@trz42 trz42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks already quite nice. The PR does more than what #94 asked for. #94 just suggested to have a means to replay an event locally (where a bot instance is running). A couple of suggestions:

  • remove the ability to send the replay command from GitHub (it will be impractical to use or worse lead to unintended consequences)
  • think of improving the efficiency, as of now os.walk() will traverse the whole directory of the bot instance and access 100k or even millions of entries
    • one could/should make use of the directory structure starting at events_log
    • one could also think of improving PyGHee such that it creates symlinks, eg, event_id -> directory that contains the header & body json
      • thus there need to be no lookup/traversal of the file system
  • there could be a new small script, say replay_event.py that imports the replay_event method
    • this probably requires to move the core of the method replay_event out of the EESSIBotEventHandler class to another module, say tools/replay_events.py
    • thus we could have a tool that replays an event and if we later decide to use that from the bot (event handler) we can just reuse it
  • a follow-up PR could improve the replay_event tool to limit the effects of replaying an event, for example, only replay it for certain architectures or job ids

Comment on lines 69 to 70
if comment_txt.lower()[:6] == "replay":
self.replay_event(comment_txt[7:])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to remove the ability to trigger the replay from GitHub. #94 only intended to replay an event locally, that is, where a bot instance is running. Someone who only has access to GitHub may also have difficulties to know/obtain the ID of an event.

"""
event = namedtuple('Request', ['headers', 'json', 'data'])

for (dir, _subdirs, files) in os.walk("."):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This traverses the whole directory tree under .. Benefit would be that even if the directory structure changes, it could still find the event. Disadvantage could be that it does not find the event because they are not stored under . (it's just a relative reference to the file system) and it checks many files/directories.

Could this be made more efficient and explicitly use the directory that stores event information?

. could be replaced with what app.cfg defines for the directory that contains jobs OR the directory PyGHee uses to log events. For example, for the current NESSI bot instances on Fram, eX3 and AWS, there are 15159, 15161 and 15129 entries under events_log, while under . there are 49037, 53289, 284738 entries, respectively.

Comment on lines 156 to 164
if any([event_id in file for file in files]):
with open(f"{dir}/{event_id}_headers.json", 'r') as jf:
headers = json.load(jf)
event.headers = CaseInsensitiveDict(headers)
with open(f"{dir}/{event_id}_body.json", 'r') as jf:
body = json.load(jf)
event.json = body

event_info = get_event_info(event)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very good. Clean and concise. No immediate requests for changes. Maybe need to be updated if the os.walk() is replaced with something more efficient.

@trz42
Copy link
Contributor

trz42 commented Nov 26, 2023

Closing this one. We may revisit it at a later time.

@trz42 trz42 closed this Nov 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replay a GitHub event locally
2 participants