-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rework logprep temp files #402
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we will have issues with multiprocessing... :-/
temp_dir = Path(tempfile.gettempdir()) | ||
list_path = temp_dir / Path(f"{self.name}-tldlist-{index}.dat") | ||
list_path.touch() | ||
list_path.write_bytes(GetterFactory.from_string(tld_list).get_raw()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could get issues with multiprocessing on this solution. Not for touching the file and not for reading it, but in case two processed try to write to this file as they do in line 92.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same in the other touched files.
How about creating the file as before with the process name but delete it on restart?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mh yeah I can try to delete the file or the whole temp directory on shutdown of logprep, but that doesn't help with your suggestion from the issue: "this leads to downloading the file on every restart instead of using the already downloaded file."
If the issues can only happen on writing to the file, can't we solve this then with a lock
? I think the processors only create this file in the startup process and are not writing to it during the logprep run. So creating these files initially with a lock should be possible if multiple processes can read from it at the same time. Besides, I think even the reading shouldn't be an issue as the TLDExtract
library should load the lists content into memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes.. a lock should work for this.
And we also should give it a try for the geoip_enricher database. we then should lock every database access.. .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a FileLock
to these write operations.
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## main #402 +/- ##
==========================================
+ Coverage 92.21% 92.32% +0.11%
==========================================
Files 133 133
Lines 9452 9460 +8
==========================================
+ Hits 8716 8734 +18
+ Misses 736 726 -10
☔ View full report in Codecov by Sentry. |
closes #401