-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gsoc phase2 #51
Gsoc phase2 #51
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks already in very good shape!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @akshitpatel01, nice job!
A general comment: now that we have some basic functionality that uses TinyDB, I think we should test where the limit is. Can you generate a very large log file (in the order of 100s of MBs, up to 1 GB) and measure how long it takes to read, parse, filter, and so on? I'd like to know whether we should jump ship and move to something else while there's still time.
Next step: tests :) while playing around with the code, I tried adding the following components to logging_example2.py
: 'WifiPhy': 'level_all', 'FrameExchangeManager': 'level_all'
. A couple things seemed to go wrong - I think that's a good starting point to understand how to improve the code!
I found this while working on tests. Consider these two logs:
If we greedily match the arguments (i.e. like this: Note: The second log is made manually for testing purposes and the first log is an actual log from FrameExchangeManager. Can you guys think of any workarounds? |
I get the following output when applying
I'm not sure I get what is incorrect in this parsing - can you explain in further detail? Is it the arguments in the first entry? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a couple of minor comments
sem/utils.py
Outdated
'Time': timestamp, # float | ||
'Context': context/nodeId, # str | ||
'Extended_Context': , #str | ||
'Component': log component, # str | ||
'Function': function name, # str | ||
'Arguments': function arguments, # str | ||
'Level': log level, # str | ||
'Severity_class': log severity class, # str | ||
'Message': log message # str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm not sure if this is being too picky, but dictionary keys are usually lower case (lower snake case to be precise). any opinion @DvdMgr ?
Yes. The arguments should be: But I think I solved it by updating the regex. |
That's great (i was also missing the last parenthesis, thank you Davide for asking). |
The '^$' ensures that the regex begins and ends with the regex. So in other words, if after matching a log with this regex, there are some trailing characters still present, then previously the regex will match (some part of the log will match) but now it will not. |
The following are the statistics for reading a file, parsing it, and executing two filters: 101MB: 7.711092710494995s I think these numbers will be highly system-specific. Particularly on my system, the main bottleneck was RAM(I have 16G) for bigger files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a few minor comments
sem/utils.py
Outdated
if isinstance(sevirity_class, str): | ||
sevirity_class = [sevirity_class] | ||
|
||
if severity_class is not None or components is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
components was check but never used. is there a reason for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refer 65afd95.
This execution time almost reduces by half if I change this line in insert_logs: Does this improvement in performance justify removing this deepcopy()? Also, all these functions will be internally called by the backend so the user will never call these functions directly. |
I have also added profiling plots here. |
Deepcopy is quite aggressive and expensive, it makes sense that runtime increase if you use it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Akshit, good job! I'm good to merge after these final comments are implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost there! I found a couple of minor issues to fix before merging. Another thing that was mentioned during the call was the plan to move these functions into a logging.py
file - do you think you can apply that change to this pull request too?
I created a dedicated logging file in the last commit. Do we want to create a new test_logging.py or keep the tests in test_utils? |
Best to move all those tests to |
Squashed and merged in 6ba05d1: thanks for the effort :) |
Hi, |
No problem, I'll make the change myself and force-push it to the |
Done! |
Hi @DvdMgr, @mattia-lecci,
I have added a new example logging-example2.py and the phase 2 functions process_logs, insert_logs, parse_logs, and filter_logs. It would be great if you could review this code.