Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gsoc phase2 #51

Closed
wants to merge 29 commits into from
Closed

Conversation

akshitpatel01
Copy link

Hi @DvdMgr, @mattia-lecci,
I have added a new example logging-example2.py and the phase 2 functions process_logs, insert_logs, parse_logs, and filter_logs. It would be great if you could review this code.

Copy link
Member

@mattia-lecci mattia-lecci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks already in very good shape!

sem/utils.py Outdated Show resolved Hide resolved
sem/utils.py Outdated Show resolved Hide resolved
sem/utils.py Outdated Show resolved Hide resolved
sem/utils.py Outdated Show resolved Hide resolved
sem/utils.py Outdated Show resolved Hide resolved
sem/utils.py Outdated Show resolved Hide resolved
sem/utils.py Outdated Show resolved Hide resolved
sem/utils.py Outdated Show resolved Hide resolved
sem/utils.py Outdated Show resolved Hide resolved
sem/utils.py Outdated Show resolved Hide resolved
Copy link
Member

@DvdMgr DvdMgr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @akshitpatel01, nice job!

A general comment: now that we have some basic functionality that uses TinyDB, I think we should test where the limit is. Can you generate a very large log file (in the order of 100s of MBs, up to 1 GB) and measure how long it takes to read, parse, filter, and so on? I'd like to know whether we should jump ship and move to something else while there's still time.

Next step: tests :) while playing around with the code, I tried adding the following components to logging_example2.py: 'WifiPhy': 'level_all', 'FrameExchangeManager': 'level_all'. A couple things seemed to go wrong - I think that's a good starting point to understand how to improve the code!

examples/logging_example2.py Outdated Show resolved Hide resolved
examples/logging_example2.py Outdated Show resolved Hide resolved
examples/logging_example2.py Outdated Show resolved Hide resolved
sem/utils.py Outdated Show resolved Hide resolved
sem/utils.py Outdated Show resolved Hide resolved
sem/utils.py Outdated Show resolved Hide resolved
@akshitpatel01
Copy link
Author

I found this while working on tests. Consider these two logs:

  • +0.045510017s 1 [mac=00:00:00:00:00:01] FrameExchangeManager:RxStartIndication(0x5576595683e0, "PSDU reception started for ", +76us, " (txVector: ", txpwrlvl: 17 preamble: LONG channel width: 20 GI: 800 NTx: 97 Ness: 0 MPDU aggregation: 0 STBC: 0 FEC coding: BCC mode: OfdmRate6Mbps Nss: 1, ")")
  • +0.000000000s -1 WifiPhy:SetChannelNumber(): [DEBUG] (Message)

If we greedily match the arguments (i.e. like this: .*?), the first log will not be matched correctly and if we do not match greedily (i.e. like this .*), the second log will not be matched correctly. Matched correctly here means that these logs do get matched by the regex, but the groups are incorrectly formed.

Note: The second log is made manually for testing purposes and the first log is an actual log from FrameExchangeManager.

Can you guys think of any workarounds?

@DvdMgr
Copy link
Member

DvdMgr commented Jul 12, 2021

Can you guys think of any workarounds?

I get the following output when applying parse_logs to a file containing the logs you described above:

[{'Time': 0.045510017,
  'Context': '1',
  'Extended_context': '[mac=00:00:00:00:00:01] ',
  'Component': 'FrameExchangeManager',
  'Function': 'RxStartIndication',
  'Arguments': '0x5576595683e0, "PSDU reception started for ", +76us, " (txVector: ", txpwrlvl: 17 preamble: LONG channel width: 20 GI: 800 NTx: 97 Ness: 0 MPDU aggregation: 0 STBC: 0 FEC coding: BCC mode: OfdmRate6Mbps Nss: 1, "',
  'Severity_class': 'FUNCTION',
  'Message': ''},
 {'Time': 0.0,
  'Context': '-1',
  'Extended_context': None,
  'Component': 'WifiPhy',
  'Function': 'SetChannelNumber',
  'Arguments': '',
  'Severity_class': 'DEBUG',
  'Message': '(Message)'}]

I'm not sure I get what is incorrect in this parsing - can you explain in further detail? Is it the arguments in the first entry?

Copy link
Member

@mattia-lecci mattia-lecci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a couple of minor comments

sem/utils.py Outdated
Comment on lines 531 to 538
'Time': timestamp, # float
'Context': context/nodeId, # str
'Extended_Context': , #str
'Component': log component, # str
'Function': function name, # str
'Arguments': function arguments, # str
'Level': log level, # str
'Severity_class': log severity class, # str
'Message': log message # str
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm not sure if this is being too picky, but dictionary keys are usually lower case (lower snake case to be precise). any opinion @DvdMgr ?

sem/utils.py Outdated Show resolved Hide resolved
@akshitpatel01
Copy link
Author

akshitpatel01 commented Jul 12, 2021

I'm not sure I get what is incorrect in this parsing - can you explain in further detail? Is it the arguments in the first entry?

Yes. The arguments should be:
'0x5576595683e0, "PSDU reception started for ", +76us, " (txVector: ", txpwrlvl: 17 preamble: LONG channel width: 20 GI: 800 NTx: 97 Ness: 0 MPDU aggregation: 0 STBC: 0 FEC coding: BCC mode: OfdmRate6Mbps Nss: 1, ")" '

But I think I solved it by updating the regex.
newRegex = ^OldRegex$

@mattia-lecci
Copy link
Member

But I think I solved it by updating the regex.
newRegex = ^OldRegex$

That's great (i was also missing the last parenthesis, thank you Davide for asking).
Although, I am not sure what you mean with this last sentence. Could you explain the new regex?

@akshitpatel01
Copy link
Author

The '^$' ensures that the regex begins and ends with the regex. So in other words, if after matching a log with this regex, there are some trailing characters still present, then previously the regex will match (some part of the log will match) but now it will not.

@akshitpatel01
Copy link
Author

Can you generate a very large log file (in the order of 100s of MBs, up to 1 GB) and measure how long it takes to read, parse, filter, and so on

The following are the statistics for reading a file, parsing it, and executing two filters:

101MB: 7.711092710494995s
205MB: 16.106702089309692s
308MB: 24.7750563621521s
411MB: 33.843762159347534s (8s for two filters)
513MB: 47.96710252761841s
617MB: 55.62735986709595s
719MB: 64.37102270126343s
815MB: 73.82458209991455s (17s for two filters)
895MB: 77.21614909172058s (18s for two filters)
964MB: 86.9548716545105s (19s for two filters)

I think these numbers will be highly system-specific. Particularly on my system, the main bottleneck was RAM(I have 16G) for bigger files.

Copy link
Member

@mattia-lecci mattia-lecci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a few minor comments

sem/utils.py Outdated Show resolved Hide resolved
sem/utils.py Outdated Show resolved Hide resolved
sem/utils.py Outdated Show resolved Hide resolved
sem/utils.py Outdated Show resolved Hide resolved
sem/utils.py Outdated Show resolved Hide resolved
tests/test_utils.py Outdated Show resolved Hide resolved
tests/test_utils.py Outdated Show resolved Hide resolved
tests/test_utils.py Outdated Show resolved Hide resolved
tests/test_utils.py Outdated Show resolved Hide resolved
tests/test_utils.py Outdated Show resolved Hide resolved
sem/utils.py Outdated
if isinstance(sevirity_class, str):
sevirity_class = [sevirity_class]

if severity_class is not None or components is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

components was check but never used. is there a reason for this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refer 65afd95.

tests/test_utils.py Outdated Show resolved Hide resolved
sem/utils.py Outdated Show resolved Hide resolved
sem/utils.py Outdated Show resolved Hide resolved
@akshitpatel01
Copy link
Author

The following are the statistics for reading a file, parsing it, and executing two filters:

This execution time almost reduces by half if I change this line in insert_logs:
db.table('logs').insert_multiple(deepcopy(logs))
to
db.table('logs').insert_multiple(logs).

Does this improvement in performance justify removing this deepcopy()? Also, all these functions will be internally called by the backend so the user will never call these functions directly.

@akshitpatel01
Copy link
Author

I have also added profiling plots here.

sem/utils.py Outdated Show resolved Hide resolved
tests/test_utils.py Outdated Show resolved Hide resolved
tests/test_utils.py Outdated Show resolved Hide resolved
tests/test_utils.py Outdated Show resolved Hide resolved
@mattia-lecci
Copy link
Member

This execution time almost reduces by half if I change this line in insert_logs:
db.table('logs').insert_multiple(deepcopy(logs))
to
db.table('logs').insert_multiple(logs).

Deepcopy is quite aggressive and expensive, it makes sense that runtime increase if you use it.
Since we never modify the original log file nor the log dictionaries, it might make sense to avoid deepcopy the logs within the program

Copy link
Member

@DvdMgr DvdMgr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Akshit, good job! I'm good to merge after these final comments are implemented.

sem/utils.py Outdated Show resolved Hide resolved
sem/utils.py Outdated Show resolved Hide resolved
sem/utils.py Outdated Show resolved Hide resolved
sem/utils.py Outdated Show resolved Hide resolved
sem/utils.py Outdated Show resolved Hide resolved
sem/utils.py Outdated Show resolved Hide resolved
sem/utils.py Outdated Show resolved Hide resolved
sem/utils.py Outdated Show resolved Hide resolved
sem/utils.py Outdated Show resolved Hide resolved
sem/utils.py Outdated Show resolved Hide resolved
Copy link
Member

@DvdMgr DvdMgr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there! I found a couple of minor issues to fix before merging. Another thing that was mentioned during the call was the plan to move these functions into a logging.py file - do you think you can apply that change to this pull request too?

sem/utils.py Outdated Show resolved Hide resolved
examples/logging_example2.py Outdated Show resolved Hide resolved
@akshitpatel01
Copy link
Author

Another thing that was mentioned during the call was the plan to move these functions into a logging.py

I created a dedicated logging file in the last commit. Do we want to create a new test_logging.py or keep the tests in test_utils?

@DvdMgr
Copy link
Member

DvdMgr commented Jul 28, 2021

I created a dedicated logging file in the last commit. Do we want to create a new test_logging.py or keep the tests in test_utils?

Best to move all those tests to test_logging.py, you are right!

@DvdMgr
Copy link
Member

DvdMgr commented Jul 28, 2021

Squashed and merged in 6ba05d1: thanks for the effort :)

@DvdMgr DvdMgr closed this Jul 28, 2021
@akshitpatel01
Copy link
Author

Hi,
There is a minor issue in sem.logging::filter_logs(). The following line:
if query_final is not None:
needs to be changed to
if query_final != []:.
Should I push a commit on my local repository for this?

@DvdMgr
Copy link
Member

DvdMgr commented Jul 28, 2021

Hi,
There is a minor issue in sem.logging::filter_logs(). The following line:
if query_final is not None:
needs to be changed to
if query_final != []:.
Should I push a commit on my local repository for this?

No problem, I'll make the change myself and force-push it to the gsoc branch!

@DvdMgr
Copy link
Member

DvdMgr commented Jul 28, 2021

Done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants