Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check-sof-logger: rewrite most of it so it can actually find bugs #666

Merged
merged 1 commit into from
Jun 10, 2021

Conversation

marc-hb
Copy link
Collaborator

@marc-hb marc-hb commented Apr 27, 2021

Probably the main change is fixing the huge etrace test gaps #321 and
thesofproject/sof#3281

Also fixes DMA trace gaps #297 and #298

I initial tried to preserve some of the existing code but it was just
too bad. PR #161 / commit 7274f49 seemed especially bad:

  • It tried to ignore a specific ll drift error but instead it filtered
    out almost every log statement out of... stderr, that does not have show
    log statements!! (Just for the record this ll drift error has been
    downgraded to warning now, see
    agent: trace drift message as warning, not error sof#2686 and
    [BUG] WARN validate(), ll drift detected sof#3854)

  • That same commit also added code that merely starts the DMA trace with
    "there is an error below" (without failing the test) but that's eclipsed
    by the entire log that follows. Later, the firmware started printing
    ERROR every single time when the ERROR FW ABI prefix was introduced yet
    no one ever noticed which proves how useless this prefix is was.

So remove this DMA trace prefix as the purpose of this test is - as
clearly stated in #167 - not to find firmware errors but errors with the
sof-logger itself (even though we never had anything looking at firmware
errors so far)

Don't grep for "error" on stderr: anything on stderr is a logger
failure (not a firmware failure).

Don't require whitespace before the TIMESTAMP header.

Add set -e.

Use shell functions.

Signed-off-by: Marc Herbert marc.herbert@intel.com

@marc-hb
Copy link
Collaborator Author

marc-hb commented Apr 27, 2021

https://sof-ci.01.org/softestpr/PR666/build677/devicetest is all green despite failing to parse the new output

@marc-hb
Copy link
Collaborator Author

marc-hb commented Apr 27, 2021

@aiChaoSONG , @fredoh9 these "Log $ftype BEG>>" changes seem to break some CI parsing, see for instance the bottom of https://sof-ci.01.org/softestpr/PR666/build677/devicetest/?model=CML_RVP_SDW&testcase=check-sof-logger

Any idea what exactly breaks and maybe how?

@aiChaoSONG
Copy link

@marc-hb drivers.audio.ci.sof-jenkins/blob/main/json2html.py this file will add some style to plain text log, and transform log into html format, in html format, pair of <any contents> can not display.

@marc-hb
Copy link
Collaborator Author

marc-hb commented Apr 27, 2021

Thanks for the pointer! BTW how come these tests pass without any TIMEOUT? They don't print PASS at the end! The similar situation in #661 caused TIMEOUTs

@aiChaoSONG
Copy link

@marc-hb No, there is!!!, if you check the source from your browser, it looks like this:

image

@marc-hb
Copy link
Collaborator Author

marc-hb commented Apr 27, 2021

Got it, thanks! Looks like I just invented the invisible PASS :-)

@marc-hb
Copy link
Collaborator Author

marc-hb commented Apr 27, 2021

@aiChaoSONG , @fredoh9 , @keqiaozhang : I thought this PR just broke some code looking for Log data BEG>> and <<END whatever but actually:

marc-hb added a commit to marc-hb/sof-test that referenced this pull request Apr 27, 2021
Also change <<END to ::END and print both files always.

<< and >> seem to corrupt the generated HTML and I could not find
anything parsing these anyway as of now.

See discussion in (much larger) PR thesofproject#666 for more details.

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
marc-hb added a commit to marc-hb/sof-test that referenced this pull request Apr 27, 2021
Also change <<END to ::END and print both files always.

<< and >> seem to corrupt the generated HTML and I could not find
anything parsing these anyway as of now.

See discussion in (much larger) PR thesofproject#666 for more details.

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
@marc-hb
Copy link
Collaborator Author

marc-hb commented Apr 28, 2021

https://sof-ci.01.org/softestpr/PR666/build680/devicetest/?model=APL_UP2_NOCODEC&testcase=check-suspend-resume-with-playback is the usual #650 TIMEOUT, everything else is green.

In the same run the check-sof-logger outputs look OK except the 2, now 4 logger tabs are all empty but that's a problem that long predates this PR (example at https://sof-ci.01.org/sofpr/PR3995/build8564/devicetest/?model=CML_HEL_RT5682&testcase=check-sof-logger ) The logger tabs seem to be empty only for the check-sof-logger test, maybe because it deals with the sof-logger differently. Still, tabs should never be empty without at least a message why.

Long story short this PR is ready however I'd like much smaller PR #668 to go first (DONE) to be cautious and make really really sure the BEG>> -> BEG:: change does not break anything. Maybe the reason it does not break anything is because it's already broken?

marc-hb added a commit that referenced this pull request Apr 28, 2021
Also change <<END to ::END and print both files always.

<< and >> seem to corrupt the generated HTML and I could not find
anything parsing these anyway as of now.

See discussion in (much larger) PR #666 for more details.

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
@marc-hb
Copy link
Collaborator Author

marc-hb commented Apr 28, 2021

https://sof-ci.01.org/softestpr/PR666/build681/devicetest has the usual #650 TIMEOUT and everything else is green and looks good.

The dmesg and logger tabs are empty for this test (and for this test only) but that's not new at all.

@marc-hb marc-hb marked this pull request as ready for review April 28, 2021 22:01
@marc-hb marc-hb requested a review from a team as a code owner April 28, 2021 22:01
@marc-hb
Copy link
Collaborator Author

marc-hb commented May 26, 2021

SOFCI TEST

EDIT: https://sof-ci.01.org/softestpr/PR666/build702/devicetest is all green and looks OK

@aiChaoSONG
Copy link

aiChaoSONG commented May 26, 2021

Seems this case generated several empty logger log files.
https://sof-ci.01.org/softestpr/PR666/build702/devicetest/
image

@marc-hb
Copy link
Collaborator Author

marc-hb commented May 26, 2021

Seems this case generated several empty logger log files.

Yes thanks for paying close attention! This is a known, UI issue, see comments and links above. It's not caused by this PR. You can still download the logs, only the UI is empty.

The only difference this PR makes is that there are now 4 empty tabs instead of 2.

Copy link

@aiChaoSONG aiChaoSONG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the log file are renamed to *.txt, it is good to merge. thank you @marc-hb

# Search for the log header, should be something like this:
# TIMESTAMP DELTA C# COMPONENT LOCATION CONTENT
for f in etrace data; do
local tracef="$LOG_ROOT/logger.$f.log"
Copy link

@aiChaoSONG aiChaoSONG Jun 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The web cannot render .log file, so it automatically download it for you, that's what you see in the internal web, please rename it to *.txt.

dlogw "Catch keyword 'ERROR' in firmware log"
fi
for f in etrace_stderr error; do
local stderr_file="$LOG_ROOT/logger.$f.log"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

local bname
for ftype in data etrace error etrace_stderr; do
printf '\n\n'
bname="logger.$ftype.log"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

func_lib_check_sudo
# etrace shared memory mailbox, newer feature.
etrace_file=$LOG_ROOT/logger.etrace.log
etrace_stderr_file=$LOG_ROOT/logger.etrace_stderr.log

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto.

data_file=$LOG_ROOT/logger.data.log
# stderr
error_file=$LOG_ROOT/logger.error.log

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The web cannot render .log file, so it automatically download it for you, that's what you see in the internal web, please rename it to *.txt.

Probably the main change is fixing the huge etrace test gaps thesofproject#321 and
thesofproject/sof#3281

Also fixes DMA trace gaps thesofproject#297 and thesofproject#298

I initial tried to preserve some of the existing code but it was just
too bad. PR thesofproject#161 / commit 7274f49 seemed especially bad:

- It tried to ignore a specific `ll drift` error but instead it filtered
out almost every log statement out of... stderr, that does not have show
log statements!! (Just for the record this `ll drift` error has been
downgraded to warning now, see
thesofproject/sof#2686 and
thesofproject/sof#3854)

- That same commit also added code that merely starts the DMA trace with
"there is an error below" (without failing the test) but that's eclipsed
by the entire log that follows. Later, the firmware started printing
ERROR every single time when the ERROR FW ABI prefix was introduced yet
no one ever noticed which proves how useless this prefix is was.

So remove this DMA trace prefix as the purpose of this test is - as
clearly stated in thesofproject#167 - not to find firmware errors but errors with the
sof-logger itself (even though we never had anything looking at firmware
errors so far)

Don't grep for "error" on stderr: anything on stderr is a logger
failure (not a firmware failure).

Don't require whitespace before the TIMESTAMP header.

Add set -e.

Use shell functions.

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
@marc-hb
Copy link
Collaborator Author

marc-hb commented Jun 9, 2021

The [ 1034.664354] kernel: IO error in rt700_jack_detect_handler, ret -16 in https://sof-ci.01.org/softestpr/PR666/build721/devicetest/?model=CML_RVP_SDW&testcase=check-suspend-resume-5 is the well known thesofproject/linux#2943

In the same run, the logger tabs are visible and not auto-downloaded like the .log files were. Thanks @aiChaoSONG ! It's still confusing to have all 4 of them named the same and to have the stderr tabs empty but that can be fixed later.

@marc-hb marc-hb requested a review from aiChaoSONG June 9, 2021 18:43
@marc-hb
Copy link
Collaborator Author

marc-hb commented Jun 10, 2021

Thanks @aiChaoSONG . This has been in review for months and badly needed for tracing changes like thesofproject/sof#4327 and others. Merging.

@marc-hb marc-hb merged commit 8732f67 into thesofproject:main Jun 10, 2021
@marc-hb marc-hb deleted the logger-test-rewrite branch June 10, 2021 04:11
@marc-hb marc-hb added False Pass / green failure area:logs Log and results collection, storage, etc. labels Jul 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:logs Log and results collection, storage, etc. False Pass / green failure
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants