Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SQA] Door lock test cases failed due to "Invalid CASE parameter" #17244

Closed
Survensa opened this issue Apr 11, 2022 · 15 comments · Fixed by #17663
Closed

[SQA] Door lock test cases failed due to "Invalid CASE parameter" #17244

Survensa opened this issue Apr 11, 2022 · 15 comments · Fixed by #17663

Comments

@Survensa
Copy link

Door lock cases were failed due to "Invalid CASE parameter" in few seconds after it has been paired.

During any "read" actions after it has been paired, Chip-Tool log thrown an error "The device does not support GetClock_RealTimeMS() API. This will eventually result in CASE session setup failures."

Following the above error, it has failed with "Invalid CASE parameter" for the further commands from Chip-tool.

Entire log has been attached. It contains few read attribute commands & the failure of the commands at the end of the log.

Error log file :
DL timeout error log.txt

@bzbarsky-apple
Copy link
Contributor

bzbarsky-apple commented Apr 13, 2022

Some questions, @Survensa:

  1. What SHA is being used for chip-tool? What SHA is being used for the server app?
  2. Which server app is being used?
  3. Which platform is the server app running on?
  4. Is this reliably reproducible? Does it happen after the same number of reads each time, or is it somewhat random?

@bzbarsky-apple
Copy link
Contributor

Oh, and:

"The device does not support GetClock_RealTimeMS() API. This will eventually result in CASE session setup failures."

"eventually" here means "after some code gets changed to require time-based certificate validation". Which has not happened, so those messages are not relevant to whatever is going on here.

@Survensa
Copy link
Author

  1. Commit used : fa5561c
  2. Used doorlock app.
  3. Executed doorlock app using ble-wifi platform.
  4. Its a random count, the issue is like after executing any command I am able to execute the next command but if that interval is > 60 sec it throws "CASE session setup failures" even if we run the same command.

Note : Used recent commit 3af5410 ,here I am able to pair with onnetwork and not faced the issue but I am not able to pair using ble-wifi platform.

@bzbarsky-apple
Copy link
Contributor

Executed doorlock app using ble-wifi platform.

No, I mean what hardware/OS? In terms of trying to reproduce, it's best if people are using a setup as close to the one you have as possible.

@bzbarsky-apple
Copy link
Contributor

Oh, and when you say "doorlock app", does that mean the thing in examples/lock-app or in examples/door-lock-app?

@cjandhyala
Copy link
Contributor

@bzbarsky-apple it is examples/door-lock-app

@bzbarsky-apple
Copy link
Contributor

Alright, I have tried to reproduce on that SHA for a while as follows:

  1. scripts/examples/gn_build_example.sh examples/door-lock-app/linux out/debug/standalone chip_config_network_layer_ble=false
  2. scripts/examples/gn_build_example.sh examples/chip-tool out/debug/standalone/
  3. rm /tmp/chip*
  4. Run ./out/debug/standalone/chip-door-lock-app
  5. ./out/debug/standalone/chip-tool pairing onnetwork 17 20202021
  6. Repeat calls to ./out/debug/standalone/chip-tool doorlock read attribute-list 1 1 a number of times, with various gaps between them (did gaps of up to 10 mins or so). Every read worked.

Maybe this is specific to the ble-wifi case, but that seems pretty doubtful. But just to check:

  • Can you reproduce when using on-network commissioning.
  • Either way, can you attach a log from the server (and a matching log for the client) for a failing case?

@Survensa
Copy link
Author

@bzbarsky-apple Door-lock-app work fine on onnetwork/ethernet but facing the issue only on Ble-wifi.

I have attached both DUT and CNTRL log for the "CASE Parameter" error .

LOG:
door lock ctrl log.txt
door lock dut log.txt

@bzbarsky-apple
Copy link
Contributor

bzbarsky-apple commented Apr 14, 2022

Door-lock-app work fine on onnetwork/ethernet but facing the issue only on Ble-wifi.

I am finding this very hard to believe; once we finish commissioning the two cases should act identically.

What other differences are there between the ble-wifi and onnetwork cases?

I never did see an answer to the "what hardware/OS?" question....

Looking at the logs, the key part is this, on the server:

[1649916219.318395][3326:3326] CHIP:SC: CASE failed to match destination ID with local fabrics

and then it logs that it failed out because there is no NOC for the fabric on the device:

[1649916219.319358][3326:3326] CHIP:IN: CASE Session establishment failed: ../../third_party/connectedhomeip/src/credentials/FabricTable.h:164: CHIP Error 0x00000003: Incorrect state

(That's in FabricInfo::GetNOCCert.)

So we had an IPK match the group storage for some fabic index, but the actual fabric for that index was not initialized properly...

Just to check: are the client and server running on the same hardware or different pieces of hardware?

@Survensa
Copy link
Author

@bzbarsky-apple Using two different raspberrypi, one as DUT and other as CNTRL.

@bzbarsky-apple
Copy link
Contributor

bzbarsky-apple commented Apr 18, 2022

OK. @Survensa Can you step through in a debugger on the CASE responder and see what's actually going on? What does the fabric table look like?

Is it possible the storage is somehow getting reset on the rPI running the CASE responder?

@liamgonyea
Copy link
Contributor

I had run the same commands mentioned above and had the same findings as well with no failures when chip-tool reads an attribute from chip-door-lock-app after onNetwork pairing

Steps Ran

  1. scripts/examples/gn_build_example.sh examples/door-lock-app/linux out/debug/standalone chip_config_network_layer_ble=false
  2. scripts/examples/gn_build_example.sh examples/chip-tool out/debug/standalone/
  3. ./out/debug/standalone/chip-door-lock-app
  4. ./out/debug/standalone/chip-tool pairing onnetwork 17 202020215.
  5. ./out/debug/standalone/chip-tool doorlock read cluster-revision 17 1

Test Setup

1st RasPi: chip-tool built with commit SHA: fa5561c23d8c1f9464da8669931affbae0660f51

2nd RasPi: door-lock-app built with commit SHA: fa5561c23d8c1f9464da8669931affbae0660f51
scripts/examples/gn_build_example.sh examples/door-lock-app/linux out/debug/standalone chip_config_network_layer_ble=false

Lines of Interest

Lines of Interest:

ubuntu@controller:~/TE/stableBuild/connectedhomeip$ ./out/debug/standalone/chip-tool pairing onnetwork 17 20202021
...
[1650578349.007276][17409:17416] CHIP:TOO: Device commissioning completed with success

ubuntu@controller:~/TE/stableBuild/connectedhomeip$ ./out/debug/standalone/chip-tool doorlock read cluster-revision 17 1
...
[1650579166.721625][17799:17804] CHIP:TOO:   ClusterRevision: 3

Logs

chip-tool_issue_17244.txt
doorlock_issue_17244.txt

@liamgonyea
Copy link
Contributor

When pairing over ble-wifi, I had experienced the same issue when chip-tool attempts to read an attribute from chip-door-lock-app

Summary of Findings

(chip-tool logs for each attempt at reading an attribute from chip-door-lock-app)

1st Attempt: Successful

[1650657650.054116][3880:3885] CHIP:TOO: ClusterRevision: 3

2nd Attempt: Fail (Invalid CASE parameter)

[1650657980.921764][3898:3898] CHIP:TOO: Run command failure: ../../../examples/chip-tool/third_party/connectedhomeip/src/protocols/secure_channel/CASESession.cpp:1479: CHIP Error 0x00000054: Invalid CASE parameter

3rd Attempt: Fail (Timeout)

[1650658795.688720][3936:3936] CHIP:TOO: Run command failure: ../../../examples/chip-tool/third_party/connectedhomeip/src/lib/address_resolve/AddressResolve_DefaultImpl.cpp:174: CHIP Error 0x00000032: Timeout

Steps Ran

  1. scripts/examples/gn_build_example.sh examples/door-lock-app/linux out/debug/standalone_ble chip_config_network_layer_ble=true
  2. scripts/examples/gn_build_example.sh examples/chip-tool out/debug/standalone/
  3. ./out/debug/standalone_ble/chip-door-lock-app
  4. ./out/debug/standalone/chip-tool pairing ble-wifi 1234 SSID SSIDpwd 20202021 3840
  5. ./out/debug/standalone/chip-tool doorlock read cluster-revision 1234 1
  6. ./out/debug/standalone/chip-tool doorlock read cluster-revision 1234 1
  7. ./out/debug/standalone/chip-tool doorlock read cluster-revision 1234 1

Test Setup

1st RasPi: chip-tool built with commit SHA: fa5561c23d8c1f9464da8669931affbae0660f51

2nd RasPi: door-lock-app built with commit SHA: fa5561c23d8c1f9464da8669931affbae0660f51
scripts/examples/gn_build_example.sh examples/door-lock-app/linux out/debug/standalone_ble chip_config_network_layer_ble=true

Logs

ble_wifi_chip-tool_issue_17244.txt
ble_wifi_doorlock_issue_17244.txt

@bzbarsky-apple
Copy link
Contributor

@liamgonyea Thank you! Those logs make it pretty clear what's going on here:

  1. The commissionee in this case is already on the network (via ethernet I am told) but also being commissioned as a wifi device.
  2. Commissioning fails after AddNOC, because the commissionee claims to be a thread-only device, or so we think, but we only have wifi credentials. From the chip-tool log:
    [1650657604.415267][3870:3875] CHIP:CTL: Finished commissioning step 'SendNOC' with error '../../../examples/chip-tool/third_party/connectedhomeip/src/controller/CHIPDeviceController.cpp:1390: Success'
    [1650657604.415327][3870:3875] CHIP:CTL: Required network information not provided in commissioning parameters
    [1650657604.415378][3870:3875] CHIP:CTL: Parameters supplied: wifi (yes) thread (no)
    [1650657604.415428][3870:3875] CHIP:CTL: Device supports: wifi (no) thread(yes)
    [1650657604.415492][3870:3875] CHIP:CTL: Going from commissioning step 'SendNOC' with lastErr = '../../../examples/chip-tool/third_party/connectedhomeip/src/controller/AutoCommissioner.cpp:180: CHIP Error 0x0000002F: Invalid argument' --> 'Cleanup'
    
  3. The commissioner claims to commission successfully (says Device commissioning completed with success) because the cleanup step succeeds, but that's a lie, of course. We failed out in an earlier step. We need to get this part fixed.
  4. Since we did AddNOC, the commissionee advertises operational, and it's on the network, so when we try to do a read it works.
  5. Some time passes, and the fail-safe expires on the commissionee:
    [1650657662.023673][10820:10820] CHIP:SVR: Failsafe timer expired
    ...
    [1650657662.023706][10820:10820] CHIP:SVR: Commissioning failed (attempt 1): ../../../examples/door-lock-app/linux/third_party/connectedhomeip/src/app/server/CommissioningWindowManager.cpp:63: CHIP Error 0x00000032: Timeout
    
    and then it kills off the new fabric we added. The next time we try a read, we still have the cached advertisement (until the DNS TTL expires), so even though commissionee is no longer advertising operational we discover it. But when we try to set up CASE that fails, because the fabric is gone on the commissionee.

The "we think this is thread" issue is suspiciously like #17477 but in reverse.

@bzbarsky-apple
Copy link
Contributor

OK, so LinuxWiFiDriver is only instantiated in all-clusters-app, lighting-app, and thermostat. LinuxThreadDriver only in all-clusters-app, for that matter.

Which means that the door lock app is not actually running a useful network commissioning server at all. But it does claim to implement the cluster, and has a FeatureMap attribute with default value 2: see

So that's what we end up with here.

bzbarsky-apple added a commit to bzbarsky-apple/connectedhomeip that referenced this issue Apr 23, 2022
…oning.

When AutoCommissioner::GetNextCommissioningStageInternal moves out of
the kSendNOC state, it can end up in one of three states:
kWiFiNetworkSetup, kThreadNetworkSetup, and kCleanup.  This third case
happens if we don't have credentials for the network technology the
commissionee supports.  In that case, we set the error outparam and
proceed to cleanup.

Unfortunately, the caller completely ignores the outparam, except for
logging.  So we get situations like
project-chip#17244 where the
log looks like this:

    CHIP:CTL: Finished commissioning step 'SendNOC' with error '../../../examples/chip-tool/third_party/connectedhomeip/src/controller/CHIPDeviceController.cpp:1390: Success'
    CHIP:CTL: Required network information not provided in commissioning parameters
    CHIP:CTL: Parameters supplied: wifi (yes) thread (no)
    CHIP:CTL: Device supports: wifi (no) thread(yes)
    CHIP:CTL: Going from commissioning step 'SendNOC' with lastErr = '../../../examples/chip-tool/third_party/connectedhomeip/src/controller/AutoCommissioner.cpp:180: CHIP Error 0x0000002F: Invalid argument' --> 'Cleanup'
    CHIP:CTL: Performing next commissioning step 'Cleanup' with completion status = '../../../examples/chip-tool/third_party/connectedhomeip/src/controller/CHIPDeviceController.cpp:1390: Success'
    CHIP:CTL: Finished commissioning step 'Cleanup' with error '../../../examples/chip-tool/third_party/connectedhomeip/src/controller/CHIPDeviceController.cpp:1497: Success'
    CHIP:TOO: Device commissioning completed with success

and that last line shows the pairing delegate was notified that
commissioning completed successfully, even though we bailed out
partway through.

The fix is to store the error in our completionStatus.  With this
change, the log ends up looking like this:

    CHIP: [CTL] Successfully finished commissioning step 'SendNOC'
    CHIP: [CTL] Required network information not provided in commissioning parameters
    CHIP: [CTL] Parameters supplied: wifi (no) thread (yes)
    CHIP: [CTL] Device supports: wifi (yes) thread(no)
    CHIP: [CTL] Going from commissioning step 'SendNOC' with lastErr = '../../../examples/chip-tool/third_party/connectedhomeip/src/controller/AutoCommissioner.cpp:203: CHIP Error 0x0000002F: Invalid argument' -> 'Cleanup'
    CHIP: [CTL] Performing next commissioning step 'Cleanup' with completion status = '../../../examples/chip-tool/third_party/connectedhomeip/src/controller/AutoCommissioner.cpp:203: CHIP Error 0x0000002F: Invalid argument'
    CHIP: [TOO] Device commissioning Failure: ../../../examples/chip-tool/third_party/connectedhomeip/src/controller/AutoCommissioner.cpp:203: CHIP Error 0x0000002F: Invalid argument

and we see the PairingDelegate is notified with an error status, as expected.

Another option would have been to add a new commissioning step for "no
network credentials" and set the completion status to error from that.
That would have the benefit of also setting the failedStage in the
completion status, which we can't easily set right now because part of
the problem is that we don't actually know what stage to proceed to
and hence which stage to deem as having failed.
bzbarsky-apple added a commit that referenced this issue Apr 25, 2022
…oning. (#17662)

When AutoCommissioner::GetNextCommissioningStageInternal moves out of
the kSendNOC state, it can end up in one of three states:
kWiFiNetworkSetup, kThreadNetworkSetup, and kCleanup.  This third case
happens if we don't have credentials for the network technology the
commissionee supports.  In that case, we set the error outparam and
proceed to cleanup.

Unfortunately, the caller completely ignores the outparam, except for
logging.  So we get situations like
#17244 where the
log looks like this:

    CHIP:CTL: Finished commissioning step 'SendNOC' with error '../../../examples/chip-tool/third_party/connectedhomeip/src/controller/CHIPDeviceController.cpp:1390: Success'
    CHIP:CTL: Required network information not provided in commissioning parameters
    CHIP:CTL: Parameters supplied: wifi (yes) thread (no)
    CHIP:CTL: Device supports: wifi (no) thread(yes)
    CHIP:CTL: Going from commissioning step 'SendNOC' with lastErr = '../../../examples/chip-tool/third_party/connectedhomeip/src/controller/AutoCommissioner.cpp:180: CHIP Error 0x0000002F: Invalid argument' --> 'Cleanup'
    CHIP:CTL: Performing next commissioning step 'Cleanup' with completion status = '../../../examples/chip-tool/third_party/connectedhomeip/src/controller/CHIPDeviceController.cpp:1390: Success'
    CHIP:CTL: Finished commissioning step 'Cleanup' with error '../../../examples/chip-tool/third_party/connectedhomeip/src/controller/CHIPDeviceController.cpp:1497: Success'
    CHIP:TOO: Device commissioning completed with success

and that last line shows the pairing delegate was notified that
commissioning completed successfully, even though we bailed out
partway through.

The fix is to store the error in our completionStatus.  With this
change, the log ends up looking like this:

    CHIP: [CTL] Successfully finished commissioning step 'SendNOC'
    CHIP: [CTL] Required network information not provided in commissioning parameters
    CHIP: [CTL] Parameters supplied: wifi (no) thread (yes)
    CHIP: [CTL] Device supports: wifi (yes) thread(no)
    CHIP: [CTL] Going from commissioning step 'SendNOC' with lastErr = '../../../examples/chip-tool/third_party/connectedhomeip/src/controller/AutoCommissioner.cpp:203: CHIP Error 0x0000002F: Invalid argument' -> 'Cleanup'
    CHIP: [CTL] Performing next commissioning step 'Cleanup' with completion status = '../../../examples/chip-tool/third_party/connectedhomeip/src/controller/AutoCommissioner.cpp:203: CHIP Error 0x0000002F: Invalid argument'
    CHIP: [TOO] Device commissioning Failure: ../../../examples/chip-tool/third_party/connectedhomeip/src/controller/AutoCommissioner.cpp:203: CHIP Error 0x0000002F: Invalid argument

and we see the PairingDelegate is notified with an error status, as expected.

Another option would have been to add a new commissioning step for "no
network credentials" and set the completion status to error from that.
That would have the benefit of also setting the failedStage in the
completion status, which we can't easily set right now because part of
the problem is that we don't actually know what stage to proceed to
and hence which stage to deem as having failed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants