Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support async NOC generation via op cred delegate API #7863

Merged
merged 6 commits into from
Jun 28, 2021

Conversation

pan-apple
Copy link
Contributor

Problem

The OperationalCredentialDelegate requires NOC to be generated synchronously. Some CA implementations may not support such design. The delegate should provide a callback mechanism so that the NOC can be provided to the controller layer once CA has generated it.

Change overview

Updated OperationalCredentialDelegate API to take a callback which the delegate implementation will call on NOC generation. Updated ExampleOperationalCredentialsIssuer and CHIPOperationalCredentialsDelegate to the new interface.

Testing

Tested commissioning flow using Python controller, chip-tool and iOS CHIPTool apps.

@todo
Copy link

todo bot commented Jun 24, 2021

- Reduce memory requirement for generating x509 certificates

// TODO - Reduce memory requirement for generating x509 certificates
// As per specifications (section 6.3.7. Trusted Root CA Certificates), DER certs
// should require 600 bytes. Currently, due to ASN writer overheads, a larger buffer
// is needed, even though the generated certificate fits in 600 bytes limit.
constexpr uint32_t kMaxCHIPDERCertLength = 1024;
/// Callbacks for CHIP operational credentials generation
class DLL_EXPORT OperationalCredentialsDelegate
{


This comment was generated by todo based on a TODO comment in af20809 in #7863. cc @pan-apple.

@pan-apple
Copy link
Contributor Author

Fixes #7859

@bzbarsky-apple
Copy link
Contributor

/rebase

Copy link
Contributor

@tcarmelveilleux tcarmelveilleux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR adds DeviceCommissioner -> OpCSRDelegate -> DeviceCommissioner workflow coupling. This may work for now, but I think overall it would be great if DeviceCommissioner had more callbacks in general (rather than its delegates) to notify a caller of some steps, so that they can go off, and do everything they need to do async, before just making a next call of "InstallNOC" or such on DeviceCommissioner, that proceed with the flow.

Instead of:
client app -> DeviceCommissioner -> OpCSRDelegate -> DeviceCommissioner.OnDeviceNOCGenerated

We would have something like:
client app -> DeviceCommissioner: start the commissioning as per today
DeviceCommissioner -> client app: OnCSRResponseReceived callback
client app -> client app: do what you need to process CSR, take time if you need...
client app -> DeviceCommissioner: InstallNOC(noc_chain)

This would reduce synchronous flow round trip coupling. Right now, with the callback approach, if you really do want to do real async work, you have to dispatch it while synchronously blocked within the callchain of processing the CSR response, which amounts to needing a different thread (or equivalently a separate dispatch queue entirely) and also not processing other events that may arise from the network, within the stack.

I do understand that the change here "fits what is already there", but we just keep extending the breadth of the long "SDK-owned" workflows and don't leave much control to the client to actually sequence the steps as needed, in their own time. It's odd for a delegate to call a callback of its owner, rather than just a method of its owner. The OperationalCredentialsDelegate is really own by the commissioning logic.

@bzbarsky-apple
Copy link
Contributor

if you really do want to do real async work, you have to dispatch it while synchronously blocked within the callchain of processing the CSR response

Wait, why? The whole point of this PR is that the commissioner calls OperationalCredentialsDelegate::GenerateNodeOperationalCertificate and the client app (which implements that delegate) can do whatever async stuff it wants and just needs to call the provided callback when done. It doesn't have to synchronously block anything; the stack can unwind, etc.

As in, this PR pretty much does the "We would have something like" flow, with the one difference I see being that it only allows the app to provide the NOC async and still gets the ICA cert sync, right?

mspang
mspang previously approved these changes Jun 25, 2021
@mspang mspang self-requested a review June 25, 2021 03:04
@msandstedt
Copy link
Contributor

This API is by definition for vendor-specific code. Fully 100% of the production code that implements this lives outside of the tree. For this reason, we cannot iterate on this. It needs to support all of our spec-compliant use cases, but it also needs to stabilize rather immediately. It's too costly to teams to be repeatedly refactoring prototypes which, again, live outside of the sdk.
I would support this PR in its approximate current form solely due to the fact that without it, it is impossible on some single-threaded platforms to interact with a remote service to acquire an NOC . That is obviously a blocker for some teams.
But this cannot be the first of many design iterations. After this PR, can we commit to a different design approach for this particular API? I think we collectively need to design an interface that is agreeable to all. Only then should we proceed to implementation.

What is vendor specific here? This is just adding the ability to be async, which likely many vendors will require. Being async says nothing about the threading model implied here, which is able to be both single and/or multi threaded.

Can you point out specifically what is the issue?

I wasn't saying that the interface is vendor-specific. I'm pointing out that these are pure virtual methods and the implementations are vendor-specific. Because of this, we have an unusual situation where the majority of important code that needs to implement this doesn't live inside project-chip/connectedhomeip. We can't look at code in the sdk to tell us how this needs to be designed because the code that would tell us this isn't in the sdk in the first place.

Therefore, I'm advocating for a different design approach for these particular features. My objection to this PR isn't necessarily the changes it introduces. My objection is to the prospect of many subsequent pull requests and many subsequent design iterations. I don't believe that is the right development approach for us to choose in this case.

@msandstedt msandstedt self-requested a review June 25, 2021 21:08
@msandstedt msandstedt dismissed their stale review June 25, 2021 21:09

Unblocking for merge since we have a robust sideband conversation discussing broader needs.

@todo
Copy link

todo bot commented Jun 25, 2021

Send DAC as input parameter to GenerateNodeOperationalCertificate()

// TODO: Send DAC as input parameter to GenerateNodeOperationalCertificate()
// This will be done when device attestation is implemented.
ReturnErrorOnFailure(mOperationalCredentialsDelegate->GenerateNodeOperationalCertificate(
Optional<NodeId>(device->GetDeviceId()), 0, CSR, ByteSpan(), &mDeviceNOCCallback));
return CHIP_NO_ERROR;
}


This comment was generated by todo based on a TODO comment in 5bca5e2 in #7863. cc @pan-apple.

*
* @return CHIP_ERROR CHIP_NO_ERROR on success, or corresponding error code.
*/
virtual CHIP_ERROR GenerateNodeOperationalCertificate(const PeerId & peerId, const ByteSpan & csr, int64_t serialNumber,
uint8_t * certBuf, uint32_t certBufSize, uint32_t & outCertLen) = 0;
virtual CHIP_ERROR GenerateNodeOperationalCertificate(const Optional<NodeId> & nodeId, FabricId fabricId, const ByteSpan & csr,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm approving as this PR is needed to unblock some activities.

However, I want to point out that even with NodeID as an optional parameter, this interface misses the mark. There should be no concept under project-chip/connectedhomeip/src of the commissioner proposing a node ID, or even fabric ID, to the CA infrastructure. That is a possible implementation. It is equally possible that all node IDs will be assigned by the CA itself.

The right approach is for such local assignment of IDs in our example implementations to occur in code under project-chip/connectedhomeip/examples. The interface here must not specify IDs; instead, an opaque context should be passed from the app. There will be an implicit contract between app and ecosystem-specific CA infrastructure as to what context is needed, but it is impossible for any code under project-chip/connectedhomeip/src to know or predict what that will be.

The immediate consequence here is that any CA-assignment of a node ID or fabric ID other than what is passed is likely to break the commissioner. I don't think it's overstating the problem to call this a bug. And until it's fixed, any implementation that adopts an approach of CA-assignment of all IDs will have to fork and patch the sdk to work around the problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@msandstedt these are the main reasons we have the optional node ID in the current CA interface.

  1. In the current commissioner interface, the node ID is provided by the commissioner app when the PairDevice() is called. I understand that it's not optimal and may even be incorrect. But unless we update all current test tools and apps as part of this PR, we will end up breaking the tip of the tree. Given we are using test events as project schedule, and TE4 cut off is 1 week away, not sure if we want to introduce all those changes in this PR.
  2. UpdateNOC command will update the existing NOC on a device with the new NOC. Would node ID be updated as part of this process? I tried reading the spec and could not conclude one way or another. But in case we want to retain the same node ID as part of UpdateNOC, this optional node ID parameter can be used to inform CA of the requested node ID value. Again, may be it is not needed, or there are other ways of achieving it. Just my thoughts..

If there is no use case where the optional node ID is needed, it can certainly be removed from the interface. Removing it right now will just cause the code to break.

The commissioner interface was designed before the specifications were finalized, and it is evolving as features are getting integrated. I understand the problem an evolving public API can cause to the customers of the API. Capturing the API requirements in an issue and getting it prioritized in a test event schedule might be a good way to get to the final solution in a deterministic timeline.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would node ID be updated as part of this process?

Yes, if the new certificate has a new node id.

I tried reading the spec and could not conclude one way or another.

"Effect When Received" for UpdateNOC says:

... such that the Node’s Operational Identifier within the Fabric immediately changes.

and

All internal data reflecting the prior operational identifier of the Node within the Fabric SHALL be revoked and removed, to an outcome equivalent to the disappearance of the prior Node, except for ....

If there's something we can do to make the spec clearer here, we should do it...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. In the current commissioner interface, the node ID is provided by the commissioner app when the PairDevice() is called. I understand that it's not optimal and may even be incorrect. But unless we update all current test tools and apps as part of this PR, we will end up breaking the tip of the tree. Given we are using test events as project schedule, and TE4 cut off is 1 week away, not sure if we want to introduce all those changes in this PR.

The problem is that with controllers in scope for TE4, it is no longer sufficient for the sdk to only support the in-tree use cases. The only alternative would be for test event participants to develop infrastructure solely for the purpose of test event compatibility. That's not desirable.

I've filed #7951 to track this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would node ID be updated as part of this process?

Yes, if the new certificate has a new node id.

I tried reading the spec and could not conclude one way or another.

"Effect When Received" for UpdateNOC says:

... such that the Node’s Operational Identifier within the Fabric immediately changes.

and

All internal data reflecting the prior operational identifier of the Node within the Fabric SHALL be revoked and removed, to an outcome equivalent to the disappearance of the prior Node, except for ....

If there's something we can do to make the spec clearer here, we should do it...

Is it desired/required for the node ID to be changed as part of UpdateNOC command? Maybe it's left up to the commissioner implementation. If the use case allows the commissioner to continue to use the same node ID for the device, having the Optional node ID in the CA interface would be useful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine for a commissioner implementation to propose its node ID to ecosystem-specific signing infrastructure. It's not appropriate to formalize that type of flow in the chip::controller::DeviceCommissioner object.

We need to invert our design here. Ecosystem-specific signing infrastructure will require context to field CSRs that the sdk cannot know or predict. Issues like #6656 are well-meaning but misguided. We do not know how, where or if ecosystems will embed other RDNs. It makes no sense to formalize local APIs for this in the first place.

At the simplest level, app code should pass a void * context that will be completely opaque to the sdk and only discernible by the associated OperationalCredentialsDelegate implementation. Consider this: how does ecosystem-specific signing infrastructure verify the commissioner's permission to commission a particular node? OperationalCredentialsDelegate needs some context from the app to prove this permission. What this should look like though is unknowable from the sdk perspective, just like fabric ID, node ID and tag assignment rules are unknowable from the sdk perspective.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pan-apple @bzbarsky-apple @msandstedt what are the next steps here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we all agree that the app is the only thing that has the full context here; the SDK does not. So the SDK needs to communicate some sort of app-meaningful context indicator here.

In addition to that, we should figure out whether there are parts of the context that the SDK can provide to the app and if so whether that provision is useful in common cases. That could include things like "the CSR is for a node with a certain id", or "the CSR is for a node that does not yet have a node id", or whatnot. If we think the app would always have this information anyway, there's not much point communicating it, of course.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was merged before comments were resolved, I thought I enabled this - will investigate. I want to make sure that we have a plan moving forward here though as there is obviously clarification required.

@pan-apple
Copy link
Contributor Author

/rebase

@woody-apple
Copy link
Contributor

@mspang can you re-review given @andy31415 is out?

@Damian-Nordic @hawk248 @jepenven-silabs, any thoughts here?

Copy link
Contributor

@mspang mspang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's continue the design discussions in #7951

Thanks @pan-apple for working to address #7859

src/controller/CHIPDeviceController.cpp Outdated Show resolved Hide resolved
src/controller/CHIPDeviceController.cpp Outdated Show resolved Hide resolved
pan-apple and others added 2 commits June 28, 2021 08:54
Co-authored-by: Michael Spang <mcspang@gmail.com>
Co-authored-by: Michael Spang <mcspang@gmail.com>
@mspang mspang merged commit ab5c617 into project-chip:master Jun 28, 2021
@pan-apple pan-apple deleted the async-ca branch June 28, 2021 18:54
nikita-s-wrk pushed a commit to nikita-s-wrk/connectedhomeip that referenced this pull request Sep 23, 2021
)

* Support async NOC generation via op cred delegate API

* address review comments

* remove nodeId from NOC callback

* fix Android build

* Update src/controller/CHIPDeviceController.cpp

Co-authored-by: Michael Spang <mcspang@gmail.com>

* Update src/controller/CHIPDeviceController.cpp

Co-authored-by: Michael Spang <mcspang@gmail.com>

Co-authored-by: Michael Spang <mcspang@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants