Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Opportunities to improve the credential query/request syntax #112

Closed
tplooker opened this issue Feb 21, 2024 · 84 comments
Closed

Opportunities to improve the credential query/request syntax #112

tplooker opened this issue Feb 21, 2024 · 84 comments

Comments

@tplooker
Copy link
Contributor

Over a period of time through implementation feedback on OpenID4VP, several items around the current credential query syntax (based on P.E v2) have been raised. To the best of my ability I have attempted to summarise some of those that I have heard below

Usage of JSON Path/Pointer

Presentation exchange v2 which is the current target of OpenID4VP makes use of JSON Path, which is a highly expressive DSL for reading JSON documents. The problems identified with it are that because of its broad expressivity and feature set, it creates certain security and use-ability challenges.

JSON pointer is a narrower expression syntax then JSON path which alleviates most of the security challenges and has been considered before as an alternative. But to some it still represents significant use-ability challenges and is still arguably too flexible/complex for most of the applications we have within OpenID4VP credential queries/request syntax where the path expressions we desire to be able to define are usually just simple traversals through nest maps and or arrays.

Furthermore some of the credential formats defined by OpenID4VP are not JSON based instead use technologies like CBOR which gives rise to other possible challenges.

The feature set of P.E is massive

Presentation Exchange itself beyond its usages of JSON Path/Pointer is highly expressive enabling all sorts of incredibly complex queries to be represented. However this complexity burdens both developers who need to write robust implementations of P.E and know how to create valid credential request/queries using P.E. Certainly the feedback overtime that I have heard is many of the features of P.E aren't needed for most usecases or the implementation complexity they create outweighs their value.

P.E protocol agnostic design goal complexity

Presentation exchange from the outset aimed to define a query/request syntax that was protocol agnostic. While this design goal does lead to the potential for re-use of P.E across different protocols. It has created duplication of features that are already inherently present in OpenID4VP (and OAuth2 and OpenID Connect for that matter) such as capability negotiation between the protocol participants. For example OpenID and OAuth2 of which OpenID4VP is built atop has always largely handled capability negotiation through metadata exchanged/registered between the protocol participants rather then doing this negotiation during execution of the protocol itself. Whereas presentation exchange due to its protocol agnostic design goal defines how to negotiate things such as cryptographic suites in a credential request/query.

There are several options we have to solve these challenges and the purpose of this issue is to first better collectively understand these concerns and then discuss what solutions we could apply.

@selfissued
Copy link
Member

Notes from conversations from three open space sessions at OSW and IIW titled "What does Presentation Exchange do and what parts of it do we actually need?" can be found at https://self-issued.info/?p=2427 and https://self-issued.info/?p=2395. In all these conversations, many people expressed the viewpoint that Presentation Exchange is pretty complicated.

@bc-pi
Copy link
Member

bc-pi commented Feb 21, 2024

I'll go on record here as being not a fan of PE.

@Sakurann
Copy link
Collaborator

I think I understand the points outlined in this issue and agree we should discuss if/how we would address them. For the limitations of using JSON Path with CBOR-based mdoc format, this thread also might be useful: WICG/digital-credentials#80 (comment).

Having said that, I also think it is important to discuss and understand what in PE has worked well. So I'll go on record here that PE does the job:

  • its flexibility allowed our implementation to meet customers' requirements without the need to go back to the WG and ask to add a new parameter.
  • as an implementer that need to support multiple credential formats, being able to reuse PE across multiple formats is helpful.
  • ISO has done multiple test events where multiple vendors were able to interoperate using OID4VP with PE for mdocs.
  • OID4VP has never mandated implementing the entire feature set of PE and so far we have taken the approach of limiting the feature set of PE i.e. as in HAIP and security considerations section in OID4VP.

Again, not saying PE is perfect, or the current approach is the best, just believe these points should be taken into account in this discussion.

@David-Chadwick
Copy link
Contributor

You will see from comments that I made over a year ago that mandating the use of DIF PE was a bad design choice. Instead we should give flexibility to implementors to use whatever query language their community of users choose.
By replacing DIF PE with a type and value extension mechanism, then those that want to use DIF PE can set the type to "DIF PE" and the value to the PE query. Those that want to use SQL, or the W3C presentation request (see https://w3c-ccg.github.io/vp-request-spec/) or any other query language can set the type to their chosen language and the value to the query in their language. The metadata of verifiers and wallets can say which query type(s) they support.

@awoie
Copy link
Contributor

awoie commented Feb 22, 2024

I also believe we need a better way than PE for expressing queries.

I have worked on and with several different credential formats, including fully JSON-based ones like IETF SD-JWT VC that do not have any special namespace considerations and have native support for selective disclosure, W3C VCDM 2.0 with a lot of optionality even in the proof format and the way claims use namespaces via @context/JSON-LD processing, and also ISO 18013-5 mdocs that are CBOR-native using COSE_Sign1 and have their own way to define namespaces. All of them are quite different, and I had troubles in applying PE because of differences in:

  • typing (JSON/JSON-LD)
  • structure (where in the credential claims are defined)
  • encoding (JSON/CBOR),
  • feature set (selective disclosure, policies, etc.).

I believe having a credential-format agnostic query language like PE is quite challenging for those reasons.

@selfissued
Copy link
Member

As cited at openid/OpenID4VCI#266 (comment), a few of us have been working on a format-specific query language proposal as a strawman to replace PE. This was motivated in part by @tlodderstedt's request during the IIW session to "show me a specific counter-proposal".

See https://docs.google.com/document/d/10JT--pXWsfwC4QVu3XJpXGwcO08M6tnpkZ4PKdtCEWo .

@tlodderstedt
Copy link
Collaborator

tlodderstedt commented Feb 22, 2024

I'm not a fan of PE. However, it is implemented now with OID4VP in a couple of places. So it is not a show stopper per se and people have invested money in it. That's why I'm hesitant to replace it.

I think most concerns can be addressed by defining a subset of PE. That's what we did in the HAIP spec. Please have a look
image

Having said that. I would be ok to add another way to request credentials (and deprecated PE) if the existing implementers and the SDOs that build standards on top of OID4VP w/ PE support that step.

@Sakurann
Copy link
Collaborator

Sakurann commented Feb 22, 2024

Chair hat on, would suggest to proceed as following:

  1. as a first step, agree on what are the problems with PE we want to address
  2. than discuss how to solve those problems

Notes from Feb-22-2024 WG call,

Some requirements

  • need to work well with currently known credential formats, which include both JSON-based and CBOR-based formats
    • could define credential format specific syntax, but sounded like there is some implementation feedback that ability to reuse the common syntax across credential formats is useful (@ve7jtb said it better)
  • need to clarify how verifier metadata negotiation happens: using traditional oauth style mechanisms, or using the query language
    • seems to be the feedback that oauth style mechanism might not be the best fit for issuer-holder-verifier model and using query language has worked

Options in the solution space (does not have to be only one of these - doing 1 and 4, for example, seems to be an option)

  1. define a (mandatory to implement?) minimum subset of PEv2.0
  2. work with DIF on PEv3.0
  3. define a credential format agnostic query syntax in OIDF. maybe something like the one proposed in Define claims display description and claims path query OpenID4VCI#276
  4. define a credential format specific query syntax in OIDF.
  5. having the query language as an extension point

@awoie
Copy link
Contributor

awoie commented Feb 22, 2024

Chair hat on, would suggest to proceed as following:

  1. define a (mandatory to implement?) minimum subset of PEv2.0
  2. work with DIF on PEv3.0
  3. define a credential format agnostic query syntax in OIDF. maybe something like the one proposed in Define claims display description and claims path query OpenID4VCI#276
  4. define a credential format specific query syntax in OIDF.

In all cases, we need to define profiles for credential formats. Even if we had one query syntax that appears similar for all credential formats, it would have significantly different meanings per credential format, which is equivalent to defining a syntax per credential format. This applies to all four options.

Under this assumption:

Regarding option 1, I'm not convinced that having a profile for PE for each credential format is the best option. There is a risk that implementers will make mistakes, implement additional features not recommended, or become overwhelmed by the feature set that PEv2 offers. Note that they will need to refer to the PE spec anyway and find sections that apply.

Regarding option 2, it is similar to option 1.

Regarding option 3, I would be fine with that, but again, we would need to profile it anyway. I am uncertain if being credential format agnostic is a goal that can be achieved or one that we actually want to achieve. Reusing some components across credential formats does make sense, for example, how to encode constraints such as certain fields being required in the response. Note that openid/OpenID4VCI#276 is actually more like option 4 since it uses the top-level namespace/credentialSubject object for mdocs/VCs, which is not common among the credential formats.

Regarding option 4, it is essentially the same as option 3.

Regarding options 3 and 4, I see them both as almost equivalent since you will need profiles anyway, and the interpretation of the query will differ depending on the format even it looks similar. I think the reuse of constraints (required, requiredIfPresent, intentToRetain, etc.) makes a lot of sense, but we should have the flexibility to cater to structural/encoding characteristics of formats, which has already been demonstrated in openid/OpenID4VCI#276. I don't think the goal is to have an n-number of query syntaxes for one credential format. Ideally, it should be just one, defined aligned with the patterns the respective community has already used.

@awoie
Copy link
Contributor

awoie commented Feb 23, 2024

need to clarify how verifier metadata negotiation happens: using traditional oauth style mechanisms, or using the query language

  • seems to be the feedback that oauth style mechanism might not be the best fit for issuer-holder-verifier model and using query language has worked

I don't understand why verifier metadata negotiation is part of this discussion. There is a mechanism in place that works well with OID4VP. We even added a feature, client identifier scheme, to facilitate this negotiation. The method by which PE populates some of the supported curves is redundant and unnecessary. We even have a paragraph that states, "The Wallet MUST ignore any format property inside a presentation_definition object if that format was not included in the vp_formats property of the metadata." Therefore, format in PE is yet another potential source of developer confusion. Negotiation by taking into account wallet capabilities is another topic we try to address in #59.

@David-Chadwick
Copy link
Contributor

@Sakurann You appear to missed out the option of having the query language as an extension point, which brings a lot of advantages, a main one being future proofing, another one being innovation. Some in the WG like this approach whilst others do not. An argument made against it was that it kills interop. But this argument is fallacious for several reasons.
a) we already have a number of extension points e.g. the crypto suite. Is anyone arguing for replacing this with fixed a crypto suite?
b) having a fixed query language can kill interop when more than one standard query language exists. We originally mandated DIF PEv1, now DIF PEv2 and some are thinking of moving to DIF PEv3. There is also the W3C presentation request draft that I referred to above, which many have implemented. Why should OID4VP try to pick the winner? Let implementations and the market decide and let the protocol cater for all.

@Sakurann
Copy link
Collaborator

@awoie verifier metadata negotiation is part of the discussion because there is no agreement on the statement The method by which PE populates some of the supported curves is redundant and unnecessary.

@David-Chadwick thank you! updated my original comment to add "having the query language as an extension point" as an option :)

@awoie
Copy link
Contributor

awoie commented Feb 23, 2024

@awoie verifier metadata negotiation is part of the discussion because there is no agreement on the statement The method by which PE populates some of the supported curves is redundant and unnecessary.

My point was that there is implicit agreement codified by normative language in the OID4VP spec: "The Wallet MUST ignore any format property inside a presentation_definition object if that format was not included in the vp_formats property of the metadata."

@Sakurann
Copy link
Collaborator

that language might be residual from before we negotiated with PE authors to add a top level format parameter in the PE. also that sentence only talks about the formats negotiation and not the rest of the metadata. so I think the large question is still in tact.

@awoie
Copy link
Contributor

awoie commented Feb 23, 2024

that language might be residual from before we negotiated with PE authors to add a top level format parameter in the PE. also that sentence only talks about the formats negotiation and not the rest of the metadata. so I think the large question is still in tact.

I don't understand what this means, according to github, format has been in PE for a long time (v2.0.0, > 2 year), even before the normative statement I was talking about was added (02/2023).

IMO, it is not a requirement of any query syntax to communicate any of these metadata. We have client metadata for this and changing this is would be a fundamental shift for OID4VP.

We have an issue to do real capability negotiation and using PE for this was never brought up nor included in any proposal. I believe nobody argued for putting other client metadata into the query syntax.

@bc-pi
Copy link
Member

bc-pi commented Feb 24, 2024

I'll go on record here as being not a fan of PE.

On the 22 Feb 24 call @Sakurann pointed out that this kind of "voting" alone on PE wasn't particularly helpful, which is fair. My concern with PE is somewhat more high-level. It promises to provide an awful lot of useful functionality but is extremely complex. This kind of standard often sounds/looks good on paper and even in narrowly constrained testing at first but the real cost of the complexity and sometimes inability to actually provide purported functionality doesn't show up until later and manifests in serious interoperability and security issues or lack of adoption/use. That's maybe not much more helpful that a "vote" but tried to convey some of my rationale.

@nemqe
Copy link
Contributor

nemqe commented Feb 26, 2024

Our experience is that PE supports a lot of functionality that on paper sounds useful, but that in the end leads to a lot of complexity when trying to build systems that support many different flavors of credentials for reasons many mentioned above (encoding matters, storage method matters, structure matters, typing matters, features matter...)
Main reasons that lead to this complexity were:

  • JSON Path richness of expression
  • Regex filters
  • and submission_requirements mixing and matching

We ignored Features for now

When only using absolute paths with non-regex filters we came to a more or less understandable implementation - but one that still involved having Intermediate Representation of credentials, and translating PE to another query language, and one that was still very complicated.

Chair hat on, would suggest to proceed as following:

1. define a (mandatory to implement?) minimum subset of PEv2.0

Reducing the set of PE would make sense at least to me, but I am not sure how that would be enforced and communicated. The feature set would need to be very clearly defined, and referenced in the base spec and not in the interop-profile.

2. work with DIF on PEv3.0

This could potentially give us an opportunity to define the base/core of the spec that we think is aligned with our requirements, so I think I prefer it more compared to suggestion #1.

3. define a credential format agnostic query syntax in OIDF. maybe something like the one proposed in [Define claims display description and claims path query OpenID4VCI#276](https://github.com/openid/OpenID4VCI/pull/276)
4. define a credential format specific query syntax in OIDF

Could make sense, but seems like a lot of work, and given that many have already implemented PE it might be easier to try to fix that one before designing yet another mini-language.

5. having the query language as an extension point

I think the spec should allow for multiple query syntax(or their versions), but that it should mandate one across all implementations. Example: VP 18 mandates PEv2, but it could allow you to use a specific one that might fit your system better. Mandated one(s) would ensure interoperability, while the optional ones would allow for experimentation, competition, and local optimizations. This could possibly also allow for easier switching from one version/flavor of query syntax to another. This would probably require some registry of known query syntaxes.

@decentralgabe
Copy link

decentralgabe commented Feb 29, 2024

I'd like to add support for @Sakurann's suggestions, namely

define a (mandatory to implement?) minimum subset of PEv2.0
work with DIF on PEv3.0

I'd take this a step further and suggest PE v3 is adopted as a work item of this group.

To paraphrase @selfissued I do not believe we should allow for multiple query languages if one will do. Standards make choices. Optionality is the enemy of successful interoperability.

Because I believe PE v3 will take a while, my immediate suggestion is to profile a subset of PE v2 and require it for all implementers of OID4VP.


Separately, I'd like to make a comment on the complexity of PE. As one of the authors/editors of PE v1 and a contributor to v2, we tried to represent the complexity present in the credentials space, which is itself highly complex. I could succeed in making an argument that OAuth, OIDC, and OID4VC are all really complex specs. This is not me suggesting that we increase that complexity, rather, it's an acknowledgment that our subject area is inherently complex and the reasoning "too complex let's exclude it" is misleading at best.

Many engineering teams have implemented PE successfully. I have personally done so multiple times. It's not an insurmountable task, albeit annoying at times.

This represents an opportunity for us as editors to find opportunities for simplicity that works for our use cases.

@TomCJones
Copy link

does anyone have an example of how to turn a PE request into a meaningful user consent form that can fine on a smart phone screen?

@jogu
Copy link
Collaborator

jogu commented Feb 29, 2024

does anyone have an example of how to turn a PE request into a meaningful user consent form that can fine on a smart phone screen?

@TomCJones My understanding of current models is that the user consents to what information can be released that satisfies the PE request, not what information was requested by PE, so there is never a need to turn PE into a user consent screen.

So for example if the PE requests a proof of name from either a driving license or a passport, it does not seem important for the user to know that this was the request - only for the user to be asked (in the case where they only have a driving license on their phone and not a passport) whether they would like to release the name in their driving license credential to the verifier.

Can you explain a situation where it would instead be important for the user to understand the request the verifier made? At least in my example, it would seem considerably less user friendly to try to explain the request to the user.

@TomCJones
Copy link

I think you are telling me that the verifier gets to create a sql injection attack against the user, the user gets a list of all of the credentials and all of the data fields in those credentials as a consent request from what? one aggregator wallet, each wallet in turn (perhaps three of them) will they are standing in a line to get on an airplane or into a concert and makes an informed consent decision? All of the ideas about informed consent that i have been involved with the question in their mind is what the verifier is going to do with the data. I don't see where informed consent is addressed in any of this! It sounds to me like a direct violation of nearly every data protection law in existence.

@jogu
Copy link
Collaborator

jogu commented Feb 29, 2024

Hi Tom

To complete my example, if the user has a driving license and a passport stored on their phone, and for the purposes of this example they are stored in different wallets, then my understanding of the flow might work (as is being defined in the WICG group):

  1. Verifier tells the browser that it wants a verified name from a passport or driving license
  2. The browser asks the installed wallets what credentials they have that meet that requirement (this matcher is sandboxed, so no information can be leaked to the wallet providers in this step)
  3. Assuming more than one credential matches, the user is asked which credential they want to provide (or if they don't want to provide one)
  4. If the user does want to share a credential, the wallet that holds that credential is then invoked to provide the credential and asks for any necessary user consent before returning the credential to the verifier.

This is fully informed consent. No one here has any interest in designing flows that fail to comply with data protection laws, that would be a fruitless endeavour.

@TomCJones
Copy link

i guess i don't see an example of the screens the user sees. Who's job is it to display the name of verifier and what they want to do with the data? For what purpose do they want the data? Is the verifier trustworthy? the device or each wallet. I can see some simple use cases going to six screens depending on the design.

@jogu
Copy link
Collaborator

jogu commented Feb 29, 2024

Hi Tom. I think your original question about PE has been answered and we've now veered significantly away from the subject of this ticket. I think some of the WICG folks have demos of the flow, it might be worth asking there.

@tplooker
Copy link
Contributor Author

tplooker commented Mar 1, 2024

To paraphrase @selfissued I do not believe we should allow for multiple query languages if one will do. Standards make choices. Optionality is the enemy of successful interoperability.

+1 here I dont believe we should allow multiple query languages, that would be a failure of the standard to foster interoperability.

Separately, I'd like to make a comment on the complexity of PE. As one of the authors/editors of PE v1 and a contributor to v2, we tried to represent the complexity present in the credentials space, which is itself highly complex. I could succeed in making an argument that OAuth, OIDC, and OID4VC are all really complex specs. This is not me suggesting that we increase that complexity, rather, it's an acknowledgment that our subject area is inherently complex and the reasoning "too complex let's exclude it" is misleading at best.

If you are referring to my last point, I don't believe that is the argument I'm making. I'm merely pointing out that there is a trade off, every feature a protocol or standard chooses to define has a cost associated to and that there needs to be a counter balance saying "do we really need this feature, is it worth the cost"? Having also participated somewhat in P.E I don't believe it got that right, the inclination was to say "yes" to every use case, rather then challenging which features were actually needed. I think its abundantly clear that especially in the context of OpenID4VP that P.E defines way more features then required for its application, evidence of which can been seen via the heavy profiling of P.E done in HAIP, ISO 18013-7 and OpenID4VP itself. That redundancy has a significant cost associated to it.

@nklomp
Copy link

nklomp commented Mar 1, 2024

Valid points are raised, but I also want to make a few points from our side:

  • Fully agreed on the complexity. It took us significant amount of engineering to support almost everything from the PE spec for v1 and v2. Having said that, PEv2 already introduced the concept of features. These are all optional, and thus if consensus is that a lot of these features are not used/needed for most use cases, creating a profile of PE makes sense; just leave out a lot of the features
  • PEX v2.1 will move to JSONPointer and make JSONPath optional, where V3 will remove JSONPath altogether, alleviating the security concerns (which are described in both PE and OID4VP, HAIP)
  • Unless I overlooked, nobody mentioned yet that PE is not only used in OID4VP, but also in other SSI contexts, like for instance Aries Present Proof v3. PE is agnostic to the transport protocol. Any party implementing support for other protocols next to OID4VP, thus would have to support another specification with very similar goals.
  • A lot of parties are supporting PE in their solutions. The OID4VC specs (no judgement) have already caused a lot of other new related specs to emerge for things that have been around in a slightly different form for quite some time. We are okay with that, but it does mean that vendors do need to add yet another way of doing similar things. The OID4VC specs are seeing constant change, which is totally understandable, at the same time the EU is moving fast and a lot of vendors already implemented PE in their solutions. IMO changing to a new credential query/request specification and then implementation will cost significant time and money on the spec level as well as on the implementation/adoption level. Almost everyone I talk to is already feeling the pain of the constant changes and amount of optionality in the OID4VC specs. Adding this big of change certainly won't help in that department.

I agree and am in favor of others pointing out:

  • it makes sense to profile to a subset of PE; the optional features would allow for that, as the PE core implementation would be considerably simpler.
  • Work together on PE v3. The whole point of PE is credential query/requests and is used in different contexts, so focus attention there

@TomCJones
Copy link

TomCJones commented Mar 1, 2024

is there any evidence that normal human users or verifiers will accept VPs?
Especially in view of the likelihood of multiple wallets responding to one request?
I guess it's a foregone conclusion that it will be the browser or device that will function as the metawallet?

@tplooker
Copy link
Contributor Author

tplooker commented Mar 4, 2024

In an effort to make some of my concerns more concrete, I've compiled 3 examples of P.E requests, I think they highlight very clearly why profiling P.E would be entirely insufficient to fix the issues here

Example 1

How should one process the following query?

{
    "id":"mDL-sample-req",
    "input_descriptors":[
        {
            "id":"org.iso.18013.5.1.mDL",
            "format":{
                "mso_mdoc":{
                    "alg":[
                        "EdDSA",
                        "ES256"
                    ]
                },
            },
            "constraints":{
                "limit_disclosure":"required",
                "fields":[
                    {
                        "path":[
                            "$['org.iso.18013.5.1']['driving_privileges']['codes']"
                        ],
                        "intent_to_retain":false
                    }
                ]
            }
        }
    ]
}

The ambiguity around this query is that selective disclosure in an mDoc can only be done to the "$['org.iso.18013.5.1']['driving_privileges']" level. So what should a wallet respond with here, the whole driving_privileges structure OR nothing?

A similar issue also exists when applied to an SD-JWT that encodes a nested object as a single disclosure rather as a series of nested disclosures all the way down to the leafs.

This in my opinion quite clearly highlights the issue with using either JSON Path OR JSON Pointer as the basis for a query syntax, it can't express the nuanced constraints of a credential format like SD-JWT or mDoc reliably.

Example 2

Similar question how should one process this query?

{
    "id":"mDL-sample-req",
    "input_descriptors":[
        {
            "id":"org.iso.18013.5.1.mDL",
            "format":{
                "mso_mdoc":{
                    "alg":[
                        "EdDSA",
                        "ES256"
                    ]
                },
            },
            "constraints":{
                // Note no limit disclosure
                "fields":[
                    {
                        "path":[
                            "$['org.iso.18013.5.1']['driving_privileges']"
                        ],
                        "intent_to_retain":false
                    }
                ]
            }
        }
    ]
}

Without limit_disclosure in the query how should a wallet receiving this interpret the query? Should they return all the claims or just the "fields" in the request? If they return only the fields in the request does this undermine the point of limit_disclosure in the first place? The point I'm making with this example is P.E has bunch of features that make sense in isolation but not when composed together, whether specific claims can be requested is a property of the credential format and shouldn't be independently communicated/negotiated in the request.

Example 3

As a verifier if I were trying to determine whether the holder has a US drivers license, e.g by determining whether it contains a real ID claim without them knowing, I could use the following query to disguise my request

{
    "id":"mDL-sample-req",
    "input_descriptors":[
        {
            "id":"org.iso.18013.5.1.mDL",
            "format":{
                "mso_mdoc":{
                    "alg":[
                        "EdDSA",
                        "ES256"
                    ]
                },
            },
            "constraints":{
                // Note no limit disclosure
                "fields":[
                    {
                        "path":[
                            "$['org.iso.18013.aamva']['real_id']",
                            "$['org.iso.18013.5.1']['family_name']"
                        ],
                        "name": "Family name",
                        "purpose": "Requesting family name",
                        "intent_to_retain":false
                    }
                ]
            }
        }
    ]
}

The problem here is two fold

  1. path allowing multiple expressions is meant to be a feature to create equivalent expressions across formats, but there is nothing stopping one from using it to query different attributes within the same credential to disguise the intent of a request.
  2. name creates a second source of truth for what is being requested meaning the verifier can use it to further disguise the intent of a request to a wallet.

Based on the logical processing of P.E for the above if the real_id attribute was present on the matched credential it would be returned and the holder would likely have given consent based on the "name" thinking they were releasing their "Family name". Secondly in the case the wallet actually didn't have the real_id attribute on the drivers license, the fact that the family name is returned here, proves the real_id attribute doesn't exist, leaking this to the relying party.

Both of these cases above are concerning from a data privacy perspective and can't simply be addressed by profiling P.E they are fundamentally designed into it.

@jogu
Copy link
Collaborator

jogu commented Mar 4, 2024

@tplooker can you explain how these situations can be resolved please - e.g. are you suggesting defining a query language that simply doesn't allow the queries in your examples to be made?

@tplooker
Copy link
Contributor Author

tplooker commented Mar 4, 2024

@tplooker can you explain how these situations can be resolved please - e.g. are you suggesting defining a query language that simply doesn't allow the queries in your examples to be made?

In short yes I believe a proposal like what is given in https://docs.google.com/document/d/10JT--pXWsfwC4QVu3XJpXGwcO08M6tnpkZ4PKdtCEWo/edit#heading=h.7igj7m3na8ru avoids these issues by

  1. Not using a generalised DSL syntax like JSON path/pointer which has no awareness of the data structure it is referring to
  2. Assumes the queries are in the context of a format, so features that cut across formats in awkward ways like limit_disclosure are no longer an issue, because either the credential format supports selective disclosure or it does not.
  3. Doesn't allow multiple path expressions to request the same attribute because the query itself is already localised to the credential format and type.

@alenhorvat
Copy link

I believe this clearly shows there's lack of semantic interoperability, and IMO this is not the issue of the query language.
When you process mDL, you'll never get the prefix explicitly, so example above would be just ["family_name"] in the iso-mdoc.

But fun begins when you want to combine claims from different VCs. Ecosystems with good data model governance won't have this problem :)

@awoie
Copy link
Contributor

awoie commented Mar 20, 2024

I believe this clearly shows there's lack of semantic interoperability, and IMO this is not the issue of the query language. When you process mDL, you'll never get the prefix explicitly, so example above would be just ["family_name"] in the iso-mdoc.

Agreed, it is because the credential formats are different. It is not a query language issue. I just wanted to demonstrate that the query language would need to cater to that. It would be probably better to embrace those differences and have Query objects that do the same.

But fun begins when you want to combine claims from different VCs. Ecosystems with good data model governance won't have this problem :)

:)

@alenhorvat
Copy link

Even if the format is different, one should be able to match the claims. Open or closed system, claim definitions are always defined.
Having full support for 1 query language is expensive, expecting ecosystems to implement 1 per credential format ...

In code (at least on GO), CBOR and JSON are serialised into objects using exactly the same approach. No namespaces, no path extensions, ...

If issue is not in the query language, which is IMO the case, the issue should be renamed into "querying claims or VCs in an open world-ecosystem" or "how to achieve semantic interoperability"?

@awoie
Copy link
Contributor

awoie commented Mar 20, 2024

Even if the format is different, one should be able to match the claims. Open or closed system, claim definitions are always defined. Having full support for 1 query language is expensive, expecting ecosystems to implement 1 per credential format ...

In code (at least on GO), CBOR and JSON are serialised into objects using exactly the same approach. No namespaces, no path extensions, ...

With ISO mdocs, there is no such single envelope structure you could serialize from CBOR to JSON (there are multiple). Also, perhaps you could serialize the ISO DeviceResponse structure to JSON but it would be awkwardly complex and you don't want to use that for the query.

@tplooker
Copy link
Contributor Author

tplooker commented Mar 20, 2024

Having full support for 1 query language is expensive, expecting ecosystems to implement 1 per credential format ..

I'd argue the true cost here isn't actually any higher. For an implementation to support a new credential format they have to add a bunch of code anyway to handle the actual response, validating it, interogating it etc. Asking them to also add support for a query structure that likely closely mirrors the actual form of the credential they get in response is actually a reduction in complexity and implementation cost. The alternative is asking them to add a bunch of a format specific sanitisation checks to a generic query syntax to support a new credential format like what @awoie has pointed out, and even in doing that the resulting queries still appear brittle and prone to abuse IMO.

@tplooker
Copy link
Contributor Author

tplooker commented Mar 20, 2024

To date, I've seen no solid evidence that we can't address all credential formats with a common JSON evaluation assumption, wherein the targets are cast to the common representation and evaluated with a common query language.

The problem is most of these credentials are not simple JSON documents nor can they be modelled as such. They could be using other data representation technologies like CBOR and or aren't even one consistent data structure, the information in a credential can be split across entirely different places (e.g mdoc has an MSO and an IssuerSigned structure which are totally different structures), W3C VC DM has a single JSON document but all the subject claims are grouped in one place "credentialSubject", SD-JWT is different again with in effect two different places credential information can actually exist (the payload and header) not to mention features like selective disclosure can be applied to varying parts of the credential including different levels of nesting. All of these differences make it extremely problematic to try and provide a generic query interface as there fundamental design is entirely different.

@alenhorvat
Copy link

I'm not sure this is how information is processed today. If my understanding is correct, you claim that queries are actually made to credentials in a signed format and not in the unsigned format?

JWS/JWT compact or JSON serialised (applies to both JSON or JSON LD representation): all claims are always in the payload. Even if selective disclosure is used, wallet will always expand the SD into a JSON/JSON-LD and store that (and queries should be done over this structure since verifier cannot know in advance how the information will be packaged, when signed) -> In other words, when it comes to JSON, you (always) validate the schema against JSON schema/shacl/...

Same applies to CBOR. CBOR is just a format that is expressing a data model. How information is transported, must be irrelevant. With CBOR it should be even simpler, because CBOR is used in environments with well defined and fixed data models.

@awoie
Copy link
Contributor

awoie commented Mar 21, 2024

I'm not sure this is how information is processed today. If my understanding is correct, you claim that queries are actually made to credentials in a signed format and not in the unsigned format?

IMO, you are right but this is not well defined. It is also not really clear if you would need to run JSON-LD expansion for LD-credentials first before querying the data. Strictly, those things need to be (better) defined by the format-specific profiles.

JWS/JWT compact or JSON serialised (applies to both JSON or JSON LD representation): all claims are always in the payload. Even if selective disclosure is used, wallet will always expand the SD into a JSON/JSON-LD and store that (and queries should be done over this structure since verifier cannot know in advance how the information will be packaged, when signed) -> In other words, when it comes to JSON, you (always) validate the schema against JSON schema/shacl/...

All claims are not always in the SD-JWT payload (Issuer-signed JWT plus Disclosures). Some of the claims are in the Disclosures, not in the Issuer-signed JWT. But I agree, it should be done like this but it is not defined. For that reason I asked to make this explicit in the PR on SD-JWT VC profile #115. Otherwise, you would break PEv2 because there is no such PEv2 "input" document that contains all the information.

Same applies to CBOR. CBOR is just a format that is expressing a data model. How information is transported, must be irrelevant. With CBOR it should be even simpler, because CBOR is used in environments with well defined and fixed data models.

Please look at the ISO 18013-5 specification. You don't want to use the ISO mdoc internal data structure in your query, you would want to use something more abstract instead, i.e., using namespaces/data element identifiers only. And I think this is also what you were saying because the data model is defined. However, PE uses JSONPath which requires some JSON document to refer to certain elements, i.e., path -> $. which is defined as (from PEv2):

The value of this property MUST be an array of one or more JSONPath string expressions (as defined in the JSONPath Syntax Definition section) that select a target value from the input.

ISO does not define an "input" document that can be used with JSONPath (PEv2). That is why we had to redefine the semantics of PEv2 for ISO in the ISO spec which was a bit awkward.

If you simply transform ISO CBOR to JSON and call this JSON document the "input" document that PEv2 requires, the queries would be bloated and are not really usable. It might be also hard to do CBOR to JSON conversion since there are a few decoupled data structures.

@alenhorvat
Copy link

JSON-LD has several representations and the format in the query should clearly define the representation (compact JSON-LD (which is essentially JSON), expanded, ...)

SD-JWT + disclosures is actually the signed form. When you decode the SD-JWT you will always result with JSON. Verifier cannot know in advance where/how the claims are "hidden". They will just say: give me a VC with "first_name", "last_name" according to vocabulary XYZ.

ISO structure is very simple: it has a namespace and an identifier as per 7.2, table 5.

  • compact JSON representation would be with identifiers only (namespace is implicit as usually for JWTs):
    { first_name: alice}
  • the data model we worked on (with the namespaces) would prepend the namespace to the claim name (the extended version):
    {
    org.iso.18013.5.1:first_name: "alice"
    }

If multiple namespaces need to be supported, it's the 2nd option.

And I agree, all this needs to be defined - and it's regardless of the query language you define. Even if the new query language is accepted, you will have exactly the same problem, and all we're discussing about, must be defined (prior to considering of introducing a new query language).

@awoie
Copy link
Contributor

awoie commented Mar 21, 2024

ISO structure is very simple: it has a namespace and an identifier as per 7.2, table 5.

  • compact JSON representation would be with identifiers only (namespace is implicit as usually for JWTs):
    { first_name: alice}
  • the data model we worked on (with the namespaces) would prepend the namespace to the claim name (the extended version):
    {
    org.iso.18013.5.1:first_name: "alice"
    }

Sorry, but this is the encoding for the JWT that is used in ISO 18013-5:2021 Server Retrieval (which nobody uses and which does not match the CBOR structure). That is not used by ISO 18013-7.

If multiple namespaces need to be supported, it's the 2nd option.

And I agree, all this needs to be defined - and it's regardless of the query language you define. Even if the new query language is accepted, you will have exactly the same problem, and all we're discussing about, must be defined (prior to considering of introducing a new query language).

Yes, that all needs to be defined. And yes, it does not matter whether you have credential format-specific query syntax or not. But by being credential format-specific you are not limited in expressiveness, effectiveness, robustness of the how to use the syntax because it is tailored to the needs of the credential format.

@awoie
Copy link
Contributor

awoie commented Mar 21, 2024

ISO structure is very simple: it has a namespace and an identifier as per 7.2, table 5.

  • compact JSON representation would be with identifiers only (namespace is implicit as usually for JWTs):
    { first_name: alice}
  • the data model we worked on (with the namespaces) would prepend the namespace to the claim name (the extended version):
    {
    org.iso.18013.5.1:first_name: "alice"
    }

As mentioned above, this is not the applicable section. You have to look at 8.3.2.1.2.2 Device retrieval mdoc response DeviceResponse which would look odd as JSON. Since Server Retrieval is kind of discouraged (not by ISO officially, but by most participating orgs), most mdocs won't have support for Server Retrieval.

@alenhorvat
Copy link

alenhorvat commented Mar 21, 2024

You have to look at 8.3.2.1.2.2 Device retrieval mdoc response DeviceResponse which would look odd as JSON.

No. This is exactly what I'm trying to explain. You don't want the verifier to create a query for the signature format. Best example is JWS: can be compact or json serialised (there are 2 variants of JSON serialised) and may or may not have SD -> this gives you 6 options for the same data model.
But in all cases, the data structure is defined by the common JSON schema which clearly defines where the claim is. Hence query should be for the JSON, not JWS.

Same goes for mDoc. Query will always be for structure in 7.2, table 5. Then you can package it in whatever signature structure you want.

Version of 18013-7 I have states:
image

Same applies, no matter how the signature is formatted.

@kimdhamilton
Copy link

kimdhamilton commented Mar 21, 2024

It occurs to me I should clarify, with my bigger DIF hat on*: DIF will happily interoperate with whatever comes out of this discussion. We're also open to bigger changes in the next major PE release if that's helpful.

There are some great ideas being discussed here. Not an easy decision but it seems to be in good hands. Daniel and I are always available if there are further questions we can help with.

*we'll call it the DIF sombrero, vs the PE baseball cap

@awoie
Copy link
Contributor

awoie commented Mar 22, 2024

You have to look at 8.3.2.1.2.2 Device retrieval mdoc response DeviceResponse which would look odd as JSON.

No. This is exactly what I'm trying to explain. You don't want the verifier to create a query for the signature format. Best example is JWS: can be compact or json serialised (there are 2 variants of JSON serialised) and may or may not have SD -> this gives you 6 options for the same data model. But in all cases, the data structure is defined by the common JSON schema which clearly defines where the claim is. Hence query should be for the JSON, not JWS.

Same goes for mDoc. Query will always be for structure in 7.2, table 5. Then you can package it in whatever signature structure you want.

I think I never said anything else than this and I fully agree that a query syntax should follow this approach. That is what for instance the query syntax proposal we shared does.

My point was that with the current version of PEv2, you won't be able to do that since it requires a "JSONPath" and an "input" document (that defines the $ for the JSONPath). Structure 7.2 in Table 5 does not exist and is not a real structure btw. (not persisted this way). It is just a list of data element identifiers and their values. You can certainly say, use Table 5 and put it in a specific JSON document and then use this as the value for the input document in PEv2.

Updated:
In addition to Table 5, you would also need to consider the document type which is not part of Table 5.

@awoie
Copy link
Contributor

awoie commented Mar 22, 2024

You have to look at 8.3.2.1.2.2 Device retrieval mdoc response DeviceResponse which would look odd as JSON.
No. This is exactly what I'm trying to explain. You don't want the verifier to create a query for the signature format. Best example is JWS: can be compact or json serialised (there are 2 variants of JSON serialised) and may or may not have SD -> this gives you 6 options for the same data model. But in all cases, the data structure is defined by the common JSON schema which clearly defines where the claim is. Hence query should be for the JSON, not JWS.
Same goes for mDoc. Query will always be for structure in 7.2, table 5. Then you can package it in whatever signature structure you want.

I think I never said anything else than this and I fully agree that a query syntax should follow this approach. That is what for instance the query syntax proposal we shared does.

My point was that with the current version of PEv2, you won't be able to do that since it requires a "JSONPath" and an "input" document (that defines the $ for the JSONPath). Structure 7.2 in Table 5 does not exist and is not a real structure btw. (not persisted this way). It is just a list of data element identifiers and their values. You can certainly say, use Table 5 and put it in a specific JSON document and then use this as the value for the input document in PEv2.

@alenhorvat PEv2 requires a transformation algorithm from the credential to an intermediate format. PEv2 defines a technical algorithm for matching/filtering based on JSONPath applied to an input document. The intermediate format is the PEv2 input document. This transformation algorithm needs to be defined per credential format. Some credential formats such as W3C VCDM don't require that since it is JSON(-LD) already. Other require this algorithm such as ISO mdocs (which we kind of implicitly did in ISO 18013-7 Annex B; we didn't call it such). Because you need such an algorithm, you are additionally limited in how you persist credentials because you have to persist the credentials in this intermediate format in your database to run the filters/matchers efficiently. Like you said, it would be better to use Table 5 on namespaces and data element identifiers directly in case of ISO. The query syntax should use those directly and should not require a transformation to an intermediate first.

@awoie
Copy link
Contributor

awoie commented Mar 22, 2024

You have to look at 8.3.2.1.2.2 Device retrieval mdoc response DeviceResponse which would look odd as JSON.
No. This is exactly what I'm trying to explain. You don't want the verifier to create a query for the signature format. Best example is JWS: can be compact or json serialised (there are 2 variants of JSON serialised) and may or may not have SD -> this gives you 6 options for the same data model. But in all cases, the data structure is defined by the common JSON schema which clearly defines where the claim is. Hence query should be for the JSON, not JWS.
Same goes for mDoc. Query will always be for structure in 7.2, table 5. Then you can package it in whatever signature structure you want.

I think I never said anything else than this and I fully agree that a query syntax should follow this approach. That is what for instance the query syntax proposal we shared does.
My point was that with the current version of PEv2, you won't be able to do that since it requires a "JSONPath" and an "input" document (that defines the $ for the JSONPath). Structure 7.2 in Table 5 does not exist and is not a real structure btw. (not persisted this way). It is just a list of data element identifiers and their values. You can certainly say, use Table 5 and put it in a specific JSON document and then use this as the value for the input document in PEv2.

@alenhorvat PEv2 requires a transformation algorithm from the credential to an intermediate format. PEv2 defines a technical algorithm for matching/filtering based on JSONPath applied to an input document. The intermediate format is the PEv2 input document. This transformation algorithm needs to be defined per credential format. Some credential formats such as W3C VCDM don't require that since it is JSON(-LD) already. Other require this algorithm such as ISO mdocs (which we kind of implicitly did in ISO 18013-7 Annex B; we didn't call it such). Because you need such an algorithm, you are additionally limited in how you persist credentials because you have to persist the credentials in this intermediate format in your database to run the filters/matchers efficiently. Like you said, it would be better to use Table 5 on namespaces and data element identifiers directly in case of ISO. The query syntax should use those directly and should not require a transformation to an intermediate first.

To make it crystal clear, the goal of the credential-format specific query language we proposed is NOT to replicate the structure of the credential (in JSON). The goal is to create a syntax with a reasonable semantic abstraction per credential format. The reason why I talked about the CBOR structures was to counter argument the point that a simple transformation from CBOR to JSON can be done and using the output for the input doc in PEv2. Doing this would make no sense since it is way too convoluted.

@csuwildcat
Copy link

I'd much rather have a simpler per-format directive for placing all target data in a single set of curly braces and running a unified query language over that than descend into a per-format query language expedition where eventually a lot of features will have to be copied across differing query languages in slightly different forms and have User Agents/OSes continually involved as they evolve toward equivalent levels of expressiveness - thar be far more dragons of far greater ferocity, imo.

@awoie
Copy link
Contributor

awoie commented Mar 22, 2024

I'd much rather have a simpler per-format directive for placing all target data in a single set of curly braces and running a unified query language over that than descend into a per-format query language expedition where eventually a lot of features will have to be copied across differing query languages in slightly different forms and have User Agents/OSes continually involved as they evolve toward equivalent levels of expressiveness - thar be far more dragons of far greater ferocity, imo.

@csuwildcat you mean something like this?

mdoc

  "mso_mdoc": {
    "docType":"org.iso.18013.5.1.mDL",
    "nameSpaces":{
      "org.iso.18013.5.1":{
        "birthdate":{
          "required":true          
        },
      },
      "org.iso.18013.5.1.aamva":{
        "DHS_compliance":{
        "requiredIfPresent":true
      }
    }
  }

SD-JWT VC

  "sd-jwt-vc": {
    "vct":"https://credentials.example.com/identity_credential",
    "claims":{
        "family_name":{
        "required":true
      }
    }
  }

This would be my preference.

@csuwildcat
Copy link

I'd much rather have a simpler per-format directive for placing all target data in a single set of curly braces and running a unified query language over that than descend into a per-format query language expedition where eventually a lot of features will have to be copied across differing query languages in slightly different forms and have User Agents/OSes continually involved as they evolve toward equivalent levels of expressiveness - thar be far more dragons of far greater ferocity, imo.

@csuwildcat you mean something like this?

mdoc

  "mso_mdoc": {
    "docType":"org.iso.18013.5.1.mDL",
    "nameSpaces":{
      "org.iso.18013.5.1":{
        "birthdate":{
          "required":true          
        },
      },
      "org.iso.18013.5.1.aamva":{
        "DHS_compliance":{
        "requiredIfPresent":true
      }
    }
  }

SD-JWT VC

  "sd-jwt-vc": {
    "vct":"https://credentials.example.com/identity_credential",
    "claims":{
        "family_name":{
        "required":true
      }
    }
  }

This would be my preference.

Not quite - I was referring to placing the target data portions of a credential in a JSON container, running JSON Pointer queries over it, and testing the results with JSON Schema declarations, instead of creating bespoke, per-format syntaxes that will eventually end up recreating one-off doppelganger versions of these well-established, industry standard utilities that are already known by millions of devs.

@awoie
Copy link
Contributor

awoie commented Mar 22, 2024

I'd much rather have a simpler per-format directive for placing all target data in a single set of curly braces and running a unified query language over that than descend into a per-format query language expedition where eventually a lot of features will have to be copied across differing query languages in slightly different forms and have User Agents/OSes continually involved as they evolve toward equivalent levels of expressiveness - thar be far more dragons of far greater ferocity, imo.

@csuwildcat you mean something like this?

mdoc

  "mso_mdoc": {
    "docType":"org.iso.18013.5.1.mDL",
    "nameSpaces":{
      "org.iso.18013.5.1":{
        "birthdate":{
          "required":true          
        },
      },
      "org.iso.18013.5.1.aamva":{
        "DHS_compliance":{
        "requiredIfPresent":true
      }
    }
  }

SD-JWT VC

  "sd-jwt-vc": {
    "vct":"https://credentials.example.com/identity_credential",
    "claims":{
        "family_name":{
        "required":true
      }
    }
  }

This would be my preference.

Not quite - I was referring to placing the target data portions of a credential in a JSON container, running JSON Pointer queries over it, and testing the results with JSON Schema declarations, instead of creating bespoke, per-format syntaxes that will eventually end up recreating one-off doppelganger versions of these well-established, industry standard utilities that are already known by millions of devs.

IMO, this requires ISO mdocs to be transformed to JSON documents first which is already a bit awkward. This has to be defined somewhere. It would make it also hard to run queries without support for JSON Pointer. It means, developers cannot use their own database scheme, they would need a database that supports JSON pointer queries and they would need to store the credentials in the transformed JSON document. Am I missing something? Wouldn't this be a strong limitation or assumption we make for a lot of developers?

Note that in the case of format-specific queries, you don't need to define that transformation. You just define a couple of indexes but how those are implemented is up to the developer.

@csuwildcat
Copy link

That's a bit of a canard, because any bespoke syntax isn't going to be natively supported by a db query syntax either. The reality is that users aren't going to have millions of credentials, so pulling hundreds to a few thousand from local storage and running a query function over them is trivial. Also, popular DBs actually do support JSON Path/Pointer, Postgres for example.

@awoie
Copy link
Contributor

awoie commented Mar 22, 2024

That's a bit of a canard, because any bespoke syntax isn't going to be natively supported by a db query syntax either.

That is true but it is more trivial to define indexes/keys for a very limited set of defined terms in your query syntax. It would be easier to choose your own database schema.

The reality is that users aren't going to have millions of credentials, so pulling hundreds to a few thousand from local storage and running a query function over them is trivial.

There are people that don't want to run those queries locally.

Also, popular DBs actually do support JSON Path/Pointer, Postgres for example.

Yes, that is true but this means, you would need to store the credentials (e.g. mdoc) in the transformed JSON form.

@tplooker
Copy link
Contributor Author

Not quite - I was referring to placing the target data portions of a credential in a JSON container, running JSON Pointer queries over it, and testing the results with JSON Schema declarations, instead of creating bespoke, per-format syntaxes that will eventually end up recreating one-off doppelganger versions of these well-established, industry standard utilities that are already known by millions of devs.

To support a new credential format developers already have to invest significant implementation effort to support the credential format anyway, because they need to know how to parse, validate and generally understand it, so even with a generic query syntax, a lot (majority IMO) of this cost will exist. The question then becomes how much cost does a generic query syntax save vs the complexity it creates through its abstract nature? What I've argued is based on the developer feedback we've seen over multiple years is that any savings in implementation re-use that P.E might bring across credential formats is far outstripped by the general confusion around how to use P.E in the first place because of how abstract it is.

@Sakurann
Copy link
Collaborator

Sakurann commented Apr 19, 2024

WG mtg

Could we close this issue and open a new one focusing on the concrete syntax (starting point being what we have here: https://docs.google.com/presentation/d/1OxqQy4-WC5BmplCyuo-oKriXZVj2fBrGPCg7n9ryCcg/edit) in a new issue @tplooker ? that might influence the requirements, so issue #144 will remain open until updated syntax is merged.

@Sakurann
Copy link
Collaborator

based on the discussion in this issue, specific proposal for an alternative query language has been made in #178 and the WG has agreed to use it as a starting point. closing this issue in a week, if no objections.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests