Support for ignoring repo specific fields through configurable parameters #109

dakbhavesh · 2023-06-05T13:18:19Z

While adding support for generating hash from EPCIS Query Document we observed that query document may be enriched with repository-specific fields like "hash", "transactionId", "captureID", etc. To ensure the consistency of generated hash must ignore such fields. Such fields are hard to know in advance as each EPCIS repository may behave uniquely and have its own nomenclature. Thus, it would be nice to add a parameter that can be used to provide fields to ignore externally.

The parameter may look like below:

Name: ignoreFields
Values: comma separated name of the fields to ignore

--ignoreFields hash,captureID

The text was updated successfully, but these errors were encountered:

sboeckelmann · 2023-06-07T13:49:33Z

If we to support this kind of feature officially, ignoredFields should be added to hash url
i.e.
ni:///sha-256;b0c8cf9e9a733fb18a0134bfb0f64e701432c6bebda06124a1e62f087d71a333?ver=CBV2.0&ignoreFields=captureID,hash

Otherwise it's not possible to regenerate hash for verification purposes.
This request doesn't seem to be completely thought through.
Maybe it's better to not allow any additional fields in events returned witth EPCISQueryDocument
Only fields that are excluded from hash generation by default should be allowed to be added in query (eventID, recordTime and error declaration related one).

@RalphTro what's your take on this?

RalphTro · 2023-06-07T14:23:04Z

Hi @dakbhavesh and @sboeckelmann,
When Bhavesh initially suggested to tackle this issue through an additional parameter (which I think is a an appropriate way!), I was thinking of a command line parameter (similar to e.g. "-j '\n'" if you want our module to return all key-vale pairs as part of the pre-hash string on a new line for better readability). Note that an eventID field can also be populated by a UUID, or even be empty. So, with that in mind, what is your take on this alternative solution variant?

sboeckelmann · 2023-06-07T15:31:25Z

Having this parameter is not necessarily wrong.
But the root cause, which led @dakbhavesh to making this feature request (repo specific fields) must be taken care of by the repo itself and not resolved by adding a feature to the hash generator. ;-)
Because the repo in this case, simply doesn't return the exact event that was captured and used for generating the hash id.
I will pick this up in a separate discussion with @dakbhavesh

dakbhavesh · 2023-07-05T11:47:34Z

Dear @RalphTro & @sboeckelmann ,

Posting here the proposed idea we brainstormed for reference:

{
  "@context": [
    "https://ref.gs1.org/standards/epcis/2.0.0/epcis-context.jsonld",
    {
      "example": "http://openepcis.io/schema.jsonld", // vendor-specific schema
      "hashGen": "http://some.well.known.domain/schema.jsonld"
    }
  ],
  	"hashGen:ignoreFields": ["hashGen:hash", "example:captureID"],
	"example:captureID": "",
	"hashGen:hash": "",
	.....
}

The above structure ensures that EPCISQueryDocument remains JSON-LD compliant.

hashGen schema [hopefully makes it to standard] will include standard fields like ignoreFields, hash, etc.
example schema is placeholder schema for any vendor-specific schema with fields like captureID which are helpful for client/consumer of EPCISQueryDocument

Incorporating the above will ensure consistency of hash between captured and query document which is the goal for raising this issue.

RalphTro · 2023-07-05T14:19:54Z

Dear @dakbhavesh ,

Many thanks! Based on your ideas, I just compiled a concrete example we could use for illustration purposes. Voilà:

{
    "@context": [
      "https://ref.gs1.org/standards/epcis/2.0.0/epcis-context.jsonld",
      {
        "example": "https://ns.example.com/epcis/",
        "hashGen": "https://some.well.known.domain/schema.jsonld"
      }
    ],
    "type": "EPCISQueryDocument",
    "schemaVersion": "2.0",
    "creationDate": "2005-07-11T11:30:47.0Z",
    "epcisBody": {
      "queryResults": {
        "subscriptionID": "32d2aec1-a6d2-46d9-900a-24124288cce1",
        "queryName": "SimpleEventQuery",
        "hashGen:ignoreFields": ["hashGen:hash", "example:captureID"],
        "resultsBody": {
          "eventList": [
            {
                "type": "ObjectEvent",
                "eventTime": "2019-10-21T14:45:00Z",
                "recordTime": "2023-06-02T07:03:16.073Z",
                "eventTimeZoneOffset": "+01:00",
                "eventID": "ni:///sha-256;b3d481c5590757ab8361a6cb0e924820feb3b61eb568e7d7703f21a96f76f51d?ver=CBV2.0",
                "epcList": [
                    "https://id.gs1.org/00/040123451111111127"
                ],
                "action": "OBSERVE",
                "bizStep": "inspecting",
                "disposition": "in_progress",
                "readPoint": {
                    "id": "https://id.gs1.org/414/4012345000245/254/1"
                },
                "hashGen:hash": "ni:///sha-256;b3d481c5590757ab8361a6cb0e924820feb3b61eb568e7d7703f21a96f76f51d?ver=CBV2.0",
                "example:captureID": "cc90bbe4-7f3a-4abd-b150-d4371c72a64f"
            }
          ]
        }
      }
    }
  }

Does this resonate with you?

dakbhavesh · 2023-07-05T14:32:16Z

Dear @RalphTro , In your example above, shall we keep the values of "eventID" & "hashGen:hash" the same to avoid confusion?

sboeckelmann · 2023-07-05T15:41:13Z

Just a few thoughts:

let's have the hash generator(s) always ignore hashGen related attributes. Which means there would be no need to include them in hashGen:ignoreFields, they shall always be ignored by default.
what do you think about being able to provide an array of event hashes, to support multiple algorithms (sha-256, sha-512, etc.) instead of single hashGen:hash value?

And yes, @dakbhavesh - I also think, if hash is used for eventID, values should match. Unless we want to add some really funky edge-case example with eventTime, recordTime being ignored, which can be a valid use case for identifying certain duplication issues.
But that's fetching too far and would cause a lot of confusion 😄

dakbhavesh · 2023-07-06T07:42:00Z

Hi @sboeckelmann,

I agree with your idea to ignore hashGen:* fields by default.

For your second point, I am curious, what benefit we will gain by providing an array of event hashes? Wouldn't businesses agree beforehand on supported algorithms?

RalphTro · 2023-07-06T08:22:24Z

@dakbhavesh : I think I now finally understand the purpose of the field hashGen:hash: I am correct in assuming that this conveys the event's hash value e.g. to cater for cases where an event does not have an eventID field or the latter is populated with a UUID? (Yesterday, I thouht that this hash also takes fields into account that should be ignored; that's why the values were different :-))
Happy to adjust it.

@sboeckelmann:
As to "let's have the hash generator(s) always ignore hashGen related attributes. Which means there would be no need to include them in hashGen:ignoreFields, they shall always be ignored by default.": how should an accessing application know about that? As it is not officially specified anywhere, wouldn't you agree that it was better to make this more explicit?

As to "what do you think about being able to provide an array of event hashes": I agree with Bhavesh: what is the benefit of this? If there is a need for going for another sha, a company/industry consortium can choose to use it in the eventID field today.

sboeckelmann · 2023-07-06T09:59:20Z

@RalphTro : always ignore hashGen related attributes
The reason why I was suggesting not to add hashGen extension related fields explicitly is, because hashGen will only be active if

Server supports this extension
Client explicitly asks for it by adding the extension to the GS1-Extensions HTTP Request Header

which means these fields don't show up automagically. A client will always be aware of hashGen related implementation details.
I do understand that calling them out explicitly may be helpful and I wouldn't be opposing against it. However, there are already a couple of fields being implicitly irgnored by the hash-generator implementation(s) (eventId, recordTime and error declaration). If we want to be 100% explicit about ignored fields, these fields should also be taken into consideration.
My personal opinion is, that fields that are anyway always or implicitly ignored by the hash generator may not have to be included in ignoreFields, because clients will be aware of the hashGen extention's implementation details.

@RalphTro , @dakbhavesh
array of event hashes
For the initial vanilla use cases it may seem correct that static setting of hash algorithm may be sufficient. Especially when having in mind, that most commonly we will see usage of sha256. I think we are safe to support single hash value only if a client can request a specific algorithm by declaring it in a specific HTTP Request Header.
I am not convinced that EPCIS query request being made will only be targeting for a single company, industry or consortium.
Just think of a transport agency that wants to provide EPCIS services to their customers. Their customers may be coming from a wide variety of industries, having different hash algorithm requirements. In the future we may have even more EPCIS services, that are not bound to a single industry. If EPCIS 2.0 Semantic Web and Linked Data is being picked up and adopted by the industries in such a way that I am hoping for, there shouldn't be a built-in limitation.

RalphTro · 2023-07-06T11:18:04Z

As to hashGen fields: I got your point. If the client specifically requests these fields, I tend to agree with you. (+ considering to add "automagically" to my vocabulary ;-))

As to event hashes array. TBD/need more time to think this through. How do you like the idea to wait at least for a first implementation in industry where multiple hash algorithms are required?

sboeckelmann · 2023-07-06T12:19:37Z

Re: multi hash array
We can keep it as a single hash for now. But we have to add some functionality for the client to set desired algorithm.

RalphTro · 2024-04-08T07:01:55Z

Dear @dakbhavesh and @sboeckelmann ,
Suggestion for tomorrow's call: let's try and implement support for this feature. :-)
Here is how it should work from my POV:

Input: an EPCIS query document (XML + JSON/JSON-LD) with an ignoreFields field, e.g. under the example namespace in this/our reference implementation - though we should point out that this should be configurable
Output: an EPCIS Event Hash ID value which is identical to the one of the original event (meaning that the algorithm disregards the fields listed in ignoreFields.

Report back to the Visibility SMG about this matter once we have implemented it.

WDYT?

sboeckelmann · 2024-04-08T08:29:48Z

Report back to the Visibility SMG about this matter once we have implemented it.

WDYT?

I like the idea of reporting back to the SMG, let's discuss with Craig tomorrow. The EPCIS Event Hash get more attention and awareness from the SMG.

dakbhavesh · 2024-04-08T08:37:21Z

I just got back in the context of the topic :)
As we agree with the approach, I believe I can pick this up for adjusting logic to ignore the fields specified in ignoreFields.
Working logic may not be present by tomorrow however I will try to get it working as soon as possible, most likely in a week's time. Would that be okay?

RalphTro · 2024-04-08T09:35:04Z

Dear @dakbhavesh ,
This would be fantastic! Many thanks for your kind offer to advance this.
I am happy to support through testing.
Looking forward to speaking with you tomorrow!

dakbhavesh · 2024-04-11T12:59:08Z

Dear @RalphTro , @sboeckelmann ,

Summarising our discussion here:

We need to finalize one fixed URL for hashGen fields. This will allow us to filter out fields to exclude in our algorithms
We will start adjusting the algorithm to ignore user extension fields as per the design already discussed i.e. "hashGen:ignoreFields": ["example:captureID"]. By default, all hashGen fields will be excluded.
The algorithm should work even if the name of the key hashGen is provided differently in context. e.g. "hg:ignoreFields": ["example:captureID"].

Please feel free to add any missing points on top of the above. Also, feel free to ask in case of any queries :)

RalphTro · 2024-04-11T14:30:44Z

Dear @dakbhavesh ,
This sounds like a great summary. Agreed + thanks!
To enable better understanding ('hashGen' may be confusing to some people ;-)), we may could use a different URI prefix than 'hashGen', e.g. 'repository-x:ignoreFields' (tied to e.g. 'repository-x.example.com'). This might convey of what we have in mind in a better way, as we could explain that fields such as 'captureID' were added by a fictitious EPCIS repository solution named 'repository-x'.
What is your take on this?
Kind regards,
Ralph

dakbhavesh · 2024-04-17T10:47:42Z

Hi @RalphTro, I agree with your suggestion to keep the prefix and URL as you suggested. It is a better alternative than some technical name like hashGen.

dakbhavesh · 2024-04-23T12:33:03Z

Dear @RalphTro, I am almost there with adjusting logic. Thought to keep you posted on progress. I am going to create a PR soon once sufficient testing and test cases are in place. Thanks for your patience :)

dakbhavesh · 2024-06-08T10:03:01Z

Dear @RalphTro, I have adjusted the logic to incorporate support for ignoring fields from EPCIS XML document and query documents.
Please have a look at PR and feel free to give it a try.

Regarding @Echsecutor proposal to add the ignored fields into the NI. I think we should take it up separately as it may make the generated hash look complex.

For example, we need to use a fully qualified name in NI for the ignored fields <example:testField1/> & <example:testField1/>
ni:///sha-256;b3d481c5590757ab8361a6cb0e924820feb3b61eb568e7d7703f21a96f76f51d?ver=CBV2.0&ignoredFields={https://ns.example.com/epcis/}testField1,{https://ns.example.com/epcis/}testField2

Please share your thoughts.

FYI: @sboeckelmann , @Echsecutor

RalphTro · 2024-06-10T06:24:32Z

I suggest that we keep this issue open until we have reached consensus in the course of our next alignment call.

RalphTro · 2024-07-15T08:41:45Z

For now, reached consensus. Additional thought: if EPCIS events need to be processed stand-alone afterwards, the repository-specific fields (indicated in ignore-fields) need to be redacted/stripped off before passing them further.

Notwithstanding the latter, in an EPCIS query doc, it makes more sense to actually convey that information once, i.e. in the document (no in each and every event).

dakbhavesh added the enhancement New feature or request label Jun 5, 2023

dakbhavesh self-assigned this Jun 5, 2023

dakbhavesh mentioned this issue Apr 23, 2024

support for ignoring industry/repository specific fields #114

Merged

RalphTro closed this as completed Jul 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for ignoring repo specific fields through configurable parameters #109

Support for ignoring repo specific fields through configurable parameters #109

dakbhavesh commented Jun 5, 2023 •

edited

Loading

sboeckelmann commented Jun 7, 2023 •

edited

Loading

RalphTro commented Jun 7, 2023

sboeckelmann commented Jun 7, 2023

dakbhavesh commented Jul 5, 2023

RalphTro commented Jul 5, 2023 •

edited

Loading

dakbhavesh commented Jul 5, 2023

sboeckelmann commented Jul 5, 2023

dakbhavesh commented Jul 6, 2023

RalphTro commented Jul 6, 2023

sboeckelmann commented Jul 6, 2023 •

edited

Loading

RalphTro commented Jul 6, 2023

sboeckelmann commented Jul 6, 2023 •

edited

Loading

RalphTro commented Apr 8, 2024

sboeckelmann commented Apr 8, 2024

dakbhavesh commented Apr 8, 2024

RalphTro commented Apr 8, 2024

dakbhavesh commented Apr 11, 2024

RalphTro commented Apr 11, 2024

dakbhavesh commented Apr 17, 2024

dakbhavesh commented Apr 23, 2024

dakbhavesh commented Jun 8, 2024 •

edited

Loading

RalphTro commented Jun 10, 2024

RalphTro commented Jul 15, 2024

Support for ignoring repo specific fields through configurable parameters #109

Support for ignoring repo specific fields through configurable parameters #109

Comments

dakbhavesh commented Jun 5, 2023 • edited Loading

sboeckelmann commented Jun 7, 2023 • edited Loading

RalphTro commented Jun 7, 2023

sboeckelmann commented Jun 7, 2023

dakbhavesh commented Jul 5, 2023

RalphTro commented Jul 5, 2023 • edited Loading

dakbhavesh commented Jul 5, 2023

sboeckelmann commented Jul 5, 2023

dakbhavesh commented Jul 6, 2023

RalphTro commented Jul 6, 2023

sboeckelmann commented Jul 6, 2023 • edited Loading

RalphTro commented Jul 6, 2023

sboeckelmann commented Jul 6, 2023 • edited Loading

RalphTro commented Apr 8, 2024

sboeckelmann commented Apr 8, 2024

dakbhavesh commented Apr 8, 2024

RalphTro commented Apr 8, 2024

dakbhavesh commented Apr 11, 2024

RalphTro commented Apr 11, 2024

dakbhavesh commented Apr 17, 2024

dakbhavesh commented Apr 23, 2024

dakbhavesh commented Jun 8, 2024 • edited Loading

RalphTro commented Jun 10, 2024

RalphTro commented Jul 15, 2024

dakbhavesh commented Jun 5, 2023 •

edited

Loading

sboeckelmann commented Jun 7, 2023 •

edited

Loading

RalphTro commented Jul 5, 2023 •

edited

Loading

sboeckelmann commented Jul 6, 2023 •

edited

Loading

sboeckelmann commented Jul 6, 2023 •

edited

Loading

dakbhavesh commented Jun 8, 2024 •

edited

Loading