-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhancement Proposal: GET linked invoice/ACK metadata #6
Comments
How should ack status get updated? The financial system of the Australian Business must be authenticated/authorised to the Gateway where acks get posted. So that means we could accept Alternatives depend on the design of the ACK state machine. If there is a reasonably linear, irreversible process where states can be skipped, then there are other options to PUT. For example, clobbering state with Suggest PUT updates to the ACK with an embedded state-change journal that is returned in the body of the GET response. |
About signature verification of invoice parameters: Maybe
This way, the invoice data is only POSTED once (by the seller) and PUT once (transition out of null-acknowledgement state). And PUT is tightly constrained to the null fields, signed hashes are imutable. |
sorry, that's silly. Salt with the GUID. |
Maybe we should chunk it up in a way that is congruent with the UBL objects/collections. What I mean is
Where url of each object is optional (but not hash - if object present then must have hash), and if a url is present then presumably it requires authentication and appropriate authority to access. So, if an Australian Business provided the whole invoice to an interested party, they could verify each chunk in turn. Or the business might provide only relevant chunks to interested parties, who could verify those objects individually. This might seem overcomplicated, but mapping to UBL semantics does not mean we have to map to their document granularity. HATEOS style approach should be simpler and more versatile in the long run, and shouldn't add real difficulty to maintaining a UBL/REST adapter. |
Phew! What a lot of ideas. I like the idea of a HATEOAS style collection of URL actions in response to a GET/invoices/{GUID}. Good questions about the state lifecycle of invoice - and something that will for sure provoke a lot of community discussion. There are several possible states and so that implies there are logically several response messages for a given invoice. There are a few implementation options. All start with a POST /invoices to the receiver gateway and get a GUID response that is the key to that specific invoice. After that there are several ways we could manage responses.
Since I think we would also want to allow the recipient to POST responses back to the sender whenever there is a status change, option 2 fits more neatly with that model. So the recipient gateway (of an invoice) will POST responses back to the invoice sender (assuming the sender SMP specifies that capability) and it would also host the response for any third parties (eg debtor financiers) to GET/response/{GUID}. This discrete responses model is probably also a bit more compatible with a future blockchain style shared state model. If you make a pull request along these lines, I'll be pleased to accept. |
Can we get higher-entropy resource identifiers please? Even if there's no easy exploit, a recognizable UID structure may encourage hostile experimentation. |
Excellent point. I'm not an expert on algorithms for cryptographically strong GUIDs (as opposed to just unique but still maybe guessable GUIDs). Any suggestions? |
Yes. The bonus is it's based on GUIDs under-the-bonnet, so the generating access point can still work with those. I have to produce a "vanilla" implementation and get permission to publish. Won't take too long. |
UUID4 has 100 bits of entropy. It didn't occur to me that it wasn't enough, it's a good point. https://tools.ietf.org/html/rfc4122.html#section-6 actually makes it too:
We should be using payload encryption where we need it. It's fair to assume anyone who wants a complete copy of the public data can have one. What would be the problem if someone could magically guess the id's? |
|
No argument about payload encryption and signing, we are on the same page there. I'm not saying "I can't think of a specific threat, therefor there isn't one". I'm saying "any sufficiently determined/lucky attacker can get access to (some or all) invoice and acknowledgement URLs". The URLs are not guarded secrets, they are bandied about to whoever needs them. Some parties will be malevolent, foolish, and/or unlucky; It's unavoidable. So, when (not if) an attacker gets access to invoice and acknowledgement URLs, what's the damage? In other words we might POST a signed and encrypted invoice/response, but we can only GET safe public data (e.g. the signature of the plaintext). And only store safe public data too. So if I provide an plaintext invoice + URL to an interested party and assert that I sent it, the interested party can compare a locally computed hash of the plaintext with the signature retrieved by GET URL, and verify/falsify my assertion. Assuming we don't make a mistake with the crypro, I think that publishing the hash does not leak any commercially sensitive information. Is that right? The difficult thing is to prevent commercially sensitive information leaking out through historic traffic analysis (assuming attacker has access to a high proportion of URLs, or even all of them). I think that means we need to ensure the public information is non-identifying. Assuming the attacker also has the entire NAPTR record, the only identifying information about the URL is the Access Point / Gateway (AP/GW, nomenclature?) that the URL belongs to, which translates to a collection of ABN through the NAPTR DB. That's an argument for using a popular AP/GW, there will be a size of AP/GW that's to small (identity of URL could be guessed). A sufficiently large one will provide identity-safety-in-numbers. |
I am not technically experienced to comment here but from a commercial perspective for debtor finance, this HATEOAS style collection of URLs to provide up-to-date statuses for invoices will be invaluable for a financier. We will be able to calibrate our API with the accounting platform who sends outs the invoice to dynamically track these statuses so if there were any red flag events, the appropriate actions can be pursued immediately. |
@AlistairSkippr the only argument we have with HATEOAS is the acronym itself: it sucks. What we're saying here is when we specify an identifier in a URL we don't use a database ID or a UID or any recognizable/enumerable data type, but rather as purely random (and therefore meaningless, except as an ID) a sequence of characters as possible. |
Do you prefer "RESTafarian", or should we just say "no session-state, locking mechanisms or any of that rubbish". |
We're stuck with the acronym. Otherwise we'd have to link to an appropriate explanatory resource every time we used a less annoying term because people wouldn't be able to Google it. |
We would expect that URLs will contain information useful to observers without compromising the confidentiality of the messages or the parties involved. Just so long as no one can infer anything about an Access Point's implementation from components of the URL (so to repeat: no DB IDs, no UIDs). Please note @monkeypants your example of a plain text invoice won't happen, because we'll be end-to-end signed and encrypted. Another reason why it won't goes right to your point about needing to assure participants that senders are who they say they are. We get that for free with no additional complications with asymmetric crypto. Otherwise we're going to have to specify canonical forms, signing blocks, appending rules and hey presto: we've reinvented ws-security. You could adopt an existing RESTful signing implementation, but trust me, a good one will just as much of a pain to implement as ws-security. Actually no, don't trust me. Check this out: http://docs.aws.amazon.com/general/latest/gr/sigv4_signing.html |
Alice sends Bob an Invoice by posting it to his Gateway. The invoice is encrypted with Alice's private key and Bob's public key. Nobody can inspect the invoice in flight without Bob's private key (including the Gateway). Bob is certain the invoice came from someone with access to Alice's private key (i.e. either Alice or someone she fully trusts). There is some back and forth between the Gateways, resulting in URLs for the invoice (at Bob's Gateway, where Alice sent it) and Bob's response (at Alice's Gateway, where Bob sent it). The resource at the URL does not contain anything that identifies Bob or Alice, nor does it contain the contents of the invoice (encrypted or otherwise).
Charles is an interested party in Alice's business. Perhaps a Debtor Financier or Auditor. Alice has access to the plaintext invoice (because she sent it). She sends it to Charles through another channel (good news Charles, I anticipate this income!). She also sends Charles the URL of the invoice endpoint. Charles generates a hash of the alleged plaintext, because he has the copy Alice gave him. Charles accesses the invoice URL and gets the published hash. Because they match, he knows that Alice did in fact send the exact invoice she provided him with. At that point it's all he knows for sure, not who it was actually sent to (or what they think of it). Charles follows the link on the invoice endpoint to access the response endpoint. This contains information about the status of the invoice, allegedly from Bob's point of view (although it's origin is not trusted on face value). This information is valuable to Charles, because he knows the subject of the status (the invoice), although this context is not evident from the response endpoint. The status information is signed by Bob. Charles knows who should be signing it (because he has access to the invoice plaintext), so he finds Bob's public key and verifies the signature. This tells him two things: that the right person received and responded to the Invoice (not some shrill of Alice's), and the state of the invoice processing by Bob. If Alice had not provided Charles with the invoice document, the public data at the invoice and response endpoints would just be a random-looking hashes at a random-looking URLs. There might also be a case for Dave, a party with an interested Bob's liabilities...
Yes, but it's also not that simple. Alice and Bob are known to each other. The public data needs enough information to unlock value without without leaking sensitive information. I think about this in three layers of trust.
If we can make it work, the value of the REST design is that it factors trust and sophistication out of the gateways. It's not eliminating Gateways, it just reduces them to a commodity service that proxies the counter-party and exposes public data. The Gateway is still valuable because it:
The way I see it, concentrating trust in the Financial Software (by factoring it out of the gateway) is the fundamental value proposition of the REST standard. Although the protocol described above does not demand it, trust in the gateway not to delete/tamper with public data could be factored out even further by using a blockchain ledger. for example, about 10-40 minutes after the invoice endpoint is published, it becomes impossible for the Gateway to delete or tamper with it (neat!). This would make it even more attractive for enterprises such as Charles to build systems on the protocol, perhaps enabled by partnerships with Financial Software that cultivate a differentiating service ecosystem. That would be ruined by complex trusted gateways dependant on perimeter security; the system is more liquid if trust and complexity can be pushed to the edge of the network. |
We agree with what you're trying to achieve. It's the mechanism that is the problem. Hashing is fraught without a canonical form. With plain text you have to specify a newline convention, a character encoding, and within that other little details that will break a hash such as whether a UTF-8 encoding must have a byte order mark. We can't guarantee that the invoice will always be plain text though. With JSON and XML the rules for canonical forms are considerably uglier. And what about future serialization mechanisms? It needs to be a binary format. In this case we could specify that documents are signed first and then encrypted. This way Alice can hash the signature block, include the hash in the URL (noting that it has to be passed between all Access Points in the chain from her to the recipient) and pass the signature block to Charles. Byte-for-byte it's exactly the stream that arrives at the recipient's Access Point. If you're good with that I think we're on to something. |
When I said "plaintext", I just meant "not cyphertext". I didn't mean any particular encoding. Yes, it has to be byte-for-bite equivalent. I'd be happy with a binary format that's well supported by open source libraries and not patent encumbered. Anything spring to mind? |
I see why you think a binary invoice encoding will simplify reliable hash comparison. Response status is probably a simple finite state machine with a timestamped journal of states. It might be easier for Charles and Dave if this was json/xml encoded. Do you think we need it to be in a binary format too (for simple reliable signature checking)? |
|
Sorry I'm a bit late to this discussion. There seem to be a number of issues being discussed here. Just to clarify/re-cap, we're trying to answer these question?
And the above need to be achieved under the following constraints:
My thoughts on the above constraints:
So on the questions posed further up, just as a starting point for discussion, how about something along the lines of:
I'm sure there are problems with the above scheme, but I figure it's a decent starting point for discussion of specifics. Commence hole-poking :) EDIT: Just to get the ball rolling on 'hole-poking', a sensible alternative would be to define very granular OAuth scopes/attributes and then condition GET access to an invoice (or field value) on the request embedding a valid bearer token in the request header (or whichever other OAuth/OIDC auth flow is appropriate). Personally, I think this is a neater solution, but would only be comfortable with it if the government were to develop a OAuth/OIDC identity provider assurance & audit framework (to allow private third-parties to offer competing alternatives to the government IDP). Otherwise the entire system will be reliant on a government identity assurance service monopoly. It's also neater in the sense that:
|
This thread is getting epic, thank you patient reader.
Sorry, it took a while for the penny to drop on this. You are right, the hash of the encrypted payload could be the psudo-random ID used in the URL. So anyone in the chain from Alice to Bob could generate the ID. Since they can also lookup Bob's Gateway, they could derive the entire public invoice URL. This squeezes a bit more trust/value out of the gateway by removing it's need to contribute entropy, so it seems like an improvement to me. I had assumed encrypt-before-sign not sign-before-encrypt. Gateways need to know sender and recipient identity, so with encrypt-before-sign they could validate sender (perhaps return 409 Conflict response to clients that attempt to forward a payload with an invalid signature). It would be OK without this, but it just seems like good manners. On the other hand, the hash of the response would not be a stable URL if the responses changes over time (timestamped journal etc, e.g. 1. queried, then 2. acknowledged, then 3. cheque's in the mail). It could be a hash of the invoice identifier salted with an attribute of Bob (such as his unique SMP endpoint), rather than a hash of his response payload (which would obviously not be known ahead of time). Other schemes include pre-computing hashes for all possible responses, but I think that gets a bit messy.
That's good, I didn't think about protecting an out-of-band channel between Alice and Charles. Can you explain a little more about how to generate the RSA bytestream, I don't see how that avoids canonical form. Imagine I start with a UBL/XML document of dubious encoding...
Yes, that's exactly why I imagined encrypt-before-sign. I had assumed Charles fetched the response himself rather than Alice provided it to him, but otherwise the same. Charles always knows who Bob is, because Alice gave him a copy of the invoice plaintext (binary encoded :)
Yes it's counterintuitive, but that's exactly what I am advocating. I'm pretty sure its the simplest and most secure scheme with the best national productivity dividend (especially in conjunction with a public blockchain ledger that prevents future tampering/deletion). But, I'm not going to argue with users about what they want, that's a very tricky business. Here's my argument about what's the best technical solution... If we accept that there is value in Charles obtaining proof that Bob responded to Alice's invoice, then we have to consider two things: Should it be an optional or mandatory part of the protocol, and what is the best trust zone for that information to come from.
I am advocating that it should be mandatory and public (zero trust). Apart from the false intuition of insecurity (a very real problem that can't be ignored), this is not dangerous because while the data is public, the information is not (unless you also possess a guarded secret, the content of the invoice). Optional + Trustless is counterproductive because the presence/absence of response data exposes information about Alice's financial circumstances (by implication). As the number of derived services increases over time, the implication about Alice's financial arrangement would be diluted and there would also be more reasons to elect to publish response data. So eventually publishing would become a defacto-mandatory (the implication of not-publishing is that you are an empty shell business, which is suspicious). So the option of not publishing would be an annoying quirk. There is very little difference between optional and mandatory high-trust schemes. They both suffer from a systematic response bias (reducing the utility to Charles). For example if Charles is a creditor, Alice will be inclined to send positive responses eagerly but negative responses hesitantly (or not at all). This will increase the cost of Alice's credit, mitigating the national productivity benefit of the whole scheme. The possible compromise is a low-trust zone message, where mandatory and optional are indistinguishable. Both have the benefits of the no-trust schemes (eliminate Alice's response-bias) and high-trust schemes (obscure Alice's financial arrangements). Two examples of low-trust schemes might be:
So far, all the schemes in the low-trust zone that I have been able to think of push trust and sophistication back from the edge of the network towards the gateway. That seems like a heavy price to me, which has to be weighed against the cost of managing a false security intuition. Difficult problem... |
You can say that again @monkeypants . I have to submit some pseudo code and respond to Mr Muir's "asymmetric crypto is an infosec risk" thing before I get to your latest essay. A few things you can expect from us:
|
@asmith1024 solved that to my satisfaction. Hash of the payload. This would need to be salted to prevent a known-cyphertext attack, cycled through a predetermined number of iterations of something like pbkdf2 to prevent rainbow table attack etc. |
@monkeypants pseudocode that satisfies your requirements is coming. We also need to establish a lingua franca for code snippets. |
@asmith1024 Absolutely agreed on the OASIS complexity thing. The easier to implement the better. |
OASIS seems economically inverted to me. It maximises the value of the traffic to the gateway network, rather than maximising the value of the gateway network to the traffic. It's a bridge-troll by design. |
@monkeypants Btw, have we got a basic example of an e-invoice somewhere? I've got the Hydra OAuth/OIDC system running on my laptop (see: https://github.com/ory-am/hydra) and wouldn't mind trying to create a bunch of granular, customised scopes to secure endpoints and enable selective info disclosure. |
If you follow the links on the read me in this repo you'll be taken to the swaggerhub spec which includes a sample structure Steven Capell
|
Cheers Steve. I'm heading home now but should be back up and running in a bit. Maybe we should thing about setting up a gitter or slack chatroom as some point. Then again, maybe not. It would make some of this discussion 'no so public'. |
@markmuir87 first the crypto thing:
All of this (except item 4), in fact the entire Internet, is predicated on certificates being properly secured by their owners. If a hostile party in possession of a compromised cert also controls an access point, router or proxy, every payload encrypted by a compromised cert that passes through is vulnerable (so in general for a transaction this will be only in one direction). Until the cert is replaced. I'll guarantee you that this will happen. TLS is underpinned by a cert. Compromise that and your session keys can be as ephemeral as they like, your server and all its comms are still P0wNx0rd. Our risk profile includes a section that tries to understand the impact on the security of our clients' data in the presence of a hostile insider. No matter how clever you are this is bad, but you can limit the damage. Relevant here is not encrypting all your clients' data with the same cert, and not allowing any single admin account access to all of the certs. Of course someone will turn up with an access point that doesn't adhere to proper policies, that eventually gets compromised and dumps its clients' data. Happens all the time. |
OK so we need a transaction ID, potentially covering a whole paper trail of interactions in a business scenario. Every time a new document is created we have a new signing hash and it's going to get messy chaining them all together. (See what I did there? Version 2). The one good thing that ebMS3 did was introduce the notion of a "Conversation ID", but we need our equivalent to be part of URLs, so it can't be enumerable, etc. The following method is UID-based and very fast (although it only has to be computed once in the lifetime of a transaction). It also produces a minimum of 86% entropy (varies with your platform's UID generation algorithm). This is good.
The symmetric encryption preserves the uniqueness of the underlying UID without exposing its structure. No hash collisions. An algorithm quite similar to this was developed during the course of penetration testing an API I am involved with. I would not hesitate to recommend the testing resource. Let me know if you need some severe punishment and I'll give you his number. |
That does look like a reliable way to produce an unguessable, unique transaction id. I'm not qualified to scrutinise it, but happy to accept it as a black box that may or may not get updated later in the development process. I also see why a it would be useful to have transaction id linking the invoice and responses, especially for the inevitable hairy corner cases we haven't discussed yet (key revocation, software provider changes, SMP updates, etc), that can't be allowed to disrupt business of course. But are you sure it belongs in the URL, not the payload? It seems to function a lot like an invoice number (Alice's reference code). I might be misunderstanding your intention here. If it was in a URL, maybe is that a new interface that links invoice versions? |
@monkeypants yes it is an easy and fast algorithm to start with and it can easily be replaced. Since it doesn't signify anything except an unguessable unique identifier multiple generator versions can coexist. I'm only including it because now we are juggling two essayists with contributions that we would consider taking on board. A transaction ID is one of them. |
Isn't the restful equivalent of a transactionID to be found in that dreaded acronym .. HATEOAS ? Steven Capell
|
The use case for this is partly your fault @onthebreeze, and partly @markmuir87 . Alice presents Charles with Bob's acknowledgement signature block, but when Bob later puts a stop on the invoice she forgets to inform Charles. The TxID allows Charles to query the workflow (not its contents but its existence). He is then able to confirm that there is more to the story than Alice has told him and can request the missing signature blocks. HATEOAS does not specify the form this TxID takes. We will require a string of random nastiness for this. |
That depends. If I understand correctly, transaction is a new concept (relative to invoice and response) that links one or more invoices. Bob queries Alices' invoice "I thought we agreed you weren't going to charge me for those peanuts". Alice says "oh sorry, parlay that, here's a new one" (new URL, same transaction id). Bob acknowledges the new one, Charles nods approvingly. But I' far from certain that's what @asmith1024 had in mind. |
I am trying to accommodate @markmuir87's observations here, so yes a workflow link. Later down the track we could imagine this ID linking all the documents in a business process (tender, quote, invoice, delivery, blah blah), including updates, addenda and so on. |
Yes I see. What we are really talking about is an un-guessable key that is shared with On 9 July 2016 at 11:45, Andrew Smith notifications@github.com wrote:
|
I think I might be missing something on this epic thread. If the hash of Charles uses the GUID to GET from Bobs gateway and, HATEAOS style, can find So the GUID first generated by the creator of the “thing” that has an As I mentioned before, I dont think there is any point in a kind of "mega On 9 July 2016 at 12:04, steve capell steve.capell@gmail.com wrote:
|
My last post was simultaneous to Andrew's. Workflow link is a good desceription of what I thought you meant. Another URL, a different thing. Maybe a new ticket? Subject of this ticket is a proposal to extend the public gateway GET interfaces with linked invoice/ACK metadata. Assuming we pull all the interesting side conversations into new discrete tickets, what do we need to do to resolve that proposal? |
shrugs I think we've reached the point where we've got far too many words and nowhere near enough code. This is a discussion about transport and metadata, and can safely be continued independently of the BPL->JSON-or-whatever guff. Let me run something up we can play with, or that I can at least demonstrate in a Hangout or something. Then we can alter the routes and mock responses until everyone's singing off the same song sheet. |
@monkeypants the canonical form I was thinking of was simply the conventional byte encoding for UTF-8 (so if you're in .NET you use a new UTF8Encoding(false) and then you're playing nicely with all the other kids). Send it as an application/octet-stream or Base64 encode it and send as text/plain (the receiver knows what to do with it based on the MIME type). This way we separate the cryptographic properties of the document from the semantics. |
For binary format (octet-stream or Base64 endoded), http://msgpack.org/ seems extremely well supported by different language bindings. Getting UTF-8 encoded json (or native types) through a msgpack codec seems like a few lines of code in any language I care about. Does it look like a good fit to you @asmith1024? |
Shiny! |
nb: GovHack project http://slay-the-bridge-trolls.readthedocs.io/ |
Further to conversation about #5, suggest we consider enhancement to 1.0 version of the protocol that allows access to data about the state of an invoice/receipt by utilising gateway-generated GUIDs and GET verbs.
For example, if successful submission (response code 200) of
POST /invoices/
returned data containing a gateway submission GUID, thenGET /invoices/<GUID>/
(at the same gateway) could return data about the state of the submission. (subsequent analysis required - what is the state transition model of a submission? it's it more complicated than "unacknowledged" or "acknowledged"?)and also:
This implies that the data returned from
POST /invoices/
has two properties:For example, an Australian Business might tell their Financier (Interested Party) "I sent this invoice, and here is the gateway endpoint for the submission", and the financier would be able to verify that claim (of submission - same date, amount, etc) and monitor it's status (hmm, queried/disputed). Financiers making use of this information may be able to provide more price-competitive credit through a combination of lower operating costs (automation) and more precise risk management.
Another example might be a debt collection service that is sent verifiable invoices / status endpoints, and who uses them (in combination with payment data) to drive an automated debt recovery protocol.
Is this all-or-nothing (one signature for the whole invoice), or would an Australian Business want to share some but not all data from a particular invoice with a third party who then needs to verify it?
Similarly, a successful
POST /responses/
(or whatever the noun is, it's still not specified in the spec) should return a data containing a GUID, such thatGET /responses/<GUID>/
will describe the state of the acknowledgement. This may or may not be a duplication of the data available in theGET /invoices/<GUID>
endpoint. My preference is that the /invoice/ endpoint contains a link to the relevant /responses/ endpoint (for better cache performance ofGET /invoices/
subsequent to transitioning out of null-acknowledgement state), but either could work I think.Again, modelling the state machine of acknowledgements would be an interesting exercise - are there different stages that could be mapped to standard accounting practice? For example, is an invoice first mechanically acknowledged (valid submission received by gateway, forwarded to business), then some kind of human acknowledgement (received by business but no comment about liability), then some kind of affirmation (or dispute - indicate intention to pay/not pay)? Perhaps even a "the cheque's in the mail" assertion...
Note: I'm assuming GUID is essentially random and that /invoice/ and /response/ are different values of GUID, even if the response corresponds to the invoice.
If an invoice has been acknowledged, then the
GET /invoices/<GUID>
should contain a URL for theGET /responses/<GUID>
. In other words, an invoice should link to it's response (if any) but not necessarily the other way round.The text was updated successfully, but these errors were encountered: