Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add status endpoint to check pin and deal info per cid #78

Closed
olizilla opened this issue Jul 13, 2021 · 21 comments · Fixed by #82 or #89
Closed

Add status endpoint to check pin and deal info per cid #78

olizilla opened this issue Jul 13, 2021 · 21 comments · Fixed by #82 or #89

Comments

@olizilla
Copy link
Contributor

olizilla commented Jul 13, 2021

GET /user/uploads/:cid

Proposed respsonse shape:

{
  "cid": "bafy",
  "dagSize": 101,
  "pins": [{
    "peerId": "12D3KooWR1Js",
    "peerName": "who?",
    "region": "where?",
    "status": "Pinned"
  }],
  "deals": [{
    "dealId": 12345,
    "miner": "f99",
    "status": "active",
    "activation": "<iso timestamp>",
    "pieceCid":  "baga",
    "dataCid":  "bafy",
    "dataModelSelector": "Links/0/Links"
  }]
}

This is the subset of properties in the current w3 schema that have some relevance to the user, flattened. Most Batch info is removed, apart from the pieceCid, the dataCid (the cid of the batch) the dataModelSelector, which, I'm told, could be used to extract the cid from the Piece.

The full shape of the schema that is available can be seen in the Fauna GraphQL explorer below for comparison
Screenshot 2021-07-13 at 10 16 57

@olizilla
Copy link
Contributor Author

@ribasushi is dataModelSelector the right name for it for a user facing api? Does it define the "path through the piece to extract your file"? dataSelector might be better or piecePath could pair nicely with pieceCid... but I'm not clear enough on how it's used to say. pls advise!

@olizilla
Copy link
Contributor Author

also, i'd be reaching for createdAt in place of activation... or if activation is a term of art here, then perhaps activatedAt?

@olizilla
Copy link
Contributor Author

olizilla commented Jul 13, 2021

As for the enpoint path /user/uploads is already used to list and delete a users content, so we could go with GET /user/uploads/:cid/status or similar to be consistent with "you can uses this to get info on cids you provided*, as opposed to a more general purpose cid checking api that would let anyone check the status of any cid. I'm assuming we won't provide that here. Please confirm @jnthnvctr

@ribasushi
Copy link

is dataModelSelector the right name for it for a user facing api?

As far as I am concerned - 100%, and this is what the corresponding lotus CLI/API part is called filecoin-project/lotus#6393 (comment). Caveat emptor: opinions differ on this somewhat

i'd be reaching for createdAt in place of activation... or if activation is a term of art here, then perhaps activatedAt?

I'd use activation yeah. In corner cases it can be weeks between a specific piece-cid<=>miner deal being "conceived" and it actually landing on chain.

@olizilla
Copy link
Contributor Author

@ribasushi thanks! Should we include the batchCID here too? is it the batch cid or the piece cid that one would need to apply the dataModelSelector ?

@ribasushi
Copy link

Errrr wait... I didn't read this close enough :(

  • Batch should go from user-facing stuffs: I do not refer to anything bybatch anymore, it just remained in my schema, and @alanshaw picked it up
  • The correct term in filcoin-land is Data Cid or Root Cid, take your pick
    • Users absolutely need that, no retrieval is possible without it
    • It usually starts with baf.... ( it's a typical IPFS-ey thing )
  • The piece cid is not something that actually plays a role during retrieval, it's what you use if you want to see the status of something on chain yourself ( a stretch for vast majority of users )
    • The PieceCid encoding is always baga...

@olizilla
Copy link
Contributor Author

And that Data Cid Is what we (internally) store as the Batch cid, right?

@mikeal
Copy link

mikeal commented Jul 13, 2021

love it!

@alanshaw
Copy link
Member

I think GET /user/uploads/:cid/status is good... if it's scoped to the user's uploads.

Apart from the missing batchCid/dataCid/rootCid the response looks great.

If this is authenticated then we can also include the upload name (and created date?), which might be helpful.

I originally thought we'd only have a public status API but I'm struggling to think of use cases where this is beneficial - help?

@jnthnvctr
Copy link
Contributor

jnthnvctr commented Jul 13, 2021 via email

@ribasushi
Copy link

And that Data Cid Is what we (internally) store as the Batch cid, right?

@olizilla correct!

@olizilla
Copy link
Contributor Author

@jnthnvctr i'm inclinded to start with "a user can ask about the status of CIDs they have stored". It would then not be much work to open that api to allow anyone to check the status of a CID.

@olizilla
Copy link
Contributor Author

@ribasushi just for my context, when you said

Batch should go from user-facing stuffs: I do not refer to anything by batch anymore

...what do you refer to a batch as now?

@ribasushi
Copy link

@olizilla I call it aggregate. It can simply be called deal as well. batch implies a certain set of queuing properties we do not actually do, is why I want to move away from it.

@olizilla
Copy link
Contributor Author

For the scope of this enpoint we could

  1. Require authentication (logged in user) and authorization (you are the user that uploaded the CID)
  2. Require authentication, but any logged in user can check any CID
  3. Require nothing. No auth, No masters. TOTAL FREEE3DOOM. anyone can ask us the status of any CID.

I assumed I was building 1. where a user could check on their stuff. But it is simpler to build the public version. The concern there would be folks hammering the api, and a nebulous concern about leaking info about what our users are storing. Filecoin deal info is public, but I don't think it's currently possible for anyone to determine which CIDs are contained within which deals. On the flip side, you'd also have to already have the content or be given the CID to ask about it, so this may not be a concern.

@ribasushi
Copy link

Require nothing. No auth, No masters. TOTAL FREEE3DOOM. anyone can ask us the status of any CID.

The above is 100% the desired endstate, be it M1 or M2.

@ribasushi
Copy link

ribasushi commented Jul 14, 2021

I don't think it's currently possible for anyone to determine which CIDs are contained within which deals

@olizilla every deal our aggregator makes carries a complete manifest of what is contained therein, as the first entry of the first directory, by design. This is to allow indexers and plain curious bystanders to attempt a retrieval of ... --datamodel-path-selector 'Links/0/Hash' and examine the few MiB of content without needing to retrieve the entire thing.

@jnthnvctr
Copy link
Contributor

jnthnvctr commented Jul 14, 2021

Agreed with @ribasushi I think we want to do (3) - but this is a good reminder that in the docs (cc @terichadbourne ) we should be heavily emphasizing that these are public open networks and the onus is on users to be encrypting before sticking into these systems.

@olizilla re: your last comment - I think the last bit is right, for a lot of these concerns they require knowledge about a CID already (if they had the CID they could also just request teh content out of IPFS), so its not necessarily leaking additional information. In the end state of what a deal indexing should surface, you would be able to come with a CID to request deal info (which miners have X CID) - so I'd nudge us to mirror that (eventual) world.

I think that does mean that the info contained in the Status API should only be the public network info (so not exposing user defined names, creation time) - as even in the end state of a public deal index we wouldn't have those.

@olizilla
Copy link
Contributor Author

@jnthnvctr GOOD NEWS. I read the vibe. You and everyone commented in favour of option 3, so that is what I have implemented in #82

olizilla added a commit that referenced this issue Jul 14, 2021
- add **Unauthenticated** enpoint for checking pin and deal status by CID.
- adds mock and fixture for testing.
- only show pins and deals that are queued or active.

**Example reponse**

`GET /status/testcid`
```json
{
  "cid": "testcid",
  "dagSize": 101,
  "pins": [{
    "peerId": "12D3KooWR1Js",
    "peerName": "who?",
    "region": "where?",
    "status": "Pinned"
  }],
  "deals": [{
    "dealId": 12345,
    "miner": "f99",
    "status": "Active",
    "activation": "<iso timestamp>",
    "pieceCid": "baga",
    "dataCid": "bafy",
    "dataModelSelector": "Links/0/Links"
  }]
}
```

Fixes #78 

License: (Apache-2.0 AND MIT)
Signed-off-by: Oli Evans <oli@tableflip.io>
@mikeal
Copy link

mikeal commented Jul 14, 2021

Is it possible to get the timestamp of when the CID was first written to web3.storage and when each deal completed?

we want metrics on this anyway, so we want to make sure we capture all that data. if we have the data we might as well expose it here too.

@olizilla
Copy link
Contributor Author

Yes! we have a created property on the content object in the schema. I shall add it to the response.

alanshaw pushed a commit that referenced this issue Jul 15, 2021
- add `created` denoting when we first saw this content
- add `pins.updated` for when the pin status last changed
- add `deals.created` and `deals.updated` to the filecoin deal status
- updated tests and docs
- fix 404 logic

fixes #78 (comment)

**Example status response**

```json
{
  "cid": "testcid",
  "created": "2021-07-14T19:27:14.934572Z",
  "dagSize": 101,
  "pins": [{
    "peerId": "12D3KooWR1Js",
    "peerName": "who?",
    "region": "where?",
    "status": "Pinned",
    "created": "2021-07-14T19:27:14.934572Z",
  }],
  "deals": [{
    "dealId": 12345,
    "miner": "f99",
    "status": "Active",
    "pieceCid": "baga",
    "dataCid": "bafy",
    "dataModelSelector": "Links/0/Links",
    "activation": "<iso timestamp>",
    "created": "2021-07-14T19:27:14.934572Z",
    "updated": "2021-07-14T19:27:14.934572Z"
  }]
}
```

License: (Apache-2.0 AND MIT)
Signed-off-by: Oli Evans <oli@tableflip.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants