Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Add a script for fetching and flattening published capdata #5880

Merged
merged 6 commits into from
Aug 8, 2022

Conversation

gibson042
Copy link
Member

...with minimal dependencies
Fixes #5878

Description

By liberal use of awk (which is required in any POSIX implementation), dependencies have been reduced to hopefully-noncontroversial curl and base64.

$ scripts/get-flattened-publication.sh https://xnet.rpc.agoric.net/ published.amm.pool0.metrics | \
  jq .
{
  "blockHeight": "513450",
  "centralAmount-brand-id": "board0425",
  "centralAmount-brand-allegedName": "RUN",
  "centralAmount-value": "202019830953",
  "liquidityTokens-brand-id": "board0188",
  "liquidityTokens-brand-allegedName": "IbcATOMLiquidity",
  "liquidityTokens-value": "202503724998",
  "secondaryAmount-brand-id": "board0639",
  "secondaryAmount-brand-allegedName": "IbcATOM",
  "secondaryAmount-value": "101498052"
}

It's also possible to use with e.g. agd, at the cost of block height information.

$ agd --node=https://xnet.rpc.agoric.net:443 query vstorage data published.amm.pool0.metrics -o json | \
  scripts/flatten_json_sequence.awk -v unwrap=value -v capdata=true | \
  jq .
{
  "centralAmount-brand-id": "board0425",
  "centralAmount-brand-allegedName": "RUN",
  "centralAmount-value": "202019830953",
  "liquidityTokens-brand-id": "board0188",
  "liquidityTokens-brand-allegedName": "IbcATOMLiquidity",
  "liquidityTokens-value": "202503724998",
  "secondaryAmount-brand-id": "board0639",
  "secondaryAmount-brand-allegedName": "IbcATOM",
  "secondaryAmount-value": "101498052"
}

Security Considerations

No one should object to using the code, since it is so narrowly-scoped in both power and effects. However, auditing it is likely more difficult than would be the case if it were written in a more common language such as JS or Go (which could be done).

I'm also not sure if this code belongs in agoric-sdk or another repository such as dckc/ag-admin.

Documentation Considerations

I have tried to include sufficient examples in source code comments.

Testing Considerations

The input data is by nature a moving target, so I think it makes sense to skip tests at least until we have adopted another publication approach (cf. #5681, #5508, #5366).

@gibson042 gibson042 added enhancement New feature or request Inter-protocol Overarching Inter Protocol labels Aug 2, 2022
@gibson042 gibson042 requested review from arirubinstein and dckc August 2, 2022 22:23
Copy link
Member

@dckc dckc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minimal dependencies

no kidding! An implementation of JSON in AWK. That's... hard core.

OTOH, as you say, reviewing it is kind of a pain. At least one of our customers is happy with python. I'd be OK reviewing python.

As is, I'll just say I have no objections but leave approval to @arirubinstein .

# }
# }
vars="$(
curl -sS "${URL_PREFIX%/}/abci_query?path=%22/custom/vstorage/data/$STORAGE_KEY%22" | \
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incidentally, this is probably a bug somewhere in Cosmos/Tendermint. Their documentation gives example values of path like /a/b/c, but making a query with ?path=/a/b/c yields a response like

{
  "jsonrpc": "2.0",
  "id": -1,
  "error": {
    "code": -32602,
    "message": "Invalid params",
    "data": "error converting http params to arguments: invalid character '/' looking for beginning of value"
  }
}

Something is apparently JSON-decoding the parameter value rather than using it directly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically https://github.com/tendermint/tendermint/blob/75d51e18f740c7cbfb7d8b4d49182ee6c7f41982/rpc/jsonrpc/server/http_uri_handler.go#L42-L51 , but I can't really get to the bottom of the reflection-heavy nonJSONStringToArg in httpParamsToArgs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

urlencode?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I get the same results with %22 vs. ". I even replaced an a with \u0061 (which is a JSON escape sequence) in the quote-wrapped version and it still worked.

@turadg
Copy link
Member

turadg commented Aug 3, 2022

By liberal use of awk (which is required in any POSIX implementation), dependencies have been reduced to hopefully-noncontroversial curl and base64.

Was the use of https://stedolan.github.io/jq/ controversial?

I see the ticket requirement,

The utility should be written such that adopting it does not risk a significant increase to potentially vulnerable surface area (e.g. by installing a complex executable).

I take that to mean that the executable we're giving them isn't complex. jq is included in most distributions so what we'd be providing would be much simpler using that than this awk script.

@gibson042
Copy link
Member Author

I don't think jq was inherently controversial, but there was definitely consternation about the complexity of requiring expressions like the following as seen on Slack from @dckc (and which would only get more complicated to incorporate block height and/or data from slots):

.value | fromjson | .body | fromjson | {
  centralAmount: .centralAmount | { brand: .brand.iface, value: .value.digits | tonumber | (. / 1000000 ) },
  secondaryAmount: .secondaryAmount | { brand: .brand.iface, value: .value.digits | tonumber | (. / 1000000 ) },
  liquidityTokens: .liquidityTokens | { brand: .brand.iface, value: .value.digits | tonumber | (. / 1000000 ) }
}

@turadg
Copy link
Member

turadg commented Aug 3, 2022

complexity of requiring expressions like the following

Maybe we're talking about different complexity. If that jq syntax were put into a flatten_json_sequence.jq then the end user complexity would be the same (modulo having to perhaps apt-get jq).

The stark difference is within flatten_json_sequence file. The the AWK is much harder to follow than any JQ and it could have errors in its JSON parsing.

# Decode the value into flat JSON and insert a member for the block height.
printf '%s' "${value_base64:-}" | \
base64 -d | \
awk -f "$FLATTENER" -v unwrap=value -v capdata=true | \
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the record, if jq is acceptable then this most complex use of awk (unwrap, expand capdata, and then flatten) can be replaced with the following:

Suggested change
awk -f "$FLATTENER" -v unwrap=value -v capdata=true | \
jq '[
# Decode JSON in `value`, capture `slots`, then decode JSON in `body`.
.value | fromjson | .slots as $slots | .body | fromjson |
walk(
if type=="object" and .["@qclass"]=="bigint" then
# Replace bigint capdata with the sequence of digits.
.digits
elif type=="object" and .["@qclass"]=="slot" then
# Replace slot reference capdata with {
# id: <value from global `slots`>,
# allegedName: <extracted from local `iface`>,
# }.
{
id: $slots[.index],
allegedName: .iface | sub("^Alleged: (?<name>.*) brand$"; "\(.name)"; "m")
}
else
.
end
) |
paths(scalars) as $path |
# Join deep object member names with "-".
{ key: $path | join("-"), value: getpath($path) }
] | from_entries'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd approve that ;-)

if jq is acceptable

I believe it is but I defer to @arirubinstein who I think represents the customer of this script

Copy link
Member Author

@gibson042 gibson042 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@arirubinstein @dckc @turadg Replaced the awk with jq. PTAAL.

.result.response.height? as $height |

# Decode `value` as base64, then decode that as JSON.
.result.response.value | @base64d | fromjson |
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today I learned that jq can encode/decode base64: https://stedolan.github.io/jq/manual/#Formatstringsandescaping

Copy link
Member

@turadg turadg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The input data is by nature a moving target, so I think it makes sense to skip tests at least until we have adopted another publication approach

Fair. I trust you've tested it with the status quo and that you'll remember to retest when the source data changes.

not sure if this code belongs in agoric-sdk or another repository such as dckc/ag-admin.

Good question. I think it's fine to include here in this misc script directory. It will also help keep the version in sync with producer logic.


if [ ":${1:-}" = '--help' -o ":$URL_PREFIX" = : -o ":$STORAGE_KEY" = : -o $# -gt 2 ]; then
printf '%s' "$USAGE"
exit 64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL!

fi

# Verify jq version.
jq --version | awk '
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm more familiar with version checking by grep and having the condition in shell code. nbd.

exit;
}
{
print "jq version out of range (must be >=1.6 <2.0): " $0 > "/dev/stderr";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious, what's required in 1.6?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@base64d, walk, and getpath.

Comment on lines +46 to +47
# Avoid the GET interface in case interpretation of `path` as JSON is ever fixed.
# https://github.com/tendermint/tendermint/issues/9164
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

helpful 👍

@gibson042 gibson042 added the automerge:rebase Automatically rebase updates, then merge label Aug 7, 2022
@turadg turadg force-pushed the gibson-5878-flatten-capdata branch from 66f93bc to 21284f3 Compare August 7, 2022 23:55
@gibson042 gibson042 removed the automerge:rebase Automatically rebase updates, then merge label Aug 8, 2022
@gibson042 gibson042 added the automerge:rebase Automatically rebase updates, then merge label Aug 8, 2022
@mergify mergify bot merged commit 914fc4c into master Aug 8, 2022
@mergify mergify bot deleted the gibson-5878-flatten-capdata branch August 8, 2022 01:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automerge:rebase Automatically rebase updates, then merge enhancement New feature or request Inter-protocol Overarching Inter Protocol
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Translate capdata into flat JSON
4 participants