-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adds filteraddrs param to searchrawtransactions API #516
adds filteraddrs param to searchrawtransactions API #516
Conversation
Thanks for the pull request. I haven't reviewed the code yet, but I think the idea is sound. I am a little concerned about the parameter creep, as you pointed out. I believe it has had 3 (and this would be the 4th) new parameter added in the last few months. The way optional parameters are handled is one of the things I really dislike about JSON-RPC 1.0, but unfortunately it's required for compatibility purposes. I suspect adding an options struct could help, although that would also have the downside of making it slightly harder to specify via the command-line such as with |
Yeah, so I'm thinking that a simple and backward-compatible way forward would be to accept this pull request basically as-is, then create a new API "searchrawtransactionsv2" or whatever you want to call it that accepts only two params: addr and options. Any new options would go into the new API, and the old one could be frozen/deprecated. Ideally addr would be changed to addr_list, in order to more efficiently retrieve transactions for multiple addresses at once. ( Bitpay's Insight API already supports this. ) With this change in mind, perhaps the new API could be called searchtransactionsforaddresses or something like that and offer more selectivity in which fields are returned. up to you... I'm just brainstorming. :-) |
For an update to this, I haven't forgot about it, just been rather busy with the PR barrage as you've probably noted. |
thx for the update. Yeah, I'm not in a real big rush about this. Though my tool/app does use the filter and it would be nice if I could announce it on reddit one of these days and have it work with official btcd instead of a forked copy. So that was my primary incentive for finishing up the pull request. |
if len(filterAddrs) > 0 { | ||
passesFilter = false | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps
passesFilter := len(filterAddrs) == 0
I am OK with this diff. My comments above are merely style. |
@dajohi thanks for the review. I made the change you suggested. The multiline if() bothered me too. I'm still getting used to Go and wanted to use the ? : operator which doesn't seem to exist. Your suggestion is a nice shortcut. |
Reviewed 3 of 4 files at r1, 1 of 1 files at r2. btcjson/chainsvrcmds.go, line 554 [r2] (raw file): rpcserver.go, line 661 [r2] (raw file): vinList := make([]btcjson.VinPrevOut, 0, len(mtx.TxIn)) Notice the addition of the extra 0 for the number of entries while the preallocated space in the backing array is the number of inputs. The rest of the code then can stay the same. rpcserver.go, line 667 [r2] (raw file): // Don't include tx if filterAddrs is not empty because coinbase
// has no address and so would never match a non-empty filter.
if len(filterAddrs) != 0 {
return nil
}
...
return vinList rpcserver.go, line 689 [r2] (raw file): rpcserver.go, line 730 [r2] (raw file): rpcserver.go, line 752 [r2] (raw file): rpcserver.go, line 754 [r2] (raw file): rpcserver.go, line 771 [r2] (raw file): encodedAddrs := make([]string, len(addrs))
for j, addr := range addrs {
encodedAddrs[j] = addr.EncodeAddress()
}
if len(filterAddrMap) > 0 {
for _, addr := range encodedAddrs {
if _, exists := filterAddrMap[addr]; !exists {
continue nextTxOut
}
}
}
var vout btcjson.Vout
vout.N = uint32(i)
...
vout.ScriptPubKey.Addresses = encodedAddrs
... rpcserver.go, line 791 [r2] (raw file): rpcserver.go, line 805 [r2] (raw file): rpcserver.go, line 810 [r2] (raw file): Comments from the review on Reviewable.io |
Review status: all files reviewed at latest revision, 1 unresolved discussion. btcjson/chainsvrcmds.go, line 554 [r2] (raw file): I will attempt the change. Comments from the review on Reviewable.io |
Review status: all files reviewed at latest revision, 1 unresolved discussion. btcjson/chainsvrcmds.go, line 554 [r2] (raw file): From the API caller's perspective, that seems messier and uglier than before:, which was simply: [addr1, addr2, addr3]. It could be that we keep the external interface an array but then copy it into a map internally. of course that adds some overhead. thoughts? Comments from the review on Reviewable.io |
return vinList | ||
} | ||
|
||
// Lookup all of the referenced transactions needed to populate the | ||
// previous output information if requested. | ||
var txStore blockchain.TxStore | ||
if vinExtra != 0 { | ||
if vinExtra != 0 || len(filterAddrs) > 0 { | ||
tx := btcutil.NewTx(mtx) | ||
txStoreNew, err := s.server.txMemPool.fetchInputTransactions(tx, true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be FetchInputTransactions, or does the mempool need locking?
Review status: all files reviewed at latest revision, 2 unresolved discussions. btcjson/chainsvrcmds.go, line 554 [r2] (raw file): rpcserver.go, line 682 [r2] (raw file): Comments from the review on Reviewable.io |
Ok, I've made all the requested changes and I think it is ready for another (hopefully final) review. I changed the internal filterAddrs var from []string to map[string]bool. This is copied from the JSON input, which is still []string. I also refactored createVinListPrevOut a bit so it is does not create the btcjson.VinPrevOut entry until after the filter passes. This matches a small refactor in createVoutList requested by @davecgh. I tested the latest code once again with: each case appears to work correctly and my client app that uses the data displays transaction list with correct values for inputs and outputs. |
Review status: 3 of 4 files reviewed at latest revision, 1 unresolved discussion. rpcserver.go, line 682 [r2] (raw file): Comments from the review on Reviewable.io |
Thanks for the updates and your work on this. I've noted a few minor efficiency-related things. Reviewed 1 of 1 files at r4. rpcserver.go, line 682 [r2] (raw file): rpcserver.go, line 691 [r4] (raw file): rpcserver.go, line 718 [r4] (raw file): nextTxIn:
for _, txIn := range mtx.TxIn {
...
if _, exists := filterAddrMap[encodedAddrs[j]]; !exists {
continue nextTxIn
} I suggest this since there isn't any reason to continue encoding all the other addresses when it's just going to continue and skip them anyways. rpcserver.go, line 756 [r4] (raw file): rpcserver.go, line 775 [r4] (raw file): nextTxOut:
for _, txOut := range mtx.TxOut {
...
if _, exists := filterAddrMap[encodedAddrs[j]]; !exists {
continue nextTxOut
} rpcserver.go, line 3180 [r4] (raw file): Comments from the review on Reviewable.io |
Review status: all files reviewed at latest revision, 3 unresolved discussions. rpcserver.go, line 718 [r4] (raw file): Now the common/usual case for Vin is that len(encodedAddrs) == 1. So they get the same result. But if len(encodedAddrs) > 1 then I believe my logic is more correct, as the goal is to retrieve all inputs and outputs that contain the filter addr(s). Am I wrong? also, I'm not sure exactly how/when len(encodedAddrs) > 1. multisig tx perhaps? Comments from the review on Reviewable.io |
I changed the type of filterAddrs to map[string]struct{}. I also renamed filterAddrs to filterAddrMap. I did not yet make the change to remove the passesFilter var and use a continue/label because I believe that changes the logic for cases when len(filterAddrMap) > 1. See my previous comment. |
rpcserver.go, line 718 [r4] (raw file): Comments from the review on Reviewable.io |
Thanks. You're correct about the code the way it is. Just need the reviewable comments to be acknowledged by you and @dajohi and this is good to go. It does need to be rebased and squashed to the latest master however. See Managing a btcd Pull Request for Code Contibutions if you need instructions. Review status: all files reviewed at latest revision, 2 unresolved discussions. Comments from the review on Reviewable.io |
d5dee01
to
1d178c8
Compare
ackknowledged. rebased/squashed. pushed. paging @dajohi |
Thanks. Though, it doesn't look like it was rebased properly. There are 15 commits instead of one that is on top of b691a22. EDIT: Looks like it's on top of your fork's master instead of upstream master. EDIT2: I'm going to guess you aren't setup the same way as the gist I linked, so your fork's master branch doesn't match the upstream branch. The following steps should work to get this PR rebased properly: git remote add tempupstream https://github.com/btcsuite/btcd.git
git fetch tempupstream
git checkout filteraddrs_pull_request_branch
git rebase --onto tempupstream HEAD^
git push -f
git remote remove tempupstream However, I would suggest setting it up like the gist I linked above since it generally makes it easier to deal with pull requests and doesn't require changing paths to match your fork's path. If you need any help there, feel free to pop on IRC and I can help you modify the repo as needed. |
1d178c8
to
c7eaee6
Compare
git is crazy, but I think I beat it into submission. :-) It's a single commit now. Review status: 0 of 5 files reviewed at latest revision, 1 unresolved discussion. Comments from the review on Reviewable.io |
Yep. Looks good. Once @dajohi acks the final discussion, I'll get it merged. Reviewed 5 of 5 files at r6. Comments from the review on Reviewable.io |
so does @dan-da get to do a btcsuite/btcrpcclient PR too? :) |
abstract
The new filteraddrs parameter enables the caller to pass a list of addresses to the searchrawtransactions API. Only inputs and outputs that match addresses in the list will be returned.
Additionally, if at least 1 filter address is passed then the (loooong) Hex attribute of the transaction will be empty. ( though key is still present. )
motivation
To increase efficiency of the API particularly in terms of bandwidth usage, processing overhead for the caller, and overall roundtrip time. The API is presently returning datasets that are 20+ MB for some addresses when the application may need only a tiny fraction of that.
example
My application displays a transaction history for one or more addresses. To do this, it calls searchrawtransaction once per address. It only needs to see the vin/vout entries that pay to or from that address. Below we call this address passed as the first param to searchrawtransactions the "target address".
For bitcoin addresses with many inputs and outputs such as those involved in mining pool payouts, the bulk of the inputs/outputs are NOT related to the target address. All this extra data is unnecessary and wasteful.
Additionally, my application does not use the Hex attribute of the transaction, but a large amount of unused data is returned with it.
Using the new filteraddrs param, my application simply passes the target address as the only element in the filteraddrs array, and the majority of unused data disappears.
This implementation makes it possible to filter vin/vout by an address other than the target address. Or by multiple addresses. But likely the most common use-case will be to filter by target address only.
how much savings?
Obviously the savings will vary depending on the address. But the savings can be very significant. For one address that I know was receiving mining pool payouts for a couple months, the total size of returned JSON is:
without filteraddrs: 22938656 bytes, 6.911 seconds
with filteraddrs: 123229 bytes, 2.087 seconds
So 22.9 MB vs 123KB. That's 186 times smaller. Also a 3x roundtrip speedup on localhost. The speedup would of course be larger if making the call over the internet given the size of data being transferred.
Also, many addresses exist with many times more transactions and vin/vout. I expect there are addresses that would return 100+ MB result sets.
testing performed
further thoughts / notes
bool flag vs array/list
If we assume that the target address is going to be the desired filter 99.99% of the time then it is simpler just to use a boolean "filterAddr=true" than a list. In fact, my initial implementation did this. However, for public use I imagine there will likely be other use cases, and the performance is about the same, so I decided to expose a filterlist to the caller.
empty filter list behavior
In this implementation the behavior is the same if filteraddrs arg is omitted or if an empty array is passed. In either case, all vin/vout and also the tx.Hex attribute are returned as per legacy behavior.
It is possible that a caller might wish to omit ALL vin/vout and might try passing an empty array to achieve this. But that won't work. A workaround for this person would be to pass a single impossible address. That will never match anything, so all will be filtered out.
Alternatively, it would be possible to change the behavior so that an empty array filters out all, but nil/null filters nothing.
Tx.Hex attribute
This patch presently includes the hex attribute but with an empty value when filterAddrs contains 1 or more values. We could hide/remove it entirely via the json omitempty flag. Alternatively, this change could be de-bundled from the patch and be part of a more comprehensive "user chooses fields" feature. see below.
further size savings
Additional savings are possible. I tried to make this patch conservative by removing only the worst offenders and thereby getting the most bang for the buck. A possible future approach could be to allow the caller to specify which fields to receive/omit for the tx, vins, vouts via a separate parameter. This would be similar to a SQL select list. I'm not proposing to work on this, as the present patch is good enough for my needs. Just throwing it out there as food for thought.
parameter creep
searchrawtransactions is such a useful function it would not surprise me if future parameters get added, and each one changes the signature and adds new default. So an idea for the dev team to consider would be creating a new param called "options" which is a struct. So adding new params is as simple as adding a new key to this struct, and callers do not have to specify all the defaults for previous params.