Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adds filteraddrs param to searchrawtransactions API #516

Merged

Conversation

dan-da
Copy link
Contributor

@dan-da dan-da commented Oct 17, 2015

abstract

The new filteraddrs parameter enables the caller to pass a list of addresses to the searchrawtransactions API. Only inputs and outputs that match addresses in the list will be returned.

Additionally, if at least 1 filter address is passed then the (loooong) Hex attribute of the transaction will be empty. ( though key is still present. )

motivation

To increase efficiency of the API particularly in terms of bandwidth usage, processing overhead for the caller, and overall roundtrip time. The API is presently returning datasets that are 20+ MB for some addresses when the application may need only a tiny fraction of that.

example

My application displays a transaction history for one or more addresses. To do this, it calls searchrawtransaction once per address. It only needs to see the vin/vout entries that pay to or from that address. Below we call this address passed as the first param to searchrawtransactions the "target address".

For bitcoin addresses with many inputs and outputs such as those involved in mining pool payouts, the bulk of the inputs/outputs are NOT related to the target address. All this extra data is unnecessary and wasteful.

Additionally, my application does not use the Hex attribute of the transaction, but a large amount of unused data is returned with it.

Using the new filteraddrs param, my application simply passes the target address as the only element in the filteraddrs array, and the majority of unused data disappears.

This implementation makes it possible to filter vin/vout by an address other than the target address. Or by multiple addresses. But likely the most common use-case will be to filter by target address only.

how much savings?

Obviously the savings will vary depending on the address. But the savings can be very significant. For one address that I know was receiving mining pool payouts for a couple months, the total size of returned JSON is:
without filteraddrs: 22938656 bytes, 6.911 seconds
with filteraddrs: 123229 bytes, 2.087 seconds

So 22.9 MB vs 123KB. That's 186 times smaller. Also a 3x roundtrip speedup on localhost. The speedup would of course be larger if making the call over the internet given the size of data being transferred.

Also, many addresses exist with many times more transactions and vin/vout. I expect there are addresses that would return 100+ MB result sets.

testing performed

  1. Updated test cases for searchrawtransactions and added 1 new test case for the filteraddrs param.
  2. passes goclean.sh tests.
  3. Verified manually that the API works as expected when called (a) without filteraddrs param, (b) with filteraddrs set to nil, (c) with filteraddrs set to array containing target address only, (d) with filteraddrs set to array with multiple addresses from the result set.
  4. Used in my application and verified that the transaction history appears the same with or without the filterAddrs param in use.

further thoughts / notes

bool flag vs array/list

If we assume that the target address is going to be the desired filter 99.99% of the time then it is simpler just to use a boolean "filterAddr=true" than a list. In fact, my initial implementation did this. However, for public use I imagine there will likely be other use cases, and the performance is about the same, so I decided to expose a filterlist to the caller.

empty filter list behavior

In this implementation the behavior is the same if filteraddrs arg is omitted or if an empty array is passed. In either case, all vin/vout and also the tx.Hex attribute are returned as per legacy behavior.

It is possible that a caller might wish to omit ALL vin/vout and might try passing an empty array to achieve this. But that won't work. A workaround for this person would be to pass a single impossible address. That will never match anything, so all will be filtered out.

Alternatively, it would be possible to change the behavior so that an empty array filters out all, but nil/null filters nothing.

Tx.Hex attribute

This patch presently includes the hex attribute but with an empty value when filterAddrs contains 1 or more values. We could hide/remove it entirely via the json omitempty flag. Alternatively, this change could be de-bundled from the patch and be part of a more comprehensive "user chooses fields" feature. see below.

further size savings

Additional savings are possible. I tried to make this patch conservative by removing only the worst offenders and thereby getting the most bang for the buck. A possible future approach could be to allow the caller to specify which fields to receive/omit for the tx, vins, vouts via a separate parameter. This would be similar to a SQL select list. I'm not proposing to work on this, as the present patch is good enough for my needs. Just throwing it out there as food for thought.

parameter creep

searchrawtransactions is such a useful function it would not surprise me if future parameters get added, and each one changes the signature and adds new default. So an idea for the dev team to consider would be creating a new param called "options" which is a struct. So adding new params is as simple as adding a new key to this struct, and callers do not have to specify all the defaults for previous params.

Review on Reviewable

@davecgh
Copy link
Member

davecgh commented Oct 19, 2015

Thanks for the pull request. I haven't reviewed the code yet, but I think the idea is sound.

I am a little concerned about the parameter creep, as you pointed out. I believe it has had 3 (and this would be the 4th) new parameter added in the last few months. The way optional parameters are handled is one of the things I really dislike about JSON-RPC 1.0, but unfortunately it's required for compatibility purposes. I suspect adding an options struct could help, although that would also have the downside of making it slightly harder to specify via the command-line such as with btcctl. The said, it's probably easier than having to specify defaults for all of the other optional parameter you probably don't really care about due to the positional requirements.

@dan-da
Copy link
Contributor Author

dan-da commented Oct 19, 2015

Yeah, so I'm thinking that a simple and backward-compatible way forward would be to accept this pull request basically as-is, then create a new API "searchrawtransactionsv2" or whatever you want to call it that accepts only two params: addr and options. Any new options would go into the new API, and the old one could be frozen/deprecated.

Ideally addr would be changed to addr_list, in order to more efficiently retrieve transactions for multiple addresses at once. ( Bitpay's Insight API already supports this. ) With this change in mind, perhaps the new API could be called searchtransactionsforaddresses or something like that and offer more selectivity in which fields are returned.

up to you... I'm just brainstorming. :-)

@davecgh
Copy link
Member

davecgh commented Oct 23, 2015

For an update to this, I haven't forgot about it, just been rather busy with the PR barrage as you've probably noted.

@dan-da
Copy link
Contributor Author

dan-da commented Oct 23, 2015

thx for the update. Yeah, I'm not in a real big rush about this. Though my tool/app does use the filter and it would be nice if I could announce it on reddit one of these days and have it work with official btcd instead of a forked copy. So that was my primary incentive for finishing up the pull request.

if len(filterAddrs) > 0 {
passesFilter = false
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps

passesFilter := len(filterAddrs) == 0

@dajohi
Copy link
Member

dajohi commented Oct 28, 2015

I am OK with this diff. My comments above are merely style.

@dan-da
Copy link
Contributor Author

dan-da commented Oct 28, 2015

@dajohi thanks for the review. I made the change you suggested. The multiline if() bothered me too. I'm still getting used to Go and wanted to use the ? : operator which doesn't seem to exist. Your suggestion is a nice shortcut.

@davecgh
Copy link
Member

davecgh commented Nov 10, 2015

Reviewed 3 of 4 files at r1, 1 of 1 files at r2.
Review status: all files reviewed at latest revision, 13 unresolved discussions.


btcjson/chainsvrcmds.go, line 554 [r2] (raw file):
What do you think about making this a *map[string]struct{} instead of a *[]string? My thinking is it would automatically prevent duplicate addresses, something the current approach doesn't deal with, and it would also generally make checking if an address was specified faster versus the current need for the stringInSlice function which searches through the slice each time.


rpcserver.go, line 661 [r2] (raw file):
This should preallocate the entries in the backing array even if only some of them end up being added. The problem with the current approach is you force several reallocates and memory copies since the original backing array is empty. It is hard on the garbage collector and less efficient.

vinList := make([]btcjson.VinPrevOut, 0, len(mtx.TxIn))

Notice the addition of the extra 0 for the number of entries while the preallocated space in the backing array is the number of inputs. The rest of the code then can stay the same.


rpcserver.go, line 667 [r2] (raw file):
Please reverse this so the code below it is unindented and the return is instant. That is:

// Don't include tx if filterAddrs is not empty because coinbase
// has no address and so would never match a non-empty filter.
if len(filterAddrs) != 0 {
    return nil
}

...
return vinList

rpcserver.go, line 689 [r2] (raw file):
Please move this down to just before it's used. This is exactly where the old vinEntry := ... was.


rpcserver.go, line 730 [r2] (raw file):
Please add the condition after the && on the next line. We aim to keep the code at 80 cols. A few chars is acceptable here and there, but this one is pushing 106.


rpcserver.go, line 752 [r2] (raw file):
Same here regarding the preallocation.


rpcserver.go, line 754 [r2] (raw file):
Please move it down to specify it just before it's used for the first time to be consistent.


rpcserver.go, line 771 [r2] (raw file):
I'd suggest moving most of the vout field setting code below this point and simply check if it passes the filter here. There is no point in doing all of the work to build a vout only to ignore it later. Since EncodeAddress is relatively expensive, I'd definitely build the list of encoded address strings here too since they're needed to check the filter. For example:

encodedAddrs := make([]string, len(addrs))
for j, addr := range addrs {
    encodedAddrs[j] = addr.EncodeAddress()
}
if len(filterAddrMap) > 0 {
    for _, addr := range encodedAddrs {
        if _, exists := filterAddrMap[addr]; !exists {
            continue nextTxOut
        }
    }
}

var vout btcjson.Vout
vout.N = uint32(i)
...
vout.ScriptPubKey.Addresses = encodedAddrs
...

rpcserver.go, line 791 [r2] (raw file):
This won't be needed if following my advice above.


rpcserver.go, line 805 [r2] (raw file):
Please move this just before the if for consistency.


rpcserver.go, line 810 [r2] (raw file):
Extra blank line that shouldn't be here.


Comments from the review on Reviewable.io

@dan-da
Copy link
Contributor Author

dan-da commented Nov 12, 2015

Review status: all files reviewed at latest revision, 1 unresolved discussion.


btcjson/chainsvrcmds.go, line 554 [r2] (raw file):
I suppose that it is worth using a map for the (probably unusual) case that a large filter list is received. For small lists (common case) performance is probably similar or may actually be better for the array loop than a hash lookup.

I will attempt the change.


Comments from the review on Reviewable.io

@dan-da
Copy link
Contributor Author

dan-da commented Nov 12, 2015

Review status: all files reviewed at latest revision, 1 unresolved discussion.


btcjson/chainsvrcmds.go, line 554 [r2] (raw file):
Hmm, if we change it here, then doesn't that force the caller to send the data as a map, eg: { addr1 : true, addr2: true, addr3: true }, etc?

From the API caller's perspective, that seems messier and uglier than before:, which was simply: [addr1, addr2, addr3].

It could be that we keep the external interface an array but then copy it into a map internally. of course that adds some overhead.

thoughts?


Comments from the review on Reviewable.io

return vinList
}

// Lookup all of the referenced transactions needed to populate the
// previous output information if requested.
var txStore blockchain.TxStore
if vinExtra != 0 {
if vinExtra != 0 || len(filterAddrs) > 0 {
tx := btcutil.NewTx(mtx)
txStoreNew, err := s.server.txMemPool.fetchInputTransactions(tx, true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be FetchInputTransactions, or does the mempool need locking?

@davecgh
Copy link
Member

davecgh commented Nov 12, 2015

Review status: all files reviewed at latest revision, 2 unresolved discussions.


btcjson/chainsvrcmds.go, line 554 [r2] (raw file):
Yes it would force the caller to specify it that way. I'm ok with building a map out of it internally. That would provided the desired duplicate filtering functionality and scale well with large numbers of addresses.


rpcserver.go, line 682 [r2] (raw file):
Yes. It should be.


Comments from the review on Reviewable.io

@dan-da
Copy link
Contributor Author

dan-da commented Nov 13, 2015

Ok, I've made all the requested changes and I think it is ready for another (hopefully final) review.

I changed the internal filterAddrs var from []string to map[string]bool. This is copied from the JSON input, which is still []string.

I also refactored createVinListPrevOut a bit so it is does not create the btcjson.VinPrevOut entry until after the filter passes. This matches a small refactor in createVoutList requested by @davecgh.

I tested the latest code once again with:
vinExtra=0, filterParams omitted
vinExtra=1, filterParams omitted
filterParams empty
filterParams non-empty

each case appears to work correctly and my client app that uses the data displays transaction list with correct values for inputs and outputs.

@dan-da
Copy link
Contributor Author

dan-da commented Nov 13, 2015

Review status: 3 of 4 files reviewed at latest revision, 1 unresolved discussion.


rpcserver.go, line 682 [r2] (raw file):
I'm not clear on this. Do I need to change something, or not? Anyway, this call was pre-existing to this pull request.


Comments from the review on Reviewable.io

@davecgh
Copy link
Member

davecgh commented Nov 14, 2015

Thanks for the updates and your work on this. I've noted a few minor efficiency-related things.


Reviewed 1 of 1 files at r4.
Review status: all files reviewed at latest revision, 6 unresolved discussions.


rpcserver.go, line 682 [r2] (raw file):
We can do it in a separate PR as I realize it's existing code. Don't worry about it in this PR.


rpcserver.go, line 691 [r4] (raw file):
With the other changes that cause a continue as soon as possible, this variable should no longer be needed.


rpcserver.go, line 718 [r4] (raw file):
Once it passes the filter, the continue can happen immediately if you add a label to the outer for loop:

nextTxIn:
    for _, txIn := range mtx.TxIn {
...
        if _, exists := filterAddrMap[encodedAddrs[j]]; !exists {
            continue nextTxIn
        }

I suggest this since there isn't any reason to continue encoding all the other addresses when it's just going to continue and skip them anyways.


rpcserver.go, line 756 [r4] (raw file):
Nor this one.


rpcserver.go, line 775 [r4] (raw file):
The same here as above in the input section:

nextTxOut:
    for _, txOut := range mtx.TxOut {
...
        if _, exists := filterAddrMap[encodedAddrs[j]]; !exists {
            continue nextTxOut
        }

rpcserver.go, line 3180 [r4] (raw file):
Since this is only acting a set, please use make(map[string]struct[}) and filterAddrs[addr] = struct{}{}. This approach causes no memory to be allocated for the keys which aren't used. The rest of the syntax that checks for key existence (if _, exists := filterAddrs[addr]; exists { ... }) stays the same.


Comments from the review on Reviewable.io

@dan-da
Copy link
Contributor Author

dan-da commented Nov 14, 2015

Review status: all files reviewed at latest revision, 3 unresolved discussions.


rpcserver.go, line 718 [r4] (raw file):
If I'm not mistaken, this changes the logic. My code was saying: "if any address in encodedAddrs matches the filter, then pass filter". This code (if I understand it correctly) is saying: "if any address in encodedAddrs does not match filter, then fail filter and go to next vin". ( because the if/continue take place inside the for loop that iterates over encodedAddrs. )

Now the common/usual case for Vin is that len(encodedAddrs) == 1. So they get the same result.

But if len(encodedAddrs) > 1 then I believe my logic is more correct, as the goal is to retrieve all inputs and outputs that contain the filter addr(s).

Am I wrong?

also, I'm not sure exactly how/when len(encodedAddrs) > 1. multisig tx perhaps?


Comments from the review on Reviewable.io

@dan-da
Copy link
Contributor Author

dan-da commented Nov 15, 2015

I changed the type of filterAddrs to map[string]struct{}.

I also renamed filterAddrs to filterAddrMap.

I did not yet make the change to remove the passesFilter var and use a continue/label because I believe that changes the logic for cases when len(filterAddrMap) > 1. See my previous comment.

@davecgh
Copy link
Member

davecgh commented Nov 15, 2015

rpcserver.go, line 718 [r4] (raw file):
Nevermind, you also want it to pass when there are no addresses specified. It's correct the way it is.


Comments from the review on Reviewable.io

@davecgh
Copy link
Member

davecgh commented Nov 15, 2015

Thanks. You're correct about the code the way it is. Just need the reviewable comments to be acknowledged by you and @dajohi and this is good to go.

It does need to be rebased and squashed to the latest master however.

See Managing a btcd Pull Request for Code Contibutions if you need instructions.


Review status: all files reviewed at latest revision, 2 unresolved discussions.


Comments from the review on Reviewable.io

@dan-da dan-da force-pushed the filteraddrs_pull_request_branch branch from d5dee01 to 1d178c8 Compare November 15, 2015 22:17
@dan-da
Copy link
Contributor Author

dan-da commented Nov 15, 2015

ackknowledged. rebased/squashed. pushed.

paging @dajohi

@davecgh
Copy link
Member

davecgh commented Nov 15, 2015

Thanks. Though, it doesn't look like it was rebased properly. There are 15 commits instead of one that is on top of b691a22.

EDIT: Looks like it's on top of your fork's master instead of upstream master.

EDIT2: I'm going to guess you aren't setup the same way as the gist I linked, so your fork's master branch doesn't match the upstream branch. The following steps should work to get this PR rebased properly:

git remote add tempupstream https://github.com/btcsuite/btcd.git
git fetch tempupstream
git checkout filteraddrs_pull_request_branch
git rebase --onto tempupstream HEAD^
git push -f
git remote remove tempupstream

However, I would suggest setting it up like the gist I linked above since it generally makes it easier to deal with pull requests and doesn't require changing paths to match your fork's path. If you need any help there, feel free to pop on IRC and I can help you modify the repo as needed.

@dan-da dan-da force-pushed the filteraddrs_pull_request_branch branch from 1d178c8 to c7eaee6 Compare November 15, 2015 23:32
@dan-da
Copy link
Contributor Author

dan-da commented Nov 15, 2015

git is crazy, but I think I beat it into submission. :-) It's a single commit now.


Review status: 0 of 5 files reviewed at latest revision, 1 unresolved discussion.


Comments from the review on Reviewable.io

@davecgh
Copy link
Member

davecgh commented Nov 15, 2015

Yep. Looks good. Once @dajohi acks the final discussion, I'll get it merged.


Reviewed 5 of 5 files at r6.
Review status: all files reviewed at latest revision, 1 unresolved discussion.


Comments from the review on Reviewable.io

@conformal-deploy conformal-deploy merged commit c7eaee6 into btcsuite:master Nov 16, 2015
@dajohi
Copy link
Member

dajohi commented Nov 16, 2015

so does @dan-da get to do a btcsuite/btcrpcclient PR too? :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants