Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transactions being returned out of order #75

Open
Tracked by #31
0xarvin opened this issue Jul 29, 2022 · 9 comments
Open
Tracked by #31

Transactions being returned out of order #75

0xarvin opened this issue Jul 29, 2022 · 9 comments

Comments

@0xarvin
Copy link

0xarvin commented Jul 29, 2022

Via BDK, we're pulling transactions for a Wallet through a signet electrum instance. Recently, we've noticed that the order that we call batch_transaction_get_raw isn't being respected by electrum and results in the transactions being returned by the client to also be out of order as the reader takes the top result if it's unable to grab a hold of the lock on the reader

This doesn't affect production calls in BDK but only in tests as there's a debug assert here. Was wondering if there's anything that I'm missing or if this is a known issue.

Wallet Descriptor for example:
wsh(sortedmulti(2,[e5cd860e/84'/1'/0']tpubDFVZtuMvX7n6DCWbNAgq361KiJdVFBy4D1sZmdafF1zDNW6dv79MwbK6JBRxGV42BS4STayPsG7Prn3QT7LQwzfaBhE8YhgCs4VzhkYVWTg/0/*,[73ee041b/84'/1'/0']tpubDEPCQp4KHW3sS5o18QbknuC6dtQJPFgWtS6e8wm6ewnc2B1vucCnKf9uf9gCddHeJoK5d5hgJv5ji5Q8VnpT4h9ydsi9FE4t8F2GoPrFPYa/0/*,[08a73043/84'/1'/0']tpubDEZ565bfePfqSVkSfrLaR7g5nruLDWxJFKkJvKWUpMEQBsS9p2f5um5uTwgmKWuHbyoiythmEvDRSgdEoYBgLXfosBsjMBTp1jev9wfi2q9/0/*

Network: Signet

@afilini
Copy link
Member

afilini commented Aug 1, 2022

I don't think this was a known issue, and I'm guessing this may be related to the specific server your are running because I've never seen this happen with electrs.

Looking at the code it's also very weird that this is happening: we do have code to re-order the replies from the server for batch requests:

for req_id in missing_responses.iter() {
match self.recv(&receiver, *req_id) {
Ok(mut resp) => answers.insert(req_id, resp["result"].take()),
Err(e) => {
// In case of error our sender could still be left in the map, depending on where
// the error happened. Just in case, try to remove it here
warn!("got error for req_id {}: {:?}", req_id, e);
warn!("removing all waiting req of this batch");
let mut guard = self.waiting_map.lock()?;
for req_id in missing_responses.iter() {
guard.remove(req_id);
}
return Err(e);
}
};
}
Ok(answers.into_iter().map(|(_, r)| r).collect())

Either there's an issue there, or the server is replying with the wrong request id. Can you try running your tests with trace log level for this crate and then attaching the log here? From that log we should be able to see if the server replies are correct.

@0xarvin
Copy link
Author

0xarvin commented Aug 1, 2022

That's what I'm starting to think, but here's the trace. I replaced the electrum server but let me know if I should include that.

https://gist.github.com/0xarvin/9baee245a45e6e409d9be65e4a447101

@notmandatory
Copy link
Member

Which electrum server software are you using where you see this issue?

@0xarvin
Copy link
Author

0xarvin commented Nov 8, 2022

@notmandatory We spun up a set of signet nodes:

url: ssl://electrum.nodes.wallet.build:51002
electrum server: fulcrum

@conduition
Copy link
Contributor

+1 for this issue. Happens for many servers, not just one special guy. Most servers I tested exhibit this bug. Including:

ssl://fulcrum.grey.pw:51002
ssl://electrum.jochen-hoenicke.de:50099
tcp://v7gtzf7nua6hdmb2wtqaqioqmesdb4xrlly4zwr7bvayxv2bpg665pqd.onion:50001
tcp://venmrle3xuwkgkd42wg7f735l6cghst3sdfa3w3ryib2rochfhld6lid.onion:50001
ssl://blackie.c3-soft.com:57002
ssl://e.keff.org:50002
tcp://udfpzbte2hommnvag5f3qlouqkhvp3xybhlus2yvfeqdwlhjroe4bbyd.onion:60001
tcp://kittycp2gatrqhlwpmbczk5rblw62enrpo2rzwtkfrrr27hq435d4vid.onion:50001
ssl://horsey.cryptocowboys.net:50002

I can confirm the cause does NOT originate within this library.

The culprit electrum servers are not respecting JSON-RPC request IDs correctly. If i ask for a series of transactions with a given set of request IDs, the server will return responses for those request IDs but the actual response data might not be mapped to the correct response IDs.

E.g. request with ID 9 asks for TX abcd..., but the response with ID 9 instead contains TX beef... which was requested elsewhere in the batch under a different request ID.

I imagine this is a bug somewhere in the upstream servers' async request handling code. We could easily fix this inside this library by sorting the response TXs by their true TXID. However, if the servers aren't respecting JSON-RPC request IDs, it's hard to know what other properties of JSON-RPC they might be breaking. I didn't test any other batch methods to see if they suffered from the same problem as batch_transaction_get. I think more testing is required here.

The fact that so many servers are apparently broken in this respect is concerning and suggests that maybe a client-side hack is needed, otherwise callers will inevitably reopen issues like this in the future.

@conduition
Copy link
Contributor

I can confirm that this problem is not limited to batched transaction fetching. It also affects batch_script_get_history. If I had to guess, i suspect afflicted servers just don't respect JSON-RPC request IDs for any methods.

@johnzweng
Copy link

johnzweng commented Sep 5, 2024

Hi!

I can provide new data to this issue 🙂

I also stumbled about this problem and after some tests I think I can confirm that this is no server issue but must happen somewhere on the Rust side of things. Unfortunately my knowledge of Rust is still too limited to spot the error. 🙈

Here is what I tested:

I wrote a simple test program, to reproduce this error. It runs in a loop and every now and then the results are out of order:

extern crate electrum_client;
use bitcoin::block::Header;
use electrum_client::{Client, ElectrumApi};

fn main() {
    loop {
        let client = Client::new("tcp://bitcoin.aranguren.org:50001").unwrap();

        // random blockheights
        let heights: Vec<u32> = vec![1, 4, 8, 12, 222, 6666];
        let headers: Vec<Header> = client.batch_block_header(heights).unwrap();

        let mut err_counter = 0;
        if headers.get(0).unwrap().time != 1231469665 { // time of block 1
            err_counter += 1;
            println!("error 0");
        }
        if headers.get(1).unwrap().time != 1231470988 { // time of block 4
            err_counter += 1;
            println!("error 1");
        }
        if headers.get(2).unwrap().time != 1231472743 { // time of block 8
            err_counter += 1;
            println!("error 2");
        }
        if headers.get(3).unwrap().time != 1231474888 { // time of block 12
            err_counter += 1;
            println!("error 3");
        }
        if headers.get(4).unwrap().time != 1231770653 { // time of block 222
            err_counter += 1;
            println!("error 4");
        }
        if headers.get(5).unwrap().time != 1236456633 { // time of block 6666
            err_counter += 1;
            println!("error 5");
        }
        println!("Result: {} headers were not at expected index in result vector", err_counter);
        println!("==========================================")
    }
}

I also added some debug printlnhere in this block:

for x in resp {
answer.push(serde_json::from_value(x)?);
}

        for x in resp {
            println!("************************");
            println!("{:#?}", x);
            answer.push(serde_json::from_value(x)?);
        }

Additonally I started wireshark to monitor the traffic to the Electrum server (that's why I chose a clear text server).

Finally I set the breakpoint at the end of my loop, so that I could step through my loop and identify the corresponding TCP connection in my wireshark window. As soon a saw the error in the console output I looked into wireshark and displayed the last TCP connection with "Follow TCP stream".

And here is the result:
electrumX responses vs rust-client

I cannot explain why, but we definitely can see that the server returned the responses with their correct corresponding JSON rpc id's but here in this line in resp it ended up in a different order:

let resp = $self.batch_call(&batch)?;

Any ideas @afilini ? 🙂 (my rust knowledge is still too limited to spot the issue here)

@johnzweng
Copy link

johnzweng commented Sep 6, 2024

As a short sidenote: This bug also impacts the BDK in some way and leads to incorrect (or at least inconsistent) data being synced into the wallet database.

In detail:
During sync the .zip(...) call here in this line also assumes that the results are in the same order as the requested block heights.

If they are not this will lead to a situation where the result of wallet.list_transactions(...) will contain transactions where the BlockTime struct confirmation_time will contain inconsistent data, where the height does not match the timestamp (containing timestamp of a different block).

pub struct BlockTime {
    /// confirmation block height
    pub height: u32,
    /// confirmation block timestamp
    pub timestamp: u64,
}

@notmandatory
Copy link
Member

@johnzweng can you try to reproduce this with the latest 1.0.0-beta.2 release? The syncing has seen major changes and if still broken we'll likely fix for the 1.0 version first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

5 participants