Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: fix batch ordering issue #160

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

marshallyale
Copy link

@marshallyale marshallyale commented Jan 22, 2025

Fixes issue #75
This is my first open source pull request so I apologize for any formatting issues. Additionally, I don't know the repository as well as others so there may be a better way to implement the fix.

I believe I found the root cause of this. I added a pull request to fix, but I'm going to copy/paste what I believe is causing the error.

The main issue in the code is inside raw_client.rs inside the recv method implementation (snippet below):

fn recv(
&self,
receiver: &Receiver<ChannelMessage>,
req_id: usize,
) -> Result<serde_json::Value, Error> {
loop {
// Try to take the lock on the reader. If we manage to do so, we'll become the reader
// thread until we get our reponse
match self._reader_thread(Some(req_id)) {
Ok(response) => break Ok(response),
Err(Error::CouldntLockReader) => {
match receiver.recv()? {
// Received our response, returning it
ChannelMessage::Response(received) => break Ok(received),
ChannelMessage::WakeUp => {

When this is first called, the self._reader_thread will run. Inside the self._reader_thread, if the request id matches the response id, everything works fine. However, if the request id does not match the response id, we run the following code:

Some(resp_id) => {
// We have an id, but it's not our response. Notify the thread and
// move on
trace!("Reader thread received response for {}", resp_id);
if let Some(sender) = self.waiting_map.lock()?.remove(&resp_id) {
sender.send(ChannelMessage::Response(resp))?;
} else {
warn!("Missing listener for {}", resp_id);
}
}

The channel that the response is sent back into is not unique, but rather all the channels share the same sender.clone() and receiver. The only validation that is done is to check that the request id is still being searched for inside self.waiting_map. This means that the receiver channel receives whatever the next response is into the channel without any validation that it matches the request id which happens here match receiver.recv()?.

This is fixed by implementing unique channels for every request id. This fix can be verified with the code @johnzweng used to show the issue

use bitcoin::block::Header;
use electrum_client::{Client, ElectrumApi};
use std::process;

fn main() {
    let url = "tcp://exs.dyshek.org:50001";
    let client = Client::new(url).unwrap();

    loop {
        let heights: Vec<u32> = vec![1, 4, 8, 12, 222, 6666];
        let headers: Vec<Header> = client.batch_block_header(heights).unwrap();

        let mut err_counter = 0;
        if headers.get(0).unwrap().time != 1231469665 {
            // time of block 1
            err_counter += 1;
            println!("error 0");
            process::exit(1);
        }
        if headers.get(1).unwrap().time != 1231470988 {
            // time of block 4
            err_counter += 1;
            println!("error 1");
            process::exit(1);
        }
        if headers.get(2).unwrap().time != 1231472743 {
            // time of block 8
            err_counter += 1;
            println!("error 2");
            process::exit(1);
        }
        if headers.get(3).unwrap().time != 1231474888 {
            // time of block 12
            err_counter += 1;
            println!("error 3");
            process::exit(1);
        }
        if headers.get(4).unwrap().time != 1231770653 {
            // time of block 222
            err_counter += 1;
            println!("error 4");
            process::exit(1);
        }
        if headers.get(5).unwrap().time != 1236456633 {
            // time of block 6666
            err_counter += 1;
            println!("error 5");
            process::exit(1);
        }
        println!("==========================================\n")
    }
}

If you run this with the initial code, it will error out after 1-10 cycles normally. However, after the fix this runs indefinitely.

@notmandatory
Copy link
Member

@marshallyale thanks for identifying and working on this. Please rebase your branch since I just merged #159 which changes this projects MSRV and CI checks. Also you have a few clippy errors that need to be fixed. We'll need to give time to get it reviewed, so be patient with getting this merged. 🙂

@notmandatory notmandatory added the bug Something isn't working label Jan 23, 2025
marshallyale and others added 2 commits January 24, 2025 11:52
Fixes issue bitcoindevkit#75
Raw client waiting map was using the same channel
for every request/response. When items were put
back into the channel inside of _reader_thread
the waiting receiver in recv would just take the
next response without validating it on request id
request.
This fixes this by using unique channels for each
request response inside of the waiting map.
@marshallyale marshallyale force-pushed the fix_transaction_ordering branch from 8c98446 to 2d3f4a5 Compare January 24, 2025 20:02
@marshallyale
Copy link
Author

No worries, I should have just rebased and pushed this. I'm working off my forked repo currently so there's no rush on getting any changes synced. Let me know if there's anything else I need to change!

@oleonardolima oleonardolima self-requested a review January 27, 2025 14:33
Copy link
Collaborator

@oleonardolima oleonardolima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this fix!

I've reviewed the issue and proposed changes, it seems it's a library problem as it's not management and matching context between requests and responses, as the response for a batch call may be out of ordered (as per JSON-RPC spec).

Using an exclusive channel for each request in a batch, solves the problem, however I'm trying to think about any shortcomings of that approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants