fix: fix batch ordering issue #160

marshallyale · 2025-01-22T22:14:18Z

Fixes issue #75
This is my first open source pull request so I apologize for any formatting issues. Additionally, I don't know the repository as well as others so there may be a better way to implement the fix.

I believe I found the root cause of this. I added a pull request to fix, but I'm going to copy/paste what I believe is causing the error.

The main issue in the code is inside raw_client.rs inside the recv method implementation (snippet below):

rust-electrum-client/src/raw_client.rs

Lines 671 to 685 in 805ea0a

    
           fn recv( 
        
               &self, 
        
               receiver: &Receiver<ChannelMessage>, 
        
               req_id: usize, 
        
           ) -> Result<serde_json::Value, Error> { 
        
               loop { 
        
                   // Try to take the lock on the reader. If we manage to do so, we'll become the reader 
        
                   // thread until we get our reponse 
        
                   match self._reader_thread(Some(req_id)) { 
        
                       Ok(response) => break Ok(response), 
        
                       Err(Error::CouldntLockReader) => { 
        
                           match receiver.recv()? { 
        
                               // Received our response, returning it 
        
                               ChannelMessage::Response(received) => break Ok(received), 
        
                               ChannelMessage::WakeUp => {

When this is first called, the self._reader_thread will run. Inside the self._reader_thread, if the request id matches the response id, everything works fine. However, if the request id does not match the response id, we run the following code:

rust-electrum-client/src/raw_client.rs

Lines 602 to 612 in 805ea0a

    
           Some(resp_id) => { 
        
               // We have an id, but it's not our response. Notify the thread and 
        
               // move on 
        
               trace!("Reader thread received response for {}", resp_id); 
        
               if let Some(sender) = self.waiting_map.lock()?.remove(&resp_id) { 
        
                   sender.send(ChannelMessage::Response(resp))?; 
        
               } else { 
        
                   warn!("Missing listener for {}", resp_id); 
        
               } 
        
           }

The channel that the response is sent back into is not unique, but rather all the channels share the same sender.clone() and receiver. The only validation that is done is to check that the request id is still being searched for inside self.waiting_map. This means that the receiver channel receives whatever the next response is into the channel without any validation that it matches the request id which happens here match receiver.recv()?.

This is fixed by implementing unique channels for every request id. This fix can be verified with the code @johnzweng used to show the issue

use bitcoin::block::Header;
use electrum_client::{Client, ElectrumApi};
use std::process;

fn main() {
    let url = "tcp://exs.dyshek.org:50001";
    let client = Client::new(url).unwrap();

    loop {
        let heights: Vec<u32> = vec![1, 4, 8, 12, 222, 6666];
        let headers: Vec<Header> = client.batch_block_header(heights).unwrap();

        let mut err_counter = 0;
        if headers.get(0).unwrap().time != 1231469665 {
            // time of block 1
            err_counter += 1;
            println!("error 0");
            process::exit(1);
        }
        if headers.get(1).unwrap().time != 1231470988 {
            // time of block 4
            err_counter += 1;
            println!("error 1");
            process::exit(1);
        }
        if headers.get(2).unwrap().time != 1231472743 {
            // time of block 8
            err_counter += 1;
            println!("error 2");
            process::exit(1);
        }
        if headers.get(3).unwrap().time != 1231474888 {
            // time of block 12
            err_counter += 1;
            println!("error 3");
            process::exit(1);
        }
        if headers.get(4).unwrap().time != 1231770653 {
            // time of block 222
            err_counter += 1;
            println!("error 4");
            process::exit(1);
        }
        if headers.get(5).unwrap().time != 1236456633 {
            // time of block 6666
            err_counter += 1;
            println!("error 5");
            process::exit(1);
        }
        println!("==========================================\n")
    }
}

If you run this with the initial code, it will error out after 1-10 cycles normally. However, after the fix this runs indefinitely.

notmandatory · 2025-01-23T16:00:54Z

@marshallyale thanks for identifying and working on this. Please rebase your branch since I just merged #159 which changes this projects MSRV and CI checks. Also you have a few clippy errors that need to be fixed. We'll need to give time to get it reviewed, so be patient with getting this merged. 🙂

Fixes issue bitcoindevkit#75 Raw client waiting map was using the same channel for every request/response. When items were put back into the channel inside of _reader_thread the waiting receiver in recv would just take the next response without validating it on request id request. This fixes this by using unique channels for each request response inside of the waiting map.

marshallyale · 2025-01-24T20:05:30Z

No worries, I should have just rebased and pushed this. I'm working off my forked repo currently so there's no rush on getting any changes synced. Let me know if there's anything else I need to change!

oleonardolima

Thanks for working on this fix!

I've reviewed the issue and proposed changes, it seems it's a library problem as it's not management and matching context between requests and responses, as the response for a batch call may be out of ordered (as per JSON-RPC spec).

Using an exclusive channel for each request in a batch, solves the problem, however I'm trying to think about any shortcomings of that approach.

notmandatory added the bug Something isn't working label Jan 23, 2025

notmandatory assigned marshallyale Jan 23, 2025

marshallyale and others added 2 commits January 24, 2025 11:52

fix: fix clippy error

2d3f4a5

marshallyale force-pushed the fix_transaction_ordering branch from 8c98446 to 2d3f4a5 Compare January 24, 2025 20:02

oleonardolima self-requested a review January 27, 2025 14:33

oleonardolima reviewed Jan 30, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: fix batch ordering issue #160

fix: fix batch ordering issue #160

marshallyale commented Jan 22, 2025 •

edited

Loading

notmandatory commented Jan 23, 2025

marshallyale commented Jan 24, 2025

oleonardolima left a comment

	fn recv(
	&self,
	receiver: &Receiver<ChannelMessage>,
	req_id: usize,
	) -> Result<serde_json::Value, Error> {
	loop {
	// Try to take the lock on the reader. If we manage to do so, we'll become the reader
	// thread until we get our reponse
	match self._reader_thread(Some(req_id)) {
	Ok(response) => break Ok(response),
	Err(Error::CouldntLockReader) => {
	match receiver.recv()? {
	// Received our response, returning it
	ChannelMessage::Response(received) => break Ok(received),
	ChannelMessage::WakeUp => {

	Some(resp_id) => {
	// We have an id, but it's not our response. Notify the thread and
	// move on
	trace!("Reader thread received response for {}", resp_id);

	if let Some(sender) = self.waiting_map.lock()?.remove(&resp_id) {
	sender.send(ChannelMessage::Response(resp))?;
	} else {
	warn!("Missing listener for {}", resp_id);
	}
	}

fix: fix batch ordering issue #160

Are you sure you want to change the base?

fix: fix batch ordering issue #160

Conversation

marshallyale commented Jan 22, 2025 • edited Loading

notmandatory commented Jan 23, 2025

marshallyale commented Jan 24, 2025

oleonardolima left a comment

Choose a reason for hiding this comment

marshallyale commented Jan 22, 2025 •

edited

Loading