Remove a random subset of validators in net_dynamic_hb #385

RicoGit · 2019-02-26T13:28:02Z

related issue #374

do_drop_and_re_add chooses at random at least 1 node for removing from the network and then re-adding removed nodes back. Cluster always remain correct. In other words before and after removing validators network correctness condition (N=3f+1) is always satisfied.

Removed nodes can be as faulty nodes as well as correct ones. You can see in the log of the test how much is actually nodes will be removed.

Max number of nodes for removing is 8, was chosen 2 nodes
Will remove and re-add nodes {1, 2}

afck

Looks good so far! 👍
I'd just like to make the test slightly more general (see comment at subset_for_remove).

tests/net/mod.rs

tests/net_dynamic_hb.rs

afck

Looks good to me! (Just one more nit-pick below.)

tests/net_dynamic_hb.rs

RicoGit · 2019-02-27T12:03:59Z

Running tests on the CI server caught an error:

thread 'drop_and_re_add' panicked at 'crank: node failed to process step: Fault { reported_by: 3, faulty_id: 7, fault_kind: HbFault(SubsetFault(BaFault(DuplicateBVal))) }', libcore/result.rs:1009:5

Trying to reproduce the error locally I caught another error:

thread 'drop_and_re_add' panicked at 'crank: network queue empty', src/libcore/option.rs:1008:5

Second error appears when only one node left in the network after removing nodes. I'm trying to figure out what's wrong.

vkomenda · 2019-02-27T12:25:38Z

@RicoGit, to reproduce locally, add the proptest seed from the failing test run on Travis to your local seeds.

RicoGit · 2019-02-27T12:36:16Z

Should I add a static test case for the failed inputs? like this:

#[test]
fn drop_and_re_add_one_node_left() {
    let cfg = TestConfig {
        dimension: NetworkDimension::new(4, 0),
        total_txs: 20,
        batch_size: 10,
        contribution_size: 1,
        seed: [151, 234, 50, 31, 109, 65, 28, 122, 82, 93, 226, 143, 185, 27, 195, 133]
    };
    do_drop_and_re_add(cfg)
}

vkomenda · 2019-02-27T12:38:27Z

No, just add the seed to net_dynamic_hb.proptest-regressions.

vkomenda · 2019-02-27T12:40:14Z

This one: https://travis-ci.org/poanetwork/hbbft/jobs/499202391#L2980.

RicoGit · 2019-02-27T14:01:30Z

I've got it. Awesome! Thanks! But local test (with the seed from CI) produces a different error.

vkomenda · 2019-02-27T14:08:15Z

Have a look at crank error variants if your local failure is 'crank: network queue empty'. Check that the removed nodes finished sending all their messages before removal.

afck · 2019-02-27T14:58:18Z

Does the second error always happen whenever we're left with only a single node?
That one may be related to the test (or the test net framework) and not the production code: Of course there are no messages in the queue if there's only one node. A single node will always immediately deliver a batch whenever we provide input.
If it's difficult to make the test handle that case, I'd be happy with making that a TODO for a separate PR, and restricting the test to at least two nodes for now.

The first error is strange, though. We are a bit aggressive with our fault reports and sending the same BVal twice wouldn't break anything in production; but it still shouldn't happen in theory… 🤔

RicoGit · 2019-02-27T15:29:01Z

"restricting the test to at least two nodes for now." - Yes! It fixed all the tests. Thanks! I've run a test about ten times for all known seeds.
"I'd be happy with making that a TODO" - I'll make an issue for that. ok?

afck · 2019-02-27T15:46:27Z

Great, thanks!
@vkomenda: Feel free to merge once you're happy with it.

vkomenda · 2019-02-27T15:53:07Z

@afck, are you OK with versioning tests/net_dynamic_hb.proptest-regressions? It's reasonable but wasn't done before.

afck · 2019-02-27T16:16:14Z

Ah, right, that file should be removed, since with a single node it always fails anyway. Good catch!

RicoGit added 6 commits February 26, 2019 16:48

Choose pivot node at random

49f44ff

Choose random number of nodes for removing in net_dynamic_hb test

3be837d

Docs and code small fixes

8e71da6

clippy fix

d58368e

Cargo fmt for stable toolchain and add rust-toolchain file as well

d082f69

Remove rust-toolchain file

884fd20

afck reviewed Feb 26, 2019

View reviewed changes

Fix grammar and improve selecting nodes for removing

03439c8

vkomenda reviewed Feb 26, 2019

View reviewed changes

tests/net_dynamic_hb.rs Show resolved Hide resolved

afck reviewed Feb 27, 2019

View reviewed changes

tests/net_dynamic_hb.rs Show resolved Hide resolved

Simplify selecting nodes for remove

5ec2945

afck approved these changes Feb 27, 2019

View reviewed changes

tests/net_dynamic_hb.rs Outdated Show resolved Hide resolved

Fix tests

2124b71

RicoGit mentioned this pull request Feb 27, 2019

Let net_dynamic_hb success with single node network. #386

Open

Remove net_dynamic_hb.proptest-regressions file

dc972a6

vkomenda merged commit 3336fa7 into poanetwork:master Feb 27, 2019

RicoGit deleted the net_dyn_hb_improvement branch February 27, 2019 17:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove a random subset of validators in net_dynamic_hb #385

Remove a random subset of validators in net_dynamic_hb #385

RicoGit commented Feb 26, 2019

afck left a comment

afck left a comment

RicoGit commented Feb 27, 2019

vkomenda commented Feb 27, 2019

RicoGit commented Feb 27, 2019

vkomenda commented Feb 27, 2019

vkomenda commented Feb 27, 2019

RicoGit commented Feb 27, 2019 •

edited

Loading

vkomenda commented Feb 27, 2019

afck commented Feb 27, 2019

RicoGit commented Feb 27, 2019

afck commented Feb 27, 2019

vkomenda commented Feb 27, 2019

afck commented Feb 27, 2019

Remove a random subset of validators in net_dynamic_hb #385

Remove a random subset of validators in net_dynamic_hb #385

Conversation

RicoGit commented Feb 26, 2019

afck left a comment

Choose a reason for hiding this comment

afck left a comment

Choose a reason for hiding this comment

RicoGit commented Feb 27, 2019

vkomenda commented Feb 27, 2019

RicoGit commented Feb 27, 2019

vkomenda commented Feb 27, 2019

vkomenda commented Feb 27, 2019

RicoGit commented Feb 27, 2019 • edited Loading

vkomenda commented Feb 27, 2019

afck commented Feb 27, 2019

RicoGit commented Feb 27, 2019

afck commented Feb 27, 2019

vkomenda commented Feb 27, 2019

afck commented Feb 27, 2019

RicoGit commented Feb 27, 2019 •

edited

Loading