do you take care about big difference between splits? #3

Sandy4321 · 2020-06-14T17:12:54Z

Important question :
may you clarify if you using some special algorithms for generation many test data / train data splits

to get big difference between splits
since some splits many have negligible difference for example for

samples 1 2 3 4 5 6 7 8 9 10 11 12
good split
test data 1 2 3 4 5 6 train data 7 8 9 10 11 12
test data 1 2 3 7 8 9 train data 4 5 6 10 11 12
minimum difference between all sets is 3

bad split
test data 1 2 3 4 5 6 train data 7 8 9 10 11 12
test data 1 2 3 4 5 7 train data 6 8 9 10 11 12
minimum difference between all sets 1

Sandy4321 · 2020-06-14T18:12:25Z

my guess here
d_test_list[[i]] <- d1_test[sample(1:nrow(d1_test), size_test),]

subsamples are generated many times with small difference?

szilard · 2020-06-14T20:04:53Z

yes, those are random samples, with relatively large sample sizes those idiosyncrasies do not matter

Sandy4321 · 2020-06-15T14:37:10Z

I see thanks, but my guess , you only assuming this and not tested it by experiment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

do you take care about big difference between splits? #3

do you take care about big difference between splits? #3

Sandy4321 commented Jun 14, 2020

Sandy4321 commented Jun 14, 2020

szilard commented Jun 14, 2020 •

edited

Loading

Sandy4321 commented Jun 15, 2020

do you take care about big difference between splits? #3

do you take care about big difference between splits? #3

Comments

Sandy4321 commented Jun 14, 2020

Sandy4321 commented Jun 14, 2020

szilard commented Jun 14, 2020 • edited Loading

Sandy4321 commented Jun 15, 2020

szilard commented Jun 14, 2020 •

edited

Loading