Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

do you take care about big difference between splits? #3

Open
Sandy4321 opened this issue Jun 14, 2020 · 3 comments
Open

do you take care about big difference between splits? #3

Sandy4321 opened this issue Jun 14, 2020 · 3 comments

Comments

@Sandy4321
Copy link

Important question :
may you clarify if you using some special algorithms for generation many test data / train data splits

to get big difference between splits
since some splits many have negligible difference for example for

samples 1 2 3 4 5 6 7 8 9 10 11 12
good split
test data 1 2 3 4 5 6 train data 7 8 9 10 11 12
test data 1 2 3 7 8 9 train data 4 5 6 10 11 12
minimum difference between all sets is 3

bad split
test data 1 2 3 4 5 6 train data 7 8 9 10 11 12
test data 1 2 3 4 5 7 train data 6 8 9 10 11 12
minimum difference between all sets 1

@Sandy4321
Copy link
Author

my guess here
d_test_list[[i]] <- d1_test[sample(1:nrow(d1_test), size_test),]

subsamples are generated many times with small difference?

@szilard
Copy link
Owner

szilard commented Jun 14, 2020

yes, those are random samples, with relatively large sample sizes those idiosyncrasies do not matter

@Sandy4321
Copy link
Author

I see thanks, but my guess , you only assuming this and not tested it by experiment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants