Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange db performance issue, disk dependant #314

Open
sb10 opened this issue Jul 1, 2020 · 1 comment
Open

Strange db performance issue, disk dependant #314

sb10 opened this issue Jul 1, 2020 · 1 comment
Labels

Comments

@sb10
Copy link
Member

sb10 commented Jul 1, 2020

There is a possible db performance issue when running the ehive longmult test with 1500+ clients.

cd /dev/shm/sb10/benchmark
rm -fr /dev/shm/sb10/.wr_development && mkdir -p /dev/shm/sb10/.wr_development && wr manager start
cat /lustre/scratch117/sciops/team117/npg/sb10/ehive/longmult_inputs_10klarge | perl -ne '($am, $bm) = split("\t", $_); print "perl /lustre/scratch117/sciops/team117/npg/sb10/ehive/longmult_step1.pl $am $bm\n"' | wr add -i longmult_step1 --cwd_matters -o 2 -m 200M

took 3m25s, was nice and consistent.

Using local scheduler limited to 24 cores, with db in /tmp:
node-1-2-5 took 2m12s (2mins with ram disk)
vr-server1 [faster SAS disk] took 3m53s (steady progress)
vr-login [old SCSI disk] took too long (slow step1, though with speedier bursts, variable speed step2/3 with pauses, but slow for a long time when I gave up after 8mins)

Just accept the db calls can be slow on slow disks like lustre.

But what's the problem with vr-server1's ssd? It doesn't actually seem to be slow in terms of db writes. Rather, for some bizzare reason LSF doesn't give many jobs because we're constantly switching to new job arrays, emptying out the ones that ran a few jobs. Just changing the db location to /tmp fixes it!

@sb10 sb10 added the question label Jul 1, 2020
@sb10
Copy link
Member Author

sb10 commented Jul 1, 2020

~/src/go/src/github.com/sb10/tests/boltdb_benchmark/boltdb_benchmark.go

On vr-2-2-02:

$ /nfs/users/nfs_s/sb10/src/go/src/github.com/sb10/tests/boltdb_benchmark/boltdb_benchmark /tmp/sb10_boltdb
2018/02/23 11:44:32 average time for last 1000 writes: 0.061388s
2018/02/23 11:44:35 average time for last 1000 writes: 0.071500s
[...]
2018/02/23 11:45:07 average time for last 1000 writes: 0.011722s
2018/02/23 11:45:09 a single write took 1.664716s
2018/02/23 11:45:09 a single write took 1.665978s
[...]
2018/02/23 11:45:09 a single write took 1.664879s
2018/02/23 11:45:13 average time for last 1000 writes: 0.143018s
2018/02/23 11:45:18 average time for last 1000 writes: 0.116287s
[...]
2018/02/23 11:45:54 average time for last 1000 writes: 0.123277s
2018/02/23 11:45:58 average time for last 1000 writes: 0.094098s
2018/02/23 11:45:58 average time for last 1000 writes: 0.011980s
2018/02/23 11:45:59 average time for last 1000 writes: 0.011911s
[...]
2018/02/23 11:46:20 100000 writes using 24 cores took 1m50.976081525s

Note the frustrating wild fluctuations in speed over time.

On vr-3-1-11:

$ /nfs/users/nfs_s/sb10/src/go/src/github.com/sb10/tests/boltdb_benchmark/boltdb_benchmark /tmp/sb10_boltdb
2018/02/23 11:50:49 average time for last 1000 writes: 0.010986s
2018/02/23 11:50:49 average time for last 1000 writes: 0.011079s
[...]
2018/02/23 11:51:34 100000 writes using 32 cores took 46.156045093s

/tmp on this machine is consistently fast.

$ /nfs/users/nfs_s/sb10/src/go/src/github.com/sb10/tests/boltdb_benchmark/boltdb_benchmark /data-ssd/neo4j/sb10_boltdb
2018/02/23 11:52:41 average time for last 1000 writes: 0.013596s
2018/02/23 11:52:41 average time for last 1000 writes: 0.013291s
[...]
2018/02/23 11:53:33 100000 writes using 32 cores took 52.115732941s

Hmmm, also consistently fast. Need a better benchmark... Or maybe this was never a disk problem, but some other bug preventing many jobs from running at once when our db is on the ssd??

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant