Strange db performance issue, disk dependant #314

sb10 · 2020-07-01T12:57:55Z

There is a possible db performance issue when running the ehive longmult test with 1500+ clients.

cd /dev/shm/sb10/benchmark
rm -fr /dev/shm/sb10/.wr_development && mkdir -p /dev/shm/sb10/.wr_development && wr manager start
cat /lustre/scratch117/sciops/team117/npg/sb10/ehive/longmult_inputs_10klarge | perl -ne '($am, $bm) = split("\t", $_); print "perl /lustre/scratch117/sciops/team117/npg/sb10/ehive/longmult_step1.pl $am $bm\n"' | wr add -i longmult_step1 --cwd_matters -o 2 -m 200M

took 3m25s, was nice and consistent.

Using local scheduler limited to 24 cores, with db in /tmp:
node-1-2-5 took 2m12s (2mins with ram disk)
vr-server1 [faster SAS disk] took 3m53s (steady progress)
vr-login [old SCSI disk] took too long (slow step1, though with speedier bursts, variable speed step2/3 with pauses, but slow for a long time when I gave up after 8mins)

Just accept the db calls can be slow on slow disks like lustre.

But what's the problem with vr-server1's ssd? It doesn't actually seem to be slow in terms of db writes. Rather, for some bizzare reason LSF doesn't give many jobs because we're constantly switching to new job arrays, emptying out the ones that ran a few jobs. Just changing the db location to /tmp fixes it!

The text was updated successfully, but these errors were encountered:

sb10 · 2020-07-01T12:59:42Z

~/src/go/src/github.com/sb10/tests/boltdb_benchmark/boltdb_benchmark.go

On vr-2-2-02:

$ /nfs/users/nfs_s/sb10/src/go/src/github.com/sb10/tests/boltdb_benchmark/boltdb_benchmark /tmp/sb10_boltdb
2018/02/23 11:44:32 average time for last 1000 writes: 0.061388s
2018/02/23 11:44:35 average time for last 1000 writes: 0.071500s
[...]
2018/02/23 11:45:07 average time for last 1000 writes: 0.011722s
2018/02/23 11:45:09 a single write took 1.664716s
2018/02/23 11:45:09 a single write took 1.665978s
[...]
2018/02/23 11:45:09 a single write took 1.664879s
2018/02/23 11:45:13 average time for last 1000 writes: 0.143018s
2018/02/23 11:45:18 average time for last 1000 writes: 0.116287s
[...]
2018/02/23 11:45:54 average time for last 1000 writes: 0.123277s
2018/02/23 11:45:58 average time for last 1000 writes: 0.094098s
2018/02/23 11:45:58 average time for last 1000 writes: 0.011980s
2018/02/23 11:45:59 average time for last 1000 writes: 0.011911s
[...]
2018/02/23 11:46:20 100000 writes using 24 cores took 1m50.976081525s

Note the frustrating wild fluctuations in speed over time.

On vr-3-1-11:

$ /nfs/users/nfs_s/sb10/src/go/src/github.com/sb10/tests/boltdb_benchmark/boltdb_benchmark /tmp/sb10_boltdb
2018/02/23 11:50:49 average time for last 1000 writes: 0.010986s
2018/02/23 11:50:49 average time for last 1000 writes: 0.011079s
[...]
2018/02/23 11:51:34 100000 writes using 32 cores took 46.156045093s

/tmp on this machine is consistently fast.

$ /nfs/users/nfs_s/sb10/src/go/src/github.com/sb10/tests/boltdb_benchmark/boltdb_benchmark /data-ssd/neo4j/sb10_boltdb
2018/02/23 11:52:41 average time for last 1000 writes: 0.013596s
2018/02/23 11:52:41 average time for last 1000 writes: 0.013291s
[...]
2018/02/23 11:53:33 100000 writes using 32 cores took 52.115732941s

Hmmm, also consistently fast. Need a better benchmark... Or maybe this was never a disk problem, but some other bug preventing many jobs from running at once when our db is on the ssd??

sb10 added the question label Jul 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Strange db performance issue, disk dependant #314

Strange db performance issue, disk dependant #314

sb10 commented Jul 1, 2020

sb10 commented Jul 1, 2020

Strange db performance issue, disk dependant #314

Strange db performance issue, disk dependant #314

Comments

sb10 commented Jul 1, 2020

sb10 commented Jul 1, 2020